Synchronization regression in 1.6.0_01-b06
There appears to be a regression in 1.6.0_01-b06 vs. 1.5.0_09-b03: modifications made to an instance field while in a synchronized block are not visible in a subsequent synchronized block. I.e. register or processor memory caches are not being flushed and/or reloaded.
A hacky workaround can be had if you're able to know when the read field value is invalid (stale) and loop re-reading the value until it becomes valid. (See the "HACK WORKAROUND" comment inThreadPool.WorkerThread.run() below.)
This seems like a serious problem. The failure is reliable but rare. (Occurs within ~ 7 to 15M iterations.)
Is anyone else seeing this? Can anyone try this on processors other than Intel & AMD?
The following multi-threaded synchronization test fails under 1.6.0_01-b06 but works on 1.5.0_09-b03. This is true on both Windows XP and Linux. On both client and server VM.
Windows XP is [Version 5.1.2600]
Linux is [Linux fsm.thetuscorp.com 2.6.18-8.1.1.el5xen #1 SMP Mon Feb 26 20:51:53 EST 2007 x86_64 x86_64 x86_64 GNU/Linux]
I've only tried Intel processors.
TEST OVERVIEW:
The ThreadPoolHang class creates and starts 15 BugWorkerThread's. It then reads user commands from stdin.
Each BugWorkerThread instances a Runnable class, schedules that to run through a custom ThreadPool class, and then waits to be notified by that Runnable. All the Runnable does is notify it's waiter. Each BugWorkerThread repeats these actions until told to stop by the main test.
The ThreadPool reuses idle WorkerThread's, or creates new a new WorkerThread if none are idle.
A WorkerThread executes its assigned Runnable (m_task), then adds itself on the ThreadPool's idle WorkerThread list (m_workerThreadCache) and waits to be notified of its next task.
The problem is that sometimes an idle WorkerThread awakes to discover its assigned Runnable (m_task) is null.
The problem appears to be that modifications made to an instance field (m_task) while in a synchronized block are not visible in a subsequent synchronized block. I.e. register or processor memory caches are not being flushed and/or reloaded.
HOW TO RUN TEST:
Compile under JDK 1.6.0_01-b06.
Run:java -cp classes ThreadPoolHang
It prompts the user with:
Commands:
status - current status of each thread
stop - drain & stop test (revealing hung threads)
"status" will print the current loop count of each BugWorkerThread.
"stop" attempts to drain all the BugWorkerThread's, causing them to print their current loop count. It then waits 5 seconds and reports any BugWorkerThread's that did not drain along with the loop count at which they hung. The current version of the test has a workaround which prevents any thread hanging.
Allow it to run for awhile. (For me it takes from 2 - 40 minutes to see the first failure.) When a problem occurs, a line like the following is printed:
Global:WT:work:11 run(): NULL m_task. Fixed after 1 sleepy re-reads.
This indicates that a WorkerThread found a null Runnable (m_task) and then invoked the workaround.
If desired, wait for more error lines to print. At some point, issue the "stop" command to stop the test and announce any hung threads.
EXAMPLE OUTPUT:
A SUCCESSFUL run on 1.5.0_09-b03 (Windows XP):
C:\Dave\Projects\ThreadPoolHang>ver
Microsoft Windows XP [Version 5.1.2600]
C:\Dave\Projects\ThreadPoolHang>java -showversion -cp classes1.5 Thread
java version "1.5.0_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)
Java HotSpot(TM) Client VM (build 1.5.0_09-b03, mixed mode)
Commands:
status - current status of each thread
stop - drain & stop test (revealing hung threads)
stop
-Draining Threads-
BWT 1 132892313
BWT 14 132175196
BWT 7 132190882
BWT 12 132860445
BWT 2 132182809
BWT 13 132873616
BWT 3 132191329
BWT 8 132910546
BWT 10 132929108
BWT 4 132867218
BWT 5 132927796
BWT 15 132899506
BWT 9 132213633
BWT 11 132174793
BWT 6 132943637
-Hung Threads-
Total Created ThreadPool threads: 22
A FAILED run on 1.6.0_01-b06 (Windows XP):
C:\Dave\Projects\ThreadPoolHang>ver
Microsoft Windows XP [Version 5.1.2600]
C:\Dave\Projects\ThreadPoolHang>java -server -showversion -cp classes ThreadPoolHang
java version "1.6.0_01"
Java(TM) SE Runtime Environment (build 1.6.0_01-b06)
Java HotSpot(TM) Server VM (build 1.6.0_01-b06, mixed mode)
Commands:
status - current status of each thread
stop - drain & stop test (revealing hung threads)
Global:WT:work:16 run(): NULL m_task. Fixed after 1 sleepy re-reads.
Global:WT:work:13 run(): NULL m_task. Fixed after 1 sleepy re-reads.
Global:WT:work:1 run(): NULL m_task. Fixed after 1 sleepy re-reads.
CODE:
The test consists of two Java files:
- ThreadPoolHang.java -
import java.util.*;
import java.io.*;
/**
* Repro ThreadPool race bug.
*
* When dispatching work to ThreadPool, eventually the work request is lost (never executed).
* The problem appears to be that modifications made to an instance field while in a synchronized block
* are not visible in a subsequent synchronized block. I.e. memory caches are not being flushed and/or
* reloaded.
*
* A workaround is to be able to know when the read field value is invalid (stale) and loop
* re-reading the value until it becomes valid.
* See the "HACK WORKAROUND" comment in <code>ThreadPool.WorkerThread.run()</code>.
*
* This appears to have been introduced between jdk1.5.0_09 and jdk1.6.0_01.
*/
publicclass ThreadPoolHang
{
privatestaticfinalint numThreads = 15;
privatestaticfinal ThreadPool myThreadPool =new ThreadPool("Global", 4 * numThreads);
privatestaticint moduleChatter = 0;
publicstaticvoid main(String args[])throws IOException
{
if (args.length > 1){
System.err.println("usage: ThreadPoolHang [moduloChatter]");
System.err.println("where moduloChatter is a number specifying how frequently to report progress (default off)");
}elseif (args.length == 1){
moduleChatter = Integer.parseInt(args[0]);
}
// Create & start the threads
ArrayList<BugWorkerThread> workers =new ArrayList<BugWorkerThread>();
for (int i = 1; i <= numThreads; i++){
BugWorkerThread bwt =new BugWorkerThread(i);
workers.add(bwt);
bwt.start();
}
// Wait for user command to stop
BufferedReader input =new BufferedReader(new InputStreamReader(System.in));
while (true){
System.out.println("Commands:\nstatus - current status of each thread\nstop - drain & stop test (revealing hung threads)");
String s = input.readLine();
if ("status".equalsIgnoreCase(s)){
System.out.println("\n-Current Progress-");
for (BugWorkerThread bwt : workers) System.out.println(bwt.getName()+" "+bwt.myCount);
}
if ("stop".equalsIgnoreCase(s))break;
}
// Drain the threads
System.out.println("\n-Draining Threads-");
for (BugWorkerThread bwt : workers) bwt.drain();
try{ Thread.sleep(5000);}catch (InterruptedException e){ e.printStackTrace();}
// Report stragglers
System.out.println("\n-Hung Threads-");
for (BugWorkerThread bwt : workers){
if (!bwt.done()) System.out.println(bwt.getName()+" HUNG @ "+bwt.myCount+" status: "+bwt.getStatus());
}
System.out.println("\nTotal Created ThreadPool threads: "+ myThreadPool.getNumCreatedThreads());
}
publicstaticsynchronizedvoid log(String msg, Throwable t)
{
System.out.println(msg);
System.out.println("Exception: "+t);
t.printStackTrace();
}
publicstaticclass BugWorkerThreadextends Thread
{
publicint myNumber;
publicint myCount = 0;
privatevolatileboolean drain =false;
privatevolatileboolean done =false;
privatevolatile InternalBugWorker myIbw =null;
public BugWorkerThread(int myNumber)
{
this.myNumber = myNumber;
setName("BWT "+myNumber);
}
/**
* Cause worker thread to terminate after the current task completes.
*/
publicvoid drain(){ drain =true;}
/**
* Return true if the thread is finished draining.
* @return true if the thread is finished draining
*/
publicboolean done(){return done;}
/**
* Return the status of the internal task:
* <ul><li>-1 : no task exists
* </li><li>0 : task waiting to be dispatched
* </li><li>> 0 : task has begun (and possibly finished) executing
* </li></ul>
*
* @return the status of the internal task
*/
publicint getStatus(){return (null == myIbw) ? -1 : myIbw.status;}
publicvoid run()
{
String label = getName() +" 0";
while (!drain){
label = getName() +" " + (++myCount);
try{
InternalBugWorker ibw =new InternalBugWorker();
myIbw = ibw;
synchronized (ibw){
myThreadPool.dispatchWorker(ibw, label);
try{ ibw.wait();}catch (Throwable t){ log(label+": doit: wait()", t);}
}
}catch (Exception e){
log(label+": doit", e);
}
if ((moduleChatter > 0) && (myCount % moduleChatter) == 0) System.out.println(label);
}
System.out.println(label);// final count
done =true;
}
staticclass InternalBugWorkerimplements Runnable
{
publicvolatileint status = 0;// indicates how far we've progressed
publicvoid run()
{
// Do nothing but wakup our waiter
status = 1;
synchronized (this){ notifyAll();}
status = 2;
}
}
}
}
- ThreadPool.java -
publicclass ThreadPool
{
privatefinal Stringm_strPoolName;// The name of the thread pool
// LOCKING ORDER: WorkerThread.m_lock is locked (if necessary) before ThreadPool.m_dispatchLock is locked.
privatefinal Objectm_dispatchLock =new Object();
private WorkerThreadm_workerThreadCache =null;// guarded by ThreadPool m_dispatchLock
privatelongm_nCreatedThreads = 0L;// guarded by ThreadPool m_dispatchLock
privatefinalintm_nMaxThreads;// maximum number of threads allowed
// --
/** Construct a thread pool.
*
* @param name The threadpool name.
* @param poolSize The threadpool size.
*/
public ThreadPool(String name,int poolSize)
{
m_strPoolName = name;
m_nMaxThreads = poolSize;
}
/**
* Cumulative number of created threads
* @return cumulative number of created threads
*/
publiclong getNumCreatedThreads(){return m_nCreatedThreads;}
/**
* Dispatch a worker thread to execute the Runnable.
*
* @param task the Runnable to execute
* @param strTaskName a name for messages
* @throws Exception for problems
*/
publicvoid dispatchWorker(Runnable task, String strTaskName)throws Exception
{
if (null == strTaskName) strTaskName ="SomeTask";
strTaskName = m_strPoolName +':' + strTaskName;
WorkerThreadworkerThread;
synchronized (m_dispatchLock){
// First try to get a worker thread from the idle cache
if (null != m_workerThreadCache){
// There is an available idle worker thread
workerThread = m_workerThreadCache;
m_workerThreadCache = workerThread.getNext();
workerThread.setNext(null);
// Next create a new thread if we haven't exceeded our cap.
}elseif (m_nCreatedThreads < m_nMaxThreads){
new WorkerThread(
m_strPoolName +':' +"WT:rest:" + m_nCreatedThreads,
m_strPoolName +':' +"WT:work:" + m_nCreatedThreads, task, strTaskName);
m_nCreatedThreads++;
return;
}else{
thrownew Exception(
"The system is overloaded and a thread cannot be dispatched to perform task: " + strTaskName);
}
}
// The worker thread is an existing idle thread. Setup the task for the thread and release it.
// We're no longer synchronized on m_dispatchLock, so it's safe to call setTask().
workerThread.setTask(task, strTaskName);
}
/**
* Return a worker thread to the thread pool cache. This is called when a worker thread has finished its
* current task and is read to be added back into the queue of available workers.
*
* @param workerThread The worker thread that just finished its task.
*/
void addWorkerToCache(WorkerThread workerThread)
{
synchronized (m_dispatchLock){
// Add it to the available thread cache.
// It will to wait for us to tell it work is available via setTask().
workerThread.setNext(m_workerThreadCache);
m_workerThreadCache = workerThread;
}
}
// =================================================================================================================
// This class manages a worker thread. Each worker thread is contained in an instance of this class. The class
// orchestrates the cycle of doing a task, and then waiting until another task arrives.
// =================================================================================================================
class WorkerThreadimplements Runnable
{
private WorkerThreadm_next =null;// guarded by ThreadPool m_dispatchLock
// LOCKING ORDER: WorkerThread m_lock is locked before ThreadPool m_dispatchLock is locked.
privatefinal Objectm_lock =new Object();
private Runnablem_task =null;// guarded by WorkerThread m_lock
private String m_strTaskName;// guarded by WorkerThread m_lock
privatefinal Stringm_strRestName;
privatefinal Stringm_strDefTaskName;
private Thread m_thread =null;
WorkerThread(String strRestName, String strDefWorkingName,
Runnable task, String strTaskName)
{
m_strRestName = strRestName;
m_strDefTaskName = strDefWorkingName;
m_task = task;
m_strTaskName = strTaskName;
m_thread =new Thread(this, strRestName);
m_thread.setDaemon(true);
m_thread.start();
}
public WorkerThread getNext(){return m_next;}// Caller must hold ThreadPool m_dispatchLock
publicvoid setNext(WorkerThread task){ m_next = task;}// Caller must hold ThreadPool m_dispatchLock
// Caller must NOT hold ThreadPool m_dispatchLock !!
publicvoid setTask(Runnable task, String strTaskName)
{
synchronized (m_lock){
// Caller should not give us a null task
if (null == task){
String s = m_strDefTaskName+" setTask("+strTaskName+"): - caller supplied NULL TASK";
System.out.println(s);
thrownew RuntimeException(s);
}
// Caller should not call us when we already have a non-null task
if (null != m_task){
String s = m_strDefTaskName+" setTask("+strTaskName+"): - already executing task: "+m_strTaskName;
System.out.println(s);
thrownew RuntimeException(s);
}
m_task = task;
m_strTaskName = strTaskName;
m_lock.notify();
}
}
publicvoid run()
{
// The thread sits in a loop that goes like this:
// > do the work (the m_task). If (null == m_task), we are done - exit the loop
// > get our lock
// > add ourselves to the available worker thread cache
// > wait for notify on our lock
// the initial setup is done and we've been released for the first task
Runnable myTask;
String myTaskName;
synchronized (m_lock){
myTask = m_task;
myTaskName = m_strTaskName;
}
if (null == myTask){
String s = m_strDefTaskName+" run(): "+myTaskName+" - NULL TASK";
System.out.println(s);
thrownew RuntimeException(s);
}
while (null != myTask){
// do the task
if (null == myTaskName) myTaskName = m_strDefTaskName;
m_thread.setName(myTaskName);
try{
myTask.run();
}catch ( Throwable t ){
System.out.println(m_thread.getName()+": Exception passed to worker thread: "+t);
if (tinstanceof ThreadDeath)throw (ThreadDeath)t;
}
// Go idle waiting for the next task to do
m_thread.setName(m_strRestName);
// Put this worker thread back in the cache and wait for work
synchronized (m_lock){
m_task =null;
m_strTaskName =null;
addWorkerToCache(this);
try{ m_lock.wait();}// wait for work
catch (InterruptedException e){
System.out.println(getClass().getName()+".run: unexpected exception while waiting for work: "+e);
}
myTask = m_task;
myTaskName = m_strTaskName;
}
if (null == m_task){
// This should never happen. Apparently the memory modification (m_task) done by setTask()
// has not propagated to this processor. But synchronized blocks should guarantee that!
// HACK WORKAROUND:
// Sleep and re-read m_task. (Usually one sleep does the trick.)
// If we re-read in a loop without sleeping, we eventually do see a non-null m_task
// (i.e. the sleep is not necessary, but the re-read needs to be done in a synchronized block)
// but it takes a few thousand re-reads.
int i = 0;
while (null == m_task){
if (i >= 50){
String s = m_strDefTaskName+" run(): NULL m_task. GIVING UP after "+i+" sleepy re-reads.";
System.out.println(s);
thrownew RuntimeException(s);
}
try{ Thread.sleep(100);}catch (InterruptedException e){/* do nothing */}
i++;
}
System.out.println(m_strDefTaskName+" run(): NULL m_task. Fixed after "+i+" sleepy re-reads.");
myTask = m_task;
myTaskName = m_strTaskName;
}
}
System.out.println(m_strDefTaskName+" run(): "+myTaskName+": EXITING THREAD");
m_thread =null;
}
}
}

