Synchronization regression in 1.6.0_01-b06

There appears to be a regression in 1.6.0_01-b06 vs. 1.5.0_09-b03: modifications made to an instance field while in a synchronized block are not visible in a subsequent synchronized block. I.e. register or processor memory caches are not being flushed and/or reloaded.

A hacky workaround can be had if you're able to know when the read field value is invalid (stale) and loop re-reading the value until it becomes valid. (See the "HACK WORKAROUND" comment inThreadPool.WorkerThread.run() below.)

This seems like a serious problem. The failure is reliable but rare. (Occurs within ~ 7 to 15M iterations.)

Is anyone else seeing this? Can anyone try this on processors other than Intel & AMD?

The following multi-threaded synchronization test fails under 1.6.0_01-b06 but works on 1.5.0_09-b03. This is true on both Windows XP and Linux. On both client and server VM.

Windows XP is [Version 5.1.2600]

Linux is [Linux fsm.thetuscorp.com 2.6.18-8.1.1.el5xen #1 SMP Mon Feb 26 20:51:53 EST 2007 x86_64 x86_64 x86_64 GNU/Linux]

I've only tried Intel processors.

TEST OVERVIEW:

The ThreadPoolHang class creates and starts 15 BugWorkerThread's. It then reads user commands from stdin.

Each BugWorkerThread instances a Runnable class, schedules that to run through a custom ThreadPool class, and then waits to be notified by that Runnable. All the Runnable does is notify it's waiter. Each BugWorkerThread repeats these actions until told to stop by the main test.

The ThreadPool reuses idle WorkerThread's, or creates new a new WorkerThread if none are idle.

A WorkerThread executes its assigned Runnable (m_task), then adds itself on the ThreadPool's idle WorkerThread list (m_workerThreadCache) and waits to be notified of its next task.

The problem is that sometimes an idle WorkerThread awakes to discover its assigned Runnable (m_task) is null.

The problem appears to be that modifications made to an instance field (m_task) while in a synchronized block are not visible in a subsequent synchronized block. I.e. register or processor memory caches are not being flushed and/or reloaded.

HOW TO RUN TEST:

Compile under JDK 1.6.0_01-b06.

Run:java -cp classes ThreadPoolHang

It prompts the user with:

Commands:

status - current status of each thread

stop - drain & stop test (revealing hung threads)

"status" will print the current loop count of each BugWorkerThread.

"stop" attempts to drain all the BugWorkerThread's, causing them to print their current loop count. It then waits 5 seconds and reports any BugWorkerThread's that did not drain along with the loop count at which they hung. The current version of the test has a workaround which prevents any thread hanging.

Allow it to run for awhile. (For me it takes from 2 - 40 minutes to see the first failure.) When a problem occurs, a line like the following is printed:

Global:WT:work:11 run(): NULL m_task. Fixed after 1 sleepy re-reads.

This indicates that a WorkerThread found a null Runnable (m_task) and then invoked the workaround.

If desired, wait for more error lines to print. At some point, issue the "stop" command to stop the test and announce any hung threads.

EXAMPLE OUTPUT:

A SUCCESSFUL run on 1.5.0_09-b03 (Windows XP):

C:\Dave\Projects\ThreadPoolHang>ver

Microsoft Windows XP [Version 5.1.2600]

C:\Dave\Projects\ThreadPoolHang>java -showversion -cp classes1.5 Thread

java version "1.5.0_09"

Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_09-b03)

Java HotSpot(TM) Client VM (build 1.5.0_09-b03, mixed mode)

Commands:

status - current status of each thread

stop - drain & stop test (revealing hung threads)

stop

-Draining Threads-

BWT 1 132892313

BWT 14 132175196

BWT 7 132190882

BWT 12 132860445

BWT 2 132182809

BWT 13 132873616

BWT 3 132191329

BWT 8 132910546

BWT 10 132929108

BWT 4 132867218

BWT 5 132927796

BWT 15 132899506

BWT 9 132213633

BWT 11 132174793

BWT 6 132943637

-Hung Threads-

Total Created ThreadPool threads: 22

A FAILED run on 1.6.0_01-b06 (Windows XP):

C:\Dave\Projects\ThreadPoolHang>ver

Microsoft Windows XP [Version 5.1.2600]

C:\Dave\Projects\ThreadPoolHang>java -server -showversion -cp classes ThreadPoolHang

java version "1.6.0_01"

Java(TM) SE Runtime Environment (build 1.6.0_01-b06)

Java HotSpot(TM) Server VM (build 1.6.0_01-b06, mixed mode)

Commands:

status - current status of each thread

stop - drain & stop test (revealing hung threads)

Global:WT:work:16 run(): NULL m_task. Fixed after 1 sleepy re-reads.

Global:WT:work:13 run(): NULL m_task. Fixed after 1 sleepy re-reads.

Global:WT:work:1 run(): NULL m_task. Fixed after 1 sleepy re-reads.

CODE:

The test consists of two Java files:

- ThreadPoolHang.java -

import java.util.*;

import java.io.*;

/**

* Repro ThreadPool race bug.

*

* When dispatching work to ThreadPool, eventually the work request is lost (never executed).

* The problem appears to be that modifications made to an instance field while in a synchronized block

* are not visible in a subsequent synchronized block. I.e. memory caches are not being flushed and/or

* reloaded.

*

* A workaround is to be able to know when the read field value is invalid (stale) and loop

* re-reading the value until it becomes valid.

* See the "HACK WORKAROUND" comment in <code>ThreadPool.WorkerThread.run()</code>.

*

* This appears to have been introduced between jdk1.5.0_09 and jdk1.6.0_01.

*/

publicclass ThreadPoolHang

{

privatestaticfinalint numThreads = 15;

privatestaticfinal ThreadPool myThreadPool =new ThreadPool("Global", 4 * numThreads);

privatestaticint moduleChatter = 0;

publicstaticvoid main(String args[])throws IOException

{

if (args.length > 1){

System.err.println("usage: ThreadPoolHang [moduloChatter]");

System.err.println("where moduloChatter is a number specifying how frequently to report progress (default off)");

}elseif (args.length == 1){

moduleChatter = Integer.parseInt(args[0]);

}

// Create & start the threads

ArrayList<BugWorkerThread> workers =new ArrayList<BugWorkerThread>();

for (int i = 1; i <= numThreads; i++){

BugWorkerThread bwt =new BugWorkerThread(i);

workers.add(bwt);

bwt.start();

}

// Wait for user command to stop

BufferedReader input =new BufferedReader(new InputStreamReader(System.in));

while (true){

System.out.println("Commands:\nstatus - current status of each thread\nstop - drain & stop test (revealing hung threads)");

String s = input.readLine();

if ("status".equalsIgnoreCase(s)){

System.out.println("\n-Current Progress-");

for (BugWorkerThread bwt : workers) System.out.println(bwt.getName()+" "+bwt.myCount);

}

if ("stop".equalsIgnoreCase(s))break;

}

// Drain the threads

System.out.println("\n-Draining Threads-");

for (BugWorkerThread bwt : workers) bwt.drain();

try{ Thread.sleep(5000);}catch (InterruptedException e){ e.printStackTrace();}

// Report stragglers

System.out.println("\n-Hung Threads-");

for (BugWorkerThread bwt : workers){

if (!bwt.done()) System.out.println(bwt.getName()+" HUNG @ "+bwt.myCount+" status: "+bwt.getStatus());

}

System.out.println("\nTotal Created ThreadPool threads: "+ myThreadPool.getNumCreatedThreads());

}

publicstaticsynchronizedvoid log(String msg, Throwable t)

{

System.out.println(msg);

System.out.println("Exception: "+t);

t.printStackTrace();

}

publicstaticclass BugWorkerThreadextends Thread

{

publicint myNumber;

publicint myCount = 0;

privatevolatileboolean drain =false;

privatevolatileboolean done =false;

privatevolatile InternalBugWorker myIbw =null;

public BugWorkerThread(int myNumber)

{

this.myNumber = myNumber;

setName("BWT "+myNumber);

}

/**

* Cause worker thread to terminate after the current task completes.

*/

publicvoid drain(){ drain =true;}

/**

* Return true if the thread is finished draining.

* @return true if the thread is finished draining

*/

publicboolean done(){return done;}

/**

* Return the status of the internal task:

* <ul><li>-1 : no task exists

* </li><li>0 : task waiting to be dispatched

* </li><li>> 0 : task has begun (and possibly finished) executing

* </li></ul>

*

* @return the status of the internal task

*/

publicint getStatus(){return (null == myIbw) ? -1 : myIbw.status;}

publicvoid run()

{

String label = getName() +" 0";

while (!drain){

label = getName() +" " + (++myCount);

try{

InternalBugWorker ibw =new InternalBugWorker();

myIbw = ibw;

synchronized (ibw){

myThreadPool.dispatchWorker(ibw, label);

try{ ibw.wait();}catch (Throwable t){ log(label+": doit: wait()", t);}

}

}catch (Exception e){

log(label+": doit", e);

}

if ((moduleChatter > 0) && (myCount % moduleChatter) == 0) System.out.println(label);

}

System.out.println(label);// final count

done =true;

}

staticclass InternalBugWorkerimplements Runnable

{

publicvolatileint status = 0;// indicates how far we've progressed

publicvoid run()

{

// Do nothing but wakup our waiter

status = 1;

synchronized (this){ notifyAll();}

status = 2;

}

}

}

}

- ThreadPool.java -

publicclass ThreadPool

{

privatefinal Stringm_strPoolName;// The name of the thread pool

// LOCKING ORDER: WorkerThread.m_lock is locked (if necessary) before ThreadPool.m_dispatchLock is locked.

privatefinal Objectm_dispatchLock =new Object();

private WorkerThreadm_workerThreadCache =null;// guarded by ThreadPool m_dispatchLock

privatelongm_nCreatedThreads = 0L;// guarded by ThreadPool m_dispatchLock

privatefinalintm_nMaxThreads;// maximum number of threads allowed

// --

/** Construct a thread pool.

*

* @param name The threadpool name.

* @param poolSize The threadpool size.

*/

public ThreadPool(String name,int poolSize)

{

m_strPoolName = name;

m_nMaxThreads = poolSize;

}

/**

* Cumulative number of created threads

* @return cumulative number of created threads

*/

publiclong getNumCreatedThreads(){return m_nCreatedThreads;}

/**

* Dispatch a worker thread to execute the Runnable.

*

* @param task the Runnable to execute

* @param strTaskName a name for messages

* @throws Exception for problems

*/

publicvoid dispatchWorker(Runnable task, String strTaskName)throws Exception

{

if (null == strTaskName) strTaskName ="SomeTask";

strTaskName = m_strPoolName +':' + strTaskName;

WorkerThreadworkerThread;

synchronized (m_dispatchLock){

// First try to get a worker thread from the idle cache

if (null != m_workerThreadCache){

// There is an available idle worker thread

workerThread = m_workerThreadCache;

m_workerThreadCache = workerThread.getNext();

workerThread.setNext(null);

// Next create a new thread if we haven't exceeded our cap.

}elseif (m_nCreatedThreads < m_nMaxThreads){

new WorkerThread(

m_strPoolName +':' +"WT:rest:" + m_nCreatedThreads,

m_strPoolName +':' +"WT:work:" + m_nCreatedThreads, task, strTaskName);

m_nCreatedThreads++;

return;

}else{

thrownew Exception(

"The system is overloaded and a thread cannot be dispatched to perform task: " + strTaskName);

}

}

// The worker thread is an existing idle thread. Setup the task for the thread and release it.

// We're no longer synchronized on m_dispatchLock, so it's safe to call setTask().

workerThread.setTask(task, strTaskName);

}

/**

* Return a worker thread to the thread pool cache. This is called when a worker thread has finished its

* current task and is read to be added back into the queue of available workers.

*

* @param workerThread The worker thread that just finished its task.

*/

void addWorkerToCache(WorkerThread workerThread)

{

synchronized (m_dispatchLock){

// Add it to the available thread cache.

// It will to wait for us to tell it work is available via setTask().

workerThread.setNext(m_workerThreadCache);

m_workerThreadCache = workerThread;

}

}

// =================================================================================================================

// This class manages a worker thread. Each worker thread is contained in an instance of this class. The class

// orchestrates the cycle of doing a task, and then waiting until another task arrives.

// =================================================================================================================

class WorkerThreadimplements Runnable

{

private WorkerThreadm_next =null;// guarded by ThreadPool m_dispatchLock

// LOCKING ORDER: WorkerThread m_lock is locked before ThreadPool m_dispatchLock is locked.

privatefinal Objectm_lock =new Object();

private Runnablem_task =null;// guarded by WorkerThread m_lock

private String m_strTaskName;// guarded by WorkerThread m_lock

privatefinal Stringm_strRestName;

privatefinal Stringm_strDefTaskName;

private Thread m_thread =null;

WorkerThread(String strRestName, String strDefWorkingName,

Runnable task, String strTaskName)

{

m_strRestName = strRestName;

m_strDefTaskName = strDefWorkingName;

m_task = task;

m_strTaskName = strTaskName;

m_thread =new Thread(this, strRestName);

m_thread.setDaemon(true);

m_thread.start();

}

public WorkerThread getNext(){return m_next;}// Caller must hold ThreadPool m_dispatchLock

publicvoid setNext(WorkerThread task){ m_next = task;}// Caller must hold ThreadPool m_dispatchLock

// Caller must NOT hold ThreadPool m_dispatchLock !!

publicvoid setTask(Runnable task, String strTaskName)

{

synchronized (m_lock){

// Caller should not give us a null task

if (null == task){

String s = m_strDefTaskName+" setTask("+strTaskName+"): - caller supplied NULL TASK";

System.out.println(s);

thrownew RuntimeException(s);

}

// Caller should not call us when we already have a non-null task

if (null != m_task){

String s = m_strDefTaskName+" setTask("+strTaskName+"): - already executing task: "+m_strTaskName;

System.out.println(s);

thrownew RuntimeException(s);

}

m_task = task;

m_strTaskName = strTaskName;

m_lock.notify();

}

}

publicvoid run()

{

// The thread sits in a loop that goes like this:

// > do the work (the m_task). If (null == m_task), we are done - exit the loop

// > get our lock

// > add ourselves to the available worker thread cache

// > wait for notify on our lock

// the initial setup is done and we've been released for the first task

Runnable myTask;

String myTaskName;

synchronized (m_lock){

myTask = m_task;

myTaskName = m_strTaskName;

}

if (null == myTask){

String s = m_strDefTaskName+" run(): "+myTaskName+" - NULL TASK";

System.out.println(s);

thrownew RuntimeException(s);

}

while (null != myTask){

// do the task

if (null == myTaskName) myTaskName = m_strDefTaskName;

m_thread.setName(myTaskName);

try{

myTask.run();

}catch ( Throwable t ){

System.out.println(m_thread.getName()+": Exception passed to worker thread: "+t);

if (tinstanceof ThreadDeath)throw (ThreadDeath)t;

}

// Go idle waiting for the next task to do

m_thread.setName(m_strRestName);

// Put this worker thread back in the cache and wait for work

synchronized (m_lock){

m_task =null;

m_strTaskName =null;

addWorkerToCache(this);

try{ m_lock.wait();}// wait for work

catch (InterruptedException e){

System.out.println(getClass().getName()+".run: unexpected exception while waiting for work: "+e);

}

myTask = m_task;

myTaskName = m_strTaskName;

}

if (null == m_task){

// This should never happen. Apparently the memory modification (m_task) done by setTask()

// has not propagated to this processor. But synchronized blocks should guarantee that!

// HACK WORKAROUND:

// Sleep and re-read m_task. (Usually one sleep does the trick.)

// If we re-read in a loop without sleeping, we eventually do see a non-null m_task

// (i.e. the sleep is not necessary, but the re-read needs to be done in a synchronized block)

// but it takes a few thousand re-reads.

int i = 0;

while (null == m_task){

if (i >= 50){

String s = m_strDefTaskName+" run(): NULL m_task. GIVING UP after "+i+" sleepy re-reads.";

System.out.println(s);

thrownew RuntimeException(s);

}

try{ Thread.sleep(100);}catch (InterruptedException e){/* do nothing */}

i++;

}

System.out.println(m_strDefTaskName+" run(): NULL m_task. Fixed after "+i+" sleepy re-reads.");

myTask = m_task;

myTaskName = m_strTaskName;

}

}

System.out.println(m_strDefTaskName+" run(): "+myTaskName+": EXITING THREAD");

m_thread =null;

}

}

}

[31016 byte] By [DavidLennerta] at [2007-11-27 4:36:29]
# 1
This became bug 6560301. And Sun pointed out that it was my bad in not handling spurious wakeups. Java 6 makes spurious wakeups more likely. (My experience is from "never" to "sometimes".)More info at: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6560301
DavidLennerta at 2007-7-12 9:46:39 > top of Java-index,Java HotSpot Virtual Machine,HotSpot Internals...