1.5 "synchronized" Holds Monitor Past End of Block?
I've got a strange problem where a synchronized(x) appears to be holding the monitor for "x", even past the end of the synchronized code block.
The basic structure of the code looks like this:
Running in a "Processor" thread:
(class-instance-level variable)"queue"
run()
{
while(!stopRequested)
{
Item item;
synchronized(queue)
{
item = queue.pop();
}
item.process();
}
}
And running in another thread (sometimes multiple if two requests appear at once):
...
synchronized(queue)// This is the same "queue" as above.
{
return(queue.getSomeInfoAboutItemsOutstanding());
}
...
What happens is the second example ends up permanently blocking (sometimes many, all in different threads) waiting for the "queue" object (both Eclipse debugger and the JMX objects tell me this).
However, suspending the threads reveals that (as expected), the first thread is not within the "synchronized" block!
In the Eclipse debugger (with "Show Monitors" on), if I look at the second thread, which it says is waiting on "queue", it tells me that "queue" is held by the first thread. However, if I look at the first thread, it tells me that it does NOT own that monitor.
Further, if I commence shutting down the application, as soon as the first thread disappears, Eclipse now tells me that the monitor is owned by some other completely nonsensical thread (and so on).
I've tried to put together a "simple" example, and haven't managed to hit on the right combination yet -- unfortunately, the application is too large to be able to post the full example.
I'm on the latest JDK 1.5 (Update 12), and this is running in a web app through Tomcat 1.5.
The only "different" things I can think of are:
1) The first thread is running at NORM_PRIORITY-1, and the second (ie, stalled) threads are running at NORM_PRIORITY-2. However, I don't think "starvation" is what is occurring here.
2) The second (stalled) threads are called through the apache XMLRPC library -- that is, the method in the second code chunk above is executed via an XMLRPC call controlled by the http://ws.apache.org/xmlrpc/ library.
I've double- and triple-checked for any possible synchronization mis-ordering and anything else I could think of.
Does anyone have any thoughts or ideas?
TIA!
# 1
Is it possible the that first block of code is being called from another block which already holds the lock?
public someCode() {
synchronized(queue) {
callTheCodeFromTheOriginalPost();
}
}
Seems unlikely given the method name and usage implied from the design, but I thought I'd ask.
# 2
No, the first block of code is the "run" method from a "Runnable" (sorry, didn't mention that), and isn't called directly (only indirectly by the Thread object when I call "start" on it).
In any case, I would hope that Eclipse would show up the lock holder, and it doesn't seem to (granted, I could be looking at some kind of Eclipse bug there, but still...)
The JMX stuff will show me the lock that a thread is waiting on -- is there anything similar that I can use to query what locks a thread is _holding_?
I think I'm going to have to put some logging around and within each of these "synchronized" and run it until the problem occurs... Hopefully then I can verify from the trace(s)...
# 3
Its not actually holding the lock outside the synchronized block. I believe it may be because there are so few instructions outside the block that it is very unlikely that another thread wanting the synchronized object will be lucky enough to be tasked in during those few instructions.
You could call Thread.yeild() after you leave the synch block to give other threads a chance.
Remeber that just because a thread leaves a synch block does not mean it yeilds. It will still be scheduled and still run. It can be tasked out when it is in a synch block but any other thread needing that synch block will not be able to obtain it. If it is outside the synch block and is tasked out then an arbitrary other thread will be tasked in. It may or may not want the synch block. If it does then it will obtain it. The probability of this happening in a tight loop containing a synch block is quite low.
Cheers
matfud
# 4
I'd actually done some further testing without the thread priority reduction in place, and I don't seem to be able to trigger the error (of course, this might simply be because of timing issues -- the condition is very difficult to produce, and is only reproducible due to length of the task involved; it takes about 2 or 3 hours of processing before this occurs).
So it appears that your analysis may be correct, except...
The one thing that I'd neglected to mention is that when I do a "queue.pop()", if the queue is empty, it goes into an Object.wait(1000)
(on a completely different object).
I've verified, and the case where it appears "hung", it actually is executing this (Object.wait) code, because the queue is completely empty -- that is, the loop is not "tight", because it is doing very little other than 1-second sleeps.
So, I can understand how that thread may not be tasked in (although I would think the scheduler should prevent starvation?), but if the VM is otherwise idle, wouldn't the thread then be tasked in?
(In the particular case that I have occuring, *every single thread* is in WAIT or TIMED_WAIT states.)
Any ideas?
# 5
Is anything ever calling notifyAll() on the object your threads are calling wait() on?
If not then a thread that sucessfully pops an object from the queue will not wake up any other threads that failed to do so (and hence called wait on the different object). This could cause all other threads to be waiting and only one thread to take all available tasks from the queue.
Cheers
matfud
# 6
PS for code clarity you might want to make the threads that fail to obtain an item from the queue call wait on that queue. As each thread sucessfully obtains an item from the queue it calls notifyAll on the queue and then exits the synch block. This wakes up all other thread that were waiting for the queue to be not-empty so they can try again.
Normally this is done with a while loop around the queue so that if they wake up and there is still nothing there they go back to sleep again.
synchronzied(queue) {
while(queue.isEmpty()) { // assuming this method checks if the queue is emtpy
queue.wait();
}
process = queue.pop();
queue.notifyAll();
}
process.doSomthing()
matfud
# 7
Actually, there's only ever one thread processing items in the queue. The other threads are either adding items to the queue, or checking the size of the queue.
And, yep, stuff is calling notify on the object that I'm waiting on. It doesn't really matter anyway, because the "wait" is not inside the sychronized(queue) (and it's not on the queue) -- so it wouldn't affect the portion that I'm having trouble with.
Now, I'd thought of the notifyAll, but I'm not sure it would apply.
For example, does this do anything:
Thread #1
synchronized(abc)
{
// do something
abc.notifyAll();
}
Thread #2
synchronized(abc)
{
// do something else
}
Assuming that Thread #2 blocks on the synchronized when Thread #1 is in "do something", will the notifyAll call "wake up" Thread #2? Remember, Thread #2 is not "wait"ing on the object. Does "notify" have any effect on something that is blocked trying to synchronize on an object? Or only on items performing a "wait"?
And this still doesn't address why if Thread #1 becomes "idle", Thread #2 doesn't pick up and start executing anyway...?
TIA!
# 8
No this does not do anything. Threads attempting to enter a synchronized block are still running. Threads that call wait are no longer runnning and need a notify to wake them (unless they have a timeout specified). You have to call notify/notiyAll on the object that those other threads called wait() on though.
If only one thread is poping stuff from the queue it does not need to "sleep()" or wait(). It may need to yeild() though. Yeild just says that the current thread has voluntarily given up its time slot so that another thread can (possibly) run and push stuff onto the queue.
matfud
# 9
This is a nice deadlock example. Your thread which pops stuff from your queue is what we call a Consumer, while all the threads which add stuff to your queue are Producers. You might want to read some multi-programming documentation on the Consumer/Producer case...
When your consumer tries to pop something from the Queue, it shouldn't expect to be receiving a result every time. Because right now, you're holding a lock on your queue, and you're waiting until someone adds an element to it because it is empty. On the other hand, your producers are waiting to be able to lock the queue so they can add stuff to it. That's why you never get out of this situation once it has happened.
# 10
> This is a nice deadlock example.
Actually, I've simplified the example, and skipped a lot of stuff. But even in the example above, there should be no possibility of a true deadlock (as matfud pointed out, there is the possibility for starvation, but the portion I skipped where I sleep if the queue is empty prevents that).
> When your consumer tries to pop something from the
> Queue, it shouldn't expect to be receiving a result
> every time. Because right now, you're holding a lock
> on your queue, and you're waiting until someone adds
> an element to it because it is empty. On the other
> hand, your producers are waiting to be able to lock
> the queue so they can add stuff to it. That's why
> you never get out of this situation once it has
> happened.
That's not quite true.
First off, I'm not holding a lock on the queue -- the only time I ever hold a lock on the queue is during the "pop". No locks are held when the queue is empty, and no locks are held during processing of items.
The producer doesn't need to wait for the consumer, unless of course the consumer is, at that exact moment, removing an item from the queue.
So there is nothing here that can deadlock:
1. There are no cases where I lock two (successive) items, so no attempts to acquire a lock from somewhere else can deadlock.
2. No blocking or time-intensive operations are ever done inside a lock.
I fully understand everything that everyone is saying here, but it has all been considered long ago, and none of has explained the issue yet.
Here, I'll post a little clearer example:
Thread #1
(class-instance-level variable) "queue"
(class-instance-level variable) "waitObject"
run()
{
while(!stopRequested)
{
Item item= null;
synchronized(queue)
{
if(queue.size() > 0)
{
item = queue.pop();
}
}
if(item != null)
{
item.process();
}
else
{
synchronized(waitObject)
{
waitObject.wait(1000);
}
}
}
}
Thread #2
...
synchronized(queue) // This is the same "queue" as above.
{
return(queue.getSomeInfoAboutItemsOutstanding());
}
...
If anyone can see any deadlock possibilities in the above, please point them out to me!
(And yes, I understand why matfud suggested doing the sync + wait on the queue itself, in essence making the "pop" into a blocking operation, but there are other considerations that make this a less efficient approach for me. In any case, it would not change nature of the problem.)
matfud,
I do need to "sleep" or "wait" in the loop, b/c otherwise I will consume all available CPU just checking the queue for new items ;-) The "pop" is not a blocking operation.
# 11
> I've got a strange problem where a synchronized(x)
> appears to be holding the monitor for "x", even past
> the end of the synchronized code block.
Disclaimer: I admit to only having skimmed the thread, so I'm not sure if this is relevant. However, the VM is free to move instructions that come after the sync block into the sync block.
// The VM is allowed to make this:
synchronized (foo) {
A
}
B
// behave like this
synchronized (foo) {
A
B
}
jverda at 2007-7-12 18:25:12 >

# 12
> Disclaimer: I admit to only having skimmed the
> thread, so I'm not sure if this is relevant. However,
> the VM is free to move instructions that come after
> the sync block into the sync block.
Hmm, now that suggestion may bear fruit...
In this case, should decompiling the .class file show me this? Or would this be done on-the-fly by the VM?
More importantly, can I prevent this? Can you point me to this documentation?
TIA!
# 13
> In this case, should decompiling the .class file show
> me this? Or would this be done on-the-fly by the
> VM?
JVM.
>
> More importantly, can I prevent this? Can you point
> me to this documentation?
Can't remember where I read it. One or more white papers, books, articles, etc. on multithreading. It should be in the VM spec, but it won't necessarily be easy to find or be stated in an obvious way.
I don't know if you can prevent it. Putting B in a separate sync block might, or putting an additional dummy sync block between your "real" one and B might, but my guess would be that the VM would be allowed to combine those to the same effect anyway.
I kind of take this stance (because it's the easiest and I'm feeling lazy): Given that the spec was deliberately made to allow this, it should be something that in the vast majority of cases should not be a problem.
Syncing is used to say, "only one thread at a time can execute this block." There should be no reason to say, "you MUST have many threads concurrently executing this block." (Which is effectively what you're saying if you insist on not having B inside the sync block.)
So I'm guessing that you're inventing a problem where there isn't one. Your program's correctness shouldn't depend on B not being in the sync block. I may be wrong--maybe you've got an unusual case here and this is a significant issue, but my gut feel is that it's not.
jverda at 2007-7-12 18:25:12 >

# 14
It's not so much a case of "I must have multiple threads executing this block", but rather "I deliberately released the lock so as to permit other threads to continue".
I agree that since Sun put this in to the JVM, it shouldn't be a problem... I guess the real question is whether that is causing me the issue or not ;-)
In any case, I would hope that the VM would not move any blocking code inside of the synchronized (that's just asking for trouble)...
Would there be any way for me to verify whether or not this could be occurring? (I'm guessing "no", unless there's a portion of the JMX that I'm missing that can tell me "what locks do I currently hold".)
# 15
> It's not so much a case of "I must have multiple
> threads executing this block", but rather "I
> deliberately released the lock so as to permit other
> threads to continue".
Right. The VM knows this, but may have decided to release the lock later anyway. It may have determined that it won't be doing a context switch that soon anyway, and so for whatever reason, a reordering there would save cycles and not negatively impact concurrency.
Note that this is just pure speculation--just guessing at what kind of reasoning might lead to this result.
> I agree that since Sun put this in to the JVM, it
> shouldn't be a problem... I guess the real question
> is whether that is causing me the issue or not ;-)
I can't comment on that because I didn't read enough to know what the actual problem is. :-)
> Would there be any way for me to verify whether or
> not this could be occurring? (I'm guessing "no",
> unless there's a portion of the JMX that I'm missing
> that can tell me "what locks do I currently hold".)
This I don't know. I think with 1.5 and 1.6 there are more tools--stuff in the API and external tools--that you can use to monitor the workings of the VM, but I haven't delved into them and don't know if they'd tell you that kind of stuff.
jverda at 2007-7-21 22:07:21 >

# 16
Humm,
You could be correct. The JMM only seems to guarentee that an unlock (synchronization exit) is guarenteed to complete before subsequent lock operations on the SAME monitor.
JMM spec
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr133.pdf
read page 35 of this as it povides another possible option
Overview
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#synchronization
taken from the overview
An unlock on a monitor happens before every subsequent lock on that same monitor.
matfud
# 17
After reading some documentation and stuff on this matter, here's my suggestion:
Thread #1
run()
{
while(!stopRequested)
{
Item item= null;
synchronized(queue)
{
if(queue.size() == 0)
{
queue.wait(); //This will release the lock during the wait, and will acquire it when it is notified
}
item = queue.pop();
}
if(item != null)
{
item.process();
}
}
}
After changing this, ensure that whatever method is used to push something on your Queue also notifies the Queue instead of your old waitObject.
# 18
I still think this is the way it shloud be done.
Perhaps adding a timeout value to the wait() method if you want it to terminate prompty.
synchronzied(queue) {
while(queue.isEmpty()) { // assuming this method checks if the queue is emtpy
queue.wait();
}
process = queue.pop();
queue.notifyAll();
}
process.doSomthing()
Other methods that need the same synch block should also call notify() or notifyAll() on the same monitor
As was said earlier, there is only one thread that pops data and processes it. It needs to wait until data is available to be able to process it. So the blocking mechanism seems to fit he bill. If it is in the wait and an interrupt() comes in then the loop will exit with an InterruptedException. You can check the isInterrupted() flag also, in case the interrupt comes when you are not in the wait() method.
So you can easily stop the thread when you want and you do not have to synchronize on two different objects.
P.S. I'm not sure why you believe you need to have the worker thread sleep for a second.
P.P.S. It is almost always advisable to have wait() called within a while loop as you are never sure when the thread will be woken up. The JMM recommends that threads that are in a wait state get woken up periodically anyway.
matfud
# 19
Why notify when you pop? surely only when you push?
ejpa at 2007-7-21 22:07:21 >

# 20
Normaly this is a symetrical operation.
so those that will push should notify those that will pop (i.e.All)
and those that will pop should notify those that will push (i.e.All)
In some specific operations you can achieve beter performance by a calling the
singular notify() but those are very rare. In any case you should almost always notify to other side of the fence that you have done something. This allows them to take action.
matfud
# 21
I disagree with you matfud here. In the producer/consumer approach, the producer doesn't need to know what the consumer is doing. Just producing is enough. I'd keep the notifies only when pushing ;)
# 22
That's what I would have thought. Notify on pop() would normally only be necessary if the queue was bounded and the producer was waiting for space.
ejpa at 2007-7-21 22:07:21 >

# 23
> I disagree with you matfud here. In the
> producer/consumer approach, the producer doesn't need
> to know what the consumer is doing.
He does if the queue is bounded, can fill up, an he's waiting for a spot to open up so he can enqueue something.
If the queue is unbounded, then you're right.
jverda at 2007-7-21 22:07:21 >

# 24
you're right jverd. That's the only case where a notify could be required on the pop side.