Help getting ThreadPoolExector to respawn terminated threads
I'm using the ThreadPoolExecutor, and I have a problem. In my usage, I submit a bunch of tasks at the start, filling up the queue, the pool then goes off to work on those tasks until the queue is drained. Sometimes, a task may throw an uncaught exception and terminate. Unfortunately, the thread pool does not restart a new thread in this case, even if there are fewer than corePoolSize threads running, except that if it's the last thread, it will start one (see the code for workerDone). Looking at the code, the pool only spawns new threads when a task is submitted via execute() or when the last thread exits.
My question is, what's the best way to cause the ThreadPoolExecutor to spawn threads up to corePoolSize if there is work in the queue when a thread dies?
One idea I had was to subclass ThreadPoolExecutor and override the afterExecute() method, calling prestartAllCoreThreads() if the thread was exiting due to an exception (e.g. if the passed-in Throwable is not null). This seems a little hacky, and also will only ensure that corePoolSize-1 threads are running (since the thread in which afterExecute is called hasn't exited yet).
It seems strange to me that nobody else has run into this problem, but I couldn't find anything about it when I searched. If I missed something, please send me a pointer.
Thanks.
Andrew
[1374 byte] By [
acertaina] at [2007-11-26 14:09:20]

# 1
This is a known issue addressed in Java 7. Of course the code is public domain so you could potentially incorporate it sooner ...
Two options:
1. As already indicated override afterExecute to call prestartCoreThread - with the noted limitation
2. Don't let tasks throw any exceptions.
You can achieve #2 by wrapping/modifying the tasks themselves, or by defining a ThreadFactory that uses a Thread subtype that doesn't allow the exceptions to propagate out of run. Of course you then need to address the issue of reporting that the exception occurred.
RuntimeExceptions and Errors - which are all run() can normally throw - aren't normal occurrences so executors tend not to drop threads due to this. For submit()ed tasks it is never an issue because FutureTask captures all exceptions.
# 2
davidhomes's second option is absolutly right on.If a thread has an exception, the next thread will probably also have an exception.The hardest part of programming is error recovery. Find out what went wrong and fix it. Yes, it's a lot of work.
# 3
>
> If a thread has an exception, the next thread will
> probably also have an exception.
>
Yes, this is the real problem.
> The hardest part of programming is error recovery.
> Find out what went wrong and fix it. Yes, it's a lot
> of work.
What about if your system cannot stop? One possible solution could be to restart the task. Any other ideas?
Michele
# 4
> What about if your system cannot stop? One possible> solution could be to restart the task. I know for example Quartz ( http://www.opensymphony.com/quartz/) restarts the task if an unchecked exception is thrown.
# 5
I almost never catch Error. If a thread gets an Error, the code is bad and no amount of restarting will make any difference.
For thread pool threads, if a thread throws and Exception or fails to handle an Exception, I log a message, send a notification to the sysadmin and the thread is no longer part of the pool (no restart.) When all threads in the pool are not functioning, I send a message every 搉?minutes to a sysadmin that we need manual intervention.
There抯 a serious problem. There抯 a limit to how much error recovery you can write. Unless you抮e writing code that will run on an inter-galactic space mission or a weapons system where no one can intervene, then get someone involved when there is a serious problem.
# 6
> I almost never catch Error. If a thread gets an
> Error, the code is bad and no amount of restarting
> will make any difference.
>
Yes, agreed.
> For thread pool threads, if a thread throws and
> Exception or fails to handle an Exception, I log a
> message, send a notification to the sysadmin and the
> thread is no longer part of the pool (no restart.)
Ok, so the question is, how can you detect when a non-checked exception happens in a separate thread? The problem is that I can realize the problem only because I see the task is not executed any more.
Michele
# 7
> Ok, so the question is, how can you detect when a
> non-checked exception happens in a separate thread?
> The problem is that I can realize the problem only
> because I see the task is not executed any more.
The answer is: a lot of work.
You need to manage threads. Rather then repeat it all here, have a read in some of the documentation for the Tymeac product.
http://coopsoft.com/TyDoc/Tuning.html
Since it's open source, you can take what you want and leave the rest. The product info starts here:
http://coopsoft.com/JavaProduct.html
This should give you an idea of how difficult it is to manage your threads (really back-end processes.)