core dumps: analysing, and security

I have occaisionally seen some obscure JVM crashes on my machine.

See, for example, my latest bug report:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6386633

I have configured the JVM on my windows machine to generate core dump files in cases of crashes. In particular, I use the command line options

set errorHandling=-XX:OnError="userdump.exe %%p"

To understand the above, see Section 3.2.4 starting at page 40 and 121 of http://java.sun.com/j2se/1.5/pdf/jdk50_ts_guide.pdf

(CRITICAL: must escape the % char if use it in an env var as above) as well as

http://support.microsoft.com/?kbid=241215

http://download.microsoft.com/download/win2000srv/Utility/3.0/NT45/EN-US/Oem3sr2.zip

http://www-128.ibm.com/developerworks/java/library/j-tiger03175.html

I have 2 questions concerning the core dump file produced by the above.

First, does anyone know how to analyse it? What tools do you use, and if you could reference any useful tutorial URLs I would be grateful.

Second, if I was to email the core dump file to someone else for analysis (e.g. Sun), is there any breach of security that I should worry about?

In particular, could they extract the complete byte code of my program (or at least the byte code for all loaded classes) and then decompile it and reconstruct my program? I ask because some of the code is proprietary.

[1442 byte] By [bbatmana] at [2007-10-2 13:49:51]
# 1
I see the list of loaded libraries included iq_jni.dll. It might be worth running with -Xcheck:jni in case it reports any issues. The other thing to check is if the problem still happens without CMS (ie: drop the -XX:+UseConcMarkSweepGC) option and see if the problem still happens.
alan.batemana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 2

>I see the list of loaded libraries included iq_jni.dll. It might be worth running with -Xcheck:jni in case it reports any issues

Thanks. I will try that today. Hopefully, it will not slow the program at all. (With DTN/IqFeed's java interface to their DLL, you are mainly using jni to start up the DLL; all stock data is read from the DLL into java using a socket interface.)

>The other thing to check is if the problem still happens without CMS (ie: drop the -XX:+UseConcMarkSweepGC) option and see if the problem still happens.

Good observation, but I do not think that it is an issue--my original posting failed to point out these facts (sorry!):

1) the bug that has crashed the JVM several times in the last 2 months always has this listed as the cause in the hs_err_pidXXX file:

VM state:not at safepoint (normal execution)

VM Mutex/Monitor currently owned by a thread: None

so it is likely some fundamental JVM bug

2) the offending thread can be anything; the 3 examples where I have saved the hs_err_pidXXX file record:

Current thread (0x00a36a40): ConcurrentMarkSweepThread [id=2556]

Current thread (0x00a36a38): ConcurrentMarkSweepThread [id=2852]

Current thread (0x49a81a50): JavaThread "PositionManager_WatchDetector" [_thread_in_Java, id=2136]

Yes, 2/3 times it was the CMS thread. But the fact that I have even a single case where an application thread ("PositionManager_WatchDetector") encountered the same bug strongly suggests that CMS itself is not to blame. Furthermore, the CMS thread is doing a lot of the work in my program, so if this error happens at random, 2/3 times may be reasonable.

Any idea how to analyse core dump files and if I can safely send them to others?

bbatmana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 3

Did -Xcheck:jni uncover anything?

Although the crash isn't always in the ConcurrentMarkSweepThread I would suggest running for a time without the CMS options and see if the problem occurs. If it doesn't then it would be useful information to add to the bug report. It would also be good to test with -client and see if the problem happens with the Client VM.

As regards the crash dump. On Windows the windbg debugger can be used to examine the dumps although for crashes like this it often requires quite a bit of knowledge about the VM internals. In Sun there are some other options to examine HotSpot crashes but they aren't "productized" and released. Have you considering logging a support call? That would be the place to bring up your concern about sharing the crash dump.

alan.batemana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 4

>Did -Xcheck:jni uncover anything?

No it didn't.

On the other hand, not only do I not expect it too (because JNI is only used briefly to start up the stock data feed DLL and probably not used thereafter) but also because even if it is the culprit, it only produces crashes say once every 2 weeks, and I have only used that additional check for 1 day of testing which is too short of a time to draw any conclusions from.

>Although the crash isn't always in the ConcurrentMarkSweepThread

>I would suggest running for a time without the CMS options and see if the problem occurs.

>If it doesn't then it would be useful information to add to the bug report.

>It would also be good to test with -client and see if the problem happens with the Client VM.

Hmm, the problem with really sporadic crashes like this is that I would have to run both with CMS and without for fairly long periods of time in order to gather reliable statistics. I suppose that I have run for a while now with it and could switch to a month or so without it.

But I really like CMS! Working with an engineer at sun (thanks, ramki), who suggested all kinds of memory tuning including CMS, I determined that with CMS on the max GC pause time ever seen by my application (and I run with a decent 1.5 GB heap) was in the 60-70 ms range. For a soft real-time application like trading, this is fairly important.

Another thing to mention is that I do not recall seeing this particular crash type with earlier versions of the JDK (I am currently using the latest prd version, 1.5_06). By the way, does anyone know if 1.5_07 will ever be released? Seems like it is due around now...

>As regards the crash dump.

>On Windows the windbg debugger can be used to examine the dumps

>although for crashes like this it often requires quite a bit of knowledge about the VM internals.

Thats what I was worried you would say.

>In Sun there are some other options to examine HotSpot crashes but they aren't "productized" and released.

Nuts.

>Have you considering logging a support call?

>That would be the place to bring up your concern about sharing the crash dump.

Do I have to be in some special kind of developer category (e.g. buy into some kind of sun support)? I will consider that in the future.

bbatmana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 5

I am tracking down a similar (likely the same) problem with my application. For me, there seem to be three key criteria present:

1. Using CMS

2. Loading resources from signed jars

3. Windows w/ multi-processor support - in my case, a hyper-threading enabled Intel processor in represented by two virtual processors.

Changing any one of the above criteria makes the problem go away, although, at a cost.

jvondrana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 6

Thats interesting!

I am not using signed jars at all in my application, but am using Windows w/ multi-processor support via hyper-threading --> two virtual processors just like you. Since I still occaisionally see the bug, then it cannot be as simple as "2 out of 3 ain't bad"--maybe just less bad!

I too would greatly prefer not to give up CMS and HT.

I went ahead and added this forum discussion to comments section of the bug report mentioned in the first post above, especially noting your observation.

How did you ever come up hyper-threading as being a suspect in the first place? Are there a lot of bugs in general associated with HT, or with HT and java?

bbatmana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 7

>> How did you ever come up hyper-threading as being a suspect in the first place? Are there a lot of bugs in general associated with HT, or with HT and java?

My development staff is on a hardware refresh cycle of ~2-3 years. The problem began presenting itself only on newer machines (which were HT enabled) when we began utilizing 1.5.0_04. Once I recognized multiple virtural processors, it made sense. Testing by disabling HT verified my suspicion.

If you're not dependent on 1.5.0_04 or newer capabilites, you may want to consider reverting to an older version. I haven't experienced the jvm crashes in 1.5.0 - 1.5.0_02.

Giving up CMS, HT, or reverting to older runtimes is only be recommended as a temporary workaround.

jvondrana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 8
The suggestion to try without the CMS options was to see if you still observe the crash. Anyway, I checked on 5.0u7 and there are a few CMS fixes. The bugIDs are 6319688 and 6319671. 5.0u7 should be out next month.
alan.batemana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 9

To confirm, running with the default gc avoids the problem for us.

We can do so as a short-term work around, but rely upon CMS to more aggressively manage the tenured generation (Old Gen). A little background ... We have a data intensive UI and find that objects (non-leaks) quickly move to the tenured gen. In low memory situations, we need to recoginize and react by "releasing" these objects and getting this space reclaimed prior to triggering an OutOfMemoryException. The goal is to steer clear of that situration when possible and, thereby, have a more stable application.

jvondrana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 10

>Anyway, I checked on 5.0u7 .. .5.0u7 should be out next month.

Alan: just out of curiousity, where on Sun's website do you find out about upcoming maintenance releases like u7? I have tried searching, to no avail.

I hope that u7 has lots of good bug fixes, cause they sure have been taking their time getting it out.

bbatmana at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 11

> ... A little background ... We

> have a data intensive UI and find that objects

> (non-leaks) quickly move to the tenured gen. In low

> memory situations, we need to recoginize and react by

> "releasing" these objects and getting this space

> reclaimed prior to triggering an

> OutOfMemoryException. The goal is to steer clear of

> that situration when possible and, thereby, have a

> more stable application.

Make sure to use the survivor spaces (by means of

-XX:MaxTenuringThreshold=<n> -XX:SurvivorRatio=<k>)

so as to stem promotion rates into the Old Generation

and thereby reduce the "pressure" on the old generation

collector. That, in conjunction with a larger heap might help

avoid reaching the "saturation point".

On to the crashes you mention, that you believe are related to

CMS and HT. Would it be possible for you to share a test

program with Sun that reproduces the problem? If so, you can

either use your support contact to log the bug (preferred) or

log a bug via SDN. The following URL's provide more details:

http://www.sun.com/service/warrantiescontracts/javamultiplatform.html

http://developers.sun.com/services/

http://bugs.sun.com/services/bugreport/index.jsp

ramki_at_jdca at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 12

> checked on 5.0u7 and there are a few CMS fixes. The

> bugIDs are 6319688 and 6319671. 5.0u7 should be out

> next month.

If you have a support contract, please contact your account

representative and they should be able to get you the

JDK for fix verification.

Alternatively, you can download the 6.0 weekly binaries

(in beta) from:

http://download.java.net/jdk6/binaries/

where the bugs mentioned above are fixed (be sure to use

appropriately recent builds; check the bug reports for which

builds fixed these bugs).

ramki_at_jdca at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 13
Can anyone confirm whether 6386633 has been fixed in 1.6beta2?6386633 is still listed as in progress with no activity for months....thanks!
TomSeva at 2007-7-13 11:49:51 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...
# 14

6386633 turns out to be a duplicate of 6415406 which was fixed in 6.0 b92.

Please download the latest JDK 6 binaries from http://jdk6.dev.java.net

to verify the fix.

If you want the fixes in the 5.0 update stream urgently, please

escalate 6415406 via your account representative or your

product support contract. Please see:

http://developers.sun.com/services

if you do not already have a support channel for your JDK.

ramki_at_jdca at 2007-7-13 11:49:52 > top of Java-index,Developer Tools,Debugging and Profiling Tool APIs...