Need advise on GC tuning for large heap size application
We have a java application that runs on AMD 64 bit Opteron Win 2003 SP1. It is configured with -Xmx = 8192M
(64 bit JDK 1.5_08)
The other JVM options are :
-Xms4096m -Xmx8192m -XX:MaxPermSize=256m -Xmn1024m -XX:SurvivorRatio=1-XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
The nature of the application is such that we bootstrap the application with data which is cached. That baseline memory = 2.8-3 GB. After that there is 24x7 transaction processing that leads to lot of transient memory accumulation.
We have observed that the memory keeps growing steadily until it reaches the peak and eventually slows down (due to GC) and eventually goes Out of Memory. Profiling the app does'nt point to any memory leak. What we do see is that the survivor spaces are hardly used and forced GC (from jConsole/jProfiler) does'nt collect some of transient objects in the first or 2nd pass. As a result the tenured region seems to build up.
The options we have used are with the objective of keeping the garbage in the young/survivor spaces as far as possible.
It would help if someone can point out if the GC collector used is best for such an application and if we have used the options correctly.
thanks
Message was edited by:
girirajveng
# 1
Curious coincidence, but this post (published a few hours ago) is probably what you're looking for - http://blogs.sun.com/jonthecollector/entry/when_you_re_at_your
# 2
Run with all your current options but add -XX:+PrintGCTimeStamps and -XX:+PrintGCDetails and send the resulting log to HotSpotGC-Feedback (at) Sun.COM and we'll try to advise you.
How many processors do you have? How many threads does your application have? CMS mostly runs as one thread, so if your application has lots of threads generating garbage, the single CMS thread may not be able to keep up. That will end with CMS bailing out to a full mark-sweep-compact collection, which should find all the garbage, at the cost of stopping your application while it cleans up.
When looking for a memory leak: wait until your application gets to what you think is steady state, and then get a histogram of what's in the heap. Then let it run until it is near(er) OutOfMemory and get another histogram. The difference between those two histograms will be what's "leaking", which for Java programs means things that are still referenced even though you didn't mean them to be.
You say that forcing a GC from jConsole doesn't clean up the garbage "in the first or 2nd pass". That sounds like finalize() methods are getting in your way. Objects with finalize() methods survive the GC pass that finds them to be unreachable, so that their finalize() methods can be called. The space for the objects can't be recovered until the next collection. If you are using finalize() methods heavily, use something like WeakReferences instead, so you can do your own cleanup, and so you have to think about which parts of your data you need to clean up.
It doesn't sound like Jon Masamitsu's advice applies to you, since you claim you have only 3GB of live data in an 8GB heap with a 1GB young generation, so you should have plenty of space for promotions. But a log file would be diagnostic.
Why did you turn -XX:SoftRefLRUPolicyMSPerMB=1 down so far? It seems like that is going to be cleaning your SoftReferences really fast, making them less useful. Especially as you run towards OutOfMemoryError and don't have any free space in the heap.
# 3
Thanks Peter. I will send the log info soon.Some questions: Is ConcurrentMarkSweep better than ParallelGC for tenured region if the rate of garbage creation is quite high? Also, does CMS lead to fragmentation which could hamper GC?
# 4
Thanks for the logs. Based on a preliminary analysis of your GC logs,
it appears highly probable that you are running into bug id 6433335
which is fixed in 5.0 update 10 (often known as 5u10) and will also
be available in 6.0 update 1 (aka 6u1).
5u10 is available for download starting at for example:
http://java.sun.com/javase/downloads/index_jdk5.jsp
Please try 5u10 and let us know whether it fixes your problem.
Please let us know should there be further issues or
questions.
# 5
Hi Can you please provide more details on the bug? Is there some place i can see the details. I shall try with the update 10 of JDK5 and let you guys know.Thanks
# 6
See http://bugs.sun.com/bugdatabase/search.do?process=1&category=hotspot&bugStatus=&subcategory=garbage_collector&type=&keyword=6433335
and in particular:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6433335
If you jave further questions after reading these bug reports,
contact us at hotspotgc dash feedback at sun dot com
and we may be able to help with further questions, or
contact your Sun support / services / account manager
and refer to this forum thread.
# 7
> Is ConcurrentMarkSweep better than ParallelGC for
> tenured region if the rate of garbage creation is
> quite high? Also, does CMS lead to fragmentation
> which could hamper GC?
ParallelGC (depending on heap size and # processors and
your pause time constraints) may be better, especially
for smaller heaps and if you do not have very strict pause
time constraints. It certainly offers better throughput (in teems of
using fewer CPU resources for doing the garbage collection task
and thus giving the application more time to get its work done).
Yes, it compacts the live objects, which
CMS does not. A high rate of promotion into or mutation in
the old generation could be problematic for CMS especially
on platforms where there is not enough spare concurrency
to use for CMS.
For better descriptions of these trade-offs between and descriptions of
the collectors, please refer to the documentation available
from for example:
http://java.sun.com/javase/technologies/hotspot/index.jsp
in particular:
http://java.sun.com/javase/technologies/hotspot/gc/index.jsp
Note, also wrt your previous emails and the
associated logs that you sent, that though we see
evidence of scavenges (minor collections) slowing down
(because of the bug id cited above, we believe), we do not
see any evidence of too much pressure on the CMS thread.
We also do not see any direct evidence of the kinds of
"out of memory" conditions that you state.
Perhaps we can take this discussion off-line from this
forum and on to the hotspotgc dash feedback ... alias
to get to the bottom of the issue you are concerned about.
Let us do that once you have had a chance to run with 5u10
and have new data to share with us.
