Need advise on GC tuning for large heap size application

We have a java application that runs on AMD 64 bit Opteron Win 2003 SP1. It is configured with -Xmx = 8192M

(64 bit JDK 1.5_08)

The other JVM options are :

-Xms4096m -Xmx8192m -XX:MaxPermSize=256m -Xmn1024m -XX:SurvivorRatio=1-XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75

The nature of the application is such that we bootstrap the application with data which is cached. That baseline memory = 2.8-3 GB. After that there is 24x7 transaction processing that leads to lot of transient memory accumulation.

We have observed that the memory keeps growing steadily until it reaches the peak and eventually slows down (due to GC) and eventually goes Out of Memory. Profiling the app does'nt point to any memory leak. What we do see is that the survivor spaces are hardly used and forced GC (from jConsole/jProfiler) does'nt collect some of transient objects in the first or 2nd pass. As a result the tenured region seems to build up.

The options we have used are with the objective of keeping the garbage in the young/survivor spaces as far as possible.

It would help if someone can point out if the GC collector used is best for such an application and if we have used the options correctly.

thanks

Message was edited by:

girirajveng

[1347 byte] By [girirajvenga] at [2007-11-26 14:54:01]
# 1
Curious coincidence, but this post (published a few hours ago) is probably what you're looking for - http://blogs.sun.com/jonthecollector/entry/when_you_re_at_your
bharathcha at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 2

Run with all your current options but add -XX:+PrintGCTimeStamps and -XX:+PrintGCDetails and send the resulting log to HotSpotGC-Feedback (at) Sun.COM and we'll try to advise you.

How many processors do you have? How many threads does your application have? CMS mostly runs as one thread, so if your application has lots of threads generating garbage, the single CMS thread may not be able to keep up. That will end with CMS bailing out to a full mark-sweep-compact collection, which should find all the garbage, at the cost of stopping your application while it cleans up.

When looking for a memory leak: wait until your application gets to what you think is steady state, and then get a histogram of what's in the heap. Then let it run until it is near(er) OutOfMemory and get another histogram. The difference between those two histograms will be what's "leaking", which for Java programs means things that are still referenced even though you didn't mean them to be.

You say that forcing a GC from jConsole doesn't clean up the garbage "in the first or 2nd pass". That sounds like finalize() methods are getting in your way. Objects with finalize() methods survive the GC pass that finds them to be unreachable, so that their finalize() methods can be called. The space for the objects can't be recovered until the next collection. If you are using finalize() methods heavily, use something like WeakReferences instead, so you can do your own cleanup, and so you have to think about which parts of your data you need to clean up.

It doesn't sound like Jon Masamitsu's advice applies to you, since you claim you have only 3GB of live data in an 8GB heap with a 1GB young generation, so you should have plenty of space for promotions. But a log file would be diagnostic.

Why did you turn -XX:SoftRefLRUPolicyMSPerMB=1 down so far? It seems like that is going to be cleaning your SoftReferences really fast, making them less useful. Especially as you run towards OutOfMemoryError and don't have any free space in the heap.

PeterKesslera at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 3
Thanks Peter. I will send the log info soon.Some questions: Is ConcurrentMarkSweep better than ParallelGC for tenured region if the rate of garbage creation is quite high? Also, does CMS lead to fragmentation which could hamper GC?
girirajvenga at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 4

Thanks for the logs. Based on a preliminary analysis of your GC logs,

it appears highly probable that you are running into bug id 6433335

which is fixed in 5.0 update 10 (often known as 5u10) and will also

be available in 6.0 update 1 (aka 6u1).

5u10 is available for download starting at for example:

http://java.sun.com/javase/downloads/index_jdk5.jsp

Please try 5u10 and let us know whether it fixes your problem.

Please let us know should there be further issues or

questions.

ramki_at_jdca at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 5
Hi Can you please provide more details on the bug? Is there some place i can see the details. I shall try with the update 10 of JDK5 and let you guys know.Thanks
girirajvenga at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 6

See http://bugs.sun.com/bugdatabase/search.do?process=1&category=hotspot&bugStatus=&subcategory=garbage_collector&type=&keyword=6433335

and in particular:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6433335

If you jave further questions after reading these bug reports,

contact us at hotspotgc dash feedback at sun dot com

and we may be able to help with further questions, or

contact your Sun support / services / account manager

and refer to this forum thread.

ramki_at_jdca at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 7

> Is ConcurrentMarkSweep better than ParallelGC for

> tenured region if the rate of garbage creation is

> quite high? Also, does CMS lead to fragmentation

> which could hamper GC?

ParallelGC (depending on heap size and # processors and

your pause time constraints) may be better, especially

for smaller heaps and if you do not have very strict pause

time constraints. It certainly offers better throughput (in teems of

using fewer CPU resources for doing the garbage collection task

and thus giving the application more time to get its work done).

Yes, it compacts the live objects, which

CMS does not. A high rate of promotion into or mutation in

the old generation could be problematic for CMS especially

on platforms where there is not enough spare concurrency

to use for CMS.

For better descriptions of these trade-offs between and descriptions of

the collectors, please refer to the documentation available

from for example:

http://java.sun.com/javase/technologies/hotspot/index.jsp

in particular:

http://java.sun.com/javase/technologies/hotspot/gc/index.jsp

Note, also wrt your previous emails and the

associated logs that you sent, that though we see

evidence of scavenges (minor collections) slowing down

(because of the bug id cited above, we believe), we do not

see any evidence of too much pressure on the CMS thread.

We also do not see any direct evidence of the kinds of

"out of memory" conditions that you state.

Perhaps we can take this discussion off-line from this

forum and on to the hotspotgc dash feedback ... alias

to get to the bottom of the issue you are concerned about.

Let us do that once you have had a chance to run with 5u10

and have new data to share with us.

ramki_at_jdca at 2007-7-8 8:42:25 > top of Java-index,Java HotSpot Virtual Machine,Specifications...