Objects getting tenured in sudden/short bursts

Hello,

I have observed the following behavior on one of our production servers running a real time system:

a) frequently, a large number (easily about 100 MB) of young generation objects are tenured over a short period of time (generally in the 2-3 minute time frame but sometimes less)

b) to see if a JVM upgrade affects/fixes this, I recently upgraded from *1.5 Update 6* to *1.6 Update 1*, but it doesnt seem to have made much difference in the sudden jumps in old generation memory

c) with 1.5, there were 'concurrent mode failure' errors in the GC log - with 1.6, those have gone away but there are "CMS: abort preclean due to time" errors, so I am not sure how effective the CMS GC really is here

I am using the following JVM options:

-Xss128k -Xmx1280m -Xms1280m -Xmn400m -XX:MaxPermSize=128m -XX:PermSize=64m

-XX:-UseParNewGC [have also tried -XX:+UseParNewGC]

-XX:+UseConcMarkSweepGC

-XX:TargetSurvivorRatio=90

-XX:MaxTenuringThreshold=1

-XX:+DisableExplicitGC

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.port=9999 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dsun.rmi.dgc.server.gcInterval=600000 -Dsun.rmi.dgc.client.gcInterval=600000

Any help with this is very much appreciated.

Srini.

Extracts from GC log:

...

33104.784: [GC 33104.784: [DefNew: 331143K->6384K(368640K), 0.0554633 secs] 382613K->58620K(1392640K), 0.0557271 secs]

33140.657: [GC 33140.658: [DefNew: 334064K->33299K(368640K), 0.1612125 secs] 386300K->86091K(1392640K), 0.1614336 secs]

33181.425: [GC 33181.425: [DefNew: 360979K->329K(368640K), 0.2583737 secs] 413771K->118776K(1392640K), 0.2585926 secs]

33211.468: [GC 33211.469: [DefNew: 328009K->362K(368640K), 0.4760715 secs] 446456K->249969K(1392640K), 0.4762905 secs]

33261.450: [GC 33261.451: [DefNew: 328042K->609K(368640K), 0.0317898 secs] 577649K->250275K(1392640K), 0.0320174 secs]

33295.271: [GC 33295.271: [DefNew: 243172K->535K(368640K), 0.0266738 secs] 492838K->250312K(1392640K), 0.0268943 secs]

33305.028: [GC 33305.029: [DefNew: 328215K->242K(368640K), 0.9267009 secs] 577992K->512326K(1392640K), 0.9269350 secs]

33305.959: [GC [1 CMS-initial-mark: 512084K(1024000K)] 515415K(1392640K), 0.0033297 secs]

33305.963: [CMS-concurrent-mark-start]

33306.652: [CMS-concurrent-mark: 0.690/0.690 secs]

33306.652: [CMS-concurrent-preclean-start]

33306.659: [CMS-concurrent-preclean: 0.006/0.007 secs]

33306.659: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 33311.672: [CMS-concurrent-abortable-preclean: 0.512/5.013 secs]

33311.676: [GC[YG occupancy: 51916 K (368640 K)]33311.676: [Rescan (non-parallel) 33311.676: [grey object rescan, 0.0027005 secs]33311.679: [root rescan, 0.1012596 secs], 0.1040770 secs]33311.780: [weak refs processing, 0.0013688 secs] [1 CMS-remark: 512084K(1024000K)] 564001K(1392640K), 0.1059224 secs]

33311.784: [CMS-concurrent-sweep-start]

33311.844: [CMS-concurrent-sweep: 0.060/0.060 secs]

33311.844: [CMS-concurrent-reset-start]

33311.901: [CMS-concurrent-reset: 0.056/0.056 secs]

33353.364: [GC 33353.364: [DefNew: 327922K->647K(368640K), 0.0331175 secs] 622136K->294905K(1392640K), 0.0333633 secs]

33404.531: [GC 33404.531: [DefNew: 328327K->394K(368640K), 0.0343957 secs] 622585K->294837K(1392640K), 0.0346283 secs]

.....

122196.870: [GC 122196.870: [DefNew: 335723K->33398K(368640K), 0.1694331 secs] 647278K->345664K(1392640K), 0.1696536 secs]

123378.669: [GC 123378.669: [DefNew: 361078K->3955K(368640K), 0.2014420 secs] 673344K->381801K(1392640K), 0.2016926 secs]

125537.896: [GC 125537.896: [DefNew: 331635K->8170K(368640K), 0.0786210 secs] 709481K->386306K(1392640K), 0.0788437 secs]

127581.547: [GC 127581.547: [DefNew: 335850K->5213K(368640K), 0.0953335 secs] 713986K->383790K(1392640K), 0.0955752 secs]

127636.773: [GC 127636.774: [DefNew: 332893K->0K(368640K), 0.1591214 secs] 711470K->379327K(1392640K), 0.1593282 secs]

127672.690: [GC 127672.690: [DefNew: 327680K->0K(368640K), 0.4753881 secs] 707007K->510399K(1392640K), 0.4755892 secs]

127731.905: [GC 127731.905: [DefNew: 327680K->0K(368640K), 0.0300905 secs] 838079K->510400K(1392640K), 0.0302818 secs]

127747.916: [GC 127747.916: [DefNew: 92907K->0K(368640K), 0.0116197 secs] 603307K->510400K(1392640K), 0.0118132 secs]

127760.320: [GC 127760.320: [DefNew: 327680K->0K(368640K), 0.9315203 secs] 838080K->772544K(1392640K), 0.9317222 secs]

127761.253: [GC [1 CMS-initial-mark: 772544K(1024000K)] 776067K(1392640K), 0.0024436 secs]

127761.256: [CMS-concurrent-mark-start]

127762.072: [CMS-concurrent-mark: 0.816/0.816 secs]

127762.072: [CMS-concurrent-preclean-start]

127762.078: [CMS-concurrent-preclean: 0.006/0.007 secs]

127762.078: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 127767.180: [CMS-concurrent-abortable-preclean: 0.487/5.102 secs]

127767.184: [GC[YG occupancy: 38613 K (368640 K)]127767.184: [Rescan (non-parallel) 127767.184: [grey object rescan, 0.0029299 secs]127767.187: [root rescan, 0.0925664 secs], 0.0956100 secs]127767.280: [weak refs processing, 0.0040487 secs] [1 CMS-remark: 772544K(1024000K)] 811158K(1392640K), 0.1002486 secs]

127767.286: [CMS-concurrent-sweep-start]

127767.462: [CMS-concurrent-sweep: 0.175/0.175 secs]

127767.462: [CMS-concurrent-reset-start]

127767.516: [CMS-concurrent-reset: 0.054/0.054 secs]

127819.603: [GC 127819.603: [DefNew: 327680K->1K(368640K), 0.0323185 secs] 868214K->540536K(1392640K), 0.0325146 secs]

128817.575: [GC 128817.575: [DefNew: 327681K->4032K(368640K), 0.0714104 secs] 868216K->544567K(1392640K), 0.0716335 secs]

128864.744: [GC 128864.744: [DefNew: 331712K->0K(368640K), 0.1596496 secs] 872247K->541367K(1392640K), 0.1598559 secs]

....

[6192 byte] By [Srini_Padmanabhana] at [2007-11-27 1:14:49]
# 1

CMS GC will start earlier, even when you have sufficient space in your tenured space. CMS GC will run concurrently and will try to complete before the heap gets full.

Increase the -XX:MaxTenuringThreshold=1. default value is 15 with java 1.5u6 and java 1.6

Or use Parallel old gc available since java 1.5u6

-XX:+UseParallelOldGC

Thanigaivel.Ma at 2007-7-11 23:50:09 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 2

Thanks Thanigaivel - I have tried higher MaxTenuringThreshold valueswithout much success.

I added the following 2 options to see if I can prevent objects from being tenured too quickly (perhaps) to old generation:

-XX:SurvivorRatio=8

-XX:MaxTenuringThreshold=15

It doesnt seem to have helped - for the first CMS GC attempt, I see the following (note the 'abort preclean due to time'):

...

20562.702: [GC 20562.703: [DefNew: 302102K->36087K(368640K), 0.9180833 secs] 740648K->478049K(1392640K), 0.9183933 secs]

20624.978: [GC 20624.979: [DefNew: 363767K->37315K(368640K), 1.3378077 secs] 805729K->544812K(1392640K), 1.3381429 secs]

20670.089: [GC 20670.089: [DefNew: 364995K->34729K(368640K), 1.4574405 secs] 872492K->676796K(1392640K), 1.4576879 secs]

20671.566: [GC [1 CMS-initial-mark: 642067K(1024000K)] 681076K(1392640K), 0.6204847 secs]

20672.187: [CMS-concurrent-mark-start]

20674.148: [CMS-concurrent-mark: 1.961/1.961 secs]

20674.149: [CMS-concurrent-preclean-start]

20674.157: [CMS-concurrent-preclean: 0.008/0.008 secs]

20674.157: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 20679.350: [CMS-concurrent-abortable-preclean: 0.220/5.193 secs]

20679.368: [GC[YG occupancy: 78067 K (368640 K)]20679.368: [Rescan (non-parallel) 20679.368: [grey object rescan, 0.0027378 secs]20679.371: [root rescan, 0.6556769 secs], 0.6585455 secs]20680.027: [weak refs processing, 0.0302429 secs] [1 CMS-remark: 642067K(1024000K)] 720135K(1392640K), 0.6896664 secs]

20680.060: [CMS-concurrent-sweep-start]

20680.639: [CMS-concurrent-sweep: 0.553/0.579 secs]

20680.639: [CMS-concurrent-reset-start]

20680.690: [CMS-concurrent-reset: 0.051/0.051 secs]

20732.561: [GC 20732.562: [DefNew: 362409K->36191K(368640K), 0.9092512 secs] 897737K->571519K(1392640K), 0.9095599 secs]

...

Additionally, after some time, promotion fails and a full GC is performed instead (4 seconds). I am not sure if I am reading this correctly, but it seems that it is the young generation GC that fails (due to young generation guarantee not being satisfied? I thought this was not a concern starting JDK 1.5).

...

32627.882: [GC 32627.883: [DefNew: 363119K->36446K(368640K), 1.2011208 secs] 1165514K->838840K(1392640K), 1.2013846 secs]

32648.847: [GC 32648.848: [DefNew: 158656K->36811K(368640K), 1.2333596 secs] 961051K->839205K(1392640K), 1.2336504 secs]

32661.111: [GC 32661.111: [DefNew (promotion failed): 364491K->364491K(368640K), 1.8605289 secs] 1166885K->1166885K(1392640K), 1.8607401 secs]

GC locker: Trying a full collection because scavenge failed

32662.972: [Full GC 32662.972: [CMS: 802394K->300980K(1024000K), 4.0932922 secs] 1166885K->300980K(1392640K), [CMS Perm : 22534K->22275K(65536K)], 4.0934981 secs]

32716.273: [GC 32716.273: [DefNew: 327680K->1177K(368640K), 0.0396261 secs] 628660K->302157K(1392640K), 0.0398532 secs]

...

Additionally, there are still times when the old generation size jumps 100-200 MB in a matter of seconds so the main problem hasn't gone away.

Thanks.

Srini.

Srini_Padmanabhana at 2007-7-11 23:50:09 > top of Java-index,Java HotSpot Virtual Machine,Specifications...