GC hell ?

Hi,

I'm running JBoss 4.0.5 on Redhat Linux 64 and java 1.5.0_12. The server has two dual core processor and ten gigabyte of ram.

I start JBoss with :

-Xms4096m -Xmx5120m -XX:PermSize=128m -XX:MaxPermSize=256m -XX:NewSize=2048m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -Xloggc:/tmp/gc.log

I have the following problem. At server start the load is very small and the gc is called not very often, for example :

4.356: [GC 314590K->8523K(3932160K), 0.0177040 secs]

4.374: [Full GC 8523K->8318K(3932160K), 0.0849990 secs]

47.272: [GC 1581182K->59216K(3932160K), 0.0860340 secs]

314.685: [GC 1632080K->61497K(3932160K), 0.0628620 secs]

834.124: [GC 1634361K->66492K(3932160K), 0.0693030 secs]

1235.281: [GC 1639356K->66265K(3932160K), 0.0582550 secs]

1519.406: [GC 1639129K->65943K(4109248K), 0.0779130 secs]

2211.883: [GC 1815895K->72202K(3932160K), 0.0790670 secs]

2478.914: [GC 1822154K->75776K(4103296K), 0.0777540 secs]

2725.280: [GC 1993984K->75759K(4031872K), 0.0193570 secs]

2794.302: [GC 1993967K->74001K(4102336K), 0.0129670 secs]

3026.186: [GC 1989009K->77645K(4017088K), 0.0134620 secs]

3604.464: [GC 737189K->79960K(4102464K), 0.0168140 secs]

3604.481: [Full GC 79960K->79901K(4102464K), 0.3914900 secs]

4350.582: [GC 1993373K->86781K(4102464K), 0.0103160 secs]

4634.533: [GC 2000253K->88447K(4105664K), 0.0127350 secs]

5441.497: [GC 2006655K->96467K(4104000K), 0.0134960 secs]

5736.696: [GC 2014675K->94458K(4110528K), 0.0101000 secs]

6554.140: [GC 2022074K->100914K(4108544K), 0.0102490 secs]

6952.151: [GC 2028530K->104049K(4116992K), 0.0095100 secs]

Than after some time that I found depends on the value of NewSize parameter the GC is called a lot of times and the load reach 100% and sometimes 200%. For example :

143806.289: [GC 2258900K->162852K(4193728K), 0.0046960 secs]

143817.558: [GC 2258916K->162884K(4193792K), 0.0044520 secs]

143829.238: [GC 2259012K->162916K(4193792K), 0.0043090 secs]

143840.359: [GC 2259044K->163174K(4193728K), 0.0064450 secs]

143851.252: [GC 2259302K->164061K(4193792K), 0.0050780 secs]

143862.932: [GC 2260189K->164197K(4193152K), 0.0049390 secs]

143874.515: [GC 2259685K->164197K(4193472K), 0.0042450 secs]

143885.381: [GC 2259685K->164773K(4193216K), 0.0052980 secs]

143896.948: [GC 2260005K->164797K(4193344K), 0.0044450 secs]

143908.172: [GC 2260029K->164805K(4193408K), 0.0060260 secs]

143917.971: [GC 2260165K->164893K(4193408K), 0.0053640 secs]

143928.683: [GC 2260253K->164933K(4193536K), 0.0047570 secs]

143939.384: [GC 2260485K->164909K(4193472K), 0.0039900 secs]

143950.040: [GC 2260461K->164909K(4193600K), 0.0040570 secs]

143960.082: [GC 2260589K->164981K(4193536K), 0.0052420 secs]

143970.591: [GC 2260661K->164997K(4193664K), 0.0043150 secs]

143979.495: [GC 1933704K->167118K(4193600K), 0.0110230 secs]

The server is responsive even under such load but I really want to understand why this happens and how to solve this.

The applications makes heavy use of persistence and caches a lot of data (in web sessions) that it's probably not used very frequently but it has a limit on the cache size so I think it's not consuming all the memory.

Any ideas ?

How should I read the gc log ? I don't know how GC works, I tired to read the documentation but it's not very clear to me.

Thank you

[3736 byte] By [Andrea_Vettoria] at [2007-11-27 7:10:55]
# 1

> ... time that I found depends on the value of NewSize parameter

> the GC is called a lot of times

> the load reach 100% and sometimes 200%.

> For example :

>

> 143806.289: [GC 2258900K->162852K(4193728K), 0.0046960 secs]

> 143817.558: [GC 2258916K->162884K(4193792K), 0.0044520 secs]

> 143829.238: [GC 2259012K->162916K(4193792K), 0.0043090 secs]

> 143840.359: [GC 2259044K->163174K(4193728K), 0.0064450 secs]

> 143851.252: [GC 2259302K->164061K(4193792K), 0.0050780 secs]

> 143862.932: [GC 2260189K->164197K(4193152K), 0.0049390 secs]

> 143874.515: [GC 2259685K->164197K(4193472K), 0.0042450 secs]

> 143885.381: [GC 2259685K->164773K(4193216K), 0.0052980 secs]

> 143896.948: [GC 2260005K->164797K(4193344K), 0.0044450 secs]

> 143908.172: [GC 2260029K->164805K(4193408K), 0.0060260 secs]

> 143917.971: [GC 2260165K->164893K(4193408K), 0.0053640 secs]

> 143928.683: [GC 2260253K->164933K(4193536K), 0.0047570 secs]

> 143939.384: [GC 2260485K->164909K(4193472K), 0.0039900 secs]

> 143950.040: [GC 2260461K->164909K(4193600K), 0.0040570 secs]

> 143960.082: [GC 2260589K->164981K(4193536K), 0.0052420 secs]

> 143970.591: [GC 2260661K->164997K(4193664K), 0.0043150 secs]

> 143979.495: [GC 1933704K->167118K(4193600K), 0.0110230 secs]

>

> The server is responsive even under such load

> but I really want to understand why this happens

> and how to solve this.

>

> Any ideas ?

>

> How should I read the gc log?

GC runs for typically less than 0.01 sec approx. every 10 sec.

Not unreasonable.

You should probably look somewhere else for what is taking all that CPU.

tschodta at 2007-7-12 19:02:29 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 2
You need more details about GC process. Use an additional parameter -XX:+PrintGCDetails and post the detailed log.
david327a at 2007-7-12 19:02:29 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 3

Now the load is low...

33253.562: [GC [PSYoungGen: 2021120K->7488K(2061056K)] 2166028K->152396K(4158208K), 0.0108700 secs]

33663.405: [GC [PSYoungGen: 2031296K->5666K(2059904K)] 2176204K->153581K(4157056K), 0.0118820 secs]

34173.009: [GC [PSYoungGen: 2029474K->6368K(2063104K)] 2177389K->158858K(4160256K), 0.0116990 secs]

34693.014: [GC [PSYoungGen: 2034272K->6961K(2061952K)] 2186762K->164552K(4159104K), 0.0142110 secs]

35076.203: [GC [PSYoungGen: 2034865K->5305K(2065472K)] 2192456K->167683K(4162624K), 0.0200090 secs]

35332.122: [GC [PSYoungGen: 2038009K->5472K(2064384K)] 2200387K->170514K(4161536K), 0.0142350 secs]

35405.222: [GC [PSYoungGen: 2038176K->2848K(2067328K)] 2203218K->172610K(4164480K), 0.0163260 secs]

35479.758: [GC [PSYoungGen: 2039584K->2687K(2066560K)] 2209346K->174761K(4163712K), 0.0078300 secs]

35686.838: [GC [PSYoungGen: 2039423K->3279K(2069184K)] 2211497K->177441K(4166336K), 0.0103820 secs]

36013.278: [GC [PSYoungGen: 1821492K->3422K(2068096K)] 1995653K->180303K(4165248K), 0.0112310 secs]

36013.289: [Full GC [PSYoungGen: 3422K->0K(2068096K)] [PSOldGen: 176881K->95158K(2097152K)] 180303K->95158K(4165248K) [PSPermGen: 72579K->72234K(131072K)], 0.5131960 secs]

36492.221: [GC [PSYoungGen: 2040128K->7649K(2072128K)] 2135286K->102808K(4169280K), 0.0095690 secs]

36868.986: [GC [PSYoungGen: 2052897K->3739K(2070272K)] 2148056K->104325K(4167424K), 0.0120880 secs]

37159.652: [GC [PSYoungGen: 2048987K->4385K(2074240K)] 2149573K->107676K(4171392K), 0.0127090 secs]

37395.041: [GC [PSYoungGen: 2054561K->4830K(2073088K)] 2157852K->111846K(4170240K), 0.0085560 secs]

37601.763: [GC [PSYoungGen: 2055006K->4511K(2076480K)] 2162022K->115846K(4173632K), 0.0088880 secs]

37736.801: [GC [PSYoungGen: 2059231K->12335K(2075392K)] 2170566K->127525K(4172544K), 0.0227330 secs]

Reading the documentation it seems like i'm throwing away 2gbs of garbage in the young gen... is it correct ?

How can I find what's inside that 2gbs ?

Andrea_Vettoria at 2007-7-12 19:02:29 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 4

Now the load is high

44793.817: [GC [PSYoungGen: 2094912K->64K(2096064K)] 2480754K->386074K(4193216K), 0.0052980 secs]

44800.755: [GC [PSYoungGen: 2095040K->64K(2096064K)] 2481050K->386074K(4193216K), 0.0052930 secs]

44808.860: [GC [PSYoungGen: 2095040K->576K(2096000K)] 2481050K->386586K(4193152K), 0.0075910 secs]

44817.285: [GC [PSYoungGen: 2095552K->240K(2096064K)] 2481562K->386298K(4193216K), 0.0056800 secs]

44825.646: [GC [PSYoungGen: 2095216K->192K(2096192K)] 2481274K->386338K(4193344K), 0.0056710 secs]

44832.525: [GC [PSYoungGen: 2095360K->128K(2096128K)] 2481506K->386386K(4193280K), 0.0059990 secs]

44841.843: [GC [PSYoungGen: 2095296K->320K(2096256K)] 2481554K->386626K(4193408K), 0.0051910 secs]

44852.122: [GC [PSYoungGen: 2095680K->128K(2096256K)] 2481986K->386594K(4193408K), 0.0062860 secs]

44859.063: [GC [PSYoungGen: 2095488K->64K(2096320K)] 2481954K->386578K(4193472K), 0.0069760 secs]

44867.598: [GC [PSYoungGen: 2095552K->815K(2096320K)] 2482066K->387538K(4193472K), 0.0077110 secs]

Andrea_Vettoria at 2007-7-12 19:02:29 > top of Java-index,Java HotSpot Virtual Machine,Specifications...
# 5

> Now the load is high

>

> 44793.817: [GC [PSYoungGen: 2094912K->64K(2096064K)]

> 2480754K->386074K(4193216K), 0.0052980 secs]

> 44800.755: [GC [PSYoungGen: 2095040K->64K(2096064K)]

> 2481050K->386074K(4193216K), 0.0052930 secs]

> 44808.860: [GC [PSYoungGen: 2095040K->576K(2096000K)]

> 2481050K->386586K(4193152K), 0.0075910 secs]

> 44817.285: [GC [PSYoungGen: 2095552K->240K(2096064K)]

> 2481562K->386298K(4193216K), 0.0056800 secs]

> 44825.646: [GC [PSYoungGen: 2095216K->192K(2096192K)]

> 2481274K->386338K(4193344K), 0.0056710 secs]

> 44832.525: [GC [PSYoungGen: 2095360K->128K(2096128K)]

> 2481506K->386386K(4193280K), 0.0059990 secs]

> 44841.843: [GC [PSYoungGen: 2095296K->320K(2096256K)]

> 2481554K->386626K(4193408K), 0.0051910 secs]

> 44852.122: [GC [PSYoungGen: 2095680K->128K(2096256K)]

> 2481986K->386594K(4193408K), 0.0062860 secs]

> 44859.063: [GC [PSYoungGen: 2095488K->64K(2096320K)]

> 2481954K->386578K(4193472K), 0.0069760 secs]

> 44867.598: [GC [PSYoungGen: 2095552K->815K(2096320K)]

> 2482066K->387538K(4193472K), 0.0077110 secs]

You have lots of objects with a very short life. In other words, your application need lots of memory to store short-lived objects (2Gb in 7 seconds). Is there any particular reason for this? On the other hand you are only using a small portion of your tenured generation (2Gb of free memory under heavy load?).

Large stacks takes more time to garbage collect but it seems you don't have trouble with GC pauses (<0,01 secs).

You can use a profiler to discover what happens in the stack , but I suggest you to use a test environment with a much smaller stack. For this you can use Eclipse TPTP.

Also, if you have no confidence on application supplier you can do a code review and look for unneeded object creation or bad practices in codebase (you can use PMD, Checkstyle...) .

david327a at 2007-7-12 19:02:29 > top of Java-index,Java HotSpot Virtual Machine,Specifications...