Large variance in performance
I posted this question in the "Java programming" forum before: http://forum.java.sun.com/thread.jspa?threadID=5172497 but since I didn't get replies I am now trying it here.
I am performing a sequential computation on a linux pentium 4 machine with hyperthreading, j2se 5.0 ergonomics classifies it as server class machine (so it is run in the ServerVM with the ParallelGC garbage collector). When i perform 11 runs sequentially I get the following times per run:
18.237953
12.537785
21.937158
30.941151
33.154015
32.846168
12.452541
32.997084
32.312898
34.365582
32.99406
The garbage collection is as follows:
0.740: [GC [PSYoungGen: 3072K->416K(3584K)] 3072K->416K(32320K), 0.0078130 secs]
1.010: [GC [PSYoungGen: 3488K->496K(6656K)] 3488K->604K(35392K), 0.0055310 secs]
1.354: [GC [PSYoungGen: 6640K->496K(6656K)] 6748K->1004K(35392K), 0.0097850 secs]
18.253: [GC [PSYoungGen: 6640K->504K(12800K)] 7148K->4236K(41536K), 0.0464690 secs]
18.448: [GC [PSYoungGen: 12792K->496K(12800K)] 16524K->4412K(41536K), 0.0071670 secs]
19.031: [GC [PSYoungGen: 12784K->2744K(27328K)] 16700K->6660K(56064K), 0.0283700 secs]
31.391: [GC [PSYoungGen: 26808K->784K(27840K)] 30724K->4922K(56576K), 0.0110660 secs]
53.050: [GC [PSYoungGen: 24848K->368K(51712K)] 28986K->4618K(80448K), 0.0050360 secs]
116.870: [GC [PSYoungGen: 48496K->3568K(51712K)] 52746K->8330K(80448K), 0.0496570 secs]
150.434: [GC [PSYoungGen: 51696K->2336K(49024K)] 56458K->7102K(77760K), 0.0179350 secs]
195.362: [GC [PSYoungGen: 48992K->272K(45568K)] 53758K->5042K(74304K), 0.0023420 secs]
261.851: [GC [PSYoungGen: 45520K->4096K(48256K)] 50290K->8870K(76992K), 0.0405260 secs]
Changing the garbage collector, e.g. to -XX:+UseTrainGC, greatly improves the measured times:
18.52931
12.87743
15.654636
12.859898
12.854254
12.789243
12.775043
12.704667
13.45864
12.723262
12.71799
This garbage collector runs much more frequently, yet the performance of the application is much better. I believe this has to do with the size of the young generation, which continually grows in the default case, but is fixed at around 4mb in the second case:
157.140: [GC 157.140: [ParNew: 3968K->0K(4032K), 0.0003940 secs] 12251K->8283K(16320K), 0.0005060 secs]
I performed one more test where the heap size (Xms and Xmx) is set to 1 GB, which results in run times:
38.439123
33.043899
32.994742
33.035311
33.425609
32.955553
33.248379
32.792764
32.954896
32.643395
33.348452
And the garbage collector reports:
104.630: [GC [PSYoungGen: 87424K->272K(101952K)] 87424K->272K(1034048K), 0.0024790 secs]
204.088: [GC [PSYoungGen: 87696K->328K(101952K)] 87696K->328K(1034048K), 0.0019860 secs]
303.121: [GC [PSYoungGen: 87752K->360K(101952K)] 87752K->360K(1034048K), 0.0025450 secs]
Does anybody have an idea where the VM in the default (first) and third cases spends all the extra time on, or know tools that can be used to identify it ?

