Releasing tenure space explicitly?
Hi,
It might sound sacrilege, but I would like to have better, may I say explicit control over memory handling, especially garbage collection.
I am working on an application which has to read 120+ million data records from a file, sort and summarize them (and write them into another file). I build up a huge TreeMap for this.
The problem is, that the memory might not be enough for the whole TreeMap (not even 16Gigs). Of course, I expect this, so when there is no more memory available, I write the TreeMap into a sorted partial output file and at the end of the processing I merge the sorted partial files into the final output file.
It would be all nice and cool (and even fast enough) if Java's memory handling was not working against me.
First of all, it is not easy to say when will the heap be full. The leaves of the TreeMap obviously will end up in the tenure space, but I cannot check how much free space is in it as the only info I can get (System.free/total/maxMemory) is about the status of the whole heap. So it might be possible that I belive that there are still a few gigabytes of free memory while the tenure space is full. ****Is there no way to know the tenure space usage?**** I overcome this problem by establishing a threshold: size of young space + 10% of tenure space must be free. But occasionally the young space is full while the tenure space is far from being full and I am forced to write a partial file (which takes a lot of time).
Second: after writting the partial file, I can clear and delete the TreeMap and build a new one. But since I cannot control garbage collection, I cannot tell JVM to clear up the previous tree and give me a nice, huge clear memory. I can only hope that it will happen in time (well, I abuse System.runFinalization() and System.gc() and that so far did the trick). ****Why can't I say: I want garbage collection now! I can wait while you clean up.**** I do not think that would be unsafe.
I have a feeling that Java assumes that the programmer is not smart enough to know when he can clear up his mess. It's tying my hand although I know exactly what I need.
All in all, I believe a System.gcNow(), a System.freeTenureSpace() (and the other related functions), and maybe functions to retrieve memory settings like young/tenure ratio would help those, who know and dare. The rest could happily ingore them the way they ignored garbage collection anyways.
Free hands for the programmers!
Regards,
Szilard Barany, Idiro Technologies
[2568 byte] By [
Szilarda] at [2007-10-3 0:36:28]

> --
> The problem is, that the memory might not be
> enough for the whole TreeMap (not even 16Gigs). Of
> course, I expect this, so when there is no more
> memory available, I write the TreeMap into a sorted
Do you detect the fact that there is no more heap memory available
based on getting an OutOfMemoryError?
> partial output file and at the end of the processing
> I merge the sorted partial files into the final
> output file.
> --
...
> --
> First of all, it is not easy to say when will the
> heap be full. The leaves of the TreeMap obviously
> will end up in the tenure space, but I cannot check
> how much free space is in it as the only info I can
> get (System.free/total/maxMemory) is about the status
> of the whole heap. So it might be possible that I
> belive that there are still a few gigabytes of free
> memory while the tenure space is full. ****Is there
> no way to know the tenure space usage?**** I overcome
Not that I think your approach above is the right way to solve your problem,
but check the Monitoring And Management API's in JDK 5.0
or how to query more specific details of the heap status,
as well as how to set a "low memory threshold" notification
service. See:
http://java.sun.com/j2se/1.5.0/docs/guide/management/index.html
You may redirect follow-up posts to the Monitoring and Management
Forum for more detailed expertise and help, should you decide
to use that functionality.
> this problem by establishing a threshold: size of
> young space + 10% of tenure space must be free. But
> occasionally the young space is full while the tenure
> space is far from being full and I am forced to write
> a partial file (which takes a lot of time).
>
> Second: after writting the partial file, I can
> clear and delete the TreeMap and build a new one. But
> since I cannot control garbage collection, I cannot
> tell JVM to clear up the previous tree and give me a
> nice, huge clear memory. I can only hope that it will
> happen in time (well, I abuse
> System.runFinalization() and System.gc() and that so
> far did the trick). ****Why can't I say: I want
> garbage collection now! I can wait while you clean
> up.**** I do not think that would be unsafe.
>
I do not understand your question here, since you clearly
do know about System.gc() which does precisely what you
want above. Perhaps you could elaborate a bit?
> I have a feeling that Java assumes that the
> programmer is not smart enough to know when he can
> clear up his mess. It's tying my hand although I know
> exactly what I need.
>
> All in all, I believe a System.gcNow(), a
> System.freeTenureSpace() (and the other related
> functions), and maybe functions to retrieve memory
> settings like young/tenure ratio would help those,
> who know and dare. The rest could happily ingore
> them the way they ignored garbage collection
> anyways.
> --
> Free hands for the programmers!
>
As I noted the "Monitoring and Management APIs", pointer above
may provide you some of the information you are seeking.
However, the "management" part of it does not today and is
unlikely to, at least in the near future, provide a means by which
you could ask for GC of selective subspaces of the heap,
which although theoretically possible would likely not be
what programmers would want to have to deal with and would
lead to an overly complex and unportable implementation of
the typical application.
> Regards,
>
>Szilard Barany, Idiro Technologies
Hi, thank you for your reply.
> > --
> > The problem is, that the memory might not be
> > enough for the whole TreeMap (not even 16Gigs). Of
> > course, I expect this, so when there is no more
> > memory available, I write the TreeMap into a sorted
>
> Do you detect the fact that there is no more heap memory available
> based on getting an OutOfMemoryError?
No. My understanding is that when I get OutOfMemoryError, it is already too late: there is no more memory left, not enough even for clean up (i.e. that I cannot recover from this exception). I check System.freeMemory continuously in the application, and when it goes below a calculated limit (what I calculate as a given percentage of the totalMemory), I save the partial results and release memory.
> Not that I think your approach above is the right way to solve your problem,
> but check the Monitoring And Management API's in JDK 5.0
> or how to query more specific details of the heap status,
> as well as how to set a "low memory threshold" notification
> service. See:
>
> http://java.sun.com/j2se/1.5.0/docs/guide/management/index.html
>
> You may redirect follow-up posts to the Monitoring and Management
> Forum for more detailed expertise and help, should you decide
> to use that functionality.
I had a quick look to the API and it seems to be very interesting and useful. Nonethless, your comment: Not that I think your approach above is the right way to solve your problem made me courious: where is the problem in my approach? Could you suggest something better?
> > System.runFinalization() and System.gc() and that so
> > far did the trick). ****Why can't I say: I want
> > garbage collection now! I can wait while you clean
> > up.**** I do not think that would be unsafe.
> >
>
> I do not understand your question here, since you clearly
> do know about System.gc() which does precisely what you
> want above. Perhaps you could elaborate a bit?
My understanding is that I can only request GC to run, but it is the JVM's decision when will it actually happen, thus it might happen too late for me. Am I wrong?
> However, the "management" part of it does not today and is
> unlikely to, at least in the near future, provide a means by which
> you could ask for GC of selective subspaces of the heap,
> which although theoretically possible would likely not be
> what programmers would want to have to deal with and would
> lead to an overly complex and unportable implementation of
> the typical application.
What I really hoped to have is tools that would allow those who dare (or are in need) to explicitely manage the heap, while the rest (e.g. me most of the time) could use the current, automatic solution.
A brief intro at the end: I am an (Oracle) database desinger/developer originally, not a Java developer. I had this data record summary/merge problem what I have tried to solve in the database originally. Well, not me, I only inherited this solution what I had to improve because it took 4 weeks (and a lot of memory) to run. With my limited Java knowledge/experience, my prototype solution did the same thing in 15 hours. If I can satisfyingly solve the heap handling it would be a production level solution (and would not require a lot of memory). I was thinking about C, but the ease of use of Java APIs were too tempting.
Thanks for your help,
Szilard
Hi Szilard --
> I check
> System.freeMemory continuously in the application,
> and when it goes below a calculated limit (what I
> calculate as a given percentage of the totalMemory),
> I save the partial results and release memory.
>
...
> ... Nonethless, your comment:
> Not that I think your approach above is the right
> way to solve your problem made me courious: where
> is the problem in my approach? Could you suggest
> something better?
I wonder if you might be able to bound the heap usage of
your program statically to some sufficiently small
value by other means, and release memory the way
you do above buit without explicit querying of the heap
state to make that decision. Armed with the knowledge
that your program will never exceed this statically
computed threshold of heap memory, you can just
let GC take care of recycling unused space at the
appropriate times.
> My understanding is that I can only request GC to
> run, but it is the JVM's decision when will it
> actually happen, thus it might happen too late for
> me. Am I wrong?
You are right that the spec itself is weak in the sense that
the above description would be a complying implementation.
Nonetheless, every JVM that I know of today interprets the
System.gc() call more strictly, and indeed, when the call
returns the JVM has already made a best effort to reclaim
unused space in the heap. The only problem is that by the time
the call returns, finalizers of all finalizable objects may not necessarily
have run, and the space used by these Reference objects and
their cohorts may not have been reclaimed as a result.
Were you asking for something that had the effect of waiting for the
finalization queue to drain before returning?
> A brief intro at the end: I am an (Oracle) database
> desinger/developer originally, not a Java developer.
> I had this data record summary/merge problem what I
> have tried to solve in the database originally. Well,
> not me, I only inherited this solution what I had to
> improve because it took 4 weeks (and a lot of memory)
> to run. With my limited Java knowledge/experience, my
> prototype solution did the same thing in 15 hours. If
> I can satisfyingly solve the heap handling it would
> be a production level solution (and would not require
> a lot of memory). I was thinking about C, but the
> ease of use of Java APIs were too tempting.
Hopefully, some combination of the Java M&M API's
for low memory notification or statically bounding the
heap usage would allow you to continue using Java
and achieve your end.
