how to handle garbage collection in a recursive function

I'm writing a web crawler which basically looks like this:

//Entity class representing a page on the web

class Page

{

...

Page[] children;

.....

}

publicvoid crawl(Page parentPage)

{

....

Page[] children = getChildrenUsingSomeFunction(parentPage);

parentPage.setChidren(children);

entitymanager.persist(parentPage);//write parentPage to database

for(Page child : children)

{

crawl(child);

}

.....

}

After writing an object to the database, I want to remove that object from main memory. A page is always referenced by its parent page, except for the root pages, so the garbage collector will never "remove" a page object from memory. But when crawling a million pages, 1 gig of RAM is obviously not sufficient.

How would you solve this problem?

[1240 byte] By [loestera] at [2007-11-27 5:59:34]
# 1

I'd suggest using not using a tree structure here. What you want is a flat associative table keyed on URL, with the parent URL in the data. You have a queue of URLs to be scanned, your program just keeps taking to top URL off the queue, checking you haven't already put it in your database and, if not, finding all the pointers, and adding them to the end of the queue.

malcolmmca at 2007-7-12 16:36:18 > top of Java-index,Java Essentials,Java Programming...