crawling in java

hI,

i want to download any site on my local system so i can navigate through it offline.

I found some ready made code provided by different vendors .But the problem is, they are using their own GUI so they have wrtten their own listeners ,methods,so i will have to change my code a lot.2ndly some of the crawlers are not downloading .css,.js files.

So can anybody provide me a crwaler which will download evry file irrespective of its type extn and while integrating i will not have to change my code a lot.

[532 byte] By [Tabrezkhana] at [2007-11-27 6:33:08]
# 1
Here's a monster: http://crawler.archive.org/
quittea at 2007-7-12 17:58:56 > top of Java-index,Java Essentials,Java Programming...
# 2

> Here's a monster:

> http://crawler.archive.org/

@OP: note that when using Heritrix you will have to do quite some work yourself in order to be able to navigate your crawled content on your (hard) disk *. Heritrix does not rewrite URLs, it just tries to download as much as you tell it to (including JS and CSS).

Perhaps a crawler like HTTrack (not written in Java!) is more something you're after: it rewrites the URLs in the documents it downloads so that you can easily browse your archived websites once the crawler is done.

Both can be operated without the graphical components they're shipped with. And best of all: they're free.

Good luck.

* You will need one of these tools:

Wayback: http://archive-access.sourceforge.net/projects/wayback/user_manual.html

WERA: http://archive-access.sourceforge.net/projects/wera/

prometheuzza at 2007-7-12 17:58:56 > top of Java-index,Java Essentials,Java Programming...