String Scraping from Websites

I'm looking to start a new program from scratch where I can scrape the contents of a website to get hotel prices. How would I go about doing this and can anyone point me to any example pages on sun docs? I can't seem to find any.

I want to scrape these details and load them into relevant textfields, then calculate certain things based on the data. Anyone undertaken similar projects or can point me in the right direction?

[439 byte] By [spencer.bowmana] at [2007-11-27 6:54:39]
# 1

> I'm looking to start a new program from scratch where

> I can scrape the contents of a website to get hotel

> prices. How would I go about doing this and can

> anyone point me to any example pages on sun docs? I

> can't seem to find any.

Here you go:

http://java.sun.com/docs/books/tutorial/networking/urls/index.html

prometheuzza at 2007-7-12 18:29:44 > top of Java-index,Java Essentials,Java Programming...
# 2

I once wrote a small application to scrape this site (it generates post/dukes statistics).

I wrote the program just because I wanted to test the features of the Scanner class.

This might get you started. It connects to a page, setups a scanner and tries to find a match for a regexp.

URL page = new URL(link);

Scanner scanner = new Scanner(page.openStream());

String userId = scanner.findWithinHorizon(USER_REGEXP, 0));

Kaj

kajbja at 2007-7-12 18:29:44 > top of Java-index,Java Essentials,Java Programming...
# 3

If you are using Firefox then should get [url=https://addons.mozilla.org/en-US/firefox/addon/3829]this[/url] plugin which lets you analyze HTTP headers. If you were to go to Google translate for instance they hide the URL String but with the above plugin you can see it all and build your own translator or currency converter etc.

_helloWorld_a at 2007-7-12 18:29:44 > top of Java-index,Java Essentials,Java Programming...
# 4

I've followed the tutorials on the sun docs but can't seem to get the website to connect - an exception is thrown saying it can't openConnection().

I get it to work with yahoo using this code:

[code]public class URLConnectionReader {

public static void main(String[] args) throws Exception {

URL yahoo = new URL("http://www.yahoo.com/");

URLConnection yc = yahoo.openConnection();

BufferedReader in = new BufferedReader(

new InputStreamReader(

yc.getInputStream()));

String inputLine;

while ((inputLine = in.readLine()) != null)

System.out.println(inputLine);

in.close();

}

}]/code]

But when I change the site it loads up a page on the website called "restricted.php". Anyone know how I can get into the site when something like this happens?

spencer.bowmana at 2007-7-12 18:29:44 > top of Java-index,Java Essentials,Java Programming...
# 5

I'll elaborate a bit on my previous post (btw, what is the command to show code on these forums?). Using that code found on the sun docs (http://java.sun.com/docs/books/tutorial/networking/urls/index.html) I was able to get the HTML code to display in the console from the yahoo website.

But when I change the web address to another site, this particular site throws an exception, with this showing up in the console:

"Exception in thread "main" java.lang.Exception: Oopen connection fialedd:

java.io.IOException: Server returned HTTP response code: 403 for URL:"

spencer.bowmana at 2007-7-12 18:29:44 > top of Java-index,Java Essentials,Java Programming...