String Scraping from Websites
I'm looking to start a new program from scratch where I can scrape the contents of a website to get hotel prices. How would I go about doing this and can anyone point me to any example pages on sun docs? I can't seem to find any.
I want to scrape these details and load them into relevant textfields, then calculate certain things based on the data. Anyone undertaken similar projects or can point me in the right direction?
> I'm looking to start a new program from scratch where
> I can scrape the contents of a website to get hotel
> prices. How would I go about doing this and can
> anyone point me to any example pages on sun docs? I
> can't seem to find any.
Here you go:
http://java.sun.com/docs/books/tutorial/networking/urls/index.html
I once wrote a small application to scrape this site (it generates post/dukes statistics).
I wrote the program just because I wanted to test the features of the Scanner class.
This might get you started. It connects to a page, setups a scanner and tries to find a match for a regexp.
URL page = new URL(link);
Scanner scanner = new Scanner(page.openStream());
String userId = scanner.findWithinHorizon(USER_REGEXP, 0));
Kaj
kajbja at 2007-7-12 18:29:44 >

I've followed the tutorials on the sun docs but can't seem to get the website to connect - an exception is thrown saying it can't openConnection().
I get it to work with yahoo using this code:
[code]public class URLConnectionReader {
public static void main(String[] args) throws Exception {
URL yahoo = new URL("http://www.yahoo.com/");
URLConnection yc = yahoo.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}]/code]
But when I change the site it loads up a page on the website called "restricted.php". Anyone know how I can get into the site when something like this happens?
I'll elaborate a bit on my previous post (btw, what is the command to show code on these forums?). Using that code found on the sun docs (http://java.sun.com/docs/books/tutorial/networking/urls/index.html) I was able to get the HTML code to display in the console from the yahoo website.
But when I change the web address to another site, this particular site throws an exception, with this showing up in the console:
"Exception in thread "main" java.lang.Exception: Oopen connection fialedd:
java.io.IOException: Server returned HTTP response code: 403 for URL:"