Parsing raw HTML Text

Hello all

I am working on a program that connects to 5 different sites (yahoo google etc...) and gets the stock information/points and brings it back to one site. I have been reading a book for a while and it tells me how to connect and everything but everytime it keeps saying that when we use openStream() function of the URL class...it returns RAW HTML.

How can i get the information i want from the raw HTML? I know there isnt any Specific method that can be used to just get to the informaiton i want from the raw HTML. But a little help from you all on how to Parse HTML tags would really help

Please i would appreciate any help. Or if you guys know any good sample code online on this matter that could help.

Thanks

[753 byte] By [bhaarat_javaa] at [2007-10-2 5:39:14]
# 1

> I am working on a program that connects to 5

> different sites (yahoo google etc...) and gets the

> stock information/points and brings it back to one

> site. I have been reading a book for a while and it

> tells me how to connect and everything but everytime

> it keeps saying that when we use openStream()

> function of the URL class...it returns RAW HTML.

Wow. You request an HTML page and guess what, you get HTML. I wouldn't consider HTML as "raw", btw. It'S actualy very rich on content and formatting information.

> How can i get the information i want from the raw

> HTML? I know there isnt any Specific method that can

> be used to just get to the informaiton i want from

> the raw HTML. But a little help from you all on how

> to Parse HTML tags would really help

If it's true XHTML, an XML parser might help. Oterwise: simply type "java html parser" into Google. Lots of results.

How about looking into webservices instead? If you just want raw data, why not ask for it?

CeciNEstPasUnProgrammeura at 2007-7-16 1:49:34 > top of Java-index,Java Essentials,Java Programming...
# 2

Ok

I am now able to strip down all the HTML tags from the raw HTML (I cant use XML sites, i wished).

So now when all the tags are off....i should look for a keyword called "Last" because near that is the information i need to extract from the data i recieved.

The following code strips down the html tags.

import java.io.IOException;

import java.io.OutputStreamWriter;

import javax.swing.text.html.*;

public class TagStripper extends HTMLEditorKit.ParserCallback{

private OutputStreamWriter out;

//private String out;

public TagStripper (OutputStreamWriter out)

{

this.out = out;

}

public void handleText(char[] text, int position)

{

try

{

out.write(text);

out.flush();

}

catch (IOException e)

{

System.err.println(e);

}

}

}

I was thinking that somewhere in the [code]handleText(char[] text, int position)[/code method I can try to search the data thats about to be spit out, for the keyword "Last". I saw that String Class has a fucntion called indexOF() which gives back the index where the supplied substring starts from.

But my question is how do i make a string and copy all the data to it, since the data is in OutputStreamWriter type?

Please give some direction

bhaarat_javaa at 2007-7-16 1:49:34 > top of Java-index,Java Essentials,Java Programming...