Parsing an HTML file

Hello, I'm connecting to a website and am reading in the HTML, and need a way of recognising tags such as <link> and <item>

I did something before which pulls out <a href> links, how can I adapt this bit of code to get tags such as <link> or <item> ?

RL url =new URL(s1);

URLConnection conn = url.openConnection();

Reader read =new InputStreamReader(conn.getInputStream());

HTMLEditorKit kit =new HTMLEditorKit();

HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();

kit.read(read, doc, 0);

HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);

while (it.isValid()){

SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();

String link = (String)s.getAttribute();

if (link !=null){

System.out.println(link);

}

it.next();

}

}

[1216 byte] By [Unconditionala] at [2007-11-26 13:49:02]
# 1
Also, are there any better ways of doing this other than using the HTMLEditor kit like above?
Unconditionala at 2007-7-8 1:25:20 > top of Java-index,Java Essentials,New To Java...
# 2

If you wrote that yourself then you shouldn't have trouble adapting it to process LINK tags rather than A tags. I've never come across ITEM tags - do you mean LI?

As regards alternative approaches, you could consider converting your HTML to XHTML (using something like Tagsoup) and then using XPath on it.

YAT_Archivista at 2007-7-8 1:25:20 > top of Java-index,Java Essentials,New To Java...