XMLParser exception

Hi..

I am getting the following exception while running a servlet on Apache tomcat

[Fatal Error] :15:184: The element type "HR" must be terminated by the matching end-tag "</HR>".

From XMLParser

org.xml.sax.SAXParseException: The element type "HR" must be terminated by the matching end-tag "</HR>".

Please could anyone suggest a solution?

-Neha Narkhede

[409 byte] By [Nehaa] at [2007-10-2 9:54:21]
# 1

Are you parsing HTML with an XML parser?

HTML is not XML conformant, concretely for this exception HTML allows unclosed tags but XML doesn't.

If you were just trying to HTML parse it, use a HTML parser.

If you were trying to XML parse it and you control the source, use X-HTML.

If you are trying to XML parse any HTML, you should convert the HTML to XML first. There is an application/library called Tidy (with a Java version) which cleans up HTMLs and should also offer the option to convert it to XML.

It's on my list of features for this very week to add HTML support to an XML analysis application I wrote :) I haven't gotten around to coding it yet but Tidy seems to be the best tool for this, it's recommended from many places and judging from the Javadocs it will do the trick.

Lokoa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 2
Hi..I am reading HTML code from a url and applying Tidy. After that I am building a DOM object and then using TextExtractingVisitor which extracts text from this DOM object.Please can you tell me which files to check to solve this problem?Thank you.-Neha
Nehaa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 3
Are you using the asXml option in Tidy and it doesn't give any error messages?
Lokoa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 4

Hi..

I am using the following options with Tidy

tidy.setQuiet(false);

tidy.setShowWarnings(false);

tidy.setXmlOut(true);

tidy.setMakeClean(true);

tidy.setHideEndTags(true);

tidy.parse(in,out);

Please tell me which other options I should try to remove that error of misssing end tags..

Thank you very much.

-Neha

Nehaa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 5

> Hi..

>I am using the following options with Tidy

>

>tidy.setQuiet(false);

> tidy.setShowWarnings(false);

>tidy.setXmlOut(true);

>tidy.setMakeClean(true);

>tidy.setHideEndTags(true);

>tidy.parse(in,out);

> Please tell me which other options I should try to

> remove that error of misssing end tags..

>

> Thank you very much.

>

> -Neha

I haven't gone into the coding details yet but if you run the application version of Tidy and pass -asxml it does produce end tags.

Why are you calling tidy.setHideEndTags(true)?

Lokoa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 6
Thanks a lot. But are you talking about some specific version of Tidy?Please could you tell me more on this?-Neha
Nehaa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...
# 7

I got the Windows exe from http://tidy.sourceforge.net/ and the Java library from

http://sourceforge.net/projects/jtidy.

The exe seems to do end tags correctly when using the -asxml command line argument. From this I assume the library version can do it as well and it has the setXmlOut. I can't give you any more because I haven't done it myself yet. When I get to it, I'll post from work with the solution, assuming I can find it :)

Again, why are you calling setHideEndTags(true)? This seems to indicate you want it to hide end tags.

Lokoa at 2007-7-16 23:59:07 > top of Java-index,Other Topics,Patterns & OO Design...