Making DOM parser less strict

Hi,

I'd like to know if there's a method to make the DOM parser less strict. The document I need to parse is not my nor do I have any way of controlling it's looks. So, what do you reckon?

Here, I post the stack trace of DomParsers Exception[Fatal Error] :2:62: White spaces are required between publicId and systemId.

MyException

at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:264)

at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:292)

at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:98)

I catch the exception and then throw my own with assigned StackTrace, it's why the top is so "funny".

The option with using some alternative DOM parser would also satisfy my, do you know some other less strict ones (the only thing is, I need XPath functionality)?

Btw. the XML document is quite fine already. I reviewed it myself and it looks good, no open tags or sth.

Thanks for interestment

Message was edited by:

AdamW

[1086 byte] By [AdamWa] at [2007-11-27 7:04:57]
# 1
HelloWould u mind posting the code? So that we can take take a better look at it.
haishaia at 2007-7-12 18:56:15 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

The code doesn't matter, it's the input XML that has the problem. (The error message says what the problem is.)

No, there are no parsers that accept badly-formed XML. The XML Recommendation says that parsers, to be compliant, may only parse well-formed XML and must reject anything else.

It follows that it is the responsibility of whoever sent you that document to fix it up and send you a well-formed version.

DrClapa at 2007-7-12 18:56:15 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

Well, thanks. It's actually not a XML file I tried to parse, but a HTML-site from net, so as said I've got nothing to say as to how does it look like. I believe I will need RegExpression as a workaround.

My question came from the fact, that I spotted functions like setValidating in the DOM factory. It seems to work a bit (I've tested on another examples), but not good enough to make the site I choosed be accepted.

Thanks for answers!

AdamWa at 2007-7-12 18:56:15 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4

> Well, thanks. It's actually not a XML file I tried to

> parse, but a HTML-site from net, so as said I've got

> nothing to say as to how does it look like.

In that case, yes, you can't complain that it's not well-formed XML. But you can make it into well-formed XML by applying products like JTidy and TagSoup to it. Then you can use XML-aware software like the DOM parser on the result. I would recommend trying that.

DrClapa at 2007-7-12 18:56:15 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...