Tidying up XML produced using Jtidy

I am not very experienced in the use of java XML libraries but thanks to some great suggestions from members of this forum i have had a headstart but are now stock on the following problem:(my object is mainly to get data from a web page)

I have used the JTidy example provided on the website to convert a webpage from which i want to get data into an XML file on my local harddisk.

Now, when i try getting the data with an XSL file using the local xml file as the source, the Xalan transformer complains and spits out errors in the XML file.

Is there any way i could clean the xml file so that it becomes well formed, thus enabling me to use XSL to get the data i desire?

Any suggestions will be much appreciated.

P/s: I tried changing the setting of the output in Jtidy from tidy.setXmlOut(true) to tidy.setXHTML(true) all to no avail!

[883 byte] By [Antananarivoa] at [2007-11-27 11:26:20]
# 1

Are you saying that JTidy produces XML that is not well-formed, or are you just guessing because you don't understand the error messages?

DrClapa at 2007-7-29 16:09:50 > top of Java-index,Java Essentials,Java Programming...
# 2

> Are you saying that JTidy produces XML that is not

> well-formed, or are you just guessing because you

> don't understand the error messages?

Propably i am saying that because i don't understand the error messages. But the problem is that when i use a different xml file, the transformatiom works!

The code used for the transformation of the XML file after being parsed by Jtidy is as follows:

public class SimpleTransform

{

public static void main(String[] args)

throws TransformerException, TransformerConfigurationException,

FileNotFoundException, IOException

{

TransformerFactory tFactory = TransformerFactory.newInstance();

// Use the TransformerFactory to instantiate a Transformer that will work with

// the specfied stylesheet. This method call also processes the stylesheet

// into a compiled Templates object.

Transformer transformer = tFactory.newTransformer(new StreamSource("outPut_to_xml.xsl"));

// Use the Transformer to apply the associated Templates object to an XML document

// and write the output to a file

transformer.transform(new StreamSource("outPut_to_xml.xml"), new StreamResult(new FileOutputStream("metoerrrtest.txt")));

System.out.println("************* The result is in metoerrrtest.txt*************");

}

}

The error i get is as follows:

-//W3C//DTD HTML 4.01 Transitional//EN; Line #31; Column #3; The declaration for the entity "HTML.Version" must end with '>'.

BUILD SUCCESSFUL (total time: 1 second)

Antananarivoa at 2007-7-29 16:09:50 > top of Java-index,Java Essentials,Java Programming...