java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.

I am having this error parsing an xml document that I created in another java class.

First, How can I find out which part of the document actually is causing the error?

Second, I have tried several things to work around the problem. The document is declared with UTF-8 encoding. If I open the xml document in StylusStudio and enter a small white-space change and then save the document with no other changes, then the parsing is successful with no errors! Any idea why this might be happening?

[512 byte] By [mtraceyza] at [2007-10-2 4:36:02]
# 1

Why is it happening? Because Stylus Studio knows how to write a file in UTF-8 encoding and you don't. Your other class created an XML document which claimed to be encoded in UTF-8 even though it wasn't. If you are writing to a Writer there, you need to create the Writer something like this:Writer w = new OutputStreamWriter(someOutputStream, "UTF-8");

DrClapa at 2007-7-16 0:08:36 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Thanks for the reply. I may be doing something wrong, but I believe I am already doing what you suggest.

StringWriter sw = new StringWriter();

XMLSerializer serializer = new XMLSerializer("XML", "UTF-8", true));

DocumetnBuilderFactory dbf = DocumentBuilderFactory.newInstance();

DocumentBuilder db = dbf.new DocumentBuilder();

Document output = db.new Document();

...create various elements and append them to the document

serializer.serialize(output);

PrintWriter pw = new PrintWriter(new FileWriter(outputPath));

pw.write(sw.toString());

pw.flush();

pw.close();

various try / catch blocks not included.

I don't explicitly tell the PrintWriter or FileWriter to use UTF-8, but I do tell the serializer, and it puts the expected heading at the top of the xml file:

<?xml version="1.0" encoding="UTF-8"?>

so, I think it should know how to do it.

Can you tell me more specifically how this is in error?

mtraceyza at 2007-7-16 0:08:36 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

Exactly what I said. You write the data to a Writer like this:PrintWriter pw = new PrintWriter(new FileWriter(outputPath));

without specifying UTF-8 as the encoding. Just to spell it out in gory detail, you need this:PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(outputPath), "UTF-8"));

But why are you serializing the data to a String and then writing the String to a file? Why not just serialize direct to the file? I don't know what this XMLSerializer does but if it can be given an OutputStream as its parameter then just do that:serializer.serialize(new FileOutputStream(outputPath));

Then the serializer will be able to use UTF-8 to convert chars to bytes; your solution didn't do that, you converted the bytes back to a string and then did your own conversion to bytes without mentioning the encoding.

DrClapa at 2007-7-16 0:08:36 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4

Well, thanks. I am making progress now. I can read in the xml file now, after I wrote it with your suggested approach.

The XMLSerializer comes from

org.apache.xml.serialize.XMLSerializer

I will look more into whether it takes an OutptutStream in its constructor.

I assembled these approaches in an xml course I took last spring.

Now that I am processing this file the way you suggested, I have introduced an XSLT TransformerException that I will have to chase down.

mtraceyza at 2007-7-16 0:08:36 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...