java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
I am having this error parsing an xml document that I created in another java class.
First, How can I find out which part of the document actually is causing the error?
Second, I have tried several things to work around the problem. The document is declared with UTF-8 encoding. If I open the xml document in StylusStudio and enter a small white-space change and then save the document with no other changes, then the parsing is successful with no errors! Any idea why this might be happening?
[512 byte] By [
mtraceyza] at [2007-10-2 4:36:02]

Why is it happening? Because Stylus Studio knows how to write a file in UTF-8 encoding and you don't. Your other class created an XML document which claimed to be encoded in UTF-8 even though it wasn't. If you are writing to a Writer there, you need to create the Writer something like this:Writer w = new OutputStreamWriter(someOutputStream, "UTF-8");
Thanks for the reply. I may be doing something wrong, but I believe I am already doing what you suggest.
StringWriter sw = new StringWriter();
XMLSerializer serializer = new XMLSerializer("XML", "UTF-8", true));
DocumetnBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.new DocumentBuilder();
Document output = db.new Document();
...create various elements and append them to the document
serializer.serialize(output);
PrintWriter pw = new PrintWriter(new FileWriter(outputPath));
pw.write(sw.toString());
pw.flush();
pw.close();
various try / catch blocks not included.
I don't explicitly tell the PrintWriter or FileWriter to use UTF-8, but I do tell the serializer, and it puts the expected heading at the top of the xml file:
<?xml version="1.0" encoding="UTF-8"?>
so, I think it should know how to do it.
Can you tell me more specifically how this is in error?
Exactly what I said. You write the data to a Writer like this:PrintWriter pw = new PrintWriter(new FileWriter(outputPath));
without specifying UTF-8 as the encoding. Just to spell it out in gory detail, you need this:PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(outputPath), "UTF-8"));
But why are you serializing the data to a String and then writing the String to a file? Why not just serialize direct to the file? I don't know what this XMLSerializer does but if it can be given an OutputStream as its parameter then just do that:serializer.serialize(new FileOutputStream(outputPath));
Then the serializer will be able to use UTF-8 to convert chars to bytes; your solution didn't do that, you converted the bytes back to a string and then did your own conversion to bytes without mentioning the encoding.
Well, thanks. I am making progress now. I can read in the xml file now, after I wrote it with your suggested approach.
The XMLSerializer comes from
org.apache.xml.serialize.XMLSerializer
I will look more into whether it takes an OutptutStream in its constructor.
I assembled these approaches in an xml course I took last spring.
Now that I am processing this file the way you suggested, I have introduced an XSLT TransformerException that I will have to chase down.