Parse exception with french characters
My parsing program throws a SaxParseException upon the parse of some french characters; in particular a "? character. These characters are enclosed in a CDATA tag, so I'm not sure why its being parsed. My file is encoded in UTF-8. How can resolve this problem? Thanks
org.xml.sax.SAXParseException: Invalidbyte 2 of 3-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:264)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:292)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
# 2
> Your file is not encoded in UTF-8, although the XML> prolog claims it is.Meaning the sender/creator of the file is at fault? If not UTF-8, then probably ISO-8859-1, correct?Thanks
# 3
> > Your file is not encoded in UTF-8, although the XML
> > prolog claims it is.
>
> Meaning the sender/creator of the file is at fault?
> If not UTF-8, then probably ISO-8859-1, correct?
Most likely the creator of the file is at fault. But it's possible that it has been through some transformation between them and you that rewrote it in a new charset.
Sure, the real encoding could be ISO-8859-1. Or it could be windows-1252. Other encodings are possible but unlikely, I suppose.