Encoding problems with DOM
First of all, to explain the error: I am writing an application that is receiving an XML message as a String. I need to parse it to either extract information from a CDATA field, or to extract an error message from <Error> tags if they exist. I will have no use for the XML message after this step, I will just pass the extracted error message or data onto another method. I tried using the following test XML message:
String sResponse ="<?xml version=\"1.0\" encoding=\"utf-16\"?>" +
"<Response xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" "+
"xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">" +
"<Errors xmlns=\"\">\n<Error>\n<Number>10009</Number>\n<Severity>Error</Severity>\n<Message>Virus detected.</Message>\n" +
"<Details><string>Virus detected in the message content; request terminated.</string>\n" +
"<string>Virus Name = HTML_TEST_VIRUS</string>\n<string>Offset = 0</string>\n"+
"</Details>\n</Error>\n</Errors>\n</Response>";
I then use the following code to begin parsing the string (this is the only way I could figure out how to get an XML string to the parse method, so if someone knows a better way I'm open to suggestions, but this is not my main question):
ByteArrayInputStream baInput =new ByteArrayInputStream(sResponse.getBytes());
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(baInput);
This code with this test message produces the following exception at the builder.parse line:
[Fatal Error] :1:40: Content is not allowed in prolog.
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at ...
After some Google searching, it looks like this error is often thrown when there are encoding problems. If I change the message in sResponse to use "utf-8" OR if I remove the prolog entirely (remove "<?xml ... ?>") it works.
The problem is, I can't guarantee whether the client will send me a message with utf-8 or utf-16, and some of the sample messages I have from the client don't even have a prolog. So I thought the easiest solution is to simply remove the prolog before parsing it. Is there a way I can remove the prolog from the XML message using DOM? Or will I have to use basic string parsing and hope the XML is well-formed? OR will removing the prolog break everything and if so, is there a better way to handle this error?
Thanks for reading all my ramblings, I hope that made sense!

