Parsing Cyrillic data
Hi,
I am trying to parse an XML file written in Cyrillic.
When I do the following:
InputStream stream = new FileInputStream(uri);
InputSource is = new InputSource();
is.setEncoding("UTF-8");
is.setByteStream(stream);
ContentHandler contentHandler = new MyContentHandler();
try {
XMLReader parser = new SAXParser();
parser.setContentHandler(contentHandler);
parser.parse(is);
} catch (IOException e) {
System.out.println("Error reading URI: " + e.getMessage());
}catch (SAXException e) {
System.out.println("Error in parsing: " + e.getMessage());
}
I am getting the data corrupted, mainly question marks.
However, when I try to convert it to bytes and then parse
InputStream stream = new FileInputStream(uri);
byte[] bytes = new byte[stream.available()];
stream.read(bytes);
ByteArrayInputStream bi = new ByteArrayInputStream(bytes);
Reader reader = new InputStreamReader(bi,"UTF-8");
InputSource is = new InputSource(reader);
is.setEncoding("UTF-8");
ContentHandler contentHandler = new MyContentHandler(fileid);
try {
XMLReader parser = new SAXParser();
parser.setContentHandler(contentHandler);
parser.parse(is);
v = ((MyContentHandler) contentHandler).resultEdge();
} catch (IOException e) {
System.out.println("Error reading URI: " + e.getMessage());
}catch (SAXException e) {
System.out.println("Error in parsing: " + e.getMessage());
}
I am getting the following error:
[Fatal Error] :1:1 Content is not allowed in prolog. It seems to me that the conversion from InputStream to InputSource is not successful as the file parses successfully in the example above, but the data is corrupted.
I would appreciate if someone could point out what am I doing wrong.
Thanks
Anne

