Unicodes get replaced with Ascii characters

I have anXML file. Depending on the number of occurence ofpii numbers, the file getssplit. That is, if the number of occurence of pii numbers are 3, then the file gets split into 3 files. This is happening perfectly.But when the files are split, the unicodes are not retained.

If the input file contains as follows

<given-names initials="kj">k. j.</given-names>

But after splitting, the unicodes are getting replaced as follows

<given-names initials="kj">k.?j.</given-names>

Could anyone please tell me what modification to do to retain the unicodes as in the input file, that is

<given-names initials="kj">k. j.</given-names>

publicvoid splitbr(String filename)

{

try

{

File test1=new File(filename);

File testbr=new File(test1.getCanonicalPath());

DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();

DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();

doc1 = docBuilder.parse(filename);

NodeList list = doc1.getElementsByTagName("article");

NodeList piecetag = doc1.getElementsByTagName("article-id");

for(int i=0;i<list.getLength();i++)

{

Element element = (Element)list.item(i);

Node pieceval=piecetag.item(i);

if(pieceval.getNodeType() == Node.ELEMENT_NODE)

{

Element articelem = (Element)pieceval;

NamedNodeMap pieceattrs = pieceval.getAttributes();

for(int s=0;s<pieceattrs.getLength();s++)

{

if(((Node)pieceattrs.item(s)).getNodeName()=="pii")

{

rolevalue=((Node)pieceattrs.item(s)).getNodeValue();

}

}

}

File f1=new File(testbr.getParent(),"reviewfiles");

f1.mkdir();

temp=temp+1;

File dlnfile=new File(testbr.getParent(),"reviewfiles\\"+rolevalue+"jbr.xml");

dlnfile.createNewFile();

doc2 = docBuilder.newDocument();

Node dup = doc2.importNode(element,true);

doc2.appendChild(dup);

Source source =new DOMSource(doc2);

Result result =new StreamResult(dlnfile.getAbsolutePath());

TransformerFactory tFactory = TransformerFactory.newInstance();

Transformer transformer = tFactory.newTransformer();

transformer.setOutputProperty(javax.xml.transform.OutputKeys.DOCTYPE_SYSTEM,"C:\\DTD\\ABCD.dtd");

transformer.setOutputProperty(javax.xml.transform.OutputKeys.METHOD,"xml");

transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT,"yes");

transformer.transform(source, result);

}

}

catch(IOException ioe)

{

System.out.println("The IO EXCEPTION IS CAUGHT in filesplit for splitbr fn "+ioe);

}

catch (SAXParseException err)

{

System.out.println("The SAXParseException EXCEPTION IS CAUGHT in filesplit for splitbr fn "+err);

}

catch (SAXException e)

{

System.out.println("The SAXException EXCEPTION IS CAUGHT in filesplit for splitbr fn "+e);

}

catch(Exception ce)

{

System.out.println("The GENERAL EXCEPTION IS CAUGHT in filesplit for splitbr fn "+ce);

}

catch (Throwable t)

{

}

}

>

[4958 byte] By [sony_tja] at [2007-11-27 6:19:56]
# 1
What makes you so sure that your problem is Unicode (not "Unicodes", it sounds very amateurish) being converted to Ascii? Have you actually checked the format of the input file compared to the output file?
jellystonesa at 2007-7-12 17:34:58 > top of Java-index,Java Essentials,New To Java...
# 2
Looks to me like you have a file that is encoded in UTF-8, but you are reading it with something that incorrectly assumes it is encoded in ISO-8859-1.I don't see where that is happening in the code you posted. Maybe it's happening before, or after, that.
DrClapa at 2007-7-12 17:34:58 > top of Java-index,Java Essentials,New To Java...
# 3
Looks to me like you have a file that is encoded in UTF-8Yes, my file is encoded in UTF-8The transformation from Unicode to Ascii is happening at the following line and not before thistransformer.transform(source, result);
sony_tja at 2007-7-12 17:34:58 > top of Java-index,Java Essentials,New To Java...