read encoding of an xml file.
Hi,
I'm trying to get the encoding of an xml file with the following code.
-
org.xml.sax.InputSource is = new InputSource( new FileInputStream( new File (filename ) ));
System.out.println( "encoding for "+ filename + " is " + is.getEncoding() );
-
Input xml file is
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Pip3A4PurchaseOrderRequest SYSTEM "3A
4_MS_R02_00_PurchaseOrderRequest.dtd">
<Pip3A4PurchaseOrderRequest>
</Pip3A4PurchaseOrderRequest>
-
However instead of returning "UTF-8" as the encoding scheme, I get a 'null'.
Anyone tried to retrieve encoding using InputSource class? Any help is greatly appreciated.
[760 byte] By [
rajeshkz] at [2007-9-26 2:03:42]

Well, with
org.xml.sax.InputSource is = new InputSource( new FileInputStream( new File (filename ) ));
you simply instantiate InputSource. This class allows you to specify an encoding by calling setEncoding(java.lang.String). This is the String you would get back by calling is.getEncoding(). Since you didn't provide an encoding, you quite correctly get back null.
This has nothing to do with what you specify in your XML source. That information would only be available after <?xml version="1.0" encoding="UTF-8"?> has actually been parsed.
Hope that helps.
lk555 at 2007-6-29 8:46:45 >

How do you get the encoding stated in the xml head? I have loaded the xml tree into the root document - doc (se code below). <?xml version="1.0" encoding="ISO-8859-1" ?>
I have roamed the org.xml.sax, org.w3c.dom and javax.xml.parsers javadocs and found nothing about how to read the encoding out of an parsed and loaded xml tree.
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource src = new InputSource( new BufferedReader( new FileReader( new File("myfile.xml") ) ) );
// Load the xml - how do I get the encoding?
Document doc = db.parse( src );
> You probably can't find out how to do that because
> there is no way to do it. And there probably is no
> way to do it because the designers couldn't imagine
> why you would want to know that. Why do you want to
> know that?
So I can take a xml tree and generate a file from it. My object design doesn't implement the xml API but rather simulates it. Sure I could parse the actual file myself or parse the regenerated xml output if using the javax API.
The question still remains, is there a method to interpret the contents of the xml header?
Regards, Jon
It's amazing that the JAXP tutorial code for read an XML file and recreate it as an identical output file contains "hard coded" the output of the XML header with UTF-8 encoding !!!
Sure it doesn't work if encoding is ISO8859 or other.
The functionnality of getting the encoding seems to be added in SAX2 extensions 1.1 as Locator2 interface.
But there aren't any parsers that implement that.
So...
Denis Queffeulou
dqu at 2007-6-29 8:46:45 >
