Extra #Text node
Hi,
It seems I got extra #Text node in my DOM object. For example, if my XML is as follows:
<Root>
<FirstElement>Value</FirstElement>
</Root>
My DOM tree will look like:
+ ELEMENT: Root
+ #TEXT:
+ ELEMENT: FirstElement
+ #TEXT: Value
+ #TEXT:
Is there a way to avoid those extra empty #Text Node?
[394 byte] By [
marlysaa] at [2007-10-3 5:23:58]

//Create a DocumentBuilderFactory
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
//Set the factory object to be validating
factory.setValidating(true);
//Set the factory object to not parse empty text nodes
factory.setIgnoringElementContentWhitespace(true);
//Create a DocumentBuilder and parse XML document
DocumentBuilder builder = factory.newDocumentBuilder();
Documentdocument = builder.parse(new File("input.xml"));
Those are not "empty" text nodes, they have newlines in them.
If your XML was:
<Root><FirstElement>Value</FirstElement></Root>
you would not see them. By default, you get them. As pointed out, you can ask the parser to ignore them.
Dave Patterson
In fact if you specified Xerces as parser / schema factory, you will get a new line.
System.setProperty("javax.xml.parsers.DocumentBuilderFactory", "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");
System.setProperty("javax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema", "org.apache.xerces.jaxp.validation.XMLSchemaFactory");
I think it is a bug with this additional text nodes. Because, if to use validation+setIgnoreWhitespace(true), this #text elements will be in the DOM tree!I solve this problem with using another parsers JDOM or DOM4J.
Timasa at 2007-7-14 23:31:05 >

I don't see how it could be a bug. The source XML file had newlines in it. Therefore a true representation of the content would include those Text entries.
Don't confuse "It does something that I don't want (at least for now)" with "it is not working as it was designed."
Dave Patterson
d.patterson setIgnoringElementContentWhitespace() is to ignore whitespace that is not a significant part of element content. I think that Extra #Text nodes aren't a significant part of element content.
And if to validate with DTD(!) this Extra #Text nodes aren't appeared. But if to validate with XSD(!), they are... And now you think that it isn't a bug?
Timasa at 2007-7-14 23:31:05 >
