Extra #Text node

Hi,

It seems I got extra #Text node in my DOM object. For example, if my XML is as follows:

<Root>

<FirstElement>Value</FirstElement>

</Root>

My DOM tree will look like:

+ ELEMENT: Root

+ #TEXT:

+ ELEMENT: FirstElement

+ #TEXT: Value

+ #TEXT:

Is there a way to avoid those extra empty #Text Node?

[394 byte] By [marlysaa] at [2007-10-3 5:23:58]
# 1

//Create a DocumentBuilderFactory

DocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance();

//Set the factory object to be validating

factory.setValidating(true);

//Set the factory object to not parse empty text nodes

factory.setIgnoringElementContentWhitespace(true);

//Create a DocumentBuilder and parse XML document

DocumentBuilder builder = factory.newDocumentBuilder();

Documentdocument = builder.parse(new File("input.xml"));

dvohra09a at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Those are not "empty" text nodes, they have newlines in them.

If your XML was:

<Root><FirstElement>Value</FirstElement></Root>

you would not see them. By default, you get them. As pointed out, you can ask the parser to ignore them.

Dave Patterson

d.pattersona at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

In fact if you specified Xerces as parser / schema factory, you will get a new line.

System.setProperty("javax.xml.parsers.DocumentBuilderFactory", "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl");

System.setProperty("javax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema", "org.apache.xerces.jaxp.validation.XMLSchemaFactory");

marlysaa at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4
I think it is a bug with this additional text nodes. Because, if to use validation+setIgnoreWhitespace(true), this #text elements will be in the DOM tree!I solve this problem with using another parsers JDOM or DOM4J.
Timasa at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5

I don't see how it could be a bug. The source XML file had newlines in it. Therefore a true representation of the content would include those Text entries.

Don't confuse "It does something that I don't want (at least for now)" with "it is not working as it was designed."

Dave Patterson

d.pattersona at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 6

d.patterson setIgnoringElementContentWhitespace() is to ignore whitespace that is not a significant part of element content. I think that Extra #Text nodes aren't a significant part of element content.

And if to validate with DTD(!) this Extra #Text nodes aren't appeared. But if to validate with XSD(!), they are... And now you think that it isn't a bug?

Timasa at 2007-7-14 23:31:05 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...