CRLF characters in CDATA section

I am trying to store a string in a CDATA section of an XML file. Many of my strings have CRLF characters (i.e. ASCII 13 + ASCII 10) and these are causing problems.

I create a DOM, then write the DOM out to a file - it looks like a reasonable XML file (with line breaks in the CDATA section where I expect them to be), though examining the actual bytes in the XML file shows that it contains 13,13,10 instead of 13,10.

When I parse the file to read it back into a DOM, then extract the String from the CDATA section, the 13,10 from the original string has been replaced by 10,10, so trying to display it shows a series of non-printing characters.

I have tried this with both JAXP and oracle.xml.parser.v2 and both approaches give very similar behaviour. I write the file out as UTF-8.

Does anyone have any suggestions about what might be going wrong?

[888 byte] By [billRoberts] at [2007-9-26 8:30:51]
# 1
If you have whitespace (such as line endings) immediately adjacent to your CDATA section, the parser will probably consider the whole thing, whitespace plus CDATA, as a single text node. Which is what it is.
DrClap at 2007-7-1 19:10:55 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

The Oracle parser does seem to do that: i.e. it tells me that the CDATA section is a text node. The JAXP parser recognises space-CDATA-space as 3 separate nodes.

However, am I not allowed to have line breaks inside a CDATA section, and can I not expect them to be preserved by the parser?

billRoberts at 2007-7-1 19:10:55 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

I am experiencing exactly the same problem.

I can correctly create a CDataSection which contains CRLFs. However when the Document is written out to an XML file, the XML file contains CRCRLF in place of CRLF with the CData elements.

This suggests a problem with the DOM to output stream identity transformation.

Can anyone provide a solution to this problem? I am currently considering stripping out the CRCR pairs and replacing them with a single CR, after i have constructed the DOM from the XML file.

Cheers, Nathan.

nc5022 at 2007-7-1 19:10:55 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...