Can't parse XML data

I'm receiving an XML document over a TCP socket, I then instantiate an instance of DOMParser and attempt to parse the data. Here's the exception:

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0) was found in markup after the end of the element content.

at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)

at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:588)

at org.apache.xerces.framework.XMLDocumentScanner$TrailingMiscDispatcher.dispatch(XMLDocumentScanner.java:1461)

at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)

at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)

Here's the relevant snippet(s) of source (there is code to write out the socket, but I didn't include that)

Socket socket =new Socket( hostIP, hostPort );

OutputStream outStream = socket.getOutputStream();

InputStream inStream = socket.getInputStream();

DataInputStream dataInStream =new DataInputStream( inStream );

byte[] inByteArray =newbyte[ 2048 ];

int length = dataInStream.read( inByteArray );

InputStream byteData =new ByteArrayInputStream( inByteArray );

try

{

DOMParser dp =new DOMParser();

dp.parse(new InputSource( byteData ));

Document doc = dp.getDocument();

}

catch ( Exception e )

{

e.printStackTrace();

System.exit( 1 );

}

Is there a different way to do this? I even tried creating a new String based on the length read from the socket, minus one. The parser then saw that the final angle bracket of my root element was missing, so it doesn't seem to be an encoding issue.

Any help would be appreciated.

Jeff

[2297 byte] By [JBRanciera] at [2007-11-27 4:44:01]
# 1

> An invalid XML character (Unicode: 0x0)

The XML is corrupt.

> ...saw that the final angle bracket of my root element was missing,

After you subtracted one from the length - that would suggest that not subtracting one would be a good idea.

I would suppose that the real problem here has nothing to do with XML nor DOM but rather that you are not correctly retreiving the data from the socket.

jschella at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...
# 2
Did you try creating a new String based on the length read from the socket, without subtracting one?
uncle_alicea at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...
# 3

> > An invalid XML character (Unicode: 0x0)

>

> The XML is corrupt.

>

> > ...saw that the final angle bracket of my root

> element was missing,

>

> After you subtracted one from the length - that would

> suggest that not subtracting one would be a good

> idea.

Yep. That was a test to verify that the data was in tact and not null terminated. To doubly check, I took an Ethereal trace of the wire, and there was no null. If I convert the byte[] to a String, everything is fine. Unfortunately,

there wasn't a parser method that took the XML document as a String.

>

> I would suppose that the real problem here has

> nothing to do with XML nor DOM but rather that you

> are not correctly retreiving the data from the socket.

Data is fine from the socket, I just had to implement my own getAttribute() and getElementContent, and use brute force. I just thought using an already written parser made more sense.

Thanks for the reply.

JBRanciera at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...
# 4
> Did you try creating a new String based on the length> read from the socket, without subtracting one?Yes. I still have to convert from a String to an InputSource. I must be something in the conversion?
JBRanciera at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...
# 5

I suspect the DataInputStream is reading less than 2048 bytes, leaving the unused array elements at their default value, zero. Then the ByteArrayInputStream is passing those zeroes along to the parser. What I'm wondering is why you need either of those things? Why not construct the InputSource directly from the socket's InputStream?

uncle_alicea at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...
# 6

> I suspect the DataInputStream is reading less than

> 2048 bytes, leaving the unused array elements at

> their default value, zero. Then the

> ByteArrayInputStream is passing those zeroes along to

> the parser.

Thanks, I didn't consider that.

What I'm wondering is why you need

> either of those things? Why not construct the

> InputSource directly from the socket's InputStream?

I did look at, but there's a binary application-level transport protocol wrapping the XML data, as well, I need to handle other binary transactions.

Thanks for the input and responses.

JBRanciera at 2007-7-12 9:55:54 > top of Java-index,Java Essentials,Java Programming...