Strange output?

Hello,

I am using the SAX parser to parse a xml file which can be downloaded from the http://archive.godatabase.org/latest-termdb/go_daily-termdb.obo-xml.gz.

The file is a gene ontology. My problem is that from the 25000 terms described in this file, the parser can not read the id node of 100 of these terms correctly. The output should be like

GO:0000142

while it reads it as

GO:00001

42

as a result what I have as the terms id is 42 . As I said this only occurs randomly in 100 terms from the 25000 terms. The characters function doesn't do anything at all

publicvoid characters(char[] ch,int start,int length)throws SAXException{

tempVal =new String(ch,start,length);

}

just reads the values. Any ideas?

ehsan

[1086 byte] By [Ehsan.Sa] at [2007-11-27 5:06:37]
# 1
This sounds like the usual problem where people assume the characters() method will always return a complete text node. That isn't the case. The parser is allowed to split a text node into as many pieces as it likes and call the characters() method once for each piece.
DrClapa at 2007-7-12 10:25:24 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Thanks for letting me know what was wrong. The thing is that it's so easy to fix too. just use a stringbuffer instead of the string . so the character method will look like this.

public void characters(char[] ch, int start, int length) throws SAXException {

sBuff.append(ch,start,length);

}

And tou can reset the stringBuffer at the startElement method

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

//reset the stringBuffer

sBuff.setLength(0);

}

Thanks again

Ehsan

Ehsan.Sa at 2007-7-12 10:25:24 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...