Can StAX recover from CharConversionException?

We've got a UTF8 encoded input file coming from a partner company and it seems like they've sent a bad character (a lowercase e with a hat on it) in one of their recent files...

The error I'm getting from Woodstox is:

07/02/06 17:33:24 com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x20 (at char #1205, byte #37)

07/02/06 17:33:24at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:683)

07/02/06 17:33:24at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)

07/02/06 17:33:24at com.xyz.quartz.jobs.ExternalDataTableListener.newRecord(ExternalDataTableListener.java:139)

07/02/06 17:33:24at com.xyz.quartz.jobs.ExternalDataTableListenerJob.execute(ExternalDataTableListenerJob.java:56)

07/02/06 17:33:24at org.quartz.core.JobRunShell.run(JobRunShell.java:203)

07/02/06 17:33:24at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:520)

07/02/06 17:33:24 Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x20 (at char #1205, byte #37)

07/02/06 17:33:24at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:310)

07/02/06 17:33:24at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:201)

07/02/06 17:33:24at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)

07/02/06 17:33:24at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)

07/02/06 17:33:24at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:967)

07/02/06 17:33:24at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:738)

07/02/06 17:33:24at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1995)

07/02/06 17:33:24at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)

07/02/06 17:33:24... 4 more

As a result of this one bad character, the entire "import file and break up into chunks, inserting each one into the database" process bombed.

Obviously I'm talking to the partner firm to get their files fixed, but in the meantime, we'd like to make our import process a little more bulletproof. Specifically, I'd like to be able to tell the XMLStreamReader to somehow skip that bad character and proceed with the rest of the file import.

Is this possible? I couldn't find anything in Google, in various Java/XML mailing lists, nor in any forums... Thanks!

[2433 byte] By [waynefaya] at [2007-11-26 17:34:02]
# 1
You can't find anything because compliant XML parsers are not allowed to modify their input in any way to "correct" errors in the data. Not well-formed, not parsed. That's a rule of XML.
DrClapa at 2007-7-9 0:02:02 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...