how to feed CharsetDecoder (nio Buffer problems)
Hi there,
I have no idea how to feed bytes into a CharsetDecoder in presence of multibyte sequences. I am trying something along the following lines.
If a multibyte char is being fed (I tried c3 +a4 = ?, the first call to decode returns UNDERFLOW with bout.hasRemaining() == false as expected, but the 2nd call yields MALFORMED[1], no matter what rewind/flip method I throw in at // *1*
I probably "misgrok" something here with Buffers. Can someone point out what?
ByteBuffer bin = ByteBuffer.allocate(3);
CharBuffer bout = CharBuffer.allocate(1);
Charset cs = Charset.forName("UTF-8");
CharsetDecoder dec = cs.newDecoder();
publicvoid addChar(int ch )throws Exception{
bin.put((byte)ch);
bin.flip();
CoderResult res = dec.decode(bin,bout,false);
bout.flip();
if( bout.hasRemaining() ){
emit( bout.get() );
bin.clear();
bout.clear();
}else{
bout.flip();
//bin.flip(); // *1*
}
}
[1501 byte] By [
HolgerKa] at [2007-11-27 9:32:57]

# 4
But the result character is already in the output buffer?! Applying my program code to ISO-8859-1, all character yield underflows and the data is successfully being converted.
Here is a complete test program:
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
public class DecoderTest {
public static void main( String[] args ) throws Exception {
dec = Charset.forName(args[1]).newDecoder();
new DecoderTest().run(args[0]);
}
ByteBuffer bin = ByteBuffer.allocate(3);
CharBuffer bout = CharBuffer.allocate(1);
static CharsetDecoder dec;
public void run( String filename ) throws Exception {
FileInputStream in = new FileInputStream(filename);
int ch;
while( (ch=in.read())>=0 ) {
addChar(ch);
}
}
public void addChar( int ch ) throws Exception {
System.out.print( "byte "+(0xff & ch) + " => " );
bin.put((byte)ch);
bin.flip();
CoderResult res = dec.decode(bin,bout,false);
System.out.print( res + " => " );
if( res.isError() ) {
bin.clear();
bout.clear();
} else {
bout.flip();
if( bout.hasRemaining() ) {
System.err.print( "char " + ((int)bout.get()) );
bin.clear();
bout.clear();
} else {
bout.flip();
// what to do with bin here?
}
}
System.err.println();
}
}
Passing in iso latin code lopoks like this:
byte 98 => UNDERFLOW => char 98
byte 228 => UNDERFLOW => char 228
byte 104 => UNDERFLOW => char 104
byte 32 => UNDERFLOW => char 32
byte 98 => UNDERFLOW => char 98
byte 228 => UNDERFLOW => char 228
byte 104 => UNDERFLOW => char 104
byte 10 => UNDERFLOW => char 10
Utf-code looks like this:
byte 98 => UNDERFLOW => char 98
byte 195 => UNDERFLOW =>
byte 164 => MALFORMED[1] =>
byte 104 => UNDERFLOW => char 104
byte 32 => UNDERFLOW => char 32
byte 98 => UNDERFLOW => char 98
byte 195 => UNDERFLOW =>
byte 164 => MALFORMED[1] =>
byte 104 => UNDERFLOW => char 104
byte 10 => UNDERFLOW => char 10
Message was edited by:
HolgerK
# 6
I missed compact() completely and it solved my problem. Thanks a lot!
here the working version (for the archive):
public void addChar( int ch ) throws Exception {
System.err.print( "byte "+(0xff & ch) + " => " );
bin.put((byte)ch);
bin.flip();
bout.clear();
CoderResult res = dec.decode(bin,bout,false);
System.err.print( res + " => " );
if( res.isError() ) {
bin.clear();
} else {
bout.flip();
if( bout.hasRemaining() ) {
bin.clear();
System.err.print( "char " + ((int)bout.get()) );
} else {
bin.compact();
}
}
System.err.println();
}