UTF8 incomplete byte sequence

Hi,

I have the following situations I am reading bytes from a socket. These bytes can contain utf-8 characters. Then I convert the bytes to a utf8 string. This all goes fine. The problem is when the bytes sequence I have read ends with an incomplete utf8 bytes sequence (because the will be read on nexy read from the socket). But I want to handle the rest of the bytes before reading the next chunk. What is the best way to do this ?

Kind regards,

Marco Laponder

[489 byte] By [LaponderMLa] at [2007-11-27 5:11:47]
# 1

> Hi,

>

> I have the following situations I am reading bytes

> from a socket. These bytes can contain utf-8

> characters. Then I convert the bytes to a utf8

> string. This all goes fine.

I'm not so sure about that. If you talk about Java, there is no such thing as an UTF-8 String. It's always UTF-16.

> The problem is when the

> bytes sequence I have read ends with an incomplete

> utf8 bytes sequence (because the will be read on nexy

> read from the socket).

> But I want to handle the rest

> of the bytes before reading the next chunk. What is

> the best way to do this ?

You could write all bytes into a ByteArrayOutputStream first, before processing them.

CeciNEstPasUnProgrammeura at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...
# 2
You are not using the InputStreamReader class? Why?
jsalonena at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...
# 3
No particular reason, if this is the solution, I will give it a go. The inputstreamreader would give me the correct output ? Kind regards,Marco Lapondermlr@interchain.nl
LaponderMLa at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...
# 4

> No particular reason, if this is the solution, I will

> give it a go. The inputstreamreader would give me the

> correct output ?

API docs:

An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset.

CeciNEstPasUnProgrammeura at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...
# 5
So the inputstream object would only give me complete characters ? Now I use new String(bytes,0,byteLen,"UTF-8");which give me a string with a question mark when the last bytes were an incomplete byte sequence
LaponderMLa at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...
# 6
Wrap a reader around the InputStream and read chars/Strings instead of bytes/byte[]. No need for conversion, InputStreamReader does this for you.
quittea at 2007-7-12 10:32:17 > top of Java-index,Java Essentials,Java Programming...