UTF8 incomplete byte sequence
Hi,
I have the following situations I am reading bytes from a socket. These bytes can contain utf-8 characters. Then I convert the bytes to a utf8 string. This all goes fine. The problem is when the bytes sequence I have read ends with an incomplete utf8 bytes sequence (because the will be read on nexy read from the socket). But I want to handle the rest of the bytes before reading the next chunk. What is the best way to do this ?
Kind regards,
Marco Laponder
> Hi,
>
> I have the following situations I am reading bytes
> from a socket. These bytes can contain utf-8
> characters. Then I convert the bytes to a utf8
> string. This all goes fine.
I'm not so sure about that. If you talk about Java, there is no such thing as an UTF-8 String. It's always UTF-16.
> The problem is when the
> bytes sequence I have read ends with an incomplete
> utf8 bytes sequence (because the will be read on nexy
> read from the socket).
> But I want to handle the rest
> of the bytes before reading the next chunk. What is
> the best way to do this ?
You could write all bytes into a ByteArrayOutputStream first, before processing them.
No particular reason, if this is the solution, I will give it a go. The inputstreamreader would give me the correct output ? Kind regards,Marco Lapondermlr@interchain.nl
> No particular reason, if this is the solution, I will
> give it a go. The inputstreamreader would give me the
> correct output ?
API docs:
An InputStreamReader is a bridge from byte streams to character streams: It reads bytes and decodes them into characters using a specified charset.
So the inputstream object would only give me complete characters ? Now I use new String(bytes,0,byteLen,"UTF-8");which give me a string with a question mark when the last bytes were an incomplete byte sequence