Reading unicode files

I have a problem reading unicode text files.

As written in books and that stuff i should use Reader instead of InputStream for this purpose. I do so and what i get is letters separated by \u0000 chars. So what is the way to convert unicode to "normal" strings. I deleted all the 00 using StringBuffer, but it don't work in older VMs, such as provided with win98.

So, what is the way-out?....the second problem going to be is writing unicode files:)

[468 byte] By [korsakoff] at [2007-9-27 21:18:48]
# 1

You have a file encoded using UTF-16, it sounds like. If you use a FileReader, let's say, that object will assume the file you are reading is encoded in your system's default encoding. Unfortunately that default is not going to be UTF-16, but something else like ISO-8859-1. To read a file using an encoding that you specify, do this:Reader r = new InputStreamReader(new FileInputStream(yourFile), "UTF16-BE");

This will give you a reader that will probably work... although I don't know if your files are "big-endian" or "little-endian", so you might have to use "UTF16-LE" instead.

You can wrap that Reader in a BufferedReader if you like, that's generally more efficient for input processing. And to write your data in UTF-16, do the similar thing using an OutputStreamWriter with the same encoding. Check out the API documentation page for the java.io package; the line about InputStreamReader will have a link for either "encoding" or "charset" that will give you more information about UTF-16 and so on.

DrClap at 2007-7-7 3:12:50 > top of Java-index,Archived Forums,Java Programming...