help! about convert between unicode and iso8859

Hi Friends,

I have a question.

I would like to accept a string(Chinese Characters) from a TextField then save it to a iso8859 format file, and I need to read date from that file and convert it back to unicode string and display in TextField later.

I have try getBytes methord, but it does not work correctly, Whether I did not use it correctly?

Please help me.

Thank you very much!

[418 byte] By [dragontail77a] at [2007-10-1 19:01:13]
# 1

I think your first problem is that you are trying to save Chinese characters to a file encoded in ISO 8859. There is no ISO 8859 encoding that covers Chinese characters, so any Chinese characters will be lost during that operation.

You should save the characters either to a UTF-8 encoded file (my recommendation) or to a file using a Chinese encoding (you need to chose one that is appropriate for either Simplified or Traditional Chinese).

one_danea at 2007-7-11 14:18:57 > top of Java-index,Desktop,I18N...
# 2

Thank you very much one_dane, but the problem is that the file I want to save the data have some other data that I do not want to save as other format. So I hope I can separate ont Chinese Character into two bytes and the save them in the file, and I can get the data from the file when I need and combine 2 bytes back to Unicode format.

Thank you.

dragontail77a at 2007-7-11 14:18:57 > top of Java-index,Desktop,I18N...
# 3
You have your work cut out for you then. What code do you use to "combine 2 bytes back to Unicode format"? Your original question related to the use of getBytes, but getBytes won't help you to do that.
one_danea at 2007-7-11 14:18:57 > top of Java-index,Desktop,I18N...
# 4

Thank you one_dane,

There is a textfield call "Customer", I save the conent to a String called Customer. People can input Chinese character in Customer textfield. I used bytes = Customer.getBytes("UTF-8") to convert customer to byte[] bytes, and save bytes to a iso8859 format file. I hope I can read data from the file, and use new String(/*data*/, "UTF-8") to convert data back to format that can display correctly(Chinese characters) in textfield. But I did not successfully now. I do not Whether it is right methord.

Thank.

dragontail77a at 2007-7-11 14:18:57 > top of Java-index,Desktop,I18N...
# 5

OK, so you do use UTF-8 as the encoding. I still don't understand the part about "saving the bytes to a iso8859" format file. Which method do you use to save the data?

In order to preserve Chinese characters (and be able to use the getBytes method to read the characters again) you would need to construct an OutputStreamWriter on a FileOutputStream and specify the encoding (utf-8). This would save the characters in UTF-8, however.

If you save the file specifying iso8859 (or use FileWriter which saves the characters using the default encoding of the system), then the data will be lost (unless the default encoding of the system is a Chinese codepage).

one_danea at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...
# 6
Thanks one_dane,I just focus on put data into and get it from a byte[ ], other people will write byte array to and read it from the file. so I would like to know how to keep the Chinese character information by using byte[ ].Thank you.
dragontail77a at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...
# 7

Well, you can definitely use String.getBytes to convert a Java String object into a byte array, you can convert a byte array back to Unicode using a String constructor, and you should get a correct result IF the data you start out with is good data. There is no way getBytes will somehow magically be able to restore Chinese characters if the data you are dealing with has been through a conversion to iso8859-1 and back again first, however.

So you need to verify that the data you are trying to convert is valid first, then you can worry about getBytes. I am sure you have already consulted the documentation - there's a good introduction in the internationalization tutorial here: http://java.sun.com/docs/books/tutorial/i18n/text/string.html

It includes sample code that does a roundtrip conversion from a String object to utf-8 and back.

one_danea at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...
# 8
Thank you very much one_dane
dragontail77a at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...
# 9
Thank you one_dane, I have solve the problem.just use getBytes("UTF-8") to Convert date to UTF-8 format and save it in bytes array.and use new String("UTF-8") to get it back.Thanks for your help!
dragontail77a at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...
# 10

Hi dragontail77,

Do u have the javascript code to convert the Chinese character to utf-8 then from utf-8 to Chinese character again?

For instance, I enter 路 in google search then i will see one of it parameter with the url like this:

q=%E8%B7%AF

which is i guess is utf-8 based 16.

do u have javascript can convert 路 -> %E8%B7%AF and then on the other page it will get the utf-8 (%E8%B7%AF) and convert back to 路 then display again on the input text field?

when this %E8%B7%AF string send to servlet side, how do I convert it back to text for database to search ? Pls help, Thanks !

kmthiena at 2007-7-11 14:18:58 > top of Java-index,Desktop,I18N...