Problem with UTF-8 To Big5

Has anyone converted UTF-8 into Big5 with any luck? I seem to just get question marks. If i print out the string in UTF8 I see 3 bits per character which i think is correct, and i'm trying to convert it for display on a web page using the "Big5" meta tag. However just printing to the console gives me the incorrect results after conversion.

--

before conversion utf8(console) = Φ?#9618; Φ╢?Φ τ赦

after conversion (console) = ?

should look like (console) =  ╢W ┴p ▓y

--

3 bits to 2 bits is correct I think, however, the question marks aren't correct. I'm getting the data from oracle which is set to store unicode.

I'm doing something like this...

code

byte[] bTitle = persistantbean.getTitle().getBytes("UTF8");

String sUTFTitle = new String(bTitle,"UTF8");

String sTitleB5 = new String(sUTFTitle.getBytes(),"Big5");

System.out.println(sUTFTitle);

System.out.println(sTitleB5);

code

any comments and/or suggestions would be greatly appricated. Thanks in advance for any help.

Rick

[1143 byte] By [RickGavin] at [2007-9-27 14:34:14]
# 1

byte[] bTitle = persistantbean.getTitle().getBytes("UTF8");

String sUTFTitle = new String(bTitle,"UTF8");

/** don't need

String sTitleB5 = new String(sUTFTitle.getBytes(),"Big5");

**/

System.out.println(sUTFTitle);

/** don't need

System.out.println(sTitleB5);

**/

Java will automatic convert the Unicode to the platform encoding when you call System.out. Are you using the multilingual console? like commmand com in Windows 2000 or linux console?

cwlimno1 at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...
# 2

> 3 bits to 2 bits is correct I think, however, the

> question marks aren't correct. I'm getting the data

> from oracle which is set to store unicode.

>

> I'm doing something like this...

> code

>

>

> byte[] bTitle =

> e = persistantbean.getTitle().getBytes("UTF8");

>

>String sUTFTitle = new String(bTitle,"UTF8");

> String sTitleB5 = new

> new String(sUTFTitle.getBytes(),"Big5");

>System.out.println(sUTFTitle);

>System.out.println(sTitleB5);

you don't want to store UTF-8 or Big5 in a String. a String contains unicode characters that are converted during writing. the appropriate way is to use a OutputStreamWriter with the appropriate encoding.

typically you would useWriter out = new OutputStreamWriter( someOutputStream, "Big5" );

out.write( persistantbean.getTitle() );

orbyte[] asBig5 = persistantbean.getTitle().getBytes( "Big5" );

regards

robert

r_klemme at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...
# 3
Not using a multi-lang console, but i should still see the correct ascii characters.
RickGavin at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...
# 4

1. byte[] bTitle = persistantbean.getTitle().getBytes("UTF8");

2. String sUTFTitle = new String(bTitle,"UTF8");

3. String sTitleB5 = new String(sUTFTitle.getBytes(),"Big5");

You seem to have the common delusion that there is such a thing as a Big5 string and a UTF-8 string. There isn't. All strings in Java are Unicode (UTF-16). A string can be converted to an array of bytes using a particular encoding, or vice versa. So let's see exactly what that code you wrote does:

1. You get a String from persistantbean and convert it to UTF-8 bytes, using the UTF-8 encoding.

2. You convert that array of bytes back to a String, using UTF-8 encoding. This should get you back the same String data that persistantbean originally returned.

3. You convert the String to an array of bytes, but since you didn't specify the encoding, the default encoding for your platform is used. This may be ISO-8859-1 or it may be something else. Then you convert that array of bytes back to a String, using the Big5 encoding, resulting in garbage.

Here's what you should do instead:String sTitle = persistantbean.getTitle();

byte[] bBig5Title = sTitle.getBytes("Big5");

DrClap at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...
# 5

> Here's what you should do instead:String sTitle

> = persistantbean.getTitle();

> byte[] bBig5Title = sTitle.getBytes("Big5");

Forgive me if i'm ignorant with byte arrays, but what do I do with the byte array once I have it in this state, the data needs to be sent to the servletResponse's writer to be written to a web page, the writer doesn't take a byte array. If i convert it back to a string at this point, won't I loose the encoding, since you said ALL strings are UTF-16.

so, how would i send it to the a Writer (HttpServletResponse.getWriter()) and or to the console to check against the ascii values..

Thanks for your time..

Rick

RickGavin at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...
# 6
> so, how would i send it to the a Writer> (HttpServletResponse.getWriter()) and or to the> console to check against the ascii values..you don't. you access the output stream and then create a writer on top of that with the proper encoding.robert
r_klemme at 2007-7-5 22:33:39 > top of Java-index,Core,Core APIs...