writing literals in program text

[nobr]Say I have a JSP which specifies "text/html; charset=utf-8" as the ContentType and I want to write a Chinese text literal in the program text and that literal includes characters from the Big5-HKSCS charset, and let's say I am writing this JSP program on a Redhat 8 box running Tomcat 5.5.9. So how do I do that, without first having to convert the literal into \uxxxx sequences with native2ascii.

I have failed the following (I get question marks mostly where I expect to see Chinese chars):

<%

String s =new String("<insert your chinese text>".getBytes("ISO8859_1" ),"UTF8" );

%>

I want to see Chinese text here: <%= s %> <br>

[/nobr]

[822 byte] By [dickensla] at [2007-10-1 20:38:16]
# 1

[nobr]> I have failed the following (I get question marks

> mostly where I expect to see Chinese chars):

>

<%

> String s = new String( "<insert your chinese

> text>".getBytes( "ISO8859_1" ), "UTF8" );

> %>

>

> I want to see Chinese text here: <%= s %> <br>

That's because it's rubbish code.

If you take some Chinese characters and encode them into bytes using ISO8859-1 -- which defines only the Latin characters and nothing else -- then the result is all question marks, because ISO8859-1 can't encode them.

Of course UTF-8 has no problem decoding those '?' bytes into '?' chars at all.

Your code should look like this:String s = "<insert your chinese text>";

And what do you have against native2ascii? Doesn't it work?[/nobr]

DrClapa at 2007-7-13 2:36:30 > top of Java-index,Desktop,I18N...
# 2

Assuming that you really just want to hardcode the Chinese text directly into your code, I don't understand why it would be so important to avoid the \uxxxx notation - since it's a trivial, one time conversion?

Secondly, I don't understand the purpose of the getBytes method in your code - you create a byte array in ISO-8859-1 encoding (almost certain to wreak havoc on any Chinese characters), and then you take that byte array and decode it back to a string using utf-8 as the charset. Even assuming that the encoding back and forth would work, what does it accomplish?

one_danea at 2007-7-13 2:36:30 > top of Java-index,Desktop,I18N...
# 3

Thanks for DrClap and one_dane's replies to my query. In hindsight, I realize that I have written something stupid.

As to what I have against native2ascii, well, all those \uxxxx sequences aren't too pleasing to human readers right ? I can include the original (whatever encoding) literal as comments but it's rather cumbersome and I wonder if there is a better way.

If you guys say "get an editor that allows you to edit text files including utf-8 encoded characters", well, my next question is gonna be "which one?"

Then again, are you guys saying that it's not even proper to write utf-8 encoded literals in the program text (in my case, a JSP page) even one can do it?

dickensla at 2007-7-13 2:36:30 > top of Java-index,Desktop,I18N...
# 4
You should be able to include your Chinese characters in utf-8 in a JSP just fine, I don't see why not.Some text editors that will allow you to edit utf-8 text are SC Unipad and Ultraedit.
one_danea at 2007-7-13 2:36:30 > top of Java-index,Desktop,I18N...