writing literals in program text
[nobr]Say I have a JSP which specifies "text/html; charset=utf-8" as the ContentType and I want to write a Chinese text literal in the program text and that literal includes characters from the Big5-HKSCS charset, and let's say I am writing this JSP program on a Redhat 8 box running Tomcat 5.5.9. So how do I do that, without first having to convert the literal into \uxxxx sequences with native2ascii.
I have failed the following (I get question marks mostly where I expect to see Chinese chars):
<%
String s =new String("<insert your chinese text>".getBytes("ISO8859_1" ),"UTF8" );
%>
I want to see Chinese text here: <%= s %> <br>
[/nobr]
[822 byte] By [
dickensla] at [2007-10-1 20:38:16]

[nobr]> I have failed the following (I get question marks
> mostly where I expect to see Chinese chars):
>
<%
> String s = new String( "<insert your chinese
> text>".getBytes( "ISO8859_1" ), "UTF8" );
> %>
>
> I want to see Chinese text here: <%= s %> <br>
That's because it's rubbish code.
If you take some Chinese characters and encode them into bytes using ISO8859-1 -- which defines only the Latin characters and nothing else -- then the result is all question marks, because ISO8859-1 can't encode them.
Of course UTF-8 has no problem decoding those '?' bytes into '?' chars at all.
Your code should look like this:String s = "<insert your chinese text>";
And what do you have against native2ascii? Doesn't it work?[/nobr]
Assuming that you really just want to hardcode the Chinese text directly into your code, I don't understand why it would be so important to avoid the \uxxxx notation - since it's a trivial, one time conversion?
Secondly, I don't understand the purpose of the getBytes method in your code - you create a byte array in ISO-8859-1 encoding (almost certain to wreak havoc on any Chinese characters), and then you take that byte array and decode it back to a string using utf-8 as the charset. Even assuming that the encoding back and forth would work, what does it accomplish?
Thanks for DrClap and one_dane's replies to my query. In hindsight, I realize that I have written something stupid.
As to what I have against native2ascii, well, all those \uxxxx sequences aren't too pleasing to human readers right ? I can include the original (whatever encoding) literal as comments but it's rather cumbersome and I wonder if there is a better way.
If you guys say "get an editor that allows you to edit text files including utf-8 encoded characters", well, my next question is gonna be "which one?"
Then again, are you guys saying that it's not even proper to write utf-8 encoded literals in the program text (in my case, a JSP page) even one can do it?