Want clarification on charset concepts

Hi,

after reading some articles about java i18N, i am still not very clear about the reason why the problem (below) exist..

1. write a java class, say Test1 (source code in UTF-8) with some multibyte characters, say chinese, japanese chars,

2. compile with option -encoding....

3. run in locale, say chinese or japanese, the chars display correctly (output by system.out.println(...))

4. but if changing the locale to others, say english, the chars display incorrectly

i have 2 questions:

1. for step 2 about compiling java source...

what will be the multibyte chars (chinese, japanese) be in the bytecode (.class) file? they get convert from utf-8 to utf-16? or stored in platform specific charset ? how the java vm read in these chars when running?

2. after changing locale from chinese, japanese to english (without recompiling the source file) ... multibytes chars can't display correctly..

is it related to java vm incorrectly reading the string literal embedded in the bytecode file...?

if yes, how can i instruct the java vm to interpret bytecode under a specific locale (without changing system setting, eg. make change in regional setting in windowxp) or compile java program in a locale independent way so that the program can really be portable?

if no, could anybody explain the cause in details? give me steps by steps guide to write portable java program, start from choosing source file encoding, javac options to reading multibyte chars from database?

thank you very much!!!!

[1579 byte] By [pslkwana] at [2007-11-26 12:28:51]
# 1

Your problems relate to you expectation that the system console can display characters from legacy code pages other than the one it is running in - since you say that you output to system.out.println(...)

This is a limitation of the console, not anything related to bytecode, compilation options, etc.

one_danea at 2007-7-7 15:37:41 > top of Java-index,Desktop,I18N...
# 2

Hi one_done,

You are right, the console can't output chinese chars, and i try JLabel, the chars

can display correctly..

actually what the problem i encountered is reading chinese char(big5) from the

database (sybase 12.5 with default charset = cp850, column type = chars, but data

is actually stored in big5 charset)...

as i am trying the use java persistence api to get the data from database, if i

run my program in chinese locale (window xp), no problem, the chars can display

correctly..

****with sybase specific connection property set --> charset=big5

but if i change the locale to english, the problem happen.. so i want to know how

the locale affect the reading of data from jdbc?

is this problem specific to sybase jdbc driver or other database's JDBC will

experience the same problem?

how can i solve the problem?

thank you...

Ping

pslkwana at 2007-7-7 15:37:41 > top of Java-index,Desktop,I18N...
# 3

If this was your problem all along, why on earth start out by asking a completely different question, wasting people's time?

Anyway, the first comment on your database problem: you do know that it's a recipe for disaster - if not now, then somewhere down the road - to define your DB with a Latin1 character set, and then storing Chinese characters in that DB? It may work for you now, in one specific, controlled environment, because you happen to know that you need to specify big5 as the db connection setting. But what happens when somebody else tries to access your data? Or when you need to get both Chinese and French data? The only sensible solution is the define your DB with UTF8 (or, if you really only have Chinese data, with a Chinese-specific encoding - but that is of course a shortsighted solution).

To get to your specific problem: if your code works with the default locale set to Chinese, but not if your default locale is set to English, then the data is being converted to the default encoding of the JVM somewhere (question marks being an indication that a converter did not find an appropriate character in the target encoding - which would be the case for an attempted Big5 -> 1252 conversion).

one_danea at 2007-7-7 15:37:41 > top of Java-index,Desktop,I18N...
# 4
Hai,i am try to implement i18n concept in my program ,buy i am unable analyze the right program ,if any one know write small example give the explanationfrom programmer
y_nagireddya at 2007-7-7 15:37:41 > top of Java-index,Desktop,I18N...