Encoding

Hi.

Is there any character output stream (file or screen) that allows me to specify a character set?

I have some Hebrew characters which I would like to send to a file or screen.

I tried to do it with FileReader and FileWriter, but they dont let me specify encodings.

Then I tried to send the characters to the screen, but System.out.format asks for a locale instead of an encoding.

System.out.format(new Locale("hb","IS"),"%s", "תשומת ליבך! ");

It printed question marks!!!(? ?! )

May I have specified wrong language and country codes?

Tnx

Message was edited by:

charllescuba

[713 byte] By [charllescubaa] at [2007-11-27 9:09:53]
# 1
Is this that hard?:)
charllescubaa at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...
# 2
YeahhhIt is that hard:|
charllescubaa at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...
# 3

I found the language and country codes for Israel by iterating over:

Locale.getAvailableLocales

It returned:

Language code: iw

Country code: IL

then I did the following:

System.out.format(new Locale("hb","IL"),"%s", " תשומת ליבך! ");

And here is what I got:

? ?!

Interesting is browsing the web (ISOs homepage) I found another language code for Israel: he and heb

Tnx anyway!!!

Hope this unsolved case may help others

Message was edited by:

charllescuba

charllescubaa at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...
# 4
You should use UTF-8 character set. But I dont know how to do :)) Just look for it
napstara at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...
# 5
I already did it.UTF-8 is known for being able to reckon any Unicode code point (representing any Unicode sequence)I even tried UTF-16. It didnt work eitherTnx anyway
charllescubaa at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...
# 6

> Yeahhh

> It is that hard

It is. Here's a list of places that it can break down, so you can walk through them:

1) Java represents all strings internally as Unicode. As long as your string contains the Unicode characters that you expect (see Hebrew chart here: http://www.unicode.org/charts/PDF/U0590.pdf ), then the problem is in writing the strings (#4). Otherwise it's in reading the strings (#2 or #3)

2) If you are reading your Hebrew strings from a file / database / whatever, you have to ensure that you're using the correct encoding when reading.

3) If you are writing Hebrew string literals in your Java code, then you have to verify that your editor is writing those characters in an appropriate encoding, and the Java compiler is reading them in an appropriate encoding. Usually these are determined by your computer's local. If you're using Unicode escapes (eg, '\u05d0'), then this won't be your problem.

4) If the strings are correct inside the JVM, then there's a problem writing them to the terminal. This could happen because System.out is using the wrong encoding, or the terminal program is using the wrong encoding, or you don't have the proper glyphs in whatever font your terminal program is using.

Edit for nit-pickers: no, System.out doesn't "have" an encoding, since it's a byte stream. It uses the platform's default encoding when converting strings for output.

kdgregorya at 2007-7-12 21:50:37 > top of Java-index,Java Essentials,Java Programming...