Encoding
Hi.
Is there any character output stream (file or screen) that allows me to specify a character set?
I have some Hebrew characters which I would like to send to a file or screen.
I tried to do it with FileReader and FileWriter, but they dont let me specify encodings.
Then I tried to send the characters to the screen, but System.out.format asks for a locale instead of an encoding.
System.out.format(new Locale("hb","IS"),"%s", "תשומת ליבך! ");
It printed question marks!!!(? ?! )
May I have specified wrong language and country codes?
Tnx
Message was edited by:
charllescuba
I found the language and country codes for Israel by iterating over:
Locale.getAvailableLocales
It returned:
Language code: iw
Country code: IL
then I did the following:
System.out.format(new Locale("hb","IL"),"%s", " תשומת ליבך! ");
And here is what I got:
? ?!
Interesting is browsing the web (ISOs homepage) I found another language code for Israel: he and heb
Tnx anyway!!!
Hope this unsolved case may help others
Message was edited by:
charllescuba
> Yeahhh
> It is that hard
It is. Here's a list of places that it can break down, so you can walk through them:
1) Java represents all strings internally as Unicode. As long as your string contains the Unicode characters that you expect (see Hebrew chart here: http://www.unicode.org/charts/PDF/U0590.pdf ), then the problem is in writing the strings (#4). Otherwise it's in reading the strings (#2 or #3)
2) If you are reading your Hebrew strings from a file / database / whatever, you have to ensure that you're using the correct encoding when reading.
3) If you are writing Hebrew string literals in your Java code, then you have to verify that your editor is writing those characters in an appropriate encoding, and the Java compiler is reading them in an appropriate encoding. Usually these are determined by your computer's local. If you're using Unicode escapes (eg, '\u05d0'), then this won't be your problem.
4) If the strings are correct inside the JVM, then there's a problem writing them to the terminal. This could happen because System.out is using the wrong encoding, or the terminal program is using the wrong encoding, or you don't have the proper glyphs in whatever font your terminal program is using.
Edit for nit-pickers: no, System.out doesn't "have" an encoding, since it's a byte stream. It uses the platform's default encoding when converting strings for output.