Converting non-English characters to their unicode representation
I have series of files/templates where each contains a locale specific language such as Chinese, Japanese or German. I need to find out how do I get their unicode representations so I can send as html formatted email?
I can already send one for the English template as html formatted email w/out a problem. I was able to find a sample of unicode representation of Japanese and send that as a test. But how do I get the temaplates that I have and convert their contents into unicode?
Thanks in advance.
please dis-regard. I figured it out.
chehrehk
[579 byte] By [
chehrehka] at [2007-10-3 2:41:34]

You need to know what character encoding was used for the template text.
For example, you could have Japanese text encoded using UTF-8 or
encoded using ISO-2022-JP and the same Japanese characters would
be represented as a different sequence of bytes. Without knowing which
charset was used, you won't be able to convert the byte sequence back
into Unicode characters (e.g., to store in a Java String).
If you do know which charset was used, java.io.Reader will convert the
byte stream into Unicode characters.
If the charset information is not available, there are heuristics that you
can use to try to guess the correct charset, but by their nature they're
going to be wrong sometimes.