UTF8 binary stream of characters to ASCII characters.

Hello,

I need an advice how to translate UTF8 binary stream of characters to ASCII characters. Translation will depends on the Locale (language) used.

For example, if UTF8 character ?(C381 in HEX) is used in Czech language I will need to translate it to two ASCII characters Ae; if the same ?character used in French language I will need to translate it to character A. Binary Stream will have some ACSII characters which will not need any translation as well.

Please, advise.

Thank you.

A Mickelson

[536 byte] By [AllaMikhelsona] at [2007-10-2 12:33:30]
# 1

U+C381 is "HANGUL SYLLABLE SSEOT"; surely it's not used in Czech?

Anyway, I don't think there's anything in the standard API for such a conversion. For removing accents you can use a "normalizer" from ICU4J, but for other kinds of conversitions you may have to write the code yourself.

http://icu.sourceforge.net/userguide/normalization.html

jsalonena at 2007-7-13 9:33:05 > top of Java-index,Desktop,I18N...
# 2

I am sure ?is used in Czech language. My assignment is to translate all UTF8 characters. I need to do translation not from UTF8 character to ASCII string. It could be translation from UTF8 character ?to ASCII string AU or something similar. I can not use existing java.nio classes.

Thank you,

abelkin

AllaMikhelsona at 2007-7-13 9:33:05 > top of Java-index,Desktop,I18N...
# 3

> U+C381 is "HANGUL SYLLABLE SSEOT"; surely it's not

> used in Czech?

Probably C381 is the two bytes used by UTF-8 to represent Á.

I suggested in the (probably accidental) cross-post to convert the bytes to Unicode using an InputStreamReader before converting the Unicode characters to their language-specific replacements.

And personally I'm used to seeing Czech words like "Neumannová" transliterated as "Neumannova", i.e. just without the accent, rather than "Neumannovae". Not my problem though as long as I don't have to read the result.

DrClapa at 2007-7-13 9:33:05 > top of Java-index,Desktop,I18N...
# 4
?to Ae was an example only. I have to do translation for multiple languages, so for some of them one UTF8 to multiple ASCII must be done. I will try InputStreamReader to Unicode, because String.getByte("UTF-8") didn't do any good for me.Thank you!
AllaMikhelsona at 2007-7-13 9:33:05 > top of Java-index,Desktop,I18N...