Effective Unicode Handling
Hallo all,
I am reading a datafile which will return me a string -> \u6210 . I want to convert this string into it's UTF-8 Encoding.
"\u6210".getBytes("UTF-8");
This is the normal way to do it i believed .
But when i try this from the input datafile. It's reading the '\u' unicode escape seq as a character and not an escape seq. Is there anyway to force the compiler to take the '\u' as an esc seq ?
Thanks for all the help
[493 byte] By [
irwin74] at [2007-9-26 1:30:19]

Hi,Can you give more specific details about what you are describing in this line?>>But when i try this from the input datafile<<Regards,Joe
joefk at 2007-6-29 1:28:13 >

Well my datafile has the value \u6210 hardcoded as an xml content.
What i am trying to do here is to read this xml data file , which will return me the string '\u6210' w/o quote and convert this string into the encoding format which i want.
i got a string data type back which is \u6210 , when i use this string and do a getBytes function which the encoding , the compiler is interpreting '\u' as a string type too and not the unicode escape sequence
In this manner i get the wrong encoding format .
Thanks
Ok, I'm only slightly familiar with JAXB, but here's what I think is going on:
Interpretation of a six character escape sequence (e.g. \u6210) as a single unicode character is built into the java reader so that if such a sequence is in a *.java file it is compiled as a unicode character by javac or if such a sequence is present in a *.properties file it is interpreted as a single unicode character when the java interpreter invokes ResourceBundle.getBundle at runtime.
However, to the best of my knowledge, this capability (of converting a six character escape sequence into a single unicode character) is *not* present anywhere in the java 2 API. Therefore, when your program reads the XML data file using the unmarshall method it will treat your sequence as a six character string. If you want to store unicode characters above 255 in your XML file there would be two ways to do it: 1)does the unmarshall method have an optional encoding argument? (I don't have its documentation); 2) you could write your own method to decode the six character escape sequence into a single unicode character.
Regards,
Joe
P.S. post back if I'm not being clear.
joefk at 2007-6-29 1:28:13 >
