Passing Unicode string to Java from Native function
I have a native function that returns a Unicode string, which I use the NewString() jni method to return it. My native string is in UTF16 LE for Windows and BE for Unix (platform default encoding). Currently I am only using the ASCII characters that are represented in UTF16. When I get a string in Java, the byte array of the string is indeed UTF16LE or BE depending on the platform. However, when I try to use this string on GUI, it shows all spaces on Unix (BE).
Here is what I tried:
// get the string from my native method
String strUTF16 = getMyString();
// get the byte array
byte[] byteArray = strUTF16.getBytes();
// According to the Java reference, this should construct a new String
// by decoding the specified array of bytes using the platform's default charset,
// but it does not seem to be working. WHY?
String str1 = new String (byteArray);
// this works on unix
String str2 = new String (byteArray, "UTF-16BE");
My question is, why is my first try to construct a string not working? If I have to specify the encoding as in my second try, how can I detect the platform default encoding? I do not want to hardcode the encoding if I don't have to...
Thanks much.
Nikki
[1281 byte] By [
cactuara] at [2007-11-27 9:15:33]

# 1
You told us that you have a problem.
You did not tell us what that problem is.
You need to tell us what exactly you are getting.
You need to tell use exactly what you expect to get.
> String strUTF16 = getMyString();
Yes you have a string.
> byte[] byteArray = strUTF16.getBytes();
Yes this definitely, and correctly, converts the string you had into a mapped representation of the default character set.
> String str2 = new String (byteArray, "UTF-16BE");
This takes the bytes and maps them via the encoding provided to the string. It only "works" when the byte array represents that encoding. It might seem to work when the byte array contains other encodings by happenstance.
# 2
My question was this.
When I do:
String str1 = new String (byteArray);
The str1's byte sequence is still in UTF-16BE, where each ascii character is represented in double bytes.
But if I do this:
String str2 = new String (byteArray, "UTF-16BE");
The str2's byte sequence shows the UTF8 format.
The str1 does not show up on my GUI, probably because the first byte value is 0.
Thanks for your help.
# 3
> My question was this.
>
> When I do:
> String str1 = new String (byteArray);
>
> The str1's byte sequence is still in UTF-16BE, where
> each ascii character is represented in double bytes.
ASCII is an encoding.
If you have bytes that represent an encoding of UTF-16BE then that is the encoding, not ASCII.
Now perhaps you mean that the bytes are only in the 0-127 range in the UTF-16BE and that for that encoding there is a one to one correspondance with same encoding in ASCII. Unicode mappings all map the lower 127 characters to the same as the ASCII character set.
Notice that is a 'map' though. It doesn't mean they are the same. In particular there are combinations of ASCII characters that would map to something different in UTF-16BE.
>
> But if I do this:
> String str2 = new String (byteArray, "UTF-16BE");
>
> The str2's byte sequence shows the UTF8 format.
ALL strings in java use the same unicode encoding (which specific one I am not sure.) Period. Nothing you can do will change that.
So if you have a string then the characters in there are in fact that encoding.
>
> The str1 does not show up on my GUI, probably because
> the first byte value is 0.
>
All characters in a String are mapped to the "screen" when you attempt to display them. How they are mapped depends on your OS. But if a mapping can not be done then you will see a '?'.
Now some characters do get mapped depends on various things. If you actually have a null character (\x0000) in your string then it will either be mapped to something specific or to a '?'. On my computer it gets mapped to a space character. You can test that on your computer by printing the string with something around it like the following....
String s = ...
System.out.println("s=|" + s + "|");
# 4
Thanks for your reply.
> ASCII is an encoding.
yes I meant that the bytes are only in the 0-127 range. Sorry for the confusion.
Since I am dealing with only the characters in the 0-127 range, my byte values are the same for ASCII and UTF8 encoding.
> String s = System.out.println("s=|" + s + "|");
The strange thing is that when I do System.out.println, it prints the string just fine for my both examples (str1 and str2). But on GUI, the str1 shows nothing, not even '?'.
# 5
> > String s = System.out.println("s=|" + s + "|");
> The strange thing is that when I do
> System.out.println, it prints the string just fine
> for my both examples (str1 and str2). But on GUI,
> the str1 shows nothing, not even '?'.
Print the length of the string in the GUI (not the console) before you display it.
If it is empty then you have a problem elsewhere.
If it is not empty then the problem is in how it maps to the gui for display.