Unicode characters
Hello,
I want to use the Unicode "ISO-10646-UCS-2".
I have the "TradeMark" (? symbol in a database (POSTGRESQL 8.0, in UNICODE). With a JDBC Postgresql Driver, i get the symbol in a String
Example:
while (rs.next) {
String t = rs.get("DATA");
}
When i print "t", i get a "?" (it'is normal i think because java is in "ANSI").
So i do this:
byte[] test = t.getBytes("ISO-10646-UCS-2");
when i print the page and code of test, i do not get the good think : page: 0x21 and code: 0x22
i get page: 0x0 and code: 0x99
Why ?
PS: I have tried System.setProperty("file.encoding", "ISO-10646-UCS-2"?with no success.
[705 byte] By [
s1ll4gea] at [2007-10-2 0:32:12]

I have made a mistake, it is rs.getString and not rs.get
Another test:
the function printBytes(byte[], String encodingName) just makes a system.out.println of the hexadecimal value of byte[] like:
"EncodingName = 0x.....";
public static void main(String argv[]) {
try {
String t = "?;// = in UTF-8
printBytes(t.getBytes(), System.getProperty("file.encoding"));
printBytes(t.getBytes(System.getProperty("file.encoding")),System.getProperty("file.encoding"));
String t1 = new String(t.getBytes(System.getProperty("file.encoding")), "ISO-10646-UCS-2");
printBytes(t1.getBytes(), "Default -> ISO 10646");
String t2 = new String(t.getBytes("UTF-8"), "ISO-10646-UCS-2");
printBytes(t2.getBytes(), "UTF-8 -> ISO 10646");
String t3 = new String(t.getBytes("ISO-10646-UCS-2"), "ISO-10646-UCS-2");
printBytes(t3.getBytes(), "ISO 10646 -> ISO 10646");
} catch (Exception e) {
System.out.println("e");
}
}
The result:
ANSI_X3.4-1968[0] = 0x3f
ANSI_X3.4-1968[0] = 0x3f
Default -> ISO 10646[0] = 0x3f
UTF-8 -> ISO 10646[0] = 0x3f
ISO 10646 -> ISO 10646[0] = 0x3f
So encoding does not work ? :/ always the same result
(IN UCS-2, "? is normally in 2 bytes: page: 0x00 and code: 0xe9)
Try this:
while (rs.next)
{
byte[] dataArray = rs.getBytes("DATA");
String data = new String(dataArray , "UTF-16BE");
JOptionPane.showMessageDialog(null, data);
}
It does nos work, i see a "square" [] instead of the trademark symbol.
With your code, the page of TM is: Page: ffffffc2 and Code: ffffff99
and with the "? and with your code, the page is: ffffffc3, code: ffffffa9. Instead of 0x00 and 0xe9.I'm trying with other encoding. Thanks for the method.
Ok, like i have seen, with your method, ?is 0xc2, 0x99.
0x00c2 is the symbol for "? and 0x99 is 153 in base10,
#&153; = #&8482; = ?br>I think this is UTF-8 (UTF-8 = ASCII + for special char, adding a "bytes", g閚閞ally "?);
"? = UTF-8 ("?+" ?);
And, if we take again the exemple with "? your code with "UTF-16BE" or "ISO-10646-UCS-2" return: 0xc2 = "? and 0xa9 = " ?;
So, i have UTF-8 encoding format, i need to find a method for convert UTF-8 to "ISO-10646-UCS-2", or maybe another encoding.
Thanks again
byte[] dataArray = rs.getBytes(A_VALUE+"_"+VALUE);
//debug
for (int i = 0; i< dataArray.length; i++) {
byte b = dataArray[i];
System.out.println(Integer.toHexString((int) b));
}
This return, like i said, 0xc2, 0x99.
String value = new String(dataArray , "UTF-16BE");
dataArray = value.getBytes();
//debug
for (int i = 0; i< dataArray.length; i++) {
byte b = dataArray[i];
System.out.println(Integer.toHexString((int) b));
}
This return 0x3f (UTF-16BE, UFT-16LE, UTF-8, ISO-10646-UCS-2);
So, String are responsible ?
> With your code, the page of TM is: Page: ffffffc2 and
> Code: ffffff99
What are the exact bytes returned by getBytes("DATA"); ?
If you don't have 0x21 0x22, then don't be surprised not to be able to catch a trademark sign.
Here is a simple test with "my code":
int[] codeArray = { 0x21, 0x22 };
// bCodeArray represents what is supposed to be returned from your DB.
byte[] bCodeArray = new byte[codeArray.length];
for (int i = 0; i < bCodeArray.length; i++)
bCodeArray[i] = (byte)codeArray[i];
String code = new String(bCodeArray, "UTF-16BE");
JOptionPane.showMessageDialog(null, code);
This shows the trademark sign.
The Database encoding is UNICODE but UTF-8, not UCS2.I want to convert UTF-8 to UCS2 !!(PostGresql does not support UCS2, do u know a "open source" database which support UCS-2 ?)
OKbyte[] TM$utf8 =?font color="navy">{(byte)0xE2, (byte) 0x84, (byte) 0xA2}; // trade mark sign in UTF-8
String TM =?font color="navy">new String(TM$utf8, "UTF-8");
String is already in UCS2 but if you want you can convert it to bytes toobyte[] TM$utf16be =燭M.getBytes("UTF-16BE");