Unicode characters

Hello,

I want to use the Unicode "ISO-10646-UCS-2".

I have the "TradeMark" (? symbol in a database (POSTGRESQL 8.0, in UNICODE). With a JDBC Postgresql Driver, i get the symbol in a String

Example:

while (rs.next) {

String t = rs.get("DATA");

}

When i print "t", i get a "?" (it'is normal i think because java is in "ANSI").

So i do this:

byte[] test = t.getBytes("ISO-10646-UCS-2");

when i print the page and code of test, i do not get the good think : page: 0x21 and code: 0x22

i get page: 0x0 and code: 0x99

Why ?

PS: I have tried System.setProperty("file.encoding", "ISO-10646-UCS-2"?with no success.

[705 byte] By [s1ll4gea] at [2007-10-2 0:32:12]
# 1
I have made a mistake, it is rs.getString and not rs.get
s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 2

Another test:

the function printBytes(byte[], String encodingName) just makes a system.out.println of the hexadecimal value of byte[] like:

"EncodingName = 0x.....";

public static void main(String argv[]) {

try {

String t = "?;// = in UTF-8

printBytes(t.getBytes(), System.getProperty("file.encoding"));

printBytes(t.getBytes(System.getProperty("file.encoding")),System.getProperty("file.encoding"));

String t1 = new String(t.getBytes(System.getProperty("file.encoding")), "ISO-10646-UCS-2");

printBytes(t1.getBytes(), "Default -> ISO 10646");

String t2 = new String(t.getBytes("UTF-8"), "ISO-10646-UCS-2");

printBytes(t2.getBytes(), "UTF-8 -> ISO 10646");

String t3 = new String(t.getBytes("ISO-10646-UCS-2"), "ISO-10646-UCS-2");

printBytes(t3.getBytes(), "ISO 10646 -> ISO 10646");

} catch (Exception e) {

System.out.println("e");

}

}

The result:

ANSI_X3.4-1968[0] = 0x3f

ANSI_X3.4-1968[0] = 0x3f

Default -> ISO 10646[0] = 0x3f

UTF-8 -> ISO 10646[0] = 0x3f

ISO 10646 -> ISO 10646[0] = 0x3f

So encoding does not work ? :/ always the same result

(IN UCS-2, "? is normally in 2 bytes: page: 0x00 and code: 0xe9)

s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 3

Try this:

while (rs.next)

{

byte[] dataArray = rs.getBytes("DATA");

String data = new String(dataArray , "UTF-16BE");

JOptionPane.showMessageDialog(null, data);

}

jfbrierea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 4
It does nos work, i see a "square" [] instead of the trademark symbol.
s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 5
With your code, the page of TM is: Page: ffffffc2 and Code: ffffff99
s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 6
and with the "? and with your code, the page is: ffffffc3, code: ffffffa9. Instead of 0x00 and 0xe9.I'm trying with other encoding. Thanks for the method.
s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 7

Ok, like i have seen, with your method, ?is 0xc2, 0x99.

0x00c2 is the symbol for "? and 0x99 is 153 in base10,

#&153; = #&8482; = ?br>I think this is UTF-8 (UTF-8 = ASCII + for special char, adding a "bytes", g閚閞ally "?);

"? = UTF-8 ("?+" ?);

And, if we take again the exemple with "? your code with "UTF-16BE" or "ISO-10646-UCS-2" return: 0xc2 = "? and 0xa9 = " ?;

So, i have UTF-8 encoding format, i need to find a method for convert UTF-8 to "ISO-10646-UCS-2", or maybe another encoding.

Thanks again

s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 8

byte[] dataArray = rs.getBytes(A_VALUE+"_"+VALUE);

//debug

for (int i = 0; i< dataArray.length; i++) {

byte b = dataArray[i];

System.out.println(Integer.toHexString((int) b));

}

This return, like i said, 0xc2, 0x99.

String value = new String(dataArray , "UTF-16BE");

dataArray = value.getBytes();

//debug

for (int i = 0; i< dataArray.length; i++) {

byte b = dataArray[i];

System.out.println(Integer.toHexString((int) b));

}

This return 0x3f (UTF-16BE, UFT-16LE, UTF-8, ISO-10646-UCS-2);

So, String are responsible ?

s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 9

> With your code, the page of TM is: Page: ffffffc2 and

> Code: ffffff99

What are the exact bytes returned by getBytes("DATA"); ?

If you don't have 0x21 0x22, then don't be surprised not to be able to catch a trademark sign.

Here is a simple test with "my code":

int[] codeArray = { 0x21, 0x22 };

// bCodeArray represents what is supposed to be returned from your DB.

byte[] bCodeArray = new byte[codeArray.length];

for (int i = 0; i < bCodeArray.length; i++)

bCodeArray[i] = (byte)codeArray[i];

String code = new String(bCodeArray, "UTF-16BE");

JOptionPane.showMessageDialog(null, code);

This shows the trademark sign.

jfbrierea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 10
The Database encoding is UNICODE but UTF-8, not UCS2.I want to convert UTF-8 to UCS2 !!(PostGresql does not support UCS2, do u know a "open source" database which support UCS-2 ?)
s1ll4gea at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...
# 11

OKbyte[] TM$utf8 =?font color="navy">{(byte)0xE2, (byte) 0x84, (byte) 0xA2}; // trade mark sign in UTF-8

String TM =?font color="navy">new String(TM$utf8, "UTF-8");

String is already in UCS2 but if you want you can convert it to bytes toobyte[] TM$utf16be =燭M.getBytes("UTF-16BE");

jsalonena at 2007-7-15 16:46:40 > top of Java-index,Desktop,I18N...