Problems with String.getBytes()
Hello everybody,
I'm working with Eclipse 3.1, JDK 1.4.2
I have the following problem.
I have a string containig generated unicode signs for a display connected via a serial com port.
The sring looks like:
\u0096\u00AD\u00A2\u008D\u008E
These signs don't make sense, they are converted arabic unicode signs. They needed to be converted to get the driver for the display running.
For I'm connected serial I need bytes rather than chars.
But when I go String.getBytes() the decimal representations of the signs are:
63, -83, -94, 63, 63
which doesnt make much sense, why so many 63 for example, why is not every byte negative?
I made another check going
byte textcmd[] ={(byte) 0x96,(byte) 0xAD,(byte) 0xA2,(byte) 0x8D,(byte) 0x8E};
Which is essentially just the least sugnificant byte of every char in the string.
Well the decimal represantations in this case are:
-106, -83, -94, -115, -114
which does make far more sense to me.
How can I convert my String into a byte array that looks like i wnat it (second case). Is there any java method to do so? Why does the getBytes() method fail?
Did you understand what I tried to tell you?
Can you help me?
Thank you all very much.
Greetings,
Stefan
> Hello everybody,
>
> I'm working with Eclipse 3.1, JDK 1.4.2
> I have the following problem.
> I have a string containig generated unicode signs for
> a display connected via a serial com port.
> The sring looks like:
> \u0096\u00AD\u00A2\u008D\u008E
> These signs don't make sense, they are converted
> arabic unicode signs. They needed to be converted to
I'm quite sure they aren't. They're still in the extended ASCII range. 00A2 is the cent sign.
> Did you understand what I tried to tell you?
Not quite. :|
Are you sure you're really providing the correct input? Especially if you have those chars hardcoded as single letters instead as Unicode entities - the compiler might mess it up when reading the source file with a wrong encoding.
Ok, I try to tell it more detailed.
I need to program a driver for a display (it's huge 4x2 meter) which must display arabic letters. for there is no windows machine within that display I need to program tht driver on my own.
I have a manual here, saying when I send the byte 0x80 the display will show a special arabic sign. This sign is in the unicode table at lets say FE97. I have a map in my driver which converts the unicode signs into the signs needed by the the display. I can map the unicode chars for example \uFE97 into my chars for example \u0080.
All I need to do now is convert an array of chars (which contain my "unicode") into an byte array.
Is this understandable?
Greetings,
Stefan
> Thank you very much, your method will work fine. with
> the "UTF-8" parameter the byte[].length is double,
> cause every valid byte is preceeded by an -62, but I
> will just filter the valid bytes into a new array.
This sounds very very wrong! The UTF-8 for \u0096\u00AD\u00A2\u008D\u008E is [-62, -106, -62, -83, -62, -94, -62, -115, -62, -114]
but the -62 values are required. You cannot just get rid of them as they are part of the representation of the character! The result you will get is he same as if you had used ISO-8859-1 encoding! Run the followng code fragment and see for yourself!!
String value = "\u0096\u00AD\u00A2\u008D\u008E";
System.out.println(Arrays.toString(value.getBytes("UTF-8")));
System.out.println(Arrays.toString(value.getBytes("ISO-8859-1")));
> Thank you very much, your method will work fine. with
> the "UTF-8" parameter the byte[].length is double,
> cause every valid byte is preceeded by an -62, but I
> will just filter the valid bytes into a new array.
>
> Thanks again,
> Stefan
Actually what you need to do is to find the character encoding that your device expects, and then you can code your strings in Arabic.
That's the way Java does things; Strings and char values are always in UNICODE (see www.unicode.org) (which means \u600 to \u6ff for arabic) and uses a specified character encoding when translating these to and from a byte stream.
Each national character encoding has a name. Most of them are identical to ASCII for 0-127 and code their national characters in 128-255.
Find the encoding name for your display and, odds are, the JRE has it in the library.
BTW the character encoding ISO-8859-1 simply maps UNICODE characters 0-255 on to bytes.