Problems with String.getBytes()

Hello everybody,

I'm working with Eclipse 3.1, JDK 1.4.2

I have the following problem.

I have a string containig generated unicode signs for a display connected via a serial com port.

The sring looks like:

\u0096\u00AD\u00A2\u008D\u008E

These signs don't make sense, they are converted arabic unicode signs. They needed to be converted to get the driver for the display running.

For I'm connected serial I need bytes rather than chars.

But when I go String.getBytes() the decimal representations of the signs are:

63, -83, -94, 63, 63

which doesnt make much sense, why so many 63 for example, why is not every byte negative?

I made another check going

byte textcmd[] ={(byte) 0x96,(byte) 0xAD,(byte) 0xA2,(byte) 0x8D,(byte) 0x8E};

Which is essentially just the least sugnificant byte of every char in the string.

Well the decimal represantations in this case are:

-106, -83, -94, -115, -114

which does make far more sense to me.

How can I convert my String into a byte array that looks like i wnat it (second case). Is there any java method to do so? Why does the getBytes() method fail?

Did you understand what I tried to tell you?

Can you help me?

Thank you all very much.

Greetings,

Stefan

[1585 byte] By [Stefan-Schottlanda] at [2007-10-2 20:17:07]
# 1

> Hello everybody,

>

> I'm working with Eclipse 3.1, JDK 1.4.2

> I have the following problem.

> I have a string containig generated unicode signs for

> a display connected via a serial com port.

> The sring looks like:

> \u0096\u00AD\u00A2\u008D\u008E

> These signs don't make sense, they are converted

> arabic unicode signs. They needed to be converted to

I'm quite sure they aren't. They're still in the extended ASCII range. 00A2 is the cent sign.

> Did you understand what I tried to tell you?

Not quite. :|

Are you sure you're really providing the correct input? Especially if you have those chars hardcoded as single letters instead as Unicode entities - the compiler might mess it up when reading the source file with a wrong encoding.

CeciNEstPasUnProgrammeura at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 2

Ok, I try to tell it more detailed.

I need to program a driver for a display (it's huge 4x2 meter) which must display arabic letters. for there is no windows machine within that display I need to program tht driver on my own.

I have a manual here, saying when I send the byte 0x80 the display will show a special arabic sign. This sign is in the unicode table at lets say FE97. I have a map in my driver which converts the unicode signs into the signs needed by the the display. I can map the unicode chars for example \uFE97 into my chars for example \u0080.

All I need to do now is convert an array of chars (which contain my "unicode") into an byte array.

Is this understandable?

Greetings,

Stefan

Stefan-Schottlanda at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 3
63 is the ASCII for a '?' character which means that the character code could not be converted to a byte using your default encoding. This could meant that your default encodign is ASCII and anything above 0x7F is outside the ASCII range.Try String.getBytes("UTF-8");
sabre150a at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 4
Thank you very much, your method will work fine. with the "UTF-8" parameter the byte[].length is double, cause every valid byte is preceeded by an -62, but I will just filter the valid bytes into a new array.Thanks again,Stefan
Stefan-Schottlanda at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 5
> I need to program a driver for a display (it's huge> 4x2 meter) which must display arabic letters. I didn't know arabic letters are that large. ;)
CeciNEstPasUnProgrammeura at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 6

> Thank you very much, your method will work fine. with

> the "UTF-8" parameter the byte[].length is double,

> cause every valid byte is preceeded by an -62, but I

> will just filter the valid bytes into a new array.

This sounds very very wrong! The UTF-8 for \u0096\u00AD\u00A2\u008D\u008E is [-62, -106, -62, -83, -62, -94, -62, -115, -62, -114]

but the -62 values are required. You cannot just get rid of them as they are part of the representation of the character! The result you will get is he same as if you had used ISO-8859-1 encoding! Run the followng code fragment and see for yourself!!

String value = "\u0096\u00AD\u00A2\u008D\u008E";

System.out.println(Arrays.toString(value.getBytes("UTF-8")));

System.out.println(Arrays.toString(value.getBytes("ISO-8859-1")));

sabre150a at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...
# 7

> Thank you very much, your method will work fine. with

> the "UTF-8" parameter the byte[].length is double,

> cause every valid byte is preceeded by an -62, but I

> will just filter the valid bytes into a new array.

>

> Thanks again,

> Stefan

Actually what you need to do is to find the character encoding that your device expects, and then you can code your strings in Arabic.

That's the way Java does things; Strings and char values are always in UNICODE (see www.unicode.org) (which means \u600 to \u6ff for arabic) and uses a specified character encoding when translating these to and from a byte stream.

Each national character encoding has a name. Most of them are identical to ASCII for 0-127 and code their national characters in 128-255.

Find the encoding name for your display and, odds are, the JRE has it in the library.

BTW the character encoding ISO-8859-1 simply maps UNICODE characters 0-255 on to bytes.

malcolmmca at 2007-7-13 22:59:34 > top of Java-index,Java Essentials,Java Programming...