unsigned byte String representation
Hi,
first of all I know that Java doesn't have unsigned types. But please spend one minute if you wish and try to understand my problem.
I'm calling a servlet from VB6 activex asking for a string. The problem appears when the string has stressed characters. Activex uses windows-1252... I cannot change response character encoding because I use Java 1.3 (please correct me if I'm wrong). For all of you who want to ask why not convert it in VB6, believe me that UTF support in VB is very poor. I decided to return output string as byte[] representation - I will construct a string in which each byte will be separated with coma. I can easily decode such string in VB. I already created an activex demo:
ą # ę # ?/i> representation is (genereted in activex):
185 35 234 35 243
but when I do this in java servlet:
ą # ę # ?/i> representation is:
-60 -123 35 -60 -103 35 -61 -77
I perform the conversion in servlet in a following way:
String a ="ą#ę#?;
byte[] bytes = a.getBytes();
As I said this servlet is compliled in Java 1.3. In tomcat's catalina.bat i specified:
set CATALINA_OPTS=%CATALINA_OPTS% -Dfile.encoding=UTF-8
I understand that "ą" is represented by byte larger than 127 so it appears in byte[] as-60 -123. So my question is: how to convert this two bytes to185. Once again - I know there are no unsigned bytes in java, but all i need is to create a strung with unsigned bytes separated by comas, for example:
flower: 102,108,111,119,101,114
Thanks in advance!
Message was edited by:
-gnom-
[1746 byte] By [
-gnom-a] at [2007-11-26 18:21:33]

Hi,I don't know if this will help or not, but the String class has a method with the signature getBytes(String enc). Kaj
Okay, let's go back about three steps. You are sending these characters with ogoneks from your ActiveX to a servlet (as you said) via an HTTP request? (You used the word "calling" which is confusing and ambiguous.) If so, a GET or POST request? And when you send the characters, what do you see in the servlet? What bytes, I mean? Let's find out what's happening before we design bizarre solutions.
And forget about that unsigned byte business. That isn't going to be part of the problem or of the solution.
No, as I said(I'm calling a servlet from VB6 activex asking for a string) I have opposite situation - let me make it clear:
1. In activeX I execute Java servlet method
2. Java servlet returns some string back to activex
As I said when the string contains Unicode characters, then problem begins because activex uses windows-1252 encoding and servlet uses utf8. My solution is to return from servlet not a string in its original form, but it's byte representation (bytes will be separated with coma)
You could try using Character.getNumericValue() or Character.codePointAt() on each character in the string maybe. But then I doubt those values would match up to their corresponding encodings in windows 1252. Oh well, things try anyway, I guess
I already tried :) in fact I tried almost everything and the only solution that might work is byte encoding.
So can anyone tell me how java is converting ą to -60 -123
I assume that conversion made is VB6 is correct so it should be 185.
How to get 185 from -60 and -123 ?!?
Yay! I got it. You're lucky I have nothing to do at work today :) Have a look at:
http://en.wikipedia.org/wiki/Windows-1252
I couldn't find any of the characters in your list on the encoding chart, except for ?(243)
When you run
System.out.println(Character.codePointAt("?,0));
It prints 243. So presumably there's something wierd going on with the other characters, maybe they're from a different encoding or something. But that one works at least :)
Oh, and # of course, but that one probably wasn't causing you any problems
> You could try using Character.getNumericValue() That method is often misunderstood. From its API:For example, the character '\u216C' (the roman numeral fifty) will return an int with a value of 50Is that really the translation required?!
> No, as I said(I'm calling a servlet from VB6
> activex asking for a string) I have opposite
> situation - let me make it clear:
> 1. In activeX I execute Java servlet method
> 2. Java servlet returns some string back to activex
Yes, I know what you said. All the ActiveX components I know of run in the browser, on the client, and servlets run on the host. So how is this "calling" working? You aren't sending an HTTP request which results in calling the servlet's doGet() method? If not, then what?
> Yay! I got it. You're lucky I have nothing to do at
> work today :) Have a look at:
> http://en.wikipedia.org/wiki/Windows-1252
>
> I couldn't find any of the characters in your list on
> the encoding chart, except for ?(243)
Yes, windows-1250 would be more suitable for Eastern European alphabets.
> > You could try using Character.getNumericValue()
>
> That method is often misunderstood. From its API:
>
> For example, the character '\u216C' (the roman
> numeral fifty) will return an int with a value of 50
>
> Is that really the translation required?!
Yeah, I didn't read the docs. I meant codePointAt()
> Yeah, I didn't read the docs. I meant codePointAt()If you meant that why did you write:You could try using Character.getNumericValue() or Character.codePointAt() :-)
Ok I finally figured out why do you get ę representation as -60 -103;
http://www.utf8-chartable.de/unicode-utf8-table.pl
it is written that:
U+0119ę11000100 10011001
so if you change 11000100 to int you get 196, 10011001 to int is 153.
and byte representation of 196 is -60, and 153 is -103 :]
And this gave me the clue that VB6 function I'm using is calculating windows1252 byte representation, not UFT 8|.
well doesn't matter... I have to use byte[] bytes = a.getBytes("windows-1252");
than change back negative bytes to their positive representation, and then in the activex I will be able decode such binary representation of string back to the text.
And about the communication.. yes it calls doGet. I wouldn't have this encoding problem if I could do one of following things:
1.HttpServletResponse.setCharacterEncoding to windows1252
or
2.Specify request encoding in activex as UTF
I cannot do any of this two....