Problem with URL encoding conversion

Hi all,

I am working on an I18N application and in my application one component sends the request to another component and then this component fetch that requet and extract the query-parameters from the request (HTTP request).

Now the problem is that the input to first component can be given in one of the 5 character encodings:-

UTF-8

Shift_JIS

EUC_JP

Windows-31J

ISO-2022-JP

I have created a test program that convert the encoded URL from one character encoding to another character encoding. It is working successfully for the above 4 encodings but for the last encoding that is "ISO-2022-JP" this fails. The test programs is: -

import java.io.*;

import java.util.*;

import java.net.URLDecoder;

import java.net.URLEncoder;

class JPtoUTF8{

publicstaticvoid main(String[] args){

try{

String shift_jis ="%82%C8%82%A4%82%8B%82%E8";// This is Shift_JIS encoded URL

String iso2022jp ="%1B%24B%24J%24%26%23k%24j%1B%28B";// This is ISO-2022-JP encoded URL

String utf8 ="%E3%81%AA%E3%81%86%EF%BD%8B%E3%82%8A";// This is the result that should be obtained

String decodedShift_jis = URLDecoder.decode(shift_jis,"Shift_JIS");

String decodedIso2022jp = URLDecoder.decode(iso2022jp,"ISO-2022-JP");

String encodedShift_JIS = URLEncoder.encode(decodedShift_jis,"UTF-8");

String encodedIso2022jp = URLEncoder.encode(decodedIso2022jp,"UTF-8");

System.out.println("shift_jis= "+shift_jis);

System.out.println("encodedShift_JIS = "+encodedShift_JIS);

System.out.println("iso2022jp= "+iso2022jp);

System.out.println("encodedIso2022jp = "+encodedIso2022jp);

}catch(Exception e){

e.printStackTrace();

}

}

}

I am using jdk5 for this application.

Please give your valuable suggestions.

Thanks in advance.

[2823 byte] By [Prashant001a] at [2007-10-3 3:32:54]
# 1

Hi all,

I think that in my question i have not explained that how i got the encoded URL.

Actually what happens that a request can come to our component from any site which may use different different character encoding and with respect to that we decode that and again encoded it into the UTF-8 format and further processing will be done.

Now we are able to do it with many encodings except "ISO-2022-JP"

Can somebody throw any light on that...

Prashant001a at 2007-7-14 21:27:12 > top of Java-index,Desktop,I18N...
# 2

Could the cause be that ISO-2022-JP is not just ISO-2022-JP:

http://www.w3.org/TR/japanese-xml/#AEN28427904

Maybe what you are getting is one of the flavors, while the java urldecoder uses another flavor? Or maybe the string you are getting is incorrectly encoded to being with (might have been incorrectly converted from shift-jis)?

With the shift-in shift-out design it is a difficult encoding to deal with under the best of circumstances, so you have my sympathy.

one_danea at 2007-7-14 21:27:12 > top of Java-index,Desktop,I18N...