URLEncoding of URLs with UTF-8 characters

Hi,

I'm trying to URL encode a URL with some chinese characters in them. But it isnt working properly.

i.e. When i decode back the encdoed URL (using java.net.URLDEncoder.decode(String) & java.net.URLDecoder.decode(string) respectively), I dont get the same URL back... The chinese characters are lost some how...

I tried writing it to a file instead of printing the out put, but no luck...

input URL=http://down.chinamp3.com/down.php?id=70201&song_name=I%20Will%20Be%20Fine&singer_name=莫文蔚

encoded URL=http%3a//down.chinamp3.com/down.php%3fid%3d70201%26song_name%3dI%2520Will%2520Be%2520Fine%26singer_name%3d%e8%8e%ab%e6%96%87%e8%94%9a

decoded URL=http://down.chinamp3.com/down.php?id=70201&song_name=I%20Will%20Be%20Fine&singer_name=猫沤芦忙鈥撯€∶ㄢ€澟?br>Where as the decode URL should be same as the input URL!!!!

I'm completely stuck, Some advice will be highly appreciated!

Regards

Pratim

[991 byte] By [pratimdas] at [2007-9-27 18:45:19]
# 1
I also tried using java.net.URLDecoder.decode(String url, String encoding) in jdk1.4, that didnt help aswell!Pratim
pratimdas at 2007-7-6 20:00:50 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2
Where is your complete code? The problem could be not in decoder, but somewhere else.
euxx at 2007-7-6 20:00:50 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

Hi!

try this....with encodeURL(yoururl, "UTF8")

public static String encodeURL(String Input, String DestinationEncoding)

{

try

{

byte [] lBytes = Input.getBytes(DestinationEncoding);

intlLength = lBytes.length;

StringBuffer lOutput = new StringBuffer(lLength * 3);

for(int i=0;i<lLength;i++)

{

if ((lBytes >= 65 && lBytes <= 90)

|| (lBytes >= 97 && lBytes <= 122))

{

lOutput.append((char)lBytes);

}

else

{

int lByte = lBytes;

if (lByte < 0) lByte = 256 + lByte;

lOutput.append('%');

lOutput.append("0123456789ABCDEF".charAt(lByte / 16));

lOutput.append("0123456789ABCDEF".charAt(lByte & 0xf));

}

}

return lOutput.toString();

}

catch (java.io.UnsupportedEncodingException ueex)

{

return java.net.URLEncoder.encode(Input);

}

}

Regards,

Harrz

harrz at 2007-7-6 20:00:50 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4

I tried something similar in different ways. After spending days.. i came up with something which is working fine for me in any language. I tested the following code from java applet, or input from HTML page, it seems to be working fine. I hope this helps.

Do not use the request.getParemeter() for retrieving chinese or other 2/3 byte input characters. Instead create your own hashtable and do read ur request input stream directly.

- code -

String queryString = request.getQueryString();

String paramStr = "";

if(isPost)

{

StringBuffer buffer = new StringBuffer();

char[] buf = new char[4 * 1024]; // 4Kchar buffer

int len;

BufferedReader reader = request.getReader();

while ((len = reader.read(buf, 0, buf.length)) != -1)

{

buffer.append(buf, 0 ,len);

}

paramStr = buffer.toString();

paramStr = URLDecoder.decode(paramStr,"UTF-8");

p(paramStr);

}

Hashtable requestHash = createHash(paramStr);

Hashtable getRequestHash = createHash(queryString==null?"":queryString );//HttpUtils.parseQueryString( queryString==null?"":queryString );

for(Enumeration e= getRequestHash.keys();e.hasMoreElements();)

{

Object key = e.nextElement();

Object value = getRequestHash.get( key );

requestHash.put(key,value);

}

p("Request Hash size : " + requestHash.size());

end code

And now the create hash function

-- code --

Hashtable createHash(String str)

{

Hashtable hash = new Hashtable();

StringTokenizer st = new StringTokenizer(str,"&");

while(st.hasMoreTokens())

{

StringTokenizer st1 = new StringTokenizer(st.nextToken(),"=");

String key = st1.nextToken();

String value = (st1.hasMoreTokens()?st1.nextToken():"");

hash.put(key,new String[]{value});

}

return hash;

}

String getRequestValue(Hashtable requestHash,String key)

{

if(requestHash.get(key)==null)

return null;

String value = ((String[]) requestHash.get(key))[0];

return value;

}

String[] getRequestValues(Hashtable requestHash,String key)

{

if(requestHash.get(key)==null)

return null;

String[] value = ((String[]) requestHash.get(key));

return value;

}

end code

You can get the request value like this:

String act=getRequestValue(requestHash,"act");

I hope this one helps

naveen1975 at 2007-7-6 20:00:50 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5

> I tried something similar in different ways. After

> spending days.. i came up with something which is

> working fine for me in any language. I tested the

> following code from java applet, or input from HTML

> page, it seems to be working fine. I hope this helps.

>

> Do not use the request.getParemeter() for retrieving

> chinese or other 2/3 byte input characters. Instead

> create your own hashtable and do read ur request input

> stream directly.

It was required for JSDK 2.2, but if you are working with the latest JSDK, there is setCharacterEncoding() method. And only thing you should do is to call it before using any of getParameter() methods.

http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)

euxx at 2007-7-6 20:00:51 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...