URLEncoding of URLs with UTF-8 characters
Hi,
I'm trying to URL encode a URL with some chinese characters in them. But it isnt working properly.
i.e. When i decode back the encdoed URL (using java.net.URLDEncoder.decode(String) & java.net.URLDecoder.decode(string) respectively), I dont get the same URL back... The chinese characters are lost some how...
I tried writing it to a file instead of printing the out put, but no luck...
input URL=http://down.chinamp3.com/down.php?id=70201&song_name=I%20Will%20Be%20Fine&singer_name=莫文蔚
encoded URL=http%3a//down.chinamp3.com/down.php%3fid%3d70201%26song_name%3dI%2520Will%2520Be%2520Fine%26singer_name%3d%e8%8e%ab%e6%96%87%e8%94%9a
decoded URL=http://down.chinamp3.com/down.php?id=70201&song_name=I%20Will%20Be%20Fine&singer_name=猫沤芦忙鈥撯€∶ㄢ€澟?br>Where as the decode URL should be same as the input URL!!!!
I'm completely stuck, Some advice will be highly appreciated!
Regards
Pratim
[991 byte] By [
pratimdas] at [2007-9-27 18:45:19]

Hi!
try this....with encodeURL(yoururl, "UTF8")
public static String encodeURL(String Input, String DestinationEncoding)
{
try
{
byte [] lBytes = Input.getBytes(DestinationEncoding);
intlLength = lBytes.length;
StringBuffer lOutput = new StringBuffer(lLength * 3);
for(int i=0;i<lLength;i++)
{
if ((lBytes >= 65 && lBytes <= 90)
|| (lBytes >= 97 && lBytes <= 122))
{
lOutput.append((char)lBytes);
}
else
{
int lByte = lBytes;
if (lByte < 0) lByte = 256 + lByte;
lOutput.append('%');
lOutput.append("0123456789ABCDEF".charAt(lByte / 16));
lOutput.append("0123456789ABCDEF".charAt(lByte & 0xf));
}
}
return lOutput.toString();
}
catch (java.io.UnsupportedEncodingException ueex)
{
return java.net.URLEncoder.encode(Input);
}
}
Regards,
Harrz
harrz at 2007-7-6 20:00:50 >

I tried something similar in different ways. After spending days.. i came up with something which is working fine for me in any language. I tested the following code from java applet, or input from HTML page, it seems to be working fine. I hope this helps.
Do not use the request.getParemeter() for retrieving chinese or other 2/3 byte input characters. Instead create your own hashtable and do read ur request input stream directly.
- code -
String queryString = request.getQueryString();
String paramStr = "";
if(isPost)
{
StringBuffer buffer = new StringBuffer();
char[] buf = new char[4 * 1024]; // 4Kchar buffer
int len;
BufferedReader reader = request.getReader();
while ((len = reader.read(buf, 0, buf.length)) != -1)
{
buffer.append(buf, 0 ,len);
}
paramStr = buffer.toString();
paramStr = URLDecoder.decode(paramStr,"UTF-8");
p(paramStr);
}
Hashtable requestHash = createHash(paramStr);
Hashtable getRequestHash = createHash(queryString==null?"":queryString );//HttpUtils.parseQueryString( queryString==null?"":queryString );
for(Enumeration e= getRequestHash.keys();e.hasMoreElements();)
{
Object key = e.nextElement();
Object value = getRequestHash.get( key );
requestHash.put(key,value);
}
p("Request Hash size : " + requestHash.size());
end code
And now the create hash function
-- code --
Hashtable createHash(String str)
{
Hashtable hash = new Hashtable();
StringTokenizer st = new StringTokenizer(str,"&");
while(st.hasMoreTokens())
{
StringTokenizer st1 = new StringTokenizer(st.nextToken(),"=");
String key = st1.nextToken();
String value = (st1.hasMoreTokens()?st1.nextToken():"");
hash.put(key,new String[]{value});
}
return hash;
}
String getRequestValue(Hashtable requestHash,String key)
{
if(requestHash.get(key)==null)
return null;
String value = ((String[]) requestHash.get(key))[0];
return value;
}
String[] getRequestValues(Hashtable requestHash,String key)
{
if(requestHash.get(key)==null)
return null;
String[] value = ((String[]) requestHash.get(key));
return value;
}
end code
You can get the request value like this:
String act=getRequestValue(requestHash,"act");
I hope this one helps
> I tried something similar in different ways. After
> spending days.. i came up with something which is
> working fine for me in any language. I tested the
> following code from java applet, or input from HTML
> page, it seems to be working fine. I hope this helps.
>
> Do not use the request.getParemeter() for retrieving
> chinese or other 2/3 byte input characters. Instead
> create your own hashtable and do read ur request input
> stream directly.
It was required for JSDK 2.2, but if you are working with the latest JSDK, there is setCharacterEncoding() method. And only thing you should do is to call it before using any of getParameter() methods.
http://java.sun.com/products/servlet/2.3/javadoc/javax/servlet/ServletRequest.html#setCharacterEncoding(java.lang.String)
euxx at 2007-7-6 20:00:51 >
