Netbeans Issue: Servlet do not display Chinese UTF-8 properly
Netbeans Issue: Servlet do not display Chinese UTF-8 properly
Java Version: JDK1.5.0_01, JRE1.5.0_01 (International version)
Netbeans Version: Netbeans IDE 4.0
OS: Windows XP Personal Edition
Dear Sirs,
First at all thanks for reading this post. I am having the following issue. I am creating an application using html pages and servlets. I am using Chinese and English languages on them (html encoding UTF-8).
I created a project in Netbeans and added an idex.html screen reporting to a servlet. Both index.html and in the servlet generated html page contains the line:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Additional, I setup the character code settings in Netbeans:
(tools-options-Java sources-Expert-default encoding=UTF-8
When I run the project, index.html displays itself perfectly, with the Chinese characters displayed properly. The problem comes when the html created servlet is displayed, which instead of the Chinese characters some strange characters are displayed (浣 instead of Chinese).
I have tried different encodings from http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html without any luck. I also setup the encoding of the file itself (using right click-properties in the project menu of Netbeans).
Also, when I am editing the servlet, the characters are displayed properly. I type them directly without any issue, but then the display is wrong at runtime.
Also, just in case this have something to do with the problem, my PC was bought in US, therefore the default character set is not Chinese. I had to install the Chinese typing stuff later on. But like I said earlier, the html page is displayed properly, so I really think is some problem with Netbeans.
After a week trying to find a solution, I decided to post it here in the hopes that someone will show me the way of the light.
Thanks in advance for any ideas or help provided
Aral.
Calling HttpServletRequest.setCharacterEncoding is essential for form data POST method. This must be called prior to HttpServletRequest.getParameter methods. HttpServletRequest.setCharacterEncoding() normally only applies to the request body NOT the URI.
If GET method is used to send request data to servlet, you need to set encoding of Tomcat's HTTP Connector for URI to UTF-8 for form data embedded in URL string to be decoded properly by the servlet. This setting is normally done in conf/server.xml file of Tomcat via URIEncoding="UTF-8" or useBodyEncodingForURI="true" if Request.setCharacterEncoding("UTF-8") is used.
Typically:
<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
debug="0" connectionTimeout="20000"
URIEncoding="UTF-8" useBodyEncodingForURI="true"
disableUploadTimeout="true" />
References:
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12253
http://java.sun.com/developer/qow/archive/179/index.jsp
http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/http.html
Hi, thanks for your help. However I think the problem his more complex than it seems. Here is my doPost method (the important parts anyway):
response.setCharacterEncoding("UTF-8"); //Not necessary because the next line should take care of it, but anyways...
response.setContentType("text/html; charset=UTF-8; pageEncoding=UTF-8");
PrintWriter out = response.getWriter();
out.println("<html>");
out.println("<head>");
out.println("<meta http-equiv='Content-Language' content='en-us'>");
out.println("<meta http-equiv='Content-Type' content='text/html; charset=utf-8; pageEncoding=utf-8'>"); //Again not necessary line, but anyways
out.println("<title>Servlet</title>");
out.println("</head>");
out.println("<body>");
out.println("this is a test 你好 this is a test");
out.println("</body>");
out.println("</html>");
out.close();
This servlet is called from a html file. Also, when I load this in the browser, I right click on the screen and I can see that UTF-8 is setup allright. Doing some detective job I found that:
- If the editor displays the characters perfectly and...
- Other html pages (no servlets) in the same application can display chinese characters well and...
- I have configured the encoding as UTF-8 in the servlet properties and in general properties and...
- The file "web.xml" contains the encoding UTF-8 in its first line then...
...my only guess is that something goes wrong during the building of the project itself (ant?). Unfortunately I have no idea about configuring ant to that level, but I began to think that the problem may be there, during the compilation...
Any ideas?
Once more, thanks for any help or advice provided.
Aral.
I know Ant allows you to specify the encoding of the source files to the javac compiler, if that is what you need:<javac srcdir="${src}" destdir="${build}" encoding="UTF-8" ... />
Some servlet containers (including Tomcat) face problem when transmitting UTF-8 encoded Strings using the POST method. Tomcat, in specific always uses ISO-8859-1, no matter what the setting on the jsp page is. Here's my solution to send Greek characters back and forth in a JSP-Struts-Tomcat application:
JSP Page:
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Servlet:
Use the getBytes method to reencode the incoming string to any other encoding. Example (incoming string is encoded in ISO-8859-1, although you entered it in your own languag on the jsp page):
String correctString = new String(incomingString.getBytes("ISO-8859-1","UTF-8");
What the above does is convert the incoming string to a byte array (reading the string as if it was encoded in ISO-8859-1) and then construct a new string from that byte array using UTF-8. You could even encode the string into any other encoding you like.
Hope this helps,
Angelos Anagnostopoulos
Actually, what your code is doing is create a byte array in ISO-8859-1, and then create a new Java string from that byte array, specifying utf-8 as the encoding of the byte array.
I have seen this kind of code a number of times, and the only reason it produces the right result in many cases is that the original characters were mangled on the way in to begin with. So in this case, UTF-8 is sent, interpreted as ISO-8859-1 by Java (because of a configuration problem, possibly with Tomcat as mentioned) and thus incorrectly converted to UTF-16, converted back to ISO-8859-1 from UTF-16, and then read as UTF-8 and converted back to UTF16.
I just tested a jsp using Tomcat, and I don't see any issues with getting UTF-8 correctly returned from the POST method (while the get method will not work).
I have utf-8 declared in the html file, and the following in my jsp:
<%
request.setCharacterEncoding("UTF-8");
%>
<%@page pageEncoding="UTF-8"%>
<html>
<head>
<META http-equiv="Content-Type" content="text/html;charset=UTF-8">
<%@page contentType="text/html;charset=UTF-8"%>
I try
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
in my JSP in Tomcat 4.1.0 it can display correctly the chinese UTF-8 words but when I try it on Tomcat 4.1.31 and Tomcat 5.x.x it can not display ...
May be the JSP compiler problem
yamya at 2007-7-9 17:15:23 >

Hi.....
I have tried all the above methods to make my tomcat show chinese characters
even trying with Big5 but nothing seems to work....
My setup
Fedora Core 3
Tomcat 5.5.9
Mysql 4.1.18
I am trying to use this setup with mondrian and chinese databse .....
i can see chinese characters if tomcat is on windows and datanase on linux
but when tomcat and database both on linux i cant see ant chinese....
tried
UTF-8
Big5
export JAVA_OPTS="-server -Xms256m -Xmx512m -Dfile.encoding=UTF-8"
export JAVA_OPTS="-server -Xms256m -Xmx512m -Dfile.encoding=Big5"
Using
URIEncoding="UTF-8" useBodyEncodingForURI="true"
in server .xml
adding
<%@ page language="java" contentType="text/html; charset=UTF-8"pageEncoding="UTF-8"%>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
even fot Big5
even changing header of all pages to the above statement
everything but I dont know what to do where i am wrong
all i can see is sometimes ?, sometimes square boxes etc
PLEASE PLEASE.........
somebody Help me........
I am going crazy...... from past 8 days making this work.........
Thanx in advance
You will have to get unicode equivant of Chinese character using native2ascii utility. And use unicode value to display chiense chracter.
This problem has nothing to do with using native2ascii or not. UTF-8 encoded characters can display just fine in web pages using JSPs. This post does not mention using resource bundles anywhere, so bringing up native2ascii will only confuse the issue.
See if the info help you: http://vietunicode.sourceforge.net/howto/java/
Aral,
I have the same problem you have.
Pages are in UTF-8. Text on pages comes from different sources:
- some comes from database
- some comes from header/footer files saved in UTF-8 format
- some of the text is hard coded in the source files
Page headers and character set is correctly set to UTF-8 (Using
response.setContentType and Html meta tags) and I see that browsers
understand that the page is UTF-8.
As a result the UTF-8 datas comming from database and header/footer
files are shown correctly but the hard coded text is
show as (?U?匲?.
I have set the UTF-8 character set on each sourcefile's properties in
NetBeans but this does not fix the problem. I think javac does not take source files as UTF-8 and this is why just texts in source file mess up.
Have you found any solution to your problem?
Regards,
Mac
Hello,
Finally adding below settings to web.xml helped me out of this.
I wish it can help you too.
<locale-encoding-mapping-list>
<locale-encoding-mapping>
<locale>fa</locale>
<encoding>UTF-8</encoding>
</locale-encoding-mapping>
</locale-encoding-mapping-list>
You will need to change the locale to chinese. (In my case it was persian).
Regards,
Mac
Hello
Did you manage to solve your problem?
I have the exact same problem and I was trying to figure out what is happening for the last 9 hours!
Its because there is a logging happening behind the scenes which dumps all parameters. The JspServlet class dumps the parameters and of course does not set any encoding. If you do request.getParameter then the request.setCharacterEncoding("utf-8") does not play any role.
and this is my log:
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:247)- JspEngine --> /bar.jsp
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:248)- ServletPath: /bar.jsp
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:249)- PathInfo: null
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:250)- RealPath: d:\jakarta\tomcat\webapps\foo\bar.jsp
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:251)-RequestURI: /bar.jsp
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:252)- QueryString: null
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:253)-Request Params:
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:257)- f_submit = true
2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:257)- foo = I盜睮?br>
not sure what to do now.
Hi,
I am facing the same problem.
I want to display some Greek characters.
I have written following in web.xml.
<locale-encoding-mapping-list>
<locale-encoding-mapping>
<locale>el</locale>
<encoding>UTF-8</encoding>
</locale-encoding-mapping>
</locale-encoding-mapping-list>
but it is showing some junk data when I display it through servlet.
Please help ASAP.
Thanks,
Neelam.
hey i think the problem can be solved if u set LANG="en_US" or set LANG=en_US:ISO8859 for the server where servlets are coming for ...if u ahve any start scripts for the application please ensure this behaviour is correctly setup there..
a14a at 2007-7-20 4:25:20 >
