Netbeans Issue: Servlet do not display Chinese UTF-8 properly

Netbeans Issue: Servlet do not display Chinese UTF-8 properly

Java Version: JDK1.5.0_01, JRE1.5.0_01 (International version)

Netbeans Version: Netbeans IDE 4.0

OS: Windows XP Personal Edition

Dear Sirs,

First at all thanks for reading this post. I am having the following issue. I am creating an application using html pages and servlets. I am using Chinese and English languages on them (html encoding UTF-8).

I created a project in Netbeans and added an idex.html screen reporting to a servlet. Both index.html and in the servlet generated html page contains the line:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Additional, I setup the character code settings in Netbeans:

(tools-options-Java sources-Expert-default encoding=UTF-8

When I run the project, index.html displays itself perfectly, with the Chinese characters displayed properly. The problem comes when the html created servlet is displayed, which instead of the Chinese characters some strange characters are displayed (浣 instead of Chinese).

I have tried different encodings from http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html without any luck. I also setup the encoding of the file itself (using right click-properties in the project menu of Netbeans).

Also, when I am editing the servlet, the characters are displayed properly. I type them directly without any issue, but then the display is wrong at runtime.

Also, just in case this have something to do with the problem, my PC was bought in US, therefore the default character set is not Chinese. I had to install the Chinese typing stuff later on. But like I said earlier, the html page is displayed properly, so I really think is some problem with Netbeans.

After a week trying to find a solution, I decided to post it here in the hopes that someone will show me the way of the light.

Thanks in advance for any ideas or help provided

Aral.

[2013 byte] By [aral_ocrama] at [2007-10-1 6:48:50]
# 1

Calling HttpServletRequest.setCharacterEncoding is essential for form data POST method. This must be called prior to HttpServletRequest.getParameter methods. HttpServletRequest.setCharacterEncoding() normally only applies to the request body NOT the URI.

If GET method is used to send request data to servlet, you need to set encoding of Tomcat's HTTP Connector for URI to UTF-8 for form data embedded in URL string to be decoded properly by the servlet. This setting is normally done in conf/server.xml file of Tomcat via URIEncoding="UTF-8" or useBodyEncodingForURI="true" if Request.setCharacterEncoding("UTF-8") is used.

Typically:

<Connector port="8080"

maxThreads="150" minSpareThreads="25" maxSpareThreads="75"

enableLookups="false" redirectPort="8443" acceptCount="100"

debug="0" connectionTimeout="20000"

URIEncoding="UTF-8" useBodyEncodingForURI="true"

disableUploadTimeout="true" />

References:

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23929

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12253

http://java.sun.com/developer/qow/archive/179/index.jsp

http://jakarta.apache.org/tomcat/tomcat-5.0-doc/config/http.html

nguyenq87a at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 2

Hi, thanks for your help. However I think the problem his more complex than it seems. Here is my doPost method (the important parts anyway):

response.setCharacterEncoding("UTF-8"); //Not necessary because the next line should take care of it, but anyways...

response.setContentType("text/html; charset=UTF-8; pageEncoding=UTF-8");

PrintWriter out = response.getWriter();

out.println("<html>");

out.println("<head>");

out.println("<meta http-equiv='Content-Language' content='en-us'>");

out.println("<meta http-equiv='Content-Type' content='text/html; charset=utf-8; pageEncoding=utf-8'>"); //Again not necessary line, but anyways

out.println("<title>Servlet</title>");

out.println("</head>");

out.println("<body>");

out.println("this is a test 你好 this is a test");

out.println("</body>");

out.println("</html>");

out.close();

This servlet is called from a html file. Also, when I load this in the browser, I right click on the screen and I can see that UTF-8 is setup allright. Doing some detective job I found that:

- If the editor displays the characters perfectly and...

- Other html pages (no servlets) in the same application can display chinese characters well and...

- I have configured the encoding as UTF-8 in the servlet properties and in general properties and...

- The file "web.xml" contains the encoding UTF-8 in its first line then...

...my only guess is that something goes wrong during the building of the project itself (ant?). Unfortunately I have no idea about configuring ant to that level, but I began to think that the problem may be there, during the compilation...

Any ideas?

Once more, thanks for any help or advice provided.

Aral.

aral_ocrama at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 3
I know Ant allows you to specify the encoding of the source files to the javac compiler, if that is what you need:<javac srcdir="${src}" destdir="${build}" encoding="UTF-8" ... />
nguyenq87a at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 4

Some servlet containers (including Tomcat) face problem when transmitting UTF-8 encoded Strings using the POST method. Tomcat, in specific always uses ISO-8859-1, no matter what the setting on the jsp page is. Here's my solution to send Greek characters back and forth in a JSP-Struts-Tomcat application:

JSP Page:

<%@ page language="java" contentType="text/html; charset=UTF-8"

pageEncoding="UTF-8"%>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Servlet:

Use the getBytes method to reencode the incoming string to any other encoding. Example (incoming string is encoded in ISO-8859-1, although you entered it in your own languag on the jsp page):

String correctString = new String(incomingString.getBytes("ISO-8859-1","UTF-8");

What the above does is convert the incoming string to a byte array (reading the string as if it was encoded in ISO-8859-1) and then construct a new string from that byte array using UTF-8. You could even encode the string into any other encoding you like.

Hope this helps,

Angelos Anagnostopoulos

A.Anagnostopoulosa at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 5

Actually, what your code is doing is create a byte array in ISO-8859-1, and then create a new Java string from that byte array, specifying utf-8 as the encoding of the byte array.

I have seen this kind of code a number of times, and the only reason it produces the right result in many cases is that the original characters were mangled on the way in to begin with. So in this case, UTF-8 is sent, interpreted as ISO-8859-1 by Java (because of a configuration problem, possibly with Tomcat as mentioned) and thus incorrectly converted to UTF-16, converted back to ISO-8859-1 from UTF-16, and then read as UTF-8 and converted back to UTF16.

one_danea at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 6

I just tested a jsp using Tomcat, and I don't see any issues with getting UTF-8 correctly returned from the POST method (while the get method will not work).

I have utf-8 declared in the html file, and the following in my jsp:

<%

request.setCharacterEncoding("UTF-8");

%>

<%@page pageEncoding="UTF-8"%>

<html>

<head>

<META http-equiv="Content-Type" content="text/html;charset=UTF-8">

<%@page contentType="text/html;charset=UTF-8"%>

one_danea at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 7

I try

<%@ page language="java" contentType="text/html; charset=UTF-8"

pageEncoding="UTF-8"%>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

in my JSP in Tomcat 4.1.0 it can display correctly the chinese UTF-8 words but when I try it on Tomcat 4.1.31 and Tomcat 5.x.x it can not display ...

May be the JSP compiler problem

yamya at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 8

Hi.....

I have tried all the above methods to make my tomcat show chinese characters

even trying with Big5 but nothing seems to work....

My setup

Fedora Core 3

Tomcat 5.5.9

Mysql 4.1.18

I am trying to use this setup with mondrian and chinese databse .....

i can see chinese characters if tomcat is on windows and datanase on linux

but when tomcat and database both on linux i cant see ant chinese....

tried

UTF-8

Big5

export JAVA_OPTS="-server -Xms256m -Xmx512m -Dfile.encoding=UTF-8"

export JAVA_OPTS="-server -Xms256m -Xmx512m -Dfile.encoding=Big5"

Using

URIEncoding="UTF-8" useBodyEncodingForURI="true"

in server .xml

adding

<%@ page language="java" contentType="text/html; charset=UTF-8"pageEncoding="UTF-8"%>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

even fot Big5

even changing header of all pages to the above statement

everything but I dont know what to do where i am wrong

all i can see is sometimes ?, sometimes square boxes etc

PLEASE PLEASE.........

somebody Help me........

I am going crazy...... from past 8 days making this work.........

Thanx in advance

solversaa at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 9
You will have to get unicode equivant of Chinese character using native2ascii utility. And use unicode value to display chiense chracter.
nknemaa at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 10
This problem has nothing to do with using native2ascii or not. UTF-8 encoded characters can display just fine in web pages using JSPs. This post does not mention using resource bundles anywhere, so bringing up native2ascii will only confuse the issue.
one_danea at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 11
See if the info help you: http://vietunicode.sourceforge.net/howto/java/
nguyenq87a at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 12

Aral,

I have the same problem you have.

Pages are in UTF-8. Text on pages comes from different sources:

- some comes from database

- some comes from header/footer files saved in UTF-8 format

- some of the text is hard coded in the source files

Page headers and character set is correctly set to UTF-8 (Using

response.setContentType and Html meta tags) and I see that browsers

understand that the page is UTF-8.

As a result the UTF-8 datas comming from database and header/footer

files are shown correctly but the hard coded text is

show as (?U?匲?.

I have set the UTF-8 character set on each sourcefile's properties in

NetBeans but this does not fix the problem. I think javac does not take source files as UTF-8 and this is why just texts in source file mess up.

Have you found any solution to your problem?

Regards,

Mac

sarmadysa at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 13

Hello,

Finally adding below settings to web.xml helped me out of this.

I wish it can help you too.

<locale-encoding-mapping-list>

<locale-encoding-mapping>

<locale>fa</locale>

<encoding>UTF-8</encoding>

</locale-encoding-mapping>

</locale-encoding-mapping-list>

You will need to change the locale to chinese. (In my case it was persian).

Regards,

Mac

sarmadysa at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 14

Hello

Did you manage to solve your problem?

I have the exact same problem and I was trying to figure out what is happening for the last 9 hours!

Its because there is a logging happening behind the scenes which dumps all parameters. The JspServlet class dumps the parameters and of course does not set any encoding. If you do request.getParameter then the request.setCharacterEncoding("utf-8") does not play any role.

and this is my log:

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:247)- JspEngine --> /bar.jsp

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:248)- ServletPath: /bar.jsp

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:249)- PathInfo: null

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:250)- RealPath: d:\jakarta\tomcat\webapps\foo\bar.jsp

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:251)-RequestURI: /bar.jsp

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:252)- QueryString: null

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:253)-Request Params:

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:257)- f_submit = true

2006-04-03 17:51:36,421 DEBUG [ajp-8009-1] servlet.JspServlet (JspServlet.java:257)- foo = I盜睮?br>

not sure what to do now.

cherouvima at 2007-7-9 17:15:23 > top of Java-index,Desktop,I18N...
# 15

Hi,

I am facing the same problem.

I want to display some Greek characters.

I have written following in web.xml.

<locale-encoding-mapping-list>

<locale-encoding-mapping>

<locale>el</locale>

<encoding>UTF-8</encoding>

</locale-encoding-mapping>

</locale-encoding-mapping-list>

but it is showing some junk data when I display it through servlet.

Please help ASAP.

Thanks,

Neelam.

neelamthitea at 2007-7-20 4:25:20 > top of Java-index,Desktop,I18N...
# 16
hey i think the problem can be solved if u set LANG="en_US" or set LANG=en_US:ISO8859 for the server where servlets are coming for ...if u ahve any start scripts for the application please ensure this behaviour is correctly setup there..
a14a at 2007-7-20 4:25:20 > top of Java-index,Desktop,I18N...