little help with FORM POST data not in UTF-8 (JSP/servlet)

Hello,

I am trying to update a MySQL database record with UTF-8 characters with my JSP application.

1) I have MySQL correctly configured to handle UTF8 and have tested insert/update/select with UTF8 characters

2) I have an "editRecord.jsp" page. At the top of the page, I specify:

<% request.setCharacterEncoding("UTF-8"); %>

3) I have an input form which is specified in that page which follows:

<form action="<c:url value="/updateRecord.jsp"/>"

name="updatetForm" method="post"

ACCEPT-CHARSET="UTF-8"

enctype="multipart/form-data">

4) I have a servlet filter that takes every HttpServletRequest and modifies that object in

doFilter(...){

...

request.setCharacterEncoding("UTF-8");

chain.doFilter(...)

}

5) In updateRecord.jsp, I fill a JavaBean with the form data. Here's an example of the form input:

name = Company

Comments=Here's some unicode text: "يني إن بلاده م

6) When I put a breakpoint and inspect the contents of the UpdateBean after posting the form, running the request through the filter, I see

I see:

name = Company

Comments=Here's some unicode text: „禺丕乇...."

Where could I possibly be missing something to force UTF8 encoding of these values?

[1405 byte] By [matlasa] at [2007-10-2 11:40:41]
# 1
Perhaps I forgot to mention I am processing my multipart form data using Apache Commons FileUpload classes and not JSTL tags.
matlasa at 2007-7-13 5:32:30 > top of Java-index,Desktop,I18N...
# 2
Going even further, in my FileUpload bean I even go as far as to specify:FileItemFactory factory = new DiskFileItemFactory();ServletFileUpload upload = new ServletFileUpload(factory);upload.setHeaderEncoding("UTF-8");......
matlasa at 2007-7-13 5:32:30 > top of Java-index,Desktop,I18N...
# 3

I found a fix for this.

Here's some good information here that solved my problem of processing multipart form data containing UTF-8 (unicode) characters in Commons FileUpload:

http://www.theserverside.com/discussions/thread.tss?thread_id=28944

Here's a snippet of my code that processes the UTF-8 form data using Commons FileUpload:

http://java.pastebin.ca/40594

The most important lines are 9 and 26:

upload.setHeaderEncoding("UTF-8");

...

String value = new String(item.getString().getBytes("ISO-8859-1"), "UTF-8"); ...

matlasa at 2007-7-13 5:32:30 > top of Java-index,Desktop,I18N...
# 4

If you have to use the following code to fix the encoding:

String value = new String(item.getString().getBytes("ISO-8859-1"), "UTF-8"); ...

then you are just putting a band aid on something that is broken somewhere else (since this code creates a byte array in 8859-1 and then creates a string from that byte array, but assuming that the array is encoded in UTF-8). That will only work if the string you start out with is mangled.

In many cases such a fix will actually "work" (in the sense that you may end up with something that is correctly encoded in the end), but you run the risk of breaking if the source of the original corruption is removed.

one_danea at 2007-7-13 5:32:30 > top of Java-index,Desktop,I18N...