Some I18n questions

Hi,

I need some answers to the problems I'm currently facing. Thanks in advance for any help...

The environment :

Database :

Oracle 9i - using SELECT * FROM V$NLS_PARAMETERS;

NLS_LANGUAGE = AMERICAN

NLS_DATE_LANGUAGE=AMERICAN

NLS_CHARACTERSET = UTF8

NLS_NCHAR_CHARACTERSET=AL16UTF16

All japanese/chinese/korean characters are stored in NVarchar2 type in the database.

Webserver:

1. Tomcat 5.5.

2. Set in Java Opt -Dfile.encoding=UTF-8

3. All JSP pages contains

<%@ page language="java" pageEncoding="UTF-8" contentType="text/html;charset=utf-8" %>

and

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

4. Using Struts with a ACTION superclass that is inherited by all actions classes. The superclass contains the following codes :

response.setContentType( "text/html; charset=UTF-8" );

response.setCharacterEncoding("UTF-8");

request.setCharacterEncoding("UTF-8");

5. Use ResultSet.getString() to obtain nVarchar2 fields.

The Question:

1. In development environment (Windows XP), I am not able to display the desired characters if I starts tomcat using the Java Option -Dfile.encoding=UTF-8. But is I am able to display the desired characters if I starts tomcat WITHOUT the Java Option -Dfile.encoding=UTF-8. I observed that if the -Dfile.encoding is not in used, the System property file.encoding has a value of "CP1250" and if the option is used, the System property file.encoding has the value of "UTF-8".

In the production environment (Linux RedHat AS3), I am not able to display the desired characters. The System property file.encoding has a value of "UTF-8".

How does -Dfile.encoding=UTF-8 affects the displaying of the characters in my WindowsXP development environment?

What can I do to display the correct character if I still want my file.encoding to be set to UTF-8? As this is the same file.encoding setting in the production environment (correct me if I'm wrong please).

2. I also did some test on the data. Using ASCIISTR function to retrieve a NVARCHAR2 field from the table, I got some \AAAA values. Using the AAAA values, I did a lookup at UNIHAN (http://unicode.org/charts/unihan.html). The lookup returns the same funny characters I see in the SQuirrel SQL Client. I also observed that the same funny characters is displayed on the jsp pages. Does these observations imply that the data stored in the database is not correct?

Regards,

[2574 byte] By [lohtwa] at [2007-11-27 8:22:10]
# 1

Hi,

After one week, I realize that the content in the database could be in "CP1252".

Using new String(xxx.getBytes("CP1252"), "UTF-8"), I am able to retrieve some decent characters to be displayed on the web. The problem I'm facing now is some characters displayed as �?�, but some characters displayed correctly.

Anyone can provide an explanation what is happening? Could the conversion between CP1252 and UTF-8 causes data loss?

Thanks in advance...

lohtwa at 2007-7-12 20:10:48 > top of Java-index,Desktop,I18N...