Problem While Displaying Chinese char

Hi

I have some Chinese chars which I am storing in db.

My approach is as follows

Step 1:

I convert file containing Chinese chars into Unicode file using native2ascii tool as shown below

native2ascii Chinese_char.txt UniChinese_char.txt

Step 2:

Now I copy Unicode chars from converted UniChinese_char.txt and past in my program as shown below.

String uniStr="\u6c49\u5b57/\u6f22\u5b57\u4e0d\u6b63\u786e"; //string copied from file converted by native2ascii tool

String fileStr="";

FileInputStream f=new FileInputStream("c:/ UniChinese_char.txt "); //read file converted by native2ascii tool

InputStreamReader sr=new InputStreamReader(f,"GBK");

BufferedReader bf=new BufferedReader(sr);

System.out.println(sr.getEncoding());

String data="";

while((data=bf.readLine())!=null)

{

fileStr=data;//store chars from file converted by native2ascii tool in fileStr variable

}

byte[] utf8Bytes = fileStr.getBytes("GBK"); //covert chars into GBK

String result=new String(utf8Bytes,"UTF8"); //covert chars into UTF8

System.out.print(uniStr+" "+result);

It displays uniStr string properly i.e. in Chinese format.

But problem arises when I try to read UniChinese_char.txt file with the help of FileInputStream as shown program above.

It displays Unicode string whenever it should display Chinese format of Unicode string.

i.e. when I print result variable which contains Unicode chars stored by string in Unicode format

it shows Unicode chars not Chinese format of Unicode string.

My file.encoding is GBK and my user.language is zh.

I have already set font of OS as Chinese.

I also tried to convert Chinese char file by utf8 encoding option with native2ascii tool, but it was useless.

Thanks for your help.

[1880 byte] By [gowher_naika] at [2007-10-2 7:56:35]
# 1

I'll just comment on one place that seems clearly wrong:

> byte[] utf8Bytes = fileStr.getBytes("GBK"); //covert

> chars into GBK

>

> String result=new String(utf8Bytes,"UTF8"); //covert

> chars into UTF8

>

> It displays Unicode string whenever it should display

> Chinese format of Unicode string.

>

> i.e. when I print result variable which contains

> Unicode chars stored by string in Unicode format

>

> it shows Unicode chars not Chinese format of Unicode

> string.

What you do in those 2 lines of code is to create a byte array in GBK encoding, and then create a new string from that byte array, but stating that the byte array is in UTF8.

Your comment "//convert chars into UTF8" seems to indicate that you think the code in that line performs a conversion INTO UTF8 - it does no such thing. A string in Java is always Unicode, and the code line in question takes the bytes and convert them FROM UTF8 into Unicode. Since the byte array in question is presumably in GBK (if they were correctly encoded to begin with), the result would be wrong.

I don't know what you mean by "it shows Unicode chars not Chinese format of Unicode string", but I would not expect the characters to display correctly.

one_danea at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 2
Should have added: You seem to have already gotten your data into the fileStr variable, so why don't you just use that? Why try to create a new String (using the incorrect conversions)?
one_danea at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 3

hi

First of all thanks for response.

Now my aim is simple when I print uniStr variable the above program displays correct chinese string which is 汉字/漢字不正确But when I print fileStr variable it displays u6c49\u5b57 / \u6f22\u5b57 \u4e0d\u6b63\u786

When ever fileStr variable should display 汉字/漢字不正确.

You will not be able to see above chinese chars correctly.

thanks

gowher_naika at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 4
So my aim is that both variables (uniStr and result) should display same Chinese string which is 汉字/漢字不正确Thanks
gowher_naika at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 5

It would seem that your characters get garbled in the process of getting from the the file to your output. Have you tried to read the unconverted file (Chinese_char.txt) instead of the file that you have run native2ascii against? Since you specify the GBK encoding in your InputStreamReader, it would seem that is the file you should read.

one_danea at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 6

Yes I stored chars from original Chinese file (Chinese_char.txt) into oracle db and program stores Chinese chars perfectly.

But problem occurs when I try to read same chars using ResultSet from db it shows junk chars.

Code which I used to store original chars are

private void createStatementFromTxtFile(String query)throws Exception

{

pstmt = (oracle.jdbc.OraclePreparedStatement)conn.prepareStatement(query);

pstmt.setFormOfUse(1, oracle.jdbc.OraclePreparedStatement.FORM_NCHAR);

String fileData=readFromFile(txtFile);//call to readFromFile method

pstmt.setString(1,new String(fileData.getBytes(),擨SO-8859-1?);

pstmt.execute();

pstmt.close();

System.out.println("String "+fileData+" Stored In DataBase Successfully.");

}

private String readFromFile(String txtFile)throws Exception

{

InputStream str=new FileInputStream(txtFile);

InputStreamReader in=new InputStreamReader(str);

int c;

String data="";

while ((c = in.read()) != -1)

{

data=data+(char)c;

}

in.close();

str.close();

return data;

}

Code which I used to display db Chinese chars are

private void displayTable(String query)throws Exception

{

try

{

pstmt = (oracle.jdbc.OraclePreparedStatement)conn.prepareStatement(query);

ResultSet rset = pstmt.executeQuery();

String name = "";

while(rset.next())

{

name = rset.getString(1);

System.out.println("The Table Data Is :"+name);//here name contains junk chars

}

}

catch (SQLException sqe)

{

sqe.printStackTrace();

}

}

Thanks

gowher_naika at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...
# 7

I see that you use ISO-8859-1 when you store your Chinese characters in the DB, and that would seem to be an issue (although I have seen statements that this kind of workaround is actually recommended by Oracle). Since you say the characters are stored correctly, it seems to work.

I don't know specifics about Oracle, so I can't really speak to the code you use to fetch the results, but maybe you need to specify an encoding in the connect statement?

But as I said, I have no experience with oracle.

one_danea at 2007-7-16 21:46:46 > top of Java-index,Desktop,I18N...