Failing to read Polish characters from unicode

I am experiencing a problem reading Polish characters from a csv file. The file contains comma delimited unicode text and displays the data correctly in notepad.

However, (these special characters are not displaying in this forum) the LATIN CAPITAL LETTER L WITH STROKE and LATIN CAPITAL LETTER Z WITH ACUTE characters are read as? Codepage 12503F. It should display8F for LATIN CAPITAL LETTER Z WITH ACUTE andA3 for the LATIN CAPITAL LETTER L WITH STROKE.

I wonder if anyone has experience reading data in using this format? Any help would be greatly appreciated.

[617 byte] By [petercknighta] at [2007-10-1 11:37:05]
# 1

Most likely you're reading the file using the wrong encoding. Find out what encoding the file is in, then do this:Reader r = new InputStreamReader(new FileInputStream(filename), encodingname));

The other possibility is that your "display" is not working right. I'm going to have to guess on that one too, since you didn't say anything about it. If it's a GUI display and you're getting boxes then your font can't render the characters. If it's the console then forget it -- unless you can actually display the file accurately on the console outside Java.

DrClapa at 2007-7-10 13:13:59 > top of Java-index,Desktop,I18N...
# 2

Hi, thanks for your timely response. I believe I'm doing as you suggest but still having no joy; this is my implementation:

// read in using unicode charSet

String filename = "jobcode.csv";

FileInputStream fis=new FileInputStream(filename);

BufferedReader br=new BufferedReader(new InputStreamReader(fis, "UTF-16"));

String myLine = br.readLine();

// write out the bytes recieved

byte[] bs = myLine.getBytes();

StringBuffer s = new StringBuffer();

for(int i=0; i<bs.length; i++) {

s.append(Integer.toHexString((int)bs[i]) + ",");

}

System.out.println("string value: "+myLine);

System.out.println("hex value: "+s.toString());

Best regards>

petercknighta at 2007-7-10 13:13:59 > top of Java-index,Desktop,I18N...
# 3

According to my copy of the Unicode characters, LATIN CAPITAL LETTER Z WITH ACUTE is U-0179. If you encode that using Codepage 1250, it's supposed to be converted to 8F, I suppose? You may or may not be using Codepage 1250: this line of codebyte[] bs = myLine.getBytes();

encodes it using your system's default charset. You can pass a charset name to the getBytes() method if you want to specify the charset it should use. So according to this document:

http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html

you could trybyte[] bs = myLine.getBytes("Cp1250");

DrClapa at 2007-7-10 13:13:59 > top of Java-index,Desktop,I18N...