MS-DOS Charset
Hi,
my java version is jdk1.4.2 .
I am writing java application.
1. i need to get the default charset name, how can i do that on 1.4.2?
2. another application (not java) using the MS-DOS default charset of the machine (for example on french machine the value is: cp850)
to encode a certain file and then sends it to the java application.
I need to retrieve the defaultMS-DOS charset of the machine on runtime so i will be able to decode the data in the file.
how can i do that in java 1.4.2?
Thanks,
Eitan.
1. The default charset name can be obtained with System.getProperty("file.encoding")
2. I think there is a MS-DOS command for getting current MS-DOS codepage: chcp, so I guess you can obtain this codepage number through a call to Runtime.exec() with the command "cmd.exe /C chcp" (and then parsing process output)
Are you trying to get the Charset name of the machine your Java program is running on, or do you want the Charset name the file the other application created the file with?
For example, file "FileFromNonJavaProgram.txt" was created by a machine with code page 850, while the Java program is running on a machine with a code page 1252.
If you use fis.getEncoding(), I believe it will return code page 1252 ("Cp1252"). To decode the file properly, you would have to open the FileInputStream with "Cp850" as shown below.
InputStreamReader is =
new InputStreamReader(
new FileInputStream("FileFromNonJavaProgram.txt", "Cp850"));
The problem is knowing in advance what the proper encoding is. I am facing a similar problem that I could solve because the files I am working with have specific combinations of file extensions and encodings (.pf=UTF16LE, .xat="Cp1252", etc). I would love to dynamically determine the encoding, but I am not sure Java does this.
Someone please prove me wrong.
One last note: Java only supports specific encodings. http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html lists the ones for version 1.4.
You are right, Java does not have a charset detection API. ICU4J does: http://icu.sourceforge.net/userguide/charsetDetection.htmlThis will only be a best attempt id, of course, no guarantees. Small text size, mixed ASCII in multibyte text, etc. will confuse the algorithm.
Hi Guys,Thanks a lot for your replies.but i would like to ask that :Does java application can get the MSDOS code page on the current machine?i have tried it and got only the windows code page.Thanks,EItan.
Did you try my proposal (using chcp) ?
Here is a quick & raw code sample:Process p = Runtime.getRuntime().exec("cmd.exe /C chcp");
BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream()));
String codePage = reader.readLine();
System.out.println(codePage);
I guess you can easily extract code page number from it and build encoding name.