Problem in choosing Encoding at runtime.

Hi

I have a problem in detecting the encoding of content of any file at runtime.

to genrate the problem please follow the steps:

1. create a text file with wordpad(Windows) with just 1 or 2 line

for example:

JAVA PROBLEM

2. save this file with name test1.txt by selecting save type as Text Document.

now save same data as name test2.txt but selecting save type as Unicode text Document.

3. Now follow the following program:

import java.io.*;

public class ReadFile

{

public static void main(String [] args) throws Exception

{

//InputStreamReader aInputStreamReader = new InputStreamReader(new FileInputStream("test.txt"),"UTF-16");

InputStreamReader aInputStreamReader = new InputStreamReader(new FileInputStream("test1.txt"));

BufferedReader aReader = new BufferedReader(aInputStreamReader);

String aStr = "";

while((aStr = aReader.readLine())!=null)

System.out.println(aStr);

}

}

4. Run the program and see the output of program it will be:

JAVA PROBLEM

5. Now replace the file by test2.txt and again Run the program and see the output of program it will be like:

JAVA PROBLEM

6. Now repeate the same problem by using the commented InputStreamReader (First line of main method)

you will see when we read test1.txt it gives Exception and with test2.txt now it print properly

JAVA PROBLEM. Similerly if I use UTF-8 encoding the problem get reversed.

Now is their any way that i can detect the content of file is saved in which type. or can get the

file type, so that i can take the dicision at runtime that which contructor should i use for InputStreamReader.

Or any other encoding that i can choose that work for both.

I have tried all BufferedReader and InputStream but I got the same problem.

kapil

[2024 byte] By [kapil_ji] at [2007-9-26 3:12:42]
# 1

If I remember correctly (probably not :), the default encoding used in Java is UTF-8; hence text2.txt is not read correctly with your code. If you use the first line, you will get an exception for text1.txt because the encoding is not UTF-16.

As your second question, I am also trying to figure that one out :) will let you know as I solve it.

shadow0_0 at 2007-6-29 11:21:24 > top of Java-index,Desktop,I18N...