Read arabic from a file
Hello everybody,
I try to read arabic unicode signs from a file into a String within my program.
I create the file using simple windows notepad.
The sign Im wrinting look perfectly arabic (although I actually cant read it).
Now when I start my program and read the signs it doesnt read arabic signs but signs that are within the Unicode range 0000 - 00FF wich is Latin + Latin-1 Supplement.
But I expect the sign beeing somewhere arround 0600 - 06FF.
I have the feeling its realated to a codepage or encodeing thing, but I dont really know a lot about it.
Can you somehow help me?
Do you need more information?
Can you tell me where I can get some usefull information?
Thank you!
Stefan
> Now when I start my program and read the signs it
> doesnt read arabic signs but signs that are within
> the Unicode range 0000 - 00FF wich is Latin + Latin-1
> Supplement.
Maybe you're reading it incorrectly?
> Can you somehow help me?
Yes.
> Do you need more information?
Yes. Like: what does the reading code look like?
OK, here is the code
private static ArrayList<String> readLinesFromFile(File file){
ArrayList<String> toReturn = new ArrayList<String>();
try {
Scanner scanner = new Scanner(file);
while(scanner.hasNextLine()){
String line = scanner.nextLine();
toReturn.add(line);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return toReturn;
}
public Scanner(InputStream source)
Constructs a new Scanner that produces values scanned from the specified input stream. Bytes from the stream are converted into characters using the underlying platform's default charset.
Which is not Unicode. Maybe you want to use the c'tor that lets you specify the encoding to use.