Problem reading scandinavian characters (with inputstreamreader)
Hi,
My problem is as follows.
My application reads a .html file from the URL specified by the user, removes all the tags and displays the plain text in a JTextArea. Most of the time everything works out allright, and everybody's happy. However, if the .html page happens to contain characters such as ( a with dots) or (o with dots), we run into trouble.
All the scandinavian characters are displayed as (ampersand)auml's or (ampersand)ouml's , instead of displaying the actual characters. What might be the problem?
Here's the method I use to read the URL :
void ReadURL(String fName) {
try {
URLConnection uConn = null;
InputStreamReader in;
BufferedReader buffer;
String line;
StringBuffer sBuffer = new StringBuffer();
uConn = new URL(fName).openConnection();
uConn.connect();
in = new InputStreamReader(uConn.getInputStream());
System.out.println(in.getEncoding());
buffer = new BufferedReader(in);
boolean eof = false;
while ((line = buffer.readLine()) != null)
sBuffer.append(line + "\n");
buffer.close();
String text = parser.parseString(sBuffer.toString());
textScreen.setText(text);
} catch (IOException e) {
System.out.println("Error -- " + e.toString());
}
}
--
I'm guessing the problem is with the inputStreamReader, as reading
normal text files with the FileReader works allright. However, I have
no clue how to fix the problem.
Any ideas?
Thank you for any help,
Ossi

