Content Retrieval and Display.

Hi,

I am reading an HTML file (HTML Source ) and getting an InputStream.

I convert the bytes in the InputStream into String and then parse it to extract information based on some match criteria.

My problem is that the retrieved content is not getting converted into string properly.

Characters like "=" and ' " ' (double quotes) in thr HTML file are not converted properly.They are displayed as "=3D" and "=22" in the retrieved String.

Is there any way i could convert the retrieved Bytes properly.

I tried Replace() method of the string to replace "=3D" and "=22" with proper representation of "=" and ' " '.

I also tried to pass the charset in the constructor of InputStreamReader but it didnt work.

Can some one please help me in this.

Its very Urgent

Regards

Saurabh

[847 byte] By [sauravimta] at [2007-11-26 15:59:02]
# 1
Try running it through [url= http://jtidy.sourceforge.net]JTidy[/url]. This might solve more problems than just this one -- HTML parsing is quite problematic, and JTidy tries to figure out little idiosyncrasies like this.
kevjavaa at 2007-7-8 22:20:13 > top of Java-index,Java Essentials,Java Programming...
# 2

Maybe I am missing something but why can't you do something as simple as this:

import java.io.*;

public class ReadHtml

{

public static void main (String [] args)

{

try

{

BufferedReader bufferedReader = new BufferedReader(new FileReader(new File("test.html")));

String data = bufferedReader.readLine();

while (data != null)

{

System.out.println(data);

data = bufferedReader.readLine();

}

}

catch (IOException ioe) { ioe.printStackTrace(); }

}

}

You may want to brush up on low-level vs. high-level io.

filestreama at 2007-7-8 22:20:13 > top of Java-index,Java Essentials,Java Programming...