web page ripping help
ive made a program that rips the html code off a webpage, but it seems to be corrupted mid stream (as this is a orum type site and its active the code changes alot)
this is my code
public sourcelist ripfile(String asdf)
{
sourcelist head = null;
sourcelist back = null;
try
{
String temped = "";
URL url = new URL(asdf);
InputStream in = url.openStream();
byte[] data = new byte[1024];
int length = -1;
while ((length = in.read(data)) != -1)
{
String b = new String(data, 0, length);
temped += b;
if (temped.lastIndexOf('\n') != -1)
{
sourcelist temp = new sourcelist(temped);
if (head == null)
{
head = temp;
back = temp;
}
else
{
back.setNext(temp);
back = temp;
}
temped = "";
}
}
return head;
}
if you can ignore my bad coding conventions, sourcelist is a linked list.
what i think is happening (and i am prolly wrong) as it reads 1 byte at a time and then the page changes, it is getting corrupted data.
is there a way to download the whole page so i can then just take the page and plug it in
thanks guys
[1247 byte] By [
kotmfua] at [2007-11-26 20:15:42]

i cant seem to edit my first post :S
so sorry about the tags and here it is again....
public sourcelist ripfile(String asdf)
{
sourcelist head = null;
sourcelist back = null;
try
{
String temped = "";
URL url = new URL(asdf);
InputStream in = url.openStream();
byte[] data = new byte[1024];
int length = -1;
while ((length = in.read(data)) != -1)
{
String b = new String(data, 0, length);
temped += b;
if (temped.lastIndexOf('\n') != -1)
{
sourcelist temp = new sourcelist(temped);
if (head == null)
{
head = temp;
back = temp;
}
else
{
back.setNext(temp);
back = temp;
}
temped = "";
}
}
return head;
}
w00t indents... now wreckon someone could help?
public sourcelist ripfile(String asdf)
{
sourcelist head = null;
sourcelist back = null;
try
{
String temped = "";
URL url = new URL(asdf);
InputStream in = url.openStream();
byte[] data = new byte[1024];
int length = -1;
while ((length = in.read(data)) != -1)
{
String b = new String(data, 0, length);
temped += b;
if (temped.lastIndexOf('\n') != -1)
{
sourcelist temp = new sourcelist(temped);
if (head == null)
{
head = temp;
back = temp;
}
else
{
back.setNext(temp);
back = temp;
}
temped = "";
}
}
return head;
}
Message was edited by:
kotmfu
import java.io.*;
import java.net.*;
import java.util.*;
public class ReadURL {
public static List < String > readURL(String urlString) throws IOException {
URL url = new URL(urlString);
BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
List < String > listing = new ArrayList < String > ();
try {
String line;
while ((line= br.readLine()) != null) {
listing.add(line);
}
} finally {
br.close();
}
return listing;
}
public static void main(String[] args) throws IOException {
for(String line : readURL("http://java.sun.com")) {
System.out.println(line);
}
}
}