web page ripping help

ive made a program that rips the html code off a webpage, but it seems to be corrupted mid stream (as this is a orum type site and its active the code changes alot)

this is my code

public sourcelist ripfile(String asdf)

{

sourcelist head = null;

sourcelist back = null;

try

{

String temped = "";

URL url = new URL(asdf);

InputStream in = url.openStream();

byte[] data = new byte[1024];

int length = -1;

while ((length = in.read(data)) != -1)

{

String b = new String(data, 0, length);

temped += b;

if (temped.lastIndexOf('\n') != -1)

{

sourcelist temp = new sourcelist(temped);

if (head == null)

{

head = temp;

back = temp;

}

else

{

back.setNext(temp);

back = temp;

}

temped = "";

}

}

return head;

}

if you can ignore my bad coding conventions, sourcelist is a linked list.

what i think is happening (and i am prolly wrong) as it reads 1 byte at a time and then the page changes, it is getting corrupted data.

is there a way to download the whole page so i can then just take the page and plug it in

thanks guys

[1247 byte] By [kotmfua] at [2007-11-26 20:15:42]
# 1
Please use code tags: http://forum.java.sun.com/help.jspa?sec=formatting
DrLaszloJamfa at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 2

i cant seem to edit my first post :S

so sorry about the tags and here it is again....

public sourcelist ripfile(String asdf)

{

sourcelist head = null;

sourcelist back = null;

try

{

String temped = "";

URL url = new URL(asdf);

InputStream in = url.openStream();

byte[] data = new byte[1024];

int length = -1;

while ((length = in.read(data)) != -1)

{

String b = new String(data, 0, length);

temped += b;

if (temped.lastIndexOf('\n') != -1)

{

sourcelist temp = new sourcelist(temped);

if (head == null)

{

head = temp;

back = temp;

}

else

{

back.setNext(temp);

back = temp;

}

temped = "";

}

}

return head;

}

kotmfua at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 3
Okay, now indent properly ;-)
DrLaszloJamfa at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 4

w00t indents... now wreckon someone could help?

public sourcelist ripfile(String asdf)

{

sourcelist head = null;

sourcelist back = null;

try

{

String temped = "";

URL url = new URL(asdf);

InputStream in = url.openStream();

byte[] data = new byte[1024];

int length = -1;

while ((length = in.read(data)) != -1)

{

String b = new String(data, 0, length);

temped += b;

if (temped.lastIndexOf('\n') != -1)

{

sourcelist temp = new sourcelist(temped);

if (head == null)

{

head = temp;

back = temp;

}

else

{

back.setNext(temp);

back = temp;

}

temped = "";

}

}

return head;

}

Message was edited by:

kotmfu

kotmfua at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 5

import java.io.*;

import java.net.*;

import java.util.*;

public class ReadURL {

public static List < String > readURL(String urlString) throws IOException {

URL url = new URL(urlString);

BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));

List < String > listing = new ArrayList < String > ();

try {

String line;

while ((line= br.readLine()) != null) {

listing.add(line);

}

} finally {

br.close();

}

return listing;

}

public static void main(String[] args) throws IOException {

for(String line : readURL("http://java.sun.com")) {

System.out.println(line);

}

}

}

DrLaszloJamfa at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 6
thanks dude.. seems to be working heaps better
kotmfua at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...
# 7
Indented better, too.
DrLaszloJamfa at 2007-7-9 23:22:29 > top of Java-index,Java Essentials,Java Programming...