PHP's file_get_contents

Hi

Im trying to emulate PHP's file_get_contents(www.php.net/file_get_contents ) using Java.

import java.io.*;

import java.net.*;

publicclass test23

{

publicstaticvoid main(String[] args)throws Exception

{

String s = (String)file_get_contents("http://google.com/");

System.out.println(s);

}

publicstatic Object file_get_contents(String strURL)

{

StringBuffer content =new StringBuffer();

byte[] b =newbyte[10];

try

{

URL oURL =new URL(strURL);

URLConnection oURLC = oURL.openConnection();

oURLC.connect();

InputStream in = oURLC.getInputStream();

while (in.read(b) != -1)

{

content.append(b);

}

}

catch (MalformedURLException e)

{

returnfalse;

}

catch (IOException e)

{

returnfalse;

}

return content.toString();

}

}

readLine() of BufferedReader can read line by line but I want it read either byte-by-bye or in chunks of bytes. How do I do that ? Here obviously the ascii value is getting appended.

Is there an easier way ?

Message was edited by:

anjanesh

null

Message was edited by:

anjanesh

null

null

[2481 byte] By [anjanesha] at [2007-11-27 4:38:43]
# 1

What happens when you compile and run your code?

> I want it read either byte-by-bye or in chunks of bytes. How do I do that ?

InputStream (and BufferedInputStream) offer a few read() methods. Note that you are reusing the same byte array. This means that it will have "junk" in it from the previous read. Also you are appending the byte array to a StringBuffer; check the API documentation for this to make sure this is what you want to do.

Finally, I would reconsider the return statements. The way you have written it the caller will never know whether they are recieving a Boolean or a String. They will have to do much checking. It might be better to return null if you intend file_get_contents() to deal with the error, or declare the exception with "throws" if you want the caller to deal with it. In this connection someone should deal with the exceptions: at the moment you are getting lots of potentially useful information (especially from the IOException), but are simply throwing that information away. At a bare minimum printStackTrace() but, as suggested above, deal with it or throw it.

pbrockway2a at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 2

I'll deal with the exceptions later - and the return type.

For now, Im trying to get the html content as a string from a given url.

Whats normally a one-liner in php ($c = file_get_contents("http://google.com");), this is way too long in Java. I dont understand why we got to deal strings in this way.

How else can a byte array to a StringBuffer or String without the trailing junk ?

Thanks

anjanesha at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 3

> I'll deal with the exceptions later - and the return type.

Spoken like a true php-er ;)

> For now, Im trying to get the html content as a string from a given url.

Hang on! For that you don't want a Stream at all, but a BufferedReader. readLine() your way through the document, appending to the StringBuffer/Builder and return the result toString(). Why is it that in your OP you rejected this? (in favour of doing things with bytes).

pbrockway2a at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 4

The return type of URLConnection.getInputStream() is InputStream, so I thought:

InputStream in = oURLC.getInputStream();

import java.io.*;

import java.net.*;

public class test23

{

public static void main(String[] args) throws Exception

{

String s = (String)file_get_contents("http://google.com/");

System.out.println(s);

}

public static Object file_get_contents(String strURL)

{

StringBuffer content = new StringBuffer();

char[] c = new char[100];

try

{

URL oURL = new URL(strURL);

URLConnection oURLC = oURL.openConnection();

oURLC.connect();

BufferedReader in = new BufferedReader(new InputStreamReader(oURLC.getInputStream()));

while (in.read(c) != -1)

{

content.append(c);

}

in.close();

}

catch (MalformedURLException e)

{

return false;

}

catch (IOException e)

{

return false;

}

return content.toString();

}

}

Anyway I've changed it to BufferedReader but Im guessing the content.append(c); isnt still right as some lines have a few extra chars.

The reason I dont want to use readLine() is that readLine() assumes you're reading a text file. file_get_contents shoudnt wont return just html links' content but even images, pdfs etc etc. Basically file_get_contents will return content of any given link.

anjanesha at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 5

> while (in.read(c) != -1)

> {

>content.append(c);

> }

Oldest Java I/O problem in the book:

int count;

while ((count = in.read(c)) != -1)

{

content.append(c, 0, count);

}

ejpa at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 6

> The reason I dont want to use readLine() is that readLine()

> assumes you're reading a text file.

What difference does it make if in the end you convert the bytes you read to a String? You might as well assume you are reading text all the way. If you really want the method to handle binary data it should return an array of bytes.

If you decide to really treat the data as binary, I think you'll find ByteArrayOutputStream useful. It implements a buffer of bytes that grows automatically when you add to it, much like StringBuffer.

jsalonena at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 7

Thanks ! Strange there no built-in function that simply returns the contents as a string itself.

Im using Java SE Update 1 - are there any new additions to the library that'll make life as easier instead of concentrating on string manipulations ?

jsalonen : Dont you mean ByteArrayInputStream ?

null

anjanesha at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 8

> Thanks ! Strange there no built-in function that

> simply returns the contents as a string itself.

>

In the most typical Java applications such a function is less useful than you think.

> Im using Java SE Update 1 - are there any new

> additions to the library that'll make life as easier

> instead of concentrating on string manipulations ?

Easier in what way? What are you trying to achieve?

jsalonena at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...
# 9
> jsalonen : Dont you mean ByteArrayInputStream ?No; Output. The class you mention is meant for reading bytes from an array, while what you would want to do is store a.k.a write bytes in an array. http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayOutputStream.html
jsalonena at 2007-7-12 9:49:13 > top of Java-index,Java Essentials,New To Java...