help with java I/O reading from a URL

Hello,

i have just started learning Java I/O and decided to make a program that reads a URL, gets the source of the website, then isolates every single link on the root web page and its sister pages.

my problem right now is how do i isolate the links of the webpage? I have developed the follwing code:

public String getNextHrefURL( Scanner in )

throws Exception

{

while ( in.hasNext() )

{

if ( in.hasNext( "<a href=" ) )

{

in.useDelimeter( "\"" );

in.useDelimeter( "><a href=" );

in.useDelimeter( ">" );

return in.next();

}

else

{

return null;

}

}

}

the entire source of the webpage has been Delimited to one line and i am using scanners to evaluate the file.

Am i on the right track, or is their a certain method i should use with my Scanner object to acquire a link between the quotes of a <a href="...">?

thank you

[983 byte] By [dmbballer25a] at [2007-11-26 20:30:56]
# 1
I don't think you are using Scanner correctly, but the good newsis that you don't need to! Why not just use String's indexOf(String str, int fromIndex) method?Message was edited by: DrLaszloJamf
DrLaszloJamfa at 2007-7-10 1:20:36 > top of Java-index,Java Essentials,New To Java...
# 2

after formatting my code to use a string, will this new code be able to look through the source of a URL and isolate all the links? and if there are no links on the page, will it return NULL properly to indicate that there are no links?

public String getNextHrefURL( Scanner in )

throws Exception

{

while ( in.hasNext() )

{

source = in.next();

if ( source.startsWith( "href=" ) )

{

quote = source.indexOf("\"", 5 );

source = source.substring( 6, quote );

return source;

}

else

{

return null;

}

}

}

dmbballer25a at 2007-7-10 1:20:36 > top of Java-index,Java Essentials,New To Java...
# 3

> after formatting my code to use a string, will this

> new code be able to look through the source of a URL

> and isolate all the links? and if there are no links

> on the page, will it return NULL properly to indicate

> that there are no links?

COULD it? Definitely. WILL it? It depends on you :)

>

> public String getNextHrefURL( Scanner in )

> throws Exception

>while ( in.hasNext() )

> {

>source = in.next();

> if ( source.startsWith( "href=" ) )

>{

>quote = source.indexOf("\"", 5 );

>source = source.substring( 6, quote );

>return source;

>}

>else

>{

> return null;

>}

>}

tjacobs01a at 2007-7-10 1:20:36 > top of Java-index,Java Essentials,New To Java...
# 4

well the difficulty of this program is that i am doing it for a class, using the BlueJ IDE, and normally we have created programs using the objectdraw API. This is our first program not using that library, and this code is being developed for a single class.

I guess my next problem would be, how do i check to see if my program is working in blueJ? Will the terminal come up for me to input the URL, or do i place the string to be used as a URL as a parameter for the constructor of the class and input the URL from there when i create a new instance of the class on the object bench?

thanks for all your help so far, it's been great!

dmbballer25a at 2007-7-10 1:20:37 > top of Java-index,Java Essentials,New To Java...
# 5
I don't know how many people hear know BlueJ. Perhaps it has its own forum.
DrLaszloJamfa at 2007-7-10 1:20:37 > top of Java-index,Java Essentials,New To Java...
# 6
I am doing project on data mining i want to know how to read a web page through a given url and how to read the source code of that web page.
tahir@irshada at 2007-7-10 1:20:37 > top of Java-index,Java Essentials,New To Java...
# 7
and you haven't even read the thread you've hijacked
ejpa at 2007-7-10 1:20:37 > top of Java-index,Java Essentials,New To Java...