help with java I/O reading from a URL
Hello,
i have just started learning Java I/O and decided to make a program that reads a URL, gets the source of the website, then isolates every single link on the root web page and its sister pages.
my problem right now is how do i isolate the links of the webpage? I have developed the follwing code:
public String getNextHrefURL( Scanner in )
throws Exception
{
while ( in.hasNext() )
{
if ( in.hasNext( "<a href=" ) )
{
in.useDelimeter( "\"" );
in.useDelimeter( "><a href=" );
in.useDelimeter( ">" );
return in.next();
}
else
{
return null;
}
}
}
the entire source of the webpage has been Delimited to one line and i am using scanners to evaluate the file.
Am i on the right track, or is their a certain method i should use with my Scanner object to acquire a link between the quotes of a <a href="...">?
thank you
after formatting my code to use a string, will this new code be able to look through the source of a URL and isolate all the links? and if there are no links on the page, will it return NULL properly to indicate that there are no links?
public String getNextHrefURL( Scanner in )
throws Exception
{
while ( in.hasNext() )
{
source = in.next();
if ( source.startsWith( "href=" ) )
{
quote = source.indexOf("\"", 5 );
source = source.substring( 6, quote );
return source;
}
else
{
return null;
}
}
}
> after formatting my code to use a string, will this
> new code be able to look through the source of a URL
> and isolate all the links? and if there are no links
> on the page, will it return NULL properly to indicate
> that there are no links?
COULD it? Definitely. WILL it? It depends on you :)
>
> public String getNextHrefURL( Scanner in )
> throws Exception
>while ( in.hasNext() )
> {
>source = in.next();
> if ( source.startsWith( "href=" ) )
>{
>quote = source.indexOf("\"", 5 );
>source = source.substring( 6, quote );
>return source;
>}
>else
>{
> return null;
>}
>}
well the difficulty of this program is that i am doing it for a class, using the BlueJ IDE, and normally we have created programs using the objectdraw API. This is our first program not using that library, and this code is being developed for a single class.
I guess my next problem would be, how do i check to see if my program is working in blueJ? Will the terminal come up for me to input the URL, or do i place the string to be used as a URL as a parameter for the constructor of the class and input the URL from there when i create a new instance of the class on the object bench?
thanks for all your help so far, it's been great!