parsing href=".." with regular expression's

I need to get all the hyper links on a webpage, I use this code but it dont work 100%

public void parseLinks(){

String A = "([Hh][Rr][Ee][Ff]\\s*=\\s*\")";

String B = "(?!#|[Hh]ttp|[Mm]ailto|.cgi|.css)";

String C = "(.*)";

String D = "(\\s*\")";

String exp = A+B+C+D;

Pattern p = Pattern.compile(exp);

Matcher m = p.matcher(s);

while(m.find()){

System.out.println(m.group());

}

}

where s is the string that being parsed (html dokument). It works kind of on regular links e.g:

<a name="qwerty" href="qwerty.html"> gives output href="qwerty.html"

but I want it to give the output qwerty.html . How can I do this? It doesn't works on links like:

<a name="qwerty" href="qwerty.html" class="link"> gives the output href="/aktuell/index.html" class="link".

How can I just get the path?

[927 byte] By [geranm] at [2007-9-30 10:41:15]
# 1
Then get group 4 instead of group 0.
dubwai at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...
# 2
Can you please explain more, I dont get it.
geranm at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...
# 3
the group() method with no parameter as the same as calling group(0). Each set of parenthesis in your regex marks a group. 0 is the whole thing, 1, is the group that starts with the first left paren. The second group is the group that starts with the second left paren and so on.
dubwai at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...
# 4

If I got s = "<a name = \"asd\" href=\"asd.html\"" or s = <a href=\"asd.html\"href<\\a>" it works with group 3.

But it dont work with s = "<a href=\"/openpos/index.html\" class=\"link\"><b>Open positions</b></a>" ,

gives output:

/openpos/index.html" class="link

Any suggestion ?, it seems that i fails if there is more then one " before the >

exempel code...

public void parseLinks(){

String s = "<a name = \"asd\" href=\"asd.html\"";

/*String s = "<a href=\"/openpos/index.html\" class=\"link\"><b>Open positions</b></a>";*/

String A = "([Hh][Rr][Ee][Ff]\\s*=\\s*\")";

String B = "(?!#|[Hh]ttp|[Mm]ailto|[Ll]ocation.|[Jj]avascript|.cgi|.css)";

String C = "(.*)";

String D = "(\\s*\")";

String exp = A+B+C+D;

Pattern p = Pattern.compile(exp);

Matcher m = p.matcher(s);

while(m.find()){

System.out.println(m.group(3));

}

}

geranm at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...
# 5
.* is greedy. Try the reluctant version though, I'm not sure it will work..*?
dubwai at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...
# 6
Thank you, works perfekt!
geranm at 2007-7-3 19:53:56 > top of Java-index,Other Topics,Algorithms...