Finding URLs using regular expression.

I have an requirement where user will type some text containing URLs like "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you". This text has to be modified as below before saving it to the database.

"Please visit this site <a href='http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747'>http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747</a>. Thank you"

I am using regular expression (http|https)://.+?\\s which marks the end of the url with a white space character.This pattern doesn't work if the URL is located at the end of the string since there will be no space at the end.

For example if the string is "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747" the regex will fail.

My acutal problem is to find the URL irrespective its position within the string.

Pattern urlPattern = Pattern.compile("(http|https)://.+?\\s", Pattern.CASE_INSENSITIVE);

Matcher matcher = urlPattern.matcher(plainText);

Map stringIndexMap =new HashMap();

//Searching the input string for urlPattern...

while(matcher.find()){

String urlString = matcher.group();

//Storing the urls in a hashmap with their indices as keys....

stringIndexMap.put(new Integer(matcher.start()), urlString.trim());

}

Set keySet = stringIndexMap.keySet();

Iterator it = keySet.iterator();

//Iterating over the hashmap containing urls...

while(it.hasNext()){

String urlString = (String) stringIndexMap.get(it.next());

/*

* Replacing the url string in the input text with <a href="#" onclick="window.open('<urlString>')"

* using String index

*/

clickableURLString.replace(clickableURLString.indexOf(urlString),

clickableURLString.indexOf(urlString) + urlString.length(),

"><a href=\"#\" onclick=\"window.open('" + urlString

+"')\">" + urlString +"</a>");

}

return clickableURLString.toString();

[2568 byte] By [Vishwas_Prasannaa] at [2007-11-27 10:54:34]
# 1

The end of the input is '$' as a regex.

hiwaa at 2007-7-29 11:51:02 > top of Java-index,Java Essentials,Java Programming...
# 2

Basically, you might simply use:"https?://\\S+"

But, how do you handle the dot ? (i.e. shouldn't you handle punctuation marks somehow ?)

"Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747. Thank you."

"Please visit this site http://www.google.com, or perform direct image search: http://images.google.com/images. Thank you."

TimTheEnchantora at 2007-7-29 11:51:02 > top of Java-index,Java Essentials,Java Programming...
# 3

Vishwas's problem is to find urls inside the string as well as at the end of the string. $ will help only if the url is at the end of the string

Kiran_Joisa at 2007-7-29 11:51:02 > top of Java-index,Java Essentials,Java Programming...
# 4

> The end of the input is '$' as a regex.

import java.util.regex.*;

public class Prasanna{

public static void main(String[] args){

String text

= "Please visit this site http://www.google.com/e/qHvQcWco`~!@#$%^&*()-7747";

//String regex = "(http|https)://.+?(?:\\s|$)"; // this works

String regex = "(http|https)://[^ ]+"; // this also works

Pattern pat = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

Matcher mat = pat.matcher(text);

while (mat.find()){

System.out.println(mat.group());

}

}

}

hiwaa at 2007-7-29 11:51:02 > top of Java-index,Java Essentials,Java Programming...
# 5

Thank you Very Much Hiwa.

Problem solved.

Vishwas_Prasannaa at 2007-7-29 11:51:02 > top of Java-index,Java Essentials,Java Programming...