How to use replaceAll to replace one URL with another

I'm parsing some HTML, and need to replace some URLs with another URL. Here's an example:

String imgTag ="<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";

// I would normally extract the src value from the String, but for our purposes I am just hardcoding it

String src ="http://mywebsite.com/img1.gif?blah1=test&blah2=test";

imgTag.replaceAll(src,"http://anothersite.com/img2.gif");

Is there an 'easy' way to do this without having to go through character by character and escaping all of the regex characters?

[696 byte] By [apc123a] at [2007-10-3 3:17:22]
# 1

The regex depends on how much of the cat you want to skin!

String imgTag = "<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";

System.out.println(imgTag.replaceAll("'http://[^']+'", "'http://anothersite.com/img2.gif'"));

sabre150a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 2
Is it just that you want to make"String imgTag = "<img src=' http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";Changed to:imgTag = "<img src=' http://anothersite.com/img2.gif'>"?
abillconsla at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 3

The example you gave doesn't work correctly. I tried this as an example

String str1 = "<a href=\"http://mysite.com/\"><img src=\"http://mysite.com/image.gif/\" /></a>";

String str2 = "http://mysite.com/image2.gif";

System.out.println(str1.replaceAll("\"http://[^']+\"", str2));

...

// Ouput

// 06/08/22 10:47:21 <a href=http://mysite.com/image2.gif /></a>

Also, I need a way to replace certain URLs, not all of them. In the example above, I would only want to replace the src URL, not the href URL. The only solution that I have currently is the following, but it seems like overkill on the replaceAll calls.

String url1 = "http://mysite.com/image1.gif";

String url2 = "http://mysite.com/image2.gif";

String html = "<a href=\"http://mysite.com/\"><img src=\"http://mysite.com/image1.gif/\" /></a>";

url1 = url1.replaceAll("\\(", "\\\\(").replaceAll(

"\\)", "\\\\)").replaceAll("\\[", "\\\\[")

.replaceAll("\\]", "\\\\]").replaceAll("\\.",

"\\\\.").replaceAll("\\*", "\\\\*")

.replaceAll("\\?", "\\\\?").replaceAll("\\+",

"\\\\+").replaceAll("\\^", "\\\\^")

.replaceAll("\\$", "\\\\$");

System.out.println(html.replaceAll(url1, url2));

// Output

// 06/08/22 10:54:15 <a href="http://mysite.com/"><img src="http://mysite.com/image2.gif/" /></a>

apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 4
Yes.
apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 5

> Is it just that you want to make"

> > String imgTag = "<img

> src='http://mywebsite.com/img1.gif?blah1=test&blah2=te

> st' />";

>

> Changed to:

> > imgTag = "<img

> src='http://anothersite.com/img2.gif'>"

>

> ?

Yes to this.

apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 6

String imgTag = "<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";

System.out.println(imgTag.replaceAll("src=['\"]http://[^\"']+['\"]", "src='http://anothersite.com/img2.gif'"));

Message was edited by:

sabre150

sabre150a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 7
Then why change it? Why not just reset the value of the variable:String newValue = "' http://anothersite.com/img2.gif'";imgTag = "<img src="+newValue+">";It's probably me being dense
abillconsla at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 8

> String imgTag = "<img

> src='http://mywebsite.com/img1.gif?blah1=test&blah2=te

> st' />";

>

> ystem.out.println(imgTag.replaceAll("src=['\"]http://[

> ^\"']+['\"]",

> "src='http://anothersite.com/img2.gif'"));

>

>

> Message was edited by:

> sabre150

This sort of works. If I have html such as

<a href="http://mysite1.com"><img src="http://mysite1.com/image1.gif" /></a><a href="http://mysite2.com"><img src="http://mysite2.com/image2.gif" /></a>

What if I only want to replace the first image url and not the second one?

If my solution is the only one, just tell me! I am just not that familiar with regular expressions, so I didn't know if there was an easier way...

apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 9
If you're using JDK5, you can also use the replace(CharSequence, CharSequence) method and not have to deal with this regex stuff. If not, here's an easier way to escape the string: url1 = url1.replaceAll("\\p{Punct}", "\\\\$0");
uncle_alicea at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 10

> If you're using JDK5, you can also use the

> replace(CharSequence, CharSequence) method and

> not have to deal with this regex stuff. If not,

> here's an easier way to escape the string: url1

> = url1.replaceAll("\\p{Punct}", "\\\\$0");

Thank. I didn't know about the {Punct} character class.

This works, but I'm curious as to why the correct text is still found and replaced even when some of the punctuation that is escaped isn't a regex special character?

apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 11

> > If you're using JDK5, you can also use the

> > replace(CharSequence, CharSequence) method

> and

> > not have to deal with this regex stuff. If not,

> > here's an easier way to escape the string:

> url1

> > = url1.replaceAll("\\p{Punct}", "\\\\$0");

>

> Thank. I didn't know about the {Punct} character

> class.

>

> This works, but I'm curious as to why the correct

> text is still found and replaced even when some of

> the punctuation that is escaped isn't a regex special

> character?

By the way - I can't use JDK 1.5 for this project that I'm working on...

apc123a at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 12
Unless you are using strictly structured html (usually machine generated) it generally isn't a good idea to rely on regex for processing it.Either you are constantly adding special cases or you end up with a parser.
jschella at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 13

> This works, but I'm curious as to why the correct

> text is still found and replaced even when some of

> the punctuation that is escaped isn't a regex special

> character?

It's because they may want to assign special meaning to other punctuation characters in the future. Notice that you can't do the same thing with letters. If you put a backslash in front of a 'j', for instance, it will throw an exception, because they might want to assign a meaning to \j in the future.

uncle_alicea at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...
# 14

> By the way - I can't use JDK 1.5 for this project

> that I'm working on...

By the way, despite what the docs say, it isn't thread safe if that matters to you. Or at least I believe that particular bug was only fixed in 1.5.

If you prime the pump by doing a query with one of the classes like 'Punct' before using it in thread (do any actual search) then it is ok.

jschella at 2007-7-14 21:08:56 > top of Java-index,Java Essentials,Java Programming...