How to use replaceAll to replace one URL with another
I'm parsing some HTML, and need to replace some URLs with another URL. Here's an example:
String imgTag ="<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";
// I would normally extract the src value from the String, but for our purposes I am just hardcoding it
String src ="http://mywebsite.com/img1.gif?blah1=test&blah2=test";
imgTag.replaceAll(src,"http://anothersite.com/img2.gif");
Is there an 'easy' way to do this without having to go through character by character and escaping all of the regex characters?
[696 byte] By [
apc123a] at [2007-10-3 3:17:22]

The regex depends on how much of the cat you want to skin!
String imgTag = "<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";
System.out.println(imgTag.replaceAll("'http://[^']+'", "'http://anothersite.com/img2.gif'"));
Is it just that you want to make"String imgTag = "<img src=' http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";Changed to:imgTag = "<img src=' http://anothersite.com/img2.gif'>"?
The example you gave doesn't work correctly. I tried this as an example
String str1 = "<a href=\"http://mysite.com/\"><img src=\"http://mysite.com/image.gif/\" /></a>";
String str2 = "http://mysite.com/image2.gif";
System.out.println(str1.replaceAll("\"http://[^']+\"", str2));
...
// Ouput
// 06/08/22 10:47:21 <a href=http://mysite.com/image2.gif /></a>
Also, I need a way to replace certain URLs, not all of them. In the example above, I would only want to replace the src URL, not the href URL. The only solution that I have currently is the following, but it seems like overkill on the replaceAll calls.
String url1 = "http://mysite.com/image1.gif";
String url2 = "http://mysite.com/image2.gif";
String html = "<a href=\"http://mysite.com/\"><img src=\"http://mysite.com/image1.gif/\" /></a>";
url1 = url1.replaceAll("\\(", "\\\\(").replaceAll(
"\\)", "\\\\)").replaceAll("\\[", "\\\\[")
.replaceAll("\\]", "\\\\]").replaceAll("\\.",
"\\\\.").replaceAll("\\*", "\\\\*")
.replaceAll("\\?", "\\\\?").replaceAll("\\+",
"\\\\+").replaceAll("\\^", "\\\\^")
.replaceAll("\\$", "\\\\$");
System.out.println(html.replaceAll(url1, url2));
// Output
// 06/08/22 10:54:15 <a href="http://mysite.com/"><img src="http://mysite.com/image2.gif/" /></a>
> Is it just that you want to make"
> > String imgTag = "<img
> src='http://mywebsite.com/img1.gif?blah1=test&blah2=te
> st' />";
>
> Changed to:
> > imgTag = "<img
> src='http://anothersite.com/img2.gif'>"
>
> ?
Yes to this.
String imgTag = "<img src='http://mywebsite.com/img1.gif?blah1=test&blah2=test' />";
System.out.println(imgTag.replaceAll("src=['\"]http://[^\"']+['\"]", "src='http://anothersite.com/img2.gif'"));
Message was edited by:
sabre150
Then why change it? Why not just reset the value of the variable:String newValue = "' http://anothersite.com/img2.gif'";imgTag = "<img src="+newValue+">";It's probably me being dense
> String imgTag = "<img
> src='http://mywebsite.com/img1.gif?blah1=test&blah2=te
> st' />";
>
> ystem.out.println(imgTag.replaceAll("src=['\"]http://[
> ^\"']+['\"]",
> "src='http://anothersite.com/img2.gif'"));
>
>
> Message was edited by:
> sabre150
This sort of works. If I have html such as
<a href="http://mysite1.com"><img src="http://mysite1.com/image1.gif" /></a><a href="http://mysite2.com"><img src="http://mysite2.com/image2.gif" /></a>
What if I only want to replace the first image url and not the second one?
If my solution is the only one, just tell me! I am just not that familiar with regular expressions, so I didn't know if there was an easier way...
If you're using JDK5, you can also use the replace(CharSequence, CharSequence) method and not have to deal with this regex stuff. If not, here's an easier way to escape the string: url1 = url1.replaceAll("\\p{Punct}", "\\\\$0");
> If you're using JDK5, you can also use the
> replace(CharSequence, CharSequence) method and
> not have to deal with this regex stuff. If not,
> here's an easier way to escape the string: url1
> = url1.replaceAll("\\p{Punct}", "\\\\$0");
Thank. I didn't know about the {Punct} character class.
This works, but I'm curious as to why the correct text is still found and replaced even when some of the punctuation that is escaped isn't a regex special character?
> > If you're using JDK5, you can also use the
> > replace(CharSequence, CharSequence) method
> and
> > not have to deal with this regex stuff. If not,
> > here's an easier way to escape the string:
> url1
> > = url1.replaceAll("\\p{Punct}", "\\\\$0");
>
> Thank. I didn't know about the {Punct} character
> class.
>
> This works, but I'm curious as to why the correct
> text is still found and replaced even when some of
> the punctuation that is escaped isn't a regex special
> character?
By the way - I can't use JDK 1.5 for this project that I'm working on...
Unless you are using strictly structured html (usually machine generated) it generally isn't a good idea to rely on regex for processing it.Either you are constantly adding special cases or you end up with a parser.
> This works, but I'm curious as to why the correct
> text is still found and replaced even when some of
> the punctuation that is escaped isn't a regex special
> character?
It's because they may want to assign special meaning to other punctuation characters in the future. Notice that you can't do the same thing with letters. If you put a backslash in front of a 'j', for instance, it will throw an exception, because they might want to assign a meaning to \j in the future.
> By the way - I can't use JDK 1.5 for this project
> that I'm working on...
By the way, despite what the docs say, it isn't thread safe if that matters to you. Or at least I believe that particular bug was only fixed in 1.5.
If you prime the pump by doing a query with one of the classes like 'Punct' before using it in thread (do any actual search) then it is ok.