Question on regex

Hi,

I am currently trying to compile a regular expression that can ignore everything within < >.

From:-

123

<a>456

To:-

123456

The objective is to remove all the HTML tags. Can anybody shed some light in a regular expression that could caterfor this?

Thanks.

Joseph

[376 byte] By [yjhung011a] at [2007-10-2 21:03:35]
# 1
[/nobr]Try this:String str = "123<a>456";String regex = "[<][\\D]*[>]";System.out.println(str.replaceAll(regex, ""));Good luck.[nobr]
prometheuzza at 2007-7-13 23:48:42 > top of Java-index,Java Essentials,New To Java...
# 2

Note: my previous (naive) method only works if there are numbers between the tags. Check this page for details on regex patterns:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

If you're interested in parsing real html files, I suggest using a html parser:

http://java-source.net/open-source/html-parsers

Good luck.

prometheuzza at 2007-7-13 23:48:42 > top of Java-index,Java Essentials,New To Java...
# 3
str = str.replaceAll("<[^>]*>", "");
uncle_alicea at 2007-7-13 23:48:43 > top of Java-index,Java Essentials,New To Java...
# 4
> str = str.replaceAll("<[^>]*>", "");Ah yes, that's it. Thanks uncle_alice.
prometheuzza at 2007-7-13 23:48:43 > top of Java-index,Java Essentials,New To Java...
# 5
Thanks everyone!
yjhung011a at 2007-7-13 23:48:43 > top of Java-index,Java Essentials,New To Java...