String Regular Expression for uncommon characters

Hi,

I am trying to get text out of HTML file for which I am using EditorKit and Document classes. After I obtain the text, the text (String) contains some characters like?/b>. This character looks likea with French styleacute accent . My problem is how to use regular expression to find and replace (replaceAll method) these unwanted characters.

Is there a regular expression pattern for such characters?

Thanks!

Rahul.

[472 byte] By [Rahul.Joshia] at [2007-10-1 23:45:36]
# 1
I'm too lazy to test this myself, but can't you use unicode escape sequences (\u####) in your regular expressions?
paulcwa at 2007-7-15 15:36:21 > top of Java-index,Java Essentials,Java Programming...
# 2

hrm I would recommend looking at the specific patterns,

a simplified site would be here http://www.p3m.org/wiki?regex

as a refernce . If you dont know regular expression, use

http://www.perl.com/doc/manual/html/pod/perlre.html

The only way I could think of constructing the regex is to use the \s and add the characters you want in that regex :s you could look into regex look ahead and look behind methods...

m0Oa at 2007-7-15 15:36:21 > top of Java-index,Java Essentials,Java Programming...
# 3
OK, I did test it, and it works:String in = "gar\u00e7on";System.out.println(in);in = in.replaceAll("\u00e7", "c");System.out.println(in);Trivial.
paulcwa at 2007-7-15 15:36:21 > top of Java-index,Java Essentials,Java Programming...
# 4
If what you are looking for is removing the accent (i.e. change ?to a and so on), have a look at this: http://www.rgagnon.com/javadetails/java-0456.html
jsalonena at 2007-7-15 15:36:21 > top of Java-index,Java Essentials,Java Programming...