What's wrong with the regular expression?

Hi all,

For the life of me I can not figure out what is wrong with this regular expression

.*\bA specific phrase\.\b.*

This is just an example the actual phrase can be an specific phrase. My problem comes when the specific phrase ends in a period. I've escaped the period but it still gives me an error. The only time I don't get an error is when I take off the end boundry character which will not suffice as a solution. I need to be able to capture all the text before and after said phrase. If the phrase doesn't have a period it would look like this...

.*\bA specific phrase\b.*

which works fine. So what is it about the \.\b combination that is not matching?

I've been banging my head on this for a while and I'm getting nowhere.

The application highlights text that comes from a server. The user builds custom highlights that have some options. Highlight entire line, match partial word, and ignore case. The code that builds my pattern is here

String strHighlight = _strHighlight;

strHighlight = strHighlight.replaceAll("\\*","\\\\*");

strHighlight = strHighlight.replaceAll("\\.","\\\\.");

String strPattern = strHighlight;

if(_bEntireParagraph)

{

if(_bPartialWord)

strPattern =".*" + strHighlight +".*";

else

strPattern =".*\\b" + strHighlight +"\\b.*";

}

else

{

if(_bPartialWord)

strPattern = strHighlight;

else

strPattern ="\\b" + strHighlight +"\\b";

}

if(_bIgnoreCase)

_patHighlight = Pattern.compile(strPattern, Pattern.CASE_INSENSITIVE);

else

_patHighlight = Pattern.compile(strPattern);

So for example I matching the phrase: The dog ate the cat. And that phrase came over in the following text:Look there's a dog. The dog ate the cat. "Oh my!"

And my user has the entire line and ignore case options selected then my regex woud look like this:.*\bThe dog ate the cat\b.*

That should get highlighted, but for some reason it doesn't. Correct me if I'm wrong but doesn't the regex read as follows:

any characters

word boundry

The dog ate the cat[period]

word boundry

any characters until newline.

Any help will be much appreciated

[2916 byte] By [James057a] at [2007-11-27 11:45:16]
# 1

A word boundary (in the context of regexes) is a position that is either followed by a word character and not preceded by one (start of word) or preceded by a word character and not followed by one (end of word). A word character is defined as a letter, a digit, or an underscore. Since a period is not a word character, the only way the position following it could be a word boundary is if the next character is a letter, digit or underscore. But a sentence-ending period is always followed by whitespace, if anything, so it makes no sense to look for a word boundary there. I think, instead of \b, you should use negative lookarounds, like so: strPattern = ".*(?<!\\w)" + strHighlight + "(?!\\w).*";

>

uncle_alicea at 2007-7-29 18:00:47 > top of Java-index,Java Essentials,Java Programming...
# 2

Uncle_alice, did you ever know that you're my hero?

~

yawmarka at 2007-7-29 18:00:47 > top of Java-index,Java Essentials,Java Programming...
# 3

It's all in the reflexes.

^_^

uncle_alicea at 2007-7-29 18:00:47 > top of Java-index,Java Essentials,Java Programming...
# 4

And I thought I knew something about regular expressions. Thanks, I'll give this a try and post the results

James057a at 2007-7-29 18:00:47 > top of Java-index,Java Essentials,Java Programming...
# 5

That worked. Thanks!

James057a at 2007-7-29 18:00:47 > top of Java-index,Java Essentials,Java Programming...