Regular Expression
Hi,
I am a newbie to Java and I've tried to solve this for several days, but no luck, could anyone here help me out?
I need to write a regular expression for a search word function. It should allow the target word ends with a white space or punctuation, and also, if the word is original form, say "bake", the program should also search for "baked" and "baking".
Any advice would be appreciated.
Thank you very much in advance.
http://www.regular-expressions.info/- Saish
> if the word is original form, say "bake", the program should also search for "baked" and "baking".
This will be tough. You could search on 'bak' as a prefix, but there might be other words that match. Having a computer figure out conjugations of words will be a hard problem. The best idea I can think of would be to leverage an existing dictionary API. But again, this will not be trivial.
- Saish
Thank you.The way I tried to solve the first problem is using this regular expression: target_word + "\\W|\\s"But this doesn't work, any ideas?
> ...> > But this doesn't work, any ideas?Yes, see reply #1.
Thank you. I read through the article you suggest and other regular expression tutorials online, but still can't tell why the expression I wrote doesn't work.
\\W is for nonword character.
\\s is for white space character.
what I wrote should match word ends with a nonword character or white space character.
Could you tell me what's wrong with my expression?
Thank you very much
Post your exact code and describe exactly what's happening and exactly what you're trying to do.
When you post code, please use[code] and [/code] tags as described in [url=http://forum.java.sun.com/help.jspa?sec=formatting]Formatting tips[/url] on the message entry page. It makes it much easier to read.
Code would be good, but your expression has some obvious flaws.
If your target word is "bake", then you have "bake\\W|\\s", which is either (the word "bake" followed by a non-whitespace character), or (any whitespace character).
You're looking for "bake(\\W|\\s)", or if you want a non-capturing group, "bake(?:\\W|\\s)". This can occur anywhere in a String, unless you add the termination character: "bake(?:\\W|\\s)$", which forces it to look for a match at the end of the String or line.
This still won't do what you're looking for, but should explain why your expression didn't behave as you expected.
Forgot to mention that the group ("\\W|\\s") is redundant, since whitespace characters *are* non-word characters. =)
"bake\\W" covers the set.
Message was edited by:
cafal
Here's my code:
if (!(strWord.equals(""))){//target word is available
for (int a = 0; a <mycontent.split("\r").length; a ++ ){
String thisparagraph = " "+mycontent.split("\n")[a].trim();
int length = thisparagraph.toLowerCase().split("\\s"+strWord.toLowerCase()+"\\W|\\s").length;
inUserFile = inUserFile + length;
if (length>1){
for (int s = 0; s < length-1 ; s ++){
...
result = result + userdata.getThreeWord(thisparagraph.toLowerCase().split("\\s"+strWord.toLowerCase()+"\\W|\\s")[s], true);
result = result + strWord ;
result = result + userdata.getThreeWord(thisparagraph.toLowerCase().split("\\s"+strWord.toLowerCase()+"\\W|\\s")[s+1], false);
}
}
}
}
Thanks
Message was edited by:
sherryswyu
Message was edited by:
sherryswyu
Thanks, Cafal.At this point, maybe I should focus on the first issue and deal with the viaration later. I tried the strWord + "(?:\\W|\\s)$", but it's not working.any ideas?
String value = "bake.";
System.out.println(value.matches("bake\\W"));
value = "baked";
System.out.println(value.matches("bake\\W"));
Output:
true
false
Here's where we need to see your code - include the section with your regular expression, and if you can, show the value of the String being tested against the expression, the results of the expression, and what you expected to happen.
--
Nevermind, your code's already there. Input, output, and expectations then, if you please!
Message was edited by:
cafal
If you're trying to find a word within a larger text, you don't want to use the $ anchor, because it only matches at the end of the text. You don't want to use split() either. For this kind of job, you should be looking at the find() method in java.util.regex.Matcher. But it looks like you're trying to write a search engine, like [url=http://lucene.apache.org/]Lucene[/url]. Regexes can be part of a project like that, but you can't implement it entirely in regexes.
use the classes Matcher and Pattern there are easy, also String class has a static method that is matchs(regularexpresion), where regular expresion is a String, and return true is the expresion is correct otherwise false.
Thank you very much for your replies.Is there a way to detect a word ends with the punctuation but not other characters, like, if the string is "bake.", it returns true and if it is "baked", it returns false.any ideas?Thanks.