Hi,
It depends on what a word is. Is "it's" one word or two? The most common way is to treat everything between blanks as a word. So this is what you basically need to do.
1) Trim of all whitespaces from the beginning and the end of the string.
2) Start looking for a white space.
3) If one is found, increase the word counter, and skip all consecutive spaces untill you find a character which isn't a space charater.
4) Goto 2
You can do all of this using methods in the String class (e.g. charAt(i) etc)
Kaj
I always use java.util.StringTokenizer... probly easier than making your own algorithm
StringTokenizer st = new StringTokenizer (sentence);
while (st.hasMoreTokens ())
{
String word = st.nextToken ();
//do what you want to the word
}
One has to split on non-word char sequences.
In java:
String sentence = "...";
String[] words = sentence.split("\\W+");
If you want to do it yourself, the difficulty lies in deciding what is a word part, and what not. With unicode...
You question can be best answered by using the Java native method
import java.util.StringTokenizer;
....
....
StringTokenizer word =
new StringTokenizer("You have the answer, Spider.", " \n.,");
while (word.hasMoreTokens())
{
System.out.println(word.nextToken());
}
Output:
You
have
the
answer
Spider
The string tokenizer is provided a sentence and a set of separators. In this case the blank, new-line, comma and period. So, the basic idea is:
Start from the beginning of the sentence provided.
Scan each character and check if it is the one specified as separators.
If it is, print all the character/s before the separator.
Start again from the position right after the sepator and follow the same technique to iterate through the whole sentence.
I hope that will help you to implement your own algorithm.
Good luck!