Using StringTokenizer to determine word frequencies

Hi

I'm currently using StringTokenizer to split a string into its composite tokens.

I'm having a problem though - after getting the tokens, I need to print out those which only occur once in the string. I don't know the best way to do this though.

Is it a good idea to use Vectors ?

Any hints/tips would be greatly appreciated

[375 byte] By [slack_alice] at [2007-9-26 3:50:58]
# 1

For each token you read using string tokenizer, maintain a counter in a separate object say of type java.util.HashMap. Each time you read a token from string tokenizer, see if that token(toekn will be key in hashmap) exists in HashMap. If found in hashmap, increment counter in HashMap(counter is value in Hashmap) (you can use java.lang.Integer to keep counter as value in Hashmap), update its counter in the HashMap. If not ounf yet in HashMap, put counter=0 in HashMap using new Integer(0).

neville_sequeira at 2007-6-29 12:36:56 > top of Java-index,Archived Forums,New To Java Technology Archive...
# 2

Add each token to a set - each can only be added once, and duplicates will be ignored:

StringTokenizer st = new StringTokenizer(stringToTokenize);

Set set = new HashSet();

while (st.hasMoreTokens())

set.add(st.nextToken());

String[] words = (String[])set.toArray(new String[set.size()]);

for (int i = 0; i < words.length; i++)

System.out.println(words[i]);

mattbunch at 2007-6-29 12:36:56 > top of Java-index,Archived Forums,New To Java Technology Archive...
# 3

Hi Alice! You can use a data object and a hashtable to achieve this. Here' s how.

public class Counter

{

private String m_token;

private intm_counter;

public Counter( String token )

{

m_token = token;

m_counter = 1;

}

public getCounterValue()

{

return m_counter;

}

public String getToken()

{

return m_token;

}

public void increment()

{

m_counter++;

}

} // class Counter

Use a StringTokenizer to separate the input string into tokens. For each token, perform the following.

// tokenTable is an instance of java.util.Hashtable

// token is the current token

// tokenCounter is an instance of Counter

tokenCounter = (Counter) tokenTable.get( token );

if ( tokenCounter == null )

{

tokenTable.put( token, new Counter( token ) );

}

else

{

tokenCounter.increment();

}

After you have parsed all the tokens, then enumerate through all the entries in the hashtable. The result you want is a list of all Counter objects whose getCounter() method returns one.

Hope this helps! Feel free to ask if something's not too clear.

Cheers!

amolk at 2007-6-29 12:36:56 > top of Java-index,Archived Forums,New To Java Technology Archive...