comparing strings that are almost the same

so, I am comparing strings to see if they're equal to each other. Usually the .equals method works fine. The issue is I also want to identify 2 strings that are ALMOST equal, to a certain degree.

Say that you had two strings "It is a hot day" and "It is a very hot day".

I would like it to identify these 2 strings as being almost equals.

I figure that I could do this using loops and charAt(), but I was wondering if there was a less messy way to do this.

Thanks for taking the time to read all of this

[536 byte] By [snoboardera] at [2007-11-27 7:35:20]
# 1

You would need to first define what you want "almost" to mean. You would need to be able to translate this into something that can be implemented with Java syntax.

You most likely will be comparing characters, possibly byte size metrics as well. Note that the semantics of Engligh words and what they mean cannot be translated into something that the Java programming language easily understands.

Natural language parsers already exist so you might want to do some research on them to learn more.

Good luck!

GhostRadioTwoa at 2007-7-12 19:15:49 > top of Java-index,Java Essentials,New To Java...
# 2

One possibility is if you are dealing with sentences, then you could break the sentences into words, and then compare the words separately, then say that if above a certain percentage matches, then they are almost similar.

Then you could also map known words to misspellings or contractions... "It is" and "it's" being compared as the same.

But as mentioned, it depends on what you are really looking for.

bsampieria at 2007-7-12 19:15:49 > top of Java-index,Java Essentials,New To Java...
# 3

I took a class in school on file organization and processing which used a book of the same name, and had a section of spell checking algorithms that used a trie. I don't remember the specifics, but it could skip over the current character if it didn't match, and keep processing the rest of the string. One part of the matching was that if there was an extra character, (if you treat words as characters in your example), it would match the rest of the word. The book was by Alan Tharp if you want to try to look it up. That's probably overkill for your problem, but it might give you some inspiration.

hunter9000a at 2007-7-12 19:15:49 > top of Java-index,Java Essentials,New To Java...
# 4

You may find this useful:

[url=http://jakarta.apache.org/commons/lang/api/org/apache/commons/lang/StringUtils.html#getLevenshteinDistance(java.lang.String,%20java.lang.String)]org.apache.commons.lang.StringUtils.getLevenshteinDistance(String s, String t)[/url]

http://jakarta.apache.org/commons/lang/

~

yawmarka at 2007-7-12 19:15:49 > top of Java-index,Java Essentials,New To Java...
# 5
wow, that getLevenshteinDistance thing is amazing, thank you
snoboardera at 2007-7-12 19:15:49 > top of Java-index,Java Essentials,New To Java...