autocorrection searching agorithm

I have a database includes list of book names and price.

when key in a book name, system will return its selling price.

however, if user keyed in name wrongly, for exmaple. "risl management" instead of real name "risk management", my system won't know how to handle this type of problem.

another problem is : if user doesn't key in the full name of the book, how the system could know which book the user is looking fot?

I see the google search provieds some "suggerted correction search". like if key in "alorithm" , googel will ask whether I am searching "algorithm" stuff like that.

so,any search algorithm can solve this kinds of problems?

thanks, waiting for your help asap.

[728 byte] By [javafun] at [2007-9-30 19:48:15]
# 1

Well, your problem formulation is very inexact. But if I understand the first problem right:

1. You need to find matches when you have typed in the "almost" correct answer.

That is solved with a fuzzy index. Etiher your database supports it, and you need to specifically tell it that some column should be indexed that way, or you need to search google for "fuzzy index" or "fuzzy indexing"

2. The other problem is similar. E.g. The full title is "The Lord of the Rings", but "The Lord" or "Lord of" should also give results. If you have a normal database, "The Lord" coud be solved with the "LIKE" condition in your SQL-query. To be able to search for text in the middle of the text, you need a context-index, which exists in the big database systems such as Oracle, DB2, SQL Server etc. But then you can only find exact matches, not misspellings etc.

Nor fuzzy index or context index are trivial matters. Even if your database system has that functionality, it is rather difficult to use it and to use it correctly.

Google has context index. Fuzzy index is really slow, and impossible for large data sets. But google has a dictionary, where it looks up similar words by fuzzy search, and recomends similar word that is more "common", and suggests you to use that exact phrase instead.

Gil

gilroitto at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 2

First question:

1) Get a list of commonly misspelt words and match accordingly - might need a dictionary if search is not limited to a few terms

2) Match the first few characters and the last few characters, and ask the user to choose from the suggestions. Regular expression searching will do it.

Second question:

For good search results, you should also search the description in addition to searching the title. For example, searching for "java books" will probably not get you anything from the title alone.

gilroitto at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 3

On Google:

If you search for "algxxxhm", it comes up with "algorithm".

http://www.google.com/search?num=20&hl=en&lr=&safe=off&q=algxxxhm

However, for "alxxxhm", its not able to find anything.

http://www.google.com/search?num=20&hl=en&lr=&safe=off&q=alxxxhm

But its not just the first few or last few words. Search for "deminat" and it comes up with "dominant".

Searching for the first few and last few words would probably get you a good solution, but not for words that are misspelt beyond human recognition.

gilroitto at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 4

One approach to fuzzy searches, at least for single words is Soundex

http://www.google.com/search?hl=en&q=soundex&btnG=Google+Search

Saw it many years ago as a function provided by Sybase/MS-SQL, and looked into it then, but never actually had a reason to use it. Looking at some of the links that search turns up, it appears to be the mechanism used to store US Census data.

kdgregory at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 5

soundex was developed by the immigrations office to deal with the problem of indexing peoples names that may have come from a foreigh language and thus did not map nicely into english. Soundex is thus a form of fuzzy search where it will hash the name you supply so that you can look things up by hash codes. The hashing has been chosen to make sure that similar sounding names hash to the same index.

However, what it does not do is correct typos. Typos are based on text substitution based on proximity on the keyboard rather than by proximity in the audio domain, which is what Soundex is all about.

We used soundex on a site to deal with just this fuzzy match and it helped (with folks that couldn't spell Tchaikovsky) but it was not much use for typos.

How does google do it?

Many programmers and many years. You want it - they (or any of those search engine types) will no doubt license their code to you.

You want it for free? take the word you are searching for. Stuff it to google. Strip all the **** off their reply except for the "Were you really looking for xxxx"

search for that.

Enjoy!

marlin314 at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 6
> You want it for free? take the word you are searching> for. Stuff it to google. Strip all the **** off their> reply except for the "Were you really looking for> xxxx"> > search for that.example?
marlin314 at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 7
then is there any alorithm or journal that I can approach? I don't want sophisticated searching engine like google, just a small one.
javafun at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...
# 8
This site seems to have some information on approximate pattern matching which might be helpful. http://www.tgries.de/agrep/
javafun at 2007-7-7 0:36:01 > top of Java-index,Other Topics,Algorithms...