Searching for a string in many documents
Hello,
I'm trying to implement a search system whereby I can search through documents to find a specific phrase. I was wondering, if anyone had any ideas on what would be the fastest method of doing this?
Would it be suffient to use indexOf(String) to dertemine whether the string is present in a file, or should I implement a fast search algorithm such as Boyer-moore?
thanks,
BBB
The Boyer-Moore algorithm is considered as the most efficient
string-matching algorithm in applications where few searches
per text is done. If many searches on the same text is done
you can speed up the search by creating indexes.
http://forum.java.sun.com/thread.jspa?threadID=662166&messageID=3882744#3882744
parza at 2007-7-15 22:36:30 >

Check out reply 21 here: http://forum.java.sun.com/thread.jspa?threadID=674409&start=15
As a first try I'd memory map a file and do indexOf() (or since memory mapping gives a byte array I'd write a method "static int indexOf(byte what[], byte data[])"). If that doesn't run at disk speed then start thinking about Boyer-Moore or its derivatives.
Oh bother, now I'm getting interested in the performance of that and I'm going to have to try it.