Searching for a string in many documents

Hello,

I'm trying to implement a search system whereby I can search through documents to find a specific phrase. I was wondering, if anyone had any ideas on what would be the fastest method of doing this?

Would it be suffient to use indexOf(String) to dertemine whether the string is present in a file, or should I implement a fast search algorithm such as Boyer-moore?

thanks,

BBB

[413 byte] By [BigBadBurrowa] at [2007-10-2 3:26:52]
# 1

The Boyer-Moore algorithm is considered as the most efficient

string-matching algorithm in applications where few searches

per text is done. If many searches on the same text is done

you can speed up the search by creating indexes.

http://forum.java.sun.com/thread.jspa?threadID=662166&messageID=3882744#3882744

parza at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...
# 2
In either case I think that using indexOf(...) behind an interface (or class) is a good idea. This way you put down minimum work now and if needed you can speed it up later without making any changes to your existing code.
parza at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...
# 3
Thanks parz, yes I think I will try indexof first to see what speeds I am getting. If it's too slow like you say I always have a plan b
BigBadBurrowa at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...
# 4

Check out reply 21 here: http://forum.java.sun.com/thread.jspa?threadID=674409&start=15

As a first try I'd memory map a file and do indexOf() (or since memory mapping gives a byte array I'd write a method "static int indexOf(byte what[], byte data[])"). If that doesn't run at disk speed then start thinking about Boyer-Moore or its derivatives.

Oh bother, now I'm getting interested in the performance of that and I'm going to have to try it.

sjasjaa at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...
# 5
You may want to take a look at the StringSearch package at http://johannburkard.de/
horstmeyera at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...
# 6
You could indexed the document, like lucene (eg) and search.
barbywarea at 2007-7-15 22:36:30 > top of Java-index,Other Topics,Algorithms...