Searching algorithim ...
hi all,
I hava a txt file that contains 65000+ rows and each row has two bigints and one string. I am searching through lines based on the value that is passed to me using the following code:
URL url =new URL("url to the text file");
BufferedReader in =new BufferedReader(new InputStreamReader(url.openStream()));
while ((c = in.readLine()) !=null){
StringTokenizer st =new StringTokenizer(c,"\t");
a = Long.parseLong(st.nextToken());
b = Long.parseLong(st.nextToken());
if( passedArgument>= a && passedArgument<=b ){
text = st.nextToken();
break;
however, it takes more than 90 seconds to go through the list ! So, Is there any better way to do this?
regards
[1122 byte] By [
Raieda] at [2007-11-27 11:05:24]

The slowness probably isn't arising from the efficiency of the algorithm, since it's pretty much just a simple comparison. If you know the exact format of the file and if it's not likely to change, you should consider a different method of parsing it.
Other than that, fast and/or fault-tolerant methods of searching exist, but you might want to break out a Linear Algebra book before trying them out :/.
Joe
Joe_ha at 2007-7-29 13:08:28 >

you should sort the list first then use a binary search to find your item, binary search is much faster than the brute force method you are using since binary search divides the list in half on each iteration of the algorithm
oh and by the way don't sort the list each time, create a new text file with the 65000+ items sorted and throw away the old unsorted one (or keep it if you really want to)
thanks for your response,
I know the exact format of the file and it doesnt change. The format of the file is like this:
11111111112222222222SomeText
in words, its ten digits integer then another ten digits integer then some text and this is for the whole file. So, what other methods of parsing it that will increase the speed of the search?
by the way, I've taken an introduction course in linear algebra but i can't think of some techneques that applies to my situations. Any suggestions?
thanks
Raieda at 2007-7-29 13:08:28 >

Binary search will work wonders on a file like that if the strings are also in ascending ASCII order.
http://en.wikipedia.org/wiki/Binary_search.
Joe
Joe_ha at 2007-7-29 13:08:28 >

Another option is to split the file up in to say 6 files of 10000+, if they follow a numerical order then you can group them logically by this. You can decide which file to search a runtime given the range of the input or whatever the criteria is.
pseudo
final String FILE_NAME = "myfile";
if(input < 10000) {
searchFile(FILENAME+"1");
} else if(input <20000) {
searchFile(FILENAME+"2");
}
......
Or use arithmetic to give you the number to append to save if-else or case.
Hello,
You mentioned:
I know the exact format of the file and it doesnt change. The format of the file is like this:
1111111111 2222222222 SomeText
But is the file in a sorted format? And if so, which field is sorted?
1 or 2 or SomeText?
ROuNIN
Thanks guys, really interesting ideas.
but the binary search and breaking my file into 6 or 7 files all needs sorting as suggested. My file is not sorted so I will try to sort it and use it again and see the difference.
But I thought about another reason that may be causing the slowness! This file is being used by a JApplet, so what happens when visitor visits the page that the applet is on ? Is he going to download the applet along with the txt file that is more than 2.5 megabytes ? Or just the applet and then use the file from the server ? That is, Is the client going to download the file then search through it then display the results? Or is he going to download the applet and just use the file on the server to do the search without downloading it?
reagrds
Raieda at 2007-7-29 13:08:28 >

Try using String's split(...) method instead of StringTokenizer.
I just tested this method on a text file with 65000 lines of text:
private static String searchFile(String name, long lookingFor) throws IOException {
Scanner file = new Scanner(new File(name));
while(file.hasNextLine()) {
String line = file.nextLine();
String[] array = line.split("\\t+");
long a = Long.parseLong(array[0]);
long b = Long.parseLong(array[1]);
if(lookingFor >= a && lookingFor <= b) {
return array[2];
}
}
return null;
}
and when I look for the last row in the file, it finds it under 2 seconds on average.
prometheuzz, thanks for your response,
your way made it a littlebit faster but it still takes more than 40 seconds. I think the algorithim is not too bad. But the question is:
what happens when a visitor visits the page that the applet is on ? is he going to download the file that the applet uses? That is, is he going to download the text file that the applet searches which is more than 2.5 megabytes or the applet is going to perform the search on the file in its original place which is on the server?
thanks,
Raieda at 2007-7-29 13:08:28 >

If you're using an applet, and you're searching the file line-by-line, then yes, all data in the file is "downloaded" to the client. There will not be any code executed on the server the applet is on.
But 2.5 MB isn't that much: you could hold it in some sort of database or a load it into memory into some collection to make searching more efficient.