Help me with File.

I want to check many file in my folder.

It is very slow.

I want to increase speed.

for(f1='a';f1<='z';f1++)

{

for(f2='a';f2<='z';f2++)

{

for(f3='a';f3<='z';f3++)

{

for(f4='a';f4<='z';f4++)

{

File f=new File("image/"+String.valueOf(f1)+String.valueOf(f2)+String.valueOf(f3)+String.valueOf(f4)+".jpg");

if(f.canRead())

{

if(i<=19)

{

no[i]=("image/"+String.valueOf(f1)+String.valueOf(f2)+String.valueOf(f3)+String.valueOf(f4)+".jpg");

i++;

}

}

}

}

}

}

[1476 byte] By [SVPRMa] at [2007-11-27 5:53:48]
# 1
What the f are you doing?Why aren't you just retrieving the files from the directory instead of testing all possible names?Kaj
kajbja at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 2

Great!

Well, first improvement is:

for(f1='a';f1<='z';f1++)

{

for(f2='a';f2<='z';f2++)

{

for(f3='a';f3<='z';f3++)

{

for(f4='a';f4<='z';f4++)

{

String name = "image/"+String.valueOf(f1)+String.valueOf(f2)+String.valueOf(f3)+String.valueOf(f4)+".jpg";

File f=new File( name );

if(f.canRead())

{

if(i<=19)

{

no[ i++ ]= name;

}

}

}

}

}

}

Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 3

How about:

import java.io.File;

import java.io.FilenameFilter;

public class FnfTest {

public static void main(String[] args) {

FilenameFilter fnf = new FilenameFilter() {

public boolean accept(File dir, String name) {

return (name.matches("[A-Z]{4}.jpg"));

}

};

String[] files = null;

try {

File imageDir = new File("./image");

files = imageDir.list(fnf);

} catch (Exception e) {

e.printStackTrace();

}

for (int i = 0; i < files.length; i++) {

System.out.println(files[i] + " matches.");

}

}

}

Terminal output:

kev@mymachine ~/java$ ls image

ABCD.java ABCD.jpb ABCD.jpg BLAH.java BLAH.jpg WOOT.java WOOT.jpg

kev@mymachine ~/java$ java FnfTest

ABCD.jpg matches.

BLAH.jpg matches.

WOOT.jpg matches.

kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 4
This completely other method not original with improvements.BTW you should use [a-z] not [A-Z] due to OP code...And how about name ABCDXjpg? Will it match? :)
Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 5

> This completely other method not original with

> improvements.

> BTW you should use [a-z] not [A-Z] due to OP code...

> And how about name ABCDXjpg? Will it match? :)

Thank you, Michael, absolutely valid points... I was trying to get by lazily without having to hammer out my regex (uncle_alice would be appalled :).

Changing the 'matches' line to the following:

return (name.matches("^[a-z]{4}\\.jpg$"));

renders the following terminal output:

kev@mymachine ~/java$ ls image

BLAH.jpg abcd.java abcd.jpg abcdxjpg woot.jpg

kev@mymachine ~/java$ java FnfTest

abcd.jpg matches.

woot.jpg matches.

kev@mymachine ~/java$

kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 6
This one is better.BTW it's very interesting to check something :)Imagine we have image directory with ALL possible names from aaaa to zzzz. Which algorithm will work faster?
Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 7
Very Very Thanks kevjava.This program is run correctly.Please explain this program.
SVPRMa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 8

> Very Very Thanks kevjava.

> This program is run correctly.

> Please explain this program.

I implemented a FilenameFilter to pass to the list() method of the File class, which did a regex lookup of the filenames. The javadocs tell it all:

[url=http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#list(java.io.FilenameFilter)]http://java.sun.com/j2se/1.5.0/docs/api/java/io/File.html#list(java.io.FilenameFilter)[/url]

http://java.sun.com/j2se/1.5.0/docs/api/java/io/FilenameFilter.html

http://www.regular-expressions.info/

kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 9

> This one is better.

> BTW it's very interesting to check something :)

> Imagine we have image directory with ALL possible

> names from aaaa to zzzz. Which algorithm will work

> faster?

Well....

At my CPU/File I/O speeds, etc, the loop takes approx. six minutes to complete, slipping to almost seven minutes with 1000 files to process. The regex matching algorithm, however, takes roughly 0.00016 seconds per file with 100, which actually gets faster with more files, reaching .000046 seconds per file with 1000 files.

If all files were present from aaaa.jpg to zzzz.jpg, you would have 26^4, or 456,976 files. The brute force algorithm would still take at least 400 seconds, presumably, and the regex algorithm would presumably take less than 456,976*.000046, or 21.0 seconds. So, this regex algorithm apparently somehow edges it out either way.

Evidence and code follows:

Terminal output:

kev@mymachine ~/java$ rm -f image/*

kev@mymachine ~/java$ java AlgorithmTest 100

Making 100 files...

Starting brute force matching...

..........................Done.

Found 100 files in 368.11 seconds,

for an average of 3.6811000000000003 seconds/file.

Starting regex matching...

Found 100 files in 0.016 seconds,

for an average of 1.6E-4 seconds/file.

kev@mymachine ~/java$ rm -f image/*

kev@mymachine ~/java$ java AlgorithmTest 1000

Making 1000 files...

Starting brute force matching...

..........................Done.

Found 1000 files in 402.453 seconds,

for an average of 0.40245299999999995 seconds/file.

Starting regex matching...

Found 1000 files in 0.046 seconds,

for an average of 4.6E-5 seconds/file.

kev@mymachine ~/java$

import java.io.File;

import java.io.FilenameFilter;

import java.io.IOException;

import java.util.Random;

public class AlgorithmTest {

public static final String BASE = "image/";

public static final String EXT = ".jpg";

public static final String CHARS = "abcdefghijklmnopqrstuvwxyz";

public static Random random = new Random();

public static String getFileName() throws IOException {

String ret = BASE;

for (int i = 0; i < 4; i++) {

ret += CHARS.charAt( random.nextInt(26) );

}

ret += EXT;

return ret;

}

public static void makeACrapLoadOfFiles(int filesToCreate) throws IOException {

for (int i = 0; i < filesToCreate; i++) {

new File( getFileName() ).createNewFile();

}

}

public static int bruteForceFindFile() {

int found = 0;

for (char c1 = 'a'; c1 <= 'z'; c1++ ) {

System.out.print(".");

for (char c2 = 'a'; c2 <= 'z'; c2++ ) {

for (char c3 = 'a'; c3 <= 'z'; c3++ ) {

for (char c4 = 'a'; c4 <= 'z'; c4++ ) {

if ( new File(BASE +

String.valueOf(c1) +

String.valueOf(c2) +

String.valueOf(c3) +

String.valueOf(c4) +

EXT).exists() ) {

found++;

}

}

}

}

}

System.out.println("Done.\n");

return found;

}

public static int regexMatchingFindFile() {

int found = 0;

FilenameFilter fnf = new FilenameFilter() {

public boolean accept(File dir, String name) {

return (name.matches("^[a-z]{4}\\.jpg$"));

}

};

String[] files = null;

try {

File imageDir = new File("./image");

files = imageDir.list(fnf);

} catch (Exception e) {

e.printStackTrace();

}

for (int i = 0; i < files.length; i++) {

found++;

}

//// I could cheat and do a

// return files.length;

//// but it might violate the purpose of the exercise.

return found;

}

public static void main(String[] args) {

if (args.length < 1) {

System.out.println("Usage: pass in one integer argument saying " +

"how many files to test with.");

System.exit(-1);

}

int numFiles = Integer.parseInt(args[0]);

try {

System.out.println("\n\nMaking " + numFiles + " files...");

makeACrapLoadOfFiles(numFiles);

System.out.println("\n\nStarting brute force matching...");

long bfStart = System.currentTimeMillis();

int bfFound = bruteForceFindFile();

long bfEnd = System.currentTimeMillis();

double bfTime = ( (double)bfEnd - (double)bfStart ) / 1000.0;

double bfAverage = ( (double)bfTime/(double)bfFound );

System.out.println("Found " + bfFound + " files in " + bfTime + " seconds, ");

System.out.println(" for an average of " + bfAverage + " seconds/file." );

System.out.println("\n\nStarting regex matching...\n");

long maStart = System.currentTimeMillis();

int maFound = regexMatchingFindFile();

long maEnd = System.currentTimeMillis();

double maTime = ( (double)maEnd - (double)maStart ) / 1000.0;

double maAverage = (double)maTime/(double)maFound;

System.out.println("Found " + maFound + " files in " + maTime + " seconds, ");

System.out.println(" for an average of " + maAverage + " seconds/file.");

} catch (IOException ioe) {

ioe.printStackTrace();

}

}

}

kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 10
Very impressive :)BTW 1000 files means nothing, the question is about all 26^4. I bet correlation between time and number of files is not liner... Will check at my home PC bit later.
Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 11

> Very impressive :)

> BTW 1000 files means nothing, the question is about

> all 26^4. I bet correlation between time and number

> of files is not liner... Will check at my home PC bit

> later.

It admittedly wasn't a definitive proof, but I figured I would establish two points on the graphs. From there, the intention was to establish trends as the number of files approached the full 26^4 files, and assert by induction which method performed more efficiently. I couldn't create all the 26^4 files here as I'm running off a limited-capacity thumb drive, and can't really afford to spare the 400,000 inodes :).

However, there is an assumption made here that the file access performance trends continue in a linear or at least asymptotic fashion as the number of files approaches the maximum, which I can only speculate at from here. It was a fun little exercise, though. Let me know what you come up with :).

kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 12
I created all required files but finished too late to continue experiment. Will continue tonight :)BTW I bet regexp version will win, but it is still question...
Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 13
Well, regexp version approximately two times faster :)Just two times! So original algorithm is no as bad as it seems to be ;)
Michael.Nazarov@sun.coma at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...
# 14
> Well, regexp version approximately two times faster> :)> Just two times! So original algorithm is no as bad as> it seems to be ;)Very cool, I would have guessed the differences to be much higher. Thanks a lot.
kevjavaa at 2007-7-12 15:47:29 > top of Java-index,Java Essentials,New To Java...