faster copy files algorithem
Hi all expert,
I am writing a method that copy a file to another location.
My case need to copy many large size files at one time, i think my algorithem is quite slow.
Anybody could modify my algorithem to make it as fast as possible ?
My algorithem is as follow, it is simple:
public void copyFile(final File fromFile, final File toFile)
{
try
{
File fp = toFile.getParentFile();
if (!fp.exists())
{
fp.mkdirs();
}
FileInputStream in = new FileInputStream(fromFile);
FileOutputStream fo = new FileOutputStream(toFile);
BufferedOutputStream out = new BufferedOutputStream(fo);
byte[] buffer = new byte[512];
for (int i = 0;(i = in.read(buffer)) > -1;)
{
out.write(buffer, 0, i);
}
out.close();
fo.close();
in.close();
}
catch (Exception e)
{
e.printStackTrace();
}
}
Thanks very much !
[981 byte] By [
aragon28a] at [2007-9-28 13:41:07]

Increase the buffer size? I think that should help. I mean, how much is 512 bytes?Ingo
> Anybody could modify my algorithem to make it as fast> as possible ?You use Runtime.exec() and copy the file using the OS copy command. I doubt you are going to be able to do anything in java that will be faster than that.
thanks all of your reply.I searched the copy files topics in forum, i found that most people's suggestion is just like my one.And when i changed the buffer size to 1024, it's really faster than before.
It would be interesting to see how much improvement you get by doubling the buffer size.
We observed that 512 was slow
1024 was better.
Try an experiemth using
512
1024
2048
4096
8192
you pick the end point and see what the improvement is.
Clearly input file sizes make a difference. I suspect that to optimize the copy time you could suspend all other threads in the program and make the buffersize the maximum available. Then when the copy is over resume.
If the input file size is smaller than the available memeory, make the buffer bigger than the file size.
Then again, I suspect that as stated previously there'll be no java copy method faster than the OS's copy method.
But you'll have to handle "out of disk space" messages from the OS.
Any comments?
Another experiment which I don't know how much it is worth... Could you have two threads: one is a producer that reads the file and creates buffers; the other one is the consumer that reads the buffers and write them. You have to synchronize the communication between the two threads.
Can we gain speed with this architecture?
Another way is to look at the java.nio.PipeChannel or SocketChannel and have two programs too, the producer and the consumer. You should be able to copy efficiently over a network...
use File.length() to set the buffer size equal to the file size, which minimizes the overhead, and read/write!
YES! dingdingdingding! Right on the money here, when I was in school, we did this exact experiment in class. Bigger buffers work better in most cases especially for bigger files, but there's a point where it starts to level off... Of course my instructor was an idiot, he couldn't understand why different buffer sizez didn't make a difference copying a 10 byte file..
> It would be interesting to see how much improvement
> you get by doubling the buffer size.
>
> We observed that 512 was slow
> 1024 was better.
>
> Try an experiemth using
> 512
> 1024
> 2048
> 4096
> 8192
> you pick the end point and see what the improvement
> is.
> Another experiment which I don't know how much it is
> worth... Could you have two threads: one is a producer
> that reads the file and creates buffers; the other one
> is the consumer that reads the buffers and write them.
> You have to synchronize the communication between the
> two threads.
> Can we gain speed with this architecture?
Either you synchronize or you risk race conditions. If you sync, one process will be waiting for the other, it's no different than one thread going back and forth. I dont think you'd get any improvement out of it,
>ChuckBing said:
>use File.length() to set the buffer size
>equal to the file size, which minimizes the
>overhead, and read/write!
That's fine with small files, but I sure wouldn't recommend that to anyone trying to copy anything even as big as an mp3.. You'll just be wasting memory, and doing the same file IO's anyway. Hard drives can only give you back so much data at once anyway, so going arbitrarily high will definately stop paying off after a certain point. I dunno where that point is, maybe 64k or your hd's sector size? Ask a hd for 6.3 megs, it will give you 64k, then another 64k, then another 64k, etc... Java might appear to do it in one nice swoop, but the OS will be breaking that request up into bite-size chunks for you, and java will be usign more resources than it needs.
> Either you synchronize or you risk race conditions.
> If you sync, one process will be waiting for the
> other, it's no different than one thread going back
> and forth. I dont think you'd get any improvement
> out of it,
i've done some tests here and it gives a worthwhile speedup under windows
asjfa at 2007-7-12 9:47:58 >

> i've done some tests here and it gives a worthwhile
> speedup under windows
I'd be interested in running those tests on some different OS'es, I'll have to try that out.. I'm wondering if windows is doing some caching or something, cuz theoretically, I can't see how threading it out could help.. Do you have some short code you could post?
sure - its not pretty but does seem to work..
also, its not the producer/consumer idea from above. The code I'm actually using uses the java nio api for copying (its integrated into some other code so :( can't be posted), and can take up to 500Mb RAM when its running..
pls let me know ur results!
asjf
java SpeedTest5 sourceroot targetroot maxthreads
import java.io.*;
// Test the effect of threading on copying a file tree
class SpeedTest5
{
static final int BUFF_SIZE = 1024*1024;
File source, target;
int sourcePrefixLength;
int maxThreadCount;
ThreadGroup dircopythreads;
SpeedTest5(File a, File b, int max)
{
source = a;
target = b;
sourcePrefixLength = source.getAbsolutePath().length();
maxThreadCount = max;
dircopythreads = new ThreadGroup("mythreads");
}
public void go() throws Exception
{
long start = System.currentTimeMillis();
new Thread(dircopythreads, new SubdirCopy(source), "Top level thread").start();
while(dircopythreads.activeCount()!=0)
{
dircopythreads.list();
Thread.currentThread().sleep(4000);
}
System.out.println("Took "+(System.currentTimeMillis()-start));
}
public static void main(String [] arg) throws Exception
{
SpeedTest5 st5 = new SpeedTest5(new File(arg[0]), new File(arg[1]), Integer.parseInt(arg[2]));
st5.go();
}
// inner class for subdirectory copying threads
class SubdirCopy implements Runnable
{
File subdirectory; // the threads sub directory to copy
byte[] buffer = new byte[BUFF_SIZE]; // the threads local byte buffer for the j2se1.3 copyfile routine
SubdirCopy(File base){subdirectory=base;}
public void run(){recurse(subdirectory);}
public void copyfile(File a,File b) // nicked from http://java.sun.com/docs/books/performance/1st_edition/html/JPIOPerformance.fm.html#11078
{
try {
InputStream in = new FileInputStream(a);
FileOutputStream out = new FileOutputStream(b);
while(true) {
int count = in.read(buffer);
if(count == -1) break;
out.write(buffer,0,count);
}
out.close();
in.close();
}
catch(Exception e){e.printStackTrace();}
}
public void recurse(File base)
{
File [] children = base.listFiles();
for(int i=0; i<children.length; i++)
if(children[i].isDirectory())
{
if(dircopythreads.activeCount()><maxThreadCount)
new Thread(dircopythreads,new SubdirCopy(children[i]),"Thread assigned to "+children[i]).start();
else
recurse(children[i]);
}
else
{
File target_file = new File(target, children[i].getAbsolutePath().substring(sourcePrefixLength));
//System.out.println("Copy "+children[i]+" to "+target_file);
while(!target_file.getParentFile().exists())
target_file.getParentFile().mkdirs();
copyfile(children[i],target_file);
}
}
}
}
>
asjfa at 2007-7-12 9:47:58 >

Experiment on the two threads producer/consumer idea.
A micro-benchmark to check how efficient or not is this solution to copy a big file, compared with the traditional buffer/read/write loop.
The program:
import java.util.*;
import java.io.*;
/**
* ThreadCopy: copy a file using 2 threads.
*
* @author Pierre M閠ras
* @date 20030312
*/
public class ThreadCopy
{
/**
* The buffer size.
*/
private final static int BUFFER_SIZE = 65736;
//private final static int BUFFER_SIZE = 8192;
private static volatile boolean _finished = false;
/**
* The queue
*/
private static List _list = new LinkedList();
/**
* Delay between the 2 threads.
*/
private static volatile int _delay = 0;
private static long _start;
/**
* Main entry point.
*/
public static void main(final String[] args)
{
if (!"2".equals(args[0]) && !"1".equals(args[0]))
{
System.err.println("ThreadCopy benchmark");
System.err.println("java ThreadCopy 2 fileIn fileOut\t2 threads copy [numberBuffersDelay]");
System.err.println("java ThreadCopy 1 fileIn fileOut\t1 thread copy");
}
_start = System.currentTimeMillis();
if ("2".equals(args[0]))
{
try
{
_delay = Integer.parseInt(args[3]);
}
catch (NumberFormatException nfe)
{
_delay = 0;
}
catch (ArrayIndexOutOfBoundsException aiobe)
{
_delay = 0;
}
Thread read = new ReadThread(args[1]);
Thread write = new WriteThread(args[2]);
read.start();
write.start();
}
else
{
copyFile(args[1], args[2]);
}
}
/**
* The reader thread.
* Reads file and put buffers in the queue.
*/
static class ReadThread
extends Thread
{
private String _fileIn;
ReadThread(final String fileIn)
{
super("ReadThread");
_fileIn = fileIn;
}
public void run()
{
try
{
BufferedInputStream in = new BufferedInputStream(new FileInputStream(_fileIn), BUFFER_SIZE);
byte[] buffer = new byte[BUFFER_SIZE];
int i = 0;
int count;
while ((count = in.read(buffer)) > -1)
{
// System.out.println("Read #" + i + " " + count);
synchronized (_list)
{
_list.add(new BufferInfo(count, buffer));
_delay--;
_list.notifyAll();
}
buffer = new byte[BUFFER_SIZE];
i++;
}
in.close();
}
catch (IOException ioe)
{
System.err.println("Read error: " + ioe);
}
finally
{
_finished = true;
_delay = -1;
synchronized (_list)
{
_list.notifyAll();
}
}
}
}
/**
* The writer thread.
* Consume the buffers from the queue and write them to disk.
*/
static class WriteThread
extends Thread
{
private String _fileOut;
WriteThread(final String fileOut)
{
super("WriteThread");
_fileOut = fileOut;
}
public void run()
{
int i = 0;
long size = 0L;
try
{
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(_fileOut), BUFFER_SIZE);
while (true)
{
BufferInfo bi = null;
synchronized (_list)
{
while (_list.isEmpty() && !_finished)
{
try
{
_list.wait();
}
catch (InterruptedException ie)
{
// Awaken!
}
}
if (_list.isEmpty() && _finished)
{
out.close();
return;
}
if (_delay < 0 && !_list.isEmpty())
{
bi = (BufferInfo) _list.remove(0);
}
}
if (bi != null)
{
// System.out.println("Write #" + i);
out.write(bi.getBuffer(), 0, bi.getSize());
size += bi.getSize();
i++;
}
}
}
catch (IOException ioe)
{
System.err.println("Write error: " + ioe);
}
finally
{
printStats(size);
}
}
}
/**
* The buffer info store in the queue.
*/
static class BufferInfo
{
private final int _size;
private byte[] _buffer;
BufferInfo(final int size, final byte[] buffer)
{
_size = size;
_buffer = buffer;
}
int getSize()
{
return _size;
}
byte[] getBuffer()
{
return _buffer;
}
}
/**
* The traditional copy function, as a reference.
*/
private static void copyFile(final String fromFile, final String toFile)
{
long size = 0L;
try
{
BufferedInputStream in = new BufferedInputStream(new FileInputStream(fromFile), BUFFER_SIZE);
BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(toFile), BUFFER_SIZE);
byte[] buffer = new byte[BUFFER_SIZE];
int count = 0;
for (int i = 0; (i = in.read(buffer)) > -1; )
{
//System.out.println("ReadWrite #" + count);
out.write(buffer, 0, i);
size += i;
count++;
}
out.close();
in.close();
}
catch (IOException ioe)
{
System.err.println("copyFile error: " + ioe);
}
printStats(size);
}
/**
* Print a few stats about copy efficiency.
*/
private static void printStats(final long size)
{
long end = System.currentTimeMillis();
System.out.println();
double last = (end - _start) / 1000.0;
System.out.println("Total time=" + last + "s");
System.out.println("Throughput=" + ((long) (size / last / 1000.0)) + "KB/s");
}
}
The results:
The optional fourth parameter is the delay that has the consumer, giving some advance to the producer thread to fill that number of buffers.
Timing are the best results from 5 runs.
My disk is really fragmented, with less than 10% space free. So don't take these figures too much seriously!
But I'm interested if someone can run this program on a system with a good and clean disk subsystem.
FYI, the DOS copy command takes a similar amount of time, but less resources intensive.
J2SDK 1.4.1_01
PC Win2K 2xPII 300MHz 512MB 13GBx7200tr/s
File size = 71652 KB
BUFFER SIZE=8KB
===============
C:\Temp>java -Xms256m -Xmx256m ThreadCopy 1 bigfile toto
Total time=13.422s
Throughput=5466KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto
Total time=13.625s
Throughput=5385KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 5
Total time=14.078s
Throughput=5211KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 30
Total time=13.609s
Throughput=5391KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 100
Total time=14.14s
Throughput=5188KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 1000
Total time=15.266s
Throughput=4806KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 10000
Total time=17.172s
Throughput=4272KB/s
BUFFER SIZE=64KB
================
C:\Temp>java -Xmx256m ThreadCopy 1 bigfile toto
Total time=13.953s
Throughput=5258KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto
Total time=13.578s
Throughput=5403KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 5
Total time=14.078s
Throughput=5211KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 30
Total time=13.438s
Throughput=5459KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 50
Total time=13.265s
Throughput=5531KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 100
Total time=14.719s
Throughput=4984KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 1000
Total time=15.406s
Throughput=4762KB/s
C:\Temp>java -Xmx256m ThreadCopy 2 bigfile toto 10000
Total time=16.688s
Throughput=4396KB/s
The conclusion:
No definite winner between both methods.
Filling the memory with many buffer is not good, but around 30 64KB buffers gives the best results.
A better solution would be to pre-allocate the buffers and keep them in a pool. This solution is left as an exercise to the reader ;-)
Another few comments, after some experiments again:
1) Using a pool to keep allocated buffers increases memory usage but gives no significant boost in performance. So GC is doing a good job by itself to recycle memory...
2) Profiling shows that more than 90% of the time of WriteThread is spent writing data, in java.io.FileOutputStream.writeBytes.
Globally:
write=70%
read=10%
So whatever is the adopted solution, to copy efficiently big files, use a speedy disks system!
just to add that I still think threading can benefit copying file trees, even if not single files..
asjfa at 2007-7-12 9:47:58 >
