zip distributes over aggregation?
hi,
for the zip algorithm in java.util.zip doessize(zip(A) + zip(B))==approxsize(zip(A+B))
?
I've run the code below on a few directories of a couple of thousand files and the results seem to more or less say yes. This seems surprising because I'd expected zip(A+B) to be able to compress more heavily in the case where A and B share some characteristic picked up upon by the zip algorithm (?). (eg. if they were identical then you'd only need just over the space to zip one of them). So this might be unlikely for just two files but In the case where you're sayingsize(zip(f1...fn))==approxsize(zip(f1) + ... + zip(fn))
and n is fairly large i'd expected this to become significant.
import java.io.*;
import java.util.*;
import java.util.zip.*;
class ZipOutputStreamTest{
publicstaticvoid main(String[] arg)throws IOException{
File base =new File(arg[0]);// dir to zip
Set allfiles = getAllFiles(base);
// zip all files
zipfiles(allfiles,new File("c:/allfiles.zip"));
// zip all files individually
File newbase =new File("c:/allfiles");
newbase.mkdirs();
for(Iterator i = allfiles.iterator(); i.hasNext(); ){
File f = (File) i.next();
zipfiles(Collections.singleton(f),new File(newbase,""+f.hashCode()));
}
}
static Set getAllFiles(File base)throws IOException{
Set result =new HashSet();
if(base.isDirectory()){
File[] child = base.listFiles();
for(int i=0; child!=null && i<child.length; i++)
result.addAll(getAllFiles(child[i]));
}else{
result.add(base);
}
return result;
}
publicstaticvoid zipfiles(Set files, File output)throws IOException{
ZipOutputStream zos =new ZipOutputStream(new FileOutputStream(output));
for(Iterator i = files.iterator(); i.hasNext(); ){
File f = (File) i.next();
FileInputStream fis =new FileInputStream(f);
ZipEntry ze =new ZipEntry(f.getPath());
zos.putNextEntry(ze);
byte[] buffer =newbyte[1024*1024];
for(int n=0; ((n=fis.read(buffer))!=-1); ){
zos.write(buffer,0,n);
}
zos.closeEntry();
fis.close();
}
zos.close();
}
}
are there other compression libraries that would have this property?
thanks,
asjf>

