Hashtable on disc

I've been looking crazy over the internet and in the forums, but haven't found any Hashtable implementation that stores on disc. Anyone know any good library for that?Gil
[186 byte] By [gilroittoa] at [2007-9-29 4:24:39]
# 1
could you use Serialization? HashTable and HashMap both override writeObject and readObjectasjf
asjfa at 2007-7-14 17:45:45 > top of Java-index,Other Topics,Algorithms...
# 2

Do you specifically need the performance characteristics of of a Hashtable (constant time lookup)?

Or do you need fast lookup?

Do you want the datastructure to operate live from the disk? Or just periodically back it up?

Depending on your needs, you may want to look into any number of small embedable SQL engines, and store your values as BLOB datatype.

For example:

http://sourceforge.net/projects/hsqldb/

mgbolusma at 2007-7-14 17:45:45 > top of Java-index,Other Topics,Algorithms...
# 3

Hi,

I need the hastbale to run on disc. I want it to be on accessed on disc for searches. Not just back it up with writeObject. I only need it to have the characteristics of a hashtable, that is constant time lookup, it doesn't have to be lightning fast.

I will need to modify it so I can have more than 1 hashtable in the same open file also, but that is the easy part.

I don't know if i want to embedd a whole SQL engine. i just want a hashtable, should not be more than a couple of hundred lines of codes.

Gil

gilroittoa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 4

Gil

The IRF framework contains a PersistentIrfHashtable that claims

to use a file for persistence.

http://www.itl.nist.gov/iad/894.02/projects/irf/guide/PdkcInternals.html

http://www.itl.nist.gov/iad/894.02/projects/irf/

It may be worth a look at.

As to having multiple persistent hash's sharing a file...that may be more

difficult then it appears as hashes don't have a fixed size so the second

hash can't just be appended to the first in the file (unless inserts are

very infrequent and you don't mind the cost of entirly rewriting the file

when one occurs)

matfud

matfuda at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 5

>I don't know if i want to embedd a whole SQL engine. i just want a >hashtable, should not be more than a couple of hundred lines of codes.

Probably more like a couple thousand.

I'm sure you know a file isn't like memory. You have to implement all allocation and deallocation mechanisms. When you remove an entry that's stored in the middle of the file what do you do with the "hole"? Re-use it? What will your strategy be to keep the file from becoming too fragmented?

How will you recover from errors like the app crashing halfway through a write?

Plus of course, if you're implementing a hashtable, part of that is implementing a linked list to deal with hash collisions. All this on disk.

And of course, for debugging, pull out your favourite hex editor (I recommend Axe if you can find it) and start debugging your hashtable file with 20000 entries so you can see where your missing data went.

Or write some custom code for debugging the file.

Trust me, I had to implement an on-disk balanced tree maps (AVL/RedBlack). It was no picnic.

If someone has done it already (HSQLDB) it really is worth consideration.

Mind you, sometimes you have to bite the bullet and do it yourself to get exactly what you want.

mgbolusma at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 6

I think most the difficulties above come from having nonuniform key/value sizes. If they are all the same size you should be sorted since you could take the source code for HashMap which is implemented via an array, and modify this.

you'll probably need to use RandomAccessFile (or a nio MemoryMappedBuffer if you really want)

asjf

asjfa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 7

I think the java HashMaps use closed-address hashing where each bin

(element in the internal array) is null or contains a pointer to a linked

list. Therefore to save this beasty you need to have a persistant

linkedlist implementation as well as a persistance mechanism for arrays.

If on the other hand you use open-address hashing all you require is the

ability to keep an array (and a number of integers for size etc.) on disk (memory mapping seems a good bet). Although you might wish to

split the array into blocks that can be saved out independently.

Chunking the data in this manner can allow more efficient use of disk

space. Open-address hashing has the disadvantage of requiring (for a

particular get or put operation) access to more of the array

implementaion which may hurt performance if your hashkey is not well

distributed.

In either circumstance having fixed size keys and values would

significantly simpyfy the implementation. As for the disk format you

may wish to use somthing similar to ASN1.

http://www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html

matfud

matfuda at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 8

Thanks all for your efforts.

As you have pinpointed it's not easy to do a hashtable on disc (why I hoped and prayed there was already something

out there so I didn't had to do it myself). But it is possible to do it myself I guess.

> As to having multiple persistent hash's sharing a file...

> that may be more difficult then it appears as hashes don't

> have a fixed size so the second hash can't just be appended to the first in the file

Not hard. I already have a class "PortionedFile" that gives me instances of "FilePortion". It is virtual files that

actually resides in only 1 physical file. It has some basic "memory handling". When a hash talbe grows, it requires

the file to grow (eg double size), and the PortionedFile finds new space (which might be caused be ealrier remove

portions) for the new file size and copies the old content to the new filespace and connects the virtual FilePortion

to that new area. Very simple algorithm but efficient and not too spacewasting (and discspace is not that much of a

problem anyway).

> Probably more like a couple thousand.

> I'm sure you know a file isn't like memory.

> You have to implement all allocation and deallocation mechanisms.

> When you remove an entry that's stored in the middle of the file

> what do you do with the "hole"? Re-use it?

> What will your strategy be to keep the file from becoming too fragmented?

> How will you recover from errors like the app crashing halfway through a write?

I have a memory handler for disc, which will be useful when allocating space for data (including variable size keys

and entries) in a similar way as memory allocation for primary memory is done (alloc in C, new in Java).

BUT! Does anyone know of a good "allocation and deallocation mechanisms" for disc? Would guess a existing solution

would not be too bad to use even if mine will "work".

> closed-address hashing where each bin

> (element in the internal array) is null or contains a pointer to a linked

> list. Therefore to save this beasty you need to have a persistant

> linkedlist implementation as well as a persistance mechanism for arrays.

The hashtable implementation I was thinking of would use "closed-address hashing", Which includes "pointers" to the

list for that "hash-bin". I actually don't know if this will cause more disc-reads then "open-adress hashing". The

advantage as I can see with "closed-address hashing" is that the list for the "hash-bin" can be some kind of structure with all variable length keys and data for that list, so they all can be read in 1 single read. If more data is added to a list, the whole list will be rewritten to disc.

> Trust me, I had to implement an on-disk balanced tree maps (AVL/RedBlack). It was no picnic.

I know this sad truth.

I didn't understand how the ASN1 would be of any help and what matfud meant by "As for the disk format you

may wish to use somthing similar to ASN1"

Thanks again

Gil

gilroittoa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 9

>it doesn't have to be lightning fast

in this case why not store each Map.Entry (or equiv.) in a single file? this would avoid having to manage the file space yourself

you could organize the disc like

%Whatever%\map1\304934\entry1.dat

%Whatever%\map1\334534\entry1.dat

%Whatever%\map1\334534\entry2.dat

%Whatever%\map1\334534\entry3.dat

%Whatever%\map1\323434\entry1.dat

where the number is a hashcode H, and entryN.dat holds the Nth Entry with key with hashcode H

asjf

asjfa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 10

Ha, would look funny w a thousand files on the disc.

Not good since Java has the stupid file limit of 2048 open files (see BUG 4189011 - unfixed since 1998, shame on Sun). We will have MANY of these indexes, therefor I even want to put more than 1 hashtable into the same physical file to not bump into this limit.

Gil

gilroittoa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 11

> Ha, would look funny w a thousand files on the disc.

yeah :), still other things do this all the time e.g.

"C:\Documents and Settings\asjf\Local Settings\Temporary Internet Files"

has (for me) ~2500 files in it.

> Not good since Java has the stupid file limit of 2048

> open files (see BUG 4189011 - unfixed since 1998,

> shame on Sun). We will have MANY of these indexes,

> therefor I even want to put more than 1 hashtable into

> the same physical file to not bump into this limit.

I think you'd only ever have a handful of files open simultaneously? you would only need to open each individual entry for the duration for the load/store operation (and even less if you make a caching implementation), and if you need to be thread-safe you could keep a count of open files and ensure that you never exceed a hard limit by waiting for other operations to finish.

asjf

asjfa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 12

Hi the ASN1 comment was more about debugging then anything else.

Its a resonably compact and standardised way of storing arbitrary

data. The nice thing about it is that there are may avaiable viewers that

could be used as debuggers.

one thing I was worried about was if, for example, you implemented a

LinkedList on disk. Consider adding a element to the begining then an

element to the end then one to the begining and so on. On the file these

would all be appended to the end of the file with references back to

wherever they are supposed to be in the list. All well and good until

you ask to iterate over the list and then you hit some performance

problems. The (now) middle elements of the list will be at the begining of

the file while the start and end elements are at the end. This could cause

disk thrashing esp. for a large file as each element would require a seek

to the begining of the file then one to the end. (depends on how the OS

does seeks).

> The advantage as I can see with "closed-address hashing" is that the

> list for the "hash-bin" can be some kind of structure with all

> variable length keys and data for that list, so they all can be read

> in 1 single read. If more data is added to a list, the whole list

> will be rewritten to disc.

This may or may not be a good idea. It really depends on the number of

elements that are in the bin. Im sure you know that on average there

should be few elements in each bin" but the occasional bin with a large

number of elements which could kill performance.

You could have a seperate thread (low priority) running in the

background (or intermittently) that defrags the disk array (depends

on how low level your access is to the PortionedFile class's memory

disk allocation routines.

yet more random thoughts from

matfud

matfuda at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 13
Why do you need to have the hash table on disc in the first place. Is it because it's large or is it because it's shared?
UlrikaJa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 14
Anyway, implementing a restricted (fixed size, no removes) hash table is very simple. You just use a random access file instead of an array (in memory).
UlrikaJa at 2007-7-14 17:45:46 > top of Java-index,Other Topics,Algorithms...
# 15

I need the hash on disc cos it's enormous.

It is NOT easy to do it on disc, since both keys and values are variable size. You have to have an array on disc that points out where the actual keys and values lies.

If you just use an array to represent the hashtable (open-adressing), then as metfud says: Open-address hashing has the disadvantage of requiring (for a particular get or put operation) access to more of the array implementation which may hurt performance if your hashkey is not well distributed.

My solution with a bin with all keys and values in a structure do have the problem that "the occasional bin with a large number of elements which could kill performance.". But so would the Open-adress hashing with bad distribution, since they hashCode pinpoints the same "hash-bin" no matter if you use open och closed hashing. And still if there was 10 entries in the same hash-bin, that would produce approximatly 10*100 bytes, and to seek and read or write 1000 bytes or 100 bytes is no difference. Then the linear search in that list in memory is still nothing compared to the disc read time.

Gil

gilroittoa at 2007-7-19 3:47:37 > top of Java-index,Other Topics,Algorithms...
# 16

> It is NOT easy to do it on disc, since both keys and

> values are variable size.

don't scrutinize this code too much :) but I still believe its the easiest approach for this. Its a naive implementation so there are lots of ways to make it faster (some kind of cacheing would really help), and it can obviously only take Serializable objects as keys and values

its just a quick prototype to see if the idea would get off the ground, so theres no surprise if there are (lots of) bugs (and the IOException handling isn't v. good either)

if actaully implementing something like this extending AbstractMap would possibly save a little bit of code duplication

asjfimport java.io.*;

import java.util.*;

public class DiskMap implements Map {

File base;

public static class Entry implements Map.Entry, Serializable {

Serializable key;

Serializable value;

public Object getKey() {return key;}

public Object getValue() {return value;}

public int hashCode() {return key.hashCode() + value.hashCode();}

public Object setValue(Object value) {

Object oldvalue = this.value;

this.value=(Serializable)value;

return oldvalue;

}

public Entry(Object k,Object v){

key = (Serializable) k;

value = (Serializable) v;

}

public boolean equals(Object o) {

boolean result = false;

if(o!=null && o instanceof Entry) {

Entry other = (Entry) o;

if(other.key!=null && other.value!=null)

result = other.key.equals(key) && (other.value.equals(value));

}

return false;

}

}

public DiskMap(File f){

if(f.exists()){

if(!f.isDirectory() || !f.canWrite())

throw new IllegalArgumentException();

} else {

f.mkdirs();

}

base=f;

}

public void clear(){

for(Iterator i=keySet().iterator(); i.hasNext();)

remove(i.next();

}

public Collection values(){

Set result = new HashSet();

File [] entries = base.listFiles();

for(int i=0; i<entries.length; i++) {

result.add(loadEntry(entries[i]).value);

}

return result;

}

public void putAll(Map t){

Object key = null;

for(Iterator i = t.keySet().iterator(); i.hasNext());

put(key=i.next(),t.get(key));

}

public Object remove(Object key){

Object value = null;

File f = new File(base,""+key.hashCode());

value = loadEntry(f).value;

f.delete();

return value;

}

public Set entrySet(){

Set result = new HashSet();

File [] entries = base.listFiles();

for(int i=0; i><entries.length; i++) {

result.add(loadEntry(entries[i]));

}

return result;

}

public Set keySet(){

Set result = new HashSet();

File [] entries = base.listFiles();

for(int i=0; i><entries.length; i++) {

result.add(loadEntry(entries[i]).key);

}

return result;

}

public int size(){return base.listFiles().length;}

public boolean isEmpty(){return base.listFiles().length==0;}

public int hashCode(){return base.hashCode();}

public boolean equals(Object o){return (o instanceof DiskMap) && ((DiskMap)o).base.equals(base);}

public boolean containsKey(Object key){return (new File(base,""+key.hashCode())).exists();}

public boolean containsValue(Object value){return values().contains(value);}

public Object put(Object key, Object value) {

Object result = null;

if(containsKey(key)){

result = loadEntry(key).value;

}

saveEntry(new Entry(key,value));

return result;

}

private void saveEntry(Entry entry) {

try {

File f = new File(base,""+entry.key.hashCode());

FileOutputStream fos = new FileOutputStream(f);

ObjectOutputStream out = new ObjectOutputStream(fos);

out.writeObject(entry);

out.close();

fos.close();

} catch(IOException e) {throw new RuntimeException(e);}

}

private Entry loadEntry(Object key) {

return loadEntry(new File(base,""+key.hashCode()));

}

private Entry loadEntry(File f) {

Entry entry = null;

try {

FileInputStream fis = new FileInputStream(f);

ObjectInputStream ois = new ObjectInputStream(fis);

entry = (Entry) ois.readObject();

ois.close();

fis.close();

} catch(IOException e) {throw new RuntimeException(e);

} catch(ClassNotFoundException e) {throw new RuntimeException(e);}

return entry;

}

public Object get(Object key) {

Object result = null;

if(containsKey(key)){

result = loadEntry(key).value;

}

return result;

}

public static void main(String [] arg) throws Exception {

Map m = new DiskMap(new File("h:/diskmap"));

m.put(new Integer(1),"one");

m.put("two",new Integer(2));

System.out.println(m.get(new Integer(1)));

m.put(new Integer(1),"one2");

System.out.println(m.get(new Integer(1)));

for(Iterator i = m.keySet().iterator(); i.hasNext(); )

System.out.println( i.next());

System.out.println(m.size());

m.remove(new String("two"));

m.remove(new Integer(1));

for(Iterator i = m.keySet().iterator(); i.hasNext(); )

System.out.println( i.next());

System.out.println(m.size());

}

}

>

asjfa at 2007-7-19 3:47:37 > top of Java-index,Other Topics,Algorithms...
# 17

To open a file for each key would be a performance killer (open is very slow), especially when there are millions of files - who knows the algorithm used be the operative system to find by name one of a million files in a directory? - In best case logarithmic search I would guess, might even be linear search, and then the constant time of a hashtable is lost. It would also seriously defrag the harddrive when there a millions of adds. And each file has some overhead.

But to implement Map and do it the way u do it seems very promising. And your implementation seem to really work, even if not satisfactory for really big hashtables.

Gil

gilroittoa at 2007-7-19 3:47:37 > top of Java-index,Other Topics,Algorithms...
# 18

> To open a file for each key would be a performance

> killer (open is very slow), especially when there are

> millions of files - who knows the algorithm used be

> the operative system to find by name one of a million

> files in a directory? - In best case logarithmic

> search I would guess, might even be linear search, and

> then the constant time of a hashtable is lost. It

> would also seriously defrag the harddrive when there a

> millions of adds. And each file has some overhead.

>

> But to implement Map and do it the way u do it seems

> very promising. And your implementation seem to really

> work, even if not satisfactory for really big

> hashtables.

cool, I think some performance issues might be alleviated by clever caching (ie have some scheme for not actually doing file io unless memory becomes low)

am no expert, but would have thought that simple operations such as opening a file by name should be pretty fast relative to other IO operations (especially when the OS caches like on NT)? so i wouldn't be too downhearted about this until having seen it in a real-world situtation (would be very interested to hear if you try it!! or have already :)

asjf

asjfa at 2007-7-19 3:47:37 > top of Java-index,Other Topics,Algorithms...
# 19
Try mapping the file using nio...
JProga at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 20

> Try mapping the file using nio...

It's the best solution, I guess. I have run some tests with jprofiler.

The operations are about 1000 times slower than a in-memory implementation.

The biggest bottlenecks in the example code are:

1. serialization (1.5ms per item)

2. File.listFiles() (increases like crazy)

_nuhb_a at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 21

what is your concern with memory? too many items? or items are

very very large ... ?

perhaps you could do a dual-solution....

i.e:

1) find an upper bound of elements you will have, say 2^32 per file

2) consider how many bits you'll need to store some information

about your objects location in a file (discuss below). we

will suggest 2^32 for suggesting starting offset (so) in the

file and 2^32 for suggesting ending offset (eo) as well as

2^8 for suggesting which data file this object is in. (such that

we may create new data files if one is too large, ~500 meg?,

whatever).

tell your map to save item "G", the map then gives "G" a number,

based on the order in which it has been added, (in this case 1)

and then writes the "eo" and "so" information as well as 0 to say

that this is the first data file. (considering size and if it may

like to make a new data file, etc).

all that info is written to disk in an index file. and then you store,

in a normal memory-hashmap fashion, the index at which your objects

data lies .... (in this case "1").

henceforth, you can store you objects like this (in serialized form or

whatever) and grab them, relatively, easily (and fast?)

this is, obviously, slower than a normal hashmap, but its just an

idea ....

of course, removing objects does become slow, in that it will involve

a re-adjustment of every object stored in the file after it (its

relative offsets) however, can't do much different if you're using

files, no?

anyway, just an idea ...

silkma at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 22

> what is your concern with memory? too many items? or

> items are very very large ... ?

I finally found a free jsp/servlet server. The thing is that they don't offer the normal range of features that you would find in Tomcat. They are running a Resin servlet container on a Compaq JDK with a dual processor Alpha.

The classloading mechanism and deployment has been altered severely by the system administrators, and it not possible to open server sockets. Thus if you use a database at all, it has to be run remotely. In effect you have a virtual file system that you are mounting locally. (Local file access is possible).

What's more annoying is that I can't reliably use the JNDI for storage (it just gets emptied after a few seconds) and some more quirks. I know there's HSQLDB, but I just wanted to try to code something myself, without a SQL layer on top.

> perhaps you could do a dual-solution....

I tested the pure diskbased solution first provided by one of the other posters, to see what kind of bottlenecks there are.

> is in. (such that

> we may create new data files if one is too large,

> e, ~500 meg?,

>whatever).

I just figured out how to get java.nio working correctly with a ReferenceQueue hooked on, so the code will not call new RandomAccessFile() or RandomAccessFile.getChannel() all the time. These seem to be one of the greatest bottlenecks while accessing an entry.

Besides that, opening/closing a ObjectInputStream/ObjectOutputStream seems to be very slow. but that was only the case when hooked up on a FileInputStream/FileOutputStream.

> of course, removing objects does become slow, in that

> it will involve

> a re-adjustment of every object stored in the file

It doesn't have to be. So I started to code this 'virtual memory disk' that is not indexed on any field, but instead on a simple index handed out by a sequencer (like the oid's in PostgreSQL) then you can divide a file into a few segments:

<file><segment type="metadata"/><segment type="data"/></file>

The metadata part just contains information to find the main index. The main index contains the mappings of (oid,fileoffset). It is located within the data segment itself (just assign it a oid value that other objects will never have).

A very naive implementation would:

When you add, just append to the existing data in the data segment.

When you insert, just append to the existing data in the data segment and modify the index.

When you delete, delete the oid from the index.

When modifying, only write back to the same location if the new data is equal or smallerin length.

Then do some garbage collection based on constraints to you program into the storage mechanism. (Or genetic algorithms like hotspot seem to do).

The index on disk does not need to be large, since the main index will never get very large if done correctly and allocate more space than needded for the main index to prevent it from moving around to much. Furthermore I can run the stuff through a ZIPOutputStream before you store it. You just can't read the index directly, but that wouldn't be wise I think (slow).

_nuhb_a at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 23

nuhb .. did you even read a word i said?

further, who are you? i was referring to the "gil" (op).

> The metadata part just contains information to find the main index.

> The main index contains the mappings of (oid,fileoffset). It is

> located within the data segment itself (just assign it a oid value

> that other objects will never have).

That is exactly what I suggested, except mine is more advanced and

better, in my opinion.

Anyway, relating to the rest of your post, I can barely understand

anything you have said ...

silkma at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 24

Hey silkm,

> nuhb .. did you even read a word i said?

Yes. I read every word.

> further, who are you? i was referring to the "gil"

Just me. Sorry I went off-topic. I responded, because in the source code posting the poster asked for real world performance tests. I tend not to see forum and newsgroup postings as 1-on-1 communication. I see it more like a group-meeting on a market place. (Hence the word 'forum'.:))

> That is exactly what I suggested, except mine is more

> advanced and better, in my opinion.

Another reason for responding is, that I found the topic interesting and was interested in other possibilities. I do not know which is better or more advanced. If you do think it is more advanced and better, I'd like to know why. The ultimate way would be to benchmark the two implementations. But they do need to exist. And they don't.

> Anyway, relating to the rest of your post, I can

> barely understand

> anything you have said ...

Darn. Is my English getting that bad...

_nuhb_a at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 25

>1. serialization (1.5ms per item)

>2. File.listFiles() (increases like crazy)

thanks for running those profiles - the code hasn't been optimized in any way at all.. i'd be very interested to see how much you can speed it up if you're keen :)

the OP said performance wasn't critical so its still my opinion that managing a file (via nio or RandomFile) is overkill

to improve the performance (for (2)) you could eliminate the repeated calls to File.listFiles if you know that only one Map is accessing the conceptual map on disk at a time by only calling it once on startup

i'm not sure if there is an alternative to serialization though?

asjf

asjfa at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 26

> the OP said performance wasn't critical so its still my opinion that managing a file (via nio or RandomFile) is overkill

> what is your concern with memory? too many items? or items are very very large ... ?

Well, performance IS an issue, especially for adding, since we need to add millions and millions of entries. I just don't need the "get" to be lightning fast, like superoptimezed with caching and all the correct theory being used. I think your easy-implementation suggestions with separate files are interesting, but I don't think there is this kind of shortcut to be found in my case.

Today I'm testing a third party "database" BerkleyDB with hashtables on disc from www.sleepycat.com. Adding variable size keys and values (avarage size 10 bytes) with BerkleyDB on a P4 2.2 GHz is around 1 ms per add. Getting is a bit faster. It's on that scale for add I need, and hence the suggested shortcut solutions (eg create a new file for each entry) are not even close by far.

But I don't want to use BerkleyDB since it totally corrupts data on unexpected shutdown, and everything need to be re-indexed. And then the price is 50000$ which is a few 0s too many.

Gil

gilroittoa at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 27
did you ever get this working? (and might you be willing to post the code?)
asjfa at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 28

Yes I got it working! BUT, it is alot of programming involved, thousands of lines of codes, and is rather core part of our projects, so I really cannot post it for free. It is with transactions and everything now. However, with caching I can achive 200000 updates per second (String key and long[], int[] or byte[] data). And getting the results is really quick, I read the data at the speed of disc, e.g. on my PC 8 MB data per second. So I am very happy with the result, but I just cannot give it away, my bosses wouldn't approve :( Hopefully we might do it as a separate open source project sometime in the future.

But If you need some advice I can give you that at least.

Gil

gilroittoa at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 29
And the solution is open-adress hashtable. It's easier that way and gives quicker results in average.Gil
gilroittoa at 2007-7-19 3:47:38 > top of Java-index,Other Topics,Algorithms...
# 30

it sounds really great :) i feel a bit miserly for asking for it for free now..

i've since discovered that the Map will also need a (read-only) history so you can say things likemap.get("key", date);

and have it fetch the value as it was on that date, so i'm wondering what other hidden requirements there are... back to the drawing board etc..

asjfa at 2007-7-19 3:47:42 > top of Java-index,Other Topics,Algorithms...
# 31

Ah... don't feel miserably, you never know when you get something, asking without being naggy is the way to success.

post-gres (postgre?) database does exactly what your specification says - keeps history data and allows gets for a certain date.

If that was a "hidden" requirement, then that is really bad, such a condition is really a major change.

We actually also have that requirement, and this is what we did:

Instead of just having data as data, we have tuples {data,date}. Then all data is filtered regarding date. Since date is well represented by a long, such conditions are quickly checked.

Gil

gilroittoa at 2007-7-19 3:47:42 > top of Java-index,Other Topics,Algorithms...