TreeMap vs HashMap
I have a simple question. When should I consider to use TreeMap basides HashMap in general?
More specifically. I have a list of phone number. I have another list of phone numbers, that is taken from another source. They both are unsorted. The second list is bounded to maximum 100 000 numbers, but typically would be less then 100 numbers. I should make a match for phone number in first list I should found corresponded number in the second list (there is aslo description, like HOME theie, that is what I actually need). The catch is that second list has all number with prefix, like 011. So it is not exactly equals. Should I use TreeMap or HashMap in this case? How can I define Comparator to TreeMap to make it even more simpler? (All Implementation I found was not transitive or compareTo()==0 wasn't equal equivalence that I want).
[852 byte] By [
alexsmaila] at [2007-11-27 11:48:35]

# 1
From your description it is not clear what you want to do. Apparently, you want to compare a phone number from one list with a phone number from another list.
This has nothing whatsoever to do with TreeMap vs HashMap. "All" a TreeMap does is keep its own entries sorted by key, using either the natural order of the keys (if the keys implement Comparable) or an external comparator.
# 2
I would suggest using HashMap in your situation.
TreeMap and HashMap are almost similar implementations except for the fact that, TreeMap stores the values in the order of ascending Keys..
In your situation, I dont see any need for the ordering of the keys.
Also keep in mind that, both TreeMap and HashMap are unsynchronized. If you need synchronization, do it at creation time
Map m = Collections.synchronizedMap(new HashMap(...));
# 3
Herko_ter_Horst, the naive algorithm is the following:
- iterate throw the first list (list1)
- for each phone number (num1) in list1
- iterate throw the second list (list2)
- let num2 will be the current phone number in list2
- if num1 is equal to num2, ignoring prefix, MATCH is found
Alternative is to sort list2 and makes binar search besides linear search. But there is a problem. I needs not exactly the same number, but number equals, ignoring prefix.. In order to do this I need to define Comparator. But I've failed to do this.
Another alternative is to build from list2 TreeMap with Comparator. Yes, you got it. The problem is I don't know how to define Comparator.
Another alternative is to build from list 2 HashMap. If set S of predefined prefixes is given, so one can make |S| time attempts to find the number is HashMap, where |S| is sizeof set S.
> This has nothing whatsoever to do with TreeMap vs
> HashMap. "All" a TreeMap does is keep its own entries
> sorted by key, using either the natural order of the
> keys (if the keys implement Comparable) or an
> external comparator.
See next answer.
# 4
In general case. Map is designed to quickly found. The typical usage is the following:
- let Customer be the object with data-field id (defined as sequence in DB)
- there is unsorted list<Customer> list with ids , say {23, 476, 4, 1002, 56, 6, 3004, 2005, 5, 38) (only ids shown). sizeof list, |list| is n=10.
- you want to chose from the list Customers by id's "many" time. How many? At least Ω(n),
Naive algorithm with linear search will take θ(n^2). This can be improved by sorting list first to O(n*log n). (If search algorithm is based on comparison it can be proven to be θ(n*log n)).
But the most common use is use of HashMap. In this case O(n) is taken to build hash map, but the search expected value is O(1+α) that is constant O(1) for any practical use. In this case building hashMap+search expected value is O(n) time takes roughly O(n) time that is significantly better than O(n*log n)
Note: see next.next paragraph for for rigorous calculation.
In such a case TreeMap aslo could be used. TreeMap is balanced binary search tree. All its operations after the tree is build, such as, search, insert, delete, takes O(log n) time. The building of the tree takes O(n*log n). If list list will be transformed to TreeMap, than it will be takes O(n*log n). Search for one number will take O(log n) time. There is O(n) number. Overall search will take O(n*log n) time. Building plus seacrhing takes O(n*log n) time.
Let's examine again HashMap. HashMap is good for use when the following holds:
- the key U=Universe is big (there is a lot of keys), but the keys that will actually be used, let say set A, is relatively small;
- hash-function (value that is returned by hashCode() method) is uniformly distributed.
In the real-world application this is usually ignored.
The next thing is, that building of HashMap takes O(n). But the operations over the hash map after building, such as insert and search takes O(1+α) expected time, where α is loading factoring, the relation of the actually used entries to its capacity c (maximum number of entries to be used). Note: that if hashMap reach its capacity, new hashMap of double capacity can be defined and all entries should be copied their. While this is looks like highly cost operations, it's amortized cost is O(1). If two assumptions above holds, than α has typically small value such as 3. In these cases it is ignored in the practice (and by java-doc of HashMap). Let's make calculations without such ignores. In such a case search is taken expected value is O(1+α) time. Overall search expected value is O(n+αn)=O(n+n*n/c). Consideration of building time O(n) doesn't change the total expected value is O(n+n*n/c). if α=n/c=O(1), then the total expected value is O(n) which is better than O(n*log n) in the TreeMap or sorting list case.
Message was edited by:
alexsmail
# 5
> I would suggest using HashMap in your situation.
>
> TreeMap and HashMap are almost similar
> implementations except for the fact that, TreeMap
> stores the values in the order of ascending Keys..
>
> In your situation, I dont see any need for the
> ordering of the keys.
>
> Also keep in mind that, both TreeMap and HashMap are
> unsynchronized. If you need synchronization, do it at
> creation time
> Map m = Collections.synchronizedMap(new HashMap(...));
First of all thank you for pointing me to need to synchronized the Map.
TreeMap indeed holds data in the sorted way. It does it in order to guarantee relatively quick search and another operations after building, only O(log n), see previous post.
In my situation using HashMap along is not good. Because the match is not exact, but exact ignoring prefix. If there is set S of predefined prefix, then overall search in the map will takes expected valuus as O(n)*O(|S|)*O(1+α), neglecting α it is O(|S|*n).
If proper Comparator could be created, TreeMap could be used, the search time will remain O(log n), that is independent on |S|. The build time will be O(n*log n+|S|) (Set S will be used as HashMap). Overall search is O(|S|+n*log n). Total O(|S|+n*log n). Typically |S| << n, then it will take only O(n*log n).
Message was edited by:
alexsmail
Message was edited by:
alexsmail
# 6
Summary:
General case:
HashMap
If the following condition holds, then building hashMap takes O(n) time, and search almost constant time.
- the key Universe is big (there is a lot of keys), but the keys that will actually be used, let say set A, is relatively small;
- hash-function (value that is returned by hashCode() method) is uniformly distributed.
Total expected time is O(n+n*n/c), where c is hash-map capacity (n/c is O(1) under above assumption, so it is actually O(n)).
TreeMap
Building treeMap takes O(n*log n) time. All operation, such as search, takes O(log n) time. Total O(n*log n) time.
My special case
There is set S of predefined prefixes.
HashMap
Under same assumptions as above expected value is
O(n*|S|+n*n*|S|/c). where c is hash-map capacity (n/c is O(1) under above assumption, so it is actually O(n|S|).
TreeMap with Comparator
Building O(|S|+n*log n). Overall search is O(n*log n). Total O(|S|+n*log n). That is practically O(n*log n).
TreeMap without Comparator, using natural order
Building O(n*log n). Overall search is O(n*|S|*log n). Total O(|S|*n*log n).
Interesting enough, that is set S is small, that can be considered as a constant, than complexity of TreeMap with and without Comparator a the same O(n*log n). (Up to the constant, of course in O).
Interesting enough. If there is exact set of prefixes, than overall complexity
In my case, I know that n<=100 000. So log2 n is approximately 17.
Suppose, that uniform distribution holds (second condition above). First condition is obviously holds. So α is O(1).
Note:
a) for the hashMap estimation is made on average, that is math expectation
b) Constant in the HashMap and in TreeMap is relatively small. I consider than much less then 17.
In this case
HashMap takes (expectation)
O(n*|S|+n*n*|S|/c)=O(n*|S|)<=C00*100 000+C01, C00<<17.
TreeMap with Comparator takes
O(|S|+n*log n)<=C10*17*100 000+C11, where C10<<17.
TreeMap without Comparator takes
O(|S|*n*log n)<=|S|*C20*17*100 000+C21, where C20<<17.
So, I think 17 is big enough to chose HashMap implementation.
See, next post for empiric test.
# 7
I wrote some test. Here is the code and results:
public class A {
public static int n = 10;
public static final Random r = new Random();
private static Integer[] cache = new Integer[n];
public static void printUsage(Map<Integer, Integer> tm){
long delta1 = printUsageBuild(tm);
long delta2 = printUsageSearch(tm);
System.out.println("Over all it tookes "+(delta1+delta2)+" for "+tm.getClass());
}
public static long printUsageBuild(Map<Integer, Integer> tm){
Integer tmpInteger = null;
int tmp = 0;
long start = System.nanoTime();
for(int i=0;i<n;i++){
tmpInteger = cache[i];
if(tmpInteger==null){
tmp = r.nextInt(n);
tmpInteger = tmp;
cache[i] = tmpInteger;
}
tm.put(tmpInteger, tmpInteger);
}
long finish= System.nanoTime();
long delta = finish-start;
System.out.println("It tooks "+delta+" to build "+tm.getClass());
return delta;
}
public static long printUsageSearch(Map><Integer, Integer> tm){
Integer tmpInteger = null;
int tmp = 0;
long start = System.nanoTime();
for(int i=0;i<n;i++){
tmp = r.nextInt(n);
tmpInteger = tmp;
tm.get(tmpInteger);
}
long finish= System.nanoTime();
long delta = finish-start;
System.out.println("It tooks "+delta+" to search in the "+tm.getClass());
return delta;
}
public static void main(String[] args) {
Map><Integer, Integer> m = null;
while(n<10000000){
cache = new Integer[n];
m = new HashMap<Integer, Integer>(n);
System.out.println(" n is "+n);
System.out.println();
printUsage(m);
m = new TreeMap<Integer, Integer>();
printUsage(m);
n*=10;
}
}
}
Results are (all numbers is numbers of nano-seconds to perform operations):
Notes: for some n multiple results shown. It is done in the case when search in the hashMap took more then the search in the TreeMap. This highlits that times of the hashMap is expected time and not maximum.
It is obvious, that for n>=10000, building and searching is more effective in HashMap.
It is also obvious that on average search in HashMap is more effective than search in the TreeMap.
For n<=1000, TreeMap search and overall result is on average more effective then HashMap search.
So, for small list (length is less then 1000) TreeMap should be used, otherwise HashMap.
Note: Than even for small list HashMap can be considered to be used because of small absolute difference in time.
TreeMap should be considered to be used when order persistent is important.
