Thursday, June 14, 2012

HashMap

Internal working of HashMap

A hash map is a data structure that uses a hash function to map keys to their associated values. This article describes the internal workings of a HashMap. The main source of the article is java source code.

For Q&A, visit HashMap Q&A.

Data Structure

In HashMap, the data structure is based on array and linked list. An entry finds its location in the array based on its hash value. If an array element is already occupied, the new entry replaces the old entry and the old entry is linked to the new one. Below is an example of the entries in the data structure.

HashMap Data Structure

Put Entry

When a new entry is assigned to an index, the existing entry to the index, if any, becomes the next entry to the new entry.

Adding an entry to HashMap

Entry

An Entry consists of key, value, hash and the next entry sharing the index.

Entry Class diagram

Capacity

The default initial capacity MUST be a power-of-two. Even if the caller provides a capacity c which is 2^x-1 > c < 2^x, the capacity is converted to 2^x to create the internal array.

Capacity Adjustment to match power-of-two

If the capacity of array is in power-of-two, the hash code can be easily converted to the index based on a simple AND operation:

index =  hashCode & (length-1)

Here length = 2^x.

In Hashtable, the capacity doesn't need to be in power of 2 and the internal array is created for whatever capacity is provided but the index calculation is based on modulus operand:

index = (hashCode & 0x7FFFFFFF) % length.

Since the capacity is in power of 2, a simple AND operation is able to return us the index.

Lets see how it works using an example.

Suppose the hash=311 and length=16, applying the modulus operation we get:
hash = 16 * 19 + 7 (length*quotient + index), thus index = 7.

Let us convert the hash to binary.
311 = 1 0011 0111 in binary
= 1 0011 0111 = 1 0011 0000 + 0111
= 2⁴*1 + 2⁵*1 + 2⁶*0 + 2⁷*0 + 2⁸*1 + 0111
= 2⁴(2⁰*1 + 2¹*1 + 2²*0 + 2³*0 + 2⁴*1) + 0111
= 2⁴(10011) + 0111

This is in the same form as the length*quotient + index, here index = 0111, thus if length=2^x, index is nothing but the lower x bits of hash code.

We can obtain these lower bits by doing & operation.

index = 0111 = 1 0111 0111 & 1111

Thus index = hash & (2^length - 1).

public void testFindModulusByAND() {
        int n = 311;
        int d = 1 << 4;
        int m = 7;
        assertEquals(m, n & (d -1));
    }

Re-hashing

Since HashMap uses power-of-two length hash table, the lower bits form the index. If the hash function is poor, it may lead to different hash codes but with same lower bits.

If a hash code results into the same index, we encounter collisions. In order to avoid this, HashMap applies a supplemental hash function to a given hashCode. For example, hash codes 375 and 311, has the same lower bits so they return the same index.

375 = 1 0111 0111
311 = 1 0011 0111

The index would be the lower four bits 0111=7.
HashMap applies the below hash function to refine the original hash code.

 h ^= (h >>> 20) ^ (h >>> 12);
 h ^= (h >>> 7) ^ (h >>> 4);

When we apply it on 311, we get 294=1 0010 0110, index is 6.
When we apply it on 375, we get 354=1 0110 0010, index is 2.

Collision

If two different key objects return the same hashcode, it will result in collision as the bucket location would be same. The new entry will replace the previous entry and the next reference of the new entry will point to the previous entry. For hash tables, this means that the second table insertion will be slower as the links have to be adjusted.

Hashmap Collision

Get Entry

The key is converted to its hash value and then HashMap's internal hash function is applied on it to get the actual hash code. This hash code is then used to identify the bucket.

The Object sitting in the bucket is of type Entry so it knows the key, value, hash and its next entry, if any. There may be more than one object stored in same bucket, in such case, the entry will be a linked list where next entry will be its next node.

We must traverse through the list to identify the Object for the associated key. If the entry's hash matches with key's hash value and its key matches with the key in context, it means we have found the correct node.

e.hash == hash
e.key == key || key.equals(k)

It is important that the key used in HashMap should be immutable, final object with proper equals() and hashcode() implementation.

NULL Key

HashMap accepts Null key, index 0 is reserved for the Null key entry.

HashMap with a NullKey

Re-sizing

Threshold

Threshold defines the next size value at which to resize the hash table. If the size of the map exceeds a given threshold defined by load-factor e.g. if load factor is .75, it will act to re-size the map once it filled 75%. Java HashMap does that by creating another new bucket array of size twice of previous size of HashMap. The new entry can occupy any index based on the hash value.

In the below diagram, the emphasis is on the relation between size, threshold and expansion of capacity. In order to simplify the diagram, it is assumed the new entries occupy array elements in serial order.

HashMap Threshold Mechanics

All the entries are transferred from current table to new Table.

This process is called rehashing because it also applies hash function to find new bucket location.

Transfer of elements

Whenever we have to re-size the hash table, the elements from the old array will have to be transferred to the new expanded array. Since the capacity is doubled, the elements based on the hash key may either stay at same index, or move with a power of two offset.

Re-hashing of HashMap

Iterator

Since the hash value determines the bucket location, the iterator doesn't promises any order. Iterator returns the first not null element in the array starting with index=0.
Below example puts keys in order "Hello","!", "How","are","you" to a HashMap object but iterator returns in completely different order. It returns elements from the first available entry, its next entry in the linked list, if any, else the next available entry in the table till it reaches the end of the table.

Order of elements returned by the iterator

Iterator Example

    public void testMapItr() {
        Map m = new HashMap();
        m.put("Hello", null);
        m.put("!", null);
        m.put("How", null);
        m.put("are", null);
        m.put("you", null);
        Iterator itr = m.keySet().iterator();
        while (itr.hasNext()) {
            String key = itr.next();
            int hash = hash(key.hashCode());
            System.out.println("table[" + (hash & 15) + "]=" + key);
        }
    }
    static int hash(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

Result:

table[3]=are
table[3]=!
table[10]=you
table[11]=How
table[14]=Hello

More Java Maps

There are more articles on Java Maps that you may want to read here:

Home

11 comments:

AnonymousDecember 1, 2012 at 4:54 AM
Nice!

the following code actually is generating same index which is an example for collision,

public class ItemTest {
public static void main(String[] args) {

Map m = new HashMap();
m.put("Hello", null);
m.put("How", null);
m.put("are", null);
m.put("you!", null);

Iterator itr = m.keySet().iterator();
while (itr.hasNext()) {
String key = (String)itr.next();
int hash = key.hashCode();
System.out.println("table[" + (hash & 15) + "]=" + key);
}
}
}

output :

table[4]=are
table[2]=you!
table[0]=How
table[2]=Hello

look at index 2
ReplyDelete
Replies
Java_DebuggerDecember 3, 2012 at 4:43 AM
The above program to be corrected as follows:

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

@SuppressWarnings("all")
public class Test extends HashMap{

public static void main(String[] args) {

HashMap m = new HashMap();
m.put("Hello", null);
m.put("How", null);
m.put("are", null);
m.put("you!", null);

Iterator itr = m.keySet().iterator();
while (itr.hasNext()) {
String key = (String)itr.next();
System.out.println("table[" + (getHashMapsHash(key.hashCode()) & 15) + "]=" + key);
}

}
// Note I had to use the following method from HashMap's class hash method, because though this is marked static in HashMap, I cannot use it with class name like HashMap.hash(key.hashCode), because the method is marked with access modifier default

static int getHashMapsHash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}

}

------output:

table[3]=are
table[5]=you!
table[11]=How
table[14]=Hello
ReplyDelete
Replies
Java_DebuggerDecember 3, 2012 at 4:45 AM
Sorry , the
class declaration should be

public class Test{

not
public class Test extends HashMap{

typo error( was trying something) My Apologies

ReplyDelete
Replies
AnonymousApril 20, 2013 at 3:48 AM
Hi I was reading about HashMap and came across this Diagram:

http://en.wikipedia.org/wiki/File:Hash_table_5_0_1_1_1_1_0_LL.svg

Which made things a bit confusing, Please correct me if I am wrong:

Under the Sub-Title "put entry" , in the Diagram showing Entry(in orange) being added
The Entry being added (Shown in orange) shouldn't it be added to the array(i.e.table) instead of the LinkedList
and the "previous entry" that would be the element residing in the array would shift to become part of LinkedList .i.e it would become the overflowing entry
ReplyDelete
Replies

Add comment