Thursday, June 14, 2012

LinkedHashMap

Internal workings of LinkedHashMap

LinkedHashMap vs. HashMap


LinkedHashMap is a HashMap that also defines the iteration ordering using an additional data structure, a double linked list. By default, the iteration order is same as insertion-order. It can also be the order in which its entries were last accessed so it can be easily extended to build LRU cache.

Data Structure


The data structure of LinkedHashMap extends that of HashMap.

In HashMap, the data structure is based on array and linked list. An entry finds its location in the array based on its hash value. If an array element is already occupied, the new entry replaces the old entry and the old entry is linked to the new one.


In HashMap, there is no control on the iteration order.

In LinkedHashMap, the iteration order is defined, either by the insertion order or access order.

LinkedHashMap differs from HashMap in that it maintains a doubly-linked list running through all of its entries. The below one is a modified example of the above data structure. It defines the iteration ordering based on the order in which keys were inserted into the map. In order to do so, the entry element is extended to keep track of the after and before element. A zero size LinkedHashMap contains just the Head element with before and after pointing to itself.


LinkedHashMap data structure
LinkedHashMap data strutcture




Below is the HashMap data structure:

HashMap Data Structure
HashMap data structure









Entry



LinkedHashMap's Entry extends the HashMap's Entry so it also inherits the same properties key, value, hash and the next Entry sharing the index. Other than these, it also has couple of additional properties to maintain the double-linked list, after and before entries.


LinkedHashMap Entry class
LinkedHashMap Entry class




New Entry


LinkedHashMap inherits HashMap so its internal data structure is same as that of HashMap. Apart from that it also maintains a double-linked list which is circularly linked via the sentinel node called head. Each node contains references to the previous and to the next node . A new node is always added to the end of the list. In order to do so, the last node's and the header node's links have to be adjusted.

  1. The new node's next reference will point to the head.
  2. The new node's previous reference will point to the current last node.
  3. The current last node's next reference will point to the new node instead of head.
  4. Head's previous reference will point to the new node.
after  = head;
before = head.before;
before.after = this;
after.before = this;
Double linked-list
Double linked-list





Performance is likely to be just slightly below that of HashMap, due to the added expense of maintaining the linked list.

Access Ordered


A special LinkedHashMap(capacity, loadFactor, accessOrderBoolean) constructor is provided to create a linked hash map whose order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently. Invoking the put or get method results in an access to the corresponding entry. If the enclosing Map is access-ordered, it moves the entry to the end of the list; otherwise, it does nothing.
   public void testLinkedHashMap() {
        LinkedHashMap lru = new LinkedHashMap(16, 0.75f, true);
        lru.put("one", null);
        lru.put("two", null);
        lru.put("three", null);

        Iterator itr = lru.keySet().iterator();
        while(itr.hasNext()) {
            System.out.println(itr.next());
        }

        System.out.println("** Access one, will move it to end **");
        lru.get("one");

        itr = lru.keySet().iterator();
        while(itr.hasNext()) {
            System.out.println(itr.next());
        }

        System.out.println("** Access two, will move it to end **");
        lru.put("two", "two");

        itr = lru.keySet().iterator();
        while(itr.hasNext()) {
            System.out.println(itr.next());
        }
    }
Result:
one
two
three
** Access one, will move it to end **
two
three
one
** Access two, will move it to end **
three
one
two
Thus in access-ordered linked hash maps, merely querying the map with get is a structural modification.

Iterator


In HashMap, the iterator has to traverse through each table element and the element's own linked list, requiring time proportional to its capacity.
In LinkedHashMap, it simply has to traverse through its own double-linked list thus requires time proportional to the size of the map and not its capacity so HashMap iteration is likely to be more expensive.

Transfer


Re-sizing is supposed to be faster as it iterates through its double-linked list to transfer the contents into a new table array.
    void transfer(HashMap.Entry[] newTable) {
        int newCapacity = newTable.length;
        for (Entry e = header.after; e != header; e = e.after) {
            int index = indexFor(e.hash, newCapacity);
            e.next = newTable[index];
            newTable[index] = e;
        }
    }

Contains Value


containsValue() is Overridden to take advantage of the faster iterator.

LRU Cache


If access ordered is true, order of iteration is the order in which its entries were last accessed, from least-recently accessed to most-recently (access-order). This kind of map is well-suited to building LRU caches. The removeEldestEntry(Entry) method may be overridden to impose a policy for removing stale mappings automatically when new mappings are added to the map. For example,
    protected boolean removeEldestEntry(Map.Entry eldest) {
        return size() > maxCacheSize;
    }
The eldest entry is returned by header.after. The default implementation of removeEldestEntry() returns false.

Access Order

The main reason why one prefers LinkedHashMap over HashMap is that it can retain the order in which the elements are accessed. Below is the basic flow that a HashMap goes through to put a new entry. The blue boxes, 'Add Entry' and 'Record Access' are the ones LinkedHashMap overrides.


LinkedHashMap
LinkedHashMap



It overrides 'Record Access' to record the access order. If the user is interested in the access order, it updates its double linked list.

LinkedHashMap
LinkedHashMap


Below entry picture shows how the entry moves up the linked list. Head's next entry will point to the latest entry accessed.


LinkedHashMap
LinkedHashMap




If E2 is accessed again, HashMap identifies the entry and then calls record access.
The record access is overridden in LinkedHashMap. Below is the class diagram where LinkedHashMap extends HashMap's entry to override the recordAccess.


LinkedHashMap
LinkedHashMap



Double linked list vs Single Linked List

LinkedHashMap
LinkedHashMap


Suppose we want to remove an entry E2. 
If we have a single linked list, E3's next should point to E1 which means if want to eliminate E2, we need to know its previous entry. If it is a single linked list, we will end up traversing the entire list to search the entry previous to E2, thus the performance would in the O(n)
In case of double linked list, the previous pointer in E2 will take us to E3 in O(1).


Home

12 comments:

  1. Hi,

    This is Yashwanth. One of the nice explanation about linkedhashmap that i could find on the net. But, i think you need to explain bit more on entry array in linkedhashmap.

    HashMap maintains entry array and in each array index, a singly linkedlist runs. But in case of linkedhashmap, it extends Hashmap and uses doubly linkedlist. My question is how doubly linkedlist improves the performance of linkedhashmap. What will be the significance of array index in linkedhashmap

    Thanks,
    Yashwanth

    ReplyDelete
    Replies
    1. Hi Yashwanth,
      Thanks!
      The LinkedHashMap is HashMap + double linked list. This double linked list is in addition to the linked list that each entry of hashmMap contains. LinkedHashMap's main purpose is to retain the order in which the elements are added. Whenever a new element is added, it is also added to the double linked list. The double linked list can also be used to create a LRU cache. Please see http://javaopensourcecode.blogspot.in/2012/11/hashmap-q.html to know how HashMap iterates, there is a diagram in the end of FAQ.
      Thanks
      Ram Satish

      Delete
    2. Hi Ram , nice post , here i want to discuss about complexity of searching , lets say we have 1 2 3 4 in cache with size 4 (here in hashmap) , these are also added in to list as well simultaneously so list will be having 4 3 2 1 (most recent at head and least recent at tail) , now say if 1 is 2 is accessed , it will checked in cache since 2 exist it will be moved to head so now list will be 2 4 3 1 , so now question is how we will know if 2 is access , we need to search its position in list right , that will O(n) in worst case , can we improve this searching part , also what benefits we are getting using doubly linked list instead of singly linked list ?

      Delete
    3. Hi, Nice question.
      I have added two new sections 'Access Order' and 'Double linked list vs Single Linked List'
      Thanks
      Ram Satish

      Delete
    4. Hi Ram , again my question is how you will come to know which is accessed in doubly linked list ?? dont you need to search that element even in doubly linked list so isn't it O(n) , also even in singly linked we can easily keep track of previous pointer no need to extra search and it can be more efficient because in doubly linked list we have overhead of extra pointer of every node .

      Delete
    5. I agree we need to search the entry but the searching is done in single linked list and NOT double linked list. Double linked list comes in picture only for recording the access order and this happens only after the entry is searched. Note that LinkedHashMap extends HashMap so the search part is inherited from HashMap.

      Please see the flow in access order diagram. When you call get(key) or put(key, value), HashMap looks for the entry. This takes O(1) + O(n') where n' is the number of elements in the single linked list. These n' elements are in linked list due to collision. More the collision, more will be the keys in linked list which is why hashing becomes important.

      Once the entry is searched, record access is called and this is where we need double linked list to move the latest entry up in the ladder towards the header.

      Here again we can use single linked list instead of double linked list but then during delete we will end up with O(N). See the section Double linked list vs Single linked list.

      Feel free to comment again if it is still not clear.

      Delete
    6. Yes , ram i'm aware of that but i was talking about improvement of searching part , is there any way we can make the search in O(1) or less then O(n) , as far as access order is concerned as i told earlier we can easily keep track of previous element in singly linked list it self , i'm just just if java implementation of linkedhashmap use doubly or singly as i don't see any improvement if it use doubly rather then extra pointer with every node also as you are saying that searching part of linkedhashmap , which in run search in hashmap O(1) and in linked-list O(n) so where is improvement in searching if it uses the doubly linked list , i hope you understand my point .

      Delete
    7. Hi!,
      I think you got me wrong. I want to bring in a bit of clarity to your question.
      You are saying the search of LinkedHashMap is NOT in O(1).
      Answer is yes.

      When you say improvement in searching part, you are also assuming that LinkedHashMap and HashMap have different search algorithms.

      Answer is No. LinkedHashMap's search algorithm is same as HashMap's. Even HashMap's search is NOT in O(1).

      If there are no collisions you will get O(1) else it is O(1) + O(n'). If n is the total number of elements. Note n' is not the same as n. It is the number of elements in the linked list due to the collisions.
      The only extra thing that LinkedHashMap does is to maintain the access order. This wont have impact on the order.

      Your next question is LinkedHashMap can track access order using single linked list so why does it use double linked list.
      Answer is to make the delete efficient. If it is single linked list, the order will be O(n). With double linked list it will be O(1)

      Delete
    8. Also, forgot to mention, feel free to ask again. Your questions have added to the clarity of the post.

      Delete
    9. Hi ram , let me take an example we have a cache of size 4 and say you have accessed the 1 2 3 4 so these will added to hashmap and linked list (in order ) . now when again for request came for say 1 it will into cahe e.g. in hashmap it will find it , now how you will come to know in linked list doesn't matter its singly or double that element 1 which is duplicate, is accessed , you need to search in linked list in worst cases it will be O(n) , isn't it ? and move it to back to list to maintain the order, , because in next tern say new element 5 came which i not present in hashmap e.g. cache how cache will decide which element need to removed from it so since 1 was previously accessed and moved to back linked list , after looking in to cache it will search from list that 2 is least access element so far so it will remove 2 from hashmap and add 2 to it and to linked list as well

      let me summarize the operations
      1. look up in hashmap O(1) if found
      2. lookup in linked list O(n) no matter if its singly or doubly list.
      3. if found in step 1 , check from list which element need to remove
      3a. searching will O(n)
      3b. add this to last of list , since it have been accessed O(1)
      4. if not found in step 1
      4a. check which was last accessed O(n) from list.
      4b. remove head of list, add new element to hashmap O(n) and at last of list O(1).

      Note: we can easily keep track of previous element in singly linked list it self , while searching so that we delete in O(1) even if we having multiple thread we can make that operation or variable synchronize.

      we can also choose hashmap + balance binary tree where (insert, search ,delete all are O(logn) ) , since most of the time we need to do search it will show better performance or an index array . its going to be good discussion .

      Please let me know if anything wrong here ?

      Delete
    10. We have a cache of size 4.
      You must be first adding keys 1,2,3,4
      If it is a LinkedHashMap, assuming accessOrder true, whenever an element is accessed, it will be moved towards header
      I don't completely understand "now when again for request came for say 1 it will into cache"
      I am assuming you meant put here. If we want to add key 1 again. HashMap will search the entry.
      Key will be converted to index, if there are collisions, it will also look into linked list. Now since LinkedHashMap extends HashMap. It is the same behavior here. The linked list that got created due to collisions is NOT the same as the double linked list that is used to track access order. The collisions linked list is part of HashMap.
      "now how you will come to know in linked list doesn't matter its singly or double that element 1 which is duplicate, is accessed , you need to search in linked list in worst cases it will be O(n) , isn't it ?"
      I guess you are confused here. Search doesn't happen in double linked list
      Search happens in HashMap.
      1) O(1), when the first entry is found. Key is converted to index and the first entry is fetched table[index]. Assume the entry is e.
      2) O(n'), to search the exact key in the linked list, whose first element is e. n' are the number of collisions.
      3) Suppose the entry found is E.
      Note the above search happens in HashMap data structure. Since LinkedHashMap extends HashMap, the search returns a LinkedHashMap entry.
      Once the entry is found, it is passed on to LinkedHashMap's record access.
      Note again till now, double linked list has not come in picture.
      I guess before we need to get into other points that you have mentioned, it is important to understand that double linked list has nothing to do with search. We don't do any search in double linked list. Whenever happens in double linked list to maintain the access order is of order O(1). Once again if we don't use double linked list, deleting an element will result into O(n).

      Delete
    11. A summary of what I wanted to explain.
      1. HashMap has an [array + linked list]
      2. LinkedHashMap has an [array + linked list] + double linked list. The [array + linked list] is inherited from HashMap.
      3. Search happens in [array=O(1) and then linked list=O(n')]
      4. Search doesn't happen in double linked list. There is no need to. Search happens ONLY once in step 3.
      5. Once the entry is found. It is used to track the access order.
      6. Question is how can the entry know about the before and after links in double linked list. Answer is the entry in LinkedHashMap is an extension of entry in HashMap. LinkedHashMap E = HashMap E + (before and after links).
      7. The record access removes the entry and moves it towards the head. This takes O(1).

      before.after = after;
      after.before = before;
      after = head
      before = head.before;
      before.after = this;
      after.before = this;

      Here 'this' is the searched entry E

      Delete