Internal Implementation Of Hashmap

HashMap is popular data structure in Java which stores data in key-value format. HashMap is a member of the Java Collection Framework. It is an implementation of Map interface, which works on the principle of hashing.


Internal Structure of HashMap

HashMap stores data in an internal array of buckets called as table. In hashing terms, a bucket is a container that is used to store key-value pairs. A bucket in a hashmap is nothing but a linked list.

Each node in the linked list is a Map.Entry object which stores the key, the value and a reference to the next Map.Entry object on the same bucket.

The figure below shows the structure of a HashMap.

Note:

A HashMap can contain one null key and multiple null values. Null Key is handled in a special way in HashMap. It always maps to index 0 in the bucket array.


Storing data in HashMap

To insert data into the hashmap, we need to call its put(key, value) method, passing it the key and the value objects. HashMap will create a Map.Entry object to store our key and value objects. Before storing the Map.Entry in a bucket, it needs to identify the location of the bucket in the table. It will call the hashcode() method of the key object and apply its own hashing mechanism to on the returned value to find the location of the bucket.

It is important to understand that both the key and the value objects are stored in the bucket, not just the value object.


Retrieving data from HashMap

To retrieve the value of any key from a HashMap we need to call the get(key) method. To get the value of a particular key, HashMap needs to identify the location of the bucket. It repeats the same process used to find the location of the bucket while storing the data. It uses the hashcode returned by the key object to find the bucket location and returns the value object from the Map.Entry object in the bucket.


Handling Key objects that have same hashcode

When multiple keys return same hashcode value, the bucket location derived will be same. Each of the objects will thus be stored in the same bucket. Remember that HashMap stores data in a LinkedList. So the new object will be appended to the head of the linked list. Thus the same bucket will contain two Map.Entry objects, each having its own key and value object.

Now we understand that the bucket has multiple Map.Entry objects. To retrieve a value object associated with a particular key, the linked list at that particular bucket location needs be traversed to find the appropriate Map.Entry object. The Key object in each Map.Entry object will be compared using equals() method to identify the actual Key object and its Value object will be returned.


Rehashing

HashMap maintains a threshold of 0.75, which means whenever it is 75% full; it will resize itself to double of its previous size. Once it is resized it needs to recalculate the new bucket location of the objects already stored. This process is called as rehashing.

We know that HashMap are not threadsafe, when multiple threads tries to resize a hashmap simultaneous there is a potential race condition which can lead to an infinite loop. The infinite loop occurs because while rehashing, the linked list gets reversed as hashmap appends new node to the head of the list.

Following example demonstrates the problem. Consider two threads accessing a bucket having 2 nodes A and B. Let's assume that thread1 runs first and initializes its current and next variables as A and B. But it doesn't actually gets a chance to move the nodes to the new location.

When thread2 executes, let's say it completes the process and does the reallocation. Thus the linkedlist gets reversed.

Next when thread1 gets the chance to execute, it still has its current and next variable set as A and B. When it tries to do the movement, the infinite loop is created as shown in the figure below.

RELATED ARTICLES

Internal Implementation of ArrayList

ArrayList is an implementation of List interface, used to store a list of objects. Unlike normal arrays, it is a dynamic data structure, its size dynamically increases when its capacity is exhausted.

View Article

Understanding equals() and hashCode() methods

The equals() and hashCode() are two primary methods of Java's Object Class which work together to uniquely identify any object in Java.

View Article

Internal Implementation of HashSet

HashSet is a member of the Java Collection Framework which is used to store unique entries. This article describes how a HashSet is internally implemented to store unique entries.

View Article

Identity HashMap in Java

IdentityHashMap in Java is a special implentation of the Map interface designed to be used in special cases where reference equality is required.

View Article

Iterator and ListIterator in Java

The Java API provides the Iterator and ListIterator that are used to iterate over a Collection. This article covers the difference between the two and discuss how Emumeration is different from Iterat.

View Article

Introduction to Java Collections

The Java collections framework is a library that provides commonly reusable collection data structures. It consists of a set of various interfaces, implementations and algorithms used to represent...

View Article

Understanding TreeMap

A TreeMap is an implementation of the Map interface used to store key-value pair in sorted order of their keys. This articles describe how a Treemap is implemented in Java.

View Article