Internal Implementation of HashSet

HashSet is a member of the Java Collection Framework used to store unique entries. It is an implementation of the Set interface. A Set is a collection in Java which does not allows duplicate entries. Since a HashSet implements the Set interface, it only stores unique entries.

Similar to HashMap, HashSet also works on the principle of hashing. In fact, it internally stores data in a HashMap. Since a HashMap doesn’t maintain the order of insertion, a HashSet also cannot guarantee about the insertion order of the objects.

Shown below is the constructor of the HashSet class:

public  HashSet() {
    map = new HashMap<E,Object>(); 
}

map’ is an instance variable of HashSet:

private transient HashMap<E,Object> map;

How HashSet stores unique entries?

The objects added to the HashSet are stored as the Key object in its internal HasMap. Since HashMap doesn’t allow duplicate keys, a HashSet can only contain unique entries.

Also we know that HashMap allows null key, thus HashSet also allow null value to be stored, but only one null value can be stored.


Adding and removing objects from a HashSet

To add/ remove any object from a HashSet, we use the add(object)/remove(object) methods of the HashSet. As we already know a HashSet internally stores data in a HashMap, the add(object) and remove(object) methods internally delegates to the put(key, object) and remove(key) methods of the HashMap.

public boolean  add(E e) {
    return map.put(e, PRESENT)==null;
}

public boolean  remove(Object o) {
    return map.remove(o)==PRESENT;
}

The object added to the HashSet, serves as the key in its internal hashmap. HashMap uses a dummy Object instance ('PRESENT') to store as the Value object in the HashMap.

private static final Object PRESENT = new Object();

We know that a HashMap allows null values, so a question that arises here is why we need to use a dummy object for value.

The answer lies in the fact that the add(object) method returns a boolean value. It returns true when a new unique object is added to the HashMap and returns false when we try to add a duplicate object to it. Remember that a HashMap returns null whenever a new key-value pair is added to it, and returns the old value object when a new value object is added to an existing key in the HashMap. The PRESENT object is used to compare the value returned by the HashMap's put(key, value) method and identify if the entry added to the Hashset is a unique or duplicate entry. If the value is null its unique, duplicate otherwise.

We can see that the remove(object) method of HashSet aslo returns a boolean value depending on the value returned by the remove(key) method of the HashMap.

When an entry is removed from the HashMap, its remove(key) method returns the value object which is being removed. In the HashSet’s remove(object) method we compare this object to our dummy object(‘PRESENT’). This is done to make sure that we are removing an entry which actually exists in the HashSet. It returns true when we remove an existing entry and returns false otherwise.

RELATED ARTICLES

Internal Implementation of ArrayList

ArrayList is an implementation of List interface, used to store a list of objects. Unlike normal arrays, it is a dynamic data structure, its size dynamically increases when its capacity is exhausted.

View Article

Understanding equals() and hashCode() methods

The equals() and hashCode() are two primary methods of Java's Object Class which work together to uniquely identify any object in Java.

View Article

Internal Implementation of HashMap

HashMap is popular data structure in Java which stores data in key-value format. This article describes the internal implementation to store and retrieve data in a hashmap.

View Article

Identity HashMap in Java

IdentityHashMap in Java is a special implentation of the Map interface designed to be used in special cases where reference equality is required.

View Article

Iterator and ListIterator in Java

The Java API provides the Iterator and ListIterator that are used to iterate over a Collection. This article covers the difference between the two and discuss how Emumeration is different from Iterat.

View Article

Introduction to Java Collections

The Java collections framework is a library that provides commonly reusable collection data structures. It consists of a set of various interfaces, implementations and algorithms used to represent...

View Article

Understanding TreeMap

A TreeMap is an implementation of the Map interface used to store key-value pair in sorted order of their keys. This articles describe how a Treemap is implemented in Java.

View Article