What are Hash Tables?
Hash Tables are data structures used to store the data in key/value pair format. It uses a hash function to compute an index which will be used in an array to store the element at that index.
What is key/value pair though?
Alright, I will be digging in fundamentals here. Let’s take an example of database table. To retrieve a particular value from database table, you sometimes need to know a primary key or a unique value from the row of database table. Then you query on database table based on that unique value or primary key to get that entire row or that particular value you are looking for me.
Still complicated?
Let’s take an example of classroom. You are in 2nd grade class and when a teacher does roll call, she doesn’t necessarily call your name, she calls the number assigned to you. So example
1 – John Doe
2 – Jill Doe
3 – Mark Ranson
So the roll number assigned to the student becomes a key to identify that student.
Similarly in programming languages (Java in this case), we use a data structure called Hash Tables.
Hash function takes an input, hashes that input to generate an index which we use as a key to store the value in an array. Why so complexity? Why not we go in sequential order?
There are many reasons, first hashing gives security. If somebody exploits sequential order, it is easy to find next element. But hashing allows us to randomly store the data. But the most important, the average time required to search for an element in a hash table is O(1).
Now from the basics, we can say that hash tables have two components – one an array to store the value and a function to calculate the index of the array.
So what is a hash function and how do we write this hash function?
A hash function is a function that takes a data of any size and transforms that data into a fixed size data. In short a hash function will take an input x and transform that into output y. Now, this looks simple, but the question arises what if there are multiple inputs that can be transformed into y. Then we will have a problem. This is known as Collision.
Important characteristics of this hash function
- It should avoid collision.
- It should easily calculate the keys.
- It should uniformly distribute the keys.
How to avoid collision?
There are a couple of techniques.
One technique is open addressing. In Open Addressing, store all elements in hash table itself. At any point, the size of the hash table must be greater than or equal to that of the number of keys. This is useful in the scenario of fixed size tables. During insertion, if you found the occupied slot in the hash table, you go for the next slot. It will continue until it finds an unoccupied slot. Since this is a linear process, open addressing is also linear probing. The disadvantage of open addressing is insertion and search operation becomes linear.
The second technique is Separate Chaining. In this, make each cell of a hash table point to a linked list of records. So if a hash function returns a duplicate key, the value will be placed in a linked list which will be pointed by earlier value stored at that key. The next value will be pointed by earlier linked list element. To make this simpler – let’s assume we have a has function key % 3
and so for 9, it will return 0. For 10, it will return 1. For 16, it will return 1 again. Now when we will store a value (for 10), we will store at index 1 and the next value (for 16), will be in a linked list pointed by the value stored at 1.
When do we use hash tables?
- Hash tables offer fast insertion
- Hash tables allow fast deletion
- Hash tables can help in searching an element
References