Tag Archives: data structures

Introduction to Graphs

In my previous article, I talked about hashtables. I will discuss one more data structure in this post and it is probably one of the most important data structures of all and that is Graphs.

Clearly, our current web technologies are heavily reliant on graphs. Google, Facebook, or LinkedIn or any social media platform which includes users use graphs as a data structure. So, graphs are the most common data structure to solve problems related to finding the distance between two nodes OR the shortest path from place A to place B.

Therefore, when it comes to the social network, we are accustomed to six degrees of freedom, in such cases, we can use graphs to find how many degrees will it take to connect two nodes on the social network. In networking, most use graphs to find the fastest way to deliver the response.

How do you explain Graphs to 5-year-olds?

The easiest example, one can give to a kid to explain Graphs, is to look at City A and City B on a map. Now use the road that connects to those two cities.

City A – has bananas, and oranges, city B – has apples, and city C – has watermelons.

Now on the map, when we travel from City A to City B, what possible route we can take and what information we can exchange. City A and City B can transfer apples, bananas, oranges to each other. Once City B gets bananas and oranges, it can transfer that to other neighboring cities.

In short, we are connecting nodes (vertices) of cities A and B through a road (edge) while exchanging the products these two cities are known for.

Graphs Data Structure

In this post, we will discuss graphs from the Java perspective. Graphs allow representing real-life relationships between different types of data. There are two important aspects to graph:

  • Vertices (Nodes) – Nodes represent the points of a graph where the graph is connected. Node store the data or data points. 
  • Edges – Edges represent the relationship between different nodes. Edges can have weight or cost.

However, there is no starting node or ending node in the graph. A graph can be cyclical or acyclical. In conclusion, edges can be directed or undirected which give birth to graphs as directed or undirected. 

For instance, edges are generally represented in the form of a set of ordered pairs as in (x,y) – there is an edge from node x to node y. So (x,y) can be different from (y,x), especially in the directed graph.

Representations of Graphs

A. Adjacency Matrix –

This is a 2 dimensional array of size n*n where n is number of nodes in the graph. adj[][] is the usual way of representing this matrix.  So if adj[i][j] = 1, it represents an edge between node i and node j. Adjacency matrix for an undirected graph is symmetrical. Now if I have to represent the graph shown above in the figure, I will represent it like below:

                A               B             C        G         E
               A                 0               1             0         1         0
               B                1              0             1         0         1
               C                0              1             0         0         1
               G                1              0             0         0         1
               E                0              1             1         1         0

 B. Adjacency List –

Similarly, an array of lists is used. The size of the array is equal to the number of nodes in the graph. So arr[i] will indicate the list of vertices adjacent to node i.

 

Operations on the Graphs

There are common operations that we will use often. Likewise, graph as a data structure offers the following operations:

Additions

addNode  – Add a node in the existing graph

addEdge – Add an edge in the existing graph between two nodes

Removal

removeNode – Remove a node from the existing graph

removeEdge – Remove an edge between two nodes from the graph

Search

contains– find if the graph contains the given node

hasEdge – find if there is an edge between given two nodes

 

Time and Space Complexity of operations on Graphs

Above all, a post would be incomplete if I didn’t talk about complexity about operations on the graph data structure. Basically, this really depends on what representations you use for the graph. With adjacency matrix, addition and removal operations are O(1) operations. While search operations like contains and hasEdge are also O(1) operations. In addition, the space complexity for the adjacency matrix is O(n*n).

While with adjacency list, additions are O(1) and removal of a node is O(n) operation, removal of an edge is O(1) . Therefore, search operations are equally O(1)

Conclusion

In conclusion, I showed the basics of the graph as a data structure. The graph is a data structure that contains nodes and edges. Also, It has operations like additions, removal, and search. In future posts, I will talk about implementing Depth First Search and Breadth First Search in the graph. After that, we will solve some real problems using this data structure. Above all, Graph is an important data structure.

References

  1. Introduction to Graphs – Graphs
  2. Graph as data structure – Graph as data structure

Hash Tables

What are Hash Tables?

Hash Tables are data structures used to store the data in key/value pair format. It uses a hash function to compute an index which will be used in an array to store the element at that index.

What is key/value pair though?

Alright, I will be digging in fundamentals here. Let’s take an example of database table. To retrieve a particular value from database table, you sometimes need to know a primary key or a unique value from the row of database table. Then you query on database table based on that unique value or primary key to get that entire row or that particular value you are looking for me.

Still complicated?

Let’s take an example of classroom. You are in 2nd grade class and when a teacher does roll call, she doesn’t necessarily call your name, she calls the number assigned to you. So example

1 – John Doe

2 – Jill Doe

3 – Mark Ranson

So the roll number assigned to the student becomes a key to identify that student.

Similarly in programming languages (Java in this case), we use a data structure called Hash Tables.

Hash function takes an input, hashes that input to generate an index which we use as a key to store the value in an array. Why so complexity? Why not we go in sequential order?

There are many reasons, first hashing gives security. If somebody exploits sequential order, it is easy to find next element. But hashing allows us to randomly store the data. But the most important, the average time required to search for an element in a hash table is O(1).

Now from the basics, we can say that hash tables have two components – one an array to store the value and a function to calculate the index of the array.

So what is a hash function and how do we write this hash function?

A hash function is a function that takes a data of any size and transforms that data into a fixed size data. In short a hash function will take an input x and transform that into output y. Now, this looks simple, but the question arises what if there are multiple inputs that can be transformed into y. Then we will have a problem. This is known as Collision.

Important characteristics of this hash function

  1. It should avoid collision.
  2. It should easily calculate the keys.
  3. It should uniformly distribute the keys.

How to avoid collision?

There are a couple of techniques.

One technique is open addressing. In Open Addressing, store all elements in hash table itself. At any point, the size of the hash table must be greater than or equal to that of the number of keys. This is useful in the scenario of fixed size tables. During insertion, if you found the occupied slot in the hash table, you go for the next slot. It will continue until it finds an unoccupied slot. Since this is a linear process, open addressing is also linear probing. The disadvantage of open addressing is insertion and search operation becomes linear.

The second technique is Separate Chaining. In this, make each cell of a hash table point to a linked list of records. So if a hash function returns a duplicate key, the value will be placed in a linked list which will be pointed by earlier value stored at that key. The next value will be pointed by earlier linked list element. To make this simpler – let’s assume we have a has function key % 3 and so for 9, it will return 0. For 10, it will return 1. For 16, it will return 1 again. Now when we will store a value (for 10), we will store at index 1 and the next value (for 16), will be in a linked list pointed by the value stored at 1.

When do we use hash tables?

  1. Hash tables offer fast insertion
  2. Hash tables allow fast deletion
  3. Hash tables can help in searching an element

References

  1. Hash tables as data structures
  2. Hash Tables