Category Archives: Java

Monitoring your microservice with Micrometer

Spring Boot has made building a web application way easier.  It has also added a lot of other critical libraries that help enterprise applications in different ways. With enterprise applications moving to the cloud, Spring Boot has made it easier to deploy spring applications in the cloud with continuous integration. In this post, I will show how we can use a spring micrometer library to gather analytics related to your code.

As a result, these analytics can be transferred to different vendor databases for creating metrics-based dashboards. I showed how to use spring-boot-actuator to collect some metrics data.

As Spring defines Micrometer is a dimensional-first metrics collection facade. In simple words, it is similar to SLF4J, except for metrics.

Configure Micrometer for microservice

Firstly to use a micrometer, I have created a simple microservice with REST APIs and it is built using Spring Boot 2. Most importantly Spring Boot has added backward compatibility for Spring 1.x.

You can configure Micrometer in your Spring Boot 2.X based Microservice by adding the following dependency in your build file

runtime('io.micrometer:micrometer-registry-prometheus:1.0.4')

Adding Metrics

We will discuss different metrics that we can add through the micrometer. Dimensions and names identify a meter. You can use Meter for different types of metrics.

Counter

Counters are a cumulative metric. These are mostly used to count the number of requests, number of errors, number of tasks completed.

Gauges

A gauge represents a single value that can go up and down. The gauge measures memory usage.

Timers

Timers measure the rate at which we call a particular code or method. Subsequently we can also find out latencies when the execution of code is complete.

We talked about different metrics and how we can configure micrometers. Now we will show how to use this library to configure against a monitoring system. Spring micrometer supports the number of the monitoring system. In this post, I will be showing how to use against Prometheus monitoring system.

What is Prometheus?

Prometheus is an in-memory dimensional time-series database with a built-in UI, a custom query language, and math operations. To know more, you can visit here.

Meanwhile, we can add Prometheus in our microservice by adding the following dependency in the Gradle file

compile('org.springframework.boot:spring-boot-starter-actuator:2.0.3.RELEASE')
runtime('io.micrometer:micrometer-registry-prometheus:1.0.4')

For example, to understand where Prometheus lies in whole architecture, look at the below

Spring Boot microservice -> Spring Micrometer -> Prometheus

Once the above dependencies are added, Spring boot will automatically configure PrometheusMeterRegistry and CollectorRegistry to collect and export metrics data in a suitable format that Prometheus can scrape.

To enable Prometheus endpoints

Similarly, you enable Prometheus and actuator endpoints. Add following properties in application.properties file

management.security.enabled = false
management.endpoints.web.exposure.include=health,info,prometheus

Now if we run to start our webserver to see how these endpoints look, we can verify by going to endpoints http://localhost:8080/actuator/info , http://localhost:8080/actuator/health and http://localhost:8080/actuator/prometheus .  Prometheus endpoint looks like below :

Prometheus

Conclusion

In this post, we showed how to use Spring Micrometer to capture metrics data and configure with Prometheus. In the next post, I will show how to display this data in the human-readable format in nice UI using Prometheus.

References

  1. Production-Ready Metrics – Metrics
  2. Spring Micrometer – Spring Micrometer

 

Introduction to Graphs

In my previous article, I talked about hashtables. I will discuss one more data structure in this post and it is probably one of the most important data structures of all and that is Graphs.

Clearly, our current web technologies are heavily reliant on graphs. Google, Facebook, or LinkedIn or any social media platform which includes users use graphs as a data structure. So, graphs are the most common data structure to solve problems related to finding the distance between two nodes OR the shortest path from place A to place B.

Therefore, when it comes to the social network, we are accustomed to six degrees of freedom, in such cases, we can use graphs to find how many degrees will it take to connect two nodes on the social network. In networking, most use graphs to find the fastest way to deliver the response.

How do you explain Graphs to 5-year-olds?

The easiest example, one can give to a kid to explain Graphs, is to look at City A and City B on a map. Now use the road that connects to those two cities.

City A – has bananas, and oranges, city B – has apples, and city C – has watermelons.

Now on the map, when we travel from City A to City B, what possible route we can take and what information we can exchange. City A and City B can transfer apples, bananas, oranges to each other. Once City B gets bananas and oranges, it can transfer that to other neighboring cities.

In short, we are connecting nodes (vertices) of cities A and B through a road (edge) while exchanging the products these two cities are known for.

Graphs Data Structure

In this post, we will discuss graphs from the Java perspective. Graphs allow representing real-life relationships between different types of data. There are two important aspects to graph:

  • Vertices (Nodes) – Nodes represent the points of a graph where the graph is connected. Node store the data or data points. 
  • Edges – Edges represent the relationship between different nodes. Edges can have weight or cost.

However, there is no starting node or ending node in the graph. A graph can be cyclical or acyclical. In conclusion, edges can be directed or undirected which give birth to graphs as directed or undirected. 

For instance, edges are generally represented in the form of a set of ordered pairs as in (x,y) – there is an edge from node x to node y. So (x,y) can be different from (y,x), especially in the directed graph.

Representations of Graphs

A. Adjacency Matrix –

This is a 2 dimensional array of size n*n where n is number of nodes in the graph. adj[][] is the usual way of representing this matrix.  So if adj[i][j] = 1, it represents an edge between node i and node j. Adjacency matrix for an undirected graph is symmetrical. Now if I have to represent the graph shown above in the figure, I will represent it like below:

                A               B             C        G         E
               A                 0               1             0         1         0
               B                1              0             1         0         1
               C                0              1             0         0         1
               G                1              0             0         0         1
               E                0              1             1         1         0

 B. Adjacency List –

Similarly, an array of lists is used. The size of the array is equal to the number of nodes in the graph. So arr[i] will indicate the list of vertices adjacent to node i.

 

Operations on the Graphs

There are common operations that we will use often. Likewise, graph as a data structure offers the following operations:

Additions

addNode  – Add a node in the existing graph

addEdge – Add an edge in the existing graph between two nodes

Removal

removeNode – Remove a node from the existing graph

removeEdge – Remove an edge between two nodes from the graph

Search

contains– find if the graph contains the given node

hasEdge – find if there is an edge between given two nodes

 

Time and Space Complexity of operations on Graphs

Above all, a post would be incomplete if I didn’t talk about complexity about operations on the graph data structure. Basically, this really depends on what representations you use for the graph. With adjacency matrix, addition and removal operations are O(1) operations. While search operations like contains and hasEdge are also O(1) operations. In addition, the space complexity for the adjacency matrix is O(n*n).

While with adjacency list, additions are O(1) and removal of a node is O(n) operation, removal of an edge is O(1) . Therefore, search operations are equally O(1)

Conclusion

In conclusion, I showed the basics of the graph as a data structure. The graph is a data structure that contains nodes and edges. Also, It has operations like additions, removal, and search. In future posts, I will talk about implementing Depth First Search and Breadth First Search in the graph. After that, we will solve some real problems using this data structure. Above all, Graph is an important data structure.

References

  1. Introduction to Graphs – Graphs
  2. Graph as data structure – Graph as data structure

Hash Tables

What are Hash Tables?

Hash Tables are data structures used to store the data in key/value pair format. It uses a hash function to compute an index which will be used in an array to store the element at that index.

What is key/value pair though?

Alright, I will be digging in fundamentals here. Let’s take an example of database table. To retrieve a particular value from database table, you sometimes need to know a primary key or a unique value from the row of database table. Then you query on database table based on that unique value or primary key to get that entire row or that particular value you are looking for me.

Still complicated?

Let’s take an example of classroom. You are in 2nd grade class and when a teacher does roll call, she doesn’t necessarily call your name, she calls the number assigned to you. So example

1 – John Doe

2 – Jill Doe

3 – Mark Ranson

So the roll number assigned to the student becomes a key to identify that student.

Similarly in programming languages (Java in this case), we use a data structure called Hash Tables.

Hash function takes an input, hashes that input to generate an index which we use as a key to store the value in an array. Why so complexity? Why not we go in sequential order?

There are many reasons, first hashing gives security. If somebody exploits sequential order, it is easy to find next element. But hashing allows us to randomly store the data. But the most important, the average time required to search for an element in a hash table is O(1).

Now from the basics, we can say that hash tables have two components – one an array to store the value and a function to calculate the index of the array.

So what is a hash function and how do we write this hash function?

A hash function is a function that takes a data of any size and transforms that data into a fixed size data. In short a hash function will take an input x and transform that into output y. Now, this looks simple, but the question arises what if there are multiple inputs that can be transformed into y. Then we will have a problem. This is known as Collision.

Important characteristics of this hash function

  1. It should avoid collision.
  2. It should easily calculate the keys.
  3. It should uniformly distribute the keys.

How to avoid collision?

There are a couple of techniques.

One technique is open addressing. In Open Addressing, store all elements in hash table itself. At any point, the size of the hash table must be greater than or equal to that of the number of keys. This is useful in the scenario of fixed size tables. During insertion, if you found the occupied slot in the hash table, you go for the next slot. It will continue until it finds an unoccupied slot. Since this is a linear process, open addressing is also linear probing. The disadvantage of open addressing is insertion and search operation becomes linear.

The second technique is Separate Chaining. In this, make each cell of a hash table point to a linked list of records. So if a hash function returns a duplicate key, the value will be placed in a linked list which will be pointed by earlier value stored at that key. The next value will be pointed by earlier linked list element. To make this simpler – let’s assume we have a has function key % 3 and so for 9, it will return 0. For 10, it will return 1. For 16, it will return 1 again. Now when we will store a value (for 10), we will store at index 1 and the next value (for 16), will be in a linked list pointed by the value stored at 1.

When do we use hash tables?

  1. Hash tables offer fast insertion
  2. Hash tables allow fast deletion
  3. Hash tables can help in searching an element

References

  1. Hash tables as data structures
  2. Hash Tables

 

Object Oriented Design Principles

A good software developer builds a software using right design principles. If you learn design patterns, object oriented concepts, but don’t learn principles, then you will do a disservice to yourself as a developer. Without design principles, you will build a software with no heart, no functionality to serve. I hope you don’t want to do that.

In this post, I will try to explain some design principles that I have come across or learned through my experience. If you do not understand any of these principles, please comment on the post and I will answer your questions.

Programming for interface and not for implementation

While building design, you can think how you can reuse or design your code in a way where you can extend it in future if needed. OR you have to minimal changes if you have to change. One design principle that can help in such cases is to Program interfaces instead of implementation directly.

For variables, method return types or argument type of methods – use interfaces. This will help to implement interfaces as you want.

Single Responsibility Principle

A class, a method should always implement single responsibility or single functionality. Putting more than one functionality in an object can disturb the functionality in future if there are any changes. To reduce future changes, always create implement your code with single responsibility principle.

Liskov Substitution Principle

This principle states that objects should be replaceable with instances of their subclasses without altering the correctness of the program.

To understand this, let’s look at a simple object and subclasses of that object Bird

public class Bird
{
    void fly()
    {
       // Fly function for bird
    }
}

public class Parrot extends Bird
{
    @Override
    void fly()
    {

    }
}

public class Ostrich extends Bird
{
   // can't implement fly since Ostrich doesn't fly
}

Parrot as a bird can fly, but Ostrich as a bird can’t fly. So if we end up using such an implementation, it will violate the principle of Liskov Substitution.

Open Closed Principle

Open Closed Principle makes that objects,methods should be open for extensions, but closed for modification. Many times, requirements are not clear at the beginning of design and implementation, we must use open closed principle to implement initial design and slowly if requirements change, it becomes easy to add them in design.

Interface Segregation Principle

This principle requires that client should not be forced to implement interface if it doesn’t use that. In another words, make sure your interfaces are concise and implement single functionality only. If interface has more than one functionality, it can be unnecessary for client to implement all the functionalities when it only needs one.

Delegation Principle

Don’t do all the stuff by yourself, but delegate the functionalities to respective classes. Delegation is kind of relationship between objects where an object can forward certain functions to do work to other objects (provided those objects implement those functions).

Dependency Inversion Principle

This principle is type of decoupling behavior for software modules. High level modules should not depend on low level modules. Generally while designing high level classes will depend on low level classes. But if you have to change low level classes after every design revision, it will warrant to be a bad design. To avoid such a problem, we create an abstraction layer. Low level classes will be created based on abstraction layer.

When this principle is used, high level classes use interfaces as an abstraction layer to work with low level classes, instead of working directly with low level classes.

References

  1. Ten object oriented design principles – SOLID Principles
  2. Design Principles – design principles

 

All about Kafka Streaming

Lately, I have been hearing a lot about Kafka streaming. Even though I have worked on microservices, I haven’t really tackled heavy data applications. In my previous experience, where we did deal with heavy data for health insurance benefits, it was very different.

Now with Netflix and Amazon, data streaming has become a major target. With growing technology and information, it has become even more important to tackle the growing data. In simple terms, all web applications should be able to process large data sets with improved performance. Data sets size should not deter applications usage.

What is Kafka Data Streaming?

Firstly we used to process large data in batch, but that is not continuous processing and sometimes, it doesn’t work in real-time scenarios for applications. Like Netflix, batch processing will never work. What’s the alternative? The data streaming. Data streaming is a process of sending data sets continuously. This process is the backbone for applications like Netflix and Amazon. Also the growing social network platforms, data streaming is at the heart of handling large data.

The streamed data is often used for real-time aggregation and correlation, filtering, or sampling. One of the major benefits of data streaming is that it allows us to view and analyze data in real-time.

Few challenges that data streaming often faces

  1. Scalability
  2. Durability of data
  3. Fault Tolerance

Tools for Data Streaming

There are a bunch of tools that are available for data streaming. Amazon offers Kinesis, Apache has few open-source tools like Kafka, Storm, and Flink. Similarly, in future posts, I will talk more about Apache Kafka and its usage. Here I am just giving a brief idea about Apache Kafka.

Apache Kafka Streaming

Apache Kafka is a real-time distributed streaming platform. Basically it allows us to publish and subscribe to streams of records.

There are two main usages where Apache Kafka is used:

  1. Building data pipelines where records are streamed continuously.
  2. Building applications that can consume data pipelines and react accordingly

Above all, the basic idea that Kafka has adapted is from Hadoop. It runs as a cluster of one or more servers that can span multiple data centers. These clusters store data streams. Each record in the stream comprised of key, value, and timestamp. Kafka provides four main APIs Producer, Consumer, Streams, Connector.

Similarly, in future posts, I will break down these APIs in detail with their usage in a sample application.

References

  1. Apache Kafka – Kafka
  2. Data Streaming – Data Streaming
  3. Stream Processing – Stream Processing