Monthly Archives: August 2021

A Complete Guide to Using ElasticSearch with Spring Boot

In this post, I will cover the details of how to use Elasticsearch with Spring Boot. I will also cover the fundamentals of Elasticsearch and how it is used in the industry.

What is Elasticsearch?

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.

It is built upon Apache Lucene. Elasticsearch is often part of the ELK stack (Elastic, LogStash, and Kibana). One can use Elasticsearch to store, search, and manage data for

  • Logs
  • Metrics
  • A search backend
  • Application Monitoring

Search has become a central idea in many fields with ever-increasing data. As most applications become data-intensive, it is important to search through a large volume of data with speed and flexibility. ElasticSearch offers both.

In this post, we will look at Spring Data Elasticsearch. It provides a simple interface to do search, store, and run analytics operations. We will show how we can use Spring Data to index and search log data.

Key Concepts of Elasticsearch

Elasticsearch has indexes, documents, and fields. The idea is simple and very similar to databases. Elasticsearch stores data as documents(Rows) in indexes(Database tables). A user can search through this data using fields(Columns).

Usually, the data in elasticsearch goes through different analyzers to split that data. The default analyzer split the data on punctuation like space or comma.

We will be using spring-data-elasticsearch library to build the demo of this post. In Spring Data, a document is nothing but a POJO object. We will add different annotations from elasticsearch in the same class.

As said previously, elasticsearch can store different types of data. Nevertheless, we will be looking at the simple text data in this demo.

Creating Spring Boot Application

Let’s create a simple spring boot application. We will be using spring-data-elasticsearch dependency.


dependencies {
 implementation 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
 implementation 'org.springframework.boot:spring-boot-starter-thymeleaf'
 implementation 'org.springframework.boot:spring-boot-starter-web'
 testImplementation 'org.springframework.boot:spring-boot-starter-test'
}

Subsequently, we need to create Elasticsearch client bean. Now there are two ways to create this bean.

The simple method to add this bean is by adding the properties in application.properties.

spring.elasticsearch.rest.uris=localhost:9200
spring.elasticsearch.rest.connection-timeout=1s
spring.elasticsearch.rest.read-timeout=1m
spring.elasticsearch.rest.password=
spring.elasticsearch.rest.username=

But in our application, we will be building this bean programmatically. We will be using Java High-Level Rest Client (JHLC). JHLC is a default client of elasticsearch.


@Configuration
@EnableElasticsearchRepositories
public class ElasticsearchClientConfiguration extends AbstractElasticsearchConfiguration
{

    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient ()
    {
        final ClientConfiguration clientConfiguration =
                ClientConfiguration.builder().connectedTo("localhost:9200").build();

        return RestClients.create(clientConfiguration).rest();
    }
}

Henceforth, we have a client configuration that can also use properties from application.properties. We use RestClients to create elasticsearchClient.

Additionally, we will be using LogData as our model. Basically, we will be building a document for LogData to store in an index.


@Document(indexName = "logdataindex")
public class LogData
{
    @Id
    private String id;

    @Field(type = FieldType.Text, name = "host")
    private String host;

    @Field(type = FieldType.Date, name = "date")
    private Date date;

    @Field(type = FieldType.Text, name = "message")
    private String message;

    @Field(type = FieldType.Double, name = "size")
    private double size;

    @Field(type = FieldType.Text, name = "status")
    private String status;

    // Getters and Setters

}
  • @Document – specifies our index.
  • @Id – represents the field _id of our document and it is unique for each message.
  • @Field – represents a different type of field that might be in our data.

There are two ways one can search or create an index with elasticsearch  –

  1. Using Spring Data Repository
  2. Using ElasticsearchRestTemplate

Spring Data Repository with Elasticsearch

Overall, Spring Data Repository allows us to create repositories that we can use for writing simple CRUD methods for searching or indexing in elasticsearch. But if you want more control over the queries, you might want to use ElasticsearchRestTemplate. Especially, it allows you to write more efficient queries.

public interface LogDataRepository extends ElasticsearchRepository<LogData, String>
{
}

This repository provides basic CRUD methods that Spring takes care of from an implementation perspective.

Using ElasticsearchRestTemplate

If we want to use advanced queries like aggregation, suggestions, we can use ElasticsearchRestTemplate . Spring Data library provides this template.

 public List getLogDatasByHost(String host) {
    Query query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.matchQuery("host", host))
        .build();
    SearchHits searchHits = elasticsearchRestTemplate.search(query, LogData.class);

    return searchHits.get().map(SearchHit::getContent).collect(Collectors.toList());
  }

I will show further the usage of ElasticsearchRestTemplate when we do more complex queries.

ElasticsearchRestTemplate implements ElasticsearchOperations.  There are key queries that you can use with ElasticsearchRestTemplate that makes the use of it easier compared to Spring Data repositories.

index() OR bulkIndex() allow creating a single index or indices in bulk. One can build an index query object and use it in index() method call.


  private ElasticsearchRestTemplate elasticsearchRestTemplate;

  public List createLogData
            (final List logDataList) {

      List queries = logDataList.stream()
      .map(logData ->
        new IndexQueryBuilder()
        .withId(logData.getId().toString())
        .withObject(logData).build())
      .collect(Collectors.toList());;
    
      return elasticsearchRestTemplate.bulkIndex(queries,IndexCoordinates.of("logdataindex"));
  }

search() method helps to search documents in an index. One can perform search operations by building Query object. There are three types of Query one can build. NativeQuery, CriteriaQuery, and StringQuery.

Rest Controller to query elasticsearch instance

Let’s create a rest controller that we will use to add the bulk of data in our elasticsearch instance as well as to query the same instance.

@RestController
@RequestMapping("/v1/betterjavacode/logdata")
public class LogDataController
{
    @Autowired
    private LogDataService logDataService;

    @GetMapping
    public List searchLogDataByHost(@RequestParam("host") String host)
    {
        List logDataList = logDataService.getAllLogDataForHost(host);

        return logDataList;
    }

    @GetMapping("/search")
    public List searchLogDataByTerm(@RequestParam("term") String term)
    {
        return logDataService.findBySearchTerm(term);
    }

    @PostMapping
    public LogData addLogData(@RequestBody LogData logData)
    {

        return logDataService.createLogDataIndex(logData);
    }

    @PostMapping("/createInBulk")
    public  List addLogDataInBulk(@RequestBody List logDataList)
    {
        return (List) logDataService.createLogDataIndices(logDataList);
    }
}

Running Elasticsearch Instance

So far, we have shown how to create an index, and how to use elasticsearch client. But, we have not shown connecting this client to our elasticsearch instance.

We will be using a docker instance to run elasticsearch on our local enviornment. AWS provides its own service to run Elasticsearch.

To run your own docker instance of elasticsearch, use the following command –

docker run -p 9200:9200 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.0

Subsequently, this will start the node elasticsearch node that you can verify by visiting http://localhost:9200

Elastic Search and Spring Data - Docker Instance

Creating Index and Searching for Data

Altogether, if we start the application, we will be using a Postman to create an initial index and continue to add documents to it.

Elasticsearch and Spring Boot - Add Documents

This will also create an index and add the documents to that index. On elasticsearch instance, we can see the log as below:

{
 "type": "server",
 "timestamp": "2021-08-22T18:48:46,579Z",
 "level": "INFO",
 "component": "o.e.c.m.MetadataCreateIndexService",
 "cluster.name": "docker-cluster",
 "node.name": "e5f3b8096ca3",
 "message": "[logdataindex] creating index, cause [api], templates [], shards [1]/[1]",
 "cluster.uuid": "mi1O1od7Rju1dQMXDnCuNQ",
 "node.id": "PErAmAWPRiCS5tv-O7HERw"
}

The message clearly shows that it has created an index logdataindex. Now if add more documents to the same index, it will update that index.

Let’s run a search query now. I will run a simple query to search for the text term “Google”

Elasticsearch and Spring Boot - Search

This was a simple search query. As previously mentioned, we can write more complex search queries using different types of queries – String, Criteria, or Native.

Conclusion

Code for this demo is available on my GitHub repository.

In this post, we covered the following things

  • Elasticsearch and Key Concepts about Elasticsearch
  • Spring Data repository and ElasticsearchRestTemplate
  • Integration with Spring Boot Application
  • Execution of different queries against Elasticsearch

If you have not checked out my book about Spring Security, you can check here.

Do you find Gradle as a build tool confusing? Why is it so complex to understand? I am writing a new simple book about Gradle – Gradle For Humans. Follow me here for more updates.

Connect Spring Boot Application with AWS Dynamo DB

In this post, I show how we can connect Spring Boot Application with AWS Dynamo DB. I will also cover some fundamentals of AWS Dynamo DB which is a No-SQL database.

AWS Dynamo DB

As per Amazon Documentation, Dynamo DB is No-SQL key-value and document database. We do have some alternatives like Cassandra (key-value) or Mongo DB (document).

Dynamo DB offers

  • reliable scaleable performance
  • a simple API for allowing key-value access

Dynamo DB is usually a great fit for application with the following requirements:

  1. A large amount of data and latency requirements
  2. Data sets for recommendation systems
  3. Serverless Application with AWS Lambda

Key Concepts

Before we can use Dynamo DB, it is important to understand some key concepts about this database.

  • Tables, Items, and Attributes – These three are the fundamental blocks of Dynamo DB. A table is a grouping of data records. An item is a single data record in a table. Henceforth, each item in a table is identified using the primary key. Attributes are pieces of data in a single item.
  • Dynamo DB tables are schemaless. However, we only need to define a primary key when creating the table. A simple primary key or a composite primary key are two types of primary keys.
  • Secondary Indexes – Sometimes primary keys are not enough to access data from the table. Secondary indexes enable additional access patterns from the Dynamo DB. Nevertheless, there are two types of indexes – local secondary indexes and global secondary indexes. A local secondary index uses the same partition key as the underlying table, but a different sort key. A global secondary index uses the different partition key and sort key from the underlying table.

Applications with Dynamo DB

There is one difference with Dynamo DB compared to other SQL or NoSQL databases. We can interact with Dynamo DB through REST calls. We do not need JDBC Connection protocols where applications need to maintain consistent connections.

There are two ways we can connect applications to Dynamo DB.

  1. Use Spring Data Library with Dynamo DB
  2. Use an AWS SDK provided client

Spring Boot Application

As part of this demo, we will create some data model classes that depict entity relationships. Subsequently, the application will provide a simple REST API for crud operation and the application will store the data in Dynamo DB.

So let’s start with adding the required dependencies in our application:


dependencies {
 implementation 'org.springframework.boot:spring-boot-starter-web'
 implementation 'io.github.boostchicken:spring-data-dynamodb:5.2.5'
 implementation 'junit:junit:4.13.1'
 testImplementation 'org.springframework.boot:spring-boot-starter-test'
}

So the dependency spring-data-dynamodb allows us to represent Dynamo DB tables in model classes and create repositories for those tables.

We will create our model class Company as follows:


package com.betterjavacode.dynamodbdemo.models;

import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBAttribute;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBAutoGeneratedKey;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBHashKey;
import com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBTable;

@DynamoDBTable(tableName = "Company")
public class Company
{
    private String companyId;

    private String name;

    private String type;

    @DynamoDBHashKey(attributeName = "CompanyId")
    @DynamoDBAutoGeneratedKey
    public String getCompanyId ()
    {
        return companyId;
    }


    public void setCompanyId (String companyId)
    {
        this.companyId = companyId;
    }

    @DynamoDBAttribute(attributeName = "Name")
    public String getName ()
    {
        return name;
    }

    public void setName (String name)
    {
        this.name = name;
    }

    @DynamoDBAttribute(attributeName = "Type")
    public String getType ()
    {
        return type;
    }

    public void setType (String type)
    {
        this.type = type;
    }
}

So this class Company maps to the Dynamo DB table of the same name. The annotation DynamoDBTable helps us with this mapping. Similarly, DynamoDBHashKey is the attribute key of this table. DynamoDBAttribute are the other attributes of this table.

We will create a REST Controller and a Service class that will allow us to call the CRUD APIs for this object.


package com.betterjavacode.dynamodbdemo.controllers;

import com.betterjavacode.dynamodbdemo.models.Company;
import com.betterjavacode.dynamodbdemo.services.CompanyService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("v1/betterjavacode/companies")
public class CompanyController
{
    @Autowired
    private CompanyService companyService;

    @GetMapping(value = "/{id}", produces = "application/json")
    public ResponseEntity getCompany(@PathVariable("id") String id)
    {

        Company company = companyService.getCompany(id);

        if(company == null)
        {
            return new ResponseEntity<>(HttpStatus.BAD_REQUEST);
        }
        else
        {
            return new ResponseEntity<>(company, HttpStatus.OK);
        }
    }

    @PostMapping()
    public Company createCompany(@RequestBody Company company)
    {
        Company companyCreated = companyService.createCompany(company);

        return company;
    }
}



So we have two methods one to get the company data and another one to create the company.


package com.betterjavacode.dynamodbdemo.services;

import com.betterjavacode.dynamodbdemo.models.Company;
import com.betterjavacode.dynamodbdemo.repositories.CompanyRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Optional;

@Service
public class CompanyService
{
    @Autowired
    private CompanyRepository companyRepository;

    public Company createCompany(final Company company)
    {
        Company createdCompany = companyRepository.save(company);
        return createdCompany;
    }

    public List getAllCompanies()
    {
        return (List) companyRepository.findAll();
    }

    public Company getCompany(String companyId)
    {
        Optional companyOptional = companyRepository.findById(companyId);

        if(companyOptional.isPresent())
        {
            return companyOptional.get();
        }
        else
        {
            return null;
        }
    }
}

Connect Spring Boot Application with AWS Dynamo DB

So far, we have seen creating some parts of the application. But, we still have an important part left and that is to connect our application to AWS Dynamo DB service in AWS.

Login to AWS Console and access Dynamo DB.

Create a new table in Dynamo DB.

Spring Boot Connect AWS Dynamo DB - Table

Assuming, you choose the primary key as CompanyId, we should be fine here. Remember, that’s the partition key we defined in our model class.

Now back to the Spring Boot Application. Create a new bean ApplicationConfig to define Dynamo DB configuration.

 


package com.betterjavacode.dynamodbdemo.config;

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import org.socialsignin.spring.data.dynamodb.repository.config.EnableDynamoDBRepositories;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableDynamoDBRepositories(basePackages = "com.betterjavacode.dynamodbdemo.repositories")
public class ApplicationConfig
{
    @Value("${amazon.aws.accesskey}")
    private String amazonAccessKey;

    @Value("${amazon.aws.secretkey}")
    private String amazonSecretKey;

    public AWSCredentialsProvider awsCredentialsProvider()
    {
        return new AWSStaticCredentialsProvider(amazonAWSCredentials());
    }

    @Bean
    public AWSCredentials amazonAWSCredentials()
    {
        return new BasicAWSCredentials(amazonAccessKey, amazonSecretKey);
    }

    @Bean
    public AmazonDynamoDB amazonDynamoDB()
    {
        return AmazonDynamoDBClientBuilder.standard().withCredentials(awsCredentialsProvider()).withRegion(Regions.US_EAST_1).build();
    }
}

We will need to pass accessKey and secretKey in application.properties. Importantly, we are creating an AmazonDynamoDB bean here.

Now, let’s start our application and we will see the log that shows it has created a connection with DynamoDB table Company.

Spring Boot Application Log - Dynamo DB Table

Once the application has started, we will access Postman for REST API.
Spring Boot POST API - Dynamo DB

Conclusion

Code for this demo is available on my github repository.

In this post, we showed how we can use Dynamo DB – a No SQL database in a Spring Boot application.

  • We went over Dynamo DB concepts.
  • And we created a Spring Boot application.
  • We created a Dynamo DB table in AWS.
  • We connected the Spring Boot application to the AWS Dynamo DB table.

References

Introduction to Serverless Architecture Patterns

In this post, I will cover Serverless Architecture Patterns. With multiple cloud providers, on-premise infrastructure is out of date. By simple definition, serverless can be the absence of a server. But is that true? Not really. To start, we will find out serverless architecture basics and then its benefits and drawbacks.

What is Serverless Architecture?

Lately, serverless architecture is becoming more of a trend. The developer still writes the server-side code, but it runs in stateless compute containers instead of traditional server architecture. An event triggers this code and a third party (like AWS Lambda) manages it. Basically, this is Function as a Service (FaaS). AWS Lambda is the most popular form of FaaS.

So the definition of Serverless Architecture –

Serverless Architecture is a design pattern where applications are hosted by a third-party service, eliminating the need for server software and hardware.

In traditional architecture, a user does activity on the UI side and that sends a request to the server where the server-side code does some database transaction. With this architecture, the client has no idea what is happening as most of the logic is on the server-side.

With Serverless Architecture, we will have multiple functions (lambdas) for individual services and the client UI will call them through API-Gateway.

Serverless Architecture Patterns - Initial Design

So in the above architecture

  1. A Client UI when accessed, authenticates the user through an authentication function that will interact with the user database.
  2. Similarly, once the user logs in, he can purchase or search for products using purchase and search functions.

In traditional server-based architecture, there was a central piece that was managing flow, control, and security. In Serverless architecture, there is no central piece. The drawback of serverless architecture is that we then rely on the underlying platform for security.

Why Serverless Architecture?

With a traditional architecture, one used to own a server, and then you would configure the webserver and the application. Then came the cloud revolution and now everyone wants to be on the cloud. Even with multiple cloud providers, we still need to manage the operating system on the server and web server.

What if there is a way where you can solely focus on the code and a service manages the server and the webserver. AWS provides Lambda, Microsoft Azure provides Function to take care of physical hardware, virtual operating systems, and webserver. This is how Serverless Architecture reduces the complexity by letting developers focus on code only.

Function As A Service (FAAS)

We have covered some ground with Serverless Architecture. But this pattern is possible only with Function as a Service. Lambda is one type of Function as a service. Basically, FaaS is about running backend code without managing any server or server applications.

One key advantage of FaaS like AWS Lambda is that you can use any programming language (like Java, Javascript, Python, Ruby) to code and the Lambda infrastructure will take care of setting up an environment for that language.

Another advantage of FaaS is horizontal scaling. It is mostly automatic and elastic. Cloud provider handles horizontal scaling.

Tools

There are a number of tools available for building serverless architecture functions. This particular Serverless Framework makes it building an easy process.

Benefits

  1. The key benefit of using Serverless Architecture is reduced operational cost. Once a cloud provider takes care of infrastructure, you do not have to focus on infrastructure.
  2. Faster Deployment, great flexibility – Speed helps with innovation. With faster deployment, serverless makes it easier to change functions and test the changes.
  3. Reduced scaling cost and time. With infrastructure providers handling most of the horizontal scaling, the application owner doesn’t have to worry about scaling.
  4. Focus more on UX. Therefore, developers can focus more on User Experience (UX) with architecture becoming easier.

Drawbacks

After all, there are tradeoffs with any architectural design.

  1. Vendor Control – By using an infrastructure vendor, we are giving away the control for backend service. We have to rely on their security infrastructure instead of designing our own.
  2. Running workloads could be more costly for serverless.
  3. Repetition of logic – Database migration means repetition of code and coordination for the new database.
  4. Startup latency issues – When the initial request comes, the platform must start the required resources before it can serve the request and this causes initial latency issues.

Conclusion

In this post, I discussed Serverless Architecture Patterns and what makes it possible to use this pattern. I also discussed the benefits and drawbacks. If you have more questions about Serverless Architecture, please post your comment and I will be happy to answer. You can subscribe to my blog here.

References

  1. Serverless Architecture – Serverless