Fundamentals of a Distributed System Design

When you are a beginner software developer, your focus is on the micro-level. What happens in your code? What happens in your application? But if you start thinking in a System Design way, it can help you immensely in your career. System design is a big topic, but I will cover the important fundamentals of distributed system design. Understanding System Design is the key to building a good system. Therefore, a developer should definitely try to learn about system design.

Fundamentals of a Distributed System

In this post, we will learn the following fundamentals.

Key characteristics of a distributed system
Load balancing
Caching
Database
Database indexes
Proxies
CAP Theorem
Consistent Hashing

Key Characteristics of a distributed system

Scalability

Scalability is the system’s ability to grow and manage increased demand
Horizontal scaling – you scale by adding more servers into your pool of resources.
Vertical scaling – you scale by adding more power to an existing server.

Reliability

It is the probability a system will fail in a given period. Specifically, the goal is to minimize this probability as much as possible.
To achieve reliability, redundancy is required. Therefore, it has a cost.

Availability

Availability is the time a system remains operational to perform its required function in a specific period.
If a system is reliable, it is available. By comparison, if it is available, it is not necessarily reliable.

Efficiency

Latency – response time
Throughput – the number of items delivered in a given time unit

Load Balancing

The load balancer routes the traffic from clients to different servers. It keeps track of the status of all the resources while distributing requests. Equally, a load balancer reduces individual server load and prevents any one application server from becoming a single point of failure. So, the load balancer can be added between clients and web servers, between webservers and an internal platform layer (application server), and between internal platform and database servers.

To organize a load balancer for distributing requests to servers, one can use different algorithms like Round Robin, Weighted Round Robin, Least Connection Method, Least Response Time, Least Bandwidth, IP Hash.

As a result, the load balancer can be a single point of failure. To overcome this, a second load balancer can be connected to the first to form a cluster.

Caching

Caches take advantage of the locality of references principle. A cache is like a short term memory. That is to say, it is faster with limited space. Furthermore, caches can exist at all levels in the architecture but often found at the level nearest to the front end.

Application Server Cache

Placing a cache directly on a request layer node enables the local storage of response data.

Content Distribution Network

CDNs are a kind of cache that comes into play for sites serving large amounts of static data.

Cache Invalidation

Write through cache – Write the data into the cache and the corresponding database at the same time.
Write around cache – Write the data to permanent storage, bypassing the cache. Therefore, recently written data will create a cache miss.
Write-back cache – Write the data to cache alone and sync with backend storage after a specified interval.

Cache Eviction Policies

First In First Out
Last In First Out
Least Recently Used
Least Frequently Used
Most Recently Used
Random Replacement

Database

You will need a storage system for your data. Obviously, Databases are the most common solution. Accordingly, there are two types of databases. Basically, Relational databases and Non-Relational databases.

If your data is structured, you can use a relational database. Also, relational databases offer structured query language (SQL) to query the databases.

Non-relational databases are unstructured, and distributed.

SQL

Store data in rows and columns
Each row contains information about one entity
MySQL, MS SQL, Oracle, PostgreSQL, SQLite are some examples of relational databases.
SQL databases use SQL for querying.
Vertically scalable, but expensive.
Horizontally scalable, but time-consuming process.
SQL databases are ACID (Atomicity, Consistency, Isolation, and Durability) compliant.
If you need ACID compliance and structured data, use SQL databases.

NoSQL

Key-Value Stores – Redis, Dynamo DB
Document databases – Couch DB and MongoDB
Wide-Column databases – Columnar databases are best suited for analyzing large datasets – Cassandra and HBase
Graph databases – data stored and related to each other in graph format. Subsequently, data is stored with nodes (entities), properties (info about entities), and lines (the connection between entities) – Neo4J and InfiniteGraph
Schemas are dynamic. Columns can be added on the fly and each row doesn’t have to contain data for each column.
Use UnQL (Unstructured Query Language).
Horizontally scalable easily.
Not ACID Compliant
Allows rapid development, stores a large volume of data with no structure.

Database Indexes

If the database search performance has been bad, we create indexes to improve that performance. Henceforth, the goal of creating an index on a particular table in a database is to make it faster to search through the table.

Indexes improve read performance, but decrease write performance. Consequently, indexes also increase memory usage. If your database is read-intensive, indexes are a good strategy. Don’t add indexes if the database is write-intensive.

Proxies

Proxy server is a piece of software or hardware that acts as an intermediary for requests from clients seekings resources from other servers. Accordingly, Proxies are used to filter requests, log requests, and sometimes transform the requests. Even more, proxy server cache can serve a lot of requests.

Open Proxy

An open proxy server is accessible by any internet user. As a result, any internet user is able to use the proxy for forwarding the requests.

Reverse Proxy

A reverse proxy retrieves resources on behalf of the client from one or more servers. Consequently, these resources are then returned to the client.

CAP Theorem

In any distributed system, you can not achieve all three consistency, availability, and partition tolerance.

CAP Theorem states that you can only get two out of these three options.

Consistency – All nodes see the same data at the same time.

Availability – Every request gets a response on success/failure.

Partition Tolerance – A partition tolerant system can tolerate any amount of network failure that doesn’t result in a failure of the entire network. Particularly, data replication across nodes helps to keep the system up.

Consistent Hashing

Consistent hashing is a mechanism that allows distributing the data across a cluster in such a way that will minimize reorganization when nodes are added or removed. As a result, when you employ consistent hashing, resizing of the hash table results in the remapping of k/n keys.

Conclusion

In conclusion, knowing these fundamentals about a distributed system can immensely help a developer while writing code or designing a system. By all means, study these fundamentals, but you should also learn about domain-driven design. Nonetheless, if you enjoyed this post, you can subscribe to my blog here.

References

System Design Primer – System Design Primer
System Design – System Design

Code Complete

Another Java Blog

Fundamentals of a Distributed System Design