Apache Kafka for Modern Distributed Systems

With the growing popularity of digital services, many companies have to handle millions and even billions of requests per day. Depending on the digital product, those requests can come from third-party services that call their APIs or from real human users that use their online services. This large scale of operation forces software companies to abandon the traditional centralized software approach and migrate to distributed systems instead.

 
iStock-1212102699-l.jpg
 

Challenges of Distributed Systems

Distributed Systems pose many challenges. They involve many instances of different applications or services, running on different machines and communicating with each other through the network.

Network failures and network partitions are only part of the problem.

In a typical modern distributed system we can have multiple services calling each other to complete a task. This creates a very dangerous synchronous dependency between a chain of services, while each one can go down at any moment.

services+chain+%281%29.jpg

There are many reasons while an application may go down in a distributed system.

One of the reasons is pure probability. Even the most reliable hardware breaks, and the more computers we have the higher the probability is that some component will break any moment. Software failures due to exceptions or bugs are even more common. But even aside from failures, some downtime is necessary to roll out a new release or upgrade the operating system. This is where Apache Kafka comes in.

Introduction to Apache Kafka

Apache Kafka is a distributed message broker that allows services to exchange messages without the need for both services to be online and available at the moment of the exchange. Apache Kafka can sit in between services and act as a virtual buffer. You can think of it as a distributed message queue on steroids.

services+chain.jpg

An abstraction that Kafka uses for publishing messages is called a Kafka Topic. Services can publish messages to a Kafka topic and consume messages from a Kafka topic.

services+chain+%281%29.jpg

Each topic can be further broken down into partitions. Messages within the same topic’s partition are ordered but there’s no global order of messages within a Kafka topic as a whole.

services+chain+%282%29.jpg

Also, unlike direct HTTP requests that have a single sender and a single receiver, messages to Kafka can be broadcasted to multiple consumers, which follows the common pub/sub pattern.

Benefits of Apache Kafka for Distributed Systems

To ensure reliable delivery, high availability and scalability, Apache Kafka is designed as a distributed system on its own from the ground up.

It uses all the state-of-the-art modern techniques for building fault-tolerant and high-performance applications and distributed databases that power the top e-commerce, search, video on demand, and ride-sharing companies as well as cloud services. 

Reliability

The reliability of Apache Kafka is very configurable. It can be used to deliver mission critical financial events like money transactions with strong guarantees on its delivery semantics. But it can also be used for delivery of logs and metrics that require a lot less reliability and can save on operational overhead

Scalability and Performance

Apache Kafka can scale to 10 GBps of throughout with under 10ms of latency which is one of the reasons 80% of fortune 100 companies use Apache Kafka in production. And thanks to the partitioning of each Kafka Topic, Kafka supports high concurrency both for producers and consumers.

Integration with Client Code Base

Apache Kafka also supports client libraries in languages like Java, C/C++, .NET, Python, Go and many others. This allows software teams to develop services the language of their choice and allow easy integration with Kafka, both for publishing messages to Kafka and consuming messages from Kafka.

If you want to learn how Kafka achieves its high availability and scalability so you can apply the same techniques for your system, check out Distributed Systems & Cloud Computing with Java. In this course you will learn all the fundamentals of modern Distributed Systems and Cloud Computing as well as the practical application of Apache Kafka for your business. The course is targeted towards Java developers and covers a variety of open source technologies like Apache Zookeeper, Distributed MongoDB, HAProxy and many others.

More Articles

Previous
Previous

The Hidden Benefits of Java Multithreading

Next
Next

Top 3 Projects for Java Concurrency