Apache Kafka is a distributed streaming platform that functions as a real-time, fault-tolerant, highly scalable messaging system. It’s designed to handle streams of records and operates in a publisher-subscriber model. Kafka offers four key capabilities: it lets you publish and subscribe to streams of records, store streams of records in a fault-tolerant manner, process streams of records as they occur, and build real-time streaming applications.

Kafka is used for two broad classes of applications: building real-time streaming data pipelines that reliably get data between systems or applications, and building real-time streaming applications that transform or react to the streams of data.

Kafka runs as a cluster on one or more servers, and the Kafka cluster stores streams of records in categories termed as topics. Each record consists of a key, a value, and a timestamp. Kafka’s robustness comes from the replication of topic log partitions to multiple servers, ensuring that data is not lost if a server fails.

Kafka’s clients include producers, which publish data to topics, and consumers, which subscribe and process these topics. Kafka’s simple yet powerful architecture makes it a popular tool for handling real-time data feeds.

Go to source article: http://kafka.apache.org/