Introduction to Kafka

33
Introduction to Kafka BY DUCAS FRANCIS

Transcript of Introduction to Kafka

Page 1: Introduction to Kafka

Introduction to KafkaBY DUCAS FRANCIS

Page 2: Introduction to Kafka

The problem

Web Security System

Real-time Monitoring

Logging SystemOther

services

Mobile

API

Job

It’s simple enough at first…

Then it gets a little busy…

And ends up a mess.

Page 3: Introduction to Kafka

The solution

Web Security System

Real-time Monitoring

Logging SystemOther

services

Mobile

API

Job

Pub/Sub

Decouple data pipelines using a pub/sub system

Producers Brokers Consumers

Page 4: Introduction to Kafka

Apache KafkaA UNIFIED, HIGH-THROUGHPUT, LOW-LATENCY PLATFORM FOR HANDLING REAL-TIME DATA FEEDS

Page 5: Introduction to Kafka

A brief history lesson

Originally developed at LinkedIn in 2011 Graduated Apache Incubator in 2012 Engineers from LinkedIn formed Confluent in 2014 Up to version 0.9.4 with 0.10 on horizon

Page 6: Introduction to Kafka

Motivation

Unified platform for all real-time data feeds High throughput for high volume streams Support periodic data loads from offline systems Low latency for traditional messaging Support partitioned, distributed, real-time processing Guarantee fault-tolerance

Page 7: Introduction to Kafka

Common use cases

Messaging Website activity tracking Metrics Log aggregation Stream processing Event sourcing Commit log

Page 8: Introduction to Kafka

Benefits of Kafka

High throughput Low latency Load balancing Fault tolerant Guaranteed delivery Secure

Page 9: Introduction to Kafka

Performance comparison

Page 10: Introduction to Kafka

Batch performance comparison

Page 11: Introduction to Kafka

Some terminology

Topic – feed of messages Producer – publishes messages to a topic Consumer – subscribes to topics and processes the feed of messages Broker – server instance that acts in a cluster

Page 12: Introduction to Kafka

@apachekafka

powers @

microsot…

Page 13: Introduction to Kafka

Libraries

Python – kafka-python / pykafka Go – sarama / go_kafka_client / … C/C++ - librdkafka / libkafka / … .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka Node.js – kafka-node / sutoiku/node-kafka / ... HTTP – kafka-pixy / kafka-rest

etc.

Page 14: Introduction to Kafka

Architecture

Producer Producer

Broker BrokerBroker

Consumer ConsumerZookeeper

Cluster

x3

Page 15: Introduction to Kafka

Show me the Kafka!!! VAGRANT TO THE RESCUE

Page 16: Introduction to Kafka

Anatomy of a topic

Topics are broken into partitions Messages are assigned sequential

ID called and offset Data is retained for a

configurable period of time Number of partitions can be

increased after creation, but not decreased

Partitions are assigned to brokers

Each partition is an ordered, immutable sequence of messages that is continually appended to…a commit log.

Page 17: Introduction to Kafka

Broker

Kafka service running as part of a cluster Receives messages from producers and serves them to consumers Coordinated using Zookeeper Need odd number for quorum Store messages on the file system Replicate messages to/from other brokers Answer metadata requests about brokers and topics/partitions As of 0.9.0 – coordinate consumers

Page 18: Introduction to Kafka

Replication

Partitions on a topic should be replicated Each partition has 1 leader and 0 or more followers An In-Sync Replica (ISR) is one that’s communicating with Zookeeper

and not too far behind the leader Replication factor can be increased after creation, not decreased

Page 19: Introduction to Kafka

./kafka-topics--CREATE--REPLICATION-FACTOR--PARTITIONS

--DESCRIBE

Page 20: Introduction to Kafka

Producers

Publishes messages to a topic Distributes messages across partitions

Round-robin Key hashing

Send synchronously or asynchronously to the broker that is the leader for the partition ACKS = 0 (none),1 (leader), -1 (all ISRs) Synchronous is obviously slower, but more durable

Page 21: Introduction to Kafka

Testing... Testing… 1 2 3

LET’S SEE HOW FAST WE CAN PUSH

Page 22: Introduction to Kafka

Consumers

Read messages from a topic Multiple consumers can read from the same topic Manage their offsets Messages stay on Kafka after they are consumed

Page 23: Introduction to Kafka

Testing... Testing… 1 2 3

LET’S SEE HOW FAST WE CAN RECEIVE

Page 24: Introduction to Kafka

It’s fast! But why…?

Efficient protocol based on message set Batching messages to reduce network latency and small I/O operations Append/chunk messages to increase consumer throughput

Optimised OS operations pagecache sendfile()

Broker services consumers from cache where possible End-to-end batch compression

Page 25: Introduction to Kafka

Load balanced consumers

Distribute load across instances in a group by allocating partitions Handle failure by rebalancing partitions to other instances Commit their offsets to Kafka

ClusterBroker 1 Broker 2P0 P1 P2 P3

Consumer Group 1

C0 C1Consumer Group 2

C2 C3 C4 C6

Page 26: Introduction to Kafka

Consumer groups and offsets

ClusterBroker 1 Broker 2P0 P1 P2 P3

Consumer Group 1

C0 C1

0 1 2 3 4 5 6 7 8 9 10P3

C1read

C1commit

C0read

C0commit

Page 27: Introduction to Kafka

Guarantees

Messages sent by a producer to a particular topic’s partition will be appended in the order they are sent

A consumer instance sees messages in the order they are stored in the log

For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log

Page 28: Introduction to Kafka

Ordered delivery

Messages are guaranteed to be delivered in order by partition, NOT topic

M1 M3 M5

M2 M4 M6

P0

P1

M1 before M3 before M5 – YES M1 before M2 – NO M2 before M4 before M6 – YES M2 before M3 - NO

Page 29: Introduction to Kafka

Enough ALT… now .NET USING RDKAFKA-DOTNET

Page 30: Introduction to Kafka

FIN. THANK YOU

Page 32: Introduction to Kafka

Log compaction

Keep the most recent payload for a key Use cases

Database change subscription Event sourcing Journaling for HA

Page 33: Introduction to Kafka

Log compaction