Tuning kafka pipelines

Tuning Kafka Pipelines

October 7, 2017

Sumant Tambe

Sr. Software Engineer, Streams Infra, LinkedIn

My backgroundBlogger

Coditation—Elegant Code for Big Data

Author (wikibook)

Open-source contributor

Visual Studio and Dev Tech

Reviewer

Tuning Truly Global Production Kafka Pipelines

Data Source

(Hadoop)

Kafka Venice Feed

East Coast

Mirror-MakerTo west-coast

Mirror-MakerTo Asia

Mirror-MakerTo east-coast

Mirror-MakerTo gulf-coast

Gulf Coast

WestCoast

Asia

Kafka Venice

Kafka Venice

Kafka Venice

Kafka Venice

VeniceConsumers

VeniceConsumers

VeniceConsumers

VeniceConsumers

East Coast

But first, some basics…

• Kafka: Distributed Messaging System rethought as a distributed commit log

Producer 1

Kafka Cluster

Broker 1

Broker 2

P0

P1’

P1

P0’

Consumer Group A

Consumer Group B

A1

A2

B1

Producer 2

Topic T

Log

Log

Replication

Topic T has 2 partitions P0 and P1.P0’ and P1’ are replicas of P0 and P1.

Moving Data Is Critical in Internet Companies

(Image Credit: Kafka Online Documentation)

Kafka Pipeline

• Why Kafka-based Pipelines• Producer/Consumer Throughput and Time Decoupling

• Large, Reliable, Durable buffer

• Data replication for high availability of data

Producer

Source KafkaCluster

Kafka Mirror-Maker

Cluster

Destination Kafka

ClusterConsumer

Log Log

The main value Kafka provides to data pipelines is its ability to serve as a very large, reliable buffer between various stages in the pipeline, effectively

decoupling producers and consumers of data within the pipeline.

Anatomy of a Kafka Pipeline

(Image Credit: Kafka Definitive Guide, O’Reilly)

Aspects of Kafka Pipelines• Reliability and Availability

• Replication Topologies (Structure)

• Time Decoupling

• Durability

• Throughput

• Latency

• Data Integration and Schemas

• Transformations

• Fair Load Distribution

• Migration/Upgrades

• Topic Lifecycle Management

• DDoS Prevention and Quotas

• Auditing

Reliability and Availability

• Must avoid single points of failure

• Allow fast and automatic recovery

• Most systems need at-least once delivery guarantee• Do not lose data

• But, be ready for duplicates

Replication Topologies

Hub and Spoke Architecture(Image Credit: Kafka Definitive Guide, O’Reilly)

Kafka Cluster

Local Apps

Kafka Cluster

Local Apps

Kafka Cluster

Local Apps

Kafka Cluster

Local Apps

Kafka Cluster

Local Apps

Crossbar Architecture(LinkedIn)

There are many more replication topologies

Each arrow is a Mirror-Maker

Cluster

Kafka Pipelines in Industrial IoT

Coditation[link]

telemetry

(Dotted lines and shaded shapes mean passive replication)

https://coditation.wordpress.com/2017/05/27/kafka-in-industrial-iot/

Durability (no-loss data pipeline)

• Durability interacts with throughput and latency

• Durability levels change depending upon producer configurations

Producer Configurations Throughput Latency Durability Ordered

acks=0 High Low No guarantee Yes

acks=1 Medium Medium Leader Yes

acks=all (-1) Low High In Sync Replicas Yes

Kafka Mirror-Maker

Cluster

Throughput• Producer and consumer throughputs are decoupled

• Add/Remove producers and consumers independently

• Throughput scales with cluster size

• Increase parallelization by increasing partitions

• Throughput also depends on co-location• Remote consume throughput is much greater than remote produce

• Consumers can batch much more data in a response than producer requests

Source KafkaCluster

Destination Kafka

ClusterLog Log

Kafka Mirror-Maker

Cluster

Remote Produce Remote Consume

Datacenter 1 Datacenter 2

Configurations For Tuning Throughput [link]

Producer

Source KafkaCluster

Kafka Mirror-Maker

Cluster

Destination Kafka

ClusterConsumer

Log Log

Producer Configurations Kafka Broker Configurations KMM Configurations Consumer Configurations

batch.size num.replica.fetchers All producer and

consumer configs are

applicable

Increase # of topic

partitions

linger.ms replica.fetch.max.byte

s

Consumer to producer

ratio

fetch.message.max.byt

es

compression.type Disable inter-broker

SSL

fetch.min.bytes

acks socket.receive.buffer.

bytes

max.in.flight.requests

.per.connection

send.buffer.bytes

(also TCP buffers)

https://kafka.apache.org/documentation/#configuration

Latency• Typical latency few hundred milliseconds

• Latency SLA depends on availability SLA• One 60-minutes downtime in a week is 99.4% availability (Assuming a weekly report)

• One 1-minute downtime in a week is 99.99% availability (Assuming a weekly report)

• But SLA can be fragile• Large Mirror-Maker clusters could take minutes to rebalance

• Maintenance of Mirror-Maker clusters could take several minutes

• Bounce Mirror-Maker cluster with 100% concurrency (to avoid repetitive rebalances)

• Configurations that affect pipeline latency• Producer linger.ms and acks

• Topic replication factor

Data Integration and Schemas

• Kafka is schema agnostic

• But applications must be protected from backwards incompatible changes to schema

• Schema-registry

• Data Integration should support schema evolution• Only backwards compatible schema evolution

• But bend the rules if/when needed

• Single topic with multiple schemas

• Propagate schema changes automatically through the pipeline

Transformations

• Extract-Transform-Load • Thick pipeline (with significant processing logic)

• Complex

• Potentially inflexible

• Extract-Load-Transform • Thin pipeline, minimal

• Flexible

• Repeated computations

• Pipelines (Brokers and Mirror-Makers) remain schema agnostic (and hence easy to manager)

Fair Load Distribution

• Ideal: Each Kafka Mirror Maker should share the burden equally

• But• When brokers go up/down partition imbalance can happen because Preferred

Leader Election is not run

• Imbalance in partitions and change in partition leadership may caused KMM to exceed quotas

• Remedy: Move partitions manually

Migration/Upgrades

• Upgrading hardware for brokers • More cores

• More memory

• Faster NIC

• If you reduce # of brokers • Must increase quotas

• Increase num.replica.fetchers

• Increase replica.fetch.response.max.bytes

Topic Lifecycle Management

• Topic creation • Topic should be created in the destination cluster first

• If not, Mirror-Maker will start replicating the topic and may fail to produce (or a topic with default configs gets created)

• Topic deletion• Topic should be deleted in the source cluster first

• But only when no one is producing or consuming

• If topic is deleted in the source cluster, the mirror-maker will cause them to be recreated with default configs due to metadata refresh

DDoS Prevention and Quotas

• Hadoop to Kafka pipeline gets DDoS easily• 800+ mappers in some cases

• Should use reducers instead

• Quotas on incoming byte rate

• Byte rate may be low but request-rate also matters• Request-rate throttling is available in Kafka 0.11.

• Mirror-Makers batch very well so request-rate throttling is not necessarily needed

Back To Tuning Global Kafka Pipelines

Global PROD Kafka Pipelines for Venice

Data Source

(Hadoop)

Kafka Venice Feed

East Coast

Kafka MMTo west-coast

Kafka MMTo Asia

Kafka MMTo east-coast

Kafka MMTo gulf-coast

Gulf Coast

WestCoast

Asia

Kafka Venice

Kafka Venice

Kafka Venice

Kafka Venice

VeniceConsumers

VeniceConsumers

VeniceConsumers

VeniceConsumers

East Coast

Low throughput

Low throughput

The Slow Throughput Problem (One Topic Experiment)

22 min38 min

Replication to West Coast = 54 mins

Replication to Asia = 180 min

CPU Utilization On Slow Mirror-MakersTo Asia (this one was the slowest)

To West coast (slower)

AverageCPU Util(aggregate)

Max CPUUtil(aggregate)

To Gulf Coast

96% 165%

To East Coast

104% 165%

To West Coast

40% 90%

To Asia 16% 60%

CPU Utilization on the Best Mirror-Makers

To Gulf coast (best)

Setup

• Producer Setup

• 100 GB data in each push from Hadoop

• 840 mappers producing data

• Kafka Broker Setup

• 4 large brokers, 32 cores each, 256 GB RAM each

• Broker replication over SSL

• Topic Replication Factor=3

• Producer ACK = -1 (all)

• Partitions = 200

• Mirror Maker Setup

• 4 independent groups

• 10 processes in each cluster

• 8 consumers in each process

• 80 consumers in each pipeline

• It’s CPU bound (due to decompression)

High Ping Latency

• From East Coast

East coast Gulf Coast West Coast Asia

0.025 ms 29 ms 67 ms 236 ms

Text Book Solution• Don’t remote produce. Prefer remote consume and local produce

• Increase max.in.flight.request.per.connection > 1

Data Source

(Hadoop)

Kafka Venice Feed

Kafka MMTo east-coast

Kafka MMTo gulf-coast

Gulf Coast

WestCoast

Asia

Kafka Venice

Kafka Venice

Kafka Venice

Kafka Venice

VeniceConsumers

VeniceConsumers

VeniceConsumers

VeniceConsumers

East Coast

Kafka MMTo west-coast

Kafka MMTo Asia

Text Book Solution Was Not Practical (at the moment)

• Must guarantee order (max.in.flight.requests.per.connection must be 1)

• Must open ACLs (firewall ports) for incoming remote connections. Takes time.

• Must have hardware capacity in the destination datacenter

Key Observations and Remedies • High Ping Latency

• From East-coast

• Four Source brokers• 150+ Under Replicated Partitions (URP)

• 840 mappers (producers) is simply way to many Replaced by reducers

• SSL has overhead Disable inter-broker SSL

• Imbalanced response time

• Unequal workload on the brokers. Should do manual replica movement to spread load evenly

• Kafka Mirror Maker• Under provisioned machines. 4 cores only. Must change to 8 cores.

• 200 partitions and 80 consumers 2 or 3 partitions per consumer Each consume talks to at most 3 brokers Inefficient Fetch Must increase # of partitions

• Producer batch.size=100K Must increase batch size (1 MB max is allowed)

• Producer send.buffer.bytes=128K Must increase send.buffer.bytes (10 MB)

• Just 1 producer per process. At most one request in flight at a time Can’t change that because order must be preserved

East coast Gulf Coast West Coast Asia

0.025 ms 29 ms 67 ms 236 ms

The Solution That Saved The Day Week

• Remote produce

• Max-in-flight = 1

• Increased batch.size to 1 MB and send.buffer.bytes to 10 MB• But there was a bug. Producer estimated batch sizes incorrectly. • Sent larger than 1MB batches to the broker.• Sporadic REQUEST_TO_LARGE exceptions. Shuts down KMM.

• Disabled compression estimation• Pack a batch up to 1 MB, compress, and send. • Resulting compressed batch size up to 650K (30% unutilized)

A Well-behaved Global Kafka Pipeline (One Topic)

23 minutes (SLA = 30 mins)

Well-Behaved KMM CPU UtilizationTo West Coast

To Asia

To Gulf Coast

To East Coast

Acknowledgements

• Kafka Dev and SRE Team, LinkedIn

• Venice Team, LinkedIn

• More Reading on LinkedIn Engineering Blog• Kafka Articles

• Venice Articles

https://engineering.linkedin.com/blog/topic/kafka

https://engineering.linkedin.com/blog/topic/venice

Thank You!

Tuning kafka pipelines

Software

Transcript of Tuning kafka pipelines