Streaming in Practice - Putting Apache Kafka in Production

Streaming in PracticePutting Apache Kafka in Production

Roger Hoover, Engineer, Confluent

Apache Kafka: Online Talk SeriesPart 1: September 27 Part 2: October 6 Part 3: October 27

Part 4: November 17 Part 6: December 15Part 5: December 1

Introduction To Streaming Data and Stream Processing with Apache Kafka

Deep Dive into Apache Kafka

Demystifying Stream Processing with Apache Kafka

Data Integration with Apache Kafka

A Practical Guide to Selecting a Stream

Processing Technology

Streaming in Practice: Putting

Apache Kafka in Production

https://www.confluent.io/apache-kafka-talk-series/

Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters

Architecture

Kafka cluster

broker 1…

producer

consumer

broker 2 broker n topic partition

server 1

server 2

server 3

ZooKeepercluster

Operations• Simple Deployment• Rolling Upgrades• Good metrics for component monitoring

Two Example Apps• User activity tracking

• Collect page view events while users are browsing our web and mobile storefronts

• Persist the data to HDFS for subsequent use in recommendation engine

• Inventory adjustments• Track sales, maintain inventory, and re-order

on-demand

Application Priorities• User activity tracking

• High throughput (100x the sales stream)• Availability is most important• Low retention required - 3 days

• Inventory adjustments• Relatively low throughput• Durability is most important• Long retention required – 6 months

Knobs- Partition count- Replication factor- Retention- Batching + compression- Producer send acknowledgements- Minimum ISRs- Unclean Leader Election

Partition Count- Partitions are the unit of consumer parallelism- Over-partition your topics (especially keyed topics)- Easy to add consumers but hard to add partitions for keyed topics- Kafka can support ~10s k partitions

Partition Count- High Throughput (User activity tracking)

- Large number of partitions (~100)- Fewer Resources (Inventory adjustments)

- Smaller number of partitions (< 50)

Replication Factor- More replicas require more storage, disk I/O, and network bandwidth- More replicas can tolerate more failures

topic1-part1

broker 1

topic1-part2

broker 2

topic2-part2

topic2-part1

broker 3

topic1-part1

broker 4

topic1-part2

topic2-part2 topic1-part1 topic1-part2

topic2-part1

topic2-part2

topic2-part1

Replication Factor- Lower cost (User activity tracking)

- replication.factor = 2- High Fault Tolerance (Inventory adjustments)

- replication.factor = 3- Defaults to 1

Retention- Retention time can be set per topic- Longer retention times require more storage (imagine that!)- Longer retention allows consumers to rewind further back in time

- Part of the consumer’s SLA!

Retention- Less Storage (User activity tracking)

- log.retention.hours=72 (3 days)- Longer Time Travel (Inventory adjustments)

- log.retention.hours=4380 (6 months)- Default is 7 days

Side-note: Time Travel- Kafka 0.10.1 supports rewinding by time

- E.g. “Rewind to 10 minutes ago”

Batching & Compression- Producer: batch.size, linger.ms, compression.type- Consumer: fetch.min.bytes, fetch.wait.max.ms

compressed batch 1send()

send()send()send()

producer

asyncflush

poll()compressed batch 2

compressed batch 3

compressed batch 1

compressed batch 2

compressed batch 3

consumerbroker

Batching & Compression- High throughput (User activity tracking)

- Producer: compression.type=lz4, batch.size (256KB), linger.ms (~10ms) or flush manually

- Consumer: fetch.min.bytes (256KB), fetch.wait.max.ms (~10ms)- Low latency (Inventory adjustments)

- Producer: linger.ms=0- Consumer: fetch.min.bytes=1

- Defaults- compression.type = none- linger.ms = 0 (i.e. send immediately)- fetch.min.bytes = 1 (i.e. receive immediately)

Producer Acknowledgements on Send

broker 1

producer

leader

broker 2

follower

broker 3

follower

3commit

When producer receives ack Latency Durability on failures

acks=0 (no ack) no network delay some data loss

acks=1 (wait for leader) 1 network roundtrip a few data loss

acks=all (wait for committed) 2 network roundtrips no data loss

topic1-part1 topic1-part1 topic1-part1consumer

Producer Acknowledgements on Send- Throughput++ (User activity tracking)

- acks = 1- Durability++ (Inventory adjustments)

- acks = all- Default

- acks = 1

In-Sync Replicas (ISRs)

broker 1

producer

leader

broker 2

follower

broker 3

follower

m1 m1 m1

m2 m2 m2

last committed

m2, m1

In-sync : replica reads from leader’s log end within replica.lag.time.max.ms

Minimum In-Sync Replicas

broker 1

producerleader

broker 2

follower

broker 3

m1 m1 m1

m2 m2 m2

m4last committed

m5 follower

- Topic config to tell Kafka how to handle writes during severe outages (rare)

- Leader will reject writes if the ISR count is too smalltopic1: min.insync.replicas=2

Minimum In-Sync Replicas- Availability++ (User activity tracking)

- min.insync.replicas = 1- Durability++ (Inventory adjustments)

- min.insync.replicas = 2- Defaults to 1

Unclean Leader Election- Topic config to tell Kafka how to handle topic leadership during severe

outages (rare)- Allows automatic recovery in exchange for losing data

broker 1

producer

leader ???

broker 2

leader

broker 3

m1 m1 m1

m2 m2 m2

m4 m4last committed

follower

Unclean Leader Election- Availability++ (User activity tracking)

- unclean.leader.election.enable = true- Durability++ (Inventory adjustments)

- unclean.leader.election.enable = false- Defaults to true

Mission Critical Data- Producer acknowledgments

- acks=all- Replication factor

- replication.factor = 3- Minimum ISRs

- min.insync.replicas = 2- Unclean Leader Election

- unclean.leader.election.enable = false

Replica Placement• Partitions are replicated• Replicas are spread evenly across the cluster• Only when the topic is created or modified

topic1-part1

broker 1

topic1-part2

broker 2

topic2-part2

topic2-part1

broker 3

topic1-part1

broker 4

topic1-part2

topic2-part1

topic2-part2

topic2-part1

Replica Placement• Over time broker load and storage become unbalanced• Initial replica placement does not account for topic throughput or

retention• Adding or removing brokers

topic1-part1

broker 1

topic1-part2

broker 2

topic2-part2

topic2-part1

broker 3

topic1-part1

broker 4

topic1-part2

topic2-part2

topic1-part1 topic1-part2topic2-part1

topic2-part2

topic2-part1

broker 5

Replica Reassignment• Create plan to rebalance replicas• Upload new assignment to the cluster• Kafka migrates replicas without disruption

topic1-part1

broker 1

topic1-part2

broker 2

topic2-part2

topic2-part1

broker 3

topic1-part1

broker 4

topic1-part2

topic1-part1

topic1-part2topic2-part1

topic2-part2

broker 5

topic2-part1

topic2-part2

topic1-part1

broker 1

topic1-part2

broker 2

topic2-part2

topic2-part1

broker 3

topic1-part1

broker 4

topic1-part2

topic2-part2

topic1-part1 topic1-part2topic2-part1

topic2-part2

topic2-part1

broker 5

Before

Data Balancing: Tricky Parts• Creating a good plan

• Balance broker disk space• Balance broker load• Minimize data movement• Preserve rack placement

• Movement of replicas can overload I/O and bandwidth resources• Use replication quota feature in 0.10.1

Data Balancing: Solutions• DIY

• kafka-reassign-partitions.sh script in Apache Kafka• Confluent Enterprise Auto Data Balancing

• Optimizes storage utilization• Rack awareness and minimal data movement• Leverages replication quotas during rebalance

Use cases • Disaster Recovery• Replicate data out to geo-localized data centers• Aggregate data from other data centers for analysis• Part of hybrid cloud or cloud migration strategy

Multi-DC: Two Approaches• Stretched cluster• Mirroring across clusters

Stretched Cluster• Low-latency links between 3 DCs. Typically AZs in a single AWS region.• Applications in all 3 DCs share the same cluster and handle failures automatically.• Relies on intra-cluster replication to copy data across DCs (replication.factor >= 3)

• Use rack awareness in Kafka 0.10; manual partition placement otherwise

producers

consumers

AZ 2 produce

rsproduce

consumers

AWS Region

Mirroring Across Clusters• Separate Kafka clusters in each DC. Mirroring process copies data between them.• Several variations of this pattern. Some require manual intervention on failover and

recovery.

How to Mirror Across Clusters• MirrorMaker tool in Apache Kafka

• Manual topic creation• Manual sync of topic configuration

• Confluent Enterprise Multi-DC• Dynamic topic creation at the destination• Automatic sync for topic configurations (including access controls)• Can be configured and managed from the Control Center UI• Leverages Connect API

More Information: Tuning Tradeoffs• Apache Kafka and Confluent Documentation• When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka

• Gwen Shapira and Jeff Holoman - https://www.confluent.io/kafka-summit-2016-ops-when-it-absolutely-positively-has-to-be-there/

• Chapter 6: Reliability Guarantees• Neha Narkhede, Gwen Shapira, Todd Palino – Kafka: The Definitive Guide

• Confluent Operations Training

More Information: Multi-DC• Building Large Scale Stream Infrastructures Across Multiple Data Centers with Apache

Kafka – Jun Rao• Video: https://www.youtube.com/watch?v=XcvHmqmh16g• Slides: http://www.slideshare.net/HadoopSummit/building-largescale-stream-

infrastructures-across-multiple-data-centers-with-apache-kafka• Confluent Enterprise Multi-DC - https://www.confluent.io/product/multi-datacenter/

More Information: Metadata Management• Yes, Virginia, You Really Do Need a Schema Registry

• Gwen Shapira - https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you-really-need-one/

Thank you!www.kafka-summit.org May 8, 2017

New York CityHilton Midtown

August 28, 2017San FranciscoHilton Union Square

Streaming in Practice - Putting Apache Kafka in Production

Software

Transcript of Streaming in Practice - Putting Apache Kafka in Production

Streaming mitApache Kafka - JUG Saxony Day · Streaming mit Apache Kafka Kafka: The Definitive Guide Real-time dataandstream processingat scale von NehaNarkhede, Gwen Shapira, Todd

Best Practices for Developing Apache Kafka Applications on ... · Confluent Cloud is a fully managed service for Apache Kafka®, a distributed streaming platform technology. Engineers

Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Data Streaming with Apache Kafka & MongoDB

Apache kafka-a distributed streaming platform

Building Streaming Data Applications Using Apache Kafka

OCI Streaming Service Level 100Apache Kafka is an open source pub/sub system; OCI Streaming Vs Apache Kafka Adding Connectors, Stream Processing, Kafka compatibility in H2 2019 OCI

Apache Kafka

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen

12062018 The Unmatchable ROI of Managed …...THE UNMATCHABLE ROI OF MANAGED APACHE KAFKA SERVICES | 04 Thousands of companies are using Apache Kafka to build streaming applications

Amazon Managed Streaming for Apache Kafka - 개발자 가이드 · 2020. 12. 2. · Amazon Managed Streaming for Apache Kafka 개발자 가이드 Amazon MSK는 클러스터에 대한

Amazon Managed Streaming para Apache Kafka - Guía para ...€¦ · Amazon Managed Streaming para Apache Kafka (Amazon MSK) es un servicio totalmente administrado que le permite crear

Apache Kafka OverviewKafka Introduction Apache Kafka is a high performance, highly available, and redundant streaming message platform. Kafka functions much like a publish/subscribe

Apache Kafka with Spark Streaming: Real-time Analytics Redefined

Streaming Visualization...Source: adapted from Tibco Edge. Apache Kafka –A Streaming Platform Kafka Cluster Consumer 1 Consume 2r Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK

Apache Kafka - Masaryk Universitydisa.fi.muni.cz/wp-content/uploads/ApacheKafka.pdf · 2018-05-14 · What is Apache Kafka A distributed streaming platform Originally developed by

Apache Kafka Overview€¦ · Apache Kafka is a high performance, highly available, and redundant streaming message platform. Kafka functions much like a publish/subscribe messaging

IBM Event Streams - Event Streaming with Apache Kafka in ...

Apache Kafka Event-Streaming Platform for .NET Developers · 1 Apache Kafka Event-Streaming Platform for .NET Developers October, 2019 @gamussa | #SpringOne | @ConfluentINc #SpringOne

Amazon Managed Streaming for Apache Kafka - Amazon MSK … · existing applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring