Kafka to the Maxka - (Kafka Performance Tuning)
-
Upload
dataworks-summit -
Category
Technology
-
view
962 -
download
11
Transcript of Kafka to the Maxka - (Kafka Performance Tuning)
KAFKA TO THE MAXKA By Matt Andruff
Kafka Performance Tuning
Welcome!
Matt Andruff - Hortonworks Practice lead @ Yoppworks
@MattAndruff
Because I get asked a lot...Yoppworks
Because I get asked a lot...Yoppworks
Because I get asked a lot...Yoppworks
Performance Tuning...
Agenda
• Performance tuning - Just some quick points
• What you can change • Simple changes• Kafka Configuration Changes
• Brief Canned Demo• Beware Kafka settings are not exciting for everyone
• Architectural changes
Perfomance Tuning
What do you need to make changes?
Performance tuning
There is no magic bulletGuesses are just GuessesEmpirical fact requires testing
Requires hardware, SME’s, time, effort
It’s non-trivial to do performance testing.
Performance tuning
The better your load tests are the better your tuning will be.Garbage in, Garbage out.
Performance tuning
The better your load tests are the better your tuning will be.Garbage in, Garbage out.
Performance tuning
The better your load tests are the better your tuning will be.Garbage in, Garbage out.
Everyone (Every client) is differentHas a unique signature of data/hardware/topics
Performance tuning
The better your load tests are the better your tuning will be.Garbage in, Garbage out.
Everyone client is differentHas a unique signature of data/hardware/topics
Tune for bottlenecks found through testing.Yes, There is always some low hanging fruit.
Beyond Tuning
What your boss understands:
Beyond Tuning
What you understand:
First a minor detour to the OS
I promise to move fast but it can’t be ignored.
To be complete we need to cover some of the basics.
Which OS to use?
The basics
● Noatime ○ removes last access time from files○ Save’s a write on read.
The basics
● Ext 4 is widely in use● XFS has shown better performance
metrics
https://kafka.apache.org/documentation.html#filesystems
JVM settingsexport $KAFKA_JVM_PERFORMANCE_OPTS = ‘...’
Java 1.8-Xmx6g -Xms6g -XX:MetaspaceSize=96m -XX:+UseG1GC-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M-XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80
Java 1.7 beware of older versions-Xms4g -Xmx4g -XX:PermSize=48m -XX:MaxPermSize=48m -XX:+UseG1GC-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
The basics
● File descriptor limits○ Per broker Partitions * segments +
Overhead ■ Watch this when you upgrade to 0.10
● set vm.swappiness = 0
The basics
● Kafka Data should be on its own disks● If you encounter read/write issues add
more disks● Each data folder you add to config will
be written to in round robin
Latest is the Greatest
● Have you upgraded to 0.10 ● Add 8 bytes of time stamp
○ Not great for small messages.● No longer does broker decompression
○ Better performance when you use compression.● File descriptor limits
○ Segments indexing changed
Defaults are your friends
Defaults are your friends
The default when you drive is to put on your seatbelt.If you are going to change the default to not wearing a seatbelt I hope you have thought through your choice.
Kafka’s defaults are setup to help keep you safe.If you are going to change the default to something else I hope you have thought through your choice.
The Producer
Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from the server at all. (Set it and forget it.)
Highest Highest
Acks=1 Leader completes write of data.
Medium Medium
Acks=all All leaders and followers have written the data.
Lowest Lowest
Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from the server at all. (Set it and forget it.)
Highest Highest
Acks=1 Leader completes write of data.
Medium Medium
Acks=all All leaders and followers have written the data.
Lowest Lowest
Definitions:
Latency: The length of time for one message to be processed.
Throughput: The number of messages processed
Batch:• “Message 1” - Time 1• “Message 2” - Time 2• “Message 3” - Time 3
← Worst Latency
← Best Latency
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data” “data”
Batch -Partition 1- TopicA
Batch -Partition 1- TopicB
“data” “data” “data”
“data”
Segment
Batch Management
Batch.size - How many messages define the maximum batch size?
Linger.ms - What is the maximum amount of time to wait before sending a batch?
Other:- Same Broker Sending (Piggy Back)- flush() or close() is called
Batch Management
Producer
Broker
Partition 1 - TopicA
Batch -Partition 1- TopicA
Batch -Partition 1- TopicB
“data” “data” “data”
“data”
Segment
Partition 1 - TopicB
Segment
Batch Management
Default Message size is 2048 (If linger.ms is large)
Buffer.memory / Batch.size > Message size
33554432 / 16384 > 2048
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data” “data”
Segment
Batch Management
Default Message size is 2048 (If linger.ms is small)
Buffer.memory / Batch.size > Message size
33554432 / (< 16384) > (>2048)
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data”
Segment
Batch -Partition 1- TopicB
“data”
Partition 1 - TopicB
Segment
“data”
← Linger is triggeringBefore batch is full.
← Using bigger messages to fill the batch
Batch Management
Tune your Batch.size/linger.ms
batch.size + linger.ms = latency + through put
batch.size + linger.ms = latency + through put
Once tuned, do not forget to size your buffer.memory
Compression
Compression.type = none
Compression can introduce performance due to transferring less data over the network. (Cost of additional CPU)
Generalization: Use snappy ****** You should do real performance tests.
Batch ManagementProducer
Batch -Partition 1- TopicA
“data” “data”
Batch -Partition 1- TopicB
“data” “data”
Serializer Partitioner
Did we stick with the Defaults?
Custom Class written for performance?
● Partitioner ○ - Create a custom key based on data - help prevent Skew
● Serializer ○ - Pluggable
● Interceptors ○ - Allows manipulation of records into Kafka ○ - Are they being used? Should they? How are they written?
Tuning
To tune performance you need to experiment with different settings.Data and throughput are different with every project.There is no one size fits all.
Luckily there is a tool to help test configurations.
kafka-run-class.sh bin/kafka-run-class.sh \
org.apache.kafka.clients.tools.ProducerPerformance \test 50000000 100 -1 acks=1 \bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 \buffer.memory=67108864 batch.size=8196
Or use the short cut:bin/kafka-producer-perf-test.sh \
test 50000000 100 -1 acks=1 \bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 \
buffer.memory=67108864 batch.size=8196
There is also one for the consumer:bin/kafka-consumer-perf-test.sh \
Time for a quick walkthrough
Monitoring
Ops Clarity- Now owned by Lightbend - Cadillac of monitoring.
Burrow- A little Resource heavy, (Kafka client per partition)- Health monitor has some false positives
Yahoo Kafka-managerConfluent Control Center
- Confluent distro
Roll your own Kafka JMX & MBeans
Where did they get the name Kafka?
My Guess
Putting Apache Kafka to Use for Event Streams,https://www.youtube.com/watch?v=el-SqcZLZlI
~ Jay Kreps
Where did they get the name Kafka?
My Guess
Where did they get the name Kafka?
My Guess
Where did they get the name Kafka?
Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps
https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messaging-system
Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps
https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messaging-system
The Broker
Broker Disk Usage
● What your rate of growth and when will you need to expand?
● Try and make sure the number of partions you select covers that growth
Broker Disk Usage
● Log.retention.bytes■ Default is unlimited (-1)
● Log.retention.[time interval]■ Default is 7 days (168 hours)
Broker
● num.io.threads■ Default is 8 - should match physical
disks
Beyond Tuning
How do we optimize writing:
Beyond Tuning
Measure the throughput:
Beyond Tuning
The Consumer
replica.high.watermark.checkpoint.interval.ms - You might think that the high water mark ensures
reliability. It also has has implications on performance.
- Whatch our for consumer lag
Beyond Tuning
Beyond TuningThe future Consumers ability to scale is constrained by the number of partitions.
Beyond Tuning> # of Partitions means:
> Level of parallelism> # files open
( Partitions * Segment count * Replication) / Brokers ~= # of open files per machine
10’s of Thousands of files is manageable on appropriate hardware.> Memory usage (Broker and Zookeeper)> Leader fail over time (Can be mitigated by increased # brokers)
Beyond TuningHow do I calculate the number of partitions to have on a broker?
What’s the rule of thumb to start testing at?
[# partitions per broker] = c x [# brokers] x [replication factor]
c ~ Your machine's awesomeness
c ~ Your appetite for risk
c ~ 100 a good safe starting point
Beyond TuningCan I move an existing partition around? I just added a new broker, and it’s not sharing the load.
Use: bin/kafka-reassign-partitions.sh
1) Create a JSON file of the topics you want to redistribute topics.json2) Use kafka-reassign-partitions.sh … --generate to suggest partition reassignment 3) Copy proposed assignment to a JSON file.4) Use kafka-reassign-partitions.sh … --execute to start the redistirbution process.
a) Can take several hours, depending on data.5) Use kafka-reassign-partitions.sh … --verify to check progress of the redistirbution process.
Link to documentation from conference sponsor.
topics.json:{"topics": [{"topic": "weather"}, {"topic": "sensors"}], "version":1}
Thanks!
Matt Andruff - Hortonworks Practice lead @ Yoppworks
@MattAndruff
I’m not an expert I just sound like one.