Kafka to the Maxka - (Kafka Performance Tuning)

Post on 22-Jan-2018

965 views 11 download

Transcript of Kafka to the Maxka - (Kafka Performance Tuning)

KAFKA TO THE MAXKA By Matt Andruff

Kafka Performance Tuning

Welcome!

Matt Andruff - Hortonworks Practice lead @ Yoppworks

@MattAndruff

Because I get asked a lot...Yoppworks

Because I get asked a lot...Yoppworks

Because I get asked a lot...Yoppworks

Performance Tuning...

Agenda

• Performance tuning - Just some quick points

• What you can change • Simple changes• Kafka Configuration Changes

• Brief Canned Demo• Beware Kafka settings are not exciting for everyone

• Architectural changes

Perfomance Tuning

What do you need to make changes?

Performance tuning

There is no magic bulletGuesses are just GuessesEmpirical fact requires testing

Requires hardware, SME’s, time, effort

It’s non-trivial to do performance testing.

Performance tuning

The better your load tests are the better your tuning will be.Garbage in, Garbage out.

Performance tuning

The better your load tests are the better your tuning will be.Garbage in, Garbage out.

Performance tuning

The better your load tests are the better your tuning will be.Garbage in, Garbage out.

Everyone (Every client) is differentHas a unique signature of data/hardware/topics

Performance tuning

The better your load tests are the better your tuning will be.Garbage in, Garbage out.

Everyone client is differentHas a unique signature of data/hardware/topics

Tune for bottlenecks found through testing.Yes, There is always some low hanging fruit.

Beyond Tuning

What your boss understands:

Beyond Tuning

What you understand:

First a minor detour to the OS

I promise to move fast but it can’t be ignored.

To be complete we need to cover some of the basics.

Which OS to use?

The basics

● Noatime ○ removes last access time from files○ Save’s a write on read.

The basics

● Ext 4 is widely in use● XFS has shown better performance

metrics

https://kafka.apache.org/documentation.html#filesystems

JVM settingsexport $KAFKA_JVM_PERFORMANCE_OPTS = ‘...’

Java 1.8-Xmx6g -Xms6g -XX:MetaspaceSize=96m -XX:+UseG1GC-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M-XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80

Java 1.7 beware of older versions-Xms4g -Xmx4g -XX:PermSize=48m -XX:MaxPermSize=48m -XX:+UseG1GC-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35

The basics

● File descriptor limits○ Per broker Partitions * segments +

Overhead ■ Watch this when you upgrade to 0.10

● set vm.swappiness = 0

The basics

● Kafka Data should be on its own disks● If you encounter read/write issues add

more disks● Each data folder you add to config will

be written to in round robin

Latest is the Greatest

● Have you upgraded to 0.10 ● Add 8 bytes of time stamp

○ Not great for small messages.● No longer does broker decompression

○ Better performance when you use compression.● File descriptor limits

○ Segments indexing changed

Defaults are your friends

Defaults are your friends

The default when you drive is to put on your seatbelt.If you are going to change the default to not wearing a seatbelt I hope you have thought through your choice.

Kafka’s defaults are setup to help keep you safe.If you are going to change the default to something else I hope you have thought through your choice.

The Producer

Default Example

Acks:

Setting Description Risk of Data loss Performance

Acks=0 No acknowledgment from the server at all. (Set it and forget it.)

Highest Highest

Acks=1 Leader completes write of data.

Medium Medium

Acks=all All leaders and followers have written the data.

Lowest Lowest

Default Example

Acks:

Setting Description Risk of Data loss Performance

Acks=0 No acknowledgment from the server at all. (Set it and forget it.)

Highest Highest

Acks=1 Leader completes write of data.

Medium Medium

Acks=all All leaders and followers have written the data.

Lowest Lowest

Definitions:

Latency: The length of time for one message to be processed.

Throughput: The number of messages processed

Batch:• “Message 1” - Time 1• “Message 2” - Time 2• “Message 3” - Time 3

← Worst Latency

← Best Latency

Batch Management

Producer

Batch -Partition 1- TopicA

Broker

Partition

“data” “data” “data”

Batch -Partition 1- TopicA

Batch -Partition 1- TopicB

“data” “data” “data”

“data”

Segment

Batch Management

Batch.size - How many messages define the maximum batch size?

Linger.ms - What is the maximum amount of time to wait before sending a batch?

Other:- Same Broker Sending (Piggy Back)- flush() or close() is called

Batch Management

Producer

Broker

Partition 1 - TopicA

Batch -Partition 1- TopicA

Batch -Partition 1- TopicB

“data” “data” “data”

“data”

Segment

Partition 1 - TopicB

Segment

Batch Management

Default Message size is 2048 (If linger.ms is large)

Buffer.memory / Batch.size > Message size

33554432 / 16384 > 2048

Batch Management

Producer

Batch -Partition 1- TopicA

Broker

Partition

“data” “data” “data”

Segment

Batch Management

Default Message size is 2048 (If linger.ms is small)

Buffer.memory / Batch.size > Message size

33554432 / (< 16384) > (>2048)

Batch Management

Producer

Batch -Partition 1- TopicA

Broker

Partition

“data” “data”

Segment

Batch -Partition 1- TopicB

“data”

Partition 1 - TopicB

Segment

“data”

← Linger is triggeringBefore batch is full.

← Using bigger messages to fill the batch

Batch Management

Tune your Batch.size/linger.ms

batch.size + linger.ms = latency + through put

batch.size + linger.ms = latency + through put

Once tuned, do not forget to size your buffer.memory

Compression

Compression.type = none

Compression can introduce performance due to transferring less data over the network. (Cost of additional CPU)

Generalization: Use snappy ****** You should do real performance tests.

Batch ManagementProducer

Batch -Partition 1- TopicA

“data” “data”

Batch -Partition 1- TopicB

“data” “data”

Serializer Partitioner

Did we stick with the Defaults?

Custom Class written for performance?

● Partitioner ○ - Create a custom key based on data - help prevent Skew

● Serializer ○ - Pluggable

● Interceptors ○ - Allows manipulation of records into Kafka ○ - Are they being used? Should they? How are they written?

Tuning

To tune performance you need to experiment with different settings.Data and throughput are different with every project.There is no one size fits all.

Luckily there is a tool to help test configurations.

kafka-run-class.sh bin/kafka-run-class.sh \

org.apache.kafka.clients.tools.ProducerPerformance \test 50000000 100 -1 acks=1 \bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 \buffer.memory=67108864 batch.size=8196

Or use the short cut:bin/kafka-producer-perf-test.sh \

test 50000000 100 -1 acks=1 \bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 \

buffer.memory=67108864 batch.size=8196

There is also one for the consumer:bin/kafka-consumer-perf-test.sh \

Time for a quick walkthrough

Monitoring

Ops Clarity- Now owned by Lightbend - Cadillac of monitoring.

Burrow- A little Resource heavy, (Kafka client per partition)- Health monitor has some false positives

Yahoo Kafka-managerConfluent Control Center

- Confluent distro

Roll your own Kafka JMX & MBeans

Where did they get the name Kafka?

My Guess

Putting Apache Kafka to Use for Event Streams,https://www.youtube.com/watch?v=el-SqcZLZlI

~ Jay Kreps

Where did they get the name Kafka?

My Guess

Where did they get the name Kafka?

My Guess

Where did they get the name Kafka?

Where did they get the name Kafka?

“I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps

https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messaging-system

Where did they get the name Kafka?

“I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps

https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messaging-system

The Broker

Broker Disk Usage

● What your rate of growth and when will you need to expand?

● Try and make sure the number of partions you select covers that growth

Broker Disk Usage

● Log.retention.bytes■ Default is unlimited (-1)

● Log.retention.[time interval]■ Default is 7 days (168 hours)

Broker

● num.io.threads■ Default is 8 - should match physical

disks

Beyond Tuning

How do we optimize writing:

Beyond Tuning

Measure the throughput:

Beyond Tuning

The Consumer

replica.high.watermark.checkpoint.interval.ms - You might think that the high water mark ensures

reliability. It also has has implications on performance.

- Whatch our for consumer lag

Beyond Tuning

Beyond TuningThe future Consumers ability to scale is constrained by the number of partitions.

Beyond Tuning> # of Partitions means:

> Level of parallelism> # files open

( Partitions * Segment count * Replication) / Brokers ~= # of open files per machine

10’s of Thousands of files is manageable on appropriate hardware.> Memory usage (Broker and Zookeeper)> Leader fail over time (Can be mitigated by increased # brokers)

Beyond TuningHow do I calculate the number of partitions to have on a broker?

What’s the rule of thumb to start testing at?

[# partitions per broker] = c x [# brokers] x [replication factor]

c ~ Your machine's awesomeness

c ~ Your appetite for risk

c ~ 100 a good safe starting point

Beyond TuningCan I move an existing partition around? I just added a new broker, and it’s not sharing the load.

Use: bin/kafka-reassign-partitions.sh

1) Create a JSON file of the topics you want to redistribute topics.json2) Use kafka-reassign-partitions.sh … --generate to suggest partition reassignment 3) Copy proposed assignment to a JSON file.4) Use kafka-reassign-partitions.sh … --execute to start the redistirbution process.

a) Can take several hours, depending on data.5) Use kafka-reassign-partitions.sh … --verify to check progress of the redistirbution process.

Link to documentation from conference sponsor.

topics.json:{"topics": [{"topic": "weather"}, {"topic": "sensors"}], "version":1}

Thanks!

Matt Andruff - Hortonworks Practice lead @ Yoppworks

@MattAndruff

I’m not an expert I just sound like one.