Apache Kafka Best Practices

Click here to load reader

  • date post

    21-Jan-2018
  • Category

    Technology

  • view

    16.923
  • download

    5

Embed Size (px)

Transcript of Apache Kafka Best Practices

  1. 1. Apache Kafka Best Practices Manikumar Reddy @omkreddy
  2. 2. 2 Hortonworks Inc. 2011 2017. All Rights Reserved Apache Kafka Core APIs The Producer API The Consumer API The Connector API The Streams API Broad classes of applications Building real-time streaming data pipelines Building real-time streaming applications core building block in other data systems
  3. 3. 3 Hortonworks Inc. 2011 2017. All Rights Reserved Key Concepts and Terminology
  4. 4. 4 Hortonworks Inc. 2011 2017. All Rights Reserved Component Layout
  5. 5. 5 Hortonworks Inc. 2011 2017. All Rights Reserved Hardware Guidance Cluster Size Memory CPU Storage Kafka Brokers 3+ 24G+ (for small) 64GB+ (for large) Multi- core processors( 12 CPU+ core), Hyper threading enabled 6+ x 1TB dedicated disks( RAID or JBOD) Zookeeper 3 (for small) 5 (for large) 8GB+ (for small) 24GB+ (for large) 2 core + SSD for Transaction logs
  6. 6. 6 Hortonworks Inc. 2011 2017. All Rights Reserved OS Tuning OS Page Cache Ex: Allocate to hold all the active segments of the log. File descriptor limits : >100k less swapping Tcp tuning JVM Configs Java 8 with G1 Collector 6-8 GB heap
  7. 7. 7 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Disk Storage Use multiple disk spindles, dedicated to kafka JBOD vs RAID10 JBOD Gives all the disk I/O JBOD Limitations any disk failure causes an unclean shutdown and requires lengthy recovery data is not distributed consistently across disks Multiple directories KIP-112/113 necessary tools for users to manage JBOD Intelligent partition assignment On disk failure, broker can serve replicas on the good disks re-assign replicas between disks of the same broker
  8. 8. 8 Hortonworks Inc. 2011 2017. All Rights Reserved RAID RAID10 Can survive single disk failure Performance and protection balance load across disks Single mount point Performance hit and reduces the space File System EXT or XFS SSD Issues on NFS. SAN, NAS
  9. 9. 9 Hortonworks Inc. 2011 2017. All Rights Reserved Basic Monitoring CPU Load Network Metrics File Handle Usage Disk Space Disk I/O Performance Garbage Collection ZooKeeper Monitoring
  10. 10. 10 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Replication Partition has replicas Leader replica, Follower replicas Leader maintains in-sync-replicas (ISR) replica.lag.time.max.ms, num.replica.fetchers min.insync.replica used by producer to ensure greater durability https://www.slideshare.net/junrao/kafka-replication-apachecon2013
  11. 11. 11 Hortonworks Inc. 2011 2017. All Rights Reserved Under Replicated Partitions Number of partitions which are not fully replicated within the cluster Mbean - kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions ISR Shrink/Expand Rate Under Replicated Partitions Lost Broker? Controller Issues Zookeeper Issues Network Issues Solutions Tune the ISR settings Expand brokers
  12. 12. 12 Hortonworks Inc. 2011 2017. All Rights Reserved Controller Manages Partitions Life cycle Avoid controller's ZK session expires Soft failures ISR Churn/Under replicated partitions ZK Server performance Long GC pauses on Broker Bad network configuration Monitoring Mbean : kafka.controller:type=KafkaController,name=ActiveControllerCount only one broker in the cluster should have 1 LeaderElectionRate
  13. 13. 13 Hortonworks Inc. 2011 2017. All Rights Reserved Unclean leader election Enable replicas not in the ISR set to be elected as leader Availability vs correctness By-default kafka chooses availability Monitoring Mbean : kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec Default will be changed in next release
  14. 14. 14 Hortonworks Inc. 2011 2017. All Rights Reserved Broker Configs log.retention.{ms, minutes, hours} , log.retention.bytes message.max.bytes, replica.fetch.max.bytes delete.topic.enable unclean.leader.election.enable = false min.insync.replicas = 2 replica.lag.time.max.ms, num.replica.fetchers replica.fetch.response.max.bytes zookeeper.session.timeout.ms = 30s num.io.threads
  15. 15. 15 Hortonworks Inc. 2011 2017. All Rights Reserved Cluster Sizing Broker Sizing Partition count on each broker ( high throughput, higher latency linger.ms time based batching larger size -> high throughput, higher latency max.in.flight.requests.per.connection Better throughput, affects ordering compression.type adding more user threads can help throughput acks Affects message durability
  16. 22. 22 Hortonworks Inc. 2011 2017. All Rights Reserved Performance tuning If throughput < network capacity Add more user threads Increase batch size Add more producers instances Add more partitions Latency when acks = -1 Increase num.replica.fetchers Cross datacenter data transfer Tune socket buffer settings, OS tcp buffer settings
  17. 23. 23 Hortonworks Inc. 2011 2017. All Rights Reserved Producer Monitoring batch-size-avg compression-rate-avg waiting-threads buffer-available-bytes record-queue-time-max record-send-rate records-per-request-avg
  18. 24. 24 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Consumer Test in your Environment kafka-consumer-perf-test.sh Throughput Issues not enough partitions OS Page Cache - allocate enough to hold all the messages for your consumers for say, 30s Application/Processing logic Offsets topic __consumer_offsets offsets.topic.replication.factor offsets.retention.minutes Monitor ISR, topic size Slow offset commits commit async, manual commits
  19. 25. 25 Hortonworks Inc. 2011 2017. All Rights Reserved Consumer Configs fetch.min.bytes and fetch.max.wait.ms max.poll.interval.ms max.poll.records session.timeout.ms Consumer Rebalance check timeouts check processing times/logic GC Issues Tune network settings
  20. 26. 26 Hortonworks Inc. 2011 2017. All Rights Reserved Consumer Monitoring Whether or not the consumer is keeping up with the messages that are being produced Consumer Lag: Difference between the end of the log and the consumer offset Monitoring Metrics Monitoring - records-lag-max bin/kafka-consumer-groups.sh LinkedIns Burrow for consumer monitoring Decreasing Lag Analyze consumer - GC Issues, hung instance Add more consumer Instances increase the number of partitions and consumers
  21. 27. 27 Hortonworks Inc. 2011 2017. All Rights Reserved No data loss settings Producer block.on.buffer.full=true retries=Long.MAX_VALUE acks=all max.in.flight.requests.per.connection=1 close producer Broker replication factor >= 3 min.insync.replicas=2 disable unclean leader election Consumer disable auto.offset.commit Commit offsets only after the messages are processed
  22. 28. 28 Hortonworks Inc. 2011 2017. All Rights Reserved Authorizer - Ranger Auditing
  23. 29. 29 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Mirror Maker Tool to mirror a source Kafka cluster into a target (mirror) Kafka cluster
  24. 30. 30 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Mirror Maker Run multiple mirroring processes high fault-tolerance high throughput --num.streams option to specify the number of consumer threads no.of threads in num.streams
  25. 31. 31 Hortonworks Inc. 2011 2017. All Rights Reserved Kafka Mirror Maker Consumer and source cluster socket buffer sizes high value for the socket buffer size consumer's fetch size OS networking Tuning Source and Target Clusters are independent entities Can be different numbers of partitions offsets will not be the same. partitioning order is preserved on a per-key basis. Create topics in target cluster Monitor whether a mirror is keeping up Consumer Lag Running In Secure Clusters We recommend to use SSL We can run MM on source cluster
  26. 32. 32 Hortonworks Inc. 2011 2017. All Rights Reserved Open source Operational Tools Ambari Metrics https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-user- guide/content/grafana_kafka_dashboards.html Removing brokers and rebalancing partitions in a cluster https://github.com/linkedin/kafka-tools Consumer Lag Monitoring Burrow (https://github.com/linkedin/Burrow) Kafka Manager - https://github.com/yahoo/kafka-manager
  27. 33. 33 Hortonworks Inc. 2011 2017. All Rights Reserved Apache Kafka 0.10.2 release Includes 15 KIPs, over 200 bug fixes and improvements The newest Java Clients now support older brokers (0.10.0 and higher) Separation of Internal and External traffic Create Topic Policy Security Improvements Support for SASL/SCRAM mechanisms Dynamic JAAS configuration for Kafka clients Support for authentication of multiple Kafka clients in single JVM Producer and Consumer Improvements Connect API & Streams API improvements
  28. 34. 34 Hortonworks Inc. 2011 2017. All Rights Reserved Thank You
  29. 35. 35 Hortonworks Inc. 2011 2017. All Rights Reserved References http://kafka.apache.org/documentation.html https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache- kafka-63147600 https://www.slideshare.net/ToddPalino/tuning-kafka-for-fun-and-profit https://www.slideshare.net/JiangjieQin/no-data-loss-pipeline-with-apache-kafka- 49753844 https://www.slideshare.net/ToddPalino/putting-kafka-into-overdrive https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a- kafka-cluster/