Blr hadoop meetup

33
© 2016 24/7 CUSTOMER, INC. BIG DATA BANGALORE JAN MEETUP - 24/7 CUSTOMER, INC. Recipes for building resilient cross-DC data pipeline with Kafka Sr. Engineering Manager - Big Data Platform Suneet Grover

Transcript of Blr hadoop meetup

Page 1: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC.

BIG DATA BANGALORE JAN MEETUP - 24/7 CUSTOMER, INC.

Recipes for building resilient cross-DC data pipeline with Kafka

Sr. Engineering Manager - Big Data Platform

Suneet Grover

Page 2: Blr hadoop meetup

2© 2016 24/7 CUSTOMER, INC.

About [24]7

Page 3: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 3

Today’s engagement is not driving successful moments

Q&A

IVR

Page 4: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 4

Smart Customer Engagement

Data-DrivenReflecting All Available Data

Click here to see [24]7 in actionVideo available at http://player.vimeo.com/video/85280070

PredictiveReal-timeDecisions

Omni-channelAcross Digital

& Voice

PersonalizedUser Experience

Page 5: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 5

Intent-driven engagement

Anticipate consumer intent

Holistic experience across channels

Delivering the right moments

to

Move from

Channel-centric engagement

Reacting to consumer behavior

Disconnected, fragmented channels

Too many failed experiences

Page 6: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 6

[24]7 by the numbers

1.2bsmart speech

calls/year

127mvirtual agent

inquiries/year

30magent

chats/year

341mweb visitors

/month

5000+digital chat agents

(#1 WW)

70+data scientists

(most in industry)

100+patents

300+software engineers &

designers

Page 7: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 7

Agenda• Introduction to Kafka• Kafka at [24]7• From problems to solutions• Transparency and Resiliency• Metrics Demo• Design for multiple data centers

Page 8: Blr hadoop meetup

8© 2016 24/7 CUSTOMER, INC.

Introduction to Kafka

Page 9: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 9

Apache Kafka• Distributed• High performance and throughput• Streaming platform, pub/sub system

Page 10: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 10

Topic and Partitions

Page 11: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 11

Producers Consumers

Page 12: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 12

Kafka setup across DCs

Brokers

Region1 Region 2

Mirrormakers

Zookeepers

Brokers

Mirrormakers

Zookeepers

Page 13: Blr hadoop meetup

13© 2016 24/7 CUSTOMER, INC.

Kafka at [24]7

Page 14: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 14

Intent PredictionData AnalyticsBusiness Intelligence

Page 15: Blr hadoop meetup

15© 2016 24/7 CUSTOMER, INC.

From problems to solutions

Page 16: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 16

Challenges with Kafka 0.8.0• Broker partition stickiness does not allow to scale• ZK load and latencies keep increasing• Range based mirror-maker algorithm not optimal• Stale topics cannot be deleted• Controller can get into a stuck state• Conflict errors in mirror-makers• Socket leaks leading to open file descriptors

Page 17: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 17

Learnings from Kafka 0.8.0• If the controller gets into a stuck state, delete the “/controller”

node from zookeeper• Always do clean shutdown and restart of brokers• Some issues are not always visible as errors or warnings• Run ZK on SSD

Page 18: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 18

Kafka 0.10• Very stable release• Easy to do in-place from 0.8.2 onwards• Better client APIs• Richer admin operations

Page 19: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 19

Broker configurations that worked for us• default.replication.factor = 3• num.partitions = 2• delete.topic.enable = true• auto.leader.rebalance.enable = true• controlled.shutdown.enable = true• queued.max.requests = 1000

Page 20: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 20© 2016 24/7 CUSTOMER, INC.

Transparency and Resiliency

Page 21: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 21

Metrics flow

Grafana

Graphite

Kafka BrokerMetrics Reporter

Kafka MM JMXTrans

Zookeeper

Host level Metrics & Alerts

Lag monitor

ELK

Page 22: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 22

Essential Broker Metrics• Disk, CPU and throughput utilization• Ingress, egress volume per broker and topic• Active controller count• Offline partitions• Under replicated partitions• Partitions per broker• Log flush rate

Page 23: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 23

Basic Alerts• Disk, CPU utilization• Open file handles• Controller count• Controller re-elections• Under replicated partitions• Offline partitions• Stuck pending commands in zookeeper• Conflicts in mirror-makers

Page 24: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 24

JMXTrans• Push mirror-maker metrics to graphite

• Throughput per topic, per thread, per instance etc.• WaitOnTake, WaitOnPut

• Push zookeeper metrics to graphite• Latency, quorum, connections etc.

Page 25: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 25

Data Lag Monitoring• Measures the event level time delay• Monitors data latencies per cluster, per topic, per partition• Latencies between multiple steps in Kafka pipeline• Optimize and configure sampling ratio• Supports multiple message formats json, avro etc.• Alerts based on pre-defined thresholds

Page 26: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 26

Indicative Broker Metrics• Request Metrics

• Local Time• Remote Time• Queue Time

• Request Handler Idle Percent • Network Processor Idle Percent

Page 27: Blr hadoop meetup

27© 2016 24/7 CUSTOMER, INC.

Now some demo

Page 28: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 28© 2016 24/7 CUSTOMER, INC.

Design for Multiple Data Centers

Page 29: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 29

Range Based Mirror Makers

Consumer 1 Consumer 2 Consumer 3 Consumer 41

10

100

10001000

181

14

5

Skewed Partition Assignment

Num Partitions

Page 30: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 30

Round Robin Mirror Makers

Consumer 1 Consumer 2 Consumer 3 Consumer 40

50

100

150

200

250

300

350

Uniform Partition Assignment

Num Partitions

Page 31: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 31

Mirror-maker fine tuning• Round Robin works better than Range based in most cases• Spread out the topics in multiple MM consumer groups

• If you have a few large volume topics• Negative regex works with whitelist parameter• Doesn’t help to have too many MM consumer threads• Tune socket buffer size (doesn’t apply unless OS allows)

• MM - socket.receive.buffer.bytes = 1048576• Broker - socket.send.buffer.bytes = 1048576

Page 32: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 32

We are hiring!!!

For current open positions, please log onto our careers web page

http://www.247-inc.com/Company>Careers>Location

For further details, Please reach out to:Achappa C B - [email protected], M: +91-7338458247

Page 33: Blr hadoop meetup

© 2016 24/7 CUSTOMER, INC. 33