Experience with Kafka & Storm
-
Upload
otto-mok -
Category
Technology
-
view
119 -
download
0
description
Transcript of Experience with Kafka & Storm
![Page 1: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/1.jpg)
Target and Connect Intelligently
Experience with Kafka & Storm
Otto MokSolution Architect, AcuityAdsApril 30, 2014 – Toronto Hadoop User Group
![Page 2: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/2.jpg)
2
Agenda
• Background– What does AcuityAds do?
• Use case– What are we trying to do?
• High-level System Architecture– How does the data flow?
• Kafka & Storm– What did we do wrong?
![Page 3: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/3.jpg)
3
Background
Source: https://www.google.ca/search?q=banner+ads&tbm=isch&tbo=u
![Page 4: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/4.jpg)
4
Background
• Digital Advertising– Website banner, pre-roll video, free mobile app
• Buy ad impressions at ‘real-time’– Response within 50ms for auction
• Find best match between people and ads– Show ad that you care about
• Use machine learning algo to ‘learn’– Data, data, data
![Page 5: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/5.jpg)
5
Use case
• 10+ billion daily impressions• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...
![Page 6: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/6.jpg)
6
High-level System Architecture
• 10+ billion daily bid requests
• Make up to 4 billion daily bids
• Serve millions of daily impressions
• 10+ TB of messages daily
• 300k+ message / second
Bidder Adserver
Kafka
Hbase/Hadoop
Storm
![Page 7: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/7.jpg)
7
Kafka
Source: http://kafka.apache.org/documentation.html
![Page 8: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/8.jpg)
8
Kafka - Spec
• Kafka v0.8.0• Servers – 10 x 2U(10 x 3TB) JBOD• Total storage – 300 TB• Replication – 3x• Unique data – 100 TB• Capacity – a few days• Producer acknowledgment – never waits• Topic - BIDREQUEST
![Page 9: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/9.jpg)
9
Kafka - Monitoring
• Nagios– Ping, CPU, memory, network I/O, disk space
• Producer-Consumer group message counting– Hourly consumption rate check
Topic Consumer Group ID Producer Count Consumer Count Error Ratio
BIDREQUEST InventoryTopology 122,450,812 122,444,294 None 1.00
BIDREQUEST SearchTargetingTopology 122,450,812 107,755,295 Ratio below 98% 0.88
![Page 10: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/10.jpg)
10
Kafka - Monitoring
• Kafka Web Console– Partition offset for each consumer group
![Page 11: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/11.jpg)
11
Kafka - Issues
• Issue 1 - Partitions– 10 partitions– Each partition > 1 TB a day– 100 TB / 1 TB – no problem!
• Each partition is stored in a directory– /disk05/kafka-logs/BIDREQUEST-09– /disk09/kafka-logs/BIDREQUEST-03
![Page 12: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/12.jpg)
12
Kafka - Issues
• Issue 2 – Unbalanced partition distribution– Some servers running out of space– Some servers are not “leader” for any partition
• Network glitch cause server to drop out of cluster, no longer leader after rejoin
• auto.leader.rebalance.enable=true
![Page 13: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/13.jpg)
13
Lots of data – now what?
Source: http://bookriotcom.c.presscdn.com/wp-content/uploads/2013/03/server-farm-shot.jpg
![Page 14: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/14.jpg)
14
Use case - again
• 10+ billion daily impressions• 30,000+ new sites daily
• How many daily impressions by site?
• How are the impressions distributed?– Country, Province, Gender, Age Range, etc...
![Page 15: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/15.jpg)
15
Storm
Source: http://storm.incubator.apache.org/documentation/Tutorial.html
![Page 16: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/16.jpg)
16
Storm - Spec
• Storm v0.8.2• Servers – 13 x Dual Quad Core Xeon 36G RAM• 4 worker slots per server• Total logical CPUs – 208• Total memory – 468 G• Total slots – 52 worker slots (JVMs)
![Page 17: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/17.jpg)
17
Storm - Monitor
![Page 18: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/18.jpg)
18
Storm - Topology
• Spout read each BidRequest from Kafka topic• Determine new or existing, emit tuples to
different “streams”
![Page 19: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/19.jpg)
19
Storm - Topology
• InsertInventoryBolt– Process tuples from NewInventory stream– Field grouping on sourceId, domainName– Tick tuple every 1 second
• UpdateInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 1 second
![Page 20: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/20.jpg)
20
Storm - Topology
• LogInventoryBolt– Process tuples from ExistingInventory stream– Field grouping on inventoryId– Tick tuple every 10 seconds
![Page 21: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/21.jpg)
21
Storm - Issues
• Issue – Low uptime– 10 workers, 100 executors– Not processing many tuples– Process latency < 10ms
• Bolts restarts due to uncaught Exceptions
![Page 22: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/22.jpg)
22
Conclusion
• Cost– Bleed edge technology bugs– Support mailing lists– Monitoring roll your own– Operation dedicated personnel
• Benefit– Near real-time data on site impression volume &
distribution by geo, demo, etc...
![Page 23: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/23.jpg)
23
Forward Looking
• Kafka v0.8.1.1– Allow specify broker hostname for producer &
consumer– Change # of partitions of a topic online
• Storm v0.9.1– Faster pure Java Netty transport– View logs from each server from Storm UI– Tick tuple using floating point seconds– Storm on Hadoop (HDP 2.1)
![Page 24: Experience with Kafka & Storm](https://reader034.fdocuments.net/reader034/viewer/2022052321/54c6dd8c4a7959aa138b45b4/html5/thumbnails/24.jpg)
24
Thank you
Otto [email protected]: http://jamesgieordano.files.wordpress.com/2011/05/babyelephant.jpg