Short introduction to Storm
-
Upload
jimmyzoger -
Category
Technology
-
view
314 -
download
0
description
Transcript of Short introduction to Storm
![Page 1: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/1.jpg)
STORMDISTRIBUTED AND FAULT-TOLERANT
REALTIME COMPUTATION
Jimmy ZögerCLC < FIB < UPC
2013-06-03
![Page 2: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/2.jpg)
INTRODUCTION
• Like Hadoop for realtime processing instead of batch
•Open Source
•Developed by BackType which was later acquired by Twitter
•Developed for analyzing Twitter data
• Similar to S4
![Page 3: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/3.jpg)
STORM TOPOLOGY
![Page 4: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/4.jpg)
SPOUTS
![Page 5: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/5.jpg)
SPOUTS
• The component responsible for feeding messages into the topology
• Emits tuples
• Can be reliable or unreliable (ack() and fail())
![Page 6: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/6.jpg)
INTEGRATION
• Kestrel
• RabbitMQ
• Kafka
• JMS
• Integration is easy with the simple Spout abstraction
![Page 7: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/7.jpg)
BOLTS
![Page 8: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/8.jpg)
BOLTS
• A component that takes tuples as input and produces tuples as output
• Can do filtering, joining, functions, aggregations etc.
•Does not have to process a tuple immediately and may hold onto tuples to process later
• Comparison with Hadoop: A bolt can be a mapper or a reducer (or anything)
![Page 9: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/9.jpg)
STORM TOPOLOGY
![Page 10: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/10.jpg)
STORM TOPOLOGY
• Spouts, bolts and streams
•Distributed
• Runs indefinitely until it is stopped
• Arbitrary complexity
• Streams requiring multiple steps also requires multiple bolts
•No intermediate queues for streams
![Page 11: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/11.jpg)
FAULT-TOLERANCE
•Nimbus daemon and Supervisor daemons are fail-fast and stateless
• Each worker sends heartbeats to Nimbus
• Transactional topologies → Guaranteed processing
NimbusZookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Zookeeper
![Page 12: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/12.jpg)
USE CASES
• Counting words!
• Realtime analytics - trending topics on Twitter
•Online machine learning
• Continuous computation
•Distributed RPC
• Extract, Transform and Load (ETL)
![Page 13: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/13.jpg)
FAST
One benchmark clocked it over a million tuples processed
per second per node
{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠
![Page 14: Short introduction to Storm](https://reader033.fdocuments.net/reader033/viewer/2022051611/54b730284a795942398b461f/html5/thumbnails/14.jpg)
STORMDISTRIBUTED AND FAULT-TOLERANT
REALTIME COMPUTATION
Jimmy ZögerCLC < FIB < UPC
2013-06-03