Apache Storm Internals
-
Upload
humoyun-ahmedov -
Category
Technology
-
view
109 -
download
4
Transcript of Apache Storm Internals
STORM ANATOMY
Cloud Computing Course Prof Hanku Lee
Social Media Cloud Computing lab MS Akhmedov Khumoyun
What is Stream processing
Stream processing is a technical paradigm to process big volume of unbound sequence of tuples in realtime
= stream
Source Stream Processor
• Continuous analytics• Online machine
learning• Sensor data monitoring• Financial trading …
Storm at Twitter
Twitter Web Analytics
What is Storm?
Storm is
• Fast & scalable• Fault-tolerant• Guarantees messages will be processed• Easy to setup & operate• Free & open source
distributed realtime computation system- Originally developed by Nathan Marz at BackType (acquired by Twitter)- Written in Java and Clojure
Conceptual View
Physical View
Concepts
Streams Spouts Bolts Topologies
Streams
Unbounded sequence of tuples
Spouts
Source of streams
• Read from Kafka queue• Read from Twitter Streaming API
Bolts
Processes input streams and produces new streams
Bolts
• Functions• Filters• Aggregation• Joins• Talk to databases
Topology
Network of spouts and bolts
TasksSpouts and bolts execute as
many tasks across the cluster
Stream grouping
When a tuple is emitted, which task does it go to?
Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: consistent hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
Starting topology
Starting topology
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Guarantees messages will be processed
Message Passing (ZeroMQ)
Easy to setup & operate
• Setup ZooKeeper cluster• Install dependencies on Nimbus and workermachines- ZeroMQ 2.1.7 and JZMQ- Java 6 and Python 2.6.6- unzip• Download and extract a Storm release to Nimbusand worker machines• Fill in mandatory configuration into storm.yaml• Launch daemons under supervision using “storm”script
Cluster Summary
Topology Summary
Component Summary
Advanced Topics
• Distributed RPC
• Transactional topologies
• Trident
• Using non-JVM languages with Storm
• Unit testing
• Patterns
Real-time Twitter AnalyticsTrending Topics and Sentiment Analysis
MySQL
Kafka
Storm Cluster
Hadoop (HDFS and HBase )
Twitter Crawler
THANK YOU FOR ATTENTION
Any Questions Are Welcome…