Apache Storm Internals

32
STORM ANATOMY Cloud Computing Course Prof Hanku Lee Social Media Cloud Computing lab MS Akhmedov Khumoyun

Transcript of Apache Storm Internals

Page 1: Apache Storm Internals

STORM ANATOMY

Cloud Computing Course Prof Hanku Lee

Social Media Cloud Computing lab MS Akhmedov Khumoyun

Page 2: Apache Storm Internals

What is Stream processing

Stream processing is a technical paradigm to process big volume of unbound sequence of tuples in realtime

= stream

Source Stream Processor

• Continuous analytics• Online machine

learning• Sensor data monitoring• Financial trading …

Page 3: Apache Storm Internals

Storm at Twitter

Twitter Web Analytics

Page 4: Apache Storm Internals

What is Storm?

Storm is

• Fast & scalable• Fault-tolerant• Guarantees messages will be processed• Easy to setup & operate• Free & open source

distributed realtime computation system- Originally developed by Nathan Marz at BackType (acquired by Twitter)- Written in Java and Clojure

Page 5: Apache Storm Internals

Conceptual View

Page 6: Apache Storm Internals

Physical View

Page 7: Apache Storm Internals

Concepts

Streams Spouts Bolts Topologies

Page 8: Apache Storm Internals

Streams

Unbounded sequence of tuples

Page 9: Apache Storm Internals

Spouts

Source of streams

• Read from Kafka queue• Read from Twitter Streaming API

Page 10: Apache Storm Internals

Bolts

Processes input streams and produces new streams

Page 11: Apache Storm Internals

Bolts

• Functions• Filters• Aggregation• Joins• Talk to databases

Page 12: Apache Storm Internals

Topology

Network of spouts and bolts

Page 13: Apache Storm Internals

TasksSpouts and bolts execute as

many tasks across the cluster

Page 14: Apache Storm Internals

Stream grouping

When a tuple is emitted, which task does it go to?

Page 15: Apache Storm Internals

Stream grouping

• Shuffle grouping: pick a random task

• Fields grouping: consistent hashing on a

subset of tuple fields

• All grouping: send to all tasks

• Global grouping: pick task with lowest id

Page 16: Apache Storm Internals

Starting topology

Page 17: Apache Storm Internals

Starting topology

Page 18: Apache Storm Internals

Storm : Fault-tolerance

Page 19: Apache Storm Internals

Storm : Fault-tolerance

Page 20: Apache Storm Internals

Storm : Fault-tolerance

Page 21: Apache Storm Internals

Storm : Fault-tolerance

Page 22: Apache Storm Internals

Storm : Fault-tolerance

Page 23: Apache Storm Internals

Guarantees messages will be processed

Page 24: Apache Storm Internals

Message Passing (ZeroMQ)

Page 25: Apache Storm Internals

Easy to setup & operate

• Setup ZooKeeper cluster• Install dependencies on Nimbus and workermachines- ZeroMQ 2.1.7 and JZMQ- Java 6 and Python 2.6.6- unzip• Download and extract a Storm release to Nimbusand worker machines• Fill in mandatory configuration into storm.yaml• Launch daemons under supervision using “storm”script

Page 26: Apache Storm Internals

Cluster Summary

Page 27: Apache Storm Internals

Topology Summary

Page 28: Apache Storm Internals

Component Summary

Page 29: Apache Storm Internals

Advanced Topics

• Distributed RPC

• Transactional topologies

• Trident

• Using non-JVM languages with Storm

• Unit testing

• Patterns

Page 30: Apache Storm Internals

Real-time Twitter AnalyticsTrending Topics and Sentiment Analysis

Twitter

MySQL

Kafka

Storm Cluster

Hadoop (HDFS and HBase )

Twitter Crawler

Page 31: Apache Storm Internals
Page 32: Apache Storm Internals

THANK YOU FOR ATTENTION

Any Questions Are Welcome…