Storm - Altamira University Presentation

30
Apache Storm A distributed, real-time computation system Some content borrowed from Nathan Marz ’ Presentation of a similar na Ryan Lanman

description

 

Transcript of Storm - Altamira University Presentation

Page 2: Storm - Altamira University Presentation

Objectives1.Their Motivation2.Our Motivation3.Storm Basics4.Demo

Page 3: Storm - Altamira University Presentation

Their MotivationHow Storm Came To Be

Page 4: Storm - Altamira University Presentation
Page 5: Storm - Altamira University Presentation
Page 6: Storm - Altamira University Presentation
Page 7: Storm - Altamira University Presentation
Page 8: Storm - Altamira University Presentation
Page 9: Storm - Altamira University Presentation
Page 10: Storm - Altamira University Presentation

What They Wanted• Guaranteed data processing• Horizontal scalability• Fault-tolerance• No intermediate message brokers!• Higher level abstraction than message passing• “Just works”

Page 11: Storm - Altamira University Presentation

Our MotivationWhy We Chose Storm

eventua

ll

y^

Page 12: Storm - Altamira University Presentation

Lumify IngestRaw Data

Text Extraction

Entity Extraction

Text Highlighting

Location Extraction

Full Text Indexing

Page 13: Storm - Altamira University Presentation

Issues

• No Reducers• High DB Read/Writes• Batch-style processing• M/R Overhead• Zero Fault Tolerance

Page 14: Storm - Altamira University Presentation

What We Really Wanted

• Distributed, Stream-type Processing• Simple Logical DAG• Better Fault Tolerance

Page 15: Storm - Altamira University Presentation

Text

Storm Ingest Workflow

Documents

Video

Images

Raw Data Content Sorter

Text Extraction

Video Frame

Splitting

Video Frame Text Extraction

Image Text Extraction

Page 16: Storm - Altamira University Presentation

Storm BasicsWhat the heck’s a Topology?

Page 17: Storm - Altamira University Presentation

Storm Cluster

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Page 18: Storm - Altamira University Presentation

Storm Cluster

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Page 19: Storm - Altamira University Presentation

Storm Cluster

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Page 20: Storm - Altamira University Presentation

Storm Cluster

Nimbus

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Page 21: Storm - Altamira University Presentation

Storm Data Concepts• Tuples• Streams• Spouts• Bolts• Topologies

Page 22: Storm - Altamira University Presentation

Tuples

• Single unit of data in Storm• Examples– Tweet– User Activity Log Entry– File Info

Page 23: Storm - Altamira University Presentation

Streams

Tuple Tuple Tuple TupleTupleTuple Tuple

An unbound sequence of Tuples

Page 24: Storm - Altamira University Presentation

Spouts

TupleTuple

TupleTupleTuple Tuple

Producers of Streams

Tuple

TupleTuple

Tuple

Tuple Tuple

Spout

Page 25: Storm - Altamira University Presentation

Bolts

TupleTuple

Tuple Tuple

Process input streams to create new streams

Tuple

Tuple

Tuple Tuple

Tuple Tuple

Page 26: Storm - Altamira University Presentation

Examples

Spout Examples• HDFS Filesystem Spout• Kafka Queue Spout

Bolt Examples• Filtering• Aggregation• DB Operations

Page 27: Storm - Altamira University Presentation

Topologies

Spout

Spout

Spout

Page 28: Storm - Altamira University Presentation

Demo

Page 29: Storm - Altamira University Presentation

Demo Topology

Twitter Hosebird

Spout

SentenceSplitter

Accumulo

WordCount

Twitter

Page 30: Storm - Altamira University Presentation

Demo Topology

Twitter Hosebird

Spout

SentenceSplitter

Accumulo

WordCount

Twitter ShuffleGrouping Field

Grouping