Storm - As deep into real-time data processing as you can get in 30 minutes.
-
Upload
dan-lynn -
Category
Technology
-
view
15.815 -
download
1
description
Transcript of Storm - As deep into real-time data processing as you can get in 30 minutes.
![Page 1: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/1.jpg)
Storm
@danklynn
As deep into real-time data processing as you can get**in 30 minutes.
![Page 2: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/2.jpg)
Keeps Contact Information Current and Complete
Based in Denver, Colorado
CTO & [email protected]
@danklynn
![Page 3: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/3.jpg)
Turn Partial Contacts Into Full Contacts
![Page 4: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/4.jpg)
Storm
![Page 5: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/5.jpg)
StormDistributed and fault-‐tolerant real-‐3me computa3on
![Page 6: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/6.jpg)
StormDistributed and fault-‐tolerant real-‐3me computa3on
![Page 7: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/7.jpg)
StormDistributed and fault-‐tolerant real-‐3me computa3on
![Page 8: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/8.jpg)
StormDistributed and fault-‐tolerant real-‐3me computa3on
![Page 9: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/9.jpg)
THE HARD WAY
Queues
Workers
![Page 10: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/10.jpg)
THE HARD WAY
![Page 11: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/11.jpg)
Key Concepts
![Page 12: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/12.jpg)
TuplesOrdered list of elements
![Page 14: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/14.jpg)
StreamsUnbounded sequence of tuples
![Page 15: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/15.jpg)
StreamsUnbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple
![Page 16: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/16.jpg)
SpoutsSource of streams
![Page 17: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/17.jpg)
SpoutsSource of streams
![Page 18: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/18.jpg)
SpoutsSource of streams
Tuple Tuple Tuple Tuple Tuple Tuple
![Page 19: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/19.jpg)
Spouts can talk with
some images from h,p://commons.wikimedia.org
•Queues
•Web logs
•API calls
•Event data
![Page 20: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/20.jpg)
BoltsProcess tuples and create new streams
![Page 21: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/21.jpg)
Bolts
some images from h,p://commons.wikimedia.org
•Apply funcBons / transforms•Filter•AggregaBon•Streaming joins•Access DBs, APIs, etc...
![Page 22: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/22.jpg)
Bolts
Tuple Tuple Tuple Tuple Tuple Tuple
some images from h,p://commons.wikimedia.org
TupleTuple
TupleTuple
TupleTuple
TupleTuple
TupleTuple
TupleTuple
![Page 23: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/23.jpg)
TopologiesA directed graph of Spouts and Bolts
![Page 24: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/24.jpg)
This is a Topology
some images from h,p://commons.wikimedia.org
![Page 25: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/25.jpg)
This is also a topology
some images from h,p://commons.wikimedia.org
![Page 26: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/26.jpg)
TasksExecute Streams or Bolts
![Page 27: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/27.jpg)
Running a Topology
$ storm jar my-code.jar com.example.MyTopology arg1 arg2
![Page 28: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/28.jpg)
Storm Cluster
Nathan Marz
![Page 29: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/29.jpg)
Storm Cluster
Nathan Marz
If this wereHadoop...
![Page 30: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/30.jpg)
Storm Cluster
Nathan Marz
Job Tracker
If this wereHadoop...
![Page 31: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/31.jpg)
Storm Cluster
Nathan MarzTask Trackers
If this wereHadoop...
![Page 32: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/32.jpg)
Storm Cluster
Nathan Marz
Coordinates everything
But it’s not Hadoop
![Page 33: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/33.jpg)
Example:Streaming Word Count
![Page 34: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/34.jpg)
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
![Page 35: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/35.jpg)
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
![Page 36: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/36.jpg)
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); }
@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }
@Override public Map<String, Object> getComponentConfiguration() { return null; }}
SplitSentence.java
![Page 37: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/37.jpg)
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); }
@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }
@Override public Map<String, Object> getComponentConfiguration() { return null; }}
SplitSentence.java
splitsentence.py
![Page 38: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/38.jpg)
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); }
@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }
@Override public Map<String, Object> getComponentConfiguration() { return null; }}
SplitSentence.java
![Page 39: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/39.jpg)
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
![Page 40: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/40.jpg)
Streaming Word Count
public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>();
@Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if(count==null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); }
@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); }}
WordCount.java
![Page 41: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/41.jpg)
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
Groupings control how tuples are routed
![Page 42: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/42.jpg)
Shuffle groupingTuples are randomly distributed across all of the
tasks running the bolt
![Page 43: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/43.jpg)
Fields groupingGroups tuples by specific named fields and routes
them to the same task
![Page 44: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/44.jpg)
Fields groupingGroups tuples by specific named fields and routes
them to the same task
Analogous to Hadoop’s
partitioning behavior
![Page 45: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/45.jpg)
Trending Topics
![Page 46: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/46.jpg)
Twitter Trending Topics
TwitterStreamingTopicSpoutparallelism = 1 (unless you use GNip)
(word)
RollingCountsBoltparallelism = n
(word, count)
IntermediateRankingsBoltparallelism = n
(rankings)
(tweets)
(JSON rankings)
RankingsReportBoltparallelism = 1
TotalRankingsBoltparallelism = 1
(rank
ings)
![Page 47: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/47.jpg)
Live Coding!
![Page 48: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/48.jpg)
Twitter Trending Topics
TwitterStreamingTopicSpoutparallelism = 1 (unless you use GNip)
(word)
RollingCountsBoltparallelism = n
(word, count)
IntermediateRankingsBoltparallelism = n
(rankings)
(tweets)
(JSON rankings)
RankingsReportBoltparallelism = 1
TotalRankingsBoltparallelism = 1
(rank
ings)
![Page 49: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/49.jpg)
Tips
![Page 50: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/50.jpg)
loggly.com
Graylog2logstash
Use a log aggregator
![Page 51: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/51.jpg)
"$topologyName-$buildNumber"
Rolling Deploys
![Page 52: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/52.jpg)
1. Launch new topology
2. Wait for it to be healthy
3. Kill the old one
Rolling Deploys
![Page 53: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/53.jpg)
These are under active development
Rolling Deploys
![Page 54: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/54.jpg)
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
see:https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
Tune your parallelism
![Page 55: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/55.jpg)
Tune your parallelismSupervisor
Worker Process (JVM)
Executor (thread)
Task
Task
Executor (thread)
Task
Task
Worker Process (JVM)
Executor (thread)
Task
Task
Executor (thread)
Task
Task
Parallelism hints control the number of Executors
![Page 56: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/56.jpg)
collector.emit(new Values(word, count));
see:https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
Anchor your tuples (or not)
collector.emit(tuple, new Values(word, count));
![Page 57: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/57.jpg)
But Dan, you left out Trident!
![Page 58: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/58.jpg)
if (storm == hadoop) { trident = pig / cascading}
![Page 59: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/59.jpg)
A little taste of Trident TridentState urlToTweeters = topology.newStaticState(getUrlToTweetersState());TridentState tweetersToFollowers = topology.newStaticState(getTweeterToFollowersState());
topology.newDRPCStream("reach") .stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters")) .each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter")) .shuffle() .stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers")) .parallelismHint(200) .each(new Fields("followers"), new ExpandList(), new Fields("follower")) .groupBy(new Fields("follower")) .aggregate(new One(), new Fields("one")) .parallelismHint(20) .aggregate(new Count(), new Fields("reach"));
h,ps://github.com/nathanmarz/storm/wiki/Trident-‐tutorial
![Page 60: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/60.jpg)
Thanks:
@stormprocessorhttp://github.com/nathanmarz/storm
Nathan Marz - @nathanmarz
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
Michael Knoll - @miguno
Michael Rose - @xorlevhttp://github.com/xorlev
![Page 61: Storm - As deep into real-time data processing as you can get in 30 minutes.](https://reader037.fdocuments.net/reader037/viewer/2022110118/554f6423b4c905c8088b4c3d/html5/thumbnails/61.jpg)
https://github.com/danklynn/storm-starter/tree/gluecon2013