A pattern language for microservices (#gluecon #gluecon2016)
Storm Gluecon
-
Upload
ahmad-luky-ramdani -
Category
Documents
-
view
231 -
download
0
description
Transcript of Storm Gluecon
-
Storm
@danklynn
As deep into real-time data processing as you can get**in 30 minutes.
-
Keeps Contact Information Current and Complete
Based in Denver, Colorado
CTO & [email protected]
@danklynn
-
Turn Partial Contacts Into Full Contacts
-
Storm
-
StormDistributed and fault-tolerant real-3me computa3on
-
StormDistributed and fault-tolerant real-3me computa3on
-
StormDistributed and fault-tolerant real-3me computa3on
-
StormDistributed and fault-tolerant real-3me computa3on
-
THE HARD WAY
Queues
Workers
-
THE HARD WAY
-
Key Concepts
-
TuplesOrdered list of elements
-
TuplesOrdered list of elements
("search-01384", "e:[email protected]")
-
StreamsUnbounded sequence of tuples
-
StreamsUnbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple
-
SpoutsSource of streams
-
SpoutsSource of streams
-
SpoutsSource of streams
Tuple Tuple Tuple Tuple Tuple Tuple
-
Spouts can talk with
some images from h,p://commons.wikimedia.org
Queues
Web logs
API calls
Event data
-
BoltsProcess tuples and create new streams
-
Bolts
some images from h,p://commons.wikimedia.org
Apply funcBons / transformsFilterAggregaBonStreaming joinsAccess DBs, APIs, etc...
-
Bolts
Tuple Tuple Tuple Tuple Tuple Tuple
some images from h,p://commons.wikimedia.org
TupleTuple
TupleTuple
TupleTuple
TupleTuple
TupleTuple
TupleTuple
-
TopologiesA directed graph of Spouts and Bolts
-
This is a Topology
some images from h,p://commons.wikimedia.org
-
This is also a topology
some images from h,p://commons.wikimedia.org
-
TasksExecute Streams or Bolts
-
Running a Topology
$ storm jar my-code.jar com.example.MyTopology arg1 arg2
-
Storm Cluster
Nathan Marz
-
Storm Cluster
Nathan Marz
If this wereHadoop...
-
Storm Cluster
Nathan Marz
Job Tracker
If this wereHadoop...
-
Storm Cluster
Nathan MarzTask Trackers
If this wereHadoop...
-
Storm Cluster
Nathan Marz
Coordinates everything
But its not Hadoop
-
Example:Streaming Word Count
-
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
-
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
-
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {public SplitSentence() {super("python", "splitsentence.py");}
@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word"));}
@Overridepublic Map getComponentConfiguration() {return null;}}
SplitSentence.java
-
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {public SplitSentence() {super("python", "splitsentence.py");}
@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word"));}
@Overridepublic Map getComponentConfiguration() {return null;}}
SplitSentence.java
splitsentence.py
-
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {public SplitSentence() {super("python", "splitsentence.py");}
@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word"));}
@Overridepublic Map getComponentConfiguration() {return null;}}
SplitSentence.java
-
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
-
Streaming Word Count
public static class WordCount extends BaseBasicBolt {Map counts = new HashMap();
@Overridepublic void execute(Tuple tuple, BasicOutputCollector collector) {String word = tuple.getString(0);Integer count = counts.get(word);if(count==null) count = 0;count++;counts.put(word, count);collector.emit(new Values(word, count));}
@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word", "count"));}}
WordCount.java
-
Streaming Word Count
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
Groupings control how tuples are routed
-
Shuffle groupingTuples are randomly distributed across all of the
tasks running the bolt
-
Fields groupingGroups tuples by specific named fields and routes
them to the same task
-
Fields groupingGroups tuples by specific named fields and routes
them to the same task
Analogous to Ha
doops
partitioning beh
avior
-
Trending Topics
-
Twitter Trending Topics
TwitterStreamingTopicSpoutparallelism = 1 (unless you use GNip)
(word)
RollingCountsBoltparallelism = n
(word, coun
t)
IntermediateRankingsBoltparallelism = n
(rankings)
(tweets)
(JSON rankings)
RankingsReportBoltparallelism = 1
TotalRankingsBoltparallelism = 1
(rank
ings)
-
Live Coding!
-
Twitter Trending Topics
TwitterStreamingTopicSpoutparallelism = 1 (unless you use GNip)
(word)
RollingCountsBoltparallelism = n
(word, coun
t)
IntermediateRankingsBoltparallelism = n
(rankings)
(tweets)
(JSON rankings)
RankingsReportBoltparallelism = 1
TotalRankingsBoltparallelism = 1
(rank
ings)
-
Tips
-
loggly.com
Graylog2logstash
Use a log aggregator
-
"$topologyName-$buildNumber"
Rolling Deploys
-
1. Launch new topology
2. Wait for it to be healthy
3. Kill the old one
Rolling Deploys
-
These are under active development
Rolling Deploys
-
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("sentences", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("sentences");builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
java
see:https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
Tune your parallelism
-
Tune your parallelismSupervisor
Worker Process (JVM)
Executor (thread)
Task
Task
Executor (thread)
Task
Task
Worker Process (JVM)
Executor (thread)
Task
Task
Executor (thread)
Task
Task
Parallelism hints control the number of Executors
-
collector.emit(new Values(word, count));
see:https://github.com/nathanmarz/storm/wiki/Understanding-the-parallelism-of-a-Storm-topology
Anchor your tuples (or not)
collector.emit(tuple, new Values(word, count));
-
But Dan, you left out Trident!
-
if (storm == hadoop) { trident = pig / cascading}
-
A little taste of Trident TridentState urlToTweeters = topology.newStaticState(getUrlToTweetersState());TridentState tweetersToFollowers = topology.newStaticState(getTweeterToFollowersState());
topology.newDRPCStream("reach") .stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters")) .each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter")) .shuffle() .stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers")) .parallelismHint(200) .each(new Fields("followers"), new ExpandList(), new Fields("follower")) .groupBy(new Fields("follower")) .aggregate(new One(), new Fields("one")) .parallelismHint(20) .aggregate(new Count(), new Fields("reach"));
h,ps://github.com/nathanmarz/storm/wiki/Trident-tutorial
-
Thanks:
@stormprocessorhttp://github.com/nathanmarz/storm
Nathan Marz - @nathanmarz
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
Michael Knoll - @miguno
Michael Rose - @xorlevhttp://github.com/xorlev
-
https://github.com/danklynn/storm-starter/tree/gluecon2013