Download - Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Transcript
Page 1: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Scalable Stream ProcessingStorm, SEEP and Naiad

Amir H. [email protected]

KTH Royal Institute of Technology

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 1 / 69

Page 2: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation

I Users of big data applications expect fresh results.

I New stream processing systems (SPS) are designed to scale to largenumbers of machines.

I SPS design issues (reminder):• SPS data flow: composiation and manipulation• SPS runtime: parallelization, fault-tolerance, optimization

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 2 / 69

Page 3: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Outline

I Storm

I SEEP

I Naiad

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 3 / 69

Page 4: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 4 / 69

Page 5: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Contribution

I Storm is a real-time distributed stream data processing engine atTwitter.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 5 / 69

Page 6: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Data Model (1/3)

I Tuple• Core unit of data.• Immutable set of key/value pairs.

I Stream• Unbounded sequence of tuples.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 6 / 69

Page 7: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Data Model (2/3)

I Spouts• Source of streams.• Wraps a streaming data source and emits tuples.

I Bolts• Core functions of a streaming computation.• Receive tuples and do stuff.• Optionally emit additional tuples.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 7 / 69

Page 8: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Data Model (3/3)

I Topology• DAG of spouts and bolts.• Data flow representation streaming computation

I Storm executes spouts and bolts as individual tasks that run inparallel on multiple machines.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 8 / 69

Page 9: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation (1/3)

I Data parallelism

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 9 / 69

Page 10: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation (2/3)

I Shuffle grouping: randomly partitions the tuples.

I Field grouping: hashes on a subset of the tuple attributes.

I All grouping: replicates the entire stream to all the consumer tasks.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 10 / 69

Page 11: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation (2/3)

I Shuffle grouping: randomly partitions the tuples.

I Field grouping: hashes on a subset of the tuple attributes.

I All grouping: replicates the entire stream to all the consumer tasks.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 10 / 69

Page 12: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation (2/3)

I Shuffle grouping: randomly partitions the tuples.

I Field grouping: hashes on a subset of the tuple attributes.

I All grouping: replicates the entire stream to all the consumer tasks.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 10 / 69

Page 13: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation (3/3)

I Global grouping: sends the entire stream to a single bolt.

I Local grouping: sends tuples to the consumer bolts in the sameexecutor.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 11 / 69

Page 14: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Word Count in Storm

public class WordCountTopology {

public static class SplitSentence implements IRichBolt { }

public static class WordCount extends BaseBasicBolt { }

public static void main(String[] args) throws Exception {

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RandomSentenceSpout(), 5);

builder.setBolt("split", new SplitSentence(), 8)

.shuffleGrouping("spout");

builder.setBolt("count", new WordCount(), 12)

.fieldsGrouping("split", new Fields("word"));

Config conf = new Config();

conf.setMaxTaskParallelism(3);

LocalCluster cluster = new LocalCluster();

cluster.submitTopology("word-count", conf, builder.createTopology());

cluster.shutdown();

}

}

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 12 / 69

Page 15: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Architecture

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 13 / 69

Page 16: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Components (1/4)

I Nimbus• The master node.• Clients submit topologies to it.• Responsible for distributing and coordinating the execution of the

topology.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 14 / 69

Page 17: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Components (2/4)

I Zookeeper• Nimbus uses a combination of the local disk(s) and Zookeeper to

store state about the topology.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 15 / 69

Page 18: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Components (3/4)

I Worker nodes• Each worker node runs one or more worker processes.• Each worker process runs a JVM, in which it runs one or more ex-

ecutors.• Executors are made of one or more tasks, where the actual work for

a bolt or a spout is done in the task.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 16 / 69

Page 19: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Components (4/4)

I Supervisor• Each worker node runs a supervisor.• It receives assignments from Nimbus and spawns workers based on

the assignment.• Contact Nimbus with a periodic heartbeat protocol, advertising the

topologies that they are currently running, and any vacancies thatare available to run more topologies.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 17 / 69

Page 20: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Deployment (1/5)

I Topology submitter uploads topology to Nimbus.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 18 / 69

Page 21: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Deployment (2/5)

I Nimbus calculates assignments and sends to Zookeeper.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 19 / 69

Page 22: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Deployment (3/5)

I Supervisor nodes receive assignment information via Zookeeperwatches.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 20 / 69

Page 23: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Deployment (4/5)

I Supervisor nodes download topology from Nimbus.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 21 / 69

Page 24: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Storm Deployment (5/5)

I Supervisors spawn workers (JVM processes) to start the topology.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 22 / 69

Page 25: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (1/4)

I Workers heartbeat back to Supervisors and Nimbus via ZooKeeper,as well as locally.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 23 / 69

Page 26: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (2/4)

I If a worker dies (fails to heartbeat), the Supervisor will restart it.

I If a worker dies repeatedly, Nimbus will reassign the work to othernodes in the cluster.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 24 / 69

Page 27: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (3/4)

I If a supervisor node dies, Nimbus will reassign the work to othernodes.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 25 / 69

Page 28: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (4/4)

I If Nimbus dies, topologies will continue to function normally, butwon’t be able to perform reassignments.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 26 / 69

Page 29: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (1/6)

I Storm provides two types of semantic guarantees:

• At most once: each tuple is either processed once, or dropped inthe case of a failure.

• At least once (reliable processing): it guarantees that each tuplethat is input to the topology will be processed at least once.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 27 / 69

Page 30: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (2/6)

I Bolts may emit tuples anchored to the ones they received.• Tuple B is a descendant of Tuple A.

I Multiple anchorings form a Tuple tree.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 28 / 69

Page 31: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (3/6)

I Bolts can acknowledge that a tuple has been processed successfully.

I Acks are delivered via a system-level bolt.

I Bolts can also fail a tuple to trigger a spout to replay the original.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 29 / 69

Page 32: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (4/6)

I Any failure in the tuple tree will trigger a replay of the original tuple.

I How to track a large-scale tuple tree efficiently?

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 30 / 69

Page 33: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (5/6)

I Tuples are assigned a 64-bit message id at spout.

I Emitted tuples are assigned new message ids.

I These message ids are XORed and sent to the acker bolt along withthe original tuple message id.

I When the XOR checksum goes to zero, the acker bolt sends thefinal ack to the spout that admitted the tuple, and the spout knowsthat this tuple has been fully processed.

• a⊕ (a⊕ b)⊕ c⊕ (b⊕ c) == 0

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 31 / 69

Page 34: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Reliable Processing (6/6)

I It is possible that due to failure, some of the XOR checksum willnever go to zero.

I The spout initially assigns a timeout parameter to each tuple.

I The acker bolt keeps track of this timeout parameter, and if theXOR checksum does not become zero before the timeout, the tupleis considered to have failed.

• The data source will replay it back in the subsequent iteration.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 32 / 69

Page 35: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

SEEP

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 33 / 69

Page 36: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Contribution

I Build a stream processing system that scale out while remainingfault tolerant when queries contain stateful operators.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 34 / 69

Page 37: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Challenges

I Stateful operators• E.g., join or aggregation• Finite window of tuples: small amount of states

I Intra-query parallelism• Static vs. dynamic

I Fault tolerance

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 35 / 69

Page 38: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Core Idea

I Make operator state an external entity that can be managed by thestream processing system.

I Operators have direct access to states.

I The system manages states.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 36 / 69

Page 39: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Core Idea

I Make operator state an external entity that can be managed by thestream processing system.

I Operators have direct access to states.

I The system manages states.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 36 / 69

Page 40: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Core Idea

I Make operator state an external entity that can be managed by thestream processing system.

I Operators have direct access to states.

I The system manages states.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 36 / 69

Page 41: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

States (1/2)

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 37 / 69

Page 42: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

States (2/2)

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 38 / 69

Page 43: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Operator State Management

I On scale out: partition operator state correctly, maintaining consis-tency

I On failure recovery: restore state of failed operator

I Define primitives for state management and build other mechanismson top of them.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 39 / 69

Page 44: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Operator State Management

I On scale out: partition operator state correctly, maintaining consis-tency

I On failure recovery: restore state of failed operator

I Define primitives for state management and build other mechanismson top of them.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 39 / 69

Page 45: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Operator State Management

I On scale out: partition operator state correctly, maintaining consis-tency

I On failure recovery: restore state of failed operator

I Define primitives for state management and build other mechanismson top of them.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 39 / 69

Page 46: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Management Primitives

I Checkpoint• Makes state available to system.• Attaches last processed tuple timestamp.

I Backup/Restore• Moves copy of state from

one operator to another.

I Partition• Splits state to scale out an operator.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 40 / 69

Page 47: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Management Primitives

I Checkpoint• Makes state available to system.• Attaches last processed tuple timestamp.

I Backup/Restore• Moves copy of state from

one operator to another.

I Partition• Splits state to scale out an operator.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 40 / 69

Page 48: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Management Primitives

I Checkpoint• Makes state available to system.• Attaches last processed tuple timestamp.

I Backup/Restore• Moves copy of state from

one operator to another.

I Partition• Splits state to scale out an operator.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 40 / 69

Page 49: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Checkpoint

I Checkpoint state = the processing state + the buffer state

I That routing state is not included in the state checkpoint.• It only changes in case of scale out or recovery.

I The system executes checkpoint asynchronously and periodically.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 41 / 69

Page 50: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Checkpoint

I Checkpoint state = the processing state + the buffer state

I That routing state is not included in the state checkpoint.• It only changes in case of scale out or recovery.

I The system executes checkpoint asynchronously and periodically.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 41 / 69

Page 51: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Checkpoint

I Checkpoint state = the processing state + the buffer state

I That routing state is not included in the state checkpoint.• It only changes in case of scale out or recovery.

I The system executes checkpoint asynchronously and periodically.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 41 / 69

Page 52: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Backup and Restore (1/2)

I The operator state (i.e., the checkpoint output) is backed up to anupstream operator.

I After the operator state was backed up, already processed tuplesfrom output buffers in upstream operators can be discarded.

• They are no longer required for failure recovery.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69

Page 53: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Backup and Restore (1/2)

I The operator state (i.e., the checkpoint output) is backed up to anupstream operator.

I After the operator state was backed up, already processed tuplesfrom output buffers in upstream operators can be discarded.

• They are no longer required for failure recovery.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69

Page 54: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Backup and Restore (2/2)

I Backed up operator state is restored to another operator to recovera failed operator or to redistribute state across partitioned operators.

I After restoring the state, the system replays unprocessed tuples inthe output buffer from an upstream operator to bring the operator’sprocessing state up-to-date.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 43 / 69

Page 55: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Backup and Restore (2/2)

I Backed up operator state is restored to another operator to recovera failed operator or to redistribute state across partitioned operators.

I After restoring the state, the system replays unprocessed tuples inthe output buffer from an upstream operator to bring the operator’sprocessing state up-to-date.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 43 / 69

Page 56: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Partition

I Split the state of a stateful operator across the new partitionedoperators when it scales out.

I Partitioning the key space of the tuples processed by the operator.

I The routing state of its upstream operators must also be updatedto account for the new partitioned operators.

I The buffer state of the upstream operators is partitioned to ensurethat unprocessed tuples are dispatched to the correct partition.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 44 / 69

Page 57: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Partition

I Split the state of a stateful operator across the new partitionedoperators when it scales out.

I Partitioning the key space of the tuples processed by the operator.

I The routing state of its upstream operators must also be updatedto account for the new partitioned operators.

I The buffer state of the upstream operators is partitioned to ensurethat unprocessed tuples are dispatched to the correct partition.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 44 / 69

Page 58: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Partition

I Split the state of a stateful operator across the new partitionedoperators when it scales out.

I Partitioning the key space of the tuples processed by the operator.

I The routing state of its upstream operators must also be updatedto account for the new partitioned operators.

I The buffer state of the upstream operators is partitioned to ensurethat unprocessed tuples are dispatched to the correct partition.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 44 / 69

Page 59: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

State Primitives: Partition

I Split the state of a stateful operator across the new partitionedoperators when it scales out.

I Partitioning the key space of the tuples processed by the operator.

I The routing state of its upstream operators must also be updatedto account for the new partitioned operators.

I The buffer state of the upstream operators is partitioned to ensurethat unprocessed tuples are dispatched to the correct partition.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 44 / 69

Page 60: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Scale Out

I To scale out queries at runtime, the system partitions operatorson-demand in response to bottleneck operators.

I The load of the bottlenecked operator is shared among a set of newpartitioned operators.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 45 / 69

Page 61: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault-Tolerance

I Overload and failure are handled in the same fashion.

I Operator recovery becomes a special case of scale out, in which afailed operator is scaled out.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 46 / 69

Page 62: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault-Tolerant Scale Out Algorithm

I Two versions of operator’s state that can be partitioned for scaleout:

• The current state• The recent state checkpoint

I In SEEP, the system partitions the most recent state checkpoint.

I Its benefits:• Avoids adding further load to the operator, which is already

overloaded, by requesting it to checkpoint or partition its own state.• Makes the scale out process itself fault-tolerant.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 47 / 69

Page 63: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault-Tolerant Scale Out Algorithm

I Two versions of operator’s state that can be partitioned for scaleout:

• The current state• The recent state checkpoint

I In SEEP, the system partitions the most recent state checkpoint.

I Its benefits:• Avoids adding further load to the operator, which is already

overloaded, by requesting it to checkpoint or partition its own state.• Makes the scale out process itself fault-tolerant.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 47 / 69

Page 64: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault-Tolerant Scale Out Algorithm

I Two versions of operator’s state that can be partitioned for scaleout:

• The current state• The recent state checkpoint

I In SEEP, the system partitions the most recent state checkpoint.

I Its benefits:• Avoids adding further load to the operator, which is already

overloaded, by requesting it to checkpoint or partition its own state.• Makes the scale out process itself fault-tolerant.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 47 / 69

Page 65: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Naiad

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 48 / 69

Page 66: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation (1/2)

I Dataflow

I Dataflow (parallelization)

I Dataflow (iteration)

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 49 / 69

Page 67: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation (1/2)

I Dataflow

I Dataflow (parallelization)

I Dataflow (iteration)

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 49 / 69

Page 68: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation (1/2)

I Dataflow

I Dataflow (parallelization)

I Dataflow (iteration)

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 49 / 69

Page 69: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation (2/2)

I Batch iteration

I Streaming iteration

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 50 / 69

Page 70: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Motivation (2/2)

I Batch iteration

I Streaming iteration

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 50 / 69

Page 71: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Naiad (1/2)

I Naiad is a distributed system for executing data parallel, and cyclicdataflow programs.

I It satisfies:• Stream processing that produces low-latency results for non-iterative

algorithms,• Batch processing that iterates synchronously at the expense of la-

tency.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 51 / 69

Page 72: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Naiad (2/2)

I Asynchronous execution: low latency of stream processors

I Fine-grained synchronous execution: high throughput of batch pro-cessors

I Support for iterative and incremental computations

I Timely dataflow: a new computation model

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 52 / 69

Page 73: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Naiad (2/2)

I Asynchronous execution: low latency of stream processors

I Fine-grained synchronous execution: high throughput of batch pro-cessors

I Support for iterative and incremental computations

I Timely dataflow: a new computation model

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 52 / 69

Page 74: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (1/3)

I Directed graph that may have cycles (possibly nested)

I Stateful vertices that consume and produce messages asyn-chronously.

I Structured loops allow feedback in the dataflow to implement iter-ation.

I Explicit notifications for synchronous processing to indicate allrecords for a given round of input or loop iteration have been re-ceived.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 53 / 69

Page 75: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (1/3)

I Directed graph that may have cycles (possibly nested)

I Stateful vertices that consume and produce messages asyn-chronously.

I Structured loops allow feedback in the dataflow to implement iter-ation.

I Explicit notifications for synchronous processing to indicate allrecords for a given round of input or loop iteration have been re-ceived.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 53 / 69

Page 76: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (1/3)

I Directed graph that may have cycles (possibly nested)

I Stateful vertices that consume and produce messages asyn-chronously.

I Structured loops allow feedback in the dataflow to implement iter-ation.

I Explicit notifications for synchronous processing to indicate allrecords for a given round of input or loop iteration have been re-ceived.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 53 / 69

Page 77: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (2/3)

I Specified input and output vertices

I Timestamped messages passed along edges.

• Timestamp : (

epoch︷ ︸︸ ︷e ∈ N,

loop counters︷ ︸︸ ︷〈c1, . . . , ck〉 ∈ Nk)

I Epoch: each record at input is labeled with epoch number to dis-tinguish between different batches of data.

I Loop counters: a timestamp has k ≥ 0 loop counters, where k isthe depth of nesting.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 54 / 69

Page 78: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (2/3)

I Specified input and output vertices

I Timestamped messages passed along edges.

• Timestamp : (

epoch︷ ︸︸ ︷e ∈ N,

loop counters︷ ︸︸ ︷〈c1, . . . , ck〉 ∈ Nk)

I Epoch: each record at input is labeled with epoch number to dis-tinguish between different batches of data.

I Loop counters: a timestamp has k ≥ 0 loop counters, where k isthe depth of nesting.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 54 / 69

Page 79: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (2/3)

I Specified input and output vertices

I Timestamped messages passed along edges.

• Timestamp : (

epoch︷ ︸︸ ︷e ∈ N,

loop counters︷ ︸︸ ︷〈c1, . . . , ck〉 ∈ Nk)

I Epoch: each record at input is labeled with epoch number to dis-tinguish between different batches of data.

I Loop counters: a timestamp has k ≥ 0 loop counters, where k isthe depth of nesting.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 54 / 69

Page 80: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Timely Dataflow and Timestamp (3/3)

I Timestamp : (

epoch︷ ︸︸ ︷e ∈ N,

loop counters︷ ︸︸ ︷〈c1, . . . , ck〉 ∈ Nk)

I Passing ingress (I) vertex: (e, 〈c1, . . . , ck〉)→ (e, 〈c1, . . . , ck, 0〉)

I Passing egress (E) vertex: (e, 〈c1, . . . , ck〉)→ (e, 〈c1, . . . , ck−1〉)

I Passing feedback (F) vertex: (e, 〈c1, . . . , ck〉)→ (e, 〈c1, . . . , ck+1〉)

I Timestamp ordering: t1 = (e1, ~x1) and t2 = (e2, ~x2), t1 ≤ t2 ⇐⇒e1 ≤ e2 ∧ ~x1 ≤ ~x2

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 55 / 69

Page 81: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Vertex Computation (1/3)

I Timely dataflow vertex: a possibly stateful object that sends andreceives messages and requests and receives notifications.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 56 / 69

Page 82: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Vertex Computation (2/3)

I Message exchange is completely asynchronous.• u.SendBy(e: Edge, m: Message, t: Timestamp)

Sending a message by u.• v.OnRecv(e: Edge, m: Message, t: Timestamp)

Message is delivered to v.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 57 / 69

Page 83: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Vertex Computation (3/3)

I Notification delivery is synchronous.• v.NotifyAt(t: Timestamp)

Requesting a notification by v.• v.OnNotify(t: Timestamp)

A notification is delivered to v, after all messages with timestampt’ ≤ t have been delivereded.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 58 / 69

Page 84: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Parallelisation

I Data parallelism

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 59 / 69

Page 85: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Word Count in Naiad (1/2)

public static class ExtensionMethods {

public static Stream<Pair<TRecord, Int64>, Epoch> StrCount<TRecord>(Stream<TRecord, Epoch> stream) {

return stream.NewUnaryStage((i, s) => new CountVertex<TRecord>(i, s), ...);

}

internal class CountVertex<TRecord> : UnaryVertex<TRecord, Pair<TRecord, Int64>, Epoch> {

private Dictionary<TRecord, Int64> Counts = new Dictionary<TRecord, long>();

private HashSet<TRecord> Changed = new HashSet<TRecord>();

public override void OnReceive(Message<TRecord, Epoch> message) {

this.NotifyAt(message.time);

for (int i = 0; i < message.length; i++) {

var data = message.payload[i];

if (!this.Counts.ContainsKey(data))

this.Counts[data] = 0;

this.Counts[data] += 1;

this.Changed.Add(data);

}

}

// once all records of an epoch are received, we should send the counts along.

public override void OnNotify(Epoch time) {

var output = this.Output.GetBufferForTime(time);

foreach (var record in this.Changed)

output.Send(new Pair<TRecord, Int64>(record, this.Counts[record]));

// reset observed records

this.Changed.Clear();

}

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 60 / 69

Page 86: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Word Count in Naiad (2/2)

public class WordCount {

public void Execute(string[] args) {

// the first thing to do is to allocate a computation from args.

using (var computation = NewComputation.FromArgs(ref args)) {

// 1. Make a new data source, to which we will supply strings.

var source = new BatchedDataSource<string>();

// 2. Attach source, and apply count extension method.

var counts = computation.NewInput(source).StrCount();

// 3. Subscribe to the resulting stream with a callback to print the outputs.

counts.Subscribe(list => { foreach (var element in list) Console.WriteLine(element); });

// activate the execution of this graph (no new stages allowed).

computation.Activate();

if (computation.Configuration.ProcessID == 0) {

// read lines of input and hand them to the input, until an empty line appears.

for (var line = Console.ReadLine(); line.Length > 0; line = Console.ReadLine())

source.OnNext(line.Split());

}

source.OnCompleted();

computation.Join();

}

}

}

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 61 / 69

Page 87: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Naiad Architecture

I Workers: the smallest unit of computation (a single thread).

I Processes: a larger unit of computation (a single OS process).• It can contain one or more workers.• A machine may host one or more processes.

I Lock-free queue for data exchange between workers in the sameprocess, and TCP connection between two different processes.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 62 / 69

Page 88: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (1/2)

I Each stateful vertex implements a CHECKPOINT and RESTOREinterface.

I Each vertex may either:• Log data as computation proceeds,• or write a full checkpoint when requested (potentially more compact).

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 63 / 69

Page 89: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Fault Tolerance (2/2)

I In periodic checkpoints:• All processes first pause worker and message delivery threads• Flush message queues by delivering outstanding OnRecv events• Invoke CHECKPOINT on each stateful vertex.

I The system then resumes worker and message delivery threads andflushes buffered messages.

I To recover from a failed process, all live processes revert to thelast durable checkpoint, and the vertices from the failed process arereassigned to the remaining processes.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 64 / 69

Page 90: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Summary

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 65 / 69

Page 91: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Summary

I Storm• Tuple and stream• Spout, bolt, and topology• Nimbus, worker, supervisor, and zookeeper• Reliable processing: xored ids

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 66 / 69

Page 92: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Summary

I SEEP• Make operator state an external entity• Primitives for state management: checkpoint, backup/restore,

partition

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 67 / 69

Page 93: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Summary

I Naiad• Timely dataflow• Asynchronous and fine-grained synchronous• Timestamp messages, epoch, and loop counters• Streaming context and loop context• Workers and processes

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 68 / 69

Page 94: Scalable Stream Processing Storm, SEEP and Naiad...Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 42 / 69 State Primitives: Backup and Restore (1/2) I The operator state (i.e.,

Questions?

Acknowledgements

Some slides and pictures were derived from Peter Pietzuch (Imperial College) andDerek G. Murray (Google) slides.

Amir H. Payberah (KTH) Storm, SEEP and Naiad 2016/09/20 69 / 69