Programming at Scale: Dataflow and Consistency

229
cs378h Programming at Scale: Dataflow and Consistency

Transcript of Programming at Scale: Dataflow and Consistency

Page 1: Programming at Scale: Dataflow and Consistency

cs378h

Programming at Scale: Dataflow and Consistency

Page 2: Programming at Scale: Dataflow and Consistency

Today

Questions?

Administrivia

• Rust lab due today!

• Project Proposal Due Thursday!

Agenda:

• Dataflow Wrapup

• Concurrency & Consistency at Scale

Page 3: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

Page 4: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

map

Page 5: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

map

reduce

Page 6: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 7: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 8: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 9: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 10: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 11: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Page 12: Programming at Scale: Dataflow and Consistency

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Key idea: adapt workload to parallel patternsQuestions:• What kinds of computations can this express?• What other patterns could we use?

Page 13: Programming at Scale: Dataflow and Consistency

How Does Parallelization Work?

INPUTFILE(s)

Page 14: Programming at Scale: Dataflow and Consistency

Execution

Group by?

Page 15: Programming at Scale: Dataflow and Consistency

Execution

Group by?

Key idea → shuffle == sort!

Page 16: Programming at Scale: Dataflow and Consistency

Task Granularity And Pipelining

|map tasks| >> |machines| -- why?

Page 17: Programming at Scale: Dataflow and Consistency

Task Granularity And Pipelining

|map tasks| >> |machines| -- why?Minimize fault recovery timePipeline map with other tasksEasier to load balance dynamically

Page 18: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Page 19: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Page 20: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Page 21: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigm

Page 22: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigmSub-optimal: brute force, no indexing

Page 23: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Page 24: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Page 25: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Incompatible with most DBMS tools

Page 26: Programming at Scale: Dataflow and Consistency

MapReduce: not without Controversy

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Incompatible with most DBMS tools

So why is it such a big success?

Page 27: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Page 28: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigm

Page 29: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigmSub-optimal: brute force, no indexing

Page 30: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Page 31: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Page 32: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Incompatible with most DBMS tools

Page 33: Programming at Scale: Dataflow and Consistency

Why is MapReduce backwards?

Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine

Missing most DBMS featuresSchema, foreign keys, …

Incompatible with most DBMS tools

So why is it such a big success?

Page 34: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

Page 35: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

• MR is a dataflow engine

Page 36: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

• MR is a dataflow engine

Page 37: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

• MR is a dataflow engine

• Lots of others• Dryad

• DryadLINQ

• Dandelion

• CIEL

• GraphChi/Pregel

• Spark

Page 38: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

• MR is a dataflow engine

• Lots of others• Dryad

• DryadLINQ

• Dandelion

• CIEL

• GraphChi/Pregel

• Spark

Page 39: Programming at Scale: Dataflow and Consistency

MapReduce and Dataflow

• MR is a dataflow engine

• Lots of others• Dryad

• DryadLINQ

• Dandelion

• CIEL

• GraphChi/Pregel

• Spark

Taxonomies:

• DAG instead of BSP

• Interface variety• Memory FIFO• Disk• Network

• Flexible Modular Composition

Page 40: Programming at Scale: Dataflow and Consistency

Dryad (2007): 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

Page 41: Programming at Scale: Dataflow and Consistency

Dryad (2007): 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D

grep1000 | sed500 | sort1000 | awk500 | perl50

Page 42: Programming at Scale: Dataflow and Consistency

Dataflow Engines

Page 43: Programming at Scale: Dataflow and Consistency

Dataflow Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

Channels

Stage

Page 44: Programming at Scale: Dataflow and Consistency

Dataflow Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

Channels

Stage

How to implement?

Page 45: Programming at Scale: Dataflow and Consistency

Channels

X

M

Items

Finite streams of items

• distributed filesystem files(persistent)

• SMB/NTFS files (temporary)

• TCP pipes(inter-machine)

• memory FIFOs (intra-machine)

Page 46: Programming at Scale: Dataflow and Consistency

Channels

X

M

Items

Finite streams of items

• distributed filesystem files(persistent)

• SMB/NTFS files (temporary)

• TCP pipes(inter-machine)

• memory FIFOs (intra-machine)

Key idea:Encapsulate data movement behind

channel abstraction → gets programmer out of the picture

Page 47: Programming at Scale: Dataflow and Consistency

Spark (2012) Background

Commodity clusters: important platformIn industry: search, machine translation, ad targeting, …

In research: bioinformatics, NLP, climate simulation, …

Cluster-scale models (e.g. MR) de facto standardFault tolerance through replicated durable storage

Dataflow is the common theme

Page 48: Programming at Scale: Dataflow and Consistency

Spark (2012) Background

Commodity clusters: important platformIn industry: search, machine translation, ad targeting, …

In research: bioinformatics, NLP, climate simulation, …

Cluster-scale models (e.g. MR) de facto standardFault tolerance through replicated durable storage

Dataflow is the common theme

Multi-coreIteration

Page 49: Programming at Scale: Dataflow and Consistency

Motivation

Programming models for clusters transform data flowing from stable storage to stable storage

E.g., MapReduce:

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

Page 50: Programming at Scale: Dataflow and Consistency

Motivation

Programming models for clusters transform data flowing from stable storage to stable storage

E.g., MapReduce:

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

Benefits of data flow: runtime can decide where to run tasks and can automatically

recover from failures

Page 51: Programming at Scale: Dataflow and Consistency

Iterative Computations: PageRank

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

Page 52: Programming at Scale: Dataflow and Consistency

Iterative Computations: PageRank

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

Page 53: Programming at Scale: Dataflow and Consistency

Iterative Computations: PageRank

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

Solution: augment data flow model with “resilient distributed datasets” (RDDs)

Page 54: Programming at Scale: Dataflow and Consistency

Programming Model

• Resilient distributed datasets (RDDs)• Immutable collections partitioned across cluster that can

be rebuilt if a partition is lost

• Created by transforming data in stable storage using data flow operators (map, filter, group-by, …)

• Can be cached across parallel operations

Page 55: Programming at Scale: Dataflow and Consistency

Programming Model

• Resilient distributed datasets (RDDs)• Immutable collections partitioned across cluster that can

be rebuilt if a partition is lost

• Created by transforming data in stable storage using data flow operators (map, filter, group-by, …)

• Can be cached across parallel operations

• Parallel operations on RDDs• Reduce, collect, count, save, …

Page 56: Programming at Scale: Dataflow and Consistency

Programming Model

• Resilient distributed datasets (RDDs)• Immutable collections partitioned across cluster that can

be rebuilt if a partition is lost

• Created by transforming data in stable storage using data flow operators (map, filter, group-by, …)

• Can be cached across parallel operations

• Parallel operations on RDDs• Reduce, collect, count, save, …

• Restricted shared variables• Accumulators, broadcast variables

Page 57: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

Page 58: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 59: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”) WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 60: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”) WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Base RDDBase RDD

Page 61: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”) WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 62: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 63: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Transformed RDDTransformed RDD

Page 64: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 65: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 66: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 67: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Cached RDDCached RDD

Page 68: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Page 69: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

Page 70: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).countParallel operationParallel operation

Page 71: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

Page 72: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

Page 73: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

tasks

Page 74: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

tasks

Page 75: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

tasks

results

Page 76: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 77: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 78: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 79: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

tasks

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 80: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

tasks

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 81: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 82: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

. . .

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Page 83: Programming at Scale: Dataflow and Consistency

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns

lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

messages = errors.map(_.split(‘\t’)(2))

cachedMsgs = messages.cache()

Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count

cachedMsgs.filter(_.contains(“bar”)).count

. . .

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Result: full-text search of Wikipedia in <1 sec (vs20 sec for on-disk data)

Page 84: Programming at Scale: Dataflow and Consistency

RDD Fault Tolerance

• RDDs maintain lineage information that can be used to reconstruct lost partitions

• Ex:cachedMsgs = textFile(...).filter(_.contains(“error”))

.map(_.split(‘\t’)(2))

.persist()

HdfsRDDpath: hdfs://…

HdfsRDDpath: hdfs://…

FilteredRDDfunc: contains(...)

FilteredRDDfunc: contains(...)

MappedRDDfunc: split(…)

MappedRDDfunc: split(…)

CachedRDDCachedRDD

Page 85: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Page 86: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

ParallelDatabases

SQL

Page 87: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

ParallelDatabases

SQL

Page 88: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

ParallelDatabases

SQL Sawzall

Page 89: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

ParallelDatabases

SQL Sawzall

Page 90: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

ParallelDatabases

SQL Sawzall

Page 91: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

ParallelDatabases

SQL Sawzall

Page 92: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall

Page 93: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall

Page 94: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall

Page 95: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall

Page 96: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQL LINQ, SQLSawzall

Cosmos, HPC, Azure

Page 97: Programming at Scale: Dataflow and Consistency

Data-Parallel Computation Systems

Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQL LINQ, SQLSawzall

Cosmos, HPC, Azure

Spark

Page 98: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Page 99: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Consistency

Page 100: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Da

taM

od

el

Consistency

Page 101: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Da

taM

od

el

Consistency

Page 102: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Page 103: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

• Atomicity• Consistency• Isolation• Durability

Page 104: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

• Basically Available• Soft State• Eventually Consistent

• Atomicity• Consistency• Isolation• Durability

Page 105: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Page 106: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Page 107: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Key Value Stores

Page 108: Programming at Scale: Dataflow and Consistency

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Key Value Stores

Document Stores

Page 109: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Key Value Stores

Document Stores

Page 110: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Page 111: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Page 112: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Page 113: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Page 114: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

• Shared-Disk• Range-Sharding• Hash-Sharding• Consistent Hashing

Page 115: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Page 116: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

• Primary-Backup• Commit-Consensus

Protocol• Sync/Async

Page 117: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Page 118: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

• Logging• Update In Place• Caching• In-Memory Storage

Page 119: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Page 120: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support• Secondary Indexing• Query Planning• Materialized Views• Analytics

Page 121: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Page 122: Programming at Scale: Dataflow and Consistency

Wide-Column Stores

(Yet) Another Framework

22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Replication

Storage

Query Support

Still not a perfect framework

Cons:

● Many dimensions contain sub-dimensions

● Many concerns fundamentally coupled

● Dimensions are often un- or partially-ordered

Pros:

• Makes important concerns explicit

• Cleanly taxonomizes most modern systems

Page 123: Programming at Scale: Dataflow and Consistency

Consistency

Page 124: Programming at Scale: Dataflow and Consistency

Consistency

How to keep data in sync?

Page 125: Programming at Scale: Dataflow and Consistency

Consistency

How to keep data in sync?

• Partitioning → single row spread over multiple machines

Page 126: Programming at Scale: Dataflow and Consistency

Consistency

How to keep data in sync?

• Partitioning → single row spread over multiple machines

Page 127: Programming at Scale: Dataflow and Consistency

Consistency

Partitions

How to keep data in sync?

• Partitioning → single row spread over multiple machines

Page 128: Programming at Scale: Dataflow and Consistency

Consistency

Partitions

How to keep data in sync?

• Partitioning → single row spread over multiple machines

• Redundancy → single datum spread over multiple machines

Page 129: Programming at Scale: Dataflow and Consistency

Consistency

Partitions

How to keep data in sync?

• Partitioning → single row spread over multiple machines

• Redundancy → single datum spread over multiple machines

Page 130: Programming at Scale: Dataflow and Consistency

Consistency

Partitions

How to keep data in sync?

• Partitioning → single row spread over multiple machines

• Redundancy → single datum spread over multiple machines

Page 131: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

R1 R2writer reader

Write(k,v) Read(k,v)

Page 132: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes

R1 R2writer reader

Write(k,v) Read(k,v)

Page 133: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes• Data is replicated among a set of servers

R1 R2writer reader

Write(k,v) Read(k,v)

Page 134: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes• Data is replicated among a set of servers

• Writes must be performed at all servers

R1 R2writer reader

Write(k,v) Read(k,v)

Page 135: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes• Data is replicated among a set of servers

• Writes must be performed at all servers

• Reads return the result of one or more past writes

R1 R2writer reader

Write(k,v) Read(k,v)

Page 136: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes• Data is replicated among a set of servers

• Writes must be performed at all servers

• Reads return the result of one or more past writes

R1 R2writer reader

Write(k,v) Read(k,v)

• How should we implement write?

Page 137: Programming at Scale: Dataflow and Consistency

Consistency: the core problem

• Clients perform reads and writes• Data is replicated among a set of servers

• Writes must be performed at all servers

• Reads return the result of one or more past writes

R1 R2writer reader

Write(k,v) Read(k,v)

• How should we implement write?• How to implement read?

Page 138: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

Page 139: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:

Page 140: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

Page 141: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

Page 142: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability:

Page 143: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

Page 144: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance:

Page 145: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Page 146: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Page 147: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Page 148: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Page 149: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Why is this “theorem” true?

Page 150: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Why is this “theorem” true?

Page 151: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Why is this “theorem” true?

Page 152: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Why is this “theorem” true?

if(partition) { keep going } → !consistent && available

Page 153: Programming at Scale: Dataflow and Consistency

Consistency: CAP Theorem

• A distributed system can satisfy at most 2/3 guarantees of:1. Consistency:

• all nodes see same data at any time

• or reads return latest written value by any client

2. Availability: • system allows operations all the time,

• and operations return quickly

3. Partition-tolerance: • system continues to work in spite of network partitions

Why care about CAP Properties?Availability

•Reads/writes complete reliably and quickly.•E.g. Amazon, each ms latency → $6M yearly loss.

Partitions• Internet router outages• Under-sea cables cut• rack switch outage• system should continue functioning normally!

Consistency • all nodes see same data at any time, or reads return latest

written value by any client.• This basically means correctness!

Why is this “theorem” true?

if(partition) { keep going } → !consistent && availableif(partition) { stop } → consistent && !available

Page 154: Programming at Scale: Dataflow and Consistency

CAP Implications

• A distributed storage system can achieve at most two of C, A, and P.

• When partition-tolerance is important, you have to choose between consistency and availability

Consistency

Partition-tolerance Availability

RDBMSs

(non-replicated)

Cassandra, RIAK,

Dynamo, Voldemort

HBase, HyperTable,

BigTable, Spanner

Page 155: Programming at Scale: Dataflow and Consistency

CAP Implications

• A distributed storage system can achieve at most two of C, A, and P.

• When partition-tolerance is important, you have to choose between consistency and availability

Consistency

Partition-tolerance Availability

RDBMSs

(non-replicated)

Cassandra, RIAK,

Dynamo, Voldemort

HBase, HyperTable,

BigTable, Spanner

CAP is flawed

Page 156: Programming at Scale: Dataflow and Consistency

CAP Implications

• A distributed storage system can achieve at most two of C, A, and P.

• When partition-tolerance is important, you have to choose between consistency and availability

Consistency

Partition-tolerance Availability

RDBMSs

(non-replicated)

Cassandra, RIAK,

Dynamo, Voldemort

HBase, HyperTable,

BigTable, Spanner

PACELC:

if(partition) {choose A or C

} else {choose latency or consistency

}

CAP is flawed

Page 157: Programming at Scale: Dataflow and Consistency

Consistency Spectrum

Strong

(e.g., Sequential)Eventual

More consistency

Faster reads and writes

Page 158: Programming at Scale: Dataflow and Consistency

Spectrum Ends: Eventual Consistency

• Eventual Consistency• If writes to a key stop, all replicas of key will converge

• Originally from Amazon’s Dynamo and LinkedIn’s Voldemort systems

Strong

(e.g., Sequential)Eventual

More consistency

Faster reads and writes

Page 159: Programming at Scale: Dataflow and Consistency

Spectrum Ends: Strong Consistency

• Strict:• Absolute time ordering of all shared accesses, reads always return last write

• Linearizability: • Each operation is visible (or available) to all other clients in real-time order

• Sequential Consistency [Lamport]:• "... the result of any execution is the same as if the operations of all the processors

were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

• After the fact, find a “reasonable” ordering of the operations (can re-order operations) that obeys sanity (consistency) at all clients, and across clients.

• ACID properties

Page 160: Programming at Scale: Dataflow and Consistency

Many Many Consistency Models

EventualStrong

(e.g., Sequential, Strict)

CausalRed-Blue

CRDTsPer-key sequential

Probabilistic

Page 161: Programming at Scale: Dataflow and Consistency

Many Many Consistency Models

EventualStrong

(e.g., Sequential, Strict)

CausalRed-Blue

CRDTsPer-key sequential

Probabilistic

• Amazon S3 – eventual consistency

• Amazon Simple DB – eventual or strong

• Google App Engine – strong or eventual

• Yahoo! PNUTS – eventual or strong

• Windows Azure Storage – strong (or eventual)

• Cassandra – eventual or strong (if R+W > N)

• ...

Page 162: Programming at Scale: Dataflow and Consistency

Many Many Consistency Models

EventualStrong

(e.g., Sequential, Strict)

CausalRed-Blue

CRDTsPer-key sequential

Probabilistic

Question: How to choose what to use or support?

• Amazon S3 – eventual consistency

• Amazon Simple DB – eventual or strong

• Google App Engine – strong or eventual

• Yahoo! PNUTS – eventual or strong

• Windows Azure Storage – strong (or eventual)

• Cassandra – eventual or strong (if R+W > N)

• ...

Page 163: Programming at Scale: Dataflow and Consistency

Some Consistency Guarantees

Strong Consistency See all previous writes.

Eventual Consistency See subset of previous writes.

Consistent Prefix See initial sequence of writes.

Bounded Staleness See all “old” writes.

Monotonic Reads See increasing subset of writes.

Read My Writes See all writes performed by reader.

A D F

D A A

C B A

B C D

C B B

C C C

Page 164: Programming at Scale: Dataflow and Consistency

Some Consistency Guarantees

Strong Consistency See all previous writes.

Eventual Consistency See subset of previous writes.

Consistent Prefix See initial sequence of writes.

Bounded Staleness See all “old” writes.

Monotonic Reads See increasing subset of writes.

Read My Writes See all writes performed by reader.

A D F

D A A

C B A

B C D

C B B

C C C

Page 165: Programming at Scale: Dataflow and Consistency

Some Consistency Guarantees

Strong Consistency See all previous writes.

Eventual Consistency See subset of previous writes.

Consistent Prefix See initial sequence of writes.

Bounded Staleness See all “old” writes.

Monotonic Reads See increasing subset of writes.

Read My Writes See all writes performed by reader.

A D F

D A A

C B A

B C D

C B B

C C C

Strong

RMWMonotonicBoundedPrefix

Eventual

metric =set of allowable read results

strength

Page 166: Programming at Scale: Dataflow and Consistency

The Game of Soccer

Page 167: Programming at Scale: Dataflow and Consistency

The Game of Soccer

Page 168: Programming at Scale: Dataflow and Consistency

The Game of Soccer

Page 169: Programming at Scale: Dataflow and Consistency

The Game of Soccer

Page 170: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

Page 171: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

Page 172: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

Page 173: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

Page 174: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

Page 175: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Page 176: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 177: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

Page 178: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Page 179: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

Page 180: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

Page 181: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

hScore = Read(“home”);

Page 182: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

hScore = Read(“home”);

vScore = Read(“visit”);

Page 183: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

hScore = Read(“home”);

vScore = Read(“visit”);

if (hScore == vScore)

Page 184: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

hScore = Read(“home”);

vScore = Read(“visit”);

if (hScore == vScore)

play-overtime

Page 185: Programming at Scale: Dataflow and Consistency

The Game of Soccerfor half = 1 .. 2 {

while half not over {

kick-the-ball-at-the-goal

for each goal {

if visiting-team-scored {

score = Read (“visitors”);

Write (“visitors”, score + 1);

} else {

score = Read (“home”);

Write (“home”, score + 1);

} } }

hScore = Read(“home”);

vScore = Read(“visit”);

if (hScore == vScore)

play-overtime

Page 186: Programming at Scale: Dataflow and Consistency

Official Scorekeeper

33

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 187: Programming at Scale: Dataflow and Consistency

Official Scorekeeper

Desired consistency?

33

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 188: Programming at Scale: Dataflow and Consistency

Official Scorekeeper

Desired consistency?

Strong

33

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 189: Programming at Scale: Dataflow and Consistency

Official Scorekeeper

Desired consistency?

Strong

= Read My Writes!

33

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 190: Programming at Scale: Dataflow and Consistency

Official Scorekeeper

Desired consistency?

Strong

= Read My Writes!

33

score = Read (“visitors”);

Write (“visitors”, score + 1);

Page 191: Programming at Scale: Dataflow and Consistency

Referee

34

vScore = Read (“visitors”);

hScore = Read (“home”);

if vScore == hScore

play-overtime

Page 192: Programming at Scale: Dataflow and Consistency

Referee

Desired consistency?

34

vScore = Read (“visitors”);

hScore = Read (“home”);

if vScore == hScore

play-overtime

Page 193: Programming at Scale: Dataflow and Consistency

Referee

Desired consistency?

Strong consistency

34

vScore = Read (“visitors”);

hScore = Read (“home”);

if vScore == hScore

play-overtime

Page 194: Programming at Scale: Dataflow and Consistency

Radio Reporter

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 195: Programming at Scale: Dataflow and Consistency

Radio Reporter

Desired consistency?

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 196: Programming at Scale: Dataflow and Consistency

Radio Reporter

Desired consistency?

Consistent Prefix

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 197: Programming at Scale: Dataflow and Consistency

Radio Reporter

Desired consistency?

Consistent Prefix

Monotonic Reads

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 198: Programming at Scale: Dataflow and Consistency

Radio Reporter

Desired consistency?

Consistent Prefix

Monotonic Readsor Bounded Staleness

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 199: Programming at Scale: Dataflow and Consistency

Radio Reporter

Desired consistency?

Consistent Prefix

Monotonic Readsor Bounded Staleness

35

do {BeginTx();

vScore = Read (“visitors”);hScore = Read (“home”);

EndTx();report vScore and hScore;sleep (30 minutes);

}

Page 200: Programming at Scale: Dataflow and Consistency

Sportswriter

36

While not end of game {

drink beer;

smoke cigar;

}

go out to dinner;

vScore = Read (“visitors”);

hScore = Read (“home”);

write article;

Page 201: Programming at Scale: Dataflow and Consistency

Sportswriter

Desired consistency?

36

While not end of game {

drink beer;

smoke cigar;

}

go out to dinner;

vScore = Read (“visitors”);

hScore = Read (“home”);

write article;

Page 202: Programming at Scale: Dataflow and Consistency

Sportswriter

Desired consistency?

Eventual

36

While not end of game {

drink beer;

smoke cigar;

}

go out to dinner;

vScore = Read (“visitors”);

hScore = Read (“home”);

write article;

Page 203: Programming at Scale: Dataflow and Consistency

Sportswriter

Desired consistency?

Eventual

Bounded Staleness36

While not end of game {

drink beer;

smoke cigar;

}

go out to dinner;

vScore = Read (“visitors”);

hScore = Read (“home”);

write article;

Page 204: Programming at Scale: Dataflow and Consistency

Statistician

37

Wait for end of game;

score = Read (“home”);

stat = Read (“season-goals”);

Write (“season-goals”, stat + score);

Page 205: Programming at Scale: Dataflow and Consistency

Statistician

Desired consistency?

37

Wait for end of game;

score = Read (“home”);

stat = Read (“season-goals”);

Write (“season-goals”, stat + score);

Page 206: Programming at Scale: Dataflow and Consistency

Statistician

Desired consistency?

Strong Consistency (1st read)

37

Wait for end of game;

score = Read (“home”);

stat = Read (“season-goals”);

Write (“season-goals”, stat + score);

Page 207: Programming at Scale: Dataflow and Consistency

Statistician

Desired consistency?

Strong Consistency (1st read)

Read My Writes (2nd read)

37

Wait for end of game;

score = Read (“home”);

stat = Read (“season-goals”);

Write (“season-goals”, stat + score);

Page 208: Programming at Scale: Dataflow and Consistency

Stat Watcher

38

do {

stat = Read (“season-goals”);

discuss stats with friends;

sleep (1 day);

}

Page 209: Programming at Scale: Dataflow and Consistency

Stat Watcher

Desired consistency?

38

do {

stat = Read (“season-goals”);

discuss stats with friends;

sleep (1 day);

}

Page 210: Programming at Scale: Dataflow and Consistency

Stat Watcher

Desired consistency?

Eventual Consistency

38

do {

stat = Read (“season-goals”);

discuss stats with friends;

sleep (1 day);

}

Page 211: Programming at Scale: Dataflow and Consistency

Official scorekeeper:score = Read (“visitors”);

Write (“visitors”, score + 1);

Statistician:

Wait for end of game;

score = Read (“home”);

stat = Read (“season-goals”);

Write (“season-goals”, stat + score);

Referee:

Radio reporter:do {

vScore = Read (“visitors”);

hScore = Read (“home”);

report vScore and hScore;

sleep (30 minutes);

}

Sportswriter:

While not end of game {

drink beer;

smoke cigar;

}

go out to dinner;

vScore = Read (“visitors”);

hScore = Read (“home”);

write article;

Stat watcher:

stat = Read (“season-runs”);

discuss stats with friends;

Page 212: Programming at Scale: Dataflow and Consistency

Sequential Consistency

• weaker than strict/strong consistency• All operations are executed in some sequential order

• each process issues operations in program order

• Any valid interleaving is allowed

• All agree on the same interleaving

• Each process preserves its program order

Page 213: Programming at Scale: Dataflow and Consistency

Sequential Consistency

• weaker than strict/strong consistency• All operations are executed in some sequential order

• each process issues operations in program order

• Any valid interleaving is allowed

• All agree on the same interleaving

• Each process preserves its program order

• Why is this weaker than strict/strong?

Page 214: Programming at Scale: Dataflow and Consistency

Sequential Consistency

• weaker than strict/strong consistency• All operations are executed in some sequential order

• each process issues operations in program order

• Any valid interleaving is allowed

• All agree on the same interleaving

• Each process preserves its program order

• Why is this weaker than strict/strong?

• Nothing is said about “most recent write”

Page 215: Programming at Scale: Dataflow and Consistency

Linearizability

Page 216: Programming at Scale: Dataflow and Consistency

Linearizability

• Assumes sequential consistency and• If TS(x) < TS(y) then OP(x) should precede OP(y) in the sequence

• Stronger than sequential consistency

• Difference between linearizability and serializability?

• Granularity: reads/writes versus transactions

Page 217: Programming at Scale: Dataflow and Consistency

Linearizability

• Assumes sequential consistency and• If TS(x) < TS(y) then OP(x) should precede OP(y) in the sequence

• Stronger than sequential consistency

• Difference between linearizability and serializability?

• Granularity: reads/writes versus transactions

•Example:•Stay tuned…relevant for lock free data structures

•Importantly: a property of concurrent objects

Page 218: Programming at Scale: Dataflow and Consistency

Causal consistency

Page 219: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order.

Page 220: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

Page 221: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

Causal:If a write produces a value thatcauses another write, they are causally related

X = 1if(X > 0) {

Y = 1}Causal consistency → all see X=1, Y=1 in same order

Page 222: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

Page 223: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

• Concurrent writes may be seen in different orders on different machines

Page 224: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

• Concurrent writes may be seen in different orders on different machines

Page 225: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

• Concurrent writes may be seen in different orders on different machines

Not permitted

Page 226: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

• Concurrent writes may be seen in different orders on different machines

Not permitted

Page 227: Programming at Scale: Dataflow and Consistency

Causal consistency

• Causally related writes seen by all processes in same order. • Causally?

• Concurrent writes may be seen in different orders on different machines

Not permittedPermitted

Page 228: Programming at Scale: Dataflow and Consistency

Consistency models summary

Page 229: Programming at Scale: Dataflow and Consistency

Consistency models summary

Consistency Description

Strict Absolute time ordering of all shared accesses matters.

LinearizabilityAll processes must see all shared accesses in the same order. Accesses are furthermore ordered

according to a (nonunique) global timestamp

Sequential All processes see all shared accesses in the same order. Accesses are not ordered in time

Causal All processes see causally-related shared accesses in the same order.

FIFOAll processes see writes from each other in the order they were used. Writes from different processes

may not always be seen in that order

(a)

Consistency Description

Weak Shared data can be counted on to be consistent only after a synchronization is done

Release Shared data are made consistent when a critical region is exited

Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.

(b)