Programming at Scale: Dataflow and Consistency

cs378h

Programming at Scale: Dataflow and Consistency

Today

Questions?

Administrivia

• Rust lab due today!

• Project Proposal Due Thursday!

Agenda:

• Dataflow Wrapup

• Concurrency & Consistency at Scale

Review: K-Means

public void kmeans() {

while(…) {

for each point

find_nearest_center(point);

for each center

compute_new_center(center)

}

}

Review: K-Means


while(…) {

for each point


for each center


}

}

map

Review: K-Means


while(…) {

for each point


for each center


}

}

map

reduce

Review: K-Means


while(…) {

for each point


for each center


}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Review: K-Means


while(…) {

for each point


for each center


}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Review: K-Means


while(…) {

for each point


for each center


}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Review: K-Means


while(…) {

for each point


for each center


}

}

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

fncfnc

fncfnc

fncfnc

cnccnc

cnccnc

Input Output

map

reduce

Key idea: adapt workload to parallel patternsQuestions:• What kinds of computations can this express?• What other patterns could we use?

How Does Parallelization Work?

INPUTFILE(s)

Execution

Group by?

Execution

Group by?

Key idea → shuffle == sort!

Task Granularity And Pipelining

|map tasks| >> |machines| -- why?

Task Granularity And Pipelining

|map tasks| >> |machines| -- why?Minimize fault recovery timePipeline map with other tasksEasier to load balance dynamically

MapReduce: not without Controversy


Backwards step in programming paradigm


Backwards step in programming paradigmSub-optimal: brute force, no indexing


Backwards step in programming paradigmSub-optimal: brute force, no indexingNot novel: 25 year-old ideas from DBMS lit

It’s just a group-by aggregate engine




Missing most DBMS featuresSchema, foreign keys, …





Incompatible with most DBMS tools






So why is it such a big success?

Why is MapReduce backwards?


Backwards step in programming paradigm


Backwards step in programming paradigmSub-optimal: brute force, no indexing






So why is it such a big success?

MapReduce and Dataflow


• MR is a dataflow engine



• Lots of others• Dryad

• DryadLINQ

• Dandelion

• CIEL

• GraphChi/Pregel

• Spark



• Lots of others• Dryad

• DryadLINQ

• Dandelion

• CIEL

• GraphChi/Pregel

• Spark

Taxonomies:

• DAG instead of BSP

• Interface variety• Memory FIFO• Disk• Network

• Flexible Modular Composition

Dryad (2007): 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

Dryad (2007): 2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D

grep1000 | sed500 | sort1000 | awk500 | perl50

Dataflow Engines

Dataflow Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

Channels

Stage

Dataflow Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

Channels

Stage

How to implement?

Channels

X

M

Items

Finite streams of items

• distributed filesystem files(persistent)

• SMB/NTFS files (temporary)

• TCP pipes(inter-machine)

• memory FIFOs (intra-machine)

Channels

X

M

Items

Finite streams of items

• distributed filesystem files(persistent)

• SMB/NTFS files (temporary)

• TCP pipes(inter-machine)

• memory FIFOs (intra-machine)

Key idea:Encapsulate data movement behind

channel abstraction → gets programmer out of the picture

Spark (2012) Background

Commodity clusters: important platformIn industry: search, machine translation, ad targeting, …

In research: bioinformatics, NLP, climate simulation, …

Cluster-scale models (e.g. MR) de facto standardFault tolerance through replicated durable storage

Dataflow is the common theme

Spark (2012) Background

Commodity clusters: important platformIn industry: search, machine translation, ad targeting, …

In research: bioinformatics, NLP, climate simulation, …

Cluster-scale models (e.g. MR) de facto standardFault tolerance through replicated durable storage

Dataflow is the common theme

Multi-coreIteration

Motivation

Programming models for clusters transform data flowing from stable storage to stable storage

E.g., MapReduce:

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

Motivation

Programming models for clusters transform data flowing from stable storage to stable storage

E.g., MapReduce:

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

Benefits of data flow: runtime can decide where to run tasks and can automatically

recover from failures

Iterative Computations: PageRank

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output


MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output


MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Input Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

Output

Solution: augment data flow model with “resilient distributed datasets” (RDDs)

Programming Model

• Resilient distributed datasets (RDDs)• Immutable collections partitioned across cluster that can

be rebuilt if a partition is lost

• Created by transforming data in stable storage using data flow operators (map, filter, group-by, …)

• Can be cached across parallel operations

Programming Model





• Parallel operations on RDDs• Reduce, collect, count, save, …

Programming Model





• Parallel operations on RDDs• Reduce, collect, count, save, …

• Restricted shared variables• Accumulators, broadcast variables

Example: Log Mining• Load error messages from a log into memory, then

interactively search for various patterns



WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver



lines = spark.textFile(“hdfs://...”) WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver




WorkerWorker

WorkerWorker

DriverDriver

Base RDDBase RDD




WorkerWorker

WorkerWorker

DriverDriver



lines = spark.textFile(“hdfs://...”)

errors = lines.filter(_.startsWith(“ERROR”))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver





WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Transformed RDDTransformed RDD





WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver





messages = errors.map(_.split(‘\t’)(2))

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver






cachedMsgs = messages.cache()

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver







WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

Cached RDDCached RDD







WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver







WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).count







WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver

cachedMsgs.filter(_.contains(“foo”)).countParallel operationParallel operation







WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver








Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver








Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver


tasks







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver


tasks

results







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver


tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver


Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver


cachedMsgs.filter(_.contains(“bar”)).count

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver



tasks

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver



tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver



. . .

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3







Block 1Block 1

Block 2Block 2

Block 3Block 3

WorkerWorker

WorkerWorker

WorkerWorker

DriverDriver



. . .

tasks

results

Cache 1Cache 1

Cache 2Cache 2

Cache 3Cache 3

Result: full-text search of Wikipedia in <1 sec (vs20 sec for on-disk data)

RDD Fault Tolerance

• RDDs maintain lineage information that can be used to reconstruct lost partitions

• Ex:cachedMsgs = textFile(...).filter(_.contains(“error”))

.map(_.split(‘\t’)(2))

.persist()

HdfsRDDpath: hdfs://…

HdfsRDDpath: hdfs://…

FilteredRDDfunc: contains(...)

FilteredRDDfunc: contains(...)

MappedRDDfunc: split(…)

MappedRDDfunc: split(…)

CachedRDDCachedRDD

Data-Parallel Computation Systems

Execution

Application

Storage

Language


Execution

Application

Storage

Language

ParallelDatabases

SQL


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

ParallelDatabases

SQL


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

ParallelDatabases

SQL Sawzall


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

ParallelDatabases

SQL Sawzall


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQLSawzall


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQL LINQ, SQLSawzall

Cosmos, HPC, Azure


Execution

Application

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, Hive

ParallelDatabases

SQL ≈SQL LINQ, SQLSawzall

Cosmos, HPC, Azure

Spark

(Yet) Another Framework

22


22

Consistency


22

Da

taM

od

el

Consistency


22

Strong: ACID Eventual: BASE

Da

taM

od

el

Consistency


22


Da

taM

od

el

Consistency

• Atomicity• Consistency• Isolation• Durability


22


Da

taM

od

el

Consistency

• Basically Available• Soft State• Eventually Consistent

• Atomicity• Consistency• Isolation• Durability


22


Da

taM

od

el

Consistency


22


Da

taM

od

el

Consistency

Key Value Stores


22


Da

taM

od

el

Consistency

Key Value Stores

Document Stores

Wide-Column Stores


22


Da

taM

od

el

Consistency

Key Value Stores

Document Stores

Wide-Column Stores


22


Da

taM

od

el

Consistency

Sharding/Partitioning

Key Value Stores

Document Stores

Wide-Column Stores


22


Da

taM

od

el

Consistency


Key Value Stores

Document Stores

Replication

Wide-Column Stores


22


Da

taM

od

el

Consistency


Key Value Stores

Document Stores

Replication

Storage