Big data processing systems research

84
Big Data Processing Systems Study Vasiliki Kalavri, emjd-dc 3 Dec 2012

Transcript of Big data processing systems research

Page 1: Big data processing systems research

Big Data Processing Systems StudyVasiliki Kalavri, emjd-dc

3 Dec 2012

Page 2: Big data processing systems research

MapReducesimplified Data Processing on Large

Clusters

OSDI 2004

Page 3: Big data processing systems research

MapReduce

● Specify a map and a reduce functions

● The system takes care of○ parallelization

○ partitioning

○ scheduling

○ communication

○ fault-tolerance

3

Page 4: Big data processing systems research

Hadoop MapReduce 1.0

4

Page 5: Big data processing systems research

MapReduce Limitations

● Static Pipeline

● No support for common operations

● Data materialization after every job

● Slow - not fit for interactive analysis

● Complex configuration

5

Page 6: Big data processing systems research

YARN (MapReduce v.2)

6

Page 7: Big data processing systems research

What is Hadoop/MR NOT good for?

● All the things it wasn't built for○ Iterative computations○ Stream processing○ Incremental computations○ Interactive Analysis○ [insert research paper here]

7

Page 8: Big data processing systems research

Improving Hadoop performance

● Reduce Network & Disk I/O● Skewed Datasets● DB-like optimizations

○ column-oriented storage

○ indexes

8

Page 9: Big data processing systems research

Map-Reduce Inspired Systems

Extending the Programming Model

Page 10: Big data processing systems research

Map-Reduce Inspired Systems

● Extend the programming model to support○ Iterative○ Streamapplications

10

Page 11: Big data processing systems research

Iterative Processing

● Characteristics○ Datasets already stored

○ Need to reuse a dataset more than once, possibly

multiple times○ Iterative jobs, e.g. estimates, convergence

● Problems with iterative MR applications○ manual orchestration of several MR jobs○ re-loading & re-processing of invariant data○ no explicit way to define a termination condition

11

Page 12: Big data processing systems research

HaLoopEfficient Iterative Data Processing on

Large Clusters

VLDB 2010

Page 13: Big data processing systems research

System Overview

13

Page 14: Big data processing systems research

Programming Model

● Iterative Programming Model

Ri+1 = R0 U (Ri ⋈ L)

● Extensions to MR○ loop body○ termination condition○ loop-invariant data

14

Page 15: Big data processing systems research

Loop-Aware Scheduling

● Inter-Iteration Locality○ schedule tasks of different iterations which

access the same data on the same machines

15

Page 16: Big data processing systems research

Caching and Indexing

● Reducer Input Cache○ caches and indexes reducer inputs○ reduces M->R I/O

● Reducer Output Cache○ stores and indexes most recent local reducer

outputs○ reduces termination condition computation cost

● Mapper Input Cache○ avoids non-local data reads in mappers

16

Page 17: Big data processing systems research

Stream Processing

● Characteristics○ Data continuously comes into the system○ Usually needs to be processed as it arrives○ Frequent updates

● Problems with stream MR applications○ runs on a static snapshot os a dataset○ computations need to finish

17

Page 18: Big data processing systems research

MuppetMapReduce-Style Processing of Fast Data

VLDB 2012

Page 19: Big data processing systems research

Programming Model

● MapUpdate○ operates on streams, i.e. sequence of events with

the same id in increasing timestamp order

● Slates○ in-memory data structures which "summarize"

all events with key k that an Update function has

seen so far

19

Page 20: Big data processing systems research

Example Applications

● An application that monitors the FourSquare-checkin stream to count the number of checkins per retailer and displays the count on a Web page

● Detect "hot" topics in Twitter

20

Page 21: Big data processing systems research

System Overview

● Uses Cassandra to persist slate states

21

Page 22: Big data processing systems research

Map-Reduce Inspired Systems

Improving Performance

Page 23: Big data processing systems research

Map-Reduce Inspired Systems

● Improve performance by○ reusing data

○ building caches / indexes

○ DBMS-like optimizations

○ reducing I/O

23

Page 24: Big data processing systems research

IncoopMapReduce for Incremental Computations

SOCC 2011

Page 25: Big data processing systems research

System Overview

25

Page 26: Big data processing systems research

Inc-HDFS

● Content-based chunking● Fingerprint calculation

26

Page 27: Big data processing systems research

Incremental MapReduce

● Incremental Map○ persistently store intermediate results○ insert reference to memoization server

○ query memoization server and fetch result if

already computed

● Incremental Reduce○ persistently store entire tasks computations

○ store and map sub-computations used in the

Contraction phase

27

Page 28: Big data processing systems research

Contraction Phase

● Break up large Reduce tasks into many applications of the Combine function

● Only a subset of Combiners needs to be re-executed

28

Page 29: Big data processing systems research

HAILOnly Aggressive Elephants are Fast Elephants

VLDB 2012

Page 30: Big data processing systems research

System Overview

30

Page 31: Big data processing systems research

Upload Pipeline

● HDFS upload pipeline is changed so that:

○ the Client creates PAX blocks○ Datanodes do not flush data or checksums to

disk○ After all chunks of a block have been received,

the block is sorted in memory and flushed○ Each DataNode computes its own checksums

31

Page 32: Big data processing systems research

Query Pipeline

Transparency is achieved using UDFs:

● HailInputFormat○ elaborate splitting policy○ scheduling taked into account relevant indexes

● HailRecordReader○ Uses user annotation / configuration info to

select records for map phase○ transforms records from PAX to row format

32

Page 33: Big data processing systems research

ThemisAn I/O Efficient MapReduce

SOCC 2012

Page 34: Big data processing systems research

How to limit Disk I/O?

● Process records in memory and spill to disk as rarely as possible

● Relax fault-tolerance guarantees○ job-level recovery

● Dynamic memory management○ pluggable policies

● Per-node I/O management○ organize data in large batches

34

Page 35: Big data processing systems research

Memory policies

● Pool-based○ fixed-sized pre-allocated buffers

● Quota-based○ controls dataflow between computational stages

using queues

● Constraint-based○ dynamically adjusts memory allocation based on

requests and available memory

35

Page 36: Big data processing systems research

System Overview

Data-flow graph consisting of stages:

● Phase Zero extracts information about distribution of records and keys

● Phase One implements mapping and shuffling

● Phase Two implements the sorting and reduce, always keeping results in memory

36

Page 37: Big data processing systems research

ReStoreReusing Results of MapReduce Jobs

VLDB 2012

Page 38: Big data processing systems research

System Overview

● Built as an extension to Pig● When a workflow is submitted, ReStore:

○ re-writes the query to reuse stored results○ stores outputs of the workflow○ stores results of sub-jobs

○ decided which outputs to store in HDFS and

which to delete

38

Page 39: Big data processing systems research

System Architecture

39

Page 40: Big data processing systems research

Example

40

Page 41: Big data processing systems research

MANIMALAutomatic Optimization for MapReduce

Programs

VLDB 2011

Page 42: Big data processing systems research

Idea

● Apply well-known query optimization techniques to Map-Reduce jobs

● Static analysis of compiled code● Apply optimizations only when "safe"

42

Page 43: Big data processing systems research

System Architecture

43

Page 44: Big data processing systems research

Example Optimizations

● Selection○ if the map function is a filter, use a B+Tree to

only scan the relevant portion of the input

● Projection○ eliminate unnecessary fields from input records

44

Page 45: Big data processing systems research

SkewTuneMitigating Skew in MapReduce Applications

SIGMOD 2012

Page 46: Big data processing systems research

Common Types of Skew

● Uneven distribution of input data○ partitioning which does not guarantee even

distribution○ popular key groups

● Expensive records○ some portions of the input take longer to process

than others

46

Page 47: Big data processing systems research

System Overview

● Per-task progress estimation● Per-task statistics● Late skew detection

○ skew mitigation is delayed until a slot is

available

● Only re-partition one task at a time○ only when half the time remaining is less than

the re-partitioning overhead

47

Page 48: Big data processing systems research

Implementation

Re-partition a map task

● mitigators execute as mappers within a new MapReduce job

● output is written to HDFS

Re-partition a reduce task

● mitigator job with an identity map read input from task tracker

48

Page 49: Big data processing systems research

StarfishA Self-Tuning System for Big Data Analytics

CIDR 2011

Page 50: Big data processing systems research

System Overview

50

Page 51: Big data processing systems research

Job-Level Tuning

● Just-in-Time Optimizer○ choose efficient execution techniques, e.g. joins

● Profiler○ learns performance models, job profiles

● Sampler○ collects statistics about input, intermediate and

output data○ helps the profiler build approximate models

51

Page 52: Big data processing systems research

Workflow-Level Tuning

● Workflow-aware Scheduler○ exploring data locality on workflow-level instead

of making locally optimal decisions

● What-If Engine○ answers questions based on simulations of job

executions

52

Page 53: Big data processing systems research

Workload-Level Tuning

● Workload Optimizer○ Data-flow sharing○ Materialization of intermediate results for reuse○ Reorganization

● Elastisizer○ node and network configuration automation

53

Page 54: Big data processing systems research

Big-Data Processing Beyond MapReduce

Page 55: Big data processing systems research

DryadDistributed Data-Parallel Programs from

Sequential Building Blocks

EuroSys 2007

Page 56: Big data processing systems research

System Overview

56

Page 57: Big data processing systems research

Graph Description

57

Page 58: Big data processing systems research

Communication

58

Page 59: Big data processing systems research

Graph Optimizations

● Schedule vertices clode to the input data

● If a computation is associative and commutative, use an aggregation tree

● Dynamically refine the graph based on output data sizes○ vary number of vertices in each stage,

connectivity

59

Page 60: Big data processing systems research

SCOPEEasy and Efficient Parallel Processing of

Massive Data Sets

VLDB 2008

Page 61: Big data processing systems research

System Overview

61

Page 62: Big data processing systems research

SCOPE scripting language

● resembles SQL with C# expressions

● commands are data transformation operators

● extensible mapreduce-like commands

62

Page 63: Big data processing systems research

SCOPE Execution

● The Compiler creates internal parse tree

● The Optimizer creates a parallel execution plan, i.e. a Cosmos job

● The Job Manager constructs the graph and schedules execution

63

Page 64: Big data processing systems research

SparkCluster Computing with Working Sets

HotCloud 2010

Page 65: Big data processing systems research

RDDs

Read-only collection of objects● partitioned across machines● store their "lineage"● can be re-constructed● users can control persistence and

partitioning

65

Page 66: Big data processing systems research

Programming Model

● Scala API● driver program

○ defines RDDs and actions on them

● workers○ long-lived processes

○ store and process RRD partitions in-memory

66

Page 67: Big data processing systems research

Job Stages

67

Page 68: Big data processing systems research

Nephele/PACTsA Programming Model and Execution Framework for Web-Scale Analytical

ProcessingSoCC 2010

Page 69: Big data processing systems research

The Stratosphere Stack

69

Page 70: Big data processing systems research

System Overview

● Execution plan in the form of a DAG● Abstracts parallelization and

communication● Optimizer to choose best execution

strategy

70

Page 71: Big data processing systems research

Programming Model

● Input Contracts: ○ give guarantees on how data is organized into

independent subsets○ Map, Reduce, Match, Cross, CoGroup

● Output Contracts:○ define properties on the output data○ Same-Key, Super-Key, Unique-Key

71

Page 72: Big data processing systems research

ASTERIXScalable, Semi-structured Data Platform for

Evolving-World Models

Distributed and Parallel Databases 2011

Page 73: Big data processing systems research

Evolving World Model

● As-of queries○ What is the best route to get to the Olympic

Stadium right now?

○ What is the traffic situation like on Saturday nights close to the city center?

○ How many visitors that visited the City Hall during the past year also went for dinner in that nearby restaurant?

73

Page 74: Big data processing systems research

Data Model - Query Language

● Semi-structured data model, ADM○ dataset ~ table: indexed, partitioned, replicated○ dataverse ~ database○ DDL: primary key, partitioning key○ "open" data schemes

● AQL query language○ declarative, inspired from Jaql and XQuery○ logical plan -> DAG -> Hyracks Job

74

Page 75: Big data processing systems research

System Overview

75

Page 76: Big data processing systems research

DremelInteractive analysis of Web-Scale Datasets

VLDB 2010

Page 77: Big data processing systems research

Columnar Storage

● lossless representation○ save field types, repetition/definition levels

● fast encoding○ recursively traverses record and computes levels

● efficient record assembly○ use a FSM to reconstruct records

77

Page 78: Big data processing systems research

Query Execution

● Language based on SQL

● Tree architecture○ Root server

■ receives incoming queries

■ reads table metadata

■ routes queries to the next level of the tree

○ Leaf servers■ communicate with storage layer

78

Page 79: Big data processing systems research

Query Dispatcher

● Schedules queries to available slots

● Balances the load

● Assures fault-tolerance

● Specifies what percentage of tablets to be scanned before returning a result

79

Page 80: Big data processing systems research

CIELa universal execution engine for distributed

data-flow computing

NSDI 2011

Page 81: Big data processing systems research

Dynamic Task Graph

81

Page 82: Big data processing systems research

System Architecture

82

Page 83: Big data processing systems research

Skywriting Language

● Turing-complete● Arbitrary data-dependent control flow

○ while loops○ recursive functions

● Supports invokation of code written in other languages

83

Page 84: Big data processing systems research

References

www.citeulike.org/user/vasiakalavri

84