HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster...

67
Airbnb HONGBO ZENG

Transcript of HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster...

Page 1: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

AirbnbHONGBO ZENG

Page 2: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

� ����

� ����

� ����

� ����� ����

� ����

Page 3: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

� ����

� ����

� ����� ����

� ����

� ����

Page 4: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Data Platform at Airbnb• Cluster Evolution• Incremental Data Replication - ReAir• Unified Streaming and Batch Processing - AirStream

Agenda

Page 5: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Data Platform at Airbnb• Cluster Evolution• Incremental Data Replication - ReAir• Unified Streaming and Batch Processing - AirStream

Agenda

Page 6: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

>13B >35PB 1400+Warehouse Size#Events Collected Machines

Hadoop + Presto + Spark

Scale of Data Infrastructure at Airbnb

5xYoY Data Growth

Page 7: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Event Logs

MySQL Dumps

Gold Cluster

HDFS

Hive

Kafka

Sqoop

Silver Cluster Spark Cluster

Spark

ReAir

Airflow Scheduling

S3

Presto ClusterAirPal

SuperSetTableau

Data Platform

Yarn HDFS

Hive

Yarn

5

AirStream

Page 8: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Event Logs

MySQL Dumps

Gold Cluster

HDFS

Hive

Kafka

Sqoop

Silver Cluster Spark Cluster

Spark

ReAir

Airflow Scheduling

S3

Presto ClusterAirPal

SuperSetTableau

Data Platform

Yarn HDFS

Hive

Yarn

6

AirStream

Page 9: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Data Platform at Airbnb• Cluster Evolution• Incremental Data Replication - ReAir• Unified Streaming and Batch Processing - AirStream

Agenda

Page 10: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Setup• Single HDFS, MR and Hive installation• c3.8xlarge (32 cores / 60G mem / 640GB disk)

+ 3TB of EBS volume• 800 nodes• Tested DN on different AZ’s• All data managed by Hive

Original Cluster

Challenges• Limited isolation between production / adhoc• Adhoc

- Difficult to meet SLA’s- Harder for capacity plan

• Disaster recovery• Difficult roll outs

Page 11: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Two independent HDFS, MR, Hive metastores• d2.8xlarge w/ 48TB local• ~250 instances in final setup• Replication of common / critical data - Silver is super

of Gold• For disaster recovery, separate AZ’s

Two Clusters

Gold Cluster

HDFS

Hive

Silver Cluster

Replication

Yarn HDFS

Hive

Yarn

Page 12: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Advantages• Failure isolation with user jobs• Easy capacity planning• Guarantee SLA’s• Able to test new versions• Disaster Recovery

Multi-Cluster Trade-Offs

Disadvantages• Data synchronization• User confusion• Operational overhead

Page 13: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Advantages• Failure isolation with user jobs• Easy capacity planning• Guarantee SLA’s• Able to test new versions• Disaster Recovery

Multi-Cluster Trade-Offs

Disadvantages• Data synchronization• User confusion• Operational overhead

Page 14: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Data Platform at Airbnb• Cluster Evolution• Incremental Data Replication - ReAir• Unified Streaming and Batch Processing - AirStream

Agenda

Page 15: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Batch• Scan HDFS, metastore• Copy relevant entries• Simple, no state• High latency

Warehouse Replication Approaches

Incremental• Record changes in source• Copy/re-run operations on destination• More complex, more state• Low latency (seconds)

Page 16: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Record Changes on Source• Convert Changes to Replication Primitives• Run Primitives on the Destination

IncrementalReplication

14

Page 17: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Hive provides hooks API to fire at specific points- Pre-execute- Post-execute- Failure

• Use post-execute to log objects that are created into an audit log

• In critical path for queries

Record ChangesOn Source

15

Page 18: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Example Audit Log Entry

16

Page 19: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• 3 types of objects - DB, table, partition• 3 types of operations - Copy, rename, drop• 9 different primitive operations• Idempotent

Convert Changes to Primitive Operations

17

Page 20: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

CREATE TABLE srcpart (key STRING) PARTITIONED BY (ds STRING) • Copy Table

INSERT OVERWRITE TABLE srcpart PARTITION(ds=‘1’) SELECT key FROM src • Copy Partition

ALTER TABLE srcpart SET FILEFORMAT TEXTFILE • Copy Table

ALTER TABLE srcpart RENAME to srcpart_old • Rename table

Primitive Example

18

Page 21: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Copy Table Flow

source exists?

dest exists and the same?

copy to temp

locationverify the

copy tmp -> dest add metadata

done

Y

NY

N

Page 22: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Data Platform at Airbnb• Cluster Evolution• Incremental Data Replication - ReAir• Unified Streaming and Batch Processing - AirStream

Agenda

Page 23: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Batch Infrastructure

21

Event Logs

MySQL Dumps

Gold Cluster

HDFS

Hive

Kafka

Sqoop

Silver Cluster Spark Cluster

SparkReAir

Airflow Scheduling

S3

Presto ClusterAirPal

SuperSetTableau

Yarn HDFS

Hive

Yarn

Page 24: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

AirStream

22

Source Process Sink

Page 25: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Streaming at Airbnb - AirStream

23

Cluster

Spark Streaming

Airflow Scheduling

HBase

HDFS

Sources

Kafka

S3

HDFS

Sinks

Datadog

Kafka

DynamoDB

ElasticSearch

Page 26: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Lambda Architecture

Page 27: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Batch

AirStream

Hive

Spark SQL

Lambda Architecture

25

Streaming

Kafka

Spark Streaming

State Storage

Page 28: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Sources

26

Streaming

source: [ { name: source_example, type: kafka, config: { topic: "example_topic", } }]

Batchsource: [ { name: source_example, type: hive, sql: { select * from db.table where ds=‘2017-06-05’; } }]

Page 29: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Computation

27

Streaming/Batch

process: [{ name = process_example, type = sql, sql = """ SELECT listing_id, checkin_date, context.source as source FROM source_example WHERE user_id IS NOT NULL """ }]

Page 30: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Sinks

28

Streaming

sink: [ { name = sink_example input = process_example type = hbase_update hbase_table_name = test_table bulk_upload = false }]

Batch

sink: [ { name = sink_example input = process_example type = hbase_update hbase_table_name = test_table bulk_upload = true }]

Page 31: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

StreamingComputation Flow

29

Source

Process_A Process_B

Process_A1

Sink_A2 Sink_B2

BatchSource

Process_A Process_B

Process_A1

Sink_A2 Sink_B2

Page 32: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Unified API through AirStream• Declarative job configuration

• Streaming source vs static source

• Computation operator or sink can be shared by streaming and batch job.

• Computation flow is shared by streaming and batch

• Single driver executes in both streaming and batch mode job

30

Page 33: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Shared State Storage

Page 34: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

AirStream

Shared Global State Store

32

HBase Tables

Spark StreamingSpark StreamingSpark StreamingSpark StreamingSpark BatchSpark BatchSpark BatchSpark Batch

Page 35: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Well integrated with Hadoop eco system

• Efficient API for streaming writes and bulk uploads

• Rich API for sequential scan and point-lookups

• Merged view based on version 33

Why HBase

Page 36: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Unified Write API

34

DataFrame

HBase

Region 1

Region 2

Region N

Re-partition<Region 1, [RowKey,

Value]>

<Region 2, [RowKey, Value]>

<Region N, [RowKey, Value]>

… …

Puts

HFile BulkLoad

Page 37: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Rich Read API

35

HBase Tables

Spark Streaming/Batch Jobs

Multi-Gets Prefix Scan Time Range Scan

Page 38: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Merged Views

36

Row Key

R1 V200 TS200

R1 V150 TS150

R1 V01 TS01

… … … …

Time

Streaming Writes

Streaming Writes

Streaming Writes

Page 39: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Merged Views

37

Row Key

R1 V200 TS200

R1 V150 TS150

R1 V01 TS01

Time

Streaming Writes

Streaming Writes

Streaming Writes

R1 V100 TS100Batch Bulk Upload

Page 40: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Our Foundations

• Unify streaming and batch process

• Shared global state store

38

Page 41: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

MySQL DB Snapshot Using Binlog Replay

Page 42: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Large amount of data: Multiple large mysql DBs• Realtime-ness: minutes delay/ hours delay• Transaction : Need to keep transaction across different tables• Schema change: Table schema evolves

Database Snapshot

40

Move Elephant

Page 43: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

41

Binlog Replay on Spark

20+ hr 4+ hr

AirStream Job

5 mins

15

1 hr

spinal tap

seed

Page 44: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Streaming and Batch shares Logic: Binlog file reader, DDL processor, transaction processor, DML processor.

• Merged by binlog position: <filenum, offset>

• Idempotent: Log can be replayed multiple times.

• Schema changes: Full schema change history.

42

Log Parser

Transaction Processor

Change Processor

Schema Processor HB

ASE

Lambda ArchitectureBinlog(realtime/history)

DML DD

L

XVID

Mysql Instance

Page 45: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Streaming Ingestion & Realtime Interactive Query

Page 46: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Realtime Ingestion and Interactive Query

44

HBase

AirStream

Spark Streaming

Kafka

Query Engine

DataPortal

Spark SQL

Hive SQL

Presto SQL

Page 47: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Interactive Query in SqlLab

45

Page 48: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Thanks

Page 49: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Realtime OLAP with Druid

Page 50: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Realtime Ingestion for Druid

48

Druid

AirStream

Spark Streaming

KafkaDimension

Metrics

Druid Beam

Page 51: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Superset Powered by Druid

49

Page 52: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Realtime Indexing

Page 53: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Hive

Realtime Indexing

51

Elastic Search

es_version=

mutation id

AirStream

Spark Streaming

Spark Batch

Table A

Event Event Event… …

Kafka

Table BTable C

Page 54: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Backup Slides

Page 55: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Tips

Page 56: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Moving Window Computation

Page 57: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Long Window Computation

55

What if window is weeks, months, or even years?

Page 58: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Distinct in a Large Window

56

I don’t want approximation. What

should I do?

Page 59: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Distinct Count

57

Row Key

Listing 1 Visitor 01 TS100

Listing 1 Visitor 02 TS100

Listing 1 Visitor 04 TS98

Listing 1 Visitor 03 TS99

Prefix Scan with TimeRange

Prefix Scan with TimeRange

Time

Page 60: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Moving Average

58

Row KeyListing 1 Total Review

Cnt: 100 TS100

Listing 1 Total Review Cnt: 98 TS99

Listing 1 Total Review Cnt: 01 TS01

Listing 1 Total Review Cnt: 50 TS50

Count Difference/Time Elapsed

Count Difference/Time Elapsed

Time

… … …

… … …

Window 1

Window 2

Page 61: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Schema EnforcementStreaming Events

Page 62: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Thrift -> DataFrame

60

ThriftEvent

https://github.com/airbnb/airbnb-spark-thrift

Thrift Class

Thrift Object

FieldMetaData

Struct Type

FieldValue

Row

DataFrame

Page 63: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Summary

Page 64: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Unify Batch and Streaming Computation

62

Page 65: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

Global State Store Using HBase

63

Page 66: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

• Serial execution- Easy to reason about operations- Very slow

• Parallel execution- Fast and scalable- Ordering is important: e.g. create table before copying a

partition- DAG of primitive operations

Run Primitiveson Destination

64

Page 67: HONGBO ZENG Airbnb Ä c+ - pic. · PDF file•Data Platform at Airbnb • Cluster Evolution • Incremental Data Replication - ReAir • Unified Streaming and Batch Processing - AirStream

� ����

� ����

� ����� ����

� ����

� ����