Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

36
Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare

Transcript of Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Page 1: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Go Stream

Matvey Arye, Princeton/CloudflareAlbert Strasheim, Cloudflare

Page 2: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Awesome CDN service for websites big & small

Millions of request a second peak

24 data centers across the globe

Page 3: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Data Analysis

– Customer facing analytics

– System health monitoring

– Security monitoring

=> Need global view

Page 4: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Functionality

• Calculate aggregate functions on fast, big data

• Aggregate across nodes (across datacenters)

• Data stored at different time granularities

Page 5: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Basic Design Requirements

1. Reliability – Exactly-once semantics

2. High Data Volumes

Page 6: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Our Environment

Source

Storage

Source

Stream processing

Page 7: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Basic Programming Model

Op

Op Op

Storage

Storage Op

OpStorage Op

OpOpStorage Op

Page 8: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Existing Systems

S4 The reliability model is not consistent

Storm Exactly-once-semantics requires batching

Reliability only inside the stream processing systemWhat if a source goes down? The DB?

Page 9: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

The Need For End-to-End Reliability

Source Stream Proccessing Storage

When source comes back up where does it start sending data from?

If using something like Storm, need additional reliability mechanisms

Page 10: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

The Takeaway

Need end-to-end reliability- Or -

Multiple reliability mechanisms

Reliability of stream processing not enough

Page 11: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Design of Reliability

• Avoid queuing because destination has failed– Rely on storage at the edges– Minimize replication

• Minimize edge cases

• No specialized hardware

Page 12: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Big Design Decisions

End-to-end reliability

Only transient operator state

Page 13: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Recovering From Failure

SourceI am starting a stream with you. What have you already seen from me?

StorageI’ve seen <X>

Source Okie dokie. Here is all the new stuff.

Page 14: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Tracking what you have seen

Store identifier for all items

Store one identifier for highest number1234

Page 15: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Tracking what you have seen

Store identifier for all itemsThe answer to what have I seen is hugeRequires lots of storage for IDs

Store one identifier for highest numberParallel processing of ordered data is tricky

1234

Page 16: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Tension between

ParallelizationHigh Volume Data

Ordering

Reliability

Page 17: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Go Makes This Easier

Language from Google written for concurrency

Goroutine I run code

Goroutine I run code

Channels send databetween Go routines

Most synchronization is done by passing data

Page 18: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Goroutine Scheduling

Channels are FIFO queues that have a maximum capacity

So goroutine can be in 4 states:1. Executing Code 2. Waiting for a thread to execute code3. Blocking to receive data from a channel4. Blocking to send data to a channel

Scheduler optimizes assignment of goroutines to threads.

Page 19: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Efficient Ordering Under The Hood

12

34

Source distributes items to workers in a specific order

Reading from each worker:1. Read one tuple off the count

channel. Assign count to X2. Read X tuples of the result channel

Count of output tuples for each inputActual result tuples

Input tuple

Page 20: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Intuition behind design

Multiple output channels allows each worker towrite independently.

Count channel tells reader how many tuples to expect. Does not block except when result needed to satisfy ordering.

Judicious blocking allows scheduler to use blocking as a signal for which worker to schedule.

Page 21: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Throughput does not suffer

2 4 8 16 320

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

OrderedUnordered

Floating Point Operations (x1000)

Tupl

es p

er S

econ

d

Page 22: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

The Big Picture - Reliability

• Source provide monotonically increasing ids– per stream

• Stream processor preserves ordering – per source-stream

• Central DB maintains mapping of:Source-stream => highest ID processed

Page 23: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Functionality of Stream Processor

• Compression, serialization

• Partitioning for distributed sinks

• Bucketing– Take individual records and construct aggregates

• Across source nodes• Across time – adjustable granularity

• Batching– Submitting many records at once to the DB

• Bucketing and batching all done with transient state

Page 25: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Data Model

Streaming OLAP-like cubesUseful summaries of high-volume data

Page 26: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

27

Cube Dimensions

01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

Page 27: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

28

Cube Aggregates

(Count, Max)

bar.c

om/m

01:01:01

Page 28: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

29

Updating A CubeRequest #1bar.com/m01:01:00

Latency: 90 ms

(0,0) (0,0) (0,0) (0,0) 01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

Page 29: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

30

Map Request To CellRequest #1bar.com/m01:01:00

Latency: 90 ms

(0,0) (0,0) (0,0) (0,0) 01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

Page 30: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

31

Update The AggregatesRequest #1bar.com/m01:01:00

Latency: 90 ms

(1,90) (0,0) (0,0) (0,0) 01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

Page 31: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

32

Update In-Place

Request #2bar.com/m01:01:00

Latency: 50 ms

(2,90) (0,0) (0,0) (0,0) 01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

Page 32: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

33

Cube Slice

01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

01:01:01

Tim

e

URL

01:01:58

01:01:59Slice

Page 33: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

34

Cube Rollup

01:01:00

foo.

com

/r

foo.

com

/q

bar.c

om/n

bar.c

om/m

Tim

e

URL

URL: bar.com/*Time: 01:01:01

URL: foo.com/*Time: 01:01:01

Page 34: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

35

Rich Structure

(5,90)(3,75)

(8,199)(21,40)

D

A

C

01:01:59

01:01:00

fo

o.

co

m/

r

fo

o.

co

m/

q

ba

r.

co

m/

n

bar.co

m/m

01:01:01

01:01:58

B

E

Cell URL Time

A bar.com/* 01:01:01

B * 01:01:01

C foo.com/* 01:01:01

D foo.com/r 01:01:*

E foo.com/* 01:01:*

Page 35: Go Stream Matvey Arye, Princeton/Cloudflare Albert Strasheim, Cloudflare.

Key Property

2 types of rollups

1. Across Dimensions2. Across Sources

We use the same aggregation function for bothPowerful conceptual constraints

Semantic properties preserved when changing the granularity of reporting