Consistent Streaming Through Time: A Vision for Event Stream Processing

14
Consistent Streaming Consistent Streaming Through Time: A Vision for Through Time: A Vision for Event Stream Processing Event Stream Processing by Jonathan Goldstein by Jonathan Goldstein (speaker), Roger Barga, (speaker), Roger Barga, Mohamed Ali, and Mingsheng Mohamed Ali, and Mingsheng Hong Hong Microsoft Research Microsoft Research

description

Consistent Streaming Through Time: A Vision for Event Stream Processing. by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Mingsheng Hong Microsoft Research. Are StreamSQL semantics ok?. Suppose we want to monitor the bandwidth of a device: - PowerPoint PPT Presentation

Transcript of Consistent Streaming Through Time: A Vision for Event Stream Processing

Page 1: Consistent Streaming Through Time: A Vision for Event Stream Processing

Consistent Streaming Consistent Streaming Through Time: A Vision for Through Time: A Vision for Event Stream ProcessingEvent Stream Processing

by Jonathan Goldstein (speaker), by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Roger Barga, Mohamed Ali, and

Mingsheng HongMingsheng Hong

Microsoft ResearchMicrosoft Research

Page 2: Consistent Streaming Through Time: A Vision for Event Stream Processing

Are StreamSQL Are StreamSQL semantics ok?semantics ok?

Suppose we want to monitor the bandwidth of a Suppose we want to monitor the bandwidth of a device:device: We create an input stream which has one field: bytes sentWe create an input stream which has one field: bytes sent We create an output stream which computes a windowed We create an output stream which computes a windowed

sumsum What are the StreamSQL semantics when the system What are the StreamSQL semantics when the system

gets overloaded (strange question to ask)?gets overloaded (strange question to ask)? Either events must be dropped, or they must be queued at Either events must be dropped, or they must be queued at

the receiver or sender for later processingthe receiver or sender for later processing Since window semantics are based on Since window semantics are based on system time system time

(StreamSQL server time)(StreamSQL server time) , if the device has constant , if the device has constant bandwidth, apparent bandwidth will decrease!bandwidth, apparent bandwidth will decrease!

In StreamSQL, the user has no reasonable way of knowing!In StreamSQL, the user has no reasonable way of knowing! Conclusion: Something is deeply wrong with the use of time Conclusion: Something is deeply wrong with the use of time

in StreamSQL query semantics!in StreamSQL query semantics!

Page 3: Consistent Streaming Through Time: A Vision for Event Stream Processing

What’s in the paper?What’s in the paper? Laundry list of CEDR features either Laundry list of CEDR features either

unsupported or poorly supported in unsupported or poorly supported in existing streaming systems (Read the existing streaming systems (Read the paper)paper) Some of these features come from event Some of these features come from event

processingprocessing Some come from specific scenarios which Some come from specific scenarios which

we believe to be importantwe believe to be important These features are described formally These features are described formally

through a query language descriptionthrough a query language description

Page 4: Consistent Streaming Through Time: A Vision for Event Stream Processing

What’s in the talk (and the What’s in the talk (and the paper)?paper)?

Formal definitions of CEDR streams and operator Formal definitions of CEDR streams and operator semanticssemantics Provides a Provides a clear and intuitive frameworkclear and intuitive framework for discussing for discussing

subtle semantic issuessubtle semantic issues Formalization of Formalization of materialized view update semanticsmaterialized view update semantics in in

standing queries and discuss why they are inadequate in standing queries and discuss why they are inadequate in isolationisolation

Definition of a Definition of a non-view update compliant operatornon-view update compliant operator which can express a very wide range of seemingly which can express a very wide range of seemingly disparate streaming featuresdisparate streaming features

A myriad of window types, the separation of inserts and deletes, A myriad of window types, the separation of inserts and deletes, etc…etc…

We discuss theoretically both the expression and correct We discuss theoretically both the expression and correct handling of both data delivered handling of both data delivered out of order and data out of order and data retractionretraction

Different formal notions of correctness lead to different consistency Different formal notions of correctness lead to different consistency levels and associated performance tradeoffslevels and associated performance tradeoffs

Page 5: Consistent Streaming Through Time: A Vision for Event Stream Processing

What is a stream and a What is a stream and a standing query?standing query?

A stream is a (possibly infinite) collection of events, where A stream is a (possibly infinite) collection of events, where each event contains:each event contains: A payload (P)A payload (P) A key which uniquely identifies the event (K)A key which uniquely identifies the event (K) An interval of time (application) for which the payload is valid [VAn interval of time (application) for which the payload is valid [Vss, V, Vee)) A time at which it arrives at a listener (C for CEDR time)A time at which it arrives at a listener (C for CEDR time)

A standing query is an operator graph, where each operator A standing query is an operator graph, where each operator takes 0 or more input streams and produces 0 or more takes 0 or more input streams and produces 0 or more output streamsoutput streams

K Vs Ve C P

K1 1 5 1 …

K2 2 3 3 …

Acknowledgement: This is inspired by and built on Rick Snodgrass’s temporal work

Page 6: Consistent Streaming Through Time: A Vision for Event Stream Processing

What properties do What properties do operators have?operators have?

All operators should be All operators should be well behavedwell behaved:: Definition 6Definition 6: A CEDR operator O is : A CEDR operator O is well behavedwell behaved iff iff

for all (combinations of) inputs to O which are logically for all (combinations of) inputs to O which are logically equivalent to infinity, O’s outputs are also logically equivalent to infinity, O’s outputs are also logically equivalent to infinityequivalent to infinity

Any well behaved operator, when given 2 identical sets Any well behaved operator, when given 2 identical sets of input streams, except for CEDR time, should produce of input streams, except for CEDR time, should produce identical sets of output streams, except for CEDR timeidentical sets of output streams, except for CEDR time

Query semantics are Query semantics are independentindependent of CEDR time of CEDR time

K Vs Ve C P

K1 1 5 1 …

K2 2 3 3 …

K Vs Ve C P

K1 1 5 3 …

K2 2 3 1 …

Page 7: Consistent Streaming Through Time: A Vision for Event Stream Processing

What properties do What properties do operators have?operators have?

Some operators are also Some operators are also view update compliantview update compliant:: Definition 11Definition 11:: A unary CEDR operator O is view A unary CEDR operator O is view

update compliant iff for all R, S s.t. *(R) and *(S) are update compliant iff for all R, S s.t. *(R) and *(S) are identical, *(O(R)) and *(O(S)) are also identicalidentical, *(O(R)) and *(O(S)) are also identical

If we interpret the stream as describing a changing If we interpret the stream as describing a changing relation where each row’s lifetime is specified by valid relation where each row’s lifetime is specified by valid time, then:time, then:

A view update compliant operator produces snapshot identical A view update compliant operator produces snapshot identical output for snapshot identical inputoutput for snapshot identical input

K Vs Ve C P

K1 1 5 1 P1

K Vs Ve C P

K1 1 2 2 P1

K2 2 5 3 P1

Page 8: Consistent Streaming Through Time: A Vision for Event Stream Processing

What are our operators?What are our operators?

We may now happily use all our favorite We may now happily use all our favorite relational operators:relational operators: Definition 9Definition 9: Join f(P1,P2)(S1, S2):⋈: Join f(P1,P2)(S1, S2):⋈ ⋈ ⋈θ(P1,P2)(S1, S2) = {(Vs, Ve, (e1.Payload θ(P1,P2)(S1, S2) = {(Vs, Ve, (e1.Payload

concantenated with e2.Payload)) | e1 concantenated with e2.Payload)) | e1 E(S1), e2 E(S1), e2 E(S2), Vs=max{ e1.Vs, e2.Vs}, E(S2), Vs=max{ e1.Vs, e2.Vs}, Ve=min{ e1.Ve, e2.Ve}, where Vs < Ve, and Ve=min{ e1.Ve, e2.Ve}, where Vs < Ve, and θ(e1.Payload, e2.Payload)}θ(e1.Payload, e2.Payload)}

These operators’ output streams describe the These operators’ output streams describe the changing contents of a materialized view changing contents of a materialized view computed over the changing input relation(s) computed over the changing input relation(s) described by the input streamsdescribed by the input streams

Page 9: Consistent Streaming Through Time: A Vision for Event Stream Processing

Non-view update compliant Non-view update compliant operatorsoperators

Moving window – all output valid end times are set Moving window – all output valid end times are set to their valid start times plus the window sizeto their valid start times plus the window size

insert separation (CQL) – all output valid end times insert separation (CQL) – all output valid end times are set to infinityare set to infinity

The semantics of these operations plus many more The semantics of these operations plus many more can be easily captured using AlterLifetime:can be easily captured using AlterLifetime: Definition 12Definition 12: AlterLifetime Π: AlterLifetime Πfvs, fΔfvs, fΔ(S)(S)

ΠΠfvs, fΔfvs, fΔ(S)={(|f(S)={(|fVsVs(e)|, |f(e)|, |fVsVs(e)| + |f(e)| + |fΔΔ (e)|, e.Payload) | e (e)|, e.Payload) | e E(S}} E(S}} Allows the lifetime of input events to be recomputedAllows the lifetime of input events to be recomputed It is not view update compliant, but it It is not view update compliant, but it isis well behaved well behaved

Page 10: Consistent Streaming Through Time: A Vision for Event Stream Processing

But is this But is this implementable?implementable?

K Vs Ve C P

K1 2 6 1 15

K2 1 5 3 5

Avg(P) – The usual average Avg(P) – The usual average operator in materialized operator in materialized view update compliant formview update compliant form

But how could CEDR know But how could CEDR know it needed to wait for K2 (to it needed to wait for K2 (to produce output) when it produce output) when it saw K1? saw K1? It couldn’t have without It couldn’t have without

waiting indefinitely or waiting indefinitely or without some external without some external guaranteeguarantee

Input:Input:

Correct Output:Correct Output:

K Vs Ve C P

K1 - 1 … ?

K2 1 2 … 5

K3 2 5 … 10

K4 5 6 … 15

K5 6 … ?

Page 11: Consistent Streaming Through Time: A Vision for Event Stream Processing

But is this But is this implementable?implementable?

We need the ability to retract previously We need the ability to retract previously output results in the stream:output results in the stream:

K Vs Ve C P

K1 1 5 … 1

-K1 1 2 … 1

K2 2 7 … 2

K Vs Ve C P

K1 1 2 … 1

K2 2 7 … 2

is is logically equivalentlogically equivalent to: to:

Page 12: Consistent Streaming Through Time: A Vision for Event Stream Processing

But is this But is this implementable?implementable?

Our real definition of well behavedness: Our real definition of well behavedness:

Any well behaved operator, when given logically Any well behaved operator, when given logically equivalent sets of input streams, produces equivalent sets of input streams, produces logically equivalent sets of output streamslogically equivalent sets of output streams

Avg may now fully retract incorrect previous output Avg may now fully retract incorrect previous output and issue new correct output for the appropriate time and issue new correct output for the appropriate time periodperiod We can denote operator semantics in a very clean manner We can denote operator semantics in a very clean manner

even in a system with arbitrarily out of order dataeven in a system with arbitrarily out of order data The use of retractions to handle out of order data The use of retractions to handle out of order data

induces a spectrum of formally defined consistency induces a spectrum of formally defined consistency levels for operators levels for operators These levels expose interesting tradeoffs between various These levels expose interesting tradeoffs between various

aspects of performance and correctness (much more in the aspects of performance and correctness (much more in the paper)paper)

Page 13: Consistent Streaming Through Time: A Vision for Event Stream Processing

How do current systems cope:How do current systems cope: Wait until we’re sure we have all data that affects our results Wait until we’re sure we have all data that affects our results

up to a point in time (High consistency)up to a point in time (High consistency) High latencyHigh latency Requires application and network guaranteeRequires application and network guarantee Requires high memoryRequires high memory Absolutely correct answersAbsolutely correct answers Useful for standing queries that result in some expensive Useful for standing queries that result in some expensive

form of corrective or examination action:form of corrective or examination action: A human must examine something because some aggregation A human must examine something because some aggregation

(avg) or negation based alert tripped(avg) or negation based alert tripped Provide an answer quickly as of the current time, but ignore Provide an answer quickly as of the current time, but ignore

late arriving data (Low Consistency)late arriving data (Low Consistency) Low latencyLow latency No application or network guarantee requiredNo application or network guarantee required Low memoryLow memory Sacrifices answer correctnessSacrifices answer correctness Useful in applications which are unable to provide Useful in applications which are unable to provide

guarantees about data arrival timeliness and where exact guarantees about data arrival timeliness and where exact answers aren’t required:answers aren’t required:

E.g. Aggregations in internet scale monitoringE.g. Aggregations in internet scale monitoring

Imperfections in Event Imperfections in Event StreamingStreaming

Page 14: Consistent Streaming Through Time: A Vision for Event Stream Processing

With retractions:With retractions: Compute our output early in an optimistic fashion Compute our output early in an optimistic fashion

and retract later if necessary (Middle Consistency)and retract later if necessary (Middle Consistency) Low latencyLow latency Doesn’t require application and network guarantees Doesn’t require application and network guarantees High memory requirements: equal to the high High memory requirements: equal to the high

consistency case if we have guarantees consistency case if we have guarantees May produce more outputMay produce more output Useful in situations where we don’t want to block, but Useful in situations where we don’t want to block, but

where we want eventual correctnesswhere we want eventual correctness Stock ticker data example. We want to compute real time Stock ticker data example. We want to compute real time

info about stock data, but compensate when a correction is info about stock data, but compensate when a correction is issued.issued.

Shared expressions between two queries, one running at the Shared expressions between two queries, one running at the high level of consistency and one at the lowhigh level of consistency and one at the low

Imperfections in Event Imperfections in Event StreamingStreaming