Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford...

58
Adaptive Processing in Adaptive Processing in Data Stream Systems Data Stream Systems Shivnath Babu Shivnath Babu stanfordstreamdataman ager Stanford University
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford...

Page 1: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

Adaptive Processing in Data Adaptive Processing in Data Stream SystemsStream Systems

Shivnath BabuShivnath Babu

stanfordstreamdatamanager

Stanford University

Page 2: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 2

Data StreamsData Streams• New applications -- data as continuous, rapid,

time-varying data streamsdata streams– Sensor networks, RFID tags– Network monitoring and traffic engineering– Financial applications– Telecom call records– Web logs and click-streams– Manufacturing processes

• Traditional databases -- data stored in finite, persistent data sets data sets

Page 3: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 3

Using Traditional DatabaseUsing Traditional Database

User/ApplicationUser/Application

LoaderLoader

QueryQuery ResultResult

ResultResult……

QueryQuery……

Table R

Table S

Page 4: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 4

New Approach for Data StreamsNew Approach for Data Streams

User/ApplicationUser/Application

Register Register Continuous Continuous

QueryQuery

Stream QueryProcessor

ResultResult

Input streams

Page 5: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 5

Example Continuous QueriesExample Continuous Queries

• Web

– Amazon’s best sellers over last hour

• Network Intrusion Detection– Track HTTP packets with destination address

matching a prefix in given table and content matching “*\.ida”

• Finance – Monitor NASDAQ stocks between $20 and $200 that

have moved down more than 2% in the last 20 minutes

Page 6: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 6

Data StreamManagement

System (DSMS)

Data Stream Management System (DSMS)Data Stream Management System (DSMS)

Input Streams

RegisterContinuous

Query

StreamedResult

StoredResult

ArchiveStoredTables

Page 7: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 7

Primer on Database Query ProcessingPrimer on Database Query Processing

Preprocessing

Query Optimization

Query Execution

Best queryexecution plan

Canonical form

DeclarativeQuery

Results

DatabaseSystem

DataData

Page 8: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 8

Traditional Query OptimizationTraditional Query Optimization

Executor:Runs chosen plan to

completion

Chosen query plan

Optimizer: Finds “best” query plan to

process this query

Query

Statistics Manager: Periodically collects statistics,

e.g., data sizes, histograms

Which statisticsare required

Estimatedstatistics

Data, auxiliary

structures,statistics

Page 9: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 9

Optimizing Continuous Queries is Optimizing Continuous Queries is ChallengingChallenging

• Continuous queries are long-running

• Stream properties can change while query runs– Data properties: value distributions

– Arrival properties: bursts, delays

• System conditions can change

• Performance of a fixed plan can change significantly over time

Adaptive processing: use plan that is best for current conditions

Page 10: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 10

RoadmapRoadmap

• StreaMon: Our adaptive query processing engine

• Adaptive ordering of commutative filters

• Adaptive caching for multiway joins

• Current and future work– Similar techniques apply to conventional databases

Page 11: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 11

Traditional Optimization Traditional Optimization StreaMon StreaMon

Optimizer: Finds “best” query plan to

process this query

Executor:Runs chosen plan to

completion

Chosen query plan

Query

Statistics Manager: Periodically collects statistics, e.g., table sizes, histograms

Which statisticsare required

Estimatedstatistics

Re-optimizer:Ensures that plan is efficient

for current characteristics

Profiler: Monitors current stream and

system characteristics

Executor: Executescurrent plan on

incoming stream tuples

Decisions toadapt

Combined in part for efficiency

Page 12: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 12

Pipelined FiltersPipelined Filters

• Commutative filters over a stream

• Example: Track HTTP packets with destination address matching a prefix in given table and content matching “*\.ida”

• Simple to complex filters

– Boolean predicates

– Table lookups

– Pattern matching

– User-defined functions

Filter1

PacketsPacketsPacketsPackets

Bad packetsBad packetsBad packetsBad packets

Filter2

Filter3

Page 13: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 13

Pipelined Filters: Problem DefinitionPipelined Filters: Problem Definition

• Continuous Query: F1 Æ F2 … Æ … Fn

• Plan: Tuples F(1) F(2) … … F(n)

• Goal: Minimize expected cost to process a tuple

Page 14: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 14

Pipelined Filters: ExamplePipelined Filters: Example

1234

456

8

1 12 23

77

12

F1 F2 F3 F4

1

Input tuples Output tuples

Informal Goal: If tuple will be dropped, then drop it as cheaply as possible

Page 15: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 15

Why is Our Problem Hard?Why is Our Problem Hard?

• Filter drop-rates and costs can change over time

• Filters can be correlated• E.g., Protocol = HTTP and DestPort = 80

Page 16: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 16

Metrics for an Adaptive AlgorithmMetrics for an Adaptive Algorithm

• Speed of adaptivity– Detecting changes and finding

new plan

• Run-time overhead– Re-optimization, collecting

statistics, plan switching

• Convergence properties– Plan properties under stable

statistics

ProfilerProfilerProfilerProfiler Re-optimizerRe-optimizerRe-optimizerRe-optimizer

ExecutorExecutorExecutorExecutor

StreaMonStreaMonStreaMonStreaMon

Page 17: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 17

Pipelined Filters: Stable StatisticsPipelined Filters: Stable Statistics

• Assume statistics are not changing– Order filters by decreasing drop-rate/cost

[MS79,IK84,KBZ86,H94]

– Correlations NP-Hard

• Greedy algorithm: Use conditional statistics

– F(1) has maximum drop-rate/cost

– F(2) has maximum drop-rate/cost ratio for tuples not

dropped by F(1)

– And so on

Page 18: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 18

Adaptive Version of GreedyAdaptive Version of Greedy• Greedy gives strong guarantees

– 4-approximation, best poly-time approx. possible assuming P NP [MBM+05]

– For arbitrary (correlated) characteristics

– Usually optimal in experiments

• Challenge:– Online algorithm

– Fast adaptivity to Greedy ordering

– Low run-time overhead

A-Greedy: Adaptive Greedy

Page 19: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 19

A-GreedyA-Greedy

Profiler: Maintains conditionalfilter drop-rates and costs

over recent tuples

Executor:Processes tuples with

current Greedy ordering

Re-optimizer: Ensures thatfilter ordering is Greedy for

current statistics

statisticsEstimated

are requiredWhich statistics

Combined in part for

efficiency

Changes infilter ordering

Page 20: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 20

A-Greedy’s ProfilerA-Greedy’s Profiler

• Responsible for maintaining current statistics– Filter costs

– Conditional filter drop-rates: exponential!

• Profile Window: Sampled statistics from which required conditional drop-rates can be estimated

Page 21: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 21

Profile WindowProfile Window

1234

456

8

1 12 23

77

4

0 1 1 0

0 0 1 11 0 0 1

1 0 0 1 ProfileWindow

1

F1 F2 F3 F4

Page 22: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 22

Greedy Ordering Using Profile WindowGreedy Ordering Using Profile Window

1 0 1 0

0 0 0 1

1 0 1 0

0 1 0 0

0 1 0 0

0 0 1 1

F1 F2 F3 F4

2 2 3 2

F1 F2 F3 F4

3 2 2 2

F3 F1 F2 F4

0 2 1

3 2 2 2

F3 F2 F4 F1

2 0 1

1 0Matrix View Greedy Ordering

Page 23: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 23

A-Greedy’s Re-optimizerA-Greedy’s Re-optimizer

• Maintains Matrix View over Profile Window – Easy to incorporate filter costs

– Efficient incremental update

– Fast detection/correction of changes in Greedy order

Details in [BMM+04]: “Adaptive Processing of Pipelined Stream Filters”, SIGMOD 2004

Page 24: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 24

NextNext

• Tradeoffs and variations of A-Greedy

• Experimental results for A-Greedy

Page 25: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 25

TradeoffsTradeoffs

• Suppose:

– Changes are infrequent

– Slower adaptivity is okay

– Want best plans at very low run-time overhead

• Three-way tradeoff among speed of adaptivity, run-time overhead, and convergence properties

• Spectrum of A-Greedy variants

Page 26: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 26

Variants of A-GreedyVariants of A-GreedyAlgorithm Convergence

PropertiesRun-time Overhead

Adap.

A-Greedy 4-approx. High (relative to others)

Fast

Matrix View

1 0 1 0

0 0 0 1

1 0 1 0

0 1 0 0

0 1 0 0

0 0 1 1

3 2 2 2

2 0 1

0

1 0

Profile Window Matrix View

Page 27: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 27

Variants of A-GreedyVariants of A-GreedyAlgorithm Convergence

PropertiesRun-time Overhead

Adap.

A-Greedy 4-approx. High (relative to others)

Fast

Matrix View

Sweep 4-approx. Less work per sampling step

Slow

Local-Swaps May get caught in

local optima

Less work per sampling step

Slow

Independent Misses correlations

Lower sampling rate

Fast

Page 28: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 28

Experimental SetupExperimental Setup

• Implemented A-Greedy, Sweep, Local-Swaps, and Independent in StreaMon

• Studied convergence properties, run-time overhead, and adaptivity

• Synthetic testbed– Can control stream data and arrival properties

• DSMS server running on 700 MHz Linux machine, 1 MB L2 cache, 2 GB memory

Page 29: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 29

Converged Processing RateConverged Processing Rate

Optimal-Fixed

Sweep

A-Greedy

Independent

Local-Swaps

20000

25000

30000

35000

40000

45000

50000

55000

3 4 6 8 10

Number of filters

Avg

. pro

cess

ing

rate

(tup

les/

sec)

Optimal

Sweep

A-Greedy

Local-Swaps

Independent

Page 30: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 30

Effect of Filter Drop-RateEffect of Filter Drop-Rate

Optimal-Fixed

Sweep

A-Greedy

Independent

Local-Swaps

30000

35000

40000

45000

50000

55000

60000

65000

70000

75000

80000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Drop-rate for each of 3 filters

Avg

. pro

cess

ing

rate

(tup

les/

sec)

Optimal

A-Greedy

Independent

Page 31: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 31

Effect of CorrelationEffect of Correlation

Optimal-Fixed

Sweep

A-Greedy

Independent

Local-Swaps

20000

25000

30000

35000

40000

45000

50000

55000

1 2 3 4

Correlation factor

Avg

. pro

cess

ing

rate

(tup

les/

sec)

Optimal

A-Greedy

Independent

Page 32: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 32

Run-time OverheadRun-time Overhead

0

50

100

150

200

250

Optimal A-Greedy Sweep Local-Swaps Independent

Ave

rage

tim

e/tu

ple

(mic

rose

cs)

Tuple processing Profiling + Reopt. Overhead

Page 33: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 33

AdaptivityAdaptivity

30003200340036003800400042004400460048005000

Progress of time

#Filt

ers

eval

uate

d pe

r 20

00 tu

ples

Sweep

Local-Swaps

A-Greedy

Independent

Permute selectivitieshere

Progress of time (x1000 tuples processed)

Page 34: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 34

RoadmapRoadmap

• StreaMon: Our adaptive processing engine

• Adaptive ordering of commutative filters

• Adaptive caching for multiway joins

• Current and future work

Page 35: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 35

Stream JoinsStream Joins

Sensor RSensor RSensor RSensor R Sensor SSensor SSensor SSensor S Sensor TSensor TSensor TSensor T

DSMS

observationsin the

last minute

join results

Page 36: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 36

MJoins (VNB04)MJoins (VNB04)

⋈R

⋈T

Window on R Window on S Window on T

⋈S

⋈T ⋈S

⋈R

Page 37: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 37

Excessive Recomputation in MJoinsExcessive Recomputation in MJoins

⋈R

⋈T

Window on R Window on S Window on T

Page 38: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 38

Materializing Join SubexpressionsMaterializing Join Subexpressions

Window on R Window on S Window on T

Fully-materialized

joinsubexpression

Page 39: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 39

Tree Joins: Trees of Binary JoinsTree Joins: Trees of Binary Joins

RRRR

SSSS

TTTT

Fully-materializedjoin subexpression

Window on R

Window on T

Window on S

Page 40: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 40

Hard State Hinders AdaptivityHard State Hinders Adaptivity

RRRR

SSSS

TTTT

WR WT⋈

SSSS TTTT

WS WT⋈

RRRR

Plan switch

Page 41: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 41

Can we get best of both worlds?Can we get best of both worlds?

MJoin Tree Join

Θ Recomputation Θ Less adaptive

Θ Higher memory use

WR WT⋈⋈S

⋈T

R S T

⋈T

⋈R

⋈R

⋈S

R

S

T

Page 42: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 42

MJoins + CachesMJoins + Caches

⋈R

⋈T

Window on R Window on S Window on T

WR WT⋈ S tuple Cache

Probe

Bypasspipelinesegment

Page 43: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 43

MJoins + Caches (contd.)MJoins + Caches (contd.)

• Caches are soft state– Adaptive

– Flexible with respect to memory usage

• Captures whole spectrum from MJoins to Tree Joins and plans in between

• Challenge: adaptive algorithm to choose join operator orders and caches in pipelines

Page 44: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 44

Adaptive Caching (A-Caching)Adaptive Caching (A-Caching)

• Adaptive join ordering with A-Greedy or variant– Join operator orders candidate caches

• Adaptive selection from candidate caches

• Adaptive memory allocation to chosen caches

Page 45: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 45

A-Caching (caching part only)A-Caching (caching part only)

Profiler:Estimates costs and benefits

of candidate caches

Executor:MJoins with caches

Re-optimizer: Ensures that maximum-benefit subset

of candidate caches is used

List of candidate caches

Estimatedstatistics

Combined in partfor efficiency

Add/removecaches

Page 46: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 46

Performance of Stream-Join Plans (1)Performance of Stream-Join Plans (1)

Arrival rates of streams are in the ratio 1:1:1:10, other details of input are given in [BMW+05]

R

T

S

U

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

MJoin TreeJoin A-Caching

Avg.

pro

cess

ing

rate

(tup

les/

sec)

Page 47: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 47

Performance of Stream-Join Plans (2)Performance of Stream-Join Plans (2)

Arrival rates of streams are in the ratio 15:10:5:1, other details of input are given in [BMW+05]

0

50000

100000

150000

200000

250000

MJoin TreeJoin A-Caching

Avg

. p

rocessin

g r

ate

(tu

ple

s/s

ec)

Page 48: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 48

A-Caching: Results at a glanceA-Caching: Results at a glance

• Capture whole spectrum from Fully-pipelined MJoins to Tree-based joins adaptively

• Approximation algorithms scalable

• Different types of caches

• Up to 7x improvement with respect to MJoin and 2x improvement with respect to TreeJoin

• Details in [BMW+05]: “Adaptive Caching for Continuous Queries”, ICDE 2005 (To appear)

Page 49: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 49

Current and Future WorkCurrent and Future Work

• Broadening StreaMon’s scope, e.g.,– Shared computation among multiple queries

– Parallelism

• Rio: Adaptive query processing in conventional database systems

• Plan logging: A new overall approach to address certain “meta issues” in adaptive processing

Page 50: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 50

Related WorkRelated Work• Adaptive processing of continuous queries

– E.g., Eddies [AH00], NiagaraCQ [CDT+00]

• Adaptive processing in conventional databases

– Inter-query adaptivity, e.g., Leo [SLM+01], [BC03]

– Intra-query adaptivity, e.g., Re-Opt [KD98], POP [MRS+04]

• New approaches to query optimization

– E.g., parametric [GW89,INS+92,HS03], expected-cost based [CHS99,CHG02], error-aware [VN03]

Page 51: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 51

SummarySummary

• New trends demand adaptive query processing– New applications, e.g., continuous queries, data streams

– Increasing data size and query complexity

• CS-wide push towards autonomic computing

• Our goal: Adaptive Data Management System– StreaMon: Adaptive Data Stream Engine

– Rio: Adaptive Processing in Conventional DBMS

• Google keywords: shivnath, stanford stream

Page 52: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 52

Performance of Stream-Join PlansPerformance of Stream-Join Plans

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

D1 D2 D3 D4 D5 D6 D7 D8

Sample points from spectrum of input properties

Avg

. pro

cess

ing

rate

(tup

les/

sec)

MJoin

TreeJoin

A-Caching

Page 53: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 53

Adaptivity to Memory AvailabilityAdaptivity to Memory Availability

10000

12000

14000

16000

18000

20000

22000

24000

26000

28000

0 10 20 32 40 50 60 70

Memory available for storing join subresults (KB)

Avg

. pro

cess

ing

rate

(tup

les/

sec)

TreeJoin

A-Caching

MJoin

Page 54: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 54

Plan LoggingPlan Logging• Log the profiling and re-optimization history

– Query is long-running

– Example view over log for R S T

Rate(R) ….. R,S) Plan Cost

1024 ….. 0.75 P112762

5642 ….. 0.72 P272332

934 ….. 0.76 P112003

⋈ ⋈

Plans lying in a Plans lying in a high- high- dimensional space of statisticsdimensional space of statistics Plans lying in a Plans lying in a high- high- dimensional space of statisticsdimensional space of statistics

Rate(R)

R,S

)

P1

P2

Page 55: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 55

Uses of Plan LoggingUses of Plan Logging

• Reducing re-optimization overhead– Create a cache of plans

• Reducing profiling overhead– Track how changes in a statistic contribute to

changes in best plan

Rate(R)

R

,S)

P1P2

Page 56: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 56

Uses of Plan Logging (contd.)Uses of Plan Logging (contd.)

• Tracking “Return of Investment” on adaptive processing– Track cost versus benefit of adaptivity

– Is there a single plan that would have good overall performance?

• Avoiding thrashing– Which statistics have transient changes?

Page 57: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 57

Adaptive Processing in Traditional DBMSAdaptive Processing in Traditional DBMS

Executor:Runs chosen plan to

completion

Chosen query plan

Optimizer: Finds “best” query plan to

process this query

Query

Statistics Manager: Periodically collects statistics,

e.g., data sizes, histograms

Which statisticsare required

Estimatedstatistics

Errors

Page 58: Adaptive Processing in Data Stream Systems Shivnath Babu stanfordstreamdatamanager Stanford University.

stanfordstreamdatamanager 58

Proactive Re-optimization with RioProactive Re-optimization with Rio

QueryWhich statisticsare required

Estimates

“Robust“

plansCombined

for efficiency

Stats. Mgr. + Profiler: Collects statistics atrun-time based onrandom samples

(Re-)optimizer:Considers pairs of

(estimate, uncertainty)during optimization+

uncertainty

Statistics Manager: Periodically collects statistics,

e.g., data sizes, histograms

Optimizer: Finds “best” query plan to

process this query

Executor:Executes current plan

Executor:Runs chosen plan to

completion