Adaptive Query Processing
description
Transcript of Adaptive Query Processing
Adaptive Query Processing
Amol Deshpande, University of Maryland
Joseph M. Hellerstein, University of California, Berkeley
Vijayshankar Raman, IBM Almaden Research Center
Outline
20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering
Adaptive Join Processing
Research Roundup
Outline
20th Century Adaptivity: Intuition from the Classical Systems– Data Independence and Adaptivity– The Adaptivity Loop– Two case studies
• System R• INGRES
– Tangential topics
Adaptive Selection Ordering
Adaptive Join Processing
Research Roundup
Data Independence Redux The taproot of modern database technology Separation of specification (“what”) from implementation (“how”)
Refamiliarizing ourselves: Why do we care about data independence?
d appdt
d env
dt
D. I. Adaptivity Query Optimization: the key to data independence
– bridges specification and implementation– isolates static applications from dynamic environments
How does a DBMS account for dynamics in the environment?
This tutorial is on a 30-year-old topic– With a 21st-Century renaissance
ADAPTIVITYADAPTIVITY
Why the Renaissance? Breakdown of traditional query optimization
– Queries over many tables– Unreliability of traditional cost estimation– Success & maturity make problems more apparent, critical
• c.f. Oracle v6!
Query processing in new environments– E.g. data integration, web services, streams, P2P, sensornets, hosting, etc.– Unknown and dynamic characteristics for data and runtime– Increasingly aggressive sharing of resources and computation– Interactivity in query processing
Note two separate themes? – Unknowns: even static properties often unknown in new environments
• and often unknowable a priori– Dynamics: can be very high -- motivates intra-query adaptivity
?
denvdt
The Adaptivity Loop
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When? When?When? When?When?
Features & Model?Features & Model? Plan Space?Plan Space? What can be changed?What can be changed?
Overhead?Overhead? Overhead?Overhead? Overhead?Overhead?
Need not happen at the same timescales!
An example query SQL: SELECT * FROM Professor P, Course C, Student S WHERE P.pid = C.pid AND S.sid = C.sid
QUEL: range of P is Professor range of C is Course range of S is Student RETRIEVE (P.ALL, C.ALL, S.ALL) WHERE P.pid = C.pid AND S.sid = C.sid
Professor
Course
Student
Dynamic Programming
System R Optimizer
> UPDATE STATISTICS ❚ cardinalities
index lo/hi key
> SELECT * FROM ...
❚
System R Adaptivity
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
Daily/WeeklyDaily/WeeklyWhen?When?
Query CompilationQuery CompilationWhen?When?
Query RuntimeQuery Runtime
Features?Features?
Cardinalities, Indexes, Cardinalities, Indexes, Hi/Lo Keys Hi/Lo KeysModel?Model? (1-bucket) Histograms(1-bucket) Histograms
Plan Space?Plan Space?Trees of Binary OpsTrees of Binary Ops
(TOBO)(TOBO)
What can be changed?What can be changed?Initial Plan ChoiceInitial Plan Choice
Overhead?Overhead? Offline scans of tablesOffline scans of tables
Overhead?Overhead?Exponential Dynamic Exponential Dynamic
ProgrammingProgramming
Overhead?Overhead?
NoneNone
Note different timescales
INGRES “Query Decomposition” 1> RANGE OF P IS ...
❚
> RANGE OF C_T IS … WHERE C_T.pid=44…❚
hashed temps
hashed temps
OVQP
OVQP(selections)
c
P C
S
INGRES “Query Decomposition” 1> RANGE OF P IS ...
❚
c
hashed temps
hashed temps
> RANGE OF T_T IS … WHERE T_T.sid = 273❚ output tuples
OVQP(selections)
OVQP
OVQP
> RANGE OF C_T IS … WHERE C_T.pid=44…❚
P C
S
> RANGE OF CT IS … WHERE CT.pid=26…❚
INGRES “Query Decomposition” 2
hashed temps
hashed temps
output tuples
OVQP
OVQP
> RANGE OF P IS ...
❚
> RANGE OF ST IS … WHERE ST.sid=441❚
OVQP(selections)P C
S
INGRES: Post-Mortem Case 1:
P, S, PC
Case 2:
P, PC, S
INL Hash
INL
INL Hash
INL
Plan choice Plan choice determined by determined by number of C number of C
matches per Pmatches per P
Each P tuple either Each P tuple either Type1 or Type2. Type1 or Type2.
Hash
Hash
Hash
P
C
S
Hash
Hash
Hash
P
C
S
“Post-mortem” behavior– Horizontal partitioning of inputs into different static plans [Ives02]
• “Driving” input relation effectively partitioned by join keys• Each partition participates in a different static plan• Recurses up each different join tree
– End result can be described as a union of static plans over partitions• In general, many such plans!
Note: post-mortem always has a relational description of some sort– But often “unusual”: plans that are simply not considered in System R!
• Often cannot know the partitioning prior to query execution– So: plan-space and adaptivity loop settings have strong interactions!
• A theme we’ll see throughout.
Horizontal Partitioning
INGRES Adaptivity
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
After Each One-After Each One- Variable Query Variable Query
When?When?Before calling OVQPBefore calling OVQP
When?When?Each OVQP callEach OVQP call
Features?Features?
CardinalitiesCardinalities (including temps) (including temps)Model?Model? (1-Bucket) Histograms(1-Bucket) Histograms
Plan Space?Plan Space?Single-table access Single-table access
methodsmethods
What can be changed?What can be changed?Choice of relationChoice of relation
Overhead?Overhead?
Offline tablescans,Offline tablescans, Count temp rows Count temp rows
Overhead?Overhead?Simple FindMin operationSimple FindMin operation
Overhead?Overhead?
NoneNone
All ‘round the loop each time…
Observations on 20thC Systems Both INGRES & System R used adaptive query processing
– To achieve data independence
They “adapt” at different timescales– Ingres goes ‘round the whole loop many times per query– System R decouples parts of loop, and is coarser-grained
• measurement/modeling: periodic • planning/actuation: once per query
Query Post-Mortem reveals different relational plan spaces– System R is direct: each query mapped to a single relational algebra stmt– Ingres’ decision space generates a union of plans over horizontal partitions
• this “super-plan” not materialized -- recomputed via FindMin
Both have zero-overhead actuation– Never waste query processing work
20th Century Summary System R’s optimization scheme deemed the winner for 25 years
Nearly all 20thC research varied System R’s individual steps– More efficient measurement (e.g. sampling)– More efficient/effective models (samples, histograms, sketches)– Expanded plan spaces (new operators, bushy trees, richer queries and data
models, materialized views, parallelism, remote data sources, etc)– Alternative planning strategies (heuristic and enumerative)
Speaks to the strength of the scheme– independent innovation on multiple fronts– as compared with tight coupling of INGRES
But… minimal focus on the interrelationship of the steps– Which, as we saw from Ingres, also affects the plan space
21st Century Adaptive Query Processing (well, starts in late 1990’s)
Revisit basic architecture of System R– In effect, change the basic adaptivity loop!
As you examine schemes, keep an eye on:– Rate of change in the environment that is targeted– How radical the scheme is wrt the System R scheme
• ease of evolutionary change– Increase in plan space: are there new, important opportunities?
• even if environment is ostensibly static!– New overheads introduced– How amenable the scheme is to independent innovation at each step
• Measure/Analyze/Plan/Actuate
Tangentially Related Work An incomplete list!!!
Competitive Optimization [Antoshenkov93]– Choose multiple plans, run in parallel for a time, let the most promising finish
• 1x feedback: execution doesn’t affect planning after the competition Parametric Query Optimization [INSS92, CG94, etc.]
– Given partial stats in advance. Do some planning and prune the space. At runtime, given the rest of statistics, quickly finish planning.• Changes interaction of Measure/Model and Planning• No feedback whatsoever, so nothing to adapt to!
“Self-Tuning”/“Autonomic” Optimizers [CR94, CN97, BC02, etc.]– Measure query execution (e.g. cardinalities, etc.)
• Enhances measurement, on its own doesn’t change the loop– Consider building non-existent physical access paths (e.g. indexes, partitions)
• In some senses a separate loop – adaptive database design• Longer timescales
Tangentially Related Work II Robust Query Optimization [CHG02, MRS+04, BC05, etc.]
– Goals: • Pick plans that remain predictable across wide ranges of scenarios• Pick least expected cost plan
– Changes cost function for planning, not necessarily the loop.• If such functions are used in adaptive schemes, less fluctuation [MRS+04]
– Hence fewer adaptations, less adaptation overhead
Adaptive query operators [NKT88, KNT89, PCL93a, PCL93b]– E.g. memory-adaptive sort and hash-join– Doesn’t address whole-query optimization problems– However, if used with AQP, can result in complex feedback loops
• Especially if their actions affect each other’s models!
Extended Topics in Adaptive QP An incomplete list!!
Parallelism & Distribution– River [A-D03] – FLuX [SHCF03, SHB04]– Distributed eddies [TD03]
Data Streams– Adaptive load shedding– Shared query processing
Adaptive Selection Ordering
Selection Ordering
Complex predicates on relations common– Eg., on an employee relation:
((salary > 120000) AND (status = 2)) OR ((salary between 90000 and 120000) AND (age < 30) AND (status = 1)) OR …
Selection ordering problem Decide the order in which to evaluate the individual predicates against the tuples
We focus on evaluating conjunctive predicates (containing only AND’s) Example Query
select * from R where R.a = 10 and R.b < 20 and R.c like ‘%name%’;
Why Study Selection Ordering
Many join queries reduce to this problem– Queries posed against a star schema– Queries where only pipelined left-deep plans are considered– Queries involving web indexes
Increasing interest in recent years– Web indexes [CDY’95, EHJKMW’96, GW’00]– Web services [SMWM’06]– Data streams [AH’00, BMMNW’04]– Sensor Networks [DGMH’05]
Similar to many problems in other domains– Sequential testing (e.g. for fault detection) [SF’01, K’01]– Learning with attribute costs [KKM’05]
Why Study Selection Ordering Simpler to understand and analyze
– Many fundamental AQP ideas can be demonstrated with these queries– Very good analytical and theoretical results known
• No analogues for general multi-way joins
Big differences to look out for– These queries are stateless; queries involving joins are not stateless
• No burden of routing history– Selections are typically very inexpensive
• The costs of AQP techniques become important
Execution StrategiesPipelined execution (tuple-at-a-time)
Operator-at-a-time execution
Apply predicate R.a = 10 to all tuples of R; materialize result as R1,Apply predicate R.b < 20 to all tuples of R1; materialize result as R2,…
R.a = 10 R.b < 20R resultR.c like …R1 Materialize
R1
R2 MaterializeR2
R.a = 10 R.b < 20 R.c like …
For each tuple r Є R Apply predicate R.a = 10 first; If tuple satisfies the selection, apply R.b < 20; If both satisfied, apply R.c like ‘%name%’;
R result
Execution StrategiesPipelined execution (tuple-at-a-time)
Operator-at-a-time execution
R.a = 10 R.b < 20R resultR.c like …R1 Materialize
R1
R2 MaterializeR2
R.a = 10 R.b < 20 R.c like …R result
Fundamentally different from adaptivity perspective
Preferred for selection ordering
Outline 20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering– Setting and motivation– Four Approaches
• Static Selinger-style optimization– KBZ Algorithm for independent selections [KBZ’86]– A 4-approx greedy algorithm for correlated selections [BMMNW’04]
• Mid-query reoptimization [KD’98]– Adapted to handle selection ordering
• A-Greedy [BMMNW’04]• Eddies [AH’00]• Other related work
Adaptive Join Processing Research Roundup
Static Selinger-style Optimization Find a single order of the selections to be used for all tuples
Queryselect *
from R where R.a = 10 and R.b < 20 and R.c like ‘%name%’;
Query plans considered
R.a = 10 R.b < 20R resultR.c like …
R.b < 20 R.c like …R resultR.a = 10 3! = 6 distinctplans possible
Static Selinger-style Optimization Cost metric: CPU instructions Computing the cost of a plan
– Need to know the costs and the selectivities of the predicates
R.a = 10 R.b < 20R resultR.c like …
cost(plan) = |R| * (c1 + s1 * c2 + s1 * s2 * c3)
R1 R2 R3
costs c1 c2 c3selectivities s1 s2 s3
cost per c1 + s1 c2 + s1 s2 c3tuple
Independence assumption
Static Selinger-style Optimization Dynamic programming algorithm
Complexity: O(2n)
Compute optimalorder and cost for
1-subsets of predicates
R.a = 10 R.b < 20 R.c like …
2-subsets of predicates
R.a = 10 AND R.b < 20 R.a = 10 AND R.c like ..
3-subsets of predicates
R.a = 10 AND R.b < 20 AND R.c like …
Using 1-d histograms or random samples etc
Using 2-d histograms or random samples, or by assuming independence
Static Selinger-style Optimization KBZ algorithm for independent selections [KBZ’86]
– Apply the predicates in the decreasing order of: (1 – s) / c where s = selectivity, c = cost
Correlated selections– NP-hard under several different formulations
• E.g. when given a random sample of the relation– Greedy algorithm:
• Apply the selection with the highest (1 - s)/c • Compute the selectivities of remaining selections over the result
– Conditional selectivities• Repeat
– Can be shown to be 4-approximate [BMMNW’04]• Best possible unless P = NP
Static Selinger-Style
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
Daily/WeeklyDaily/WeeklyWhen?When?
Query CompilationQuery CompilationWhen?When?
Query RuntimeQuery Runtime
Features?Features?
Selectivities Selectivities Model?Model? Histograms, random Histograms, random samples etcsamples etc
Plan Space?Plan Space? Linear orderings of the Linear orderings of the
selectionsselections
What can be changed?What can be changed?Selection ordering usedSelection ordering used
Overhead?Overhead?
Offline tablescansOffline tablescansOverhead?Overhead?
Dynamic programming or Dynamic programming or simple sortingsimple sorting
Overhead?Overhead?
NoneNone
Outline 20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering– Setting and motivation– Four Approaches
• Static Selinger-style optimization– KBZ Algorithm for independent selections [KBZ’86]– A 4-approx greedy algorithm for correlated selections [BMMNW’04]
• Mid-query reoptimization [KD’98]– Adapted to handle selection ordering
• A-Greedy [BMMNW’04]• Eddies [AH’00]• Other related work
Adaptive Join Processing Research Roundup
Mid-query Reoptimization At materialization points, re-evaluate the rest of the query plan Example:
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3Materialize
R1
Initial query plan chosen
Estimated selectivities
0.05 0.1 0.2
A free opportunity to re-evaluate the rest of the query plan - Exploit by gathering information about the materialized result
Mid-query Reoptimization At materialization points, re-evaluate the rest of the query plan Example:
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3Materialize
R1; build1-d hists
Initial query plan chosen
Estimated selectivities
0.05 0.1 0.2
A free opportunity to re-evaluate the rest of the query plan - Exploit by gathering information about the materialized result
Mid-query Reoptimization At materialization points, re-evaluate the rest of the query plan Example:
R.b < 20 R.c like …R2 R3
R.a = 10RR1 Materialize
R1
Initial query plan chosen
Estimated selectivities
0.05 0.1 0.2
Re-estimatedselectivities
0.5 0.01
Significantly different original plan probably sub-optimalReoptimize the remaining part of the query
MaterializeR1; build1-d hists
Mid-query Reoptimization Explored plan space identical to static
– The operators are applied to the tuples in the same order• The order is determined lazily
– The specific approach equivalent to the 4-Approx Greedy algorithm
Cost of adaptivity:– Materialization cost
• Many (join) query plans typically have materialization points • May want to introduce materialization points if there is high uncertainty
– Constructing statistics on intermediate results• Depends on the statistics maintained
– Re-optimization cost• Optimizer should be re-invoked only if the estimates are significantly wrong
Mid-query Reoptimization Advantages:
– Easy to implement in a traditional query processing system– Familiar plan space; easy to optimize and understand what's going on
• Operator-at-a-time query processing
Disadvantages:– Granularity of adaptivity is coarse
• Once an operator starts executing, can’t change that decision– Explored plan space identical to static optimization
• Can’t apply different orders to different sets of tuples– Requires materialization
• Cost of materialization can be high• Ill-suited for data streams and similar environments
Mid-query Adaptivity
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
At each materialization At each materialization pointpoint
When?When? At each materialization At each materialization
pointpoint
When?When?After planningAfter planning
Features?Features?
Selectivities of Selectivities of remaining predicatesremaining predicatesModel?Model? Histograms etcHistograms etc
Plan Space?Plan Space? Linear orderings of the Linear orderings of the
selections and selections and materialization pointsmaterialization points
What can be changed?What can be changed?Selection ordering usedSelection ordering used
Overhead?Overhead?
Statistics collection, Statistics collection, materialization costmaterialization cost
Overhead?Overhead?Dynamic programming or Dynamic programming or
simple sortingsimple sorting
Overhead?Overhead?
MinimalMinimal
Outline 20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering– Setting and motivation– Four Approaches
• Static Selinger-style optimization– KBZ Algorithm for independent selections [KBZ’86]– A 4-approx greedy algorithm for correlated selections [BMMNW’04]
• Mid-query reoptimization [KD’98]– Adapted to handle selection ordering
• A-Greedy [BMMNW’04]• Eddies [AH’00]• Other related work
Adaptive Join Processing Research Roundup
Adaptive Greedy [BMMNW’04] Context: Pipelined query plans over streaming data Example:
R.a = 10 R.b < 20 R.c like …
Initial estimated selectivities
0.05 0.1 0.2
Costs 1 unit 1 unit 1 unit
Three independent predicates
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Optimal execution plan orders by selectivities (because costs are identical)
Adaptive Greedy [BMMNW’04] Monitor the selectivities Switch order if the predicates not ordered by selectivities
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Rsample
Randomly sample R.a = 10
R.b < 20
R.c like …
estimate selectivities of the predicatesover the tuples of the profile
ReoptimizerIF the current plan not optimal w.r.t. these new selectivitiesTHEN reoptimize using the Profile
Profile
Adaptive Greedy [BMMNW’04] Correlated Selections
– Must monitor conditional selectivities
monitor selectivities sel(R.a = 10), sel(R.b < 20), sel(R.c …)
monitor conditional selectivities sel(R.b < 20 | R.a = 10) sel(R.c like … | R.a = 10) sel(R.c like … | R.a = 10 and R.b < 20)
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
Rsample
Randomly sample R.a = 10
R.b < 20
R.c like …(Profile)
ReoptimizerUses conditional selectivities to detect violationsUses the profile to reoptimize
O(n2) selectivities need to be monitored
Adaptive Greedy [BMMNW’04]
Cost of adaptivity:– Profile maintenance
• Must evaluate a (random) fraction of tuples against all operators – Detecting violations
• Periodic checks for detecting if the current order is optimal• Doing this per tuple too expensive
– Reoptimization cost• Can require multiple passes over the profile
Adaptive Greedy: Post-Mortem Plan Space explored
– “Horizontal partitioning” by order of arrival
If the selectivities correlated with tuple arrival order, this can lead to huge savings
R.a = 10 R.b < 20 R.c like …
..
Plan switch point
R.b < 20 R.a= 10 R.c like …
orderof arrival
Adaptive Greedy [BMMNW’04]
Advantages: – Can adapt very rapidly– Theoretical guarantees on performance
• Not known for any other AQP protocols
Disadvantages:– Limited applicability
• Only applies to selection ordering and specific types of join queries– Possibly high runtime overheads
• Several heuristics described in the paper
A-Greedy Adaptivity
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
Periodically during Periodically during executionexecution
When?When? Current plan suboptimal Current plan suboptimal
on last data windowon last data window
When?When?After planningAfter planning
Features?Features?
Conditional selectivities Conditional selectivities Model?Model? Random sample, a Random sample, a specific sufficient statisticspecific sufficient statistic
Plan Space?Plan Space? Linear orderings of the Linear orderings of the
selectionsselections
What can be changed?What can be changed?Selection ordering used Selection ordering used
for remaining tuplesfor remaining tuples
Overhead?Overhead?
Statistics maintenance, Statistics maintenance, suboptimality detectionsuboptimality detection
Overhead?Overhead?Running greedy algorithm Running greedy algorithm
over profile tuplesover profile tuples
Overhead?Overhead?
MinimalMinimal
Outline 20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering– Setting and motivation– Four Approaches
• Static Selinger-style optimization– KBZ Algorithm for independent selections [KBZ’86]– A 4-approx greedy algorithm for correlated selections [BMMNW’04]
• Mid-query reoptimization [KD’98]– Adapted to handle selection ordering
• A-Greedy [BMMNW’04]• Eddies [AH’00]• Other related work
Adaptive Join Processing Research Roundup
Eddies [AH’00] Treat query processing as routing of tuples through operators
Pipelined query execution using an eddy
An eddy operator• Intercepts tuples from sources and output tuples from operators• Executes query by routing source tuples through operators
A traditional pipelined query plan
R.a = 10 R.b < 20R resultR.c like …R1 R2 R3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Encapsulates the full adaptivity loop in a “standard” dataflow operator: measure, model, plan and actuate.
Eddies [AH’00]
a b c …15 10 AnameA …
An R Tuple: r1
r1
r1
EddyR
result
R.a = 10
R.c like …
R.b < 20
ready bit i : 1 operator i can be applied 0 operator i can’t be applied
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
done bit i : 1 operator i has been applied 0 operator i hasn’t been applied
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
Used to decide validity and need of applying operators
EddyR
result
R.a = 10
R.c like …
R.b < 20
Eddies [AH’00]
a b c … ready done15 10 AnameA … 111 000
An R Tuple: r1
r1
Operator 1
Operator 2
Operator 3
satisfiedr1
r1
a b c … ready done15 10 AnameA … 101 010
r1
not satisfied
eddy looks at the next tuple
For a query with only selections, ready = complement(done)
EddyR
result
R.a = 10
R.c like …
R.b < 20
Eddies [AH’00]
a b c …10 15 AnameA …
An R Tuple: r2
Operator 1
Operator 2
Operator 3
r2EddyR
result
R.a = 10
R.c like …
R.b < 20
satisfied
satisfied
satisfied
Eddies [AH’00]
a b c … ready done10 15 AnameA … 000 111
An R Tuple: r2
Operator 1
Operator 2
Operator 3
r2
if done = 111, send to output
r2
EddyR
result
R.a = 10
R.c like …
R.b < 20
satisfied
satisfied
satisfied
Eddies [AH’00] Adapting order is easy
– Just change the operators to which tuples are sent– Can be done on a per-tuple basis– Can be done in the middle of tuple’s “pipeline”
How are the routing decisions made ?– Using a routing policy
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Routing Policy 1: Non-adaptive Simulating a single static order
– E.g. operator 1, then operator 2, then operator 3
Routing policy: if done = 000 route to 1 100 route to 2 110 route to 3
table lookups very efficient
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Overhead of Routing PostgreSQL implementation of eddies using bitset lookups [Telegraph Project] Queries with 3 selections, of varying cost
– Routing policy uses a single static order, i.e., no adaptation
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 μsec 10 μsec 100 μsec
Selection cost
Norm
aliz
ed C
ost
No-eddies
Eddies
Routing Policy 2: Deterministic Monitor costs and selectivities continuously Reoptimize periodically using KBZ
Statistics Maintained: Costs of operators Selectivities of operators
Routing policy: Use a single order for a batch of tuples Periodically apply KBZ
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Can use the A-Greedy policy for correlated predicates
Overhead of Routing and Reoptimization Adaptation using batching
– Reoptimized every X tuples using monitored selectivities– Identical selectivities throughout experiment measures only the overhead
0
1
2
3
4
5
6
0 μsec 10 μsec 100 μsec
Selection Cost
Norm
aliz
ed C
ost No-eddies
Eddies - No reoptimization
Eddies - Batch Size = 100 tuples
Eddies - Batch Size = 1 tuple
Routing Policy 3: Lottery Scheduling Originally suggested routing policy [AH’00] Applicable when each operator runs in a separate “thread”
– Can also be done single-threaded, via an event-driven query executor Uses two easily obtainable pieces of information for making routing
decisions:– Busy/idle status of operators– Tickets per operator
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
Routing Policy 3: Lottery Scheduling Routing decisions based on busy/idle status of operators
Rule: IF operator busy, THEN do not route more tuples to it
Rationale: Every thread gets equal time SO IF an operator is busy, THEN its cost is perhaps very high
Operator 1
Operator 2
Operator 3
EddyR
result
R.a = 10
R.c like …
R.b < 20
BUSY
IDLE
IDLE
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
Will be routed to: O1 w.p. 0.1 O2 w.p. 0.7 O3 w.p. 0.2
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
r
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
r
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 11tickets(O2) = 70tickets(O3) = 20
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
r
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 11tickets(O2) = 70tickets(O3) = 20
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
r
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
Will be routed to: O2 w.p. 0.777 O3 w.p. 0.222
Routing Policy 3: Lottery Scheduling Routing decisions based on tickets
Rationale: Tickets(Oi) roughly corresponds to (1 - selectivity(Oi)) So more tuples are routed to the more selective operators
Rules: 1. Route a new tuple randomly weighted according to the number of tickets 2. route a tuple to an operator Oi
tickets(Oi) ++; 3. Oi returns a tuple to eddy tickets(Oi) --;
Operator 1
Operator 2
Operator 3
Eddy
result
R.a = 10
R.c like …
R.b < 20
tickets(O1) = 10tickets(O2) = 70tickets(O3) = 20
Routing Policy 3: Lottery Scheduling
Effect of the combined lottery scheduling policy:– Low cost operators get more tuples– Highly selective operators get more tuples– Some tuples are randomly, knowingly routed according to sub-optimal
orders• To explore• Necessary to detect selectivity changes over time
Routing Policy 4: Content-based Routing Routing decisions made based on the values of the attributes [BBDW’05] Also called “conditional planning” in a static setting [DGHM’05] Less useful unless the predicates are expensive
– At the least, more expensive than r.d > 100
Example Eddy notices: R.d > 100 sel(op1) > sel(op2) & R.d < 100 sel(op1) < sel(op2)
Routing decisions for new tuple “r”: IF (r.d > 100): Route to op1 first w.h.p ELSE Route to op2 first w.h.p
Operator 1
Operator 2
Eddy result
Expensive predicates
Eddies: Post-Mortem Plan Space explored
– Allows arbitrary “horizontal partitioning”– Not necessarily correlated with order of arrival (unlike A-greedy)
• E.g. with lottery scheduling policy, content-based routing.
..
R.a = 10 R.b < 20 R.c like …
R.b < 20 R.a= 10 R.c like …
.
.
orderof arrival
Eddies: Post-Mortem
Cost of adaptivity– Routing overheads
• Minimal with careful engineering– E.g. using bitset-indexed routing arrays
• “Batching” helps tremendously– Statistics Maintenance– Executing the routing policy logic
Lottery Scheduling
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
Continuously during Continuously during executionexecution
When?When? Continuously during Continuously during
executionexecution
When?When?Continuously during Continuously during
executionexecution
Features?Features?
Operator status (ready/ Operator status (ready/ idle), tickets per operatoridle), tickets per operatorModel?Model? Random samples (in Random samples (in essence)essence)
Plan Space?Plan Space? Linear orderings of the Linear orderings of the
selectionsselections
What can be changed?What can be changed?Selection ordering usedSelection ordering used
Overhead?Overhead?
Statistics collection, Statistics collection, cost of explorationcost of exploration
Overhead?Overhead?NoneNone
Overhead?Overhead?
Routing overheadRouting overhead
Outline 20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering– Setting and motivation– Four Approaches
• Static Selinger-style optimization– KBZ Algorithm for independent selections [KBZ’86]– A 4-approx greedy algorithm for correlated selections [BMMNW’04]
• Mid-query reoptimization [KD’98]– Adapted to handle selection ordering
• A-Greedy [BMMNW’04]• Eddies [AH’00]• Other related work
Adaptive Join Processing Research Roundup
Related Problems Multi-way join queries
– Some classes of queries reduce to selection ordering with minor differences– Many classes of execution plans are equivalent to selection ordering for
adaptivity purposes• E.g. plans with only index joins, pipelined left-deep plans
– Will study next
Queries with parallel or distributed selections– E.g. parallel multi-processor systems [CDHW’06], queries over web data
sources [EHJ+’96, SMWM’06]– Main differences
• The optimization metric is throughput– Requires use of aggressive horizontal partitioning
• Precedence constraints between the selections
Many open problems
Recap Studied four query processing techniques for selection ordering
Static Selinger-StyleOptimization
Mid-queryReoptimization
A-Greedy Eddies
increasing adaptivity
Exploit horizontal partitioning
increasing explored plan space
Discussion
Benefits for AQP techniques come from two places– Increased explored plan space
• Can use different plans for different parts of data – Adaptation
• Can change the plan according to changing data characteristics
Selection ordering is STATELESS– No inherent “cost of switching plans”
• Can switch the plan without worrying about operator states– Key to the simplicity of the techniques
Discussion
Adaptation is not free– Costs of monitoring and maintaining statistics can be very high
• A selection operation may take only 1 instruction to execute• Comparable to updating a count
– “Sufficient statistics”• May need to maintain only a small set of statistics to detect
violations • E.g. The O(n2) matrix in Adaptive-Greedy [BBMNW’04]
Adaptive Join Processing
Outline
20th Century Adaptivity: Intuition from the Classical Systems
Adaptive Selection Ordering
Adaptive Join Processing– Additional complexities beyond selection ordering– Four plan spaces
• Simplest: pipelines of Nested Loop Joins• Traditional: Trees of Binary Operators (TOBO)• Multi-TOBO: horizontal partitioning• Dataflows of unary operators
– Handling asynchrony Research Roundup
Select-Project-Join Processing Query: select count(*) from R, S, T
where R.a=S.a and S.b=T.b and S.c like ‘%name%’ and T.d = 10
An execution plan
Cost metric: CPU + I/O Plan Space:
– Traditionally, tree of binary join operators (TOBO):• access methods• Join algorithms• Join order
– Adaptive systems: • Some use the same plan space, but switch between plans during execution• Others use much larger plan spaces
– different adaptation techniques adapt within different plan spaces
S
T
R
SMJ
NLJ
Differences from Adaptive Selections
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
What Features?What Features?
Join Join fanoutsfanouts
When?When?More sparinglyMore sparingly
When?When?
At points where it is easy to At points where it is easy to switch plans. switch plans. -- complicated by -- complicated by stateful operators stateful operators
Model?Model?
Plan Space?Plan Space?
Graphs of Graphs of statefulstateful, , multi-ary multi-ary operatorsoperators
What can be changed?What can be changed?
Join order, access Join order, access methods, join algorithms, methods, join algorithms, ……
Overhead?Overhead?
same as beforesame as beforeOverhead?Overhead?
Optimization costOptimization costOverhead?Overhead?
Switching costsSwitching costs
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Pipelined Nested Loops Join
Simplest method of joining tables– Pick a driver table (R). Call the rest driven tables– Pick access methods (AMs) on the driven tables– Order the driven tables– Flow R tuples through the driven tables
For each r R do:look for matches for r in A;for each match a do:
look for matches for <r,a> in B;…
RB
NLJ
C
NLJ
A
NLJ
Adapting a Pipelined Nested Loops Join
Simplest method of joining tables– Pick a driver table (R). Call the rest driven tables– Pick access methods (AMs) on the driven tables– Order the driven tables– Flow R tuples through the driven tables
For each r R do:look for matches for r in A;for each match a do:
look for matches for <r,a> in B;…
RB
NLJ
C
NLJ
A
NLJ
Keep this fixed for now
“competition”
Almost identical to selection
ordering
Ordering the driven tables
Let ci = cost/lookup into i’th driven table, si = fanout of the lookup
As with selection, cost = |R| x (c1 + s1c2 + s1s2c3) Only difference from selection ordering
– Fanouts s1,s2,… can be > 1– So the cost formula, and rank ordering, don’t quite work (why?)
Otherwise, both static optimization and adaptive techniques are exactly as for selection ordering– rank ordering, eddies, etc. are all applicable
RB
NLJ
C
NLJ
A
NLJ
RC
NLJ
B
NLJ
A
NLJ
(c1, s1) (c2, s2) (c3, s3)
ASIDE: Cache effects in Join Processing
cached lookups are much faster than others– especially important when fanout>1:
for each tuple of R, all but 1 of the s1 rows in RxA will hit in the cache of B
– Say c1 = c2 = 1 (non-cached), and = 0 (cached)– Say s2 = 2– Say s1 = 0 for 50% of the tuples; = 7 for 50% of the tuples
=> on average s1 = 3.5 Rank ordering suggests (RBA) (same cost, B has lower fanout) BUT, cost(RBA) = ( 1 + 1*1) = 2 |R|
cost(RAB) = ( 1 + 0.5*1) = 1.5 |R|– fanout and cost are both random variables– fanout and cost are correlated– so E(cs) <> E(c)E(s)
RB
NLJ
A
NLJ
(c1, s1) (c2, s2)
Cache is an ugly detail in join processing. Not well studied.
Picking access methods (AM) on each table
Multiple AMs / table => Combinatorial explosion of possible plans– choice depends on selectivity, clustering, memory availability, …– correlations
Static optimization: explore all possibilities and pick best Adaptive: Run multiple plans in parallel for a while,
and then pick one and discard the rest (DEC RDB 96)– Cannot easily explore combinatorial options– Problem of duplicates: will return to this later
RB
NLJ
C
NLJ
A
NLJ
RA
NLJ
C
NLJNLJ
B
Pipelined NLJN Adaptivity
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
On each tupleOn each tupleWhen?When?
On each tupleOn each tupleWhen?When?
On each tupleOn each tuple
What Features?What Features?
Fanouts for each table; Fanouts for each table; costs for each AMcosts for each AM
Plan Space?Plan Space?ordering of driven tablesordering of driven tables
choice of AMschoice of AMs
What can be changed?What can be changed?
ordering of driven tablesordering of driven tables choice of AMschoice of AMs
Overhead?Overhead?Statistics collection,Statistics collection,Cost of explorationCost of exploration
Overhead?Overhead?NoneNone
Overhead?Overhead?
Switching AM Switching AM can lead to duplicatescan lead to duplicates
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Tree Of Binary Join Operators (TOBO) recap
Standard plan space considered by most DBMSs today– Pick access methods, join order, join algorithms– search in this plan space done by dynamic programming
A
B
R
MJ
NLJ
NLJ
DC
HJ
Switching plans during query execution
Monitor cardinalities at well-defined checkpoints– Provide plentiful opportunities for switching plans
If actual cardinality is too different from estimated,– Avoid unnecessary plan switching (where the plan doesn’t change)
re-optimize to switch to a new plan– Avoid loss of work during plan switching
Most widely studied technique:-- Federated systems (InterViso 90, MOOD 96), Red Brick,
Query scrambling (96), Mid-query re-optimization (98), Progressive Optimization (04), Proactive Reoptimization (05), …
Where?
How?
Threshold?
A
C
B
R
MJ
NLJ
MJ
B
C
HJ
MJ
sort
sort
Challenges
Where to place Checkpoints? (1)
Lazy checkpoints: placed above materialization points – No work need be wasted if we switch plans here
Eager checkpoints: can be placed anywhere– May have to discard some partially computed results– useful where optimizer estimates have high uncertainty
A
C
B
R
MJ
NLJ
MJ
sort
Lots of checkpoints => lots of opportunities for switching plans– Overhead of monitoring is small [LEO 01]
BUT, it is easier to switch plans at some checkpoints than others
sortLazy
Eager
Where to place checkpoints? (2) Lazy checkpoints depend on materialization
Materializations are quite common – Sorts– Inner of hash joins– Explicit materializations operators (e.g., for common sub-expressions, …)
Forced materialization– E.g., outer of nested loops join (NLJ)– Outer cardinality must be low for NLJ to be good
(if it isn’t low, the plan is likely suboptimal anyway)
“Rescheduling” plan operators so that checkpoints are reached early– This allows mis-estimations to be caught early– Especially useful in federated queries
When to re-optimize? Say at a checkpoint actual cardinality is different from estimates:
how high a difference should trigger a re-optimization?
Idea: do not re-optimize if current plan is still the best
Validity range: range of a parameter within which plan is optimal– During execution, re-optimize if value at a checkpoint falls outside this range– Place eager checkpoints where ever the validity range is narrow
ASIDEthis is related to parametric and robust optimization also. Quite a few recent papers.E.g., bounding boxes [BBD05] model uncertainty of estimates
Validity Range Determination
when plans P1 and P2 are considered during optimizer pruning– costP1(est_cardouter) < costP2(est_cardouter)
– numerically find lower and upper bounds (lb, ub) on cardouter s.t. P1 beats P2
– validity_rangeP1 = validity_rangeP1 (lb,ub) This finds the correct validity range w.r.t plans with the same join order
P2
L2
P Q
P3
validityrange (P1)
P1P2cost
cardinality
P1
L1
P Q
ASIDE • Cost functions are rarely well-behaved (see [Haritsa 05])
• still, numerical techniques work okay on 1-dimension• Open problems
• multi-dimensional cost functions• validity ranges w.r.t. plans with different join orders
How to switch to a new plan? (1)One Approach: Re-invoke the optimizer: Getting a better plan:
– Plug in actual cardinality information learned during this query,and re-run the optimizer
Reusing work when switching to the better plan:– Treat fully computed intermediate results as materialized views
• Everything that is under a materialization point
– Note: It is optional for the optimizer to use these in the new plan
ASIDEit is not always easy to exploit cardinalities over intermediate results. E.g, given actual |ab(R)| and |bc (R)|, what is best estimate for |abc(R)|?Recent work using maximum entropy [MMKTHS 05]
How to switch to a new plan? (2) Other approaches also possible
Query Scrambling [UFA’98]– Reacts to delayed data sources by rescheduling operators in the plan
Proactive reoptimization [BBD’05]– Chooses plans that are easy to switch from, during optimization
Post-Mortem
Switching plans during execution has been the most popular adaptation method till now– Has proven to be fairly easy to add to existing query processors
Works well for plans with lots of dams Where it doesn’t work well
– Pipelined plans– Databases with endemic correlation
Open issues– Switching between parallel query plans; need global barriers
Adaptivity in Switching Plans during Execution
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
At checkpointsAt checkpointsWhen?When?
At failed checkpointsAt failed checkpointsWhen?When?
At failed checkpointsAt failed checkpoints
What Features?What Features?
Actual cardinalitiesActual cardinalitiesModel?Model?
Usual optimizer statisticsUsual optimizer statistics
Plan Space?Plan Space?TOBOTOBO
What can be changed?What can be changed?
entire planentire plan
Overhead?Overhead?
Monitoring overheadMonitoring overheadOverhead?Overhead?
Re-optimization costRe-optimization costOverhead?Overhead?
Lose all the work that is Lose all the work that is not captured in mat. views not captured in mat. views
or is not reused.or is not reused.
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Convergent Query Processing Ives02 Switch plans, but use
pipelining query operators
Measure cardinalities at runtime and compare to optimizer estimates
Replan when different– Can be done mid-pipeline
Execute new plan starting on new data inputs
End of query: cleanup phase computes cross-phase results
CQP Discussion Direct connection to horizontal partitioning
– Clean algebraic interpretation Easy to extend to more complex queries
– Aggregation, grouping, subqueries, etc.
Cross-phase results postponed to final phase– Despite all the data having arrived
Adaptivity in CQP
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
After a tuple is consumed After a tuple is consumed in pipelinein pipeline
When?When?Upon drift of execution Upon drift of execution
from estimatesfrom estimates
When?When?Upon generation of new Upon generation of new
planplan
What Features?What Features?
Actual cardinalitiesActual cardinalitiesModel?Model?
Usual optimizer statisticsUsual optimizer statistics
Plan Space?Plan Space?Optimizer’s native space Optimizer’s native space
(TOBO)(TOBO)
What can be changed?What can be changed?
Plan for next phase, Plan for next phase, eventual state of cleanup eventual state of cleanup
phasephase
Overhead?Overhead?
Monitoring overheadMonitoring overheadOverhead?Overhead?
Re-optimization costRe-optimization costOverhead?Overhead?
Postponement of results to Postponement of results to cleanup.cleanup.
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Transition to Eddies with Joins We have just seen one method that adapts, within TOBO
– Switch plans at checkpoints – primarily at materialization points
Now we return to eddies, but this time using them to do joins. – Eddy is a router that sits between operators,
intercepts tuples from sources and output tuples from operators, and – Uses feedback from the operators to route
In plans with join operators, Eddy can change the plan in a more fine grained fashion, provided:– Operators give it control in a fine-grained fashion
• Needs pipelined join operator
Example Database
Name Level
Joe Junior
Jen Senior
Name Course
Joe CS1
Jen CS2
Course Instructor
CS2 Smith
select *from students, enrolled, courseswhere students.name = enrolled.name and enrolled.course = courses.course
Students Enrolled
Name Level Course
Joe Junior CS1
Jen Senior CS2
Enrolled Courses
Students Enrolled
Courses
Name Level Course Instructor
Jen Senior CS2 Smith
Symmetric Hash Join
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level CourseJen Senior CS2
Joe Junior CS1
Joe Senior CS2
Students Enrolled
Pipelined hash join [WA’91] Simultaneously builds and probes hash tables on
both sides
Widely used: adaptive qp, stream joins, online aggregation, …
Naïve version degrades to NLJ once memory runs out– Quadratic time complexity– memory needed = sum of inputs
Improved by XJoins [UF 00] Needs more investigation
Query Execution using Eddies
EddySEC
Probe to find matches
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Junior
Joe JuniorJoe Junior
No matches; Eddy processesthe next tuple
Output
Insert with key hash(joe)
Query Execution using Eddies
EddySEC
InsertProbe
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Jr
Jen Sr
Joe CS1
Joe CS1Joe CS1
Joe Jr CS1
Joe Jr CS1Joe Jr CS1
Output
CS2 Smith
Query Execution using Eddies
EddySEC
Output
Probe
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Jr
Jen Sr
CS2 Smith
Jen CS2
Joe CS1
Joe Jr CS1Jen CS2
Jen CS2Jen CS2 Smith
Probe
Jen CS2 SmithJen CS2 SmithJen Sr. CS2 Smith
Jen Sr. CS2 Smith
Eddies: Postmortem (1)
• Eddy executes different TOBO plans for different parts of data
Students Enrolled
Output
Courses
E C
S E
Courses Enrolled
Output
Students
E S
C E
Course InstructorCS2 Smith
Course InstructorCS2 Smith
Name CourseJoe CS1
Name LevelJoe Junior
Jen Senior
Name LevelJoe Junior
Jen Senior
Name CourseJen CS2
Eddies: Postmortem (2)
• By changing the routing of tuples, Eddy adapts the join order during query execution• Access methods and join algorithms still chosen up front
Students Enrolled
Output
S E
Courses
C E
Course InstructorCS2 Smith
Name CourseJoe CS1
Jen CS2
Name LevelJoe Junior
Jen Senior
Routing Policies routing of tuples determines join order
same policies as described for selections– can simulate static order– lottery scheduling– can tune policy for interactivity metric [RH02]
much more work remains to be done
Summary
Eddies dynamically reorder pipelined operators, on a per-tuple basis– selections, nested loop joins, symmetric hash joins– extends naturally to combinations of the above
This can also be extended to non-pipelined operators (e.g. hybrid hash)
But, eddy can adapt only when it gets control (e.g., after hash table build)– just like with mid-query reoptimization
Next we study two extensions that widen the plan space still further– STAIRs: moving state across join operators– SteMs: unary operators for adapting access methods,
join algorithms dynamically
Adaptivity in Eddies with binary join operators
ActuateActuatePlanPlan
Measure/ModelMeasure/Model
When?When?
Continuously during query Continuously during query executionexecution
When?When? Continuously during query Continuously during query
executionexecution
When?When?
Continuously duringContinuously duringquery executionquery execution
What Features?What Features?
Cardinalities, costsCardinalities, costsModel?Model?
Plan Space?Plan Space?Multiple Trees of Binary Multiple Trees of Binary
Join OperatorsJoin Operators
What can be changed?What can be changed?
operator orderoperator order
Overhead?Overhead?
Monitoring cost, Monitoring cost, exploration costexploration cost
Overhead?Overhead?
nonenoneOverhead?Overhead?
Routing costRouting cost
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Burden of Routing History
EddySEC
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Joe Jr
Jen Sr
Joe CS1
Joe Jr CS1
Output
CS2 Smith
As a result of routing decisions, state got embedded inside the operators
What if these turn out to be wrong ?
Burden of Routing History
EddySEC
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Output
Hypothetical scenario 1. Tuples from E arrive before either S or C. 2. Eddy routes E tuples to
3. Turns out to be a bad idea; not selective 4. NO WAY TO RECOVER !!
E
E E S E
S E
S
SSE
S
Embedded state
STAIRs [DH’04] Observation:
– Changing the operator ordering not sufficient– Must allow manipulation of state
New operator: STAIR– Expose join state to the eddy
• By splitting a join into two halves– Provide state management primitives
• That guarantee correctness of execution• Able to lift the burden of history
– Enable many other adaptation opportunities• E.g. adapting spanning trees, selective caching, pre-computation
STAIRs [DH’04]
EddySEC
S EHashTable
S.NameHashTable
E.Name
E C
HashTableE.Course
HashTableC.Course
Output
ES
STAIRs [DH’04]
EddySEC
Output
HashTable
SS.Name STAIR
HashTable
EE.Name STAIR
HashTable
E.course STAIR
HashTable
C.Course STAIR
E C
Provide state manipulation primitives - For moving state from one STAIR to another - Guaranteed to be correct
Can be used to move E from from E.name STAIR to E.course STAIRPrevents S JOIN E
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Granularity of Query Operators
Mostly, we have seen plans that use binary join operatorsBut A Join operator encapsulates multiple physical operations– Tuple access– Matching tuples on join predicate– Composing matching input tuples – Caching– ...– Access methods and join algorithms embedded inside join
This encapsulation is in itself a hindrance to adaptation
Will now see another style of join processing– Unary operators and N-ary joins– Varying the tuple routing to the unary operators changes:
• Join algorithm• Access methods used
Index Join
EddyP
cacheP
index
N-way Symmetric Hash Join
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level Course InstructorJen Senior CS2 Prof. B
Joe Junior CS1 Prof. A
Joe Senior CS2 Prof. B
Students Enrolled
Pipelined join– XJoin-like disk spilling possible
[VNB’03] Simplest version atomically does
one build + (n-1) probes Breaking this atomicity is key to
adaptation
Cour Inst
CS1 P. A
CS2 P. B
Levels
N-way Joins State Modules (SteMs)
Name Level
Jen Senior
Joe Junior
Name Course
Joe CS1
Jen CS2
Joe CS2
select * from students, enrolled where students.name = enrolled.name
Name Level Course InstructorJen Senior CS2 Prof. B
Joe Junior CS1 Prof. A
Joe Senior CS2 Prof. B
Students Enrolled
Cour Inst
CS1 P. A
CS2 P. B
Levels
SteM is an abstraction of a unary operator By adapting the routing between SteMs, we can
– Adapt the join ordering (as before)– Adapt access method choices– Adapt Join Algorithms
• Hybridized Join Algos (e.g. hash join index join on memory ov.flow)• Much larger space of join algorithms
– Adapt join spanning trees
Also useful for Sharing State across Joins– Useful for continuous queries [MSHR’02, VNB’03]
State Modules (SteMs)
EddyRS
T
JOINRS
JOINRT
EddyRS
TT
SteMS
SteMT
SteMR
Query Processing with Unary operators (1)
SELECT FROM R,S,TWHERE <conditions>
Pick one access method / tablePick one spanning treePick join algorithms
EddyRS
T
JOINRS
JOINRT
SELECT FROM R,S,TWHERE <conditions>
Pick all access methods (AMs)Create a SteM on each base table(optionally on intermediate tables also)
EddyRS
T
SteMS
SteMT
SteMR
T
Example
Benefit: Exposes internal join operations to Eddy– For monitoring/measurement– For control
Query Processing with Unary operators (2)
R
Eddy
ind.jn
R
Eddy
P PR
Eddy
P
hash jn
+P
State Modules (SteMs): data structures– Dictionary of homogeneous tuples– (insert) build and (search) probe operations– Evictions possible, see paper
Access modules (AMs)– probe with tuples and get matches– scan: probe with empty “seed” tuple
All operators can return results asynchronously will bounce back incoming tuples, in some circumstances
– Returning the tuple to the Eddy (because it still needs to probe other tables to generate output tuples)
Two Unary operators
build R
probe ST
RST matches
SteMR
SRP probe
S matches
AMS
Query Execution
EddySEC
Insert
Probe
S SteM
Joe Jr
Jen Sr
Joe CS1
CS2 Smith
E SteM
C SteM
Jen CS2
Jen CS2 Smith
Jen Sr. CS2 SmithJen CS2Jen CS2
Jen CS2
Jen CS2
Jen CS2 Smith
Jen Sr. CS2 Smith
Probe
Join algorithm determined
by how we route tuples
Symmetric Hash Join– Build SteMR
– Probe SteMS with R tuple– Build SteMS
– Probe SteMR with R tuple– Repeat
(BLDRPROBESTEM_S | BLDSPROBESTEM_R)*
Symmetric Hash join
R
Eddy
S
S probe
S bl
dR bld
S probeR probe
Symmetric Hash Join(BLDRPROBESTEM_S | BLDSPROBESTEM_R)*
– Alternate builds and probes Synchronous Index Join
(PROBEIDX_S)*
Synchronous Index join
R
Eddy
Synchronous index join S
with Caching:– S matches are cached in SteMS
– R tuples can probe SteMS first before AMS
Asynchronous Index join
R
EddyS matches
S probe
S bl
d
R
R bldRS RSR probe
S
cacherendezvousbuffer
First introduced for deep-web (WSQ/DSQ [GW’00])
With SteMs– build R into SteMR (rendezvous buffer)
– probe AMS with R
– S matches async. probe SteMR
– (BLDRPROBEIDX_S)* | (PROBESTEM_R)*
ASIDE: Asynchrony is a very useful technique in several contexts – disk, parallel cpus, distributed data, …
Understudied in research literature
Symmetric Hash Join– (BLDRPROBESTEM_SBLDSPROBESTEM_R)*
– Alternate builds and probes Synchronous Index Join
(PROBEIDX_S)*
Grace/Hybrid Hash Join– Build all R into SteMR
– Build all S into SteMS
– probe SteMR with all S– (BLDR)* (BLDS)* (PROBESTEM_R)*– For i/o efficiency, we must cluster probes by partitions
• Importance of asynchronous operations
Grace Hash join
R
Eddy
S
S probe
S bl
dR bld
S probeR probe
Eddy can adapt between both joins within the same query
Hybridized Join Algorithmse.g. – Start with index join
(for pipelining first few rows)– Later switch to hash join for performance– When memory runs out, switch back to index join
Alternate AMs (access modules) run competitively– But lookup cache (SteMS) shared by both AMs
• eliminates duplicates• Avoids cache fragmentation => switch join algorithms w/o losing work
Hybridized Joins
R
Eddy
S matches
S probe
S bl
d
R
R bldRS RSR probe
S
S
Hybridized Hash-Index Join
0 50 100 150 200 Time(seconds)N
umbe
r of r
esul
t tup
les
outp
ut200
150
100
50
0
Num
ber o
f res
ult t
uple
s ou
tput
indexjoin
hybridhybrid
hashjoin
idx. join
1000
800
600
400
200
00 10 20 30
Time(seconds)
R
EddyRS
S probe
S bl
d
R
R bld RSR probe
S
cacherendezvousbuffer
S
S scan
Starts off as async index join
Ends like symm hash join
But, routing constraints needed to make this work correctly!
Routing Constraints Flexibility given by SteMs can allow incorrect algorithms E.g. BLDRBLDSPROBESTEM_RPROBESTEM_S
– Causes False Duplicates– Arises whenever we decouple build & probe
Cyclic queries, Competitive AMs, Asynchrony also cause errors
Need Routing constraints on order of builds and probes– define space of permissible query executions– Subject to these constraint, Eddy routes to maximize performance
Paper [R01, R+03] give routing constraints that characterize the space of correct routing policies
– Essentially, a space of correct join algorithms• Much broader than TOBO
R
Eddy
S
S probe
S bl
dR bldS probe
R probe
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Approach Pipelined nested-loops plans
– Static Rank Ordering– Dynamic Rank Ordering – Eddies, Competition
Trees of Binary Join Operators (TOBO)– Static: System R– Dynamic: Switching plans during execution
Multiple Trees of Binary Join Operators– Convergent Query Processing– Eddies with Binary Join Operators– STAIRs: Moving state across joins
Dataflows of Unary Operators– N-way joins
switching join algorithms during execution
Asynchrony in Query Processing
BNLJ
CNLJ
ANLJ
Need for Asynchrony Many events in query processing are inherently asynchronous
– I/O– Remote data sources– Parallel computation
Forcing these to be synchronous hinders flexibility in doing joins
Fascinating topic with plenty of possibilities– Will only touch on this briefly here– See papers for more info: WSQ/DSQ, SteMs, Query scrambling
Symmetric Hash Join(BLDRPROBESTEM_SBLDSPROBESTEM_R)*
– Alternate builds and probes Asynchronous Index Join
– More general & flexible than sync. join– build R into SteMR (rendezvous buffer)– probe AMS with R– S matches async. probe SteMR
– (BLDRPROBEIDX_S)* | (PROBESTEM_R)*
First introduced for web data sources – WSQ/DSQ [GW’00]
Asynchronous Index Join
R
Eddy
S matches
S probe
R
R bldRS
S
Symmetric Hash Join(BLDRPROBESTEM_SBLDSPROBESTEM_R)*
– Alternate builds and probes Asynchronous Index Join
– More general & flexible than sync. join– build R into SteMR (rendezvous buffer)– probe AMS with R– S matches async. probe SteMR
– (BLDRPROBEIDX_S)* | (PROBESTEM_R)*
– with Caching:• S matches are cached in SteMR
• R tuples can probe SteMS first before AMS
Asynchronous Index Join
R probe
R
Eddy
S matches
S probe
S bl
d
R
R bldRS RS
S
Research Roundup
Metrics What are the metrics of choice for database users? How do you get them?
– Completion time– Query throughput– Interactivity (& ability to change query on the fly)– Business process based
Digging deeper, parameterize these metrics:– Expected value of metric (and for what distributions)– Variance in value of metric -- i.e. predictability– Quality relative to worst-case value of metric (“competitive analysis”)
• Under what kind of adversarial model for manipulating environment?
Cost and Benefit of adaptivity in these settings:– Benefit: for which is a static plan likely to fail badly? – Cost: for which is it low-overhead to change plans?– Relative costs of the different adaptivity schemes for various metrics?
As yet, not systematically
explored
Measurement & Models What is the right mix of model granularity and measurement timescales?
– Rich models periodically built on data (a la standard practice)– Lighter-weight models built during query processing
• For use in that query (as discussed today)• For use in future queries -- i.e. query feedback
– Some mixture:• To support both intra- and inter-query measurement• How to synchronize multiple models?
Can we deal with correlation without combinatorial explosions in stats?– Or “fleeing from knowledge to ignorance”
As yet, not systematically
explored
Execution Space Identify the “complete” space of post-mortem executions:
– Partitioning– Caching– State migration– Competition & redundant work
What aspects of this space are important? When?– A buried lesson of AQP work : “non-Selingerian” plans can win big!
• But characterize the circumstances? Can low-overhead “switchpoints” be well characterized?
– Generalize “moments of symmetry” argument from Eddy paper [AH00] Given this (much!) larger plan space, navigate it efficiently
– Especially on-the-fly
As yet, not systematically
explored
Robustness How do users get to understand and control the system
– What is the equivalent of EXPLAIN in an adaptive system?– Is this important? Will it be important in 10 years?
Synergy between “robust optimization” and adaptivity? Robust metrics may be important to “gate” feedback overheads Adaptivity may help system stay within predictable parameters– Track these metrics on the fly?– Do they significantly prune the plan space?– What sacrifice is made in terms of ideal performance
As yet, not systematically
explored
Formalisms and Tie-ins There are clear connections here to:
– Online algorithms– Machine Learning and Control Theory
• Bandit Problems• Reinforcement Learning
– Operations Research scheduling
Connections in their infancy– Mostly for selection ordering, and even there, mostly static– Can we capture adaptivity and correlation together?– Can we capture join processing
• Access method selection• The “burden” of state, and state migration• Cache effects
– For many metrics and computational models• Including parallelism and asynchrony!
As yet, not systematically
explored
Engineering: Evolutionary & Revolutionary What can/should be exploited by the Big Three
– Based on an understanding of the expanded plan space– How “hard” is it to graft techniques into a traditional engine?
• Iterator-based dataflows• Encapsulated join algorithms• Carefully tuned but simple cost models, measurement and models
– Note: this is the subject of some controversy• E.g. how “fundamental” a shift is it to add eddies?
What is the right “from-scratch” architecture for new systems– What is the execution model?
• Pull-oriented iterators wrong for anything that accesses a network• Eddy-centric architecture: overkill? Eddy as operator, not architecture.• Middle ground -- inspiration from NW routers or search engines?
– What is the role of an explicit optimizer?• Is it used for apriori planning?• Is it called during execution?
– Connection to auto-tuning database design component?
As yet, not systematically
explored
In Sum Adaptivity is the future (and past!) of query execution
– No new system can afford to ignore it– Old systems probably can’t either
Lots of innovation over the last few years– Basic reassessment of 25 years of conventional wisdom– Exploration of architectures, algorithms and optimizations
Lessons and structure emerging– The adaptivity “loop” and its separable components– Selection ordering as a clean “kernel”, and its limitations– Horizontal partitioning as a logical framework for understanding/explaining
adaptive execution• Expanding the plan space• The corresponding challenge of learning/exploiting correlations
– The critical and tricky role of state in join processing– Asynchrony and its challenges
In Sum But given all this innovation…
There is a lot of science and engineering to be done– Much new material has been exposed, but little of it is fully understood
• Either theoretically or empirically– As with most of query processing there are many opportunities for various
modes of research• Architectural• Algorithmic • Experimental
Success depends on future applications of DB query processing ideas– Challenge: identify (or create!) the new environments that matter
References [A-D03] R. Arpaci-Dusseau. Runtime Adaptation in River. ACM TOCS 2003. [AH’00] R. Avnur, J. M. Hellerstein: Eddies: Continuously Adaptive Query Processing SIGMOD
Conference 2000: 261-272 [Antoshenkov93] G. Antoshenkov: Dynamic Query Optimization in Rdb/VMS. ICDE 1993: 538-547. [BBD’05] S. Babu, P. Bizarro, D. J. DeWitt. Proactive Reoptimization. VLDB 2005: 107-118 [BBDW’05] P. Bizarro, S. Babu, D. J. DeWitt, J. Widom: Content-Based Routing: Different Plans for
Different Data. VLDB 2005: 757-768 [BC02] N. Bruno, S. Chaudhuri: Exploiting statistics on query expressions for optimization. SIGMOD
Conference 2002: 263-274 [BC05] B. Babcock, S. Chaudhuri: Towards a Robust Query Optimizer: A Principled and Practical
Approach. SIGMOD Conference 2005: 119-130 [BMMNW’04] S. Babu, et al: Adaptive Ordering of Pipelined Stream Filters. SIGMOD Conference 2004:
407-418 [CDHW06] Flow Algorithms for Two Pipelined Filter Ordering Problems; Anne Condon, Amol
Deshpande, Lisa Hellerstein, and Ning Wu. PODS 2006. [CDY’95] S. Chaudhuri, U. Dayal, T. W. Yan: Join Queries with External Text Sources: Execution and
Optimization Techniques. SIGMOD Conference 1995: 410-422 [CG94] R. L. Cole, G. Graefe: Optimization of Dynamic Query Evaluation Plans. SIGMOD Conference
1994: 150-160. [CHG02] F. C. Chu, J. Y. Halpern, J. Gehrke: Least Expected Cost Query Optimization: What Can We
Expect? PODS 2002: 293-302 [CN97] S. Chaudhuri, V. R. Narasayya: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL
Server. VLDB 1997: 146-155
References (2) [CR94] C-M Chen, N. Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD
Conference 1994: 161-172 [DGHM’05] A. Deshpande, C. Guestrin, W. Hong, S. Madden: Exploiting Correlated Attributes in
Acquisitional Query Processing. ICDE 2005: 143-154 [DGMH’05] A. Deshpande, et al.: Model-based Approximate Querying in Sensor Networks. In VLDB
Journal, 2005 [DH’04] A. Deshpande, J. Hellerstein: Lifting the Burden of History from Adaptive Query Processing.
VLDB 2004. [EHJKMW’96] O. Etzioni, et al: Efficient Information Gathering on the Internet. FOCS 1996: 234-243 [GW’00] R. Goldman, J. Widom: WSQ/DSQ: A Practical Approach for Combined Querying of Databases
and the Web. SIGMOD Conference 2000: 285-296 [INSS92] Y. E. Ioannidis, R. T. Ng, K. Shim, T. K. Sellis: Parametric Query Optimization. VLDB 1992. [Ives02] Z. G. Ives. Efficient Query Processing for Data Integration. Ph.D., U. Washington, 2002. [K’01] M.S. Kodialam. The throughput of sequential testing. In Integer Programming and Combinatorial
Optimization (IPCO) 2001. [KBZ’86] R. Krishnamurthy, H. Boral, C. Zaniolo: Optimization of Nonrecursive Queries. VLDB 1986. [KD’98] N. Kabra, D. J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution
Plans. SIGMOD Conference 1998: 106-117 [KKM’05] H. Kaplan, E. Kushilevitz, and Y. Mansour. Learning with attribute costs. In ACM STOC, 2005. [KNT89] Masaru Kitsuregawa, Masaya Nakayama and Mikio Takagi, "The Effect of Bucket Size Tuning
in the Dynamic Hybrid GRACE Hash Join Method”. VLDB 1989.
References (3) [LEO 01] M. Stillger, G. M. Lohman, V. Markl, M. Kandil: LEO - DB2's LEarning Optimizer. VLDB 2001. [MRS+04] Volker Markl, et al.: Robust Query Processing through Progressive Optimization. SIGMOD
Conference 2004: 659-670 [MSHR’02] S. Madden, M. A. Shah, J. M. Hellerstein, V. Raman: Continuously adaptive continuous
queries over streams. SIGMOD Conference 2002: 49-60 [NKT88] M. Nakayama, M. Kitsuregawa, and M. Takagi. Hash partitioned join method using dynamic
destaging strategy. In VLDB 1988. [PCL93a] H. Pang, M. J. Carey, M. Livny: Memory-Adaptive External Sorting. VLDB 1993: 618-629 [PCL93b] H. Pang, M. J. Carey, M. Livny: Partially Preemptive Hash Joins. SIGMOD Conference 1993. [RH’05] N. Reddy, J. Haritsa: Analyzing Plan Daigrams of Database Query Optimizers; VLDB 2005. [SF’01] M.A. Shayman and E. Fernandez-Gaucherand: Risk-sensitive decision-theoretic diagnosis.
IEEE Trans. Automatic Control, 2001. [SHB04] M. A. Shah, J. M. Hellerstein, E. Brewer. Highly-Available, Fault-Tolerant, Parallel Dataflows ,
SIGMOD, June 2004. [SHCF03] M. A. Shah, J. M. Hellerstein, S. Chandrasekaran and M. J. Franklin. Flux: An Adaptive
Partitioning Operator for Continuous Query Systems, ICDE, March 2003. [SMWM’06] U. Srivastava, K. Munagala, J. Widom, R. Motwani: Query Optimization over Web Services;
VLDB 2006. [TD03] F. Tian, D. J. Dewitt. Tuple Routing Strategies for Distributed Eddies. VLDB 2003. [UFA’98] T. Urhan, M. J. Franklin, L. Amsaleg: Cost Based Query Scrambling for Initial Delays. SIGMOD
Conference 1998: 130-141 [UF 00] T. Urhan, M. J. Franklin: XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data
Eng. Bull. 23(2): 27-33 (2000) [VNB’03] S. Viglas, J. F. Naughton, J. Burger: Maximizing the Output Rate of Multi-Way Join Queries
over Streaming Information Sources. VLDB 2003: 285-296 [WA’91] A. N. Wilschut, P. M. G. Apers: Dataflow Query Execution in a Parallel Main-Memory
Environment. PDIS 1991: 68-77