1 Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected] Elisa Bertino Purdue...

21
The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm New England Database Summit 2010 Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected] Elisa Bertino Purdue University [email protected] 1 Rimma V. Nehme Microsoft Jim Gray Systems Lab [email protected] Thanx goes to NSF 0917017 for partial support of this project.

Transcript of 1 Elke. A. Rundensteiner Worcester Polytechnic Institute [email protected] Elisa Bertino Purdue...

1

The Query Mesh Project: A Powerful Multi-Route Query Processing

Paradigm

New England Database Summit 2010

Elke. A. RundensteinerWorcester Polytechnic Institute

[email protected]

Elisa BertinoPurdue University

[email protected]

1

Rimma V. NehmeMicrosoft Jim Gray

Systems [email protected]

Thanx goes to NSF 0917017 for partial support of this project.

2

Motivation A variety of modern applications face data with non-uniform

characteristics ubiquitous healthcare, location-based services, financial tickers, network

monitoring…

Data

Query Results

Data Sources Database Engine

SELECT * FROM …

Query Optimizer

Plan Cost

1.234

Query

Query Execution Plan

Query

Executor

Overa

ll Sta

tist

ics

I want my results quickly. I don’t

care how exactly they are computed

TYPICALLY ONE

execution plan for

ALL DATA

2

3

Concrete Example: Network Monitoring

Data Streams

Query Results

Network packets

DSMS

SELECT * FROM …

Query Optimizer

Continuous Query

Query Execution Plan

Network Monitoring

Multi-Plan (/Route) Query ProcessingPlan 1 Plan 2 Plan 3

Single Plan Query ProcessingOpportunity for Improvement:

It may be more efficient to use different plans for different subsets

of data

3

• Here example is with streaming data• Similar examples can be found with static data

4

Outline Introduction & Motivation Background : Query Mesh

Model Optimization Execution

Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Experimental Evaluation

Ongoing and future work Conclusion

4

5

(Here, route = execution plan)

Multi-Plan Query Processing Using Query Mesh

Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems

Single “route-oriented” solution

Multiple routesClassifier

Traditional Query Optimization Eddies and its descendants

Multi “route-less” solution

Eddy

Query Mesh

Multi “route-oriented” solution

Coarse optimization

Small overhead

Fine-granularity optim.

Significant overhead

Fine-granularity optimization

Less overhead

Physical Architecture of Query Mesh Framework

5

6

Query Mesh Search Space

1234

1/2/3/4

1/23/4 14/2/3 1/24/3 13/2/4 12/3/4 1/2/34

14/23 1/234 124/3 13/24 123/4 134/2 12/34

Set of training tuples {1,2,3,4}* has cardinality n = 4

* We denote {{1},{2,3}} as “1/23” for brevity

One plan for all data

Each subset has individual route

Query MeshLattice ShapedSearch Space

6

Search Space: the set of all possible solutions

Search Space Complexity

Bell number Bn = sum of Stirling numbers of second kind S(n,k)

Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n

into exactly k nonempty subsets

7

Query Mesh Optimization Problem

7

Query Mesh Cost Model(main idea)

Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead

Query Mesh Search Algorithms

Optimal Query Mesh Search (Opt-QM)

Query Mesh Search Heuristics

Start solution

Final solution

= explored solutions

Three components of search heuristics: (1) Start Solution 5 different approaches - extreme-1, extreme-N, random, content-driven, route-driven Experimentally evaluated (2) Search Strategy Randomized algorithms -Iterative Improvement - Simulated annealing (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, Resource-boundedToo expensive! Need heuristics!

(1) Form all possible sets for the given powerset

(2 ) Form partitions out of the above sets

Main idea:

8

Query Mesh Optimization Overview

Sample of Tuples(training dataset)

t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…

Data Stream

Query Executor

Query Optimizer

… samplesamplesampleand so on

Compute Routes (i.e., plans)

Query Mesh

Induce Classifierr3

r4

r2r1

r1 r2 r4

- QM Optimizer

- QM Executor

8[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009.

Query Mesh Execution Overview

Classification Window(tumbling window)

t5 t4 t3 t1

t9 t6 t2

t10 t8 t7

After Classification

route r1

route r2

route r3

t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…

<1,4,3,2>

<2,4,3,1>

<3,4,1,2>

r-tokensdata tuples

rusters

Send to Self-Routing

Fabric

Data Stream

Query Executor

Query Optimizer

- QM Optimizer

- QM Executor

9

[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009.

10

But… data characteristics may change…

At time T + 1

At time T + 2 At time T + 3

At time T

10

11

Can we have an execution strategy that

Dynamic Re-optimization with Query Mesh

is plan-basedsupports different plans for distinct subsets of datais as adaptive “as Eddies”

Self-Tuning Query Mesh (ST-QM)

11[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query

Processing, In EDBT 2009.

12

Outline Introduction & Motivation Background : Query Mesh

Model Optimization Execution

Dynamic Re-Optimization with Query Mesh Challenges Architecture Details

Conclusion Current and Future Work

12

13

Challenges

Multiple routesClassifier

Query Mesh

1. What should be monitored to determine whether the current QM solution is no longer adequate?

2. How to determine if the current QM solution should be adapted?

3. How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed?

Concept Drift Analysis, QM Cost Model, Improvement Measure

Data and Statistics Monitoring

Single Lightweight Operation to Physically Adapt QM

.

.Self-Tuning Query Mesh

Contributions

13

[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

14

ST-QM Architecture

Static QM Framework

Query Executor

Query Optimiz

er

Query Executor

Query Optimiz

er

ST-QM

Adaptive QM Framework

14

[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

15

ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted)

ST-QM Analyzer determines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted

ST-QM Actuator takes these recommendations and physically adapts the QM solution

ST-QM Components

ST-QMMonitor

ST-QMAnalyzer

ST-QMActuator

measurements recommendations

actuationsampling

15

Query Mesh

ST-QM

NewQuery Mesh

17

Classifier Modification

Query Mesh

………

Query Mesh

………

Query Mesh

………

R1 New Classifier + Old Routes

R2 Old Classifier + New Routes

R3 New Classifier + New Routes

ST-QM Actuator: Physical Query Mesh Adaptation

All possible recommendations:Case 1: Virtual Concept Drift RecommendationCase 2: Real Concept Drift RecommendationCase 3: Hybrid Concept Drift Recommendation

1

2

3

4

0

Query results

OI-arrayOp-modules

opi

opi

opk

opl

Self-Routing Fabric

Data

r1

r2

r3

r1

r2

r3

Online Classifier

rusters

rusters

CurrentClassifier

NewClassifier

The beauty of

the proposed design!!!

17

18

Experimental Evaluation

ST-QM was implemented inside Java-based continuous query engine called CAPE

Compare its relative performance against competitor systems, namely, we compared adaptive QM against: Static (non-adaptive) QM, Adaptive “plan-less” Eddies Adaptive “plan-less” Eddies with CBR-based routing policy

Results can be found in EDBT’ 2010.

18

19

Summary of ST-QM Experimental Results

ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach

The runtime overhead of ST-QM relative to query execution is small (on average 2%).

The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost

Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM

19

20

Conclusion

• Query Mesh is practical query optimization approach Eliminates single plan assumption Feasibility shown Has low overhead & high potential benefit Easily implemented and integrated with existing

systems

• Query Mesh leads to novel solutions Usage of machine learning in query optimization

and query processing Usage of network-inspired techniques in query

optimization and query processing20

21

Next Steps in QM Project

• Consider state caching and indexing in QM stream context

• Work with alternate classification methods for route decisions

• Design customized query optimization and processing strategies

• Study multi-query processing and optimization

• Scale by applying distributed processing technologies

• Do QM principles also apply in static DB context !?

21

22

Thank You for Listening !!!!!

22

Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more.