Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph...

27
Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph Freytag

Transcript of Query Task Model (QTM): Modeling Query Execution with Tasks 1 Steffen Zeuch and Johann-Christoph...

Query Task Model (QTM):Modeling Query Execution with

Tasks

1Steffen Zeuch and Johann-Christoph Freytag

Motivation✤ Different DBMS execute the same QEP using different

schedules

✤ Run-time execution not query optimization

✤ No uniform scheduling format

✤ Query execution in different DBMS are not comparable

✤ Major differences between DBMS:

✤ Chunk Size: Size of operator’s input

✤ Scheduling Strategy: Execution model vs. run-time scheduler

2

How to make different schedules com-parable to explain

why one schedule performs better than another?

Outline

1.Parallel Query Execution

2.QTM: Query Task Model

3.Evaluation

4.Outlook

3

Chunk Size

4

Selection

t1

Tuple-at-a-time

t1

Buffer-at-a-time

t1,t2, t3t1,t2,t3

t4, t5, t6

Column-at-a-time

t2

t3

t4

t5

t6

Chunk Size DBMS

1 Tuple System R, MySQL, (PostgreSQL)

“Fit into Cache” Monet X100, DB2 with BLU

Fix number of tuples Hyper

Fix Block Size C-Store

Column MonetDB MIL

Scheduling Strategie

5

R S T

Hash Build

Hash Build

Selection

HashProbe (S)

HashProbe (R)

Volcano Execution Model(Open-Next-Close Iterator)

6

R S T

Hash Build

Hash Build

Selection

HashProbe (S)

HashProbe (R)

Next

Next

NextTuple

Tuple

Tuple

(Run-time) Scheduler

7

T

Selection

HashProbe (S)

HashProbe (R)

Spatial Locality

Sel(t1)

Sel(t2)

Prob_S(t1)

Prob_S(t2)

Prob_R(t2)

Prob_R(t1)

Temporal Locality

Sel(t1)

Sel(t2)

Prob_S(t1)

Prob_S(t2)

Prob_R(t2)

Prob_R(t1)

TimeFurther Optimiziation Criteria:

I/O, NUMA or Memory Usage

Dynamic Load Balancing

8

T1T2

T3 T4 T5

CPU1 CPU2

R S T

T1 T2

T3

T4 T5

σσ

DBMS Landscape

9

Tuple-at-atime

Buffer-at-a

time

Column-at-atime

VolcanoExecution

Model

(Run-time)

Scheduler

DynamicLoad

Balancing

System RMySQL

PostgreSQL

DB2PostgreSQL

MonetDB X100

DB2 BLUStagedDB

Hyper

MonetDB MIL

SAP HANA

Ch

un

k S

ize

Scheduling Strategy

Outline

1.Parallel Query Execution

2.QTM: Query Task Model

3.Evaluation

4.Outlook

10

QTM: Query Task Model Idea: A model that describes parallel query

execution with tasks

QEP: Queue of tasks

Task: Encapsulate a piece of work on some data

Goals:

Open a design space for DBMS schedules

Make main aspects of query scheduling comparable:

Execution order, degree of parallelism and thread coordination, and partitioning 11

Query Task Model

12

Work

Data

ProcessingStrategies

T1 T3T2

Task Queue

Data Queuet1 t3t2

t1Tablet2

t3

QTM Transformation: Input

13

QEP Hardware Architecture Table Format

QTM Transformation

14

QEPChoosingHash Join Max. Pipelines

+Dependency Graph

QTM: Task Configuration

15

Max. Pipelines+

Dependency GraphTask Configurations

(Task Blueprints)

QTM: Tasks

16

Task Configuration (Task Blueprints)

Instantiation

Set of Tasks(TC Instantiation)

QTM: Implementation

17

Compile-time

Run-time

Outline

1.Parallel Query Execution

2.QTM: Query Task Model

3.Evaluation

4.Outlook

18

Evaluation: Scenario

19

Schedule Workload

Tuples per Relation

30M

Selection < 25M

S1 Values 0,1,2 …

S2 Values 0,2,4,…

S3 Values 0,4,8,…

Evaluation: Configuration

20

Schedule Buffer Size

Tasks per Op

Total Tasks

1) Tup – Pipe

1 30M 90M

2) Tup – Mat

1 30M 150M

3) Tup – Seq

1 30M 150M

4) Buf - CL 4 7.5M 22.5M

5) Buf – L1 2,048 14,649 43,947

6) Buf – L2 16,384 1,832 5,496

7) Buf – L3 491,520 62 186

8) Op - Mat

7.5M 4 20

9) Op - Seq

7.5M 4 20

Evaluation: Runtimes

21

Evaluation: Sampling

22Data-related Misses Instruction-related Misses

Evaluation: Miss Distribution

23

Evaluation: Scalability

24

Evaluation: Insights✤ Tradeoff between data and instruction cache performance

✤ Sweet spot: Largest private cache size vs. slightly larger buffer

✤ Medium sized tasks are data-efficient:

✤ Pros: Buffer fits entirely into cache, high data locality

✤ Cons: High number of tasks and instructions

✤ Large tasks are instruction-efficient:

✤ Pros: Decrease number of instructions and tasks, high instruction locality

✤ Cons: More data cache misses if cache size is exceeded

✤ QTM: Cache-performance can be adjusted by buffer size 25

Outline

1.Parallel Query Execution

2.QTM: Query Task Model

3.Evaluation

4.Outlook

26

Outlook✤ Contributions:

✤ QTM: A model for parallel query execution using tasks

✤ Open a design space for DBMS schedules

✤ Make different schedules present in different DBMS comparable

Thanks!✤ Future Work:

✤ Cost Model

✤ Transformation process for an arbitrary QEP27