Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros...

21
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh HDMS 2005

Transcript of Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros...

Page 1: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

Freshness-Aware Scheduling of Continuous Queries in the

Dynamic Web

Mohamed A. Sharaf Alexandros LabrinidisPanos K. Chrysanthis

Kirk Pruhs

Advanced Data Management Technologies LabDepartment of Computer Science

University of Pittsburgh

HDMS 2005

Page 2: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

2 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Motivation

Price < $5

Page 3: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

3 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Motivation

Web Server

Wish List 1

Price < $5Title = Star

Wars I

Price < $5

T P

T

Wish List 2

P

Price < $7Input data streams

Price = $8

Price = $4

Price = $4Price = $8Price = $4Price = $2Price = $2

Price = $2Price < $5

Price = $5

Price = $5

Page 4: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

4 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Continuous Queries

A continuous query is a long-running standing query whose execution is triggered with the arrival of new data

The output of a continuous query is a continuous data stream (e.g., e-mails or update personalized Web page)

A Web server should propagate updates to users as soon as they become available, however delays occur due to:

1) Time spent by a query processing updates

2) Time spent by a query waiting to be executed

Page 5: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

5 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Scheduling Multiple Continuous Queries

The execution order of continuous queries determines the overall behavior of the system. For example:

In Aurora: the Minimum Latency scheduler reduces response time [Carney et. al., VLDB’03]

In STREAM: the Chain scheduler minimizes memory usage [Babcock et. al., SIGMOD’03]

Problem Statement: Devise a policy for scheduling the execution of multiple

continuous queries (MCQ) with the objective of maximizing the overall quality of data (QoD) of output data streams

Page 6: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

6 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Outline

Motivation

Quality of Data (QoD)

Freshness-Aware Scheduling of MCQ (FAS-MCQ)

Experimental Evaluation

Conclusions and Future Work

Page 7: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

7 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Quality of Data

QoD based on freshness (deviation from the ideal) At any time instance, the output data stream is fresh

when it matches the ideal one, otherwise it is stale

Tx Ty

t1 t2 t3

Delay (ti)= wait time + processing time

Freshness = 1- (t1 + t2 + t3)/(Ty - Tx)

Ideal output data stream

Actual output data stream

Fresh

Stale

Page 8: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

9 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

How updates are processed

Q1

N1

W1

C1

S1 =1

Cost of executing operator Q1

Selectivity of operator Q1Wait time of top update in input queuefor operator Q1

Number of pending updates

Incoming updates

Operator Q1

Page 9: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

10 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Freshness-Aware Scheduling of MCQ (FAS-MCQ)

Compute loss in freshness (L) under two policies: Policy X: first Q1 then Q2

Policy Y: first Q2 then Q1

LX = (W1 + N1*C1) + (W2 + N2*C2 + N1*C1)

Current loss in freshness

Loss until processing

pending tuples

Q1’s loss

Loss due to waiting for Q1 to finish execution

Q2’s loss

Q1

N1

W1

C1

Q2

N2

W2

C2

S1 =1 S2=1

Page 10: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

11 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Freshness-Aware Scheduling of MCQs

Under policy X: first Q1 then Q2

LX = (W1 + N1*C1) + (W2 + N1*C1 + N2*C2)

Under policy Y: first Q2 then Q1

LY = (W2 + N2*C2) + (W1 + N2*C2 + N1*C1)

For LX < LY => N1 * C1 < N2 * C2

Q1

N1

W1

C1

Q2

N2

W2

C2

Priority of Qi = 1/(Ni * Ci)

S1 =1 S2=1

Page 11: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

12 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Impact of Selectivity

A query is a tree of operators Each query operator is associated with:

Cost (c): processing time Selectivity (s): probability of producing an

output after processing an input update

Maximum cost: C = c1 + c2 + c3

Total selectivity: S = s1 * s2 * s3

Average/Expected Cost: Cavg = c1 + (c2 * s1) + (c3 * s1 * s2)

1

2

3

Page 12: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

13 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Selectivity-Aware FAS-MCQ

Priority of Qi = 1/(Ni * Ciavg)

But we need to consider selectivity: If Si = 0, no appending and the output data stream is fresh

If Si = 1, appending and the output data stream is stale

Compute the staleness probability (Pi) = 1 - (1-Si)Ni

Q1

N1

W1

C1avg

Q2

N2

W2

C2avg

S1 S2

Priority of Qi = Pi / (Ni * Ciavg)

Page 13: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

15 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Intuitions underlying FAS-MCQ

Priority of Qi = Pi / (Ni * Ciavg)

The priority of a query increases if it has: Small processing cost (Ci),

Small number of pending updates (Ni),

High staleness probability (Pi)

Page 14: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

16 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Outline

Motivation

Quality of Data (QoD)

Freshness-Aware Scheduling of MCQ (FAS-MCQ)

Experimental Evaluation

Conclusions and Future Work

Page 15: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

17 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Simulation Testbed

Simulated the execution of 250 continuous queries with variable costs and selectivities

10 input data streams following Poisson distribution

Half of the input streams are bursty

Experiments to show: Impact of utilization,

Impact of Selectivity,

(Impact of burstiness),

Fairness

Page 16: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

18 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Scheduling Algorithms

FAS-MCQ:Freshness-Aware Scheduling of Multiple Continuous Queries

RR:Round-Robin (Aurora)

SRPT:Shortest Remaining Processsing Time

FCFS:First-Come First-Served

Page 17: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

19 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Impact of Utilization (Selectivity = 1)

Server Utilization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Freshness

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

RRFCFSSRPTFAS-MCQ

22%

Qua

lity

of D

ata

Page 18: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

20 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Impact of Selectivity

Server Utilization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Freshness

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

RRFCFSSRPTFAS-MCQ

50%

Qua

lity

of D

ata

Page 19: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

21 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Fairness

Server Utilization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Standard Deviation of Freshness

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

RRFCFSSRPTFAS-MCQ

Sta

ndar

d D

evia

tion

for

QoD

Page 20: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

22 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Conclusions

We proposed a policy for freshness-aware scheduling of multiple continuous queries.

Our policy exploits the properties of continuous queries (i.e., cost and selectivity) as well as the properties of input data streams (variability of updates)

We showed experimentally that our proposed policy outperforms the traditional scheduling policies

Future: study multi-stream queries and different definitions for QoD

Page 21: Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.

HDMS 2005 - Athens, Greece -

23 Sharaf, Labrinidis, Chrysanthis, Pruhs

University of Pittsburgh

Advanced Data Management Technologies Labhttp://db.cs.pitt.edu