Query Processing in a WSMS - Stanford University

28
Query Processing in a WSMS Utkarsh Srivastava Stanford University Joint work with Jennifer Widom, Kamesh Munagala, Rajeev Motwani, and Gene Pang

Transcript of Query Processing in a WSMS - Stanford University

Page 1: Query Processing in a WSMS - Stanford University

Query Processing in a WSMS

Utkarsh SrivastavaStanford University

Joint work withJennifer Widom, Kamesh Munagala, Rajeev Motwani,

and Gene Pang

Page 2: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 2

What are Web Services?Highly standardized method of sharing data

and functionality

WebServices

Discovery and Description

UDDI, WSDLUser/Client

Page 3: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 3

What are Web Services?Highly standardized method of sharing data

and functionality

WebServices

Communication

SOAPUser/Client

Page 4: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 4

Example

Page 5: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 5

Result of Invocation

Page 6: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 6

Query Across Web Services

WS1

symbol companyinfo.

WS2

symbol stockpriceinfo.

NASDAQ

Query

Results

User/Client

find info. about allcompanies whose

stock had >10% change

Complex

Page 7: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 7

Taking a Cue from DBMSs

DataDatabase

Management System

Query

Results

User/Client

DeclarativeInterfaceSimple All the complexity

Page 8: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 8

Web Service Management System

WS1

symbol companyinfo.

WS2

symbol stockpriceinfo.

NASDAQ

Query

Results

User/Client

Web ServiceManagement

System(WSMS)

Page 9: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 9

Web Service Management System

Client

WS1

WS2

WSn

WSMS

Query and Input Data

Results

Declarative Interface WS InvocationsMetadata Component

WS Registration

SchemaMapper

Query Processing Component

PlanSelection

PlanExecution

ResponseTime Profiler

StatisticsTracker

Profiling and Statistics Component

Page 10: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 10

Query over Web Services – An ExampleCredit card company wishes to send offers to thosea)Who have a credit rating > 50, and, b)Who have a payment history = “Good” on a prior

card.

Company has at its disposalL: List of potential recipient SSNsWS1: SSN ! credit ratingWS2: SSN ! card no(s)WS3: card no. ! payment history

Page 11: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 11

Plan 1

Client

WS1

WS2

WS3

WSMS

Results

L (SSN)

SSN ! cr

SSN ! ccn

ccn ! ph

Filter on cr, keep SSN

SSNSSN, cr

SSN, ccn

SSN, ccn, phFilter on ph, keep SSN

Note pipelined processing

SSN cr1 602 70

SSN ccn1 xx12 xx2

ccn phxx1 Bxx2 G

SSN12

SSN2

Page 12: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 12

Simple Representation of Plan 1

L WS1 WS2 WS3 Results

Page 13: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 13

Plan 2

Client

WS1

WS2

WS3

WSMS

Results

L (SSN)

Filter on cr, keep SSN

SSNSSN, cr

SSN, ccn

SSN, ccn, phFilter on ph, keep SSN

JoinSSN

SSN ! cr

SSN ! ccn

ccn ! ph

SSN cr1 602 70

SSN ccn1 xx12 xx2

ccn phxx1 Bxx2 G

SSN12

SSN2

Page 14: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 14

Simple Representation of Plan 2

L

WS1

WS2 WS3

Results

Page 15: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 15

Quiz

L

WS1

WS2 WS3

Results

L WS1 WS2 WS3 Results

In Plan 1, WS2 has to process only filtered SSNsIn Plan 2, WS2 has to process all SSNs

Which plan is better?

Cost Metric: Steady-state throughput

Plan 1

Plan 2

Page 16: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 16

Query Planning Recap

Possible plans P1, …, Pn

Statistics S

Cost Metric cost(Pi, S)

Want to find least-cost plan

Page 17: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 17

Class of Queries Considered

“Select-Project-Join” queries over input data and set of web services

Precedence constraintsInput for WSi may be provided by the output of WSj

e.g., WS2: SSN ! ccn and WS3: ccn ! phPrecedence constraints impose a DAG.

Page 18: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 18

Statistics: Response Time

ci: per-tuple response time of WSi from client

Assume independent response times

WS1

SSN ! crClient

SSN

cr

Page 19: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 19

Statistics: Selectivitysi: selectivity of WSiAverage number of output tuples per input tuple to

WSi

a) WS1: SSN ! crIf 90% individuals have cr > 50, s1 = 0.9

b) WS2: SSN ! ccnIf on average each SSN holds 2 credit cards, s2

= 2

Assume independent selectivities

Page 20: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 20

Bottleneck Cost MetricLunch Buffet

Dish 1 Dish 2 Dish 3 Dish 4

Overall per-item processing time=

Response time of slowest or bottleneck stage in pipeline

Page 21: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 21

Cost Expression for Plan PRi(P): Predecessors of WSi in plan P

Fraction of input tuples seen by WSi =

Response time per original input tuple at WSi

Assumption: WSMS cost is not the bottleneckContrast with sum cost metric

Page 22: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 22

Problem StatementInput:

Set of web services WS1, …, WSnResponse times c1, …, cnSelectivities s1, …, snPrecedence constraints among web services

Output:Arrange web services into a plan PP respects all precedence constraintscost(P) by the bottleneck metric is minimized

Page 23: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 23

No Precedence ConstraintsAll selectivities · 1

Theorem: Optimal to linearly order by increasing ci

General case

…LocalJoin at

WSMS

selective web services

in increasing cost order

proliferativeweb services

Results

Page 24: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 24

With Precedence Constraints

Sum cost metricHard to approximate to within a factor O(nθ)

Bottleneck cost metricSurprisingly, solvable in polynomial time Developed an O(n5) algorithm

Adds one WS at a time to the planWS to be added is chosen by solving a linear program.

Page 25: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 25

Isn’t this the same as …?Web Service Composition

Targeted towards workflow-oriented applicationsDon’t give provably optimal strategies

Parallel and Distributed Query OptimizationFreedom to move query operators aroundMuch larger space of execution plans

Data Integration, MediatorsIntegrate general sources of dataPrimarily optimize the cost at the integration system itself

Page 26: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 26

Implementation

Building a prototype general-purpose WSMSWritten in JavaUses Apache Axis, an open-source implementation of SOAPImplements query planning and execution

Page 27: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 27

Future DirectionsMonetary cost of invoking web services

Optimize combination of response time and cost

Variations in web service response timesDepends on provisioning, load, network conditionsConsider adaptive plans and/or robust plans

Statistics CollectionSelf-tuning histograms are relevant

Extension to optimizing workflows

Page 28: Query Processing in a WSMS - Stanford University

March 24, 2008 InfoLab Workshop 28

Conclusion

Query

Results

User/Client Questions?

http://infolab.stanford.edu/wsms

WebServices