Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...

Post on 12-Jan-2016

217 views 0 download

Transcript of Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...

Schema-Based Query Optimization for XQuery over XML Streams

Hong SuElke A. Rundensteiner

Murali Mani

Worcester Polytechnic Institute, Massachusetts, USA

VLDB 2005

Schema-Based Query Optimization (SQO)

Schema knowledge can be utilized to optimize queries

Well studied in deductive/relational databases Join elimination predicate elimination, detection of empty answer set …

Equally applicable to XML for flat value filtering

SQO for XML Pattern Retrieval

General XML SQO Applicable to both static and streaming XML E.g..: Query tree minimization [Amer-Yahia+02]

Static XML Specific SQO Focus on expediting random access of data E.g.: Query rewrite using “extents” (indices built

on element types) [Fernandez+98], … Stream specific XML SQO

Focus on expediting token-by-token sequential access of data

Stream Specific SQO Example

/seller[shipTo]

Without schema

<!element seller((billTo,shipTo)|sameAddr, …)>

Buffer seller element

Retrieve /shipTo

Buffer seller element

Retrieve /shipTo

Retrieve /sameAddr

<seller><sameAddr>…<url>…<url></seller>

buffer:

buffer:<seller>

When retrieved

Skip computation

Related Work YFilter [Diao02] and XSM [Ludscher 03]

Use schema to decide whether pattern results are recursive or types of child elements

Essentially propose general XML SQO FluXQuery [Koch+04]

Use schema to minimize buffer size Is complementary to our focus (aim to skip

unnecessary computations) SIX [Gupta+03]

Use indices interleaved with XML data to reduce parsing

Could be combined with our techniques

Challenge: Constraint Useful?

/seller/shipToRetrieve /shipTo

Retrieve /sameAddr

When retrieved

Nothing to save: /shipTo is the only pattern

retrieval<!element seller((billTo,shipTo)|sameAddr, …)>

/seller[shipTo]/billTo Retrieve /shipTo

Retrieve /sameAddr

When retrieved

Retrieve /billTo

Nothing to save: /billTo has

already been retrieved

Challenge : Benefits/Overhead?

Maximal benefits: no beneficial optimization should be missed Any failed patterns should be detected as early

as possible

Minimal overhead: no redundant optimization should be introduced Whether a particular pattern fails should not be

repeatedly checked

Challenge: Plan Execution

Optimization at lower level than query rewrite

Specific physical implementations are needed

/seller[shipTo]

Buffer seller element

Retrieve /shipTo

Retrieve /sameAddr

When retrieved

No query can capture

this optimization

<!element seller((billTo,shipTo)|sameAddr, …)>

Outline

SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Physical Implementation of Pattern Retrieval

Note: Important to understand physical

stream engine implementation for designing effective SQO

Our implementation: Widely used automata implementation

[e.g., Tukwila, YFilter]

Example Query and its Automata

0 1 2

911 12

auctionsauction

shipToseller

primary, secondary phone

10

…for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item where $c//keyword=“auto” return $b/*/phone</auction>

*

<auctions> <auction> … <phone> … </phone>input

[2,3][1][0]

[1][0][0]

stack [12#]

[11]

…[2,3][1][0]

… …[11]

…[2,3][1][0]

#: buffering flag

Example Query and its Automata

0 1 2

911 12

auctionsauction

shipToseller

primary, secondary phone

10

*

<auctions> <auction> … <phone> … </phone>input

[2,3][1][0]

[1][0][0]

stack [12#]

[11]

…[2,3][1][0]

… …[11]

…[2,3][1][0]

#: buffering flag

Opt. opportunities:

1. avoid transitions as much as possible

2. revoke buffering flag as soon as possible

Is Constraint Useful for Opt.?

Constraints used to find “ending marks” of a pattern within a context element

<!element seller((billTo, shipTo)|sameAddr?, …)>

<sameAddr> is ending mark of /shipTo within seller element context

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear

<!element auction(seller, …)>

Ending mark for $a/seller is not helpfulfor $a in /auctions/auction,

$b in $a/seller…

+<!element auction(seller?, …)>

Ending mark for $a/seller is helpful

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear Pattern is required

for $c in $a/itemreturn <c>$a/category</c>

<!element item

(category?, desc, …)>+

Ending mark for $a/category is not helpful

for $c in $a/item[category]return <c>$a/category</c>

Ending mark for $a/category is helpful

Is Constraint Useful for Opt.?

Ending mark helpful if Context element can be filtered out earlier:

Pattern may fail to appear Pattern is required

and The early filtering can be beneficial:

Transitions may happen after ending marks Buffering flags may be raised before ending

marks

SQO Design

Helpful ending marks identified by our SQO

Three SQO rules designed using Occurrence constraints Exclusive constraints Order constraints

Example SQO Rule

Use occurrence constraint Event-condition-action output by rule

for $a in /auctions/auction, $b in $a/sellerWhere $b/*/phone = “508-1234567”…

<!element seller(primary, secondary, …)>

<!element primary (phone)>

<!element secondary (phone)>

+

Event: second </phone> is encountered in a seller

Condition: $b/*/phone = “508-1234567” not satisfied yet

Action: skip rest computations within current seller element

Outline

SQO Technique Design SQO Application Execution of Optimized Plan Experimentations

Properties of SQO Application

Maximal benefits

Minimal overhead

Maximal Benefit

Definition of “rule independence” Proof of “maximal benefits” given

If rules are all independent, as long as each rule is applied

on each pattern, maximal benefits are ensured

Minimal Overhead: Redundancy

Same pattern redundancy : Multiple ending marks adopted for same pattern

<!element seller ( shipTo?, billTo, url )>

for $a in /auctions/auction, $b in $a/seller[shipTo]…

Query Schema Constraints Ending mark <billTo> for $b/shipTo

<billTo> guarantees to capture failure of /shipTo

Ending mark <url> for $b/shipTo

Redundant

Minimal Overhead: Redundancy?

Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern

for $a in /auctions/auction, $b in $a/seller[shipTo]…

optional<!element auction (seller, bidder)><!element seller (shipTo, billTo?)>

Query Constraints<billTo> for $b/shipTo

<bidder> for $a/seller

required

Can be used to capture failure of $a/seller[shipTo]

Redundant<!element auction (seller, bidder)><!element seller (shipTo, billTo)>

SQO Application Algorithm Input:

XQuery represented as a tree XML Schema represented as a graph

Processing: Query tree traversed top-down

“maximal benefits” ensured

Tree node applied by local/regional appliers Same pattern redundancy excluded by local applier Parent-child pattern redundancy excluded by regional applier

Output: Event-condition-actions attached to tree nodes

Outline

SQO Technique Design Guideline SQO Application Execution of Optimized Plan Experimentations

Encoding ECAs in Automata

E: push-in or pop-out of state C: pattern result buffer checked A: actions include:

Suspend computations by removing automata transitions

Clean up result generated within current context element

Prepare for recovering computation for next context element (e.g., backup transitions)

Example: ECAs in Automata

0 1 2

9

5

auctionsauction

shipTo

item

seller

3

10

13sameAddr

(1, startTag, none,state 2)

Event: 1st <sameAddr> encounteredCondition: noneAction: cut all transitions from 1. q22. States reachable via : q33. States between q2 and q13: q9

…<auction> <seller>

primary, secondary

11 12phone

(…, state 3)

<sameAddr> </sameAddr>

<item> </item>

<primary> </primary>

for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item …</auction>

Outline

SQO technique design guideline SQO application Execution of optimized plan Experimentations

Optimization Effected by ?

0

1

2

3

4

5

0% 25% 50% 75% 100%

Selectivity of the Pattern with Ending Marks

Exe

cutio

n Ti

me

Rat

io:

with

out S

QO

/ w

ith S

QO Minor Unit

Gain

MediumUnit Gain

Major UnitGain

How often pattern fails (pattern selectivity)

• How much gain each early filtering brings (unit gain)

Necessity of Design Guideline

Selectivity of Pattern with the Only Useful Ending Mark

Plan without SQO

Plan with SQO (1 ending mark)

Plan with SQO but no guideline considered (30 ending marks)

Conclusion

First SQL on streaming XML Support SQO on nested XQuery with “*” or “//” Offer criteria of “useful” constraints Ensure maximal benefits and minimal overhead in SQO

application Provide execution strategy in widely-used automata-

based model Implement SQO optimizer in Raindrop system (VLDB’04

demo) Experimentally demonstrate SQO brings significant

improvement with little overhead

Visit our XQuery engine over XML stream

project (RAINDROP) website

http://davis.wpi.edu/dsrg/raindrop/

Supported by USA National Science Foundation and IBM PhD Fellowship