Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...
-
Upload
raymond-hudson -
Category
Documents
-
view
217 -
download
0
Transcript of Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali...
Schema-Based Query Optimization for XQuery over XML Streams
Hong SuElke A. Rundensteiner
Murali Mani
Worcester Polytechnic Institute, Massachusetts, USA
VLDB 2005
Schema-Based Query Optimization (SQO)
Schema knowledge can be utilized to optimize queries
Well studied in deductive/relational databases Join elimination predicate elimination, detection of empty answer set …
Equally applicable to XML for flat value filtering
SQO for XML Pattern Retrieval
General XML SQO Applicable to both static and streaming XML E.g..: Query tree minimization [Amer-Yahia+02]
Static XML Specific SQO Focus on expediting random access of data E.g.: Query rewrite using “extents” (indices built
on element types) [Fernandez+98], … Stream specific XML SQO
Focus on expediting token-by-token sequential access of data
Stream Specific SQO Example
/seller[shipTo]
Without schema
<!element seller((billTo,shipTo)|sameAddr, …)>
Buffer seller element
Retrieve /shipTo
Buffer seller element
Retrieve /shipTo
Retrieve /sameAddr
<seller><sameAddr>…<url>…<url></seller>
buffer:
buffer:<seller>
When retrieved
Skip computation
Related Work YFilter [Diao02] and XSM [Ludscher 03]
Use schema to decide whether pattern results are recursive or types of child elements
Essentially propose general XML SQO FluXQuery [Koch+04]
Use schema to minimize buffer size Is complementary to our focus (aim to skip
unnecessary computations) SIX [Gupta+03]
Use indices interleaved with XML data to reduce parsing
Could be combined with our techniques
Challenge: Constraint Useful?
/seller/shipToRetrieve /shipTo
Retrieve /sameAddr
When retrieved
Nothing to save: /shipTo is the only pattern
retrieval<!element seller((billTo,shipTo)|sameAddr, …)>
/seller[shipTo]/billTo Retrieve /shipTo
Retrieve /sameAddr
When retrieved
Retrieve /billTo
Nothing to save: /billTo has
already been retrieved
Challenge : Benefits/Overhead?
Maximal benefits: no beneficial optimization should be missed Any failed patterns should be detected as early
as possible
Minimal overhead: no redundant optimization should be introduced Whether a particular pattern fails should not be
repeatedly checked
Challenge: Plan Execution
Optimization at lower level than query rewrite
Specific physical implementations are needed
/seller[shipTo]
Buffer seller element
Retrieve /shipTo
Retrieve /sameAddr
When retrieved
No query can capture
this optimization
<!element seller((billTo,shipTo)|sameAddr, …)>
Outline
SQO Technique Design SQO Application Execution of Optimized Plan Experimentations
Physical Implementation of Pattern Retrieval
Note: Important to understand physical
stream engine implementation for designing effective SQO
Our implementation: Widely used automata implementation
[e.g., Tukwila, YFilter]
Example Query and its Automata
0 1 2
911 12
auctionsauction
shipToseller
primary, secondary phone
3λ
10
…for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item where $c//keyword=“auto” return $b/*/phone</auction>
*
<auctions> <auction> … <phone> … </phone>input
[2,3][1][0]
[1][0][0]
stack [12#]
[11]
…[2,3][1][0]
… …[11]
…[2,3][1][0]
#: buffering flag
Example Query and its Automata
0 1 2
911 12
auctionsauction
shipToseller
primary, secondary phone
3λ
10
…
*
<auctions> <auction> … <phone> … </phone>input
[2,3][1][0]
[1][0][0]
stack [12#]
[11]
…[2,3][1][0]
… …[11]
…[2,3][1][0]
#: buffering flag
Opt. opportunities:
1. avoid transitions as much as possible
2. revoke buffering flag as soon as possible
Is Constraint Useful for Opt.?
Constraints used to find “ending marks” of a pattern within a context element
<!element seller((billTo, shipTo)|sameAddr?, …)>
<sameAddr> is ending mark of /shipTo within seller element context
Is Constraint Useful for Opt.?
Ending mark helpful if Context element can be filtered out earlier:
Is Constraint Useful for Opt.?
Ending mark helpful if Context element can be filtered out earlier:
Pattern may fail to appear
<!element auction(seller, …)>
Ending mark for $a/seller is not helpfulfor $a in /auctions/auction,
$b in $a/seller…
+<!element auction(seller?, …)>
Ending mark for $a/seller is helpful
Is Constraint Useful for Opt.?
Ending mark helpful if Context element can be filtered out earlier:
Pattern may fail to appear Pattern is required
for $c in $a/itemreturn <c>$a/category</c>
<!element item
(category?, desc, …)>+
Ending mark for $a/category is not helpful
for $c in $a/item[category]return <c>$a/category</c>
Ending mark for $a/category is helpful
Is Constraint Useful for Opt.?
Ending mark helpful if Context element can be filtered out earlier:
Pattern may fail to appear Pattern is required
and The early filtering can be beneficial:
Transitions may happen after ending marks Buffering flags may be raised before ending
marks
SQO Design
Helpful ending marks identified by our SQO
Three SQO rules designed using Occurrence constraints Exclusive constraints Order constraints
Example SQO Rule
Use occurrence constraint Event-condition-action output by rule
for $a in /auctions/auction, $b in $a/sellerWhere $b/*/phone = “508-1234567”…
<!element seller(primary, secondary, …)>
<!element primary (phone)>
<!element secondary (phone)>
+
Event: second </phone> is encountered in a seller
Condition: $b/*/phone = “508-1234567” not satisfied yet
Action: skip rest computations within current seller element
Outline
SQO Technique Design SQO Application Execution of Optimized Plan Experimentations
Properties of SQO Application
Maximal benefits
Minimal overhead
Maximal Benefit
Definition of “rule independence” Proof of “maximal benefits” given
If rules are all independent, as long as each rule is applied
on each pattern, maximal benefits are ensured
Minimal Overhead: Redundancy
Same pattern redundancy : Multiple ending marks adopted for same pattern
<!element seller ( shipTo?, billTo, url )>
for $a in /auctions/auction, $b in $a/seller[shipTo]…
Query Schema Constraints Ending mark <billTo> for $b/shipTo
<billTo> guarantees to capture failure of /shipTo
Ending mark <url> for $b/shipTo
Redundant
Minimal Overhead: Redundancy?
Parent-child pattern redundancy: ending marks of child patterns early filter parent pattern
for $a in /auctions/auction, $b in $a/seller[shipTo]…
optional<!element auction (seller, bidder)><!element seller (shipTo, billTo?)>
Query Constraints<billTo> for $b/shipTo
<bidder> for $a/seller
required
Can be used to capture failure of $a/seller[shipTo]
Redundant<!element auction (seller, bidder)><!element seller (shipTo, billTo)>
SQO Application Algorithm Input:
XQuery represented as a tree XML Schema represented as a graph
Processing: Query tree traversed top-down
“maximal benefits” ensured
Tree node applied by local/regional appliers Same pattern redundancy excluded by local applier Parent-child pattern redundancy excluded by regional applier
Output: Event-condition-actions attached to tree nodes
Outline
SQO Technique Design Guideline SQO Application Execution of Optimized Plan Experimentations
Encoding ECAs in Automata
E: push-in or pop-out of state C: pattern result buffer checked A: actions include:
Suspend computations by removing automata transitions
Clean up result generated within current context element
Prepare for recovering computation for next context element (e.g., backup transitions)
Example: ECAs in Automata
0 1 2
9
5
auctionsauction
shipTo
item
seller
3
10
13sameAddr
(1, startTag, none,state 2)
…
Event: 1st <sameAddr> encounteredCondition: noneAction: cut all transitions from 1. q22. States reachable via : q33. States between q2 and q13: q9
…<auction> <seller>
primary, secondary
11 12phone
(…, state 3)
<sameAddr> </sameAddr>
<item> </item>
<primary> </primary>
…
for $a in /auctions/auction, $b in $a/seller[shipTo]where $b/*/phone=“508-123-4567” return <auction> for $c in $a/item …</auction>
Outline
SQO technique design guideline SQO application Execution of optimized plan Experimentations
Optimization Effected by ?
0
1
2
3
4
5
0% 25% 50% 75% 100%
Selectivity of the Pattern with Ending Marks
Exe
cutio
n Ti
me
Rat
io:
with
out S
QO
/ w
ith S
QO Minor Unit
Gain
MediumUnit Gain
Major UnitGain
How often pattern fails (pattern selectivity)
• How much gain each early filtering brings (unit gain)
Necessity of Design Guideline
Selectivity of Pattern with the Only Useful Ending Mark
Plan without SQO
Plan with SQO (1 ending mark)
Plan with SQO but no guideline considered (30 ending marks)
Conclusion
First SQL on streaming XML Support SQO on nested XQuery with “*” or “//” Offer criteria of “useful” constraints Ensure maximal benefits and minimal overhead in SQO
application Provide execution strategy in widely-used automata-
based model Implement SQO optimizer in Raindrop system (VLDB’04
demo) Experimentally demonstrate SQO brings significant
improvement with little overhead
Visit our XQuery engine over XML stream
project (RAINDROP) website
http://davis.wpi.edu/dsrg/raindrop/
Supported by USA National Science Foundation and IBM PhD Fellowship