Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Post on 01-Feb-2016

63 views 0 download

description

Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams. Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute Worcester, MA VLDB 2004. Stream Processing. data sources. Networks. data requesters. - PowerPoint PPT Presentation

Transcript of Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams

Raindrop:

An Algebra-Automata Combined XQuery Engine over XML Streams

Hong Su, Elke Rundensteiner, Murali Mani, Ming Li

Worcester Polytechnic Institute

Worcester, MA

VLDB 2004

Stream Processingdata sources

data requesters

Networks

What’s Special for XML Stream Processing

<auctions>

Token-by-Token access manner

timeline

Pattern retrieval + Filtering + Restructuring

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Token: not a counterpart of a self-contained tuple

Pattern Retrieval on Token Streams

<auction>

<seller>

<primary>

<phone>

Two Computation Paradigms Automata-based [yfilter, xscan, xsm, xsq, xpush…] Algebraic [niagara00, …]

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

1auction

*

2

3seller

bidder

Automata

8Navigate

$a, /seller->$b

Navigate $a, /bidder-> $c

Tagger

Algebra

Navigate stream(bids),//auction->$a

4

homepage

9sameAddr

5 6* phone

7

bid

Comparison of Two Paradigms

Either paradigm has deficiencies

Both paradigms complement each other

Automata Paradigm Algebra Paradigm

Good for pattern retrieval on tokens Does not support token inputs

Need patches for filtering and restructuring

Good for filtering and restructuring

Present all details on same low level Support multiple descriptive levels (e.g., logical plan, physical plan)

Little studied as query processing paradigm

Well studied as query process paradigm

Four-Level Algebraic Framework

Semantics-Focused PlanSemantics-Focused Plan

Stream Physical PlanStream Physical Plan

Stream Execution PlanStream Execution Plan

Express the semantics of query regardless of

input sources

Accommodate tokenized streams/

automata computation

Describe implementation

details of operators

Decide how an operator is invoked

(scheduling) Abstraction Level

High (Declarative)

Low (Procedural)

Stream Logic PlanStream Logic Plan

This Raindrop framework intends to integrate both paradigms into one

Level I: Semantics-Focused Plan

Express query semantics regardless of stored or stream input sources [Rainbow-ZPR02]

Reuse existing general optimization techniques Decorrelation Cancel duplicate navigation operators …

Stream Data:Stream Data: <auctions> <auction> <seller> <primary><phone>508</phone></primary> <secondary><phone>613</phone></secondary> </seller> <bid><bidder>…</bidder><bidder>…</bidder></bid> </auction> …

source<auctions> … </auctions>

source<auctions>… </auctions>

$a<auction> … </auction>

<auctions> … </auctions>

<auction> … </auction>

source<auctions>… </auctions>

$a<auction>… </auction>

$b <seller>…

</seller>

<auctions>… </auctions>

<auction>… </auction>

source <auctions>…

</auctions>

$a<auction>… </auction>

$b <seller>…

</seller>

$c <bidder>…

</bidder>

<auctions>… </auctions>

<auction>. .. </auction>

NavUnneststream(bids),//auction->$a

NavUnnest $a, /seller ->$b

NavUnnest $a, /bid/bidder ->$c

Example Semantics-Focused Plan

Plan and Input/output Data:Plan and Input/output Data:

Query:Query:

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Level II: Stream Logical Plan

Extend semantics-focused plan to accommodate tokenized stream inputs New input data format:

Tokens New operators:

StreamSource, TokenNavigate, ExtractUnnest, ExtractNest, StructuralJoin

New rewrite rules: Push-into/Pull-out-of Automata

One Uniform Algebraic View

Token-based plan (automata plan)

Tuple-based plan

Tuple stream

XML data stream

Query answer

Algebraic Stream Logical Plan

Modeling Automata in Algebraic Plan:Black Box[XScan01] vs. White Box

$a := stream(bids)//auction$b := $a/seller$c := $a/bid/bidder

Black Box

XScan

StructuralJoin$a

ExtractUnnest $a, $b

ExtractUnnest $a, $c

White Box

TokenNavigate $a, /seller->$b

TokenNavigate $a, /bid/bidder->$c

TokenNavigate stream(bids), //auction->$a

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Data Model in Algebraic Plan Modeling Automata

StructuralJoin$a

ExtractUnnest $a, $b

ExtractUnnest $a, $c

TokenNavigate $a, /seller->$b

TokenNavigate $a, /bid/bidder->$c

TokenNavigate stream(bids), //auction->$a

<phone>

<primary>

<seller>

<auction>

0314

<bidderid>

<bidder>

<bidder>...</bidder>

</primary>

</phone>

508

...

<phone>

<primary>

<seller>

<seller>…</seller>

……

<bidder>...</bidder><seller>…</seller>

....

<auction>

<auctions>

StreamSource

For Details of Levels III and IV, please refer to “Automaton Meets Query Algebra: Towards a Unified Mo

del for XQuery Evaluation over XML Data Streams”, ER 2003

“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, CIKM 2003

“Raindrop: A Uniform and Layered Algebraic Framework for XQueries on XML Streams”, Journal Submission 2004

Optimization I: Computation Into or Out of Automata?

TokenNavigate $a, /bid/bi

dder->$c

ExtractUnnest $a, $c

ExtractUnnest $a, $b

StructuralJoin $a

TokenNavigate $a, /seller->$

b

TokenNavigate stream(bids), //a

uction->$a

ExtracUnnest stream(bids), $a

NavigateUnnest $a, /seller-

>$b

NavigateUnnest $a, /bid/bid

der->$c

TokenNavigate stream(bids), //aucti

on->$a

NavUnnest stream(bids), //auction->$a

NavigateUnnest $a, /seller ->$b

NavigateUnest $a, /bid/bidder ->$c

Out of Automata Into Automata

Automata Plan

Automata Plan

… …

Experimentation Results

Execution Time on 85M XML Stream Under Various Selectivity

25000

30000

35000

40000

45000

50000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Selectivity of Selection

Exe

cutio

n Ti

me

(ms)

1 Nav

2 Navs

3 Navs

4 Navs

5 Navs

Optimization II: Semantic Query Optimization

General schema-based optimizations Eliminate predicate/join, … Focus on operators manipulating flat values

XML specific schema-based optimizations Focus on pattern retrieval Fall into two categories

General XML SQO• Minimize query tree [YCL+-AT&T 01]

Stream XML SQO (our focus)

Stream-Specific XML SQO

Observations Pattern retrieval over tokens solely relies on docum

ent-order traversal Schema constraints help expedite document-order t

raversal State-of-the-Art

[XPush03] covers limited query (boolean XPath match) and one type of constraints

Our goals: Support more powerful query (XQuery) Support more types of constraints (XSchema)

Step I: Construct Query Graph

(a) Example Query (b) Query Tree

FOR $a in stream(bids)//auction, $b in $a/seller[homepage], $c in $a/bid/bidder[sameAddr]WHERE $b/*/phone = “508”Return <auction> $b, $c </auction>

Example XML Schema

Step II: Apply Optimization Rules

Offer optimization rules utilizing occurrence constraints exclusive constraints order constraints

Apply rules in an order ensuring no beneficial rule missed no redundant rule introduced

Step III: Translate Rewritten Query Graph Back to Plan (I)

when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2 and s3

Utilize Occurrence Constraints

Step III: Translate Rewritten Query Graph Back to Plan (II)

when <billTo> or <shipTo> is encountered once: suspend states s2 and s9

Utilize Exclusive Constraints

Step III: Translate Rewritten Query Graph Back to Plan (III)

when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2

Utilize Order Constraints

http://davis.wpi.edu/dsrg/raindrop/

suhong@cs.wpi.edu

Thank WPI DSRG Rainbow Team for XAT Algebra Support

Thank WPI DSRG Rainbow Team for XAT

Algebra Support