1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

35
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

Page 1: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

1

Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II)

Xin Zhang

Page 2: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

2

Outline XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 3: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

3

XAT Decorrelation XQuery is Correlated Query Decorrelation is required for

Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Page 4: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

4

Three kinds of Decorrelation Simple Decorrelation

No Additional sources No Aggregate Functions

Complex Decorrelation with Additional Sources

Complex Decorrelation with Aggregate Functions

Page 5: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

5

<!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

<prices> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title>Data on the Web</title> <price>34.95</price>

</book> <book>

<title>Data on the Web</title> <price>39.95</price>

</book> </prices>

Example* of XML Use Cases.

Page 6: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

6

Simple Query Example

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title) return

<minprice> $t

</minprice> }

</results>

In the document "prices.xml", find the book title.

Page 7: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

7

Simple DecorrelationLinear the Tree: T[FOR(CB, T2[])[T1[S1]]]

T[T2[T1[S1]]]

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

T (<minprice>[$t]</minprice>):col1

Page 8: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

8

Is Simple Decorrelation Right? Every operator, except Groupby,

has the semantic of “for each” tuple in the input table.

Hence, the FOR operator can be omitted in the simple decorrelation scenario.

Page 9: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

9

Two types of Navigates Navigate Unnesting: U

Unnesting the parent-children relationship, and duplicates the parent values for each child.

Navigate Collection: C

Nesting the parent-children relationship, create a collection of children, but keep the single parent.

Page 10: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

10

Where to use two types Navigate Unnesting: U

FOR binding. Navigate Collection: C

LET binding.

Page 11: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

11

Complex Query Example

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], [col4]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, $b/price

</minprice> }

</results>

In the document "prices.xml", find the book title and its prices.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

Page 12: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

12

Complex Decorrelation with Additional Source

: T[FOR(CB, T2[S2])[T1[S1]]] T[T2[[T1[S1],S2]]]

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

C($b, price):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

Page 13: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

13

Full Query Example

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, <price>min($b/price/text())</price>

</minprice> }

</results>

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

Page 14: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

14

Complex Query Decorrelation with one Aggregation Function

T[FOR(CB, T2[Agg(T3[])])[T1[S1]]] T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]),

S2]))]]]

DM(T1) is data model computed from T1.

S2

Agg()

T1

S1

T3

FOR($rate)

T2

T

S1

Groupby(DM(T1), Agg())

S2

T3

TT2

T1

Distinct

Page 15: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

15

The Query after Decorrelation

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

Page 16: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

16

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 17: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

17

XAT Computation Pushdown To push the execution into

relational database Steps:

Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.

Page 18: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

18

Navigation Pushdown Basically Navigation can push through

all the operators until: Has dependency on its child operator.

Example Rewriting rules: (x1, path):x2[(y1, path):y2[T]] (y1,

path):y2[(x1, path):x2[T]] (x1 != y2) (x1, path):x2[(c) [T]] (c) [(x1, path):x2[T]] (x1, path):x2[[T1, T2]] [T1, (x1, path):x2[T2]]

(if x1 in DM(T2)) (x1, path):x2[[T1, T2]] [(x1, path):x2[T1], T2]

(if x1 in DM(T1))

Page 19: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

19

Navigation Pushdown Example

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

Page 20: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

20

Navigation/Tagger Cancel Out Used to simplify a composite XAT

tree. Transformation Rules:

(x, /):y[T(<tag>[z]</tag>):x[s]] s Note: Also use type analysis for the

cancel out.

Page 21: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

21

View Query Example<DB>

<book> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title>Data on the Web</title> <price>34.95</price>

</row> <row>

<title>Data on the Web</title> <price>39.95</price>

</row> </book>

</prices>

<prices> {

for $row in distinct (DXV /book/row),return

<book> $row/title, $row/price

</book> }

</prices>

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

Page 22: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

22

Cancel Out Example (1)

C($b, price/text()):col4

S(“prices.xml”):R2

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

(x, y)[op():x[s]] op():y[s]

Page 23: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

23

Cancel Out Example (2)

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

Page 24: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

24

Cancel Out Example (3)

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

C($b, price/text()):col4

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

Page 25: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

25

Cancel Out Example (4)

C($b, price):temp1

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

C(temp1, text()):col4

...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

Page 26: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

26

SQL Generation Find a pattern in the XAT Translate that pattern into a SQL

operator that will access the relational database.

Page 27: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

27

SQL Generation Example...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

...

SQL(select title as col3,

price as temp1 from book):{col3,temp}

C(temp1, text()):col4

Page 28: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

28

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 29: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

29

XAT Data Model Cleanup By Default Each operator will append one

additional columns to the data model. Used to Help:

Execute: used to optimize the data storage during the execution

Cutting: get rid of the un-used operators in the XQuery

Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DMp – Pp) Cp (P – C)

Page 30: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

30

Data Model Examplefor $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

DM before DM after

1 {} {} {$prices, R1, $b, col1}

{}

2 {col1} {$b} {$prices, R1, $b, col1}

{col1}

3 {$prices}

{$b} {$prices, R1, $b}

{$b, $prices}

4 {$b} {R1} {R1, $b} {$b}

5 {R1} {} {R1} {R1}

DM := (DMp – Pp) Cp (P – C)

Page 31: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

31

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

Page 32: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

32

XAT Cutting General Idea:

Get rid of the operators that’s produce useless data.

Equations: R := (Rp – P) C (P M) (Rp Mp) = NULL

Page 33: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

33

XAT Cutting Example

R := (Rp – P) C

(P M) (Rp Mp)= NULL

for $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

Modified

Required

Cut?

1 {} {} {*} {} N/A

2 {col1} {$b} {} {$b} {col1}

3 {$prices}

{$b} {} {$b} {}

4 {$b} {R1} {} {R1} {$b}

5 {R1} {} {} {} {R1}

Page 34: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

34

Conclusions XQuery are heavily correlated,

hence need to be decorrelated for better optimization.

After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.

Page 35: 1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.

35

Future Works Write TR to formalize the XAT.

Compare with ORDB, ODB, also XQA operators. Wrap Up:

Finalize uncertain operators deal with collections Union, Navigate

Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis

Finalize the XAT Rewriting Rules for: Order Handling Update propagation.

Translation from XAT back to Query Next Step:

Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.