1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
![Page 1: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/1.jpg)
1
Introduction to XML Algebra
Based on talk prepared for CS561 by Wan Liu and Bintou Kane
![Page 2: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/2.jpg)
2
Data Model data model ~ core data structures
and data types supported by DBMS relational database is a table (set-
oriented) data model XML format is a tree-structured
hierarchical model
![Page 3: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/3.jpg)
3
Why XML Algebra?
It is common to translate a query language into an algebra.
First, the algebra is used to give a semantics for the query language.
Second, the algebra is used to support query optimization.
![Page 4: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/4.jpg)
5
NIAGARA Title : Following the paths of XML
Data: An algebraic framework for XML query evaluation
By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.
Univ. of Wisconsin
![Page 5: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/5.jpg)
6
Outline
Concepts of Niagara Algebra
Operations
Optimization
![Page 6: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/6.jpg)
7
Goals of Niagara Algebra
Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful
algebraic expressions Allow re-use of traditional optimization
techniques
![Page 7: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/7.jpg)
8
Example: XML Source Documents
Invoice.xml
<Invoice_Document>
<invoice No = 1>
<account_number>2 </account_number>
<carrier>AT&T</carrier>
<total>$0.25</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>Sprint</carrier>
<total>$1.20</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>AT&T</carrier>
<total>$0.75</total>
</invoice>
</Invoice_Document>
Customer.xml
<Customer_Document>
<customer>
<account>1 </account>
<name>Tom </name>
</customer >
<customer>
<account>2 </account>
<name>George </name>
</customer >
</Customer _Document>
![Page 8: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/8.jpg)
9
XML Data Model and Tree Graph
Example:Invoice_Document
Invoice Invoice…
numbercarrier total number
carriertotal
2 AT&T $0.25 1 Sprint $1.20
<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>
<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>
</Invoice_Document>
Ordered Tree Graph,
Semi structured Data
![Page 9: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/9.jpg)
10
XML Data Model [GVDNM01]
Collection of bags of vertices. Vertices in a bag have no order. Example:
Root invoice.xml invoice invoice.account_number
<invoice>Invoice-element-content
</invoice>
< account_number >element-content
</ account_number >
[Root“invoice.xml”, invoice, invoice. account_number ]
![Page 10: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/10.jpg)
11
Data Model
Bag elements are reachable by path expressions.
Path expression consists of two parts: An entry point A relative forward part
Example: account_number:invoice
![Page 11: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/11.jpg)
12
Operators
Source S , Follow , Select , Join , Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .
![Page 12: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/12.jpg)
13
Source Operator S
Input : a list of documents Output :a collection of singleton bags
Examples :
S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename match “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to schema.dtd
![Page 13: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/13.jpg)
14
Follow operator Input : a path expression in entry
point notation Functionality : extracts vertices
reachable by path expression Output : a new bag that consists of
the extracted vertex + all contents of original bag (in case of unnesting follow)
![Page 14: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/14.jpg)
15
Follow operator (Example*)
Root invoice.xml invoice
<invoice>Invoice-element-content
</invoice>
Root invoice.xml invoice invoice.carrier
<invoice>Invoice-element-content
</invoice>
<carrier>carrier -element-content
</carrier >
(carrier:invoice)*Unnesting Follow
{[Root invoice.xml , invoice]}
{[Root invoice.xml , invoice, invoice.carrier]}
![Page 15: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/15.jpg)
16
Select operator
Input : a set of bags Functionality : filters the bags of a
collection using a predicate Output : a set of bags that conform
to the predicate Predicate : Logical operator (,,), or simple
qualifications (,,,,,)
![Page 16: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/16.jpg)
17
Select operator (Example)
invoice.carrier =Sprint
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}
{[Root invoice.xml , invoice],… }
![Page 17: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/17.jpg)
18
Join operator Input: two collections of bags Functionality: Joins the two
collections based on a predicate Output: the concatenation of pairs of
pages that satisfy the predicate
![Page 18: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/18.jpg)
19
Join operator (Example)
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root customer.xml customer<customer>
customer-element-content</customer>
account_number: invoice =number:customer
Root invoice.xml invoice Root customer.xml customer<invoice>
Invoice-element-content</invoice>
<customer>customer-element-content
</customer>
{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}
{[Root invoice.xml , invoice, Root customer.xml , customer]}
![Page 19: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/19.jpg)
20
Expose operator
Input: a list of path expressions of vertices to be exposed
Output: a set of bags that contains vertices in the parameter list with the same order
![Page 20: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/20.jpg)
21
Expose operator (Example)
Root invoice.xml invoice. bill_period invoice.carrier
<invoice>carrier-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
(bill_period,carrier)
{[Root invoice.xml , invoice.bill_period, invoice.carrier]}
Root invoice.xml invoice invoice.carrier invoice.bill_period
<invoice>Invoice-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}
<invoice>carrier-element-content
</invoice>
![Page 21: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/21.jpg)
22
Vertex operator
Creates the actual XML vertex that will encompass everything created by an expose operator
Example :
(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]
![Page 22: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/22.jpg)
23
Other operators Group : is used for arbitrary
grouping of elements based on their values Aggregate functions can be used with
the group operator (i.e. average) Rename : Changes entry point
annotation of elements of a bag. Example: (invoice.bill_period,date)
![Page 23: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/23.jpg)
24
Example: XML Source Documents
Invoice.xml
<Invoice_Document>
<invoice>
<account_number>2 </account_number>
<carrier>AT&T</carrier>
<total>$0.25</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<carrier>Sprint</carrier>
<total>$1.20</total>
</invoice>
<invoice>
<account_number>1 </account_number>
<total>$0.75</total>
</invoice>
<auditor> maria </auditor>
</Invoice_Document>
Customer.xml
<Customer_Document>
<customer>
<account>1 </account>
<name>Tom </name>
</customer >
<customer>
<account>2 </account>
<name>George </name>
</customer >
</Customer _Document>
![Page 24: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/24.jpg)
25
Xquery ExampleList account number, customer name, and
invoice total for all invoices that has carrier = “Sprint”.
FOR $i in (invoices.xml)//invoice,
$c in (customers.xml)//customer
WHERE $i/carrier = “Sprint” and
$i/account_number= $c/account
RETURN
<Sprint_invoices>
$i/account_number,
$c/name,
$i/total
</Sprint_invoices>
![Page 25: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/25.jpg)
26
Example: Xquery output
<Sprint_Invoice>
<account_number>1 </account_number>
<name>Tom </name>
<total>$1.20</total>
</Sprint_Invoice >
![Page 26: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/26.jpg)
27
Algebra Tree Execution
customer (2) customer(1) Invoice (1) invoice (2) invoice (3)
Source (Invoices.xml) Source (cutomers.xml)
Follow (*.invoice) Follow (*.customer)
Select (carrier= “Sprint” )
invoice (2)
Join (*.invoice.account_number=*.customer.account)
invoice(2) customer(1)
Expose (*.account_number , *.name, *.total )
Account_number name total
![Page 27: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/27.jpg)
28
Optimization with Niagara
Optimizer based on Niagara algebra:
Use the operation more efficiently Produce simpler expressions by
combining operations
![Page 28: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/28.jpg)
29
Language Convention A and B are path expressions A< B -- Path Expression A is
prefix of B AnB --- Common prefix of path
A and B AńB --- Greatest common of
path A and B ┴ --- Null path Expression
![Page 29: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/29.jpg)
30
Heuristics using Rewrite Rules
Allow optimization based on path selectivity
When applying un-nesting following operation Φμ
![Page 30: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/30.jpg)
31
Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]
TRUE when exists C such that C < A && C < B and C = AńB
Or AnB = ┴
Interchangeability of Follow operation
![Page 31: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/31.jpg)
32
Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *
=?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **
![Page 32: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/32.jpg)
33
Application of Rule on Invoice
Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]
?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]
Equivalent because both share the common prefix “invoice”.
Case AńB = invoice
![Page 33: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/33.jpg)
34
Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice
element, while carrier is not required for invoice element
THEN:Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]
?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]
Then what algebra tree do we prefer?
Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]
make more sense than ** Why?
![Page 34: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/34.jpg)
35
Discussion
Reduction of Input Size on firstSub-operation:
Φμ(carrier:invoice)
![Page 35: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/35.jpg)
36
Should we/can we apply the rule below?
Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]
![Page 36: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.](https://reader030.fdocuments.net/reader030/viewer/2022032704/56649d785503460f94a5b59e/html5/thumbnails/36.jpg)
37
“acc_Num:invoice” and“acc_Num:customer” are two totally different paths
Case is: AnB = ┴
So yes, rule is valid.