Introduction to XML, XPath, & XQuery
description
Transcript of Introduction to XML, XPath, & XQuery
![Page 1: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/1.jpg)
Introduction to XML, XPath, & XQuery
CS186, Fall 2005R &G - Chapters 7-27
Bill Gates, The Revolution, and
a Network of Trees(based on a true story)
![Page 2: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/2.jpg)
Letter to Bill Gates
![Page 3: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/3.jpg)
“Microsoft mailing address”
![Page 4: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/4.jpg)
“Microsoft address”
![Page 5: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/5.jpg)
Web Search Today
• Web document: bag of words• HTML: presentation language
• Difficult to identify structure/semantics
<I> Microsoft<BR> One Microsoft Way<BR> Redmond, WA<BR> </I>
<I> Terriyaki sauce<BR> One egg<BR> New York steak<BR> </I>
![Page 6: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/6.jpg)
A first step - XML• Focus on structure/semantics instead of
layout
<I> Microsoft<BR> One Microsoft Way<BR> Redmond, WA<BR> </I>
<address> <company name=“Microsoft”> <street>One Microsoft way</street> <city>Redmond</city> <state>WA</state></address>
“Microsoft mailing address”
address[.*name=“Microsoft”]
![Page 7: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/7.jpg)
HTML vs. XML
• HTML
– Fixed set of tags for markups
– Semantically poor : tags only describe presentation of data
• XML
– Extensible set of semantically-rich tags
– Describe meaning/semantics of the data
![Page 8: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/8.jpg)
The Revolution
InternetInternetXML
XML
XML
![Page 9: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/9.jpg)
XML Data (Text)<?xml version=“1.0” encoding=“UTF-8” standalone=“yes”?><booklist>
<book genre=“Science” format=“Hardcover”><author>
<firstname>Richard</firstname><lastname>Feynman</lastname>
</author><title>The character of Physical Law</title>
</book><book genre=“Fiction”>
<author><firstname>R.K.</firstname><lastname>Narayan</lastname>
</author><title>Waiting for the Mahatma</title><published>1981</published>
</book></booklist>
Element
Content
Nesting
![Page 10: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/10.jpg)
XML Data (Tree)
booklist
book book
a t pa t
“Richard”
“The character of physical Law”
@g“Science”
@g“…” “…” “…”
@f“Hardcover”
f l“Feynman” f l
“…”“…”
![Page 11: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/11.jpg)
XML Basics• Elements
– Encode “concepts” in the XML database
– Nesting denotes association/inclusion
• Attributes
– Record information specific to an element (e.g., the genre of a book)
• References
– Links between elements in different parts of the document
![Page 12: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/12.jpg)
Example of XML References
<booklist><book id=“narayan_w4m” genre=“Fiction”>
<author><firstname>R.K.</firstname><lastname>Narayan</lastname>
</author><title>Waiting for the Mahatma</title>
</book>…<book id=“tolkien_lotr” genre=“Fiction”>
<author><firstname>J.R.R.</firstname><lastname>Tolkien</lastname>
</author><title>The Lord of the Rings</title><related ref=“narayan_w4m”/>
</book></booklist>
Data becomesa graph
![Page 13: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/13.jpg)
XML Data with References
booklist
book book
a t @ra t
“R.K.”
“Waiting for the Mathama”
@g“Fiction”
@g“…” “…”
f l“Narayan” f l
“Tolkien”“J.R.R”
![Page 14: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/14.jpg)
What about a schema?• XML does not require a schema
– After all, data is self-describing– More flexibility, less usability!
• There are two means for defining a “schema”:– A Document Type Definition (DTD)– An XML Schema
– Fix vocabulary of tags (and semantics)• Match information across different XML documents
– Describe nesting structure • Know where to look for what information
![Page 15: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/15.jpg)
<!DOCTYPE BOOKLIST [<!ELEMENT BOOKLIST (BOOK)*>
<!ELEMENT BOOK (AUTHOR,TITLE,PUBLISHED?) ><!ELEMENT FIRSTNAME (#PCDATA)><!ELEMENT LASTNAME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)><!ELEMENT PUBLISHED (#PCDATA)>
<!ATTLIST BOOK GENRE (Science|Fiction) #REQUIRED><!ATTLIST BOOK FORMAT (Paperback|Hardcover) “Paperback”>
]>
Document Type Definition
• DTD specifies a regular expression for every element
• Does not specify the type of content• “Loosely” structured data compared to relational
tables– Semistructured dataSemistructured data
![Page 16: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/16.jpg)
XML vs. Relational Data
name phone
John 3634
Sue 6343
Dick 6363
row row row
name name name
phone phone phone
“John” 3634 “Sue” “Dick”6343 6363
Relation
XML
![Page 17: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/17.jpg)
XML vs. Relational Data
• A relation instance is basically a tree with:– Unbounded fanout at level 1 (i.e., any # of
rows)– Fixed fanout at level 2 (i.e., fixed # fields)
• XML data is essentially an arbitrary tree– Unbounded fanout at all nodes/levels– Any number of levels– Variable # of children at different nodes,
variable path lengths
![Page 18: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/18.jpg)
Query Language for XML
• Must be high-level; “SQL for XML”
• Must conform to DTD/XML Schema
– But also work in absence of schema info
• Support simple and complex/nested datatypes
• Support universal and existential quantifiers, aggregation
• Operations on sequences and hierarchies of document structures
• Capability to transform and create XML structures
![Page 19: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/19.jpg)
Overview of XQuery• Path expressions (XPath)• Element constructors• FLWOR (“flower”) expressions
– Several other kinds of expressions as well, including conditional expressions, list expressions, quantified expressions, etc.
• Expressions evaluated w.r.t. a context:– Context item (current node)– Context position (in sequence being
processed)– Context size (of the sequence being
processed)– Context also includes namespaces, variables,
functions, date, etc.
![Page 20: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/20.jpg)
XPath Expressions
Examples:• /booklist/book• /booklist/book/author• /booklist/book/author/lastname
Given an XML document, the value of a path expression p is a set of elements (= XML subtrees)
![Page 21: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/21.jpg)
Path Expressions
• XPath expressions– Simple: /A/P/T– Branching: /A[B]/P/T– Values: /A/P/T[=v11]
• Result is a set
/
PB3
P6 B9
T13
A1
T11
P7
T12
A2
B5
T10 E14
N4 N8
V4
V10V11 V12 V13
V8
V14
![Page 22: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/22.jpg)
Path Expressions
• XPath expressions– Simple: /A/P/T– Branching: /A[B]/P/T– Values: /A/P/T[=v11]
• Result is a set
/
PB3
P6 B9
T13
A1
T11
P7
T12
A2
B5
T10 E14
N4 N8
V4
V10V11 V12 V13
V8
V14
![Page 23: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/23.jpg)
Path Expressions
• XPath expressions– Simple: /A/P/T– Branching: /A[B]/P/T– Values: /A/P/T[=v11]
• Result is a set
/
PB3
P6 B9
T13
A1
T11
P7
T12
A2
B5
T10 E14
N4 N8
V4
V10V11 V12 V13
V8
V14
![Page 24: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/24.jpg)
Path Expressions
• XPath expressions– Simple: /A/P/T– Branching: /A[B]/P/T– Values: /A/P/T[=v11]
• Result is a set
/
PB3
P6 B9
T13
A1
T11
P7
T12
A2
B5
T10 E14
N4 N8
V4
V10V11 V12 V13
V8
V14
![Page 25: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/25.jpg)
Path Expressions
• XPath expressions– Simple: /A/P/T– Branching: /A[B]/P/T– Values: /A/P/T[=v11]
• Result is a set
/
PB3
P6 B9
T13
A1
T11
P7
T12
A2
B5
T10 E14
N4 N8
V4
V10V11 V12 V13
V8
V14
![Page 26: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/26.jpg)
XPath Syntax• Path wildcards
– // = descendant at any level (or self) – * = any (single) tag– Example: /booklist//lastname
• Query attributes and attribute content– Use “@”– Examples:
/booklist//book[@format=“Paperback”], /booklist//book/@genre
• Branching predicates: A[pred] – Predicate on A’s subtree using logical connectives
(and, or, etc.), path expressions, built-in functions (e.g., contains()), etc.
– Example: //author[contains(./lastname, “Fey”)]
![Page 27: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/27.jpg)
XQuery FLWOR Expressions
• FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR / LET Clauses
WHERE Clause
ORDERBY/RETURN Clause
List of tuples
List of tuples
Instance of XQuery data model
![Page 28: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/28.jpg)
FOR vs. LET
• FOR $x IN path-expression – Binds $x in turn to each element in the
expression
• LET $x := path-expression– Binds $x to the entire list of elements in the
expression– Useful for common sub-expressions and for
aggregations
![Page 29: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/29.jpg)
FOR vs. LET: Example
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns:<result> <book>...</book> <book>...</book> <book>...</book> ...</result>
Notice that result hasseveral elements
Notice that result hasexactly one element
![Page 30: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/30.jpg)
XQuery Example 1
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
![Page 31: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/31.jpg)
XQuery Example 2
For each author of a book by Morgan Kaufmann, list all books she published:
FOR $a IN distinct( document("bib.xml"/bib/book[publisher=“Morgan Kaufmann”]/author))
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct( document("bib.xml"/bib/book[publisher=“Morgan Kaufmann”]/author))
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates (after converting inputs to atomic values)
![Page 32: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/32.jpg)
Results for Example 2
<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
Observe how nested structure of result elements is determined by the nested structure of the query.
![Page 33: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/33.jpg)
XQuery Example 3
count = (aggregate) function that returns the number of elements
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
For each publisher p
- Let the list of books published by p be b
Count the # books in b, and return p if b > 100
![Page 34: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/34.jpg)
XQuery Example 4
Find books whose price is larger than average:
LET $a := avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
LET $a := avg(document("bib.xml")/bib/book/price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/price > $a
RETURN $b
![Page 35: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/35.jpg)
Collections in XQuery• Ordered and unordered collections
– /bib/book/author = an ordered collection– Distinct(/bib/book/author) = an unordered collection
• Examples:– LET $a = /bib/book $a is a collection– $b/author also a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>
Returns a single collection! <result> <author>...</author> <author>...</author> <author>...</author> ... </result>
However:
![Page 36: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/36.jpg)
Collections in XQuery
What about collections in expressions ?
• $b/price list of n prices
• $b/price * 0.7 list of n numbers??• $b/price * $b/quantity list of n x m numbers ??
– Valid only if the two sequences have at most one element– Atomization
• $book1/author eq "Kennedy" - Value Comparison
• $book1/author = "Kennedy" - General Comparison
![Page 37: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/37.jpg)
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)
ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]
ORDERBY $b/price DESCENDING RETURN <book>
$b/title , $b/price </book> </publisher></publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher)
ORDERBY $p RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p]
ORDERBY $b/price DESCENDING RETURN <book>
$b/title , $b/price </book> </publisher></publisher_list>
![Page 38: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/38.jpg)
Conditional Expressions: If-Then-Else
FOR $h IN //holding
ORDERBY $h/titleRETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
FOR $h IN //holding
ORDERBY $h/titleRETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding>
![Page 39: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/39.jpg)
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
![Page 40: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/40.jpg)
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
![Page 41: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/41.jpg)
Other Stuff in XQuery
• Before and After– for dealing with order in the input
• Filter– deletes some edges in the result tree
• Recursive functions• Namespaces• References, links …• Lots more stuff …
![Page 42: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/42.jpg)
XML & PostgreSQL• Store XML documents as text BLOBs (Binary Large
Objects) inside text-valued columns
• Load XML in-memory and use external User- Defined Functions (UDFs) to process XPath expressions– xpath_bool(xml_text_col, “xpath_query_string”)
• False/true if element set discovered is empty/nonempty
– xpath_nodeset(xml_text_col, “xpath_query_string”)• Text result = concatenation of element subtrees
• No support for full-fledged XQuery– Some support for XSLT transformations -- won’t
discuss here…
• Pros/Cons??
![Page 43: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/43.jpg)
Summary• XML has gained momentum as a “universal data
format”– Standard for publishing/exchange in business world
• Jury is still out for the “data model” part– Still need a lot of work on efficient storage/ indexing,
query optimization, …• Increasing support in commercial systems
– BLOB approach is common, others (e.g., DB2) map XML to/from relational
– A few “native” systems• XML is the foundation for the next “Web
Revolution”– Semantic web, web services, ontologies, …– XML trees will grow everywhere!
• Click on XML/RSS tabs on web pages, or search for “XML” on your PC
![Page 44: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/44.jpg)
But, don’t just take it from me…
“Microsoft has been working with the industry to advance a new generation of software that is interoperable by design, reducing the need for custom development and cumbersome testing and certification. These efforts are centered on using XML, which makes information self-describing – and thus more easily understood by different systems. … This approach is also the foundation for XML-based Web services, which provide an Internet-based set of protocols for distributed computing. This new model for how software talks to other software has been embraced across the industry. It is the cornerstone of Microsoft .NET and the latest generation of our Visual Studio tools for software developers. This approach is also evident in the use of XML as the data interoperability framework for Office 2003 and the Office System set of products.”
• Microsoft’s address:– One Microsoft Way
Redmond, WA
Bill Gates, MS Executive Email, Feb’05
![Page 45: Introduction to XML, XPath, & XQuery](https://reader033.fdocuments.net/reader033/viewer/2022061407/56814029550346895dab8974/html5/thumbnails/45.jpg)
Some Online Resources• XPath tutorials
– http://www.w3schools.com/xpath/– http://www.zvon.org/xxl/XPathTutorial/General/exam
ples.html
• XQuery tutorials– http://www.w3schools.com/xquery/default.asp– http://www.db.ucsd.edu/people/yannis/XQueryTutoria
l.htm
• XML reading– http://www.rpbourret.com/xml/XMLAndDatabases.ht
m