XML, XML Schema, Xpath and Xquery
description
Transcript of XML, XML Schema, Xpath and Xquery
XML, XML Schema, Xpath and Xquery
Slides collated from various sources, many from Dan Suciu at Univ. of
Washington
CS561 - Spring 2004. 2
XML
W3C standard to complement HTML
• origins: structured text SGML
• motivation:– HTML describes presentation– XML describes content
• • http://www.w3.org/TR/2000/REC-xml-20001006 (version
2, 10/2000)
SGMLXMLHTML4.0
CS561 - Spring 2004. 3
From HTML to XML
HTML describes the presentation
CS561 - Spring 2004. 4
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
CS561 - Spring 2004. 5
XML<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>XML describes the content
CS561 - Spring 2004. 6
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element
well formed XML document: if it has matching tags
CS561 - Spring 2004. 7
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
CS561 - Spring 2004. 8
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
CS561 - Spring 2004. 10
XML Namespaces
• http://www.w3.org/TR/REC-xml-names (1/99)
• name ::= [prefix:]localpart
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
<book xmlns:isbn=“www.isbn-org.org/def”>
<title> … </title>
<number> 15 </number>
<isbn:number> …. </isbn:number>
</book>
CS561 - Spring 2004. 11
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
<tag xmlns:mystyle = “http://…”>
…
<mystyle:title> … </mystyle:title>
<mystyle:number> …
</tag>
XML Namespaces
• syntactic: <number> , <isbn:number>
• semantic: provide URL for schema
defined here
CS561 - Spring 2004. 13
XML Schemas
• http://www.w3.org/TR/xmlschema-1/10/2000
• generalizes DTDs• uses XML syntax• two documents: structure and datatypes
– http://www.w3.org/TR/xmlschema-1– http://www.w3.org/TR/xmlschema-2
• XML-Schema is complex
CS561 - Spring 2004. 14
XML Schemas
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>
DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>
CS561 - Spring 2004. 15
Elements v.s. Types in XML Schema
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>
<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
<xsd:element name=“person” type=“ttt”><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>
DTD: <!ELEMENT person (name,address)>
CS561 - Spring 2004. 16
• Types:– Simple types (integers, strings, ...)
– Complex types (regular expressions, like in DTDs)
• Element-type-element alternation:– Root element has a complex type
– That type is a regular expression of elements
– Those elements have their complex types...
– ...
– On the leaves we have simple types
Elements v.s. Types in XML Schema
CS561 - Spring 2004. 17
Local and Global Types in XML Schema
• Local type: <xsd:element name=“person”>
[define locally the person’s type] </xsd:element>
• Global type: <xsd:element name=“person” type=“ttt”/>
<xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType>
Global types: can be reused in other elements
CS561 - Spring 2004. 18
Local v.s. Global Elements inXML Schema
• Local element: <xsd:complexType name=“ttt”>
<xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType>
• Global element: <xsd:element name=“address” type=“...”/>
<xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType>
Global elements: like in DTDs
CS561 - Spring 2004. 19
Regular Expressions in XML Schema
Recall the element-type-element alternation: <xsd:complexType name=“....”>
[regular expression on elements] </xsd:complexType>
Regular expressions:• <xsd:sequence> A B C </...> = A B C
• <xsd:choice> A B C </...> = A | B | C
• <xsd:group> A B C </...> = (A B C)
• <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*
• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?
CS561 - Spring 2004. 20
Attributes in XML Schema
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
<xsd:element name=“paper” type=“papertype”/>
<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
. . . . . .
</xsd:sequence>
<xsd:attribute name=“language" type="xsd:NMTOKEN" fixed=“English"/>
</xsd:complexType>
Attributes are associated to the type, not to the elementOnly to complex types; more trouble if we want to add attributesto simple types.
CS561 - Spring 2004. 21
“Mixed” Content, “Any” Type
• Better than in DTDs: can still enforce the type, but now may have text between any elements
• Means anything is permitted there
<xsd:complexType mixed="true"> . . . .
<xsd:complexType mixed="true"> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
<xsd:element name="anything" type="xsd:anyType"/> . . . .
CS561 - Spring 2004. 22
Derived Types by Extensions <complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
<complexType name="Address">
<sequence> <element name="street" type="string"/>
<element name="city" type="string"/>
</sequence>
</complexType>
<complexType name="USAddress">
<complexContent>
<extension base="ipo:Address">
<sequence> <element name="state" type="ipo:USState"/>
<element name="zip" type="positiveInteger"/>
</sequence>
</extension>
</complexContent>
</complexType>
Corresponds to inheritance
CS561 - Spring 2004. 23
Derived Types by Restrictions
• (*): may restrict cardinalities, e.g. (0,infty) to (1,1); may restrict choices; other restrictions…
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
<complexContent> <restriction base="ipo:Items“> … [rewrite the entire content, with restrictions]... </restriction> </complexContent>
Corresponds to set inclusion
CS561 - Spring 2004. 24
Keys in XML Schema<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<purchaseReport>
<regions>
<zip code="95819">
<part number="872-AA" quantity="1"/>
<part number="926-AA" quantity="1"/>
<part number="833-AA" quantity="1"/>
<part number="455-BX" quantity="1"/>
</zip>
<zip code="63143">
<part number="455-BX" quantity="4"/>
</zip>
</regions>
<parts>
<part number="872-AA">Lawnmower</part>
<part number="926-AA">Baby Monitor</part>
<part number="833-AA">Lapis Necklace</part>
<part number="455-BX">Sturdy Shelves</part>
</parts>
</purchaseReport>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
<key name="NumKey">
<selector xpath="parts/part"/>
<field xpath="@number"/>
</key>
XML:
XML Schema:
CS561 - Spring 2004. 25
Keys in XML Schema
• In general, two flavors:
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<key name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
<unique name=“someDummyNameHere">
<selector xpath=“p"/>
<field xpath=“p1"/>
<field xpath=“p2"/>
. . .
<field xpath=“pk"/>
</key>
Note: all Xpath expressions “start” at the element currently being definedThe fields must identify a single node
CS561 - Spring 2004. 26
Keys in XML Schema
• Unique = guarantees uniqueness
• Key = guarantees uniqueness and existence
• All Xpath expressions are “restricted”:– /a/b | /a/c OK for selector”– //a/b/*/c OK for field
• Note: better than DTD’s ID mechanism
CS561 - Spring 2004. 27
Keys in XML Schema
• Examples<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
<key name="fullName">
<selector xpath=".//person"/>
<field xpath="forename"/>
<field xpath="surname"/>
</key>
<unique name="nearlyID">
<selector xpath=".//*"/>
<field xpath="@id"/>
</unique>
Recall: must haveA single forename,Single surname
CS561 - Spring 2004. 28
Foreign Keys in XML Schema
• Example
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
<keyref name="personRef" refer="fullName">
<selector xpath=".//personPointer"/>
<field xpath="@first"/>
<field xpath="@last"/>
</keyref>
XPATH
CS561 - Spring 2004. 30
XPath• Goal = permit to access some nodes from document• XPath main construct : axis navigation• XPath path consists of one or more navigation steps,
separated by /• Navigation step : axis + node-test + predicates• Examples
– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle =“XML”][2]
• XPath also offers shortcuts– no axis means child– // /descendant-or-self::node()/
CS561 - Spring 2004. 31
XPath- Child axis navigation• author is shorthand for child::author. Examples:
– aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)
– . -- the context node– / -- the root node
aaa
bbb
ccc aaa
aaa bbb ccc
1 2 3
4 5 6 7
context node
CS561 - Spring 2004. 32
XPath- child axis navigation
– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node
(equivalent to aaa)– text() -- all the text children of the context node– node() -- all the children of the context node
(includes text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //text() -- all the text nodes in the document
CS561 - Spring 2004. 33
Predicates
– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context node– [last()] -- the last child node of the context node– chapter[title=“introduction”] -- the chapter children of
the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes)
– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”
CS561 - Spring 2004. 34
Axis navigation• So far, nearly all our expressions have moved us down by
moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node
• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others
describe sequences of nodes.
CS561 - Spring 2004. 35
XPath Navigation Axesancestor
descendant
followingpreceding
following-siblingpreceding-sibling
child
attribute
namespace
self
CS561 - Spring 2004. 36
XPath abbreviated syntax
(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)
Query Languages - XQuery
CS561 - Spring 2004. 52
Summary of XQuery
• FLWR expressions• FOR and LET expressions• Collections and sorting
ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/
CS561 - Spring 2004. 53
XQuery
• Based on Quilt (which is based on XML-QL)
• http://www.w3.org/TR/xquery/2/2001
• XML Query data model (ordered)
CS561 - Spring 2004. 54
FLWR (“Flower”) Expressions
FOR ... LET... FOR... LET...
WHERE...
RETURN...
CS561 - Spring 2004. 55
XQuery
Find all book titles published after 1995:
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
FOR $x IN document("bib.xml")/bib/book
WHERE $x/year > 1995
RETURN $x/title
Result: <title> abc </title> <title> def </title> <title> ghi </title>
CS561 - Spring 2004. 56
XQueryFor each author of a book by Morgan
Kaufmann, list all books she published:
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN <result>
$a,
FOR $t IN /bib/book[author=$a]/title
RETURN $t
</result>
distinct = a function that eliminates duplicates
CS561 - Spring 2004. 57
XQuery
Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>
CS561 - Spring 2004. 58
XQuery
• FOR $x in expr -- binds $x to each element in the list expr
• LET $x = expr -- binds $x to the entire list expr– Useful for common subexpressions and for
aggregations
CS561 - Spring 2004. 59
XQuery
count = a (aggregate) function that returns the number of elms
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
<big_publishers>
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml")/book[publisher = $p]
WHERE count($b) > 100
RETURN $p
</big_publishers>
CS561 - Spring 2004. 60
XQuery
Find books whose price is larger than average:
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
LET $a=avg(document("bib.xml")/bib/book/@price)
FOR $b in document("bib.xml")/bib/book
WHERE $b/@price > $a
RETURN $b
CS561 - Spring 2004. 61
XQuery
Summary:
• FOR-LET-WHERE-RETURN = FLWR
FOR/LET Clauses
WHERE Clause
RETURN Clause
List of tuples
List of tuples
Instance of Xquery data model
CS561 - Spring 2004. 62
FOR v.s. LET
FOR
• Binds node variables iteration
LET
• Binds collection variables one value
CS561 - Spring 2004. 63
FOR v.s. LET
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
FOR $x IN document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
LET $x := document("bib.xml")/bib/book
RETURN <result> $x </result>
Returns: <result> <book>...</book> <book>...</book> <book>...</book> ...</result>
CS561 - Spring 2004. 64
Collections in XQuery
• Ordered and unordered collections– /bib/book/author = an ordered collection
– Distinct(/bib/book/author) = an unordered collection
• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)
RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>
CS561 - Spring 2004. 65
Sorting in XQuery
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY(price DESCENDING) </publisher> SORTBY(name) </publisher_list>
CS561 - Spring 2004. 66
Sorting in XQuery
• Sorting arguments: refer to name space of RETURN clause, not FOR clause
• To sort on an element you don’t want to display, first return it, then remove it with an additional query.
CS561 - Spring 2004. 67
If-Then-Else
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
FOR $h IN //holding
RETURN <holding>
$h/title,
IF $h/@type = "Journal"
THEN $h/editor
ELSE $h/author
</holding> SORTBY (title)
CS561 - Spring 2004. 68
Existential Quantifiers
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
contains($p, "sailing")
AND contains($p, "windsurfing")
RETURN $b/title
CS561 - Spring 2004. 69
Universal Quantifiers
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title
FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
contains($p, "sailing")
RETURN $b/title