Introduction to XQuery Bun Yue Professor, CS/CIS UHCL.

Post on 31-Dec-2015

234 views 1 download

Tags:

Transcript of Introduction to XQuery Bun Yue Professor, CS/CIS UHCL.

Introduction to XQuery

Bun YueProfessor, CS/CISUHCL

W3C Recommendations http://www.w3.org/TR/xquery/: W3C XQuery

http://www.w3.org/TR/xmlquery-use-cases: XQuery use cases.

http://www.w3.org/TR/xquery-operators/: XQuery and XPath functions.

http://www.w3.org/TR/xpath-datamodel/: XQuery 1.0 and XPath 2.0 Data Model.

http://www.w3.org/TR/xpath20/: XPath 2.0. http://www.w3.org/TR/xmlschema-1/: XML Schema

Part 1: Structures. http://www.w3.org/TR/xmlschema-2/: XML Schema

Part 2: datatypes.

Introduction

XQuery is designed for effectively query and retrieve information from a diversified XML sources.

The XML sources can be one or more XML documents.

XQuery is derived from Quilt, and has borrowed features from XPath, XQL, SQL, etc.

Introduction It is a functional language where a

query is an expression. There are three faces of the XQuery

languages: A "surface" syntax that programmers

may probably use. An XML-based syntax that machine may

probably use (XQueryX). A formal semantic that XQuery engine

implementators use.

Introduction. XQuery 1.0 extends XPath 2.0. The type system of XQuery is based

on XML Schema. A limitation of XQuery:

No update or insert. The basic building block of XQuery is

expressions. (In this sense, like SQL, XQuery is not a full programming language.)

Comparing to SQL

Relational DB: SQL

XML DB: XQuery

Basic units relations collections

Records tuples or rows of schema

documents of same schema

Schema Relational Schema

DTD, XML Schema

Query results Relations: unordered list of rows

Ordered sequences of nodes.

Review of XPath 2.0 The value of an expression is a sequence,

which is an ordered list of items. An item can be a node or of atomic value. There are 7 node types:

Document Element Attribute Comment Text Processing Instruction Namespace

XQueryXFor doc("census.xml")//person[@job="Athlete"]the corresponding XQueryX can be:

<?xml version="1.0"?><q:query xmlns:q="http://www.w3.org/2001/06/xqueryx">  <q:step q:axis="descendant-or-self">    <q:function q:name="document">      <q:constant q:datatype="xs:string">census.xml</q:constant> </q:function>    <q:predicatedExpr>      <q:identifier>person</q:identifier>      <q:predicate> <q:function q:name="equals"> <q:step q:axis="attribute"> <q:identifier>job</q:identifier>         

     </q:step>               <q:constant q:datatype="xs:string">Athlete</q:constant>          </q:function>      </q:predicate>   

</q:predicatedExpr>  </q:step></q:query>

Data Types

XQuery is strongly typed. XQuery types are based on

XML Schema: using the namespace prefix xs and url: http://www.w3.org/2001/XMLSchema.

XPath functions and operators: using the namespace prefix xdt and url: http://www.w3.org/2004/07/xpath-datatypes

Types

Types

xdt:untyped is used to denote element nodes not yet validated.

xdt:untypedAtomic is used to denote atomic types that has not been assigned a more specific type.

Query

A query in XQuery is an expression for reading XML documents or fragments

and returning a sequence of well-formed XML

fragments

Everything in XQuery is an expression that is evaluated to a value.

Query expressions Some common forms of XQuery

expressions are (these appear in most tutorials): path expressions element constructors FLWR or FLOWR (pronounced as "flower")

expressions list expressions conditional expressions quantified expressions datatype expressions

More Queries Examples of other expressions

include: primary expressions sequence expressions arithmetic expressions logical expressions comparison expressions sorting expressions validate expressions

Comments

XQuery comments are embedded within (: and :).

Functions

Supports a collection of about 200 built-in operators and functions to be used within expressions.

Input functions in XQuery include doc() and collection(). They are used to identify the sources of the XML documents.

Input Functions

Input functions: doc() collection().

Prolog

XQuery may have prologs for declarations. Examples: Variable declarations Function declarations Base-URI declarations Version declarations Module import …

Variable Declarations

Format: declare variable $name = expression;

E.g.

declare variable $a := doc("census.xml")//person ;

Path Expressions

XQuery 1.0 is a superset of XPath 2.0. An XPath expression is also an

XQuery expression

Editix

Use “View > Windows > XQuery Builder”

For XQ files, use “XSLT/XQuery > Transform using an XQuery Request…” Specify source xq file, xml file and output

file. Use .xml extension. If you use .txt

extension, only text node contents are output.

Examples

declare base-uri "whatever-path";doc("bib.xml")/*

Return basically bib.xml.

Example

doc("bib.xml")//*

Return many nodes (in a sequence).

Results are not well-formed.

Examples

doc("bib.xml")//book[@year]

count(doc("census.xml")//person)

Element Constructors

Element constructors can be used to construct XML elements.

If the name, attributes, and content of the element are all constants, the element constructor is based on standard XML notation and is called a direct element constructor (W3C).

Example

The XQuery<authors><author>Bun Yue</author></authors>returns<authors><author>Bun Yue</author></authors>

Element Constructors

XQuery expressions can be embedded in the direct element constructors within a pair of curly braces, {}.

For the characters '{' and '}', use '{{' and '}}' respectively.

XQuery expressions may be separated by commas.

Example

<authors><author>Bun Yue</author>{ doc("bib.xml")//author }</authors>

Adds Bun Yue to the authors of bib.xml.

Computed Constructors Computed constructors can also be used to

declare nodes: Use the keywords element, attribute, document,

text, processing-instruction, comment, or namespace to declare the type of the nodes.

Specify the node names for those node types with names (element, attribute, processing instruction, and namespace nodes)

Use a pair of braces to define the content expressions.

Note the use of commas to separate expressions in the context.

Example (from W3C)

element book { attribute isbn {"isbn-0060229357" }, element title { "Harold and the Purple

Crayon"}, element author { element first { "Crockett" },

element last {"Johnson" } }}

Example (result)

<book isbn="isbn-0060229357">    <title>Harold and the Purple Crayon</title>    <author>      <first>Crockett</first>      <last>Johnson</last>    </author>

</book>

Dynamic Element Names

Computed expressions can be used to create elements with dynamic names.

Example

<result>{ for $author in doc("bib.xml")//author return element {$author/last/text()} { $author/first }}</result>

Example Result<?xml version="1.0" encoding="UTF-8"?><result> <Stevens> <first>W.</first> </Stevens> <Stevens> <first>W.</first> </Stevens> <Abiteboul> <first>Serge</first> </Abiteboul> <Buneman> <first>Peter</first> </Buneman> <Suciu> <first>Dan</first> </Suciu></result>

Example Note that <first> is a child element. See

the difference of:<result>{ for $author in doc("bib.xml")//author return element {$author/last/text()} { $author/first/text() }}</result>

Example

This example may also result in a runtime error (as the value of <last> may not be suitable for a QName.

FLWOR expressions

FLWOR expressions are one of the most important constructs in XQuery.

You may compare with the SELECT statement of SQL.

FLWOR (W3C)[42]    FLWORExpr    ::=    (ForClause | LetClause)+ WhereClause?

OrderByClause? "return" ExprSingle[43]    ForClause    ::=    "for" "$" VarName TypeDeclaration?

PositionalVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle)*

[45]    LetClause    ::=    "let" "$" VarName TypeDeclaration? ":=" ExprSingle ("," "$" VarName TypeDeclaration? ":=" ExprSingle)*

[123]    TypeDeclaration    ::=    "as" SequenceType[44]    PositionalVar    ::=    "at" "$" VarName[46]    WhereClause    ::=    "where" Expr[47]    OrderByClause    ::=    ("order" "by" | "stable" "order" "by")

OrderSpecList [48]    OrderSpecList    ::=    OrderSpec ("," OrderSpec)*[49]    OrderSpec    ::=    ExprSingle OrderModifier[50]    OrderModifier    ::=    ("ascending" | "descending")? (("empty"

"greatest") | ("empty" "least"))? ("collation" StringLiteral)?

FLWOR FLWOR expressions allow:

For: Iteration through items in XPath 2.0 sequences. Create a tuple stream where each tuple contains a distinct binding for each variable to a distinct value.

Let: Variables binding Where: Predicate application for inclusion in the

iteration. Order by: Ordering data set for the iteration. Return: Constructing new result for returning.

For and Let

The for and let clauses produces a tuple stream.

A tuple consists of one or more bound variables.

A variable begins with the prefix $. A bound variable is one that has been

assigned a value.

Example

declare base-uri “whatever”;let $a := doc("bib.xml")//authorreturn<authors> { $a }</authors>

Example Results<?xml version="1.0" encoding="UTF-8"?><authors> <author> <last>Stevens</last> <first>W.</first> </author> <author> <last>Stevens</last> <first>W.</first> </author>…</authors>

Example Note

In this example: The tuple stream is composed of only

one tuple. The variable $b in this tuple is bound to

the node sequence of 5 <author> nodes.

Example

for $a in doc("bib.xml")//authorreturn<authors> { $a }</authors>

Example Result<?xml version="1.0" encoding="UTF-8"?><authors> <author> <last>Stevens</last> <first>W.</first> </author></authors><authors> <author> <last>Stevens</last> <first>W.</first> </author></authors>…</authors>

Example Notes

In this example: The tuple stream is composed of only

five tuples. The variable $b in this tuple is bound to

one <author> node at a time.

Example

for $a in doc("bib.xml")//author, $b in doc("bib.xml")//authorreturn <count/>

Example Result

<?xml version="1.0" encoding="UTF-8"?>

<count/><count/><count/><count/>… (: 25 counts :)

Example Note The tuple stream is composed of only 25 tuples. The 25 tuples are:

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>)

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>)

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Abiteboul</last><first>Serge</first></author>)

Example

for $a in doc("bib.xml")//author, $b in $a/lastreturn <count />

Example Result

<?xml version="1.0" encoding="UTF-8"?>

<count/><count/><count/><count/><count/>

Example Note The tuple stream is composed of only 5 tuples. The 5 tuples are:

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <last>Stevens</last)

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <last>Stevens</last)

($a: <author><last>Abiteboul</last><first>Serge</first></author>, $b: <last>Abiteboul</last)

Example

for $a in doc("bib.xml")//author, $b in doc("bib.xml")//authorwhere $a = $breturn <result><alast>{ $a/last/text()

}</alast><blast>{ $b/last/text() }</blast></result>

Example Result<?xml version="1.0" encoding="UTF-8"?><result> <alast>Stevens</alast> <blast>Stevens</blast></result>… (: three more times. :)<result> <alast>Abiteboul</alast> <blast>Abiteboul</blast></result><result> <alast>Buneman</alast> <blast>Buneman</blast></result><result> <alast>Suciu</alast> <blast>Suciu</blast></result>

Example Note The tuple stream is composed of only 7 tuples. The 7 tuples are:

($a: <author><last>Stevens</last><first>W.</first></author>, $b: <author><last>Stevens</last><first>W.</first></author>) (: 4 times)

($a: <author><last>Abiteboul</last><first>Serge</first></author>, $b: <author><last>Abiteboul</last><first>Serge</first></author>)

Example

<figlist> {for $f in doc("tree-data.xml")//figure return <diagram> { $f/@* } { $f/title } </diagram>}</figlist>

Example Result<?xml version="1.0" encoding="UTF-8"?><figlist> <diagram height="400" width="400"> <title>Traditional client/server architecture</title> </diagram> <diagram height="200" width="500"> <title>Graph representations of structures</title> </diagram> <diagram height="250" width="400"> <title>Examples of Relations</title> </diagram></figlist>

Example Note There are three tuples in the tuple stream

of the for clause. Each tuple has one variable: $f, which is bounded to each of the three <figure> elements in the input xml contents respectively.

{ $f/@* } returns the attributes of the original <figure> elements, which will be put as attributes of the output <figure> element.

Example

<authors> { fn:string-join(for $a in doc("tree-

data.xml")//author return $a/text(), ", ") }

</authors>

Example Result

<?xml version="1.0" encoding="UTF-8"?>

<authors>Serge Abiteboul, Peter Buneman, Dan Suciu</authors>

Example Note

fn:string-join takes two arguments: A sequence of string, and A string join separator

Example

<book> {for $f in doc("tree-data.xml")//figure return <figure> { attribute size { $f/@width *

$f/@height } } </figure>}</book>

Example Result

<?xml version="1.0" encoding="UTF-8"?>

<book> <figure size="160000"/> <figure size="100000"/> <figure size="100000"/></book>

Example

<book> { for $f in doc("tree-data.xml")//figure let $size := $f/@width * $f/@height order by $size return <figure> { attribute size { $size } } </figure>}</book>

Example Result

<?xml version="1.0" encoding="UTF-8"?>

<book> <figure size="100000"/> <figure size="100000"/> <figure size="160000"/></book>

Exercise #1 Use bib.xml, Show all books published by Addison

Wesley.<bib>

    <book>        <title>TCP/IP Illustrated</title>        <author><last>Stevens</last><first>W.</first></author>    </book>    <book>        <title>Advanced Programming in the Unix environment</title>        <author><last>Stevens</last><first>W.</first></author>    </book></bib>

Exercise #2 All books by Addison-Wesley using different

format:<bib>

    <book author="W. Stevens">        <name>TCP/IP Illustrated</name>    </book>    <book author="W. Stevens">        <name>Advanced Programming in the Unix environment</name>    </book>

</bib>

Exercise #3

All books written by W. Stevens ordered by years:

<result> <book-title>Advanced Programming

in the Unix environment</book-title> <book-title>TCP/IP Illustrated</book-

title></result>

Exercise #4 All books written by W. Stevens

ordered by years in descending order:<result> <book-title>TCP/IP Illustrated</book-

title> <book-title>Advanced Programming

in the Unix environment</book-title></result>

Exercise #5 Use ft2.xml, return every <person> with its

<first> and <last> child elements. Add a child element <numEmail> to include the number of email addresses.

  <result>    <person>      <first>Boris</first>      <last>Becker</last>      <numEmail>2</numEmail>    </person>…

  </result>

Exercise #6 Return all <person> elements with all attributes. The body of the

<person> element should be the name of the person in the format of first name and then last name. For ft2.xml, it returns:

  <result>    <person ssn="s123456789" gender="M" luckynumber="7">Boris Becker</person>    <person ssn="s111222333" gender="F" luckynumber="6">Valerie Becker</person>    <person ssn="s123123123" gender="M" luckynumber="4">Chris Becker</person>    <person ssn="s222333444" gender="F">Julie Becker</person>    <person ssn="s555987323" gender="M">John Becker</person>    <person ssn="s887667545" gender="F">Mary Becker</person>

</result>

Exercise #7 Return all pairs of <first> elements of persons with the same last

name, not including pairing with oneself. Each pair of result is embedded in an element with the last name of the persons as the element name. For ft2.xml, it returns:

<result>   <Becker><first>Boris</first><first>Valerie</first></Becker>   <Becker><first>Boris</first><first>Chris</first></Becker>   <Becker><first>Boris</first><first>Julie</first></Becker>   <Becker><first>Boris</first><first>John</first></Becker>   <Becker><first>Boris</first><first>Mary</first></Becker>   <Becker><first>Valerie</first><first>Boris</first></Becker>

…</result>

Exercise #8 Convert all text nodes to <text /> and all

elements with name x to <element name="x" />. For ft2.xml, it returns:

  <result>     <element name="familytree"/>    <text/>    <text/>    <element name="meta"/>

…</result>

Function Declarations XQuery allows user-defined functions in the

prolog.[26]   FunctionDecl   ::=   "declare" "function"

QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external")

[27]   ParamList   ::=   Param ("," Param)*[28]   Param   ::=   "$" QName

TypeDeclaration?[118]   TypeDeclaration   ::=   "as"

SequenceType

Example: factorial($i)

declare function local:factorial($i as xs:integer) as xs:integer

{ if ($i < 0) then 0 else if ($i = 0) then 1 else $i * local:factorial($i - 1)};local:factorial(6)

Functions There is a ; after the function declaration. The namespace prefix local is used for user-

defined functions. XQuery predefines the namespace prefix local to the namespace http://www.w3.org/2004/07/xquery-local-functions, and reserves this namespace for use in defining local functions.

The types of the arguments and return values should be sequence types.

Types

Sequence type can be: empty(), or ItemType OccurrenceIndicator?

OccurrenceIndicator can be +, ? or *. Item type can be:

item() atomic type, or kind test.

Kind Tests

Important kind tests include node() text() comment() processing-instruction(): with optional

name argument. element test attribute test

Element Tests

Example of element tests are: element(*) element(familytree) element(man, personType)

Functions

Writing XQuery functions: Functional programming. Many are recursive in nature. Beware of types of parameters and

return values.

Example from W3Cdeclare function local:depth($e as node()) as xs:integer{ (: A node with no children has depth 1 :) (: Otherwise, add 1 to max depth of children :) if (fn:empty($e/*)) then 1 else fn:max(for $c in $e/* return local:depth($c)) + 1};<result>{ local:depth(doc("ft2.xml"))}</result>

Exercise #9

Write an XQuery function to count the number of elements in an element node (including itself). Try to use a recursive solution.

Exercise #10

For XML document such as ft2.xml, write a function that returns all child person nodes with parent of social security number $ssn.

Questions