XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

86
XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

description

XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page. Overview. XML A self-describing, hierarchal data model DTD, XML Schema Standardizing schemas for XML Xpath, XQuery How to navigate and query XML documents Xslt - PowerPoint PPT Presentation

Transcript of XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Page 1: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML

These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Page 2: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Overview

XML A self-describing, hierarchal data model

DTD, XML Schema Standardizing schemas for XML

Xpath, XQuery How to navigate and query XML documents

Xslt How to transform one XML document into

another XML document

Page 3: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML – eXtensible Markup Language

Language A way of communicating information

Markup Notes or meta-data that describe your

data or language

Extensible Limitless ability to define new

languages or data sets

Page 4: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

MotivationData interchange is critical in today’s networked world Examples:

Banking: funds transfer Order processing (especially inter-company orders) Scientific data

Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), …

Paper flow of information between organizations is being replaced by electronic flow of information

Each application area has its own set of standards for representing informationXML has become the basis for all new generation data interchange formats

Page 5: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML Document

<accommodations> <hotel> <name Hyatt <name> <amenities pool=“Y” gym=“N” > <phone number="(216) 555-1234“ /> <phone number="(216) 555-1258“ /> <address sc=“OH"> <street> South St. </street> <city> Cleveland </city> <state> OH</state> </address>

<available> <room type="S" price="125.00"> <number> 101 </number> <dates> <from> 10/30/2002 </from> <to> 11/04/2002 </to> </dates> </room> <room type=...> ... </room> ... </available> </hotel> <hotel> ...

Page 6: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Example

<class name=‘CS63005’><location building=‘MSB’ room=‘121’/><professor>Yuri Breitbart</professor><student_list>

<student id=‘999-991’>John Smith</student>

<student id=‘999-992’>Jane Doe</student></student_list>

</class>

Page 7: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML GenealogyHTML is one of the parents of XMLXML is not a replacement for HTMLAbility to introduce new tags and to nest them Includes data and data description Example: Chemical Markup Language

<molecule><weight>234.5</weight><Spectra>…</Spectra><Figures>…</Figures>

</molecule>

Page 8: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Structure of XML Data

Xml is a hierarchy of user-defined tags called elements with attributes and dataData is described by elements, elements are described by attributes

<student id=‘999-991’>John Smith</student>closing tag

attributeattribute value

dataopen tagelement name

Page 9: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Structure of XML Data

Elements are nestedEvery document must have a single top-level element

Page 10: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Elements

<student id=‘999-991’>John Smith</student>

XML is case and space sensitiveElement opening and closing tag names must be identicalOpening tags: “<” + element name + “>”Closing tags: “</” + element name + “>”Empty Elements have no data and no closing tag: They begin with a “<“ and end with a “/>”

<location/>

closing tagattribute

attribute value

dataopen tagelement name

Page 11: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Attributes

<student id=‘999-991’>John Smith</student>

Attributes arte inside the starting and ending tags of an element.There can be zero or more attributes in every element; each one has the the form:

attribute_name=‘attribute_value’- There is no space between the name and the “=‘”- Attribute values must be surrounded by “ or ‘ characters

Multiple attributes are separated by white space (one or more spaces or tabs).

closing tagattribute

attribute value

dataopen tagelement name

Page 12: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Attributes Vs. Subelements

Distinction between subelement and attribute In the context of documents, attributes are part of

markup, while subelement contents are part of the basic document contents

In the context of data representation, the difference is unclear and may be confusing Same information can be represented in two ways

<account account-number = “A-101”> …. </account> <account>

<account-number>A-101</account-number> … </account>

Suggestion: use attributes for identifiers of elements, and use subelements for contents

Page 13: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Data

<student id=‘999-991’>John Smith</student>

XML data is any information between an opening and closing tagXML data must not contain the ‘<‘ or ‘>’ characters

closing tagattribute

attribute value

dataopen tagelement name

Page 14: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Nesting & Hierarchy

XML tags can be nested in a tree hierarchyXML documents can have only one root tagBetween an opening and closing tag you can insert:

1. Data2. More Elements3. A combination of data and elements

<root> <tag1> Some Text <tag2>More</tag2> </tag1></root>

Page 15: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

More on XML SyntaxElements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag <account number=“A-101”

branch=“Perryridge” balance=“200 />

To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below <![CDATA[<account> … </account>]]>

Here, <account> and </account> are treated as just strings

Page 16: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Namespaces

XML data has to be exchanged between organizationsSame tag name may have different meaning in different organizations, causing confusion on exchanged documentsSpecifying a unique string as an element name avoids confusion

Page 17: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Namespaces

Avoid using long unique names all over document by using XML Namespaces

<bank Xmlns:FB=‘http://www.FirstBank.com’> …

<FB:branch>

<FB:branchname>Downtown</FB:branchname>

<FB:branchcity> Brooklyn</FB:branchcity> </FB:branch>…

</bank>

Page 18: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML – Storage

Storage is done just like an n-ary tree (DOM)

<root>

<tag1>

Some Text

<tag2>More</tag2>

</tag1>

</root>

NodeType: Element_NodeName: ElementValue: Root

NodeType: Element_NodeName: ElementValue: tag1

NodeType: Text_NodeName: TextValue: More

NodeType: Element_NodeName: ElementValue: tag2

NodeType: Text_NodeName: TextValue: Some Text

Page 19: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xml vs. Relational Model

Id Speed RAM HD

101

800Mhz

256MB

40GB

102

933Mhz

512MB

40GB

Computer Table

<Table>

<Computer Id=‘101’>

<Speed>800Mhz</Speed>

<RAM>256MB</RAM>

<HD>40GB</HD>

</Computer>

<Computer Id=‘102’>

<Speed>933Mhz</Speed>

<RAM>512MB</RAM>

<HD>40GB</HD>

</Computer>

</Table>

Page 20: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML Document Schema

Database schemas constrain what information can be stored, and the data types of stored valuesXML documents are not required to have an associated schemaHowever, schemas are very important for XML data exchange Otherwise, a site cannot automatically

interpret data received from another site

Page 21: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML Document Schema

Document Type Definition (DTD) Widely used

XML Schema Newer, not yet widely used

Page 22: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Document Type Definition (DTD)

The type of an XML document can be specified using a DTDDTD constraints structure of XML data What elements can occur What attributes can/must an element have What subelements can/must occur inside each

element, and how many times.

DTD does not constrain data types All values represented as strings in XML

XML protocols and languages can be standardized with DTD present

Page 23: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Fruit Basket DTD<?xml version='1.0'?><!ELEMENT Basket (Cherry+, (Apple | Orange)*) >

<!ELEMENT Cherry EMPTY><!ATTLIST Cherry flavor CDATA #REQUIRED>

<!ELEMENT Apple EMPTY><!ATTLIST Apple color CDATA #REQUIRED>

<!ELEMENT Orange EMPTY><!ATTLIST Orange location ‘Florida’>

-------------------------------------------------------------------------------- <Basket>

<Apple/> <Cherry flavor=‘good’/> <Orange/></Basket>

<Basket> <Cherry flavor=‘good’/> <Apple color=‘red’/> <Apple color=‘green’/></Basket>

Page 24: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Bank DTD

<!DOCTYPE bank [<!ELEMENT bank ( ( account | customer | depositor)+)><!ELEMENT account (account-number branch-name balance)><! ELEMENT customer(customer-name customer-street customer-city)><! ELEMENT depositor (customer-name account-number)><! ELEMENT account-number (#PCDATA)><! ELEMENT branch-name (#PCDATA)><! ELEMENT balance(#PCDATA)><! ELEMENT customer-name(#PCDATA)><! ELEMENT customer-street(#PCDATA)><! ELEMENT customer-city(#PCDATA)>

]>

Page 25: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

DTD Syntax

<!ELEMENT element (subelements-specification) ><!ATTLIST element (attributes) >

Page 26: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Element Specification in DTD

<!ELEMENT Basket (Cherry+, (Apple | Orange)*) >

!ELEMENT declares an element name, and what children elements(subelements) it should have Wildcards: * Zero or more + One or more

Name Children

Page 27: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Element Specification in DTD

Subelements can be specified as names of elements, or #PCDATA (parsed character data), i.e., character

strings EMPTY (no subelements) or ANY (anything can be

a subelement)

Example<! ELEMENT depositor (customer-name account-number)>

<! ELEMENT customer-name(#PCDATA)><! ELEMENT account-number (#PCDATA)>

Page 28: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Element Specification in DTD

Subelement specification may have regular expressions <!ELEMENT bank ( ( account |

customer | depositor)+)> Notation:

“|” - alternatives “+” - 1 or more occurrences “*” - 0 or more occurrences

Page 29: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Attribute Specification in DTD

Attribute specification : for each attribute Name Type of attribute

CDATA ID (identifier) or IDREF (ID reference) or IDREFS

(multiple IDREFs) Whether

mandatory (#REQUIRED) has a default value (value), or neither (#IMPLIED)

Page 30: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Attribute Specification in DTD

Examples <!ATTLIST account acct-type CDATA

“checking”> <!ATTLIST customer

customer-id ID # REQUIREDaccounts IDREFS # REQUIRED >

Page 31: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

IDs and IDREFsAn element can have at most one attribute of type IDThe ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier

An attribute of type IDREF must contain the ID value of an element in the same documentAn attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

Page 32: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

DTD –Well-Formed and Valid

<?xml version='1.0'?><!ELEMENT Basket (Cherry+)>

<!ELEMENT Cherry EMPTY><!ATTLIST Cherry flavor CDATA #REQUIRED>

--------------------------------------------------------------------------------

Well-Formed and Valid<Basket> <Cherry flavor=‘good’/></Basket>

Not Well-Formed<basket> <Cherry flavor=good></Basket>

Well-Formed but Invalid<Job> <Location>Home</Location></Job>

Page 33: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Bank DTD with Attributes

Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[

<!ELEMENT account (branch, balance)> <!ATTLIST account

account-number ID # REQUIRED owners IDREFS # REQUIRED>

<!ELEMENT customer(customer-street, customer-city)> <!ATTLIST customer

customer-name ID # REQUIRED accounts IDREFS # REQUIRED>

… declarations for branch, balance, customer-street and customer-city

]>

Page 34: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML data with ID and IDREF attributes

<bank-2><account account-number=“A401”

owners=“Joe, Mary”> <branch-name>Downtown</branch-name> <balance>500</balance> </account> <customer customer-name=“Joe”

accounts=“A401”> <customer-street>Monroe</customer-

street <customer-city>Madison</customer-city>

</customer><customer customer-name=“Mary”

accounts=“A401”> <customer-street> Erin</customer-

street> <customer-city> Newark </customer-city>

</customer></bank-2>

Page 35: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Limitations of DTDsNo typing of text elements and attributes All values are strings, no integers, reals, etc.

Difficult to specify unordered sets of subelements Order is usually irrelevant in databases (A | B)* allows specification of an unordered set,

but Cannot ensure that each of A and B occurs only once

IDs and IDREFs are untyped The owners attribute of an account may contain

a reference to another account, which is meaningless owners attribute should ideally be constrained to refer

to customer elements

Page 36: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML SchemaXML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values

E.g. integer, string, etc Also, constraints on min/max values

User defined types Is itself specified in XML syntax, unlike DTDs

More standard representation, but verbose Is integrated with namespaces Many more features

List types, uniqueness and foreign key constraints, inheritance ..

BUT: significantly more complicated than DTDs, not yet widely used.

Page 37: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML Schema of Bank DTD<xsd:schema xmlns:xsd:http://www.w3.org/2001/XMLSchema><xsd:element name=“bank” type=“BankType”/><xsd:element name=“account”> <xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:sequence> </xsd:complexType> </xsd:element>…….definitions o customer and depositor…………….<xsd:complexType name=“BankType”>

<xsd:sequence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded” /> <xsd:element ref”customer” minOccurs=“0”

maxOccurs=“unbounded />……….

Page 38: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XML – Querying and Schema Transforming

Translation of XML schemas and querying are closely related and handled by the same toolStandard XML querying/translation languages Xpath: Simple language consisting of path

expressions XSLT: Simple language designed for translation

from XML to XML and XML to HTML Xquery: An XML query language with a rich set of

features

Wide variety of other languages have been proposed.

Page 39: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Tree Model of XML Data

Query and transformation languages are based on a tree model of XML dataAn XML document is modeled as a tree, with nodes corresponding to elements and attributes Element nodes have children nodes, which can be

attributes or subelements Text in an element is modeled as a text node child of

the element Children of a node are ordered according to their

order in the XML document Element and attribute nodes (except for the root

node) have a single parent, which is an element node The root node has a single child, which is the root

element of the document

Page 40: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath

When XML is stored in a tree, XPath allows you to navigate to different nodes.

Class

Student Student

Text:Jeff

Text:Pat

<Class>

<Student>Jeff</Student>

<Student>Pat</Student>

</Class>

Page 41: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath

XPath is used to address (select) parts of documents using path expressionsA path expression is a sequence of steps separated by “/”Result of path expression: set of values that along with their containing elements/attributes match the specified path E.g. /bank-2/customer/name evaluated on the bank-2 data we saw earlier returns <name>Joe</name> <name>Mary</name>

E.g. /bank-2/customer/name/text( ) returns the same names, but without the

enclosing tags

Page 42: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xpath

The initial “/” denotes root of the document (above the top-level tag)Path expressions are evaluated left to rightSelection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400] /bank-2/account[balance] Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account-

number

Page 43: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath

XML is similar to a file structure, but you can select more than one node:

//Class/Student Class

Student Student

Text:Jeff

Text:Pat

<Class>

<Student>Jeff</Student>

<Student>Pat</Student>

</Class>

Page 44: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath<class name=‘CS63005’> <location building=‘MSB’ room=‘121’/> <professor>Yuri Breitbart</professor> <student_list> <student id=‘999-991’>John Smith</student> <student id=‘999-992’>Jane Doe</student> </student_list></class>

//class[@name=‘CS63005’]/student_list/student/@id

Starting Element

Attribute Constraint

Element Path

Selection

Selection Result: The attribute nodes containing 999-991 and 999-992

Page 45: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath - Context

Context – your current focus in an XML document Use:

//<root>/… When you want to start from the

beginning of the XML document

Page 46: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath - Context

Student Student

Text:Jeff

Text:Pat

Prof

Text:Gehrke

ListLocation

Attr:Olin

Class

XPath: List/Student

Page 47: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath - Context

Student Student

Text:Jeff

Text:Pat

Prof

Text:Gehrke

ListLocation

Attr:Olin

Class

XPath: Student

Page 48: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath – Examples

<Basket><Cherry flavor=‘sweet’/><Cherry flavor=‘bitter’/><Cherry/><Apple color=‘red’/><Apple color=‘red’/><Apple color=‘green’/>…

</Basket>

Select all of the red apples:

//Basket/Apple[@color=‘red’]

Page 49: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath – Examples

<Basket><Cherry flavor=‘sweet’/><Cherry flavor=‘bitter’/><Cherry/><Apple color=‘red’/><Apple color=‘red’/><Apple color=‘green’/>…

</Basket>

Select the cherries that have some flavor:

//Basket/Cherry[@flavor]

Page 50: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XPath – Examples

<orchard> <tree> <apple color=‘red’/> <apple color=‘red’/> </tree> <basket> <apple color=‘green’/> <orange/> </basket></orchard>

Select all the apples in the orchard:

//orchard/descendant()/apple

Page 51: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Functions in XPath

XPath provides several functions The function count() at the end of a path

counts the number of elements in the set generated by the path E.g. /bank-2/account[customer/count() > 2]

Also function for testing position (1, 2, ..) of node w.r.t. siblings

Boolean connectives and and or and function not() can be used in predicatesIDREFs can be referenced using function id() /bank-2/account/id(@owner)

Page 52: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

More XPath Features

Operator “|” used to implement union E.g. /bank-2/account/id(@owner) |

/bank-2/loan/id(@borrower) gives customers with either accounts or loans

“//” can be used to skip multiple levels of nodes E.g. /bank-2//name

finds any name element anywhere under the /bank-2 element, regardless of the element in which it is contained.

A step in the path can go to:parents, siblings, ancestors and descendants

of the nodes generated by the previous step, not just to the children

“//”, described above, is a short from for specifying “all descendants”

“..” specifies the parent.

Page 53: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XSLT

A stylesheet stores formatting options for a document, usually separately from document E.g. HTML style sheet may specify font colors

and sizes for headings, etc.

The XML Stylesheet Language (XSL) was originally designed for generating HTML from XMLXSLT is a general-purpose transformation language Can translate XML to XML, and XML to HTML

XSLT transformations are expressed using rules called templates Templates combine selection using XPath with

construction of results

Page 54: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XSLT

Amazon.com order form:<single_book_order> <title>Databases</title> <qty>1</qty></single_book_order>

Supplier’s order form:<form7957> <purchase item=’book’ property=’title’ value=’Databases’

quantity=’1’/></form7957>

Page 55: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – A First Look<single_book_order> <title>Databases</title> <qty>1</qty></single_book_order>

<form7957> <purchase item=’book’ property=’title’ value=’Databases’ quantity=’1’/></form7957>

<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'> <xsl:template match='single_book_order'> <form7957><purchase item='book' property='title' value='{title}‘ quantity='{qty}'/></form7957> </xsl:template></xsl:stylesheet>

Page 56: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – Header

Xslt stylesheets MUST include this body:

<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'

version='1.0'> …</xsl:stylesheet>

Page 57: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – Templates

Xslt stylesheets are a collection of templates Templates are like functions The body of a template is the output

of a transformation

Page 58: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt - Templates

You define a template with the <xsl:template match=‘’> instruction

You call a template with the <xsl:apply-templates select=‘’> instruction

1. All elements or attributes that satisfy the the select attribute

expression are selected.

2. For each element or attribute that is selected:

i. A matching template is found in the stylesheet.

ii. The body of the template is executed.

Page 59: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – Template MatchingStylesheet<xsl:template match=‘basket’> <new_basket> <xsl:apply-templates select=‘apple’/> <xsl:apply-templates select=‘box’/> </new_basket></xsl:template>

<xsl:template match=‘apple’> <apple/></xsl:template>

<xsl:template match=‘box’> <box/> <xsl:apply-templates/><xsl:template>

XML<basket> <apple color=‘red’/> <apple color=‘green/> <apple color=‘green/> <box> <orange taste=‘good’/> <peach/> <apple color=‘red’/> </box></basket>

Transformed XML:<new_basket> <apple/> <apple/> <apple/> <box/><apple/></new_basket>

Page 60: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – choose Instruction

<xsl:choose> instruction is similar to a C++ or Java switch statement<xsl:when test=‘’> instruction is similar to the case statement<xsl:otherwise> instruction is similar to the default statement

Page 61: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – choose Example Original Xml: <customer> <order id=‘5’> <item><title>Database Management Systems</title></item> </order> </customer>

Xslt Stylesheet: <xsl:template match=‘customer’> FUNCTION <xsl:choose> SWITCH <xsl:when test='order/@id'> CASE <single_book_order> <title><xsl:value-of select='order/item/title'/></title> </single_book_order> </xsl:when> <xsl:otherwise><single_book_order><fail/> DEFAULT </single_book_order></xsl:otherwise> </xsl:choose> </xsl:template>

Output Xml:<single_book_order><title>Database Management Systems</title></single_book_order>

Page 62: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – choose Example 2 Original Xml: <customer> <order> <item><title>Database Management Systems</title></item> </order> </customer>

Xslt Stylesheet: <xsl:template match=‘customer’> FUNCTION <xsl:choose> SWITCH <xsl:when test='order/@id'> CASE <single_book_order> <title><xsl:value-of select='order/item/title'/></title> </single_book_order> </xsl:when> <xsl:otherwise><single_book_order><fail/> DEFAULT </single_book_order></xsl:otherwise> </xsl:choose> </xsl:template>

Output Xml:<single_book_order><fail/></single_book_order>

Page 63: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – for-each Instruction

<xsl:for-each select=‘item’> instruction is similar to a foreach iterator or a for loopThe select attribute selects a set of elements from an Xml document

Page 64: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – if Instruction

<xsl:if test=‘’> instruction is similar to an if statement in Java or C++The test attribute is the if condition: True

statement is true test returns an element or attribute.

False statement is false test returns nothing

There is no ‘else’, so use the <xsl:choose> operator in this situation.

Page 65: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Xslt – for-each and if Example

Original Xml: <basket> <apple color=‘red’ condition=‘yummy’/> <apple color=‘green’ condition=‘wormy/> <apple color=‘red’ condition=‘crisp’/> </basket>

Xslt Stylesheet: <xsl:template match=‘basket’> FUNCTION <condition_report> <xsl:for-each select=‘apple’> FOR LOOP <xsl:if test=“contains(@color, ‘red’)”> IF <condition><xsl:value-of select=‘@condition’/></condition> </xsl:if> </xsl:for-each> </condition_report> </xsl:template>

Output Xml: <condition_report> <condition>yummy</condition> <condition>crisp</condition> </condition_report>

Page 66: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Joins in XSLT

XSLT keys allow elements to be looked up (indexed) by values of subelements or attributes Keys must be declared (with a name) and, the key() function can then

be used for lookup. E.g. <xsl:key name=“acctno” match=“account”

use=“account-number”/> <xsl:value-of select=key(“acctno”, “A-101”)

Keys permit (some) joins to be expressed in XSLT

<xsl:key name=“acctno” match=“account” use=“account-number”/>

<xsl:key name=“custno” match=“customer” use=“customer-name”/>

<xsl:template match=“depositor”.

<cust-acct>

<xsl:value-of select=key(“custno”, “customer-name”)/>

<xsl:value-of select=key(“acctno”, “account-number”)/>

</cust-acct>

</xsl:template>

<xsl:template match=“*”/>

Page 67: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Structural Recursion Action of a template can be to recursively apply templates to the

contents of a matched element E.g. <xsl:template match=“/bank”>

<customers> <xsl:template apply-templates/>

</customers > <xsl:template match=“/customer”> <customer>

<xsl:value-of select=“customer-name”/> </customer>

</xsl:template> <xsl:template match=“*”/>

Example output: <customers> <customer> John </customer> <customer> Mary </customer> </customers>

Page 68: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XSLT – Other Information

W3C is standardizing XPath and Xslt:

http://www.w3.org/TR/xslt.html

http://www.w3.org/TR/xpath.html

Lot’s of Books. Here’s a suggestion: D. Martin et al. Professional Xml. Wrox

Press, 2000.

Page 69: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XQuery Motivation

Separation between logical and physical data viewsDeclarative query language for object oriented modelVariety of Data Types

Page 70: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XQuery

XQuery is a general purpose query language for XML data Currently being standardized by the World Wide Web Consortium (W3C)Alpha version of XQuery engine available free from MicrosoftXQuery uses a for … let … where .. result … syntax for SQL from where SQL where result SQL select let allows temporary variables, and has no equivalent in SQL

Page 71: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Example

List each publisher and the average price of its books

FOR $p IN distinct distinct (document(“bib.xml”)//publisher)(document(“bib.xml”)//publisher)

LET LET $a := $a := avgavg(document(“bib.xml”)/book[publisher=$p]/price)(document(“bib.xml”)/book[publisher=$p]/price)

RETURNRETURN

<publisher><publisher>

<name> $p/text() </name>,<name> $p/text() </name>,

<avgprice> $a </avgprice><avgprice> $a </avgprice>

</publisher></publisher>

Page 72: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Example

Find titles of all books in which both sailing and windsurfing are mentioned

in the same paragraph.

FOR $b IN //book//book

WHERE SOME WHERE SOME $p $p ININ $b//para SATISFIES

contains contains ($p, “sailing”) ($p, “sailing”)

AND contains AND contains ($p, “windsurfing”)

RETURN RETURN $b/title

Page 73: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

FLWR Syntax for XQuery

For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPathSimple FLWR expression in XQuery find all accounts with balance > 400, with

each result enclosed in an <account-number> .. </account-number> tag for $x in /bank-2/account let $acctno := $x/@account-number where $x/balance > 400 return <account-number> $acctno </account-number>

Page 74: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

FLWR Syntax for XQuery

Let clause not really needed in the previous query, and selection can be done In XPath. Query can be written as:

for $x in /bank-2/account[balance>400]return <account-number>

$X/@account-

number

Page 75: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Path Expressions and Functions

Aggregate functions such as sum( ) and count( ) can be applied to path expression resultsXQuery does not support groupby, but the same effect can be got by nested queries, with nested FLWR expressions within a result clause More on nested queries later

Page 76: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Path Expressions and Functions

Path expressions are used to bind variables in the for clause, but can also be used in other places E.g. path expressions can be used in let clause, to

bind variables to results of path expressions

The function distinct( ) can be used to removed duplicates in path expression results

The function document(name) returns root of named document E.g. document(“bank-2.xml”)/bank-2/account

Page 77: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

JoinsJoins are specified in a manner very similar to SQL

for $b in /bank/account, $c in /bank/customer,

$d in /bank/depositor where $b/account-number =

$d/account-number and $c/customer-name = $d/customer-name

return <cust-acct> $c $b </cust-acct>

Page 78: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Joins

The same query can be expressed with the selections specified as XPath selections:

for $b in /bank/account $c in /bank/customer

$d in /bank/depositor[ account-number =$b/account-number and customer-name = $c/customer-name] return <cust-acct> $c $b</cust-acct>

Page 79: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Changing Name Structure

The following query converts data from the flat structure for bank information into the nested structure used in bank-1

<bank-1> for $c in /bank/customer return

<customer> $c/* for $d in /bank/depositor[customer-name = $c/customer-name], $a in /bank/account[account-number= $d/account-number] return $a

<customer> </bank-1>

$c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag

Page 80: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

XQuery Path Expressions

$c/text() gives text content of an element without any subelements/tagsXQuery path expressions support the “–>” operator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but

simpler to use Can be applied to a set of IDREFs to get a

set of results June 2001 version of standard has changed

“–>” to “=>”

Page 81: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Sorting in XQuery

Sortby clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer return <customer> $c/* </customer> sortby(name)Can sort at multiple levels of nesting (sort by customer-name, and by account-number within each customer)

<bank-1> for $c in /bank/customer return <customer> $c/*

for $d in /bank/depositor[customer-name=$c/customer-name],

$a in /bank/account[account-number=$d/account-number] return <account> $a/* </account> sortby(account-number)

</customer> sortby(customer-name) </bank-1>

Page 82: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Functions and Other XQuery FeaturesUser defined functions with the type system of XMLSchema function balances(xsd:string $c) returns list(xsd:numeric) { for $d in /bank/depositor[customer-name = $c], $a in /bank/account[account-number=$d/account-number] return $a/balance

}

Page 83: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Functions and Other XQuery Features (con’t)

Types are optional for function parameters and return valuesUniversal and existential quantification in where clause predicates some $e in path satisfies P every $e in path satisfies P XQuery also supports If-then-else clauses

Page 84: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Application Program Interface

There are two standard application program interfaces to XML data: SAX (Simple API for XML)

Based on parser model, user provides event handlers for parsing events

E.g. start of element, end of element Not suitable for database applications

DOM (Document Object Model) XML data is parsed into a tree representation Variety of functions provided for traversing the DOM

tree E.g.: getParentNode( ), getFirstChild( ),

getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), …

Also provides functions for updating DOM tree

Page 85: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

Storage of XML DataXML data can be stored in Non-relational data stores

Flat files Natural for storing XML But has all problems discussed in Chapter 1 (no

concurrency, no recovery, …) XML database

Database built specifically for storing XML data, supporting DOM model and declarative querying

Currently no commercial-grade systems

Relational databases Data must be translated into relational form Advantage: mature database systems Disadvantages: overhead of translating data and

queries

Page 86: XML These slides are borrowed from Silberschatz book and also from Johannes Gehrke web page.

URL Tutorials

http://msdn.microsoft.com/xml/tutorial/default.asp

http://www.ils.unc.edu/~kempa/inls259/xml/

http://www.geocities.com/SiliconValley/Peaks/5957/10minxml.html