eXtensible Markup Language - Unitat de Coordinació ...jblat/material/doctorat/xml.pdf ·...

32
1 eXtensible Markup Language Jesús Ibáñez, Toni Navarrete, Rocío García, Josep Blat Universitat Pompeu Fabra eXtensible Markup Language ? New Internet mark-up metalanguage ? Previously: SGML, HTML ? Extensibility, structure and validation ? SGML adaptation for WWW

Transcript of eXtensible Markup Language - Unitat de Coordinació ...jblat/material/doctorat/xml.pdf ·...

1

eXtensible MarkupLanguage

Jesús Ibáñez, Toni Navarrete, Rocío García, Josep BlatUniversitat Pompeu Fabra

eXtensible MarkupLanguage

?New Internet mark-up metalanguage

?Previously: SGML, HTML?Extensibility, structure and

validation?SGML adaptation for WWW

2

eXtensible MarkupLanguage

?Defined as standard by W3C (Generic SGML Editorial Review Board -XML Working Group)

?XML != HTML++ ; XML == SGML--

?XML, DTD (Document Type Definition) and XSL (eXtensible Style Language)

Main Characteristics

? Describing semantically document content

? Uncoupling semantic description from presentation

? Allowing each user community to define its own labels, for instance: <PRICE>, <AUTHOR>, <SECTION>, <DATE>,

<IMPORTANCE LEVEL="Expert">

3

XML Example (without DTD)

<?XML version="1.0" standalone="yes"?><conversation>

<greeting>Hello world!</greeting><answer>Stop it, I’m getting

off!</answer></conversation>

Example with DTD (1)<!DOCTYPE Book[

<!ELEMENT Book(Title, Author, Date, ISBN, Publisher) <!ELEMENT Title(#PCDATA)> <!ELEMENT Author(#PCDATA)> <!ELEMENT Date(#PCDATA)> <!ELEMENT ISBN(#PCDATA)> <!ELEMENT Publisher(#PCDATA)>

]>

4

Example with DTD (2)<?xml version="1.0"? standalone=“no”><!DOCTYPE Book SYSTEM "file://localhost/xml-

course/xsl/Book.dtd"><Book>

<Title>My Life and Times</Title><Author>Paul McCartney</Author><Date>July, 1998</Date><ISBN>94303-1202143892</ISBN><Publisher>McMillan

Publishing</Publisher></Book>

DTDs? Allow to create new sets of labels? Examples:

? <!ELEMENT Title (#PCDATA)>? <!ELEMENT Disk (Disk)+> (1 or more)? <!ELEMENT Book (Book)*> (0 or more)? ? (0 or 1) , (sequence) | (option)? Attributes:

• <!ATTLIST ARTICLE DATE CDATA>(CDATA means Character Data)

• <!ATTLIST PERSON GENDER (male | female) #IMPLIED>(optional)

• <!ATTLIST PERSON GENDER (male | female) “male” #REQUIRED> (required)

5

DTDs<!DOCTYPE Discography[

<!ELEMENT Discography (disk)*><!ELEMENT Disk (Title, Group, Song*)><!ELEMENT Title(#PCDATA)> <!ELEMENT Group(#PCDATA)> <!ELEMENT Song (titleS, Duration> <!ELEMENT titleS(#PCDATA)> <!ELEMENT Duration(#PCDATA)>

]>

DTDs

< Discography>< Disk>

< Title>Brother in arms</ Title>< Group>Dire Straits</ Group>< Song>

< titleS>Money for nothing</ titleS>< Duration>5:20</ Duration>

</ Song>< Song>

<titleS>So far away</titleS><duration>4:10</duration>

</ Song>...

</Disk><Disk>

<Title>On every street</Title><Group>Dire Straits</Group><Song>...

</Disk></Discography>

6

DTDs<!DOCTYPE publications[

<!ELEMENT publications (disk | book)*><!ELEMENT book ... > <!ELEMENT disk ... >

]>

DTDs

<publications><disk>

<titledisk>Brother in arms</titledisk><group>Dire Straits</group><song>

<titleS>Money for nothing</titleS><duration>5:20</duration>

</song>...

</disc><book>

<titlebook>Cien años de soledad</titlebook><writer>Gabriel García Márquez</writer>...

</book><book>

<titlebook>La ciudad de los prodigios</titlebook><writer>Eduardo Mendoza</writer>...

</book></publications>

7

DTDs<?xml version="1.0"?><!DOCTYPE file [

<!ELEMENT file (name+, surname+, address+, picture?)><!ELEMENT name (#PCDATA)><!ATTLIST name sex (male|female) #IMPLIED><!ELEMENT surname (#PCDATA)><!ELEMENT address (#PCDATA)><!ELEMENT picture EMPTY>

]>

<file><name sex=“male”>Toni</name><surname>Navarrete</surname><surname>Terrasa</surname><address>Rambla 32</address>

</file>

Well formed versus valid

?Well formed: XML syntax compliant?Valid: the content conforms to the

rules of the associated DTD. ?Completeness, good format and

attribute values of the XML data is ensured.

?An XML document without DTD can be well formed but, of course, cannot be valid.

8

DTD limitations; Schemas and Namespaces? Difficult to read (and parse)? Not extensible? No support for datatypes and inheritance? Solution: namespaces and schemas can

integrate different sources? Re-use and better structure are

supported? XML syntax: improved readability and

processing

XML Schemas? XML Schemas to define the structure of

XML documents (same as DTDs BUT in XML syntax)

? Same parser to validate, tools for dynamic creation

? Use of Namespaces? Improved data type definition (41 instead

of 10, plus user-defined)? Object orientation allows new types by

extension or restriction of previous ones? Validation (a document wrt a scheme, a

scheme wrt scheme of schemes)

9

Schema definition? An XML document whose root is “schema”

and within it elements and attributes are defined:<?xml version="1.0“?><schema>

... elements and attributes definition</schema>

? element definition<element name=“name of the element”

type=“type of the element”[options...]

>

Simple types of elements

? string: characters chain? boolean (false, 0, true, 1)? float (32 bits)? double (64 bits)? decimal (integer)? timeDuration? recurringDuration (several subtypes)? binary? uriReference (Uniform Resource Indicator)And derived from these basic ones

10

Data type structure

Example<?xml version="1.0“ encoding="ISO-8859-1“?><bookshop>

<book isbn="84-111-1111-1"><title>El Quijote</ title><author>Miguel de Cervantes</author><publisher>Plaza y Janés</publisher><character>Don Quijote</character><character>Sancho Panza</character><character>Dulcinea</character><character>Rocinante</character>

</book><book isbn="84-222-2222-2">

<title>La ciudad de los prodigios</ title><author>Eduardo Mendoza</author><publisher>Seix-Barral</publisher><character>Onofre Boubila</character><character>Efren Castells</character>

</book><book isbn="84-333-3333-3">

<title>Cien años de soledad</title><author>Gabriel García Márquez</author><publisher>Planeta</publisher><character>Aureliano Buendía</character>

</bookshop>

XML document previous to schema definition

11

Building blocks: simple elements and cardinality

?Simple elements:<element name=“title" type="string" /><element name="author" type="string" /><element name=“publisher" type="string" /><element name=“character"

minOccurs="0" maxOccurs="unbounded" />

? A DTD would be like:<!ELEMENT title (#PCDATA)>

? In the cardinality definition we replace the DTD symbols ?, *, +

Building blocks: Complex types

? The element book is composite, thus we define it as a complex type:

<element name=“book"><complexType>

<sequence><element name=“title" type="string" /><element name="author" type="string" /><element name=“publisher" type="string" /><element name=“character" minOccurs="0" maxOccurs="unbounded" />

</sequence></complexType>

</element>

12

Alternative: naming complex types

?We could also define a complex type with a name:

<element name=“book” type=“Booktype” />

<complexType name=“Booktype”><element name=“title" type="string" /><element name="author" type="string" /><element name=“publisher" type="string" /><element name=“character" minOccurs="0"

maxOccurs="unbounded" /></complexType>

Remark: the combination of both is not allowed

<element name=“book” type=“Booktype”><complexType name=“Booktype”>

<element name=“title" type="string" /><element name="author" type="string" /><element name=“publisher" type="string"

/><element name=“character" minOccurs="0"

maxOccurs="unbounded" /></complexType>

</element>

13

Building blocks: empty elements? Elements such as HTML tags <hr> or <img ...>

are empty<hr /><img src=“image.gif” />

? Empty has to be declared as an implicit complex type

<element name=“img”><complexType content=“empty”>

<attribute name=“src” type=“string” /></complexType>

</element>

<element name=“hr”><complexType

content=“empty” /></element>

A level upwards ...?Let us define “bookshop”:<element name=“bookshop">

<complexType><element name=“book"

minOccurs="0” maxOccurs="unbounded">

<complexType>...

</complexType></element>

</complexType></element>

A schema definition is a BOTTOM-UP process

14

Attribute definition

? Elements can have attributes associated to them

? In DTDs, we would write:<!ATTLIST book isbn #REQUIRED>

In XML Schema:<attribute name=“name of the attribute”

type=“type of the attribute”[options of the attribute ...]

>

Attribute definition

?At the end of the element definition<element name=“book" minOccurs="0"

maxOccurs="unbounded"><complexType>

<element name=“title" type="string" /><element name="autor" type="string" /><element name=“publisher" type="string" /><element name=“character"

minOccurs="0" maxOccurs="unbounded" />

<attribute name="isbn" type="string" /></complexType>

</element>

15

General ordering

?The definitions are ordered for a better legibility:?1) Simple types definition?2) Attributes definition?3) Complex types definition

Referencing the schema

?We then add the schema reference in the XML document: assume it is book.xmland bookshop is book.xsd then we would write:<?xml version="1.0" encoding="ISO-8859-1"?><bookshop

xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance”xsi:noNamespaceSchemaLocation=“book.xsd”

>...

</bookshop>

16

Namespaces

? An XML Namespace is a collection of names (of elements and attributes) identified by an URI

? Namespaces are a very flexible tool. The re-use of schemas, names, mixing them is promoted.

? For instance we could use elements from two name spaces

< BOOKS><bk: BOOK xmlns:bk="urn: BookLovers.org:BookInfo“

xmlns:money="urn:Finance:Money"><bk:TITLE>A Suitable Boy</bk:TITLE><bk:PRICE money:currency="US Dollar">22.95</bk:PRICE>

</bk:BOOK></BOOKS>

Namespaces

? http://www.w3.org/2000/10/XMLSchema? This is the Namespace for the schemata. Suffix xsd

is used; if none, it is the default namespace

? http://www.w3.org/2000/10/XMLSchema-instance? Namespace for the documents instantiated from a

schema. The prefix xsi is usually used.

17

Example<schema xmlns="http://www.w3.org/2000/10/XMLSchema 1

targetNamespace="http://www.upf.es/namespaces/Book” 2elementFormDefault="qualified” 3xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-

instance”xsi:schemaLocation=

"http://www.w3.org/2000/10/XMLSchemahttp://www.w3.org/2000/10/XMLSchema.xsd"

xmlns:bk="http://www.publishing.org/namespaces/Book">

1 Indicates the default namespace, which is XMLSchema2 Indicates that the elements and attributes in this schema are

defined upon the namespace http://www.upf.es/namespaces/Book3 Indicates that all the elements created in this namespace and used

in the instantiated documents have to be qualified with a prefix (if we had used unqualified, only the global elements could go)

Example (2)<schema xmlns="http://www.w3.org/2000/10/XMLSchema ”

targetNamespace="http://www.upf.es/namespaces/BookelementFormDefault="qualified”xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-

instance” 4xsi:schemaLocation= 5

"http://www.w3.org/2000/10/XMLSchema 6http://www.w3.org/2000/10/XMLSchema.xsd" 7

xmlns:bk=" http://www.upf.es/namespaces/Book">

4 Indicates that this XML document is instantiated from the general Schema on Schemata

5 This is the namespace where the attribute schemaLocation is defined

6 The namespace for the general Schema on Schemata7 URI of this Schema on Schemata

18

Example (3)<schema xmlns="http://www.w3.org/2000/10/XMLSchema ”

targetNamespace="http://www.upf.es/namespaces/Book”elementFormDefault="qualified”xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-

instance”xsi:schemaLocation=

"http://www.w3.org/2000/10/XMLSchemahttp://www.w3.org/2000/10/XMLSchema.xsd"

xmlns:bk="http://www.upf.es/namespaces/Book"> 8

8 We give a prefix to the target namespace to facilitate the use in documents, for instance:

<element ref=“bk:Title" minOccurs="1" maxOccurs="1"/>

Example (and 4)? In the instantiated document:<bookshop xmlns ="http://www.upf.es/namespaces/Book” 1

xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance” 2

xsi:schemaLocation=“http://www.upf.es/namespaces/book.xsd"> 3

1 We define the default namespace of the document2 We include the namespace where schema instantiation is

defined (xsi)3 With schemaLocation we specify where is the Schema for this

document (book.xsd)

19

Other important concepts

? ID and IDREFS?DOM (Document Object Model)?X-path?X-pointer?X-link

ID and IDREFS? ID attribute for unique identification of element.

Similar role of URI. Example assigning the identity “attack”:<paragraph id=“attack”>Suddenly the skies were filled

with aircraft</paragraph>

? IDREFS (identity reference) easiest way of referring to an ID. Example: In a DTD defined attributes of employee “empnumber” as an ID and “boss” as IDREFS; here we say that Hank’s ID is 126 and his boss is 124 (defined earlier):< employee empnumber=“emp126” boss=“emp124”>

Hank</employee>

20

DOM (Document Object Model)

?DOM is a technology for accessing and manipulating parts of an XML document

?DOM models a document as a tree whose nodes are its elements

?Then some properties and methods exist for the objects, allowing the access and manipulation

Linking in HTML

? HTML provides <A HREF=…>, with the following properties:? HTML links are embedded in the source

document? HTML links only allow navigation in one direction? HTML links only connect two resources? HMTL links do not specify the behaviour of the

rendering engine

21

Linking using XML: X-LINK

?X-Link is a language for describing how to link resources in XML

?We use attributes for the element link in the NameSpace xlink at "http://www.w3.org/XML/XLink/1.0"

?The attributes are used to describe end-points, traversal, effect, resources

XLink

? An XLink is an explicit relationship between resources or portions of resources? It is possible to address a portion of a resource. For example, if

the whole resource is an XML document, a useful portion of that resource might be a particular element inside the document. Following a link to it might result, for example, in highlighting that element or scrolling to that point in the document

? XLink links are able to associate all kinds of resources, not just XML-encoded ones

? One of the common uses of XLink is to create hyperlinks

22

Some related concepts

? An arc that has a local starting resource and a remote ending resource goes outbound, that is, away from the linking element.] (Examples of links with such an arc are the HTML A element)

? If an arc's ending resource is local but its starting resource is remote, then the arc goes inbound

? If neither the starting resource nor the ending resource is local, then the arc is a third-party arc

? Documents containing collections of inbound and third-party links are called link databases, or linkbases

Types of Xlinks

? XLink offers two kinds of links:? Simple links

Simple links offer shorthand syntax for a common kind of link, an outbound link with exactly two participating resources (into which category HTML-style A and IMG links fall)

? Extended links Extended links offer full XLink functionality, such as inbound and third-party arcs, as well as links that have arbitrary numbers of participating resources

? While simple links are conceptually a subset of extended links, they are syntactically different. To convert a simple link into an extended link, several structural changes would be needed

23

Simple links

Extended links

? Typically, extended linking elements are stored separately from the resources they associate (for example, in entirely different documents). Thus, extended links are important for situations where the participating resources are read-only, or where it is expensive to modify and update them but inexpensive to modify and update a separate linking element, or where the resources are in formats with no native support for embedded links (such as many multimedia formats).

24

Extended links

Some elements of XLink

? The extended-type element may contain a mixture of the following elements in any order, possibly along with other content and markup:? locator-type elements that address the remote resources participating

in the link? arc-type elements that provide traversal rules among the link's

participating resources? title-type elements that provide human-readable labels for the link? resource-type elements that supply local resources that participate in

the link? The extended-type element may have the semantic attributes role

and title They supply semantic information about the link as a whole; the role attribute indicates a property that the entire link has, and the title attribute indicates a human-readable description of the entire link

25

More Xlinking

? For an XLink application to traverse from a starting resource to an ending resource, it needs to locate both the starting resource and the link. Locating the two pieces is not a problem in the case of outbound arcs because the starting resource is either the linking element itself or a child of the linking element. However, in the case of inbound and third-party arcs, the XLink application needs to be able to find both pieces somehow

? Linkbases are often used to make link management easier by gathering together a number of related linking elements.

X-PATH

?X-Path is a language for referencing parts of an XML document

? It is used, for instance, to transform a document through XSL

?X-Path is based upon DOM; and uses paths (similar to URLs) to reference parts of a document

26

X-POINTER

?X-Pointer is a language for pointing at a part of an XML document; it is similar to the HTML # (anchor)

?X-Pointer uses X-path for pointing?X-Pointer enables linking

XLink available related SW

?X2X from empolis UK Ltd. is an XML XLink Engine. X2X allows linking between documents and information resources without needing to change the resources that are being linked. X2X removes the requirement to insert link information inside document content. The Links are NOT in the document?Fujitsu XLink Processor : Fujitsu XLink Processor, which is developed by Fujitsu Laboratories Ltd., is an implementation of XLink and Xpointer?xlinkit.com : is a lightweight application service which provides rule-basedXLink generation and checks the consistency of distributed documents and web content. You tell xlinkit.com the information you want to link and rules that relate the information. xlinkit.com will generate the links that you can then use for navigation. It will also diagnose inconsistent information?Mozilla: The Opensource browser has support for XLinks simple links?Amaya: The W3C editor/browser now supports XLinks simple links too?XTooX is a free XLink processor that turns extended, out-of-line links into inline links. It takes as its input a linkbase - a document containing only XLinks -and puts the links into the referenced documents. XTooX is available under the GNU Lesser General Public License

27

XPointer available related SW

?Fujitsu XLink Processor : Fujitsu XLink Processor, which is developed by Fujitsu Laboratories Ltd., is an implementation of XLink and (almost all of) Xpointer?libxml : the Gnome XML library has a beta implementation ofXPointer. The full syntax is supported but the testsuite does not cover all aspects yet?4XPointer : this is an XPointer Processor Written in Python by Fourthought, Inc?At the University of Bologna two different implementations of XPointerare in progress, one in Javascript for ASP pages and another in Java?XPointerLib, from the Connexions project, a mozdev.org project providing XPointer support for Mozilla / Netscape 7 / Phoenix browsers. It is an XPCOM service written in JavaScript that creates and resolves a subset of the XPointer language?X2X from empolis UK Ltd. is an XML XLink Engine. X2X allows linking between documents and information resources without needing to change the resources that are being linked. X2X removes the requirement to insert link information inside document content. The Links are NOT in the document

Some benefits of Xlink

? Easier to control cycles? Keep abstraction from resources (views)? Promotes localization? More info: www.w3c.org

28

XSL

Transformation Engine(XSL Parser)

XSL

XML HTML

• Allows to incorporate a design into an XML document, generating HTML, PDF, mail, SMS message, ...• Using CSS and DSSSL (SGML)

XSL<?xml version="1.0"?><!DOCTYPE BookCatalogue SYSTEM "file://localhost/xml-course/xsl/BookCatalogue.dtd"><BookCatalogue>

<Book><Title>My Life and Times</Title><Author>Paul McCartney</Author><Date>July, 1998</Date><ISBN>94303-12021-43892</ISBN><Publisher>McMillin Publishing</Publisher>

</Book><Book>

<Title>Illusions The Adventures of a Reluctant Messiah</Title><Author>Richard Bach</Author><Date>1977</Date><ISBN>0-440-34319-4</ISBN><Publisher>Dell Publishing Co.</Publisher>

</Book><Book>

<Title>The First and Last Freedom</Title><Author>J. Krishnamurti</Author><Date>1954</Date><ISBN>0-06-064831-7</ISBN><Publisher>Harper &amp; Row</Publisher>

</Book></BookCatalogue>

29

XSLDocument

/

PI<?xml version=“1.0”?>

DocumentType<!DOCTYPE BookCatalogue ...>

ElementBookCatalogue

ElementBook

ElementBook

ElementBook

ElementTitle

ElementAuthor

ElementDate

ElementISBN

ElementPublisher

... ...

TextMy Life ...

TextPaul McCartney

TextJuly, 1998

Text94303-12021-43892

TextMcMillin Publishing

XSL

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0"><xsl:template match="/">

<xsl:apply-templates/></xsl:template><xsl:template match="BookCatalogue">

<xsl:apply-templates/></xsl:template><xsl:template match="Book">

<xsl:apply-templates/></xsl:template><xsl:template match="Title">

<xsl:apply-templates/></xsl:template><xsl:template match="Author">

<xsl:apply-templates/></xsl:template><xsl:template match="Date">

<xsl:apply-templates/></xsl:template><xsl:template match="ISBN">

<xsl:apply-templates/></xsl:template><xsl:template match="Publisher">

<xsl:apply-templates/></xsl:template><xsl:template match="text()">

<xsl:value-of select="."/></xsl:template>

</xsl:stylesheet>

30

XSL

<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0"><xsl:template match="/">

<HTML><HEAD><TITLE>Book Catalogue</TITLE></HEAD><BODY>

<xsl:apply-templates/></BODY></HTML>

</xsl:template><xsl:template match="BookCatalogue">

<xsl:apply-templates/></xsl:template><xsl:template match="Book">

<xsl:apply-templates/></xsl:template><xsl:template match="Title">

<xsl:apply-templates/></xsl:template><xsl:template match="Author">

<xsl:apply-templates/></xsl:template><xsl:template match="Date">

<xsl:apply-templates/></xsl:template><xsl:template match="ISBN">

<xsl:apply-templates/></xsl:template><xsl:template match="Publisher">

<xsl:apply-templates/></xsl:template><xsl:template match="text()">

<xsl:value-of select="."/></xsl:template>

</xsl:stylesheet>

BookCatalogue.xsl

added these

XML-based formats

? XML is an architecture not an application? SMIL (Synchronized Multimedia Integration

Language)? RDF (Resource Description Framework) for

metadata? CDF (Channel Definition Format) canales

Microsoft? MathML (Mathematical Markup Language)? CML (Chemical Markup Language)? BSML (Bioinformatic Sequence Markup

Language)? JML? WIDL (B2B integration)

31

Processing

?Two orientations to process XML documents using Java as programming language:

? DOM (Document Object Model)? tree structure (nodes, elements and text), most

used

? SAX (Serial Access with the Simple API for XML)

? event based? Fastest, less memory requirements, more

difficult to program

Tools

?XML Browsers (visualisers)?XML Editors?XML Parsers?XML Servers ?Relational DB to XML converters?XSL Editors?XSL Processors

32

Some references? http://www.w3.org/

? Official web with all the standards? http://www.xml.com/

? Web from O’Reilly publishers. A lot of good documentation and resources.

? http://www.xfront.com/? Very good tutorials of XSL and XML-Schema

? http://xml.apache.org? Apache parsers and documentation (Xerces,

Xalan, ...)? XML and Java. B. McLAUGHLIN. O’Reilly, 2000

? Interesting about their combination using Apache parsers