© 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

47
© 2004 University of Greenwich 1 XML 1 Introduction Recycled from Gill Windall’s notes
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

Page 1: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 1

XML 1

Introduction

Recycled from Gill Windall’s notes

Page 2: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 2

XML Basics

• This lecture aims to cover:– What is XML and why it is significant– Content versus presentation– Displaying XML documents– Well-formed XML documents– Further XML syntax– What XML is actually used for– Technologies related to XML– Introduction to DTDs and Schemas– Introduction to namespaces

Page 3: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 3

What is XML?

1. A revolutionary and pervasive technology

– but pervasive things can be a bit difficult to get a handle on ...

"XML is what we should be focussing on in the industry for the next 2 to 4 years"

"XML gives us the freedom to do what we want"

Don Box - IT Guru - Dec 2001

Page 4: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 4

What is XML?

2. eXtensible Markup Language– HTML tags and attributes are restricted to those that

the browser has been coded to recognise– XML is extensible because tags and attributes can be

invented to suit any application e.g.

<book> <ISBN>1-34565-79-8</ISBN> <date>2001-07-03</date> <title> Hamsters and other Furry Rodents </title></book>

Page 5: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 5

What is XML?3. A simplified version of SGML (Standardised

General Markup Language) - a language for defining mark-up languages– XML and HTML are related (hence the family likeness)

via SGML

SGML

XML

XHTML Other XML languages

HTML Other SGML languages

is defined using is a subset of

Page 6: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 6

What is XML?– SGML is too complex for easy automatic processing.

Generic tools for manipulating SGML documents are expensive and large.

– XML is designed for easy automatic processing. Generic tools for manipulating XML documents are relatively cheap and efficient.

4. A W3C standard - the core specification is XML 1.0

5. More than just hype (although it has been heavily hyped)

Page 7: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 7

W3C Design Goals of XML

1. XML shall be straightforwardly usable over the Internet.

2. XML shall support a wide variety of applications.

3. XML shall be compatible with SGML.

4. It shall be easy to write programs which process XML documents.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6. XML documents should be human-legible and reasonably clear.

7. The XML design should be prepared quickly.

8. The design of XML shall be formal and concise.

9. XML documents shall be easy to create.

10. Terseness in XML markup is of minimal importance.

http://www.w3.org/TR/REC-xml/#sec-origin-goals

Page 8: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 8

Why XML?

• HTML tags and attributes are pre-defined in the HTML (XHTML) standard and describe presentation

• XML tags and attributes are defined to describe content and structure

XML separates content from presentation

Page 9: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 9

Separation of Content and Presentation

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

<tr> <td>1-56543-87-9</td> <td>1998-03-07</td> <td>Frogs and Toads of

the British Isles </td></tr>

content meaning clear

content meaning ?????

presentation definedpresentation ?????

Page 10: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 10

Separation of Content and Presentation

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

web browser on a PC tablet

printed paper

mobile phoneaudio

Presentation can be rendered differently for different devices and needs

catalogueadvert

Page 11: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 11

Separation of Content and Presentation

Enables meaningful searches

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of

the British Isles </title></book>

XML search engine

query:FIND book

WHERE ISBN=

Page 12: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 12

Book publisherBook retailer

Separation of Content and Presentation

A universal format for data exchange and communication

SQL Server on Windoze

Oracle server on UNIX

XML

Page 13: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 13

Separation of Content and Presentation

An alternative to Database technology? – Not really, XML is not a replacement for a

RDBMS but may be used in places where a full RDBMS may be overkill.

– XML schemas are well established but research is ongoing in the development of XML ontologies• ontology: classification of categories of being

Data storage

Page 14: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

Displaying XML documents• XML documents define content but not presentation• The more recent browsers can display XML documents

as a hierarchical structure

Page 15: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 15

Displaying XML documents• So how do you tell browsers (or other presentation

software) how to display document that use XML defined tags?– Using style sheets of course:

• There are two main style sheet languagesCSS – Cascading Style SheetsXSL – eXtensible Stylesheet Language

• XSL is much more complex and powerfulXSL-FO and XSLT

• For now we'll just use CSS to explore some possibilities

XML document + style sheet = presentable document

Page 16: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 16

Displaying XML documents<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/css" href="books.css"?><booklist> <book> <ISBN>1-34565-79-8</ISBN> <date>2001-07-03</date> <title>Hamsters and other Furry Rodents</title> </book> <book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles</title> </book></booklist>

book { display:block }

ISBN { display:inline; font-family:arial; color:blue; font-size:10pt; font-weight:bold }

title { display:inline; font-family:arial; }

date { display:none}

books.xml

books.css

Page 17: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 17

Well Formed and Valid XML Documents

• An XML document that conforms to the strict syntax rules in the XML 1.0 specification can be considered to be well-formed.

• In addition, an XML document can be considered as valid if it conforms to a set of grammar rules defined in:– a Document Type Definition (DTD) or…– an XML Schema (XSD).

• XML documents don't need to have an associated DTD or Schema– in which case they can only be checked for being well

formed but not for validity.

Page 18: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 18

XML Syntax Rules

1. Document has a single root element2. Tags must be properly nested

• no overlapping tag pairs

3. All tags must have a closing tag• or be self closing

4. Tag names are case sensitive5. Tag attributes are in the opening tag

• unique attribute name• attribute value must be quoted

Page 19: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 19

XML Syntax Rules

1. Only one root element is allowed in a documentThis is called the document element

<head> <title>Some HTML doc</title></head><body> A bit of text</body>

<html> <head> <title>Some HTML doc</title> </head> <body> A bit of text </body></html>

not well formedwell formed

To be well-formed an XML document must have a document element that encloses all the other elements

Page 20: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 20

XML Syntax Rules

• Any element contained inside another element has to be completely contained within it– you can't have one element partly within another

• The following may work as XHTML but it is not well formed XML

• Whereas this is well formed XML (XHML)

2. All elements must be "properly nested"

<b>bold text <i>bold italic text</b> italic text</i>

<b>bold text <i>bold italic text</i></b><i> italic text</i>

Page 21: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 21

XML Syntax Rules

Rules 1 and 2 combined mean that it is always possible to represent an XML document as a simple hierarchical tree<html> <head><title>Some HTML doc</title></head> <body><p>A bit of text</p></body></html>

html

body

head

p

title Some HTML doc

A bit of text

Page 22: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 22

XML Syntax Rules Quick Quiz

<html><head><title>Flowers</title></head><body><p>List of <b>flowers</b></p><ul> <li>daisy</li><li><i>buttercup</i></li></ul><hr></hr></body></html>

Draw a hierarchical tree to represent the following document

Page 23: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 23

XML Syntax Rules

• The following acceptable HTML is not well-formed XML

<p>first paragraph <p>second paragraph

• Whereas this is

<p>first paragraph</p> <p>second paragraph</p>

• If the tag is truly empty (i.e. it has no content) then the empty tag notation may be used so…

<hr></hr>

• may be rewritten as

<hr />

3. All elements must have a closing tag

Page 24: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 24

XML Syntax Rules

• <title> is different to <Title> is different to <TITLE>

• closing tags must match case – of course

<title>Hamsters and other Furry Rodents</TITLE>

• would be wrong

4. Tag names are case sensitive

Page 25: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 25

XML Syntax Rules

• Start tags and empty tags but not end tags can contain attributes

• Attributes always exists as name=“value” pairs• The attribute value must always be quoted with " or '• The attribute name must be unique within the tag• Some bad attribute examples:

5. Some rules concerning attributes

<film rating=PG>Snow White turns ugly</film><car colour='silver trim' colour="red body">KKE 763L</car><transaction>credit</transaction id="12543"><transaction synchronised>close account</transaction>

Page 26: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 26

Some More XML Syntax

• Knowing about elements (i.e. tags), attributes and well-formed documents allows you create basic XML documents

• Other aspects of XML syntax include– XML declaration– Processing instructions– Comments– Character references and Entities– Special symbols– CDATA sections

Page 27: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 27

XML Declaration

• Ideally all XML documents should start with an XML declaration (SGML processing instruction)

<?xml version="1.0" encoding="UTF-8"?>

• If included the declaration must:– be the first line in the document– be on a single line beginning with <?xml and ending

with ?>– include version= to indicate the version of xml

• currently this must be "1.0"– the declaration may optionally include:

• encoding= indicates the encoding used to store the file typically this is "UTF-8" (8 bit Unicode)

• standalone="[yes|no]" does the document depend on external markup declarations?

Page 28: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 28

Processing Instructions• Instructions intended for an application

processing the XML document• PIs have the form <?target instruction ?>

– target identifies the program that the instruction is intended for

– instruction is the instruction to the target program

• A very common PI is <?xml-stylesheet href="mystyle.css" type="text/css"?>

target instruction

Page 29: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 29

Character References• As in HTML these can be used to include non-

standard characters in the document– i.e. things that can be displayed but not easily entered

from a standard keyboard

• Format is:– &#NNN; &#xHHH;– NNN is the decimal number or HHH is the hex

number representing the character in the Unicode character set.

<test>it's Greek to me &#934; &#916; &#x394;</test>

• it's Greek to me Φ Δ Δ

Page 30: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 30

Entities• Some symbols have a special meaning in XML

and must be entered as entities (or character references)

• Standard symbols – Less than symbol (<) - &lt;– Greater than symbol (>) - &gt;– Quotation mark (“) - &quot;– Apostrophe (‘) - &apos;– Ampersand (&) - &amp; – Copyright (©) - &copy;

• Customised ones e.g. &copyw; to insert a predefined (e.g. in a DTD) copyright statement.

Page 31: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 31

CDATA Sections

• A way of including data that you don't want interpreted as XML

• Form is <![CDATA[the data not to be interpreted as XML]]>

• Why would you do this?– Perhaps to include examples of XML in a document

which you don't want processed as XML e.g. <![CDATA[ <wrong attr=val />]]>

• Comments like HTML use <!-- and -->

Page 32: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 32

XML Applications

<molecule convention="MDLMol" id="dopamine" title="DOPAMINE"> <date day="22" month="11" year="1995"></date> <atomArray> <atom id="a1"> <string builtin="elementType">C</string> <float builtin="x2">0.0222</float> <float builtin="y2">0.8115</float> </atom>

Standard vocabularies for representing and exchanging specialist data

e.g. legal, scientific, medical, mathematical vocabularies

Page 33: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 33

XML Applications

• Used by human-facing client software e.g.– eXtensible Hypertext Markup Language -

XHTML – Wireless Markup Language - WML – Synchronised Multimedia Integration

Language - SMIL – Scalable Vector Graphics - SVG – MathML– Voice over XML - VoiceXML

Page 34: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 34

XML Applications

• Meta data (data about data) to describe resources e.g.– Resource Description Framework RDF– Really Simple Syndication RSS– DARPA Agent Markup Language DAML– Ontology Integration Language OIL– Web Ontology Language OWL

<rdf:Description about="http://www.gre.ac.uk/examregs.html"><cd:Creator>Fred Bloggs</cd:Creator><cd:Date>20021212</cd:Date></rdf:Description>

Page 35: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 35

XML Applications

• Web services• Buried deep in computer to computer

communications– XML-RPC, SOAP, WSDL, UDDI

• Business to business (B2B) data exchange– BizTalk, ebXML

• More B2B than B2C

<SOAP-ENV:Body><proc:GetCurrentPrice xmlns:proc="proc-URI"/>

<BusinessPartnerRole name="Buyer"><Performs initiatingRole="Buyer"/>

Page 36: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

Web SiteXML documents transformed using

XSLT for multi-channel deliveryXML multimedia

WMLXHTML HTML

VoiceXML

Enterprise SystemsXML communication within a

distributed system (SOAP, XML-RPC)

XML enabled databases e.g. Oracle, DB2, SQL Server

XML aware search engines

B2B linksXML data exchange

XML based web servicesCall to third party services

e.g. Microsoft Passport

XML in the Enterprise

Page 37: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 37

Applications of XML

CML MathML WML VoiceML XHTML SMIL SVG

RDF SOAP UDDI WSDL ebXML etc. etc.

Core XML

Syntax DTD XSD Namespaces

Supporting Specifications

Xpath Xlink

Xpointer Xquery

XSLT XSL-FO

CSS DOM etc.

Supporting Tools

Browsers – IE Mozilla

APIs – DOM SAX

Parsers – Expat MSXML Xerces

IDEs – XMLSpy Stylus

XML Technologies

Page 38: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 38

DTDs and Schemas

• DTDs and schemas (XSD) are alternative ways of defining an XML language.

• They contain rules that specify things such as– the tags in the vocabulary– which tags are allowed to be nested in other tags– which tags and attributes are optional / mandatory– which values are allowed for attributes

• XML languages defined by a DTDs or schemas are used to create valid XML documents.

Page 39: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 39

DTDs and Schemas

• For an XML document to be valid it must conform to the rules specified in its DTD or Schema

XML documents that use the language defined in the DTD or Schema

DTD or Schema defines an XML

language

Page 40: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

Example XML with DTD

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE transactions SYSTEM "translang.dtd"><transactions>

<transaction><trantype>credit</trantype><amount>2000</amount>

</transaction><transaction>

<trantype>debit</trantype><amount>1000</amount>

</transaction><transaction>

<trantype>credit</trantype><amount>300</amount>

</transaction></transactions>

<?xml version="1.0" encoding="UTF-8"?><!ELEMENT transactions (transaction*)><!ELEMENT transaction (trantype, amount)><!ELEMENT trantype (#PCDATA)><!ELEMENT amount (#PCDATA)>

the DOCTYPE declaration associates a DTD in a separate file (translang.dtd) with this document

translang.dtd says that:• the transactions element contains zero or more transaction elements• each transaction element contains a trantype element followed by an amount element• each trantype element contains data• each amount element contains data

transactions.xml

translang.dtd

Page 41: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 41

XML Schema• DTDs:

– easy for humans to cope with– older than schemas

• supported by a much wider range of XML tools and software– have poor support for namespaces

• Schemas:– more verbose– much more expressive than DTDs

• data types, constraints on values– an XML based vocabulary

• can be manipulated with general purpose XML tools– support namespaces– declared in the root element of the XML document

<transactions xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="translang.xsd">

Page 42: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xs:element name="transactions"> <xs:complexType> <xs:sequence> <xs:element ref="transaction" minOccurs="0" maxOccurs="100"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="transaction"> <xs:complexType> <xs:sequence> <xs:element ref="trantype"/> <xs:element ref="amount"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="trantype"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="credit"/> <xs:enumeration value="debit"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="amount" type="xs:integer"/></xs:schema>

the transactions element contains between 0 and 100 transaction

elements

the transaction element contains a trantype element followed by an

amount element

the trantype element contains a string with either the value

"credit" or "debit"

the trantype element contains an integertranslang.xsd

Page 43: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 43

Quick Quiz

• Is the following document valid according to either or both of the DTD or Schema above?

<transactions> <transaction> <trantype>credit</trantype><amount>24.75</amount> </transaction> <transaction> <trantype>credit</trantype><amount>650</amount> </transaction></transactions>

Page 44: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 44

Namespaces• Namespaces are a way of avoiding name conflicts

– where different XML vocabularies use the same element names to mean different things.

• Consider two hypothetical XML languages; ShoeML and PicML– in the language ShoeML the <size> element refers to shoe size– in PicML the <size> element refers to the size of an image.

• The problem comes when you want to mix several vocabularies

<shoe> <style>SupaFeet</style> <size>39</size> <image> <filename>supafeet.jpg</filename> <size>100kb</size> </image></shoe>

what does size mean?

Page 45: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 45

Namespaces• The previous example is well-formed XML but it is

difficult for applications to know how to process <size>.• The solution is to use prefixes for the element names to

distinguish between them– can also be used for attributes

• Here shoe vocabulary element names are prefixed by shu: and images element names are prefixed by img:

<shu:shoe> <shu:style>SupaFeet</shu:style> <shu:size>39</shu:size> <img:image> <img:filename>supafeet.jpg</img:filename> <img:size>100 kb</img:size></img:image></shu:shoe>

Page 46: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 46

References• There are masses of XML books and websites.

– "Professional XML" - Birbeck et al, Wrox Press• Very comprehensive book.• This lecture covers much of the material in chapters 1 and 2

– “SAMS Teach Yourself XML in 24 hours” - Morrison• Cheap as chips, good scope but little depth

• W3Schools online tutorial http://www.w3schools.com– Try their online XML test

• World Wide Web consortium at http://www.w3.org– The home of the XML specification and so much more.

• XML in practice from http://www.xml.org– Articles, white papers, user groups and more

• XML resources and information from http://www.xml.org– Provided by Tim O’Reilly

Page 47: © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.

© 2004 University of Greenwich 47

Summary• XML is a meta-language used to define application

specific markup languages– XHTML, MathML, CML, WML, ShoeML, etc.

• XML is designed to be straightforward and easy to use

• XML provides simple syntactic rules that result in well-formed hierarchically structured documents

• DTDs or Schemas are used to define valid XML languages– namespaces avoid conflicts between XML languages

• XML separates content from presentation– CSS and XSL can be used to render XML documents in a

readable form