© 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of © 2004University of Greenwich1 XML 1 Introduction Recycled from Gill Windall’s notes.
© 2004 University of Greenwich 1
XML 1
Introduction
Recycled from Gill Windall’s notes
© 2004 University of Greenwich 2
XML Basics
• This lecture aims to cover:– What is XML and why it is significant– Content versus presentation– Displaying XML documents– Well-formed XML documents– Further XML syntax– What XML is actually used for– Technologies related to XML– Introduction to DTDs and Schemas– Introduction to namespaces
© 2004 University of Greenwich 3
What is XML?
1. A revolutionary and pervasive technology
– but pervasive things can be a bit difficult to get a handle on ...
"XML is what we should be focussing on in the industry for the next 2 to 4 years"
"XML gives us the freedom to do what we want"
Don Box - IT Guru - Dec 2001
© 2004 University of Greenwich 4
What is XML?
2. eXtensible Markup Language– HTML tags and attributes are restricted to those that
the browser has been coded to recognise– XML is extensible because tags and attributes can be
invented to suit any application e.g.
<book> <ISBN>1-34565-79-8</ISBN> <date>2001-07-03</date> <title> Hamsters and other Furry Rodents </title></book>
© 2004 University of Greenwich 5
What is XML?3. A simplified version of SGML (Standardised
General Markup Language) - a language for defining mark-up languages– XML and HTML are related (hence the family likeness)
via SGML
SGML
XML
XHTML Other XML languages
HTML Other SGML languages
is defined using is a subset of
© 2004 University of Greenwich 6
What is XML?– SGML is too complex for easy automatic processing.
Generic tools for manipulating SGML documents are expensive and large.
– XML is designed for easy automatic processing. Generic tools for manipulating XML documents are relatively cheap and efficient.
4. A W3C standard - the core specification is XML 1.0
5. More than just hype (although it has been heavily hyped)
© 2004 University of Greenwich 7
W3C Design Goals of XML
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
http://www.w3.org/TR/REC-xml/#sec-origin-goals
© 2004 University of Greenwich 8
Why XML?
• HTML tags and attributes are pre-defined in the HTML (XHTML) standard and describe presentation
• XML tags and attributes are defined to describe content and structure
XML separates content from presentation
© 2004 University of Greenwich 9
Separation of Content and Presentation
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
<tr> <td>1-56543-87-9</td> <td>1998-03-07</td> <td>Frogs and Toads of
the British Isles </td></tr>
content meaning clear
content meaning ?????
presentation definedpresentation ?????
© 2004 University of Greenwich 10
Separation of Content and Presentation
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
web browser on a PC tablet
printed paper
mobile phoneaudio
Presentation can be rendered differently for different devices and needs
catalogueadvert
© 2004 University of Greenwich 11
Separation of Content and Presentation
Enables meaningful searches
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
<book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of
the British Isles </title></book>
XML search engine
query:FIND book
WHERE ISBN=
© 2004 University of Greenwich 12
Book publisherBook retailer
Separation of Content and Presentation
A universal format for data exchange and communication
SQL Server on Windoze
Oracle server on UNIX
XML
© 2004 University of Greenwich 13
Separation of Content and Presentation
An alternative to Database technology? – Not really, XML is not a replacement for a
RDBMS but may be used in places where a full RDBMS may be overkill.
– XML schemas are well established but research is ongoing in the development of XML ontologies• ontology: classification of categories of being
Data storage
Displaying XML documents• XML documents define content but not presentation• The more recent browsers can display XML documents
as a hierarchical structure
© 2004 University of Greenwich 15
Displaying XML documents• So how do you tell browsers (or other presentation
software) how to display document that use XML defined tags?– Using style sheets of course:
• There are two main style sheet languagesCSS – Cascading Style SheetsXSL – eXtensible Stylesheet Language
• XSL is much more complex and powerfulXSL-FO and XSLT
• For now we'll just use CSS to explore some possibilities
XML document + style sheet = presentable document
© 2004 University of Greenwich 16
Displaying XML documents<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/css" href="books.css"?><booklist> <book> <ISBN>1-34565-79-8</ISBN> <date>2001-07-03</date> <title>Hamsters and other Furry Rodents</title> </book> <book> <ISBN>1-56543-87-9</ISBN> <date>1998-03-07</date> <title>Frogs and Toads of the British Isles</title> </book></booklist>
book { display:block }
ISBN { display:inline; font-family:arial; color:blue; font-size:10pt; font-weight:bold }
title { display:inline; font-family:arial; }
date { display:none}
books.xml
books.css
© 2004 University of Greenwich 17
Well Formed and Valid XML Documents
• An XML document that conforms to the strict syntax rules in the XML 1.0 specification can be considered to be well-formed.
• In addition, an XML document can be considered as valid if it conforms to a set of grammar rules defined in:– a Document Type Definition (DTD) or…– an XML Schema (XSD).
• XML documents don't need to have an associated DTD or Schema– in which case they can only be checked for being well
formed but not for validity.
© 2004 University of Greenwich 18
XML Syntax Rules
1. Document has a single root element2. Tags must be properly nested
• no overlapping tag pairs
3. All tags must have a closing tag• or be self closing
4. Tag names are case sensitive5. Tag attributes are in the opening tag
• unique attribute name• attribute value must be quoted
© 2004 University of Greenwich 19
XML Syntax Rules
1. Only one root element is allowed in a documentThis is called the document element
<head> <title>Some HTML doc</title></head><body> A bit of text</body>
<html> <head> <title>Some HTML doc</title> </head> <body> A bit of text </body></html>
not well formedwell formed
To be well-formed an XML document must have a document element that encloses all the other elements
© 2004 University of Greenwich 20
XML Syntax Rules
• Any element contained inside another element has to be completely contained within it– you can't have one element partly within another
• The following may work as XHTML but it is not well formed XML
• Whereas this is well formed XML (XHML)
2. All elements must be "properly nested"
<b>bold text <i>bold italic text</b> italic text</i>
<b>bold text <i>bold italic text</i></b><i> italic text</i>
© 2004 University of Greenwich 21
XML Syntax Rules
Rules 1 and 2 combined mean that it is always possible to represent an XML document as a simple hierarchical tree<html> <head><title>Some HTML doc</title></head> <body><p>A bit of text</p></body></html>
html
body
head
p
title Some HTML doc
A bit of text
© 2004 University of Greenwich 22
XML Syntax Rules Quick Quiz
<html><head><title>Flowers</title></head><body><p>List of <b>flowers</b></p><ul> <li>daisy</li><li><i>buttercup</i></li></ul><hr></hr></body></html>
Draw a hierarchical tree to represent the following document
© 2004 University of Greenwich 23
XML Syntax Rules
• The following acceptable HTML is not well-formed XML
<p>first paragraph <p>second paragraph
• Whereas this is
<p>first paragraph</p> <p>second paragraph</p>
• If the tag is truly empty (i.e. it has no content) then the empty tag notation may be used so…
<hr></hr>
• may be rewritten as
<hr />
3. All elements must have a closing tag
© 2004 University of Greenwich 24
XML Syntax Rules
• <title> is different to <Title> is different to <TITLE>
• closing tags must match case – of course
<title>Hamsters and other Furry Rodents</TITLE>
• would be wrong
4. Tag names are case sensitive
© 2004 University of Greenwich 25
XML Syntax Rules
• Start tags and empty tags but not end tags can contain attributes
• Attributes always exists as name=“value” pairs• The attribute value must always be quoted with " or '• The attribute name must be unique within the tag• Some bad attribute examples:
5. Some rules concerning attributes
<film rating=PG>Snow White turns ugly</film><car colour='silver trim' colour="red body">KKE 763L</car><transaction>credit</transaction id="12543"><transaction synchronised>close account</transaction>
© 2004 University of Greenwich 26
Some More XML Syntax
• Knowing about elements (i.e. tags), attributes and well-formed documents allows you create basic XML documents
• Other aspects of XML syntax include– XML declaration– Processing instructions– Comments– Character references and Entities– Special symbols– CDATA sections
© 2004 University of Greenwich 27
XML Declaration
• Ideally all XML documents should start with an XML declaration (SGML processing instruction)
<?xml version="1.0" encoding="UTF-8"?>
• If included the declaration must:– be the first line in the document– be on a single line beginning with <?xml and ending
with ?>– include version= to indicate the version of xml
• currently this must be "1.0"– the declaration may optionally include:
• encoding= indicates the encoding used to store the file typically this is "UTF-8" (8 bit Unicode)
• standalone="[yes|no]" does the document depend on external markup declarations?
© 2004 University of Greenwich 28
Processing Instructions• Instructions intended for an application
processing the XML document• PIs have the form <?target instruction ?>
– target identifies the program that the instruction is intended for
– instruction is the instruction to the target program
• A very common PI is <?xml-stylesheet href="mystyle.css" type="text/css"?>
target instruction
© 2004 University of Greenwich 29
Character References• As in HTML these can be used to include non-
standard characters in the document– i.e. things that can be displayed but not easily entered
from a standard keyboard
• Format is:– &#NNN; &#xHHH;– NNN is the decimal number or HHH is the hex
number representing the character in the Unicode character set.
<test>it's Greek to me Φ Δ Δ</test>
• it's Greek to me Φ Δ Δ
© 2004 University of Greenwich 30
Entities• Some symbols have a special meaning in XML
and must be entered as entities (or character references)
• Standard symbols – Less than symbol (<) - <– Greater than symbol (>) - >– Quotation mark (“) - "– Apostrophe (‘) - '– Ampersand (&) - & – Copyright (©) - ©
• Customised ones e.g. ©w; to insert a predefined (e.g. in a DTD) copyright statement.
© 2004 University of Greenwich 31
CDATA Sections
• A way of including data that you don't want interpreted as XML
• Form is <![CDATA[the data not to be interpreted as XML]]>
• Why would you do this?– Perhaps to include examples of XML in a document
which you don't want processed as XML e.g. <![CDATA[ <wrong attr=val />]]>
• Comments like HTML use <!-- and -->
© 2004 University of Greenwich 32
XML Applications
<molecule convention="MDLMol" id="dopamine" title="DOPAMINE"> <date day="22" month="11" year="1995"></date> <atomArray> <atom id="a1"> <string builtin="elementType">C</string> <float builtin="x2">0.0222</float> <float builtin="y2">0.8115</float> </atom>
Standard vocabularies for representing and exchanging specialist data
e.g. legal, scientific, medical, mathematical vocabularies
© 2004 University of Greenwich 33
XML Applications
• Used by human-facing client software e.g.– eXtensible Hypertext Markup Language -
XHTML – Wireless Markup Language - WML – Synchronised Multimedia Integration
Language - SMIL – Scalable Vector Graphics - SVG – MathML– Voice over XML - VoiceXML
© 2004 University of Greenwich 34
XML Applications
• Meta data (data about data) to describe resources e.g.– Resource Description Framework RDF– Really Simple Syndication RSS– DARPA Agent Markup Language DAML– Ontology Integration Language OIL– Web Ontology Language OWL
<rdf:Description about="http://www.gre.ac.uk/examregs.html"><cd:Creator>Fred Bloggs</cd:Creator><cd:Date>20021212</cd:Date></rdf:Description>
© 2004 University of Greenwich 35
XML Applications
• Web services• Buried deep in computer to computer
communications– XML-RPC, SOAP, WSDL, UDDI
• Business to business (B2B) data exchange– BizTalk, ebXML
• More B2B than B2C
<SOAP-ENV:Body><proc:GetCurrentPrice xmlns:proc="proc-URI"/>
<BusinessPartnerRole name="Buyer"><Performs initiatingRole="Buyer"/>
Web SiteXML documents transformed using
XSLT for multi-channel deliveryXML multimedia
WMLXHTML HTML
VoiceXML
Enterprise SystemsXML communication within a
distributed system (SOAP, XML-RPC)
XML enabled databases e.g. Oracle, DB2, SQL Server
XML aware search engines
B2B linksXML data exchange
XML based web servicesCall to third party services
e.g. Microsoft Passport
XML in the Enterprise
© 2004 University of Greenwich 37
Applications of XML
CML MathML WML VoiceML XHTML SMIL SVG
RDF SOAP UDDI WSDL ebXML etc. etc.
Core XML
Syntax DTD XSD Namespaces
Supporting Specifications
Xpath Xlink
Xpointer Xquery
XSLT XSL-FO
CSS DOM etc.
Supporting Tools
Browsers – IE Mozilla
APIs – DOM SAX
Parsers – Expat MSXML Xerces
IDEs – XMLSpy Stylus
XML Technologies
© 2004 University of Greenwich 38
DTDs and Schemas
• DTDs and schemas (XSD) are alternative ways of defining an XML language.
• They contain rules that specify things such as– the tags in the vocabulary– which tags are allowed to be nested in other tags– which tags and attributes are optional / mandatory– which values are allowed for attributes
• XML languages defined by a DTDs or schemas are used to create valid XML documents.
© 2004 University of Greenwich 39
DTDs and Schemas
• For an XML document to be valid it must conform to the rules specified in its DTD or Schema
XML documents that use the language defined in the DTD or Schema
DTD or Schema defines an XML
language
Example XML with DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE transactions SYSTEM "translang.dtd"><transactions>
<transaction><trantype>credit</trantype><amount>2000</amount>
</transaction><transaction>
<trantype>debit</trantype><amount>1000</amount>
</transaction><transaction>
<trantype>credit</trantype><amount>300</amount>
</transaction></transactions>
<?xml version="1.0" encoding="UTF-8"?><!ELEMENT transactions (transaction*)><!ELEMENT transaction (trantype, amount)><!ELEMENT trantype (#PCDATA)><!ELEMENT amount (#PCDATA)>
the DOCTYPE declaration associates a DTD in a separate file (translang.dtd) with this document
translang.dtd says that:• the transactions element contains zero or more transaction elements• each transaction element contains a trantype element followed by an amount element• each trantype element contains data• each amount element contains data
transactions.xml
translang.dtd
© 2004 University of Greenwich 41
XML Schema• DTDs:
– easy for humans to cope with– older than schemas
• supported by a much wider range of XML tools and software– have poor support for namespaces
• Schemas:– more verbose– much more expressive than DTDs
• data types, constraints on values– an XML based vocabulary
• can be manipulated with general purpose XML tools– support namespaces– declared in the root element of the XML document
<transactions xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="translang.xsd">
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified"> <xs:element name="transactions"> <xs:complexType> <xs:sequence> <xs:element ref="transaction" minOccurs="0" maxOccurs="100"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="transaction"> <xs:complexType> <xs:sequence> <xs:element ref="trantype"/> <xs:element ref="amount"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="trantype"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="credit"/> <xs:enumeration value="debit"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="amount" type="xs:integer"/></xs:schema>
the transactions element contains between 0 and 100 transaction
elements
the transaction element contains a trantype element followed by an
amount element
the trantype element contains a string with either the value
"credit" or "debit"
the trantype element contains an integertranslang.xsd
© 2004 University of Greenwich 43
Quick Quiz
• Is the following document valid according to either or both of the DTD or Schema above?
<transactions> <transaction> <trantype>credit</trantype><amount>24.75</amount> </transaction> <transaction> <trantype>credit</trantype><amount>650</amount> </transaction></transactions>
© 2004 University of Greenwich 44
Namespaces• Namespaces are a way of avoiding name conflicts
– where different XML vocabularies use the same element names to mean different things.
• Consider two hypothetical XML languages; ShoeML and PicML– in the language ShoeML the <size> element refers to shoe size– in PicML the <size> element refers to the size of an image.
• The problem comes when you want to mix several vocabularies
<shoe> <style>SupaFeet</style> <size>39</size> <image> <filename>supafeet.jpg</filename> <size>100kb</size> </image></shoe>
what does size mean?
© 2004 University of Greenwich 45
Namespaces• The previous example is well-formed XML but it is
difficult for applications to know how to process <size>.• The solution is to use prefixes for the element names to
distinguish between them– can also be used for attributes
• Here shoe vocabulary element names are prefixed by shu: and images element names are prefixed by img:
<shu:shoe> <shu:style>SupaFeet</shu:style> <shu:size>39</shu:size> <img:image> <img:filename>supafeet.jpg</img:filename> <img:size>100 kb</img:size></img:image></shu:shoe>
© 2004 University of Greenwich 46
References• There are masses of XML books and websites.
– "Professional XML" - Birbeck et al, Wrox Press• Very comprehensive book.• This lecture covers much of the material in chapters 1 and 2
– “SAMS Teach Yourself XML in 24 hours” - Morrison• Cheap as chips, good scope but little depth
• W3Schools online tutorial http://www.w3schools.com– Try their online XML test
• World Wide Web consortium at http://www.w3.org– The home of the XML specification and so much more.
• XML in practice from http://www.xml.org– Articles, white papers, user groups and more
• XML resources and information from http://www.xml.org– Provided by Tim O’Reilly
© 2004 University of Greenwich 47
Summary• XML is a meta-language used to define application
specific markup languages– XHTML, MathML, CML, WML, ShoeML, etc.
• XML is designed to be straightforward and easy to use
• XML provides simple syntactic rules that result in well-formed hierarchically structured documents
• DTDs or Schemas are used to define valid XML languages– namespaces avoid conflicts between XML languages
• XML separates content from presentation– CSS and XSL can be used to render XML documents in a
readable form