Post on 30-Apr-2018
Modul 2:
XML Schemadefinition
a.Univ.-Prof. Dr. Werner Retschitzegger
Vorlesu
ng
IFS in der B
ioinformatik
SS 2011
Johannes Kepler University Linzwww.jku.ac.at
Johannes Kepler University Linzwww.jku.ac.at
Institute of Bioinformaticswww.bioinf.jku.at
Institute of Bioinformaticswww.bioinf.jku.at
IFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
IFSIFSIFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
M2-2
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Outline
IntroductionMotivation for XMLDocument Markup LanguagesApplication Areas for XML
XML 1.0NamespacesXML Schema
The following slides are based (among others) on:Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005
M2-3
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML 1/5From HTML to XML
"If I invent another programming language, its name will contain the letter X."
(N. Wirth, Software Pioniere Konferenz, Bonn 2001)
223 Mio.SQL
252 Mio.ABC
20,6 K“Werner Retschitzegger”
237 Mio.Soccer
603 Mio.XML
2,2 Mrd.Love
Google Indicator:
... as of Sep/16/08
M2-4
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML 2/5From HTML to XML
Brian Kerningham: "The problem with HTML-WYSIWYG is thatwhat you see is all you've got"
HTML (HyperText Markup Language) is the "Lingua Franca" for representing Hypertext Documents at the WebStandardized 1989 by W3C (World Wide Web Consortium)Basic concept: "Markup" in terms of "Tags"
DrawbacksRestricted number of pre-defined tags
permanent extensions with proprietary tags
Tags primarily describe layout aspectshardens Web search
M2-5
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML 3/5From HTML to XML
<h1>PDACatalog</h1><h2>Nokia 8210</h2><table border="1"><tr><td>Battery</td><td>900mAh</td></tr><tr><td>Weight</td><td>141g</td></tr> …</table>
HTML describes layout of content<PDACatalog><Producer name="Nokia"><PDA name="8210"><Battery>900mAh</Battery><Weight>141g</Weight>
…</PDA></Producer></PDACatalog>
XML describes structure and semantics of content
Tim Bray, Co-Editor of XML 1.0:"XML will become the ASCII of the 21st century -
basic, essential, unexciting"
PDA-Catalog
BatteryWeight
PDA-Catalog
M2-6
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML 4/5Features of XML
Layout IndependenceSeparation of structure and semantics of the content from its layout
Platform and Vendor IndependenceEndorsed by the W3C
InternationalityBased on the UNICODE-Standard
ExtensibilityTags can be defined and named arbitrarily – meta language
StructurabilityTags can be nested arbitrarily
Semi-structuredContent can contain fully structured parts and fully unstructured parts
Self-describingTags describing structure and semantics of the content are... for humans: relatively easy to read and edit... for machines: easy to generate and parse
X-Technology InfrastructureW3C provides a set of XML-based standards – „XML Standards Family“
Correctness ProofOptionally, XML documents can be proofed for correctness
M2-7
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Well-formednesssyntactical properties, e.g.:
At least 1 tag per documentExactly 1 root tagTags have to be none-overlappingEach tag has to havean end tag....
XML-Processors parse XML documents and checkeither solely well-formedness (non-validating processors)or also validity (validating processors)
Can be called from within an application (e.g., browser)Decompose an XML document into its parts forming a tree, which allows to access its parts from within an application
ValidityXML document is well-formedand corresponds to a schemaSchema defines vocabulary and grammarAlternatives: DTD orXML Schema-StandardApplication
DocumentpartsErrors
Catalog.DTD
XML Processor
ParserEntityManagerPDACatalog1.XML
PDA
XML-Document
FeaturesEntities
Motivation for XML 5/5
Properties of XML Documents and XML Processors
M2-8
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages 1/4History
Vannevar Bush 1945 MemexDouglas Engelbart 1962 AugmentTed Nelson 1965 XanaduWilliam Tunniclife (GCA) 1967 GenCodeGoldfarb, Mosher, Lorie (IBM) 1969 GML (Generalized Markup Language)ANSI 1978 Standardisierung (GenCode & GML)Charles GoldfarbISO 1986 SGML (Standard Generalized Markup
Language - ISO 8879)Tim Berners-Lee (CERN) 1989 HTML (Hypertext Markup Language)Mark Andreessen (NCSA) 1993 HTML-Forms (XMosaic)Netscape, Microsoft 1994 HTML-DerivationsJon Bosak, Tim Bray, 1996 XML Working Group James Clark et al. (W3C)
10. 2. 1998 XML 1.029. 9. 2006 XML 1.1, 2nd Edition
M2-9
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages 2/4
Memexhttp://www.ps.uni-sb.de/~duchier/pub/vbush/vbush-all.shtml
M2-10
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
SGMLXML Meta Level
XHTML Language Level(e.g. DTDs)HTMLMathML
Instance Level(documents)
e iπ +1= 0n
f (n) = Σ kk=1
WMLz.B.
z.B.
M2
M1
M0
[www.omg.org]
Document Markup Languages 3/4
XML and OMG’s Metadata Architecture
M2-11
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages 4/4XML versus ...
... SGMLXML vs. SGML (60 pages vs. 600 pages)XML has 20% of SGML’s complexity, but 80% of its functionalityXML documents are conform to an ISO revision of SGML -WebSGML (Annex to the SGML-Standard ISO8879)
... HTMLXML is complementary to HTML (semantic and structure vs. layout)XML is not backward compatible to HTMLSimple conversion from HTML documents to XML
... XHTML= Extensible HTMLW3C Recommendation Aug. 2002 (2nd edition)HTML 4.01 as an „XML application“, i.e. HTML was described bymeans of a XML-DTD
M2-12
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Application Areas of XML 1/4
Three Main Application Areas
Data Exchange ("Portable Data")Using XML solely as an exchange format orUsing also a common schema
Multi-DeliveryOne and the same content can be delivered to different end user devices
Intelligent RetrievalInstead of a simple keyword search on basis of HTML documents, structure-based search on basis of XML documents
"Mozart" -
Componist or chocolate
ball?
M2-13
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
[http://www.oasis-open.org/cover/xml.html#applications]
XML-DTDs for ...Literature "Gutenberg"Travel "openTravel"News "NewsML"Marketing "adXML"Weather "OMF"Human Resources "XML-HR"Voice Applications "VoxML"Vector Graphics "SVG"Mobile Applications "WML"Geo Applications "ANZMETA"Health Care "HL7"Mathematics "MathML”Banking "MBA”eGovernment “eGovML”
Electronic CommerceCBL: Common Business
Library (Commerce One)
BizTalk: MicrosoftcXML: Commerce XMLRosettaNet:Format for Online-
OrdersebXML: OASIS + XML/EDIFnXML: Financial Products
Markup Language...
Application Areas of XML 2/4
Industrial Sectors – "Verticalisation of XML"
M2-14
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Application Areas of XML 3/4
Sources of XML Data
Inter-application and mobile devices communication data
e.g., Web Services
Logs and Blogse.g., RSS
Metadatae.g., Schema, WSDL, XMP
Presentation datae.g., XHTML
Documentse.g., Word
Views of other sources of datae.g., Relational, LDAP, CSV, Excel, etc.
Sensor data
M2-15
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XMLXML language concepts incl. DTD
XML NamespacesSupport of a global identification schema for element names and attribute names
XPath (XML Path Language)Path expressions for navigation in XML documents
XML SchemaXML-based language for the definition of XML schemata
XLink, XPointerXML-based language for the linking of (parts of) XML documents
XSL (Extensible Stylesheet Language)XSLT: Transformation of XML documents (declarative)XSL-FO: Rendering of XML documents (declarative)
DOM (Document Object Model)API for accessing XML documents in a procedural manner
W3C Standardization Levels:(1) Note(2) Working Draft (WD)(3) Candidate Recommendation (CR)(4) Proposed Recommendation (PR)(5) Recommendation (REC)
Application Areas of XML 4/4XML Standardization Family (excerpt)
„It takes ten minutes to understand (base) XML, but then ten month to understand the new technologies hung around it. „
(Peter Chen)
M2-16© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Outline
IntroductionXML 1.0
XML DocumentDTDEntities
NamespacesXML Schema
M2-17© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
XML Document 1/3
Running Example: PDACatalog
<?xml <?xml version="1.0" version="1.0" encoding="UTF-8"?>><<PDACatalogPDACatalog>><!<!---- NOKIA NOKIA ---->>
<Producer<Producer name="NOKIA"name="NOKIA">><<ProducerNoProducerNo no="h1234"no="h1234"/>/><PDA<PDA name="7110"name="7110">><Weight><Weight>141g141g</Weight></Weight><Price <Price contract=contract=““yes"yes">>999999</Price></Price><Price <Price contract=contract=““no"no">>49994999</Price></Price>
</PDA></PDA><PDA<PDA name="8210"name="8210">>... ...
</PDA></PDA></Producer></Producer></</PDACatalogPDACatalog>>
“Root Element" or“Document Element"
Prologue (optional)"xml declaration"
Comment
Start Tag
End Tag Attribute
Attribute Value
Elementname
Text“Character Data"
“Element Content"of <Producer>
“Empty Element"Subelement
PDACatalog1.XMLPDACatalog1.XML
“Mixed Content"
M2-18© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
XML Document 2/3
Elements and Attributes
Element- and attribute names have to be valid "XML Names"[ letter | _ | : ] [ letter | '0..9' | '.' | '-' | '_' | ':' ]*
"letter": A-Z, a-z, and others like ä, ê ς
':' reserved for namespaces
No length restriction
Case-sensitive
Empty elements can be represented in long form or short form
<ProducerNo no="h1234"></ProducerNo> or<ProducerNo no="h1234"/>
Attribute values must be enlosed by quotation marks<PDA name='8210'> or<PDA name="8210">
M2-19© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
XML Document 3/3
Comments
Can stretch across multiple rowsBetween start tag and end tag of an elementBefore or after the root element
RestrictionsComment within a tag not allowedNesting of comments not allowed"--" within a comment not allowed
<!--A comment may comprisealso <tagNames> or&entities;-->
...
M2-20© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
A DTD defines vocabulary and grammar for a set of XML documentsAn XML document is allowed to reference a single DTD only("document type declaration - DOCTYPE")
A DTD has to be referencedAFTER the prologuebut BEFORE the root element
A DTD does NOT DEFINE the rootelement of a XML document
The root element is rather definedwithin the XML document itselfusing the DOCTYPE-DeclarationCan be an arbitrary element of the DTD
DTD 1/8Purpose and Characteristica
<?xml version="1.0"?><?xml version="1.0"?><!DOCTYPE <!DOCTYPE PDACatalogPDACatalog ......<<PDACatalogPDACatalog>>..........
PDACatalog1.XMLPDACatalog1.XML
Catalog.DTDCatalog.DTD
Root ElementDefinitionUsage
M2-21© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 2/8Incorporating DTD’s into XML Documents – 3 Alternatives
1. External DTD, i.e., a dedicated file (*.dtd) identified by means of an URI ("external subset") <!DOCTYPE PDACatalog SYSTEM "Catalog.dtd">
2. Internal DTD, i.e., defined within the XML document ("internal subset")<!DOCTYPE PDACatalog […]>
3. External & internal DTD, i.e., internal complements external
Excursus – URL vs. URI:An URL (Uniform Resource Locator) identifies Internet resources on basis of their location using the Domain Name Service (DNS)An URI (Uniform Resource Identifier) identifies arbitraryresources on basis of their names (z.B. ISBN#) or otherproperties of the resourceEach URL is a valid URI
M2-22© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 3/8Example – Catalog.dtd
<!-- Catalog DTD Version 1.0 --><!ELEMENT PDACatalog (Producer*)><!ELEMENT Producer (ProducerNo, PDA+)><!ATTLIST Producer name CDATA #REQUIRED><!ELEMENT ProducerNo EMPTY><!ATTLIST ProducerNo no ID #REQUIRED><!ELEMENT PDA (Weight, Price+)><!ATTLIST PDA name CDATA #REQUIRED><!ELEMENT Weight (#PCDATA)><!ELEMENT Price (#PCDATA)><!ATTLIST Price contract (yes|no) "no">
Weight
ProducerNono
*
1..*
Pricecontract
PDAname
PDACatalog
Producername
1
1 1..*
UML Class Diagram XML DTD
XML ElementXML Attribute
Legend:1 : exactly once1..* : once or several times* : 0 or several times
: part-of
M2-23© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 4/8Element Declaration <!ELEMENT element name
(Content Model)>
Sequence <!ELEMENT Producer (ProducerNo, PDA+)>
Alternative <!ELEMENT Battery (LiIo | NiMh | NiCd)>
CardinalityOptional (0 or once)
<!ELEMENT PDA (Comment?)>
Null or several times <!ELEMENT PDACatalog (Producer*)>
Once or several times<!ELEMENT Producer (PDA+)>
Content model can be nested by means of paranthesis<!ELEMENT div1 (head, (p | list | note)*, div2*)>
M2-24© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 5/8Element Declaration
Empty ElementElement may contain attributes, but neither text nor subelements
<!ELEMENT ProducerNo EMPTY>
Element ContentElement contains subelements and optional attributes but no text
<!ELEMENT PDACatalog (Producer*)>
Mixed ContentElement contains text and optional subelements or attributes
<!ELEMENT Price (#PCDATA)> <!ELEMENT Price (#PCDATA | Category | Discount)*>
Element with arbitrary contentContent not exactly specified in DTDUsed elements have to be declared anyway
<!ELEMENT Comment ANY>
M2-25© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 6/8Attribute Declaration
<!ATTLIST element nameattributename1 type defaultattributename2 type default...>
Attribute names must be unique within an element
Default specificationsNOT NULL #REQUIREDOptional Value #IMPLIEDDefault Value [#FIXED] "value"
M2-26© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
CDATAString<!ATTLIST Producer name CDATA #REQUIRED>
ID, IDREF(S)ID ensures uniqueness of attribute values within a documentPer element 1 attribute of type ID allowed onlyIDREF is a reference to an attribute of type ID
„Referential integrity“ (untyped!) is checked by XML processorValues of ID- and IDREF(S)-attributes must be valid XML names, i.e., starting numbers are not allowed
DTD 7/8Attribute Declaration – 10 Types
<!ATTLIST Exampleidentity ID #IMPLIEDreference IDREF #IMPLIED>
M2-27© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 8/8Attribute Declaration – 10 Types
Enumeration TypeA pre-defined set of values consisting of XML name tokens<!ATTLIST Price contract (yes|no) "no">
ENTITY, ENTITIESAttribute value is the name of a declared non-parsed Entity<!ATTLIST Image filename ENTITY #REQUIRED>
NMTOKEN(S)"XML name tokens” are an extended form of XML namesIn addition, they can start with "0..9 ", ". " and "-"<!ATTLIST journal year NMTOKEN #REQUIRED>
NOTATIONAttribute value is the name of a declared notation – seldomlyused<!ATTLIST image type NOTATION (gif | tiff) #REQUIRED>
M2-28© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 1/9Overview
General EntitiesUsage in XML documents
Parameter EntitiesUsage in DTDs
Pre-definedReplacement of XML-specific char’s
UnicodeReplacement of none-ASCII-char’s
User-definedReplacement of document parts
Internalembedded
Externalfile
Parsed
Non-parsedInternal External
Referenceable, named parts ofXML documents (plain text, markup or other arbitrary formats) or a DTD
Purpose: Character replacement – macros, modularisationProcessing: References are expanded during parsing
M2-29© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Purpose: Representation of XML specific characterse.g. <> – "escaping"
5 pre-defined Entities & & (ampersand)< < (less than)> > (greater than)
Example<formular>x < y</formular>
UsageAs element value or attribute value
Alternative: CDATA-SectionExample:<formular>x <![CDATA[<]]> y</formular>“Within” CDATA only its end is recognized (']]>')CDATA-Sections cannot be nested
Entities 2/9Pre-defined Entities
' ' (apostrophe)&qout; " (quotation mark)
Interpreted as plain text,NOT as markup
M2-30© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
PurposeRepresentation of characters, notavailable at the keyboardhttp://www.unicode.org/
Unicode classifies characters into letters, numbers, punctuations, symbols (general, technical, mathematical), etc.
Unique assignment of charactersto numbersSupports 25 living languages (Cyrillic, Hebrew, Hiragana, ...)All in all approx. 50.000 different characters
UsageAs element value or attribute valueArbitrary Unicode-characters arereferenced via their numbers(decimal or hexadecimal)
Entities 3/9Unicode ("Character Encoding") Entities
û û and ©all represent the samecharacter
M2-31© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Text or well-formed markup is associated with a name
Declaration within the DTD:
UsageAs element value or attribute value of the XML documentIn entities themselves – but cyclic references are forbidden
Entities 4/9User-Defined Internal Entities
<!ENTITY entityName "replacementText or Markup">
&entityName;
M2-32© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
PurposeDecomposition of the XML document (similar to SSI – Server Side Include-mechanism) Because of the document’s size or for reuse
Declaration within the DTD
CharakteristicaIn principal well-formed, but may contain multiple root elementsReference to a DTD not allowed
UsageSyntax analogous to internal entitiesAs element values of the XML document and within entities themeselvesCyclic references forbiddenNOT within attribute values
Entities 5/9User-Defined External Parsed Entities
<!ENTITY entityName SYSTEM "URI">
M2-33© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
PurposeReferences to files with arbitrary formats, e.g. ASCII, not-wellformed XML, GIF, JPEG, QuickTime Movies
NDATA defines a "non-parsed" Entity and specifies an arbitrary file formata NOTATION-declaration is necessary to identify a corresponding application (via an URI), which is able to process files of thisformat
UsageOnly as attribute value of type ENTITYSyntax: entity name within quotation marks (Note: NO &...;)Processor informs the application only that there exists a non-parsed entity at a certain location – no expansion!
(More expressive) Alternative: W3C’s XLink-Standard
Entities 6/9User-Defined External Non-Parsed Entities
<!ENTITY entityName SYSTEM "URI" NDATA formatName><!NOTATION formatName SYSTEM "URI">
M2-34© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 7/9User-Defined Entities – Example
<?XML version="1.0"?><!DOCTYPE PDACatalog SYSTEM ”Catalog.dtd" [<!ENTITY linkNokia "http://www.nokia.de/8210"><!ENTITY address "<town>Linz</town>"><!ENTITY features SYSTEM "feat8210.XML"><!ENTITY bildNokia SYSTEM "/pictures/8210.jpg"
NDATA jpeg><!NOTATION jpeg SYSTEM "image/jpeg">…<!ATTLIST Image filename ENTITY #REQUIRED>]>…<PDA name="8210">
<Picture><Image filename="bildNokia"/></Picture><ProducerInfo>&linkNokia;</ProducerInfo>…&features; &address;
</PDA> …
Dec
lara
tion
Usa
ge
internal
external, parsedexternal, non-parsed
Usage aselement value
Usage asattribute value
M2-35© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 8/9Parameter Entities
<!ENTITY % Battery"(type, capacity)"
>
<!ELEMENT PDABatt %Battery;><!ELEMENT camcorderBatt %Battery;>
Internal<!ENTITY % linkNokia
SYSTEM "http://nokia.de">
%linkNokia;
External
PurposeModularization of DTDs
Syntactical difference to General Entities% blank included for declaration% blank excluded for usage
Definition of ...Name and content model of elementsAttribute declaration
M2-36© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 9/9Parameter Entities – Overriding
<!ENTITY % residental_content"address,rooms">
External DTD
Internal DTD of a XML document<!ENTITY % residental_content
"address,rooms,baths">
A Parameter Entity defined within an external DTD can bearbitrarily overriden within the internal DTD of a XML documentThis allows to adapt the external DTD to the requirementsof single XML documents without having to change theexternal DTDThus, the Parameter Entity is used as a kind of "Customization Hook"
M2-37© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Outline
IntroductionXML 1.0NamespacesXML Schema
M2-38© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 1/5
A XML namespace (NS) allows a unique global identification of elments and attributes
W3C-REC "Namespaces in XML", 14th Jan. 1999 (13 pages)
For this, elements and attributes of a domain (e.g. MathML) are assigned to one or more NS
XSL uses, e.g., different namespaces for XSLT and XSL-FO
A NS is represented by an URINeeds not directly refer to the corresponding vocabularyThus, provides a level of indirection which allows to decouple thelocation of the vocabulary from the unique identifier – the URI
The associated elements and attributes have to be qualifiedby means of this URI in case of usage, thus being madeglobaly unique
This allows the reuse and especially the combination(„mixture“) of different vocabularies
M2-39© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 2/5NS with Prefix vs. Default NS
BUT: URIs cannot be used for direct qualificationThis is since URIs normally contain characters which are not allowed as part of valid XML names (e.g., " / ", " & ")
Instead, user-defined prefixes have to be used
One ore more NS are declared on basis of the pre-definedattribute xmlns
This attribute can be defined in the context of any element of the DTD
The name of the element itself where the NS has been declared as well as direct and indirect subelements and attributes can be qualified withthe NS – „NS-inheritance“
Default NSAlso declared via the pre-defined attribute xmlns – BUT – only 1 per element, and without declaring any prefixNone-qualified subelements are automatically associated with thedefault NS, attributes NOT Can be overriden within subelements
M2-40© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 3/5Declaration and Usage
...<edi:HC
xmlns:edi='http://ecommerce.org/schema'xmlns='http://www.mobildev.com/schema'>
<model name="8210"><edi:price edi:units='Euro'>32.18</edi:price><price währung='USD'>25.16</price>...</model>...
</edi:HC>
NS Prefix (optional) URI of the NSPre-defined Attributefor NS Declaration
Default-NS(no Prefix)
The NS of the element edi:price is http://ecommerce.org/schemaThe NS of the elements model and price is the default NShttp://www.mobildev.com/schemaThe attributes name and währung have NO NS associated with
M2-41© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 4/5... and DTDs
NS are in principle independent of DTDsCan be used in documents with or without DTDs
BUT:All elements and attributes which are qualified in the XML document must also be declared appropriately within the DTDHuge Overhead – this is since DTD’s are not aware of NS<edi:HC> ... <!ELEMENT edi:HC (....)><edi:price> ... <!ELEMENT edi:price (#PCDATA)>
What can be done is to specify a default NS within the DTD<!ATTLIST edi:HC xmlns
CDATA #FIXED 'http://www.mobildev.com/schema'>
M2-42© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 5/5Exemplary NS-URIs
RDF http://www.w3.org/1999/02/22-rdf-syntax-ns#http://www.w3.org/2000/01/rdf-schema#
MathML http://www.w3.org/1998/Math/MathML
XHTML http://www.w3.org/1999/xhtmlSMIL http://www.w3.org/TR/REC-smil
XSL http://www.w3.org/1999/XSL/Transformhttp://www.w3.org/1999/XSL/Format
M2-43© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Outline
IntroductionXML 1.0NamespacesXML Schema
IntroductionElements and AttributesPre-defined DatatypesUser-defined DatatypesKeysSchema CompositionSchema Modeling StylesComparison DTD – XML Schema
M2-44© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
IntroductionDTD versus XML Schema 1/2
Drawbacks DTDsProprietary syntaxFew datatypes, in fact onlyone – StringGlobal definition of elementsParameter Entities for modularization & overridingID, IDREF(S): Severe restrictions
Advantages XML SchemaXML as syntaxNumerous pre-defineddatatypesUser-defined simple andcomplex datatypesInheritanceKeys, references:flexible concept
XML SchemaDefinition of the structure of XML documentsW3C REC May 2001, approx. 420 pagesW3C REC 2nd edition October 2004
M2-45© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
<?xml version="1.0"?><schema ...>
<simpleType name="producerNoType"> ...<element name="PDACatalog">
<complexType><sequence>
<element name="Producer" minOccurs="0" maxOccurs="unbounded"><complexType>
<sequence><element name="ProducerNo"
type="hc:producerNoType" minOccurs="1" maxOccurs="1"/><element name=„PDA" minOccurs="1" maxOccurs="unbounded">
<complexType><sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>
</sequence> ...</schema>
Catalog.xsdCatalog.dtd
IntroductionDTD versus XML Schema 2/2
...<!ELEMENT PDACatalog (Producer*) ><!ELEMENT Producer (ProducerNo, PDA+)><!ELEMENT PDA (Weight, Battery)> <!ELEMENT Weight (#PCDATA)><!ELEMENT Battery (#PCDATA)> ...
M2-46© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespace for own VocabularyNamespace (NS) of the vocabulary to be defined can be declared by means of attribute targetNamespace (optional!)
NS of the XML Schema-Standard VocabularyDeclaration is obligatory!Additional NS (i.e., vocabularies) can be incorporated
A single NS can be defined as Default–NSEither own NS, XML Schema–NS or other NSFor all other NS used, a prefix is obligatory
<?xml version="1.0"?><schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc"
xmlns:hc="http://www.ifs.uni-linz.ac.at/hc"xmlns="http://www.w3.org/2001/XMLSchema"attributeFormDefault="qualified"elementFormDefault="qualified">...
IntroductionDeclaration of Namespaces in the Schema
M2-47© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Schema of a XML document is defined within the root element via the attribute schemaLocation
1. Part: targetNamespace of the schema
2. Part: location of the schema document
Catalog.xsd
Catalog1.xml
<?xml version="1.0"?><schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc"
xmlns:hc="http://www.ifs.uni-linz.ac.at/hc"xmlns="http://www.w3.org/2001/XMLSchema"attributeFormDefault="qualified"elementFormDefault="qualified">...
<?xml version="1.0"?><PDACatalog xsi:schemaLocation="http://www.ifs.uni-linz.ac.at/hc Catalog.xsd"
xmlns="http://www.ifs.uni-linz.ac.at/hc"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“
>...
IntroductionUsage of NS in the XML Document
xsi:noNamespaceSchemaLocation= "directPathToXSD_File"
M2-48© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Element
Attribut
Global DefinitionDirect subelement of schemaNOTE: the root element of the XML document is required to be defined as global element!
Local DefinitionDefinition on an arbitrary nesting level
Analoguosly for Datatypes!
<element name="name" type="type" minOccurs="int" maxOccurs="int|unbounded"... />
Simple orComplex Type
Cardinality: Upper/Lower Bounds(only in “local” elements)
<attribute name="name" type="type" use="how-its-used" default/fixed="value"... />
Values: required,optional, prohibited(only in “local” attributes)
only relevant, if“use” is not defined
Simple Type
Elements and Attributes 1/3Global / Local Definition
M2-49© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Global or Local Datatypes
Reference to an existing Element or Attribute
<element name="name" minOccurs="int" maxOccurs="int|unbounded"...><complexType>…</complexType>
</element>
<element ref="name" minOccurs="int" maxOccurs="int|unbounded".../>
<attribute name="name" use="how-its-used" default/fixed="value"...><simpleType>...</simpleType>
</attribute>
<attribute ref="name" use="how-its-used" default/fixed="value".../>
Elements and Attributes 2/3Global / Local Datatypes and References
M2-50© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
<schema ...><element name="Producer"><complexType><sequence><element name="ProducerNo" type="hc:producerNoType"
minOccurs="1" maxOccurs="1"/><element ref="hc:PDA" maxOccurs="unbounded"/>
</sequence><attribute name="name" type="string" use="required"/>
</complexType></element><element name="PDA"><complexType><sequence><element name="Weight" type="string"/><element name="Battery" type="string"/>
</sequence></complexType>
</element><simpleType name="producerNo"> …
Global Element,local Datatype
Reference to a global Element
Local Element,global Datatype
Global Element,local Datatype
Local Element,pre-def. Datatype
Local Attribute,pre-def. Datatype
Elements and Attributes 3/3Summarizing Example – Global/Local
Orthogonality of Concepts:
M2-51© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
string boolean float double duration dateTime
time date gYear gMonthDay
gDaygYearMonth
anyType
anySimple Type(all complex types)
gMonth hexBinary
base64Binary
anyURI
QName NOTATION
normalizedString
token
language NMTOKEN Name
NMTOKENS NCName
ID IDREF ENTITY
IDREFS ENTITIES
decimal
integer
nonPositiveInteger nonNegativeInteger
negativeInteger positiveInteger unsignedLong
unsignedInt
unsignedShort
unsignedByte
long
int
short
byte
(W3C REC, 28th Oct. 2004)
Primitive (atomic)Derived
Pre-Defined Datatypes 1/4
M2-52© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Because of backward-compatibilityreasons, usable only as types forattributes
Pre-Defined Datatypes 2/4String Datatypes
string
anySimpleType
hexBinary
base64Binary
anyURI
QNameNOTATION
normalizedString
token
language
NMTOKEN
NameNMTOKENS
NCName
ID IDREF ENTITY
IDREFS ENTITIES
Pre-defined primitive TypesPre-defined derived Types
Backward-compatibility to DTDs
Normalized String with whitespace replacement. Each Tab, Linefeed and CR is replaced by Blank.
"Tokenized" String – all whitespace characters are replaced by blanks, all starting and ending blanks are deleted and multiple consecutive blanks are replacedby a single one.
Standardized language codes (e.g. en, en-US, de, de-DE)
Name token: String without blanks (z.B. "CMS", "234234")
XML-Name: must start with letter, ":" or "-" (e.g., "CMS", "-1")
Name without prefix
String-Datatype withoutWhitespace-Replacement
Binary string-encodedDatatypes
Qualified name: supports the usageof names with NS-prefix
M2-53© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Pre-defined Datatypes 3/4Numerical Datatypes
floatdouble
anySimpleType
decimal
integer
nonPositiveInteger nonNegativeInteger
negativeInteger positiveInteger unsignedLong
unsignedInt
unsignedShort
unsignedByte
long
int
short
byte
Pre-defined primitive TypesPre-defined derived Types
Decimal Numbers: decimal separator ".", "+" or "-" possible.
64, 32, 16 or 8 Bit
Floating Point Numbers: simple (32 Bits) and double(64 Bits) precision
boolean
M2-54© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Pre-defined Datatypes 4/4Date- and Time Datatypes
duration dateTimetime date gYear gMonthDay gDaygYearMonth
anySimpleType
gMonth
"CCYY-MM-DDThh:mm:ss"
"CCYY-MM-DD"
"CCYY-MM""CCYY"
"--MM-DD"
"---DD"
"--MM""hh:mm:ss""PnYnMnDTnHnMnS"
Day of the month
Day of the year
Month of the year
M2-55© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
User-defined DatatypesAlternatives
Should the Type contain Elements or Attributes?
Unstructured Content<simpleType>
Structured Content<complexType>
Derivation<restriction>
<union> or<list>
Derivation<restriction><extension>
Nesting<sequence><all><choice>
Empty / Mixed
Nam
ed
/ A
no
nym
ou
s
Should the Type contain Elements?
yes no
yes no
Attributes & Elements<complexContent>
Attributes<simpleContent>
Note: <complexContent>only necessary in case of derivationfrom a user-definedtype
M2-56© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
User-defined DatatypesAlternatives – Examples
<xsd:complexType name="BookTypeWithID"><xsd:complexContent>
<xsd:extension base="BookType"><xsd:attribute name="ID" type="xsd:token"/>
</xsd:extension></xsd:complexContent>
</xsd:complexType>
<xsd:complexType><xsd:sequence>
....</xsd:sequence>
</xsd:complexType>
<xsd:simpleType name="longitudeType"><xsd:restriction base="xsd:integer">
<xsd:minInclusive value="-180"/><xsd:maxInclusive value="180"/>
</xsd:restriction></xsd:simpleType>
<xsd:integer>
No Derivation Derivation
Simple
Complex
User-definedPre-defined
Anonymous Named
M2-57© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Restriction of a pre-defined datatype<restriction>
Union of pre-defined datatypes (Extension)<union>
Values must correspond to at least one of the combined datatypes
List of values of one pre-defined datatype(or again of a List-Datatype)
<list>
User-defined DatatypesDerived Simple Datatypes – <simpleType>
M2-58© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Alternative Definition PossibilitiesReferencing an existing datatype via the attribute baseLocal definition from scratch by using simpleType as subelement of the restriction-Element
12 Possible Restrictions, depending on the base datatypelengthminLengthmaxLengthpatternenumerationminInclusivemaxInclusiveminExclusivemaxExclusivewhiteSpacetotalDigitsfractionDigits
<simpleType name="batteryType"><restriction base="string">
<enumeration value="NiMh"/><enumeration value="NiCd"/><enumeration value="LiIo"/>
</restriction></simpleType><element name="Battery" type="hc:batteryType"/>
<Battery>NiCd</Battery>XML-Document
User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction
M2-59© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction
M2-60© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
User-defined DatatypesDerived Simple Datatypes <simpleType> – restriction
Restrictions using a “pattern” element
Restrictions of the lexical values
Simple regular expressionsNormal characters: "C&A"Categories of characters:"\p{IsBasicLatin}"Sets of characters: "[\p{IsBasicLatin}-[\d]]"Quantifiers: "[a-zA-Z]{1,8}"Paranthesis: "(XML(\s+|-))?Schema"
Combinations of these expressions
M2-61© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Alternative Definition PossibilitiesReferencing an existing datatype via attributes (memberTypes or itemType)
Local definition from scratch by using simpleType as subelementof the union- or list-Elements
<simpleType name="PDAFeatureType"><union memberTypes="hc:PDAColor hc:PDARobustness"/>
</simpleType><simpleType name="PDAFeatureListType">
<list itemType="hc:PDAFeature"/></simpleType><element name="PDAFeatureList" type="hc:PDAFeatureListType"/>
XML-Dokument:<PDAFeatureList>blue waterproof shockproof</PDAFeatureList>
User-defined DatatypeDerived Simple Datatypes <simpleType> – union/list
M2-62© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Nested ElementsPossible within a complex datatype only
AttributesPossible within a complex datatype only
Independent of the existence of nested elements
Empty ContentPossible within a complex datatype only
Does not have nested elements
Mixed ContentDatatype may contain nested elements and text
In contrast to DTDs, for nested elements, the ordering and cardinality properties can be arbitrarily specified
User-defined Datatypes<complexType> - Nested Elements/Attributes/Empty/Mixed
M2-63© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Sequence – <sequence>
Choice – <choice>Arbitrary Ordering – <all>
Nested Elements can be used in arbitrary order
CardinalityExpressed by means of minOccurs and maxOccurs
<complexType name=“PDAType"><sequence minOccurs="1" maxOccurs="1">
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>
</sequence><attribute name="no" type="nonNegativeInteger" use="required"/>
</complexType>
User-defined Datatype<complexType> – Nested Elements / Attributes
M2-64© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
<complexType name=“PDAType" mixed="true"><sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>
</sequence></complexType><element name=„PDA" type="hc:PDAType"/>
<PDA>Type Nokia 7110 has<Weight>141g</Weight>and<Battery>900mAh</Battery>
</PDA>
XML Document
User-defined Datatypes<complexType> – Mixed Content
M2-65© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Extension<extension>
Additional nested elements and/or attributes
Restriction<restriction>
DomainCardinality
Abstract Datatypes<complexType> with attribute abstract = "true“
Prohibition of Derivation<complexType> with attribute finalPotential values: #all, restriction, extension
User-defined Datatypes<complexType> – Derivation of Complex Types
M2-66© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Elements are attached at the endExtension must be specified within a <complexContent>-Tag
<complexType name=“extendedPDAType"><complexContent>
<extension base="hc:PDAType" ><sequence>
<element name=“Band" type="string" minOccurs="1" maxOccurs="1"/><element name="Feature" type="string"
minOccurs="1" maxOccurs="10"/></sequence>
</extension></complexContent>
</complexType>
extendedPDAType
PDAType
User-defined Datatypes<complexType> – Derivation via Extension
M2-67© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
The declarations of the base datatypewhich should retain must be repeatedRestriction must be specified within a <complexContent>-Tag
<complexType name=“restrictedPDAType"><complexContent>
<restriction base="hc:extendedPDAType"><sequence><element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name=“Band" type="string" minOccurs="1" maxOccurs="1"/><element name="Feature" type="string" minOccurs="1" maxOccurs="5"/>
</sequence></restriction>
</complexContent></complexType>
User-defined Datatypes<complexType> – Derivation via Restriction
extendedPDAType
restrictedPDAType
PDAType
M2-68© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
StaticDynamic
Definition of the derived datatype within the XML document via the attribute type of the XML Schema Instance (xsi) NS
ElementPDAhas datatypePDAType
<PDA><Weight>141g</Weight><Battery>900mAh</Battery>
</PDA><PDA xsi:type=“extendedPDAType">
<Weight>115g</Weight><Battery>550mAh</Battery><Band>Dualband</Band><Feature>Waterproof</Feature>
</PDA>
DatatypeextendedPDATypeis derived from PDAType:Extension withBand & Feature
User-defined Datatype<complexType> – Two Usage Possibilities
M2-69© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Characteristics of a key (key) Value (combination) must be uniqueValue must existKey must be defined as subelement of another element –following the type definition
Candidates for keys (field)Elements with simple datatypes only!AttributesCombinations of elements and attributes
Scope can be defined (selector)
Reference to key can be defined (keyref)
Elements, Attributes and Combinations thereof can bedefined to be unique (unique)
Value (combination) must be uniqueValue need NOT exist
Keys 1/2
M2-70© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Keys 2/2
<element name="PDACatalog"><complexType> ...</complexType><key name=“typeKey">
<selector xpath="hc:Producer/hc:PDA"/><field xpath="@name"/><field xpath="@version"/>
</key><keyref name="refToTypeKey" refer="hc:typeKey">
<selector xpath="hc:Stock/hc:PDAQuantity"/><field xpath="@name"/><field xpath="@version"/>
</keyref></element>
PDA Name Version Weight ... PDAQuantity Name Version Quantity
<element name="PDACatalog"><complexType> ...</complexType>
<unique name="uniqueProducerNo"><selector xpath="hc:Producer"/><field xpath="@producerNo"/>
</unique></element>
M2-71© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Group of Elements
<group name="mainData"><sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/><element name="Battery" type="string" minOccurs="1" maxOccurs="1"/>
</sequence></group>
<complexType name=“PDAType"><sequence>
<group ref="hc:mainData"/><element name="Feature" type="string" minOccurs="1" maxOccurs="10"/>
</sequence></complexType>
Schema CompositionWithin a Schema 1/2
M2-72© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Group of Attributes
<attributeGroup name="BatteryAttributeGroup"><attribute name="type" type="string" default="NiMh"/><attribute name=“capacity" type="string" use="required"/>
</attributeGroup>
<complexType name=“BatteryType"><sequence>...</sequence><attributeGroup ref="hc:BatteryAttributeGroup"/>
</complexType>
Schema CompositionWithin a Schema 2/2
M2-73© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Incorporation of other schemata via include, redefine and import
include, redefine and import elements must be subelementsof schema prior to any other declaration
Include of a Schema – includeNS of included schema must be equal to the NS of the including schema or no NS at allThe included schema can be used as if it were declared directly within the including schema
<schema...><include schemaLocation="PDA.xsd"/>...
Schema CompositionDifferent Schemata 1/2
M2-74© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Including and Redefining a Schema – redefineSame functionality as includeIn addition, included components (simpleType, complexType, group, attributeGroup) can be newly definedNew definitions replace the original ones
Import of a Schema – importImported schema can have an arbitrary NS (could be unequal to the current one) or none
<import namespace="http://" http://www.somewhere.else.com"schemaLocation="Producer.xsd"/>...
<redefine schemaLocation="PDA.xsd"><complexType name=“PDAType">....</complexType>...
</redefine>...
Schema CompositionDifferent Schemata 2/2
M2-75© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Schema Modeling StylesNon-Normative Datamodel of XML Schema Concepts
Legend:
http://www.w3.org/TR/xmlschema-1/
M2-76© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Schema Modeling StylesXML Schema Concepts in Practice
Analysis of 1400 Schemata of diverse standard vocabularies
Open Travel Alliance (OTA),
Human Resource XML (HR-XML),
W3C,
Global Justice XML,
etc.
P. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html
M2-77© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Schema Modeling StylesRelationships /Global vs. Local /Element vs. Type
RelationshipsRealisation by means of nesting or via references
Global Elements/Attribute-DeclarationsPre-requisite for reuse in the same/another schemaRoot element must be global
Local Element/Attribute-DeclarationsIn case that a declaration makes sense only in combination with thedeclared type
Local Element DeclarationsCan occur with different structure but the same name in different types
Local Attribute DeclarationsMakes sense since attributes are most often tightly coupled to elements
Three Stereotypical Design FormsRussian Doll DesignSalami Slice DesignVenetian Blinds Design
LiteratureXMLSchema Best Practices (Roger Costello): www.xfront.comP. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html
M2-78© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Nested Element Declarations
Local declarations onlyPrevents global types
AdvantagesStructure obvious(corresponds to the XML document‘s structure)Prevents side-effects
DisadvantagesDanger of deep nesting levelsNo reuse of declarations – redundanciesNo extensibility in terms of derivation
Schema Modeling StylesRussian Doll Design
M2-79© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Global Element DeclarationsUsage of global elements per reference (ref-Attribute)
Each global element can be aroot element
AdvantagesReuse of elements
DisadvantagesLarge numger of global elements
ConfusingDanger of side-effects in case ofchanges to global elements
No extensibility in terms of derivation
Schema Modeling StylesSalami Slice Design
M2-80© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Global Type DeclarationsElements, except the rootelement, are declared locally
AdvantagesReuse of types
A named type is available foreach element and attributeTypes can be imported fromother schemata
Extensibility by derivation<redefine>
DisadvantagesLarge number of global types
Confusing
Schema Modeling StylesVenetian Blinds Design
M2-81© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Russian Doll DesignFor restrictive structuresStructure of the XML documents in large parts pre-defined bythe schema
Salami Slice DesignFor flexible structuresStructure of the XML documents can strongly vary since different root elements are possible
Venetian Blinds DesignFor flexible structures tooStructure of XML documents can strongly vary in case that typeinheritance is used
In practice – mixtures!
Schema Modeling StylesComparison
M2-82© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Schema Modeling StylesPossible Mixture: „Garden of Eden“
M2-83© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces
ComplexSimpleStructure
XMLProprietarySyntax
XML SchemaDTD
Comparison DTD – XML SchemaGeneral Criteria
M2-84© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
minOccurs and maxOccurs(more flexible)
"?", "*", "+"Cardinality
<sequence>","Defined Order
simple Types, complexTypes
Text, Elements, mixed content
Definition of the content
Default values
XML SchemaDTD
Arbitrary Order <all>
Alternative <choice>"|"
Comparison DTD – XML SchemaElements
M2-85© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Optionality
Default values
XML SchemaDTD
Comparison DTD – XML SchemaAttributes
M2-86© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
<pattern>Very restricted (e.g. by means of cardinality)
Patterns forDatatypes
many possibilities: <length>, ...
Enumerating all possiblevalues (only for attributes)
Domains
User-definedDatatypes
various datatypes;e.g. boolean, integer...
few datatypes –in fact STRING only,e.g. CDATA, ID, ...
Pre-definedDatatypes
XML SchemaDTD
Comparison DTD – XML SchemaDatatypes
M2-87© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
base and <restriction>Dervivation from complexdatatypes (restriktion)
base and <extension>Derivation from complexdatatypes (extension)
baseDerivation from pre-defined, simple datatypes
XML SchemaDTD
Comparison DTD – XML SchemaInheritance
M2-88© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Most important advantages of DTD’squick andeasy to specify
benefical for the specification of simple documents
Most important advantages of XML SchemaNumerous datatypesObject-oriented approachmore modelling possibilities than with DTDs
beneficial for the specification of complex documents
Comparison DTD – XML SchemaSummary
M2-89© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schemadefinition
Literature
BooksXML in a Nutshell: A Desktop Quick Reference, 3rd EditionElliotte Rusty Harold, W. Scott Means, O'Reilly & Associates, 2005
O’Reilly XML.com: http://www.xml.com
XML 1.1 Bible, Elliotte Rusty Harold, 2nd Edition, John Wiley & Sons, 2004Elliotte Rusty Harold. Cafe con Leche XML News and Resources: http://www.ibiblio.org/xml
ConferencesXML Europe (XTech Conference Series)
http://www.xmleurope.com
XML Conference & Expositionhttp://www.xmlconference.org
Online ResourcesCommented XML-Standard – Tim Bray
http://www.xml.com/axml/testaxml.htm
W3Schoolshttp://www.w3schools.com/xml/
XML & DTD Patternshttp://www.xmlpatterns.com/
Overview XML Editorshttp://www.perfectxml.com/soft.asp?cat=6
Java and XML. Sun Microsystems, Inchttp://java.sun.com/xml/
IBM XML Zonehttp://www.ibm.com/developer/xml/
Microsoft XML Developer Centerhttp://msdn.microsoft.com/xml/default.asp
XML Schema Test Suites vom W3Chttp://www.w3.org/2001/05/xmlschema-test-collection.html
IBM's Schema Quality Checker (SQC)http://www.alphaworks.ibm.com/tech/xmlsqc