Xml pres 1

87
XML From The Ground Up

description

A presentation on XML from 2006! but still useful

Transcript of Xml pres 1

Page 1: Xml pres 1

XML From The Ground Up

Page 2: Xml pres 1

?

12345678901234567890123456789

simpson bart springfield

flintstonefred bedrock

rubble barney bedrock

Page 3: Xml pres 1

Fixed Width Field

12345678901234567890123456789

simpson bart springfield

flintstonefred bedrock

rubble barney bedrock

Page 4: Xml pres 1

Fixed Width cont…

simpson bart springfield

flintstone fred bedrock

rubble barney bedrock

Page 5: Xml pres 1

?1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -

loaded",4799.00

Page 6: Xml pres 1

CSV1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -

loaded",4799.00

Page 7: Xml pres 1

CSV cont…

1997 Ford E350 ac, abs, moon 3000.00

1999 ChevyVenture "Extended Edition"   4900.00

1996 Jeep Grand Cherokee

MUST SELL! air - moon roof - loaded 4799.00

Page 8: Xml pres 1

? 01041cam 2200265 a 450000100200000000300040002000

50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\

Page 9: Xml pres 1

MARC 01041cam 2200265 a 450000100200000000300040002000

50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\

Page 10: Xml pres 1

MARC cont… Leader 01041cam 2200265 a 4500 Control No. 001 ###89048230 Control No. ID 003 DLC DTLT 005 19911106082810.9 Fixed Data 008 891101s1990 maua j 001 0 eng LCCN 010 ## $a ###89048230 ISBN 020 ## $a 0316107514 :

$c $12.95 ISBN 020 ## $a 0316107506 (pbk.) :

$c $5.95 ($6.95 Can.) Cat. Source 040 ## $a DLC

$c DLC $d DLC LC Call No. 050 00 $a GV943.25 $b .B74 1990 Dewey No. 082 00 $a 796.334/2 $2 20 …

Page 11: Xml pres 1

?:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE

Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!

:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and

column wide): :fig place=inline width=column. Let's create another figure that is column wide,

which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.

Page 12: Xml pres 1

GML:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE

Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!

:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and

column wide): :fig place=inline width=column. Let's create another figure that is column wide,

which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.

Page 13: Xml pres 1

GML cont…

Page 14: Xml pres 1

SGML

<QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS>

</QUOTE>

Page 15: Xml pres 1

HTML

Page 16: Xml pres 1

XML - 1

<stats21> <ARN ref="E008026"> <AttendantCircumstancesRecord> <PoliceForce>96</PoliceForce> <YearOfRecord>00</YearOfRecord> <MonthOfRecord>00</MonthOfRecord> <AccidentReferenceNumber>E008026</AccidentReferenceNumber> <AccidentSeverity>3</AccidentSeverity> <NumberOfVehicles>002</NumberOfVehicles> <NumberOfCasualties>001</NumberOfCasualties>

… </AttendantCircumstancesRecord> </ARN></stats21>

Page 17: Xml pres 1

XML - 2

Is for structuring data Is derived from SGML/HTML Is text, but isn’t meant to be read Is verbose by design

Page 18: Xml pres 1

Basic Syntax of XML All XML elements must have a closing tag Empty elements must close with / XML tags are case sensitive All XML elements must be properly nested All XML documents must have a root

element Attribute values must always be quoted XML entities must be used for special

characters

Page 19: Xml pres 1

Special Characters in XML strings

& - &amp; < - &lt; > - &gt; " - &quot; ' - &#39;

Page 20: Xml pres 1

Example of Special Characters

Invalid XML<Organization>Logica & SE</Organization>

Valid XML <Organization>Logica &amp;

SE</Organization>

Page 21: Xml pres 1

XML Structure<?xml version="1.0" encoding="utf-

8" standalone="no"?> <?xml-stylesheet type="text/css“ href="xmlstyle.css"?><bookstore xml:lang="en-US“ xmlns:def="Definitions“>

<book id=“1”>The Bible</book> …</bookstore>

Prolog.(optional)

Processing Instruction (optional)Document Element (namespace/s)

Child node/s

Closing tag of Document Element

Page 22: Xml pres 1

XML Example<?xml version="1.0" encoding="UTF-8"?> <Recipe name="bread" prep_time="5 mins" cook_time="3 hours">

<title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient><ingredient amount="0.25" unit="ounce">Yeast</ingredient><ingredient amount="1.5" unit="cups“ state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient><Instructions>

<step>Mix all ingredients together, and knead thoroughly.</step>

<step>Cover with a cloth, and leave for one hour in warm room.</step>

<step>Knead again, place in a tin, and then bake in the oven.</step></Instructions>

</Recipe>

Page 23: Xml pres 1

Root

?xml Recipe

name

prep_timecook_tim

e

ingredient bread

5mins3

hourstitle

Basic…amoun

t

Instructions

step step step

Mix… Cover… Knead…

3

Flour

textp-iroot

element

attribute

Page 24: Xml pres 1

Attributes vs ElementsData can be stored in child elements or in attributes.

<person sex="female"> <fname>Anna</fname> <lname>Smith</lname> </person>

<person> <sex>female</sex> <fname>Anna</fname> <lname>Smith</lname> </person>

Page 25: Xml pres 1

Namespaces Disambiguation mechanism <x xmlns:edi='http://ecommerce.org/schema'>

  <!--the "edi" prefix is bound to http://ecommerce.org/schema       for the "x" element and contents --></x>

<x xmlns:edi='http://ecommerce.org/schema'>  <!-- the 'price' element's namespace is 

http://ecommerce.org/schema -->  <edi:price units='Euro'>32.18</edi:price></x>

Page 26: Xml pres 1

XML Document Structure

Page 27: Xml pres 1

Tree Representation

Page 28: Xml pres 1

Tree

Page 29: Xml pres 1

Pruning

Page 30: Xml pres 1

Grafting

Page 31: Xml pres 1

Hierarchy

Page 32: Xml pres 1

Tree Traversal

Page 33: Xml pres 1

Tree Models

Page 34: Xml pres 1

Trees – Nested Set view

Page 35: Xml pres 1

Take Home …

XML is a syntax for marking up data Markup tags are not pre-defined Namespaces make identical tag

names unique An XML instance document is made

up of markup tags and text (data) XML documents are tree structures

Page 36: Xml pres 1
Page 37: Xml pres 1

XPath

language for addressing part/s of an XML document

designed to be used by XSLT models XML document as tree of

nodes fully supports XML Namespaces

Page 38: Xml pres 1

XPath & XML Document Structure

<xml> <table> <rec id="1"> <numField>123</numField> <stringField>StringValue</stringField> </rec> <rec id="2"> <numField>346</numField> <stringField>Text Value</stringField> </rec> </table> </xml>

XPathMain.htm

xml xml/table xml/table/rec xml/table/rec/numField xml/table/rec/stringField

xml/table/rec/@id

xml/table/

rec[@id='2']

Page 39: Xml pres 1

XSL/XSLT

Page 40: Xml pres 1
Page 41: Xml pres 1

XSL/XSL Example - Source

<persons> <person username="MP123456">

<name>John</name> <family_name>Smith</family_name>

</person> <person username="PK123456">

<name>Sally</name> <family_name>Jones</family_name>

</person> </persons>

Page 42: Xml pres 1

XSLT Stylesheet<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:template match="/"> <transform>

<xsl:apply-templates/> </transform> </xsl:template>

<xsl:template match="person"> <record>

<username> <xsl:value-of select="@username" />

</username> <name>

<xsl:value-of select="name" /> </name>

</record> </xsl:template>

</xsl:stylesheet>

Page 43: Xml pres 1

Transformed Output<?xml version="1.0" encoding="UTF-8"?> <transform> <record> <username>MP123456</username>

<name>John</name> </record> <record> <username>PK123456</username>

<name>Sally</name> </record> </transform>

Page 44: Xml pres 1

XSLT Functions

current document element-available format-number function-available generate-id key system-property unparsed-entity-uri

Page 45: Xml pres 1

XPath Functions

boolean ceiling concat contains count false floor id lang

last local-name name namespace-uri normalize-space

not number position round

starts-with string string-length substring substring-after

substring-before

sum translate true

Page 46: Xml pres 1

XSL-FO Processor

Page 47: Xml pres 1

Take Home …

XPath to address data within XML XSLT to re-structure XML They operate on collections of

nodes They work with any type of XML

Page 48: Xml pres 1

XSLT_test.htm

Page 49: Xml pres 1

XML Schema

A pattern for XML documents Content Structure Constraints

Page 50: Xml pres 1

XML Schema Defines … Content

elements & attributes Structure

parent-child relationships order of child elements number of child elements

Constraints whether an element is empty or can include

text data types for elements and attributes default/fixed values for elements & attributes

Page 51: Xml pres 1

Example: Simple XML File

<?xml version="1.0"?> <note> <to>Peter</to> <from>Clare</from>

<heading>Reminder</heading> <body>Don't forget the pub this weekend!</body>

</note>

Page 52: Xml pres 1

Example: XML Schema<?xml version="1.0"?> <xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema“><xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Page 53: Xml pres 1

Schema components [1]

The <schema> element

<?xml version="1.0"?><xs:schema …..... ...

</xs:schema>

Page 54: Xml pres 1

Schema components [2]

Simple element can contain only text. It cannot

contain any other elements or attributes.

<xs:element name="to" type="xs:string"/>

Page 55: Xml pres 1

Schema components [3]

Attributes

e.g. <xs:attribute name="lang"

type="xs:string"/>

<lastname lang="EN">Smith</lastname>

Page 56: Xml pres 1

Schema components [4]

Built-in data types…. E.g: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

Page 57: Xml pres 1

Schema restrictions [restriction base]

<xs:element name="age"><xs:simpleType> <xs:restriction base="xs:integer">

<xs:minInclusive value="0"/><xs:maxInclusive value="100"/>

</xs:restriction> </xs:simpleType></xs:element>

Page 58: Xml pres 1

Schema restrictions [enumeration]

<xs:element name="car"><xs:simpleType>

<xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction>

</xs:simpleType></xs:element>

Page 59: Xml pres 1

Schema restrictions [pattern/regular expression]

<xs:element name="letter"><xs:simpleType>

<xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction>

</xs:simpleType>

</xs:element>

Page 60: Xml pres 1

Regular Expressions Wildcards on steroids

ab|c{2}|de “ab”; “cc”; “de”

[A-Z]{1,4} “ABDS”; “A”; “ZXS”

[1970-2030] e.g. years in range

[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}

Post Codes

Page 61: Xml pres 1

Restrictions for Datatypes enumeration fractionDigits length maxExclusive maxInclusive maxLength

minExclusive minInclusive minLength pattern totalDigits whiteSpace

Page 62: Xml pres 1

Complex Element

contains other elements and/or attributes. [4 kinds]

1) empty elements 2) elements that contain only other

elements 3) elements that contain only text

4) elements that contain both other elements and text

Page 63: Xml pres 1

Complex Element examples

a) <product pid="1345"/>

b) <employee> <firstname>John</firstname>

<lastname>Smith</lastname> </employee>

c) <food type="dessert">Ice cream</food>

Page 64: Xml pres 1

Complex Element Definition

<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname"

type="xs:string"/> <xs:element name="lastname"

type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

Page 65: Xml pres 1

Complex Element Definition /2

Reference to complex type <xs:element name="employee"

type="personinfo"/>

<xs:complexType name="personinfo"> <xs:sequence>

<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType>

Page 66: Xml pres 1

Type Reuse

Several elements based on same type

<xs:element name="employee" type="personinfo"/>

<xs:element name="student" type="personinfo"/>

<xs:element name="member" type="personinfo"/>

Page 67: Xml pres 1

Type Extension<xs:complexType name="fullpersoninfo"> <xs:complexContent>

<xs:extension base="personinfo"> <xs:sequence>

<xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/>

</xs:sequence> </xs:extension>

</xs:complexContent>

</xs:complexType>

Page 68: Xml pres 1

Indicators Seven type of indicators enable composition

Order indicators: All Choice Sequence

Occurrence indicators: maxOccurs minOccurs

Group indicators: Group name attributeGroup name

Page 69: Xml pres 1

<any>

The <any> element enables us to extend the XML document with elements not specified by the schema.

The <anyAttribute> element enables us to extend the XML document with attributes not specified by the schema.

Page 70: Xml pres 1

Where’s the beef?

XML Schema permits… Standard libraries of data

specifications Formal specification of data models Automated validation of XML

instance files based on XML Schema Simplified creation of structured

documents

Page 71: Xml pres 1

XML Schema QA

Automated using a QA XSLT GovTalk – Schema QA Stylesheet schemaQA_1.htm

Page 72: Xml pres 1

Schema Libraries

Govtalk Ordnance Survey MasterMap Environmental Information

Exchange

Page 73: Xml pres 1

XML Toolkit

Parsers (validating & non-validating)

DOM (Document Object Model) SAX (Simple API for XML) Hybrid pull parsers

Page 74: Xml pres 1

Schema & Validation

Schema provide basis for automated validation of XML

xmlValidation.dot

Page 75: Xml pres 1

Schema & Document Creation

Page 76: Xml pres 1

SAS XML Mapper

Page 77: Xml pres 1

SAS XMLMap<?xml version="1.0" encoding="UTF-8" ?> <SXLEMAP > <TABLE name="docDscr_citation__titl">

<TABLE-PATH syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl

</TABLE-PATH> <COLUMN name="docDscrcitationtitl">  <PATH

syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl</PATH>   <TYPE>character</TYPE>   <DATATYPE>string</DATATYPE>   <LENGTH>950</LENGTH>   <LABEL>Full authoritative title of the documentation (DC

Title)</LABEL>   </COLUMN></TABLE></SXLEMAP>

Page 78: Xml pres 1

SAS XMLMap Manager Plugin

Page 79: Xml pres 1
Page 80: Xml pres 1

Benefits of the XML route

Open Standards Vendor Neutral e-GIF/OSIAF compliant Very flexible – one source, many

uses

Page 81: Xml pres 1

Problems with the XML route

XML files tend to be large DOM (Drudgery Object Model) Inter-record linking & validation

across records is not trivial Many tools are not mature (but this

situation is improving rapidly.)

Page 82: Xml pres 1

OK, What next…?

Vocabularies Schemas Additional intra-record validation

based on XSLT and XPath Publish

Page 83: Xml pres 1

Vocabularies

Domain experts identify data items and agree a vocabulary.

Arrange items into logical data groupings

Page 84: Xml pres 1

XML Schemas

Model the data items (UML?) Isolate common data definitions Prepare Schemas Disambiguate using namespaces Validate model QA Schemas for compliance with

standards (automated)

Page 85: Xml pres 1

Intra-record validation

Options include… XSLT XPath(SE examples: Pupil Census; Road

Accident Stats.)

Page 86: Xml pres 1

Publication

Add to Schema Library Govtalk Ordnance Survey MasterMap Environmental Information

Exchange Example: BS7666

Page 87: Xml pres 1