Xml pres 1

Post on 19-Jun-2015

279 views 1 download

description

A presentation on XML from 2006! but still useful

Transcript of Xml pres 1

XML From The Ground Up

?

12345678901234567890123456789

simpson bart springfield

flintstonefred bedrock

rubble barney bedrock

Fixed Width Field

12345678901234567890123456789

simpson bart springfield

flintstonefred bedrock

rubble barney bedrock

Fixed Width cont…

simpson bart springfield

flintstone fred bedrock

rubble barney bedrock

?1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -

loaded",4799.00

CSV1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air - moon roof -

loaded",4799.00

CSV cont…

1997 Ford E350 ac, abs, moon 3000.00

1999 ChevyVenture "Extended Edition"   4900.00

1996 Jeep Grand Cherokee

MUST SELL! air - moon roof - loaded 4799.00

? 01041cam 2200265 a 450000100200000000300040002000

50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\

MARC 01041cam 2200265 a 450000100200000000300040002000

50017000240080041000410100024000820200025001060200 04400131040001800175050002400193082001800217100003 20023524500870026724600360035425000120039026000370 04023000029004395000042004685200220005106500033007 30650001200763^###89048230#/AC/r91^DLC^19911106082 810.9^891101s1990####maua###j######000#0#eng##^##$ a###89048230#/AC/r91^##$a0316107514 :$c$12.95^##$a 0316107506 (pbk.) :$c$5.95 ($6.95 Can.)^##$aDLC$cD LC$dDLC^00$aGV943.25$b.B74 1990^00$a796.334/2$220^ 10$aBrenner, Richard J.,$d1941-^10$aMake the team. $pSoccer :$ba heads up guide to super soccer! /$cR ichard J. Brenner.^30$aHeads up guide to super soccer.^##$a1st ed.^##$aBoston :$bLittle, Brown,$cc19 90.^##$a127 p. :$bill. ;$c19 cm.^##$a"A Sports ill ustrated for kids book."^##$aInstructions for improving soccer skills. Discusses dribbling, heading, playmaking, defense, conditioning, mental attitud e, how to handle problems with coaches, parents, and other players, and the history of soccer.^#0$aS occer$vJuvenile literature.^#1$aSoccer.^\

MARC cont… Leader 01041cam 2200265 a 4500 Control No. 001 ###89048230 Control No. ID 003 DLC DTLT 005 19911106082810.9 Fixed Data 008 891101s1990 maua j 001 0 eng LCCN 010 ## $a ###89048230 ISBN 020 ## $a 0316107514 :

$c $12.95 ISBN 020 ## $a 0316107506 (pbk.) :

$c $5.95 ($6.95 Can.) Cat. Source 040 ## $a DLC

$c DLC $d DLC LC Call No. 050 00 $a GV943.25 $b .B74 1990 Dewey No. 082 00 $a 796.334/2 $2 20 …

?:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE

Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!

:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and

column wide): :fig place=inline width=column. Let's create another figure that is column wide,

which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.

GML:p.Here's an example of some BASIC statements: :xmp. 10 PRINT USING 55 A, B, C 20 LET J = K + 2 30 IF J = X GO TO 80 :exmp. :pc.that will solve this problem. :fig place=inline width=page frame=box. AN INLINE, PAGE-WIDE FIGURE

Because the contents of a figure format EXACTLY as entered, you can enter blanks on the line (before text) and the lines will print exactly the same as they were entered!

:figcap.An Inline, Page-Wide Figure :figdesc.This is the first figure I have entered myself. :efig. :p.This paragraph follows the FIG end tag. Here we have another figure (inline and

column wide): :fig place=inline width=column. Let's create another figure that is column wide,

which will create a second item for a list of illustrations in a future exercise. :figcap.A Column-Wide Figure :efig.

GML cont…

SGML

<QUOTE TYPE="example"> typically something like <ITALICS>this</ITALICS>

</QUOTE>

HTML

XML - 1

<stats21> <ARN ref="E008026"> <AttendantCircumstancesRecord> <PoliceForce>96</PoliceForce> <YearOfRecord>00</YearOfRecord> <MonthOfRecord>00</MonthOfRecord> <AccidentReferenceNumber>E008026</AccidentReferenceNumber> <AccidentSeverity>3</AccidentSeverity> <NumberOfVehicles>002</NumberOfVehicles> <NumberOfCasualties>001</NumberOfCasualties>

… </AttendantCircumstancesRecord> </ARN></stats21>

XML - 2

Is for structuring data Is derived from SGML/HTML Is text, but isn’t meant to be read Is verbose by design

Basic Syntax of XML All XML elements must have a closing tag Empty elements must close with / XML tags are case sensitive All XML elements must be properly nested All XML documents must have a root

element Attribute values must always be quoted XML entities must be used for special

characters

Special Characters in XML strings

& - &amp; < - &lt; > - &gt; " - &quot; ' - &#39;

Example of Special Characters

Invalid XML<Organization>Logica & SE</Organization>

Valid XML <Organization>Logica &amp;

SE</Organization>

XML Structure<?xml version="1.0" encoding="utf-

8" standalone="no"?> <?xml-stylesheet type="text/css“ href="xmlstyle.css"?><bookstore xml:lang="en-US“ xmlns:def="Definitions“>

<book id=“1”>The Bible</book> …</bookstore>

Prolog.(optional)

Processing Instruction (optional)Document Element (namespace/s)

Child node/s

Closing tag of Document Element

XML Example<?xml version="1.0" encoding="UTF-8"?> <Recipe name="bread" prep_time="5 mins" cook_time="3 hours">

<title>Basic bread</title> <ingredient amount="3" unit="cups">Flour</ingredient><ingredient amount="0.25" unit="ounce">Yeast</ingredient><ingredient amount="1.5" unit="cups“ state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient><Instructions>

<step>Mix all ingredients together, and knead thoroughly.</step>

<step>Cover with a cloth, and leave for one hour in warm room.</step>

<step>Knead again, place in a tin, and then bake in the oven.</step></Instructions>

</Recipe>

Root

?xml Recipe

name

prep_timecook_tim

e

ingredient bread

5mins3

hourstitle

Basic…amoun

t

Instructions

step step step

Mix… Cover… Knead…

3

Flour

textp-iroot

element

attribute

Attributes vs ElementsData can be stored in child elements or in attributes.

<person sex="female"> <fname>Anna</fname> <lname>Smith</lname> </person>

<person> <sex>female</sex> <fname>Anna</fname> <lname>Smith</lname> </person>

Namespaces Disambiguation mechanism <x xmlns:edi='http://ecommerce.org/schema'>

  <!--the "edi" prefix is bound to http://ecommerce.org/schema       for the "x" element and contents --></x>

<x xmlns:edi='http://ecommerce.org/schema'>  <!-- the 'price' element's namespace is 

http://ecommerce.org/schema -->  <edi:price units='Euro'>32.18</edi:price></x>

XML Document Structure

Tree Representation

Tree

Pruning

Grafting

Hierarchy

Tree Traversal

Tree Models

Trees – Nested Set view

Take Home …

XML is a syntax for marking up data Markup tags are not pre-defined Namespaces make identical tag

names unique An XML instance document is made

up of markup tags and text (data) XML documents are tree structures

XPath

language for addressing part/s of an XML document

designed to be used by XSLT models XML document as tree of

nodes fully supports XML Namespaces

XPath & XML Document Structure

<xml> <table> <rec id="1"> <numField>123</numField> <stringField>StringValue</stringField> </rec> <rec id="2"> <numField>346</numField> <stringField>Text Value</stringField> </rec> </table> </xml>

XPathMain.htm

xml xml/table xml/table/rec xml/table/rec/numField xml/table/rec/stringField

xml/table/rec/@id

xml/table/

rec[@id='2']

XSL/XSLT

XSL/XSL Example - Source

<persons> <person username="MP123456">

<name>John</name> <family_name>Smith</family_name>

</person> <person username="PK123456">

<name>Sally</name> <family_name>Jones</family_name>

</person> </persons>

XSLT Stylesheet<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

version="1.0">

<xsl:template match="/"> <transform>

<xsl:apply-templates/> </transform> </xsl:template>

<xsl:template match="person"> <record>

<username> <xsl:value-of select="@username" />

</username> <name>

<xsl:value-of select="name" /> </name>

</record> </xsl:template>

</xsl:stylesheet>

Transformed Output<?xml version="1.0" encoding="UTF-8"?> <transform> <record> <username>MP123456</username>

<name>John</name> </record> <record> <username>PK123456</username>

<name>Sally</name> </record> </transform>

XSLT Functions

current document element-available format-number function-available generate-id key system-property unparsed-entity-uri

XPath Functions

boolean ceiling concat contains count false floor id lang

last local-name name namespace-uri normalize-space

not number position round

starts-with string string-length substring substring-after

substring-before

sum translate true

XSL-FO Processor

Take Home …

XPath to address data within XML XSLT to re-structure XML They operate on collections of

nodes They work with any type of XML

XSLT_test.htm

XML Schema

A pattern for XML documents Content Structure Constraints

XML Schema Defines … Content

elements & attributes Structure

parent-child relationships order of child elements number of child elements

Constraints whether an element is empty or can include

text data types for elements and attributes default/fixed values for elements & attributes

Example: Simple XML File

<?xml version="1.0"?> <note> <to>Peter</to> <from>Clare</from>

<heading>Reminder</heading> <body>Don't forget the pub this weekend!</body>

</note>

Example: XML Schema<?xml version="1.0"?> <xs:schema

xmlns:xs="http://www.w3.org/2001/XMLSchema“><xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Schema components [1]

The <schema> element

<?xml version="1.0"?><xs:schema …..... ...

</xs:schema>

Schema components [2]

Simple element can contain only text. It cannot

contain any other elements or attributes.

<xs:element name="to" type="xs:string"/>

Schema components [3]

Attributes

e.g. <xs:attribute name="lang"

type="xs:string"/>

<lastname lang="EN">Smith</lastname>

Schema components [4]

Built-in data types…. E.g: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

Schema restrictions [restriction base]

<xs:element name="age"><xs:simpleType> <xs:restriction base="xs:integer">

<xs:minInclusive value="0"/><xs:maxInclusive value="100"/>

</xs:restriction> </xs:simpleType></xs:element>

Schema restrictions [enumeration]

<xs:element name="car"><xs:simpleType>

<xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction>

</xs:simpleType></xs:element>

Schema restrictions [pattern/regular expression]

<xs:element name="letter"><xs:simpleType>

<xs:restriction base="xs:string"> <xs:pattern value="[a-z]"/> </xs:restriction>

</xs:simpleType>

</xs:element>

Regular Expressions Wildcards on steroids

ab|c{2}|de “ab”; “cc”; “de”

[A-Z]{1,4} “ABDS”; “A”; “ZXS”

[1970-2030] e.g. years in range

[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}

Post Codes

Restrictions for Datatypes enumeration fractionDigits length maxExclusive maxInclusive maxLength

minExclusive minInclusive minLength pattern totalDigits whiteSpace

Complex Element

contains other elements and/or attributes. [4 kinds]

1) empty elements 2) elements that contain only other

elements 3) elements that contain only text

4) elements that contain both other elements and text

Complex Element examples

a) <product pid="1345"/>

b) <employee> <firstname>John</firstname>

<lastname>Smith</lastname> </employee>

c) <food type="dessert">Ice cream</food>

Complex Element Definition

<xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname"

type="xs:string"/> <xs:element name="lastname"

type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>

Complex Element Definition /2

Reference to complex type <xs:element name="employee"

type="personinfo"/>

<xs:complexType name="personinfo"> <xs:sequence>

<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType>

Type Reuse

Several elements based on same type

<xs:element name="employee" type="personinfo"/>

<xs:element name="student" type="personinfo"/>

<xs:element name="member" type="personinfo"/>

Type Extension<xs:complexType name="fullpersoninfo"> <xs:complexContent>

<xs:extension base="personinfo"> <xs:sequence>

<xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/>

</xs:sequence> </xs:extension>

</xs:complexContent>

</xs:complexType>

Indicators Seven type of indicators enable composition

Order indicators: All Choice Sequence

Occurrence indicators: maxOccurs minOccurs

Group indicators: Group name attributeGroup name

<any>

The <any> element enables us to extend the XML document with elements not specified by the schema.

The <anyAttribute> element enables us to extend the XML document with attributes not specified by the schema.

Where’s the beef?

XML Schema permits… Standard libraries of data

specifications Formal specification of data models Automated validation of XML

instance files based on XML Schema Simplified creation of structured

documents

XML Schema QA

Automated using a QA XSLT GovTalk – Schema QA Stylesheet schemaQA_1.htm

Schema Libraries

Govtalk Ordnance Survey MasterMap Environmental Information

Exchange

XML Toolkit

Parsers (validating & non-validating)

DOM (Document Object Model) SAX (Simple API for XML) Hybrid pull parsers

Schema & Validation

Schema provide basis for automated validation of XML

xmlValidation.dot

Schema & Document Creation

SAS XML Mapper

SAS XMLMap<?xml version="1.0" encoding="UTF-8" ?> <SXLEMAP > <TABLE name="docDscr_citation__titl">

<TABLE-PATH syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl

</TABLE-PATH> <COLUMN name="docDscrcitationtitl">  <PATH

syntax="XPath">/codeBook/docDscr/citation/titlStmt/titl</PATH>   <TYPE>character</TYPE>   <DATATYPE>string</DATATYPE>   <LENGTH>950</LENGTH>   <LABEL>Full authoritative title of the documentation (DC

Title)</LABEL>   </COLUMN></TABLE></SXLEMAP>

SAS XMLMap Manager Plugin

Benefits of the XML route

Open Standards Vendor Neutral e-GIF/OSIAF compliant Very flexible – one source, many

uses

Problems with the XML route

XML files tend to be large DOM (Drudgery Object Model) Inter-record linking & validation

across records is not trivial Many tools are not mature (but this

situation is improving rapidly.)

OK, What next…?

Vocabularies Schemas Additional intra-record validation

based on XSLT and XPath Publish

Vocabularies

Domain experts identify data items and agree a vocabulary.

Arrange items into logical data groupings

XML Schemas

Model the data items (UML?) Isolate common data definitions Prepare Schemas Disambiguate using namespaces Validate model QA Schemas for compliance with

standards (automated)

Intra-record validation

Options include… XSLT XPath(SE examples: Pupil Census; Road

Accident Stats.)

Publication

Add to Schema Library Govtalk Ordnance Survey MasterMap Environmental Information

Exchange Example: BS7666