Defining XML The Document Type Definition. Document Type Definition text syntax for defining...

27
Defining XML The Document Type Definition

Transcript of Defining XML The Document Type Definition. Document Type Definition text syntax for defining...

Page 1: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Defining XML

The Document Type Definition

Page 2: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Document Type Definition

• text syntax for defining– elements of XML– attributes (and possibly default values)– structure

• <?xml … standalone = “no”… ?>

• implies that an external definition exists and may be required to properly understand the content

Page 3: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Why do we need DTDs?

• Define classes of xml documents– For particular applications– Agreement on data and structure

• Validate xml data– DTD is used to check structure

• Document an xml class– DTD provides complete information about an

xml class

Page 4: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

linking an XML file to a DTD

• a document type declaration is added to the xml<!DOCTYPE message SYSTEM “myDTD.dtd”>

XMLfile

DTDDOCTYPE link

myDTD.dtdmessage.xml

Page 5: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

What Is a DTD?

• Defines a type of xml document– What elements are allowed?– What attributes do they have?– How can they be structured?

• DTD is in text format

• Usually external to the xml data– Linked by a document type declaration

• May be included in the xml data file

Page 6: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Element type declarations

<!ELEMENT myElement (#PCDATA)>

the “element definition” element

name of the element being defined

content that the element can have

#PCDATA = parsed character data

Page 7: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

<!ELEMENT message ( #PCDATA )>

One line of text, stored in messageML.dtd

<?xml version = “1.0” ?><!DOCTYPE message SYSTEM ”messageML.dtd"><message> Welcome to XML!</message>

Example of a message document conforming to this DTD

Example

Page 8: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Internal DTD Example

<?xml version = “1.0” ?>

<!DOCTYPE message [ <!ELEMENT message (#PCDATA)>]><message>Welcome to XML!

</message>

Page 9: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

• Element declarations define the content of elements

• Content can be text or other elements

• Content defines structure– How are the elements nested?– How many elements can be included?– What order do elements come in?

Defining structure

Page 10: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Defining structure

<!ELEMENT classroom (teacher, student)>

a classroom contains exactly one teacher followed by exactly one student

<!ELEMENT dessert (iceCream ¦ pastry)>

a dessert contains either one iceCream or one pastry, but not both

<!ELEMENT album (track+)>

an album contains one or more tracks

Page 11: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

occurrence indicators

Plus sign (+)

Asterisk (*)

Question mark (?) Element will appear 0 to 1 times

Element will appear 0 to many times

Element will appear 1 to many times

<!ELEMENT album (track+)>

<!ELEMENT library (book*)>

<!ELEMENT seat (person?)>

Page 12: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

A Simple Document Type Definition

<!—DTD for sample document--> <!ELEMENT customer-details (name, address) > <!ELEMENT address (street, city, state, postal) > <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT postal (#PCDATA)>

Page 13: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

DTD Example 1

<!ELEMENT class

(number, (instructor ¦ assistant+), (credit ¦ nocredit) )>

a class must contain a number followed by either an instructor or one or more assistants followed by either a credit or a nocredit

<class>

<number>CM4003</number>

<instructor>Stewart Massie</instructor>

<credit>15</credit>

</class>

Page 14: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

DTD Example 2

<!ELEMENT donutBox (jam?, lemon*,

((cream | sugar)+ | iced))

a donutBox contains 0 or 1 jam followed by 0 to many lemon followed by either one to many cream or sugar or one iced

<donutBox>

<jam>raspberry</jam>

<lemon>sour</lemon>

<lemon>half-sour</lemon>

<iced>chocolate</iced

</donutBox>

<donutBox>

<iced>pink</iced>

</donutBox>

Page 15: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

DTD Example 3

<!ELEMENT farm (farmer+,

(dog* | cat?), pig*,

(goat | cow)?, (chicken+ | duck*)

)>

<farm>

<farmer>Farmer Maggot</farmer>

<cat>Tiddles</cat>

<duck>Donald</duck>

</farm>

Page 16: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

DTD Example 4

mixed content (narrative XML)

<!ELEMENT paragraph (#PCDATA|name|profession|date|irony)*>

A <paragraph> element may contain any combination of <name>, <profession> or <date> elements interspersed with parsed character data.

<paragraph> Today’s date is <date month=“October” day=“1”/> and

<name>Stewart Massie</name>, a <profession>lecturer</profession> is delivering a <irony>scintillating</irony> XML lecture.</paragraph>

Page 17: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Defining attributes

• attributes assigned to elements using the <!ATTLIST …> instruction

• ATTLIST defines– Which element the attribute belongs to– The name of the attribute– The values the attribute can take– Possible default values– Whether the attribute MUST be present or not

Page 18: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Attribute values

• In HTML all attributes are text

• DTDs support 10 attribute types

• Most common are:– CDATA (literal text)– ID (unique identifier)– NMTOKEN (“no whitespace”)– Enumeration (of all possible values)

Page 19: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Conditions on attributes

• #REQUIRED– the attribute must be given a value in the XML

• #IMPLIED– the attribute may be omitted from the XML

• #FIXED– the value of the attribute is fixed and defined in

the DTD

• literal– a default value is supplied literally in the DTD

Page 20: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Example attribute declarations

<!ELEMENT pig (PCDATA)><!ATTLIST pig weight CDATA #REQUIRED><!ATTLIST pig id_code ID #REQUIRED><!ATTLIST pig name NMTOKEN #IMPLIED><!ATTLIST pig sex (M | F) “F”><!ATTLIST pig canFly FIXED “no”>

<pig weight = “1000kg”id_code = “pig017”>

Porky</pig>

Page 21: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

entities

• used to represent text that would cause parsing problems

• &lt; represents <

• &amp; represents &

• &gt; represents >

• &quot; represents “• &apos; represents ‘

Page 22: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

defining entities

• <!ENTITY label replacementText>

• <!ENTITY super supercallifragilisticexpialidocious>

• now &super; is replaced in the XML (or in attribute values) by supercallifragilisticexpialidocious

Page 23: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

CDATA or PCDATA?

• PCDATA– Parsed Character DATA– will be parsed for entities

• CDATA– Character DATA– Will NOT be parsed– CDATA sections are sometimes included in

xml to include “literal” sections of code

Page 24: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Writing a CDATA section

<!CDATA[Hi! I’m a CDATA section!I can include anything that would normally upset the parser:

<?<<< &&&;; ><></> hahahahahahaha!!!The only thing I have to avoid is a double square closing bracket, which means the CDATA has ended.

]]>

Page 25: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Validation of xml

• Validation means checking that an xml document conforms to its DTD

• Adds security to automatic processing

• Allows free machine-machine exchange of xml

• Applied before manipulating xml– See XSLT, SAX, DOM later

Page 26: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

Well-formed vs valid

• Well-formed xml– The data obeys the xml syntax rules

• Valid xml– The data is well-formed xml– The data has a DTD– The data conforms to the DTD

• xml data may be well-formed but invalid

Page 27: Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)

xml parser types

• validating parser– checks XML is well-formed

• conforms to XML specification

– checks XML is valid (has and matches a DTD)

• non-validating parser– only checks XML is well-formed– may pass invalid XML