SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

34
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

Transcript of SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

Page 1: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNUOOPSLA Lab.

Logical structure

© copyright 2001 SNU OOPSLA Lab.

Page 2: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Contents

• Concepts• DTD Structure• Element Declaration• Attribute Declarations• Parameter Entities• Conditional Sections• Notation Declarations• DTD Processing Issues

Page 3: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Concepts of DTD(1)

• DTD(Document Type Definition)– An optional but powerful feature of XML– Comprises a set of declarations that define a

document structure tree– Some XML processors read the DTD and use it to

build the document model in memory– Establishes formal document structure rules

• It define the elements and dictates where they may be applied in relation to each other

Page 4: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Concepts of DTD(2)

• Declare Vs. Define

– Declare “This document is a concert poster”

– Define “A concert poster must have the following features”

• DTD define

– Element type + Attribute + Entities

• Valid Vs. Invalid

– Valid conforms to DTD

– Invalid fail to conform to DTD

Well formed XML Document

Valid XML Document

Page 5: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Valid & Invalid Documents• Valid:

<GREETING>various random text but no markup</GREETING>

• Invalid: anything else including<GREETING> <sometag>various random text</sometag> <someEmptyTag/></GREETING>– or<GREETING> <GREETING>various random text</GREETING></GREETING>

Page 6: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

DTD structure

• DTD is composed of a number of declarations– ELEMENT (tag definition)– ATTLIST (attribute definitions)– ENTITY (entity definition)– NOTATION(data type notation definition)

• DTD can be stored in an external subset or an internal subset

Page 7: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Internal and External Subset(1)

• Internal subset– Form : <!DOCTYOE … [ <!-- Internal Subset --> … ]>– Pros

• Easy to write XML

– Cons

• Editing two files without moving

• Other document can’t reuse without copying internal subset

Page 8: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Internal and External Subset(2)

• External subset– better to use external DTDs

– Reason why?• Many benefits

– document management

– updating

– editing

• Few reasons

– If you use an external DTD, you can use public DTDs(capability)

– External DTDs provide for better document management

– External DTDs make it easier to validate you document

Page 9: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Element Declarations• Used to define a new element, specify its allowed

content and gives the name and content model of the element

• Each tag must be declared in a <!ELEMENT> declaration.

• The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

ELEMENT Type declaration ‘<!ELEMENT’ S Name S Contentspec S? ‘>’

Page 10: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Content Specifications

• ANY• #PCDATA• Sequences• Choices• Mixed Content• Modifiers• Empty

Page 11: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

ANY

• A SEASON can contain any child element and/or raw text (parsed character data)

• Rarely used in practice, due to the lack of constraint on structure it encourages.

<!ELEMENT SEASON ANY>

Page 12: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

#PCDATA

• Parsed Character Data; i.e. raw text, no markup

• Represent normal data and preceded by the hash-symbol, ‘#’, to avoid confusion with an identical element name, when used within a model group( for example, ‘(#PCDATA | PCDATA)’)

<!ELEMENT YEAR (#PCDATA)>

Page 13: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Use of #PCDATA in XML

• Valid: • Invalid:

<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 .E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine</YEAR>

<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>

Page 14: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Child Elements

• To declare that a LEAGUE element must have a LEAGUE_NAME child:

<!ELEMENT LEAGUE (LEAGUE_NAME)> <!ELEMENT LEAGUE_NAME (#PCDATA)>

Page 15: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Sequences(1)

• Separate multiple required child elements with commas; e.g.

• One or More Children +

<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)><!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>

<!ELEMENT DIVISION_NAME (#PCDATA)><!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>

Page 16: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Sequences(1)

• Zero or More Children *

• Choices

<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)><!ELEMENT TEAM_CITY (#PCDATA)><!ELEMENT TEAM_NAME (#PCDATA)>

<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>

<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>

Page 17: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Grouping With Parentheses

• Parentheses combine several elements into a single element.

• Parenthesized element can be nested inside other parentheses in place of a single element.

• The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.

<!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>

Page 18: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Mixed Content

• Both #PCDATA and child elements in a choice

• #PCDATA must come first• #PCDATA cannot be used in a sequence

<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>

Empty elements

<!ELEMENT BR EMPTY>

Page 19: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Attribute Declarations

• Consider this element:

• It is declared like this:

<GREETING LANGUAGE="Spanish"> Hola!</GREETING>

<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA "English">

<!ATTLIST Element_name Attribute_name Type Default_value>

Page 20: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Multiple Attribute Declarations

• Consider this element

• With two attribute declarations:

• With one attribute declaration

• Indentation is a convetion, not a requirement

<RECT LENGTH="70px" WIDTH="85px"/>

<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA "0px"><!ATTLIST RECTANGLE WIDTH CDATA "0px">

<!ATTLIST RECTANGLE LENGTH CDATA "0px" WIDTH CDATA "0px">

Page 21: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Attribute Types

• CDATA• ID• IDREF• IDREFS• ENTITY

• ENTITIES • NOTATION • NMTOKEN • NMTOKENS• Enumerated

Page 22: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

CDATA

• Most general attribute type• Value can be any string of text not containing

a less-than sign (<) or quotation marks (")

Page 23: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

ID• Value must be an XML name

– May include letters, digits, underscores, hyphens, and periods

– May not include whitespace– May contain colons only if used for namespaces

• Value must be unique within ID type attributes in the document

• Generally the default value is #REQUIRED

Page 24: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

IDREF

• Value matches the ID of an element in the same document

• Used for links and the like

IDREFS

A list of ID values in the same documentSeparated by white space

Page 25: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

ENTITY

• Value is the name of an unparsed general entity declared in the DTD

ENTITIES

Value is a list of unparsed general entities declared in the DTDSeparated by white space

Page 26: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

NOTATION• Value is the name of a notation declared in

the DTD

<!NOTATION Tex SYSTEM “..\TEXVIEW.EXE”>

<!ENTITY Logo SYSTEM “LOGO.TEX” NDATA Tex>

TEXVIEW.EXE LOGO.TEX

1

2

3

4

Page 27: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

NMTOKEN

• Value is any legal XML name

NMTOKENS

Value is a list of XML namesSeparated by white space

Page 28: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Enumerated

• Not a keyword• Refers to a list of possible values from which

one must be chosen• Default value is generally provided explicitly

<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">

Page 29: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Attribute Default Values

• A literal string value • One of these three keywords

– #REQUIRED– #IMPLIED– #FIXED

Page 30: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

#REQUIRED• No default value is provided in the DTD• Document authors must provide attribute value

for each element

<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>

Page 31: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

#IMPLIED

• No default value in the DTD• Author may(but does not have to) provide a

value with each element

Page 32: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

#FIXED

• Value is the same for all elements• Default value must be provided in DTD• Document author may not change default value

<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">

Page 33: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Example of Internal DTDs

<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>

Page 34: SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.

SNU

OOPSLA Lab.

Internal DTD Subsets

• Internal declarations override external declarations

<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>