Extensible Markup Language Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn...

30
Extensible Markup Language Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University

Transcript of Extensible Markup Language Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn...

Extensible Markup Language

Natawut Nupairoj, Ph.D.

Department of Computer EngineeringChulalongkorn University

Outline

Overview. Basic XML Syntax. User-Defined XML Structure

Document Type Definition.

Overview

What is Markup Language ? Old style communication for editing. Between writer and editor. Example:

This are is a mark-up.

We can more text.

add Sometimes, called “Metalanguage”

^

Overview

Family of Computer Markup LanguageStandard Generalized Markup Language

(SGML) Father of them all. Complex.

HyperText Markup Language (HTML) The most popular child. Focus on presentation: for human.

Overview

Extensible Markup Language (XML) Become increasingly popular. Similar to HTML. Focus on describing data

For human and machine.

Extensible Language for creating other languages. Base syntax. User-defined structure.

Example

<?xml version=“1.0” encoding = “UTF-8”?><endangered_species><animal><name language=“English”>Tiger</name><name language=“Latin”>pantera tigris</name><threats><threat>poachers</threat><threat>habitat destruction</threat><threat>trade in tiger bones for traditional Chinese

medicine(TCM)</threat></threats><weight>500 pounds</weight><length>3 yards from nose to tail</length>

...

</endangered_species>

XML Siblings

XML Structure Definition Document Type Definition (DTD). XML Schema.

XML Parser DOM. SAX.

XML-related technologies XSLT. XPath.

XML Components

Element: tag and contentData.

<name>Tiger</name>

XML Components

Attribute: name and value Metadata = Data of Data.

<name language=“English”>Tiger</name>

<name> <language>English</language> <text>Tiger</text></name>

XML Components

Nested element

<animal>

<name language=“English”>Tiger</name>

<name language=“Latin”>Panthera tigris</name>

<weight>500 pounds</weight>

</animal>

XML Components

Empty element

<animal></animal>

<animal />

<picture filename=“tiger.jpg” />

XML Components

Special symbols&amp; for ampersand (&).&lt; for less than sign (<).&gt; for greater than sign (>).&quot; for double quotation (“).&apos; for single quotation or apostrophe (‘).

<weight>&lt;500 pounds</weight>

XML Components

Comment

<!–- This is a comment. It can span multiple lines. -->

Basic XML Syntax

All XML files/applications must conform to basic XML syntaxXML declaration is not required (but

recommended).

<?xml version=“1.0”?><endanger_species><name>Tiger</name></endangered_species>

Basic XML Syntax

One and only one root element.

<?xml version=“1.0”?>

<endanger_species>

<name>Tiger</name>

</endangered_species>

Basic XML Syntax

Balanced and matched opening/closing tags.

<?xml version=“1.0”?>

<endanger_species>

<name>Tiger</name>

<picture filename=“tiger.jpg” />

</endangered_species>

Basic XML Syntax

Case-sensitive.

<name>Tiger</Name>

Case-sensitive.

<picture filename=“tiger.jpg” />

User-Defined XML Structure

XML basic syntax The pattern of all XML documents. Does not say about “structure”. Followed basic syntax = well-formed document.

User-Defined XML Structure Which “tags” and “attributes” are allowed. Describe the structure. Followed “structure” = valid document.

Parser and DTD

XML ParserXMLDocument

Yes/No

DTD

Check input using basic syntax and DTD.

Document Type Definition (DTD)

Old-fashioned, simple, but widely used. Internal DTD.<?xml version=“1.0”?><!DOCTYPE endangered_species [...]><endangered_species><animal>...

Document Type Definition (DTD)

External DTD.<?xml version=“1.0” standalone=“no”?>

<!DOCTYPE endangered_species SYSTEM

“http://www.natawut.com/xml/my_xml.dtd”>

<endangered_species>

<animal>

...

Defining Elements

<!ELEMENT endanger_species (animal)>

<!ELEMENT picture EMPTY>

<!ELEMENT endanger_species ANY>

Defining Elements

<!ELEMENT name (#PCDATA)>

<!ELEMENT weight (#PCDATA)>

<!ELEMENT threat (#PCDATA)>

<name language=“English”>Tiger</name>

<weight>500 pounds</weight>

...

Defining Elements

<!ELEMENT animal (name, threats, weight, length, source, picture, subspecies)>

<animal>

<name language=“English”>Tiger</name>

<threats>

<threat>poachers</threat>

</threats>

<weight>500 pounds</weight>

...

</animal>

Defining Elements

<!ELEMENT characteristics ((weight, length) | picture)>

<characteristics>

<weight>500 pounds</weight>

<length>3 yards from nose to tail</length>

</characteristics>

<characteristics>

<picture filename=“tiger.jpg”/>

</characteristics>

Defining Elements

<!ELEMENT animal (name+, threats, weight?, length?, source, picture, subspecies*)>

<!ELEMENT threats (threat, threat, threat+)>

Defining Attributes

<!ELEMENT population (#PCDATA)>

<!ATTLIST population year CDATA #IMPLIES>

<population>445</population>

<population year=“2002”>445</population>

<population year=“year-rabbit”>445</population>

Defining Attributes

<!ELEMENT population (#PCDATA)>

<!ATTLIST population year CDATA #REQUIRED>

<!ELEMENT population (#PCDATA)>

<!ATTLIST population year (2002|2003) #REQUIRED>

<!ELEMENT population (#PCDATA)>

<!ATTLIST population year CDATA ”2002”>

Defining Attributes

<!ELEMENT population (#PCDATA)>

<!ATTLIST population year CDATA #FIXED ”2002”>

Putting Them Together

<!ELEMENT endangered_species (animal*)>

<!ELEMENT animal(name+, threats, weight?, length?, source, picture, subspecies+)>

<!ELEMENT name (#PCDATA)>

<!ATTLIST name language (English | Latin)>

...