XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics...

28
XML - Why: The HTML-Dilemma • HTML, SGML, XML - How: Syntax, Concept, Language Elements • Basics • Well-formed XML-Documents (without DTD) • Valid XML-Documents (with DTD) • Attributes, Entities, Style Sheets • More concepts from the „XML family“

Transcript of XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics...

Page 1: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML

- Why: The HTML-Dilemma• HTML, SGML, XML

- How: Syntax, Concept, Language Elements• Basics• Well-formed XML-Documents (without DTD)• Valid XML-Documents (with DTD)• Attributes, Entities, Style Sheets• More concepts from the „XML family“

Page 2: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

The HTML-Dilemma

HTML - a language to markup documents

<H1>Heading 1</H1><H2>Heading 2</H2><p>paragraph<p>...

Page 3: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

The HTML-Dilemma

HTML is ...

simple

...but unfortunately...

Extensibility: No semantic markup Structure: No complex structures beyond layout Validity: Structural weakness

Page 4: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

SGML

SGML - Rules to define markup languages

+ Metalanguage: Highly flexible

+ Architecture to process data on different media without losing the structure of the data

¬ Complexity (user, programmer)

Page 5: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML: The Language Concept

What is XML ? Extensible Markup Language (XML) is a text-based

meta-markup language which allows you to define an infinite number of markup languages based upon the standards defined by XML.

Rather than providing a set of pre-defined tags, as with HTML, XML specifies the standards with which you can define your own markup languages with their own sets of tags.

Page 6: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML is - as SGML - based upon the idea of structured markup of data

structure

layout

presentation

content

XML: The Language Concept

Page 7: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML: The Language Concept

• Tags and attributes can be defined individually

• Document structure in any complexity can be described

• XML-documents can - but don‘t have to - contain a formal description of their grammar

HTML XML

<P> <strong>Bosak, Jon

</strong> XML, Java, and

the future of the Web </P>

<?xml version="1.0"?>

<ARTICLE>

<AUTHOR>Bosak, Jon</AUTHOR>

<TITLE>XML, Java, and the future of the Web </TITLE>

</ARTICLE>

Page 8: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML: The Language Concept

XML consists of tags <TAG>content</TAG>

...that are nested <TAG><OneMoreTag>content</OneMoreTag></TAG>

...and that constitute an XML-document, if some well-formedness rules are met.

<?xml version="1.0"?>

Page 9: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Well-formed documents

• Every open tag must explicitly be closed

• Empty elements (<IMG> in HTML) in XML are written as <IMG/> or closed

• Attribute-values are to be put in quotation marks: <?xml version=”1.0”?>

• Child markup must nest completely within parent markup, i.e. markup needs to be completely hierarchical (as SGML)

• No markup-character (< or &) in text, all attributes are CDATA by default

• You should declare your XML version at the start: <?xml version=”1.0”?>

Page 10: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Well-formed document„ORDER“

<?xml version="1.0" ?> <ORDER>

<HEAD> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>

</HEAD> <BODY>

<ITEM> <DESCRIPTION>cd rom drive</DESCRIPTION> <ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>

</ITEM> <ITEM>

<DESCRIPTION>monitor</DESCRIPTION> <ARTICLE-NO>9876</ARTICLE-NO> <AMOUNT>1</AMOUNT>

</ITEM> </BODY>

</ORDER>

Page 11: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML Basics

XML-documents are well-formed if they conform with basic syntax requirements

XML provides rules for defining markup languages. There are two ways of defining these rules (i.e. the grammar of a particular markup language

XML-documents can contain an explicit definition of required/allowed tags and their structure, i.e. a Document Type

Definition (DTD). XML-documents that confirm with a DTD are valid

Page 12: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Valid document„Order“

<?xml version="1.0" ?> <!DOCTYPE ORDER SYSTEM “ORDER.DTD"><ORDER>

<HEAD> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>

</HEAD> <BODY>

<ITEM> <DESCRIPTION>cd rom drive</DESCRIPTION> <ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>

</ITEM> <ITEM>

<DESCRIPTION>monitor</DESCRIPTION> <ARTICLE-NO>9876</ARTICLE-NO> <AMOUNT>1</AMOUNT>

</ITEM> </BODY>

</ORDER>

Page 13: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

DTD of valid document „Order“

<!ELEMENT ORDER (HEAD, BODY)>

<!ELEMENT HEAD (NAME, DATE, E-MAIL)>

<!ELEMENT NAME (#PCDATA)>

<!ELEMENT DATE (#PCDATA)>

<!ELEMENT E-MAIL (#PCDATA)>

<!ELEMENT BODY (ITEM)+>

<!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)>

<!ELEMENT DESCRIPTION (#PCDATA)>

<!ELEMENT ARTICLE-NO (#PCDATA)>

<!ELEMENT AMOUNT (#PCDATA)>

ORDER.DTD

Page 14: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Declaration of elements in a DTD

Elements can contain other elements or character data

<!ELEMENT HEAD (NAME, DATE, E-MAIL)>

<!ELEMENT NAME (#PCDATA)>

Elements can have mixed content

<!ELEMENT a (#PCDATA | b | c)*>

Elements can be defined as mandatory, optional, etc.

<!ELEMENT a (b, c?, (d|e)+, f*)<!ELEMENT e-mail (address, cc*, message, signature?)

Page 15: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Attributes

All elements can contain attributes:

<DESCRIPTION edifact=„UNH D0062.1“ lala=„123“>

Attributes have to be declared similar to elements:

<!ATTLIST DESCRIPTION edifact CDATA #REQUIRED>

Attributes can be optional, mandatory or „fixed“

<!ATTLIST DESCRIPTION ean CDATA #REQUIREDpicture CDATA #FIXED „http://my.pics.de/cd-rom.htm“status (sale | normal) „normal“>

Page 16: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Valid XML-Document

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ORDER SYSTEM „order2.dtd“> <ORDER>

<HEAD edifact=„UNH D0062.1“> <NAME>Mustermann</NAME> <DATE>02.10.1998</DATE> <E-MAIL>[email protected]</E-MAIL>

</HEAD> <BODY>

<ITEM> <DESCRIPTION ean=„3034152204082“

picture=„http://my.pics.de/cd-rom.htm“status=„sale“>cd rom drive</DESCRIPTION>

<ARTICLE-NO>123456</ARTICLE-NO> <AMOUNT>5</AMOUNT>

</ITEM> <ITEM>

.......</ITEM>

</BODY> </ORDER>

Page 17: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

DTD

<!ELEMENT ORDER (HEAD, BODY)><!ELEMENT HEAD (NAME, F-NAME*, DATE, E-MAIL+)><!ATTLIST HEAD edifact CDATA #REQUIRED><!ELEMENT NAME (#PCDATA)><!ELEMENT F-NAME (#PCDATA)><!ELEMENT E-MAIL (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT BODY (ITEM)+><!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)><!ELEMENT DESCRIPTION (#PCDATA)><!ATTLIST DESCRIPTION ean CDATA #REQUIRED

picture CDATA #FIXED „http://my.pics.de/cd-rom.htm“status (sale | normal) „normal“>

<!ELEMENT ARTICLE-NO (#PCDATA)><!ELEMENT AMOUNT (#PCDATA)>

Page 18: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Valid XML-documents

• An XML-document is valid if it is well-formed and conforms with the specifications as defined in a DTD.

• Any well-formed XML-document can become valid if it is made compliant with a DTD.

• Functionally, a DTD is analogous to a relational database schema or an IDL.

• Applications can use the DTD to check an XML-document instance for structural validity and to create new instances of the defined document type.

Page 19: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Internal DTDs

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE ORDER[<!ELEMENT ORDER (HEAD, BODY)><!ELEMENT HEAD (NAME, DATE, E-MAIL)><!ELEMENT NAME (#PCDATA)><!ELEMENT DATE (#PCDATA)><!ELEMENT E-MAIL (#PCDATA)><!ELEMENT BODY (ITEM)+><!ELEMENT ITEM (DESCRIPTION, ARTICLE-NO, AMOUNT)><!ELEMENT DESCRIPTION (#PCDATA)><!ELEMENT ARTICLE-NO (#PCDATA)><!ELEMENT AMOUNT (#PCDATA)>]>

<ORDER> <HEAD>

<NAME>Mustermann</NAME> .............................

</ORDER>

DTDs can also be part of a document instance

Page 20: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Logical and physical structure of XML-documents

The logical structure is determined by the sequence of tags in the document.

Irrespective of the logical structure, an XML-document can be divided into any number of physical entities.

Thus, it is possible to combine physically distributed XML-data into one XML-document.

Entities references are used to refer to external data.

References pointing to entities are written between „&“ and „;“

Page 21: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

External entity referneces

<!doctype ORDER

[ <!entity Head SYSTEM “HeadOrder.xml"> <!entity ItemsPC SYSTEM “Items/PC1.xml"> <!entity ItemsCD-ROM SYSTEM “http://cd.de/m2.xml"> ]>

<ORDER><CUSTOMER>&Head;</CUSTOMER> <SALESORDER> &ItemsPC;

&ItemsCD-ROM; </SALESORDER>

</ORDER >

XML-documents can be spread over different files:

Page 22: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML Entities, Unicode

<!DOCTYPE EXAMPLE

[ <!ENTITY xml "Extensible Markup Language"> ]>

<EXAMPLE>The new standard &xml; supports international character sets (ISO-10646 (Unicode)); the example shows different notations for number „1“:

&#49; (in ASCII), &#x0661; (in Devanagari), &#x0967; (in Arabisch) and &#x0d67; (in Malayalam).

</EXAMPLE>

Page 23: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Presentation ofXML-documents

XML-documents are presented using style sheets.

A style sheet determines the document’s layout.

Style Sheets are referred to by a processing instruction, e.g.: <?xml-stylesheet type="text/css” href="style1.css"?>

W3C is developing XSL, a style sheet language for XML.

In addition, presentation of XML-documents in a browser, for example, is possible using CSS which is also used to display HTML.

Page 24: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Why 2 Style-Sheet-Languages?

1) CSS: Simple; every element is assigned a layout

2) XSL: More than CSS (Scripting, Transformation), but more complex

ORDER {background-color:blue}

NAME, DATE, E-MAIL {Display:Block; font-size:28pt; font-family:Times,serif}

E-MAIL {color:yellow}

<xsl:template match=“Article-No.”>

<P>

<xsl:process-children/>

</P>

</xsl:template>

Page 25: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

XML and CSS

<?xml version="1.0" ?><?xml-stylesheet type="text/css" href="style1.css"?> <ORDER> <HEAD>

<NAME>Mustermann</NAME> ..............

</ORDER>

ORDER { Display: Block; background-color: blue; float: left; padding: 15pt}

NAME, DATE, E-MAIL {Display: Block; font-size: 28pt; font-family: Times, serif}

E-MAIL {color:yellow}

BODY {Display: Block; background-color: green; float: left; padding: 12pt}

DESCRIPTION {font-size: 28pt; font-family: Times, sans-serif}

+ =

Page 26: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

The XML-family

Besides the specifications of XML 1.0 (recommendation since 10.02.1998) there are more W3C initiatives on XML. The most important related standards are:

XLink (Working Draft, 26.07.1999)

XPointer (Working Draft, 09.07.1999)

XML Namespaces (Recommendation, 14.01.1999)

XSL (Working Draft 21.04.1999)

DOM (Recommendation, 01.10.1998)

RDF (Recommendation, 24.02.1999)

XML Schemas (Working Drafts, 06.05.1999) (XML-Data, DCD, SOX, DDML)

Page 27: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Linking in XML

• XML supports much more powerful linking capabilities than HTML.

• XLink describes uni- as well as sophisticated multi-directional links.

• XPointer specifies a mechanism for pointing to fragments of a target document, even without identifiers: “book.html#section2”.

Einfacher Link Erweiterter Link(XLink)

Link auf Element innerhalb einer Instanz (XPointer)

simple link extended link(XLink)

link to element in instance (XPointer)

Page 28: XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.

Namespaces in XML

How can an application know which namespace is relevant if different DTDs are in use (i.e. for own documents, data exchange or search engines)?

In order to prevent element and attribute names from colliding namespaces have been developed. Example: „Title“ (heading, evidence of ownership)

<EXAMPLE xmlns:h="http://www.w3.org/html4"xmlns:b="http://www.my.server.de/bibliography"xmlns:p="http://www.my.server.de/claims">

<h:caption>My XML text</h:caption><b:title>XML, Java and the future of the Web</b:title><p:title>realty</p:title>

</EXAMPLE>