XML for E-commerce Helena Ahonen-Myka University of Helsinki.

Post on 30-Dec-2015

219 views 0 download

Tags:

Transcript of XML for E-commerce Helena Ahonen-Myka University of Helsinki.

XML for E-commerce

Helena Ahonen-Myka

University of Helsinki

XML: background

SGML: standard for markup languages (1986)

HTML: an SGML application XML: a simplified version of SGML

(developed for the Web) software and platform independent

representations for structured data

Example, HTML

<html> <head> <title>An HTML document</title> </head> <body> <h1>Heading 1</h1> <p>Some text content</p> <h2>Subheading</h2> <p>More text.</p> </body></html>

Lists, images, links

<body> <h1>Finland</h1>

<ol> <li><a href=”trav.html”>Traveling</a> <li><a href=”culture.html”>Culture</a> <li><a href=”sports.html”>Sports</a> </ol>

<p><img src=”map.jpg” alt=”map”></body>

Tables<table border=”1”> <tr><th>Year</th><th>Sales</th></tr> <tr><td>2000</td><td>$18M</td></tr> <tr><td>2001</td><td>$25M</td></tr> <tr><td>2002</td><td>$36M</td></tr></table>

Year

Sales

2000

2001

2002

$18M

$25M

$36M

Forms

<form action=”http://some.com/add” method=”post”> <p>First name: <input type=”text” name=”fname”><br> Last name: <input type=”text” name=”lname”><br> <input type=”submit”><input type=”reset”> </p></form>

First name: _________________________Last name: __________________________Submit Reset

HTML

easy to describe simple documents (headings, text, lists, tables, images)

easy to create links to other documents or different parts of the same document

the elements have a default presentation style

Presentation

the browsers give elements a default presentation style

often the authors want something else it is wise to separate the presentation

from the document contents: ease of modifications and uniformity of the appearance

CSS: Cascading Style Sheets

Stylesheet defines for each element, e.g., the font, size, color, widths of margins

the structure of a document cannot be modified

several stylesheets can be attached to a document: modularity

CSS, examples

<style type=”text/css”> body { color: black; background: white; font-family: verdana, sans serif;} h1, h2 { color: red; } p.new { color: green; }</style>

CSS: layout

<div class=”box”>The content within this DIV element will be enclosed in a box with a thin line around it.</div>

div.box { border: solid; border-width: thin; width: 100%; padding: 2em;}

CSS2

free layout can be described for elements

dynamic changes of contents and style, animations etc.

Dynamic HTML

HTML ECMAScript (JavaScript, JScript) CSS DOM

Three-tier architecture

browser web server: processing logic database server

Examples

1. Browser asks for a page. 2. Server sends the page. 3. Browser shows the page.

1. As above, but the page contains a form, which the user fills out. 2. Based on the data of the form, server starts an application which queries a database and forms a new page

Browser vs. server browser interprets CSS-definitions HTML documents may include

embedded JavaScript scripts, which are run in the browser

problems: the implementations of CSS vary, JavaScript may be switched off

most of the functionality on the server side?

XML

Extensible Markup Language (1998) developed for interchanging structured

documents in Internet used more and more as a platform

independent data format between applications

document vs. data

<memo importance=”high” date=”19990323”> <from>Paul V. Biron</from> <to>Ashok Malhotra</to> <subject>Latest draft</subject> <body> We need to discuss the latest draft <emph>immediately</emph>. Either email me at <email> mailto:paul.v.biron@kp.org</email> or call <phone>555-9876</phone> </body></memo>

”Document”:

<invoice> <orderDate>19990121</orderDate> <shipDate>19990125</shipDate> <billingAddress> <name>Ashok Malhotra</name> <street>123 IBM Ave.</street> <city>Hawthorne</city> <state>NY</state> <zip>10532-0000</zip> </billingAddress> <voice>555-1234</voice> <fax>555-4321</fax></invoice>

”Data”:

<body> <p><b>Order date:</b> 19990121</p> <p><b>Shipping date:</b> 19990125</p> <p><b>Address:</b></p> <table> <tr><th>name<th>street<th>city<th>state<th>zip <tr><td>Ashok Malhotra <td>123 IBM Ave. <td>Hawthorne <td>NY <td>10532-0000 </table> <p>Phone: 555-1234</p> <p>Fax: 555-4321</p></body>

Basic concepts: logical structure

logical structure: elements names of elements can be chosen

freely elements can have attributes logical structure is described by a

document type definition (DTD)

Elements

elements can be containers, which can contain other elements and/or text, e.g.

<name><fname>Helena</fname> <lname>Ahonen</lname></name>

an element can also be empty: <img src=”picture.jpg” alt=”Picture” />

Attributes

attributes express information that is not really content

attribute/value pairs are attached to the start tag of an element

<memo importance=”high”>…</memo> it may be difficult to decide whether

some information should be modeled as an element or as an attribute

Attribute or element?

<memo date=”060600”> <from>Ashok Malhotra</from> <to>Peter May</to> …</memo>

<memo> <from>Ashok Malhotra</from> <to>Peter May</to> <date>060600</date> ...</memo>

Defining the structure: DTD

document type definition (DTD) describes how the elements are formed

from the other elements and text defines which attributes an element

may/must have

Examples of definitions

<!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street,

(city, state, zipcode) | (zipcode, city))> <!ELEMENT contact

(address, phone*, email?)> <!ELEMENT contact2

(address | phone | email)*>

Symbols

+ : 1 or more * : 0 or more ? : 0 or 1 | : choice (one has to be chosen) () : grouping , : order

DTD for the Invoice example

<!DOCTYPE invoice [<!ELEMENT invoice (orderDate, shipDate, billingAddress voice*, fax?)><!ELEMENT orderDate (#PCDATA)><!ELEMENT shipDate (#PCDATA)><!ELEMENT billingAddress (name, street, city, state, zip)><!ELEMENT voice (#PCDATA)><!ELEMENT fax (#PCDATA)><!ELEMENT name (#PCDATA)><!ELEMENT street (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT zip (#PCDATA)>]>

Note:

elements cannot overlap container elements must have end tags empty elements: <br /> all names are case-sensitive attribute values must be delimited by

quotation marks

Well-formed XML documents

documents that adhere to the formal requirements (syntax) of the XML specification

if a document is not well-formed, it is not an XML document (and the XML tools do not have to process it)

Valid documents

a document is a valid XML-document, if it is well-formed and adheres to the structure defined in the DTD given

XML-processor can be validating or non-validating

sometimes validity is important, sometimes not

Where do the DTDs come from?

general DTDs: communities that have to be able to interchange information agree on a common DTD

also standard-like: MathML, SMIL tailored DTDs can be designed for the

own use

XML basics: physical structure

physical structure: entities ”file structure”: a document is assembled

from parts: e.g. chapters of a book (each in one file)

including parts that appear often non-XML content: e.g. images characters that are not found in the

keyboard

Entities

In DTD: <!ENTITY HY ”Helsingin yliopisto”>

dokumentin sisällä: <place>&HY;</place>

Defining the presentation

names of elements are arbitrary: the browsers cannot know how an element should be presented

presentation is defined using a separate stylesheet (CSS, XSL)

one stylesheet - many documents one document - many stylesheets

Extensible Style Language (XSL)

specification contains two parts: transformation language XSLT and formatting objects

XSLT-transformation can express many kinds of transformations: elements can be inserted and deleted, elements can be reordered etc.

standardization of formatting objects not ready

Transformation target

XSLT-transformations can be used for transformations into several different representations

since the standardization of general formatting objects is not ready, transforming XML into HTML is a good choice

transformations into other XML-formats, PDF, etc. also possible

<sales> <products><product id=”p1”>Packing Boxes</product> <product id=”p2”>Packing Tape</product> </products> <record><cust num=”C1001”> <prodsale idref=”p1”>100</prodsale> <prodsale idref=”p2”>200</prodsale> </cust> <cust num=”C1002”> <prodsale idref=”p2”>50</prodsale> </cust> <cust num=”C1003”> <prodsale idref=”p1”>75</prodsale> <prodsale idref=”p2”>15</prodsale> </cust> </record></sales>

<body> <h2>Record of Sales</h2>

<ul> <li>C1001 - Packing Boxes - 100</li> <li>C1001 - Packing Tape - 200</li> <li>C1002 - Packing Tape - 50</li> <li>C1003 - Packing Boxes - 75</li> <li>C1003 - Packing Tape - 15</li> </ul></body>

XSLT transformations XML document is seen as a tree how do we get from the source tree to the

target tree? transformation rules are matched to the

parts of the tree, and transformations defined by the rules are applied

tree is often traversed starting from root contents can be picked from any part

<xsl:template match=”/”> <html><head><title>Record of Sales</title></head> <body><h2>Record of Sales</h2> <xsl:apply-templates select=”/sales/record”/> </body></html></xsl:template>

<xsl:template match=”record”> <ul><xsl:apply-templates/></ul></xsl:template>

<xsl:template match=”prodsale”> <li><xsl:value-of select=”../@num”/> <xsl:text> - </xsl:text> <xsl:value-of select=”id(@idref)”/> <xsl:text> - </xsl:text> <xsl:value-of select=”.”/></li></xsl:template></xsl:stylesheet>

Other XML related standards

XHTML Xlink XML Schema DOM RDF

XHTML

Extensible HyperText Markup Language (v. 1.0 January 2000)

redefinition of HTML using XML XHTML documents can be processed

using XML tools

XHTML: modularization

XHTML facilitates creating new document types:

a subset can be used (e.g. for presentation on different devices)

definitions can be expanded (special elements, e.g. for representation of medical information)

XLink

XML Linking Language (July 2000) links can have several targets types, roles, etc. can be attached to link links can be stored separately from the

document link can point to an arbitrary location in the

target document behavior of the link can be defined

DOM

Document Object Model (Sep 2000) defines a platform- and language-

neutral programming interface (API) for HTML ja XML documents

defines how programs and scripts can retrieve, insert, delete, and modify contents, structure and styles

XML Schema

Sep 2000 the modeling power of DTD is restricted datatyping: e.g. date, integer database schema-like representation:

constraints e.g. how many times the element may occur

RDF

Resource Description Framework (Mar 2000)

RDF can be used for describing metadata of web resources

metadata for search engines, for managing large collections, for depicting the parts of a large document etc.

XML vs. HTML

Good in HTML

well-known and broadly used: large public can use easily

browsers know how to show: it is not necessary to define the presentation separately

heterogenous material is simple to combine using hyperlinks

Bad in HTML

contents and presentation intermingle: multiple usages in different contexts is difficult

accessing parts of a document is hard representing complex structures is

difficult automatization is difficult

Good in XML

contents in one place -> several presentations for several media

automatic processing of documents is easier: more precise queries, transformations, retrieving specific data

structure of documents can be validated

Bad in XML

meaning of elements have to be known presentation does not exist

automatically: stylesheets have to be given

creating documents may require using special editors or laborious conversion

browsers do not support well, yet

XML in system architectures

basically like with HTML (three-tier) use of XML is influenced by the nature of

the contents (”data” or ”document”) ”data”: XML as an interchange format

between applications (storage e.g. in relational databases)

”document”: content management systems (often based on object databases)

Browser vs. server

decision: where the final presentation is formed?

If the browser understands XSL, formatting can be given to the browser; otherwise the server transforms the document into HTML with CSS-styles

probably always some transformation from the original XML format

Tools

editors: XML, XSL, DTD, XML Schema parsers (included in many tools) XSL-engines content management systems (e.g.,

managing document components, version managements, assembly)

e-commerce tools

Technology providers

Microsoft, IBM / AlphaWorks publishing technology providers

(Arbortext, SoftQuad, Chrystal Software, Poet)

database technology providers (Oracle, Sybase)

public domain software, prototypes, etc. (e.g. Apache Cocoon -project)

XML portals

www.xml.com www.xml.org www.w3c.org www.oasis-open.org www.xmlsoftware.com www.cs.helsinki.fi/~hahonen/uumek00/

sisalto/xml/ (New Media course)