SWE 444 - Internet & Web Application Development5.1 5. Data Description and Transformation 1. XML 2....

SWE 444 - Internet & Web Application Development5.1 5. Data Description and Transformation 1. XML 2. XPath 3. XSL /XSLT 4. DTD 5. XSD 6. DOM

SWE 444 - Internet & Web Application Development5.2 5.1 XML What is XML? Why XML? Brief History and Versions Sample XML Documents XML Namespaces

SWE 444 - Internet & Web Application Development5.3 What is XML? XML stands for EXtensible Markup Language A meta-language for descriptive markup: you invent your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe the data XML with a DTD or XML Schema is designed to be self-descriptive Built-in internationalization via Unicode Built-in error-handling A forgotten tag, or an attribute without quotes renders an XML document unusable Tons of support from the big IT companies

SWE 444 - Internet & Web Application Development5.4 Why XML? Much of shareable data reside in computer systems and databases in incompatible formats use conflicting hardware and/or software. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet Converting the data to XML can greatly reduce the complexity and create data that can be read by many different applications XML data is stored in plain text format hardware and software independent XML can be used to create new languages Allows us to define our own markup languages

SWE 444 - Internet & Web Application Development5.5 Brief XML History SGML (Standard Generalized Markup Language) ISO Standard, 1986, for data storage & exchange Metalanguage for defining languages (through DTDs) A famous SGML language: HTML Separation of content and display Used in U.S. gvt. & contractors, large manufacturing companies, technical info. Publishers,... SGML reference is 600 pages long XML W3C recommendation in 1998 Simple subset (80/20 rule) of SGML: ASCII of the Web, Semantic Web XML specification is 26 pages long

SWE 444 - Internet & Web Application Development5.6 Brief XML History 1986 SGML becomes a standard 1989 Tim Berners-Lee creates the WWW 1994 W3C established 1998 XML 1.0 W3C Recommendation Jan 2000 XHTML becomes W3C Recommendation A Reformulation of HTML 4 in XML 1.0 Feb 2004 W3c XML 1.0 (Third Edition) Recommendation http://www.w3.org/TR/2004/REC-xml-20040204/ Feb 2004 XML 1.1 Recommendation http://www.w3.org/TR/2004/REC-xml11-20040204/ updates XML to use Unicode 3

SWE 444 - Internet & Web Application Development5.7 XML and HTML XML is not a replacement for HTML In future Web development, XML is likely to be used to describe data while HTML will be used to format and display the same data (one interpretation of XML) XML and HTML were designed with different goals XML was designed to describe data and to focus on what data is XML describes only content, or meaning HTML was designed to display data and to focus on how data looks. HTML describes both structure (e.g.,, ) and appearance (e.g.,, ) XML is for computers while HTML is for humans XML is used to mark up data so it can be processed by computers HTML is used to mark up text so it can be displayed to users

SWE 444 - Internet & Web Application Development5.8 XML does not DO anything XML was not designed to DO anything A piece of software must be written to do something (send, receive or display the document) The following example is a book info, stored as XML: The Autobiography of Benjamin Franklin Benjamin Franklin 8.99

SWE 444 - Internet & Web Application Development5.9 XML is Free and Extensible XML tags are not predefined You must "invent" your own tags The tags used to mark up HTML documents and the structure of HTML documents are predefined The author of HTML documents can only use tags that are defined in the HTML standard XML allows the author to define his own tags and his own document structure

SWE 444 - Internet & Web Application Development5.10 XML Future XML is going to be everywhere A large number of software vendors adopted the XML standard very quickly XML is a cross-platform, software and hardware independent tool for transmitting information. Documents Configuration Database Application X Repository XML

SWE 444 - Internet & Web Application Development5.11 Benefits of XML Open W3C standard non-proprietary Representation of data across heterogeneous environments Cross platform Allows for high degree of interoperability E.g., ability to exchange data between incompatible applications with incompatible data formats Strict rules that make it relatively easy to write XML parsers Syntax Structure Case sensitive XML can make data more useful s/w, h/w and application independence of XML makes data available to more users not only HTML browsers

SWE 444 - Internet & Web Application Development5.12 Components of an XML Document XML declaration Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration Elements Each element has a beginning and ending tag ... Elements can be empty ( ) Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag

SWE 444 - Internet & Web Application Development5.13 Components of an XML Document Processing Instructions Elements Elements with Attributes

SWE 444 - Internet & Web Application Development5.14 XML Declaration The XML declaration looks like this: The XML declaration is not required by browsers, but is required by most XML processors (so include it!) If present, the XML declaration must be first--not even whitespace should precede it Note that the brackets are The version attribute is required encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted An XML document is standalone if it makes use of no external markup (DTD) declarations Default value for this attribute is no

SWE 444 - Internet & Web Application Development5.15 Processing Instructions A PI is a command to the program processing the XML document to handle it in a certain way PIs (Processing Instructions) may occur anywhere in the XML document (but usually first) XML documents are typically processed by more than one program Programs that do not recognize a given PI should just ignore it General format of a PI: Example:

SWE 444 - Internet & Web Application Development5.16 XML Elements An XML element is everything from the element's start tag to the element's end tag XML Elements are extensible and they have relationships Related as parents and children XML Elements have simple naming rules Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml..) Names cannot contain spaces

SWE 444 - Internet & Web Application Development5.17 XML Attributes XML elements can have attributes Data can be stored in child elements or in attributes Should you avoid using attributes? Here are some of the problems using attributes: attributes cannot contain multiple values (child elements can) attributes are not easily expandable (for future changes) attributes cannot describe structures (child elements can) attributes are more difficult to manipulate by program code attribute values are not easy to test against a Document Type Definition (DTD) - which is used to define the legal elements of an XML document Experience shows that attributes are handy in HTML but child elements should be used in their place in XML Use attributes only to provide information that is not relevant to the data

SWE 444 - Internet & Web Application Development5.18 An XML Document The Autobiography of Benjamin Franklin Benjamin Franklin 8.99 The Confidence Man Herman Melville 11.99

SWE 444 - Internet & Web Application Development5.19 Another XML Document 7/14/97 North Place, NX USA High Temp: 103 Low Temp: 70 Morning: Partly cloudy, Hazy Afternoon: Sunny & hot Evening: Clear and Cooler

SWE 444 - Internet & Web Application Development5.20 XML Validation There is a difference between a well-formed XML document and a valid XML document A well-formed XML document is one with correct XML syntax See next slide for well-formedness rules XML syntax is constrained by a grammar (DTD or Schema) that governs the permitted tag names, attachment of attributes to tags, and so on. A well-formed XML document that also conforms to a given DTD or schema is said to be valid. Every valid XML document is well-formed but the reverse is not necessarily the case

SWE 444 - Internet & Web Application Development5.21 Rules For Well-Formed XML There must be one, and only one, root element All XML elements must have a closing tag Sub-elements must be properly nested Attributes are optional Defined by an optional schema Attribute values must be enclosed in or Processing instructions are optional XML is case-sensitive

SWE 444 - Internet & Web Application Development5.22 XML DTD A DTD defines the legal elements of an XML document defines the document structure with a list of legal elements XML Schema XML Schema is an XML based alternative to DTD Errors in XML documents will stop the XML program The W3C XML specification states that a program should not continue to process an XML document if it finds a validation error Processing an XML document requires a software program called an XML Parser (or XML Processor) http://www.xml.com/xml/pub/Guide/xml_parsers There are two flavors of parsers: Non-validating: checks for a documents well-formedness (e.g., Browsers) Validating: checks for a documents validity

SWE 444 - Internet & Web Application Development5.23 Browsers Support for XML Netscape 6 supports XML Internet Explorer 5.0 supports the XML 1.0 standard Internet Explorer 5.0 has the following XML support: Viewing of XML documents Displaying XML with CSS Transforming and displaying XML with XSL XML embedded in HTML as Data Islands Binding XML data to HTML elements Access to the XML DOM Full support for W3C DTD standards

SWE 444 - Internet & Web Application Development5.24 Viewing XML Documents Raw XML files can be viewed in IE 5.0 (and higher) and in Netscape 6 XML documents do not carry information about how to display the data To make them display like a web page, you have to add some display information Different solutions to the display problem, using CSS, XSL, XML Data Islands, and JavaScript Will you be writing your future Homepages in XML? Most Microsoft pages are XML based and the server converts them to HTML on-the-fly when requested

SWE 444 - Internet & Web Application Development5.25 Displaying XML with CSS With CSS (Cascading Style Sheets) you can add display information to an XML document Formatting XML with CSS is NOT the future of the Web Formatting with XSL will be the new standard

SWE 444 - Internet & Web Application Development5.26 Example: the xml file Empire Burlesque Bob Dylan USA Columbia 10.90 1985 Hide your heart Bonnie Tyler UK CBS Records 9.90 1988..

SWE 444 - Internet & Web Application Development5.27 Example: the css file CATALOG { background-color: white; width: 100%; } CD { display: block; margin-bottom: 30pt; margin-left: 0; } TITLE { color: red; font-size: 20pt; } ARTIST { color: blue; font-size: 20pt; } COUNTRY,PRICE,YEAR,COMPANY { display: block; color: black; margin-left: 20pt; }

SWE 444 - Internet & Web Application Development5.28 Displaying XML with XSL With XSL you can add display information to your XML document XSL is the preferred style sheet language of XML XSL (the eXtensible Stylesheet Language) is far more sophisticated than CSS One way to use XSL is to transform XML into HTML before it is displayed by the browser

SWE 444 - Internet & Web Application Development5.29 Example: the xml file Belgian Waffles $5.95 two of our famous Belgian Waffles with plenty of real maple syrup 650 Strawberry Belgian Waffles $7.95 light Belgian waffles covered with strawberries and whipped cream 900

SWE 444 - Internet & Web Application Development5.30 Example: the xsl file - ( calories per serving)

SWE 444 - Internet & Web Application Development5.31 View the result in IE 6

SWE 444 - Internet & Web Application Development5.32 XML Embedded in HTML XML can be embedded within HTML pages in Data Islands Manipulated via client side script or data binding The unofficial tag is used to embed XML data within HTML The id attribute of the tag defines an ID for the data island, and the src attribute points to the XML file to embed: The next step is to format and display the data in the data island by binding it to HTML elements.

SWE 444 - Internet & Web Application Development5.33 Bind Data Island to HTML Elements Data Islands can be bound to HTML elements (like HTML tables) An XML data island with ID cdcat is loaded from an external file XML file An HTML table is bound to the data Island with a datasrc attribute The td elements are bound to the XML data with a datafld attribute inside a span.

SWE 444 - Internet & Web Application Development5.34 The Microsoft XML Parser To read and update an XML document, you need an XML parser The Microsoft XML parser comes with Microsoft Internet Explorer 5.0 Once you have installed IE 5.0, the parser is available to scripts, both inside HTML documents. The parser features a language-neutral programming model that supports: JavaScript, VBScript, Perl, VB, Java, C++ and more W3C XML 1.0 and XML DOM DTD and validation You can create an XML document object with the following code: var xmlDoc=new ActiveXObject("Microsoft.XMLDOM")

SWE 444 - Internet & Web Application Development5.35 Loading an XML file into the parser XML files can be loaded into the parser using script code. The following code loads an XML document (note.xml) into the XML parser: var xmlDoc = new ActiveXObject("Microsoft.XMLDOM") xmlDoc.async="false" xmlDoc.load("note.xml") //....... processing the document goes here The second line in the code above creates an instance of the Microsoft XML parser The third line turns off asynchronized loading, to make sure that the parser will not continue execution before the document is fully loaded The fourth line tells the parser to load the XML document called note.xml We will revisit these issues later

SWE 444 - Internet & Web Application Development5.36 Namespaces XML allows you to define a new document format by combining and reusing other formats This can lead to name conflicts since the document formats being combined may have the same element names that are used for different purposes Namespaces allow authors to differentiate between tags of the same name (using a prefix) That is, name conflicts are solved using a prefix Frees author to focus on the data and decide how to best describe it The W3C namespace specification states that a namespace should be identified by a URI (Uniform Resource Identifier) A URI is a string of characters which identifies an Internet resource A URL is the most common URI used to identify resources and their location on the Internet Another less common type of URI is URN (Universal Resource Name) When a URL is used in a namespace declaration, the URL does NOT have to represent a live server The only purpose is to give the namespace a unique name. However, very often companies use the namespace as a pointer to a real Web page containing information about the namespace

SWE 444 - Internet & Web Application Development5.37 Namespaces: Declaration xmlns: bk = "http://www.example.com/bookinfo/" xmlns: bk = "urn:mybookstuff.org:bookinfo" Namespace declaration Namespace declaration examples: Prefix URI (URL) xmlns: bk = "http://www.example.com/bookinfo/"

SWE 444 - Internet & Web Application Development5.38 Namespaces: Examples All About XML Joe Developer 19.99 All About XML Joe Developer 19.99

SWE 444 - Internet & Web Application Development5.39 Namespaces: Default Namespace An XML namespace declared without a prefix becomes the default namespace for all sub-elements All elements without a prefix will belong to the default namespace: All About XML Joe Developer

SWE 444 - Internet & Web Application Development5.40 Namespaces: Scope Unqualified elements belong to the inner-most default namespace. BOOK, TITLE, and AUTHOR belong to the default BOOK namespace PUBLISHER and NAME belong to the default PUBLISHER namespace All About XML Joe Developer Microsoft Press

SWE 444 - Internet & Web Application Development5.41 Namespaces: Attributes Unqualified attributes do NOT belong to any namespace Even if there is a default namespace They dont need to since scope of attributes is only within the element for which they are attributes This differs from elements, which belong to the default namespace

SWE 444 - Internet & Web Application Development5.42 Entities Entities provide a mechanism for textual substitution for special characters, e.g. XML parsers normally parse all the text in an XML document When an XML element is parsed, the text between the XML tags is also parsed If you place special characters like

SWE 444 - Internet & Web Application Development5.43 CDATA By default, all text inside an XML document is parsed You can force text to be treated as unparsed character data by enclosing it in Any characters, even & and cannot occur inside a CDATA CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text) Example:

SWE 444 - Internet & Web Application Development5.44 References W3 Schools XML Tutorial http://www.w3schools.com/xml/default.asp W3C XML page http://www.w3.org/XML/ XML Tutorials http://www.programmingtutorials.com/xml.aspx Online resource for markup language technologies http://xml.coverpages.org/ Several Online Presentations

SWE 444 - Internet & Web Application Development5.45 5.2 XPath What is XPath? Sample Syntactic Elements Path Slashes Brackets Stars Arithmetic Expressions Some XPath Functions

SWE 444 - Internet & Web Application Development5.46 What is XPath? XPath is a syntax used for selecting parts of an XML document The way XPath describes paths to elements is similar to the way an operating system describes paths to files XPath is almost a small programming language; it has functions, tests, and expressions XPath is a W3C standard http://www.w3.org/TR/xpath XPath is not itself written as XML, but is used heavily in XSLT

SWE 444 - Internet & Web Application Development5.47

SWE 444 - Internet & Web Application Development5.48 Terminology library is the parent of book ; book is the parent of the two chapters The two chapters are the children of book, and the section is the child of the second chapter The two chapters of the book are siblings (they have the same parent) library, book, and the second chapter are the ancestors of the section The two chapters, the section, and the two paragraphs are the descendents of the book

SWE 444 - Internet & Web Application Development5.49 Paths /library = the root element (if named library ) Operating system: XPath: / = the root directory /users/dave/foo = the file named foo in dave in users /library/book/chapter/section = every section element in a chapter in every book in the library. = the current directory. = the current element.. = the parent directory.. = parent of the current element /users/dave/* = all the files in /users/dave /library/book/chapter/* = all the elements in /library/book/chapter foo = the file named foo in the current directory section = every section element that is a child of the current element

SWE 444 - Internet & Web Application Development5.50 Slashes A path that begins with a / represents an absolute path, starting from the top of the document Example: /email/message/header/from Note that even an absolute path can select more than one element A slash by itself means the whole document A path that does not begin with a / represents a path starting from the current element Example: header/from A path that begins with // can start from anywhere in the document Example: //header/from selects every element from that is a child of an element header This can be expensive, since it involves searching the entire document

SWE 444 - Internet & Web Application Development5.51 Brackets and last() A number in brackets selects a particular matching child Example: /library/book[1] selects the first book of the library Example: //chapter/section[2] selects the second section of every chapter in the XML document Example: //book/chapter[1]/section[2] Only matching elements are counted; for example, if a book has both section s and exercise s, the latter are ignored when counting section s The function last() in brackets selects the last matching child Example: /library/book/chapter[last()] You can even do simple arithmetic Example: /library/book/chapter[last()-1]

SWE 444 - Internet & Web Application Development5.52 Stars A star, or asterisk, is a wild card--it means all the elements at this level Example: /library/book/chapter/* selects every child of every chapter of every book in the library Example: //book/* selects every child of every book ( chapter s, tableOfContents, index, etc.) Example: /*/*/*/paragraph selects every paragraph that has exactly three ancestors Example: //* selects every element in the entire document

SWE 444 - Internet & Web Application Development5.53 Attributes I You can select attributes by themselves, or elements that have certain attributes Remember: an attribute consists of a name-value pair, for example in, the attribute is named num To choose the attribute itself, prefix the name with @ Example: @num will choose every attribute named num Example: //@* will choose every attribute, everywhere in the document To choose elements that have a given attribute, put the attribute name in square brackets Example: //chapter[@num] will select every chapter element (anywhere in the document) that has an attribute named num

SWE 444 - Internet & Web Application Development5.54 Attributes II //chapter[@num] selects every chapter element with an attribute num //chapter[not(@num)] selects every chapter element that does not have a num attribute //chapter[@*] selects every chapter element that has any attribute //chapter[not(@*)] selects every chapter element with no attributes

SWE 444 - Internet & Web Application Development5.55 Values of attributes //chapter[@num='3'] selects every chapter element with an attribute num with value 3 The normalize-space() function can be used to remove leading and trailing spaces from a value before comparison Example: //chapter[normalize-space(@num)="3"]

SWE 444 - Internet & Web Application Development5.56 Arithmetic Expressions + add - subtract * multiply div (not / ) divide mod modulo (remainder)

SWE 444 - Internet & Web Application Development5.57 Equality Tests = equals(Notice its not == ) != not equals But its not that simple! value = node-set will be true if the node-set contains any node with a value that matches value value != node-set will be true if the node-set contains any node with a value that does not match value Hence, value = node-set and value != node-set may both be true at the same time!

SWE 444 - Internet & Web Application Development5.58 Other Boolean Operators and (infix operator) or (infix operator) Example: count = 0 or count = 1 not() (function) The following are used for numerical comparisons only: < less than greater than >= greater than or equal to

SWE 444 - Internet & Web Application Development5.59 Some XPath Functions XPath contains a number of functions on node sets, numbers, and strings; here are a few of them: count(elem) counts the number of selected elements Example: //chapter[count(section)=1] selects chapter s with exactly one section child name() returns the name of the element Example: //*[name()='section'] is the same as //section starts-with(arg1, arg2) tests if arg1 starts with arg2 Example: //*[starts-with(name(), 'sec')] contains(arg1, arg2) tests if arg1 contains arg2 Example: //*[contains(name(), 'ect')] Examples http://www.zvon.org/xxl/XPathTutorial/General/examples.html

SWE 444 - Internet & Web Application Development5.60 References W3School XPath Tutorial http://www.w3schools.com/xpath/default.asp MSXML 4.0 SDK Several online presentations

SWE 444 - Internet & Web Application Development5.61 5.3 XSL / XSLT What is XSL? Some XSLT Constructs xsl:value-of xsl:for-each xsl:if xsl:choose xsl:sort xsl:text xsl:attribute Templates XSL on the Client XSL on the Server

SWE 444 - Internet & Web Application Development5.62 What is XSL? XSL stands for eXtensible Stylesheet Language a standard recommended by the W3C http://www.w3.org/TR/xsl/ CSS was designed for styling HTML pages, and can be used to style XML pages XSL was designed specifically to style XML pages, and is much more sophisticated than CSS XSL consists of three languages: XSLT (XSL Transformations) is a language used to transform XML documents into other kinds of documents (most commonly HTML, so they can be displayed) XPath is a language to select parts of an XML document to transform with XSLT XSL-FO (XSL Formatting Objects) is a replacement for CSS The future of XSL-FO as a standard is uncertain, because much of its functionality overlaps with that provided by cascading style sheets (CSS) and the HTML tag set

SWE 444 - Internet & Web Application Development5.63 How does it work? The XML source document is parsed into an XML source tree You use XPath to define templates that match parts of the source tree You use XSLT to transform the matched part and put the transformed information into the result tree The result tree is output as a result document Parts of the source document that are not matched by a template are typically copied unchanged

SWE 444 - Internet & Web Application Development5.64 Simple XPath Heres a simple XML document: XML Gregory Brill Java and XML Brett Scott XPath expressions look a lot like paths in a computer file system / means the document itself (but no specific elements) /library selects the root element /library/book selects every book element //author selects every author element, wherever it occurs

SWE 444 - Internet & Web Application Development5.65 Simple XSLT loops through every book element, everywhere in the document chooses the content of the title element at the current location chooses the content of the title element for each book in the XML document

SWE 444 - Internet & Web Application Development5.66 Using XSL to Create HTML Our goal is to turn this: XML Gregory Brill Java and XML Brett Scott Into HTML that displays something like this: Book Titles: XML Java and XML Book Authors: Gregory Brill Brett Scott Note that weve grouped titles and authors separately

SWE 444 - Internet & Web Application Development5.67 What we need to do We need to save our XML into a file (lets call it books.xml ) We need to create a file (say, books.xsl ) that describes how to select elements from books.xml and embed them into an HTML page We do this by intermixing the HTML and the XSL in the books.xsl file We need to add a line to our books.xml file to tell it to refer to books.xsl for formatting information

SWE 444 - Internet & Web Application Development5.68 books.xml, revised XML Gregory Brill Java and XML Brett McLaughlin This tells you where to find the XSL file

SWE 444 - Internet & Web Application Development5.69 Desired HTML Book Titles and Authors Book titles: XML Java and XML Book authors: Gregory Brill Brett Scott Red text is data extracted from the XML document Blue text is our HTML template We dont necessarily know how much data we will have

SWE 444 - Internet & Web Application Development5.70 XSL Outline...

SWE 444 - Internet & Web Application Development5.71 Selecting Titles and Authors Book titles: Book authors:...same thing, replacing title with author Notice that XSL can rearrange the data; the HTML result can present information in a different order than the XML Notice the xsl:for- each loop

SWE 444 - Internet & Web Application Development5.72 All of books.xml XML Gregory Brill Java and XML Brett Scott Note: if you do View Source, this is what you will see, not the resultant HTML

SWE 444 - Internet & Web Application Development5.73 All of books.xsl Book Titles and Authors Book titles: Book authors:

SWE 444 - Internet & Web Application Development5.74 How to use it In a modern browser, such as Netscape 6, Internet Explorer 6, or Mozilla 1.0, you can just open the XML file Older browsers will ignore the XSL and just show you the XML contents as continuous text You can use a program such as Xalan, MSXML, or Saxon to create the HTML as a file This can be done on the server side, so that all the client side browser sees is plain HTML The server can create the HTML dynamically from the information currently in XML

SWE 444 - Internet & Web Application Development5.75 The result (in IE)

SWE 444 - Internet & Web Application Development5.76 XSLT XSLT stands for eXtensible Stylesheet Language Transformations XSLT is used to transform XML documents into other kinds of documents--usually, but not necessarily, XHTML XSLT uses two input files: The XML document containing the actual data The XSL document containing both the framework in which to insert the data, and XSLT commands to do so

SWE 444 - Internet & Web Application Development5.77 Understanding the XSLT Process

SWE 444 - Internet & Web Application Development5.78 The XSLT Processor

SWE 444 - Internet & Web Application Development5.79 The.xsl file An XSLT document has the.xsl extension The XSLT document begins with: Contains one or more templates, such as: ... And ends with: The template says select the entire file You can think of this as selecting the root node of the XML tree

SWE 444 - Internet & Web Application Development5.80 Where XSLT can be used A server can use XSLT to change XML files into HTML files before sending them to the client A modern browser can use XSLT to change XML into HTML on the client side This is what we will mostly be doing here Most users seldom update their browsers If you want everyone to see your pages, do any XSL processing on the server side Otherwise, think about what best fits your situation

SWE 444 - Internet & Web Application Development5.81 xsl:value-of selects the contents of an element and adds it to the output stream The select attribute is required Notice that xsl:value-of is not a container tag, hence it needs to end with a slash

SWE 444 - Internet & Web Application Development5.82 xsl:for-each xsl:for-each is a kind of loop statement The syntax is Text to insert and rules to apply Example: to select every book ( //book ) and make an unordered list ( ) of their titles ( title ), use:

SWE 444 - Internet & Web Application Development5.83 Filtering Output You can filter (restrict) output by adding a criterion to the select attributes value: This will select book titles by Brett Scott

SWE 444 - Internet & Web Application Development5.84 Filter Details Here is the filter we just used: author is a sibling of title, so from title we have to go up to its parent, book, then back down to author This filter requires a quote within a quote, so we need both single quotes and double quotes Legal filter operators are: = != < > Numbers should be quoted

SWE 444 - Internet & Web Application Development5.85 But it doesnt work right! Heres what we did: This will output and for every book, so we will get empty bullets for authors other than Brett Scott There is no obvious way to solve this with just xsl:value-of

SWE 444 - Internet & Web Application Development5.86 xsl:if xsl:if allows us to include content if a given condition (in the test attribute) is true Example: This does work correctly!

SWE 444 - Internet & Web Application Development5.87 xsl:choose The xsl:choose... xsl:when... xsl:otherwise construct is XMLs equivalent of Javas switch... case... default statement The syntax is:... some code...... some code... xsl:choose is often used within an xsl:for-each loop

SWE 444 - Internet & Web Application Development5.88 xsl:sort You can place an xsl:sort inside an xsl:for-each The attribute of the sort tells what field to sort on Example: by This example creates a list of titles and authors, sorted by author

SWE 444 - Internet & Web Application Development5.89 xsl:text Used inside templates to indicate that its contents should be output as text Its contents are pure text, not elements, and white space is not collapsed... helps deal with two common problems: XSL isnt very careful with whitespace in the document This doesnt matter much for HTML, which collapses all whitespace anyway gives you much better control over whitespace; it acts like the element in HTML Since XML defines only five entities, you cannot readily put other entities (such as ) in your XSL These are & (&), < ( ), " (), ' () Others can be inserted using their decimal or hexadecimal number forms You may use the following secret formula for entities: A yes value means special characters like

SWE 444 - Internet & Web Application Development5.110 A DTD example ]> A novel consists of a foreword and one or more chapter s, in that order Each chapter must have a number attribute A foreword consists of one or more paragraph s A chapter also consists of one or more paragraph s A paragraph consists of parsed character data (text that cannot contain any other elements) PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

SWE 444 - Internet & Web Application Development5.111 ELEMENT descriptions Suffixes: ? optional foreword? + one or more chapter+ * zero or more appendix* Separators, both, in order foreword?, chapter+ | or section|chapter Grouping ( ) grouping (section|chapter)+

SWE 444 - Internet & Web Application Development5.112 Elements without children The syntax is The name is the element name used in start and end tags The category may be EMPTY : In the DTD: In the XML: or just In the XML, an empty element may not have any content between the start tag and the end tag An empty element may (and usually does) have attributes

SWE 444 - Internet & Web Application Development5.113 Elements with unstructured children The syntax is The category may be ANY This indicates that any content -- character data, elements, even undeclared elements -- may be used Since the whole point of using a DTD is to define the structure of a document, ANY should be avoided wherever possible The category may be (#PCDATA), indicating that only character data may be used In the DTD: In the XML: A shot rang out! The parentheses are required! Note: In (#PCDATA), whitespace is kept exactly as entered Elements may not be used within parsed character data Entities are character data, and may be used

SWE 444 - Internet & Web Application Development5.114 Elements with children A category may describe one or more children: Parentheses are required, even if there is only one child A space must precede the opening parenthesis Commas (,) between elements mean that all children must appear, and must be in the order specified | separators means any one child may be used All child elements must themselves be declared Children may have children Parentheses can be used for grouping:

SWE 444 - Internet & Web Application Development5.115 Elements with mixed content #PCDATA describes elements with only character data #PCDATA can be used in an or grouping: This is called mixed content Certain (rather severe) restrictions apply: #PCDATA must be first The separators must be | The group must be starred (meaning zero or more)

SWE 444 - Internet & Web Application Development5.116 Names and namespaces All names of elements, attributes, and entities, in both the DTD and the XML, are formed as follows: The name must begin with a letter or underscore The name may contain only letters, digits, dots, hyphens, underscores, and colons The DTD doesnt know about namespaces -- as far as it knows, a colon is just part of a name The following are different (and both legal): Avoid colons in names, except to indicate namespaces

SWE 444 - Internet & Web Application Development5.117 An expanded DTD example ]>

SWE 444 - Internet & Web Application Development5.118 Attributes and entities In addition to elements, a DTD may declare attributes and entities An attribute describes information that can be put within the start tag of an element In XML: In DTD: An entity describes text to be substituted In XML: &copyright; In the DTD:

SWE 444 - Internet & Web Application Development5.119 Attributes The format of an attribute is: where the name-type-requirement may be repeated as many times as desired Note that only spaces separate the parts, so careful counting is essential The element-name tells which element may have these attributes The name is the name of the attribute Each attribute has a type, such as CDATA (character data) Each attribute may be required, optional, or fixed In the XML, attributes may occur in any order

SWE 444 - Internet & Web Application Development5.120 Important attribute types There are ten attribute types These are the most important ones: CDATA The value is character data (man|woman|child) The value is one from this list ID The value is a unique identifier ID values must be legal XML names and must be unique within the document NMTOKEN The value is a legal XML name This is sometimes used to disallow whitespace in the name It also disallows numbers, since an XML name cannot begin with a digit The other seven, less frequently used, are: IDREF, IDREFS, NMTOKENS, ENTITY, ENTITIES, NOTATION, xml:

SWE 444 - Internet & Web Application Development5.121 Requirements Recall that an attribute has the form The requirement is one of: A default value, enclosed in quotes Example: #REQUIRED The attribute must be present #IMPLIED The attribute is optional #FIXED "value" The attribute always has the given value If specified in the XML, the same value must be used

SWE 444 - Internet & Web Application Development5.122 Entities There are exactly five predefined entities: , &, ", and ' Additional entities can be defined in the DTD: Entities can be defined in another document: Example of use in the XML: This document is &copyright; 2002. Entities are a way to include fixed text (sometimes called boilerplate) Entities should not be confused with character references, which are numerical values between & and # Example: &233#; or &xE9#; to indicate the character

SWE 444 - Internet & Web Application Development5.123 Another example: XML 05/29/2002 Philadelphia, PA USA 84 51

SWE 444 - Internet & Web Application Development5.124 The DTD for this example

SWE 444 - Internet & Web Application Development5.125 Inline DTDs If a DTD is used only by a single XML document, it can be put directly in that document: ]> An inline DTD can be used only by the document in which it occurs

SWE 444 - Internet & Web Application Development5.126 External DTDs An external DTD (a DTD that is a separate document) is declared with a SYSTEM or a PUBLIC command: The name that appears after DOCTYPE (in this example, myRootElement ) must match the name of the XML documents root element Use SYSTEM for external DTDs that you define yourself, and use PUBLIC for official, published DTDs The file extension for an external DTD is.dtd External DTDs can only be referenced with a URL External DTDs are almost always preferable to inline DTDs, since they can be used by more than one document

SWE 444 - Internet & Web Application Development5.127 Limitations of DTDs DTDs are a very weak specification language You cant put any restrictions on element contents Its difficult to specify: All the children must occur, but may be in any order This element must occur a certain number of times There are only ten data types for attribute values But most of all: DTDs arent written in XML! If you want to do any validation, you need one parser for the XML and another for the DTD This makes XML parsing harder than it needs to be There is a newer and more powerful technology: XML Schemas However, DTDs are still very much in use

SWE 444 - Internet & Web Application Development5.128 Validators Opera 5 and Internet Explorer 5 can validate your XML against an internal DTD IE provides (slightly) better error messages Opera apparently just ignores external DTDs IE considers an external DTD to be an error jEdit with the XML plugin will check for well- structuredness and (if the DTD is inline) will validate your XML each time you do a Save http://www.jedit.org/ http://www.jedit.org/ Validate [Using Inline DTD] http://www.stg.brown.edu/service/xmlvalid/

SWE 444 - Internet & Web Application Development5.129 References W3School DTD Tutorial http://www.w3schools.com/dtd/default.asp MSXML 4.0 SDK http://www.topxml.com http://www.xml.org http://www.xml.com Several online presentations

SWE 444 - Internet & Web Application Development5.130 5.5 XML Schema Definition (XSD) What is XSD? An XML Document with Its Schema Referencing A Schema from XML Document Simple and Complex Elements Predefined Types Numeric types Date and Time types String types Defining Schema Components Simple Elements Attributes Restrictions or Facets Enumeration Complex Elements

SWE 444 - Internet & Web Application Development5.131 What is XML Schema? The origin of schema XML Schema documents are used to define and validate the content and structure of XML data XML Schema was originally proposed by Microsoft, but became an official W3C recommendation in May 2001 http://www.w3.org/XML/Schema

SWE 444 - Internet & Web Application Development5.132 Why Schema? Information Structure Format Traditional Document: Everything is clumped together Information Structure Format Fashionable Document: A document is broken into discrete parts, which can be treated separately Separating Information from Structure and Format

SWE 444 - Internet & Web Application Development5.133 Why Schema? Schema Workflow

SWE 444 - Internet & Web Application Development5.134 DTD vs. Schema DTDXSD No constraints on character dataCan constrain character data like requiring a string to be of a fixed characters Not using XML syntaxUses XML syntax and thus frees developer of the need to learn another language. XML transformations can be applied, too. No support for namespaceVery limited for reusability and extensibility Can reuse in other schemas, create own derived data types and reference multiple schemas from same document Easier to write DTD-based validators: may only need to check existence of content like PCDATA Schema-based validators are more difficult to write because we may have to validate content detail Easier to understandMore complex: The notion of type adds an extra layer of confusing complexity

SWE 444 - Internet & Web Application Development5.135 XML.org Registry The XML.org Registry offers a central clearinghouse for developers and standards bodies to publicly submit, publish and exchange XML schemas, vocabularies and related documents

SWE 444 - Internet & Web Application Development5.136 Example 1: An XML Document Instance

SWE 444 - Internet & Web Application Development5.137 Schema for Example 1 book.xsd

SWE 444 - Internet & Web Application Development5.138 Example 2: An XML Document and Its Schema Dear Mr. John Smith. Your order 1032 will be shipped on 2001-07-13.

SWE 444 - Internet & Web Application Development5.139 The XSD Document Since the XSD is written in XML, it can get confusing which we are talking about The file extension is.xsd The root element is The XSD starts like this:

SWE 444 - Internet & Web Application Development5.140 The element may have attributes: xmlns:xs="http://www.w3.org/2001/XMLSchema" Indicates that the elements used in the schema (schema, element, complextType, etc) come from this namespace elementFormDefault="qualified" This means that all XML elements must be qualified (i.e., prefixed with xs )

SWE 444 - Internet & Web Application Development5.141 Referring to a Schema To refer to a DTD in an XML document, the reference goes before the root element: ... To refer to an XML Schema in an XML document, the reference goes in the root element: ... xmlns:xsi=http://www.w3.org/2001/XMLSchema-instancehttp://www.w3.org/2001/XMLSchema-instance Schema instance namespace This attribute has two values for The namespace to use and the second value is the location of the XML schema to use for that namespace:

SWE 444 - Internet & Web Application Development5.142 Simple and Complex Elements A simple element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may have various restrictions applied to it If an element isnt simple, its complex A complex element may have attributes A complex element may be empty, or it may contain text, other elements, or both text and other elements

SWE 444 - Internet & Web Application Development5.143 Predefined Numeric Types Here are some of the predefined numeric types: Allowable restrictions on numeric types: enumeration, minInclusive, minExclusive, maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace xs:decimalxs:positiveInteger xs:bytexs:negativeInteger xs:shortxs:nonPositiveInteger xs:intxs:nonNegativeInteger xs:long

SWE 444 - Internet & Web Application Development5.144 Predefined Date and Time Types xs:date - A date in the format CCYY-MM-DD, for example, 2003-11-05 xs:time - A time in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime - Format is CCYY-MM-DDThh:mm:ss Allowable restrictions on dates and times: enumeration, minInclusive, minExclusive, maxInclusive, maxExclusive, pattern, whiteSpace

SWE 444 - Internet & Web Application Development5.145 Predefined String Types Recall that a simple element is defined as: Here are a few of the possible string types: xs:string - a string xs:normalizedString - a string that doesnt contain tabs, newlines, or carriage returns xs:token - a string that doesnt contain any whitespace other than single spaces Allowable restrictions on strings: enumeration, length, maxLength, minLength, pattern, whiteSpace

SWE 444 - Internet & Web Application Development5.146 Defining a Simple Element A simple element is defined as where: name is the name of the element the most common values for type are xs:booleanxs:integer xs:datexs:string xs:decimalxs:time Other attributes a definition of a simple element may have: default=" default value " if no other value is specified fixed=" value " no other value may be specified

SWE 444 - Internet & Web Application Development5.149 Restrictions, or Facets The age" element is a simple type with a restriction. The acceptable values are: 20 to 100 The example above could also have been written like this:

SWE 444 - Internet & Web Application Development5.150 Restrictions on numbers minInclusive number must be the given value minExclusive number must be > the given value maxInclusive number must be the given value maxExclusive number must be < the given value totalDigits number must have exactly value digits fractionDigits number must have no more than value digits after the decimal point

SWE 444 - Internet & Web Application Development5.151 Restrictions on strings length the string must contain exactly value characters minLength the string must contain at least value characters maxLength the string must contain no more than value characters pattern the value is a regular expression that the string must match whiteSpace not really a restriction - tells what to do with whitespace value="preserve" Keep all whitespace value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space

SWE 444 - Internet & Web Application Development5.152 Restriction with Regular Expression Patterns Test these and find out whether the semantics of regular expressions is the same as that in JavaScript

SWE 444 - Internet & Web Application Development5.153 Enumeration An enumeration restricts the value to be one of a fixed set of values Example:

SWE 444 - Internet & Web Application Development5.154 Complex Elements A complex element is defined as... information about the complex type... Example:

SWE 444 - Internet & Web Application Development5.155 Complex Elements Another example using a type attribute

SWE 444 - Internet & Web Application Development5.156 xs:sequence Weve already seen an example of a complex type whose elements must occur in a specific order:

SWE 444 - Internet & Web Application Development5.157 xs:all xs:all allows elements to appear in any order Despite the name, the members of an xs:all group can occur once or not at all You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1) In this context, n may only be 0 or 1

SWE 444 - Internet & Web Application Development5.158 Extensions You can base a complex type on another complex type...new stuff...

SWE 444 - Internet & Web Application Development5.159 Text Element with Attributes If a text element has attributes, it is no longer a simple type

SWE 444 - Internet & Web Application Development5.160 Empty Elements Empty elements are (ridiculously) complex

SWE 444 - Internet & Web Application Development5.161 Mixed Elements Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType element The text itself is not mentioned in the element, and may go anywhere (it is basically ignored) See Example 2 at the start of this section

SWE 444 - Internet & Web Application Development5.162 References W3School XSD Tutorial http://www.w3schools.com/schema/default.asp MSXML 4.0 SDK Several online presentations

SWE 444 - Internet & Web Application Development5.163 Reading List W3School XSD Tutorial http://www.w3schools.com/schema/default.asp

SWE 444 - Internet & Web Application Development5.164 5.6 XML DOM The XML DOM XML Parsers DOM-based SAX-based Examples Cross-browser XML DOM object creation Creating HTML table using XML data Some XML DOM properties and methods Creating XML using DOM methods

SWE 444 - Internet & Web Application Development5.165 XML DOM The DOM is a collection of interfaces that parser vendors and browser manufacturers implement To enable creation and manipulation of XML documents The DOM interfaces are specified in modules, making it possible for implementations to support parts of the DOM XML parsers, for instance, arent required to provide support for the HTML- specific part of the DOM The W3C DOM is separated into different parts (Core, XML, and HTML) and different levels (DOM Level 1/2/3): Core DOM - defines a standard set of objects for any structured document XML DOM - defines a standard set of objects for XML documents HTML DOM - defines a standard set of objects for HTML documents HTML DOM HTML DOM extends the Core XML DOM Core DOM provides interface definition for manipulating and working with any XML. HTML DOM augments this with additional interfaces definitions for HTML specific elements.

SWE 444 - Internet & Web Application Development5.166 XML DOM The XML DOM is designed to be used with any programming language and any operating system. It is fully described in the W3C DOM specification http://www.w3.org/DOM/ With the XML DOM, a programmer can create an XML document, navigate its structure, and add, modify, or delete its elements DOM provides generic access to DOM-compliant documents: add, edit, delete, manipulate DOM is language-independent The DOM is based on a tree view of your document. Nodes! Nodes! Nodes!

SWE 444 - Internet & Web Application Development5.167

SWE 444 - Internet & Web Application Development5.168 XML Parsers As mentioned earlier, a software program called a parser is required to process an XML document Parsers can support the DOM and/or the SAX for accessing a documents content programmatically using Java, C, JavaScript etc A DOM-based parser builds a tree structure containing the XML documents data in memory A SAX (Simple API for XML)-based parser processes the document and generates events when tags, text, comments etc are encountered SAX and DOM are standards for XML parsers DOM is a W3C standard SAX is an ad-hoc (but very popular) standard Examples: JAXP (Java API for XML Parsing), MSXML 3.0 (Microsoft XML parser), Xerces (Apatches Xerces parser) All support both SAX and DOM

SWE 444 - Internet & Web Application Development5.169 SAX Callbacks SAX works through callbacks: you call the parser, it calls methods that you supply Your program main(...) startDocument(...) startElement(...) characters(...) endElement( ) endDocument( ) parse(...) The SAX parser

SWE 444 - Internet & Web Application Development5.170 Difference between SAX and DOM DOMSAX Tree-based modelEvent-based model: invokes methods when markup is encountered Data can be accessed quickly (randomly) since all data is in memory No tree structure is created: data is passed to the application from the XML document as it is found. SAX provides only sequential access to data Provides facilities for adding and removing nodes (i.e., modifying the document) SAX implementations do not Requires too much space. Cannot be used for large XML documents Less memory overhead

SWE 444 - Internet & Web Application Development5.171 DOM components Document top-level view of the document, with access to all nodes (including root element) createElement method - creates an element node createAttribute method - creates an attribute node createComment method - creates a comment node getDocumentElement method - returns root element appendChild method - appends a child node getChildNodes method - returns child nodes

SWE 444 - Internet & Web Application Development5.172 DOM components II Node represents a node - "A node is a reference to an element, its attributes, or text from the document." cloneNode method - duplicates a node getNodeName method - returns the node name getNodeName method - returns the node's name getNodeType method - returns the node's type getNodeValue method - returns the node's value getParentNode method - returns the node's parent's name hasChildNodes method - true if has child nodes insertBefore method - stuffs child in before specified child removeChild method - removes the child node replaceChild method - replaces one child with another setNodeValue method - sets node's value

SWE 444 - Internet & Web Application Development5.173 DOM components III attribute represents an attribute node - getAttribute method - gets attribute! getTagName method - gets element's name removeAttribute method - deletes it setAttribute method - sets att's value

SWE 444 - Internet & Web Application Development5.174 DOM Access with JavaScript We have seen that the DOM can be mapped against XSLT style sheets to transform an XML document into a formatted Web page The DOM presents an XML document as a tree-structure (a node tree), with the elements, attributes, and text defined as nodes. We now show how JavaScript can be used to navigate the document tree and manipulate its data. Although this does not permit saving an XML document locally An XML file is made accessible to scripting by loading it into a Document Object Model. When the MSXML parser loads an XML document, it reads it from start to finish and creates a logical tree model of it.

SWE 444 - Internet & Web Application Development5.175 Creating a DOM Object function Load_DOM() { XMLDoc = new ActiveXObject("Microsoft.XMLDOM") XMLDoc.async = false XMLDoc.load("books.xml") if (XMLDoc.parseError.errorCode != 0) { alert("DOM Not Loaded: XML file has error(s)!") } } By default, the load() method returns control to the caller before the download is finished. This async="true" action avoids unusually long waits for the Web page to load while the DOM is loading a large document. For small to moderate size XML files use async="false" to load and display the page concurrently with the XML file. The document's parseError object returns an errorCode property as a decimal number associated with an error. If the error code is zero, then loading of the file into the DOM was successful.

SWE 444 - Internet & Web Application Development5.176 Example 1: Cross-Browser Code var xmlDoc function loadXML() { //load xml file if (window.ActiveXObject) {// code for IE xmlDoc = new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async=false; xmlDoc.load("note.xml"); getmessage() }else if (document.implementation && //code for Mozilla document.implementation.createDocument) { xmlDoc= document.implementation.createDocument("","",null); xmlDoc.load("note.xml"); xmlDoc.onload=getmessage } else { alert('Your browser cannot handle this script'); } } // continued

SWE 444 - Internet & Web Application Development5.177 Cross-Browser Code function getmessage() { document.getElementById("to").innerHTML= xmlDoc.getElementsByTagName("to")[0].firstChild.nodeValue document.getElementById("from").innerHTML= xmlDoc.getElementsByTagName("from")[0].firstChild.nodeValue document.getElementById("message").innerHTML= xmlDoc.getElementsByTagName("body")[0].firstChild.nodeValue } W3Schools Internal Note To: From: Message:

SWE 444 - Internet & Web Application Development5.1 5. Data Description and Transformation 1. XML 2....

Documents

Transcript of SWE 444 - Internet & Web Application Development5.1 5. Data Description and Transformation 1. XML 2....