ICS 123
XML: It’s a Good Thing
Richard N. Taylor & Eric M. DashofyICS 123 S2002
ICS 123
2
Topic 10XML
Motivation
•“I'll never go hungry again!” –Scarlett O’Hara
•“I’ll never write a parser again!” – Anonymous XML User
•Data encoding is a perpetual problem in computer applications
•Lots of time is wasted writing parsers, lexers, marshalers, unmarshalers, data bindings, even meta-languages!
ICS 123
3
Topic 10XML
Existing Problems
App2 App3
App1
File Format 2File Format 3
File Format 1
Import Converter
Export Converter
3rd Party Converter
File Exchange
ICS 123
4
Topic 10XML
Why is this a problem?
•Everybody has a proprietary format
•Converters must be maintained by various parties
– This is an n2 problem!
•Something is usually lost in the translation
•Note: Same problems with data exchange across networked apps
ICS 123
5
Topic 10XML
Another Problem
Defining a File or Data Format
Parser
In-memoryRepresentation
Disk NetMeta-
Language
Helps to generate
Serializer
Data Bindings
Helps to generate
edits
ICS 123
6
Topic 10XML
Why is this a problem?
•Parsers, serializers, data bindings all have to be developed
•This development takes time
•Conflicting tools for assistance
•How do you evolve the file format?
ICS 123
7
Topic 10XML
Potential Solution
•To too many file formats:– Intermediate format
» Even better: Common format– An agreed-upon meta-language– Ability to extend language and ignore unknown constructs
•To tool-building:– Choose a suitable meta-language– Build tools surrounding that meta-language– Port those tools to different environments, but keep the APIs
semi-standard
ICS 123
8
Topic 10XML
What is XML
•Stolen from xml-computing.com:– eXtensible Markup Language– A way to represent structured data– a World Wide Web Consortium (W3C) standard – platform-independent – a way to create your own custom languages – license-free and well-supported – the future of computing?
•Buzzword-compliant!
ICS 123
9
Topic 10XML
Origins of XML
•From SGML– Standard Generalized Markup Language
•cf. HTML
•A document markup language– For annotating documents with metadata to make them
easier to interpret
Hi! My name is <NAME><FIRST>Eric</FIRST> <LAST>Dashofy</LAST></NAME>.
You can email me at <EMAIL>[email protected]</EMAIL>.
ICS 123
10
Topic 10XMLThe Times, They are a
Changin’
•XML is arguably more useful to simply encode data, outside the strict context of a document
<PERSON>
<NAME>
<FIRST>Eric</FIRST>
<LAST>Dashofy</LAST>
<DEPARTMENT>Information and Computer Science</DEPARTMENT>
<EMAIL>[email protected]</EMAIL>
</NAME>
</PERSON>
ICS 123
11
Topic 10XML
Terminology
•Tag– The markup of the document, enclosed in angle-brackets.
» <foo> is the start tag» </foo> is the end tag
– Tags may be nested, but may not cross» <A>foo<B>bar</B>baz</A> --OK!» <A>foo<B>bar</A>baz</B> --NO!
– Hierarchical data structure
ICS 123
12
Topic 10XML
Terminology
•Element– Stuff in between a start and end tag– Includes the tags– May contain nested elements– Ex:
» <a>foo</a>» <a>foo<b>bar</b></a>
• (nested)
ICS 123
13
Topic 10XML
Terminology
•Attribute– A way of annotating tags with additional info– Simple name-value pairs– Ex:
» <name lang=“English”>Henry</name>» <name lang=“Spanish”>Enrique</name>
ICS 123
14
Topic 10XML
Document
•A collection of elements, usually in a file
•One top-level element– Called the “root” element or “document” element– Some header stuff
<?xml version="1.0"?>
<person> <name> <first>Eric</first> <last>Dashofy</last> </name> <department>Information and Computer Science</department> <email>[email protected]</email></person>
ICS 123
15
Topic 10XML
Side-note:
•“If you don’t understand it, ignore it.”
ICS 123
16
Topic 10XML
Kinds of Documents
•“Well Formed”– Syntactically correct– All the start tags have end tags– All the start-quotes have end-quotes– etc.
•“Valid”– Well-formed, and conforms to some language specification
ICS 123
17
Topic 10XML
Why a meta-language?
•To define what elements, sub-elements, attributes are allowed
•And in what order
•So different organizations can agree on a real data format
– Well-formed documents don’t restrict how you encode the data, so they’re not very valuable
ICS 123
18
Topic 10XML
DTDs
•Document Type Definition– Part of XML 1.0– The original XML meta-language– Doesn’t look like XML– Like production rules
<!DOCTYPE FooDocument [ <!ELEMENT Foo (Bar*,Baz?,Booyah+)> <!ELEMENT Bar (#PCDATA)> <!ELEMENT Baz (#PCDATA)> <!ELEMENT Booyah (#PCDATA)> ]>
ICS 123
19
Topic 10XML
Namespaces
• “You keep on using that word, I do not think it means what you think it means.” –Inigo Montoya
• How can you make a document that draws elements from multiple DTDs?
<usa:address xmlns:usa=“http://www.dtds.com/usaddress.dtd”> <usa:street>1600 Pennsylvania Ave</usa:street> <usa:city>Washington</usa:city> <usa:state>DC</usa:state> <usa:zip>20509</usa:zip></usa:address>
<uk:address xmlns:uk=“http://www.dtds.com/ukaddress.dtd”> <uk:street>23B Baker Street</uk:street> <uk:city>London, England</uk:street> <uk:postcode>N22</uk:postcode></uk:address>
ICS 123
20
Topic 10XML
Why not DTDs?
•“Uhm, DTDs are bad, mmkay?” –Mr. Mackey– DTDs are lacking in some areas
» Don’t look like XML» Can’t specify at a level below elements
• i.e. can’t specify regular expressions on content
» Difficult to extend/add things to existing element definitions
» Difficult to implement modular languages
ICS 123
21
Topic 10XML
XML Schemas
•A DTD replacement from W3C– Look like XML / Easier to read– Contribute a type system to XML– Element, attribute definitions become types
» Single-inheritance model in the type system– Better namespace management
ICS 123
22
Topic 10XML
Example
<complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </sequence></complexType>
<complexType name="USAddress"> <complexContent> <extension base="Address"> <sequence> <element name="state" type="USState"/> <element name="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType>
ICS 123
23
Topic 10XML
Example, cont.
<complexType name="UKAddress"> <complexContent> <extension base="Address"> <sequence> <element name="postcode" type="UKPostcode"/> </sequence> <attribute name="exportCode" type="positiveInteger" fixed="1"/> </extension> </complexContent> </complexType>
ICS 123
24
Topic 10XML
What do you get?
•Lots of tools for free– Parsers
» DOM and SAX– Serializers– Transformation
» XSL(T)– A meta-language (two, actually )– Data Bindings– Syntax-directed editors
ICS 123
25
Topic 10XML
Spotlight: DOM & SAX
•APIs for accessing XML documents– SAX: Lightweight, callback based
» “I saw an element! Ooh, I saw an attribute!”– DOM: Parses entire document into an object tree in memory
In-memoryRepresentation
XML Document
DOM Parser
ICS 123
26
Topic 10XML
Spotlight: Data Bindings
•DOM API is very, very generic– Example functions:
» appendChild(Element n)» setAttribute(String name, String value)
– No namespace management
•Data bindings are APIs guided by the language definition
– Example functions:» addComponent(Component c);» setIdentifier(String id);
•Data bindings can be generated automatically
Top Related