Processing XML with Java
-
Upload
cameron-sellers -
Category
Documents
-
view
47 -
download
2
description
Transcript of Processing XML with Java
![Page 1: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/1.jpg)
1
Processing XML with JavaProcessing XML with Java
Representation and Management of Data on the Internet
A comprehensive tutorial about XML processing with Java
XML tutorial of W3Schools
![Page 2: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/2.jpg)
2
Resources used for this presentationResources used for this presentation
• The Hebrew University of Jerusalem – CS Faculty.
• An Introduction to XML and Web Technologies – Course’s Literature.
![Page 3: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/3.jpg)
3
ParsersParsers
• What is a parser?
Parser
Formal grammar
Input AnalyzedData
The structure(s) of the input, according to the atomic elements
and their relationships (as described in the grammar)
![Page 4: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/4.jpg)
4
XML-Parsing StandardsXML-Parsing Standards
• We will consider two parsing methods that implement W3C standards for accessing XML
• DOM- convert XML into a tree of objects - “random access” protocol
• SAX- “serial access” protocol- event-driven parsing
![Page 5: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/5.jpg)
5
XML ExamplesXML Examples
![Page 6: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/6.jpg)
6
<?xml version="1.0"?>
<!DOCTYPE countries SYSTEM "world.dtd">
<countries>
<country continent="&as;">
<name>Israel</name>
<population year="2001">6,199,008</population>
<city capital="yes"><name>Jerusalem</name></city>
<city><name>Ashdod</name></city>
</country>
<country continent="&eu;">
<name>France</name>
<population year="2004">60,424,213</population>
</country>
</countries>
world.xmlworld.xmlroot element
validating DTD file
reference to an entity
![Page 7: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/7.jpg)
7
countries
country
Asia
continent
Israel
name
year
2001
6,199,008
population
city
capital
yes
name
Jerusalem
country
Europe
continent
France
nameyear
2004
60,424,213
population
city
capital
no
name
Ashdod
XML Tree ModelXML Tree Modelelement
element
attribute simple content
![Page 8: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/8.jpg)
8
<!ELEMENT countries (country*)> <!ELEMENT country (name,population?,city*)> <!ATTLIST country continent CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT city (name)><!ATTLIST city capital (yes|no) "no"> <!ELEMENT population (#PCDATA)> <!ATTLIST population year CDATA #IMPLIED> <!ENTITY eu "Europe"> <!ENTITY as "Asia"><!ENTITY af "Africa"><!ENTITY am "America"><!ENTITY au "Australia">
world.dtdworld.dtd
default value
Open world.xml in your browser
Check world2.xml for #PCDATA exmaple
As opposed to
required
parsedNot parsed
![Page 9: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/9.jpg)
9
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:em>DBI:</xhtml:em>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
“xhtml” namespace declaration
default namespace declaration
namespace overriding
(non-parsed) character data
NamespacesNamespaces
![Page 10: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/10.jpg)
10
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1> DBI </xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “http://www.w3.org/1999/xhtml”
Local name: “h1”
Qualified name: “xhtml:h1”
![Page 11: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/11.jpg)
11
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1> DBI </xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “http://www.cs.huji.ac.il/~dbi/comments”
Local name: “par”
Qualified name: “par”
![Page 12: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/12.jpg)
12
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1>DBI</xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “”
Local name: “title”
Qualified name: “title”
![Page 13: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/13.jpg)
13
<?xml version="1.0"?>
<forsale date="12/2/03"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<book>
<title> <xhtml:h1>DBI</xhtml:h1>
<![CDATA[Where I Learned <xhtml>.]]>
</title>
<comment
xmlns="http://www.cs.huji.ac.il/~dbi/comments">
<par>My <xhtml:b> favorite </xhtml:b> book!</par>
</comment>
</book>
</forsale>
sales.xmlsales.xml
Namespace: “http://www.w3.org/1999/xhtml”
Local name: “b”
Qualified name: “xhtml:b”
![Page 14: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/14.jpg)
14
DOM DOM –– D Documentocument O Objectbject MModelodel
![Page 15: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/15.jpg)
15
DOM ParserDOM Parser
• DOM = Document Object Model
• Parser creates a tree object out of the document
• User accesses data by traversing the tree- The tree and its traversal conform to a W3C standard
• The API allows for constructing, accessing and manipulating the structure and content of XML documents
![Page 16: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/16.jpg)
16
<?xml version="1.0"?>
<!DOCTYPE countries SYSTEM "world.dtd">
<countries>
<country continent="&as;">
<name>Israel</name>
<population year="2001">6,199,008</population>
<city capital="yes"><name>Jerusalem</name></city>
<city><name>Ashdod</name></city>
</country>
<country continent="&eu;">
<name>France</name>
<population year="2004">60,424,213</population>
</country>
</countries>
![Page 17: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/17.jpg)
17
The DOM TreeThe DOM TreeDocument
countries
country
Asia
continent
Israel
name
year
2001
6,199,008
population
city
capital
yes
name
Jerusalem
country
Europe
continent
France
nameyear
2004
60,424,213
population
city
capital
no
name
Ashdod
![Page 18: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/18.jpg)
18
Using a DOM TreeUsing a DOM Tree
DOM Parser DOM TreeXML File
API
Application
in memory
![Page 19: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/19.jpg)
19
![Page 20: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/20.jpg)
20
Creating a DOM TreeCreating a DOM Tree
• A DOM tree is generated by a DocumentBuilder
• The builder is generated by a factory, in order to be
implementation independent
• The factory is chosen according to the system
configuration
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("world.xml");
![Page 21: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/21.jpg)
21
Configuring the FactoryConfiguring the Factory
• The methods of the document-builder factory enable you to configure the properties of the document building
• For example- factory.setValidating(true) - factory.setIgnoringComments(false)
Read more about DocumentBuilderFactory Class, DocumentBuilder Class
![Page 22: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/22.jpg)
22
The The NodeNode Interface Interface
• The nodes of the DOM tree include- a special root (denoted document)
• The Document interface retrieved by builder.parse(…) actually extends the Node Interface
- element nodes - text nodes and CDATA sections- attributes- comments- and more ...
• Every node in the DOM tree implements the Node interface
![Page 23: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/23.jpg)
23
A light-weight fragment of the document. Can hold several sub-trees
InterfacesInterfaces in a DOM Tree in a DOM TreeDocumentFragment
Document
CharacterDataText
Comment
CDATASection
Attr
Element
DocumentType
Notation
Entity
EntityReference
ProcessingInstruction
Node
NodeList
NamedNodeMap
DocumentType
Figure as appears in : “The XML Companion” - Neil Bradley
![Page 24: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/24.jpg)
24
Interfaces in the DOM TreeInterfaces in the DOM Tree
Document
Document Type Element
Attribute Element ElementAttribute Text
ElementText Entity Reference TextText
TextComment
![Page 25: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/25.jpg)
25
Node NavigationNode Navigation
• Every node has a specific location in tree
• Node interface specifies methods for tree navigation- Node getFirstChild();- Node getLastChild();- Node getNextSibling();- Node getPreviousSibling();- Node getParentNode();- NodeList getChildNodes();- NamedNodeMap getAttributes()
![Page 26: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/26.jpg)
26
Node Navigation (cont)Node Navigation (cont)
getFirstChild()
getPreviousSibling()
getChildNodes()
getNextSibling()
getLastChild()
getParentNode()
![Page 27: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/27.jpg)
27
Node PropertiesNode Properties
• Every node has- a type- a name- a value- attributes
• The roles of these properties differ according to the node types
• Nodes of different types implement different interfaces (that extend Node)
![Page 28: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/28.jpg)
28
InterfacenodeNamenodeValueattributes
Attrname of attributevalue of attributenull
CDATASection"#cdata-section"content of the Sectionnull
Comment"#comment"content of the commentnull
Document"#document"nullnull
DocumentFragment"#document-fragment"nullnull
DocumentTypedoc-type namenullnull
Elementtag namenullNodeMap
Entityentity namenullnull
EntityReferencename of entity referencednullnull
Notationnotation namenullnull
ProcessingInstructiontargetentire contentnull
Text"#text"content of the text nodenull
Names, Values and AttributesNames, Values and Attributes
![Page 29: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/29.jpg)
29
Node Types - Node Types - getNodeType()getNodeType()
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node …}
Read more about Node Interface
![Page 30: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/30.jpg)
30
import org.w3c.dom.*;
import javax.xml.parsers.*;
public class EchoWithDom {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(“world.xml");
new EchoWithDom().echo(doc);
}
![Page 31: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/31.jpg)
31
private void echo(Node n) {
print(n);
if (n.getNodeType() == Node.ELEMENT_NODE) {
NamedNodeMap atts = n.getAttributes();
++depth;
for (int i = 0; i < atts.getLength(); i++) echo(atts.item(i));
--depth; }
depth++;
for (Node child = n.getFirstChild(); child != null;
child = child.getNextSibling()) echo(child);
depth--;
}
![Page 32: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/32.jpg)
32
private int depth = 0;
private String[] NODE_TYPES = {
"", "ELEMENT", "ATTRIBUTE", "TEXT", "CDATA",
"ENTITY_REF", "ENTITY", "PROCESSING_INST",
"COMMENT", "DOCUMENT", "DOCUMENT_TYPE",
"DOCUMENT_FRAG", "NOTATION" };
private void print(Node n) {
for (int i = 0; i < depth; i++) System.out.print(" ");
System.out.print(NODE_TYPES[n.getNodeType()] + ":");
System.out.print("Name: "+ n.getNodeName());
System.out.print(" Value: "+ n.getNodeValue()+"\n");
}}
![Page 33: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/33.jpg)
33
public class WorldParser {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder =
factory.newDocumentBuilder();
Document doc = builder.parse("world.xml");
printCities(doc);
}
Another ExampleAnother Example
![Page 34: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/34.jpg)
34
public static void printCities(Document doc) {
NodeList cities = doc.getElementsByTagName("city");
for(int i=0; i<cities.getLength(); ++i) {
printCity((Element)cities.item(i));
}
}
public static void printCity(Element city) {
Node nameNode =
city.getElementsByTagName("name").item(0);
String cName = nameNode.getFirstChild().getNodeValue();
System.out.println("Found City: " + cName);
}
Another Example (cont)Another Example (cont)
Searches within descendents
![Page 35: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/35.jpg)
35
Normalizing the DOM TreeNormalizing the DOM Tree
• Normalizing a DOM Tree has two effects:- Combine adjacent textual nodes- Eliminate empty textual nodes
• To normalize, apply the normalize() method to the document element
Created by node
manipulation…
![Page 36: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/36.jpg)
36
Node ManipulationNode Manipulation
• Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc.
• To constructs new nodes, use the methods of Document- createElement, createAttribute, createTextNode,
createCDATASection etc.
• To manipulate a node, use the methods of Node- appendChild, insertBefore, removeChild, replaceChild,
setNodeValue, cloneNode(boolean deep) etc.
![Page 37: Processing XML with Java](https://reader036.fdocuments.net/reader036/viewer/2022062408/56813510550346895d9c6611/html5/thumbnails/37.jpg)
37
Node Manipulation (cont)Node Manipulation (cont)
RefNew
insertBefore
Old
New
replaceChild
cloneNode
deep = 'false'
deep = 'true'
Figure as appears in “The XML Companion” - Neil Bradley