5 xml parsing
-
Upload
gauravashq -
Category
Education
-
view
1.003 -
download
0
description
Transcript of 5 xml parsing
XML Parsing | Atul Kahate 2
XML Processing XML processing means
Reading an XML document Parsing it in the desired manner
Allows handling the contents of an XML document the way we want
XML Parsing | Atul Kahate 3
XML Parser Software that sits between an
application and the XML files Shield programmers from having to
manually parse through XML documents
Programmers are free to concentrate on the contents of the XML file, not syntax
Programmers use the parser APIs to access/manipulate an XML file
XML Parsing | Atul Kahate 4
XML Processing Approaches Process as a sequence of events
Simple API for XML Processing (SAX)
Process as a hierarchy of nodes Document Object Model (DOM)
Pull approach Streaming API (StAX)
XML Parsing | Atul Kahate 5
SAX Versus DOM
XML Parsing | Atul Kahate 6
StAX Pulls events from the XML
document via the parser Also an event-based API, but
differs from SAX The application, and not the parser;
controls the flow
Simple API for XML (SAX)
XML Parsing | Atul Kahate 8
XML Processing as Sequence of Events – 1 Process as a sequence of events
Event is the occurrence of something noticeable
e.g. in Windows, mouse movement, keyboard input are events
The OS captures all events and sends messages to a program
The programmer has to take an appropriate action to deal with the event
XML Parsing | Atul Kahate 9
XML Processing as Sequence of Events – 2 Process as a sequence of events
Event-based model can be applied to XML documents also
Various events that occur while reading an XML document sequentially
Start of document Start tag of an element End tag of an element Comments
XML Parsing | Atul Kahate 10
XML Processing as Sequence of Events – 3 Process as a sequence of events
The programmer has to write code to handle these events
Called as event handlers
XML Parsing | Atul Kahate 11
Sequential Processing Example – 1 Consider the following XML document<?xml version=“1.0”?><books>
<book><name> Learning XML </name><author> Simon North </author><publication> TMH </publication>
</book><book>
<name> XML by Example </name><author> Don Box </author><publication> Pearson </publication>
</book></books>
XML Parsing | Atul Kahate 12
Sequential Processing Example – 2 Events generated when we read the
above XML fileStart documentStart element: booksStart element: bookStart element: nameCharacters: Learning XML End element: nameStart element: authorCharacters: Simon North End element: authorStart element: publication Characters: TMH End element: publication…End element: bookEnd document
XML Parsing | Atul Kahate 13
Sample XML Tree
XML Parsing | Atul Kahate 14
Tree Processing Sequence1
2 8
3 4 9 10 14 15
5 6 7 11 12 13 16 17
XML Parsing | Atul Kahate 15
Sequential Traversal: Summary Order
Top to bottom Left to right
Advantages Simple Fast Requires less amount of memory
Drawback Not possible to look ahead
XML Parsing | Atul Kahate 16
SAX Concept
JAXP
Java API for XML Processing
XML Parsing | Atul Kahate 18
JAXP Concept
Application program written in Java for working with XML
Java API for XML Processing (JAXP)
JAXP APIs
Simple API for XML Processing (SAX)
Document Object Model (DOM)
Sequential processing Tree-based processing
XML Parsing | Atul Kahate 19
JAXP Java API for XML Processing Standardized by Sun Very thin layer on top of SAX or DOM Makes application code parser-
independent Our programs should use JAXP,
which in turn, calls parser APIs Include package javax.xml.parsers.*
XML Parsing | Atul Kahate 20
JAXP: API or Abstraction? JAXP is an API, but is called as an
abstraction layer Does not provide new means of parsing
XML Does not add to SAX or DOM Does not give new functionality to Java
or XML handling Makes working with SAX and DOM easier It is vendor-neutral
XML Parsing | Atul Kahate 21
JAXP and Parsing JAXP is not a replacement for SAX, DOM, JDOM
etc Some vendor must supply the implementation of
SAX, DOM, etc JAXP provides APIs to use these implementations
In the early versions of JDK, Sun had supplied a parser called Crimson
Now, Sun provides Apache Xerces Both are not a part of JAXP API – they are part of JAXP
distribution In JDK, we can locate Xerces implementations in
the org.xml.sax and org.w3c.dom packages
XML Parsing | Atul Kahate 22
JAXP API The main JAXP APIs are defined in the
package javax.xml.parsers Contains two vendor-neutral factory
classes SAXParserFactory – Gives a SAXParser
object DocumentBuilderFactory – Gives a
DocumentBuilder object DocumentBuilder, in turn, gives Document object
XML Parsing | Atul Kahate 23
Package Details javax.xml.parsers
The JAXP APIs, which provide a common interface for different vendors' SAX and DOM parsers.
org.w3c.dom Defines the Document class (a DOM), as well as
classes for all of the components of a DOM. org.xml.sax
Defines the basic SAX APIs. javax.xml.transform
Defines the XSLT APIs that let you transform XML into other forms.
XML Parsing | Atul Kahate 24
Which Packages to use in JAXP? We need to include two sets of packages – one
for JAXP and the other for SAX/DOM, as appropriate
// JAXP import javax.xml.parsers.SAXParserFactory;
// SAX import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory;
SAX Programming in JAXP
XML Parsing | Atul Kahate 26
SAX Approach
XML Parsing | Atul Kahate 27
Key SAX APIs – 1 SAXParserFactory
Creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.
SAXParser
An interface that defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.
XML Parsing | Atul Kahate 28
Key SAX APIs – 2 SAXReader
The SAXParser wraps a SAXReader. Typically, you don't care about that, but every once in a while you need to get hold of it using SAXParser's getXMLReader(), so you can configure it. It is the SAXReader which carries on the conversation with the SAX event handlers you define.
DefaultHandler Not shown in the diagram, a DefaultHandler
implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.
XML Parsing | Atul Kahate 29
1 – Specify the Parser Various approaches are possible
Set a system property for javax.xml.parsers.SAXParserfactory
Specify the parser in jre_dir/lib/jaxp.properties
Use system-dependent default parser (check documentation)
Usually done at the time of JDK installation itself automatically
XML Parsing | Atul Kahate 30
1 – Specify the Parser Example
Public static void main (String [] args){
String jaxpPropertyName = “javax.xml.parsers.SAXParserFactory”;…
}
XML Parsing | Atul Kahate 31
2 – Create a Parser Instance Steps
i. Create an instance of a parser factoryii. Use that to create a SAXParser object
Example
SAXParserFactory factory = SAXParserFactory.newInstance ();
SaxParser p = factory.newSAXParser ();
XML Parsing | Atul Kahate 32
3 – Create an Event Handler Event handler responds to
parsing events It is a subclass of DefaultHandler
public class MyHandler extends DefaultHandler { … }
Main event methods (callbacks) startDocument, endDocument startElement, endElement characters, ignoreableWhitespace
XML Parsing | Atul Kahate 33
3 – Create an Event Handler Example method: startElementDeclaration
public void startElement (String nameSpaceURI, String localName, String qualifiedName, Attributes attributes)
throws SASExceptionArgumentsnameSpaceURI URI identifying the namespace uniquelylocalName Element name without namespace prefixqualifiedName Complete element name, including
namespace prefixattributes Attributes object, representing attributes of the
element
XML Parsing | Atul Kahate 34
3 – Create an Event Handler
nameSpaceURI
<cwp:book xmlns:cwp=“http://www.test.com/xml/”>
qualifiedName attribute[1]
<cwp:chapter number=“23” part=“Server programming”>
<cwp:title> XML made easy </cwp:title></cwp:chapter>
localName
</cwp:book>
XML Parsing | Atul Kahate 35
4 – Invoke the Parser Call the parse method, supplying:
The content handler The XML document
File or Input stream
p.parse (file name, handler);
XML Parsing | Atul Kahate 36
Sample XML File (emp.xml) <?xml version="1.0" encoding="UTF-8"?>
<root> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> </root>
XML Parsing | Atul Kahate 37
Java Program to Count Total Number of Elements import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler;
public class SAXEmployeeCount extends DefaultHandler { int tagCount = 0;
public void startElement (String uri, String localName, String rawName, Attributes attributes) { tagCount++;
}
public void endDocument() { System.out.println("There are " + tagCount + " elements."); }
public static void main(String[] args) { SAXEmployeeCount handler = new SAXEmployeeCount ();
try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser ();
parser.parse("employee.xml", handler); } catch (Exception ex) { System.out.println(ex); } } }
XML Parsing | Atul Kahate 38
Count Only Book Elements<?xml version="1.0"?><books> <book category="reference"> <author>Nigel Rees</author> <title>Sayings of the Century</title> <price>8.95</price> </book> <book category="fiction"> <author>Evelyn Waugh</author> <title>Sword of Honour</title> <price>12.99</price> </book> <book category="fiction"> <author>Herman Melville</author> <title>Moby Rick</title> <price>8.99</price> </book></books>
XML Parsing | Atul Kahate 39
Parsing Code in JAXP import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;
public class BookCount extends DefaultHandler{
private int count = 0;
public void startDocument() throws SAXException { System.out.println("Start document ..."); }
public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {
int year = 0; String attrValue;
System.out.println ("Current element = " + raw);
if (raw.equals ("book")) { count++; } }
public void endDocument() throws SAXException { System.out.println("The total number of books = " + count); }
public static void main (String[] args) throws Exception { BookCount handler = new BookCount ();
try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("book.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 40
Specifying Parser Name import java.io.IOException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;
public class SAXApp extends DefaultHandler{
// default parser to use protected static final String DEFAULT_PARSER_NAME = "org.apache.xerces.parsers.SAXParser";
private int count = 0;
public void countTopics () throws IOException, SAXException { // create parser try { System.out.println ("Inside countTopics"); } catch (Exception e) { e.printStackTrace(System.err); } }
public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { if (raw.equals("topic")) count++; System.out.println (raw); }
public void endDocument() throws SAXException { System.out.println("There are " + count + " topics"); }
public static void main (String[] args) throws Exception{ System.out.println ("Inside main ...");
SAXApp handler = new SAXApp();
try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("contents.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 41
Exercise Consider the following XML file and write a program to
count the number of elements that have at least one attribute.
<?xml version="1.0"?><BOOKS>
<BOOK pubyear="1929"><BOOK_TITLE>Look Homeward, Angel</BOOK_TITLE><AUTHOR>Wolfe, Thomas</AUTHOR>
</BOOK><BOOK pubyear="1973">
<BOOK_TITLE>Gravity's Rainbow</BOOK_TITLE><AUTHOR>Pynchon, Thomas</AUTHOR>
</BOOK><BOOK pubyear="1977">
<BOOK_TITLE>Cards as Weapons</BOOK_TITLE><AUTHOR>Jay, Ricky</AUTHOR>
</BOOK><BOOK pubyear="2001">
<BOOK_TITLE>Computer Networks</BOOK_TITLE><AUTHOR>Tanenbaum, Andrew</AUTHOR>
</BOOK></BOOKS>
XML Parsing | Atul Kahate 42
Solution import java.io.IOException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;
public class countAttr extends DefaultHandler{ private int count = 0; public void startDocument() throws SAXException { System.out.println("Start document ..."); } public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { System.out.println ("Current element = " + raw); if (attrs.getLength () != 0) { count++; } } public void endDocument() throws SAXException { System.out.println("The total number of attributes = " + count); }
public static void main (String[] args) throws Exception { countAttr handler = new countAttr (); try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("countAttr.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 43
Exercise For the same XML file, display
element names only if the book is published in the 1970s.
XML Parsing | Atul Kahate 44
Solution import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;
public class seventiesBooks extends DefaultHandler{ private int count = 0; public void startDocument() throws SAXException { System.out.println("Start document ..."); } public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { int year = 0; String attrValue; System.out.println ("Current element = " + raw); if (attrs.getLength () > 0) { attrValue = attrs.getValue (0); year = Integer.parseInt (attrValue); if (year < 1970) { count++; } } } public void endDocument() throws SAXException { System.out.println("The total number of matching elements = " + count); }
public static void main (String[] args) throws Exception { seventiesBooks handler = new seventiesBooks(); try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("countAttr.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 45
Exercise Consider the following XML document (stock.xml) <?xml version="1.0"?>
<stock> <stockinfo symbol="IFL"> <company>i-flex solutions limited</company> <price>2500</price> </stockinfo> <stockinfo symbol="HLL"> <company>Hindustan Lever</company> <price>1840</price> </stockinfo> <stockinfo symbol="LT"> <company>Laresn and Toubro</company> <price>2678</price> </stockinfo> <stockinfo symbol="Rel"> <company>Reliance Communications</company> <price>1743</price> </stockinfo> </stock> Produce output as shown on the next slide
XML Parsing | Atul Kahate 46
Expected Output
XML Parsing | Atul Kahate 47
Solution import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*;
public class DisplayStockDetails extends DefaultHandler {
public void startDocument () throws SAXException { System.out.println ("\nDisplaying Stock Details"); System.out.println ("=========================\n"); }
public void endDocument () throws SAXException { System.out.println ("\nEnd of Details"); System.out.println ("==============\n"); }
public void startElement (String uri, String local, String raw, Attributes attrs) throws SAXException {
// Skip processing root element if (local.equals ("stock")) return;
// Skip processing if there are no attributes if (attrs == null) return;
for (int i=0; i<attrs.getLength (); i++) { System.out.println ("[Symbol: " + attrs.getValue (i) + "]"); } }
public void endElement (String uri, String local, String raw) throws SAXException { // System.out.println (); }
public void characters (char[] ch, int start, int length) throws SAXException { System.out.println (new String (ch, start, length)); }
public static void main (String[] args) throws Exception {
DisplayStockDetails handler = new DisplayStockDetails ();
try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("stock.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 48
Exercise Consider the following XML document. Write a Java program to find out the maximum price, and also display the author name corresponding to
that book. <?xml version="1.0"?> <books> <book category="reference"> <author>Nigel Rees</author> <title>Sayings of the Century</title> <price>8</price> </book> <book category="fiction"> <author>Evelyn Waugh</author> <title>Sword of Honour</title> <price>12</price> </book> <book category="fiction"> <author>Herman Melville</author> <title>Moby Rick</title> <price>8</price> </book> <book category="non-fiction"> <author>Bill Bryson</author> <title>A Short History Of Everything</title> <price>20</price> </book> <book category="reference"> <author>Herb Schildt</author> <title>Java - The Complete Reference</title> <price>23</price> </book> <book category="non-fiction"> <author>Paul Smith</author> <title>The Scientists</title> <price>12</price> </book> </books>
XML Parsing | Atul Kahate 49
Solution package saxexamples;
import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler;
public class HighestPricedBook extends DefaultHandler {
private int maxPriceBookPrice = 0; private String currentBookAuthor, maxPriceBookAuthor; private boolean flagIsCurrentElementPrice = false, flagIsCurrentElementAuthor = false;
public void startDocument() throws SAXException { System.out.println("Start document ..."); }
public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {
System.out.println("Current element = " + raw);
if (raw.equals("author")) { flagIsCurrentElementAuthor = true; System.out.println ("Current element is book"); } else if (raw.equals("price")) { flagIsCurrentElementPrice = true; }
}
public void characters(char[] ch, int start, int len) throws SAXException {
if (flagIsCurrentElementAuthor == true) { flagIsCurrentElementAuthor = false; System.out.println("ch = " + ch); System.out.println("start = " + start); System.out.println("len = " + len);
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < len; i++) { buffer.append(ch[start + i]); }
System.out.println("*** buffer = " + buffer + " ***"); currentBookAuthor = buffer.toString(); System.out.println("*** str = " + currentBookAuthor + " ***"); }
else if (flagIsCurrentElementPrice == true) {
flagIsCurrentElementPrice = false;
System.out.println("ch = " + ch); System.out.println("start = " + start); System.out.println("len = " + len);
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < len; i++) { buffer.append(ch[start + i]); }
System.out.println("*** buffer = " + buffer + " ***");
String str = buffer.substring(0); int uprice = Integer.parseInt(str);
if (uprice > maxPriceBookPrice) { maxPriceBookPrice = uprice; maxPriceBookAuthor = currentBookAuthor; }
System.out.println("Current maximum price = " + maxPriceBookPrice); }
}
public void endDocument() throws SAXException { System.out.println("The book author with the maximum price = " + maxPriceBookAuthor); System.out.println("And the book price = " + maxPriceBookPrice); }
public static void main(String[] args) throws Exception { HighestPricedBook handler = new HighestPricedBook();
try { SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser parser = spf.newSAXParser(); parser.parse("book.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
XML Parsing | Atul Kahate 50
Exercise Consider the following XML file and write a program to find out and display the total cost for
all CDs.
<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>Candle in the wind</title> <artist>Elton John</artist> <country>UK</country> <company>HMV</company> <price>8.20</price> <year>1998</year> </cd> </catalog>
XML Parsing | Atul Kahate 51
Solution import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;
public class CDPrice extends DefaultHandler{
private int count = 0, total = 0; private boolean flagIsAvailable = false, flagIsCurrentElementPrice = false;
public void startDocument() throws SAXException { System.out.println("Start document ..."); }
public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {
int year = 0; String attrValue;
System.out.println ("Current element = " + raw);
if (raw.equals ("price")) { flagIsCurrentElementPrice = true; System.out.println ("INSIDE if of startElement ==="); }
}
public void characters (char [] ch, int start, int len) throws SAXException {
if (flagIsCurrentElementPrice) {
System.out.println ("ch = " + ch); System.out.println ("start = " + start); System.out.println ("len = " + len);
StringBuffer buffer = new StringBuffer ();
for (int i=0; i<len; i++) { buffer.append (ch[start+i]); }
System.out.println ("*** buffer = " + buffer + " ***");
String str = buffer.substring (0); int uprice = Integer.parseInt(str);
total += uprice; flagIsCurrentElementPrice = false; System.out.println ("Current total = " + total);
}
}
public void endDocument() throws SAXException { System.out.println("The total price of available CDs = " + total); }
public static void main (String[] args) throws Exception { CDPrice handler = new CDPrice();
try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("cdcatalog2.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }
Document Object Model (DOM)
XML Parsing | Atul Kahate 53
DOM – Basic Flow
XML Parsing | Atul Kahate 54
Basic Concepts
XML Parsing | Atul Kahate 55
JAXP and DOM – Overview Class DocumentBuilderFactory
public abstract class javax.xml.parsers.DocumentBuilderFactory extends java.lang.object
Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents
parse method: Parses the contents of an XML document and returns the contents as a new Document object
XML Parsing | Atul Kahate 56
JAXP and DOM – Overview Class DocumentBuilder
public abstract class javax.xml.parsers. DocumentBuilder extends java.lang.Object
Defines the API to obtain DOM Document instances from an XML document
XML Parsing | Atul Kahate 57
JAXP and DOM – Overview Interface Document
public interface Document extends Node
The Document interface represents the entire HTML or XML document
Conceptually, it is the root of the document tree, and provides the primary access to the document's data
XML Parsing | Atul Kahate 58
JAXP and DOM – Overview Interface Element
public interface Element extends Node The Element interface represents an
element in an HTML or XML document Elements may have attributes associated
with them Inherits from Node, the generic Node
interface attributes may be used to retrieve the set of
all attributes for an element
XML Parsing | Atul Kahate 59
JAXP and DOMDocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance ();DocumentBuilder builder =
factory.newDocumentBuilder ();Document document = builder.parse (fileName);Element root = document.getDocumentElement
();
XML Parsing | Atul Kahate 60
Example – XML File Count the number of Employee elements from this XML using DOM <?xml version="1.0"?>
<BOOKS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="book.xsd"> <BOOK> <TITLE>Computer Networks</TITLE> <AUTHOR>Andrew Tanenbaum</AUTHOR> <PUBLISHER>Pearson Education</PUBLISHER> <PRICE>400</PRICE> <CATEGORY>Computer Science</CATEGORY> </BOOK> <BOOK> <TITLE>TCP/IP</TITLE> <AUTHOR>Douglas Comer</AUTHOR> <PUBLISHER>Pearson Education</PUBLISHER> <PRICE>350</PRICE> <CATEGORY>Computer Science</CATEGORY> </BOOK> </BOOKS>
XML Parsing | Atul Kahate 61
Example – Java Code /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package domexamples;
import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;
public class DOMCountExample {
public static void main(String[] args) {
try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");
// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;
LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");
DOMConfiguration config = parser.getDomConfig();
// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);
// Set schema validation parameters config.setParameter("validate", Boolean.TRUE); config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");
config.setParameter("validate-if-schema", Boolean.TRUE); config.setParameter("schema-location", "book.xsd");
Document document = parser.parseURI("book.xml"); Element root = document.getDocumentElement();
NodeList nodes = document.getElementsByTagName("BOOK"); System.out.println("There are " + nodes.getLength() + " elements.");
} catch (Exception ex) { System.out.println(ex); } } }
XML Parsing | Atul Kahate 62
Case Study – XML File<?xml version="1.0"?><catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10</price> <year>1985</year> </cd> <cd> <title>Candle in the wind</title> <artist>Elton John</artist> <country>UK</country> <company>HMV</company> <price>8</price> <year>1998</year> </cd></catalog>
XML Parsing | Atul Kahate 63
Problem Write a program to find out if an
element by the name price exists in the XML file and display its contents
XML Parsing | Atul Kahate 64
Solution package domexamples;
import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;
public class CountPriceElements { public static void main (String[] args) { NodeList elements; String elementName = "price"; try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");
// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;
LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");
DOMConfiguration config = parser.getDomConfig();
// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);
// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");
// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");
Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement(); System.out.println ("In main ... XML file openend successfully ..."); elements = document.getElementsByTagName(elementName);
// is there anything to do? if (elements == null) { return; }
// print all elements int elementCount = elements.getLength(); System.out.println ("Count = " + elementCount); for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); Node node = element.getFirstChild(); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + node.getNodeType()); System.out.println("Element Value = " + node.getNodeValue()); System.out.println("Has attributes = " + node.hasAttributes()); } } catch (DOMException e2) { System.out.println ("Exception: " + e2); } catch (Exception e3) { System.out.println ("Exception: " + e3); } } }
XML Parsing | Atul Kahate 65
Problem Write a program to display
element names and their attribute names and values
XML Parsing | Atul Kahate 66
Solution package domexamples;
import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;
public class DOMAttributesExample {
public static void main(String[] args) {
NodeList elements; String elementName = "cd";
try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");
// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;
LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");
DOMConfiguration config = parser.getDomConfig();
// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);
// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");
// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");
Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement();
System.out.println("In main ... XML file openend successfully ...");
elements = document.getElementsByTagName(elementName);
// is there anything to do? if (elements == null) { return; }
// print all elements int elementCount = elements.getLength(); System.out.println("Count = " + elementCount);
for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + element.getNodeType()); System.out.println("Element Value = " + element.getNodeValue()); System.out.println("Has attributes = " + element.hasAttributes());
// If attributes exist, print them if (element.hasAttributes()) { // if it does, store it in a NamedNodeMap object NamedNodeMap AttributesList = element.getAttributes();
// iterate through the NamedNodeMap and get the attribute names and values for (int j = 0; j < AttributesList.getLength(); j++) { System.out.println("Attribute: " + AttributesList.item(j).getNodeName() + " = " + AttributesList.item(j).getNodeValue()); } } } } catch (Exception e1) { System.out.println("Exception: " + e1); } } }
XML Parsing | Atul Kahate 67
Problem For a given element, find out all
the child elements and display their types
XML Parsing | Atul Kahate 68
Solution import org.w3c.dom.*;
package domexamples;
import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;
public class DOMGetChildrenOfAnElement { public static void main (String[] args) { NodeList elements, Children; String elementName = "cd"; String local = ""; try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");
// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;
LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");
DOMConfiguration config = parser.getDomConfig();
// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);
// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");
// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");
Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement();
System.out.println ("In main ... XML file openend successfully ..."); elements = document.getElementsByTagName(elementName);
// is there anything to do? if (elements == null) { return; }
// print all elements int elementCount = elements.getLength(); System.out.println ("Count = " + elementCount); for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); Node node = element.getFirstChild(); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + node.getNodeType()); System.out.println("Element Value = " + node.getNodeValue()); System.out.println("Has attributes = " + node.hasAttributes()); // Find out if child nodes exist for this element Children = element.getChildNodes();
if (Children != null) { for (int j=0; j< Children.getLength(); j++) { local = Children.item(j).getNodeName(); System.out.println ("Child element name = " + local); } } } } catch (Exception e) { System.out.println ("Exception: " + e); } } }
XML Parsing | Atul Kahate 69
Node Types 1 ELEMENT_NODE Element The element name
2 ATTRIBUTE_NODE Attribute The attribute name 3 TEXT_NODE Text #text 4 CDATA_SECTION_NODE CDATA #cdata-section 5 ENTITY_REFERENCE_NODE Entity reference The entity reference name 6 ENTITY_NODE Entity The entity name 7 PROCESSING_INSTRUCTION_NODE PI The PI target 8 COMMENT_NODE Comment #comment 9 DOCUMENT_NODE Document #document 10 DOCUMENT_TYPE_NODE DocType Root element 11 DOCUMENT_FRAGMENT_NODE DocumentFragment #document-fragment 12 NOTATION_NODE Notation The notation name
XML Parsing | Atul Kahate 70
Problem Write a program to create XML
contents dynamically and write them to a file on the disk
XML Parsing | Atul Kahate 71
Solution /* * To change this template, choose Tools | Templates * and open the template in the editor. */
package domexamples;
import org.w3c.dom.*; import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.ls.*;
import javax.xml.parsers.*;
/** * * @author atulk */ public class DOMNewXMLFileCreator { public void saveDocument () { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.newDocument(); Element catalog = document.createElement("catalog"); catalog.setAttribute("publisher", "McGraw-Hill"); document.appendChild (catalog); Element journal = document.createElement("journal"); journal.setAttribute("edition", "October 2005"); journal.setAttribute("section", "XML"); catalog.appendChild(journal); Element article = document.createElement("article"); journal.appendChild (article); Element title = document.createElement("title"); title.appendChild (document.createTextNode("DOM Parsing")); article.appendChild (title); Element author = document.createElement("author"); author.appendChild (document.createTextNode("Anonymous")); article.appendChild(author); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0"); DOMImplementationLS implLS = (DOMImplementationLS) domImpl; LSSerializer dom3Writer = implLS.createLSSerializer(); LSOutput output = implLS.createLSOutput(); System.out.println ("Outputting XML Document ..."); output.setByteStream (System.out); output.setEncoding ("UTF-8"); dom3Writer.write (document, output); System.out.println ("\n\n" + "Outputting the journal node" + "\n"); dom3Writer.write(journal, output); String nodeString = dom3Writer.writeToString(journal); } catch (Exception e) { e.printStackTrace(); } } public static void main (String [] args) { DOMNewXMLFileCreator fc = new DOMNewXMLFileCreator (); fc.saveDocument(); } }
Stream API (StAX)
XML Parsing | Atul Kahate 73
Creating an XML Document using StAX
package xmlbook.chapter7;
import java.io.*; import java.util.*; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamWriter; import javax.xml.stream.XMLStreamException;
public class CreateFTXML {
public void CreateXML(int fromAccount, int toAccount, int amount) {
// Create XMLOutputFactory object XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
try { // Create the XMLStreamWriter object FileWriter fileWriter = new FileWriter("d:\\FT.xml"); XMLStreamWriter writer = outputFactory.createXMLStreamWriter(fileWriter); // XMLStreamWriter writer = outputFactory.createXMLStreamWriter(System.out);
// Write XML data now writer.writeStartDocument("1.0"); writer.writeComment("Funds Transfer Data"); writer.writeStartElement("FundsTransfer");
writer.writeStartElement("FromAccount"); writer.writeCharacters(Integer.toString(fromAccount)); writer.writeEndElement();
writer.writeStartElement("ToAccount"); writer.writeCharacters(Integer.toString(toAccount)); writer.writeEndElement();
writer.writeStartElement("Amount"); writer.writeCharacters(Integer.toString(amount)); writer.writeEndElement();
writer.writeEndElement(); // for FundsTransfer element writer.writeEndDocument(); writer.flush(); writer.close(); } catch (IOException e) { e.printStackTrace(); } catch (XMLStreamException e) { e.printStackTrace(); } } }
XML Parsing | Atul Kahate 74
Using StAX to Read package xmlbook.chapter7;
import java.io.*; import java.util.*;
import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamReader; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException;
public class ReadFTXML {
public int parseXML(String whichElement) {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try {
// create an XMLStreamReader object FileReader fileReader = new FileReader("d:\\FT.xml"); XMLStreamReader reader = inputFactory.createXMLStreamReader(fileReader);
// read the XML now
while (reader.hasNext()) {
if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) { String elementName = reader.getLocalName(); if (elementName.equalsIgnoreCase(whichElement)) { String valueToReturn = reader.getElementText(); return Integer.parseInt(valueToReturn); }
}
reader.next(); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (XMLStreamException e) { e.printStackTrace(); }
return -1; // In the case of an error } }
XML and ASP.NET – An Overview
XML Parsing | Atul Kahate 76
XmlReader and XmlWriter XMLReader
Pull-style API for XML Forward-only, read-only access to XML
documents XMLReader is an abstract class that other
classes derive from, to provide specific concrete instances such as XmlTextReader and XmlNodeReader
In ASP.NET 2.0, XMLReader is a factory We need not specify which implementation of
XMLReader needs to be used We use a static Create method, and supply necessary
parameters and let .NET decide how to instantiate it
XML Parsing | Atul Kahate 77
Example – XML Document <?xml version="1.0" encoding="utf-8" ?> <bookstore> <book genre ="autobiography" publicationdate="1981" ISBN="1-861003-11-0"> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre ="novel" publicationdate="1967" ISBN="0-201-65512-2"> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book> <book genre ="philosophy" publicationdate="1991" ISBN="1-861001-57-6"> <title>The Gorgias</title> <author> <first-name>Sidas</first-name> <last-name>Plato</last-name> </author> <price>9.99</price> </book> </bookstore>
XML Parsing | Atul Kahate 78
Example – ASP.NET Page using System; using System.Data; using System.Configuration; using System.Collections; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using System.Xml; using System.IO;
public partial class XMLReader2 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true; settings.IgnoreComments = true;
string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");
using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount));
} }
XML Parsing | Atul Kahate 79
Validating an XML Against a Schema using System.Xml.Schema; using System; using System.Xml; using System.IO;
public partial class XMLReader3 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();
string booksSchemaFile = Path.Combine(Request.PhysicalApplicationPath, "books.xsd"); settings.Schemas.Add (null, XmlReader.Create (booksSchemaFile)); settings.ValidationType = ValidationType.Schema; settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings; settings.ValidationEventHandler += new ValidationEventHandler (settings_ValidationEventHandler); settings.IgnoreWhitespace = true; settings.IgnoreComments = true;
string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");
using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount)); }
void settings_ValidationEventHandler(object sender, System.Xml.Schema.ValidationEventArgs e) { Response.Write(e.Message); } }
XML Parsing | Atul Kahate 80
Creating an XML Document using System.Xml.Schema; using System; using System.Xml; using System.IO;
public partial class XMLReader3 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();
string booksSchemaFile = Path.Combine(Request.PhysicalApplicationPath, "books.xsd"); settings.Schemas.Add (null, XmlReader.Create (booksSchemaFile)); settings.ValidationType = ValidationType.Schema; settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings; settings.ValidationEventHandler += new ValidationEventHandler (settings_ValidationEventHandler); settings.IgnoreWhitespace = true; settings.IgnoreComments = true;
string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");
using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount)); }
void settings_ValidationEventHandler(object sender, System.Xml.Schema.ValidationEventArgs e) { Response.Write(e.Message); } }
Thank you!
Any Questions?