5 xml parsing

81
XML Parsing Atul Kahate [email protected]

description

XML Parsing

Transcript of 5 xml parsing

Page 1: 5   xml parsing

XML Parsing

Atul Kahate

[email protected]

Page 2: 5   xml parsing

XML Parsing | Atul Kahate 2

XML Processing XML processing means

Reading an XML document Parsing it in the desired manner

Allows handling the contents of an XML document the way we want

Page 3: 5   xml parsing

XML Parsing | Atul Kahate 3

XML Parser Software that sits between an

application and the XML files Shield programmers from having to

manually parse through XML documents

Programmers are free to concentrate on the contents of the XML file, not syntax

Programmers use the parser APIs to access/manipulate an XML file

Page 4: 5   xml parsing

XML Parsing | Atul Kahate 4

XML Processing Approaches Process as a sequence of events

Simple API for XML Processing (SAX)

Process as a hierarchy of nodes Document Object Model (DOM)

Pull approach Streaming API (StAX)

Page 5: 5   xml parsing

XML Parsing | Atul Kahate 5

SAX Versus DOM

Page 6: 5   xml parsing

XML Parsing | Atul Kahate 6

StAX Pulls events from the XML

document via the parser Also an event-based API, but

differs from SAX The application, and not the parser;

controls the flow

Page 7: 5   xml parsing

Simple API for XML (SAX)

Page 8: 5   xml parsing

XML Parsing | Atul Kahate 8

XML Processing as Sequence of Events – 1 Process as a sequence of events

Event is the occurrence of something noticeable

e.g. in Windows, mouse movement, keyboard input are events

The OS captures all events and sends messages to a program

The programmer has to take an appropriate action to deal with the event

Page 9: 5   xml parsing

XML Parsing | Atul Kahate 9

XML Processing as Sequence of Events – 2 Process as a sequence of events

Event-based model can be applied to XML documents also

Various events that occur while reading an XML document sequentially

Start of document Start tag of an element End tag of an element Comments

Page 10: 5   xml parsing

XML Parsing | Atul Kahate 10

XML Processing as Sequence of Events – 3 Process as a sequence of events

The programmer has to write code to handle these events

Called as event handlers

Page 11: 5   xml parsing

XML Parsing | Atul Kahate 11

Sequential Processing Example – 1 Consider the following XML document<?xml version=“1.0”?><books>

<book><name> Learning XML </name><author> Simon North </author><publication> TMH </publication>

</book><book>

<name> XML by Example </name><author> Don Box </author><publication> Pearson </publication>

</book></books>

Page 12: 5   xml parsing

XML Parsing | Atul Kahate 12

Sequential Processing Example – 2 Events generated when we read the

above XML fileStart documentStart element: booksStart element: bookStart element: nameCharacters: Learning XML End element: nameStart element: authorCharacters: Simon North End element: authorStart element: publication Characters: TMH End element: publication…End element: bookEnd document

Page 13: 5   xml parsing

XML Parsing | Atul Kahate 13

Sample XML Tree

Page 14: 5   xml parsing

XML Parsing | Atul Kahate 14

Tree Processing Sequence1

2 8

3 4 9 10 14 15

5 6 7 11 12 13 16 17

Page 15: 5   xml parsing

XML Parsing | Atul Kahate 15

Sequential Traversal: Summary Order

Top to bottom Left to right

Advantages Simple Fast Requires less amount of memory

Drawback Not possible to look ahead

Page 16: 5   xml parsing

XML Parsing | Atul Kahate 16

SAX Concept

Page 17: 5   xml parsing

JAXP

Java API for XML Processing

Page 18: 5   xml parsing

XML Parsing | Atul Kahate 18

JAXP Concept

Application program written in Java for working with XML

Java API for XML Processing (JAXP)

JAXP APIs

Simple API for XML Processing (SAX)

Document Object Model (DOM)

Sequential processing Tree-based processing

Page 19: 5   xml parsing

XML Parsing | Atul Kahate 19

JAXP Java API for XML Processing Standardized by Sun Very thin layer on top of SAX or DOM Makes application code parser-

independent Our programs should use JAXP,

which in turn, calls parser APIs Include package javax.xml.parsers.*

Page 20: 5   xml parsing

XML Parsing | Atul Kahate 20

JAXP: API or Abstraction? JAXP is an API, but is called as an

abstraction layer Does not provide new means of parsing

XML Does not add to SAX or DOM Does not give new functionality to Java

or XML handling Makes working with SAX and DOM easier It is vendor-neutral

Page 21: 5   xml parsing

XML Parsing | Atul Kahate 21

JAXP and Parsing JAXP is not a replacement for SAX, DOM, JDOM

etc Some vendor must supply the implementation of

SAX, DOM, etc JAXP provides APIs to use these implementations

In the early versions of JDK, Sun had supplied a parser called Crimson

Now, Sun provides Apache Xerces Both are not a part of JAXP API – they are part of JAXP

distribution In JDK, we can locate Xerces implementations in

the org.xml.sax and org.w3c.dom packages

Page 22: 5   xml parsing

XML Parsing | Atul Kahate 22

JAXP API The main JAXP APIs are defined in the

package javax.xml.parsers Contains two vendor-neutral factory

classes SAXParserFactory – Gives a SAXParser

object DocumentBuilderFactory – Gives a

DocumentBuilder object DocumentBuilder, in turn, gives Document object

Page 23: 5   xml parsing

XML Parsing | Atul Kahate 23

Package Details javax.xml.parsers

The JAXP APIs, which provide a common interface for different vendors' SAX and DOM parsers.

org.w3c.dom Defines the Document class (a DOM), as well as

classes for all of the components of a DOM. org.xml.sax

Defines the basic SAX APIs. javax.xml.transform

Defines the XSLT APIs that let you transform XML into other forms.

Page 24: 5   xml parsing

XML Parsing | Atul Kahate 24

Which Packages to use in JAXP? We need to include two sets of packages – one

for JAXP and the other for SAX/DOM, as appropriate

// JAXP import javax.xml.parsers.SAXParserFactory;

// SAX import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.XMLReaderFactory;

Page 25: 5   xml parsing

SAX Programming in JAXP

Page 26: 5   xml parsing

XML Parsing | Atul Kahate 26

SAX Approach

Page 27: 5   xml parsing

XML Parsing | Atul Kahate 27

Key SAX APIs – 1 SAXParserFactory

Creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.

  SAXParser

An interface that defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

   

Page 28: 5   xml parsing

XML Parsing | Atul Kahate 28

Key SAX APIs – 2 SAXReader

The SAXParser wraps a SAXReader. Typically, you don't care about that, but every once in a while you need to get hold of it using SAXParser's getXMLReader(), so you can configure it. It is the SAXReader which carries on the conversation with the SAX event handlers you define.

DefaultHandler Not shown in the diagram, a DefaultHandler

implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.

Page 29: 5   xml parsing

XML Parsing | Atul Kahate 29

1 – Specify the Parser Various approaches are possible

Set a system property for javax.xml.parsers.SAXParserfactory

Specify the parser in jre_dir/lib/jaxp.properties

Use system-dependent default parser (check documentation)

Usually done at the time of JDK installation itself automatically

Page 30: 5   xml parsing

XML Parsing | Atul Kahate 30

1 – Specify the Parser Example

Public static void main (String [] args){

String jaxpPropertyName = “javax.xml.parsers.SAXParserFactory”;…

}

Page 31: 5   xml parsing

XML Parsing | Atul Kahate 31

2 – Create a Parser Instance Steps

i. Create an instance of a parser factoryii. Use that to create a SAXParser object

Example

SAXParserFactory factory = SAXParserFactory.newInstance ();

SaxParser p = factory.newSAXParser ();

Page 32: 5   xml parsing

XML Parsing | Atul Kahate 32

3 – Create an Event Handler Event handler responds to

parsing events It is a subclass of DefaultHandler

public class MyHandler extends DefaultHandler { … }

Main event methods (callbacks) startDocument, endDocument startElement, endElement characters, ignoreableWhitespace

Page 33: 5   xml parsing

XML Parsing | Atul Kahate 33

3 – Create an Event Handler Example method: startElementDeclaration

public void startElement (String nameSpaceURI, String localName, String qualifiedName, Attributes attributes)

throws SASExceptionArgumentsnameSpaceURI URI identifying the namespace uniquelylocalName Element name without namespace prefixqualifiedName Complete element name, including

namespace prefixattributes Attributes object, representing attributes of the

element

Page 34: 5   xml parsing

XML Parsing | Atul Kahate 34

3 – Create an Event Handler

nameSpaceURI

<cwp:book xmlns:cwp=“http://www.test.com/xml/”>

qualifiedName attribute[1]

<cwp:chapter number=“23” part=“Server programming”>

<cwp:title> XML made easy </cwp:title></cwp:chapter>

localName

</cwp:book>

Page 35: 5   xml parsing

XML Parsing | Atul Kahate 35

4 – Invoke the Parser Call the parse method, supplying:

The content handler The XML document

File or Input stream

p.parse (file name, handler);

Page 36: 5   xml parsing

XML Parsing | Atul Kahate 36

Sample XML File (emp.xml) <?xml version="1.0" encoding="UTF-8"?>

<root> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> <employee>test 1</employee> </root>

Page 37: 5   xml parsing

XML Parsing | Atul Kahate 37

Java Program to Count Total Number of Elements import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler;

public class SAXEmployeeCount extends DefaultHandler { int tagCount = 0;

public void startElement (String uri, String localName, String rawName, Attributes attributes) { tagCount++;

}

public void endDocument() { System.out.println("There are " + tagCount + " elements."); }

public static void main(String[] args) { SAXEmployeeCount handler = new SAXEmployeeCount ();

try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser ();

parser.parse("employee.xml", handler); } catch (Exception ex) { System.out.println(ex); } } }

Page 38: 5   xml parsing

XML Parsing | Atul Kahate 38

Count Only Book Elements<?xml version="1.0"?><books> <book category="reference"> <author>Nigel Rees</author> <title>Sayings of the Century</title> <price>8.95</price> </book> <book category="fiction"> <author>Evelyn Waugh</author> <title>Sword of Honour</title> <price>12.99</price> </book> <book category="fiction"> <author>Herman Melville</author> <title>Moby Rick</title> <price>8.99</price> </book></books>

Page 39: 5   xml parsing

XML Parsing | Atul Kahate 39

Parsing Code in JAXP import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;

public class BookCount extends DefaultHandler{

private int count = 0;

public void startDocument() throws SAXException { System.out.println("Start document ..."); }

public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {

int year = 0; String attrValue;

System.out.println ("Current element = " + raw);

if (raw.equals ("book")) { count++; } }

public void endDocument() throws SAXException { System.out.println("The total number of books = " + count); }

public static void main (String[] args) throws Exception { BookCount handler = new BookCount ();

try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("book.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 40: 5   xml parsing

XML Parsing | Atul Kahate 40

Specifying Parser Name import java.io.IOException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;

public class SAXApp extends DefaultHandler{

// default parser to use protected static final String DEFAULT_PARSER_NAME = "org.apache.xerces.parsers.SAXParser";

private int count = 0;

public void countTopics () throws IOException, SAXException { // create parser try { System.out.println ("Inside countTopics"); } catch (Exception e) { e.printStackTrace(System.err); } }

public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { if (raw.equals("topic")) count++; System.out.println (raw); }

public void endDocument() throws SAXException { System.out.println("There are " + count + " topics"); }

public static void main (String[] args) throws Exception{ System.out.println ("Inside main ...");

SAXApp handler = new SAXApp();

try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("contents.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 41: 5   xml parsing

XML Parsing | Atul Kahate 41

Exercise Consider the following XML file and write a program to

count the number of elements that have at least one attribute.

<?xml version="1.0"?><BOOKS>

<BOOK pubyear="1929"><BOOK_TITLE>Look Homeward, Angel</BOOK_TITLE><AUTHOR>Wolfe, Thomas</AUTHOR>

</BOOK><BOOK pubyear="1973">

<BOOK_TITLE>Gravity's Rainbow</BOOK_TITLE><AUTHOR>Pynchon, Thomas</AUTHOR>

</BOOK><BOOK pubyear="1977">

<BOOK_TITLE>Cards as Weapons</BOOK_TITLE><AUTHOR>Jay, Ricky</AUTHOR>

</BOOK><BOOK pubyear="2001">

<BOOK_TITLE>Computer Networks</BOOK_TITLE><AUTHOR>Tanenbaum, Andrew</AUTHOR>

</BOOK></BOOKS>

Page 42: 5   xml parsing

XML Parsing | Atul Kahate 42

Solution import java.io.IOException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;

public class countAttr extends DefaultHandler{ private int count = 0; public void startDocument() throws SAXException { System.out.println("Start document ..."); } public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { System.out.println ("Current element = " + raw); if (attrs.getLength () != 0) { count++; } } public void endDocument() throws SAXException { System.out.println("The total number of attributes = " + count); }

public static void main (String[] args) throws Exception { countAttr handler = new countAttr (); try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("countAttr.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 43: 5   xml parsing

XML Parsing | Atul Kahate 43

Exercise For the same XML file, display

element names only if the book is published in the 1970s.

Page 44: 5   xml parsing

XML Parsing | Atul Kahate 44

Solution import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;

public class seventiesBooks extends DefaultHandler{ private int count = 0; public void startDocument() throws SAXException { System.out.println("Start document ..."); } public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException { int year = 0; String attrValue; System.out.println ("Current element = " + raw); if (attrs.getLength () > 0) { attrValue = attrs.getValue (0); year = Integer.parseInt (attrValue); if (year < 1970) { count++; } } } public void endDocument() throws SAXException { System.out.println("The total number of matching elements = " + count); }

public static void main (String[] args) throws Exception { seventiesBooks handler = new seventiesBooks(); try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("countAttr.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 45: 5   xml parsing

XML Parsing | Atul Kahate 45

Exercise Consider the following XML document (stock.xml) <?xml version="1.0"?>

<stock> <stockinfo symbol="IFL"> <company>i-flex solutions limited</company> <price>2500</price> </stockinfo> <stockinfo symbol="HLL"> <company>Hindustan Lever</company> <price>1840</price> </stockinfo> <stockinfo symbol="LT"> <company>Laresn and Toubro</company> <price>2678</price> </stockinfo> <stockinfo symbol="Rel"> <company>Reliance Communications</company> <price>1743</price> </stockinfo> </stock> Produce output as shown on the next slide

Page 46: 5   xml parsing

XML Parsing | Atul Kahate 46

Expected Output

Page 47: 5   xml parsing

XML Parsing | Atul Kahate 47

Solution import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import javax.xml.parsers.*;

public class DisplayStockDetails extends DefaultHandler {

public void startDocument () throws SAXException { System.out.println ("\nDisplaying Stock Details"); System.out.println ("=========================\n"); }

public void endDocument () throws SAXException { System.out.println ("\nEnd of Details"); System.out.println ("==============\n"); }

public void startElement (String uri, String local, String raw, Attributes attrs) throws SAXException {

// Skip processing root element if (local.equals ("stock")) return;

// Skip processing if there are no attributes if (attrs == null) return;

for (int i=0; i<attrs.getLength (); i++) { System.out.println ("[Symbol: " + attrs.getValue (i) + "]"); } }

public void endElement (String uri, String local, String raw) throws SAXException { // System.out.println (); }

public void characters (char[] ch, int start, int length) throws SAXException { System.out.println (new String (ch, start, length)); }

public static void main (String[] args) throws Exception {

DisplayStockDetails handler = new DisplayStockDetails ();

try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("stock.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 48: 5   xml parsing

XML Parsing | Atul Kahate 48

Exercise Consider the following XML document. Write a Java program to find out the maximum price, and also display the author name corresponding to

that book. <?xml version="1.0"?> <books> <book category="reference"> <author>Nigel Rees</author> <title>Sayings of the Century</title> <price>8</price> </book> <book category="fiction"> <author>Evelyn Waugh</author> <title>Sword of Honour</title> <price>12</price> </book> <book category="fiction"> <author>Herman Melville</author> <title>Moby Rick</title> <price>8</price> </book> <book category="non-fiction"> <author>Bill Bryson</author> <title>A Short History Of Everything</title> <price>20</price> </book> <book category="reference"> <author>Herb Schildt</author> <title>Java - The Complete Reference</title> <price>23</price> </book> <book category="non-fiction"> <author>Paul Smith</author> <title>The Scientists</title> <price>12</price> </book> </books>

Page 49: 5   xml parsing

XML Parsing | Atul Kahate 49

Solution package saxexamples;

import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler;

public class HighestPricedBook extends DefaultHandler {

private int maxPriceBookPrice = 0; private String currentBookAuthor, maxPriceBookAuthor; private boolean flagIsCurrentElementPrice = false, flagIsCurrentElementAuthor = false;

public void startDocument() throws SAXException { System.out.println("Start document ..."); }

public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {

System.out.println("Current element = " + raw);

if (raw.equals("author")) { flagIsCurrentElementAuthor = true; System.out.println ("Current element is book"); } else if (raw.equals("price")) { flagIsCurrentElementPrice = true; }

}

public void characters(char[] ch, int start, int len) throws SAXException {

if (flagIsCurrentElementAuthor == true) { flagIsCurrentElementAuthor = false; System.out.println("ch = " + ch); System.out.println("start = " + start); System.out.println("len = " + len);

StringBuffer buffer = new StringBuffer();

for (int i = 0; i < len; i++) { buffer.append(ch[start + i]); }

System.out.println("*** buffer = " + buffer + " ***"); currentBookAuthor = buffer.toString(); System.out.println("*** str = " + currentBookAuthor + " ***"); }

else if (flagIsCurrentElementPrice == true) {

flagIsCurrentElementPrice = false;

System.out.println("ch = " + ch); System.out.println("start = " + start); System.out.println("len = " + len);

StringBuffer buffer = new StringBuffer();

for (int i = 0; i < len; i++) { buffer.append(ch[start + i]); }

System.out.println("*** buffer = " + buffer + " ***");

String str = buffer.substring(0); int uprice = Integer.parseInt(str);

if (uprice > maxPriceBookPrice) { maxPriceBookPrice = uprice; maxPriceBookAuthor = currentBookAuthor; }

System.out.println("Current maximum price = " + maxPriceBookPrice); }

}

public void endDocument() throws SAXException { System.out.println("The book author with the maximum price = " + maxPriceBookAuthor); System.out.println("And the book price = " + maxPriceBookPrice); }

public static void main(String[] args) throws Exception { HighestPricedBook handler = new HighestPricedBook();

try { SAXParserFactory spf = SAXParserFactory.newInstance(); SAXParser parser = spf.newSAXParser(); parser.parse("book.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 50: 5   xml parsing

XML Parsing | Atul Kahate 50

Exercise Consider the following XML file and write a program to find out and display the total cost for

all CDs.

<?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>Candle in the wind</title> <artist>Elton John</artist> <country>UK</country> <company>HMV</company> <price>8.20</price> <year>1998</year> </cd> </catalog>

Page 51: 5   xml parsing

XML Parsing | Atul Kahate 51

Solution import java.io.IOException; import java.lang.*; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.Attributes; import org.xml.sax.Locator; import org.xml.sax.SAXException; import org.xml.sax.SAXNotRecognizedException; import org.xml.sax.SAXNotSupportedException; import org.xml.sax.SAXParseException; import org.xml.sax.XMLReader; import org.xml.sax.ext.LexicalHandler; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.helpers.ParserAdapter; import org.xml.sax.helpers.XMLReaderFactory;

public class CDPrice extends DefaultHandler{

private int count = 0, total = 0; private boolean flagIsAvailable = false, flagIsCurrentElementPrice = false;

public void startDocument() throws SAXException { System.out.println("Start document ..."); }

public void startElement(String uri, String local, String raw, Attributes attrs) throws SAXException {

int year = 0; String attrValue;

System.out.println ("Current element = " + raw);

if (raw.equals ("price")) { flagIsCurrentElementPrice = true; System.out.println ("INSIDE if of startElement ==="); }

}

public void characters (char [] ch, int start, int len) throws SAXException {

if (flagIsCurrentElementPrice) {

System.out.println ("ch = " + ch); System.out.println ("start = " + start); System.out.println ("len = " + len);

StringBuffer buffer = new StringBuffer ();

for (int i=0; i<len; i++) { buffer.append (ch[start+i]); }

System.out.println ("*** buffer = " + buffer + " ***");

String str = buffer.substring (0); int uprice = Integer.parseInt(str);

total += uprice; flagIsCurrentElementPrice = false; System.out.println ("Current total = " + total);

}

}

public void endDocument() throws SAXException { System.out.println("The total price of available CDs = " + total); }

public static void main (String[] args) throws Exception { CDPrice handler = new CDPrice();

try { SAXParserFactory spf = SAXParserFactory.newInstance (); SAXParser parser = spf.newSAXParser (); parser.parse ("cdcatalog2.xml", handler); } catch (SAXException e) { System.err.println(e.getMessage()); } } }

Page 52: 5   xml parsing

Document Object Model (DOM)

Page 53: 5   xml parsing

XML Parsing | Atul Kahate 53

DOM – Basic Flow

Page 54: 5   xml parsing

XML Parsing | Atul Kahate 54

Basic Concepts

Page 55: 5   xml parsing

XML Parsing | Atul Kahate 55

JAXP and DOM – Overview Class DocumentBuilderFactory

public abstract class javax.xml.parsers.DocumentBuilderFactory extends java.lang.object

Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents

parse method: Parses the contents of an XML document and returns the contents as a new Document object

Page 56: 5   xml parsing

XML Parsing | Atul Kahate 56

JAXP and DOM – Overview Class DocumentBuilder

public abstract class javax.xml.parsers. DocumentBuilder extends java.lang.Object

Defines the API to obtain DOM Document instances from an XML document

Page 57: 5   xml parsing

XML Parsing | Atul Kahate 57

JAXP and DOM – Overview Interface Document

public interface Document extends Node

The Document interface represents the entire HTML or XML document

Conceptually, it is the root of the document tree, and provides the primary access to the document's data

Page 58: 5   xml parsing

XML Parsing | Atul Kahate 58

JAXP and DOM – Overview Interface Element

public interface Element extends Node The Element interface represents an

element in an HTML or XML document Elements may have attributes associated

with them Inherits from Node, the generic Node

interface attributes may be used to retrieve the set of

all attributes for an element

Page 59: 5   xml parsing

XML Parsing | Atul Kahate 59

JAXP and DOMDocumentBuilderFactory factory =

DocumentBuilderFactory.newInstance ();DocumentBuilder builder =

factory.newDocumentBuilder ();Document document = builder.parse (fileName);Element root = document.getDocumentElement

();

Page 60: 5   xml parsing

XML Parsing | Atul Kahate 60

Example – XML File Count the number of Employee elements from this XML using DOM <?xml version="1.0"?>

<BOOKS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="book.xsd"> <BOOK> <TITLE>Computer Networks</TITLE> <AUTHOR>Andrew Tanenbaum</AUTHOR> <PUBLISHER>Pearson Education</PUBLISHER> <PRICE>400</PRICE> <CATEGORY>Computer Science</CATEGORY> </BOOK> <BOOK> <TITLE>TCP/IP</TITLE> <AUTHOR>Douglas Comer</AUTHOR> <PUBLISHER>Pearson Education</PUBLISHER> <PRICE>350</PRICE> <CATEGORY>Computer Science</CATEGORY> </BOOK> </BOOKS>

Page 61: 5   xml parsing

XML Parsing | Atul Kahate 61

Example – Java Code /* * To change this template, choose Tools | Templates * and open the template in the editor. */ package domexamples;

import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;

public class DOMCountExample {

public static void main(String[] args) {

try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");

// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;

LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

DOMConfiguration config = parser.getDomConfig();

// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);

// Set schema validation parameters config.setParameter("validate", Boolean.TRUE); config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");

config.setParameter("validate-if-schema", Boolean.TRUE); config.setParameter("schema-location", "book.xsd");

Document document = parser.parseURI("book.xml"); Element root = document.getDocumentElement();

NodeList nodes = document.getElementsByTagName("BOOK"); System.out.println("There are " + nodes.getLength() + " elements.");

} catch (Exception ex) { System.out.println(ex); } } }

Page 62: 5   xml parsing

XML Parsing | Atul Kahate 62

Case Study – XML File<?xml version="1.0"?><catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10</price> <year>1985</year> </cd> <cd> <title>Candle in the wind</title> <artist>Elton John</artist> <country>UK</country> <company>HMV</company> <price>8</price> <year>1998</year> </cd></catalog>

Page 63: 5   xml parsing

XML Parsing | Atul Kahate 63

Problem Write a program to find out if an

element by the name price exists in the XML file and display its contents

Page 64: 5   xml parsing

XML Parsing | Atul Kahate 64

Solution package domexamples;

import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;

public class CountPriceElements { public static void main (String[] args) { NodeList elements; String elementName = "price"; try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");

// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;

LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

DOMConfiguration config = parser.getDomConfig();

// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);

// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");

// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");

Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement(); System.out.println ("In main ... XML file openend successfully ..."); elements = document.getElementsByTagName(elementName);

// is there anything to do? if (elements == null) { return; }

// print all elements int elementCount = elements.getLength(); System.out.println ("Count = " + elementCount); for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); Node node = element.getFirstChild(); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + node.getNodeType()); System.out.println("Element Value = " + node.getNodeValue()); System.out.println("Has attributes = " + node.hasAttributes()); } } catch (DOMException e2) { System.out.println ("Exception: " + e2); } catch (Exception e3) { System.out.println ("Exception: " + e3); } } }

Page 65: 5   xml parsing

XML Parsing | Atul Kahate 65

Problem Write a program to display

element names and their attribute names and values

Page 66: 5   xml parsing

XML Parsing | Atul Kahate 66

Solution package domexamples;

import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;

public class DOMAttributesExample {

public static void main(String[] args) {

NodeList elements; String elementName = "cd";

try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");

// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;

LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

DOMConfiguration config = parser.getDomConfig();

// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);

// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");

// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");

Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement();

System.out.println("In main ... XML file openend successfully ...");

elements = document.getElementsByTagName(elementName);

// is there anything to do? if (elements == null) { return; }

// print all elements int elementCount = elements.getLength(); System.out.println("Count = " + elementCount);

for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + element.getNodeType()); System.out.println("Element Value = " + element.getNodeValue()); System.out.println("Has attributes = " + element.hasAttributes());

// If attributes exist, print them if (element.hasAttributes()) { // if it does, store it in a NamedNodeMap object NamedNodeMap AttributesList = element.getAttributes();

// iterate through the NamedNodeMap and get the attribute names and values for (int j = 0; j < AttributesList.getLength(); j++) { System.out.println("Attribute: " + AttributesList.item(j).getNodeName() + " = " + AttributesList.item(j).getNodeValue()); } } } } catch (Exception e1) { System.out.println("Exception: " + e1); } } }

Page 67: 5   xml parsing

XML Parsing | Atul Kahate 67

Problem For a given element, find out all

the child elements and display their types

Page 68: 5   xml parsing

XML Parsing | Atul Kahate 68

Solution import org.w3c.dom.*;

package domexamples;

import org.w3c.dom.*; import org.w3c.dom.bootstrap.*; import org.w3c.dom.ls.*;

public class DOMGetChildrenOfAnElement { public static void main (String[] args) { NodeList elements, Children; String elementName = "cd"; String local = ""; try { DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0");

// Casting DOMImplementationLS implLS = (DOMImplementationLS) domImpl;

LSParser parser = implLS.createLSParser(DOMImplementationLS.MODE_SYNCHRONOUS, "http://www.w3.org/2001/XMLSchema");

DOMConfiguration config = parser.getDomConfig();

// Set error handler DOMErrorHandlerImpl errorHandler = new DOMErrorHandlerImpl(); config.setParameter("error-handler", errorHandler);

// Set schema validation parameters // config.setParameter("validate", Boolean.TRUE); // config.setParameter("schema-type", "http://www.w3.org/2001/XML/Schema");

// config.setParameter("validate-if-schema", Boolean.TRUE); // config.setParameter("schema-location", "book.xsd");

Document document = parser.parseURI("CDCatalog.xml"); Element root = document.getDocumentElement();

System.out.println ("In main ... XML file openend successfully ..."); elements = document.getElementsByTagName(elementName);

// is there anything to do? if (elements == null) { return; }

// print all elements int elementCount = elements.getLength(); System.out.println ("Count = " + elementCount); for (int i = 0; i < elementCount; i++) { Element element = (Element) elements.item(i); Node node = element.getFirstChild(); System.out.println("Element Name = " + element.getNodeName()); System.out.println("Element Type = " + node.getNodeType()); System.out.println("Element Value = " + node.getNodeValue()); System.out.println("Has attributes = " + node.hasAttributes()); // Find out if child nodes exist for this element Children = element.getChildNodes();

if (Children != null) { for (int j=0; j< Children.getLength(); j++) { local = Children.item(j).getNodeName(); System.out.println ("Child element name = " + local); } } } } catch (Exception e) { System.out.println ("Exception: " + e); } } }

Page 69: 5   xml parsing

XML Parsing | Atul Kahate 69

Node Types 1 ELEMENT_NODE Element The element name

2 ATTRIBUTE_NODE Attribute The attribute name 3 TEXT_NODE Text #text 4 CDATA_SECTION_NODE CDATA #cdata-section 5 ENTITY_REFERENCE_NODE Entity reference The entity reference name 6 ENTITY_NODE Entity The entity name 7 PROCESSING_INSTRUCTION_NODE PI The PI target 8 COMMENT_NODE Comment #comment 9 DOCUMENT_NODE Document #document 10 DOCUMENT_TYPE_NODE DocType Root element 11 DOCUMENT_FRAGMENT_NODE DocumentFragment #document-fragment 12 NOTATION_NODE Notation The notation name

Page 70: 5   xml parsing

XML Parsing | Atul Kahate 70

Problem Write a program to create XML

contents dynamically and write them to a file on the disk

Page 71: 5   xml parsing

XML Parsing | Atul Kahate 71

Solution /* * To change this template, choose Tools | Templates * and open the template in the editor. */

package domexamples;

import org.w3c.dom.*; import org.w3c.dom.bootstrap.DOMImplementationRegistry; import org.w3c.dom.ls.*;

import javax.xml.parsers.*;

/** * * @author atulk */ public class DOMNewXMLFileCreator { public void saveDocument () { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document document = builder.newDocument(); Element catalog = document.createElement("catalog"); catalog.setAttribute("publisher", "McGraw-Hill"); document.appendChild (catalog); Element journal = document.createElement("journal"); journal.setAttribute("edition", "October 2005"); journal.setAttribute("section", "XML"); catalog.appendChild(journal); Element article = document.createElement("article"); journal.appendChild (article); Element title = document.createElement("title"); title.appendChild (document.createTextNode("DOM Parsing")); article.appendChild (title); Element author = document.createElement("author"); author.appendChild (document.createTextNode("Anonymous")); article.appendChild(author); DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance(); DOMImplementation domImpl = registry.getDOMImplementation("LS 3.0"); DOMImplementationLS implLS = (DOMImplementationLS) domImpl; LSSerializer dom3Writer = implLS.createLSSerializer(); LSOutput output = implLS.createLSOutput(); System.out.println ("Outputting XML Document ..."); output.setByteStream (System.out); output.setEncoding ("UTF-8"); dom3Writer.write (document, output); System.out.println ("\n\n" + "Outputting the journal node" + "\n"); dom3Writer.write(journal, output); String nodeString = dom3Writer.writeToString(journal); } catch (Exception e) { e.printStackTrace(); } } public static void main (String [] args) { DOMNewXMLFileCreator fc = new DOMNewXMLFileCreator (); fc.saveDocument(); } }

Page 72: 5   xml parsing

Stream API (StAX)

Page 73: 5   xml parsing

XML Parsing | Atul Kahate 73

Creating an XML Document using StAX

package xmlbook.chapter7;

import java.io.*; import java.util.*; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamWriter; import javax.xml.stream.XMLStreamException;

public class CreateFTXML {

public void CreateXML(int fromAccount, int toAccount, int amount) {

// Create XMLOutputFactory object XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();

try { // Create the XMLStreamWriter object FileWriter fileWriter = new FileWriter("d:\\FT.xml"); XMLStreamWriter writer = outputFactory.createXMLStreamWriter(fileWriter); // XMLStreamWriter writer = outputFactory.createXMLStreamWriter(System.out);

// Write XML data now writer.writeStartDocument("1.0"); writer.writeComment("Funds Transfer Data"); writer.writeStartElement("FundsTransfer");

writer.writeStartElement("FromAccount"); writer.writeCharacters(Integer.toString(fromAccount)); writer.writeEndElement();

writer.writeStartElement("ToAccount"); writer.writeCharacters(Integer.toString(toAccount)); writer.writeEndElement();

writer.writeStartElement("Amount"); writer.writeCharacters(Integer.toString(amount)); writer.writeEndElement();

writer.writeEndElement(); // for FundsTransfer element writer.writeEndDocument(); writer.flush(); writer.close(); } catch (IOException e) { e.printStackTrace(); } catch (XMLStreamException e) { e.printStackTrace(); } } }

Page 74: 5   xml parsing

XML Parsing | Atul Kahate 74

Using StAX to Read package xmlbook.chapter7;

import java.io.*; import java.util.*;

import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamReader; import javax.xml.stream.XMLStreamConstants; import javax.xml.stream.XMLStreamException;

public class ReadFTXML {

public int parseXML(String whichElement) {

XMLInputFactory inputFactory = XMLInputFactory.newInstance();

try {

// create an XMLStreamReader object FileReader fileReader = new FileReader("d:\\FT.xml"); XMLStreamReader reader = inputFactory.createXMLStreamReader(fileReader);

// read the XML now

while (reader.hasNext()) {

if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) { String elementName = reader.getLocalName(); if (elementName.equalsIgnoreCase(whichElement)) { String valueToReturn = reader.getElementText(); return Integer.parseInt(valueToReturn); }

}

reader.next(); } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (XMLStreamException e) { e.printStackTrace(); }

return -1; // In the case of an error } }

Page 75: 5   xml parsing

XML and ASP.NET – An Overview

Page 76: 5   xml parsing

XML Parsing | Atul Kahate 76

XmlReader and XmlWriter XMLReader

Pull-style API for XML Forward-only, read-only access to XML

documents XMLReader is an abstract class that other

classes derive from, to provide specific concrete instances such as XmlTextReader and XmlNodeReader

In ASP.NET 2.0, XMLReader is a factory We need not specify which implementation of

XMLReader needs to be used We use a static Create method, and supply necessary

parameters and let .NET decide how to instantiate it

Page 77: 5   xml parsing

XML Parsing | Atul Kahate 77

Example – XML Document <?xml version="1.0" encoding="utf-8" ?> <bookstore> <book genre ="autobiography" publicationdate="1981" ISBN="1-861003-11-0"> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre ="novel" publicationdate="1967" ISBN="0-201-65512-2"> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book> <book genre ="philosophy" publicationdate="1991" ISBN="1-861001-57-6"> <title>The Gorgias</title> <author> <first-name>Sidas</first-name> <last-name>Plato</last-name> </author> <price>9.99</price> </book> </bookstore>

Page 78: 5   xml parsing

XML Parsing | Atul Kahate 78

Example – ASP.NET Page using System; using System.Data; using System.Configuration; using System.Collections; using System.Web; using System.Web.Security; using System.Web.UI; using System.Web.UI.WebControls; using System.Web.UI.WebControls.WebParts; using System.Web.UI.HtmlControls; using System.Xml; using System.IO;

public partial class XMLReader2 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();

settings.IgnoreWhitespace = true; settings.IgnoreComments = true;

string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");

using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount));

} }

Page 79: 5   xml parsing

XML Parsing | Atul Kahate 79

Validating an XML Against a Schema using System.Xml.Schema; using System; using System.Xml; using System.IO;

public partial class XMLReader3 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();

string booksSchemaFile = Path.Combine(Request.PhysicalApplicationPath, "books.xsd"); settings.Schemas.Add (null, XmlReader.Create (booksSchemaFile)); settings.ValidationType = ValidationType.Schema; settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings; settings.ValidationEventHandler += new ValidationEventHandler (settings_ValidationEventHandler); settings.IgnoreWhitespace = true; settings.IgnoreComments = true;

string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");

using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount)); }

void settings_ValidationEventHandler(object sender, System.Xml.Schema.ValidationEventArgs e) { Response.Write(e.Message); } }

Page 80: 5   xml parsing

XML Parsing | Atul Kahate 80

Creating an XML Document using System.Xml.Schema; using System; using System.Xml; using System.IO;

public partial class XMLReader3 : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { int bookCount = 0; XmlReaderSettings settings = new XmlReaderSettings();

string booksSchemaFile = Path.Combine(Request.PhysicalApplicationPath, "books.xsd"); settings.Schemas.Add (null, XmlReader.Create (booksSchemaFile)); settings.ValidationType = ValidationType.Schema; settings.ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings; settings.ValidationEventHandler += new ValidationEventHandler (settings_ValidationEventHandler); settings.IgnoreWhitespace = true; settings.IgnoreComments = true;

string booksFile = Path.Combine(Request.PhysicalApplicationPath, "Books.xml");

using (XmlReader reader = XmlReader.Create(booksFile, settings)) { while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element && "book" == reader.LocalName) { bookCount++; } } } Response.Write(String.Format("Found {0} books!", bookCount)); }

void settings_ValidationEventHandler(object sender, System.Xml.Schema.ValidationEventArgs e) { Response.Write(e.Message); } }

Page 81: 5   xml parsing

Thank you!

Any Questions?