Bottom-up Evaluation of XPath Queries

29
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou

description

Bottom-up Evaluation of XPath Queries. Stephanie H. Li Zhiping Zou. Outline. Overview of XPath Motivation Algorithms : bottom-up evaluation Design and implementation. Introduction- Overview. Overview of Xpath - PowerPoint PPT Presentation

Transcript of Bottom-up Evaluation of XPath Queries

Page 1: Bottom-up Evaluation of  XPath Queries

Bottom-up Evaluation of XPath Queries

Stephanie H. LiZhiping Zou

Page 2: Bottom-up Evaluation of  XPath Queries

Outline

Overview of XPath Motivation Algorithms : bottom-up evaluation Design and implementation

Page 3: Bottom-up Evaluation of  XPath Queries

Introduction- Overview

Overview of Xpath XPath is a querying language and is designed for

addressing nodes of XML documents.Data modelSyntaxExpressions

Location paths Operators Functions

Evaluation(context)

Page 4: Bottom-up Evaluation of  XPath Queries

Data Model

Data Model XML document = tree of nodes 7 kinds of nodes:

Element Attribute Text Namespace Processing-instruction Comment Document (root) nodes.

Page 5: Bottom-up Evaluation of  XPath Queries

Data Model(Example)

r

a

b b

The root node

The root element

bb

<a><b/><b/><b/><b/>

</a>

Page 6: Bottom-up Evaluation of  XPath Queries

Expression

XPath uses expressions to select nodes from XML documents

The main types of expressions are Location Paths, Functions and operators

Page 7: Bottom-up Evaluation of  XPath Queries

Location Paths

Although there are many different kinds of XPath expressions, the one that’s of primary use in Java programs is the location path.

Location Path: /child::movies/child::movie[position()=5]step axis nodetest predicate

location path

Page 8: Bottom-up Evaluation of  XPath Queries

Location Step

Axis::Nodetest[predicts] Axis: chooses the direction to move from the

context node Node test: determines what kinds of nodes will

be selected along that axis Predicts: further filter the node-set.

Page 9: Bottom-up Evaluation of  XPath Queries

XPath Axis

Axis---main navigator for a XML docancestor : nodes along the path to the rootancestor-or-self : same but including the context nodechild : children of the context nodedescendant : descendants of the context nodedescendant-or-self : same but including the context nodefollowing : nodes after the context node in document order,

excluding descendantsfollowing-sibling : following sibling of the context nodeparent : the parent of the context nodepreceding : nodes before the context node in document

order,excluding ancestors preceding-sibling : preceding sibling of the context node

Page 10: Bottom-up Evaluation of  XPath Queries

Node Test

Node Type test Example

T(root()) = {r}, T(element()) = {a; b1; : : : ; b4} T(element(a))= {a}T(element(b)) = {b1; : : : ; b4}

Node Name test Element node name

Page 11: Bottom-up Evaluation of  XPath Queries

Operators and Functions

Arithmetic Ops

Ops for comparisons and boolean logic: {<,>,<=,>=,=,!=} {or, and}

Functions Position() Last()

Page 12: Bottom-up Evaluation of  XPath Queries

Xpath Query Evalutation

Query evaluation is a major algorithmic problem Main construct is the expression Each expression is evaluated to yield an object one of

these four types: Node-set (an unordered collection of nodes without

duplicates ) Boolean(true or false) Number(a floating-point number ) String

Page 13: Bottom-up Evaluation of  XPath Queries

Context

All XPath expressions are evaluated w.r.t. a Context,which consists of A context node A context position(int) A context size(int)

The input context for query evaluation is chosen by the user.

Page 14: Bottom-up Evaluation of  XPath Queries

Motivation

Claim: The way XPath is defined in W3C XPath

recommendation motivates an inefficient implementation (exponential-time).

This paper propose more efficient way (polynomial-time)

Page 15: Bottom-up Evaluation of  XPath Queries

Basic query evaluation strategy

Procedure process-location-step(n0, Q)/* n0 is the context node;

query Q is a list of location steps */Begin

node set S := apply Q.first to node n0;if (Q.tail is not empty) thenfor each node n ∈ S doprocess-location-step(n, Q.tail);

End

Time(|Q|) = |D| * Time(|Q|-1) or |D||Q| when |Q| > 0 1 when |Q| = 0

The algorithm recursively evaluates each remaining step for each matching node of the current step

Page 16: Bottom-up Evaluation of  XPath Queries

Xpath Evaluate in PTime

Theorem: Let e be an arbitrary XPath expression. Then, for context node x, position k, and size n, the value of e is v, where v is the unique value such that <x,k,n,v>∈ E↑[e]

The main principle that the paper propose to obtain an XPath evaluation algorithm with PTime complexity is the notion of a context-value table(CVT)

Page 17: Bottom-up Evaluation of  XPath Queries

Context-value table Principle

Given an expression e, the CVT of e specifies all valid combinations of contexts c<x,k,n> and values v, s.t. e evaluates to v in context c<x,k,n>

Such a table for expression e is obtained by first computing the CVTs of the direct subexpressions of e and then combining them into the CVT for e.

The size of each of the CVTs has a polynomial bound Each of the combination steps can be effected in

PTime Thus, query evaluation in total under our principle

also has a PTime bound

Page 18: Bottom-up Evaluation of  XPath Queries

Bottom-up evaluation of XPath

Page 19: Bottom-up Evaluation of  XPath Queries

Bottom-up evaluation of XPath

Algorithm (Bottom-up algorithm for XPath)Input: An XPath query Q;Output: E↑[Q]Method:

Let Tree(Q) be the parse tree of query Q;R:=Ø;For each atomic expression l ∈ leaves(Tree(Q)) do

compute table E↑[l] and add it to R; [Note: we use JDom to do this]

While E↑[root(Tree(Q))]! ∈ R doBegin

take an Op(l1,…ln) nodes(Tree(Q))s.t. E↑[l1],… E↑[ln] ∈ R;

compute E↑[Op(l1,…ln)] using E↑[l1],…, E↑[ln];add E↑[Op(l1,…ln)] to R;

End;Return E↑[root(Tree(Q))]

By a bottom-up algorithm we mean a method of processing XPath while traversing the parse tree of the query from its leaves up to its root.

Page 20: Bottom-up Evaluation of  XPath Queries

Bottom-up evaluation of XPath

Example XML :

<?xml version="1.0"?><people> <person born="1912" died="1954" id="p342">

<name> Alan Turing </name> <!-- Did the word computer scientist exist in Turing's day? --> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage>href="http://www.turing.org.uk/"</homepage>

</person> <person born="1918" died="1988" id="p4567">

<name>Richard M. Feynman</name> <profession>physicist</profession> <hobby>Playing the bongoes</hobby>

</person> </people>

Page 21: Bottom-up Evaluation of  XPath Queries

Example: XML Doc Tree

Page 22: Bottom-up Evaluation of  XPath Queries

Example: XPath Query tree

Parse tree XPath query: descendant:: profession/following-sibling::*[position()!= last()]

Page 23: Bottom-up Evaluation of  XPath Queries

Example: Evaluate subexpressions

Page 24: Bottom-up Evaluation of  XPath Queries

Example: Evaluate subexpressions

Page 25: Bottom-up Evaluation of  XPath Queries

Example: Evaluate subexpressions

Page 26: Bottom-up Evaluation of  XPath Queries

Design and Implementaion

Environment Java,JDK1.5.0 Jdom1.0 XPath1.0 Features:

Only Element nodes are queriedNot support abbreviated xpath expressionsNot support format of location steps in predicts.

Page 27: Bottom-up Evaluation of  XPath Queries

System Structure

Query Parser(Parser.java BinaryTree.java,Node.java)

User input(MyDriver.java)

Query tree

Evaluator( QueryEval.java)

JDom XML parser(org.jdom.input.SAXBuilder)

Context value tables (ContextValTable.java and others)

XML document tree

Result for the full xpath query

XML fileQuery

Context node

Page 28: Bottom-up Evaluation of  XPath Queries

Conclusion

XPath query evaluation algorithm that runs in polynomial time with respect to the size of both the data and the query (linear in the size of queries and quadratic in the size of data)

No optimization, strictly coheres to the specification given in the paper

Page 29: Bottom-up Evaluation of  XPath Queries

References

G. Gottlob, C. Koch, and R. Pichler. "Xpath Processing in a Nutshell". In Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE'03), Bangalore, India, Mar. 2003.

G. Gottlob, C. Koch, and R. Pichler. "Efficient Algorithms for Processing XPath Queries". In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.

G. Gottlob, C. Koch, and R. Pichler. "XPath Query Evaluation: Improving Time and Space Efficiency". In Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE'03), Bangalore, India, Mar. 2003.

http://www.ibiblio.org/xml/books/xmljava/chapters/ch16.html