Trie Indexes for Efficient XML Query Processing

44
Trie Indexes for Efficient XML Query Processing Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz Indiana University, Bloomington {sbrenesb, yuqwu, vgucht, psantacr}@cs.indiana.edu 1

description

Trie Indexes for Efficient XML Query Processing. Sofia Brenes , Yuqing Wu, Dirk Van Gucht , Pablo Santa Cruz Indiana University, Bloomington { sbrenesb , yuqwu , vgucht , psantacr }@ cs.indiana.edu. XML and Queries – An Example. Query 1: //A/B/C Query 2 : //B/C - PowerPoint PPT Presentation

Transcript of Trie Indexes for Efficient XML Query Processing

Page 1: Trie  Indexes for Efficient XML Query Processing

1

Trie Indexes for Efficient XML Query Processing

Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz

Indiana University, Bloomington{sbrenesb, yuqwu, vgucht, psantacr}@cs.indiana.edu

Page 2: Trie  Indexes for Efficient XML Query Processing

2

XML and Queries – An Example

Query 1: //A/B/CQuery 2: //B/CQuery 3: //A/B[./D]/CQuery 4: //A[./B[./D]]/B/C

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 3: Trie  Indexes for Efficient XML Query Processing

3

Index and XML Query EvaluationChallenges Structure

◦Data: containment relationship◦Query:

pattern matching (nested) predicates

Page 4: Trie  Indexes for Efficient XML Query Processing

4

Structural Indices for XML DataConsider both value and

structureIndex Features Structural IndicesPure structural summaries

DataGuide, T-index

Local bi-similarity A(k), UD(k,i), D(k), M(k)

Workload-aware D(k), M(k), M*(k)Encoded sequence ViST, Index FabricIndex chooser XIST

Page 5: Trie  Indexes for Efficient XML Query Processing

5

Expected Features for an XML Index

Reasonable sizeEasy to construct and adjustQuery evaluation

◦Index-only plan for most queries.

Page 6: Trie  Indexes for Efficient XML Query Processing

6

OutlineIntroductionMethodologyPartition induced by structural characteristics

of XMLPartition induced by fragments of XPath

AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 7: Trie  Indexes for Efficient XML Query Processing

7

Rewind – back to the world of RDB

RDBMS Theory

RDBMS Engineering Techniques

Page 8: Trie  Indexes for Efficient XML Query Processing

Our approachStudy XML query language and its

fragmentsStudy the indistinguishibility of

components in an XML documentsReason about existing XML indicesDesign new XML indices.

8

Page 9: Trie  Indexes for Efficient XML Query Processing

9

OutlineIntroductionMethodologyPartition induced by structural

characteristics of XMLPartition induced by fragments of XPath

AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 10: Trie  Indexes for Efficient XML Query Processing

10

XML Data ModelRepresent XML document D as a

finite unordered node-labeled tree

D = (V, Ed, r, )Nodes: VEdges: Ed Root: rLabels:

LV :

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 11: Trie  Indexes for Efficient XML Query Processing

11

m

n

Label Path LP(m,n)

◦LP(m,n) = (A,B,C) LP(n, k)

◦LP(n,0) = (C)◦LP(n, 1) = (B,C)◦LP(n,4) = (A,A,B,C)◦LP(n,7) = (A,A,B,C)

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 12: Trie  Indexes for Efficient XML Query Processing

12

N [k] Equivalence

),(),( 212][1 knknnn k LPLPΝ

Given an XML document and value k

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

2]1[1 BB Ν

2]2[1 BB Ν

Page 13: Trie  Indexes for Efficient XML Query Processing

13

N [k] Partition),(),( 212][1 knknnn k LPLPΝ

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

N [1] (A)(A,A)(A,B)(B,B)(B,C)(B,D)

{A1}{A2}{B1, B2, B3, B4}{B5}{C1, C2, C3, C4}{D1}

N [1][(A,B)] = {B1, B2, B3, B4}

Label Path

Page 14: Trie  Indexes for Efficient XML Query Processing

14

P [k] Equivalence

knmnmnm

nmnm k

|),(

),(),(),(),(

11

221122][11 LP|

LPLPP

Given an XML document and value k

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

),(),( 22]2[11 CACA P

),(),( 41]3[21 CACA P

Page 15: Trie  Indexes for Efficient XML Query Processing

15

P [k] Partition A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

P [1]

(A)(B)(C)(D)

{(A1, A1), (A2, A2)}{(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)}{(C1, C1), (C2, C2), (C3, C3), (C4, C4)}{(D1, D1)}

(A,A)(A,B)(B,B)(B,C)(B,D)

{(A1, A2)}{(A1, B1), (A2, B2), (A2, B3), (A1, B4)}{(B4, B5)}{(B1, C1), (B2, C2), (B3, C3), (B5, C4)}{(B2, D1)}

P [1][(A,A)] = {(A1, A2)}

Page 16: Trie  Indexes for Efficient XML Query Processing

16

P [k] Partition A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

P [2]

(A)(B)(C)(D)

{(A1, A1), (A2, A2)}{(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)}{(C1, C1), (C2, C2), (C3, C3), (C4, C4)}{(D1, D1)}

(A,A)(A,B)(B,B)(B,C)(B,D)

{(A1, A2)}{(A1, B1), (A2, B2), (A2, B3), (A1, B4)}{(B4, B5)}{(B1, C1), (B2, C2), (B3, C3), (B5, C4)}{(B2, D1)}

(A,A,B)(A,B,B)(A,B,C)(A,B,D)(B,B,C)

{(A1, B2), (A1, B3)}{(A1, B5)}{(A1, C1), (A2, C2), (A2, C3)}{(A2, D1)} {(B4, C4)}P [2][(A,B,C)] = {(A1, C1), (A2, C2),

(A2, C3)}

Page 17: Trie  Indexes for Efficient XML Query Processing

17

OutlineIntroductionMethodologyPartition induced by structural characteristics

of XMLPartition induced by fragments of XPath

AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 18: Trie  Indexes for Efficient XML Query Processing

18

XPath Algebra

})(|),{()()(

}|),{()(

lmVmmmDlD

VmmmD

1)(

)(

EdD

EdD

)}().()(),(:|),{()(

)}(),(:|),{()(

2121

111

DEnwDEwmwnmDEEDEnmnmmDE

Path semantics

Node semantics )}(),(:|{])[( DEnmmnnodesDE

Page 19: Trie  Indexes for Efficient XML Query Processing

19

Fragments of XPath Algebra

D algebra XPath algebra - ↑, π1D [ ] algebra XPath algebra - ↑

D [k] algebra D algebra up to length k

D [ ][k] algebra D [ ] algebra up to length k

Page 20: Trie  Indexes for Efficient XML Query Processing

20

D [k] Equivalence Given an XML document and

value k and (m1, n1), (m2, n2) in DownPairs(D)

For any E in D [k]

),(),( 22[k]11 nmnm D

)(),()(),( 2211 DEnmDEnm

Page 21: Trie  Indexes for Efficient XML Query Processing

21

OutlineIntroduction MethodologyPartition induced by structural characteristics

of XMLPartition induced by fragments of XPath

AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 22: Trie  Indexes for Efficient XML Query Processing

22

Coupling TheoremLet D be a document and k is an integer.

◦The P[k]-partition of D and the D[k]- partition of D are the same under the path semantics

◦The N[k]-partition of D and the D[k]-partition of D are the same under the node semantics

][][][][][][

PPΝΝ

DDDD

kkkk

Page 23: Trie  Indexes for Efficient XML Query Processing

23

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

k-Label-Path SetThe set of label-paths of

length k in an XML document that satisfies an XPath expression in algebra D.

BAE

)},,(),,,{()2,(

BBABAAELPS

Page 24: Trie  Indexes for Efficient XML Query Processing

24

Label-Union TheoremLet D be a document, k an integer,

and E is an D[k] expression. Then there exists a class of partition blocks of the P[k]-partition (N[k]-partition) of D such that

),(

),(

]][[)(

]][[])[(

kELPSlp

kELPSlp

lpkDE

lpknodesDE

P

N

Page 25: Trie  Indexes for Efficient XML Query Processing

25

Query Evaluation Using Label-Union Theorem

A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4N [2]

(A)(A,A)(A,B)(A,A,B)(A,B,B)(A,B,C)(B,B,C)(A,B,D)

{A1,}{A2}{B1, B4}{B2, B3,}{B5}{C1, C2, C3} {C4}{D1}

Query 2: //B/CLPS(E,2) = {(A,B,C),

(B,B,C)}

Page 26: Trie  Indexes for Efficient XML Query Processing

26

OutlineIntroduction MethodologyPartition induced by structural

characteristics of XMLPartition induced by fragments of XPath

AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 27: Trie  Indexes for Efficient XML Query Processing

27

N[k]-Trie Index A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Keep track of the N [k]-partitions

Use the reverse label path as key

N [2]

(A)(A,A)(A,B)(A,A,B)(A,B,B)(A,B,C)(B,B,C)(A,B,D)

{A1,}{A2}{B1, B4}{B2, B3,}{B5}{C1, C2, C3} {C4}{D1}

Page 28: Trie  Indexes for Efficient XML Query Processing

28

Query Evaluation with N [k]-Trie IndexA1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

N [2]

(A)(A,A)(A,B)(A,A,B)(A,B,B)(A,B,C)(B,B,C)(A,B,D)

{A1,}{A2}{B1, B4}{B2, B3,}{B5}{C1, C2, C3} {C4}{D1}

Query 1: //A/B/CLPS(E,2) = {(A,B,C)}

Page 29: Trie  Indexes for Efficient XML Query Processing

29

Query Evaluation with N [k]-Trie IndexA1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

N [2]

(A)(A,A)(A,B)(A,A,B)(A,B,B)(A,B,C)(B,B,C)(A,B,D)

{A1,}{A2}{B1, B4}{B2, B3,}{B5}{C1, C2, C3} {C4}{D1}

Query 2: //B/CLPS(E,2) = {(A,B,C),

(B,B,C)}

Page 30: Trie  Indexes for Efficient XML Query Processing

30

P[k]-Trie Index A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Keep track of the P[k]-partitions

Use the reverse label path as key P

[2](A)(B)

(C)

(D)

{(A1, A1), (A2, A2)}{(B1, B1), (B2, B2), (B3, B3), (B4, B4), (B5, B5)}{(C1, C1), (C2, C2), (C3, C3), (C4, C4)}{(D1, D1)}

(A,A)(A,B)(B,B)(B,C)(B,D)

{(A1, A2)}{(A1, B1), (A2, B2), (A2, B3), (A1, B4)}{(B4, B5)}{(B1, C1), (B2, C2), (B3, C3), (B5, C4)}{(B2, D1)}

(A,A,B)(A,B,B)(A,B,C)(A,B,D)(B,B,C)

{(A1, B2), (A1, B3)}{(A1, B5)}{(A1, C1), (A2, C2), (A2, C3)}{(A2, D1)} {(B4, C4)}

Page 31: Trie  Indexes for Efficient XML Query Processing

31

Query Evaluation with P[k]-Trie Index

Query 1: //A/B/CA1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 32: Trie  Indexes for Efficient XML Query Processing

32

Query Evaluation with P[k]-Trie Index

Query 2: //B/CA1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 33: Trie  Indexes for Efficient XML Query Processing

33

Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 34: Trie  Indexes for Efficient XML Query Processing

34

Query Evaluation with P[k]-Trie IndexQuery 3: //A/B[./D]/C A1

D1C2

B3B2C1

B4A2B1

C3

B5

C4

Page 35: Trie  Indexes for Efficient XML Query Processing

35

OutlineIntroductionMethodologyPartition induced by structural

characteristics of XMLPartition induced by fragments of

XPath AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationFuture Directions

Page 36: Trie  Indexes for Efficient XML Query Processing

36

Experimental SetupIndices prototyped in TIMBER

systemReport results on DBLP data

◦127M bytes◦3.3M nodes

Page 37: Trie  Indexes for Efficient XML Query Processing

37

Index Sizes

Page 38: Trie  Indexes for Efficient XML Query Processing

38

Index Creation Time

Page 39: Trie  Indexes for Efficient XML Query Processing

39

Query Evaluation//dblp/inproceedings/title/i/sub

Page 40: Trie  Indexes for Efficient XML Query Processing

40

Query Evaluation//dblp/inproceedings[./title[./i]/

sub]/ee

Page 41: Trie  Indexes for Efficient XML Query Processing

41

OutlineIntroductionMethodologyPartition induced by structural

characteristics of XMLPartition induced by fragments of

XPath AlgebraCoupling and Block Union TheoremsTrie Indices and Query EvaluationExperimental EvaluationConclustion

Page 42: Trie  Indexes for Efficient XML Query Processing

42

ConclusionP [k]-Trie index is able to facilitate

index-only plan for most queries consistently and significantly outperform N[k]-Trie and A(k)-index.

A modest k value is sufficient for providing significant performance improvements.

Page 43: Trie  Indexes for Efficient XML Query Processing

43

Thanks!!Questions?

Page 44: Trie  Indexes for Efficient XML Query Processing

44

Research Direction Further study of query decomposition

and inversion algorithmsStudy workload driven index creationDevelop other appropriate index

structures