Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi,...

18
Sum-Max Monotonic Ranked Joins for Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Evaluating Top-K Twig Queries on Weighted Data Graphs Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State University Maria Luisa Sapino, University of Torino VLDB ’07, September 23-28, 2007, Vienna, Austria VLDB ’07, September 23-28, 2007, Vienna, Austria 2008. 01. 25 Summarized by Dongjoo Lee, IDS Lab., Seoul National University Presented by Dongjoo Lee, IDS Lab., Seoul National University

description

Copyright  2008 by CEBT Motivation  Query Processing on Metadata with Conflicts (FICSR, SIGMOD07) F eedback-based I n C on S istency R esolution and Query Processing on Misaligned Data Sources 3

Transcript of Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi,...

Page 1: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Sum-Max Monotonic Ranked Joins for Evaluating Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data GraphsTop-K Twig Queries on Weighted Data Graphs

Yan Qi, Arizona State UniversityK. Selcuk Candan, Arizona State University

Maria Luisa Sapino, University of Torino

VLDB ’07, September 23-28, 2007, Vienna, AustriaVLDB ’07, September 23-28, 2007, Vienna, Austria

2008. 01. 25Summarized by Dongjoo Lee, IDS Lab., Seoul National University

Presented by Dongjoo Lee, IDS Lab., Seoul National University

Page 2: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

ContentsContents Motivation Top-k Twig Queries over Weighted Data Graphs Answering Twig Queries on Weighted Graphs

Sum-Max Monotonicity Progressive Result Enumeration

HR-Join, MHR-Join Experiments Conclusion

2

Page 3: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

MotivationMotivation Query Processing on Metadata with Conflicts (FICSR,

SIGMOD07) Feedback-based InConSistency Resolution and Query

Processing on Misaligned Data Sources

3

Page 4: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

FICSR Integrated RepresentationFICSR Integrated Representation

4

Internal FICSR Representation

Simplified Visualization for the User

Page 5: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

FICSR AggrementsFICSR Aggrements Based on source analysis and user feedback

5

Page 6: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Top-k Twig Queries over Weighted Data Top-k Twig Queries over Weighted Data GraphsGraphs XPath, XQuery

6

More desirable!Can we find it before the other?

Page 7: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Answering Twig Queries on Weighted Answering Twig Queries on Weighted GraphsGraphs

NP-complete problem By reduction from the Group Steiner Tree Problem.

7

VLSI design

Page 8: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

How to Solve the Problem?How to Solve the Problem? Use ranked-join algorithms for top-k queries

A query plan with better sub-plans is always more desirable Must be monotonic

Twig query? Sum-max monotonicity is held! We can enumerate the result incrementally!

8

Page 9: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Sum-Max Monotonicity (1)Sum-Max Monotonicity (1)

9

A//C A//D A[//C]//D

A

E

C B

D

5

7 3

210

12

A

E

C B

D

5

7 3

210

12

A

E

C B

D

5

7 3

210

12

Cost = 12 Cost = 10 Cost = 17 < 22

PROPOSITION 1q = Tq (Vq, Eq)r = <µnode, uedge>SR = {sr1, sr2, …, srm}

Rsr

iiRsri

i

srRsr )(cost)(cost))(cost(max

(max(10, 12) = 12) < (cost = 17) < (12 + 10 = 22)

Page 10: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Sum-Max Monotonicity (2)Sum-Max Monotonicity (2)

10

PROPERTY 1answer r1, r2

sets of sub result R1, R2

)(cost)(cost))(cost(max)(cost 211

2

rrsrsrRsr

jRsrii

j

Use for pruning

Page 11: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Progressive Result EnumerationProgressive Result Enumeration

11

cost

Join for A[//C]//BA//B A//C

v3(4) v2(2)

v1(7)v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v8(9)

Horizon = ∞

Horizon = 14Horizon = 11

14 ( <=5+9)

v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v4(10) v8(9)v1(10) v5(10)

14

11 ( <=10+7)

1411

v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v4(10) v8(9)v1(10) v5(10)

v11(13) v15(10)v16(12)

14

11

Horizon = 14Horizon Tightening

Horizon Relaxation

The best can be returned as the top-1 result

submatchesallsubmatchesnecessarysubmatchesalladdataoverhe

_#_#_#

Page 12: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

HR-Join : Horizon based Ranked JoinHR-Join : Horizon based Ranked Join

12

Result SieveCreates cost ranked output streamUse heap sort Alg.Control the horizon valves

Horizon ValveControl the data availability on a given stream of cost-ranked dataUse horizon variable that externally controlled

Page 13: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Query Plan using HR-Join OperatorsQuery Plan using HR-Join Operators

13

A query twig and sub-queries

Page 14: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

M-way HR-Joins (HRM-Joins)M-way HR-Joins (HRM-Joins)

14

A query twig and sub-queries

Page 15: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

Sub JobsSub Jobs Sub-result enumeration

K-shortest simple paths problem O(k|V| (|E| + |V|log|V|))

Dealing with “*” wildcards in twigs Can be expensive Query rewriting

15

Page 16: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

ExperimentsExperiments Data

FICSR weighted graph data Query plans

HR-Join, M-way HR-Join 2 significantly different join selectivity distributions

– ~10% and 1 to 1

16

Page 17: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

ResultsResults

17

Page 18: Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.

Copyright 2008 by CEBT

ConclusionConclusion Twig query processing over weighted data graphs. Optimization using HR-Join based on sum-max

monotonicity HR-Join, MHR-Join

18