Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi,...

Post on 18-Jan-2018

217 views 0 download

description

Copyright  2008 by CEBT Motivation  Query Processing on Metadata with Conflicts (FICSR, SIGMOD07) F eedback-based I n C on S istency R esolution and Query Processing on Misaligned Data Sources 3

Transcript of Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi,...

Sum-Max Monotonic Ranked Joins for Evaluating Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data GraphsTop-K Twig Queries on Weighted Data Graphs

Yan Qi, Arizona State UniversityK. Selcuk Candan, Arizona State University

Maria Luisa Sapino, University of Torino

VLDB ’07, September 23-28, 2007, Vienna, AustriaVLDB ’07, September 23-28, 2007, Vienna, Austria

2008. 01. 25Summarized by Dongjoo Lee, IDS Lab., Seoul National University

Presented by Dongjoo Lee, IDS Lab., Seoul National University

Copyright 2008 by CEBT

ContentsContents Motivation Top-k Twig Queries over Weighted Data Graphs Answering Twig Queries on Weighted Graphs

Sum-Max Monotonicity Progressive Result Enumeration

HR-Join, MHR-Join Experiments Conclusion

2

Copyright 2008 by CEBT

MotivationMotivation Query Processing on Metadata with Conflicts (FICSR,

SIGMOD07) Feedback-based InConSistency Resolution and Query

Processing on Misaligned Data Sources

3

Copyright 2008 by CEBT

FICSR Integrated RepresentationFICSR Integrated Representation

4

Internal FICSR Representation

Simplified Visualization for the User

Copyright 2008 by CEBT

FICSR AggrementsFICSR Aggrements Based on source analysis and user feedback

5

Copyright 2008 by CEBT

Top-k Twig Queries over Weighted Data Top-k Twig Queries over Weighted Data GraphsGraphs XPath, XQuery

6

More desirable!Can we find it before the other?

Copyright 2008 by CEBT

Answering Twig Queries on Weighted Answering Twig Queries on Weighted GraphsGraphs

NP-complete problem By reduction from the Group Steiner Tree Problem.

7

VLSI design

Copyright 2008 by CEBT

How to Solve the Problem?How to Solve the Problem? Use ranked-join algorithms for top-k queries

A query plan with better sub-plans is always more desirable Must be monotonic

Twig query? Sum-max monotonicity is held! We can enumerate the result incrementally!

8

Copyright 2008 by CEBT

Sum-Max Monotonicity (1)Sum-Max Monotonicity (1)

9

A//C A//D A[//C]//D

A

E

C B

D

5

7 3

210

12

A

E

C B

D

5

7 3

210

12

A

E

C B

D

5

7 3

210

12

Cost = 12 Cost = 10 Cost = 17 < 22

PROPOSITION 1q = Tq (Vq, Eq)r = <µnode, uedge>SR = {sr1, sr2, …, srm}

Rsr

iiRsri

i

srRsr )(cost)(cost))(cost(max

(max(10, 12) = 12) < (cost = 17) < (12 + 10 = 22)

Copyright 2008 by CEBT

Sum-Max Monotonicity (2)Sum-Max Monotonicity (2)

10

PROPERTY 1answer r1, r2

sets of sub result R1, R2

)(cost)(cost))(cost(max)(cost 211

2

rrsrsrRsr

jRsrii

j

Use for pruning

Copyright 2008 by CEBT

Progressive Result EnumerationProgressive Result Enumeration

11

cost

Join for A[//C]//BA//B A//C

v3(4) v2(2)

v1(7)v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v8(9)

Horizon = ∞

Horizon = 14Horizon = 11

14 ( <=5+9)

v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v4(10) v8(9)v1(10) v5(10)

14

11 ( <=10+7)

1411

v3(4) v2(2)

v8(5) v1(7)v9(7) v19(8)

v4(10) v8(9)v1(10) v5(10)

v11(13) v15(10)v16(12)

14

11

Horizon = 14Horizon Tightening

Horizon Relaxation

The best can be returned as the top-1 result

submatchesallsubmatchesnecessarysubmatchesalladdataoverhe

_#_#_#

Copyright 2008 by CEBT

HR-Join : Horizon based Ranked JoinHR-Join : Horizon based Ranked Join

12

Result SieveCreates cost ranked output streamUse heap sort Alg.Control the horizon valves

Horizon ValveControl the data availability on a given stream of cost-ranked dataUse horizon variable that externally controlled

Copyright 2008 by CEBT

Query Plan using HR-Join OperatorsQuery Plan using HR-Join Operators

13

A query twig and sub-queries

Copyright 2008 by CEBT

M-way HR-Joins (HRM-Joins)M-way HR-Joins (HRM-Joins)

14

A query twig and sub-queries

Copyright 2008 by CEBT

Sub JobsSub Jobs Sub-result enumeration

K-shortest simple paths problem O(k|V| (|E| + |V|log|V|))

Dealing with “*” wildcards in twigs Can be expensive Query rewriting

15

Copyright 2008 by CEBT

ExperimentsExperiments Data

FICSR weighted graph data Query plans

HR-Join, M-way HR-Join 2 significantly different join selectivity distributions

– ~10% and 1 to 1

16

Copyright 2008 by CEBT

ResultsResults

17

Copyright 2008 by CEBT

ConclusionConclusion Twig query processing over weighted data graphs. Optimization using HR-Join based on sum-max

monotonicity HR-Join, MHR-Join

18