presentation(ppt)
-
Upload
hondafanatics -
Category
Documents
-
view
303 -
download
0
description
Transcript of presentation(ppt)
Containment of Partially Specified
Tree-Pattern QueriesDimitri Theodoratos (NJIT, USA)
Theodore Dalamagas (NTUA, GREECE)
Pawel Placek (NJIT, USA)
Stefanos Souldatos (NTUA, GREECE)
Timos Sellis (NTUA, GREECE)
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 3
Motivating Example () Tree structure (e.g. XML) with motorbike spare parts. We search for spare parts. BUT…
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
Stefanos Souldatos - HDMS 2006 4
Motivating Example () Dimitri Theodoratos lives in NJ. He has a Yamaha Serrow motorbike in Greece. He searches for spare parts in Greece or USA.
structural differencerr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
?
Stefanos Souldatos - HDMS 2006 5
Motivating Example () Theodore Dalamagas has a BMW motorbike. He looks for spare parts worldwide.
structural inconsistency
../F650GS/650ccrr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
../650cc/F650GS
Stefanos Souldatos - HDMS 2006 6
Motivating Example () Stefanos Souldatos has a Honda Varadero. But, he is not fully aware of the tree structure.
unknown structure
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
Stefanos Souldatos - HDMS 2006 7
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
Motivating Example () Pawel Placek wants to buy a motorbike that he can
easily find spare parts for. He searches in many different tree structures. source integration
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
Stefanos Souldatos - HDMS 2006 8
Motivation
Querying tree-structured data
BUT
structure is not always strictly defined
user does not always deal with structure: Find Honda spare parts in Greece.
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 11
Dimension Graph
rr
ATHENSATHENS
HONDAHONDA
GREECEGREECE USAUSA
YAMAHAYAMAHA BMWBMW
TRAVELTRAVEL
VARADEROVARADERO
125cc125cc 1000cc1000cc
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
650cc650cc
F650F650F650GSF650GS
YAMAHAYAMAHA BMWBMW
ON-OFFON-OFF
200cc200cc
SERROWSERROW
TRAVELTRAVEL
F650GSF650GS
650cc650cc
NJNJ
R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
R
C
B
T M
L
E
DIMENSIONS
dimension graph = summary of the tree structure
Stefanos Souldatos - HDMS 2006 13
Partially Specified Tree-pattern QueryR
C
B
T M
L
E
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
DIMENSIONS
Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)
Stefanos Souldatos - HDMS 2006 14
Partially Specified Tree-pattern Query
Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)
R
C
B
T M
L
E
partially specified paths (PSP)
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
DIMENSIONS
Stefanos Souldatos - HDMS 2006 15
Partially Specified Tree-pattern QueryR
C
B
T M
L
E
output path (*)
partially specified paths (PSP)
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
DIMENSIONS
Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)
Stefanos Souldatos - HDMS 2006 16
Partially Specified Tree-pattern Query
parentchild
R
C
B
T M
L
E
output path (*)
partially specified paths (PSP)
ancestordescendant
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
DIMENSIONS
Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)
Stefanos Souldatos - HDMS 2006 17
Partially Specified Tree-pattern Query
parentchild
R
C
B
T M
L
E
node sharing expression
(NSE)
output path (*)
partially specified paths (PSP)
ancestordescendant
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1R (oot)
C (ountry)
B (rand)
T (ype)
L (ocation)
M (odel)
E (ngine)
DIMENSIONS
Query: Find shops with spare parts for all models and all engines of BMW motorbikes in Greece. (+ structural info)
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 19
Additional Concepts
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1
C = {Greece}
Full Form Query
Stefanos Souldatos - HDMS 2006 20
Additional ConceptsR
C
B
T M
L
E
RC = {Greece}
B = {BMW}
T
ME
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1
C = {Greece}
Full Form Query
Dimension Trees
DIMENSION TREES = QUERY + GRAPH
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 22
Absolute Containment
Q1 Q2 Each result of Q1 is a result of Q2.
Stefanos Souldatos - HDMS 2006 23
Absolute Containment
Q1 Q2 Each result of Q1 is a result of Q2.
homomorphism from Q2 to Q1
Stefanos Souldatos - HDMS 2006 24
Absolute Containment
Q1 Q2 Each result of Q1 is a result of Q2.
Q1 Q2
homomorphism from Q2 to Q1
PSP p2PSP *p1
C
B
M
B
E
C
PSP p4PSP *p3
C
ME
C
Stefanos Souldatos - HDMS 2006 25
Relative Containment (w.r.t. G)
Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.
Stefanos Souldatos - HDMS 2006 26
Relative Containment (w.r.t. G)
Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.
homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
Stefanos Souldatos - HDMS 2006 27
Relative Containment (w.r.t. G)
Q1 G Q2 Each result of Q1 in G is a result of Q2 in G.
R
C
B
T
EM
A dimension tree of Q1
A dimension tree of Q2
R
C
B
T
E
homomorphism from the Dimension Trees of Q2 to the Dimension Trees of Q1
Stefanos Souldatos - HDMS 2006 28
Relative Containment Heuristic
1msec
Absolute Containment
(AC)
100msec
Relative Containment
(RC)
Stefanos Souldatos - HDMS 2006 29
Relative Containment Heuristic
1msec
Absolute Containment
(AC)
100msec
Relative Containment
(RC)
Relative Containment
Heuristic (RCH)
sound but not complete
extract structural information from the Dimension Graph insert it in the query Q1 check Q1 Q2 instead of Q1 G Q2
Stefanos Souldatos - HDMS 2006 30
Relative Containment Heuristic
B = ?
T = ?
PSP *p1
B = ?
PSP *p2
C = ?
Q1 Q2
R
C
B
T M
L
E
Q1 Q2
Example
Stefanos Souldatos - HDMS 2006 31
Relative Containment Heuristic
B = ?
T = ?
PSP *p1
B = ?
PSP *p2
C = ?
Q1 Q2
R
C
B
T M
L
E
Q1 Q2
Example
B=>T : R->C, C=>B
Stefanos Souldatos - HDMS 2006 32
Relative Containment Heuristic
B = ?
T = ?
PSP *p1
B = ?
PSP *p2
C = ?
Q1 Q2
R
C
B
T M
L
E
Q1 Q2
C = ?
R = ?
Q1 G Q2
Example
B=>T : R->C, C=>B
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 34
Experiments We measured…
execution time for Absolute Containment (AC) Relative Containment (RC) Relative Containment Heuristic (RCH)
accuracy for RCH
…for various graph sizes …for various query sizes
Stefanos Souldatos - HDMS 2006 35
TimeT
ime
(mse
c)
Graph paths: 10 - 80
Graph dimensions: 20 Graph dimensions: 30 Graph dimensions: 40
Graph paths: 15 - 120 Graph paths: 20 - 160
Query PSPs: 1 Query PSPs: 2
Tim
e (m
sec)
Nodes per PSP: 3 - 6 Nodes per PSP: 3 - 6
RC
RCH
AC
RC
RCH
AC
RCH
AC
RC
RCH
AC
RCH
AC
RC
RC
Stefanos Souldatos - HDMS 2006 36
Accuracy of RCH 80% for graphs of common sizes
based on XML benchmarks (XMach, XMark, etc.)
50% for graphs of higher density
IntroductionData Model
Additional ConceptsQuery Containment
ExperimentsConclusion
Stefanos Souldatos - HDMS 2006 38
Conclusion Query Containment for Partially Specified Tree-
Pattern Queries (PSTPQs).
Sound technique for checking Relative Query Containment Time: one order of magnitude Accuracy: over 80%
Stefanos Souldatos - HDMS 2006 39
Future Work Heuristics for checking Relative Containment
precomputed and on-the-fly trade-off between time and accuracy
Special forms of queries, e.g. swings:
B
PSP *p3PSP p1
B
A A
C C
PSP p2
Questions?
Stefanos Souldatos - HDMS 2006 41
Links
Introduction (2-9)
Data Model (10-17)
Additional Concepts (18-20)
Query Containment (21-32)
Experiments (33-36)
Conclusion (37-41)
Appendix (42-46)
Appendix
Stefanos Souldatos - HDMS 2006 43
Who defines the dimensions? Automatic
XML tags (dimension graph = “path summary”, “path index”, “structural summary”)
Semi-automatic Graph administrator + XML tags
(dimension = group of XML tags) Graph administrator + ontology
Manual Graph administrator
Stefanos Souldatos - HDMS 2006 44
Inference RulesR
C
B
T M
L
E
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1
C = {Greece}
1. Full Form Query
INFERENCE RULES(IR1) |- R[p1] R[p2](IR2) A[p1] A[p2], A[p2] A[p3] |- A[p1] A[p3](IR3) a structural expression that involves A[p] |- R[p] => A[p](IR4) A[p] B[p] |- A[p] => B[p](IR5) A[p] => B[p], B[p] => C[p] |- A[p] => C[p](IR6) A[p] B[p], A[p => C[p] |- B[p] => C[p](IR7) A[p] B[p], C[p] => B[p] |- C[p] => A[p](IR8) A[p1] B[p1], B[p1] B[p2] |- A[p2] B[p2](IR9) A[p1] => B[p1], B[p1] B[p2] |- A[p2] => B[p2](IR10) A[p1] => B[p1], A[p1] A[p2], R[p2] => B[p2] |- A[p2] => B[p2](IR11) A[p1] => B[p1], B[p1] B[p2] |- A[p1] A[p2](IR12) A[p1] B[p1], C[p2] B[p2], D[p1] D[p2] |- D[p1] => A[p1](IR13) A[p1] B[p1], A[p2] C[p2], D[p1] D[p2] |- D[p1] => A[p1](IR14) A[p1] => B[p1], B[p2] => A[p2], C[p1] C[p2] |- C[p1] => A[p1]
Stefanos Souldatos - HDMS 2006 45
Dimension TreesR
C
B
T M
L
E
RC = {Greece}
B = {BMW}
T
M
E
r/Greece/BMW/
*T[*E]/*M
RC = {Greece}
B = {BMW}
T
ME
r/Greece/BMW/*T/*M [*E]
RC = {Greece}
B = {BMW}
T
E
M
RC = {Greece}
B = {BMW}
T
M
E
E
M
r/Greece/BMW/*T[*M/*E]/*E*M
r/Greece/BMW/*T/*E/*M
C = {Greece}
B = {BMW}
M = ?
B = {BMW}
E = ?
PSP *p2PSP p1
C = {Greece}
Stefanos Souldatos - HDMS 2006 46
Previous Approaches Keyword-based search approach
Absence of structure Naive approach
All possible query patterns are generated
(Honda=>Greece, Greece=>Honda) Approximation techniques
Relax the query more answers Traditional integration approach
Global structure and mapping rules