Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/2010 1 Simulation Revised for...
-
Upload
curtis-boustead -
Category
Documents
-
view
216 -
download
2
Transcript of Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/2010 1 Simulation Revised for...
Yinghui Wu, LFCS DB talk
Database Group Meeting Talk
Yinghui Wu
10/11/2010
1
Simulation Revised for Graph
Pattern Matching
Yinghui Wu, LFCS DB talk
Outline
Graph Simulation• label equality, edge-to-edge matching relation
Bounded Simulation• node predicates, edge bound, edge-to-path matching relation
Reachability Queries and Graph Pattern Queries
• query containment and minimization – cubic time
• query evaluation – cubic time
Conclusion
2
A first step towards revising simulation for graph pattern matching
Yinghui Wu, LFCS DB talk
Graph Pattern Matching: the problem
Given a pattern graph P and a data graph G , decide whether
G matches P , and if so, find all the matches of P in G.
Applications• social queries, social matching
• biology and chemistry network querying
• key work search, proximity search, …
3
Widely employed in a variety of emerging real life applications
How to define?
Yinghui Wu, LFCS DB talk
Graph Simulation
Node label equivalence
Edge-to-edge relation
4
Identical label matching, edge-to-edge relations
Capable enough?
A
B
D
Bv1 v2
E
G
A
B
D EP
Yinghui Wu, LFCS DB talk
An example from real life social matching
5
Alice
biologist
doctors
3
1
1
3
P
G
edge-to-path
mappings
Graph simulation is too restrictive!
Yinghui Wu, LFCS DB talk
Bounded Simulation
data graph G = (V, E, fA)
pattern graph P = (Vp, Ep, fv, fe)
G matches P via bounded simulation if there is a binary
relation from Vp to V that for every edge of P, there exists a
path in G satisfying the constraints of the edge.
bounded simulation v.s graph simulation
• node matches v.s label equality
• edge-to-path matching v.s edge-to-edge matching
6Enriched model for capturing meaningful matches
special caseId = ‘Alice’
Job = ‘biologist’
Job = ‘doctors’
3
1
1
3
PG
Job = ‘biologist’
Job = ‘biologist’
Job = ‘biologist’
Job = ‘doctors’
Job = ‘doctors’
Job = ‘CTO’
Id = ‘Alice’
Yinghui Wu, LFCS DB talk
Basic results for the bounded simulation
For any graph G and pattern P, if G matches P, then there is a
unique maximum match in G for P.
The graph pattern matching problem via bounded simulation
can be solved in cubic time.
The incremental bounded simulation problem
Efficient approaches for graph pattern matching
extension for multiple edge colors?
7
Yinghui Wu, LFCS DB talk
Considering edge types…8
Real life graphs have multiple edge types
Essembly Network
friends-allies
friends-nemeses
strangers-nemeses
strangers-allies
Yinghui Wu, LFCS DB talk
Querying Essembly network: an example9
Essembly Network
fafn
snsa
Alice
Biologists supporting Cloning
Doctors Against cloning
fa<=2 sa<=2
fn
fn
P
fa<=2 sn
fa+
Pattern queries with multiple edge types
Yinghui Wu, LFCS DB talk
Graph reachability and pattern queries
Real life graphs usually bear different edge types…
data graph G = (V, E, fA, , fC)
• Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a
subclass of regular expression of:
F ::= c | c≤k | c+ | FF
Qr(G): set of node pairs (v1, v2) that there is a nonempty path
from v1 to v2 , and the edge colors on the path match the
pattern specified by fe.
10
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’
fa<=2 fn
Yinghui Wu, LFCS DB talk
Graph pattern queries
11
graph pattern queries PQ Qp =(Vp, Ep, fv , fe) where for each
edge e=(u,u’), Qe=(u1, u2, fv(u) , fv(u’), fe(e)) is an RQ.
Qp(G) is the maximum set (e, Se)
for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that
(v2,v3) is in Se2 .
for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is
a v3 that (v1,v3) is in Se2
PQ vs. simulation and bounded simulation
search condition on query nodes
mapping edges to paths
constrain the edges on the path with a regular expression
RQ and bounded simulation are special cases of PQ
Yinghui Wu, LFCS DB talk
Reachability and graph pattern query: examples
12
fafn
snsa
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’
fa<=2 fn
Id=‘Alice’
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’dsp=‘cloning’
fa<=2 sa<=2
fn
fn
fa<=2 sn
fa+
Yinghui Wu, LFCS DB talk
Fundamental problems: query containment
PQ Q1 (V1, E1, fv1 , fe1) is contained in Q2 (V2, E2, fv2
, fe2) if there
exists a mapping λ from E1 to E2 s.t for any data graph G and e
in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that
Q1(G) is mapped to Q2(G).
Query containment and equivalence problems can all be
determined in cubic time
• Query similarity based on a revision of graph simulation
• Determine the query similarity in cubic time
13
Query containment and equivalence for PQs can be solved efficiently
Yinghui Wu, LFCS DB talk
query containment: example
14
B1
C1
Q1
C3C2
h<=1
h<=2
h<=3
B2
Q2
C4
h<=1
B3
C5
Q3
C6
h<=1 h<=3
Yinghui Wu, LFCS DB talk
Fundamental problems: query minimization
Query minimization problem
• input: a PQ Qp
• output: a minimized PQ Qm equivalent to Qp
Query minimization problem can be solved in cubic time.• compute the maximum node equivalent classes based on a
revision of graph simulation;
• determine the number of redundant nodes and edges based on
the equivalent classes;
• Removed redundant and isolated nodes and edges
15
Query minimization for PQs can be solved efficiently
Yinghui Wu, LFCS DB talk
query minimization: example
16
R
B
Q1
B
C
f
h<=2g<=3
g
C C C
h<=2
g<=3
R
B B
f g
C C
h<=2
g<=3 h<=2
g<=3
R
B B
f g
C C
h<=2
g<=3 g<=3
h<=2
Q2 Q3
Yinghui Wu, LFCS DB talk
Evaluating graph pattern queries
17
PQ can be answered in cubic time.
• Join-based Algorithm JoinMatch
Matrix index vs distance cache
join operation for each edge in PQ until a fixpoint is
reached (wrt. a reversed topological order)
• Split-based Algorithm SplitMatch
blocks: treating pattern node and data node uniformly
partition-relation pair
Graph pattern matching can be solved in polynomial time
Yinghui Wu, LFCS DB talk
Example of JoinMatch
18
fafn
snsa
Id=‘Alice’
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’dsp=‘cloning’
fa<=2 sa<=2
fn
fn
fa<=2 sn
fa+
Yinghui Wu, LFCS DB talk
Example of JoinMatch
19
fafn
snsa
Id=‘Alice’
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’dsp=‘cloning’
fa<=2 sa<=2
fn
fn
fa<=2 sn
fa+
Yinghui Wu, LFCS DB talk
Example of JoinMatch
20
fafn
snsa
Id=‘Alice’
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’dsp=‘cloning’
fa<=2 sa<=2
fn
fn
fa<=2 sn
fa+
Yinghui Wu, LFCS DB talk
Example of JoinMatch
21
fafn
snsa
Id=‘Alice’
Job=‘biologist’, sp=‘cloning’
Job=‘doctors’dsp=‘cloning’
fa<=2 sa<=2
fn
fn
fa<=2 sn
fa+
Yinghui Wu, LFCS DB talk
Experimental results – effectiveness of PQs
22
Effectiveness of PQs: edge to path relations
Yinghui Wu, LFCS DB talk
Experimental results – querying real life graphs
23
Evaluation algorithms are sensitive to pattern edges
Varying |Vp| Varying |Ep|
Yinghui Wu, LFCS DB talk
Experimental results – querying real life graphs
24
The algorithms are sensitive to the number of predicates
Varying |pred| Varying b
Yinghui Wu, LFCS DB talk
Experimental results – querying synthetic graphs
25
The algorithms scale well over large synthetic graphs
Varying |V| (x105) Varying b
Yinghui Wu, LFCS DB talk
Experimental results – querying synthetic graphs
26
The algorithms scale well over large synthetic graphs
Varying α Varying cr
Yinghui Wu, LFCS DB talk
Conclusion
Simulation revised for graph pattern matching
• Bounded Simulation node predicates, edge bound, edge-to-path matching relation
• Reachability Queries and Graph Pattern Queries
query containment and minimization – cubic time
query evaluation – cubic time
Future work• extending RQs and PQs by supporting general regular
expressions
• incremental evaluation of RQs and PQs
27
Simulation revised for graph pattern matching
Yinghui Wu, LFCS DB talk 28
“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)
Terrorist Collaboration Network (1970 - 2010)
Thank you!