STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of...

35
STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA [email protected] Andrea Pugliese DEIS Dept. University of Calabria, Italy [email protected] John Grant, V.S. Subrahmanian Computer Science Dept. University of Maryland, USA {grant,vs}@cs.umd.edu 1

Transcript of STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of...

Page 1: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

1

STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS

Chanhyun KangComputer Science Dept.

University of Maryland, [email protected]

Andrea PuglieseDEIS Dept.

University of Calabria, [email protected]

John Grant, V.S. SubrahmanianComputer Science Dept.

University of Maryland, USA{grant,vs}@cs.umd.edu

Page 2: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

2

MotivationLet’s assume that there is a social network including spatio-temporal information with certainty values.

Maryland

Bethesda

Potomac

Page 3: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

3

At time point 5Within Maryland

Motivation• Query example

• Find all people who attended a party in Maryland at time point 5 with certainty at least 0.5

Spatial constraint Temporal constraint Certainty constraintCommon subgraph matching query

The query contains not only common graph query but also constraints for spatio-temporal information and certainty values

At least 0.5 certainty

Page 4: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

4

Motivation• In graph query research

• Several subgraph matching algorithms and index structures are suggested

• The indexes and the algorithms consider graph structure property only

• But in order to answer the query efficiently, we need to consider • Graph structure property• Spatio-temporal information property • Certainty information property

• So, we suggest a new index structure considering the properties and a query processing algorithm using the index.

Page 5: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

5

In this paper• Introduce STUN: Spatio-Temporal Uncertainty (Social)

Network

• Define STUN query language

• Develop STUN index, a disk based index structure

• Develop a query processing algorithm using STUN index

• Evaluate the algorithms

Page 6: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

6

STUN• Spatio-Temporal Uncertainty (Social) Network is an

extension of social networks • Supports aspects of spatio-temporal uncertainty in

networks • Where and when the relationships are/were true• How certain we are that the relationships hold/held

• Defined by a set of STUN tuples• STUN tuple : STUN quadruple + STUN annotation• STUN quadruple : two vertices, a relationship and a certainty

value • STUN annotation : spatio-temporal information

Page 7: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

7

Syntax : STUN quadruple • STUN quadruple : (v, l, v’; c)

• v, v’ ∈ V (vertices) and l ∈ L (labels)• Certainty factor c [0,1]∈

EdJimFriend;0.7

“Jim” is a friend of “Ed” with certainty 0.7

For example, (Jim, Friend, Ed; 0.7)

Page 8: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

8

Syntax : STUN annotation• STUN annotation: [R,T]

• Expresses spatial information and temporal information

• R is a region, a set of space points in a spatial reference system S• S [0,M] x [0,N] with M,N ⊆ ∈ R (Real numbers)• A space point is a member of S

• T is a time interval, a pair(st, et) with st ≤ et• st and et are time points to express the start and the end of a

specific period• A time point is a member of a temporal reference system [L, U]

Page 9: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

9

Syntax : STUN tuple• STUN tuple : (v, l, v’; c) : [R, T]

• STUN quadruple + STUN annotation

• A STUN knowledge base is a finite set of STUN tuples.

Ex. (Phil, Organized, Party2; 1):[Bethesda, (15,15)]

Party2Phil( ,Organized, ;1) [Bethesda, (15,15)]

“Phil” organized “Party2” with certainty 1 and the event occurred at time 15 at some location within the region “Bethesda”

Page 10: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

10

STUN QUERY LANGUAGE

Page 11: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

11

STUN Queries• A STUN query q contains

• Graph part (Gq)

• Subgraph query • Minimum certainty values for the relationships in the graph query

• Constraint Part (Cq)

• Constraints for spatial information• Constraints for temporal information

Find all people who attended a party in Maryland at time point 5

with certainty at least 0.5Subgraph query Constraint for spatio-

temporal information

Minimum certainty value

Example.

Page 12: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

12

• Graph part : Gq • Subgraph query and Minimum certainty values• A set of query graph tuples• Variables are denoted using “?”; output variables are

underlined• A query graph tuple is (v, l, v’; c) : [R, T] where

• v, v’ ∈ V U VARV, l ∈ L U VARL, c [0,1], ∈

• R VAR∈ R and T VAR∈ T

STUN Queries

Find all people(?I) who attended a party(?P) in Maryland at time point 5

with certainty at least 0.5Subgraph query

Example.

Minimum certainty value Gq={(?I, attended, ?P; 0.5):[?s,?t]}

Page 13: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

13

Find all people(?I) who attended a party(?P) in Maryland at time point 5

with certainty at least 0.5

STUN Queries

• Constraint part: Cq

• Specify spatial constraints and temporal constraints• Expressed by

• Predicate symbols • Represent a spatial relation or a temporal relation

• Parameters for the predicates• Ground terms or variables in the graph part

Spatial constraint

Example.

Cq ={inside(?s, Maryland), during(?t,[5,5])}

Temporalconstraint

Page 14: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

14

STUN Query example• Find all people(?I) who attended a party(?P) in Maryland

at time point 5 with certainty at least 0.5

Gq={(?I, attended, ?P; 0.5):[?s,?t]}

Cq ={inside(?s,Maryland), during(?t,[5,5])}

Page 15: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

15

STUN Query example• Finds all people(?I)

• who have been a friend of ‘Jim’ in the time interval [10,20] with certainty at least 0.9 as well as a friend of ‘Phil’ in the same interval with certainty at least 0.6

• And who attended a party(?P) in Maryland organized by ‘Phil’ that occurred during the time interval [0,20]

Gq={(?I, attended, ?P; 1.0):[?s1,?t1], (?I, friend, Jim; 0.9):[?s2,?t2], (?I, friend, Phil; 0.6):[?s2,?t2], (Phil, organized, ?P; 1.0):[?s1,?t1],}

Cq={inside(?s1, Maryland), during(?t1,[0,20]), during(?t2,[10,20])}

Page 16: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

16

STUN query answer• A substitution θ maps variables to ground terms

• Each ground term maps to itself• Denote the application of θ to a term x as xθ

• A substitution θ is an answer to a STUN query q:(Gq, Cq)• The tuples with θ for the Gq exist in the STUN KB

• The certainty values of the tuples in STUN KB are larger than minimum certainty in the Gq

• The spatio-temporal information of the tuples satisfy all constraints in the Cq

-

- And is true

?PPhil Organized OrganizedPhil Party3

Substitution θ

Page 17: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

17

STUN INDEX

Page 18: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

18

STUN Index• A balanced tree • Each leaf node represents a portion of the STUN knowledge base.• Each inner node captures the subgraph represented by its child

nodes.

Page 19: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

19

STUN Index• Each node occupies a disk page and contains

• MBR(minimum bounding rectangle)• Envelops the regions associated with the STUN tuples in the subgraph

of child nodes

• MBI(minimum bounding interval)• Envelops the time intervals associated with the STUN tuples in the

subgraph of child nodes

• On processing queries, MBRs and MBIs are used to prune nodes for the answers using spatial constraints and temporal constraints

A spatial reference system

R1R2

R3

R1,R2, R2: regionsN1,N2,N3: nodes

N1 N2

N3

R1

R2

R3

MBR of N1

MBR of N2

MBR of N3

Page 20: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

20

STUN Index• Reduce the number of nodes to read for answering queries.

• Each index node should have• Few cross edges with other nodes at the same level• Small MBR(minimum bounding rectangle) and small MBI(minimum

bounding interval) • Small MBR overlaps with other nodes at the same level• Small MBI overlaps with other nodes at the same level.

• In order to achieve the constraints• Build a vertex and edge weighted undirected graph(WUG) from the

STUN KB • Then, handle the weights on building the index

Page 21: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

21

Building STUN Index

I. Initial step• Build a vertex and edge weighted undirected graph(WUG) from

STUN KB• The weights are used to satisfy the constraints

• Few cross edges• Small MBR(minimum bounding rectangle) and small MBI(Minimum

bounding interval) • Small MBR overlaps and small MBI overlaps

II. Coarsening Step• Merging vertices using weights of vertices and edges

III. Partitioning Step• Build a tree index using coarsened graphs

Page 22: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

22

Building Index- Initial Step

I. Initial Step• Build a vertex and edge weighted undirected graph(WUG)• Assign weights of vertices as 1• Calculate weights of edges using a spatio-temporal vertex distance

function • Calculate MBR(minimum bounding rectangle)s and MBI(minimum

bounding interval)s for edges

v1 v2v0v1 v2v0

𝜹 (𝒗𝟎 ,𝒗𝟏 ) 𝜹 (𝒗𝟏 ,𝒗𝟐 )1 1 1e0

e1

e2

Each edge contains a spatio-temporal information with a certainty value

MBRMBI

MBRMBI

{𝒍𝒆𝟎 } {𝒍𝒆𝟏 , 𝒍𝒆𝟐 }labels

WUG

Page 23: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

23

Building Index- Initial Step Spatio-temporal vertex distance function

• Looks at the neighborhood of the two vertices• Measures the “amount” of space and time the vertices share with

each other with respect to their neighborhoods.

,

, ,,

𝛼+𝛽=1 ,𝑎𝑛𝑑𝛼 , 𝛽∈𝑅

,

Page 24: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

24

Building Index- Coarsening• Coarsen the graph until the size of the coarsened graph is less than 1 disk

page • At each coarsening level l, the number of vertices in Gl is half of the

number of vertices in Gl-1

Original graphG0

G1

G2

Gk

Merging vertices

Merging vertices

Merging vertices

Coarsening

Level 0

Level 1

Level k

Level 2

… …N

# of vertices

N/2

N/4

N/2k

Page 25: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

25

How to merge verticesChoose a vertex v randomly to merge

Select a neighbor m of v with minimum edge weight(v is merged into m)

Update the weight of vertex m : +

Update the weight, MBR and MBI of edges of v and m(If there is no edge between m and a neighbor of v, add an edge between m and the neighbor)

Delete the edge between v and m and the vertex v

Update mapping information: (v))

Page 26: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

26

MBR(all edges of Gk)MBI(all edges of Gk)

Building Index- Partitioning

Gk-2

Gk-1

Gk

2. Partition

3.Induce subgraphs using the mapping information from Gk-1 to Gk

a bMBR(all edges of a) MBI(all edges of a)

MBR(all edges of b) MBI(all edges of b)

1. Store Gk as a root page

4. Store the subgraphs as child pages

5. Do the works until at the lowest coarsening level recursively

Coarsened graphs

- Each edge already has a MBR and a MBI

Page 27: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

27

Query Answering• STUN index is used to get candidates for variables

• Retrieve the index tree using mapping information with ground terms(constants) in a query

• MBR(minimum bounding rectangle) and MBI(minimum bounding interval) are used to filter out the unnecessary pages for the query answer with regard to spatial and temporal constraints

Phil?IJim?I friend friend Phil ?Porganized

- Check MBRs and MBIs of pages with the constraints for pruning

STUN index

Page 28: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

28

Query Answering• Overall algorithm

I. Get candidates for each variable of a query

II. Select a variable that has the smallest number of candidates

III. Substitute each candidate for the variable

IV. For each substitution, do steps II and III for remaining variables recursively in a depth first manner

V. If no variable is left, return the substitutions

Page 29: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

29

EVALUATION

Page 30: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

30

Experiment : Environment• We developed a prototype implementation in about

10,600 lines of Java code• Ran the code on a laptop

• a dual-core 2.8 GHz CPU with 8G of RAM running Window 7• Indexes are on the disk (No explicit buffer to load the index)

• Experiments for the scalability of the STUN index by varying • The size of the graph• The complexity of queries• The number of constraints in queries

• Queries are randomly generated from STUN KBs• Each query has at least one answer.

• More than 10000 queries are tested

Page 31: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

31

Experiment : Dataset• YouTube dataset

• Vertices : people and groups• 20% of groups have a region randomly assigned

• Edge relations • ‘follow’ : person to person, a time interval • ‘membership’ : person to group, a time interval• ‘co-located’ : person to group, a time interval and a region

• Time intervals are randomly assigned to ‘follow’ and ‘membership’ relationships

• A ‘co-located’ edge is added between two members if• They have ‘membership’ relationships with a same group• And they have overlapped time interval with the same group• And the same group has an assigned region

Page 32: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

32

Experiment: Result• Every single data point was obtained by running 200 queries.

Page 33: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

33

Experiment: Result• The query processing time increases slightly super-linearly with the size

of the database thought the slope of the graph increases with the complexity of the query.

Page 34: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

34

Conclusion• Introduce Spatio-Temporal Uncertainty (Social) network• Define STUN query language• Develop a disk based index structure • Develop a query processing algorithm• Do experiments for evaluating the STUN system

Page 35: STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA chanhyun@cs.umd.edu Andrea Pugliese.

35

Questions