Applications of Relative Importance

30
1 Applications of Relative Importance Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data Graphs become too complex for manual analysis

description

Applications of Relative Importance. Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data Graphs become too complex for manual analysis. Existing Techniques. Web PageRank (Google) Social Networks ‘Centrality’ - PowerPoint PPT Presentation

Transcript of Applications of Relative Importance

Page 1: Applications of Relative Importance

1

Applications of Relative Importance

Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data

Graphs become too complex for manual analysis

Page 2: Applications of Relative Importance

2

Existing Techniques Web

PageRank (Google) Social Networks

‘Centrality’

All focus on global measures of node importance – we’re interested in importance relative to a set of root nodes R

Page 3: Applications of Relative Importance

3

Use Existing Techniques?

Use global algorithm on the subgraph surrounding root nodes?

No preferential treatment of root nodes – just ranking surrounding nodes.

Page 4: Applications of Relative Importance

4

Organization: Relative importance Algorithms

Notation Problem Formulation General Framework Algorithms

Page 5: Applications of Relative Importance

5

Notation Digraph

G = (V, E) Edges

Ordered pair of nodes (u, v) Graphs are directed, unweighted, simple Walks from u to v

a.k.a. A walk is a path with no repeated nodes

1 2 ... ku u u u v 1 1 2( , ),( , ),...,( , )ku u u u u v

Page 6: Applications of Relative Importance

6

Notation k-short paths P(u,v) – set of paths between u and v – set of distinct out-going edges from

u Similarly, we have

( )outS u( ) ( )out outd u S u

( ) ( )in ind u S u

Page 7: Applications of Relative Importance

7

Problem Formulation

1. Given G and r and t, where , compute the “importance” of t w.r.t. root node r:

{r,t} G

|I t r

Page 8: Applications of Relative Importance

8

Problem Formulation

2. Given G and node , rank all vertices in T(G), T V, w.r.t. r.

r G

Page 9: Applications of Relative Importance

9

Problem Formulation

3. Given G, a set of nodes T(G) to rank, and a set of root nodes R(G) where R V, rank all vertices in T w.r.t. R.

This is similar to the last case, except that we compute rather than

Average importance:

|I t r |I t R

1| |

r R

I t R I t rR

Page 10: Applications of Relative Importance

10

Problem Formulation (3 cont’d.) Rather than average each node’s

importance score, we could define

This requires ‘important’ nodes to have a high importance score among all nodes in R

| min | :I t R I t r r R

Page 11: Applications of Relative Importance

11

Problem Formulation

4. Given G, rank all nodes where R=T=V.

Page 12: Applications of Relative Importance

12

General Framework:Weighted Paths

Nodes are related according to the paths that connect them

The longer the path, the less importance:

is a scalar coefficient,

P(r,t) is a set of paths from r to t, pi is the ith path in P.

Importance decays exponentially

,

1

|

i

P r tp

i

I t r 1

Page 13: Applications of Relative Importance

13

How to choose P(r,t)?

Path examples

A

R

D

E

F

T

C

B

A

R

D

E

F

T

C

B

a. b.

Shortest pathsfrom R to T:{R-C-T. R-D-T}which fail to capture much ofConnectivity fromR to T.

Page 14: Applications of Relative Importance

14

Shortest Path

e.g.: Transport cargo from r to t

Shortest path doesn’t always give a good approximation of importance. E.g: the web (graph b)

Page 15: Applications of Relative Importance

15

k-Short Paths Paths of length K Idea: there might often be longer paths than the shortest ones that are

important to take into account Fixes problem of longer, important

paths in Shortest Paths e.g.: graph b., 3-short

Problem: capacity constraints e.g.: network topology

Page 16: Applications of Relative Importance

16

k-Short Node-Disjoint Paths

No nodes and no edges are repeated Implicitly enforces capacity constraints Motivated by ‘mass flow’ where

importance can ‘flow’ along paths e.g.: graph b.

Breadth-first with some heuristic, with some K and some

Page 17: Applications of Relative Importance

17

Markov Chains & Relative Importance

Graph viewed as a stochastic process Explanation of Markov Chains Token traversing Chain… Obviously good for modeling the web

Page 18: Applications of Relative Importance

18

Markov Chains & Relative Importance

Markov Centrality Mean First Passage Time

: expected number of steps until first arrival at node t starting at node r : probability that the chain first returns to

state t in exactly n steps

1

( )rt rtn

m nf n

rtm

( )rtf n

Page 19: Applications of Relative Importance

19

Markov Chains & Relative Importance

Bias toward ‘central nodes’ COMPLEX!!

Time: O(|V|3) (inversion of |V|x|V| transition matrix)

Space: O(|V2|)

1( | )

1rt

r R

I t Rm

R

Page 20: Applications of Relative Importance

20

Markov Chains & Relative Importance

PageRank Uses backlinks to assign importance to

web pages

Page 21: Applications of Relative Importance

21

Markov Chains & Relative Importance

PageRank Less complex

Converges logarithmically 322 million links

processed in 52 iterations

Page 22: Applications of Relative Importance

22

Markov Chains & Relative Importance

Retrofit PageRank such that all nodes in R have a uniform bias at the start

‘Surfer’ begins at a root node, traverses graph, returning to root set R with probability at each time-step

I(t|R) = probability that surfer visits t during a walk

Page 23: Applications of Relative Importance

23

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

Page 24: Applications of Relative Importance

24

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

More complex in and out degrees

changed Shortest path

lengths between nodes changed (e.g.: A-B)

Analysis which follows, R={A,F}

Page 25: Applications of Relative Importance

25

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

HITSPaA .252F .241G .128C .110E .099H .052D .032J .025I .032B .024

HITSPhF .225A .186D .162B .119E .090I .067H .061J .050G .028C .008

Page 26: Applications of Relative Importance

26

Experiments (Simulated Data)

D F

E

J

C HA

B

G

I

MarkovCJ .180C .133G .130H .129E .111I .101F .069D .051A .047B .044

KSMarkovH .146G .142E .142J .140C .120I .098F .087D .061A .034B .024

Page 27: Applications of Relative Importance

27

Experiments (9/11 Terrorist Network)

63 nodes (terrorists) 308 edges (interactions)

Page 28: Applications of Relative Importance

Rank PRankP HITSP WKPaths MarkovC KSMarkov

1 Khemais Khemais Beghal Atta Khemais

2 Beghal Beghal Khemais Al-Shehhi Beghal

3 Moussaoui Atta Moussaoui Al-Shibh Moussaoui

4 Maaroufi Moussaoui Maaroufi Moussaoui Maaroufi

5 Qatada Maaroufi Bensakhria Jarrah Qatada

6 Daoudi Qatada Daoudi Hanjour Daoudi

7 Courtaillier Bensakhria Qatada Al-Omari Bensakhria

8 Bensakhria Daoudi Walid Khemais Courtaillier

9 Walid Courtaillier Courtaillier Qatada Walid

10 Khammoun Khammoun Khammoun Bahaji Khammoun

Page 29: Applications of Relative Importance

29

Conclusion

Provides a first-step to addressing ‘relative-importance’

Scaling for algorithms such as Markov Chaining can be an issue

Using different algorithms and comparing results can reveal interesting information

…Paper Analysis…

Page 30: Applications of Relative Importance

30

References White, Smyth. Algorithms for Estimating Relative

Importance in Networks. SIGKDD ’03. Page, Brin, Motwani, Winograd. The PageRank Citation

Ranking: Bringing Order to the Web. Stanford University, Computer Science Department Technical Report.

Wikipedia on Markov Chains http://en.wikipedia.org/wiki/Markov_chain http://en.wikipedia.org/wiki/Examples_of_Markov_chains