NetMine: Mining Tools for Large Graphs

37
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch

description

NetMine: Mining Tools for Large Graphs. Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch. Introduction. Protein Interactions [genomebiology.com]. Internet Map [lumeta.com]. Food Web [Martinez ’91]. ► Graphs are ubiquitous. Friendship Network [Moody ’01]. - PowerPoint PPT Presentation

Transcript of NetMine: Mining Tools for Large Graphs

Page 1: NetMine: Mining Tools for Large Graphs

NetMine: Mining Tools for Large Graphs

Deepayan ChakrabartiYiping ZhanDaniel BlandfordChristos FaloutsosGuy Blelloch

Page 2: NetMine: Mining Tools for Large Graphs

2

Introduction

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

► Graphs are ubiquitous

Page 3: NetMine: Mining Tools for Large Graphs

3

Graph “Patterns”

How does a real-world graph look like? Patterns/“Laws”

Degree distributions/power-laws

Count vs Outdegree

Power Laws

Page 4: NetMine: Mining Tools for Large Graphs

4

Graph “Patterns”

How does a real-world graph look like? Patterns/“Laws”

Degree distributions/power-laws “Small-world” “Scree” plots and others…

Hop-plot

Effective Diameter

Page 5: NetMine: Mining Tools for Large Graphs

5

Graph “Patterns”

Why do we like them? They capture interesting properties of graphs. They provide “condensed information” about the

graph. They are needed to build/test realistic graph

generators (useful for simulation studies extrapolations/sampling )

They help detect abnormalities and outliers.

Page 6: NetMine: Mining Tools for Large Graphs

6

Graph Patterns

Degree-distributions “power laws” “Small-world” small diameter “Scree” plots … What else?

Page 7: NetMine: Mining Tools for Large Graphs

7

Our Work

The NetMine toolkit

contains all the patterns mentioned before, and adds:

The “min-cut” plot a novel pattern which carries interesting

information about the graph. A-plots

a tool to quickly find suspicious subgraphs/nodes.

Page 8: NetMine: Mining Tools for Large Graphs

8

Outline

Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

Page 9: NetMine: Mining Tools for Large Graphs

9

“Min-cut” plot

What is a min-cut? Minimizes the number of edges cut

Two partitions of almost equal size

Size of mincut = 2

Page 10: NetMine: Mining Tools for Large Graphs

10

“Min-cut” plot

Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

Mincut size = sqrt(N)

Page 11: NetMine: Mining Tools for Large Graphs

11

“Min-cut” plot

Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

New min-cut

Page 12: NetMine: Mining Tools for Large Graphs

12

“Min-cut” plot

Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

New min-cut

Slope = -0.5

For a d-dimensional grid, the slope is -1/d

Page 13: NetMine: Mining Tools for Large Graphs

13

“Min-cut” plot

log (# edges)

log (mincut-size / #edges)

Slope = -1/d

For a d-dimensional grid, the slope is -1/d

log (# edges)

log (mincut-size / #edges)

For a random graph, the slope is 0

Page 14: NetMine: Mining Tools for Large Graphs

14

“Min-cut” plot

Min-cut sizes have important effects on graph properties, such as efficiency of divide-and-conquer algorithms compact graph representation difference of the graph from well-known

graph types for example, slope = 0 for a random graph

Page 15: NetMine: Mining Tools for Large Graphs

15

“Min-cut” plot

What does it look like for a real-world graph?

log (# edges)

log (mincut-size / #edges)

?

Page 16: NetMine: Mining Tools for Large Graphs

16

Experiments

Datasets: Google Web Graph: 916,428 nodes and

5,105,039 edges Lucent Router Graph: Undirected graph of

network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges

User Website Clickstream Graph: 222,704 nodes and 952,580 edges

Page 17: NetMine: Mining Tools for Large Graphs

17

Experiments

Used the METIS algorithm [Karypis, Kumar, 1995]

log (# edges)

log

(min

cut-

size

/ #

edge

s)

• Google Web graph

• Values along the y-axis are averaged

• We observe a “lip” for large edges

• Slope of -0.4, corresponds to a 2.5-dimensional grid!

Slope~ -0.4

Page 18: NetMine: Mining Tools for Large Graphs

18

Experiments

Same results for other graphs too…

log (# edges) log (# edges)

log

(min

cut-

size

/ #

edge

s)

log

(min

cut-

size

/ #

edge

s)

Lucent Router graph Clickstream graph

Slope~ -0.57 Slope~ -0.45

Page 19: NetMine: Mining Tools for Large Graphs

19

Observations

Linear slope for some range of values “Lip” for high #edges Far from random graphs (because slope ≠ 0)

Page 20: NetMine: Mining Tools for Large Graphs

20

Outline

Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

Page 21: NetMine: Mining Tools for Large Graphs

21

A-plots

How can we find abnormal nodes or subgraphs? Visualization

but most graph visualization techniques do not scale to large graphs!

Page 22: NetMine: Mining Tools for Large Graphs

22

A-plots

However, humans are pretty good at “eyeballing” data

Our idea: Sort the adjacency matrix in novel ways and plot the matrix so that patterns become visible to the user

We will demonstrate this on the SCAN+Lucent Router graph (284,805 nodes and 898,492 edges)

Page 23: NetMine: Mining Tools for Large Graphs

23

A-plots

Three types of such plots for undirected graphs… RV-RV (RankValue vs RankValue) Sort nodes

based on their “network value” (~first eigenvector)

Rank of Network Value

Ra

nk

of N

etw

ork

Va

lue

Page 24: NetMine: Mining Tools for Large Graphs

24

A-plots

Three types of such plots for undirected graphs… RD-RD (RankDegree vs RankDegree) Sort

nodes based on their degree

Rank of Degree of node

Ra

nk

of D

eg

ree

of n

ode

Page 25: NetMine: Mining Tools for Large Graphs

25

A-plots

Three types of such plots for undirected graphs… D-RV (Degree vs RankValue) Sort nodes

according to “network value”, and show their corresponding degree

Page 26: NetMine: Mining Tools for Large Graphs

26

RV-RV plot (RankValue vs RankValue) We can see a

“teardrop” shape and also some

blank “stripes” and a strong

diagonal (even though there

are no self-loops)!Rank of Network Value

Ra

nk

of N

etw

ork

Va

lue

Stripes

Page 27: NetMine: Mining Tools for Large Graphs

27

RV-RV plot (RankValue vs RankValue)

Rank of Network Value

Ra

nk

of N

etw

ork

Va

lue

The “teardrop” structure can be explained by degree-1 and degree-2 nodes

NV1 = 1/λ * NV2

1 2

Page 28: NetMine: Mining Tools for Large Graphs

28

RV-RV plot (RankValue vs RankValue)

Rank of Network Value

Ra

nk

of N

etw

ork

Va

lue

Strong diagonal nodes are more likely to connect to “similar” nodes

Page 29: NetMine: Mining Tools for Large Graphs

29

RD-RD (RankDegree vs RankDegree)

Rank of Degree of node

Ra

nk

of D

eg

ree

of n

ode

• Isolated dots due to 2-node isolated components

Page 30: NetMine: Mining Tools for Large Graphs

30

D-RV (Degree vs RankValue)

Rank of Network Value

Deg

ree

Page 31: NetMine: Mining Tools for Large Graphs

31

D-RV (Degree vs RankValue)

Rank of Network Value

Deg

ree

Why?

Page 32: NetMine: Mining Tools for Large Graphs

32

Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows

spikes. Why? “Stripe” nodes degree-2 nodes connecting only to the “Spike” nodes

“Spike” nodes high degree, but all edges to “Stripe” nodes

Stripe

Page 33: NetMine: Mining Tools for Large Graphs

33

A-plots

They helped us detect a buried abnormal subgraph

in a large real-world dataset which can then be taken to the domain

experts.

Page 34: NetMine: Mining Tools for Large Graphs

34

Outline

Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

Page 35: NetMine: Mining Tools for Large Graphs

35

Conclusions

We presented “Min-cut” plot

A novel graph pattern with relevance for many algorithms and applications

A-plots which help us find interesting abnormalities

All the methods are scalable Their usage was demonstrated on large real-world

graph datasets

Page 36: NetMine: Mining Tools for Large Graphs

36

RV-RV plot (RankValue vs RankValue) We can see a

“teardrop” shape and also some

blank “stripes” and a strong

diagonal.

Rank of Network Value

Ra

nk

of N

etw

ork

Va

lue

Page 37: NetMine: Mining Tools for Large Graphs

37

RD-RD (RankDegree vs RankDegree)

Rank of Degree of node

Ra

nk

of D

eg

ree

of n

ode

• Isolated dots due to 2-node isolated components