Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An...

110
Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier 1 1 Ecole Centrale Paris eBISS 2011, 07/07/2011 Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 1 / 56

Transcript of Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An...

Page 1: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Mining & Community DetectionAn Introduction to Social Networks Data Analysis

Etienne Cuvelier1

1Ecole Centrale Paris

eBISS 2011,07/07/2011

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 1 / 56

Page 2: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

Page 3: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

Page 4: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

Page 5: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

Page 6: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

Page 7: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

Page 8: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Outline

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 4 / 56

Page 9: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 5 / 56

Page 10: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipedia

Figure: source: wikipedia

Using graphs Euler proved that the problem has no solution.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

Page 11: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipediaFigure: source: wikipedia

Using graphs Euler proved that the problem has no solution.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

Page 12: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipediaFigure: source: wikipedia

Using graphs Euler proved that the problem has no solution.Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

Page 13: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

The Graph Theory Paradigm

A graph is denoted G = (V ,E)with

V the set of vertices,E the set of edges.

{v ,w} is the edge connectingvertices v and w .

A =

a1,1 · · · a1,n

.... . .

...an,1 · · · an,n

where

ai,j =

{1 if vi and vj are connected,0 else.

If G is a weighted graph, then ai,j = ω(vi , vj) and thenA = W = (wi,j) = (ω(vi , vj)).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 7 / 56

Page 14: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

The Graph Theory Goes Social

Moreno (1933) 1st to use pointsand lines for socialconfigurations,Cartwright and Harary (1956) linkwith the graph theory,individuals are represented usingpoints, called nodes or vertices,and social relationships arerepresented using lines.

0

12

3

4

5

6

7

8

9

1011

12

13

1415

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

Zachary's Karate Club Network

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 8 / 56

Page 15: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Directed or not?

0

1

2

3

4

5

6

7

8

9

An undirected graph (ex.:Facebook): {v ,w}

0

1

2

3

4

5

6

7

8

9

A directed graph (ex.: Twitter):(v ,w).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 9 / 56

Page 16: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Theory Basic Notions

|V | = n is called the order the graph.|E | = m is called the size of the graph.If |E | = n(n − 1)/2, i.e. (any pair of vertices are connected), thegraph complete.v and u are neighbours if connected by an edge.The neighbourhood of a node v is denoted Γ(v).A subgraph G′ = (V ′,E ′) of G = (V ,E) is such V ′ ⊂ V , E ′ ⊂ Eand {v ,u} ∈ E ′ ⇒ v ,u ∈ V ′.A subset C of V can define an induced subgraphG(C) = (C,E(C)), where E(C) = {(v ,u) ∈ E |v ,u ∈ C}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 10 / 56

Page 17: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Theory Basic Notions (ctd)

A path P is a subgraphP = (V (P),E(P)) suchV (P) = {vi0 , · · · , vik} and E(P) ={{vi0 , vi1}, {vi1 , vi2}, · · · , {vik−1 , vik}

}.

k is the length path.If no vertice is repeated, then the pathis simple. length,If there exists a path between v andu, they are connected.The graph is a connected graph if∀v ,u, there is, at least, one pathconnecting v and u.A connected subgraph is called aconnected component.

0

1

2

3

45

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

21

22

23

24

2526

27

28

29

30

31

3233

Diameter of the Zachary Karate Club network

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 11 / 56

Page 18: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Graph Theory Basic Notions (ctd)

Density of a subgraph C(V (C),E(C)) : ratio between |E(C)| andthe maximum possible number of edges:

δ(G(C)) =|E(C)|

|V (C)|(|V (C)| − 1)/2(1)

A partition of V in two subsets C and V \ C is called a cut.The cut size is the number of edges joining vertices of C withvertices of V \ C:

c(C,V \ C) = | {{u, v} ∈ E |u ∈ C, v ∈ V \ C} | (2)

For an unweighted graph, degree of a vertice v , deg(v) : numberof incident edges,For weighted graph:

deg(vi) =n∑

j=1

wi,j . (3)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 12 / 56

Page 19: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 13 / 56

Page 20: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Detecting Communities

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 14 / 56

Page 21: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,

But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 22: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,

Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 23: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),

But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 24: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,

Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 25: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .

In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 26: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).

It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 27: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Page 28: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Then in a Community...

Connections must be minimum between groups and maximum withingroups.

0

1

2

3

4

5

6

7

8

9

10

11

12

13 14

15

16

17

18

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 16 / 56

Page 29: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Then in a Community...

Connections must be minimum between groups and maximum withingroups.

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 16 / 56

Page 30: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Wasserman Criteria

Four criteria:

Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Page 31: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,

Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Page 32: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,

Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Page 33: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,

Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Page 34: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Page 35: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Complete mutuality

In a very strict sense, in acommunity, all member of asubgroup must be “friends” withall members of the subgroup,In graph theory, it corresponds toa clique,But define of a community as aclique is very too strict (Alba1973) calls it “a quite stingy”),that leads to relaxed definitions ofthe notion of clique...

A cliques with 4 nodes.

A cliques with 5 nodes.

A cliques with 6 nodes.

A cliques with 7 nodes.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 18 / 56

Page 36: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Reachability

An n-clique is a maximal subgraph such, for any pair of vertices,there exists at least a geodesic no larger than n.The classical clique is then a 1-clique.

But a geodesic path of an n-clique could run outside of this latter,and then the diameter of the subgraph c an exceed n...That is why was defined n-clan which is an n-clique with diameternot larger than n.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 19 / 56

Page 37: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Reachability

An n-clique is a maximal subgraph such, for any pair of vertices,there exists at least a geodesic no larger than n.The classical clique is then a 1-clique.But a geodesic path of an n-clique could run outside of this latter,and then the diameter of the subgraph c an exceed n...That is why was defined n-clan which is an n-clique with diameternot larger than n.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 19 / 56

Page 38: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Nodal degree

For a given graph G and a cluster C:A k-plex is a maximal subgraph such each vertice is adjacent to allother vertices of the subgraph except for k of them.Conversely a k-core is a maximal subgraph such each vertice isadjacent to, at least, k other vertices of the subgraph.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 20 / 56

Page 39: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,

If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Page 40: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.

A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Page 41: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Page 42: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Page 43: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Page 44: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Page 45: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Page 46: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

Using the cut size, the number of edges joining vertices of C withvertices of V \ C,The conductance of a community C is defined to taking intoaccount the order of the cluster and the outside of the cluster:

Φ(C) =c(C,V \ C)

min{deg(C),deg(V \ C)}

where deg(C) and deg(V \ C) are the total degrees of C and ofthe rest of the graph. The min(Φ(C)) is obtained when C has alow cut size and when the total degree of the cluster and itscomplement are equal.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 23 / 56

Page 47: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Page 48: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).

A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Page 49: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Page 50: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures of Belonging to a Community

Dissimilarity MeasuresA distance measure d betweentwo objects must fulfill thefollowing criteria:

1 separation: d(u,u) = 0,2 symmetry: d(u, v) = d(v ,u),3 triangle inequality:

d(u, v) ≤ d(u,w) + d(w , v).

Similarity MeasuresA similarity measure s must fulfillthe following criteria:

1 s(u,u) = k , where k is aconstant,

2 s(u, v) = s(v ,u),3 s(u, v) ≤ s(u,u) = k .

Link Between Similarities and DissimilarityA dissimilarity measure d can be converted into a similarity measureusing a strictly decreasing functions φ with some boundary conditions :

d = φ(s) and s = φ−1(d).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 25 / 56

Page 51: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures of Belonging to a Community

Dissimilarity MeasuresA distance measure d betweentwo objects must fulfill thefollowing criteria:

1 separation: d(u,u) = 0,2 symmetry: d(u, v) = d(v ,u),3 triangle inequality:

d(u, v) ≤ d(u,w) + d(w , v).

Similarity MeasuresA similarity measure s must fulfillthe following criteria:

1 s(u,u) = k , where k is aconstant,

2 s(u, v) = s(v ,u),3 s(u, v) ≤ s(u,u) = k .

Link Between Similarities and DissimilarityA dissimilarity measure d can be converted into a similarity measureusing a strictly decreasing functions φ with some boundary conditions :

d = φ(s) and s = φ−1(d).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 25 / 56

Page 52: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Classical Distance Recall

Recall Distances for Classical VectorsFor two vectors in IR2, u = (u1, · · · ,un) and v = (v1, · · · , vn):

Euclidean:

d(u, v) =

√√√√ n∑k=1

(uk − vk )2

Manhattan distance: d1(u, v) =∑n

k=1 |uk − vk |Tchebychev’s distance d∞(u, v) = maxk=1,··· ,n |uk − vk |Cosine:

θ(u, v) =

∑nk=1 uk · vk√∑n

k=1(uk )2√∑n

k=1(vk )2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 26 / 56

Page 53: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures for Graphs: structural equivalence

Given the adjacency matrix A = {ai,j}, ui and uj are structurallyequivalent if they have the same neighbors, i.e. if di,j = 0:

di,j =

√∑k 6=i,j

(ai,k − aj,k )2

A =

0 0 1 1 01 0 0 1 01 1 0 1 00 0 1 0 01 0 1 1 0

12

3

4

5

Structural Equivalence

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 27 / 56

Page 54: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures for Graphs: Pearson Correlation

Another measure directly defined on A is the Pearson correlationmatrix:

ci,j =

∑nk=1(ai,k − µi)(aj,k − µj)

nσjσj(4)

with µi =∑

k ai,k/n and σi =√∑

k (ai,k − µk )2/n.

A =

1.00 0.16 −0.16 0.16 0.660.16 1.00 0.66 −0.66 0.66−0.16 0.66 1.00 −1.00 0.16

0.16 −0.66 −1.00 1.00 −0.160.66 0.66 0.16 −0.16 1.00

12

3

4

5

Structural Equivalence

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 28 / 56

Page 55: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures for Graphs: Jaccard Index

Another popular seed to build (dis)similarity measure is the Jaccardindex which measures similarity between sets:

J(A,B) =|A ∩ B||A ∪ B|

. (5)

A first use of the Jaccard index in the graph theory context is tomeasure the overlap of the neighborhoods of two nodes v and u:

ω(v ,u) =|Γ(v) ∩ Γ(u)||Γ(v) ∪ Γ(u)|

(6)

which is equal to zero when there is no common neighbors, and onewhen v and u are structurally equivalent. And, as 0 ≤ J(A,b) ≤ 1, it iseasy to define the Jaccard distance:

Jδ(A,B) = 1− J(A,B) =|A ∪ B| − |A ∩ B|

|A ∪ B|. (7)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 29 / 56

Page 56: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Measures for Graphs: Tanimoto coefficient

Tanimoto coefficient is an extension of the cosine similarity whichcoincide with the Jaccard index for binary vectors:

T (A,B) =

n∑k=1

ak · bk

n∑k=1

ak +n∑

k=1

bk −n∑

k=1

ak · bk

. (8)

The Tanimoto coefficient gives the quotient between the number ofshared features by A and B, divided by the whole number of featuresfor A and B.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 30 / 56

Page 57: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 31 / 56

Page 58: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Introduction

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 32 / 56

Page 59: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms : k-means

k-means algorithms try to maximize the intra-cluster dissimilarity:

gn(C) =k∑

i=1

∑x∈Ci

d(x , ci)

Algorithm:Starting:Determine an initial vector (or centers) (c(0)

1 · · · , c(0)k ),

Repeat until stationarity: t ← t + 1Assignment: observations go to the cluster with the closest centroid:

C(t+1)i =

{x : d(x , c(t)

i ) ≤ d(x , c(t)j )∀j = 1, . . . , k

}Update: Compute the new centroids (c(t+1)

1 , · · · , c(t+1)k ):

c(t+1)i =

1

|C(t)i |

∑x∈C(t)

i

x

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 33 / 56

Page 60: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Instability

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 34 / 56

Page 61: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Spherical Clusters

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 35 / 56

Page 62: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

●●

●●●

● ●

●● ●

●●● ●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ● ●

●●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0 5 10 150

510

15

k−means

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Page 63: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 44

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Page 64: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 43

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Page 65: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 45

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Page 66: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 41

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Page 67: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 68: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 69: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 70: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 71: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 72: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Page 73: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - For SNA

0

12

3

4

5

6

7

8

9

1011

12

13

1415

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

Zachary's Karate Club Network

Figure: The Zachary Karate club network.

At the starting point, the n objects to cluster are their own classes:{{x1}, · · · , {xn}}, then at each stage we merged the two more similarclusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 38 / 56

Page 74: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - For SNA

01

23

45

6

24

25

23

27

28

31

9●

8●

2●

13

26

29

30

22

20

15

14

18

32

33

16

5

6

4

10

11

12

17

21

1

3

7

0

19

Dendrogram for the Zachary Karate Club network

A

B

C

Figure: A dendrogram for the Zachary Karate club network.

At the starting point, the n objects to cluster are their own classes:{{x1}, · · · , {xn}}, then at each stage we merged the two more similarclusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 38 / 56

Page 75: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Agglomerative Hierarchical Algorithms - The LinkChoice

For a given dissimilarity measure d between objects, severaldissimilarities between clusters D exist:

the single linkage:

D(A,B) = min{d(x , y) : x ∈ A, y ∈ B},

the complete linkage:

D(A,B) = max{d(x , y) : x ∈ A, y ∈ B},

the average linkage:

D(A,B) =1

|A| · |B|∑x∈A

∑y∈B

d(x , y).

Drawbacks: vertices of a community may be not correctly classified.Complexity: O(n2) for the single linkage and O(n2 log n) for completeand average linkages.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 39 / 56

Page 76: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Divisive Hierarchical Algorithms - Betweenness

0

1

2

3

4

5

6

7

8

9

10

11

12

13 14

15

16

17

18

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Page 77: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Divisive Hierarchical Algorithms - Betweenness

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Page 78: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Divisive Hierarchical Algorithms - Betweenness

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Page 79: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Divisive Hierarchical Algorithms - Betweenness

Find the connecting edges to find the communities.Edge betweenness is the number of shortest paths between allvertex pairs that run along the edge.Divide the graph finding and "‘removing"’ edges connectingcommunity, two communities, i.e. edges on the maximum ofshortest paths between these Communities (max edgebetweenness):

1 compute the edge betweenness for all edges of the running graph,2 remove the edge with the largest value (which gives the new

running graph).Stop when no improvement on a criterion like the modularity:

M(C1, · · · ,Ck ) =∑

i

dege(Ci)−∑

i

degi(Ci)

Complexity edge betwneenness on a graph can be computed inO(n ·m) for unweighted graphs and in O(n ·m + n2logn) for theweighted.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 41 / 56

Page 80: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Divisive Hierarchical Algorithms - Betweenness

Find the connecting edges to find the communities.Edge betweenness is the number of shortest paths between allvertex pairs that run along the edge.Divide the graph finding and "‘removing"’ edges connectingcommunity, two communities, i.e. edges on the maximum ofshortest paths between these Communities (max edgebetweenness):

1 compute the edge betweenness for all edges of the running graph,2 remove the edge with the largest value (which gives the new

running graph).Stop when no improvement on a criterion like the modularity:

M(C1, · · · ,Ck ) =∑

i

dege(Ci)−∑

i

degi(Ci)

Complexity edge betwneenness on a graph can be computed inO(n ·m) for unweighted graphs and in O(n ·m + n2logn) for theweighted.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 41 / 56

Page 81: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 0 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 82: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 0 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 83: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 1 1 1 0 0 01 −1 1 1 0 0 0−3 0 0 1 0 0 0

1 0 −2 1 0 0 00 0 0 0 0 −2 10 0 0 0 1 1 10 0 0 0 −1 1 1

.

spectrum={4,3,1,0} ∪ {3,3,0}Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 84: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 1 1 1 0 0 01 −1 1 1 0 0 0−3 0 0 1 0 0 0

1 0 −2 1 0 0 00 0 0 0 0 −2 10 0 0 0 1 1 10 0 0 0 −1 1 1

.

spectrum={4,3,1,0} ∪ {3,3,0}Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 85: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 0 0 1 1 0 11 0 0 −1 1 0 1−3 0 0 0 0 0 1

1 0 0 0 −2 0 10 0 −2 0 0 1 00 1 1 0 0 1 00 −1 1 0 0 1 0

.

spectrum={4,3,3,3,1,0,0} SpectralDecomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 86: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two 'Almost' Connected Components.

Figure: Two almost connectedcomponent.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 1 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 87: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - intro

0

1

2

3

4

5

6

Two 'Almost' Connected Components.

Figure: Two almost connectedcomponent.

Eigenvectors matrix

Q =

. . . . . −18 1

. . . . . −18 1

. . . . . −10 1

. . . . . −18 1

. . . . . 15 1

. . . . . 25 1

. . . . . 25 1

.

spectrum={· · · } Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Page 88: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - AlgorithmFor efficiency reason it is recommended to not work directly with theadjacency matrix W but with the Laplacian matrix:

L = D −W .

where D is such that we found the degrees deg(vi ) on the diagonal, and thenchoose a normalization:

Lrw = D−1L or Lsym = D−1/2LD−1/2

Algorithm for spectral clustering is the following:

1 compute the eigenvalues and sort them such λ1 ≥ λ2 ≥ · · · ≥ λn,

2 compute the last k eigenvectors ~un−k · · · , ~un,

3 form matrix U ∈ Rn×k with ~un−k · · · , ~un as columns, and matrix Y = U t ,

4 cluster the points (yi )i=1,··· ,n using the k -means algorithm into clustersA1, · · · ,Ak ,

5 build the communities C1, · · · ,Ck such Ci = {vj |yj ∈ Ai}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 43 / 56

Page 89: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Spectral Methods - For More than Graphs

Efficient even for non "‘graph"’ data: it is sufficient to have a similaritymeasure s(u, v) to build the weight/adjacency matrix W :

●●

●●●

● ●

●● ●

●●● ●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ● ●

●●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0 5 10 15

05

1015

spectral clustering

Complexity issue: the computation of the k eigenvectors of theLaplacian matrix require a time in O(n3).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 44 / 56

Page 90: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices - Introduction

To cluster from a binary table cwith objects and attributes,

Preying Flying Bird MammalLion x xFinch x xEagle x x xHare xOstrich xBee xBat x x

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Page 91: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices - Introduction

we can compute a similarity table...

Lion Finch Eagle Hare Ostrich Bee BatLion 2 0 1 1 0 0 1Finch 0 2 2 0 1 1 1Eagle 1 2 3 0 1 1 1Hare 1 0 0 1 0 0 1Ostrich 0 1 1 0 1 0 0Bee 0 1 1 0 0 1 1Bat 1 1 1 1 0 1 2

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Page 92: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices - Introduction

do a hierarchical classification

Bee

Bat

Lion

Har

e

Ost

rich

Fin

ch

Eag

le

Hierarchical Classification of Species

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Page 93: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices - Introduction

do a hierarchical classification and choose one class per object!

Bee

Bat

Lion

Har

e

Ost

rich

Fin

ch

Eag

le

Hierarchical Classification of SpeciesH

eigh

t

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Page 94: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices - Introduction

...or extract all the concepts and shared propeties!

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Page 95: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices: Intension

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammalLion x xFinch x xEagle x x xHare xOstrich xBee xBat x x

For X ∈ O:

f (X ) = {a ∈ A|∀o ∈ X ,oIa}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 46 / 56

Page 96: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices: Extension

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

For Y ∈ A:

g(Y ) = {o ∈ O|∀a ∈ Y ,oIa}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 47 / 56

Page 97: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices: Concepts

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

A concept is (X ,Y ) ∈ O × A such:

f (X ) = Y & g(Y ) = X .

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 48 / 56

Page 98: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices: POSET

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

(X1,Y1), (X2,Y2) ∈ O×A : (X1,Y1) ≤ (X2,Y2)⇔ X1 ⊆ X2( or Y1 ⊇ Y2).({Moineau,Eagle,Ostrich}, {Birdx}) ⊃({Moineau,Eagle}, {Flying,Birdx}) ⊃({Eagle}, {Preying,Flying,Birdx})

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 49 / 56

Page 99: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices for Social Networks

0

1

2

3

4

56

7

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Page 100: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices for Social Networks

1 2 3 4 5 6 7 81 × × × × ×2 × × ×3 × × × ×4 × × ×5 × × ×6 × × × × ×7 × × × × ×8 × × × × × ×

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Page 101: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices for Social Networks

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Page 102: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices for Social Networks

Conceptual Metrics based on Galois Lattices.

RelatednessRelatedness(O)= % of objects which share properties with O.

ClosenessCloseness(O)= % of shared properties of with its related objects.

Relatedness→↓ Closeness

High Low

High ClusteredLow Marginal

Filter the p % of marginal objects for a chosen p ∈]0,1[.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Page 103: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Galois Lattices vs Similarity Based Clusterings

Criteria GL SBCSimilarity Equality ProximityUniqueness of results Y NCompleteness of final structure Y NAttribute Weight N YContinuous value management Hard Y

Complexity: if we denote |O| the number of objects, |A| the number ofattributes and |L| the size of the lattices (i.e. the number of concepts),then algorithms have a complexity time in O(|O|2|A||L|) orO(|O||A|2|L|).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 51 / 56

Page 104: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

EVARIST: eBuzz Monitoring on Twitter

#ereputation

avis internautes eréputation @laurentbinard: #wikioconf séminaire présentation #reseauxsociaux bientôt,alerte,"e−reputation",http://ow.ly/1a7k7w (via,@nettoyeurdunet) cnil,harcèlement,internet,questions,http://ow.ly/3e5ov #personalbranding cv daily,read,twitter,newspaper,http://bit.ly/d5hdkg,(19,contributions,today) améliore,identité,numérique,,temps,mal,!! @merouanes,"mieux,vaut,prévenir,guérir,!",tard,,aussi!,http://bit.ly/bz4zqh @googlecleaner:,interview,http://bit.ly/ajhsvk,recommander,bon,article,:) blog,anthony,poncier,»,archive,tweet,semaine,http://ow.ly/1a6dk0 flux,rss,discussion,trackback,uri,commentaire,laisser,http://ow.ly/1a6dk1 truc,super,énervant,web,2.0,rates,parle,rarement,qd,réussis,#jalousie @digimind_fr:,meamedica,,site,communautaire,noter,médicaments,http://bit.ly/co28ud,#sites,#in soigner,réputation,“en,ligne”,http://bit.ly/brrgef reasons,company,#twitter,http://j.mp/agaike,@tweetsmarter,#influence,#corporatebranding

multiples,solutions,veille...,outil,efficace 70%,contenus,recommandés,publiés veillent,stratégies,http://ow.ly/1a7fwm smartbox,choisit,identik,analyser,indice,rhnet,http://ow.ly/1a6tlp cf,@fred_montagnon

dit,fred,montagnon

#wikioconf,,serge,alleyne,,fondateur,#nomao,,annonce,présente,solution,locale,#wikiobuzz...

petit,dej,#wikiobuzz,"il,suffit,audience,influent"

link,humans.,http://slidesha.re/gztsbm frontière,ténue,sphère,professionnelle,privée,http://3.ly/ztcv,@xavierbiseul,@lesechos,#web grâce,search,http://ow.ly/3dyni,#recrutement @duboissetb: jeudi,déc,,débat,http://fy.to/d04

google

réalité,augmentée...,maintenant,http://t.co/ihqy1by,@lepost @blogpb: @chloegiard:

@celinecrespin: @giselabonnaud: @ffoschiani,#cv,#google @weickmann) @agence_happy_id:

@atchikservices:

Treillis Complet

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 52 / 56

Page 105: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

EVARIST: eBuzz Monitoring on Twitter

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 52 / 56

Page 106: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Pajek

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 53 / 56

Page 107: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

igraph

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 54 / 56

Page 108: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Gephi

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 55 / 56

Page 109: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Conclusions

Definition of a community or cluster, is not an easy task.In fact, what is a cluster is in the eyes of the beholder,One person’s noise could be another person’s signal.Cluster analysis is structure seeking although its operation isstructure imposing.In data clustering many choices must be done before any analysis(cluster definition , algorithms, measures, ...) which influencestrongly the result.But, in spite of all these warnings, clustering algorithms allow us toretrieve valuable pieces of information in social networks, byfinding communities.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 56 / 56

Page 110: Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An Introduction to Social Networks Data Analysis Etienne Cuvelier1 1Ecole Centrale Paris

Vladimir Estivill-Castro.Why so many clustering algorithms: a position paper.SIGKDD Explor. Newsl., 4(1):65–75, 2002.

Santo Fortunato.Community detection in graphs.Physics Reports, 486(3-5):75–174, 2010.

Satu Elisa Shaeffer.Graph clustering.Computer Science Review, 1:27–64, 2007.

U. von Luxburg.A tutorial on spectral clustering.Technical Report Technical Report 149, Max Planck Institute forBiological Cybernetics, 2006.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 56 / 56