Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An...

Post on 29-Sep-2020

21 views 0 download

Transcript of Graph Mining & Community Detection An Introduction to ......Graph Mining & Community Detection An...

Graph Mining & Community DetectionAn Introduction to Social Networks Data Analysis

Etienne Cuvelier1

1Ecole Centrale Paris

eBISS 2011,07/07/2011

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 1 / 56

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

Graph Mining and SNDA. Why?

My Facebook’s friends set reduced in 3 main communities.

What you don’t tell, your network can tell it for you (MIT’s Gaydarexperience).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 2 / 56

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

A talk with a minimum of formulas?

Stephen HawkingMy editor told me that eachequation I included in the bookwould halve sales.

"A Brief History of Time"

But...min > 0 ! (Argh! A first formula!)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 3 / 56

Outline

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 4 / 56

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 5 / 56

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipedia

Figure: source: wikipedia

Using graphs Euler proved that the problem has no solution.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipediaFigure: source: wikipedia

Using graphs Euler proved that the problem has no solution.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

The Graph Theory start

How to walk through the city that would cross each bridge once andonly once ? (Typically mathematician’s game).

Figure: source: wikipediaFigure: source: wikipedia

Using graphs Euler proved that the problem has no solution.Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 6 / 56

The Graph Theory Paradigm

A graph is denoted G = (V ,E)with

V the set of vertices,E the set of edges.

{v ,w} is the edge connectingvertices v and w .

A =

a1,1 · · · a1,n

.... . .

...an,1 · · · an,n

where

ai,j =

{1 if vi and vj are connected,0 else.

If G is a weighted graph, then ai,j = ω(vi , vj) and thenA = W = (wi,j) = (ω(vi , vj)).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 7 / 56

The Graph Theory Goes Social

Moreno (1933) 1st to use pointsand lines for socialconfigurations,Cartwright and Harary (1956) linkwith the graph theory,individuals are represented usingpoints, called nodes or vertices,and social relationships arerepresented using lines.

0

12

3

4

5

6

7

8

9

1011

12

13

1415

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

Zachary's Karate Club Network

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 8 / 56

Directed or not?

0

1

2

3

4

5

6

7

8

9

An undirected graph (ex.:Facebook): {v ,w}

0

1

2

3

4

5

6

7

8

9

A directed graph (ex.: Twitter):(v ,w).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 9 / 56

Graph Theory Basic Notions

|V | = n is called the order the graph.|E | = m is called the size of the graph.If |E | = n(n − 1)/2, i.e. (any pair of vertices are connected), thegraph complete.v and u are neighbours if connected by an edge.The neighbourhood of a node v is denoted Γ(v).A subgraph G′ = (V ′,E ′) of G = (V ,E) is such V ′ ⊂ V , E ′ ⊂ Eand {v ,u} ∈ E ′ ⇒ v ,u ∈ V ′.A subset C of V can define an induced subgraphG(C) = (C,E(C)), where E(C) = {(v ,u) ∈ E |v ,u ∈ C}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 10 / 56

Graph Theory Basic Notions (ctd)

A path P is a subgraphP = (V (P),E(P)) suchV (P) = {vi0 , · · · , vik} and E(P) ={{vi0 , vi1}, {vi1 , vi2}, · · · , {vik−1 , vik}

}.

k is the length path.If no vertice is repeated, then the pathis simple. length,If there exists a path between v andu, they are connected.The graph is a connected graph if∀v ,u, there is, at least, one pathconnecting v and u.A connected subgraph is called aconnected component.

0

1

2

3

45

6

7

8

9

10

11

12

13

14

1516

17

18

19

20

21

22

23

24

2526

27

28

29

30

31

3233

Diameter of the Zachary Karate Club network

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 11 / 56

Graph Theory Basic Notions (ctd)

Density of a subgraph C(V (C),E(C)) : ratio between |E(C)| andthe maximum possible number of edges:

δ(G(C)) =|E(C)|

|V (C)|(|V (C)| − 1)/2(1)

A partition of V in two subsets C and V \ C is called a cut.The cut size is the number of edges joining vertices of C withvertices of V \ C:

c(C,V \ C) = | {{u, v} ∈ E |u ∈ C, v ∈ V \ C} | (2)

For an unweighted graph, degree of a vertice v , deg(v) : numberof incident edges,For weighted graph:

deg(vi) =n∑

j=1

wi,j . (3)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 12 / 56

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 13 / 56

Detecting Communities

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 14 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,

But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,

Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),

But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,

Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .

In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).

It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

But What is a Community?

In the clustering framework a community is a cluster of nodes in agraph,But a very important question is what is a cluster?,Even in the clustering literature there is non complete agreementbetween all authors, ([EC02]),But, objects inside a cluster must be more similar than objectsoutside of this cluster,Objects are clustered or grouped based in maximizing intra-classsimilarity and minimizing inter-class similarity .In graph framework, clustering is dividing vertices such nodes of acommunity must be more connected with nodes of this community,than with nodes outside of the cluster ( [Sha07], [For10]).It implies that it must exists at least a path between two nodes of acluster, and this path must be internal to the cluster.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 15 / 56

Then in a Community...

Connections must be minimum between groups and maximum withingroups.

0

1

2

3

4

5

6

7

8

9

10

11

12

13 14

15

16

17

18

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 16 / 56

Then in a Community...

Connections must be minimum between groups and maximum withingroups.

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 16 / 56

Wasserman Criteria

Four criteria:

Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,

Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,

Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,

Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Wasserman Criteria

Four criteria:Complete mutuality: all member of a subgroup must be “linked”with all members of the subgroup,Reachability: existence (and length) of paths between vertices ofa subgroup,Nodal degree: imposes a constraint on the number of adjacentvertices,Internal versus external cohesion: the former must be higher thanthe latter.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 17 / 56

Complete mutuality

In a very strict sense, in acommunity, all member of asubgroup must be “friends” withall members of the subgroup,In graph theory, it corresponds toa clique,But define of a community as aclique is very too strict (Alba1973) calls it “a quite stingy”),that leads to relaxed definitions ofthe notion of clique...

A cliques with 4 nodes.

A cliques with 5 nodes.

A cliques with 6 nodes.

A cliques with 7 nodes.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 18 / 56

Reachability

An n-clique is a maximal subgraph such, for any pair of vertices,there exists at least a geodesic no larger than n.The classical clique is then a 1-clique.

But a geodesic path of an n-clique could run outside of this latter,and then the diameter of the subgraph c an exceed n...That is why was defined n-clan which is an n-clique with diameternot larger than n.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 19 / 56

Reachability

An n-clique is a maximal subgraph such, for any pair of vertices,there exists at least a geodesic no larger than n.The classical clique is then a 1-clique.But a geodesic path of an n-clique could run outside of this latter,and then the diameter of the subgraph c an exceed n...That is why was defined n-clan which is an n-clique with diameternot larger than n.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 19 / 56

Nodal degree

For a given graph G and a cluster C:A k-plex is a maximal subgraph such each vertice is adjacent to allother vertices of the subgraph except for k of them.Conversely a k-core is a maximal subgraph such each vertice isadjacent to, at least, k other vertices of the subgraph.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 20 / 56

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,

If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.

A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Internal versus external cohesion

For a given graph G and a cluster C:degi(C): internal degree of C is the number of internal edges ofC: degi(v ,C) = |Γ(v) ∩ C|,dege(C): external degree of C is the number of edges with onevertice inside C, and the other outside of C:dege(v ,C) = |Γ(v) ∩ (V \ C)|,deg(C) = degi(C) + dege(C): degree of C,If dege(v ,C) = 0, then v ∈ C is surely a good assignation for v ,Conversly if degi(v ,C) = 0, then we must have v /∈ C.A LS-set, or strong community is a subgraph C such for eachnode v ∈ C we have degi(v ,C) > dege(v ,C).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 21 / 56

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Internal versus external cohesion

For a given graph G and a cluster C:Graph density is the ratio between existing number of edges andmaximum possible number of edges: δ(G(C)) = |E(C)|

|V (C)|(|V (C)|−1)/2

Intra-cluster density: quotient of the number of internal edges of Cand the maximal possible number of internal nodes:δi(C) = |{{u,v}|u,v∈C}|

|C|(|C|−1)/2 ,,

Inter-cluster density: quotient of the existing number of edges withone vertice inside C, and the other outside of C and maximumpossible number of edges in this configuration:δe(C) = |{{u,v}|u∈C,v /∈C}|

|C|(|G|−|C|) ,,

For a given partition {C1, · · · ,Ck} we want

k∑i=1

δi(Ci) = δi(G|C1, · · · ,Ck ) >> δ(G).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 22 / 56

Internal versus external cohesion

Using the cut size, the number of edges joining vertices of C withvertices of V \ C,The conductance of a community C is defined to taking intoaccount the order of the cluster and the outside of the cluster:

Φ(C) =c(C,V \ C)

min{deg(C),deg(V \ C)}

where deg(C) and deg(V \ C) are the total degrees of C and ofthe rest of the graph. The min(Φ(C)) is obtained when C has alow cut size and when the total degree of the cluster and itscomplement are equal.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 23 / 56

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).

A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Internal versus external cohesion

A community with strong inner ties must have a higher Relativedensity

ρ(C) =degi(C)

deg(C).

The edge connectivity of a graph G is the minimal number ofnodes to be removed so that G is disconnected, and is denotedk(G).A community C can be defined as an Highly connected subgraph(HCS) such

k(C) >n2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 24 / 56

Measures of Belonging to a Community

Dissimilarity MeasuresA distance measure d betweentwo objects must fulfill thefollowing criteria:

1 separation: d(u,u) = 0,2 symmetry: d(u, v) = d(v ,u),3 triangle inequality:

d(u, v) ≤ d(u,w) + d(w , v).

Similarity MeasuresA similarity measure s must fulfillthe following criteria:

1 s(u,u) = k , where k is aconstant,

2 s(u, v) = s(v ,u),3 s(u, v) ≤ s(u,u) = k .

Link Between Similarities and DissimilarityA dissimilarity measure d can be converted into a similarity measureusing a strictly decreasing functions φ with some boundary conditions :

d = φ(s) and s = φ−1(d).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 25 / 56

Measures of Belonging to a Community

Dissimilarity MeasuresA distance measure d betweentwo objects must fulfill thefollowing criteria:

1 separation: d(u,u) = 0,2 symmetry: d(u, v) = d(v ,u),3 triangle inequality:

d(u, v) ≤ d(u,w) + d(w , v).

Similarity MeasuresA similarity measure s must fulfillthe following criteria:

1 s(u,u) = k , where k is aconstant,

2 s(u, v) = s(v ,u),3 s(u, v) ≤ s(u,u) = k .

Link Between Similarities and DissimilarityA dissimilarity measure d can be converted into a similarity measureusing a strictly decreasing functions φ with some boundary conditions :

d = φ(s) and s = φ−1(d).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 25 / 56

Classical Distance Recall

Recall Distances for Classical VectorsFor two vectors in IR2, u = (u1, · · · ,un) and v = (v1, · · · , vn):

Euclidean:

d(u, v) =

√√√√ n∑k=1

(uk − vk )2

Manhattan distance: d1(u, v) =∑n

k=1 |uk − vk |Tchebychev’s distance d∞(u, v) = maxk=1,··· ,n |uk − vk |Cosine:

θ(u, v) =

∑nk=1 uk · vk√∑n

k=1(uk )2√∑n

k=1(vk )2.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 26 / 56

Measures for Graphs: structural equivalence

Given the adjacency matrix A = {ai,j}, ui and uj are structurallyequivalent if they have the same neighbors, i.e. if di,j = 0:

di,j =

√∑k 6=i,j

(ai,k − aj,k )2

A =

0 0 1 1 01 0 0 1 01 1 0 1 00 0 1 0 01 0 1 1 0

12

3

4

5

Structural Equivalence

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 27 / 56

Measures for Graphs: Pearson Correlation

Another measure directly defined on A is the Pearson correlationmatrix:

ci,j =

∑nk=1(ai,k − µi)(aj,k − µj)

nσjσj(4)

with µi =∑

k ai,k/n and σi =√∑

k (ai,k − µk )2/n.

A =

1.00 0.16 −0.16 0.16 0.660.16 1.00 0.66 −0.66 0.66−0.16 0.66 1.00 −1.00 0.16

0.16 −0.66 −1.00 1.00 −0.160.66 0.66 0.16 −0.16 1.00

12

3

4

5

Structural Equivalence

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 28 / 56

Measures for Graphs: Jaccard Index

Another popular seed to build (dis)similarity measure is the Jaccardindex which measures similarity between sets:

J(A,B) =|A ∩ B||A ∪ B|

. (5)

A first use of the Jaccard index in the graph theory context is tomeasure the overlap of the neighborhoods of two nodes v and u:

ω(v ,u) =|Γ(v) ∩ Γ(u)||Γ(v) ∪ Γ(u)|

(6)

which is equal to zero when there is no common neighbors, and onewhen v and u are structurally equivalent. And, as 0 ≤ J(A,b) ≤ 1, it iseasy to define the Jaccard distance:

Jδ(A,B) = 1− J(A,B) =|A ∪ B| − |A ∩ B|

|A ∪ B|. (7)

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 29 / 56

Measures for Graphs: Tanimoto coefficient

Tanimoto coefficient is an extension of the cosine similarity whichcoincide with the Jaccard index for binary vectors:

T (A,B) =

n∑k=1

ak · bk

n∑k=1

ak +n∑

k=1

bk −n∑

k=1

ak · bk

. (8)

The Tanimoto coefficient gives the quotient between the number ofshared features by A and B, divided by the whole number of featuresfor A and B.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 30 / 56

Plan

1 Social Networks and Graphs Basics

2 Define Communities in Social Network

3 Measures of Belonging to a Community

4 Detection Algoritms

5 Softwares

6 Conclusions

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 31 / 56

Partitional Algorithms - Introduction

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 32 / 56

Partitional Algorithms : k-means

k-means algorithms try to maximize the intra-cluster dissimilarity:

gn(C) =k∑

i=1

∑x∈Ci

d(x , ci)

Algorithm:Starting:Determine an initial vector (or centers) (c(0)

1 · · · , c(0)k ),

Repeat until stationarity: t ← t + 1Assignment: observations go to the cluster with the closest centroid:

C(t+1)i =

{x : d(x , c(t)

i ) ≤ d(x , c(t)j )∀j = 1, . . . , k

}Update: Compute the new centroids (c(t+1)

1 , · · · , c(t+1)k ):

c(t+1)i =

1

|C(t)i |

∑x∈C(t)

i

x

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 33 / 56

Partitional Algorithms - Instability

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 34 / 56

Partitional Algorithms - Spherical Clusters

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 35 / 56

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

●●

●●●

● ●

●● ●

●●● ●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ● ●

●●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0 5 10 150

510

15

k−means

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 44

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 43

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 45

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Partitional Algorithms - Drawbacks

Drawbackschoice of k : an inappropriatechoice leads too nonsignificant results,spherical clusters: algorithmsworks better when sphericalclusters are in data,instability: the random startingpartition can leads to a localoptimum for the criterionfunction.

Criterion: 41

0

1

2

3

4

56

7

89

10

1112

13

14

1516

17

18

19

20

Complexity: O(m2 × n × k) where m is the number of attributes, n thenumber of objects to cluster and k the number of clusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 36 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - Intro

1 2

3 4

5

6

0 1 2 3 4

01

23

45

6

Hierarchical Clustering: The Data

x

y

6 4 5 3 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

HClust, Complete Link

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 37 / 56

Agglomerative Hierarchical Algorithms - For SNA

0

12

3

4

5

6

7

8

9

1011

12

13

1415

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

3233

Zachary's Karate Club Network

Figure: The Zachary Karate club network.

At the starting point, the n objects to cluster are their own classes:{{x1}, · · · , {xn}}, then at each stage we merged the two more similarclusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 38 / 56

Agglomerative Hierarchical Algorithms - For SNA

01

23

45

6

24

25

23

27

28

31

9●

8●

2●

13

26

29

30

22

20

15

14

18

32

33

16

5

6

4

10

11

12

17

21

1

3

7

0

19

Dendrogram for the Zachary Karate Club network

A

B

C

Figure: A dendrogram for the Zachary Karate club network.

At the starting point, the n objects to cluster are their own classes:{{x1}, · · · , {xn}}, then at each stage we merged the two more similarclusters.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 38 / 56

Agglomerative Hierarchical Algorithms - The LinkChoice

For a given dissimilarity measure d between objects, severaldissimilarities between clusters D exist:

the single linkage:

D(A,B) = min{d(x , y) : x ∈ A, y ∈ B},

the complete linkage:

D(A,B) = max{d(x , y) : x ∈ A, y ∈ B},

the average linkage:

D(A,B) =1

|A| · |B|∑x∈A

∑y∈B

d(x , y).

Drawbacks: vertices of a community may be not correctly classified.Complexity: O(n2) for the single linkage and O(n2 log n) for completeand average linkages.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 39 / 56

Divisive Hierarchical Algorithms - Betweenness

0

1

2

3

4

5

6

7

8

9

10

11

12

13 14

15

16

17

18

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Divisive Hierarchical Algorithms - Betweenness

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Divisive Hierarchical Algorithms - Betweenness

0

1

23

4

5

6

7

8

9

10

11 12

13

14

15

16

1718

19

20

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 40 / 56

Divisive Hierarchical Algorithms - Betweenness

Find the connecting edges to find the communities.Edge betweenness is the number of shortest paths between allvertex pairs that run along the edge.Divide the graph finding and "‘removing"’ edges connectingcommunity, two communities, i.e. edges on the maximum ofshortest paths between these Communities (max edgebetweenness):

1 compute the edge betweenness for all edges of the running graph,2 remove the edge with the largest value (which gives the new

running graph).Stop when no improvement on a criterion like the modularity:

M(C1, · · · ,Ck ) =∑

i

dege(Ci)−∑

i

degi(Ci)

Complexity edge betwneenness on a graph can be computed inO(n ·m) for unweighted graphs and in O(n ·m + n2logn) for theweighted.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 41 / 56

Divisive Hierarchical Algorithms - Betweenness

Find the connecting edges to find the communities.Edge betweenness is the number of shortest paths between allvertex pairs that run along the edge.Divide the graph finding and "‘removing"’ edges connectingcommunity, two communities, i.e. edges on the maximum ofshortest paths between these Communities (max edgebetweenness):

1 compute the edge betweenness for all edges of the running graph,2 remove the edge with the largest value (which gives the new

running graph).Stop when no improvement on a criterion like the modularity:

M(C1, · · · ,Ck ) =∑

i

dege(Ci)−∑

i

degi(Ci)

Complexity edge betwneenness on a graph can be computed inO(n ·m) for unweighted graphs and in O(n ·m + n2logn) for theweighted.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 41 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 0 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 0 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 1 1 1 0 0 01 −1 1 1 0 0 0−3 0 0 1 0 0 0

1 0 −2 1 0 0 00 0 0 0 0 −2 10 0 0 0 1 1 10 0 0 0 −1 1 1

.

spectrum={4,3,1,0} ∪ {3,3,0}Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 1 1 1 0 0 01 −1 1 1 0 0 0−3 0 0 1 0 0 0

1 0 −2 1 0 0 00 0 0 0 0 −2 10 0 0 0 1 1 10 0 0 0 −1 1 1

.

spectrum={4,3,1,0} ∪ {3,3,0}Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two Connected Components.

Figure: Two connectedcomponents.

Eigenvectors matrix

Q =

1 0 0 1 1 0 11 0 0 −1 1 0 1−3 0 0 0 0 0 1

1 0 0 0 −2 0 10 0 −2 0 0 1 00 1 1 0 0 1 00 −1 1 0 0 1 0

.

spectrum={4,3,3,3,1,0,0} SpectralDecomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two 'Almost' Connected Components.

Figure: Two almost connectedcomponent.

Adjacency Matrix

W =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 1 1 0 00 0 1 0 0 0 00 0 0 0 0 1 10 0 0 0 1 0 10 0 0 0 1 1 0

.

Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - intro

0

1

2

3

4

5

6

Two 'Almost' Connected Components.

Figure: Two almost connectedcomponent.

Eigenvectors matrix

Q =

. . . . . −18 1

. . . . . −18 1

. . . . . −10 1

. . . . . −18 1

. . . . . 15 1

. . . . . 25 1

. . . . . 25 1

.

spectrum={· · · } Spectral Decomposition

W = QΛQ−1

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 42 / 56

Spectral Methods - AlgorithmFor efficiency reason it is recommended to not work directly with theadjacency matrix W but with the Laplacian matrix:

L = D −W .

where D is such that we found the degrees deg(vi ) on the diagonal, and thenchoose a normalization:

Lrw = D−1L or Lsym = D−1/2LD−1/2

Algorithm for spectral clustering is the following:

1 compute the eigenvalues and sort them such λ1 ≥ λ2 ≥ · · · ≥ λn,

2 compute the last k eigenvectors ~un−k · · · , ~un,

3 form matrix U ∈ Rn×k with ~un−k · · · , ~un as columns, and matrix Y = U t ,

4 cluster the points (yi )i=1,··· ,n using the k -means algorithm into clustersA1, · · · ,Ak ,

5 build the communities C1, · · · ,Ck such Ci = {vj |yj ∈ Ai}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 43 / 56

Spectral Methods - For More than Graphs

Efficient even for non "‘graph"’ data: it is sufficient to have a similaritymeasure s(u, v) to build the weight/adjacency matrix W :

●●

●●●

● ●

●● ●

●●● ●

●●

●●●

●●●●

● ●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ● ●

●●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●

●● ●●

●●

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

0 5 10 15

05

1015

spectral clustering

Complexity issue: the computation of the k eigenvectors of theLaplacian matrix require a time in O(n3).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 44 / 56

Galois Lattices - Introduction

To cluster from a binary table cwith objects and attributes,

Preying Flying Bird MammalLion x xFinch x xEagle x x xHare xOstrich xBee xBat x x

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Galois Lattices - Introduction

we can compute a similarity table...

Lion Finch Eagle Hare Ostrich Bee BatLion 2 0 1 1 0 0 1Finch 0 2 2 0 1 1 1Eagle 1 2 3 0 1 1 1Hare 1 0 0 1 0 0 1Ostrich 0 1 1 0 1 0 0Bee 0 1 1 0 0 1 1Bat 1 1 1 1 0 1 2

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Galois Lattices - Introduction

do a hierarchical classification

Bee

Bat

Lion

Har

e

Ost

rich

Fin

ch

Eag

le

Hierarchical Classification of Species

Hei

ght

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Galois Lattices - Introduction

do a hierarchical classification and choose one class per object!

Bee

Bat

Lion

Har

e

Ost

rich

Fin

ch

Eag

le

Hierarchical Classification of SpeciesH

eigh

t

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Galois Lattices - Introduction

...or extract all the concepts and shared propeties!

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 45 / 56

Galois Lattices: Intension

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammalLion x xFinch x xEagle x x xHare xOstrich xBee xBat x x

For X ∈ O:

f (X ) = {a ∈ A|∀o ∈ X ,oIa}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 46 / 56

Galois Lattices: Extension

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

For Y ∈ A:

g(Y ) = {o ∈ O|∀a ∈ Y ,oIa}.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 47 / 56

Galois Lattices: Concepts

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

A concept is (X ,Y ) ∈ O × A such:

f (X ) = Y & g(Y ) = X .

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 48 / 56

Galois Lattices: POSET

Obj

ects

OAttributes A︷ ︸︸ ︷

Preying Flying Bird MammifèreLion x xMoineau x xEagle x x xHare xOstrich xBee xBat x x

(X1,Y1), (X2,Y2) ∈ O×A : (X1,Y1) ≤ (X2,Y2)⇔ X1 ⊆ X2( or Y1 ⊇ Y2).({Moineau,Eagle,Ostrich}, {Birdx}) ⊃({Moineau,Eagle}, {Flying,Birdx}) ⊃({Eagle}, {Preying,Flying,Birdx})

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 49 / 56

Galois Lattices for Social Networks

0

1

2

3

4

56

7

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Galois Lattices for Social Networks

1 2 3 4 5 6 7 81 × × × × ×2 × × ×3 × × × ×4 × × ×5 × × ×6 × × × × ×7 × × × × ×8 × × × × × ×

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Galois Lattices for Social Networks

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Galois Lattices for Social Networks

Conceptual Metrics based on Galois Lattices.

RelatednessRelatedness(O)= % of objects which share properties with O.

ClosenessCloseness(O)= % of shared properties of with its related objects.

Relatedness→↓ Closeness

High Low

High ClusteredLow Marginal

Filter the p % of marginal objects for a chosen p ∈]0,1[.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 50 / 56

Galois Lattices vs Similarity Based Clusterings

Criteria GL SBCSimilarity Equality ProximityUniqueness of results Y NCompleteness of final structure Y NAttribute Weight N YContinuous value management Hard Y

Complexity: if we denote |O| the number of objects, |A| the number ofattributes and |L| the size of the lattices (i.e. the number of concepts),then algorithms have a complexity time in O(|O|2|A||L|) orO(|O||A|2|L|).

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 51 / 56

EVARIST: eBuzz Monitoring on Twitter

#ereputation

avis internautes eréputation @laurentbinard: #wikioconf séminaire présentation #reseauxsociaux bientôt,alerte,"e−reputation",http://ow.ly/1a7k7w (via,@nettoyeurdunet) cnil,harcèlement,internet,questions,http://ow.ly/3e5ov #personalbranding cv daily,read,twitter,newspaper,http://bit.ly/d5hdkg,(19,contributions,today) améliore,identité,numérique,,temps,mal,!! @merouanes,"mieux,vaut,prévenir,guérir,!",tard,,aussi!,http://bit.ly/bz4zqh @googlecleaner:,interview,http://bit.ly/ajhsvk,recommander,bon,article,:) blog,anthony,poncier,»,archive,tweet,semaine,http://ow.ly/1a6dk0 flux,rss,discussion,trackback,uri,commentaire,laisser,http://ow.ly/1a6dk1 truc,super,énervant,web,2.0,rates,parle,rarement,qd,réussis,#jalousie @digimind_fr:,meamedica,,site,communautaire,noter,médicaments,http://bit.ly/co28ud,#sites,#in soigner,réputation,“en,ligne”,http://bit.ly/brrgef reasons,company,#twitter,http://j.mp/agaike,@tweetsmarter,#influence,#corporatebranding

multiples,solutions,veille...,outil,efficace 70%,contenus,recommandés,publiés veillent,stratégies,http://ow.ly/1a7fwm smartbox,choisit,identik,analyser,indice,rhnet,http://ow.ly/1a6tlp cf,@fred_montagnon

dit,fred,montagnon

#wikioconf,,serge,alleyne,,fondateur,#nomao,,annonce,présente,solution,locale,#wikiobuzz...

petit,dej,#wikiobuzz,"il,suffit,audience,influent"

link,humans.,http://slidesha.re/gztsbm frontière,ténue,sphère,professionnelle,privée,http://3.ly/ztcv,@xavierbiseul,@lesechos,#web grâce,search,http://ow.ly/3dyni,#recrutement @duboissetb: jeudi,déc,,débat,http://fy.to/d04

google

réalité,augmentée...,maintenant,http://t.co/ihqy1by,@lepost @blogpb: @chloegiard:

@celinecrespin: @giselabonnaud: @ffoschiani,#cv,#google @weickmann) @agence_happy_id:

@atchikservices:

Treillis Complet

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 52 / 56

EVARIST: eBuzz Monitoring on Twitter

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 52 / 56

Pajek

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 53 / 56

igraph

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 54 / 56

Gephi

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 55 / 56

Conclusions

Definition of a community or cluster, is not an easy task.In fact, what is a cluster is in the eyes of the beholder,One person’s noise could be another person’s signal.Cluster analysis is structure seeking although its operation isstructure imposing.In data clustering many choices must be done before any analysis(cluster definition , algorithms, measures, ...) which influencestrongly the result.But, in spite of all these warnings, clustering algorithms allow us toretrieve valuable pieces of information in social networks, byfinding communities.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 56 / 56

Vladimir Estivill-Castro.Why so many clustering algorithms: a position paper.SIGKDD Explor. Newsl., 4(1):65–75, 2002.

Santo Fortunato.Community detection in graphs.Physics Reports, 486(3-5):75–174, 2010.

Satu Elisa Shaeffer.Graph clustering.Computer Science Review, 1:27–64, 2007.

U. von Luxburg.A tutorial on spectral clustering.Technical Report Technical Report 149, Max Planck Institute forBiological Cybernetics, 2006.

Cuvelier (ECP) SNDA eBISS 2011, 07/07/2011 56 / 56