Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive...

42
Network communities Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social Network Analysis MAGoLEGO course Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42

Transcript of Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive...

Page 1: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Network communities

Leonid E. Zhukov

School of Data Analysis and Artificial IntelligenceDepartment of Computer Science

National Research University Higher School of Economics

Social Network AnalysisMAGoLEGO course

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42

Page 2: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Lecture outline

1 Cohesive subgroupsGraph cliques

2 Network communitiesDensity, cuts, modularity

3 AlgorithmsEdge betweennessModularity maximizationLabel propagationFast community unfoldingWalktrap

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 2 / 42

Page 3: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Network communities

Connected and undirected graphsLeonid E. Zhukov (HSE) Lecture 5 15.05.2015 3 / 42

Page 4: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Network communities

What makes a community (cohesive subgroup):

Mutuality of ties. Everyone in the group has ties (edges) to oneanother

Compactness. Closeness or reachability of group members in smallnumber of steps, not necessarily adjacency

Density of edges. High frequency of ties within the group

Separation. Higher frequency of ties among group members comparedto non-members

Wasserman and Faust

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 4 / 42

Page 5: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Graph cliques

Definition

A clique is a complete (fully connected) subgraph, i.e. a set of verticeswhere each pair of vertices is connected.

Cliques can overlapLeonid E. Zhukov (HSE) Lecture 5 15.05.2015 5 / 42

Page 6: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Graph cliques

A maximal clique is a clique that cannot be extended by includingone more adjacent vertex (not included in larger one)

A maximum clique is a clique of the largest possible size in a givengraph

Graph clique number is the size of the maximum clique

igraph: cliques(), maximal.cliques(), largest.cliques()

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 6 / 42

Page 7: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Graph cliques

Maximum cliques

Maximal cliques:Clique size: 2 3 4 5Number of cliques: 11 21 2 2

Zachary, 1977

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 7 / 42

Page 8: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Network communities

Definition

Network communities are groups of vertices such that vertices inside thegroup connected with many more edges than between groups.

Community detection is an assignment of vertices to communities.

Will consider non-overlapping communities, graph cuts

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 8 / 42

Page 9: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Community density

Graph G (V ,E ), n = |V |, m = |E |Community - set of nodes Sns -number of nodes in S , ms - number of edges in S

Graph density

ρ =m

n(n − 1)/2

community internal density

δint(C ) =ms

ns(ns − 1)/2

external edges density

δext(C ) =mext

nc(n − nc)

community (cluster): δint > ρ, δext < ρ

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 9 / 42

Page 10: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Graph cuts

Graph cutQ = cut = cs

Ratio cut:

Q =cut

|S |+

cut

|V \ S |=

ncsns(n − ns)

Normalized cut:

Q =cut

Vol(S)+

cut

Vol(V \ S)=

cs2ms + cs

+cs

2(m −ms) + cs

Conductance (quotient cut):

Q =cut

min (Vol(S),Vol(V \ S))=

cs2ms + cs

cs - edges crossing the boundary between S and V \ S

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 10 / 42

Page 11: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Modularity

Compare fraction of edges within the cluster to expected fraction inrandom graph with identical degree sequence

Q =1

4(ms − E (ms))

Modularity score

Q =1

2m

∑ij

(Aij −

kikj2m

)δ(ci , cj),=

∑u

(euu − a2u)

euu - fraction of edges within community uau =

∑u euv - fraction of ends of edges attached to nodes in u

The higher the modularity score - the better are communities

Modularity score range Q ∈ [−1/2, 1), single community Q = 0

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 11 / 42

Page 12: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Community detection

Consider only sparse graphs m� n2

Each community should be connected

Combinatorial optimization problem:- optimization criterion (cut, conductance, modularity)- optimization method

Exact solution NP-hard(bi-partition: n = n1 + n2, n!/(n1!n2!) combinations)

Solved by greedy, approximate algorithms or heuristics

Recursive top-down 2-way partition, multiway partition

Balanced class partition vs communities

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 12 / 42

Page 13: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Multiway partitioning

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 13 / 42

Page 14: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Recursive partitioning

recursive partitioning

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 14 / 42

Page 15: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Focus on edges that connect communities.Edge betweenness -number of shortest paths σst(e) going through edge e

CB(e) =∑s 6=t

σst(e)

σst

Construct communities by progressively removing edgesLeonid E. Zhukov (HSE) Lecture 5 15.05.2015 15 / 42

Page 16: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness algorithm

Newman-Girvan, 2004

Algorithm: Edge Betweenness

Input: graph G(V,E)

Output: Dendrogram/communities

repeatFor all e ∈ E compute edge betweenness CB(e);

remove edge ei with largest CB(ei ) ;

until edges left;

If bi-partition, then stop when graph splits in two components(check for connectedness)

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 16 / 42

Page 17: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Hierarchical algorithm, dendrogram

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 17 / 42

Page 18: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Zachary karate club

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 18 / 42

Page 19: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Zachary karate club

igraph:edge.betweenness.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 19 / 42

Page 20: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Zachary karate club

igraph:edge.betweenness.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 20 / 42

Page 21: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

igraph:modularity()

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 21 / 42

Page 22: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

best: clusters = 6, modularity = 0.345

igraph:edge.betweenness.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 22 / 42

Page 23: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Edge betweenness

Zachary karate club

igraph:dendPlot()

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 23 / 42

Page 24: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Spectral graph partitioning

Indicator vector si = ±1

Integer optimization problem

Normalized cuts:

Q =1

4sTLs, L = D− A

Modularity optimization:

Q =1

4msTBs, Bij = Aij −

kikj2m

Relaxation s → x , s ∈ Zn, x ∈ Rn

Quadratic optimization problem under constraints

Solved by finding min/max eigenvalues and eigenvectors of L or B

Lx = λDx, or Bx = λx,

Eigenvector rounds up to indicator vector s = sign(x)

Fiedler 1973, Poten and Simon 1990, Newman 2006

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 24 / 42

Page 25: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Spectral partitioning

Newman, 2006

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 25 / 42

Page 26: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Spectral modularity maximization

M. Newman, 2006

Algorithm: Spectral modularity maximization: two-way partition

Input: adjacency matrix A

Output: class indicator vector s

compute k = deg(A);

compute B = A− 12mkkT ;

solve for maximal eigenvector Bx = λx;

set s = sign(xmax)

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 26 / 42

Page 27: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Leading eigenvector

clusters = 5, modularity = 0.437

igraph:leading.eigenvector.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 27 / 42

Page 28: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Label propagation algorithm

U.N. Raghavan, R. Albert, S. Kumara, 2007

Algorithm: Label propagation

Input: Graph G(V,E)

Output: Communities

Initialize labels on all nodes;

Randomized node order;

repeatFor every node replace its label with occurring with the highestfrequency among neighbors (ties are broken uniformly randomly);

until every node has a label that the maximum number of the neighborshave;

Raghavan, 2007

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 28 / 42

Page 29: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Label propagation

clusters = 3, modularity = 0.435

igraph:label.propagation.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 29 / 42

Page 30: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Label propagation

image from Lab41 blog

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 30 / 42

Page 31: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Fast community unfolding

V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, 2008 ”TheLouvain method”

Heuristic method for greedy modularity optimization

Find partitions with high modularity

Multi-level (multi-resolution) hierarchical scheme

Scalable

V. Blondel et.al., 2008

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 31 / 42

Page 32: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Fast community unfolding algorithm

Algorithm: Fast unfolding

Input: Graph G(V,E)

Output: Communities

Assign every node to its own community;

repeatrepeat

For every node evaluate modularity gain from removing node fromits community and placing it in the community of its neighbor;

Place node in the community maximizing modularity gain;

until no more improvement (local max of modularity);

Nodes from communities merged into ”super nodes” ;

Weight on the links added up

until no more changes (max modularity);

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 32 / 42

Page 33: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Fast community unfolding

V. Blondel et.al., 2008

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 33 / 42

Page 34: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Fast community unfolding

clusters = 4, modularity = 0.445

igraph:multilevel.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 34 / 42

Page 35: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Fast community unfolding

V. Blondel et.al., 2008

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 35 / 42

Page 36: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Walktrap community

P. Pons and M. Latapy, 2006 ”Random walks based community detection”

Consider random walk on graphAt each time step walk moves to NN uniformly at randomDistance between nodes rij(t) is computed as probability Pt

ij to getfrom one to another in t stepsComputations:- exact, matrix multiplication- approximate, random walk simulationVertex clustering (agglomerative algorithm)

P. Pons and M. Latapy, 2006

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 36 / 42

Page 37: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Walktrap

Algorithm: Walktrap community detection

Input: Graph G(V,E)

Output: Dendrogram/communities

Assign each vertex to its own community;

Compute random walk distance between adjacent vertices;

for n-1 steps dochoose two ”closest” communities and merge them ;

update distance between communities

end

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 37 / 42

Page 38: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Walktrap

P. Pons and M. Latapy, 2006

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 38 / 42

Page 39: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Walktrap

clusters = 4, modularity = 0.440

igraph: walktrap.community()Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 39 / 42

Page 40: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Overlapping communities

Community detection:

Graph partitioning (sparse cuts)

Vertex clustering (vertex similarity)

image from W. Liu , 2014

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 40 / 42

Page 41: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

Community detection algorithms

Fortunato, 2010

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 41 / 42

Page 42: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 1 / 42. Lecture outline 1 Cohesive subgroups Graph cliques 2 Network communities Density, cuts, modularity 3 Algorithms

References

S. Fortunato. Community detection in graphs, Physics Reports, Vol.486, Iss. 35, pp 75-174, 2010S. E. Schaeffer. Graph clustering. Computer Science Review,1(1):2764, 2007.Modularity and community structure in networks, M.E.J. Newman,PNAS, vol 103, no 26, pp 8577-8582, 2006Finding and evaluating community structure in networks, M.E.J.Newman, M. Girvan, Phys. Rev E, 69, 2004U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm todetect community structures in large-scale networks, Phys. Rev. E 76(3) (2007) 036106.G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Uncovering the overlappingcommunity structure of complex networks in nature and society,Nature 435 (2005) 814?818.P. Pons and M. Latapy, Computing communities in large networksusing random walks, Journal of Graph Algorithms and Applications,10 (2006), 191-218.V.D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre, Fastunfolding of communities in large networks, J. Stat. Mech. P10008(2008).

Leonid E. Zhukov (HSE) Lecture 5 15.05.2015 42 / 42