Community structure in graphs
description
Transcript of Community structure in graphs
![Page 1: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/1.jpg)
Community structure in graphs
Santo Fortunato
![Page 2: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/2.jpg)
More links “inside” than “outside”
Graphs are “sparse”
“Communities”
![Page 3: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/3.jpg)
Metabolic Protein-protein
Social Economical
![Page 4: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/4.jpg)
Outline
• Elements of community detection• Graph partitioning• Hierarchical clustering• The Girvan-Newman algorithm• New methods• Testing algorithms• Conclusions
![Page 5: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/5.jpg)
Questions
• What is a community?• What is a partition?• What is a “good” partition?
![Page 6: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/6.jpg)
Communities: definition
• Local criteria• Global criteria• Vertex similarity
In general, communities are indirectlydefined by the particular algorithm used!
![Page 7: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/7.jpg)
What is a partition?
“A partition is a subdivision of a graph in groups of vertices, such that each vertex is assigned to one group”
Problems:1) Overlapping communities2) Hierarchical structure
![Page 8: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/8.jpg)
Overlapping communities
In real networks,vertices may belongto different modules
G. Palla, I. Derényi, I. Farkas, T. Vicsek,Nature 435, 814, 2005
![Page 9: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/9.jpg)
Hierarchies
Modules may embedsmaller modules, yielding differentorganizational levels
A. Clauset, C. Moore, M.E.J. Newman, Nature 453, 98, 2008
![Page 10: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/10.jpg)
What is a “good” partition?
![Page 11: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/11.jpg)
How can we compare different partitions?
?
![Page 12: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/12.jpg)
Partition P1 versus P2: which one isbetter?
Quality function Q
Is Q(P1) > Q(P2) or Q(P1) < Q(P2) ?
![Page 13: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/13.jpg)
Modularity
= # links in module i
= expected # of links in module i
li
![Page 14: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/14.jpg)
History
• 1970s: Graph partitioning in computer science
• Hierarchical clustering in social sciences• 2002: Girvan and Newman, PNAS 99,
7821-7826• 2002-onward: methods of “new
generation”, mostly by physicists
![Page 15: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/15.jpg)
Graph partitioning
“Divide a graph in n parts, such that thenumber of links between them (cut size)is minimal”
Problems:
1. Number of clusters must be specified2. Size of the clusters must be specified
![Page 16: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/16.jpg)
If cluster sizes are not specified, the minimal cut size is zero, for a partition where all nodes stay in a single cluster and the other clusters are “empty”
Bipartition: divide a graph in two clustersof equal size and minimal cut size
![Page 17: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/17.jpg)
Spectral partitioning
Laplacian matrix L
![Page 18: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/18.jpg)
Spectral properties of L:
• All eigenvalues are non-negative• If the graph is divided in g components, there are g zero eigenvalues• In this case L can be rewritten in a block-diagonal form
![Page 19: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/19.jpg)
If the network is connected, but thereare two groups of nodes weakly linked to each other, they can be identified fromthe eigenvector of the second smallest eigenvalue (Fiedler vector)
The Fiedler vector has both positive andnegative components, their sum must be 0
If one wants a split into n1 and n2=n-n1 nodes, one takes the n1 largest (smallest)components of the Fiedler vector
![Page 20: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/20.jpg)
Kernighan-Lin algorithm
Start: split in two groups
At each step, a pair of nodes of differentgroups are swapped so to decrease the cut size
Sometimes swaps are allowed that increase the cut size, to avoid local minima
![Page 21: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/21.jpg)
Hierarchical clustering
Very common in social network analysis
1. A criterion is introduced to compare nodes based on their similarity2. A similarity matrix X is constructed:the similarity of nodes i and j is Xij
3. Starting from the individual nodes, largergroups are built by joining groups of nodes based on their similarity
![Page 22: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/22.jpg)
Final result: a hierarchy of partitions (dendrogram)
![Page 23: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/23.jpg)
Problems of traditionalmethods• Graph partitioning: one needs to specify
the number and the size of the clusters • Hierarchical clustering: many partitions
recovered, which one is the best?
One would like a method that can predictthe number and the size of the partitionand indicate a subset of “good” partitions
![Page 24: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/24.jpg)
Girvan-Newman algorithm
M. Girvan & M.E.J Newman, PNAS 99, 7821-7826 (2002)
Divisive method: one removes the linksthat connect the clusters, until the latterare isolated
How to identify intercommunity links? Betweenness
![Page 25: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/25.jpg)
Link-betweenness: number of shortestpaths crossing a link
![Page 26: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/26.jpg)
Steps
1. Calculate the betweenness of all links2. Remove the one with highest
betweenness3. Recalculate the betweenness of the
remaining edges4. Repeat from 2
![Page 27: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/27.jpg)
The process delivers a hierarchy of partitions: which one is the best?
The best partition is the one correspondingto the highest modularity Q
M.E.J. Newman & M. Girvan, Phys. Rev. E69, 026113 (2004)
The algorithm runs in a time O(n3) on a sparse graph (i.e. when m ~ n)
![Page 28: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/28.jpg)
New methods
• Divisive algorithms• Modularity optimization• Spectral methods• Dynamics methods• Clique percolation• Statistical inference
![Page 29: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/29.jpg)
Modularity optimization
1) Greedy algorithms2) Simulated annealing3) Extremal optimization4) Spectral optimization
Goal: find themaximum of Q overall possible networkpartitions
Problem: NP-complete!
![Page 30: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/30.jpg)
Greedy algorithm
• Start: partition with one node in each community
• Merge groups of nodes so to obtain the highest increase of Q
• Continue until all nodes are in the same community
• Pick the partition with largest modularity
M.E.J. Newman, Phys. Rev. E 69, 066133, 2004
CPU time O(n2)
![Page 31: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/31.jpg)
Resolution limit of modularity
![Page 32: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/32.jpg)
Ll
Ll
?
S.F. & M. Barthélemy, PNAS 104, 36 (2007)
![Page 33: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/33.jpg)
Dynamic algorithms
• Potts model• Synchronization• Random walks
![Page 34: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/34.jpg)
Clique percolation
G. Palla, I. Derényi, I. Farkas, T. Vicsek,Nature 435, 814, 2005
![Page 35: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/35.jpg)
Testing algorithm
• Artificial networks• Real networks with known
community structure
![Page 36: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/36.jpg)
Benchmark of Girvan & Newman
![Page 37: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/37.jpg)
Problems
• All nodes have the same degree• All communities have equal size
In real networks the distributions of degree and community size are highly heterogeneous!
![Page 38: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/38.jpg)
New benchmark (A. Lancichinetti, S. F., F. Radicchi, arXiv:0805.4770)
• Power law distribution of degree• Power law distribution of community size• A mixing parameter μsets the ratio between
the external and the total degree of each node
![Page 39: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/39.jpg)
![Page 40: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/40.jpg)
![Page 41: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/41.jpg)
Real networks
![Page 42: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/42.jpg)
Open problems
• Overlapping communities• Hierarchies• Directed graphs• Weighted graphs• Computational complexity• Testing?
![Page 43: Community structure in graphs](https://reader035.fdocuments.net/reader035/viewer/2022062500/56815863550346895dc5c17a/html5/thumbnails/43.jpg)
S. F., C. Castellano, arXiv:0712.2716