Network theory III David Lusseau BIOL4062/5062 [email protected].
-
Upload
wilfred-gordon -
Category
Documents
-
view
219 -
download
1
Transcript of Network theory III David Lusseau BIOL4062/5062 [email protected].
Outline
16 March: community structure
Suggested readings: Newman M.E.J. 2003. The structure and function of complex
networks. SIAM Review 45,167-256
What is a community?
A cluster of individuals that are more linked to one another than to others
Traditional techniques
Cluster analysis (hierarchical)
Multi-Dimensional Scaling
Principal Coordinate Analysis
Traditional techniques
How representative is the result? Loss of information measure: Stress in MDS
What is the best division? Cluster analysis Peripheral individuals are lumped together
Girvan-Newman algorithm
Divisive clustering algorithm Divide a population of n vertices in 1 to n communities
Find the boundaries of communities Weakest link between communities: edge betweenness
Standardise betweenness at each step Re-calculate edge betweenness at each step
Zachary karate club
Girvan & Newman 2002 PNAS
Finding the best division For each step calculate a modularity coefficient
Best division will have the most edges within communities and the least between Take community size into consideration
2i
iii aeQ
1 2 3
1 30 2 5
2 2 10 2
3 5 2 50
j
iji ea
))
108
57(
108
50())
108
14(
108
10())
108
37(
108
30(Q 222
Q=0.42
Zachary karate club
Newman & Girvan 2003 Physics Review E
Modularity coefficient
The principle of modularity coefficient optimisation can be apply to any community structure algorithm
Extension to weighted matrices Edge betweenness
Transform similarity matrix into dissimilarity matrix Calculate geodesic path using Djikstra’ algorithm
Problem: more likely to remove edges between strongly connected pairs
Alternative: Modularity optimisation Forget edge betweenness
Optimise for high Q!
Computer intensive
Prone to false minima Difficult to find out Iterate the optimisation to detect
Not always successful
Modularity- Greedy algorithm
Start with n communities (agglomerative clustering method)
At each step link the communities that provides the greatest increase (or the smallest decrease in Q)
Q optimisation
Girvan-Newman
Modularity- Greedy algorithm
Overlapping communities
Recognise that some individuals sit on the fence Do not force them in one community or the other
but identify them as overlapping
Palla et al. 2005 Nature
Palla algorithm Based on the k-clique principle: a community is composed of a number of k-cliques
k-cliques: fully connected subgraphs of k vertices
Adjacent k-cliques share k-1 vertices
Community: series of adjacent cliques
Palla et al. 2005 Nature
Palla algorithm Find all k-cliques Calculate the clique-clique overlap matrix Define adjacent cliques
Issues (and advantages): k is user-defined, find ‘best’ k by trial and error Works only on binary networks
(weighted network transformation)
Palla et al. 2005 Nature
Simply the best method
Modularity matrix
A matrix? Let’s eigenanalyse!
Let’s rewrite the modularity coefficient:
jiij
jiij ss
m
kkA
mQ )
2(
4
1
Links distributed at random
Community identification
Newman 2006 PNAS
Modularity matrix
Sum rows and sum of columns = 0 One eigenvector (1,1,1….) with eigenvalue 0 Graph Laplacian
Eigenvector of the dominant eigenvalue gives the best community division into 2 communities (negative and positive elements)
)2
(m
kkAB ji
ijij
Magnitude of eigenvector elements Tells us how well a vertex is classified (whether
it belongs to the core or the periphery of the community)
Zachary karate club
Finding the best division
Repeat the process on each subgraph
Recalculate the modularity coefficient for the whole graph
If new division makes 0 or <0 contribution to modularity then do not do it
Else continue
Power of modularity matrix method Different types of null models can be tested
As long as we have One eigenvector (1,1,1….) with eigenvalue 0
To do so, substract sum of rows from diagonal
jiij
ijij ssPAm
Q )(2
1
Uncertainty
Bootstrapped algorithm m results from community algorithm
Matrix: likelihood that 2 individuals belong to the same community
Coarse-grain community identity Provides uncertainty overlap
Girvan-Newman in NetdrawModularity matrix in Socprog