Physics Inspired Approaches to Community Detection

download Physics Inspired Approaches to Community Detection

of 52

  • date post

    23-Aug-2014
  • Category

    Science

  • view

    1.204
  • download

    2

Embed Size (px)

description

Community structure is one of the most relevant features of graphs in sociology, biology, computer science and so on. In this slide, the following methods for community detection are reviewed: (1) synchronization, and (2) spinglass. References [1] A. Arenas, A. D. Guilera, C. J. P. Vicente, Phys. Rev. Lett. 96, 114102 (2006) [arXiv:cond-mat/0511730] [2] P. Ronhovde, Z. Nussinov, Phys. Rev. E 81, 046114 (2010) [arXiv:0803.2548] [3] S.Fortunato, Phys. Rep. 486, 74 (2010) [arXiv:0906.0612]

Transcript of Physics Inspired Approaches to Community Detection

  • Physics Inspired Approaches to Community Detection 2012-09 1 / 52
  • AbstractCommunity structure is one of the most relevantfeatures of graphs in sociology, biology, computerscience and so on.In this talk, we review the following methods forcommunity detection: synchronization spinglass 2 / 52
  • Outline. Introduction1. Synchronization2. Spinglass3. Summary4 3 / 52
  • 1. Introduction 4 / 52
  • Communities in real-world networks biological network protein interaction, gene regulatory, metabolic, food chain social network SNS, collaborators, phone/email, organization technical system web graph, Internet, power grid other network citation, e-commerce/bidding, stock returns dynamical phenomena epidemic, cascade, synchronization, opinion change 5 / 52
  • NotationG = (V, E) graph(network) vi V vertex(node), n = |V| (i, j) E edge(link) from vi to v j , 2m = ij Ai j Aij adjacency matrix 1 (i, j) E Aij = 0 otherwise ki degree of vi kiout = j Aij , kin = i Aij j wij 0 weight (i, j) of wi = j wi j , win = i wij , 2w = ij wij out j cs C community, q = |C| 6 / 52
  • What is community detectionCommunities are subgraphs within which connections are dense, and between which they are sparse.The concept of community is not rigorously dened, and includes some degree of arbitrariness.Finding an exact solution is NP-hard in most cases. Thus an approximation algorithm is needed. 7 / 52
  • In this talk, we focus on non-overlapping communities non-dynamical graphs sparse graphs: O(m) = O(n) 8 / 52
  • Girvan-Newman algorithm [1] hierarchical divisive algorithm iteratively remove an edge with the highest betweeness, and recalculate betweeness O(m2 n) (for shortest-path betweeness version)Alternative denitions of betweeness . shortest-path betweeness 1 . ow betweeness 2 . random-walk betweeness 3 9 / 52
  • Ref. [1]Ref. [1] 10 / 52
  • ModularityMany algorithms assume modularity Q as a measureof goodness of a partition. [2] win wout Q= 1 w i j (c , c ) ij i j 2w 2w ij w q ( ) ss win wout = s 2 s w 4w s=1 2wss = ij wij (ci , cs )(c j , cs ), ws = i wij (ci , cs ) ci : community to which vi belongs 1st term: weight of within-community edges 2nd term: expectation value of it for randomized graph 11 / 52
  • Greedy modularity optimization hierarchical agglomerative algorithm iteratively merge communities to produce the largest possible increase of modularity O((m + n)n) [2, Newman]O(md log n), d = (depth of dendrogram) log nwith use of max-heap [4, Caluset-Newman-Moore]Improvement of merging strategy [20]. 12 / 52
  • Louvain method [25] . assign its own community to each node 1 . iterate until no change happen 2 . locally optimize each order in sequential 1 order until no change happen . replace communities by supernodes 2Alternatively start with randomly assigned q < Ncommunities, and try several initial conditions.A new random sequential order can be used eachtime. 13 / 52
  • Ref. [25] 14 / 52
  • Resolution limit of modularityModularity optimization may fail to identify communities smaller than an intrinsic sale mmeasured by the number of links. [11] Km Km Km Km Km A Km Km Km Km Km Kp B Km Km Kp Ref. [11] 15 / 52
  • Infomap/map equation [22, 36]Compress information of a random-walk trajectoryby Huffman code. pi : visit frequency with teleportation ratio qout : exit probability for cs i n ns qexit = s pi + (1 ) pi Aij n i ij ci =cs , c j cs 16 / 52
  • 17 / 52 Ref. [22] st
  • Minimize the description length by Louvain method: exit exit L= qs H + pi + qs Hs . s s i ci =csH is the entropy of a community-index codebook qexit ( ) s qs H= exit log exit . s r qr r qrHs is the entropy of a within-community codebook ( ) qs