A Genetic Algorithm for Modularity Density Optimization in Community Detection
Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf ·...
Transcript of Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf ·...
![Page 1: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/1.jpg)
© 2013 NTT Software Innovation Center
Fast Algorithm for Modularity-based Graph Clustering
Hiroaki ShiokawaNTT Software Innovation Center, NTT Corporation,
July 23rd , 2013
![Page 2: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/2.jpg)
2© 2013 NTT Software Innovation Center
BACKGROUND & MOTIVATION
![Page 3: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/3.jpg)
3© 2013 NTT Software Innovation Center
Large Graphs• Large-scale graphs become available
• Facebook: 1.11 billion active users / month(*1)• Twitter: 140 million active users / day
340 million new posts / day (*2)• And more …
• A lot of techniques for analyzing massive-scale graph• Massive data require so much time for analysis • It is important to analyze large scale data quickly
(*1) “Key Facts”, http://newsroom.fb.com/Key-Facts(*2) http://dev.twitter.com/media/authors
![Page 4: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/4.jpg)
4© 2013 NTT Software Innovation Center
• Graph clustering is one of the most important methods• Community detection over social networks• Event detection from microblogging services• Brain Analysis, Image segmentation, …
Graph Clustering
Densely connected
Sparsely connected
Clustering analysis
![Page 5: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/5.jpg)
5© 2013 NTT Software Innovation Center
• Clustering methods which find the division of graph to maximize the modularity measure
• Improvement of clustering speed
Modularity-based Graph Clustering
Girvan-Newman method [Girvan et al., 2004]10k nodes/hour
10M nodes/hour BGLL[Blondel et al., 2008]
100k nodes/hour Newman method [Newman et al., 2004]
1M nodes/hour CNM[Clauset et al., 2004],WT[Wakita et al., 2008]
1B 〜 100M nodes/hour
Our research target
There are no algorithms
![Page 6: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/6.jpg)
6© 2013 NTT Software Innovation Center
• ObjectiveObjective and Contributions
Fast graph clustering method with high modularity
• 3 key techniques1. Incremental nodes aggregation 2. Incremental nodes pruning3. Efficient ordering of nodes selections
• Contributions of our algorithm• Efficiency
• Considerably faster than BGLL• Clusters 100M nodes within 3 minutes
• High Modularity• Scores high modularity as same as BGLL
• Effectiveness• Improves performances for complex networks
![Page 7: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/7.jpg)
7© 2013 NTT Software Innovation Center
PRELIMINARIES
![Page 8: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/8.jpg)
8© 2013 NTT Software Innovation Center
• Modularity evaluates the strength of division of a graph into clusters [Newman and Girvan 2004]
• Finding the division which maximizes modularity is NP-complete⇒ A lot of greedy approaches were proposed
Modularity
2 ∑ ∈
2 ∈
:Set of cluster:Number of edges between cluster ,:Total number of edges in a graph
The fraction of the edges within cluster i
The fraction of outgoing edgesfrom a cluster i
![Page 9: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/9.jpg)
9© 2013 NTT Software Innovation Center
• Continuing following passes until the modularity score is maximized
State of the art algorithm: BGLL
Pass =
1st phase: Local clustering1) Selects a node2) Computes the modularity gain3) Places the neighbor node in the same cluster
2nd phase: Data reduction• Aggregates all nodes in the same cluster as a single
node
1st phase 2nd phase
Continue to the next pass
![Page 10: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/10.jpg)
10© 2013 NTT Software Innovation Center
PROPOSED ALGORITHM
![Page 11: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/11.jpg)
11© 2013 NTT Software Innovation Center
• 3 key techniques• Incremental aggregation • Incremental pruning• Efficient ordering
Overview of proposed algorithmOur method BGLL
• Batch based aggregation• Random ordering
• Clustering coefficient• Power-law degree distribution
![Page 12: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/12.jpg)
12© 2013 NTT Software Innovation Center
• Complex networks have large clustering coefficient• Clustering coefficient is a measure of degree to which nodes
in a graph tend to cluster together• There are many duplicated nodes/edges in a graph which
has large clustering coefficient
• edu
Idea 1 : Clustering coefficient
Cluster
Internal nodeswithin a cluster
Duplicated edgesbetween different clusters Nodes whose clusters
are trivially obtained
![Page 13: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/13.jpg)
13© 2013 NTT Software Innovation Center
Idea 2 : Power-law degree distribution• Complex networks have highly skewed degree
distribution following the power-law distribution• Most of nodes only have a few neighbor nodes, and only few
nodes have large number of neighbor nodes• The frequency F of nodes with degree d is
• Random ordering of node selection leads redundant computation
Example of degree distributionof complex network
![Page 14: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/14.jpg)
14© 2013 NTT Software Innovation Center
Incremental Aggregation
• Incrementally aggregate nodes which belong to the same cluster
• It aggregates duplicated edges between clusters while keeping the graph semantics
• Example)
Same cluster
2
2
Aggregated node
Aggregate
Duplicated edges
![Page 15: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/15.jpg)
15© 2013 NTT Software Innovation Center
• Incrementally prune nodes whose cluster is trivially obtained
• We can easily assign nodes to clusters without computing modularity gains
• From the graph structure, there are 2 patterns of pruning
Incremental pruning
A
B
A node only has a single neighbor node
Pattern AA node surrounded by nodes belong to same cluster
Pattern Bcluster
Check neighborsʼ cluster
Non-trivial Easy to prune nodes
![Page 16: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/16.jpg)
16© 2013 NTT Software Innovation Center
Incremental pruning (Cont.)• Efficient pruning approach for pattern 2
• All nodes within the same cluster have been aggregated to a node by incremental aggregation
• We can find all prunable nodes by obtaining nodes such that they have only a single adjacent node
IncrementalAggregate
cluster
aggregatedcluster
A node of pattern B is converted to pattern A
![Page 17: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/17.jpg)
17© 2013 NTT Software Innovation Center
Efficient ordering of node selection• Dynamically selects a node with the smallest degree
• Example) Node A and B being assigned to the same cluster
• By selecting node with the smallest degree, we can avoid producing super-cluster structures
A B Select from A
Select from Bmany computations
2 computations
Itʼs more efficient to compute from node B than node A
![Page 18: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/18.jpg)
18© 2013 NTT Software Innovation Center
EVALUATION
![Page 19: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/19.jpg)
19© 2013 NTT Software Innovation Center
Datasets & Experimental Environment• Real world datasets
• 2 Social networks and 3 Web graphs of IP domains
• Experimental Environment• All experiments were conducted on a Linux 2.6.18 server with
Intel Xeon CPU L5640 2.27GHz and 144GB RAM• Run all methods on 1 core, 1CPU
Dataset |V| |E| Skewness ofdegree distribution
dblp-2010 326,186 1,615,400 2.82ljournal-2008 5,363,260 79,023,142 2.29
uk-2005 39,459,925 936,364,282 1.71webbase 118,142,155 1,019,903,190 2.14
uk2007-05 105,896,555 3,738,733,648 1.51
![Page 20: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/20.jpg)
20© 2013 NTT Software Innovation Center
Computational time
100 million nodesand 1 billion edgesin 156 seconds!
320k nodeswithin 1 seconds!
・Proposed is up to 60 timesfaster than the state ofthe art algorithm BGLL
・Graphs with highly skeweddegree distribution run fasterthan the other datasets
![Page 21: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/21.jpg)
21© 2013 NTT Software Innovation Center
Computational time – power-law differences
![Page 22: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/22.jpg)
22© 2013 NTT Software Innovation Center
Computational time – size differences
![Page 23: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/23.jpg)
23© 2013 NTT Software Innovation Center
• Modularity score for datasets• Large modularity score means the output of algorithms is
well clustered• Proposed method achieves almost same modularity scores
as/slightly higher than BGLL
Modularity score
![Page 24: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/24.jpg)
24© 2013 NTT Software Innovation Center
CONCLUSION
![Page 25: Fast Algorithm for Modularity-based Graph Clusteringnwds.cs.washington.edu/pdf/Shiokawa.pdf · 2013. 8. 5. · • Community detection over social networks • Event detection from](https://reader035.fdocuments.net/reader035/viewer/2022062506/5fbd05e05dd5060c0a740c89/html5/thumbnails/25.jpg)
25© 2013 NTT Software Innovation Center
• Fast clustering algorithm for large graphs• Our solution
• Incremental aggregation• Incremental pruning• Efficient ordering of nodes selections
• Contribution of our algorithm• Efficiency
• Considerably faster than BGLL• Clusters 100M nodes within 3 minutes
• High Modularity• Scores high modularity as same as BGLL
• Effectiveness• Improves performance for complex networks
Conclusion