Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data...
Transcript of Billion-scale Network Embedding with Iterative Random ...4 Challenge: Billion-scale Network Data...
Billion-scale Network Embedding
with Iterative Random Projection
Ziwei Zhang Peng Cui Haoyang Li Xiao Wang Wenwu Zhu
Tsinghua U Tsinghua U Tsinghua U Tsinghua U Tsinghua U
2
Network Data is Ubiquitous
Social Network Biology Network
Traffic Network
3
Network Embedding: Vector Representation of Nodes
Generate
Embed
Apply feature-based machine
learning algorithms
Fast computing of nodes similarity
Support parallel computing
Applications: link prediction,
node classification, community
detection, centrality measure,
anomaly detection ...
4
Challenge: Billion-scale Network Data
Social Networks
WeChat: 1 billion monthly active users (March,
2018)
Facebook: 2 billion active users (2017)
E-commerce Networks
Amazon: 353 million products, 310 million users, 5
billion orders (2017)
Citation Networks
130 million authors, 233 million publications, 754
million citations (Aminer, 2018)
How to conduct network embedding for
such large-scale network dataοΌ
5
Bottleneck of Existing Methods Methods based on random-walks
DeepWalk, B. Perozzi, et al. KDD 2014.
LINE, J. Tang, et al. WWW 2015.
Node2vec, A. Grover, et al. KDD 2016.
Methods based on matrix factorization
M-NMF, X. Wang, et al. AAAI 2017.
AROPE, Z. Zhang, et al. KDD 2018.
Methods based on deep learning
SDNE, D. Wang, et al. KDD 2016.
DVNE, D. Zhu, et al. KDD 2018.
Common bottleneck: based on sophisticated optimization
Computationally expansive
Hard to resort to distributed computing scheme
Optimization is entangled and needs global information
β Communication cost is high
Only handle thousands or millions of nodes and edges
Network embedding: essentially a dimension reduction problem
Random projection: optimization-free for dimension reduction
Basic idea: randomly project data into a low-dimensional subspace
Extremely efficient and friendly to distributed computing
6
Random Projection
Key network property: high-order proximity
Can solve the network sparsity problem
Measure indirect relationship between nodes
β How to design a high-order proximity preserved random projection?
7
High-Order Proximity
Target Node
First-order
Second-order
Third-order
β¦
8
Problem Formulation Objective function: matrix factorization of preserving high-order proximity
minπ,π
π β πππ π2
π = π π΄ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π
Slight modification: assuming positive semi-definite and using 2 norm
minπ
πππ β πππ2
π = π π΄ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π
Random projection:
Denote π β βπΓπ as a Gaussian random matrix
π ππ~π© 0,1
π
Surprisingly simple result:
π = ππ
9
Theoretical Guarantee
Theoretical guarantee
Basically, random projection can effectively minimize the objective function
However, calculating π is still very time consuming
10
Iterative Projection Iterative projection:
π = ππ = πΌ1π΄1 + πΌ2π΄
2 +β―+ πΌππ΄π π
= πΌ1π΄1π + πΌ2π΄
2π +β―+ πΌππ΄ππ
Can be calculated iteratively
Why efficient?
π΄: π Γπ sparse adjacency matrix
π : π Γ π low-dimensional matrix
Associative law of matrix multiplication
π΄π΄β¦π΄π΄π΄π
Γ A Γ A Γ A
π΄π΄β¦π΄π΄ π΄π
π΄π΄β¦π΄ π΄π΄π
Sparse
Low-dimensional
Sparse
Low-dimensional
Sparse
Low-dimensional
Sparse matrix multiplication!
11
Iterative Projection
A1 A2 AqΓ A Γ A Γ Aβ¦
Γ R
π1Γ A
π2 ππΓ A Γ Aβ¦
Γ R
Time Consuming!
Efficient!
12
RandNE: Iterative Random projection Network Embedding
Time Complexity: π πππ + ππ2
π/π: number of nodes/edges; π: dimension; π: order
Linear w.r.t. network size
Only need to calculate q sparse matrix products
Orders of magnitude faster than existing methods!
Advantages:
Distributed Calculation
Dynamic Updating
13
Distributed Calculation
Iterative random projection only involves matrix product ππ = π΄ππβ1
Each dimension can be calculated separately
Property of sparse matrix product
No communication is needed during calculation!
π β βπΓπ
14
Dynamic Updating Networks are dynamic in nature
E.g., in social networks, users add/delete friends, new users join, old users leave
Changes of edges β Calculate incremental parts!
ππ + Ξππ = π΄ + Ξπ΄ β (ππβ1+Ξππβ1)
β Ξππ = π΄ β Ξππβ1 + Ξπ΄ β ππβ1 + Ξπ΄ β Ξππβ1
Changes of nodes β adjust the dimensionality
TimeT T+1
15
Dynamic Updating
Linear scalability w.r.t. number of changed nodes/edges
No error accumulation
Identical results as re-running the algorithm
16
Experimental Setting: Moderate-scale Networks
Datasets: BlogCatalog, Flickr, YouTube
Baselines:
DeepWalk (KDD 2014): DFS random walk + skip-gram
LINE (WWW 2015): BFS random walk + skip-gram
Node2vec (KDD 2016): biased random walk + skip-gram
SDNE (KDD 2016): deep auto-encoder
17
Experimental Results Running time
At least dozens of times faster
20
Experimental Results Node Classification
22
Experimental Results Parameter analysis:
Effectiveness of preserving high-order proximity
Scalability
<4 minutes for network with 1 million nodes, 100 million edges with one PC
23
Experiments on a Billion-scale Network Experimental results on WeChat
250 millions nodes, 4.8 billion edges
Network Reconstruction
Dynamic link prediction
Better results and no error accumulation!
24
Experimental Results Running time and acceleration ratio
Practical running time for real billion-scale networks
Number of Computing Nodes 4 8 12 16
Running Time(s) 82157 46029 33965 24757
<7 hours!
Support distributed computing
25
Conclusion RandNE: a billion-scale network embedding method
Based on iterative random projection to preserve high-order proximities
Much more computationally efficient
Distributed algorithm
Handle dynamic networks
Experimental results on moderate-scale networks
At least one order of magnitude faster
Better or comparable performance
Linear scalability
Experiments on WeChat, a real billion-scale network
Better results in network reconstruction and link prediction
No error accumulation
Linear acceleration ratio
26
Thanks!Ziwei Zhang, Tsinghua University
http://zw-zhang.github.io/
http://nrl.thumedialab.com/