Boosting Vertex-Cut Partitioning for Streaming Graphs
-
Upload
hooman-peiro-sajjad -
Category
Presentations & Public Speaking
-
view
105 -
download
0
Transcript of Boosting Vertex-Cut Partitioning for Streaming Graphs
Boosting Vertex-Cut Partitioning for Streaming Graphs
Hooman Peiro Sajjad*, Amir H. Payberah†, Fatemeh Rahimian†, Vladimir Vlassov*, Seif Haridi†
* KTH Royal Institute of Technology † SICS Swedish ICT
5th IEEE International Congress on Big Data
Introduction
Graph PartitioningPartition large graphs for applications such as:•Complexity reduction, parallelization and distributed graph analysis
3
P1 P2
P3 P4
Partitioning Models
4
Partitioning Models
5
Partitioning Models
6
P1 P2
Partitioning Models
7
P1 P2
Partitioning Models
8
P1 P2 P1 P2
Partitioning Models
9
P1 P2 P1 P2
More efficient for power-law graphs
A Good Vertex-Cut Partitioning
10
• Low replication factor• Balanced partitions with respect to the number of edges
Streaming Graph Partitioning
• Graph elements are assigned to partitions as they are being streamed
• No global knowledge
11
Partitioner
P1
P2
Pp
streaming edges
State-of-the-Art Partitioners
12
State-of-the-Art Partitioners• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information
13
State-of-the-Art Partitioners• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information
• Distributed partitioner:• Oblivious partitioners: several independent
partitioners
14
State-of-the-Art Partitioners• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information
• Distributed partitioner:• Oblivious partitioners: several independent
partitioners
15
Slow partitioning timeLow replication factor
State-of-the-Art Partitioners• Centralized partitioner:
• Single thread partitioner
• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information
• Distributed partitioner:• Oblivious partitioners: several independent
partitioners16
Slow partitioning timeLow replication factor
Fast partitioning timeHigh replication factor
Slow partitioning timeLow replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
17
Distributed partitioner
Fast partitioning timeHigh replication factor
Slow partitioning timeLow replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
18
Distributed partitioner
Fast partitioning timeHigh replication factor
?
Slow partitioning timeLow replication factor
Centralized partitioner
Partitioning Time vs. Partition Quality
19
Distributed partitioner
Fast partitioning timeHigh replication factor
HoVerCut
HoVerCut Framework
HoVerCut ...
• Streaming Vertex-Cut partitioner
• Horizontally and Vertically scalable
• Scales without degrading the quality of partitions
• Employs different partitioning policies
21
Architecture Overview
22
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Shared State
Async
Async
Architecture: Input
23
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner 1
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Async
Async
• Input graphs are streamed by their edges
• Each subpartitioner receives an exclusive subset of the edges
Shared State
Architecture: Configurable Window
24
Partitioning Policy
Local State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Async
Async
Subpartitioners collect a number of incoming edges in a window of a certain size.
Tumbling Window
Core
Shared State
Architecture: Partitioning Policy
25
Local State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Async
Async
Each subpartitioner assigns the edges to the partitions based on a given policy
Partitioning Policy
Tumbling Window
Shared State
Core
Architecture: Local State
26
Each subpartitioner has a local state, which includes information about the edges processed locally:
• partial degree• partitions of each vertex• num. edges in each partition
Partitioning Policy
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Async
Async
Local State
Tumbling Window
Shared State
Core
Architecture: Shared State
27
Shared-state is the global state accessible by all subpartitioners.
Partitioning Policy
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
Tumbling Window
Shared State
Core
Local
State
Architecture: Shared State
28
Shared-state is the global state accessible by all subpartitioners.
putState
getState
ID Partial Degree partitions
v1 12 p1
v2 50 p1,p2
Vertex Table Partition Table
Shared State
ID Num. of edges
p1 5000
p2 6500
Partitioning Policy
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream
Async
Async
Tumbling Window
Shared State
Core
Local
State
Architecture: Core
29
Partitioning Policy
Local State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local State
Subpartitioner n
Edge stream
Async
Async
The core is HoVerCut’s main algorithm parametrised with partitioning policy and the window size.
Core
Shared State
Tumbling Window
Vertex-Cut Partitioning Heuristics
30
For an edge with end-vertices uand v and for every partition p
Vertex-Cut Partitioning Heuristics
31
Score = ReplicationScore + LoadBalanceScore
For an edge with end-vertices uand v and for every partition p
Vertex-Cut Partitioning Heuristics
Choose the partition that maximizes the Score.
32
Score = ReplicationScore + LoadBalanceScore
For an edge with end-vertices uand v and for every partition p
Vertex-Cut Partitioning Heuristics
Choose the partition that maximizes the Score
33
Score = ReplicationScore + LoadBalanceScore
State-of-the-Art Heuristics:•Greedy•HDRF
For an edge with end-vertices uand v and for every partition p
Greedy vs. HDRF
34
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
35
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
36
P1
P2
u
v
uGreedy
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
37
P1
P2
u
v
uP1
P2
u
v
Greedy
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
38
P1
P2
u
v
uP1
P2
u
v
Greedy
HDRF (High Degree Replicated First): replicates the higher degree end-vertex.
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
39
P1
P2
u
v
uP1
P2
u
v
Greedy
P1
P2
u
v u
HDRFv
HDRF (High Degree Replicated First): replicates the higher degree end-vertex.
Greedy vs. HDRF
Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.
40
P1
P2
u
v
uP1
P2
u
v
Greedy
P1
P2
u
v uP1
P2
HDRF
u
v v
vHDRF (High Degree Replicated First): replicates the higher degree end-vertex.
Partitioning a Window of Edges
41
Partitioning a Window of Edges
vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex subtable restricted to vids
42
Partitioning a Window of Edges
vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids
for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size
end
43
Partitioning a Window of Edges
vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids
for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size
end
update the shared state by sending vt, pt represented as deltas
44
Partitioning a Window of Edges
vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids
for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size
end
update the shared state by sending vt, pt represented as deltas
ID Degree partitions
v1 +4 +p1
v2 +2 +p2
ID size
p1 +3
p2 +1
vt pt
Evaluation
Datasets
47
Dataset |V| |E|
Autonomous systems (AS) 1.7M 11M
Pokec social network (PSN) 1.6M 22M
LiveJournal social network (LSN) 4.8M 48M
Orkut social network (OSN) 3.1M 117MPartitions: 16
Evaluation Metrics
• Replication Factor (RF): the average number of replicated vertices
• Load Relative Standard Deviation (LRSD): the relative standard deviation of edge size in each partition (LRSD=0 indicates equal size partitions)
• Partitioning time: the time it takes to partition a graph
48
One Host: Summary
49
HoVerCut’s configuration: Subpartitioners (threads) = 32Window size = 32
One Host: Summary
50
HoVerCut’s configuration: Subpartitioners = 32Window size = 32
Distributed Configuration
51
AS|V|=1.7M|E|=11M
Distributed Configuration
52
AS|V|=1.7M|E|=11M
OSN|V|=3.1M|E|=117M
Conclusion•We presented HoVerCut, a parallel and distributed partitioner
•We can employ different partitioning policies in a scalable fashion
•We can scale HoVerCut to partition larger graphs without degrading the quality of partitions
53
Thank You!