Boosting Vertex-Cut Partitioning for Streaming Graphs

Boosting Vertex-Cut Partitioning for Streaming Graphs

Hooman Peiro Sajjad*, Amir H. Payberah†, Fatemeh Rahimian†, Vladimir Vlassov*, Seif Haridi†

* KTH Royal Institute of Technology † SICS Swedish ICT

5th IEEE International Congress on Big Data

Introduction

Graph PartitioningPartition large graphs for applications such as:•Complexity reduction, parallelization and distributed graph analysis

3

P1 P2

P3 P4

Partitioning Models

4

Partitioning Models

5

Partitioning Models

6

P1 P2

Partitioning Models

7

P1 P2

Partitioning Models

8

P1 P2 P1 P2

Partitioning Models

9

P1 P2 P1 P2

More efficient for power-law graphs

A Good Vertex-Cut Partitioning

10

• Low replication factor• Balanced partitions with respect to the number of edges

Streaming Graph Partitioning

• Graph elements are assigned to partitions as they are being streamed

• No global knowledge

11

Partitioner

P1

P2

Pp

streaming edges

State-of-the-Art Partitioners

12

State-of-the-Art Partitioners• Centralized partitioner:

• Single thread partitioner

• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information

13




• Distributed partitioner:• Oblivious partitioners: several independent

partitioners

14





partitioners

15

Slow partitioning timeLow replication factor





partitioners16


Fast partitioning timeHigh replication factor


Centralized partitioner

Partitioning Time vs. Partition Quality

17

Distributed partitioner





18



?




19



HoVerCut

HoVerCut Framework

HoVerCut ...

• Streaming Vertex-Cut partitioner

• Horizontally and Vertically scalable

• Scales without degrading the quality of partitions

• Employs different partitioning policies

21

Architecture Overview

22

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Shared State

Async

Async

Architecture: Input

23

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner 1

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

• Input graphs are streamed by their edges

• Each subpartitioner receives an exclusive subset of the edges

Shared State

Architecture: Configurable Window

24

Partitioning Policy

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Subpartitioners collect a number of incoming edges in a window of a certain size.

Tumbling Window

Core

Shared State

Architecture: Partitioning Policy

25

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Each subpartitioner assigns the edges to the partitions based on a given policy

Partitioning Policy

Tumbling Window

Shared State

Core

Architecture: Local State

26

Each subpartitioner has a local state, which includes information about the edges processed locally:

• partial degree• partitions of each vertex• num. edges in each partition

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Local State

Tumbling Window

Shared State

Core

Architecture: Shared State

27

Shared-state is the global state accessible by all subpartitioners.

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner n

Edge stream

Async

Async

Tumbling Window

Shared State

Core

Local

State

Architecture: Shared State

28

Shared-state is the global state accessible by all subpartitioners.

putState

getState

ID Partial Degree partitions

v1 12 p1

v2 50 p1,p2

Vertex Table Partition Table

Shared State

ID Num. of edges

p1 5000

p2 6500

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner n

Edge stream

Async

Async

Tumbling Window

Shared State

Core

Local

State

Architecture: Core

29

Partitioning Policy

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

The core is HoVerCut’s main algorithm parametrised with partitioning policy and the window size.

Core

Shared State

Tumbling Window

Vertex-Cut Partitioning Heuristics

30

For an edge with end-vertices uand v and for every partition p


31

Score = ReplicationScore + LoadBalanceScore



Choose the partition that maximizes the Score.

32




Choose the partition that maximizes the Score

33


State-of-the-Art Heuristics:•Greedy•HDRF


Greedy vs. HDRF

34

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

35

Greedy vs. HDRF


36

P1

P2

u

v

uGreedy

Greedy vs. HDRF


37

P1

P2

u

v

uP1

P2

u

v

Greedy

Greedy vs. HDRF


38

P1

P2

u

v

uP1

P2

u

v

Greedy

HDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Greedy vs. HDRF


39

P1

P2

u

v

uP1

P2

u

v

Greedy

P1

P2

u

v u

HDRFv

HDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Greedy vs. HDRF


40

P1

P2

u

v

uP1

P2

u

v

Greedy

P1

P2

u

v uP1

P2

HDRF

u

v v

vHDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Partitioning a Window of Edges

41


vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex subtable restricted to vids

42


vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids

for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size

end

43




end

update the shared state by sending vt, pt represented as deltas

44




end

update the shared state by sending vt, pt represented as deltas

ID Degree partitions

v1 +4 +p1

v2 +2 +p2

ID size

p1 +3

p2 +1

vt pt

Evaluation

Datasets

47

Dataset |V| |E|

Autonomous systems (AS) 1.7M 11M

Pokec social network (PSN) 1.6M 22M

LiveJournal social network (LSN) 4.8M 48M

Orkut social network (OSN) 3.1M 117MPartitions: 16

Evaluation Metrics

• Replication Factor (RF): the average number of replicated vertices

• Load Relative Standard Deviation (LRSD): the relative standard deviation of edge size in each partition (LRSD=0 indicates equal size partitions)

• Partitioning time: the time it takes to partition a graph

48

One Host: Summary

49

HoVerCut’s configuration: Subpartitioners (threads) = 32Window size = 32

One Host: Summary

50

HoVerCut’s configuration: Subpartitioners = 32Window size = 32

Distributed Configuration

51

AS|V|=1.7M|E|=11M

Distributed Configuration

52

AS|V|=1.7M|E|=11M

OSN|V|=3.1M|E|=117M

Conclusion•We presented HoVerCut, a parallel and distributed partitioner

•We can employ different partitioning policies in a scalable fashion

•We can scale HoVerCut to partition larger graphs without degrading the quality of partitions

53

Thank You!

Boosting Vertex-Cut Partitioning for Streaming Graphs

Presentations & Public Speaking

Transcript of Boosting Vertex-Cut Partitioning for Streaming Graphs