Boosting Vertex-Cut Partitioning for Streaming Graphs

54
Boosting Vertex-Cut Partitioning for Streaming Graphs Hooman Peiro Sajjad * , Amir H. Payberah , Fatemeh Rahimian , Vladimir Vlassov * , Seif Haridi * KTH Royal Institute of Technology † SICS Swedish ICT 5th IEEE International Congress on Big Data

Transcript of Boosting Vertex-Cut Partitioning for Streaming Graphs

Page 1: Boosting Vertex-Cut Partitioning for Streaming Graphs

Boosting Vertex-Cut Partitioning for Streaming Graphs

Hooman Peiro Sajjad*, Amir H. Payberah†, Fatemeh Rahimian†, Vladimir Vlassov*, Seif Haridi†

* KTH Royal Institute of Technology † SICS Swedish ICT

5th IEEE International Congress on Big Data

Page 2: Boosting Vertex-Cut Partitioning for Streaming Graphs

Introduction

Page 3: Boosting Vertex-Cut Partitioning for Streaming Graphs

Graph PartitioningPartition large graphs for applications such as:•Complexity reduction, parallelization and distributed graph analysis

3

P1 P2

P3 P4

Page 4: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

4

Page 5: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

5

Page 6: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

6

P1 P2

Page 7: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

7

P1 P2

Page 8: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

8

P1 P2 P1 P2

Page 9: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning Models

9

P1 P2 P1 P2

More efficient for power-law graphs

Page 10: Boosting Vertex-Cut Partitioning for Streaming Graphs

A Good Vertex-Cut Partitioning

10

• Low replication factor• Balanced partitions with respect to the number of edges

Page 11: Boosting Vertex-Cut Partitioning for Streaming Graphs

Streaming Graph Partitioning

• Graph elements are assigned to partitions as they are being streamed

• No global knowledge

11

Partitioner

P1

P2

Pp

streaming edges

Page 12: Boosting Vertex-Cut Partitioning for Streaming Graphs

State-of-the-Art Partitioners

12

Page 13: Boosting Vertex-Cut Partitioning for Streaming Graphs

State-of-the-Art Partitioners• Centralized partitioner:

• Single thread partitioner

• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information

13

Page 14: Boosting Vertex-Cut Partitioning for Streaming Graphs

State-of-the-Art Partitioners• Centralized partitioner:

• Single thread partitioner

• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information

• Distributed partitioner:• Oblivious partitioners: several independent

partitioners

14

Page 15: Boosting Vertex-Cut Partitioning for Streaming Graphs

State-of-the-Art Partitioners• Centralized partitioner:

• Single thread partitioner

• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information

• Distributed partitioner:• Oblivious partitioners: several independent

partitioners

15

Slow partitioning timeLow replication factor

Page 16: Boosting Vertex-Cut Partitioning for Streaming Graphs

State-of-the-Art Partitioners• Centralized partitioner:

• Single thread partitioner

• Multi-threaded partitioner: each thread partitions a subset of the graph and shares the state information

• Distributed partitioner:• Oblivious partitioners: several independent

partitioners16

Slow partitioning timeLow replication factor

Fast partitioning timeHigh replication factor

Page 17: Boosting Vertex-Cut Partitioning for Streaming Graphs

Slow partitioning timeLow replication factor

Centralized partitioner

Partitioning Time vs. Partition Quality

17

Distributed partitioner

Fast partitioning timeHigh replication factor

Page 18: Boosting Vertex-Cut Partitioning for Streaming Graphs

Slow partitioning timeLow replication factor

Centralized partitioner

Partitioning Time vs. Partition Quality

18

Distributed partitioner

Fast partitioning timeHigh replication factor

?

Page 19: Boosting Vertex-Cut Partitioning for Streaming Graphs

Slow partitioning timeLow replication factor

Centralized partitioner

Partitioning Time vs. Partition Quality

19

Distributed partitioner

Fast partitioning timeHigh replication factor

HoVerCut

Page 20: Boosting Vertex-Cut Partitioning for Streaming Graphs

HoVerCut Framework

Page 21: Boosting Vertex-Cut Partitioning for Streaming Graphs

HoVerCut ...

• Streaming Vertex-Cut partitioner

• Horizontally and Vertically scalable

• Scales without degrading the quality of partitions

• Employs different partitioning policies

21

Page 22: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture Overview

22

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Shared State

Async

Async

Page 23: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Input

23

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner 1

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

• Input graphs are streamed by their edges

• Each subpartitioner receives an exclusive subset of the edges

Shared State

Page 24: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Configurable Window

24

Partitioning Policy

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Subpartitioners collect a number of incoming edges in a window of a certain size.

Tumbling Window

Core

Shared State

Page 25: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Partitioning Policy

25

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Each subpartitioner assigns the edges to the partitions based on a given policy

Partitioning Policy

Tumbling Window

Shared State

Core

Page 26: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Local State

26

Each subpartitioner has a local state, which includes information about the edges processed locally:

• partial degree• partitions of each vertex• num. edges in each partition

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

Local State

Tumbling Window

Shared State

Core

Page 27: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Shared State

27

Shared-state is the global state accessible by all subpartitioners.

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner n

Edge stream

Async

Async

Tumbling Window

Shared State

Core

Local

State

Page 28: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Shared State

28

Shared-state is the global state accessible by all subpartitioners.

putState

getState

ID Partial Degree partitions

v1 12 p1

v2 50 p1,p2

Vertex Table Partition Table

Shared State

ID Num. of edges

p1 5000

p2 6500

Partitioning Policy

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local

State

Subpartitioner n

Edge stream

Async

Async

Tumbling Window

Shared State

Core

Local

State

Page 29: Boosting Vertex-Cut Partitioning for Streaming Graphs

Architecture: Core

29

Partitioning Policy

Local State

Subpartitioner 1

Edge stream

Core

Partitioning Policy

Tumbling Window

Local State

Subpartitioner n

Edge stream

Async

Async

The core is HoVerCut’s main algorithm parametrised with partitioning policy and the window size.

Core

Shared State

Tumbling Window

Page 30: Boosting Vertex-Cut Partitioning for Streaming Graphs

Vertex-Cut Partitioning Heuristics

30

For an edge with end-vertices uand v and for every partition p

Page 31: Boosting Vertex-Cut Partitioning for Streaming Graphs

Vertex-Cut Partitioning Heuristics

31

Score = ReplicationScore + LoadBalanceScore

For an edge with end-vertices uand v and for every partition p

Page 32: Boosting Vertex-Cut Partitioning for Streaming Graphs

Vertex-Cut Partitioning Heuristics

Choose the partition that maximizes the Score.

32

Score = ReplicationScore + LoadBalanceScore

For an edge with end-vertices uand v and for every partition p

Page 33: Boosting Vertex-Cut Partitioning for Streaming Graphs

Vertex-Cut Partitioning Heuristics

Choose the partition that maximizes the Score

33

Score = ReplicationScore + LoadBalanceScore

State-of-the-Art Heuristics:•Greedy•HDRF

For an edge with end-vertices uand v and for every partition p

Page 34: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

34

Page 35: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

35

Page 36: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

36

P1

P2

u

v

uGreedy

Page 37: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

37

P1

P2

u

v

uP1

P2

u

v

Greedy

Page 38: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

38

P1

P2

u

v

uP1

P2

u

v

Greedy

HDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Page 39: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

39

P1

P2

u

v

uP1

P2

u

v

Greedy

P1

P2

u

v u

HDRFv

HDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Page 40: Boosting Vertex-Cut Partitioning for Streaming Graphs

Greedy vs. HDRF

Greedy: places end-vertices u and v of an edge in a partition that already has a replica of u or v.

40

P1

P2

u

v

uP1

P2

u

v

Greedy

P1

P2

u

v uP1

P2

HDRF

u

v v

vHDRF (High Degree Replicated First): replicates the higher degree end-vertex.

Page 41: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning a Window of Edges

41

Page 42: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning a Window of Edges

vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex subtable restricted to vids

42

Page 43: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning a Window of Edges

vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids

for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size

end

43

Page 44: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning a Window of Edges

vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids

for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size

end

update the shared state by sending vt, pt represented as deltas

44

Page 45: Boosting Vertex-Cut Partitioning for Streaming Graphs

Partitioning a Window of Edges

vids: the set of vertex ids in the current windowedges: set of edges in current windowpt = get the partition table vt = get the vertex table restricted to vids

for each e ∊ edges:u = e.src , v = e.dstincrement vt(u).degree and vt(v).degreegiven a partition policy: select p based on vt(u), vt(v) and ptadd p to vt(u).partitions and vt(v).partitionsincrement pt(p).size

end

update the shared state by sending vt, pt represented as deltas

ID Degree partitions

v1 +4 +p1

v2 +2 +p2

ID size

p1 +3

p2 +1

vt pt

Page 46: Boosting Vertex-Cut Partitioning for Streaming Graphs

Evaluation

Page 47: Boosting Vertex-Cut Partitioning for Streaming Graphs

Datasets

47

Dataset |V| |E|

Autonomous systems (AS) 1.7M 11M

Pokec social network (PSN) 1.6M 22M

LiveJournal social network (LSN) 4.8M 48M

Orkut social network (OSN) 3.1M 117MPartitions: 16

Page 48: Boosting Vertex-Cut Partitioning for Streaming Graphs

Evaluation Metrics

• Replication Factor (RF): the average number of replicated vertices

• Load Relative Standard Deviation (LRSD): the relative standard deviation of edge size in each partition (LRSD=0 indicates equal size partitions)

• Partitioning time: the time it takes to partition a graph

48

Page 49: Boosting Vertex-Cut Partitioning for Streaming Graphs

One Host: Summary

49

HoVerCut’s configuration: Subpartitioners (threads) = 32Window size = 32

Page 50: Boosting Vertex-Cut Partitioning for Streaming Graphs

One Host: Summary

50

HoVerCut’s configuration: Subpartitioners = 32Window size = 32

Page 51: Boosting Vertex-Cut Partitioning for Streaming Graphs

Distributed Configuration

51

AS|V|=1.7M|E|=11M

Page 52: Boosting Vertex-Cut Partitioning for Streaming Graphs

Distributed Configuration

52

AS|V|=1.7M|E|=11M

OSN|V|=3.1M|E|=117M

Page 53: Boosting Vertex-Cut Partitioning for Streaming Graphs

Conclusion•We presented HoVerCut, a parallel and distributed partitioner

•We can employ different partitioning policies in a scalable fashion

•We can scale HoVerCut to partition larger graphs without degrading the quality of partitions

53

Page 54: Boosting Vertex-Cut Partitioning for Streaming Graphs

Thank You!