Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

38
Leopard: Lightweight Partitioning and Replication for Dynamic Graphs Jiewen Huang and Daniel Abadi Yale University

Transcript of Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Page 1: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Leopard: Lightweight Partitioning and Replication for Dynamic Graphs

Jiewen Huang and Daniel AbadiYale University

Page 2: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Facebook Social Graph

Page 3: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Social Graphs

Page 4: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Web Graphs

Page 5: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Semantic Graphs

Page 6: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Many systems use hash partitioning

● Results in many edges being “cut”

Given a graph G and an integer k, partition the vertices into k disjoint sets such that:

● as few cuts as possible

● as balanced as possible

Graph Partitioning

NP Hard

Page 7: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Multilevel scheme Coarsening phase

State of the Art

Page 8: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

The only constant is change.

-------- Heraclitus

To Make the Problem more Complicated

Social graphs: new people and friendshipsSemantic Web graphs: new knowledgeWeb graphs: new websites and links

Page 9: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Dynamic Graphs

A

Partition 1 Partition 2

Is partition 1 still the better partition for A?

Page 10: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Repartitioning the entire graph upon every change is way too expensive

New Framework

Leopard:● Locally reassess partitioning as a result of

changes without a full re-partitioning● Integrates consideration of replication with

partitioning

Page 11: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Outline

Background and Motivation

LEOPARD

Overview

Computation Skipping

Replication

Experiments

Page 12: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Algorithm Overview

For each added/deleted edge <V1, V2>

Compute best partition for V1 using a heuristic

Re-assign V1 if needed

The same for V2

Page 13: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Example: Adding an Edge

AB

Partition 1 Partition 2

Page 14: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Compute the Partition for B

A

B

Partition 1 Partition 2# neighbours: 1# vertices: 5

# neighbours: 3# vertices: 3

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 5/6) = 0.17 3 * (1 - 3/6) = 1.5

Higher score

This heuristic is simple for the sake of presentation. More advanced heuristics are discussed in the paper

Page 15: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Compute the Partition for A

A

B

Partition 1 Partition 2# neighbours: 1# vertices: 4

# neighbours: 2# vertices: 4

Goals: (1) few cuts and (2) balanced

Heuristic: # neighbours * (1 - #vertices/capacity)

1 * (1 - 4/6) = 0.33 2 * (1 - 4/6) = 0.66

Higher score

Page 16: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Example: Adding an Edge

B

Partition 1 Partition 2

A

(1) B stays put(2) A moves to partition 2

Page 17: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Outline

Background and Motivation

Leopard

Overview

Computation Skipping

Replication

Experiments

Page 18: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Computation cost

For each new edge, must: For both vertexes involved in the edge: Calculate the heuristic for each partition (May involve communication for remote vertex location lookup)

Page 19: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Computation Skipping

Observation: As the number of neighbors of a vertex increases, the influence of a new neighbor decreases.

Page 20: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Computation Skipping

Basic Idea: Accumulate changes for a vertex, if the changes exceed a certain threshold, recompute the partition for the vertex.

For example, threshold = # accumulated changes / # neighbors = 20%.

(1) Compute the partition when V has 10 neighbors. Then 2 new edges are added for V: 2 / 12 = 17% < 20%. Don’t recompute

(2) When 1 more new edge is added for V: 3 / 13 = 23% > 20%. Recompute the partition for V. Reset # accumulated changes to 0.

Page 21: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Outline

Background and Motivation

Leopard

Overview

Computation Skipping

Replication

Experiments

Page 22: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Goals of replication:

fault tolerance (k copies for each data point/block)

further cut reduction

Replication

Page 23: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

It takes two parameters:

● minimum: fault tolerance

● average: cut reduction

Minimum-Average Replication

Page 24: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Example

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

min = 2average = 2.5

first copy

replica

Page 25: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Example

# copies vertices

2 A,C,D,E,H,J,K,L

3 F,I

4 B,G

min = 2average = 2.5

Page 26: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

How Many Copies?

A

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum = 2average = 3

Scores of each partition

Page 27: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

How Many Copies?

A

Partition 1 Partition 4Partition 3Partition 2

0.1 0.40.30.2

minimum = 2average = 3

minimum requirementWhat about them?

Page 28: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Always keep the last n computed scores.

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

cutoff: top avg-1/k-1 percent of scores

Page 29: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 2

cutoff: 30th highest score

Page 30: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 2

cutoff: 30th highest score

Page 31: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th 31th

# copies: 3

cutoff: 30th highest score

Page 32: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Comparing against Past Scores

0.220.290.30.40.870.9 0.2 0.11 0.1

High Low

... ... ... ... ....

minimum = 2average = 3

30th

# copies: 4

cutoff: 30th highest score

Page 33: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Outline

Background and Motivation

Leopard

Experiments

Page 34: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Experiment Setup

● Comparison points○ Leopard with FENNEL heustitics

○ One-pass FENNEL (no vertex reassignment)

○ METIS (static graphs)

○ ParMETIS (repartitioning for dynamic graphs)

○ Hash Partitioning

● Graph Datasets○ Type: social graphs, collaboration graphs, Web graphs, email graphs, and synthetic graphs

○ Size: up to 66 million vertices and 1.8 billion edges

Page 35: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Edge Cut

Page 36: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Computation Skipping

Page 37: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Effect of Replication on Edge Cut

Page 38: Leopard: Lightweight Partitioning and Replication  for Dynamic Graphs

Thanks!

Q & A