PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS,...

19
PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS, Peking University, China

Transcript of PAGE: A Partition Aware Graph Computation Engine Yingxia Shao, Junjie Yao, Bin Cui, Lin Ma EECS,...

PAGE: A Partition Aware Graph Computation Engine

Yingxia Shao, Junjie Yao, Bin Cui, Lin MaEECS, Peking University, China

Agenda

Background• Design of PAGE• Experiment result• Conclusion

2/19

Background

• Prevalent large scale graphs– Social networks– Web graph – …

• Graph computing systems– Pregel (Google)– Giraph (Apache)– GPS (Stanford)– GraphLab (CMU)– …

3/19

Background

• Graph Partitioning– Offline approach

• METIS (Karypis Lab)

– Online approach• Streaming partitioning• Linear Deterministic Greedy(LDG) algorithm (I. Stanton)

4/19

Problem: The existing graph computation systems cannot efficiently integrate the high-quality graph partitioning.

Inefficient partition integrating

Av

erag

e ti

me(

s/it

erat

ion

) 8 0 o v e ra l l c o s t

7 0 s y n c re m o te c o m m . c o s t

6 0 lo c a l c o m m . c o s t 5 0

4 0 3 0

2 0 1 0

0

Partitio n S ch em e

5/19

The high-quality graph partitioning leads to the worse overall performance.

The graph partitioning quality is improved from left to right.

Running PageRank on Giraph with six different graph partition qualities.

Motivation of the PAGE

Call for a novel graph computation engine to efficiently integrate graph partitioning with various qualities.

A Novel Graph Computation Engine

High-Quality Graph PartitionLow-Quality Graph Partition

6/19

Agenda

• BackgroundDesign of PAGE• Experiment result• Conclusion

7/19

Message processor

8/19

Message Process Unit

msg.

msg.

msg.

Message Block

msg.

msg.

msg.

msg.

msg.…

Header

msg.

msg.

msg.

msg.

msg.

Message Process Unit

Message Process Unit

Message Process Unit

Message Process Unit

Message Processor

Inefficient partition integratingA

vera

ge ti

me(

s/ite

ratio

n) 8 0 o v e ra l l c o s t

7 0 s y n c re m o te c o m m . c o s t

6 0 lo c a l c o m m . c o s t 5 0

4 0 3 0

2 0 1 0

0

Partitio n S ch em e

9/19

The local message processing cost dominates the overall cost.

The existing systems cannot provide enough local message processor.

Running PageRank on Giraph with six different graph partition qualities.

Overview of the PAGE

PAGE worker1

Partition Aware

Comm.

PAGE worker2

Partition Aware

Comm.

PAGE worker3

Partition Aware

Comm.

Distributed In-Memory Partitioned Graph

Computation Computation Computation

PAGE applies adaptively tuning mechanism and new cooperation methods.10/19

New Designed PAGE Worker

11/19

Partition Aware

Monitor

DCCM

Communication

Dual Concurrent MP

Sender Receiver

Computation

Remote MP

Local MP

Dual Concurrent MP

Remote MP

Local MP

Dual Concurrent Message Processor

• First type concurrency– A remote MP and a local MP are

embedded• Second type concurrency

– A set of message process units are contained by each message processor

• The concurrency is automatically determined by the system itself.

12/19

Dynamic Concurrency Control Model

• The DCCM determines the proper parameters, such as nmp , nmpl , nmpr .

• The DCCM is built on top of two heuristic rules.– Ability Lower-bound.– Workload Balance Ratio.

• Monitor– Tracks the necessary metrics

Partition Aware

Monitor

DCCM

13/19

Agenda

• Background• Design of PAGEExperiment result• Conclusion

14/19

Environment & Datasets

• Experiment Environment– a 24 nodes cluster

• Dataset: the uk-2007-05-u.– Undirected– Vertex #: 105,153,952 – Edge #: 6,603,753,128

• Benchmark: PageRank

Scheme Edge Cut

Random 98.52%

LDG1 82.88%

LDG2 75.69%

LDG3 66.37%

LDG4 56.34%

METIS 3.48%

Partition qualities

15/19

Balance factor: < 1%.

Partition Awareness in PAGE A

ver

age

tim

e(s/

iter

atio

n)

3 5

3 0

2 5

2 0 o v erra l l co s t

s y n c rem o te co m m . co s t 1 5

s y n c lo ca l co m m . co s t

1 0

5

0

Partitio n S ch em e A

ver

age

tim

e(s/

iter

atio

n)

7 0

o v era ll co s t 6 0

sy n c rem o t e co m m . co s t

5 0 sy n c lo cal co m m . co s t

4 0

3 0

2 0

1 0

0

Partitio n S ch em e

PAGE Giraph

16/19

Compare with the naive solution

Ave

rag

e t

ime

(s/it

era

tion

) 80

G irap h 70 G irap h-G P S o p

PA G E

60

50

40

30

20 10

0

Partition S chem e

17/19

* The Giraph-GPSop is the naive solution.

Contribution & Conclusion

• We identify the problem of partition unaware inefficiency.

• We set up a new partition aware graph computation engine, PAGE.

• We design a Dynamic Concurrency Control Model based on several heuristic rules to better profile the characters of graph partition.

• At last, we demonstrate PAGE’s robustness and efficiency on different graph partition qualities.

18/19

Thanks!

19/19

Email: [email protected]