When Is Graph Reordering An Optimization?vigneshb/papers/Reordering-IISWC18.pdf · Graph Reordering...

Post on 16-Oct-2020

7 views 0 download

Transcript of When Is Graph Reordering An Optimization?vigneshb/papers/Reordering-IISWC18.pdf · Graph Reordering...

When Is Graph Reordering An Optimization?

A Cross Application and Input Graph Study on the Effectiveness of Lightweight Graph Reordering

Vignesh Balaji Brandon Lucia

Graph Processing Has Many Applications

2

Path Planning Social network analysis Recommender systems

Graph Processing Has Many Applications

3

Path Planning Social network analysis Recommender systems

Graph Applications Are Memory Bound

4Figure from “Optimizing Cache Performance for Graph Analytics” ArXiv v1;

Cycles stalled on DRAM / Total Cycles LLC Miss Rate

Graph Applications Are Memory Bound

Figure from “Optimizing Cache Performance for Graph Analytics” ArXiv v1; 5

Problem: Poor LLC locality ⇒ Many long-latency DRAM accesses

Cycles stalled on DRAM / Total Cycles LLC Miss Rate

Reason For Poor Locality - Irregular Memory Accesses

6

Reason For Poor Locality - Irregular Memory Accesses

7

Typical graph processing kernel

Reason For Poor Locality - Irregular Memory Accesses

8Input Graph

Typical graph processing kernel

Reason For Poor Locality - Irregular Memory Accesses

Compressed Sparse Row (CSR) Representation 9

Typical graph processing kernel

Input Graph

Reason For Poor Locality - Irregular Memory Accesses

10Input Graph

vtxData

Irregular accesses to vtxData array

Compressed Sparse Row (CSR) Representation

Typical graph processing kernel

Input Graph

Irregular Accesses Have Poor Temporal And Spatial Locality

11

Time

LLC

2 lines

2 words/line

12

Time

LLC

vtxData[3]

Irregular Accesses Have Poor Temporal And Spatial Locality

13

Time

LLC

2 3

Miss

Irregular Accesses Have Poor Temporal And Spatial Locality

Line granular transfers.2 words / line

vtxData[3]

14

Time

LLC

2 3

LLC

2 3

0 1

Miss Miss

Irregular Accesses Have Poor Temporal And Spatial Locality

vtxData[3] vtxData[0]

15

Time

LLC

2 3

LLC

2 3

0 1

Miss Miss

LLC

6 7

0 1

vtxData[7]

Miss

Irregular Accesses Have Poor Temporal And Spatial Locality

Eviction due to capacity miss

vtxData[3] vtxData[0]

16

Time

LLC

2 3

LLC

2 3

0 1

Miss Miss

LLC

6 7

0 1

Miss

vtxData[3]

Miss

LLC

6 7

2 3

Irregular Accesses Have Poor Temporal And Spatial Locality

vtxData[7]vtxData[3] vtxData[0]

17

Time

LLC

2 3

LLC

2 3

0 1

Miss Miss

LLC

6 7

0 1

Miss Miss

LLC

6 7

2 3

Irregular Accesses Have Poor Temporal And Spatial Locality

vtxData[3]vtxData[7]vtxData[3] vtxData[0]

Working set size >> LLC capacity↓

Poor Temporal Locality

18

Time

LLC

2 3

LLC

2 3

0 1

Miss Miss

LLC

6 7

0 1

Miss Miss

LLC

6 7

2 3

Line Size > Access granularity↓

Poor Spatial Locality

Irregular Accesses Have Poor Temporal And Spatial Locality

vtxData[3]vtxData[7]vtxData[3] vtxData[0]

Working set size >> LLC capacity↓

Poor Temporal Locality

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering

❖ Graph Reordering Challenge - Application and Input-dependent Speedups

❖ When is Graph Reordering an Optimization?

❖ Selective Graph Reordering

19

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering

❖ Graph Reordering Challenge - Application and Input-dependent Speedups

❖ When is Graph Reordering an Optimization?

❖ Selective Graph Reordering

20

Real-world Graphs Offer Opportunities To Improve Locality

21

Power-law Degree Distribution

Air Traffic Network

22

Hubs

Power-law Degree Distribution

Real-world Graphs Offer Opportunities To Improve LocalityAir Traffic Network

23Right figure from “Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

Community StructurePower-law Degree Distribution

Hubs

Real-world Graphs Offer Opportunities To Improve LocalityAir Traffic Network

Facebook friend Graph

24Right figure from “Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

Community StructurePower-law Degree Distribution

CommunitiesHubs

Real-world Graphs Offer Opportunities To Improve LocalityAir Traffic Network

Facebook friend Graph

25Right figure from “Rabbit Order: Just-in-time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

Community StructurePower-law Degree Distribution

Observation: Subset of vertices are accessed together

CommunitiesHubs

Real-world Graphs Offer Opportunities To Improve LocalityAir Traffic Network

Facebook friend Graph

Reordering To Improve Locality of Graph Applications

26

Key Insight: Store commonly accessed vertices contiguously in memory

Reordering To Improve Locality of Graph Applications

27

Power-law graph

Key Insight: Store commonly accessed vertices contiguously in memory

Reordering To Improve Locality of Graph Applications

28

Reorder

Power-law graph

Key Insight: Store commonly accessed vertices contiguously in memory

29

Time

LLC

Reordering Improves Spatial & Temporal Locality

2 lines

2 words/line

Reordered CSR

30

Time

LLC

vtxData[0]

0 1

Miss

Reordering Improves Spatial & Temporal Locality

31

Time

LLC

0 1

LLC

vtxData[2]

0 1

2 3

Miss Miss

vtxData[0]

Reordering Improves Spatial & Temporal Locality

32

Time

LLC

0 1

LLC

0 1

2 3

Miss Miss

LLC

0 1

2 3

vtxData[1]

Hit

vtxData[2]vtxData[0]

Reordering Improves Spatial & Temporal Locality

33

Time

Miss Miss

LLC

0 1

2 3

vtxData[0]

LLC

0 1

LLC

0 1

2 3

LLC

0 1

2 3

Hit Hit

vtxData[1]vtxData[2]vtxData[0]

Reordering Improves Spatial & Temporal Locality

34

Time

Miss Miss

LLC

0 1

2 3

LLC

0 1

LLC

0 1

2 3

LLC

0 1

2 3

Hit Hit

vtxData[0]

LLC

0 1

2 3

Hit

vtxData[0]vtxData[1]vtxData[2]vtxData[0]

Reordering Improves Spatial & Temporal Locality

35

Time

Miss Miss

LLC

0 1

2 3

LLC

0 1

LLC

0 1

2 3

LLC

0 1

2 3

Hit Hit

LLC

0 1

2 3

Hit

Graph Reordering improved Spatial and Temporal

locality of vtxData accesses

vtxData[0]vtxData[0]vtxData[1]vtxData[2]vtxData[0]

Reordering Improves Spatial & Temporal Locality

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups

❖ When is Graph Reordering an Optimization?

❖ Selective Graph Reordering

36

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups

❖ When is Graph Reordering an Optimization?

❖ Selective Graph Reordering

37

Graph Reordering Is Not A Panacea

38

Graph Reordering Is Not A Panacea

39

Locality Benefits

Reordering Overhead

Net Speedup

40

Graph Reordering Is Not A Panacea

41

Net speedup from Reordering depends on the Application and Input Graph

Graph Reordering Is Not A Panacea

Question: What are the properties of Applications and Input Graphs that benefit from Reordering?

42

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?

❖ Selective Graph Reordering

43

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space➢ Which Applications benefit from Reordering?➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

44

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space➢ Which Applications benefit from Reordering?➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

45

Characterization Space

46

3 Graph

Reordering Techniques

15 Applications

(Ligra, GAP)

8 Input Graphs

(M vertices, B edges)

Server-class Processor

(dual-Socket, 28 cores, 35MB LLC, 64GB DRAM)

Lightweight Reordering (LWR) Techniques

➢ Rabbit Ordering [Arai et. al., IPDPS 2016]

➢ Frequency-based Clustering (or “Hub-Sorting”) [Zhang et. al., Big Data

2017]

➢ Hub-Clustering (Our Variation of Hub Sorting)

47

Selection Criteria: Low reordering overhead (Require very few runs/iterations to amortize overheads)

LWR 1 - Rabbit Ordering

48Figure from “Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

LWR 1 - Rabbit Ordering

49Figure from “Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

50Figure from “Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

● Fast community detection using incremental aggregation

● Complexity - O(|E| - ck|V|) where c = clustering coeff.

k = avg. degree

LWR 1 - Rabbit Ordering

LWR 2 & 3 - HubSorting & HubClustering

51

degrees

Vtx IDs

LWR 2 & 3 - HubSorting & HubClustering

52

degrees

Vtx IDs

HubSo

rting

LWR 2 & 3 - HubSorting & HubClustering

53

degrees

Vtx IDs

HubSo

rting HubClustering

LWR 2 & 3 - HubSorting & HubClustering

54

degrees

Vtx IDs

HubSorting HubClustering

Locality Benefits Temporal AND Spatial ↑↑ Temporal ↑

Complexity O(|V|.logV) ↓↓ O(|V|) ↓

HubSo

rting HubClustering

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering?➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

55

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering?➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

56

Legend for Results

57

Application

Input GraphSpeedup with respect to original ordering of graph

Reordering Technique

Legend for Results

58

Speedup excluding the overhead of reordering

(Max Speedup)

Legend for Results

59

Speedup including the overhead of reordering

(Net Speedup)

Legend for Results

60

Reordering overhead

Speedup including the overhead of reordering

(Net Speedup)

15 Applications →5 Categories

• Category 1: Applications processing Large Frontiers are good candidates

• Category 2: Symmetric bipartite graphs require bi-partiteness aware reordering

• Category 3: Applications processing small frontiers offer limited opportunity

• Category 4: Reordering for Push-style applications introduces false-sharing

• Category 5: Reordering affects convergence for applications with ID-dependent

computations

61

15 Applications →5 Categories

• Category 1: Applications processing Large Frontiers are good candidates

• Category 2: Symmetric bipartite graphs require bi-partiteness aware reordering

• Category 3: Applications processing small frontiers offer limited opportunity

• Category 4: Reordering for Push-style applications introduces false-sharing

• Category 5: Reordering affects convergence for applications with ID-dependent

computations

62

63

Category I - Applications Processing a Large Fraction Of Edges

❖ PageRank (Ligra & Gap)

❖ Graph Radii Estimation (Ligra)

64

Category I - Applications Processing a Large Fraction Of Edges

65

Observation 1: LWR provides end-to-end speedups in some cases

Category I - Applications Processing a Large Fraction Of Edges

66

Observation 2: Maximum speedups from HubSort > HubCluster

Category I - Applications Processing a Large Fraction Of Edges

Observation 1: LWR provides end-to-end speedups in some cases

67

Observation 3: Reordering Overhead is HubSort > HubCluster

Category I - Applications Processing a Large Fraction Of Edges

Observation 2: Maximum speedups from HubSort > HubCluster

Observation 1: LWR provides end-to-end speedups in some cases

68

Observation 4: HubSort strikes a balance between effectiveness and overhead

Category I - Applications Processing a Large Fraction Of Edges

Observation 3: Reordering Overhead is HubSort > HubCluster

Observation 2: Maximum speedups from HubSort > HubCluster

Observation 1: LWR provides end-to-end speedups in some cases

Category II - Executions On Symmetric Bipartite Graphs

69

❖ Collaborative Filtering (Ligra)

Category II - Executions On Symmetric Bipartite Graphs

70

Category II - Executions On Symmetric Bipartite Graphs

71

Surprising trend: HubSort causes net slowdowns

Category II - Reason For Slowdown With HubSort

72

Part1 Part 2

73

[0, V)

Category II - Reason For Slowdown With HubSort

Part1 IDs Part2 IDs

Original Ordering

Heatmap showing the part that each vertex belongs to

Part1 Part 2

74

[0, V)

Category II - Reason For Slowdown With HubSort

Original OrderingPart1 Part 2

Part1 IDs Part2 IDs

Heatmap showing the part that each vertex belongs to

75

[0, V)

Category II - Reason For Slowdown With HubSort

Range of Irregular accesses to vtxData = #nodes in a part

Original OrderingPart1 Part 2

Part1 IDs Part2 IDs

76

Category II - Reason For Slowdown With HubSort

[0, V)

Vertices from different parts assigned consecutive IDs

↓Increased range of irregular

accesses to vtxData

Original OrderingPart1 Part 2

Part1 IDs Part2 IDs

77

Category II - Reason For Slowdown With HubSort

[0, V)

Opportunity: Bipartiteness-aware Graph Reordering Techniques

Original OrderingPart1 Part 2

Part1 IDs Part2 IDs

Category III - Applications Processing a Small Fraction of Edges

78

❖ Betweenness Centrality

❖ BFS

❖ K-Core Decomposition

79

Low speedup even without Reordering overheads

Category III - Applications Processing a Small Fraction of Edges

80

Category III - Applications Processing a Small Fraction of Edges

81

Category III - Applications Processing a Small Fraction of Edges

82

Category III - Applications Processing a Small Fraction of Edges

83

Limited reuse in vtxData accesses↓

Lower headroom for reordering

Category III - Applications Processing a Small Fraction of Edges

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering? ✔➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

84

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering? ✔➢ Which Input Graphs benefit from Reordering?

❖ Selective Graph Reordering

85

86

Speedup From HubSorting Varies Across Inputs

HubSort

Category I Application

87

HubSort

✅ ✅ ✅

❌ ❌ ❌

Need to predict speedup from HubSorting AND

selectively perform HubSorting

Speedup From HubSorting Varies Across Inputs

Category I Application

Understanding Performance Improvement From HubSorting

88

Cache line

vtxData

Understanding Performance Improvement From HubSorting

89

Hub Vertices

Layout of hubs in original ordering

Cache line

Understanding Performance Improvement From HubSorting

90

HubSort

Layout of hubs in original ordering

Layout of hubs after reordering vertices

Hub Vertices

Cache line Cache line

Understanding Performance Improvement From HubSorting

91

HubSort

Layout of hubs in original ordering

Layout of hubs after reordering vertices

Hub Vertices

Cache line Cache line

HubSorting will be most effective for Graphs with:

❖ Property #1: Skew in the degree-distribution (Presence of Hubs)❖ Property #2: Sparsely distributed hub vertices (Quality of original ordering)

Packing Factor - A Measure of Hub Density Packing Factor is a measure of how densely the hubs are packed after HubSorting

92

Cache line

Hub Vertices

Layout of the original ordering of vertices

Layout after reordering vertices by HubSorting

HubSort

Cache line

Packing Factor Can Predict Speedup From HubSorting

93

Pearson Correlation = 0.92

94

Pearson Correlation = 0.92

Speedup from HubSorting for a given pair of (Application, Input Graph)

Packing Factor Can Predict Speedup From HubSorting

95

Pearson Correlation = 0.92

Packing Factor is a good predictor for Speedup

Packing Factor Can Predict Speedup From HubSorting

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering? ✔➢ Which Input Graphs benefit from Reordering? ✔

❖ Selective Graph Reordering

96

Outline

❖ Poor Locality of Graph Processing Applications ✔

❖ Improving locality through Graph Reordering ✔

❖ Graph Reordering Challenge - Application and Input-dependent Speedups ✔

❖ When is Graph Reordering an Optimization?➢ Characterization Space ✔➢ Which Applications benefit from Reordering? ✔➢ Which Input Graphs benefit from Reordering? ✔

❖ Selective Graph Reordering

97

Selective Graph Reordering

98

G`= HubSort(G)

Process(G`)

Selective Graph Reordering

99

Increasing order of Packing Factor

G`= HubSort(G)

Process(G`)

Selective Graph Reordering

100

G`= HubSort(G)

Process(G`)

PF = ComputePF(G)if (PF > 4): G` = HubSort(G) Process(G`)else: Process(G)

Selective Graph Reordering

101

G`= HubSort(G)

Process(G`)

PF = ComputePF(G)if (PF > 4): G` = HubSort(G) Process(G`)else: Process(G)

Selective Graph Reordering

102

Selective Reordering avoids slowdowns

Selective Graph Reordering

103

Computing Packing Factor does not degrade performance

Selective Graph Reordering

104

Selective Reordering is a viable Optimization

Conclusions

❖ Graph Reordering does not benefit all Application and Input Graphs

❖ Opportunity to design new Reordering techniques for specific applications

❖ Packing Factor enables Selective Graph Reordering

105

Source Code Available

106

● Includes code for:○ Packing Factor○ Lightweight Reordering Techniques○ Selective HubSorting

● Open sourced at -

○ https://github.com/CMUAbstract/Graph-Reordering-IISWC18

Thank You!

107

When Is Graph Reordering An Optimization?

A Cross Application and Input Graph Study on the Effectiveness of Lightweight Graph Reordering

Vignesh Balaji Brandon Lucia

Backup Slides

109

Use-cases Where Reordering Overhead Cannot be Amortized

Time

Left fig. from “Chronos: A Graph Engine for Temporal Graph Analysis” EuroSys 2014; Right fig. from “Graph Evolution: Densification and Shrinking Diameters” TKDD 2007

Temporal Graph Mining

Sophisticated Reordering Techniques are impractical for cases where graph is processed only a few times

110

Sophisticated Reordering Techniques Impose High Overhead

Assumption: Reordered graph will be processed multiple times

111

Irregular Accesses Have Poor Temporal And Spatial Locality

112

Time

LLC

vData[3]

2 3

LLC

vData[0]

2 3

0 1

Miss Miss

LLC

6 7

0 1

vData[7]

Miss

vData[3]

Miss

LLC

6 7

2 3

vData[1]

LLC

0 1

2 3

Miss

113

Time

LLC

vData[3]

2 3

LLC

vData[0]

2 3

0 1

Miss Miss

LLC

6 7

0 1

vData[7]

Miss

vData[3]

Miss

LLC

6 7

2 3

vData[1]

LLC

0 1

2 3

Miss

vData[3]

Hit

LLC

0 1

2 3

Irregular Accesses Have Poor Temporal And Spatial Locality

114

Time

LLC

vData[3]

2 3

LLC

vData[0]

2 3

0 1

Miss Miss

LLC

6 7

0 1

vData[7]

Miss

vData[3]

Miss

LLC

6 7

2 3

vData[1]

LLC

0 1

2 3

Miss

vData[3]

Hit

LLC

0 1

2 3

vData[7]

Miss

LLC

6 7

2 3

Irregular Accesses Have Poor Temporal And Spatial Locality

Graph Applications

115

Ligra➢ Page Rank➢ Page Rank-Delta➢ SSSP – Bellman Ford➢ Collaborative Filtering➢ Radii➢ Betweenness Centrality➢ BFS➢ Kcore➢ Maximal Independent Set➢ Connected Components

GAP➢ Page Rank➢ SSSP – Delta Stetting➢ Betweenness Centrality➢ BFS➢ Connected Components

11 Distinct Algorithms

HW Platform

❖ Dual-Socket Intel Xeon E5-2660v4 processors

❖ 14 cores per Socket (2HT/core)

❖ 35 MB Last Level Cache per processor

❖ 64 GB of main memory

116

32GB DRAM

35MB Last Level Cache

35MB Last Level Cache

Core1

Core2

Core14

Core1

Core2

Core14

32GB DRAM

…. ….

Socket 1 Socket 2

Input Graphs

117

Input Graphs

118

Irregular working set size >> Aggregate LLC Capacity

Input Graphs

119

Irregular working set size >> Aggregate LLC Capacity

We use the original ordering of Input Graphs

Lightweight Reordering Can Provide End-to-end Speedups

Figure from “Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis” IPDPS 2016120

Lightweight Reordering Can Provide End-to-end Speedups

Figure from “Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis” IPDPS 2016

Reordering techniques exploiting power-law distributions and community structure

can have low-overheads

121

Category II - Executions On Symmetric Bipartite Graphs

122

Surprising trends:● HubSort offers the least performance benefits● HubSort causes slowdowns

123

Assigning vertices from each part of the graph a contiguous range is good for temporal locality

Need a simple mechanism to assign hub vertices from the same part a contiguous range of IDs

Part1 Part 2

Category II - Reason For Slowdown With HubSort

Category IV - Push-based Graph Applications

124

Push-phase Pull-phase

+ Work Efficient execuztion- Overhead of synchronization

+ No synchronization required- Work-inefficient (iterate over all Vertices)

125

Category IV - Push-based Graph Applications

Reordering causes an increase in false-sharing

LWR favors pull-style graph applications

Category V - LWR can affect convergence

126

Vertex IDs influence amount of work done each iteration

Category V - LWR can affect convergence

127

Increase in Iterations until convergence due to LWR

Opportunity to accelerate convergence by reordering vertices

The Need For Selective Lightweight Reordering

128

Unconditionally performing LWR causes net slowdowns on some input graphs

Completely avoid LWR misses speedups up to 1.8x

Need to predict speedup from LWR for an input graph and only selectively perform LWR

Using Packing Factor for Selective Reordering

129Selective Reordering avoids slowdowns for graphs with low Packing Factor

Using Packing Factor for Selective Reordering

130

The low overhead of Packing Factor computation does not sacrifice speedup for high Packing Factor graphs

Speedups From HubSorting Are Due To Locality Improvements

131

Computing Packing Factor

132