Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS)...

49
Region-Based Memory Management for Task Dataflow Models Nikolopoulos, D. (2012). Region-Based Memory Management for Task Dataflow Models: First Joint ENCORE & PEPPHER Workshop on Programmability and Portability for Emerging Architectures (EPoPPEA). Held in conjunction with the 7th International Conference on High Performance and Embedded Architectures and Compilers (HIPEAC). Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:26. Oct. 2020

Transcript of Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS)...

Page 1: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Region-Based Memory Management for Task Dataflow Models

Nikolopoulos, D. (2012). Region-Based Memory Management for Task Dataflow Models: First Joint ENCORE &PEPPHER Workshop on Programmability and Portability for Emerging Architectures (EPoPPEA). Held inconjunction with the 7th International Conference on High Performance and Embedded Architectures andCompilers (HIPEAC).

Queen's University Belfast - Research Portal:Link to publication record in Queen's University Belfast Research Portal

General rightsCopyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or othercopyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associatedwith these rights.

Take down policyThe Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made toensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in theResearch Portal that you believe breaches copyright or violates any law, please contact [email protected].

Download date:26. Oct. 2020

Page 2: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Region-Based Memory Managementfor Task Dataflow Models

Dimitrios S. Nikolopoulos

Joint work with: Spyros Lyberis and Polyvios PratikakisComputer Architecture and VLSI Systems Laboratory (CARV)

Institute of Computer Science (ICS)Foundation for Research and Technology – Hellas (FORTH)

EPoPPEA, January 24, 2012

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 1 / 25

Page 3: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 2 / 25

Page 4: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

The ProblemManycore Processors and Shared Memory

Shared memory can be hard to scale to many coresI Cache coherence becomes expensiveI Causes excessive communication, memory contention

Manycore processors that relax shared memory abstractionI AMD Opteron: 12 cores, NUMA, MMU per coreI Cell Broadband Engine: 1 + 8 cores, no shared memoryI GPUs: Thousands of cores, small shared memories between few coresI Intel Single Chip Cloud: 48 cores, no shared memory

Increased demand for parallel software developmentI Threads and shared memory: accessible, error-proneI Message-passing: tedious, experts-onlyI Both are non-deterministic: difficult to debug and understand

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 3 / 25

Page 5: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Task-parallelism for distributed and shared memory

Available in several contemporary programming models (Sequoia,Cilk, OMPSs)

Recursive task-parallelismI Parallel programs are composed from nested parallel tasks

Annotate each task with its memory footprint

Runtime system performs dependency analysis, scheduling, allrequired data transfers

Support region-based memory managementI Easier to express irregular task footprintsI Enables use of dynamic data structures

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 4 / 25

Page 6: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Benefits

Shared memory abstractionI Resemble shared memory programmingI Runtime system hides data transfers among core memoriesI Use memory footprints to transfer required data locally before starting

a taskI Alternatively, schedule a task near the data

Message-passing execution semanticsI Tasks compute on local dataI Hierarchical scheduling, easier to scale to more coresI Remove shared-memory bottleneckI No need for complicated SDSM protocol on every access

DeterministicI Implicit, correct synchronizationI Repeatable behaviorI Formal proof

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 5 / 25

Page 7: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 6 / 25

Page 8: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Motivation for Regions

Task memory footprints are easy to express if the are composed offew objects, contiguous in memory

What if the task memory footprint is:I A linked list or part of it?I A tree or part of it?I A graph or part of it?

Hard to use many common linked data structures

Hard to express irregular algorithms and applications

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 7 / 25

Page 9: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 10: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 11: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 12: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 13: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 14: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

region L

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 15: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

region L region R

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 16: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

region L region R

v0

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 17: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

region L region R

v0

v1

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 18: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleRegion-based Memory Management

region G, L, R;Object ∗v0;init () {G = newregion();L = newsubregion(G);R = newsubregion(G);v0 = ralloc (G, sizeof (Object));v0->left = ralloc (L, sizeof (Object));v0->right = ralloc (R, sizeof (Object));}

region G

region L region R

v0

v1 v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 8 / 25

Page 19: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 9 / 25

Page 20: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

region L region R

v0

v1 v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 21: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

f[G]

region L region R

v0

v1 v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 22: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

f[G]

region L

f[L]

region R

v0

v1 v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 23: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

f[G]

region L

f[L]

region R

f[R]

v0

v1 v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 24: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

f[G]

region L

f[L]

region R

f[R]

v0

v1

h[v1]

v2

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 25: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

ExampleTasks and Regions

f(Object ∗v0) {if (done) return;spawn f(v0->left) [ inout region L];spawn f(v0->right) [ inout region R];spawn h(v0->left) [ inout v0->left ];spawn h(v0->right) [inout v0->right ];}

region G

f[G]

region L

f[L]

region R

f[R]

v0

v1

h[v1]

v2

h[v2]

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 10 / 25

Page 26: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Deterministic Scheduler Operational Semantics

Define small-step operational semanticsI 〈T ,D,S ,R〉 →p 〈T ′,D ′,S ′,R ′〉

Memory model:I Global address space: the store S includes all memory addressesI Distributed memory implementation: each task only accesses memory

locations declared in its footprint

Scheduling algorithm:I Dependency metadata D maintain a task queue per memory locationI Always possible to spawn a taskI But, it can only run when at the head of queue for all locations in the

footprint

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 11 / 25

Page 27: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Deterministic Proof Technique(Pratikakis et. al, MSPC’11)

Define sequential operational semanticsI 〈S , e〉 →s 〈S ′, e′〉

Prove sequential equivalence on execution tracesI Intuitively: Every parallel execution will produce the same value and

memory state as the sequential executionI More precisely: Given a program e and a finite (terminating) parallel

execution trace for e that produces a value v and a memory state S ,we can always construct a sequential execution trace for e that alsoreturns v and produces S

Proof by induction on the parallel traceI We can always reorder steps in the parallel trace to bring the next

“sequential” step to the front of the traceI Proof does not require scoped parallelism (sync)I Similar to a confluence proof

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25

Page 28: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 13 / 25

Page 29: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Management API

• Object allocation:

void *sys_alloc(size_t size, rid_t region);

void sys_balloc(size_t size, rid_t region,

int num_objects, void **objects);

void sys_free(void *ptr);

void *sys_realloc(void *ptr, size_t size, rid_t rgn);

• Region allocation:

rid_t sys_ralloc(rid_t parent, char level_hint);

void sys_rfree(rid_t region);

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 14 / 25

Page 30: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by ExampleA = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

A: 0

alloc() hash dep waiting ready queue

B: 1

C: 0

D: 2

core0

A, C

core1

B

core2

D

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 31: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by ExampleA = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

A: 0 0

alloc() hash dep waiting ready queue

B: 1

C: 0

D: 2

0(10), 1(20)

core0

A, C

core1

B

core2

D

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 32: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by ExampleA = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

A: 1 0

alloc() hash dep waiting ready queue

B: 1

C: 0

D: 2

core0

C

core1

A, B

core2

D

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 33: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by ExampleA = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

A: 1 0

0

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

0(5), 2(80)

core0

C

core1

A, B

core2

C, D

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 34: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by ExampleA = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

A: 1 0

0

2

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

core0 core1

A, B

core2

C, D

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 35: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by Example

A: 1 0

1

2

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

core0 core1

A, B

core2

C, D

A = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

task2 (C, D) { ... # task inout D task4 (D); # wait on D}

2(80)

0

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 36: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by Example

A: 1

1

1

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

core0 core1

A, B

core2

C, D

A = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

task2 (C, D) { ... # task inout D task4 (D); # wait on D}

0

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 37: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by Example

A: 1

0

1

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

core0 core1

A, B

core2

C, D

A = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

task2 (C, D) { ... # task inout D task4 (D); # wait on D}

2(5), 2(80)

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 38: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by Example

A: 1

0

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

core0 core1

A, B

core2

B, C, D

A = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

task2 (C, D) { ... # task inout D task4 (D); # wait on D}

1(20), 2(80)

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 39: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Distributed Memory Implementation by Example

A: 1 0

1

2

alloc() hash dep waiting ready queue

B: 1

C: 2

D: 2

A = malloc(10);B = malloc(20);C = malloc(5);D = malloc(80);

# task in A, out Btask1(A, B);

# task in C, out Dtask2(C, D);

# task in B, inout Dtask3(B, D);

task2 (C, D) { ... # task inout D task4 (D); # wait on D}

2(80)

0

Decode: Enqueue task args into alloc() hash fromleft/right side; Count non-ownershipargs in dep waiting

Issue: If dep waiting reaches 0, put task inready queue along with affinity stats

Prepare: Decide where to run, fetch non-local data

Run: Execute code when fetch is finished

Collect: Update alloc() hash ownership; Update depwaiting for these objects; Maybe goto Issue

1(10), 1(20)

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 15 / 25

Page 40: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 16 / 25

Page 41: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Hierarchical schedulers topology

Level 2

Level 1

Level 0

(Level 3: Workers)

20

13121110

00

21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 17 / 25

Page 42: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Splitting region trees for hierarchical scheduling

root

R1 Ra

R

Mb

Ra1

L

L2L1 Ma

Mb1 Mb2

M

Ma1

1

root region

region

object

L0 scheduler

L1 scheduler

L1 schedulerL2 scheduler

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 18 / 25

Page 43: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Barnes-Hut, 1.1M bodies, Single scheduler

0

100

200

300

400

500

600

Serial 1 2 4 8 16 32 64 128 256 512

Exe

cutio

n tim

e (s

econ

ds)

Number of workers

ComputationWorker commSched comm

Barrier waitSimulate

Load balancingNew local tree

Fetch remote tree

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 19 / 25

Page 44: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Barnes-Hut, 1.1M bodies, Two-level scheduler hierarchy

0

100

200

300

400

500

600

Serial 1(1) 2(1) 4(1) 8(1) 16(2) 32(4) 64(8) 128(16) 256(32) 512(64)

Exe

cutio

n tim

e (s

econ

ds)

Number of workers (number of L1 schedulers)

ComputationWorker commSched comm

Barrier waitSimulate

Load balancingNew local tree

Fetch remote tree

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 20 / 25

Page 45: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

UPC comparison, 16 worker cores

0

10

20

30

40

50

60

70

80

90

100

15,000 30,000 60,000 120,000 240,000

Exe

cutio

n tim

e (s

econ

ds)

Number of objects

UPC, 248-B objectsUPC, 504-B objects

UPC, 1016-B objectsUPC, 2040-B objects

Myrmics, 248-B objectsMyrmics, 504-B objects

Myrmics, 1016-B objectsMyrmics, 2040-B objects

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 21 / 25

Page 46: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

UPC comparison, 15,000 504-B objects

2

4

8

16

32

64

128

256

16 32 64 128

Exe

cutio

n tim

e (s

econ

ds)

Number of workers

UPCMyrmics, single scheduler

Myrmics, hierarchical schedulers

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 22 / 25

Page 47: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Outline

1 IntroductionContribution

2 Memory Management with Nested Regions

3 Task and Region SemanticsBy exampleFormal Definition

4 Distributed Memory Regions

5 Early Experimental Results

6 Conclusions

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 23 / 25

Page 48: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

Conclusions

Regions language construct:I arbitrary task memory footprints, better productivityI shared memory programming abstractionsI distributed memory execution semanticsI scalability

Work in progress:I Distributed memory task-based runtime (Myrmics)I Compiler support for regions (SCOOP)I Implementation and testing on FPGA prototype (Formic)

Acknowledgments:

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 24 / 25

Page 49: Region-Based Memory Management for Task Dataflow ModelsDimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 12 / 25 Outline 1 Introduction Contribution

CARV (FORTH-ICS) Formic 512-core FPGA Prototype

Dimitrios S. Nikolopoulos (FORTH-ICS) Region-Based Memory Management EPoPPEA 2012 25 / 25