CPEG 852 Advanced Topics in Computing Systems The EARTH ...

78
CPEG 852 — Advanced Topics in Computing Systems The EARTH Program Execution Model Hybrid von Neumann/Dataflow Models St´ ephane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University of Delaware 140 Evans Hall Newark,DE 19716, United States [email protected] October 20-27, 2015 Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 1 / 61

Transcript of CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Page 1: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

CPEG 852 — Advanced Topics in ComputingSystems

The EARTH Program Execution ModelHybrid von Neumann/Dataflow Models

Stephane Zuckerman

Computer Architecture & Parallel Systems LaboratoryElectrical & Computer Engineering Dept.

University of Delaware140 Evans Hall Newark,DE 19716, United States

[email protected]

October 20-27, 2015

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 1 / 61

Page 2: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 2 / 61

Page 3: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 3 / 61

Page 4: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 4 / 61

Page 5: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH: The Secret Origins

I At the beginning of the 1990’s, dataflow machines are consideredobsolete compared to their vector and superscalar cousins

I Some people still think dataflow as a founding principle for models ofcomputation is still sound

I Idea: mix RISC-like processors for the low-level execution anddataflow semantics for high-level concepts

I Technically, this is more or less the definition of macro-dataflow

I EARTH is a bit different from macro-dataflow – see next slide

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 5 / 61

Page 6: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

A Short Detour: Macro-Dataflow I

Macro-Dataflow: a Description

I Concept: take a sequence of instructions, and group them into amacro-dataflow actor

I A macro-dataflow actor has input arcs (for the “tokens”) and outputarcs

I A macro-dataflow actor has no state except the one formed by itsinput tokens

I Firing rule: when all input arcs have a token present, themacro-dataflow actor may fire

I A macro-dataflow actor always places tokens on its ouputs arcs all atonce

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 6 / 61

Page 7: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

A Short Detour: Macro-Dataflow II

Why EARTH Does Not Quite Follow a Macro-Dataflow Model

I In EARTH, actors do execute an uninterruptible sequence ofinstructions (like in macro-dataflow)

I But an EARTH actor can signal/produce data at any time during itsexecution

I It is more “loose” in its asynchrony

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 7 / 61

Page 8: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

What Is a Thread in EARTH?

I Parallel function invocation (threaded function invocation)

I A code sequence defined (by the user or a compiler) to be a thread(fiber)

I Usually, a threaded function’s body will be partitioned into multiplethreads/fibers

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 8 / 61

Page 9: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 9 / 61

Page 10: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

A Simple Example: Naıve Fibonacci

u64 fib(u64 n) {

if (n < 2) return n;

return fib(n-1) + fib(n-2);

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 10 / 61

Page 11: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

A Simple Example: Naıve FibonacciComputation Tree Example

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 11 / 61

Page 12: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

A Simple Example: Naıve FibonacciActivation Frame Tree

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 12 / 61

Page 13: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Known Short Latencies, Known Long Latencies,and Unknown LatenciesAn Example

i n t f ( i n t ∗x , i n t i , i n t j ) {i n t a , b , sum , prod , f a c t ;i n t r1 , r2 , r 3 ;a = x [ i ] ;f a c t = 1 ;f a c t = f a c t ∗ a ;

b = x [ j ] ;sum = a + b ;prod = a ∗ b ;r 1 = g ( sum ) ;r 2 = g ( prod ) ;r 3 = g ( f a c t ) ;

r e t u r n r 1 + r 2 + r 3 ;}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 13 / 61

Page 14: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Known Short Latencies, Known Long Latencies,and Unknown LatenciesPartitioning Into Fibers

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 14 / 61

Page 15: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Fibers

I A fiber shares its enclosing “frame” with other fibers within the samethreaded function invocation

I The state of a fiber includes:

I Its instruction pointerI Its “temporary register set”

I A fiber is “ultra-lightweight:” it does not need dynamic storage(frame) allocation.

I Our focus: non-preemptive threads – which we call fibers

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 15 / 61

Page 16: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 16 / 61

Page 17: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Fibers Firing Rule

I A fiber becomes enabled if it has received all input signals

I An enabled fiber may be selected for execution when the requiredhardware resource has been allocated

I When a fiber finishes its execution, a signal is sent to all destinationthreads to update the corresponding synchronization slots

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 17 / 61

Page 18: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Fibers States

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 18 / 61

Page 19: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Model of Computation

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 19 / 61

Page 20: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Multithreaded Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 20 / 61

Page 21: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH vs. Cilk

Figure: EARTH ModelFigure: Cilk Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 21 / 61

Page 22: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 23: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 24: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 25: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 26: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 27: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 28: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 29: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 30: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 31: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 32: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 33: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 34: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 35: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fiber Execution Model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 22 / 61

Page 36: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 23 / 61

Page 37: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 24 / 61

Page 38: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Abstract Machine

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 25 / 61

Page 39: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Abstract Machine

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 26 / 61

Page 40: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH’s Evaluation Platforms

EARTH-Manna

Implement EARTH on a baremetal tightly coupledmulti-processor

EARTH on Clusters

I EARTH on Beowulf

I Implement EARTH on acluster of UltraSPARC SMPworkstations connected byFast Ethernet

EARTH-IBM-SP

Plan to implement EARTH on anoff-the-shelf commercial parallelmachine (IBM SP2/SP3)

Note—Benchmark codeswere all written using EARTHThreaded-C: The API forEARTH Execution andAbstract Machine Models

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 27 / 61

Page 41: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH’s Evaluation Platforms

EARTH-Manna

Implement EARTH on a baremetal tightly coupledmulti-processor

EARTH on Clusters

I EARTH on Beowulf

I Implement EARTH on acluster of UltraSPARC SMPworkstations connected byFast Ethernet

EARTH-IBM-SP

Plan to implement EARTH on anoff-the-shelf commercial parallelmachine (IBM SP2/SP3)

Note—Benchmark codeswere all written using EARTHThreaded-C: The API forEARTH Execution andAbstract Machine Models

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 27 / 61

Page 42: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH’s Evaluation Platforms

EARTH-Manna

Implement EARTH on a baremetal tightly coupledmulti-processor

EARTH on Clusters

I EARTH on Beowulf

I Implement EARTH on acluster of UltraSPARC SMPworkstations connected byFast Ethernet

EARTH-IBM-SP

Plan to implement EARTH on anoff-the-shelf commercial parallelmachine (IBM SP2/SP3)

Note—Benchmark codeswere all written using EARTHThreaded-C: The API forEARTH Execution andAbstract Machine Models

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 27 / 61

Page 43: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH’s Evaluation Platforms

EARTH-Manna

Implement EARTH on a baremetal tightly coupledmulti-processor

EARTH on Clusters

I EARTH on Beowulf

I Implement EARTH on acluster of UltraSPARC SMPworkstations connected byFast Ethernet

EARTH-IBM-SP

Plan to implement EARTH on anoff-the-shelf commercial parallelmachine (IBM SP2/SP3)

Note—Benchmark codeswere all written using EARTHThreaded-C: The API forEARTH Execution andAbstract Machine Models

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 27 / 61

Page 44: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 28 / 61

Page 45: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Open Issues

I Can a multithreaded program execution model support highscalability for large-scale parallel computing while maintaining highprocessing efficiency?

I If so, can this be achieved without exotic hardware support?

I Can these open issues be addressed both qualitatively andquantitatively with performance studies of real-life benchmarks?

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 29 / 61

Page 46: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH-MANNA Multiprocessor Testbed

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 30 / 61

Page 47: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Main Features of EARTH Multiprocessor

I Fast thread context switching

I Efficient parallel function invocation

I Good support of fine-grain dynamic load-balancing

I Efficient support of split-phase transactions

I Brings together the concept of fibers and dataflow

I All that using off-the-shelf microprocessors!

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 31 / 61

Page 48: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Main Features of EARTH Multiprocessor

I Fast thread context switching

I Efficient parallel function invocation

I Good support of fine-grain dynamic load-balancing

I Efficient support of split-phase transactions

I Brings together the concept of fibers and dataflow

I All that using off-the-shelf microprocessors!

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 31 / 61

Page 49: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH-C Compiler Environment

Figure: EARTH CompilationEnvironment

Figure: EARTH-C Compiler

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 32 / 61

Page 50: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Performance Study of EARTH

I Overview

I Microbenchmarking:

I Stress-testingI Measure performance of basic EARTH mechanisms for communication

and synchronization

I Kernel benchmarking:

I SpeedupI USE value (Uni-node Support Efficiency)

I i.e., compare pure sequential kernel vs. single-node EARTH kernel

I Latency tolerance capacity

Note—It is important to define your own performance “features” and/or“parameters” that best distinguishes your model from your competitors

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 33 / 61

Page 51: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH Benchmark Suite (EBS)

Benchmark Name Problem Size Problem Domain Characteristics

Ray Tracing 512× 512 Image Processing Class AWave-2D 150× 150 Fluid Dynamics Class ATomcatv 257 Scientific computation Class A2D-SLT 80× 80 Fluid Dynamics Class A

Matrix Multiply 480× 480 Numerical computation Class ABarnes-Hut 8192 bodies N-Body simulation Class B

MP3D 18 K particles Fluid Flow simulation Class BEM3D 20 K nodes Electromagnetic wave simulation Class B

Sampling Sorting 64 K Sorting problem Class BGauss Elimination 720× 720 Numerical computation Class B

Protein Folding 3× 3× 3 cube Chemistry Class BEigenvalue 2999 Numerical computation Class B

Vertex Enumeration 10 Pivot-based searching Class BTSP 10 Graph searching Class B

Paraffins 20 Chemistry Class BN-Queen 12 Graph searching Class B

Power 10000 Power system optimization Class B

Voronoi 64 K Graph Partitioning Class B

Heuristic-TSP 32 K Searching problem Class B

Tree-Add 1 M Graph Searching Class B

Highlighted benchmarks are implemented using a portable version of Threaded-C

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 34 / 61

Page 52: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Main Experimental Results of EARTH-MANNA

I Efficient multithreading support is possible with off-the-shelfprocessor nodes with low overhead

I At the time: context-switch time ≈ 35 cyclesI Nowadays, this figure would most likely be bigger by at least one order

of magnitude, probably ≈ 20 times bigger

I This is speculation, but even with a bare-metal implementation, thereis a ≈ 3× difference between memory bus and processor frequencies.

I A multithreaded program execution model an make a big difference

I Results from the EARTH Benchmark Suite (EBS)

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 35 / 61

Page 53: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 36 / 61

Page 54: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 37 / 61

Page 55: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Programming Models for Multithreaded ArchitecturesThe EARTH Threaded-C Experience

Threaded-C: A Base Language

I Serves as a target language for high-level language compilers

I Serves as a machine language for the EARTH architecture

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 38 / 61

Page 56: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Role of Threaded-C

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 39 / 61

Page 57: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 40 / 61

Page 58: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Features of Threaded Programs

I Thread partition

I Thread length vs. useful parallelismI Where to “cut” a dependence and create a “split-phase?”

I Split-phase synchronization and communication

I Parallel threaded function invocation

I Dynamic load-balancing

I Other advanced features: fibers and dataflow

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 41 / 61

Page 59: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Operation Set

Base operations

I Thread synchronization and scheduling ops

I SPAWN

I SYNC

I Split-phase data & synchronization operations

I GET SYNC

I DATA SYNC

I Threaded function invocation and load-balancing operations

I INVOKE

I TOKEN

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 42 / 61

Page 60: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Instruction Set I

I Basic insructions:

I Arithmetic, logic, branchingI Typical RISC instructions, e.g., those from the i860

I Thread Switching

I FETCH NEXT

I Synchronization

I SPAWN fp, ip

I SYNC fp, ss off

I INIT SYNC ss off, sync cnt, reset cnt, ip

I INCR SYNC fp, ss off, value

I Data transfers & synchronization

I DATA SPAWN value, dest addr, fp, ip

I DATA SYNC value, dest addr, fp, ss off

I BLOCKDATA SPAWN src addr, dst addr, size, fp, ip

I BLOCKDATA SYNC src addr, dst addr, size, fp, ss off

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 43 / 61

Page 61: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The EARTH Instruction Set II

I Split-phase data requests

I GET SPAWN src addr, dst addr, fp, ip

I GET SYNC src addr, dst addr, fp, ss off

I GET BLOCK SPAWN src addr, dst addr, fp, ip

I GET BLOCK SYNC src addr, dst addr, fp, ss off

I Function invocation

I INVOKE dst PE, func name, num params, params

I TOKEN func name, num params, params

I END FUNCTION

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 44 / 61

Page 62: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 45 / 61

Page 63: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

EARTH-MANNABenchmark Programs

I Ray Tracing is a program for rendering 3-D photo-realistic images

I Protein Folding is an application that computes all possible foldingstructures of a given polymer

I TSP is an application to find a minimal-length Hamiltonian cycle in agraph with N cities and weighted paths.

I Tomcatv is one of the SPEC benchmarks which operates upon amesh

I Parrafins is another application which enumerates distinct isomersparaffins

I 2D-SLT is a program implementing the 2D-SLT Semi-LagrangianAdvection Model on a Gaussian Grid for numerical weatherpredication

I N-Queens is a benchmark program typical of graph searchingproblem.

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 46 / 61

Page 64: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 47 / 61

Page 65: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Reminder: Tree of Activation Frames

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 48 / 61

Page 66: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fibonacci Example

Definition: Fibonacci SequenceU0 = 0

U1 = 1

Un = Un−1 + Un−2

, ∀n ∈ N

u64 fib(u64 n) {

if (n < 2) return n;

return fib(n-1) + fib(n-2);

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 49 / 61

Page 67: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fibonacci ExampleThreaded-C Version

THREADED fib(u64 n, u64* result , sslot_t done) {

THREAD -0:

u64 sum1 = 0, sum2 = 1;

sslot_t slot1 = ... , /∗, , ( 0 , 0 ), , ∗/,slot2 = ... , /∗, , ( 2 , 2 ), , ∗/;

if (n < 2) {

DATA_RSYNC(n, result , done);

} else {

TOKEN (fib , n-1, &sum1 , slot1 );

TOKEN (fib , n-2, &sum2 , slot2 );

}

END_THREAD ();

THREAD -1:

DATA_RSYNC(sum1+sum2;, result , done);

END_THREAD ();

END_FUNCTION;

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 50 / 61

Page 68: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

GEneral Matrix Multiplication (GEMM)Definition

Although the definition is general, we set our numbers in either R or C.We generalize with the set K.Let A, B, and C be matrices, with AM,K , BK ,N , CM,N ∈ K×K; letα, β ∈ K. Then,

CM,N ← β × CM,N + α× AM,K × BK ,N

One element ci ,j of CM,N is computed as such:

ci ,j = β · ci ,j + α ·k=K∑k=1

ai ,k · bk,j , ai ,k ∈ AM,K , bk,j ∈ BK ,N

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 51 / 61

Page 69: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

GEneral Matrix Multiplication (GEMM)Square Case — Naıve Sequential Code

Note—To simplify the problem, we assume β = 0, α = 1, 0. In other words, we compute

CN,N = AN,N × BN,N .

void dgemm(const double beta , double* C,

const double alpha ,

const double* A, const double* B,

const size_t N)

{

for (size_t i = 0; i < N; ++i) {

for (size_t j = 0; j < N; ++j) {

c[i*N+j] = beta * c[i*N+j];

for (size_t k = 0; k < N; ++k)

c[i*N+j] += alpha * A[i*N+k] * b[k*N+j];

}

}

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 52 / 61

Page 70: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fibonacci ExampleThreaded-C Version — Outer ProductNote—We assume the B matrix is correctly transposed to allow row-major traversal.

THREADED void dgemm(double* C, const double* A, const double* B,

const size_t N)

{

double *row_a , *column_b;

sslot_t slot0 = init_sslot (0,0),

slot1 = init_sslot(N*N,N*N);

THREAD -0:

for (size_t i = 0; i < M; ++i) {

for (size_t j = 0; j < N; ++j) {

row_a = a[i];

column_b = b[j];

for (size_t k = 0; k < K; ++k)

TOKEN (inner , &c[i*N+j], row_a , column_b , slot2 );

}

}

END_THREAD ();

THREAD -1:

RETURN (); , //, , do, , no th in gEND_THREAD ();

END_FUNCTION;

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 53 / 61

Page 71: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

The Fibonacci ExampleThreaded-C Version — Inner Product

Note—We assume the B matrix is correctly transposed to allow row-major traversal.

THREADED void inner( double* result , const size_t N,

const double* A, const double* B,

sslot_t done)

{

double sum , *row_a , *column_b;

sslot_t slot0 = init_sslot (0,0),

slot1 = init_sslot (2 ,2);

THREAD -0:

BLKMOV_SYNC(A, row_a , N, slot1);

BLKMOV_SYNC(B, column_b , N, slot1);

sum = 0.0;

END_THREAD ();

THREAD -1:

for (size_t i = 0; i < N; ++i)

sum += row_a[i] * column_b[j];

DATA_RSYNC (* result +=sum;, done);

END_THREAD ();

END_FUNCTION;

}

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 54 / 61

Page 72: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 55 / 61

Page 73: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Compilation Environment, Revisited

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 56 / 61

Page 74: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Compilation Environment, Revisited

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 57 / 61

Page 75: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Main Features of EARTH

∆ Fast thread context switching

I Efficient parallel function invocation

I Good support of fine-grain dynamic load-balancing

∆ Efficient support of split-phase transactions and fibers

Note—Items marked with a ∆ are features unique to EARTH, and notfound in the original Cilk model

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 58 / 61

Page 76: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Outline

1 IntroductionSecret Origins of EARTH

2 The EARTH Program Execution Model

3 The EARTH Abstract Machine ModelThe EARTH Abstract Machine

4 EARTH-Manna: An Implementation of the EARTH ArchitectureModel

5 Programming Models for Multithreaded ArchitecturesFeatures of Multithreaded Programming ModelsEARTH Instruction SetThe EARTH Benchmark Suite (EBS)Programming ExamplesCompilation Environment, Revisited

6 Summary

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 59 / 61

Page 77: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

Summary of EARTH-CExtensions

I Explicit parallelism

I Parallel vs. sequential statement sequencesI forall loops

I Locality annotation

I Local vs. remote memory references (global, local, replicate, . . . )

I Dynamic load-balancing

I Basic vs. remote function & invocation sites

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 60 / 61

Page 78: CPEG 852 Advanced Topics in Computing Systems The EARTH ...

References I

I Kevin Bryan Theobald (1999). “Earth: an efficient architecture for running threads”.AAINQ50269. PhD thesis. Montreal, Que., Canada, Canada. isbn: 0-612-50269-4

I Guang R. Gao and Vivek Sarkar (1995). “Location Consistency: Stepping Beyond theMemory Coherence Barrier”. In: ICPP (2), pp. 73–76

I Guang R. Gao and Vivek Sarkar (1997). “On the Importance of an End-To-End View ofMemory Consistency in Future Computer Systems”. In: Proceedings of the InternationalSymposium on High Performance Computing. London, UK: Springer-Verlag, pp. 30–41.isbn: 3-540-63766-4. url: http://portal.acm.org/citation.cfm?id=646346.690059

I Guang R. Gao and Vivek Sarkar (2000). “Location Consistency-A New Memory Modeland Cache Consistency Protocol”. In: IEEE Trans. Comput. 49 (8), pp. 798–813. issn:0018-9340. doi: 10.1109/12.868026. url:http://portal.acm.org/citation.cfm?id=354862.354865

Zuckerman CPEG852 – Fall ’15 – Memory Consistency Models 61 / 61