Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines...

55
Programming with Charm++ and AMPI: Introduction Laxmikant (Sanjay) Kale & Eric Bohm http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana Champaign

Transcript of Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines...

Page 1: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Programming with Charm++ and AMPI: Introduction

Laxmikant (Sanjay) Kale & Eric Bohmhttp://charm.cs.uiuc.edu

Parallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana Champaign

Page 2: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Motivation, Concepts and Benefits

Before we plunge into the details of programming, let us examineo Why we developed Charm++o What are some of the central ideaso What benefits you get by using Charm++ 

And so, in what contexts is it especially useful

Specifically:o What is object‐based over‐decomposition

A bit of a look under the hood: how is it implemented

o How Charm++ and AMPI embody this ideao Capabilities of the system: what it automates, supports, …

August 5th, 2009 2Charm ++ and AMPI: Session I

Page 3: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Summarizing the State of Art

Petascaleo Very powerful parallel computers are being builto Application domains exist that need that kind of powerNew generation of applicationso Use sophisticated algorithmso Dynamic adaptive refinementso Multi‐scale, multi‐physicsChallenge: o Parallel programming is more complex than sequentialo Difficulty in achieving performance that scales to PetaFLOPs and beyond

o Difficulty in getting correct behavior from programs 

August 5th, 2009 3Charm ++ and AMPI: Session I

Page 4: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Guiding Principles Behind Charm++

No magico Parallelizing compilers have achieved close to technical perfection, but are not enough

o Sequential programs obscure too much information

Seek an optimal division of labor between the system and the programmer

Design abstractions based solidly on use‐caseso Application‐oriented yet computer‐science centered approach

August 5th, 2009 4Charm ++ and AMPI: Session I

Page 5: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Charm ++ and CSE Applications

5

Enabling CS technology of parallel objects and intelligent runtime systems has led to several CSE collaborative applications

Synergy

Well‐known Biophysics molecular simulations App 

Gordon Bell Award, 2002

Computational Astronomy

Nano‐Materials..

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 6: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Object‐based Over‐decomposition

Let the programmer decompose computation into objects

Work units, data‐units, composites

Let an intelligent runtime system assign objects to processorso RTS can change this assignment (mapping) during execution

Locality of data references is a critical attribute for performance A parallel object can access only its own datao Asynchronous method invocation for accessing other objects’ data

August 5th, 2009 6Charm ++ and AMPI: Session I

Page 7: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Object‐based Over‐decomposition: Charm++

August 5th, 2009 7

User view

System implementation

• Multiple “indexed collections” of C++ objects• Indices can be multi‐dimensional and/or sparse• Programmer expresses communication between objects

– with no reference to processors

Charm ++ and AMPI: Session I

Page 8: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Object‐based Over‐decomposition: AMPI

Each MPI process is implemented as a user‐level threadThreads are light‐weight and migratable!

o <1 microsecond context switch time, potentially >100k threads per core

Each thread is embedded in a charm++ object (chare)

August 5th, 2009 8

Real Processors

MPI processes

Virtual Processors (user‐level migratablethreads)

Charm ++ and AMPI: Session I

Page 9: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

B e n e f i t s o f t h e C h a r m M o d e l : O u t l i n e

Software engineeringo Number of virtual processors can be independently controlledo Separate VPs for different modulesMessage driven execution

o Adaptive overlap of communicationo Predictability : 

Automatic out‐of‐corePrefetch to local stores

o Asynchronous reductionsDynamic mapping

o Heterogeneous clusters: vacate, adjust to speed, shareo Automatic checkpointingo Change set of processors usedo Automatic dynamic load balancingo Communication optimizationo Fault tolerance

August 5th, 2009 9Charm ++ and AMPI: Session I

Page 10: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

P a r a l l e l D e c o m p o s i t i o n a n d P r o c e s s o r s

MPI style encourageso Decomposition into P pieces, where P is the number of physical processors available

o If your natural decomposition is a cube, then the number of processors must be a cube

o …

Charm++/AMPI style “virtual processors”o Decompose into natural objects of the applicationo Let the runtime map them to processorso Decouple decomposition from load balancing

August 5th, 2009 10Charm ++ and AMPI: Session I

Page 11: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Decomposition Independent of Number of Cores

August 5th, 2009 11

Rocket simulation example under traditional MPI vs. Charm++/AMPI framework

o Benefit: load balance, communication optimizations, modularity

Solid

Fluid

Solid

Fluid

Solid

Fluid. . .

1 2 P

Solid1

Fluid1

Solid2

Fluid2

Solidn

Fluidm. . .

Solid3. . .

Charm ++ and AMPI: Session I

Page 12: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Parallel Composition: A1; (B || C ); A2

Recall: Different modules, written in different languages/paradigms, can overlap in time and on processors, without programmer having to worry about this explicitly

Charm ++ and AMPI: Session IAugust 5th, 2009 12

Page 13: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Without message-driven execution (and virtualization), you get either:Space-division

Charm ++ and AMPI: Session IAugust 5th, 2009 13

Page 14: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

OR: Sequentialization

Charm ++ and AMPI: Session IAugust 5th, 2009 14

Page 15: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

P a r a l l e l i z a t i o n U s i n g C h a r m + +

August 5th, 2009 15

Bhatele, A., Kumar, S., Mei, C., Phillips, J. C., Zheng, G. & Kale, L. V. 2008 Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium, Miami, FL, USA, April 2008.

Charm ++ and AMPI: Session I

Page 16: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Performance of NAMD

12481632641282565121024

128

256

512

1024

2048

4096

8192

16384

32768

STMV Blue Gene/L

STMV Blue Gene/P

STMV Cray XT4

ApoA1 Blue Gene/L

ApoA1 Blue Gene/P

ApoA1 Cray XT4

August 5th, 2009 16

No. of cores

Time (m

s pe

r step

)

STMV:  ~1 million atoms

Blue Gene results based on work on DCMF many‐to‐many pattern by  Sameer Kumar, IBM Research

ApoA1:  ~92K atoms

Charm ++ and AMPI: Session I

Page 17: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

S c h e d u l e r SchedulerMessage Q Message Q

Message‐driven ExecutionAdaptively and automatically overlaps communication and computationAlso, you can predict data accesso You can peak ahead at the queue

Can use to scale memory wallPrefetching of needed data 

– into scratch pad memories, for exampleAutomatic out‐of‐core execution 

August 5th, 2009 17Charm ++ and AMPI: Session I

Page 18: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Automatic Dynamic Load Balancing

Measurement based load balancerso Principle of persistence: In many CSE applications, 

computational loads and communication patterns tend to persist, even in dynamic computations

o So, recent past is a good predictor of near future

o Charm++ provides a suite of load‐balancers periodic measurement and migration of objects

Seed balancers (for task‐parallelism)o Useful for divide‐and‐conquer and state‐space‐search 

applications

o Seeds for charm++ objects moved around until they take root

August 5th, 2009 18Charm ++ and AMPI: Session I

Page 19: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Fault Tolerance

Automatic checkpointing

o Migrate objects to disko In‐memory checkpointing as 

an optiono Automatic fault detection and 

restart

“Impending fault” response

o Migrate objects to other processors

o Adjust processor‐level parallel data structures

Scalable fault toleranceo Experimental featureo When a processor out of 

100,000 fails, all 99,999 shouldn’t have to run back to their checkpoints!

o Sender‐side message loggingo Latency tolerance helps 

mitigate costso Restart can be speeded up by 

spreading out objects from failed processor

August 5th, 2009 Charm ++ and AMPI: Session I 19

Page 20: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

How to Tune Performance for a Future Machine?

For example, Blue Waters will arrive in 2011o But need to prepare applications for it starting now

Even for extant machineso Full size machine may not be available as often as needed for tuning runs

Our approach: BigSimo Full scale emulation, leveraging virtualizationo Trace‐driven simulation

August 5th, 2009 20Charm ++ and AMPI: Session I

Page 21: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

The Rest of the Day

Before lunchAMPI Hands‐on ExerciseBasic Charm++Charm++  Hands‐on Exercise

After lunchLoad BalancingHands‐on with Load BalancingIntermediate Charm++Overview of Advanced Charm++ Concepts

August 5th, 2009 Charm ++ and AMPI: Session I 21

Page 22: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

AMPI HANDS‐ON TUTORIALEric J. Bohm

Charm ++ and AMPI: Session I 22

Page 23: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Converting MPI to AMPI

Jacobi 1D Starting Point: version1.cpp

This MPI Program already works as an AMPI program under certain limitations:

o AMPI virtualization ratio must be 1(data stored in global variable is not private to each MPI rank)

o AMPI migration is not used (no pup)

23Charm ++ and AMPI: Session IAugust 5th, 2009

Page 24: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Converting MPI to AMPI

We will fix these two problems!

Specifically:

o Non‐constant data stored in global variables must be privatized (one per AMPI thread)

o PUP (serialization) routines must be created to allow AMPI threads to migrate with user data

24Charm ++ and AMPI: Session IAugust 5th, 2009

Page 25: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Fixing First Problem

The first of these problems is fixed in:version2.cpp

Data stored in global variables is privatized by allocating it on the heap and/or stack

25

version 2.cpp

// called by each AMPI rank in main()chunk *cp = new chunk;

version1.cpp

// in global scopechunk cp;

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 26: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Create a new file called version3.cpp with the same contents as version2.cpp

We will add some code to create a functional AMPI version with automatic load balancing

26Charm ++ and AMPI: Session IAugust 5th, 2009

Page 27: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Pack‐Unpack(PUP) routines must be created to allow AMPI threads to migrate

PUP routines serialize the user data for an AMPI rank when it migrates

In this program all user data is stored in the heap allocated “chunk *cp”

27Charm ++ and AMPI: Session IAugust 5th, 2009

Page 28: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Step 1: Add this PUP routine

28

#ifdef AMPIvoid chunk_pup(pup_er p, void *d) {chunk **cpp = (chunk **) d;if(pup_isUnpacking(p))

*cpp = new chunk;chunk *cp = *cpp;pup_doubles(p, cp->t, (BLOCKSIZE+2)); // PUP an array of

doublespup_int(p, &cp->xp); // PUP an integer valuepup_int(p, &cp->xm); // PUP an integer valueif(pup_isDeleting(p))

delete cp;}#endif

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 29: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Step 2: Register the newly created pup routine in main() after allocation of cp

All program state must be saved and recreated via this PUP routine. Normally this includes just program data, but open files, sockets, and other resources might need to be handled.

29

#ifdef AMPIMPI_Register((void*)&cp, (MPI_PupFn) chunk_pup);

#endif

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 30: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Step 3: Request load balancing every few iterations

The AMPI migration framework may migrate AMPI ranks at these points in the program

30

#ifdef AMPIif(iter%10 == 0) MPI_Migrate();

#endif

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 31: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Hands On: Enabling Migration

Step 4: Enable the desired load balancing module at link time

Turn on load balancing debug output at runtime to verify that everything is working

See the Charm++ manual for more load balancing optionshttp://charm.cs.uiuc.edu/manuals/

31

-balancer GreedyLB

+LBDebug 1

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 32: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Using Charm++ with Arrays

Laxmikant (Sanjay) Kale & Eric Bohmhttp://charm.cs.uiuc.edu

Parallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana Champaign

Page 33: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

The Charm++ Model

Parallel objects (chares) communicate via asynchronous method invocations (entry methods)

The runtime system maps chares onto processors and schedules execution of entry methods

33Charm ++ and AMPI: Session IAugust 5th, 2009

Page 34: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

The Charm++ Model

34

User View:

System View:

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 35: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Proxy Objects

Entry methods for each chare are invoked using proxy objects‐‐lightweight handles to potentially remote chares

35Charm ++ and AMPI: Session IAugust 5th, 2009

Page 36: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Limitations of Plain Proxies

In a large program, keeping track of all the proxies is difficult

A simple proxy doesn’t tell you anything about the chare other than its type

Managing collective operations like broadcast and reduce is complicated

36Charm ++ and AMPI: Session IAugust 5th, 2009

Page 37: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Example: Molecular Dynamics

Patches: blocks of space containing particles

Computes: chares which compute the interactions between particles in two patches

37Charm ++ and AMPI: Session IAugust 5th, 2009

Page 38: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Example: Molecular Dynamics

We could write this program using just chares and plain proxies

But, it’ll be much simpler if we use Chare Arrays

38

Note: you can find complete MD code in examples/charm++/Molecular2D in the Charm distribution

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 39: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Chare Arrays

Arrays organize chares into indexed collections

Each chare in the array has a proxy for the other array elements, accessible using simple syntax

sampleArray[i] // i’th proxy

39Charm ++ and AMPI: Session IAugust 5th, 2009

Page 40: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Array Dimensions

Anything can be used as array indiceso integers

o tuples

o bit vectors

o user‐defined types

40Charm ++ and AMPI: Session IAugust 5th, 2009

Page 41: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Arrays in MD

array [2D] Patch { ... }

array [4D] Compute { ... }

note: arrays can be sparse

41

x and y coordinates of patch

x and y coordinates of both patches involved

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 42: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Managing Mapping

Arrays let the programmer control the mapping of array elements to PEso Round‐robin, block‐cyclic, etc.

o User defined mapping

42Charm ++ and AMPI: Session IAugust 5th, 2009

Page 43: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Broadcasts

Simple way to invoke the same entry method on each array element

Syntax:// call on one element

sampleArray[i].method()

// broadcast to whole array

sampleArray.method() 

43Charm ++ and AMPI: Session IAugust 5th, 2009

Page 44: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Broadcasts in MD

Used to fire off the first timestep after initialization is complete

// All chares are created, time to // start the simulationpatchArray.start();

44Charm ++ and AMPI: Session IAugust 5th, 2009

Page 45: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Reductions

No global flow of control, so each chare must contribute data independently using contribute()A user callback (created using CkCallback) is invoked when the reduction is complete

45Charm ++ and AMPI: Session IAugust 5th, 2009

Page 46: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Reductions

Runtime system builds reduction tree

User specifies reduction operation

At root of tree, a callback is performed on a specified chare

46Charm ++ and AMPI: Session IAugust 5th, 2009

Page 47: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Reduction Example

Sum local error estimators to determine global error

CkCallback cb(CkIndex_Main::computeGlobalError(),mainProxy);

contribute(sizeof(myError), (void*)&myError,CkReduction::sum, cb);

47

Function to call

Chare to handle callback function

Local Data

Reduction operation

Charm ++ and AMPI: Session IAugust 5th, 2009

Page 48: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

CHARM++ HANDS‐ON TUTORIALEric J. Bohm

Charm ++ and AMPI: Session IAugust 5th, 2009 48

Page 49: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Ring Example

Different chare elements pass a token around 

August 5th, 2009 49Charm ++ and AMPI: Session I

Page 50: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

ring.ci

August 5th, 2009 50

mainmodule ring {readonly CProxy_Main mainProxy;readonly int numChares;

mainchare Main {entry Main(CkArgMsg *m);entry void done(void);

};

array [1D] Ring {entry Ring(void);entry void passToken(int token);

}; };

Charm ++ and AMPI: Session I

Page 51: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

ring.C (Main Chare)

August 5th, 2009 51

#include "ring.decl.h“

/*readonly*/ CProxy_Main mainProxy;/*readonly*/ int numChares;

/*mainchare*/ class Main : public Chare{public:Main(CkArgMsg* m) {numChares = 5;if(m->argc > 1)

numChares = atoi(m->argv[1]);delete m;

CkPrintf("Running Ring on %d processors for %d elements\n", CkNumPes(), numChares);mainProxy = thishandle;

CProxy_Ring arr = CProxy_Ring::ckNew(numChares);

arr[0].passToken(11);};

void done(void) {CkPrintf("All done\n");CkExit();

};};

Charm ++ and AMPI: Session I

Page 52: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

ring.C (1D array)

August 5th, 2009 52

/*array [1D]*/class Ring : public CBase_Ring{public:Ring() {// CkPrintf("Ring %d created\n",thisIndex);

}

Ring(CkMigrateMessage *m) { }

void passToken(int token) {CkPrintf("Token No [%d] received by chare element %d\n", token, thisIndex);if (thisIndex < numChares-1)

// Pass the tokenthisProxy[thisIndex+1].passToken(token+1);

else // We've been around once-- we're done.mainProxy.done();

}};

#include "ring.def.h"

Charm ++ and AMPI: Session I

Page 53: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Compiling a Charm++ Program

Steps in compiling a Charm++ program:

Running the program:

August 5th, 2009 53

charmc ring.cicharmc –O3 –c ring.Ccharmc –O3 –language charm++ –o ring ring.o

./charmrun +p4 ./ring 13

Charm ++ and AMPI: Session I

Page 54: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Output of the Ring Program

August 5th, 2009 54

[bhatele@zen] ring $ ./charmrun +p4 ./ring 13

Charm++: scheduler running in netpoll mode.Charm++> cpu topology info is being gathered. Charm++> 1 unique compute nodes detected. Running Ring on 4 processors for 13 elementsToken No [11] received by chare element 0Token No [12] received by chare element 1Token No [13] received by chare element 2Token No [14] received by chare element 3Token No [15] received by chare element 4Token No [16] received by chare element 5Token No [17] received by chare element 6Token No [18] received by chare element 7Token No [19] received by chare element 8Token No [20] received by chare element 9Token No [21] received by chare element 10Token No [22] received by chare element 11Token No [23] received by chare element 12All done

[bhatele@zen] ring $

Charm ++ and AMPI: Session I

Page 55: Programming AMPI: Introduction - Great Lakes Consortium · AMPI threads to migrate PUP routines serialize the user data for an AMPI rank when it migrates In this program all user

Task

Change the entry method passToken(…) such that it passes an array of integers around

Every chare prints the entire array

August 5th, 2009 55Charm ++ and AMPI: Session I