Distributed Array Component based on Global Arrays
description
Transcript of Distributed Array Component based on Global Arrays
CCACommon Component Architecture
Distributed Array Component based on Global Arrays
Manoj Krishnan, Jarek NieplochaHigh Performance Computing Group
Pacific Northwest National Laboratory
CCA Forum
CCACommon Component Architecture
Overview
Global Arrays Distributed Array ComponentCore CapabilitiesApplicationsFuture Work
CCACommon Component Architecture
Global Arrays physically distributed dense array
single, shared data structure global indexing
shared memory model in context of distributed dense arrays
complete environment for parallel code development compatible with MPI ~140 functions
data locality control similar to distributed memory/message passing model
e.g., A(4,3) rather than buf(7) on task 2
CCACommon Component Architecture
Global Array Model of Computations
compute/update
local memorylocal memory
Shared Object
copy to local mem
ory
1-sidedcommunication
get
Shared Object
cop
y to
sha
red
obje
ct
local memory
1-sidedcommunication
put
CCACommon Component Architecture
Structure of GA
Message Passingprocess creation,
run-time environment
ARMCIportable 1-sided communication
put,get, locks, etc
distributed arrays layermemory management, index translation
application interfacesFortran 77, C, C++, Python, SIDL
system specific interfacesLAPI, GM/Myrinet, Elan/Quadrics, threads, VIA,..
CCACommon Component Architecture
Distributed Array Component
GAComponentGAComponent: Classic and SIDL Interfaces
36+98 (direct+indirect) global arrays classic methods are available through GAClassicPort
GADADFPort provides methods, proposed by Data Working Group of CCA Forum, for creating array descriptors and templates
GA
Linear Algebra
DADF
GA Classic
CCACommon Component Architecture
GA Classic Port
• GAClassicPort – provides public interfaces for creating and accessing
distributed arrays i.e., GlobalArray objects
• GlobalArray– encapsulate all details of the data distribution,
addressing, and data access . – offers a set of operations for
• one-sided data transfer operations (get, put, scatter, gather, etc)
• collective array operations
• supportive operations for data locality control and queries
CCACommon Component Architecture
class GAClassicPort: public virtual ::classic::gov::cca::Port { /* array creation methods, for example */virtual GlobalArray* createGA(…) = 0;virtual GlobalArray* createGA_Ghosts(…)=0; /* utility operations like reduce, broadcast, etc.,. */virtual void brdcst(void *buf, int lenbuf, int root)=0;/* cluster & process information e.g. rank, size*/nnodes(),clusterNnodes(),clusterNodeid(),clusterNprocs/* Interprocess Synchronization: locks, barrier */lock(), unlock(), sync(), fence(), createMutexes(), …
} /* Total: 36 methods available thru’ this port */
Class GlobalArray {/* one-sided communication operations */put(), get(), accumulate(), scatter, gather, ...
/* collective array operations (whole and patch) */copy(), scale(), add(), gemm(), update_ghosts(), ...
/* element wise operations, ghost cell methods, matrix operatios etc… */
} /* Total: 98 methods available */
CCACommon Component Architecture
Core Capabilities
Distributed array• dense arrays 1-7 dimensions
• four data types: integer, real, double precision, double complex
• global rather than per-task view of data structures
• user control over data distribution: regular and irregular
Collective and shared-memory style operations Support for ghost cells Interfaces to third party parallel numerical libraries
• PeIGS, Scalapack, SUMMA, TAO
CCACommon Component Architecture
GA DADF Port
Provides standard interface for defining, creating and querying distributed arrays– Supports creating, cloning and destruction of arrays,
array templates and descriptors– DADF-Distributed Array Descriptor Factory by Data
Working Group of CCA forum. DADF Array
– creates a distributed array DADF Template:
– Virtual multi-dimensional array to which one or more actual distributed arrays may be aligned
DADF Descriptor– To query an existing distributed array
CCACommon Component Architecture
class DADFPort: : public virtual ::classic::gov::cca::Port { /* methods to create/clone/destroy dscr,array,templates*/ virtual DistArrayDescriptor * createDescriptor(..) = 0; virtual DistArray * createArray (…) = 0; virtual DistArrayTemplate* createTemplate(…) = 0; ...}
class DistArray { /** Set data type. */ virtual int setDataType(const enum DataType type) = 0; /** Associate this data object with distribution template. */ virtual int setTemplate(DistArrayTemplate * & templ) = 0; /** Sets this process's location in the process topology. */ virtual int setMyProcCoords(const int procCoords[] ) = 0; /** Align object to template with identity mapping. */ virtual int setIdentityAlignmentMap() = 0; /** Signal that data object is completely defined. */ virtual int commit() = 0; ... /* set of query & miscellaneous functions */}
CCACommon Component Architecture
Class Hierarchy
DistArrayTemplate DistArray
DADFDescriptor DADFArrayDADFTemplate
GA XExample DAs
CCACommon Component Architecture
GA TAO
addProvidesPort registerUsesPort
CCA Services
GADADF
CCA Services
LA
getPortgetPort(“ga”)(“ga”)
LA
GA/TAO Interoperability
TAO - optimization component (Toolkit for Advanced Optimization – ANL) provides advanced optimization algorithms
GA provides TAO core linear algebra support for manipulating vectors, matrices, and linear solvers thru’ LinearAlgebraPort (LA)
CCACommon Component Architecture
GA Component in Applications (I)
GA LJMD
addProvidesPort
registerUsesPort
CCA Services
GADADF
CCA Services
GA
getPortgetPort(“ga”)(“ga”)
LA
Lennard Jones Molecular Dynamics Force decomposition method &
dynamic load balancing (improves performance over the traditional message-passing version by S.Plimpton, Sandia)
Component overhead is negligible (<1%)
Good scaling (simulation of 12,000 atoms yields a speedup of 7.86 on 8 processors)
CCACommon Component Architecture
Chemistry: Molecular geometry optimization (between GA and TAO)
GA Component in Applications (II)
CCACommon Component Architecture
GA
Solver
addProvidesPort
registerUsesPort
CCA Services
GADADF
CCA Services
GA
getPortgetPort(“ga”)(“ga”)LA
CFD
registerUsesPort
CCA Services
GA
Visualization
registerUsesPort
CCA Services
GA
getPortgetPort(“ga”)(“ga”)
getPortgetPort(“ga”)(“ga”)
CCACommon Component Architecture
Applications Areas
thermal flow simulation
Visualization and image analysis
electronic structure glass flow simulation
material sciences molecular dynamics
Others: financial security forecasting, astrophysics, geosciences
biology
CCACommon Component Architecture
Future Work
Additional capabilities in GA component including operations necessary for supporting more TAO optimization algorithms. will also involve new nonblocking communication interfaces.
Implementation of component that interfaces secondary storage (parallel I/O).
Verify component usability for large apps Study performance and overhead associated with CCA ESI (or any generic solver) interfaces to distributed array
component
CCACommon Component Architecture
Feedback
Provide a generic distributed array component We would like to know
Applications that need distributed array components Functionalities expected from apps Additions/modifications required Suggestions to make it more generic
Communication interfaces in DADF (put/get) ..? Setting up priorities based on feedback