- ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley...

46
- ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) [email protected] UC Berkeley - CS267
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley...

Page 1: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

- ACTS -A Reliable Software Infrastructure

for Scientific Computing

Osni MarquesLawrence Berkeley National Laboratory (LBNL)

[email protected]

UC Berkeley - CS267

Page 2: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 2

Outline

• Keeping the pace with the software and hardware• Hardware evolution

• Performance tuning

• Software selection

• What is missing?

• The DOE ACTS Collection Project• Goals

• Current features

• Lessons learned

Page 3: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 3

IBM BlueGene/L

A computation that took 1 full year to complete in 1980 could be done in ~ 10 hours in 1992, in ~ 16 minutes in 1997,

in ~ 27 seconds in 2001 and in ~ 1.7 seconds today!

Page 4: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 4

Challenges in the Development of Scientific Codes

• Research in computational sciences is fundamentally interdisciplinary

• The development of complex simulation codes on high-end computers is not a trivial task

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Research in computational sciences is fundamentally interdisciplinary

• The development of complex simulation codes on high-end computers is not a trivial task

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Libraries written in different languages• Discussions about standardizing

interfaces are often sidetracked into implementation issues

• Difficulties managing multiple libraries developed by third-parties

• Need to use more than one language in one application

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

• Libraries written in different languages• Discussions about standardizing

interfaces are often sidetracked into implementation issues

• Difficulties managing multiple libraries developed by third-parties

• Need to use more than one language in one application

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

Page 5: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 5

Automatic Tuning

• For each kernel1. Identify and generate a space of

algorithms

2. Search for the fastest one, by running them

• What is a space of algorithms?• Depending on kernel and input, may

vary

• instruction mix and order

• memory access patterns

• data structures

• mathematical formulation

• When do we search?• Once per kernel and architecture

• At compile time

• At run time

• All of the above

• PHiPAC: www.icsi.berkeley.edu/~bilmes/phipac

• ATLAS: www.netlib.org/atlas

• XBLAS: www.nersc.gov/~xiaoye/XBLAS

• Sparsity: www.cs.berkeley.edu/~yelick/sparsity

• FFTs and Signal Processing• FFTW: www.fftw.org

• Won 1999 Wilkinson Prize for Numerical Software

• SPIRAL: www.ece.cmu.edu/~spiral

• Extensions to other transforms, DSPs

• UHFFT

• Extensions to higher dimension, parallelism

Page 6: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 6

What About Software Selection?

• Use a direct solver (A=LU) if• Time and storage space acceptable

• Iterative methods don’t converge

• Many b’s for same A

• Criteria for choosing a direct solver• Symmetric positive definite (SPD)

• Symmetric

• Symmetric-pattern

• Unsymmetric

• Row/column ordering schemes available• MMD, AMD, ND, graph partitioning

• Hardware

bAx :Example

Build a preconditioning matrix K such that Kx=b is much easier to solve than Ax=b and K is somehow “close” to A (incomplete LU decompositions, sparse approximate inverses, polynomial preconditioners, preconditioning by blocks or domains, element-by-element, etc). See Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods.

Page 7: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 7

Components: simple example

)2

()(1

1

n

j

jjb

a

xxf

n

abdxxf

a bx

)(xf

Numerical integration: midpoint

N

in

b

a

xfNab

dxxf1

)(11

)(

Numerical integration: Monte Carlo

a b

)(xf

x

xxf 2)(2

2

1 )( xxf

23 1

4)(

xxf

Page 8: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 8

The DOE ACTS Collection

Goals Collection of tools for developing parallel applications

Extended support for experimental software

Make ACTS tools available on DOE computers

Provide technical support ([email protected])

Maintain ACTS information center (http://acts.nersc.gov)

Coordinate efforts with other supercomputing centers

Enable large scale scientific applications

Educate and train

• High Performance Tools• portable• library calls• robust algorithms• help code optimization

• More code development in less time• More simulation in less computer time

• High• Intermediate level• Tool expertise• Conduct tutorials

• Intermediate• Basic level• Higher level of support to users of the tool

• Basic• Help with installation • Basic knowledge of the tools• Compilation of user’s reports

Levels of Supporthttp://acts.nersc.govhttp://acts.nersc.gov

Page 9: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 9

Current ACTS Tools and their Functionalities

Category Tool Functionalities

Numerical

Aztec Algorithms for the iterative solution of large sparse linear systems.

HypreAlgorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters.

PETSc Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations.

OPT++ Object-oriented nonlinear optimization package.

SUNDIALSSolvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.

ScaLAPACK Library of high performance dense linear algebra routines for distributed-memory message-passing.

SuperLU General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.

TAOLarge-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization.

Code DevelopmentGlobal Arrays

Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-memory view of distributed arrays.

Overture Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries.

Code ExecutionCUMULVS

Framework that enables programmers to incorporate fault-tolerance, interactive visualization and computational steering into existing parallel programs.

TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs.

Library Development ATLAS Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers.

Page 10: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 10

Use of ACTS Tools

Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field

(Louie, Yoon, Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.

Advanced Computational Research in Fusion (SciDAC Project, PI Mitch Pindzola). Point of contact: Dario Mitnik (Dept. of Physics, Rollins College). Mitnik attended the workshop on the ACTS Collection in September 2000.

Since then he has been actively using some of the ACTS tools, in particular ScaLAPACK, for which he has

provided insightful feedback. Dario is currently working on the development, testing and support of new

scientific simulation codes related to the study of atomic dynamics using time-dependent close coupling lattice and time-independent methods. He reports that this work could not be carried out in sequential machines

and that ScaLAPACK is fundamental for the parallelization of these codes.

The international BOOMERanG collaboration announced results of the most detailed measurement of the cosmic microwave background radiation (CMB), which strongly indicated that the universe is flat (Apr. 27, 2000). Likelihood methods implemented in the MADCAP software package, using routines from ScaLAPACK, were used to examine the large dataset generated by BOOMERanG.

Performance of four science-of-scale applications that use ScaLAPACK

functionalities on an IBM SP

Page 11: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 11

Use of ACTS Tools

3D overlapping grid for a submarine produced with Overture’s module ogen.

Model of a "hard" sphere included in a "soft" material, 26 million d.o.f.

Unstructured meshes in solid mechanics using Prometheus and

PETSc (Adams and Demmel).

Multiphase flow using PETSc, 4 million cell blocks, 32 million DOF, over 10.6

Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes

(Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler).

3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a

legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of

Kaushik and Keyes).

Electronic structure optimization performed with TAO, (UO2)3(CO3)6

(courtesy of deJong).

Molecular dynamics and thermal flow simulation using codes based on Global

Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid,

etc.

Page 12: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 12

Use of ACTS Tools

Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon,

Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.

OPT++ is used in protein energy minimization problems (shown

here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)

Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution

of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution

of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.

Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.

Page 13: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 13

ScaLAPACK: software structure

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobal

Local

platform specific

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Communication routines targeting

linear algebra operations.

Communication routines targeting

linear algebra operations.

Parallel BLAS.

Parallel BLAS.

Communication layer (message

passing).

Communication layer (message

passing).

http://acts.nersc.gov/scalapack

Version 1.7 released in August 2001; recent NSF funding for further

development.

Version 1.7 released in August 2001; recent NSF funding for further

development.

Page 14: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 14

• Similar to the BLAS in portability, functionality and naming:• Level 1: vector-vector operations

• Level 2: matrix-vector operations

• Level 3: matrix-matrix operations

CALL DGEXXX( M, N, A( IA, JA ), LDA, ... )

CALL PDGEXXX( M, N, A, IA, JA, DESCA, ... )

• Built atop the BLAS and BLACS

• Provide global view of

the matrix operands

PBLAS

BLAS

PBLAS

(Parallel Basic Linear Algebra Subroutines)

array descriptor (see next slides)

array descriptor (see next slides)

A(IA:IA+M-1,JA:JA+N-1)

JA

IA

N_

N

MM_

Page 15: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 15

BLACS

• A design tool, they are a conceptual aid in design and coding.

• Associate widely recognized mnemonic names with communication operations. This improves:• program readability

• self-documenting quality of the code.

• Promote efficiency by identifying frequently occurring operations of linear algebra which can be optimized on various computers.

(Basic Linear Algebra Communication Subroutines)

Page 16: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 16

BLACS: basics

• Processes are embedded in a two-dimensional grid.

• An operation which involves more than one sender and one receiver is called a scoped operation.

10 32

0

0

1 2 3

54 76

98 1110

1 2

Scope Meaning

Row All processes in a process row participate.

Column All processes in a process column participate.

All All processes in the process grid participate.

Example: a 3x4 grid Example: a 3x4 grid

Page 17: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 17

ScaLAPACK: data layouts

• 1D block and cyclic column distributions

• 1D block-cycle column and 2D block-cyclic distribution• 2D block-cyclic used in ScaLAPACK for dense matrices

Page 18: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 18

ScaLAPACK: 2D Block-Cyclic Distribution

a11 a12 a15 a13 a14

a21 a22 a25 a23 a24

a51 a52 a55 a53 a54

a31 a32 a35 a33 a34

a41 a42 a45 a43 a44

5x5 matrix partitioned in 2x2 blocks 2x2 process grid point of view

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

0 1

2 3

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

Page 19: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 19

2D Block-Cyclic Distribution

http://acts.nersc.gov/scalapack/hands-on/datadist.html

Page 20: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 20

ScaLAPACK: array descriptors

• Each global data object is assigned an array descriptor

• The array descriptor:• Contains information required to establish mapping between a global array

entry and its corresponding process and memory location (uses concept of BLACS context).

• Is differentiated by the DTYPE_ (first entry) in the descriptor.

• Provides a flexible framework to easily specify additional data distributions or matrix types.

• User must distribute all global arrays prior to the invocation of a ScaLAPACK routine, for example:• Each process generates its own submatrix.

• One processor reads the matrix from a file and send pieces to other processors (may require message-passing for this).

SUBROUTINE PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO )

Page 21: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 21

Array Descriptor for Dense Matrices

DESC_() Symbolic Name Scope Definition

1 2 3 4 5 6 7 8 9

DTYPE_A CTXT_A M_A N_A MB_A NB_A RSRC_A CSRC_A LLD_A

(global) (global) (global) (global) (global) (global) (global) (global) (local)

Descriptor type DTYPE_A=1 for dense matrices. BLACS context handle.

Number of rows in global array A. Number of columns in global array A.

Blocking factor used to distribute the rows of array A.

Blocking factor used to distribute the columns of array A.

Process row over which the first row of the array A is distributed.

Process column over which the first column of the array A is distributed.

Leading dimension of the local array.

Page 22: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 22

ScaLAPACK: Functionality

xxxx

xLeast Squares

GQR

GRQ

xxxx

xxxx

xx

xxx

Symmetric

General

Generalized BSPD

SVD

SolutionReductionExpert Driver

Simple Driver

Ax = x or Ax = Bx

xxxxxx

xxx

xxxx

General

General Banded

General Tridiagonal

xxxxxx

xxx

xxxx

SPD

SPD Banded

SPD Tridiagonal

xxxxTriangular

Iterative Refinement

Conditioning Estimator

InversionSolveFactorExpert Driver

Simple Driver

Ax = b

Page 23: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

On line tutorial: http://acts.nersc.gov/scalapack/hands-on/main.html

Page 24: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 24

Global Arrays (GA) Wrappers

• Simpler than message-passing for many applications

• Complete environment for parallel code development

• Data locality control similar to distributed memory/message passing model

• Compatible with MPI

• Scalable

Distributed Data: data is explicitly associated with each processor, accessing data requires specifying the location of the data on the processor and the processor itself.Shared Memory: data is an a globally accessible address space, any processor can access data by specifying its location using a global index.GA: distributed dense arrays that can be accessed through a shared memory-like style.

http://www.emsl.pnl.gov/docs/global/ga.html

Page 25: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 25

TAU: Tuning and Performance Analysis

• Multi-level performance instrumentation• Multi-language automatic source instrumentation

• Flexible and configurable performance measurement

• Widely-ported parallel performance profiling system• Computer system architectures and operating systems

• Different programming languages and compilers

• Support for multiple parallel programming paradigms• Multi-threading, message passing, mixed-mode, hybrid

• Support for performance mapping

• Support for object-oriented and generic programming

• Integration in complex software systems and applications

Page 26: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 26

Definitions – Profiling

• Profiling• Recording of summary information during execution

• inclusive, exclusive time, # calls, hardware statistics, …• Reflects performance behavior of program entities

• functions, loops, basic blocks• user-defined “semantic” entities

• Very good for low-cost performance assessment• Helps to expose performance bottlenecks and hotspots• Implemented through

• sampling: periodic OS interrupts or hardware counter traps• instrumentation: direct insertion of measurement code

Page 27: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 27

Definitions – Tracing

• Tracing• Recording of information about significant points (events) during

program execution

• entering/exiting code region (function, loop, block, …)

• thread/process interactions (e.g., send/receive message)

• Save information in event record

• timestamp

• CPU identifier, thread identifier

• Event type and event-specific information

• Event trace is a time-sequenced stream of event records

• Can be used to reconstruct dynamic program behavior

• Typically requires code instrumentation

Page 28: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 28

TAU: Example 1 (1/4)

http://acts.nersc.gov/tau/programs/psgesv

Page 29: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 29

TAU: Example 1 (2/4)

Page 30: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 30

TAU: Example 1 (3/4)

PROGRAM PSGESVDRIVER!! Example Program solving Ax=b via ScaLAPACK routine PSGESV!! .. Parameters ..

!**** a bunch of things omitted for the sake of space **** ! .. Executable Statements ..!! INITIALIZE THE PROCESS GRID! integer profiler(2) save profiler

call TAU_PROFILE_INIT() call TAU_PROFILE_TIMER(profiler,'PSGESVDRIVER') call TAU_PROFILE_START(profiler) CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

!**** a bunch of things omitted for the sake of space ****

CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, & INFO )

!**** a bunch of things omitted for the sake of space ****

call TAU_PROFILE_STOP(profiler) STOP END

psgesvdriver.int.f90

NB. ScaLAPACK routines have not been instrumented and therefore are not shown in the charts.

Page 31: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 31

TAU: Example 2 (1/2)

http://acts.nersc.gov/tau/programs/pdgssvx

tau-multiplecounters-mpi-papi-pdt

Page 32: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 32

TAU: Example 2 (2/2)

PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact [email protected] for the corresponding TAU events). In this example we are just measuring FLOPS.

PARAPROF

Page 33: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 33

Who Benefits from these tools?

... More Applications …

http://acts.nersc.gov/AppMathttp://acts.nersc.gov/AppMat

Enabling sciencesand discoveries…

withhigh performance and scalability...

Page 34: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

Tool descriptions, installation

details, examples, etc

Agenda, accomplishmen

ts, conferences, releases, etc

Goals and other relevant

information

Points of

contact

Search engine

• High Performance Tools• portable• library calls• robust algorithms• help code optimization

• Scientific Computing Centers• Reduce user’s code development

time that sums up in more production runs and faster and effective scientific research results

• Overall better system utilization• Facilitate the accumulation and

distribution of high performance computing expertise

• Provide better scientific parameters for procurement and characterization of specific user needs

• High Performance Tools• portable• library calls• robust algorithms• help code optimization

• Scientific Computing Centers• Reduce user’s code development

time that sums up in more production runs and faster and effective scientific research results

• Overall better system utilization• Facilitate the accumulation and

distribution of high performance computing expertise

• Provide better scientific parameters for procurement and characterization of specific user needs

VECPAR 2006 ACTS Workshop 2006

http://acts.nersc.govhttp://acts.nersc.gov

Page 35: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 35

Journals Featuring ACTS Tools

• An Overview of the Advanced CompuTational Software (ACTS) Collection, by T. Drummond and O. Marques

• SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers, by A. Hindmarsh, P. Brown, K. Grant, S. Lee, R. Serban, D. Shumaker and C. Woodward.

• An Overview of SuperLU: Algorithms, Implementation, and User Interface, by X. Li. • SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems, by V.

Hernandez, J. Roman and V. Vidal. • An Overview of the Trilinos Project, by M. Heroux, R. Bartlett, V. Howle, R. Hoekstra, J.

Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring, A. Williams and K. Stanley.

• Pursuing Scalability for hypre's Conceptual Interfaces, by R. Falgout, J. Jones and U. Yang.

• A Component Architecture for High-Performance Scientific Computing, by D. Bernholdt, B. Allan, R. Armstrong, F. Bertrand, K. Chiu, T. Dahlgreen, K. Damevski, W. Elwasif, T. Epperly, M. Govindaraju, D. Saltz, J. Kohl, M. Krishnan, G. Kumfert, J. Larson, S. Lefantzi, M. Lewis, A. Malony, L. McInnes, J. Nieplocha, B. Norris, S. Parker, J. Ray, S. Shende, T. Windus and S. Zhou.

• CUMULVS: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance, by J. Kohl, T. Wilde and D. Bernholdt.

• Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit, by J. Nieplocha, B. Palmer, V. Tipparaju, M. Krishnan, H. Trease and E. Aprà.

• The TAU Parallel Performance System, by S. Shende and A. Malony. • High Performance Remote Memory Access Communication: The ARMCI Approach, by

J. Nieplocha, V. Tipparaju, M. Krishnan and D. Panda.

Spring 2006 Issue

September 2005 Issue

Page 36: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 36

ACTS Numerical Tools: Functionality

Computational Problem Methodology Algorithms Library

Systems of Linear Equations

Direct Methods

LU Factorization

ScaLAPACK(dense)

SuperLU (sparse)

Cholesky Factorization

ScaLAPACK

LDLT (Tridiagonal matrices)

ScaLAPACK

QR Factorization

ScaLAPACK

QR with column pivoting

ScaLAPACK

LQ factorization ScaLAPACK

Page 37: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 37

ACTS Numerical Tools: Functionality

Computational Problem

Methodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

Conjugate Gradient

AztecOO (Trilinos)

PETSc

GMRES AztecOO

PETSc

Hypre

CG Squared AztecOO

PETSc

Bi-CG Stab AztecOO

PETSc

Quasi-Minimal Residual (QMR)

AztecOO

Transpose Free QMR

AztecOO

PETSc

Page 38: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 38

ACTS Numerical Tools: Functionality

Computational Problem

Methodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

(cont..)

SYMMLQ PETSc

Precondition CG AztecOOPETScHypre

Richardson PETSc

Block Jacobi Preconditioner

AztecOOPETScHypre

Point Jocobi Preconditioner

AztecOO

Least Squares Polynomials

PETSc

Page 39: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 39

ACTS Numerical Tools: FunctionalityComputational

ProblemMethodology Algorithms Library

Systems of Linear Equations

(cont..)

Iterative Methods

(cont..)

SOR PreconditioningPETSc

Overlapping Additive Schwartz

PETSc

Approximate InverseHypre

Sparse LU preconditioner

AztecOOPETScHypre

Incomplete LU (ILU) preconditioner

AztecOO

Least Squares Polynomials

PETSc

MultiGrid (MG)

Methods

MG PreconditionerPETScHypre

Algebraic MGHypre

Semi-coarsening Hypre

Page 40: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 40

Computational Problem

Methodology Algorithm Library

Linear Least Squares Problems

Least Squares ScaLAPACK

Minimum Norm Solution

ScaLAPACK

Minimum Norm Least Squares

ScaLAPACK

Standard Eigenvalue Problem

Symmetric Eigenvalue Problem For A=AH or A=AT

ScaLAPACK (dense)

SLEPc (sparse)

Singular Value Problem

Singular Value Decomposition

ScaLAPACK (dense)

SLEPc (sparse)

Generalized Symmetric Definite Eigenproblem

Eigenproblem ScaLAPACK (dense)

SLEPc (sparse)

ACTS Numerical Tools: Functionality

minx || b Ax ||2

minx || x ||2

minx || x ||2

minx || b Ax ||2

Az z

A UVT

A UVH

Az Bz

ABz z

BAz z

Page 41: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 41

ACTS Numerical Tools: Functionality

Computational Problem

Methodology Algorithm Library

Non-Linear Equations

Newton Based

Line Search PETSc

Trust Regions PETSc

Pseudo-Transient Continuation

PETSc

Matrix Free PETSc

Page 42: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 42

ACTS Numerical Tools: FunctionalityComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization

Newton Based

Newton OPT++

TAO

Finite-Difference Newton

OPT++

TAO

Quasi-Newton OPT++

TAO

Non-linear Interior Point

OPT++

TAO

CG

Standard Non-linear CG

OPT++

TAO

Limited Memory BFGS

OPT++

Gradient Projections

TAO

Direct SearchNo derivate information

OPT++

Page 43: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 43

ACTS Numerical Tools: FunctionalityComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization

Newton Based

Newton OPT++

TAO

Finite-Difference Newton

OPT++

TAO

Quasi-Newton OPT++

TAO

Non-linear Interior Point

OPT++

TAO

CG

Standard Non-linear CG

OPT++

TAO

Limited Memory BFGS

OPT++

Gradient Projections

TAO

Direct SearchNo derivate information

OPT++

Page 44: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 44

ACTS Numerical Tools: FunctionalityComputational

ProblemMethodology Algorithm Library

Non-Linear Optimization (cont..)

Semismoothing

Feasible Semismooth

TAO

Unfeasible semismooth

TAO

Ordinary Differential Equations

IntegrationAdam-Moulton

(Variable coefficient forms)

CVODE (SUNDIALS)

CVODES

Backward Differential Formula

Direct and Iterative Solvers

CVODE

CVODES

Nonlinear Algebraic Equations

Inexact NewtonLine Search KINSOL (SUNDIALS)

Differential Algebraic Equations

Backward Differential Formula

Direct and Iterative Solvers

IDA (SUNDIALS)

Page 45: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 45

ACTS Tools: Functionality

Computational Problem

Support Techniques Library

Writing Parallel Programs

Distributed Arrays

Shared-Memory Global Arrays

Distributed Memory

CUMULVS (viz)Globus (Grid)

Grid Generation OVERTURE

Structured Meshes CHOMBO (AMR)HypreOVERTUREPETSc

Semi-Structured Meshes

CHOMBO (AMR)HypreOVERTURE

Distributed Computing

GRID Globus

Remote Steering CUMULVS

Coupling PAWS

Page 46: - ACTS - A Reliable Software Infrastructure for Scientific Computing Osni Marques Lawrence Berkeley National Laboratory (LBNL) oamarques@lbl.gov UC Berkeley.

04/20/2006UC Berkeley - CS267 46

ACTS Tools: FunctionalityComputational

ProblemSupport Technique Library

Writing Parallel Programs (cont.)

Distributed Computing

Check-point/restart CUMULVS

ProfilingAlgorithmic Performance

Automatic instrumentation

PETSc

User Instrumentation

PETSc

Execution Performance

Automatic Instrumentation

TAU

User Instrumentation

TAU

Code Optimization

Library Installation

Linear Algebra Tuning

ATLAS

InteroperabilityCode Generation

Language BABEL

CHASM

Components CCA