Introduction to scientific computing using PETSc and Trilinos

128
Introduction to scientific computing using PETSc and Trilinos Václav Hapla David Horák Michal Merta PRACE Spring School, Cracow 2012

Transcript of Introduction to scientific computing using PETSc and Trilinos

Objektov orientovaný MATLAB praktický pínos pro implementaci numerických algoritmPETSc and Trilinos Václav Hapla David Horák
Michal Merta
many complex but well-known and often-used algorithms (LU, CG, matrix-vector multiply, …) have been already implemented, tested and are ready to use!
a software framework is a software providing generic functionality that can be selectively changed by user code, thus providing application specific software (wikipedia.org)
motivation: programmers should consider focusing on new, original algorithms that make an added value
Frameworks for scientific computing – why ?
are parallelized on the data level (vectors & matrices) using MPI
use BLAS and LAPACK – de facto standard for dense LA
have their own implementation of sparse BLAS
include robust preconditioners, linear solvers (direct and iterative) and nonlinear solvers
can cooperate with many other external solvers and libraries (e.g. MATLAB, MUMPS, UMFPACK, …)
already support CUDA and hybrid parallelization
are licensed as open-source
Both PETSc and Trilinos…
„essential object orientation“
for programmers used to procedural programming but seeking for modular code
recommended for C and FORTRAN users
Trilinos
„pure object orientation“
for programmers who are not scared of OOP, appreciate good SW design and have some experience with C++
extensibility and reusability
Linear solvers
Debugging & profiling
PETSc = Portable, Extensible Toolkit for Scientific computation
developed by Argonne National Laboratory since 1991
data structures and routines for the scalable parallel solution of scientific applications modeled by PDE
coded primarily in C language but good FORTRAN support, can also be called from C++ and Python codes
homepage: www.mcs.anl.gov/petsc
PETSc project (1)
code and mailing lists open to anybody
portable to any parallel system supporting MPI
tightly coupled systems (Cray XT5, BG/P, Earth Simulator, Sun Blade, SGI Altix)
loosely coupled systems, such as networks of workstations (Linux, Windows, IBM, Mac, Sun)
iPhone support
PETSc project (2)
Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet.
Barry Smith
(PETSc Team)
Role of PETSc
„We will continually add new features and enhanced functionality in upcoming releases; small changes in usage and calling sequences of PETSc routines will continue to occur. Although keeping one's code accordingly up-to-date can be annoying, all PETSc users will be rewarded in the long run with a cleaner, better designed, and easier-to-use interface.“
Changes
Documentation
PETSc users manual – PDF (fully searchable, hypertext)
help topics – general topics such as „error handling“, „multigrid“, „shared memory“
manual pages – individual routines, split into 4 categories: Beginner - basic usage
Intermediate - setting options for algorithms and data structures
Advanced - setting more advanced options and customization
Developer - interfaces intended primarily for library developers
MPI provides low-level tools to exchange data primitives between processes
PETSc provides medium-level tools such as insert matrix element to arbitrary location
parallel matrix-vector product
but you can call arbitrary MPI routine directly if needed
same code for sequential and parallel runs
Parallelism in PETSc
Dense LA: BLAS, LAPACK, BLACS, ScaLAPACK, PLAPACK
Graphs & load balancing: ParMetis, Chaco, Jostle, Party, Scotch, Zoltan
Direct linear solvers: MUMPS, Spooles, SuperLU, SuperLU_Dist, UMFPack
PETSc cooperates with... (2)
Multigrid: Trilinos ML
Eigenvalue solvers: BLOPEX
Data exchange: HDF5
SLEPc - Scalable Library for Eigenvalue Problems
fluidity - a finite element/volume fluids code
Prometheus - scalable unstructured finite element solver
freeCFD - general purpose CFD solver
OpenFVM - finite volume based CFD solver
OOFEM - object oriented finite element library
libMesh - adaptive finite element library
Packages that use/extend PETSc (1)
MOOSE - Multiphysics Object-Oriented Simulation Environment developed at INL built on top of libmesh on top of PETSc
DEAL.II - sophisticated C++ based finite element simulation package
PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project
Chaste - Cancer, Heart and Soft Tissue Environment
Packages that use/extend PETSc (2)
PETSc has been used for modeling in all of these areas: Acoustics, Aerodynamics, Air Pollution, Arterial Flow, Bone Fractures, Brain Surgery, Cancer Surgery, Cancer Treatment, Carbon Sequestration, Cardiology, Cells, CFD, Combustion, Concrete, Corrosion, Data Mining, Dentistry, Earth Quakes...
Applications (1)
Applications (2)
Fracture mechanics
Mechanics- elasticity
Real-time surgery
Magma dynamics
Václav Hapla
stable releases of PETSc can be downloaded via HTTP as a tarball
petsc-3.2-p7.tar.gz - full distribution (including all current patches) with documentation
petsc-lite-3.2-p7.tar.gz - smaller version with no documentation (all documentation may be accessed online)
Download - tarball
stable releases as well as current development release can be downloaded using Mercurial versioning system
caution – build system has its own separate repository!
stable: hg clone http://petsc.cs.iit.edu/petsc/releases/petsc-3.2
realizes PETSc auto-tuning capabilities
sets many internal variables and macros depending on the machine
generates makefile
PETSC_DIR and PETSC_ARCH variables that control the configuration and build process of PETSc
These variables can be set as environment variables or specified on the command line.
PETSC_DIR points to the location of the PETSc installation that is used.
Multiple PETSc versions can coexist on the same file-system. By changing PETSC_DIR value, one can switch between these installed versions of PETSc.
PETSC_DIR
PETSC_ARCH variable gives a name to a configuration and build.
configure uses this value to store the generated makefiles in ${PETSC_DIR}/${PETSC_ARCH}/conf.
make uses this value to determine the location
program libraries (.a or .so) of PETSc and downloaded external packages are stored into ${PETSC_DIR}/${PETSC_ARCH}/lib
Thus one can install multiple variants of PETSc libraries - by providing different PETSC_ARCH values to each configure build.
Then one can switch between using these variants of libraries by switching the PETSC_ARCH value used.
PETSC_ARCH
download and compile automatically:
--download-[pkg] - downloads and installs a package for you in $PETSC_DIR/lib
use existing installation
--with-[pkg] =<bool> test for [pkg]
--with-[pkg]-dir=<dir> the root directory of the [pkg] installation
--with-[pkg]-include=<dirs>
External packages
./configure --with-batch
for machines with a batch system
configure generates special executable binary conftest
run conftest on one computing node (e.g. submit the batch script)
it will generate a new ./reconfigure-$PETSC_ARCH script with machine specific variables set (cache size etc.)
run ./reconfigure to complete the configuration stage
Batch mode
after configuration stage is completed successfully you get the message like this Configure stage complete. Now build PETSc libraries with (cmake build): make PETSC_DIR=/home/vhapla/devel/petsc-dev \
PETSC_ARCH=debug-so-mpich2-gnu all
you can copy and paste the make command
it will compile the source files and build the program library
it can make use of CMake if installed
significant speedup of compilation
#include "petsc.h"
#undef __FUNCT__
int main(int argc,char **argv)
Declare the name of each routine by redefining __FUNCT__ macro to get more useful tracebacks on error
Program header in C
program init
implicit none
#include "finclude/petsc.h"
FORTRAN has more limited error handling, one cannot use __FUNCT__ macro
If you are familiar with C, please use C.
We will focus on PETSc C interface.
Program header in F
You can include all PETSc headers at once by #include "petsc.h" //includes all PETSc headers
Or you can include specific headers #include "petscsys.h" //framework routines
#include "petscvec.h" //vectors
#include "petscmat.h" //matrices
#include "petscksp.h" //includes vec,mat,dm,pc
What headers to include?
#include <petscsys.h>
ierr = PetscFinalize();CHKERRQ(ierr);
return 0;
ends with the call to PetscFinalize()
they call MPI_Init(), MPI_Finalize()
#include <petscsys.h>
ierr = PetscFinalize();CHKERRQ(ierr);
return 0;
argc,argv - propagate command line arguments to PETSc and MPI
help - additional help messages to print when the executable is invoked with the cmd-line-arg -help (will be discussed later)
PETSc is written in C
C has no support for C++ exceptions
instead of throwing exception, every routine returns integer error code (PetscErrorCode type)
error code is „catched“ by CHKERRQ(ierr) macro
PetscErrorCode ierr;
ierr = SomePetscRoutine();CHKERRQ(ierr);
This code throws this error: PetscInitialize() must be called before PetscFinalize()
(+ stacktrace)
Communicators
communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel
PETSc built-in communicators: PETSC_COMM_SELF – just this process – for serial objects
PETSC_COMM_WORLD – all processes – for parallel objects
MPI can split communicators, spawn processes on new communicators – PETSc does not deal with it
Function Collectiveness
2. Logically Collective – checked when running in debug mode KSPSetType(), PCMGSetCycleType()
3. Neighbor-wise Collective – point-to-point communication between two processes VecScatterBegin(), MatMult()
4. Collective – global communication, synchronous VecNorm(), MatAssemblyBegin(), KSPCreate()
PETSc provides many useful utilities
prefixed by Petsc
Utility routines (1)
PETSc has its own typedefs for numeric data types
It is better to use them instead of built-in C types
Better portability and easier switching between
real and complex numbers
32-bit and 64-bit numbers
PetscOptionsGetString,
Options (1)
PETSC_NULL);
-help command-line argument prints essential info about the PETSc-based program:
program description (the last argument of PetscInitialize()
options specific for the program
general built-in options
PETSc version
command-line help
Input parameters include:
-m <mesh_x> : number of mesh points in x-direction
-n <mesh_n> : number of mesh points in y-direction
-----------------------------------------------------------
...
-----------------------------------------------------------
-help: prints help method for each option
-on_error_abort: cause an abort when an error is detected. Useful
only when run in the debugger
...
~/.petscrc
$PWD/.petscrc
$PWD/petscrc
PetscOptionsInsertFile()
PetscOptionsInsertString()
Ways to set options
F: PetscPrintf(MPI_Comm, character(*),
Print to standard output
#include <petscsys.h>
PetscFinalize();
program main
endif
static char help[] = "Hello world program.\n\n";
#include <petscsys.h>
ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRQ(ierr);
rank);CHKERRQ(ierr);
PETSc Hello world in C - with error checking
To obtain output of the first processor followed by that of the second, etc., one can call:
PetscSynchronizedPrintf(PETSC_COMM_WORLD,
PetscSynchronizedFlush(PETSC_COMM_WORLD);
Hello World from 1
Hello World from 2
Hierarchy of components le
MPI_Comm is the first argument of every object‘s constructor
two objects can only interact if they belong to the same communicator
Objects and communicators
PETSc uses specific and limited inheritance
every object in PETSc is an instance of a class: Vec, Mat, KSP, SNES, …
functions called on objects (= methods in C++) are prefixed by a class name: MatMult(Mat,…)
class is specified when the object is created using proper Create function (= constructor in C++): Mat A;
MatCreate(comm, &A);
classes are further subdivided into types: seqaij,mpidense,composite,…
= seq. sparse, par. dense, implicit matrix addition/multiplication
type of object is specified during object lifetime Mat A;
MatCreate(comm, &A);
MatSetType(A, MATSEQAIJ);
you don‘t access inner fields directly
in include/petscmat.h you can find typedef struct _p_Mat* Mat;
so B = A only copies pointer, not data
prevents unwanted data copying
makes pointer handling easier
PETSc object oriented design: opaque objects
Polymorphism
public interface
uniform for all types of matrices: sequential, parallel, dense, sparse, …
documented
calls private implementation based on type: MatMult_SeqDense(Mat A,Vec x,Vec y)
hidden, specific for each matrix type
PetscObject (1)
Every PETSc object can be cast to PetscObject: Mat A;
PetscObject obj;
Get/SetName() – name the object (used for printing, MATLAB interface, etc.)
GetType() – the type of the object
GetComm() – the communicator the object belongs to
PetscObject (2)
Mat A;
char *type;
MPI_Comm comm;
PetscObjectGetComm((PetscObject)A,&comm);
PetscObjectGetType((PetscObject)A,&type);
...
...
once again: method names must be prefixed by the class name: Vec,Mat,KSP, etc.
all PETSc buil-in classes support following methods
Create() - create the object
Common methods (1)
SetFromOptions() - set all options of the object from the options database
Get/SetOptionsPrefix() - set a specific option prefix for the given object
SetUp() - prepare the object inner state for computation
View() - print object info to specified output
Destroy() - deallocate the memory used by the object
Common methods (2)
Destroy method uses simple reference counting.
If counter > 0, then only nullify the pointer and decrement the counter.
If reference count equals 0
call type-specific private destroy routine
deallocate the whole object
So PETSc uses destroy always paradigm
Not like smart pointers in new C standard, Boost or Trilinos RCP, that use destroy never paradigm
Destroy
PETSc contains special PetscViewer class for printing to stdout, files (several text and binary formats), strings or even socket connection
basic usage: PetscViewer viewer;
Viewers (1)
Viewer v; Mat A; Vec x;
...
PetscViewerCreate(PETSC_COMM_WORLD, &viewer);
PetscViewerSetType(viewer, PETSCVIEWERASCII);
PetscViewerFileSetMode(viewer, FILE_MODE_APPEND);
PetscViewerFileSetName(viewer, "test.txt");
PetscViewerASCIIPrintf(viewer, "test line %d\n", i);
}
PetscViewer Example (1)
This program will append the following text to the file test.txt:
test line 0
test line 1
test line 2
test line 3
test line 4
test line 5
PetscViewer Example (2)
David Horák
Vec v;
a vector is an array of PetscScalars
the vector object is not completely created in one call, you must at least set sizes: VecSetSizes(Vec v, int m, int M);
Create another vector with the same type and layout: VecDuplicate(Vec v,Vec *w);
Vec: Vectors
Create vector from user provided array:
VecCreateSeqWithArray(MPI_Comm comm,
Vec *v)
VecCreateMPIWithArray(MPI_Comm comm,
Global size can be specified as PETSC_DECIDE.
Local size can be specified as PETSC_DECIDE.
Vector parallel layout
Query vector layout:
VecSet(x,1.0);
PetscScalar*,InsertMode);
Setting vector values (1)
Setting vector values (2)
Set more entries at once: ii[0]=1; ii[1]=2; vv[0]=2.7; vv[1]=3.1;
VecSetValues(x,2,ii,vv,INSERT_VALUES);
ADD_VALUES - add to original value
VecSetValues is not collective, values are cached
after setting all values, you must call assembly routine to exchange values between processors: VecAssemblyBegin(Vec x);
VecAssemblyEnd(Vec x);
get a copy of entries of x with indices ix to an array y:
VecGetValues(Vec x, PetscInt ni, const PetscInt ix[],
PetscScalar y[])
Vec x; PetscScalar *a;
VecRestoreArray(Vec x,PetscScalar *a[]);
Getting values
int localsize,first,i;
PetscScalar *a;
VecGetLocalSize(x,&localsize);
VecGetOwnershipRange(x,&first,PETSC_NULL);
VecGetArray(x,&a);
printf("Vector element %d : %e\n",
first+i,a[i]);
VecScale(Vec x, PetscScalar a);
VecMDot(Vec x,int n,Vec y[],PetscScalar *r);
VecNorm(Vec x,NormType type, double *r);
VecSum(Vec x, PetscScalar *r);
VecCopy(Vec x, Vec y);
VecSwap(Vec x, Vec y);
VecMax(Vec x, int *idx, double *r);
VecMin(Vec x, int *idx, double *r);
VecAbs(Vec x);
VecReciprocal(Vec x);
generalization of an integer array
can be distributed (if comm has more than one process)
general IS: IS is; PetscInt indices[]={1,3,7}; PetscInt n=3;
ISCreateGeneral(comm,n,indices,PETSC_COPY_VALUES,&is);
ISCreateGeneral(comm,n,indices,PETSC_OWN_VALUES,&is);
ISDestroy(&is) is called */
IS: Index Sets (1)
IS: Index Sets (2)
Various manipulations: ISSum, ISDifference, ISInvertPermutations
To get the values given by isx from x and put them at positions
determined by isy into y:
VecScatterCreate(Vec x,IS isx,Vec y,IS isy,VecScatter*)
VecScatterBegin(VecScatter,Vec x,Vec y,InsertMode,
IS & VecScatters
Creating a vector and a scatter context that copies all values of MPI
vector vin to each processor into Seq. vector vout :
VecScatterCreateToAll(Vec vin,VecScatter *ctx,Vec *vout)
Creating an output vector and a scatter context used to copy all
values of MPI vector vin into the seq. vector vout on the zeroth core
VecScatterCreateToZero(Vec vin,VecScatter *ctx,Vec *vout)
VecScatterDestroy()
MatSetType(A,MATSEQAIJ); /*or MATMPIAIJ,MATAIJ */
PetscInt N);
Mat: Matrices
PetscInt nz, const PetscInt nnz[],Mat *A);
nz - expected number of nonzeros per row (or slight overestimate)
nnz - array of expected row lengths (or slight overestimates)
considerable savings over dynamic allocation!
Matrix creation all in one
MatCreateMPIAIJ(MPI_Comm comm,PetscInt m,
d_nnz - array of # of nonzeros per row in diagonal part
o_nnz - array of # of nonzeros per row in off-diagonal part
Matrix creation all in one
Basic matrix types
MATAIJ, MATSEQAIJ, MATMPIAIJ
basic sparse format, known as compressed row format, CRS, Yale
MATAIJ is identical to MATSEQAIJ when constructed with a single process communicator, and MATMPIAIJ otherwise.
MATBAIJ, MATSEQBAIJ, MATMPIAIJ
multiple DOFs per mesh node
MATDENSE, MATSEQDENSE, MATMPIDENSE
MatGetOwnershipRange(Mat A, PetscInt *first row,
PetscInt *last row);
Querying parallel structure
right - vector that the matrix can be multiplied against
left - vector that the matrix vector product can be stored in
both can be PETSC_IGNORE
No sparsity pattern
any processor can set any element => potential for lots of malloc calls
malloc is very expensive
tell PETSc the matrix' sparsity structure (do construction loop twice: once counting, once making)
MatSeqAIJSetPreallocation(Mat B,
Matrix Preallocation
PetscScalar va,InsertMode mode);
MatSetValues(Mat A,
Setting values
MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY);
MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);
Assembling the matrix
PetscScalar v[])
Gets a block of values given by idxm and idxn from a matrix, only returns a local block
mat - the matrix
m, idxm - the number of rows and their global indices
n, idxn - the number of columns and their global indices
The user must allocate space (m*n PetscScalars) for the values v which are then returned in a row-oriented format, analogous to that used by default in MatSetValues()
Getting Values
Matrix elements can only be obtained locally
PetscErrorCode MatGetRow(Mat mat,PetscInt row,
PetscInt *ncols,const PetscInt *cols[],
MatReuse cll, Mat *newmat)
Extract multiple single-processor matrices:
MatGetSubMatrices(Mat mat, PetscInt n,
MatReuse scall, Mat *submat[])
Submatrices
MatTranspose(Mat A, MatReuse reuse, Mat *B)
computes an out-of-place transpose B of a matrix A if reuse=MAT_INITIAL_MATRIX or
an in-place transpose of a matrix A if reuse=MAT_REUSE_MATRIX and B=A
MatMultTranspose()
MatMultTransposeAdd()
MatIsTranspose()
Matrix operations
Implicit matrices
some of the matrix types in PETSc are not stored by elements but they behave like normal matrices in some operations
nomenclature: matrix-free, implicit, not assembled, not formed, not stored ...
the most important operation is a matrix-vector product (MatMult) which can be considered an application of a linear operator
when using an iterative solver, this operation suffices to solve a linear system
matrix type MATTRANSPOSE
maintains pointer to the original matrix
its MatMult just calls MatMultTranspose of an underlying matrix and vice versa
MatTranspose (1)
MatCreateTranspose(A, &Ati);
MatTranspose (2)
// F = A*B*C (implicitly)
MatCreateComposite(comm, 3, arr, &F);
MatCreateComposite(comm, 3, arr, &G);
matrix type MATSHELL
no predefined operation
arbitrary size
any operations can be defined by the user (C function pointers) using MatShellSetOperation function
can have a context with additional data
MatShellSetContext(Mat mat,void *ctx);
MatShellGetContext(Mat mat,void **ctx);
MyType *matData;
PetscFunctionReturn(0);
Linear solvers David Horák
Solving a linear system Ax = b with Gaussian elimination can take a lot of time and memory.
alternative: iterative solvers use successive approx. of the solution:
convergence not always guaranteed
basic operation: y = Ax executed once per iteration
convergence can be accelerated by a preconditioner B ~ A-1
KSP & PC: Iterative solvers
direct solvers - one iteration with perfect preconditioning (LU, Cholesky)
Object oriented: solvers only need matrix action, so can handle shell matrices
Preconditioners
Tolerances
Basic concepts
KSPCreate(comm,&solver);
KSPSetFromOptions(solver);
then options -ksp_... are parsed -ksp_type gmres
-ksp_gmres_restart 20
solution may be completely wrong
KSPGetConvergedReason(solver,&reason)
KSPGetIterationNumber(solver,&nits) after how many iterations did the method stop?
Convergence
KSPSetTolerances(solver,rtol,atol,dtol,maxit);
-ksp_monitor
-ksp_monitor_true_residual
Monitors and convergence tests
Many options for the (mathematically) sophisticated user, some specific to one method
KSPSetInitialGuessNonzero
KSPGMRESSetRestart
KSPSetPreconditionerSide
KSPSetNormType
KSPSetNullSpace(ksp,sp);
The solver will now properly remove the null space at each iteration.
Null spaces
PC usually created as part of KSP: separate create and destroy calls exist, but are (almost) never needed
KSP solver; PC precon;
Controllable through commandline options:
-pc_type ilu -pc_factor_levels 3
PC basics
Iterative method with direct solver as preconditioner would converge in one step
Direct methods in PETSc implemented as special iterative method: KSPPREONLY only apply preconditioner - skips stopping criteria etc.
All direct methods are preconditioner type PCLU:
myprog -pc_type lu -ksp_type preonly \
-pc_factor_mat_solver_package mumps
MatGetOrdering(A,MATORDERING_NATURAL,&isr,&isc);
// Solves A x = b, given a factored matrix, for a
collection of vectors
Low-level direct methods
Krylov Subspace Methods
MatStructure flag);
Can access subobjects
Can change solver dynamically from the command line
-ksp_type bicgstab
SNESSetFunction(SNES snes,Vec r,residualFunc,
jacFunc,void *ctx);
Can access subobjects
Set the subdomain preconditioner to ILU with -sub_pc_type ilu
Nonlinear solvers - summary
1 Sequential LU
EUCLID & PILUT (Hypre, David Hysom, LLNL)
ESSL (IBM)
Matlab
LUSOL (MINOS, Michael Saunders, Stanford)
2 Parallel LU
3 Parallel Cholesky
MUMPS (Patrick Amestoy, Toulouse)
CHOLMOD (Tim Davis, Florida)
1 Parallel ICC
2 Parallel ILU
SPAI 3.0 (Marcus Grote and Barnard, NYU)
4 Sequential Algebraic Multigrid
SAMG (Klaus Steuben, GMD)
5 Parallel Algebraic Multigrid
Prometheus (Mark Adams, PPPL)
3rd party preconditioners in PETSc
DM: Data management and grid manipulation
SNES: Nonlinear solvers
TS: Time stepping
Debugging & profiling
Attach debugger only to some parallel processes: -debugger_nodes 0,1
Put a breakpoint in PetscError() to catch errors as they occur
Debugging - stepping
the CHKMEMQ macro causes a check of all allocated memory
track memory overwrites by bracketing them with CHKMEMQ
PETSc checks for leaked memory
use PetscMalloc() and PetscFree() for all allocation
print unfreed memory on PetscFinalize() with -malloc_dump
Simply the best tool today is valgrind (http://www.valgrind.org)
it checks memory access, cache performance, memory usage...
needs -trace-children=yes when running under MPI
Debugging - memory checking
PETSc has integrated profiling (timing, flops, memory usage, MPI messages)
Option -log_summary prints a report on PetscFinalize()
PETSc allows user-defined events
PetscLogEventRegister(), PetscLogEventBegin/End()
to create and to manage events reporting time, calls, flops, communication, etc.
Memory usage is tracked by object
Events may also be nested and will aggregate in a nested fashion
Profiling is separated into stages
PetscLogStageRegister(), PetscLogStagePush/Pop()
to create and to to manage stages identified by an integer handle
Stages may be nested, but will not aggregate in a nested fashion
Profiling
Introduction to PETSc, TACC, Jan 17, 2012 (Victor Eijkhout). Slides
Short Course at the Graduate University, Chinese Academy of Sciences, Beijing, China, July 2010 (Matthew Knepley). Slides
Tutorial at ICES, UT Austin, TX September 2011 (Matthew Knepley). Slides
PETSc homepage, http://www.mcs.anl.gov/petsc/