Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress
-
Upload
theodore-todd -
Category
Documents
-
view
25 -
download
0
description
Transcript of Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress
![Page 1: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/1.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM 1
Towards Scalable Cross-Platform Application Performance Analysis --
Tool Goals and ProgressShirley [email protected]
![Page 2: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/2.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
2
Scalability Issues
• Code instrumentation– Hand instrumentation too tedious for
large codes
• Runtime control of data collection• Batch queueing systems
– Cause problems for interactive tools
• Tracefile size and complexity• Data analysis
![Page 3: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/3.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
3
Cross-platform Issues
• Goal: similar user interfaces across different platforms
• Tools necessarily rely on platform-dependent substrates – e.g., for accessing hardware counters.
• Standardization of interfaces and data formats promotes interoperability and allows design of portable tools.
![Page 4: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/4.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
4
Where is Standardization Needed?
• Performance data– Trace records vs. summary statistics– Data format– Data semantics
• Library interfaces– Access to hardware counters– Statistical profiling– Dynamic instrumentation
![Page 5: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/5.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
5
Standardization? (cont.)
• User interfaces– Common set of commands– Common functionality
• Timing routines• Memory utilization information
![Page 6: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/6.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
6
Parallel Tools Consortium
• http://www.ptools.org/• Interaction between vendors,
researchers, and users• Venue for standardization• Current projects
– PAPI– DPCL
![Page 7: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/7.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
7
Hardware Counters
• Small set of registers that count events, which are occurrences of specific signals related to the processor’s function
• Monitoring these events facilitates correlation between the structure of the source/object code and the efficiency of the mapping of that code to the underlying architecture.
![Page 8: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/8.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
8
Goals of PAPI
• Solid foundation for cross platform performance analysis tools
• Free tool developers from re-implementing counter access
• Standardization between vendors, academics and users
• Encourage vendors to provide hardware and OS support for counter access
• Reference implementations for a number of HPC architectures
• Well documented and easy to use
![Page 9: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/9.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
9
PAPI Implementation
Tools!!!
PAPI Low LevelPAPI High Level
Hardware Performance Counter
Operating System
Kernel Extension
PAPI Machine Dependent SubstrateMachine
SpecificLayer
PortableLayer
![Page 10: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/10.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
10
PAPI Preset Events
• Proposed standard set of events deemed most relevant for application performance tuning
• Defined in papiStdEventDefs.h• Mapped to native events on a
given platform– Run tests/avail to see list of PAPI
preset events available on a platform
![Page 11: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/11.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
11
Statistical Profiling
• PAPI provides support for execution profiling based on any counter event.
• PAPI_profil() creates a histogram by text address of overflow counts for a specified region of the application code.
• Used in vprof tool from Sandia Lab
![Page 12: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/12.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
12
PAPI Reference Implementations
• Linux/x86, Windows 2000– Requires patch to Linux kernel, driver for Windows
• Linux/IA-64• Sun Solaris 2.8/Ultra I/II• IBM AIX 4.3+/Power
– Contact IBM for pmtoolkit
• SGI IRIX/MIPS• Compaq Tru64/Alpha Ev6 & Ev67
• Requires OS device driver patch from Compaq• Per-thread and per-process counts not possible• Extremely limited number of events
• Cray T3E/Unicos
![Page 13: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/13.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
13
PAPI Future Work
• Improve accuracy of hardware counter and statistical profiling data– Microbenchmarks to measure accuracy (Pat
Teller, UTEP)– Use hardware support for overflow
interrupts– Use Event Address Registers (EARs) where
available
• Data structure based performance counters (collaboration with UMd)– Qualify event counting by address range– Page level counters in cache coherence
hardware
![Page 14: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/14.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
14
PAPI Future (cont.)
• Memory utilization extensions (following list suggested by Jack Horner, LANL)– Memory available on a node– Total memory available/used– High-water-mark memory used by
process/thread– Disk swapping by process– Process-memory locality– Location of memory used by an object
• Dynamic instrumentation – e.g., PAPI probe modules
![Page 15: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/15.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
15
For More Information
• http://icl.cs.utk.edu/projects/papi/– Software and documentation– Reference materials– Papers and presentations– Third-party tools– Mailing lists
![Page 16: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/16.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
16
DPCL
• Dynamic Probe Class Library• Built of top of IBM version of
University of Maryland’s dyninst• Current platforms
– IBM AIX– Linux/x86 (limited functionality)
• Dyninst ported to more platforms but by itself lacks functionality for easily instrumenting parallel applications.
![Page 17: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/17.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
17
Infrastructure Components?
• Parsers for common languages• Access to hardware counter data• Communication behavior
instrumentation and analysis• Dynamic instrumentation
capability• Runtime control of data collection
and analysis• Performance data management
![Page 18: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/18.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
18
Case Studies
• Test tools on large-scale applications in production environment
• Reveal limitations of tools and point out areas where improvements are needed
• Develop performance tuning methodologies for large-scale codes
![Page 19: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/19.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
19
PERC: Performance Evaluation Research Center
• Developing a Developing a sciencescience for understanding for understanding
performance of scientific applications on high-end performance of scientific applications on high-end
computer systems. computer systems.
• Developing Developing engineeringengineering strategies for improving strategies for improving
performance on these systems. performance on these systems.
• DOE Labs: ANL, LBNL, LLNL, ORNLDOE Labs: ANL, LBNL, LLNL, ORNL
• Universities: UCSD, UI-UC, UMD, UTKUniversities: UCSD, UI-UC, UMD, UTK
• Funded by SciDAC: Scientific Discovery through Funded by SciDAC: Scientific Discovery through
Advanced ComputingAdvanced Computing
![Page 20: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/20.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
20
PERC: Real-World Applications
• High Energy and Nuclear PhysicsHigh Energy and Nuclear Physics– Shedding New Light on Exploding Stars: Terascale Simulations of Shedding New Light on Exploding Stars: Terascale Simulations of
Neutrino-Driven SuperNovae and Their NucleoSynthesisNeutrino-Driven SuperNovae and Their NucleoSynthesis– Advanced Computing for 21st Century Accelerator Science and Advanced Computing for 21st Century Accelerator Science and
TechnologyTechnology• Biology and Environmental ResearchBiology and Environmental Research
– Collaborative Design and Development of the Community Climate Collaborative Design and Development of the Community Climate
System Model for Terascale ComputersSystem Model for Terascale Computers• Fusion Energy SciencesFusion Energy Sciences
– Numerical Computation of Wave-Plasma Interactions in Multi-Numerical Computation of Wave-Plasma Interactions in Multi-
dimensional Systemsdimensional Systems• Advanced Scientific ComputingAdvanced Scientific Computing
– Terascale Optimal PDE Solvers (TOPS)Terascale Optimal PDE Solvers (TOPS)– Applied Partial Differential Equations Center (APDEC)Applied Partial Differential Equations Center (APDEC)– Scientific Data Management (SDM)Scientific Data Management (SDM)
• Chemical SciencesChemical Sciences– Accurate Properties for Open-Shell States of Large MoleculesAccurate Properties for Open-Shell States of Large Molecules
• ……and more…and more…
![Page 21: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/21.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
21
Parallel Climate Transition Model
• Components for Ocean, Atmosphere, Sea Ice, Land Surface and River Transport
• Developed by Warren Washington’s group at NCAR
• POP: Parallel Ocean Program from LANL
• CCM3: Community Climate Model 3.2 from NCAR including LSM: Land Surface Model
• ICE: CICE from LANL and CCSM from NCAR
• RTM: River Transport Module from UT Austin
• Fortran 90 with MPI
![Page 22: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/22.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
22
PCTM: Parallel Climate Transition Model
Flux CouplerLand
SurfaceModel
OceanModel Atmosphere
Model
Sea Ice Model
Sequential Executionof Parallelized Modules
RiverModel
![Page 23: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/23.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
23
PCTM Instrumentation
• Vampir tracefile in tens of gigabytes range even for toy problem
• Hand instrumentation with PAPI tedious• UIUC working on SvPablo
instrumentation• Must work in batch queueing
environment• Plan to try other tools
– MPE logging and jumpshot– TAU– VGV?
![Page 24: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/24.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
24
In Progress
• Standardization and reference implementations for memory utilization information (funded by DoD HPCMP PET, Ptools-sponsored project)
• Repositories of application performance evaluation case studies (e.g., SciDAC PERC)
• Portable dynamic instrumentation for parallel applications (DOE MICS project – UTK, UMd, UWisc)
• Increased functionality and accuracy of hardware counter data collection (DoD HPCMP, DOE MICS)
![Page 25: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/25.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
25
Next Steps
• Additional areas for standardization?– Scalable trace file format– Metadata standards for performance
data– New hardware counter metrics (e.g.,
SMP and DMP events, data-centric counters)
– Others?
![Page 26: Towards Scalable Cross-Platform Application Performance Analysis -- Tool Goals and Progress](https://reader030.fdocuments.net/reader030/viewer/2022032606/56812da1550346895d92c4e5/html5/thumbnails/26.jpg)
October 18, 2001 LACSI Symposium, Santa Fe, NM
26
Next Steps (cont.)
• Sharing of tools and data– Open source software– Machine and software profiles– Runtime performance data– Benchmark results– Application examples and case
studies• Long-term goal: common
performance tool infrastructure across HPC systems