Scalable Scientific Applications C haracteristics & Future Directions

39
Managed by UT-Battelle for the Department of Energy CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008 Scalable Scientific Applications Characteristics & Future Directions Douglas B. Kothe And Richard Barrett, Ricky Kendall, Bronson Messer, Trey White Leadership Computing Facility National Center for Computational Sciences Oak Ridge National Laboratory

description

Scalable Scientific Applications C haracteristics & Future Directions. Douglas B. Kothe And Richard Barrett, Ricky Kendall, Bronson Messer, Trey White Leadership Computing Facility National Center for Computational Sciences Oak Ridge National Laboratory. - PowerPoint PPT Presentation

Transcript of Scalable Scientific Applications C haracteristics & Future Directions

Page 1: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battellefor the Department of Energy CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Scalable Scientific ApplicationsCharacteristics & Future Directions

Douglas B. Kothe

And Richard Barrett, Ricky Kendall, Bronson Messer, Trey WhiteLeadership Computing Facility

National Center for Computational Sciences Oak Ridge National Laboratory

Page 2: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

2

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Science Teams Have Specific PF Objectives

Application Area

Science Driver Science Objective Impact

Combustion(S3D)

Predictive engineering engine design simulation tool for new engine design

Understanding flame stabilization in lifted autoigniting diesel fuel jets relevant to low-temperature combustion for engine design at realistic operating conditions

Potential for 50% increase in efficiency and 20% savings in petroleum consumption with lower emission, leaner burning engines

Fusion(GTC)

Understand and quantify physics and properties of ITER scaling and H‑mode confinement

Strongly coupled and consistent wall-to-edge-to-core modeling of ITER plasmas; attain a realistic assessment of ignition margins

ITER design and operation

Chemistry(MADNESS)

Computational catalysis Describe large systems accurately with modern hybrid and meta density functional theory functionals

Generate quantitative catalytic reaction rates and guide small system calibration

Nanoscale Science(DCA++)

Material-specific understanding of high‑temperature superconductivity theory

Understand the quantitative differences in the transition temperatures of high-temperature superconductors

Macroscopic quantum effect at elevated temperatures (>150K); new materials for power transmission and oxide electronics

Climate(POP)

Accurate representation of ocean circulation

Fully coupled eddy-resolving ocean and sea ice model to reduce the coupled model biases where ice and deep water parameters are governed by the accurate representation of current systems

Reduce current uncertainties in coupled ocean-sea ice system model

Geoscience(PFLOTRAN)

Perform multiscale, multiphase, multi-component modeling of a 3-D field CO2

injection scenario

Include oil phase and four-phase liquid-gas-aqueous-oil system to describe dissipation of the supercritical CO2 phase and escape of CO2 to the

surface

Demonstrate viability of and potential for sequestration of anthropogenic CO2 in deep geologic formations

Astrophysics(CHIMERA)

Understand the core-collapse supernova mechanism for a range of progenitor star masses

Perform core-collapse simulations with sophisticated spectral neutrino transport, detailed nuclear burning, and general relativistic gravity

Understand the origin of many elements in the Periodic Table and the creation of neutron stars and black holes

Page 3: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

3

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Application Requirements at the PF

Application categories analyzed– Science motivation and impact– Science quality and productivity– Application models, algorithms, software– Application footprint on platform– Data management and analysis– Early access science-at-scale scenarios

Results– 100+ page Application Requirements Document published in Jul 07

– New methods for categorizing platforms and application attributes devised and utilized in analysis: guiding tactical infrastructure purchase and deployment

– But still too qualitative! More work to do….

Page 4: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

4

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Application Codes in 2008An Incomplete List

Astrophysics– CHIMERA, GenASiS, 3DHFEOS,

Hahndol, SNe, MPA-FT, SEDONA, MAESTRO, AstroGK

Biology– NAMD, LAMMPS

Chemistry– CPMD, CP2K, MADNESS, NWChem,

Parsec, Quantum Expresso, RMG, GAMESS

Nuclear Physics– ANGFMC, MFDn, NUCCOR, HFODD

Engineering– Fasel, S3D, Raptor, MFIX, Truchas,

BCFD, CFL3D, OVERFLOW, MDOPT

High Energy Physics– CPS, Chroma, MILC

Fusion– AORSA, GYRO, GTC, XGC

Materials Science– VASP, LS3DF, DCA++, QMCPACK,

RMG, WL-LSMS, WL-AMBER, QMC

Accelerator Physics– Omega3P, T3P

Atomic Physics– TDCC, RMPS, TDL

Space Physics– Pogorelov

Climate & Geosciences– MITgcm, PFLOTRAN, POP,

CCSM (CAM, CICE, CLM, POP)

Computer Science (Tools)– Active Harmony, IPM, KOJAK, mpiP,

PAPI, PMaC, Sca/LAPACK, SvPablo, TAU

Page 5: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

5

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Apps Teams Are Reasonably Adept at Using our Current Systems*

*Is the “field of dreams” approach inadequate (too little too late)?What is “effective utilization”? Scaling? Percent of Peak (Jacoby vs MG)?Current SC apps range from 2-70% of peak: what’s the goal?Remember, we improve what we measure so let’s have the right metrics and measuresMy $0.02: science and engineering achievements on these systems is the legacy

Page 6: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

6

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Science WorkloadJob Sizes and Resource Usage of Key Applications

Code2007 Resource

Utilization(M core-hours)

Projected 2008 Resource Utilization

(M core-hours)

Typical Job Size in 2006-2007(K cores)

Anticipated Job Size in 2008

(K cores)

CHIMERA2

(under development)16

0.25(under development)

>10

GTC 8 7 8 12

S3D 6.5 18 8-12 >15

POP 4.8 4.7 4 8

MADNESS1

(under development)4

0.25(under development)

>8

DCA++N/A

(under development)3-8

N/A(under development)

4-16 (w/o disorder)>40 (with disorder)

PFLOTRAN0.37

(under development)>2

1-2(under development)

>10

AORSA 0.61 1 15-20 >20

Page 7: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

7

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Preparing for the ExascaleLong-Term Science Drivers and Requirements

We have recently surveyed, analyzed, and documented the science drivers and application requirements envisioned for exascale leadership systems in the 2020 timeframe

These studies help to– Provide a roadmap for the ORNL

Leadership Computing Facility

– Uncover application needs and requirements

– Focus our efforts on those disruptive technologies and research areas in need of our and the HPC community’s attention

Page 8: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

8

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

All projections are daunting– Based on projections of existing technology both

with and without “disruptive technologies”

– Assumed to arrive in 2016-2020 timeframe

Example 1– 115K nodes @ 10 TF per node, 50-100 PB, optical

interconnect, 150-200 GB/s injection B/W per node, 50 MW

Examples 2-4 (DOE “Townhall” report*)

What Will an EF System Look Like?

*www.er.doe.gov/ASCR/ProgramDocuments/TownHall.pdf

Page 9: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

9

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Science Prospects and Benefits with High End Computing (EF?) in the Next Decade

Opportunity Key application areas Goal and benefit

Materials science Nanoscale science, manufacturing, and material lifecycles, response and failure

Design, characterize, and manufacture materials, down to the nanoscale, tailored and optimized for specific applications

Earth science Weather, carbon management, climate change mitigation and adaptation, environment

Understand the complex biogeochemical cycles that underpin global ecosystems and control the sustainability of life on Earth

Energy assurance Fossil, fusion, combustion, nuclear fuel cycle, chemical catalysis, renewables (wind, solar, hydro), bioenergy, energy efficiency, power grid, transportation, buildings

Attain, without costly disruption, the energy required by the United States in guaranteed and economically viable ways to satisfy residential, commercial, and transportation requirements

Fundamental science

High energy physics, nuclear physics, astrophysics, accelerator physics

Decipher and comprehend the core laws governing the Universe and unravel its origins

Biology and medicine

Proteomics, drug design, systems biology

Understand connections from individual proteins through whole cells into ecosystems and environments

National security Disaster management, homeland security, defense systems, public policy

Analyze, design, stress-test, and optimize critical systems such as communications, homeland security, and defense systems; understand and uncover human behavioral systems underlying asymmetric operation environments

Engineering design Industrial and manufacturing processes

Design, deploy, and operate safe and economical structures, machines, processes, and systems with reduced concept-to-deployment time

Page 10: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

10

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Science Case: Climate

250 TF– Mitigation: Initial simulations

with dynamic carbon cycle and limited chemistry

– Adaptation: Decadal simulations with high-resolution ocean (1/10°)

1 PF– Mitigation: Full chemistry,

carbon/nitrogen/sulfur cycles, ice-sheet model, multiple ensembles

– Adaptation: High-resolution atmosphere (1/4°), land, and sea ice, as well as ocean

Sustained PF– Mitigation: Increased

resolution, longer simulations, more ensembles for reliable projections; coupling with socio-economic and biodiversity models

– Adaptation: Limited cloud-resolving simulations, large-scale data assimilation

1 EF– Mitigation: Multi-century

ensemble projections for detailed comparisons of mitigation strategies

– Adaptation: Full cloud-resolving simulations, decadal forecasts of regional impacts and extreme-event statistics

Resolve clouds, forecast weather & extreme events, provide quantitative mitigation strategies

Resolve clouds, forecast weather & extreme events, provide quantitative mitigation strategies

Mitigation: Evaluate strategies and inform policy decisions for climate stabilization; 100-1000 year simulationsAdaptation: Decadal forecasts & region impacts; prepare for committed climate change; 10-100 year simulations

Mitigation: Evaluate strategies and inform policy decisions for climate stabilization; 100-1000 year simulationsAdaptation: Decadal forecasts & region impacts; prepare for committed climate change; 10-100 year simulations

Page 11: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

11

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Barriers in Ultrascale Climate SimulationAttacking the Fourth Dimension: Parallel in Time

Problem– Climate models use explicit

time stepping

– Time step must go down as resolution goes up

– Time stepping is serial

– Single-process performance is stagnating

– More parallel processes do not help!

Possible complementary solutions

– Implicit time stepping

– High-order in time

– “Fast” bases: curvelets and multi-wavelets

– “Parareal” parallel in time

Progress– Implicit version of HOMME for

global shallow-water equations: 10x speedup for steady-state test case

– High-order single-step time integration

– Single-cycle multi-grid linear solver for 1D

– Pure advection with curvelets and multi-wavelets

Near-term plans– Scale, tune, and precondition

implicit HOMME

– Single-cycle multi-grid linear solver for 2D

– “Parareal” for Burgers’ (1D nonlinear)

Ref: Trey White (ORNL)

Page 12: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

12

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Science Case: Astrophysics

250 TF – The interplay of several

important phenomena: hydrodynamic instabilities, role of nuclear burning, neutrino transport

1 PF– Determine the nature of the

core-collapse supernova explosion mechanism

– Fully integrated, 3D neutrino radiation hydrodynamics simulations with nuclear burning

Sustained PF– Detailed nucleosynthesis

(element production) from core-collapse SNe

– Large nuclear network capable of isotopic prediction (along with energy production)

1 EF– Precision prediction of

complete observable set from core-collapse SNe: nucleosynthesis, gravitational waves, neutrino signatures, light output

– Tests general relativity and information about the dense matter equation state, along with detailed knowledge of stellar evolution

– Full 3D Boltzmann neutrino tranpsort, 3D MHD/RHD, nuclear burningCHIMERA

Explanation and prediction of core-collapse SNe; put general relativity, dense EOS, stellar evolution theories to the test

Explanation and prediction of core-collapse SNe; put general relativity, dense EOS, stellar evolution theories to the test

Page 13: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

13

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Requirements Gathering

Consult literature and existing documentation

Construct a survey eliciting speculative requirements for scientific application on HPC platforms in 2010–2020

Pass the survey to leading computational scientists in a broad range of scientific domains

Analyze and validate the survey results (hard)

Make informed decisions and take action

Page 14: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

14

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Survey Questions

What are some possible science drivers and urgent problems that would require Leadership Computing in 2010–2020?

What are some looming computational challenges that will need resolution in 2010–2020?

What are some science objectives and outcomes that Leadership Computing could enable in 2010–2020?

What are some improvement goals for science-simulation fidelity that Leadership Computing could enable in 2010–2020?

What are some possible changes in physical model attributes for Leadership-Computing applications in 2010–2020?

What major software-development projects could occur in your application area in 2010–2020?

What major algorithm changes could occur for your applications in 2010–2020?

What libraries and development tools may need to be developed or significantly improved for Leadership Computing in 2010–2020?

How might system-attribute priorities change for Leadership Computing for your application?

In what ways might or should your workflow in 2010–2020 be different from today?

Are there any disruptive technologies that might affect your applications?

Page 15: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

15

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 16: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

16

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 17: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

17

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 18: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

18

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 19: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

19

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 20: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

20

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Findings in Models and Algorithms

The seven algorithm types are scattered broadly among science domains, with no one particular algorithm being ubiquitous and no one algorithm going unused.

– Structured grids and dense linear algebra continue to dominate, but other algorithm categories will become more common.

Compared to the Seven Dwarfs for current applications, we project a significant increase in Monte Carlo and increases in unstructured grids, sparse linear algebra, and particle methods, as well as a relative decrease in FFTs

– These projections reflect the expectation of much-greater parallelism in architectures and the resulting need for very high scalability

– Load balancing, scalable sparse solver, and random number generator algorithms will be more important.

Some important algorithms are not captured in the Seven Dwarfs– Categories expected by application scientists to be of growing importance in

2010–2020 include adaptive mesh refinement, implicit nonlinear systems, data assimilation, agent-based methods, parameter continuation, and optimization

Page 21: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

21

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Findings in Software

“Hero developer” mode is fatalistic– Does not scale and no single person can adequately understand breadth

and depth of issues

– Only accomplished by computer scientists, algorithm developers, application developers, and end-user scientists working together in a tightly integrated manner

– Must develop a means of interface between the heterogeneous computer, the developer, and the end user scientist

Must raise the level of abstraction– Current approach based on low-level constructs places constraints on

performance: over-constrain compiler and runtime system

– Raising abstraction level allows for increased algorithm experimentation, incorporation of intent in data structures, flexible memory organization, inclusion of fault tolerance constructs

– Enables exploration of power-aware algorithms

– Freedom from heroic software efforts having to be the norm

Page 22: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

22

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Findings in Software

Application development and maintenance tools and practices need to fundamentally change

Productivity improvement is an important metric and guide for tool and software choices

Fault tolerance and V&V software components must be used to improve reliability and robustness of application software

Knowledge discovery techniques and tools should be explored to help with bug detection, simulation steering, and data feature extraction and correlation

A holistic view of application data (from input to archival) is needed to most effectively deliver tools for the end-to-end workflow performed by scientists

Page 23: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

23

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 24: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

24

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 25: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

25

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 26: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

26

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 27: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

27

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Page 28: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

28

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Applications Analyzed

CHIMERA– Astrophysics: core-collapse supernova explosion mechanism

S3D– Turbulent combustion: lifted flame stabilization in diesel & gas turbine

engines

GTC– Fusion: Analyze and validate CTEM and ETG core turbulence

POP– Global ocean circulation: Eddy-resolved flow with biogeochemistry

DCA++– High-temperature superconductivity: Effect of charge & spin

inhomogeneities in the Hubbard model superconducting state

MADNESS– Chemistry: neutron & x-ray spectra of cuprates; dynamics of few-electron

systems; metal oxides surfaces in catalytic processes

PFLOTRAN– Reactive flows in porous media: Uranium migration and CO2 sequestration

in subsurface geologic formations

Page 29: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

29

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Application Requirements and Workload Reinforces a Balanced System Assertion

Applications analyzed represent almost one half of our 2008 allocation

A broad range of compute/communicate workloads must be supported

– Depends upon science, application within that science, and problem tackled by application

Application requirements call for breadth in models, algorithms, software, and scaling type

– Physical models coupled continuum conservation laws,

radiation transport, many-body Schrodinger, plasma physics, Maxwell’s equations, turbulence

– Numerical algorithms Each of the “7 dwarfs” is required

– Software implementation All popular languages are required

– Science drivers Strong scaling (time to solution) Weak scaling (bigger problem)

Application readiness actions plans are in place and being followed

Solar Physics3.3% Accelerator Physics

3.1%

Astrophysics14.1%

Biology4.8%

Chemistry7.4%

Climate13.6%

Computer Science2.8%

Engineering0.56%

Combustion14.4%

Nuclear Physics5.2%

Atomic Physics1.4%

QCD4.9%

Geosciences1.2%

Fusion7.2%

Materials Science 16.0%

Solar Physics3.3% Accelerator Physics

3.1%

Astrophysics14.1%

Biology4.8%

Chemistry7.4%

Climate13.6%

Computer Science2.8%

Engineering0.56%

Combustion14.4%

Nuclear Physics5.2%

Atomic Physics1.4%

QCD4.9%

Geosciences1.2%

Fusion7.2%

Materials Science 16.0%

GTC

MADNESS

DCA++

S3D CHIMERA

POPPFLOTRAN

Distribution in this space depends upon the applications and the problem being simulated for a given application

Computation

Co

mm

un

ica

tion

0% 100%0%

100%

GTC

S3D

POP

CHIMERA

DCA++

MADNESS

Computation

Co

mm

un

ica

tion

0% 100%0%

100%

GTC

S3D

POP

CHIMERA

DCA++

MADNESSPFLOTRAN

Page 30: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

30

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Resource Utilization by Science ApplicationsScience Dictates the Requirements

Page 31: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

31

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Example: PF Performance Observations and Readiness Plan for Some of our Key Apps

CodeScience Scaling Needs

Performance Observations andReadiness Plan

S3D larger problem

Compute-bound with minimal communication overhead Reduce memory contention with hybrid parallelism Increase cache reuse

GTC larger problem

Compute-bound with minimal communication overhead Use radial domain decomposition to eliminate cross-core collective calls Reduce the size of the problem per core and get better cache-reuse Increase SSE factor

DCA++ solution time

Heavily compute bound, benefitting from L3 BLAS routines (DGEMM,ZGEMM) Very good use of SSE (50%) with no changes Include disorder model for additional level of parallelism: 10x need for more processors Multithreaded linear algebra will allow additional parallelism at a lower level

MADNESS solution time Full asynchronous algorithm with communications hidden by model Nicely positioned to exploit Gemini Good SSE factor but still room for improvement

POP solution time

Sizeable communication component Reduce memory contention time and increase SSE factor Minimize synchronous behavior; better cache blocking New physics (biogeochemistry) increases compute fraction

CHIMERA larger problem

Communication dominated by collectives Production level physics increases compute fraction Reasonable SSE factor but room for improvement 20% raw speedup from Gemini w/o enhancements

PFLOTRAN solution time

Communication dominated by collectives Poor SSE factor – some room for improvement Additional phases and chemical species will reduce memory contention (natural block structure of Jacobian enables more efficient use of memory hierarchy)

Page 32: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

32

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Accelerating Development & Readiness

Automated diagnostics– Drivers: performance analysis, application verification, S/W debugging, H/W-fault

detection and correction, failure prediction and avoidance, system tuning, and requirements analysis

Hardware latency– Won’t see improvement nearly as much as flop rate, parallelism, B/W in coming years

– Can S/W strategies mitigate high H/W latencies?

Hierarchical algorithms– Applications will require algorithms aware of the system hierarchy (compute/memory)

– In addtion to hybrid data parallelism, and file-based checkpointing, algorithms may need to include dynamic decisions between recomputing and storing, fine-scale task-data hybrid parallelism, and in-memory checkpointing

Parallel programming models– Improved programming models needed to allow developer to identify an arbitrary

number of levels of parallelism and map them onto hardware hierarchies at runtime

– Models continue to be coupled into larger models, driving the need for arbitrary hierarchies of task and data parallelism

Page 33: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

33

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Accelerating Development & Readiness

Solver technology and innovative solution techniques– Global communication operations across 106-8 processors will be prohibitively

expensive, solvers will have to eliminate global communication where feasible and mitigate its effects where it cannot be avoided. Research on more effective local preconditioners will become a very high priority

– If increases in memory B/W continue to lag the number of cores added to each socket, further research needed into ways to effectively trade flops for memory loads/stores

Accelerated time integration– Are we ignoring the time dimension along which to exploit parallelism? (Ex: climate)

Model coupling– Coupled models require effective methods to implement, verify, and validate the

couplings, which can occur across wide spatial and temporal scales. The coupling requirements drive the need for robust methods for downscaling, upscaling, and coupled nonlinear solving

– Evaluation of the accuracy and importance of couplings drives the need for methods for validation, uncertainty analysis, and sensitivity analysis of these complex models

Maintaining current libraries– Reliance of current HPC applications on libraries will grow

– Libraries must perform as HPC systems grow in parallelism and complexity

Page 34: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

34

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

PF Survey Findings (with some opinion)

A rigorous & evolving apps reqms process pays dividends

– Needs to be quantitative: apps cannot “lie” with performance analysis

Algorithm development is evolutionary

– Can we break this mold?

– Ex: Explore new parallel dimensions (time, energy)

Hybrid/multi-level programming models virtually nonexistent

No algorithm “sweet spots” (one size fits all)

– But algorithm footprints share characteristics

V&V and SQA not in good standing– Ramifications on compute systems as well

as apps results generated

No one is really clamoring for new languages

MPI until the water gets too hot (frog analogy)

Apps lifetimes are >3-5x machine lifetimes

– Refactoring a way of life

Fault tolerance via defensive checkpointing de facto standard

– Won’t this eventually bite us? Artificially drives I/O demands

Weak or strong scale or both (no winner)

Data analytics paradigm must change

The middleware layer is surprisingly stable and agnostic across apps (and should expand!)

Page 35: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

35

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Summary & Recommendations: EF Survey

We are in danger of failing because of a software crisis unless concerted investments are undertaken to close the H/W-S/W gap

– H/W has gotten way ahead of the S/W (same ole – same ole?)

Structured grids and dense linear algebra continue to dominate, but …– Increase projected for Monte Carlo algorithms, unstructured grids, sparse

linear algebra, and particle methods (relative decrease in FFTs)

– Increasing importance for AMR, implicit nonlinear systems, data assimilation, agent-based method, parameter continuation, optimization

Priority of computing system attributes – Increase: interconnect bandwidth, memory bandwidth, mean time to

interrupt, memory latency, and interconnect latency Reflect desire to increase computational efficiency to use peak flops

– Decrease: disk latency, archival storage capacity, disk bandwidth, wide area network bandwidth, and local storage capacity

Reflect expectation that computational efficiency will not increase

– Per-core requirements relatively static, while aggregate requirements will grow with the system

Page 36: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

36

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Summary & Recommendations: EF Survey

System software must possess more stability, reliability, and fault tolerance during application execution

– New fault tolerance paradigms must be developed and integrated into applications

Job management and efficient scheduling of those resources will be a major obstacle faced by computing centers

Systems must be much better “science producers”– Strong software engineering practices must be applied to systems to

ensure good end-to-end productivity

– Data analytics must empower scientists to ask “what-if” questions, providing S/W & H/W infrastructure capable of answering these questions in a timely fashion (Google desktop)

– Strong data management will become an absolute at the exascale

Just like H/W requires disruptive technologies for acceleration of natural evolutionary paths, so too will algorithm, software, and physical model development efforts need disruptive technologies (invest now!)

Page 37: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

37

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Har

dw

are:

3

From David Keyes, FSP Review ASCAC 30 April 2008

1.5 orders: increased processor speed and efficiency

1.5 orders: increased concurrency

1 order: higher-order discretizations – Same accuracy can be achieved with many fewer elements

1 order: flux-surface following gridding– Less resolution required along than across field lines

4 orders: adaptive gridding– Zones requiring refinement are <1% of ITER volume and

resolution requirements away from them are ~102 less severe

3 orders: implicit solvers– Mode growth time 9 orders longer than Alfven-limited CFL

Fusion Simulation Project: Where to find 12 orders in 10 years?

So

ftw

are:

9

Page 38: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

38

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

A View from Berkeley (John Shalf)*

Need better benchmarks and better performance models– For reliable extrapolated

code requirements

Power is driving daunting concurrency

Scalable programming models– Need to exploit hierarchical

machine architecture

Hybrid processors– More concurrency; need a

more generalized approach

Apps must deal with platform reliability

Don’t forget autotuning– Shows value of good compilers

and associated R&D

Fast, robust I/O is hard

Scaling and concurrency is outsripping our ability to do rigorous V&V

Application code complexity has outgrown available tools

Frameworks and community codes can work but with certain “rules of engagement”

*ASCAC Fusion Simulation Project Review panel presentation (4/30/08)

Page 39: Scalable Scientific Applications C haracteristics & Future Directions

Managed by UT-Battelle for theDepartment of Energy

39

CSRI Scalable Apps Workshop: SF, NM, June 3-5, 2008

Questions?

Doug Kothe ([email protected])

Total # of Processors in Top15

0

50000

100000

150000

200000

250000

300000

350000

Jun-93Dec-93Jun-94Dec-94Jun-95Dec-95Jun-96Dec-96Jun-97Dec-97Jun-98Dec-98Jun-99Dec-99Jun-00Dec-00Jun-01Dec-01Jun-02Dec-02Jun-03Dec-03Jun-04Dec-04Jun-05Dec-05Jun-06

List

Processors