The Impact of Global Petascale Plans on Geoscience Modeling Richard Loft SCD Deputy Director for...

21
Supercomputing • Communications • NCAR Scientific Computing Div The Impact of Global Petascale The Impact of Global Petascale Plans on Geoscience Modeling Plans on Geoscience Modeling Richard Loft Richard Loft SCD Deputy Director for R&D SCD Deputy Director for R&D
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of The Impact of Global Petascale Plans on Geoscience Modeling Richard Loft SCD Deputy Director for...

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The Impact of Global Petascale The Impact of Global Petascale Plans on Geoscience ModelingPlans on Geoscience Modeling

Richard LoftRichard Loft

SCD Deputy Director for R&DSCD Deputy Director for R&D

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Outline Outline

Good news/bad news about the petascale architectures. Good news/bad news about the petascale architectures. NSF Petascale “Track-1” SolicitationNSF Petascale “Track-1” Solicitation A sample problem description from NSF Track-1.A sample problem description from NSF Track-1. NCAR’s petscale science response.NCAR’s petscale science response. CCSM as a petascale applicationCCSM as a petascale application

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Good news: the race to petascale is Good news: the race to petascale is on and is international…on and is international…

Earth Simulator-2 (MEXT) in Japan is committed to Earth Simulator-2 (MEXT) in Japan is committed to regaining ES leadership position by 2011.regaining ES leadership position by 2011.

DOE is deploying 1 PFLOPS peak system by ~2008.DOE is deploying 1 PFLOPS peak system by ~2008. Europe has the ENES EVE:lab program program.Europe has the ENES EVE:lab program program. NSF Track-1 solicitation targets 2010 system.NSF Track-1 solicitation targets 2010 system. Lot’s of opportunities for ES modeling on petascale Lot’s of opportunities for ES modeling on petascale

systems - worldwide!systems - worldwide! But how does this impact ES research/application But how does this impact ES research/application

development plans?development plans?

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Bad news: computer architecture is Bad news: computer architecture is facing significant challengesfacing significant challenges

Memory wall: memory speeds not keeping up with CPUMemory wall: memory speeds not keeping up with CPU– Memory is optimized for density not speedMemory is optimized for density not speed– Which causes CPU latency to memory in terms of the number of CPU clocks Which causes CPU latency to memory in terms of the number of CPU clocks

per load to increase;per load to increase;– Which causes more and more on-chip real-estate used for cache; Which causes more and more on-chip real-estate used for cache; – Which causes cache lines are getting longer;Which causes cache lines are getting longer;– Which causes microprocessors to become less forgiving of irregular memory Which causes microprocessors to become less forgiving of irregular memory

access patterns.access patterns. Microprocessor performance improvement has slowed since 2002, and Microprocessor performance improvement has slowed since 2002, and

are already 3x off projected levels of performance for 2006, based on are already 3x off projected levels of performance for 2006, based on pre-2002 historical rate of improvement.pre-2002 historical rate of improvement.– Key driver is power consumption.Key driver is power consumption.– Future feature size shrinks will be devoted to Future feature size shrinks will be devoted to more CPU’s per chipmore CPU’s per chip..– Rumors of Japanese chips with Rumors of Japanese chips with 1024 CPU’s per chip1024 CPU’s per chip at ISCA-33. at ISCA-33.

Design verification of multi-billion gate chips, fab-ability, reliability Design verification of multi-billion gate chips, fab-ability, reliability (MTBF), fault tolerance, are becoming serious issues.(MTBF), fault tolerance, are becoming serious issues.

Can we program these things?Can we program these things?

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Best Guess about Architectures: 2010Best Guess about Architectures: 2010

5 GHz is looking like an aggressive clock speed for 2010. For 8 CPU’s/sockets (chip) this is about 80 GFLOPS peak/socket. 2 PFLOPS is 25,000 sockets with ~200,000 CPU’s. Key unknown is which architecture for a cluster on a chip will be most

effective (there are many ways to organize a CMP. Vector systems will be around, but at what price? Wildcards

– Impact of DARPA HPCS program architectures.– Exotics in the wings: MTA’s, FPGA’s, PIM’s, GPU’s, etc.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

NSF Track-1 System BackgroundNSF Track-1 System Background

Source of funds:Source of funds: Presidential Innovation Initiative Presidential Innovation Initiative announce in SOTU.announce in SOTU.

Performance goal:Performance goal: 1 PFLOPS1 PFLOPS sustainedsustained on on “interesting problems”.“interesting problems”.

Science goal:Science goal: breakthroughs breakthroughs Use model:Use model: 12 research teams per year using whole 12 research teams per year using whole

system for days or weeks at a time.system for days or weeks at a time. Capability systemCapability system - large everything & fault tolerant. - large everything & fault tolerant. Single systemSingle system in in oneone locationlocation.. NotNot a requirement that machine be a requirement that machine be upgradableupgradable..

Supercomputing • Communications • Data

NCAR Scientific Computing Division

The “NSF Track-1” petascale system The “NSF Track-1” petascale system proposal is out: NSF06-573proposal is out: NSF06-573

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Track-1 Project ParametersTrack-1 Project Parameters

Funds:Funds: $200M$200M over 4 years, starting FY07 over 4 years, starting FY07 – Single awardSingle award– Money is for end-to-end system (as in 625)Money is for end-to-end system (as in 625)– Not intended to fund facility.Not intended to fund facility.– Release of funds tied to meeting hw and sw milestones.Release of funds tied to meeting hw and sw milestones.

Deployment Stages:Deployment Stages:– SimulatorSimulator– PrototypePrototype– Petascale system operates: FY10-FY15Petascale system operates: FY10-FY15

Operations funds FY10-15 funded separately.Operations funds FY10-15 funded separately. Facility costs not included.Facility costs not included.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Two Stage Award Process TimelineTwo Stage Award Process Timeline

Solicitation out: June 6, 2006 Solicitation out: June 6, 2006 [ HPCS down-select: July, 2006 ][ HPCS down-select: July, 2006 ] Preliminary Proposal due: September 8, 2006Preliminary Proposal due: September 8, 2006

– Down selection (invitation to 3-4 to write Full Proposal)Down selection (invitation to 3-4 to write Full Proposal) Full Proposal due: February 2, 2007Full Proposal due: February 2, 2007 Site visits: Spring, 2007Site visits: Spring, 2007 Award: Sep, 2007Award: Sep, 2007

Supercomputing • Communications • Data

NCAR Scientific Computing Division

NSF’s view of the problemNSF’s view of the problem

NSF recognizes the facility (power, cooling, space) challenge NSF recognizes the facility (power, cooling, space) challenge of this system.of this system.

NSF recognizes the need for fault tolerance features.NSF recognizes the need for fault tolerance features. NSF recognizes that applications will need significant NSF recognizes that applications will need significant

modification to run on this systems.modification to run on this systems.– NSF expects Track-1 proposer to discuss needs with application NSF expects Track-1 proposer to discuss needs with application

experts (many are in this room).experts (many are in this room).

– NSF application support funds - expect solicitation out in NSF application support funds - expect solicitation out in September, 2006.September, 2006.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Sample benchmark problem (from Sample benchmark problem (from Solicitation)Solicitation)

A 12,288-cubed simulation of fully developed homogeneous turbulence in a periodic domain for one eddy turnover time at a value of Rlambda of O(2000). The model problem should be solved using a dealiased, pseudospectral algorithm, a fourth-order explicit Runge-Kutta time-stepping scheme, 64-bit floating point (or similar) arithmetic, and a time-step of 0.0001 eddy turnaround times. Full resolution snapshots of the three-dimensional vorticity, velocity and pressure fields should be saved to disk every 0.02 eddy turnaround times. The target wall-clock time for completion is 40 hours.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Back of the envelope calculations forBack of the envelope calculations forturbulence exampleturbulence example

N=12288N=12288 One 64 bit variable = 14.8 TB of memoryOne 64 bit variable = 14.8 TB of memory

– 3 time levels x 5 variables (u,v,w,P,vor) ~222 TB of memory3 time levels x 5 variables (u,v,w,P,vor) ~222 TB of memory File output every 0.02 eddy turn-over times = 74 TB/snapshotFile output every 0.02 eddy turn-over times = 74 TB/snapshot Total output in 40 hour run: 3.7 PBTotal output in 40 hour run: 3.7 PB

– I/O BW >> 256 GB/secI/O BW >> 256 GB/sec ~3*(N*N*N*(65)) = 361.8 TFLOP/field/step~3*(N*N*N*(65)) = 361.8 TFLOP/field/step Assume 4 variables = (u,v,w,P) = 1.447 PFLOP/stepAssume 4 variables = (u,v,w,P) = 1.447 PFLOP/step Must average 14.4 seconds/step Must average 14.4 seconds/step

– Mean flops rate ~100 TFLOPS. (what a coincidence)Mean flops rate ~100 TFLOPS. (what a coincidence)– Real FLOPS rate >> 100 TFLOPS because of I/O + comms overhead.Real FLOPS rate >> 100 TFLOPS because of I/O + comms overhead.

Must communicate ~ 3*8*4*N*N*N = 178 TB per timestep - Must communicate ~ 3*8*4*N*N*N = 178 TB per timestep - aggregate network BW >> 12.4 TB/secaggregate network BW >> 12.4 TB/sec

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Turbulence problem system Turbulence problem system resource estimatesresource estimates

>> 5 GFLOPS sustained per socket. >> 5 GFLOPS sustained per socket. – Computations must scale well on chip.Computations must scale well on chip.– 10 GFLOPS sustained probably more realistics. 10 GFLOPS sustained probably more realistics. – Probably doable with optimized RFFT calls (FFTW).Probably doable with optimized RFFT calls (FFTW).

> 8 GB memory/socket (1-2 GB/CPU)> 8 GB memory/socket (1-2 GB/CPU) >> 0.5 GB/sec/socket sustained system bisection BW>> 0.5 GB/sec/socket sustained system bisection BW

– For a ~100,000 byte messageFor a ~100,000 byte message– Realistically ( ~2 GB/sec/socket ) Realistically ( ~2 GB/sec/socket )

>> 670 disk spindles saturated during I/O>> 670 disk spindles saturated during I/O– Realistically ( ~2k-4k disks Realistically ( ~2k-4k disks – multi-pbyte RAIDmulti-pbyte RAID

Supercomputing • Communications • Data

NCAR Scientific Computing Division

UCAR Peta-ProcessUCAR Peta-Process

Define UCAR petascale science goalsDefine UCAR petascale science goals Develop system requirements Develop system requirements

– Make these available to Track-1 proposal writers Make these available to Track-1 proposal writers Define application development resource requirementsDefine application development resource requirements

– Fold these into proposals for additional resourcesFold these into proposals for additional resources

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Peta-process details…Peta-process details…

A few, strategic, specific and realistic science goals for A few, strategic, specific and realistic science goals for exploiting petascale systems. What are the killer apps?exploiting petascale systems. What are the killer apps?

The CI requirements (system attributes, data archive The CI requirements (system attributes, data archive footprint, etc.) of the footprint, etc.) of the entire toolchainentire toolchain for the science for the science workflow for each.workflow for each.

A mechanism to provide part of this information to all the A mechanism to provide part of this information to all the consortia competing for the $200M.consortia competing for the $200M.

The project retasking required to ultimately write viable The project retasking required to ultimately write viable proposals for time on petascale systems over the next 4 proposals for time on petascale systems over the next 4 years.years.

Resource requirements forResource requirements for– staff augmentationsstaff augmentations– Local equipment infrastructure enhancementsLocal equipment infrastructure enhancements

Build University collaborations to support this effort.Build University collaborations to support this effort.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Relevant Science Areas Relevant Science Areas (from NSF Track-1 Solicitation)(from NSF Track-1 Solicitation)

The detailed structure of, and the nature of intermittency in, stratified and unstratified, rotating and non-rotating turbulence in classical and magnetic fluids, and in chemically reacting mixtures;

The nonlinear interactions between cloud systems, weather systems and the Earth’s climate;

The dynamics of the Earth’s coupled, carbon, nitrogen and hydrologic cycles;

The decadal dynamics of the hydrology of large river basins; The onset of coronal mass ejections and their interaction with the

Earth’s magnetic field, including modeling magnetic reconnection and geo-magnetic sub-storms;

The coupled dynamics of marine and terrestrial ecosystems and oceanic and atmospheric physics;

The interaction between chemical reactivity and fluid dynamics in complex systems such as combustion, atmospheric chemistry, and chemical processing.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Peta-science Ideas (slide 1 of 2)Peta-science Ideas (slide 1 of 2)

Topic 1.Topic 1. Across scale modeling: simulation of the 21st century Across scale modeling: simulation of the 21st century climate with a coupled atmosphere-ocean model at 0.1 degree climate with a coupled atmosphere-ocean model at 0.1 degree resolution (eddy resolving in the ocean). For specific time resolution (eddy resolving in the ocean). For specific time periods of the integration, shorter-time simulations with higher periods of the integration, shorter-time simulations with higher spatial resolution: 1 km with a nonhydrostatic global spatial resolution: 1 km with a nonhydrostatic global atmospheric model and 100 m resolution in a nested regional atmospheric model and 100 m resolution in a nested regional model. Emphasis will be put the explicit representation of moist model. Emphasis will be put the explicit representation of moist turbulence, convection and hydrological cycle.turbulence, convection and hydrological cycle.

Topic 2.Topic 2. Interactions between atmospheric layers and Interactions between atmospheric layers and response of the atmosphere to solar variability. Simulations of response of the atmosphere to solar variability. Simulations of the atmospheric response to 10-15 solar cycles derived by a the atmospheric response to 10-15 solar cycles derived by a high-resolution version of WACCM (with explicit simulation of high-resolution version of WACCM (with explicit simulation of the QBO) coupled to an ocean model.the QBO) coupled to an ocean model.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Peta-science Ideas (slide 2 of 2)Peta-science Ideas (slide 2 of 2)

Topic 3.Topic 3. Simulation of Chemical Weather: High resolution Simulation of Chemical Weather: High resolution chemical/dynamical model with rather detailed chemical chemical/dynamical model with rather detailed chemical scheme. Study of global air pollution, impact of mega-cities scheme. Study of global air pollution, impact of mega-cities wildfires, and other pollution sources.wildfires, and other pollution sources.

Topic 4.Topic 4. Solar MHD: a high resolution model of turbulence in Solar MHD: a high resolution model of turbulence in the solar photosphere.the solar photosphere.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Will CCSM qualify on the Track-1 Will CCSM qualify on the Track-1 system? Maybe!system? Maybe!

This system is designed to be 10x bigger than Track-2 systems being funded by OCI money at $30M a piece over the next 5 years.

Therefore, a case can be made an qualifying ensemble application that could run on at least 10-25% of the system, even as an ensemble in order to justify needing the resource.

This means a good target for one instance of CCSM is 10,000 to 50,000 processors.

John Dennis (with Bryan and Jones) has done some work on POP 2.1 @ 0.1 degree that looks encouraging at 30,000 CPU’s…

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Some petascale application do’s Some petascale application do’s and don’tsand don’ts

Don’t assume the node is a rack mounted server.Don’t assume the node is a rack mounted server.– No local disk driveNo local disk drive– No full kernel - beware of relying on system calls.No full kernel - beware of relying on system calls.

No serialization of memory or I/ONo serialization of memory or I/O– Global arrays read in and distributedGlobal arrays read in and distributed– Serialized I/O (I.e. through node 0) of any kind, will be a Serialized I/O (I.e. through node 0) of any kind, will be a

show stoppershow stopper– Multiple executables applications may be problematicMultiple executables applications may be problematic– Eschew giant look-up tables.Eschew giant look-up tables.

No unaddressed load imbalances.No unaddressed load imbalances.– Dynamic load balancing schemesDynamic load balancing schemes– Communication overlappingCommunication overlapping

Algorithms with irregular memory accesses will be perform Algorithms with irregular memory accesses will be perform poorly.poorly.

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Some design question for discussionSome design question for discussion

How CCSM can use different CMP’s and/or MTA’s How CCSM can use different CMP’s and/or MTA’s architectures effectively? Consider-architectures effectively? Consider-– Power 5 ( 2 CPU per chip, 2 threads per CPU)Power 5 ( 2 CPU per chip, 2 threads per CPU)– Niagara (8 CPU’s per chip, 4 thread per CPU)Niagara (8 CPU’s per chip, 4 thread per CPU)– Cell (assymetrical - 1 scalar + 8 Synergistic procs per chip) Cell (assymetrical - 1 scalar + 8 Synergistic procs per chip)

New coupling strategies?New coupling strategies?– Time to scrap multi-executables?Time to scrap multi-executables?

New ensemble strategies?New ensemble strategies?– Stacking instances as multiple threads on an CPU?Stacking instances as multiple threads on an CPU?

Will the software we rely on scale?Will the software we rely on scale?– MPIMPI– ESMFESMF– pNetCDFpNetCDF– NCLNCL