Pugh Dec2016 V2 - Bureau of Meteorology€¦ · • High uptime internet communications and...

25
Supercomputing Programme A seven-year programme to enhance the computational and numerical prediction capabilities of the Bureau’s forecast and warning services. Tim Pugh Supercomputer Programme Director Australian Bureau of Meteorology Tuesday, December 13, 2016

Transcript of Pugh Dec2016 V2 - Bureau of Meteorology€¦ · • High uptime internet communications and...

Supercomputing ProgrammeA seven-year programme to enhance the computational and numerical prediction capabilities of the Bureau’s forecast and warning services.

Tim Pugh

Supercomputer Programme Director

Australian Bureau of Meteorology Tuesday, December 13, 2016

• National/Global observing system: atmosphere, marine, water, land, space

• 24/7 Operational forecasting systems for weather, climate, oceans and flooding

• Supercomputing and massive data storage

• High uptime internet communications and disaster recovery

• Professional forecasting capability across multiple disciplines

• Experts out posted in the Australian Defence Force, State Emergency Centres and Aviation Operation Centres

Reliable, resilient, national capability

New funding announced by Australian Government in May 2014

Seven Year Programme from July 2014 to July 2021

� Funding for Supercomputer system, Supporting Data Processing and Storage systems, Data Centre and

Networks, and Numerical Prediction Project (Transitions to Operations)

Programme Investment Areas across People, Processes, Science and Technology

» Benefit Planning and Realisation (Supercomputer and Services Board)

� Investments, Priorities, Delivery and Schedules, Social Economic Value, Return on Investment

» Infrastructure (Information Systems and Services)

� Data centre, networks, HPC and Data Intensive Computing, Software Services, Suite and Job Scheduling, UM

Modelling Infrastructure, System and Application Monitoring

» Delivery (Science to Services)

� Scientific Computing Service, Model Build Team, Numerical Prediction, Guidance Post Processing, Model

Data Services, Software Lifecycles, Verification Frameworks, Software Services

» Scalability (Research and Development)

� Future architectures, Growth in Compute and Data, Software Engineering, Skills

Forecast Production Value Chain

- - - - - - Continuous improvement through research and verification - - - - - -

More accurate - particularly for the location, timing and direction of rainfall, storms and wind changes

More up-to-date - more frequent forecasts available

More valuable - for decision makers, by quantifying forecast outcome probabilities using ensembles

More responsive - through capability to produce additional, on-demand, detailed forecasts for multiple extreme weather and hazard events across Australia.

Investments and Outcomes

Climate Change

Climate variability

Weather

Minutes

Hours

Days

Weeks

Months

Seasons

Years

Decades

Centuries

Alerts

Watches

Warnings

Forecasts

Outlook

Predictions

Guidance

Scenarios

Emergency

response

Strategic

planning

International

policy

negotiation

Sectoral

preparedness

planning

Forecast

uncertainty

Environmental Modelling in the Bureau

Australis HPC system Numerical Prediction for weather, climate, marine, hydrology, space weather

Supercomputer detailsSupercomputer details

CRAY Inc.WILL SUPPLY THE NEW

SUPERCOMPUTER

59 MILLION (USD)HAS BEEN ALLOCATED

FOR THE PROJECT$

Numerical Weather Prediction Roadmap

Projection of Nominal Modelling Resolutions for Future Computing Systems

Model Topography of Sydney, NSW

2 x daily 10-day & 3-day forecast40km Global Model

4 x daily 3-day forecast12km Regional Model

Sydney, NSW

(research 1.5km topography)

4 x daily 36-hour forecast4km City/State Model

TCTC

Increasing model

resolution

for improved local

information

Future model ensembles

for likelihood of

significant weather

2 x daily 10-day & 3-day forecast12km Global Model

8 x daily 3-day forecast4.5km Regional Model

24 x daily 18h or 36h forecast1.5km City/State Model

2013

2020

Modelling Outcomes to Achieve

Capability 2014 HPC systemNew HPC systems

(2016 to 2021)

Model grid resolution (horizontal only) ACCESS-G (global)

ACCESS-R (regional)

ACCESS-C (city)

40 km12 km4 km

25 km > 10 km12 km > 4.5 km

1.5 km

Regular forecast updates(times per day) Global

Regional

City and on-demand

4 times4 times4 times

4 times8 times

Up to 24 times

Tropical cyclone forecasts(horizontal grid resolution)

(forecast length)12 km

Out to 3 days

Up to 3 concurrent events12km > 4.5kmOut to 5 days

Ensembles Forecasts(Certainty for decision makers)

None Yes (Global, City, TC, Relocatable)

Capability to produce additional, on-demand, high-resolution forecasts for extreme weather

None1.5 km

Up to 4 concurrent eventsUp to 24 times per day

What is the Decoupler Strategy?

Products Gen� Best gridded data

� Standard methods

� Common data services

� API management

HPC Apps• 1-2 updates per annum• Grid enhancements• Modelling enhancements• Initial state enhancements

Service Apps• Agile application development• Product consistency (5-10 yrs)• Data access consistency• Fit-for-Purpose Quality

improvements over time

A key aim is to break the coupling between numerical prediction models and customer-specific forecast products. • In this way it acts like an interface between

them, absorbing requirements from both

sides to ensure that a change to one does

not affect the other.

HPC Production Workflow

What is the Best Gridded Data?

Data processing levels

Use to define level of processing applied Use Level 3 (Best Data) by default

Strong

coupling

Weak/ no coupling

Incre

asin

g q

ua

lity

Aurora

Australis

PBSpro

Production Scheduler

(Aurora)

North

(Aurora) CS400North

(Aurora)

South

(Aurora) CS400South

GPFS GPFS

Data

Intensive

Lustre Lustre

Compute

Intensive

(Australis)XC40 East

(Australis)XC40 West

XC40 Dev System

Lustre

Terra

PBSpro

Development

Staging Production

ITOpsDashboard

Staging Scheduler

PBSpro

Dev Scheduler

VM Dev

Cloud

Achieving Automation in ModelingNew approaches and improving standards in software development

Australis (Prod)

SCS-Workflow Prod

Australis (Stage)Terra (Dev)

SCS-Workflow StageSCS-Workflow Dev

GIT scs-repos-dev artifactory / binaries

Dev Branch Dev Branch Prod Deploys

User-space development

Some automated testing

Automated deployments

Service account model

Automated testing

Versioned deployments

”One-step” installation

Service account model

master

branch

Suite Schedulers

Computational Platforms

Software Services

Feature Branches

Aurora

Australis

PBSpro

Production

Scheduler

(Aurora)

CS400

North

(Aurora)

CS400

South

GPFS GPFS

Data

Intensive

Lustre Lustre

Compute

Intensive

(Australis)

XC40 East

(Australis)

XC40 WestXC40 Dev System

Lustre

Terra

PBSpro

Development

Staging Production

Staging

Scheduler

PBSpro

Dev

Scheduler

VM Dev Cloud

Emergency Services

Cloud

Aviation Services

Cloud

…Service Cloud

DevOps to Production Simulation to Services

Simulation

Products

copy-out

copy-out

copy-out

BoM Production & Staging Platforms (Australis )38x performance, 8x electrical power

2015 Australis(delivered)

2018 Addition (projected)

2018 Australis (projected)

Ngamai HPC

System (Retired

Oct’16)

Relative Increase

Processor

Intel Xeon

Haswell

12-core, 2.6 GHz

Intel Xeon

Skylake

Intel Xeon

Haswell + Skylake

Intel Xeon

Sandy Bridge

6-core, 2.5 GHz

Increase relative to

Ngamai System

Nodes 2,160 1,952 4,112 5762015: 3.8x

2018: 7.1x

Cores 51,840 78,080 129,920 6,9122015: 7.5x

2018: 18.8x

Aggregate Memory 276 TB 375 TB 651 TB 36.9 TB2015: 7.5x

2018: 17.7x

Usable Storage 4,320 TB 4,320 TB 8,640 TB 214 TB2015: 20.2x

2018: 40.4x

Storage Bandwidth 135 GB/s 171 GB/s 306 GB/s 16 GB/s2015: 8.4x

2018: 19.1x

Sustain system

performance (SSP)253 365 618 16

2015: 15.6x

2018: 38.1x

Typical Power Use 865 kW 783 kW 1,648 kW 200 kW2015: 4.3x

2018: 8.2x

+ =

Computational capacity and performance of Aurora

Specification: 2 Clusters

Number of Nodes: 20 (16 Compute, 4 GPU Compute with GPU)

Service Nodes: 7 (1 NFS, 2 General Purpose, 1 Jump Server, 1 LNET Router, 2 Management Nodes)

Processors per Node: 2

Processor Type: Intel Xeon Broadwell E5-2695v4 18-core 2.1GHz 120W

Memory: 256 GiB (8x 32GB DDR4 2400MHz)

Internal Storage: 1x Intel DC P3608 Series 1.6TB

Network Port:

1x onboard Mellanox Connect-IB with Virtual Protocol Interconnect (VPI), providing 10/20/40/56 Gb/s InfiniBand through a single-port QSFP+

Accelerator Card: 1x NVIDIA K80 GPU with 24GB RAM (4 nodes)

BoM Midrange Production & Staging Platforms (Aurora)

Integrated CS-400 data processing system with two cluster of nodes

� Dual socket nodes with Intel 18-core Broadwell, 256GB DDR4, FDR

Infiniband interconnects (2 x 720 cores)

� Addition of 1.6 TB Intel NVMe flash on all compute nodes and handful

of NVIDIA K80 GPUs for visualisation and data processing

� GPFS data storage system based on DDN GS14K system with 150TB SSD

& 2PB HDD storage pools.

CS-400 (Aurora) running data intensive workloads

• Single node Workloads – Pre/Post-Processing.

• Small file workloads (GPP/OCF).

• Product Generation - Data Quality Verification

• Replacing components previously running on Ngamai and RTDS4 (midrange system).

• Will host new capability (for example):

� Master Data Management

� Operational Data Store

� Data Management/Portal Services

� GPGPU data processing and visualization

� Capacity to cope with the ACCESS NWP v3 modelling system

“TERRA”, a new Cray XC40 System

TERRA is 1/6th the size of AUSTRALIS production system

Two phase delivery:

2016 system: 117 Teraflops, 144 nodes, 3,456 cores, 18.4 TB memory, 1440 TB usable

data storage, 45 GB/s I/O bandwidth• to include I/O Accelerators – 48 TB of NVMe Flash (Data Warp I/O) for computational

research and workflow optimisation (reduce elapse times)

• to include Compute Accelerators – NVIDIA GPUs, Intel Xeon Phi for computational

research and application optimisation

2018 upgrade: 473 Teraflops, 321 nodes, 10,536 cores, 52.4 TB memory, 2880 TB

usable data storage, 90+GB/s I/O bandwidth

Delivered in May 2016, accepted on 30 June, commissioned on 1st September 2016

• to support our Development to Operations (DevOps) methodology and pathway

from the NCI computing facility to AUSTRALIS production system

• to facilitate porting, testing and preparation of scientific code in development

for operations on AUSTRALIS.

BoM Development System (Terra )For application porting and scientific computing development

Parameter 2015 Dev System 2018 Addition 2018 Dev System 2013 HPC System

ProcessorIntel Xeon Haswell

12-core 2.6GHz

Intell XeonSkylake20-core

Intel XeonHaswell + Skylake

Intel XeonSandy Bridge6-core 2.5GHz

Nodes 144 177 321 576

Cores 3,456 7,080 10,536 6,912

Aggregate Memory 18.4 TB 34.0 TB 52.4 TB 36.9 TB

Global Filesystem Technology

Cray/Seagate Sonexion Lustre 2.5.1+

Cray/Seagate Sonexion Cray/Seagate Sonexion

Lustre

Oracle/Lustre1.8.8

Usable Storage 1,440 TB 1,440+ TB 2,880+ TB 214 TB

Storage Bandwidth 45 GB/s 45+ GB/s 90+ GB/s 16 GB/s

Data Storage AccelerationDataWarp I/O

48 TB SSD33.6 GB/s BW

NVDIMMsDataWarp I/O

48 TB SSD33.6 GB/s BW

N/A

Compute InterconnectCray Aries

93 – 157Gb/sCray Aries

93 – 157Gb/sCray Aries

93 – 157Gb/sInfiniband QDR

40Gb/sTypical Power Use (kiloWatts)

71 kW 91 kW 162 kW 200 kW

Top 500 Rmax Linpack 110 TF 362 TF 473 TF 104 TF

+ =

Computing Memory and Storage Trends (2016)

Current Model

CPU

Memory(DRAM)

Parallel Storage(HDD)

Archive Storage(HDD & Tape)

Future Model

CPU

Near Memory(HBM/HMC)

Parallel Storage(HDD)

Archive Storage(HDD & Tape)

Far Memory(DRAM/NVDIMM)

Near Storage(Flash)

Sourced from Cray (2015)

On Node

Off Node

(External)

On Node

Off Node

(External)

Off Node

(Internal/HSN)

HBM = High Bandwidth Memory

HMC = Hybrid Memory Cube or 3D-stacked DRAM

MCDRAM = Multi-Channel DRAM is 3D-stacked DRAM

Flash = (solid-state) non-volatile data storage chip

SSD = Solid State Disk (Flash-based storage device)

HDD = Hard Disk Device (spinning disk)

NWP Model Data Production

1.6 - 1.6 2.8 2.8 5.6

23.0

52.8

75.8

41.2

87.6

128.8

-

20.0

40.0

60.0

80.0

100.0

120.0

140.0

Deterministic Ensemble Total(year)

An

nu

al

Data

Vo

lum

e (

PB

)

APS1 APS2 APS3 APS4

Daily Production

APS1 = 4TB

APS2 = 15TB

APS3 = 208 TB

APS4 = 353 TB

Australis Data Production (not storage)

Application Computational Efficiency and Scalability

Objective:

A proposed collaboration to establish improved performance of ACCESS climate and weather

modelling for HPC systems based on the next generation of HPC systems.

Proposed Goals:

• To computationally meet the operational time windows and throughput needs of weather,

climate, and earth system modelling.

• To utilise new computational architectures, programming models, and algorithms to improve

model performance and scalability from hundreds to thousands of processor cores.

• To achieve capacity computing and I/O storage throughput needed to support ensemble

modelling systems for high resolution weather and coupled climate modelling.

• To elevate the collaboration and contributions in the development of the Australian

Community Climate Earth-System Simulator (ACCESS) and Weather Prediction

Application Computational Efficiency and Scalability

MetOffice

Tsunami Events – Modelling Realtime Events with GPUs

• Currently based on pre-computed scenarios

using NOAA MOST tsunami model

• Runtime of >60 minutes made it impossible to

run a real time simulation during an event

• Performance improvement in 6 weeks by two

HPC programmers.

• Initial results of 24-hour simulation of tsunami

wave propagation

Serial code

Intel Xeon Haswell> 3600 sec (53 min)

OpenMP, 24 cores

Intel Xeon Haswell262 sec (~4.4m)

CUDA, 1 GPU

NVIDIA K80 Telsa134 sec (~2.3 min)

CUDA, 8 GPUs

NVIDIA K80 Telsa22 sec (~0.3 min)

Parallel & GPU version allow on-demand simulation of Tsunami

event:

– More accurate forecasts of effects

– Ensemble modelling

– Better uncertainty estimation, improved risk map

Topics of Interest

• Improved data analytics and workflows

– New data storage technology

– Architectural configuration

– New software tools and pipelines

– Containers in HPC & data processing

• Meteorological Archival and Retrieval Systems

• Data services and data management

– Data access services

– Data integrity management

– Data and metadata management

– Data virtualisation and aggregation

• System Monitoring and Analytics to improve system robustness and resilience

• Machine Learning in observation and forecast sciences

– Radar image recognization and tracking

– Observation quality control

– Probablistic forecasting and ensemble member design and assessment (quality)

– Nowcasting applications

– Predictive analytics for system faults

• Computational Sciences

– Software engineering for next generation applications

– New processor architectures (GPU, Xeon Phi, FPGA)

– Software tools and domain interpretations

– New algorithms in next generation numerical weather and climate prediction