ALCF at Argonne
description
Transcript of ALCF at Argonne
1
Argonne Leadership Computing Facility
ALCF at Argonne
Opened in 2006 Operated by the Department
of Energy’s Office of Science Located at Argonne National
Laboratory (30 miles southwest of Chicago)
Argonne Leadership Computing Facility
2
IBM Blue Gene/P, Intrepid (2007) 163,840 processors 80 terabytes of memory 557 teraflops Energy-efficient system uses one-third
the electricity of machines built with conventional parts
#38 on Top500 (June 2012) #15 on Graph500 (June 2012)
The groundbreaking Blue Gene General-purpose architecture excels in virtually
all areas of computational science Presents an essentially standard Linux/PowerPC
programming environment Significant impact on HPC – Blue Gene systems are consistently
found in the top ten list Delivers excellent performance per watt High reliability and availability
Argonne Leadership Computing Facility
3
IBM Blue Gene/Q, Mira
IBM Blue Gene/Q, Mira– 768,000 processors– 768 terabytes of memory– 10 petaflops – #3 on Top500 (June 2012)– #1 on Graph500 (June 2012)
Blue Gene/Q
Prototype 2 ranked #1
June 2011
4
Argonne Leadership Computing Facility
Programs for Obtaining System Allocations
For more information, visit: http://www.alcf.anl.gov/collaborations/index.php)
Argonne Leadership Computing Facility
5
The U.S. Department of Energy’s INCITE Program
INCITE seeks out large, computationally intensive research projects and awards more than a billion processing hours to enable high-impact scientific advances. Open to researchers in academia, industry,
and other organizations Proposed projects undergo scientific
and computational readiness reviews More than a billion total hours are awarded
to a small number of projects Sixty percent of the ALCF’s processing hours
go to INCITE projects Call for proposals issued once per year
6
Argonne Leadership Computing Facility
2012 INCITE Allocations by Discipline
7%8%
13%
5%
7%
3%22%
17%
18%
Total Argonne Hours = 732,000,000
AstrophysicsBiological SciencesChemistryComputer ScienceEarth ScienceEnergy TechnologiesEngineeringMaterials SciencePhysics
Argonne Leadership Computing Facility
7
World-Changing Science Underway at the ALCF Research that will lead to improved, emissions-reducing
catalytic systems for industry (Greeley) Enhancing pubic safety through more accurate earthquake
forecasting (Jordan) Designing more efficient nuclear reactors that are less
susceptible to dangerous, costly failures (Fischer) Accelerating research that may improve diagnosis and
treatment for patients with blood-flow complications (Karniadakis)
Protein studies that will apply to a broad range of problems, such as a finding a cure for Alzheimer’s disease, creating inhibitors of pandemic influenza, or engineering a step in the production of biofuels (Baker)
Furthering research to bring green energy sources, like hydrogen fuel, safely into our everyday lives, reducing our dependence on foreign fuels (Khokhlov)
Argonne Leadership Computing Facility
8
ALCF Service Offerings
Scientific liaison (“Catalyst”) for INCITE and ALCC projects, providing collaboration along with assistance with proposals and planning
Startup assistance and technical support Performance engineering and application tuning Data analysis and visualization experts MPI and MPI-I/O experts Workshops and Seminars
Argonne Leadership Computing Facility
9
A single node
Can be carved up into multiple MPI ranks, or as a single MPI rank with threads– Up to 4 MPI ranks/node on intrepid, up to 64 MPI ranks/node on mira
SIMD available on the cores, required to reach peak flop rate– 2-way FPU on intrepid, 4-way FPU on mira
Runs a Compute Node Kernel, requires cross-compiling from the front-end login nodes
Forwards I/O operations to an I/O node, which aggregates requests from multiple compute nodes
No virtual memory– 2 GB/node on intrepid, 16 GB/node on mira
No fork()/system() calls
Argonne Leadership Computing Facility
10
A partition
Partitions come in pre-defined sizes that gain you isolation from other users
Additionally, you get the I/O nodes connecting you to the GPFS filesystems – requires a scalable I/O strategy!
Partitions can be as small as 512 nodes (16 on development rack), up to the size of the full machine
At the small scale, this is governed by the ratio of I/O nodes to compute nodes
At the large scale, this is governed by the network links required to make a torus, rather than a mesh
Argonne Leadership Computing Facility
11
Blue Gene/P hierarchy:
1 chip, 20 DRAMs
13.6 GF/s2.0 GB DDRSupports 4-way SMP
32 Node Cards1024 chips, 4096 procs
14 TF/s2 TB
40 Racks
556 TF/s82TB
Rack
IntrepidSystem
Compute Card
435 GF/s64 GB
(32 chips 4x4x2)32 compute, 0-2 IO cards
Node Card
Front End Node / Service NodeSystem p ServersLinux SLES10
Argonne Leadership Computing Facility
12
Argonne Leadership Computing Facility
13
Argonne Leadership Computing Facility
14
Visualization and Data Analytics
Both systems come with a linux cluster attached to the same GPFS filesystems and network infrastructure
The GPUs on these machines can be used for straight visualization, or to perform data analysis
Software includes VISIT, ParaView, and other viz toolkits
Argonne Leadership Computing Facility
15
Programming Models and Development Environment Basically, all of the lessons from this week apply: MPI, pthreads,
OpenMP, using any of C, C++, Fortran– Also have access to things like Global Arrays and other lower-level communication
protocols if that’s your thing– Can use XL or GNU compilers, along with LLVM (beta)
I/O using HDF, NetCDF, MPI-I/O, … Debugging with TAU, HPCToolkit, DDT, TotalView, … Many supported libraries, like BLAS, PetSc, Scalapack, …
Argonne Leadership Computing Facility
16
How do you get involved?
Send email to [email protected] requesting access to the CScADs project
Or, go to https://accounts.alcf.anl.gov and request a project of your own