ARCHER Advanced Research Computing High End Resource Nick Brown [email protected].
-
Upload
edward-long -
Category
Documents
-
view
217 -
download
1
Transcript of ARCHER Advanced Research Computing High End Resource Nick Brown [email protected].
![Page 3: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/3.jpg)
Machine overviewARCHER (a Cray XC30) is a Massively Parallel Processor (MPP) supercomputer design built from many thousands of individual nodes.
There are two basic types of nodes in any Cray XC30:• Compute nodes (4920)
• These only do user computation and are always referred to as “Compute nodes”
• 24 cores per node, therefore approx 120,000 cores
• Service/Login nodes (72/8) • Login nodes – allow users to log in and perform interactive tasks• Other misc service functions
• Serial/Post-Processing Nodes (2)
About ARCHER
![Page 4: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/4.jpg)
Interacting with the systemUsers do not log directly into the system. Instead they run commands via an esLogin server. This server will relay commands and information via a service node referred to as a “Gateway node”
Computenode
Computenode
LNET Nodes
Computenode
Computenode
Gatewaynode
Computenode
Computenode
esLoginnode
Lustre OSS
Lustre OSS
CrayAries
Interconnect
Cray XC30 CabinetsCray Sonnexion
Filesystem
Ext
erna
l N
etw
ork
Infiniband links
Ethernet
User guide
Serialnode
![Page 5: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/5.jpg)
Job submission example
my_job.pbs
nbrown23@eslogin008:~> qsub my_job.pbs50818.sdbnbrown23@eslogin008:~>
PBS QUEUE
Test-job.o50818
Test-job.e50818
Compute node
Compute node
Compute node
Compute node
Compute node
Compute node
nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job -- 2 48 -- 00:20 Q -- nbrown23@eslogin008:~> qstat –u $USER50818.sdb nbrown23 standard test-job 29053 2 48 -- 00:20 R 00:00
#!/bin/bash --login
#PBS -l select=2
#PBS -N test-job#PBS -A budget
#PBS -l walltime=0:20:0
# Make sure any symbolic links are resolved to absolute pathexport PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR) aprun -n 48 -N 24 ./hello_world
Quick start guide
![Page 6: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/6.jpg)
ARCHER LayoutCompute node architecture and topology
![Page 7: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/7.jpg)
Cray XC30 nodeThe XC30 Compute node features:• 2 x Intel® Xeon®
Sockets/die• 12 core Ivy Bridge
• 64GB in normal nodes• 128GB in 376 “high
memory” nodes
• 1 x Aries NIC• Connects to shared Aries
router and wider network
Cray XC30 Compute Node
NUMA Node 1NUMA Node 0
Intel® Xeon®12 Core die
AriesRouter
Intel® Xeon®12 Core die
Aries NIC
32GB 32GB
PCIe 3.0
Aries Network
QPI
DDR3
![Page 8: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/8.jpg)
XC30 Compute Blade
![Page 9: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/9.jpg)
Cray XC30 Rank1 Network
o Chassis with 16 compute bladeso 128 Socketso Inter-Aries communication over
backplaneo Per-Packet adaptive Routing
![Page 10: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/10.jpg)
16 Aries connected by backplane
Cray XC30 Rank-2 Copper Network
4 nodes connect to a single Aries
6 backplanes connected with
copper cables in a 2-cabinet group:
Active optical cables interconnect
groups
2 Cabinet Group
768 Sockets
![Page 11: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/11.jpg)
Copper & Optical Cabling
OpticalConnections
CopperConnections
![Page 12: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/12.jpg)
ARCHER FilesystemsBrief Overview
![Page 13: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/13.jpg)
Nodes and filesystems
RDF /home /work
Login/PP Nodes Compute Nodes
![Page 14: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/14.jpg)
ARCHER Filesystems• /home (/home/n02/n02/<username>)
• Small (200 TB) filesystem for critical data (e.g. source code)• Standard performance (NFS)• Fully backed up
• /work (/work/n02/n02/<username>)• Large (>4 PB) filesystem for use during computations• High-performance, parallel (Lustre) filesystem• No backup
• RDF (/nerc/n02/n02/<username>)• Research Data Facility• Very large (26 PB) filesystem for persistent data storage (e.g. results)• High-performance, parallel (GPFS) filesystem• Backed up via snapshots
User guide
![Page 15: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/15.jpg)
Research Data Facility• Mounted on machines such as:
• ARCHER (service and PP nodes)• DiRAC Bluegene/Q (frontend nodes)• Data Transfer Nodes (DTN)• Jasmin
• Data Analytic Cluster (DAC)• Run compute, memory, or IO intensive analyses on data hosted on
the service.• Nodes are specifically tailored for data intensive work with direct
connections to the disks.• Separate from ARCHER but very similar architecture
RDF guide
![Page 16: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/16.jpg)
ARCHER SoftwareBrief Overview
![Page 17: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/17.jpg)
Cray’s Supported Programming Environment
17
Programming Languages
Fortran
C
C++
I/O Libraries
NetCDF
HDF5
Optimized Scientific Libraries
LAPACK
ScaLAPACK
BLAS (libgoto)
Iterative Refinement
Toolkit
Cray Adaptive FFTs (CRAFFT)
FFTW
Cray PETSc (with CASK)
Cray Trilinos (with CASK)
Cray developed
Licensed ISV SW
3rd party packaging
Cray added value to 3rd party
3rd Party Compilers
• Intel Composer
GNU
Compilers
Cray Compiling Environment
(CCE)
Programming models
Distributed Memory (Cray MPT)• MPI• SHMEM
PGAS & Global View• UPC (CCE)• CAF (CCE)• Chapel
Shared Memory• OpenMP 3.0• OpenACC
Python
•CrayPat• Cray Apprentice2
Tools
Environment setup
Debuggers
Modules
Allinea (DDT)
lgdb
Debugging Support Tools
•Abnormal Termination Processing
Performance Analysis
STAT
Scoping Analysis
Reveal
![Page 18: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/18.jpg)
Module environment• Software is available via the module environment
• Allows you to load in different packages and different versions of packages
• Deals with potential library conflicts
• This is based around the module command• List currently loaded modules: module list• List all modules: module available• Load a module: module load x• Unload a module: module unload x
Best practice guide
![Page 19: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/19.jpg)
ARCHER SAFEService Administration
https://www.archer.ac.uk/safe
![Page 20: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/20.jpg)
SAFE• SAFE is an online ARCHER management system which
all users have an account on• Request machine accounts• Reset passwords• View resource usage
• Primary way in which PIs manage their ARCHER projects• Management of project users • Track user’s project usage• Email users of the project
SAFE user guide
![Page 21: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/21.jpg)
Project resources• Machine usage is charged in kAUs.
• This is time running your jobs on each compute node, 0.36 kAUs for a node hour.
• There is no usage charge for time spent working on the login nodes, post processing nodes or RDF DAC
• You can track usage via the SAFE or the budgets command (calculated daily.)
• Disk quotas• There is no specific charge made for disk usage, but all projects
have quotas• If you need more disk space then contact the PI or us if you
manage the project
User guide
![Page 22: ARCHER Advanced Research Computing High End Resource Nick Brown nick.brown@ed.ac.uk.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697c0061a28abf838cc551e/html5/thumbnails/22.jpg)
To conclude….• You will be using ARCHER during this course
• If you have any questions then let us know
• The documentation on the archer website is a good reference tool• Especially the quick start guide
• In normal use if you have any questions or can not find something then contact the helpdesk• [email protected]