HPC Workshop: Working on Soroban - FU-Berlin ZEDAT · PDF fileHPC Workshop: Working on Soroban...
Transcript of HPC Workshop: Working on Soroban - FU-Berlin ZEDAT · PDF fileHPC Workshop: Working on Soroban...
HPC Workshop: Working on Soroban
Dr. L. Bennett
ZEDAT, FU Berlin
SS 2015
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 1 / 34
Outline
1 IntroductionGoalsResources
2 Before Running JobsPreparationBatch System
3 While Running JobsQueueProcesses
4 After Running JobsResources UsedTidying Up
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 2 / 34
Introduction Goals
Overview
Your goals
Complete a certain task in a certain time
That's (probably) it!
Our goals
Manage resources such that the largest number of people can achievetheir goals
Provide resourcesHelp users make good use of resources
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 3 / 34
Introduction Resources
Soroban
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 4 / 34
Introduction Resources
Overview
Limited resourcescores
memory
disk-space
licences
Intel compilerMATLAB / MATLAB toolboxes
graphics processing units (GPUs)
Unlimited resources
software
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 5 / 34
Introduction Resources
Main Limited & Consumable Resources
Cores
1344 cores in total
12 cores per node
Memory
5.25 TB in total
24, 48 or 96 GB per node
18, 42 or 90 GB available for users (6 GB reserved for OS)
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 6 / 34
Introduction Resources
Main Limited & Nonconsumable Resource
Disk space
local disks
none
distributed �le systems
16 TB /home
174 TB /scratch
Limits
total size of the �le-systems
how well the admins monitor usage
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 7 / 34
Introduction Resources
Bottlenecks
Comparison of resources
Resource Bottleneck? Commentcores sometimes many cores often unused,
but few on individual nodesmemory often users often overestimatedisk space not usually we keep an eye on disk usagedisk access occasionally IO may cross critical threshold
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 8 / 34
Introduction Resources
File Systems
/home
16 TB
NFS
backup (except tmp, temp)
scriptsresults
high metadata performance
good for reading/writingmany �les
/scratch
174 TB
FhGFS
no backup
temporary datacopies of input data
high I/O performance
good for reading/writinglarge �les
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 9 / 34
Introduction Resources
Useful commands
$ sinfo -eo "%30N %.5D %9P %11T %.6m %20E" or $ sinfo -Nl
NODELIST NODES PARTITION STATE CPUS MEMORY REASON
gpu01 1 gpu allocated 12 18000 none
node[001-002] 2 test allocated 12 42000 none
node003 1 main* idle 12 18000 none
node[004-024] 19 main* allocated 12 18000 none
node[025-034,036-100] 70 main* allocated 12 42000 none
node035 6 main* draining 12 42000 large uptime
node[101-111] 11 main* allocated 12 90000 none
node112 1 main* drained 12 90000 large uptime
$ df -h /home /scratch
Filesystem Size Used Avail Use% Mounted on
master.ib.cluster:/home 17T 13T 3.7T 78% /home
fhgfs_nodev 164T 111T 54T 68% /scratch
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 10 / 34
Introduction Resources
Software
Software can be . . .
directly in the operating system, e.g.
PerlPython
provided centrally via module (see next slides) , e.g.
GROMACSNAMD
in your /home directory
before you install here, check whether the software is already available
all three of the above, e.g.
R
not yet installed
we can install software centrally for you (others may also need it)
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 11 / 34
Introduction Resources
Modules I
$ module av...
gromacs/openmpi/gcc/64/4.5.4 stampy/1.0.23
gromacs/openmpi/gcc/64/4.5.5 symmetree/1.1
hpl/2.0 tophat/2.0.13
iozone/3_373 trinity/20140413p1
java/1.8.0 turbomole/6.5(default)
lammps/11Jan12-openmpi turbomole/6.5mpi
mafft/7.205 turbomole/6.5smp
matlab/R2011b vasp/5.2.12(default)
matlab/R2012b vasp/5.3.3
matlab/R2014a vesta/3
matlab/R2014b vmd/1.9.1
migrate-n/3.6 wien2k/11_32bit
mira/3.4.0 wien2k/11_64bit
$ module av fsl
------------------------ /cm/shared/modulefiles/production ---------------------
fsl/4.1.9 fsl/5.0.0 fsl/5.0.1 fsl/5.0.7
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 12 / 34
Introduction Resources
Modules II
$ module help fsl
----------- Module Specific Help for 'fsl/5.0.7' ------------------
PROGRAM
FSL 5.0.7 - FMRIB Software Library
EXECUTABLES
This package encompasses a large number of executables. Please refer to the
documentation:
http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FslOverview
NOTES
Some executables may support multithreading.
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 13 / 34
Introduction Resources
Modules III
$ module show fsl-------------------------------------------------------------------
/cm/shared/modulefiles/production/fsl/5.0.7:
module-whatis FSL 5.0.7 - FMRIB Software Library
append-path PATH /cm/shared/apps/fsl/5.0.7/bin
setenv FSLDIR /cm/shared/apps/fsl/5.0.7
setenv FSLOUTPUTTYPE NIFTI_GZ
setenv FSLMULTIFILEQUIT TRUE
setenv FSLTCLSH /cm/shared/apps/fsl/5.0.7/bin/fsltclsh
setenv FSLWISH /cm/shared/apps/fsl/5.0.7/bin/fslwish
setenv FSLCONFDIR /cm/shared/apps/fsl/5.0.7/config
setenv FSLMACHTYPE `/cm/shared/apps/fsl/5.0.7/etc/fslconf/fslmachtype.sh`
-------------------------------------------------------------------
$ module add fsl / $ module rm fsl
$
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 14 / 34
Before Running Jobs Preparation
What to think about
Scope
What do I want to do?
How much time have I got?
Are su�cient resources, such as software, CPU-time, memory, diskspace available?
Skills
Do I have general Unix skills?
Do I have program-speci�c skills?
Do I know here to get help?
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 15 / 34
Before Running Jobs Preparation
Getting Help
Our website http://www.zedat.fu-berlin.de/HPC/Home
is in English and German
has general information and some information about speci�c programs
People in your group
may already be familiar with Soroban
may already have done similar calculations
may know other people who can help
People here in the HPC group
can be reached via email ([email protected]) or telephone
are happy to talk to you about your project face-to-face
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 16 / 34
Before Running Jobs Batch System
Overview
Slurm
simple Linux utility for resource management
allocates resources for jobs
provides framework for starting and monitoring jobs
works out job priorities
Basic work�ow
user submits job to Slurm via sbatch
Slurm calculates priorities for each job
Slurm starts jobs according to priority and available resources
(Slurm noti�es user on job completion)
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 17 / 34
Before Running Jobs Batch System
Fairshare
Fairshare Factor
F = 2−U/S
where
F fairshare factor
U normalised usage
S total normalised shares
Thus
U > S implies F < 0.5U = S implies F = 0.5U < S implies F > 0.5 g
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 18 / 34
Before Running Jobs Batch System
Priority
Contributing factors
P = wfF + wnNCPUs + waA
where
F fairshare
NCPUs percentage of CPUs requested
A age (time in queue)
wi weighting factors
wf = 1000000wn = 10000wa = 1000
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 19 / 34
Before Running Jobs Batch System
Main Parameters
Speci�cation
Cores--nodes=1-1 exactly one node--nodes=2 at least 2 nodes--nodes=2-12 at least 2 nodes, at most 12 nodesMemory--mem=10240 memory per node in MB--mem-per-cpu=1024 memory per CPU in MBTime--time=00:30:00 maximum run-time (hr:min:sec)--time=2-12:00:00 maximum run-time (days-hr:min:sec)
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 20 / 34
Before Running Jobs Batch System
Back�ll
Node requirements
Mechanism
Job A is running
Job B can only runwhen Job A ends
Job C can startbefore Job B if itends before Job A
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 21 / 34
Before Running Jobs Batch System
Slurm Batch Script
Serial job#SBATCH [email protected]
#SBATCH --mail-type=end
#SBATCH --job-name=my_test_job_serial
#SBATCH --mem-per-cpu=2048
#SBATCH --time=08:00:00
cd /scratch/fakeuser/test
cp ~/input/test.input .
module add fakeprog
fakeprog -i test.input > test.out
cp test.out ~/results
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 22 / 34
Before Running Jobs Batch System
Slurm Batch Script II
Multithreaded job#SBATCH [email protected]
#SBATCH --mail-type=end
#SBATCH --job-name=my_test_job_multithreaded
#SBATCH --mem-per-cpu=2048
#SBATCH --time=04:00:00
#SBATCH --ntasks=6
#SBATCH --nodes=1-1
cd /scratch/fakeuser/test
cp ~/input/test.input .
module add fakeprog_mt
fakeprog_mt -n $SLURM_NTASKS -i test.input > test.out
cp test.out ~/results
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 23 / 34
Before Running Jobs Batch System
Slurm Batch Script III
MPI parallel job#SBATCH [email protected]
#SBATCH --mail-type=end
#SBATCH --job-name=my_test_job_mpi
#SBATCH --mem-per-cpu=2048
#SBATCH --time=04:00:00
#SBATCH --ntasks=24
cd /scratch/fakeuser/test
cp ~/input/test.input .
module add fakeprog_mpi
fakeprog_mpi -n $SLURM_NTASKS -i test.input > test.out
cp test.out ~/results
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 24 / 34
Before Running Jobs Batch System
Useful commands
$ sbatch my_job.sh
Submitted batch job 25618
$ sprio -o "%.7i %8u %.10Y %.10A %.10F %.10J" | sort -nk3
JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE
123496 alice 41861 24 31735 102
123468 bob 59108 21 49042 45
123467 carol 232189 1 222116 72
or $ sprio -l | sort -nk3
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 25 / 34
While Running Jobs Queue
Information on jobs
$ squeue -u fakeuser
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)123456 main test_01 fakeuser R 7-02:49:44 4 node[019]123457 main test_02 fakeuser R 2-01:46:13 3 node[008-009,011]123458 main test_03 fakeuser PD 00:00 1 (Priority)
$ scontrol show job 123456JobId=123456 Name=test_01
UserId=fakeuser(111111) GroupId=agfake(999999)Priority=823131 Account=agfake QOS=normalJobState=RUNNING Reason=None Dependency=(null)Requeue=1 Restarts=1 BatchFlag=1 ExitCode=0:0RunTime=01:23:35 TimeLimit=2-12:00:00 TimeMin=N/ASubmitTime=2015-03-18T06:40:35 EligibleTime=2015-03-17T16:03:28StartTime=2015-03-18T06:40:37 EndTime=2015-03-21T05:40:37PreemptTime=None SuspendTime=None SecsPreSuspend=0Partition=main AllocNode:Sid=node005:98510ReqNodeList=(null) ExcNodeList=(null)NodeList=node019BatchHost=node019NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*MinCPUsNode=1 MinMemoryNode=2750M MinTmpDiskNode=0Features=(null) Gres=(null) Reservation=(null)Shared=OK Contiguous=0 Licenses=(null) Network=(null)Command=/work/fakeuser/scaling_test/batch.shWorkDir=/scratch/fakeuser/scaling_test
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 26 / 34
While Running Jobs Processes
Tools
$ ps -flHu lorisF S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD5 S loris 23315 23309 0 80 0 - 31612 poll_s 10:28 ? 00:00:00 sshd: loris@pts/440 S loris 23316 23315 0 80 0 - 30621 wait 10:28 pts/44 00:00:00 -bash0 R loris 15717 23316 0 80 0 - 29655 - 11:12 pts/44 00:00:00
$ htop -u loris
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 27 / 34
After Running Jobs Resources Used
Memory
$ cat slurm-123456.outFri Dec 20 11:02:11 CET 2013
-- Slurm Task Epilog ----------------------------------------------------------node[040,042]:
JobID Memory Used Memory Requested STATUS, Comment on Slurm Memory------------ ----------- ---------------- -----------------------------------123456 1084 6000 COMPLETED, Memory too high
JobID MaxRSS TotalCPU Elapsed NTasks NCPUS MaxRSSNode State End------------ ----------- ------------- ----------- ------ ----- ---------- ---------- -------------------123456 41:22.170 00:10:25 4 COMPLETED 2013-12-20T11:01:40123456.batch 6620K 00:00.184 00:10:25 1 1 node040 COMPLETED 2013-12-20T11:01:40123456.0 1116196K 41:21.985 00:10:23 4 4 node040 COMPLETED 2013-12-20T11:01:40
SLURM_JOB_DERIVED_EC=0SLURM_JOB_EXIT_CODE=0COMPLETED
Why bother?
Job does not have to wait for memory it doesn't need
More memory available to other jobs (also yours!)
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 28 / 34
After Running Jobs Resources Used
Time
Email about time limitJobID Limit Requested RunTime Used STATUS, Comment on Slurm TimeLimit
------ --------------- ------------ ----------------------------------613062 23:59:00 07:17 COMPLETED, Limit too high613087 23:59:00 23:03 COMPLETED, Ok613128 23:59:00 06:22 COMPLETED, Limit too high613936 23:59:00 1-00:01:16 TIMEOUT(TimeLimit), Limit too low!614003 05:59:00 04:45 CANCELLED(COMPLETED), Limit too high614077 23:59:00 06:16 CANCELLED(COMPLETED), Limit too high614096 23:59:00 15:22 COMPLETED, Limit too high
Why bother?
Job can take advantage of back�ll
Before maintenance, maximum run-time is sometimes shortened
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 29 / 34
After Running Jobs Tidying Up
Data Management
Do
delete �les you no longer need
archive �les that you don't currently need
tar -czf old_simulations.tgz ./old_simulations
rm -rf ./old_simulations
move data o� Soroban
Don't
write a large volume of data to /home
write a large number of �les to /scratch
duplicate data
ignore mails about your data usage
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 30 / 34
After Running Jobs Tidying Up
Useful Commands
$ du -sh ~ /scratch/loris (this can take a while)
16G /home/loris
141G /scratch/loris
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 31 / 34
After Running Jobs Tidying Up
Comparison of Data Usage
/home
0
5000
10000
2013
−09−
14
2013
−11−
09
2014
−01−
04
2014
−03−
01
2014
−04−
26
2014
−06−
21
2014
−08−
16
2014
−10−
11
2014
−12−
06
2015
−01−
31
2015
−03−
28
date
disk
usa
ge [G
B]
/scratch
0
25000
50000
75000
100000
2013
−09−
15
2013
−11−
10
2014
−01−
05
2014
−03−
02
2014
−04−
27
2014
−06−
23
2014
−08−
18
2014
−10−
13
2014
−12−
08
2015
−02−
03
2015
−03−
31
date
disk
usa
ge [G
B]
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 32 / 34
After Running Jobs Tidying Up
Publications
Writing a paper?
Please acknowledge us in the paper, if time on Soroban has contributed toyour results
Had something published?
Please send us:
the bibliographical reference
a nice graphic
Why let us put it on our website?
We get a warm fuzzy feeling:Our time and the CPU-time wasn't wasted
The university management gets a warm fuzzy feeling:Money for HPC is well-spent
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 33 / 34
After Running Jobs Tidying Up
Leaving the group / university
Account expiry
If your FU account expires we shall
1 warn you and your group leader
2 wait for your FU account to be deleted (around 4 weeks)
3 delete your HPC account and all your data
4 inform your group leader about the deletion
Dr. L. Bennett (ZEDAT, FU Berlin) HPC Workshop: Working on Soroban SS 2015 34 / 34