Post on 16-Dec-2015
Workshop: Using the VIC3 Cluster for Statistical Analyses
Support perspective
G.J. Bex
Overview
• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A
Birds eye view of VIC3
login1
login2
svcs1svcs2
r1i0n0
r1i0n1 r1i3n15
r2i0n0
r2i0n1
netapp
~vsc30034
/bin
r2i3n15
VIC3 nodes• Compute nodes
– 112 nodes with 2 quad core 'harpertown', 8GB RAM– 80 nodes with 2 quad core 'nehalem', 24GB RAM– 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk
• Storage– 20 TB disk space shared between home directories and scratch
space, access via NFS– 4 nodes with disks for a parallel file system (needed for MPI I/O
jobs)
• Service nodes include 2 login nodes
1584 cores, for16.6 TFlop (theoretical peak)
What can you run?
• All open source linux software• All linux software the K.U.Leuven has a license
for that covers the cluster, and you are a K.U.Leuven staff member
• All linux software you have a license for that covers the cluster
• No Windows software
R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users
Overview
• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A
Running example: SAS code
• Your SAS program, e.g., 'clmk.sas'– is usually interactive– depends on parameters, e.g.,• type of distribution• alpha, beta
– has to be run for several types and values of alpha and beta
Running example: batch mode
• 1st step: convert it for batch mode– capture command line variables:
– run it from the command line:
…%LET type = "%scan(&sysparm, 1, %str(:))";%LET alpha = %scan(&sysparm, 2, %str(:));%LET beta = %scan(&sysparm, 3, %str(:));…
$ sas –batch –noterminal –sysparm discr:1.3:15.0 clmk.sas
login
I've got a job to do: PBS files
compute nodes
queue system/scheduler:Torque/Moab
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas
clmk.pbs
$ msub clmk.pbs
No more modifying!#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas
$ msub clmk.pbs
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
$ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0
Going parallel… or nuts?
• Parameter sets…– are independent, so computations can be done in
parallel!– but all combination of type, alpha, beta: large
number of jobs
Worker framework
Overview
• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A
Conceptuallytype alpha beta
discr 1.3 15.0
discr 1.3 30.0
discr 1.8 15.0
discr 1.8 30.0
… … …
cont 1.3 15.0
… … …
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
Concretetype alpha beta
discr 1.3 15.0
discr 1.3 30.0
discr 1.8 15.0
discr 1.8 30.0
… … …
cont 1.3 15.0
… … …
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas
clmk.pbs
clmk.csv
$ module load worker/1.0$ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8
N
N rows will be computed in parallel by 2 × 8 – 1 = 15 cores
Caveat 1: time is of the essence…
• How long does your job need? (= walltime)– time to compute N rows/requested cores
• walltime limitations– more than 5 minutes– less than 2 days
• hence, if walltime exceeds 2 days, split data and submit multiple jobs
• explicitly request sufficient walltime:
No hard limits,but guidelines toreduce queue time
$ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00
Caveat 2: slave labour
• P cores, how to choose P?– functions• 1 master• P – 1 slaves
– each compute node has 8 cores, so P mod 8 = 0– N >> P: better load balancing, efficiency– larger P• shorter walltime• (potentially) longer time in queue
shortest turn-around: hard to predict
turn-around=
queue time+
walltime
Caveat 3: independence
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"
sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas
#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR
log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"
sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas
SAS locks log and output files!
Make sure each computation writes to its own files!
Overview
• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A
Conceptually: MapReduce
data.txt
data.txt.1
data.txt.2
data.txt.7
…result.txt
result.txt.1
result.txt.2
result.txt.7
…map reduce
Concrete: -prolog & -epilog
data.txt
data.txt.1
data.txt.2
data.txt.7
…result.txt
result.txt.1
result.txt.2
result.txt.7
…
prolog.sh epilog.shprolog.sh
batch.sh
batch.sh
batch.sh
$ wsub –prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8
Overview
• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A
Where to find help?
• http://www.vscentrum.be/vsc-help-center• hpcinfo@icts.kuleuven.be• http://status.kuleuven.be/hpc• UHasselt staff: geertjan.bex@uhasselt.be