Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Workshop: Using the VIC3 Cluster for Statistical Analyses

Support perspective

G.J. Bex

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Birds eye view of VIC3

login1

login2

svcs1svcs2

r1i0n0

r1i0n1 r1i3n15

r2i0n0

r2i0n1

netapp

~vsc30034

r2i3n15

VIC3 nodes• Compute nodes

– 112 nodes with 2 quad core 'harpertown', 8GB RAM– 80 nodes with 2 quad core 'nehalem', 24GB RAM– 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk

• Storage– 20 TB disk space shared between home directories and scratch

space, access via NFS– 4 nodes with disks for a parallel file system (needed for MPI I/O

• Service nodes include 2 login nodes

1584 cores, for16.6 TFlop (theoretical peak)

What can you run?

• All open source linux software• All linux software the K.U.Leuven has a license

for that covers the cluster, and you are a K.U.Leuven staff member

• All linux software you have a license for that covers the cluster

• No Windows software

R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users

Overview

Running example: SAS code

• Your SAS program, e.g., 'clmk.sas'– is usually interactive– depends on parameters, e.g.,• type of distribution• alpha, beta

– has to be run for several types and values of alpha and beta

Running example: batch mode

• 1st step: convert it for batch mode– capture command line variables:

– run it from the command line:

…%LET type = "%scan(&sysparm, 1, %str(:))";%LET alpha = %scan(&sysparm, 2, %str(:));%LET beta = %scan(&sysparm, 3, %str(:));…

$ sas –batch –noterminal –sysparm discr:1.3:15.0 clmk.sas

I've got a job to do: PBS files

compute nodes

queue system/scheduler:Torque/Moab

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

clmk.pbs

$ msub clmk.pbs

No more modifying!#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

$ msub clmk.pbs

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

$ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0

Going parallel… or nuts?

• Parameter sets…– are independent, so computations can be done in

parallel!– but all combination of type, alpha, beta: large

number of jobs

Worker framework

Overview

Conceptuallytype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

Concretetype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

clmk.pbs

clmk.csv

$ module load worker/1.0$ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8

N rows will be computed in parallel by 2 × 8 – 1 = 15 cores

Caveat 1: time is of the essence…

• How long does your job need? (= walltime)– time to compute N rows/requested cores

• walltime limitations– more than 5 minutes– less than 2 days

• hence, if walltime exceeds 2 days, split data and submit multiple jobs

• explicitly request sufficient walltime:

No hard limits,but guidelines toreduce queue time

$ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00

Caveat 2: slave labour

• P cores, how to choose P?– functions• 1 master• P – 1 slaves

– each compute node has 8 cores, so P mod 8 = 0– N >> P: better load balancing, efficiency– larger P• shorter walltime• (potentially) longer time in queue

shortest turn-around: hard to predict

turn-around=

queue time+

walltime

Caveat 3: independence

log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"

sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas

log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"

sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas

SAS locks log and output files!

Make sure each computation writes to its own files!

Overview

Conceptually: MapReduce

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

…map reduce

Concrete: -prolog & -epilog

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

prolog.sh epilog.shprolog.sh

batch.sh

$ wsub –prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8

Overview

Where to find help?

• http://www.vscentrum.be/vsc-help-center• hpcinfo@icts.kuleuven.be• http://status.kuleuven.be/hpc• UHasselt staff: geertjan.bex@uhasselt.be

Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Documents

Transcript of Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

BEx Monitor

BEx Broacasting

BEX Relays

BEX Presentation

Bex Broadcaster

Bex User Manual

Excel Bex Pharma

IGBC/BEX Presentation

Bex Analyzer - BI

BEx Analyzer Instructions

Advanced Training BEx

BEX Broadcasting

BEx Analyzer

Presentatie Bex*stage

BEX Calculus Course

BEX Switches

I-Bex Video Analytics Platform - Partner Firstpartnerfirst.pelco.com/sites/partnerfirst.pelco.com/files/i-bex factsheet_low_res.pdfI-Bex provides complete Video Surveillance and Intelligent

BEX 305 Manual

Research and Operational Application of TRMM- Based, Fine Time Scale Precipitation Analyses R.F. Adler 1, G.J. Huffman 1,2, D.T. Bolvin 1,2, S. Curtis.

MINES DE SEL DE BEX · MINES DE SEL DE BEX Tél. +41 24 463 03 30 info@mines.ch Route des Mines de Sel 55 1880 Bex - Suisse Partenaires Soutien MINES DE SEL DE BEX Bex Le Bévieux