HPC wiki Documentation · 2020-02-14 · HPC wiki Documentation, Release 2.0 2.1.3Identity Manager...

HPC wiki DocumentationRelease 2.0

Hurng-Chun Lee, Daniel Sharoh, Edward Gerrits, Marek Tyc, Mike van Engelenburg, Mariam Zabihi

Feb 14, 2020

Contents

1 About the wiki 1

2 Table of Contents 32.1 High Performance Computing for Neuroimaging Research . . . . . . . . . . . . . . . . . . . . . . . 32.2 Linux tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Introduction to the Linux BASH shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 The HPC cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5 The project storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982.6 Linux & HPC tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

i

CHAPTER 1

About the wiki

This wiki contains materials used by the Linux and HPC workshop held regularly at Donders Centre for CognitiveNeuroimaging (DCCN). The aim of this workshop is to provide researchers the basic knowledage to use the High-Performance Computing (HPC) cluster for data analysis. During the workshop, the wiki is used in a combinationwith lectures; nevertheless, contents of the wiki are written in such that they can also be used for self-learning andreferences.

There are two major sessions in this wiki. The Linux basic consists of the usage of the Linux operating system and anintroduction to the Bash scripting language. After following the session, you should be able to create text-based datafiles in a Linux system, and write a bash script to perform simple data analysis on the file. The cluster usage focuseson the general approach of running computations on the Torque/Maui cluster. After learning this session, you shouldbe knowing how to distribute data analysis computations to the Torque/Maui cluster at DCCN.

1

http://donders.ru.nl

http://donders.ru.nl

HPC wiki Documentation, Release 2.0

2 Chapter 1. About the wiki

CHAPTER 2

Table of Contents

2.1 High Performance Computing for Neuroimaging Research

2.1.1 Computing Cluster

The HPC cluster at DCCN consists of three groups of computer nodes, they are:

• access nodes: mentat001 ~ mentat005 as login nodes.

• mentat compute nodes: mentat203 ~ mentat208 used for long term (i.e. > 72 hours) computations.

• Torque/Maui cluster: a pool of powerful computers with more than 800 CPU cores, managed by the Torquejob manager and the Moab job scheduler.

2.1.2 Central Storage

The central storage provides a shared file system amongst the Windows desktops within DCCN and the computers inthe HPC cluster.

On the central storage, every user has a personal folder with a so-called office quota (20 gigabytes by default). Thispersonal folder is referred to as the M:\ drive on the Windows desktops.

Storage spaces granted to research projects (following the project proposal meeting(PPM)) are also provided by thecentral storage. The project folders are organised under the directory /project which is referred to as the P:\ driveon the Windows desktops.

The central storage also hosts a set of commonly used software/tools for neuroimaging data processing and analysis.This area in the storage is only accessible for computers in the HPC cluster as software/tools stored there require theLinux operating system.

3

http://www.adaptivecomputing.com/products/open-source/torque

http://www.adaptivecomputing.com/products/open-source/maui/

http://intranet.donders.ru.nl/index.php?id=4502


Fig. 1: Figure: the HPC environment at DCCN.

4 Chapter 2. Table of Contents


2.1.3 Identity Manager

The identity manager maintains information for authenticating users accessing to the HPC cluster. It is also used tocheck users’ identity when logging into the Windows desktops at DCCN. In fact, the user account received from theDCCN check-in proceduer is managed secretely by this identity manager.

Note: The user account concerned here (and throughout the entire wiki) is the one received via the DCCN check-inprocedure. It is, in most of cases, a combination of the first three letters of your first name and the first threeletters of your last name. It is NOT the account (i.e. U/Z/S-number) from the Radboud University.

2.1.4 Supported Software

A list of supported software can be found here.

2.2 Linux tutorial

2.2.1 Very short introduction of Linux

Linux is an operating system originally developed by Linus Torvalds in 90’s for cloning the Unix operating systemto personal computers (PCs). It is now one of the world-renowned software projects developed and managed by theopen-source community.

With its open nature in software development, free in (re-)distribution, and many features inherited directly from Unix,the Linux system provides an ideal and affordable environment for software development and scientific computation.It is why Linux is widely used in most of scientific computing systems nowadays.

2.2. Linux tutorial 5

https://intranet.donders.ru.nl/index.php?id=4465

https://intranet.donders.ru.nl/index.php?id=4465

http://intranet.donders.ru.nl/index.php?id=software

http://en.wikipedia.org/wiki/Linux

http://en.wikipedia.org/wiki/Linus_Torvalds

http://en.wikipedia.org/wiki/Unix


Architecture

The figure above illustrates a simplified view of the Linux architecture. From inside out, the core of the system iscalled the kernel. It interacts with hardware devices, and provides upper layer components with low-level functionsthat hide complexity of, for example, arranging concurrent accesses to hardware. The shell is an interface to the kernel.It takes commands from user (or application) and executes kernel’s functions accordingly. Applications are generallyrefer to system utilities providing advanced functionalities of the operating system, such as the tool cp for copyingfiles.

File and process

Everything in Linux is either a file or a process.

A process in Linux refers to an executing program identified by an unique process identifier (PID). Processes areinternally managed by the Linux kernel for the access to hardware resources (e.g. CPU, memory etc.).

In most of cases, a file in Linux is a collection of data. They are created by users using text editors, running compilersetc. Hardware devices are also represented as files in the Linux.

Linux distributions

Nowadays Linux is made available as a collection of selected software packages based around the Linux kernel. It isthe so-called Linux distribution. As of today, different Linux distributions are available on the market, each addressesthe need of certain user community.

In the HPC cluster at DCCN, we use the CentOS Linux distribution. It is a well-maintained distribution developedclosely with RedHat, a company providing commercial Linux distribution and support. It is also widely used in manyscientific computing systems in the world.


http://www.centos.org/

http://www.redhat.com/en


2.2.2 Getting started with Linux

By following this wiki, you will login to one of the access nodes of the HPC cluster, learn about the Linux shell andissue a very simple Linux command on the virtaul terminal.

Obtain a user account

Please refer to this guide.

SSH login with Putty

Please refer to this guide.

The prompt of the shell

After you login to the access node, the first thing you see is a welcome message together with couple of news messages.Following the messages are few lines of text look similar to the example below:

honlee@mentat001:~999 $

Every logged-in users is given a shell to interact with the system. The example above is technically called the promptof the Linux shell. It waits for your commands to the system.

Following the prompt, you will type in commands to run programs.

Note: For the simplicity, we will use the symbol $ to denote the prompt of the shell.

Environment variables

Every Linux shell comes with a set of variables that can affect the way running processes will behave. Those variablesare called environment variables. The command to list all environment variables in the current shell is

$ env

Tip: The practical action of running the above command is to type env after the shell prompt, and press the Enterkey.

Generally speaking, user needs to set or modify some default environment variables to get a particular program runningproperly. A very common case is to adjust the PATH variable to allow the system to find the location of the program’sexecutable when the program is launched by the user. Another example is to extend the LD_LIBRARY_PATH toinclude the directory where the dynamic libraries needed for running a program can be found.

In the HPC cluster, a set of environment variables has been prepared for the data analysis software supported inthe cluster. Loading (or unloading) these variables in a shell is also made easy using the Environment Modules.For average users, it’s not even necessary to load the variables explicitly as a default set of variables correspondingto commonly used neuroimaging software are loaded automatically upon the user login. More details about usingsoftware in the HPC cluster is found here).


http://modules.sourceforge.net


Knowing who you are in the system

The Linux system is designed to support multiple concurrent users. Every user has an account (i.e. user id) that is theone you used to login to the access node. Every user account is associated with at-least one group in the system. Inthe HPC cluster at DCCN, the system groups are created in response of the research (i.e. PI) groups. User accountsare associated with groups according to the registration during the check-in procedure.

To know about your user id and the system group you are associated with, simply type id followed by pressing theEnter key to issue the command on the prompt. For example:

$ iduid=10343(honlee) gid=601(tg) groups=601(tg)

Using online manuals

A linux command comes with options for additional functionalities, the online manual provides a handy way to findthe supported options of a command. To access to the online manual of a command, one use the command manfollowed by the command in question. For example, to get all possible options of the id command, one does

$ man id

2.2.3 Understanding the Linux file system

Data and software programs in the Linux system are stored in files organised in directories (i.e. folders). The filesystem is responsible for managing the files and directories.

In this wiki, you will learn about the tree structure of the file system and understand the syntax used to represent thefile type and access permission. You will also learn the commands for creating, moving/copying, and deleting file anddirectories in the file system.

Present working directory

Right after you login to a Linux system, you are in certain working directory in the file system. It is the so-calledpresent working directory. Knowing which working directory you are currently in can be done with the commandpwd. For example,

$ pwd/home/tg/honlee

The system responses to this command with a string representing the present working directory in a special notation.This string is referred to as the path to the present working directory. The string /home/tg/honlee from the aboveexample is interpreted as follows:

In the Linux filesystem, directories and files are organised in a tree structure. The root of the tree is denoted by the/ symbol as shown at the beginning of the string. Following that is the first-level child directory called home. Itis then separated from the second-level child tg by an additional / symbol. This notation convention repeats whilemoving down the child-directory levels, until the present working directory is reached. For instance, the presentworking directory in this example is the third-level child from the root, and it’s called honlee. The hierarchy is alsoillustrated in the diagram below:

/ <-- the root directory|-- home <-- first-level child

(continues on next page)



(continued from previous page)

| |-- tg <-- second-level child| | |-- honlee <-- the present working directory

Changing the present working directory

With the file path notation, one changes the present working directory in the system using the cd command. Continuewith the above example, if we want to move to the tg directory, we do:

$ cd /home/tg

Since the directory tg is one level up with respect to the present working directory, it can also be referred by the ..symbol. Therefore, an alternative to the previous command is:

$ cd ..

The difference in between is that in the first command the directory tg is referred from the root directory using the so-called absolute path; while in the second it is referred relatively from the present working directory with the relativepath.

Tip: The relative path to the present working directory is denoted by the symbol .

The personal directory

Every user has a personal directory in which the user has full access permission to manage data stored in it. Theabsolute path of this directory is referred by an environment variable called $HOME.

Thus, one can always use the following command to change the present working directory to the personal directory.

$ cd $HOME

Tip: One can also leave out $HOME in the above cd command to move to the personal directory.

Listing files in a directory

For listing files and sub-directories in the present working directory, one use the ls command. For example,

$ ls

The option -l is frequently used to get more information about the files/directories. For example,

$ ls -ltotal 68drwxr-xr-x 2 honlee tg 4096 Aug 12 13:09 Desktopdrwxr-xr-x 2 honlee tg 4096 Aug 21 16:15 matlabdrwx------ 5 honlee tg 4096 Mar 7 14:37 opt-rw-r--r-- 1 honlee tg 84 Mar 5 10:47 startup.m-rwxr-xr-x 1 honlee tg 737 Aug 19 12:56 test.sh



File information are provided in columes. They are summarised in the following table:

Column Example Information1 drwxr-xr-x indicator for file type and access permission2 2 number of links to the file3 honlee user ownership4 tg group ownership5 4096 size of file in byte6-8 Aug 12 13:09 time of the last modification9 Desktop name of the file

File type and permission

The indicator for file type and access permission requires an interpretation, showing graphically in the picture below.

The first character presents the type of the file. In most of cases, you will see the character of d, -, or l correspondingto directory, regular or link file respectively.

The file-type character is followed by 9 additional characters organised in three sets, each consists of three charactersrepresenting the read (r), write (w) and execute (x) permissions of the file. If certain permission is disabled, a - isshown instead. The three sets, from left to right, indicate permissions for the user, the group (i.e. all users in thegroup), and others (i.e. all other users in the system). The user and group considered here are the user and groupownership (see the third and fourth columns of the table).

Changing file permission

When you are the owner of a file (or you have the write permission of it), you can change the file permission. Tochange the permission, we use the chmod command.



For example, to make a file call test readable for all users in the system, one does

$ chmod o+r test

The syntax o+r stands for add read permission for others. By replacing the character o with u or g, one adds readpermission for user or group. Replacing r with w or x will set write or execute permission instead of read. Using -instead of + removes permissions accordingly.

Copying and (re-)moving files

For copying a file, one uses the cp command. Assuming there is a file at path /home/tg/test, to make a copy ofit and place the copy at path /home/tg/test.copy, one does

$ cp /home/tg/test /home/tg/test.copy

It requires the -R option to copy a directory. For example, to copy a directory at path /home/tg/test_dir to/home/tg/test_dir.copy, one does

$ cp -R /home/tg/test_dir /home/tg/test_dir.copy

For moving a file/directory from one path to another, one uses the mv command:

$ mv /home/tg/test /home/tg/test.move$ mv /home/tg/test_dir /home/tg/test_dir.move

To delete (remove) a file from the file system, one uses the rm command:

$ rm /home/tg/test

When deleting a directory from the file system, the directory should be emptied first, i.e. not contains any files or sub-directories in it. The -r option simplify the deletion of a directory by removing files and sub-directories iteratively.

Creating new directory

Creating a directory is done by using the mkdir command. The following command create a new directory at path/home/tg/new_dir.

$ mkdir /home/tg/new_dir

The system assumes that the parent paths (/home and /home/tg) exist a prior the creation of /home/tg/new_dir. The option -p is used to create necessary parent directories.

Using wildcards

Wildcards are special syntax in specifying a group of files with some part of their names in common. Linux commandscan use wildcards to perform actions on more than one files at a time. The mostly used wildcard syntax is the asterisk* representing any number of characters.

In the example below, the wildcard is used to remove files with prefix subject_ and suffix .out in the presentworkding directory



$ lssubject_1.dat subject_2.dat subject_3.dat subject_4.dat subject_5.datsubject_1.out subject_2.out subject_3.out subject_4.out subject_5.out

$ rm subject_*.out

$ lssubject_1.dat subject_2.dat subject_3.dat subject_4.dat subject_5.dat

Tip: More wildcard syntax can be found here.

2.2.4 Working with text files

Given the simplicity and readability, text files are widely used in computing system for various purposes. In thispractice, we will use text files to store numerical data. A benefit of storing data in text file is that many tools comingalong with the Linux system can be used directly to process the data.

In the examples below, we will create two text files to store the final-exame scores of four students in the mathematicsand language courses. We will then introduce few usueful Linux commands to browse and analysis the data.

Before we start, make sure the directory $HOME/tutorial/labs is already available; otherwise create it with

$ mkdir -p $HOME/tutorial/labs

and change the present working directory to it:

$ cd $HOME/tutorial/labs

Creating and editing text file

There are many text editors in Linux. Here we use the editor called nano which is relatively easy to adopt. Let’sfirstly create a text file called score_math.dat using the following command:

Note: In Linux, the suffix of the filename is irrelevant to the file type. Use the file command to examine the filetype.

$ nano score_math.dat

You will be entering an empty editing area provided by nano. Copy or type the following texts into the area:

Thomas 81Percy 65Emily 75James 55

Press Control+o followed by the Enter key to save the file. Press Control+x to quit the editing environmentand return to the prompt.

Now repeat the steps above to create another file called score_lang.dat, and paste the data below into it.


http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/x11655.htm


Thomas 53Percy 85Emily 70James 65

When you list of the content of the present working directory, you should see the two data files.

$ ls -ltotal 0-rw-r--r-- 1 honlee tg 40 Sep 30 15:06 score_lang.dat-rw-r--r-- 1 honlee tg 37 Sep 30 15:06 score_math.dat

Browsing text file

Several commands can be used to brows the text file. First of all, the command cat can be used to print the entirecontent on the terminal. For example:

$ cat score_math.dat

When the content is too large to fit into the terminal, one uses either more or less command to print contents inpages. For example,

$ more score_math.dat$ less score_math.dat

Tip: The command less provides more functionalities than the more command such as up/down scrolling and textsearch.

When the top and bottom of the content are the only concern, one can use the commands tail and head. To printthe first 2 lines, one does

$ head -n 2 score_math.dat

To print the last 2 lines, one does

$ tail -n 2 score_math.dat

Searching in text file

For search a string in text file, one use the command grep. For example, if we would like to search for the nameThomas in the file score_math.dat, we do

$ grep 'Thomas' score_math.dat

Tip: grep supports advanced pattern searching using the regular expression.

2.2.5 Extracting information from data

This practice will continue work on the two data files created in Working with text (data) file. The aim is to presenthow to extract interesting information out of the data, using some simple but powerful command-line tools of Linux.


http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_01.html


You should have the following two files in the present working directory:

$ cat score_math.datThomas 81Percy 65Emily 75James 55

$ cat score_lang.datThomas 53Percy 85Emily 70James 65

Data sorting

If we wonder who has the highest score in the language course, a way to get the answer is applying the sort commandon the text file. For example,

$ sort -k 2 -n -r score_lang.datPercy 85Emily 70James 65Thomas 53

Here we use option -k to sort data on the second column, -n to treat the data as numerical value (instead of textcharacters by default), and make the sorting decendent with option -r. Voila, Percy has the highest score in thelanguage class.

Data processing

Using the awk, a pattern scanning and processing language, one can already perform some statistical calculation onthe data without the need of advanced tools such as R. The example below shows a way to calculate the arithmeticmean of the score in the language class.

$ awk 'BEGIN {cnt=0; sum=0;} {cnt += 1; sum += $2;} END {print "mean:", sum/cnt}'→˓score_lang.datmean: 68.25

The example above shows the basic structure of the awk language. It consists of three parts. For the explanation here,we call them the pre-processor, processor and post-processor. They are explained below:

• Pre-processor starts with a keyword BEGIN followed by a piece of codes enclosed by the curly braces (i.e.BEGIN{ ... }). It defines what to do before awk starts processing the data file. In the example above,we initiate two variables called cnt and sum for storing the number of students and the sum of the scores,respectively.

• The context of the processor is merely enclosed by the curly braces (i.e. { ... }), and it follows right afterthe pre-processor. The processor defines what to do for each line in the data file. It uses the index variables torefer to the data in a specific column in a line. The variable $0 refers to the whole line; and variables $n to thedata in the n-th column. In the example, we simply add 1 to the conunter cnt, and increase the sum by thescore taken from the 2nd column.

• Post-processor is initiated with a keyword END with context enclosed again by another curly braces (i.e. END{... }). Here in the example, we simply calculate the arithmetic mean and print it.


http://www.r-project.org


Data filtering

One can also use awk to create filters on the data. The example below selects only those with score lower than 70.

$ awk '{ if ( $2 < 70 ) print $0}' score_math.datPercy 65James 55

Data processing pipeline

Every running command is treated as a process in the Linux system. Every process is attached with three data streamsfor receiving data from an input device (e.g. a keyboard), and for printing outputs and errors to an output device (e.g.a screen). These data streams are technically called STDIN, STDOUT and STDERR standing for the standard input,standard output and standard error, respectively.

An import feature of these data streams is that the output stream (e.g. STDOUT) of a process can be connected to theinput stream (STDIN) of another process to form a data processing pipeline. The very symbol for constructing thepipeline is |, the pipe.

In the following example, we assume that we want to make a nice-looking table out of the two score files. The tablewill list the name of the student, the score for each class, and the total score of the student.

Firstly we have to put the data from the two text files together, using the paste command:

$ paste score_lang.dat score_math.datThomas 53 Thomas 81Percy 85 Percy 65Emily 70 Emily 75James 65 James 55

But the output looks ugly! Furthermore, it’s just half way to what we want to have. It is where the process pipelineplays a role. We now revise our command to as the follows:

$ paste score_lang.dat score_math.dat | awk 'BEGIN{print "name\tlang\tmath\ttotal";→˓print "---"} {print $1"\t"$2"\t"$4"\t"$2+$4}'name lang math total---Thomas 53 81 134Percy 85 65 150Emily 70 75 145James 65 55 120

Note: In the Linux shell, the string "\t" represents the Tab key. It is a way to align data in a column.

Here the pipeline is constructed in such that we firstly put together the data in the two files using the paste command,and connect the output stream of it to the input stream of the awk command to create the nice-looking table.

Saving output to file

When you have processed the data and produced a nice-looking table, it would be a good idea to save the output to thefile rather than print to the screen. Here we will discuss another important feature of the STDOUT and STDERR datastreams: the output redirection.

The following command will produce the nice-looking table again, but instead of printing table to the terminal, it willbe saved to a file called score_table.txt by redirecting the output.



$ paste score_lang.dat score_math.dat | awk 'BEGIN{print "name\tlang\tmath\ttotal";→˓print "---"} {print $1"\t"$2"\t"$4"\t"$2+$4}' > score_table.txt

Tip: Output redirection with > symbol will override the content of an existing file. One could use the >> symbol toappend new data to the existing file.

Note that the above command only redirects the STDOUT stream to a file, data to the STDERR stream will still beprinted to the terminal.

There are two approaches to save the STDERR stream to file:

1. Merge STDERR to STDOUT

$ paste score_lang.dat score_math.dat | awk 'BEGIN{print "name\tlang\tmath\ttotal";→˓print "---"} {print $1"\t"$2"\t"$4"\t"$2+$4}' > score_table.txt 2>&1

2. Save STDERR to separate file

$ paste score_lang.dat score_math.dat | awk 'BEGIN{print "name\tlang\tmath\ttotal";→˓print "---"} {print $1"\t"$2"\t"$4"\t"$2+$4}' 1>score_table.txt 2>score_table.err

2.2.6 Exercise: file system operations

Note: Please try not just copy-n-pasting the commands provided in the hands-on exercises!! Typing (and eventuallymaking typos) is an essential part of the learning process.

In this exercise, we will get you familiar with the Linux file system. Following the steps below, you will performcertain frequently used commands to perform operations on the file system, including

• browsing files and sub-directories within a directory,

• creating and removing directory,

• moving current working directory between directories,

• changing access permission of a directory,

• creating and deleting files.

You will also learn few useful wildcard syntax to make things done quicker and easier.

Tasks

1. Change the present working directory to your personal directory

$ cd $HOME

2. Create a new directory called tutorial

$ mkdir tutorial

3. Change the present working directory to the tutorial directory



$ cd tutorial

4. Create two new directories called labs and exercises

$ mkdir labs$ mkdir exercises

5. Remove all access permissions of others from the exercises directory

$ chmod o-rwx exercises

6. Set groups to have read and execute permissions on the exercises directory

$ chmod g=rx exercises

7. Change the present working directory to $HOME/tutorial/labs

$ cd $HOME/tutorial/labs

8. Create multiple empty files (and list them) using wildcards. Note the syntax {1..5} in the first commandbelow. It is taken by the Linux shell as a serious of sequencial integers from 1 to 5.

$ touch subject_{1..5}.dat

$ ls -l subject_*-rw-r--r-- 1 honlee tg 0 Sep 30 16:24 subject_1.dat-rw-r--r-- 1 honlee tg 0 Sep 30 16:24 subject_2.dat-rw-r--r-- 1 honlee tg 0 Sep 30 16:24 subject_3.dat-rw-r--r-- 1 honlee tg 0 Sep 30 16:24 subject_4.dat-rw-r--r-- 1 honlee tg 0 Sep 30 16:24 subject_5.dat

Tip: The touch command is used for creating empty files.

9. Remove multiple files using wildcards. Note the syntax *. It is taken as “any characters” by the Linux shell.

$ rm subject_*.dat

2.2.7 Exercise: Familiarize Yourselves with Redirects

The typical shells used in Linux environments allow for redirecting input and output to additional commands. Thebasic redirects you will use today are > >> and | You can generally use these redirects with any standard command lineutility.

Your Task

1. Either make a new directory or go to an existing directory that you made in the previous exercise. Take a fewminutes to try each of these three redirects with arbitrary commands to improve your understanding of theirfunctionality.

Hint: Try some commands like these shown below. Experiment with other commands you learned about in the slidesthis morning, or come of the commands on your cheat sheet. Notice that you can stack redirects multiple times, as in



the first example.

$ ls /home | sort > file.txt$ echo hello > file.txt$ echo hello >> file.txt

2.2.8 Exercise: Using Wildcards


Preparation

Move into a directory you’d like to work in (make a new directory if you like), and run the command

$ touch gcutError_recon-all.log s10_recon-all.log s1_recon-all.log s6_recon-all.log→˓s8_recon-all.log

This will create empty files for the purpose of this exercise.

Background

A handy way to refer to many items with a similar pattern is with wildcards. These were described so far in thelectures, but mainly consist of the characters:

• * matches everything

• ? matches any single character

• [] matches any of the letters or numbers, or a range of letters or numbers inside the brackets

With BASH, the shell itself expands the wildcards. This means that the commands usually don’t see these specialcharacters because BASH has already expanded them before the command is run. Try to get a feel for wildcards withthe following examples

$ ls *recon-all.loggcutError_recon-all.log s10_recon-all.log s1_recon-all.log s6_recon-all.log→˓s8_recon-all.log

$ ls gcut*gcutError_recon-all.log

$ ls s[0-9]*s10_recon-all.log s1_recon-all.log s6_recon-all.log s8_recon-all.log

$ ls s[0-9]_*s1_recon-all.log s6_recon-all.log s8_recon-all.log

$ ls s[0-9][0-9]_*s10_recon-all.log



$ ls [a-z][0-9][0-9]???con-all.logs10_recon-all.log

$ ls s?_recon-all.logs1_recon-all.log s6_recon-all.log s8_recon-all.log

Do you understand all of the patterns and how they returned what they did?

The [ ] wildcard has the most complex syntax because it is more flexible. When BASH sees the [ ] characters, itwill try to match any of the characters or a range of characters it sees inside them. A range of characters is specifiedby separating two search characters with the - character. Some legal patterns would be [0-9], [5-8], [a-Z], or[ady1-3]. Another handy trick is to use the ! character to negate a search pattern inside []. For instance, [!0-9]means don’t return anything with a value between 0 and 9. Take a look at next examples to get a feel for this veryuseful globbing character.

• matching all strings starting with s1 followed by any of numbers from 0 to 9, followed then by anything.

$ ls s1[0-9]*s10_recon-all.log

• matching all strings starting with any of a range of letters from a to Z

$ ls [a-Z]*gcutError_recon-all.log s10_recon-all.log s1_recon-all.log s6_recon-all.log s8_→˓recon-all.log

• matching all strings starting with s, g, or 0.

$ ls [sg0]*

• matching all strings that do not start with s

$ ls [!s]*gcutError_recon-all.log

Your Task

1. Find a search pattern that will return all files ending in .txt

2. Find a search pattern that will return all files starting in s and ending in .log

3. Find a search pattern that will return all files starting s followed by two numbers

4. Find a search pattern that will return all files only starting s followed by one number

Solution

1. ls *.txt

2. ls s*.log

3. ls s[0-9][0-9]*

4. ls s[0-9][!0-9]*



Clean up

When your finished and have checked the solution, run the command below to remove the files we were working with.If you don’t do this, the next exercise will give you trouble.

$ rm gcutError_recon-all.log s10_recon-all.log s1_recon-all.log s6_recon-all.log s8_→˓recon-all.log

2.2.9 Exercise: play with text-based data file


Preparation

Download this data file using the following command:

$ wget https://raw.githubusercontent.com/Donders-Institute/hpc-wiki-v2/master/docs/→˓linux/exercise/gcutError_recon-all.log

This data file is an example output file from a freesurfer command submitted to the cluster using qsub. In this simpletask we are going to try to extract some information from it using a few commands.

Your Task

1. Construct a Linux command pipeline to get the subject ID associated with the log file. The subejct ID is of theform Subject##, i.e Subject01, Subject02, Subject03, etc. Use one command to send input to grep, and then usegrep to search for a pattern. If you’re a bit confused, take a look at the hints and the example grep commandbelow. You’ll have to modify it to get the result you want.

Hint:

• Commands separated with a pipe, the | character, send the output of the command to the left of the pipeas input to the command on the right of the pipe.

• Think back on the exercise about wildcards. grep uses something called regular expressions that aresimilar to wildcards, but much more extensive. For grep regexps, * and [] work the same way as theydo in wildcards. For a fuller treatment of regexps, click here. For a quick example see below. You cangrep for a search term in a file with something like the following:

#example grep command$ cat file.txt | grep SEARCHTERM# where searchterm can be something like$ cat file.txt | grep "[0-9][0-9].*"# this search term would find matches in strings that start with two numbers→˓followed by anything

2. If you completed Task 1, you were able to find the output you wanted, but there was much more output sent tothe screen than you needed. Construct another pipeline to limit the output of grep to only the first line.




Hint: Think of a command that prints the first n lines of a file. You can always google the task if you can’tthink of the right tool for the job.

Solution

Solution to Task 1

$ cat gcutError_recon-all.log | grep "Subject[0-9][0-9]"/home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/FreeSurfer-subjid FreeSurfer -i /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/→˓Scans/Anatomical/MP2RAGE/MP2RAGE.nii -allsetenv SUBJECTS_DIR /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/FreeSurfermri_convert /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/Scans/→˓Anatomical/MP2RAGE/MP2RAGE.nii /home/language/dansha/Studies/LaminarWord/→˓SubjectData/Subject05/FreeSurfer/mri/orig/001.mgzmri_convert /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/Scans/→˓Anatomical/MP2RAGE/MP2RAGE.nii /home/language/dansha/Studies/LaminarWord/→˓SubjectData/Subject05/FreeSurfer/mri/orig/001.mgzreading from /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/Scans/→˓Anatomical/MP2RAGE/MP2RAGE.nii...writing to /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/FreeSurfer/→˓mri/orig/001.mgz.../home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/FreeSurfer/mri/orig/→˓001.mgzcp /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/FreeSurfer/mri/→˓orig/001.mgz /home/language/dansha/Studies/LaminarWord/SubjectData/Subject05/→˓FreeSurfer/mri/rawavg.mgz

Hint: Note that you could also have run the command

$ grep "Subject[0-9][0-9]" gcutError_recon-all.log

to get the same results. The traditional unix command line tools typically provide many ways of doing the same thing.It’s up to the user to find the best way to accomplish each task. grep is an excellent tool. To learn more about whatyou can search, try man grep. You can also google for something like “cool stuff I can do with grep.”

Solution to Task 2

$ grep "Subject[0-9][0-9]" gcutError_recon-all.log | head -1

You could have also done

$ grep -m 1 "Subject[0-9][0-9]" gcutError_recon-all.log$ cat gcutError_recon-all.log | grep "Subject[0-9][0-9]" | head -1$ cat gcutError_recon-all.log | grep -m 1 "Subject[0-9][0-9]"

There are usually many ways to do the same thing. Look up the -m option in the grep man page if you’re curious!



Closing Remarks

These are just simple examples. You see the real power of the unix command line tools when you add a little, soonto come, scripting know-how. A simple example of a more powerful way to use grep is in a case where you have 543subject logs (not impossible!), and you need to search through all of them for subjects who participated in a version ofyour experiment with a bad stimuli list. grep is an excellent tool for this!

2.3 Introduction to the Linux BASH shell

2.3.1 Get started with bash script

A great feature of the Linux shell is its programming capability. This feature provides feasibility of managing complexcomputations. This session focuses on the basic of the bash script. You will learn how to compose a simple bash script,make the script executable and run it as a shell command.

The first script in action

Follow the steps below to write our first bash script, and put it in action.

• Change present working directory to $HOME/tutorial/libs

$ cd $HOME/tutorial/libs

• Create a new text file called hello_me.sh

$ nano hello_me.sh

• Save the following texts into the file

1 #!/bin/bash2

3 # The -n option of echo command do not print the new line character at the end,4 # making the output from the next command to show on the same line.5 echo -n "Hello! "6

7 # Just run a system command and let the output printed to the screen8 whoami9

10 # Here we capture the output of the command "/bin/hostname",11 # assigning it to a new variable called "server".12 server=$(/bin/hostname)13

14 # Here we compose a text message and assign it to another variable called "msg".15 msg="Welcome to $server"16

17 # Print the value of the variable "msg" to the terminal.18 echo $msg

• Change the file permission to executable

$ chmod a+x hello_me.sh

• Run the script as a command-line tool



$ ./hello_me.sh

Note: In addtion to just typing the script name in the terminal, we add ./ in front. This enforces the system toload the executable (i.e. the script) right from the present working directory.

Interpreter directive

Generally speaking, a shell script is essentially a text file starting with an interpreter directive. The interpreterdirective specifies which interpreter program should be used to translate the file contents into instructions executed bythe system.

The directive always starts with #! (a number sign and an exclamation mark) followed by the path to the executableof the interpreter. Since we are going to use the interpreter of the bash shell, the executable of it is /bin/bash.

Comments

Except for the first line that is meant for the interpreter directive, texts following a # (number sign) in the same line aretreated as comments. They will be ignored by the interpreter while executing the script. In BASH, there is no specialsyntax for block comments.

Shell commands

Running shell commands via a script is as simple as typing the commands into the text file, just like they are in thetermianl. A trivial example is show on line 8 where the command whoami is called to get the user id.

Variables

Variables are used to store data in the script. This is done by assigning value to variable. Two different ways are shownin the example script:

1. The first way is shown on line 12 where the variable server is given a value captured from the output of the/bin/hostname command. For capturing the command output, the command is enclosed by a parenthesis() following the dollar sign $.

2. The second way shown on line 15 is simply assigning a string to the variable msg.

Note: When assigning value to variable, there SHOULD NOT be any space characters around the equal sign =.

Tip: Environment variables are also accessible in the script. For example, one can use $HOME in the script to get thepath to the personal directory on the file system.

Note: BASH variables are type-free, meaning that you can store any type of data, such as a string, a number or anarray of data, to a variable without declaring the type of it in advance.

This feature results in speedy coding and enables flexibility in recycling variable names; but it can also lead to con-flict/confusion at some point. Keep this feature in your mind when writing a complex code.

2.3. Introduction to the Linux BASH shell 23


2.3.2 Useful information

cheat sheet

A PDF version of the cheatsheet can be downloaded from here.

key-combinations in terminal

Note: These key combinations will not work with all terminal applications (i.e nano, etc) because specific programsmay have the key combinations already assigned to another purpose. In other cases, the terminal program itself maynot interpret these characters in a typical way.

The ^ character indicates the Control button. When you see it next to another character, it means to hold down theCtrl button while you push that character. For example, ^c means to hold down Ctrl and then press the c button whileyou are holding down Control. In the case of ^shift+c it means to hold down Control AND Shift buttonswhile pushing the c button.

key-combination

function

^shift+c copy highlighted text in terminal. Highlight text by clicking and dragging, just like in any application.^shift+v paste text into terminal. Text copied from the terminal will be available in other applications using the

typical ^v key combination.^c send the SIGINT signal to a program. Will usually quit any process currently running in the terminal.

It will not quit certain programs, like nano, but it will by default terminate a running script.^a move the cursor to the beginning of the line in the terminal^e move the cursor to the end of the line in the terminal^k delete everything after the cursor on one line

The rest of these aren’t as important, but may still be useful to you:

key-combination function^w delete one word backward from the cursor^b move the cursor one character backward^f move the cursor one character forwardAlt-f (hold down the Alt button and then press f) will move the cursor one word forwardAlt-b move the cursor one word backward

Handy commands

The following cd commands help you to move around in the Linux filesystem:

command functioncd - change dir to the previous directory you were just incd ../ change dir to one directory back, you can move as many directories back with this syntax as

you likecd ../../Dir

change dir to two directories back and one directory forward into the directory Dir (should beon one line)

cd ~ change dir to the home directory



Changing the PATH variable

At a BASH prompt, type:

$ PATH=$PATH:/path/to/new/directory/

You can add as many directories as you like. If you want to add more the syntax would be

$ PATH=$PATH:/path/to/first/directory/:/path/to/second/directory/:/and/so/on/

Note: If you find that none of your commands are found after you tried to change PATH, then you have accidentallydeleted you PATH variable. Restart bash (reopen the terminal application) and it will go back to normal.

Changing the $HOME/.bashrc

First, it is a good idea to back up the file if you plan to make changes.

$ cp ~/.bashrc ~/.bashrc.bak

Then you can open the bashrc file to modify with the command:

$ nano ~/.bashrc

You will then see a minimal bashrc file that the TG has configured for every user.

Add whatever commands you would like to this file. A common thing to do is to alter the path variable to contain adirectory with your personal scripts

To do this, you just add something like the following to the bottom. Note that you could enter the commands whereveryou want in the bashrc, just keep in mind that they will be executed sequentially.

$ PATH=$PATH:/usr/local/abin/:/usr/local/bin/mricron_lx/:/sbin/:/usr/local/bin/:/usr/→˓local/Scripts/

Of course, you’ll have to enter in your own directories for the PATH to make sense for you. There is no sense incopying and pasting these example PATHS.

Like on the command line, you can add as many directories as you want, just remember to separate them with the :character.

When you are finished modifying the file. Press ^x to exit, and nano will ask you if you want to save. Say yes. Tohave the current bash environment use the new bashrc, you can either start a new instance of bash, or run the command

$ source ~/.bashrc

The source command just means to run the file as though you were typing in each command yourself, and not in a newbash instance (the behavior for scripts)

If we were to run the bashrc like a script, any variables we set in bashrc would not affect the parent environment.

Note: bashrc is a hidden file. It has a . character in front of it. This means that it will not be visible normally. Youwould need to run the command ls -a to see it in the output.



When to Use Quotes and Which Quotes to Use

Quoting in bash is used to force bash to interpret characters inside the quotes literally. Often, quotes are used to avoidbash treating spaces as delimiting characters. There are two types of quotes in bash. Double quotes escape spaces,globbing characters, single quotes, and blocks the expansion of the tilde and {}. Double quotes to not escape the $character, so variable names are expanded normally.

For example, if you need to escape spaces but still want bash to expand variable names, you should use double quotes:$ file="a file with spaces.txt"; cp "$file" aFileWithoutSpaces.txt

Single quotes escape everything. Use these if you want bash to ignore all special characters. In single quotes, variableswon’t be expanded. Single quotes are commonly used when quoting search patters used for grep or awk. This canbe because some bash special characters overlap with the grep regular expression characters and cause problems orbecause you want to grep for a pattern that double quotes would expand. Consider the following:

$ echo 'Users should set their $PATH variable' >> README; cat $file | grep'$PATH'

If we want to grep for the string $PATH, then we are forced to use single quotes to stop the shell from treating the $character as special. There are many other use cases for both single and double quotes.

You can escape individual characters with the \ character. This works within double quotes as well. If for example,you wanted to have a string with two $ characters where one $ is escaped, and one $ is interpreted normally, then youcan use double quotes with a \ preceding the $ you would like to escape.

echo "$PATH \$PATH" > file.txt

This code will echo both an expanded $PATH variable and the string $PATH to a file called file.txt

Process control (killing hung jobs)

If a process you are running, whether on the GUI or on the command line, becomes unresponsive and you cannot killit by conventional means. You can use the kill command

First find the process ID that you want to stop. The following command will list all the processes being run by yourusername.

$ ps ux

For example,

1 $ ps ux2 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND3 dansha 4244 0.0 0.0 162256 3604 ? Ss Oct11 0:00 xterm4 dansha 4246 0.0 0.0 131076 3372 pts/0 Ss Oct11 0:00 bash5 dansha 4342 4.6 0.1 578252 27800 ? Rl 11:54 0:00 konsole6 dansha 4346 1.0 0.0 131076 3320 pts/12 Ss 11:54 0:00 /bin/bash7 dansha 4369 0.0 0.0 578492 16148 pts/0 Sl+ Oct11 0:01 xfce4-terminal8 dansha 4375 0.0 0.0 22980 896 pts/0 S+ Oct11 0:00 gnome-pty-helper9 dansha 4376 0.0 0.0 131084 3332 pts/3 Ss+ Oct11 0:00 bash

10 dansha 4474 0.0 0.0 133648 1388 pts/12 R+ 11:54 0:00 ps ux11 dansha 4729 0.0 0.0 131084 3336 pts/7 Ss+ Oct11 0:00 bash12 dansha 4920 0.0 0.0 131084 3392 pts/8 Ss+ Oct11 0:00 bash13 dansha 5104 0.0 0.0 162256 3604 ? Ss Oct11 0:00 xterm14 dansha 5106 0.0 0.0 131076 3256 pts/11 Ss+ Oct11 0:00 bash15 dansha 5617 0.0 0.0 162256 3804 ? Ss Oct06 0:00 xterm16 dansha 5619 0.0 0.0 131176 3568 pts/17 Ss+ Oct06 0:00 bash17 dansha 5711 0.0 0.0 376040 404 ? Ss Aug31 0:00 emacs -daemon





18 dansha 7505 0.0 0.0 36732 4 ? Ss May20 0:00 /bin/dbus-daemon --→˓fork --print-pid 6 --print-address 8 --session

19 dansha 9568 0.0 0.0 433608 8796 ? Sl Oct09 0:00 /usr/libexec/tracker-→˓store

20 dansha 9572 0.0 0.0 304444 3132 ? Sl Oct09 0:00 /usr/libexec/gvfsd21 dansha 9576 0.0 0.0 286896 5344 ? Sl Oct09 0:00 /usr/libexec//gvfsd-

→˓fuse /run/user/10441/gvfs -f -o big_writes22 dansha 12361 0.0 0.0 143436 2244 ? S Oct07 0:00 sshd: dansha@notty23 dansha 12362 0.0 0.0 62932 1912 ? Ss Oct07 0:00 /usr/libexec/openssh/

→˓sftp-server24 dansha 12472 0.0 0.0 143568 2244 ? S Oct07 0:00 sshd: dansha@notty25 dansha 12473 0.0 0.0 69328 2148 ? Ss Oct07 0:00 /usr/libexec/openssh/

→˓sftp-server26 dansha 15633 0.0 0.0 143568 2436 ? S Oct07 0:00 sshd: dansha@pts/10,

→˓pts/1527 dansha 15634 0.0 0.0 129872 2116 pts/10 Ss+ Oct07 0:00 /bin/sh28 dansha 16263 0.0 0.0 128944 3076 pts/15 Ss+ Oct07 0:00 /bin/bash --

→˓noediting -i29 dansha 18069 0.0 0.6 275020 101536 ? Sl Oct04 5:24 /usr/bin/Xvnc :2 -

→˓desktop mentat208.dccn.nl:2 (dansha) -auth /home/language/dansha/.Xauthority -→˓geometry 1910x10

30 dansha 18078 0.0 0.0 115184 1540 ? S Oct04 0:00 /bin/bash /home/→˓language/dansha/.vnc/xstartup

31 dansha 18142 0.0 0.0 96760 4120 ? S Oct04 0:00 vncconfig -iconic -→˓sendprimary=0 -nowin

32 dansha 18143 0.0 0.0 159188 6988 ? S Oct04 0:06 fluxbox33 dansha 18284 1.0 1.9 1461168 318744 ? Ssl Oct04 112:48 /usr/lib64/firefox/

→˓firefox34 dansha 18313 0.0 0.0 28504 768 ? S Oct04 0:00 dbus-launch --

→˓autolaunch=d172390f877044d1a0919ebec6673565 --binary-syntax --close-stderr35 dansha 18314 0.0 0.0 37012 896 ? Ss Oct04 0:00 /bin/dbus-daemon --

→˓fork --print-pid 6 --print-address 8 --session36 dansha 18341 0.0 0.0 160184 2560 ? S Oct04 0:01 /usr/libexec/gconfd-237 dansha 30537 0.0 0.0 406336 2536 ? Sl Sep22 0:15 /usr/bin/pulseaudio -

→˓-start --log-target=syslog

The idea is to match the process ID (PID) with the command name. Any command you run (clicking on an icon isalso a command) will have an entry in this table if the command created a process that is still running.

For example, to kill firefox process with PID 18284, one uses the command:

$ kill 18284

If firefox still doesn’t close, one could try

$ kill -9 18284

Note: kill -9 is kind of a nuclear option. Don’t use it unless the program won’t close normally with kill.

One could also combine the ps command with grep to find a running process. For example, to find firefoxprocesses, one does:

$ ps ux | grep firefoxdansha 4638 0.0 0.0 114708 984 pts/12 S+ 11:56 0:00 grep --color=auto→˓firefox





dansha 18284 1.0 1.9 1461168 318744 ? Ssl Oct04 112:48 /usr/lib64/firefox/→˓firefox

Be careful to enter in the right PID. If you enter in the wrong PID, it will kill that program instead. Think of this likeending the wrong process in the windows task manager.

Tip:

1. If you want to save your work in nano without closing the program , press ^o.

2. To read text files without editing them, use the program less. You can search through documents by typing /and then entering the search term you want to look up. Don’t include spaces. You can use this same method tonavigate man pages.

3. To see if a program is on your path and where that program is on your path, use the command which.

Odd things to be aware of

These are some little things that have come up with users in the past. I may add more items to this in the future, butthese topics are already pretty well addressed on forums.

1. In some terminal programs, accidentally pushing ^s will cause the terminal to lock up. If you notice yourterminal is locked up and your not sure why, try pushing ^q

2. Sometimes terminal formatting can get messed up. You may notice that when you type long lines, new charactersoverwrite characters at the beginning of the line. Also, if you accidentally run cat on a binary file, you may noticeyour terminal may start displaying nonsense characters when you type. In both of these cases, you might try torun the command:

$ reset

Tip: You may not be able to see what you type, but if you hit enter, type the command, and then hit enter againyou might get your terminal back to normal. If that doesn’t work, restart the terminal application.

2.3.3 Exercise: Putting Commands into a Script, and Setting the Script as Exe-cutable

Note: DO NOT just copy-and-paste the commands for the hands-on exercises!! Typing (and eventually makingtypos) is an essential part of the learning process.

Preparation

Download a text file using the following command:

$ rm -f gcutError_recon-all.log$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/linux/→˓exercise/gcutError_recon-all.log



This data file is an example output file from a freesurfer command submitted to the cluster using qsub. In this simpletask we are going to try to extract some information from it using a few commands.

Task

In This task, we’re going to create a script, set it as executable (make it so we can run it), and put it on the path

1. Make a directory called ~/Scripts. If you can’t remember the command to do this, google for it.

Hint: Remember that ~ refers to your home directory.

2. We’re going to start making a scrpt that you will build on in the next exercise.

Since a script is really just a text file, open a text editor and then enter the following lines.

#!/bin/bash

# Lines beginning with # are comments. These are not processed by BASH, except in→˓one special case.# At the beginning of a script, the first line is special.# It tells Linux what interpreter to use, and is called the interpreter directive.# If someone tries to execute a BASH script that does not have the #!/bin/bash→˓line,# and they are using a different shell (tcsh, ksh, etc), then the script# will not probably not work correctly.# This is because different shells use different syntax.# The syntax of the interpreter directive is a #! followed immediately by the→˓absolute path of the interpreter you'd like to use.# In most GNU/Linux systems, BASH is expected to live in the /bin folder, so it's→˓full path is normally /bin/bash.

This is the beginning of every BASH script with some useful commentary added. Comments in BASH aremarked with the pound (#) sign.

3. So far this script will do nothing if run because it only contains an interpreter directive and commentary. We aregoing to add some commands to the script to make it do something.

Recall the previous execise where you grep’d over the log file. If we want to save those commands to use again,a script is a very good way to do that.

Add the following commands to your script following the commantary:

$ cat gcutError_recon-all.log | grep "Subject[0-9][0-9]" | head -1

4. Save this file as ~/Scripts/logSearch.sh

5. Set the script as executable with the following command

$ chmod +x ~/Scripts/logSearch.sh

Note: This step is extrememly important. Your script will not run unless you tell linux that it can be run. Thisis a security thing. In the chmod (change mode) command, +x is an option meaning “plus executable,” or setthis file to have permission to execute for all users. For more and potentially confusing information, run thecommand



$ man chmod

6. Next we will show how you can run your script.

In Linux, executable files are treated fairly similary whether they are scripts or binary programs. To run anexecutable, you generally need to type it’s name in, and it will execute.

You only need to make sure BASH knows where to look for the executable you want to run. There are differentways to do so:

• You can run the executable by typing in the full (absolute) path of the script.

• You can use the path relative to your current working directory.

• You can add the location of the executable to your $PATH environment variable.

Try to run your script by first using the relative path, then the absolute path. Raise your hand, if you don’tunderstand this instruction.

Hint: The character . refers to your current directory. In BASH, you need to indicate that you want to run anexecutable in your current directory by prefacing the command with ./ For example, if you want to executa ascript, myscript.sh in your current directory, you would type ./myscript.sh.

7. Now that you’ve run your script using the absolute and relative paths, try to add ~/Scripts to your $PATHenvironment variable.

Hint:

• Checkout this useful information

• Remember that you need to add directories to your path, not files. When you type a command and hitenter, BASH will search all the directories on your path for a file matching what you typed. Do not addfiles directly to your path. BASH will not be able to find them.

8. See that you can run the script just by typing the name of it now! WOW!!

When an executable file is on your path, you can just type its name without giving any information about itslocation in the file system. If you specify the path of a file in the command, i.e by prepending a ./ or /the/path/to/file to the file name, BASH will ignore your path variable and look in the location you specify.

The take away from all this is that instead of typing

$ cat gcutError_recon-all.log | grep "Subject[0-9][0-9]" | head -1

Every time you want to run this command, you can just run the script you made in this exercise.

As you might be thinking already, you can add as many lines as you want to a script. If you open the script back upwith your favorite text editor, you can add anything you want to extend its functionality.

2.3.4 Exercise: the if -statement and for -loop

Introduction

In this exercise we will be extending script.sh by adding some BASH flow control constructions. We will beworking with if and for. The if-statement and the for-loop are probably the most common flow control tools in



the bash hackers toolbox.

The overall goal of this lengthy exercise is to introduce for-loops and if-statements and show you a common usecase in which they are put together with what you have learned in pervious sessions for actual works. As an example,we will show you how to search for a specific pattern in a collection of log files and print out certain information fromthe log files given a condition.

The exercise consists of two main sections broken into subtasks. The two main sections focus respectively on if andfor, and the subtasks are designed to introduce these tools and illustrate their utility.

Task 1: simple for loop

Background

We will construct a simple for-loop to demonstrate how it works.

The for-loop works by iterating over a list of items and executing all the commands in the body once for each itemin the list. The general form is:

for variable-name in list-of-stuff; docommandsmore commands

done

You can add as many commands as you like. BASH will loop through the commands in the body of the loop as manytimes as there are items in your list. You can see [the wiki](language.md) for more information.

Your Task

1. Add a list of items to this for-loop and see what happens. A list can be a list of files, strings, numbers, anything.

for i in INSERT-LIST-HERE; doecho $i

done

Replace the INSERT-LIST-HERE with $(ls ${HOME}) and see how it changes i to the next item on thelist each time it iterates.

2. In this next one, try to add any command you want to the body of the for-loop

for i in {01..10}; doINSERT-COMMANDS-HEREINSERT-MORE-COMMANDS-HERE-IF-YOU-LIKE

done

Tip: Bash takes a range of items within {} and expands it before running any commands. For example, {01..05} will expand to 01 02 03 04 05. You can use letters or numbers. See this link for more information.

The main things to remember are that the variable name, list and commands are totally arbitrary and can bewhatever you like as long as you keep the correct syntax. Also note that you can have any number of items inthe list as you want, you can set the variable name to whatever you want, and you can use any commands youwant. You don’t even need to reference the variable in the body. For example, try running


http://www.linuxjournal.com/content/bash-brace-expansion


for i in {01..05}; doecho 'hello world!'

done

Hint: Notice the syntax. The first line ends in do, the next commands are indented, and done, the keywordwhich ends the loop, is at the same indentation level as the keyword for, which begins the loop. This is howall your for loops should look.

Task 2: Use the for loop in a BASH script

Background

We will extend the functionality of our current script with the for-loop. For this exercise, we deal with the commonscenario of needing to search through a collection of log files for specific information.

Preparation

Start by downloading the log files we’ll be using. Move into a directory you’d like to work in and run thiscommand to download and untar the logfiles.

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/bash/logs.tgz$ tar xvf logs.tgz

Now open script.sh and change your grep command to the one you see below. The -o option tells grep to printONLY the matching pattern, and not the rest of the line around it. This will be useful later in the task and in general.

#!/bin/bash

# Lines beginning with # are comments. These are not processed by BASH, except in one→˓special case.# At the beginning of a script, the first line is special. It tells Linux what→˓interpreter to use, and is called, accordingly, the _interpreter directive.

grep -o "Subject[0-9][0-9]" gcutError_recon-all.log | head -1

Your task

Using this command as a starting point, create a for-loop to grep the Subject ID of every log file we’ve downloaded.

To accomplish this goal you will need to do the following:

1. Create a for loop which iterates over a list consisting of the log files.

2. Modify the grep command to search through the current log file and not “gcutError_recon-all.log”.

3. Run your script.

The structure will be something like this:

for var in list-of-logs; dogrep -o search-term file-to-search | head -1

done


https://xkcd.com/1168/


Note: Always remember to include all the special keywords: for , in , ; , do , and done. If you don’tremember these, you might not get an error, but your loop definitely won’t run.

Task 3: simple if statement

Background

Often in programming, you want your program or script to do something if certain conditions are met, and other thingsif the conditions are not met. In BASH, as well as many other languages, a very common way of exerting this type ofcontrol over your program is an if-statement.

The purpose of if is to test if a command returns an exit status of 0 (zero) or not 0, and then run some commandsif the exit staus is 0. You can also say to run commands if the exit status is not 0. This is what the keyword elsemeans.

Recall that, in BASH, the if-statement syntax is

if command-returns-true; thenrun these commands

elserun-these-commands-instead

fi

true means exit status 0 (BASH tracks every process’ exit status), and the else portion is optional. Any non-zero exitstatus would be not true, i.e false.

Note: For the gory details, refer back to the slides, the wiki, or suffer the agony of this fairly exhasutive treatment.

Your task

1. Modify the following if-statement code using the command true.

if INSERT-COMMAND-TO-EVALUATE; thenINSERT-COMMANDS-TO-RUN-IF-TRUEINSERT-MORE-COMMANDS-TO-RUN-IF-TRUE

else INSERT-COMMANDS-TO-RUN-IF-FALSEINSERT-MORE-COMMANDS-TO-RUN-IF-FALSE

fi

Tip: true is a command which does nothing except return exit status 0, thus it always evaluates to true! Thedescription in the man page is good for a chuckle. You’ll want to make sure you put true as the command toevaluate. Remember to fill in the other commands too. The other commands can be whatever you like.

2. Now try using the command false instead of true.

Note: Now the else portion of the code will be evaluated while the part before the else keyword will not beevaluated. Use the same template if-statement as you did in subtask 1.




Task 4: Comparitive statements

Background

In this task, you will extend the power of if by using it with comparison operators of BASH.

Task 3 demonstrated how if-statements work, but their main use in scripting is testing if a comparison evaluates totrue or false. This complicates the syntax.

For comparisons, you need to use a separate command called test. In BASH, the most commonly seen form oftest is [[ things-to-compare ]].

Tip: You will also see the form [ things-to-compare ], which is simply a less featured version of [[ ]].They are both versions of the command test. In general, you should always use the [[ ]] form. You can look tothis guide for the a good explanation of test [ ] and [[ ]].

Your Task

1. Modify the following if-statement structure to test if the number on the left is less-than the number on the right.

if [[ 3 INSERT-OPERATOR 4 ]]; thenecho "3 is less than 4"

elseecho "4 is not greater than 3"

fi

Tip: Numerical comparison operators to use with [[ ]] are -lt, -gt, -ge, -le, -eq, and -ne. Theymean, less-than, greater-than, greater-or-equal, etc.

Now test if 3 is greater than 4 by using a different comparison operator.

2. Try the same command but with variables now instead of numbers. Modify this code, remembering to set valuesfor variables num1 and num2.

num1=num2=if [[ $num1 INSERT-OPERATOR $num2 ]]; then

INSERT-COMMANDSelse

INSERT-COMMANDSfi

Note: BASH only understands integers. Floating point arithmetic requires external programs (like bc).

3. Now we will perform string comparisons.

The main purpose of this is to see if some variable is set to a certain value. Strings use different comparisonoperators than integers. For strings we use ==, >, <, and !=. By far the most common operators are == and !=meaning respectively equal and not equal.


http://tldp.org/LDP/abs/html/comparison-ops.html

http://mywiki.wooledge.org/BashFAQ/031


string=

if [[ $string == "A String" ]]; thenecho "strings the same"

elseecho "strings are not the same"

fi

Note: This one place where the difference between [[ ]] and [ ] becomes evident. With [ ] you will haveto escape the < and > characters because they are special characters to the shell. With [[ ]] you don’t haveto worry about escaping anything. Recall in BASH that we use \ to tell BASH to process the next characterliterally.

Note: If a string has a space in it the space has to be escaped somehow. One way of doing this is by usingeither single or double quotes.

Task 5: Put if and for together

Background

We will now return to our script with the for-loop and extend the functionality by adding an if-statement inside ofthe for-loop.

In this task, we will find the amount of time each script which generated each logfile ran. We will print the run timeand the logfile name to the screen if the runtime is below 9 hours. I’ve broken this rather large task into small steps.Raise your hand if you get lost! This one’s hard.

Your Task

1. In each logfile the “run-time” is recorded. It is the amount of time the freesurfer script which generated thelogfile ran.

Open your script and modify the grep command to search for the “run-time” instead of the subject ID. You’llneed to remove the -o flag now because we’ll need the full line.

#an examplefor file in list; do

grep SEARCH-PATTERN $filedone

After correctly modifying grep and running the script, you should have a bunch of lines output to the screen.They’ll all be of the form:

#@#%# recon-all-run-time-hours 5.525#@#%# recon-all-run-time-hours 10.225...

If you get output like this, move on to 2.



2. Restrict this output to ONLY numbers less than 10. In other words, find a search pattern that is only sensitiveto one digit followed by a decimal. Then find a way to restrict the output further so that only the whole numberremains, i.e 8.45 becomes simply 8.

If you spend more than 10 minutes on this, look to the solution and move on to 3!. This is a hard one, so Iprovide lots of hints.

Tip:

1. You only need grep for this, not if. Think about piping multiple grep commands together and of usingregexes.

2. The key to this question is getting the right regexp. There are a few ways you could do this.

3. Remember that “space” is a character.

4. If you want to search for a literal . character, you’ll have to escape it with grep, i.e \. and not ..

5. Be careful not to accidentally return only the second digit of a two digit number.

6. In grep you don’t negate the items inside [] with ! as you do with wildcards, instead you use ^, i.e[^0-9], to mean NOT a number from 0 to 9 instead of [!0-9]

7. Finally, it’s good practice in grep to put your search term in single or double quotes.

3. grep should be returning one digit numbers or nothing at all. This is what we want!

In step 3, we will capture the output and save it to a variable. We will use this variable later for a numericalcomparison involving if. Recall command substitution. If you want to save the output of a command as avariable, use the syntax:

var=$(MY-COMMANDS-HERE)

Insert your command into the parentheses and then insert that line in place of your current grep pipeline.

4. Now add an if-statement to the body of the for-loop and create a comparison, testing if the value grepreturned is less than 9. If the value is less than 9, we want to print the name of the logfile and the variable valueto the screen.

for file in list; dovar=$(MY-GREP-PIPELINE)if [[ $var INSERT-OPERATOR INSERT-VALUE ]]; then

DO SOMETHINGfi

done

If you’ve done this correctly, you may notice an odd result. Even if $var is empty, your comparison will alwaysevaluate to less than 9?! If this odd outcome is the same as yours, check the solution and then move onto subtask5!

Tip: An excellent trick is to echo the commands you will run before you run them. If, for example, you are(as you should be) worried that your search patterns are a bit too liberal, you can see what the loop will actuallydo by putting it in double-quotes and adding echo before it. Observe:

for file in list; dovar=$(MY-GREP-PIPELINE)echo "if [[ $var INSERT-OPERATOR INSERT-VALUE ]]; then

DO SOMETHING





fi"done

Instead of running the commands, you’ve now told the for-loop to echo what will actually be run to the screen.This is an important step in checking your own code for errors before you run it.

5. The reason $var is always less than 9, even when nothing is assigned to it is because empty strings evaluate to0! To get around this you can add extra conditions to your if-statement. Add an extra comparison that will testif $var is greater than zero. The syntax is like so:

for file in list; dovar=$(MY-GREP-PIPELINE)if [[ $var INSERT-OPERATOR INSERT-VALUE && $var INSERT-OPERATOR INSERT-VALUE

→˓]]; thenDO SOMETHING

fidone

This will test if both conditions evaluate to true, and then run the command if both are true. You could alsocreate a comparison using logical or with ||.

As a result, if the run time is less than 9 hours and greater than 0 hours, we will print the log and the run time tothe screen. Good work!

Note: For an even better solution, you can use what are called unary operators. These are detailed amongthe agonies of this fairly exhasutive treatment. They test if variables are empty strings, if files exist, etc. Notethat this guide uses the [ ] form of test, but you can use everything described there with the [[ ]] form aswell.

Solutions

2.4 The HPC cluster

2.4.1 Obtaining an user account

You should receive a pair of username/password after following the ICT check-in at DCCN. If you do not have aaccount, ask the TG helpdesk.

Note: The user account here is NOT the account (e.g. u-number) given by the Radboud University.

2.4.2 Accessing the cluster

Getting access to the HPC cluster

SSH login with Putty

Follow the steps below to connect to one of the cluster’s access nodes, using the SSH.

2.4. The HPC cluster 37


https://intranet.donders.ru.nl/index.php?id=checkincheckout

mailto:[email protected]


Screenshots of the four steps are shown below:

1. start putty on on the Windows desktop

2. configure putty for connecting to, e.g., mentat001.dccn.nl

3. login with your username and password

4. get a test-based virtual terminal with a shell prompt

SSH logout

You can logout the system by either closing the Putty window or typing the command exit in the virtual terminal.

VNC for graphic desktop

Note: For the first-time user, type

$ vncpasswd

in the putty terminal to protect your VNC server from anonymous access before following the instructions below.

Firstly, start the VNC server by typing the following command in the putty terminal.

$ vncmanager

Follow the step-by-step instructions on the screen to initiate a VNC server. See the screenshots below as an example.

1. start a new VNC server

2. select a host

3. choose resolution

4. make fullscreen

5. select windows manager

6. VNC server started with a display endpoint

In the screenshots above, we have started a VNC server associated with a display endpoint mentat002.dccn.nl:56. To connect to it, we use a VNC client called TigerVNC Viewer. Follow the steps below to make the connec-tion:

Note: The display endpoint mentat002.dccn.nl:56 is just an example. In reality, you should replace it with adifferent endpoint given by the vncmanager.

1. open the TigerVNC Viewer (double-click the icon on the desktop)

2. enter the display endpoint (mentat002.dccn.nl:56) as the VNC server

3. enter the authentication password you set via the vncpasswd command

4. get the graphical desktop of the access node



Disconnect VNC server

To disconnect the VNC server, simply close the TigerVNC-viewer window in which the graphical desktop is displayed.The VNC server will remain available, and can be reused (re-connected) when you need to use the graphical desktopagain in the future.

Warning: DO NOT logout the graphical desktop as it causes the VNC server become unaccessible afterwards.

Terminate VNC server

Since the graphical windows manager takes significant amount of resources from the system, it is strongly recom-mended to terminate the VNC server if you are not actively using it. Terminating a VNC server can be done via thevncmanager command. The steps are shown in the screenshots below:

1. stop a VNC server

2. choose a server to be stopped

3. confirm and stop the server

Access from outside of DCCN

If you are at home or on travel, or connecting your personal laptop to the edurom network, you are not allowed toconnect to the access nodes directly as they are in the DCCN network protected by a firewall.

In this case, you need to make the connection indirectly via one of the following two ways:



Using eduVPN

EduVPN is a virtual private network service provided by SURF allowing secure access to institute’s protected network,services and systems.

It is the most straight forward way of accessing the HPC cluster from outside of the DCCN network; but it requires avalid RU/RUMC account prefixed with u or e (a.k.a. the u/e-number). If you do have such type of RU/RUMC account,you can follow the instruction to setup the eduVPN.

After you start the eduVPN connection, your computer is “virtually” part of the DCCN network. With that you canconnect directly to the HPC cluster as accessing from inside of DCCN.

Using SSH tunnel

A SSH gateway named ssh.dccn.nl is provided for setting the SSH tunnels. When setting up a tunnel for con-necting to a target service behind the firewall, one needs to choose a local network port that is still free for use on yourdesktop/laptop (i.e. the Source port) and provides the network endpoint (i.e. the Destination) referring to thetarget service.

Tip: This technique can also be applied for accessing different services protected by the DCCN firewall.

Contents

• Instructions in video

• Putty login via SSH tunnel

• VNC via SSH tunnel (Windows)

• VNC via SSH tunnel (Linux/Mac OSX)

Instructions in video

The following screencast will guide you through the steps of accessing the cluster via the SSH tunnel.


https://www.surf.nl/en/eduvpn-facilitate-secure-internet-access-everywhere

https://intranet.donders.ru.nl/index.php?id=eduvpn


Putty login via SSH tunnel

In this example, we choose Source port to be 8022. The Destination referring to the SSH server on men-tat001 should be mentat001:22.

Follow the steps below to establish the tunnel for SSH connection:

1. start putty on the Windows desktop

2. configure putty for connecting to the SSH gateway ssh.fcdonders.nl

3. configure putty to initiate a local port 8022 for forwarding connections to mentat001:22

4. login the gateway with your username and password to establish the tunnel

Once you have logged in the gateway, you should keep the login window open; and make another SSH connection tothe local port as follows:

1. start another putty on the Windows desktop

2. configure putty for connecting to localhost on port 8022. This is the port we initiated when establishingthe tunnel.

3. login with your username and password

4. get the virtual terminal with a shell prompt. You should see the hostname mentat001 showing on the prompt.

VNC via SSH tunnel (Windows)

In this example, we choose Source port to be 5956. We also assume that a VNC server has been startedon mentat002 with the display number 56. The Destination referring to the VNC server should bementat002:5956.

Note: The display number 56 is just an example. In reality, you should replace it with a different number assignedby the vncmanager. Nevertheless, the network port number is always the display number plus 5900.

Follow the steps below to establish the tunnel for VNC connection:

1. start putty on the Windows desktop

2. configure putty for connecting to the SSH gateway ssh.fcdonders.nl

3. configure putty to initiate a local port 5956 for forwarding connections to mentat002:5956

4. login the gateway with your username and password to establish the tunnel

Once you have logged in the gateway, you should keep the login window open; and maken a VNC client connectionto the local port as follows:

1. open the TigerVNC application

2. enter the display endpoint (localhost:5956) as the VNC server





VNC via SSH tunnel (Linux/Mac OSX)

In this example, we choose Source port to be 5956. We also assume that a VNC server has been startedon mentat002 with the display number 56. The Destination referring to the VNC server should bementat002:5956.

Note: The display number 56 is just an example. In reality, you should replace it with a different number assignedby the vncmanager. Nevertheless, the network port number is always the display number plus 5900.

Follow the steps below to establish the tunnel for VNC connection:

1. open a terminal application

On Linux, this can be either gnome-terminal on GNOME desktop environment, xfce4-terminal on the XFCE4,or konsole of the KDE. On Mac, the Terminal app can be found in the Other group under the app lanchpad.

2. set up the SSH tunnel

Use the following command to create the SSH tunnel. Note that the $ sign is just an indication of your terminalprompt, it is not the part of the command. The username xxxyyy should also be your actual DCCN accountname in practice.

$ ssh -L 5956:mentat002:5956 [email protected]

A screenshot below shows an example:

Once the connect is set, you should leave the terminal open. If you close the terminal, the tunnel is also closed.You can now make a connection to your VNC session through this SSH tunnel.

3. open the TigerVNC application



4. enter the display endpoint (localhost:5956) as the VNC server



2.4.3 Using the cluster

Running computations on the Mentat compute nodes

From one of the cluster’s access node, you can login to one of the mentat compute nodes via SSH and run computations.The mentat comput nodes are namde from mentat203 to mentat208. Each of them is equipped with 4 CPU coresand 16 gigabytes of memory.

The mentat compute nodes are designed to support the following use cases:

• developing and testing algorithms

• computations require more than 72 hours of walltime

For other type of computations, it is encouraged to use the Torque cluster.

Examples below assume that you are connected to one of the cluster’s access nodes via VNC following [this instruc-tion](access.md).

Choosing a node

Choosing a mentat compute node is a manual step. The “Headnode Status” tab on this page helps you to find a lesslyloaded node at the time you start the computation. However, due to the freedom that user can start new computations


http://torquemon.dccn.nl/


at any time, load on the chosen node can change largly during your computation.

Computation in text mode

1. login to the chosen mentat compute node (e.g. mentat203) with SSH

$ ssh mentat203

2. run a program interactively, e.g.

$ matlab

Computation in graphic mode

1. accept X-window applications on any host to display graphic interface on the access node

$ xhost +

2. use SSH X11Forwarding to launch a X-window application on the mentat compute node. The following com-mand launches Matlab with the graphic desktop on mentat203:

$ ssh -Y mentat203 'source /etc/bashrc; matlab -desktop'

Running computations on the Torque cluster

What is the Torque cluster?

The Torque cluster is a pool of high-end computers (also referred to as compute nodes) managed by a resourcemanager called Torque and a job scheduler called Moab. Instead of allowing users to login to one computer and runcomputations freely, user submit their computations in forms of jobs to the Torque cluster. A sketch in the picturebelow summarises how jobs are being managed by the Torque server and scheduled by its companion, the Moab server,to perform computations on the compute nodes in the cluster.

Every job is submitted to the Torque cluster with a set of resource requirement (e.g. duration of the computation,number of CPU cores, amount of RAM, etc.). Based on the requirement, jobs are arranged internally in job queues.The Moab scheduler is responsible for prioritising jobs and assign them accordingly to compute nodes on whichthe jobs’ requirements are fulfilled. The system also guarantees dedicated resources for the computation. Thus,interference between different computations is minimised, resulting in more predictable job completion time.

Resource sharing and job prioritisation

For optimising the utilisation of the resources of the Torque cluster, certain resource-sharing and job prioritisationpolicies are applied to jobs submitted to the cluster. The implications to users can be seen from the the three aspects:job queues, throttling policies for resource usage and job prioritisation.

Job queues

In the cluster, several job queues are made available in order to arrange jobs by resource requirements. Those queuesare summarised in the table below. Queues are mainly distinguished by the wall time and memory limitations. Some


http://www.adaptivecomputing.com/products/open-source/torque/

http://www.adaptivecomputing.com/products/hpc-products/moab-hpc-basic-edition/


Fig. 2: Figure: a simplified view of the torque cluster architecture.



queues, such as matlab, vgl and interactive, have their own special purpose for jobs with additional resource require-ments.

queuename

routingqueue

max. walltime perjob

max. memory perjob

special feature job prior-ity

matlab N/A 48 hours 256 GB matlab license normalvgl N/A 8 hours 10 GB VirtualGL capabil-

itynormal

bigscratch N/A 72 hours 256 GB local disk space normalshort N/A 2 hours 8 GB normalveryshort N/A 20 minutes 8 GB normallong automatic 24 hours 8 GB normalbatch automatic 48 hours 256 GB normalverylong automatic 72 hours 64 GB normalinteractive automatic 72 hours 64 GB user interaction highlcmgui N/A 72 hours 64 GB interactive

LCModelhigh

At the job submission time, user can specify to which queue the job should be placed in the system. Alternatively,one could simply specify the wall time and memory required by the job and let the system pick up a most properqueue automatically for the job. The second approach is implemented by the automatic queue behaving as a router toa destination queue.

Throtteling policies for resource usage

In the Torque cluster at DCCN, throttle policies are applied to limit the amount of resources an user can allocate at thesame time. It is to avoid resources of the entire cluster being occupied by a single user. The policies are defined in twoscopes:

Queue-wise policies

For every job queue, the total number of runnable and queue-able jobs per user are throttled. In the tablebelow, the max. runnable jobs specifies the maximum number of running jobs a user is allowed to have ina queue at a given time; while the max. queueable jobs restricts the total number of jobs (including idle,running and blocked jobs) a user is allowed to have.

queue name max. runnable jobs max. queue-able jobsmatlab 400 2000bigscratch 400 2000short 400 2000veryshort 400 2000long 400 2000batch 400 2000verylong 400 2000vgl 2 5interactive 2 4lcmgui 2 4

For most of queues, the number of runnable and queue-able jobs are set to 300 and 2000, respectively.However, more restricted policies are applied to jobs in the vgl, interactive and lcmgui queues. For jobs in



the vgl queue, the maximum runnable and queue-able jobs are set to 2 and 5, respectively; while they are2 and 4 for jobs in the interactive and the lcmgui queues. This is to compensate for the facts that vgl jobsconsume lots of the network bandwidth; and interactive and lcmgui jobs always have the highest priorityto start. Furthermore, the lcmgui jobs are always assigned to the node on which the LCModel license isinstalled.

Cluster-wise policies

The cluster-wise throttling is to limit the total amount of resources a single user can occupy at the sametime in the cluster. The three upper-bound (cluster-wise) limitations are:

• 400 jobs

• 660 days processing (wall)time

• 1 TB memory

The cluster-wise policies overrule the queue-wise policies. It implies that if the resource utilisation ofyour current running jobs reaches one of the cluster-wise limitations, your additional jobs have to wait inthe queue even there are still available resources in the cluster and you are not rearching the queue-wiselimitations.

Job prioritisation

Job priority determines the order of waiting jobs to start in the cluster. Job priority is calculated by the Moab schedulertaking into account various factors. In the cluster at DCCN, mainly the following three factors are considered.

1. The waiting time a job has spent in the queue: this factor will add one additional priority point to jobs waitingfor one additional minute in the queue.

2. Queue priority: this factor is mainly used for boosting jobs in the interactive queue with an outstanding priorityoffset so that they will be started sooner than other types of jobs.

The final job priority combining the three factors is used by the scheduler to order the waiting jobs accordingly. Thefirst job in the order is the next to start in the cluster.

Note: Job priority calculation is dynamic and not complete transparent to users. One should keep in mind that thecluster does not treat the jobs as “first-come first-serve”.

Job management workflow

The Torque system comes with a set of command-line tools for users to manage jobs in the cluster. These tools aregeneric and can be utilised for running various types of analysis jobs. The picture on the left shows a general jobmanagement lifecycle when running your computations in the cluster. The three mostly used tools during the jobmanagement lifecycle are: qsub for submitting jobs to the cluster, qstat for checking jobs’ status in the cluster, andqdel for cancelling jobs. The usage of them are given below.

Batch job submission

The qsub command is used to submit jobs to the Torque job manager. The first and simplest way of using qsub ispipelining a command-line string to it. Assuming that we want to display the hostname of the compute node on whichthe job will run, we issue the following command:


http://s-provencher.com/lcm-license.shtml


Fig. 3: Figure: the generic job management workflow.



$ echo '/bin/hostname -f' | qsub -l 'procs=1,mem=128mb,walltime=00:10:00'

Here we echo the command we want to run (i.e. /bin/hostname -f) as a string, and pass it to qsub as thecontent of our job. In addition, we also request for resources of 1 processor with 128 megabytes RAM for a walltimeof 10 minute, using the -l option.

In return, you will receive an unique job identifier similar to the one below.

6278224.dccn-l029.dccn.nl

It is “the” identifier to be used for tracing the job’s progress and status in the cluster. We will show it later; for themoment, we continue with a different way of using the qsub command.

It is more realistic that our computation involves a set of commands to be executed sequentially. A more handy way isto compose those commands into a BASH script and hand the script over to the qsub command. Assuming we havemade a script called my_analysis.sh right in the present working directory (i.e. PWD), we can then submit thisscript as a job via the following command:

$ qsub -l 'procs=1,mem=128mb,walltime=00:10:00' ${PWD}/my_analysis.sh

It is very often that the same analysis needs to be repeated on many datasets, each corresponds to, for example,a subject. It would be smart to implement the bash script with additional arguments to switch between datasets.Assuming that the my_analysis.sh is now implemented to take one argument as the subject index, submitting thescript to run on the dataset of subject 001 would look like the example below:

$ echo "${PWD}/my_analysis.sh 001" | qsub -N 's001' -l 'procs=1,mem=128mb,→˓walltime=00:10:00'

Note: The command above for passing argument to script is actually a workaround as qsub (of currently installedversion) does not provide options to deal with the command arguments.

Interactive computation in text mode

It is possible to acquire a Linux shell of an compute node for running computations interactively. It is done bysubmitting the so-called interactive jobs. To submit an interactive job, one adds an additional -I option of the qsubcommand:

$ qsub -I -l 'procs=1,mem=128mb,walltime=00:10:00,mem=128mb'

In few seconds, a message similar to the one below will show up in the terminal.

1 qsub: waiting for job 6318221.dccn-l029.dccn.nl to start2 qsub: job 6318221.dccn-l029.dccn.nl ready3

4 ----------------------------------------5 Begin PBS Prologue Tue Aug 5 13:31:05 CEST 2014 14072382656 Job ID: 6318221.dccn-l029.dccn.nl7 Username: honlee8 Group: tg9 Asked resources: mem=128mb,procs=1,walltime=00:10:00

10 Queue: interactive11 Nodes: dccn-c35112 End PBS Prologue Tue Aug 5 13:31:05 CEST 2014 1407238265





13 ----------------------------------------14 honlee@dccn-c351:~

The shell prompt on line 14 shows that you are now logged into an compute node (i.e. dccn-c351). You can nowrun the computation interactively by typing a command after the prompt.

Note: the resource usage of interactive job is also monitored by the Torque system. The job will be killed (i.e. you willbe kicked out the shell) when the computation runs over the amount of the resources requested at the job submissiontime.

Interactive computation in graphic mode

Inteactive computation in graphic mode is actually achieved by submitting a batch job to run the graphical applicationon the execute node; but when the application runs, it shows the graphic interface remotely on the cluster’s accessnode. Therefore, it requires you to connect to the cluster’s access node via VNC.

Assuming we want to run FSL interactively through its graphical menu, we use the following commands:

$ xhost +$ echo "export DISPLAY=${HOSTNAME}${DISPLAY}; fsl" | qsub -q interactive -l 'procs=1,→˓mem=128mb,walltime=00:10:00'

The first command allows graphic interfaces on any remote host to be displayed on the access node. The secondcommand submit a job to firstly set the compute node to forward graphic interfaces to the access node before launchingthe FSL executable.

Checking job status

Every submitted job in the cluster is referred by an unique identifier (i.e. the job id). It is “the” reference allowingsystem and users to trace the progress of a particular job in the cluster. The system also maintains a set of historicaljobs (i.e. jobs finished in last 12 hours) that can be also queried by users using the qstat command.

To get a list of jobs submitted by you, simply run

$ qstat

If you have jobs in the system, you will get a table similar to the one below:

job id Name User Time Use S Queue------------------------- ---------------- --------------- -------- - -----6318626.dccn-l029 matlab honlee 00:00:00 C matlab6318627.dccn-l029 matlab honlee 00:00:00 C matlab6318630.dccn-l029 STDIN honlee 00:00:01 C matlab6318631.dccn-l029 STDIN honlee 00:00:01 C interactive

In the able, the colume Time Use indicates the CPU time utilisation of the job, while the job status is presented in thecolumn S with a flag of a capital letter. Possible job-status flags are summarised below:

• H: job is held (by the system or the user)

• Q: job is queued and eligible to run

• R: job is running

• E: job is exiting after having run



• C: job is completed after having run

Tip: There are many options supported by qstat. For example, one can use -i to list only jobs waiting in thequeue. More options can be found via the online document using man qstat.

Cancelling jobs

Cancelling jobs in the cluster is done with the qdel command. For example, to cancel a job with id 6318635, onedoes

$ qdel 6318635

Note: You cannot cancel jobs in status exiting (E) or completed (C).

Output streams of the job

On the compute node, the job itself is executed as a process in the system. The default STDOUT and STDERRstreams of the process are redirected to files named as <job_name>.o<job_id_digits> and <job_name>.e<job_id_digits>, respectively. After the job reachers the complete state, these two files will be produced onthe file system.

Tip: The STDOUT and STDERR files produced by job usually provide useful information for debugging issues withthe job. Always check them first when your job is failed or terminated unexpectedly.

Specifying resource requirement

Each job submitted to the cluster comes with a resource requirement. The job scheduler and resource manager of thecluster make sure that the needed resources are allocated for the job. To allow the job to complete successfully, it isimportant that a right and sufficient amount of resources are specified at the job submission time.

When submitting jobs with the qsub command, one uses the -l option to specify required resources. The value ofthe -l option follows certain syntax. Detail of the syntax can be found on the Torque documentation. Hereafter arefew useful, and mostly used examples for jobs requiring:

1 CPU core, 4 gigabytes memory and 12 hours wallclock time

$ qsub -l 'walltime=12:00:00,mem=4gb' job.sh

The requirement of 1 CPU is skipped as it is by default to be 1.

4 CPU cores on a single node, 12 hours wallclock time, and 4 gb memory

$ qsub -l 'nodes=1:ppn=4,walltime=12:00:00,mem=4gb' job.sh


http://docs.adaptivecomputing.com/torque/6-1-2/adminGuide/torque.htm#topics/torque/2-jobs/requestingRes.htm?Highlight=resource%20specification


Here we explicitly ask 4 CPU cores to be on the same compute node. This is usually a case that the application (suchas multithreading of MATLAB) can benefit from multiple cores on a (SMP) node to speed up the computation.

1 CPU core, 500gb of free local “scratch” diskspace in /data, 12 hours wallclock time, and 4 gbmemory

$ qsub -l 'file=500gb,walltime=12:00:00,mem=4gb' job.sh

Here we explicitly ask for 500gb of free local diskspace located in /data on the compute node. This could for instancebe asked for when submitting an fmriprep job that requires lots of local diskspace for computation. The more jobs arerunning, the longer it can take for torque to find a node with enough free diskspace to run the job. Max to request foris 3600gb.

Note: In case you use more than the requested 500gb there will be no penalty. Diskspace is monitored, but your jobwon’t fail if the requested diskspace is “overused”, as long as diskspace is available. Of course if no more diskspaceis available your job will fail.

1 Intel CPU core, 4 gigabytes memory and 12 hours wallclcok time, on a node with 10 Gb networkconnectivity

$ qsub -l 'nodes=1:intel:network10GigE,walltime=12:00:00,mem=4gb' job.sh

Here we ask the allocated CPU core to be on a node with properties intel and network10GigE.

4 CPU cores, 12 hours wallclock time, and 4 gb memory. The 4 CPU cores may come from differentnodes

$ qsub -l 'procs=4,walltime=12:00:00,mem=4gb' job.sh

Here we use procs to specify the amount of CPU cores we need, but not restricting to a single node. In this scenario,the job (or the application the job runs) should take care of the communication between the processors distributed onmany nodes. This is typically for the MPI-like applications.

1 GPU with minimal cuda capability 5.0, 12 hours wallclock time, and 4 gb memory

$ qsub -l 'nodes=1:gpus=1,feature=cuda,walltime=1:00:00,mem=4gb,reqattr=cudacap>=5.0'

Here we ask for a 1 GPU on a node with the (dynamic) attribute cudacap set to larger or equal to 5.0. Thefeature=cuda requirement allows the system to make use of a standing reservation if there is still space avail-able in the reservation.

Note: The GPU support in the cluster is still in the pilot phase. Currently there is only 1 GPU available in the entirecluster. More GPUs will be added to the cluster in the future.


https://en.wikipedia.org/wiki/Message_Passing_Interface


Estimating resource requirement

As we have mentioned, every job has attributes specifying the required resources for its computation. Based on thoseattributes, the job scheduler allocates resources for jobs. The more precise these requirement attributes are given, themore efficient the resources are used. Therefore, we encourage all users to estimate the resource requirements beforesubmitting massive jobs to the cluster.

The walltime and memory requirements are the most essential ones amongst others. Hereafter are three differentways to make estimations of those two requirements.

Note: Computing resources in the cluster are reserved for jobs in terms of size (e.g. amount of requested memoryand CPU cores) and duration (e.g. the requested walltime). Under-estimating the requirement causes job to be killedbefore completion and thus the resources have been consumed by the job were wasted; while over-estimating blocksresources from being used efficiently.

1. Consult your colleages

If your analysis tool (or script) is commonly used in your research field, consulting with your colleagues mightbe just an efficient way to get a general idea about the resource requirement of the tool.

2. Monitor the resource consumption (with an interactive test job)

A good way of estimating the wall time and memory requirement is through monitoring the usage of them at runtime. This approach is only feasible if you run the job interactively through a graphical interface. Nevertheless,it’s encouraged to test your data analysis computation interactively once before submitting it to the cluster with alarge amount of batch jobs. Through the interactive test, one could easily debug issues and measure the resourceusage.

Upon the start of an interactive job, a resource comsumption monitor is shown on the top-right corner of yourVNC desktop. An example is shown in the following screenshot:

The resource monitor consists of three bars. From top to bottom, they are:

• Elapsed walltime: the bar indicates the elasped walltime consumed by the job. It also shows the remainingwalltime. The walltime is adjusted accordingly to the CPU speed.

• Memory usage: the bar indicates the current memory usage of the job.

• Max memory usage: the bar indicates the peak memory usage of the job.

3. Use the job’s epilogue message (a trial-and-error approach)

The wall time and memory requirements can also be determined with a trial procedure in which the usersubmits a test job to the cluster with a rough requirement. In the job’s STDOUT file (i.e. <job_name>.o<job_id_digits>), you will see an Epilogue message stating the amount of resources being used by thejob. In the snippet below, this is shown on line 10. Please also node the job exit code 137 on line 4. It indicatesthat job was killed by the system, very likely, due to memory overusage if you see the memory usage reportedon line 10 is close to the memory requirement on line 9.

1 ----------------------------------------2 Begin PBS Epilogue Wed Oct 17 10:18:53 CEST 2018 15397643333 Job ID: 17635280.dccn-l029.dccn.nl4 Job Exit Code: 1375 Username: honlee6 Group: tg7 Job Name: fake_app_28 Session: 156689 Asked resources: walltime=00:10:00,mem=128mb





10 Used resources: cput=00:00:04,walltime=00:00:19,mem=134217728b11 Queue: veryshort12 Nodes: dccn-c365.dccn.nl13 End PBS Epilogue Wed Oct 17 10:18:53 CEST 2018 153976433314 ----------------------------------------

Note: In addtion to checking the job’s epilogue file, you will also receive an email notification when the jobexceeds the requested walltime.

Adjust the rough requirement gradually based on the usage information and resubmit the test job with the newrequirement. In few iterations, you will be able to determine the actual usage of your analysis job. A rule ofthumb for specifying the resource requirement for the production jobs is to add on top of the actual usage a10~20% buffer as a safety margin.

Cluster tools

A set of auxiliary scripts is developed to ease the job management works on the cluster. Those tools are listed belowwith brief description about their functionalities. To use them, simply type the command in the terminal. You couldtry to apply the -h or --help option to check if there are more options available.



command functioncheckjob shows job status from the scheduler’s perspective. It is useful for knowing why a job is not started.pbsnode lists the compute nodes in the cluster. It is one of the Torque client tools.hpcutil retrieve various information about the cluster and jobs. See hpcutil-usage for more detail about the

usage.

Running supported software in the cluster

Commonly used data analysis/process software are centrally managed and supported in the cluster. A list of thesupported software can be found here. The repository where the software are organised and installed is mounted to the/opt directory on every cluster node.

Using the supported software via modules

Running a software or application in Linux requires certain changes on the environment variables. Some variablesare common (such as $PATH, $LD_LIBRARY_PATH), some are application specific (such as $R_LIBS for R,$SUBJECTS_DIR for Freesurfer.)

In order to help configure the shell environment for running the supported software, a tool called Environment Modulesis used in the cluster. Hereafter, we introduce few mostly used module commands for using the supported softwarein the cluster.

Note: You should have the module command available if you login to one of the mentat access node using aSSH client (e.g. putty). In the virtual terminal (i.e. GNOME Terminal or Konsole) of a VNC session, the modulecommand may not be available immediately. If it happens to you, make sure the following lines are presented in the~/.bashrc file.

if [ -f /etc/bashrc ]; thensource /etc/bashrc

fi

For example, run the following command in a terminal:

$ echo 'if [ -f /etc/bashrc ]; then source /etc/bashrc; fi' >> ~/.bashrc

Please note that you should close all existing terminals in the VNC session and start from a new terminal. In the newterminal, you should have the module command available.

Showing available software

Firstly, one uses the module command to list the supported software in the cluster. This is done by the followingcommand:

$ module avail----------------------------- /opt/_modules ------------------------------------------→˓--32bit/brainvoyagerqx/1.10.4 cluster/1.0(default) matlab/7.0.4 mcr/R2011b32bit/brainvoyagerqx/1.3.8 cuda/5.0 matlab/7.1.0 mcr/R2012a32bit/brainvoyagerqx/1.8.6 cuda/5.5(default) matlab/R2006a mcr/→˓R2012b(default)



http://intranet.donders.ru.nl/index.php?id=torque_da_software

http://modules.sourceforge.net/



32bit/ctf/4.16 dcmtk/3.6.0(default) matlab/R2006b mcr/R2013a32bit/mricro/1.38_6 fsl/5.0.6 matlab/R2014a python/2.6.5

## ... skip ...

As shown above, the software are represented as modules organised in name and version. From the list, one selects asoftware (and version) by picking up a corresponding module. Assuming that we are going to run FSL version 5.0.6,the module to chose is named as fsl/5.0.6.

Tip: Software are installed in a directory with respect to the hierachy of the module names. For instance, the FSLsoftware corresponding to the module fsl/5.0.6 is installed under the directory /opt/fsl/5.0.6.

Loading software

After chosing a module, the next step is to load it to configure the shell environment accordingly for running thesoftware. This is done via the load command. For example, to configure fsl/5.0.6 one does

$ module load fsl/5.0.6

After that, one can check if a right version of the FSL executable is available. For example,

$ which fsl/opt/fsl/5.0.6/bin/fsl

Tip: You can load more than one module at the same time.

Unloading software

When a loaded software is no longer needed, one can easily rollback the shell environment configuration by unloadingthe specific module. For instance,

$ module unload fsl/5.0.6

As the configuration for running FSL version 5.0.6 is removed, the FSL executable becomes unavailable. It makessure that the environment is clean for running other software.

Listing loaded software

In most of cases, you will load several software in one shell environment. To get an overview on the software loadedin the current shell, one can use the list option. For example,

% module listCurrently Loaded Modulefiles:1) fsl/5.0.6 2) R/3.1.2 3) cluster/1.0 4) matlab/R2012b

Pre-loaded software

Right after logging into the cluster, you will find several pre-loaded software. You can find them via module listcommand. Although you are free to unload them using the module unload command, you should always keep the



module cluster/1.0 loaded as it includes essential configurations for running computations in the cluster.

Tip: You should always keep the cluster/1.0 module loaded.

Using the supported software via utility scripts

For mostly used applications in the cluster (e.g. Matlab, R), utility scripts are provided to integrate with job submissionto the torque cluster. Those scripts are built on top of the software modules.

Available software

• Matlab

• RStudio

• Jupyter Notebook

Matlab

For running Matlab in the cluster, a set of wrapper scripts are available. They are part of the cluster module. Withthese wrapper scripts, one does not even need to load any corresponding modules in advance.

To start, for example, Matlab version 2014b, simply run the following command.

% matlab2014b

The wrapper script uses internally the environment modules to configure the shell environment. It also decides the wayof launching the Matlab program based on the function of the node on which the command is executed. For instance,if the command is executed on an access node, an interactive torque job will be submitted to the cluster to start theMatlab program on one of the computer nodes.

RStudio

Also for running a graphical version of RStudio to do your R analysis, another set of wrapper scripts will submit thejob to the HPC cluster. In this case no prerequisitional steps have to be taken as the wrapper scripts will do so for you.

To start RStudio, just run the following command on the commandline of your terminal in your VNC session.

% rstudio

The wrapper script starts a menu on which you can select your R/RStudio version combination. The latest versionsare shown by default. Select your desired versions and click the OK button.

Next you will be asked for your job-requirements for walltime and memory to submit RStudio as a graphical job tothe HPC cluster (just like starting you interactive graphical matlab session. . . ). Define your requirements and hit theOK button.

The menu will close and will return you to your terminal. This shows the job is submitted and the jobID.

You can check the status of your job with:

% qstat [jobID]



The selected combination of R/RStudio starts, along with the graphical walltime/memory indicator. . .

Jupyter Notebook

Jupyter notebook provides a web-based python environment for data analysis. To star it on the cluster, simply run thefollowing command in the terminal within a VNC session.

% jupyter-notebook

For the moment, only the Jupyter Notebook from Anaconda 3 is supported as it provides token-based protection onthe notebook.

Note: When using jupyter-notebook with the conda environment. One should also install jupyter package whencreating the enviromnet so that your conda environment will be used within the notebook. For example,

% conda create --name env jupyter

Best practices of running jobs on the HPC cluster

In this section, we try to collect various best practices that may be helpful for speeding up your data analysis. Pleasenote that they are developed with certain use-case. Therefore, unless it’s mentioned to be general, take a practicecarefully and always think twice whether it’s applicable to your data analysis.

If you have questions about the best-practices below or suggestions for new ones, please don’t hesitat to contact theTG helpdesk.

Avoid massive short jobs

The scheduler in the HPC cluster is in favor of less-longer jobs over massive-short jobs. The reason is that there areextra overhead for each job in terms of resource provision and job output staging. Therefore, if feasible, stacking manyshort jobs into one single longer job is encouraged.

With the longer job, your whole computation task will also be done faster given the fact that whenever a resource isallocated for you, you can utilise it longer to make more computations.

A trade-off of this approach is that if a job fails, more computing time is wasted. This can be overcome with a goodbookeeping in such that results from the finished computations in a job is preserved, and the finished computations donot need to be re-run.


https://conda.io/docs/user-guide/tasks/manage-environments.html




Utilise the scratch drive on the compute node

If your compute jobs on the cluster produce intermediate data during the process, using the scratch drive locally on thecompute node has two benefits:

• Data I/O on local drive is faster than on the home and project directory provided by a network-attached storage.

• It saves storage space in your home or project directory.

The scratch drive on the compute node is mounted on the path of /data. A general approach of storing data on it is tocreate a subdirectory under the /data path, and make the name specific to your job. For exampl, you could introducea new environment variable in the BASH shell called LOCAL_SCATCH_DIR in the following way:

export LOCAL_SCRATCH_DIR=/data/${USER}/${PBS_JOBID}/$$mkdir -p ${LOCAL_SCRATCH_DIR}

Whenever you want to store intermediate data to the directory, use the absolute path with prefix${LOCAL_SCRATCH_DIR}. For example,

cp /home/tg/honlee/mydataset.txt ${LOCAL_SCRATCH_DIR}/mydataset.txt

It would be nice if your job also takes care of clean up of the data in the /data directory. For example,

rm -rf ${LOCAL_SCRATCH_DIR}

Generally speaking, it’s not really necessary as data in this directory will be automatically removed after 14 days.However, it may help other users (and yourself) to utilise the local scratch for large datasets if space is not occupiedby finished jobs.

Avoid massive output to STDOUT

It may be handy (and quick) to just print analysis result to the screen (or, in the other word, the standard output).However, if the output is lengthy, it can results in very large STDOUT file produced by your compute jobs. Multiplyingthe amount of parallel jobs you submitted to the system, it will ends up with filling up your home directory. Thingscan easily go wrong when your home directory is full (i.e. out of quota), such as data loss.

A good advicce is to output your analysis to a file with good data structure. Most of analysis tools provides their owndata structures, e.g. the .mat file of MATLAB or the .RData file of R.

2.4.4 Exercises: cluster

Exercise: interactive job

In this exercise, you will start an interactive job in the Torque cluster. When the interactive job starts, check thehostname of the computer node on which your interactive job runs.

Tasks

Note: DO NOT just copy-n-paste the commands for the hands-on exercises!! Typing (and eventually making typos)is an essential part of the learning process.

1. submit an interactive job with the following command and wait for the job to start.



$ qsub -I -N 'MyFirstJob' -l 'walltime=00:30:00,mem=128mb'

2. note the prologue message when the job starts.

3. check the hostname of the compute node with the command below:

$ hostname -fdccn-c012.dccn.nl

4. try few linux commands in this shell, e.g. ls, cd, etc.

Tip: In the interactive session, it is just like working in a Linux shell.

5. terminate the job by the exit command

$ exit

After that, you should get back to the Linux shell on the access node where your job was submitted.

Exercise: simple batch job

The aim of this exercise is to get you familiar with the torque client tools for submitting and managing cluster jobs.We will firstly create a script that calls the sleep command for a given period of time. After that, we are going tosubmit the script as jobs to the cluster.

Tasks


1. make a script called run_sleep.sh with the following content:

#!/bin/bash

my_host=$( /bin/hostname )

time=$( date )echo "$time: $my_host falls asleep ..."

sleep $1

time=$( date )echo "$time: $my_host wakes up."

Note: Input argument of a bash script is accessible via variable $n where n is an integer referring to the n-thvariable given the the script. In the script above, the value $1 on the line sleep $1 refers to the first argumentgiven the the script. For instance, if you run the script as run_sleep.sh 10, the value of $1 is 10.

2. make sure the script runs locally



$ chmod +x run_sleep.sh$ ./run_sleep.sh 1Mon Sep 28 16:36:28 CEST 2015: dccn-c007.dccn.nl falls asleep ...Mon Sep 28 16:36:29 CEST 2015: dccn-c007.dccn.nl wakes up.

3. submit a job to run the script

$ echo "$PWD/run_sleep.sh 60" | qsub -N 'sleep_1m' -l 'nodes=1:ppn=1,mem=10mb,→˓walltime=00:01:30'6928945.dccn-l029.dccn.nl

4. check the job status. For example,

$ qstat 6928945

Note: The torque job id given here should be replaced accordingly.

5. or monitor it until it is complete

$ watch qstat 6928945

Tip: The watch command is used here to repeat the qstat command every 2 seconds. Press Control-cto quit the watch program when the job is finished.

6. examine the output file, e.g. sleep_10.o6928945, and find out the resource consumption of this job

$ cat sleep_1m.o6928945 | grep 'Used resources'Used resources: cput=00:00:00,mem=4288kb,vmem=433992kb,walltime=00:01:00

7. submit another job to run the script, with longer duration of sleep. For example,

$ echo "$PWD/run_sleep.sh 3600" | qsub -N 'sleep_1h' -l 'nodes=1:ppn=1,mem=10mb,→˓walltime=01:10:00'6928946.dccn-l029.dccn.nl

Note: Try to compare the command in step 3. As we expect the job to run longer, the requirement on the jobwalltime is also extended to 1 hour 10 minutes.

8. Ok, we don’t want to wait for the 1-hour job to finish. Let’s cancel the job. For example,

$ qdel 6928946

Exercise: finding resource requirement

In this exercise, you will use two different ways to estimate the resource requirement of running a “fake” application.

We will focus on estimating the memory requirement, as it has significant impact on the resource utilisation efficiencyof the cluster resources.



Preparation

Download the "fake" applciation which performs memory allocaiton and random number generation. At theend of the computation, the fake application also produces the cube number of a given integer (i.e. n^3).

Follow the commands below to download the fake application and run it locally:

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_resource/fake_app$ chmod +x fake_app$ ./fake_app 3 1

compute for 1 secondsresult: 27

The first argument (i.e. 3) is the base of the cube number. The second argument (i.e. 1) specifies the duration of thecomputation in unit of second.

Although the result looks trivial, the program internally generates usage of CPU time and memory. The CPU time isclearly specified by the second input argument. The question here is the amount of memory needed for running thisprogram.

Task 1: with the JOBinfo monitor

In the first task, you will estimate the amount of memory required by the fake application, using a resource-utilisationmonitor.

1. Start a VNC session (skip this step if you are already in a VNC session)

2. Submit an interactive job with the following command

$ qsub -I -l walltime=00:30:00,mem=1gb

When the job starts, a small JOBinfo window pops up at the top-right corner.

3. Run the fake application under the shell prompt initiated by the interactive job

$ ./fake_app 3 60

Keep your eyes on the JOBinfo window and see how the memory usage evolves. The Max memory usageindicates the amount of memory needed for the fake application.

4. Terminate the interactive job

Task 2: with job’s STDOUT/ERR file

In this task, you will be confronted with an issue that the computer resource (in this case, the memory) allocatedfor your job is not sufficient to complete the computation. With few trials, you will find out a sufficient (but notoverestimated) memory requirement to finish the job.

1. Download another fake application

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_→˓howto/exercise_resource/fake_app_2$ chmod +x fake_app_2

2. Try to submit a job to the cluster using the following command.



$ echo "$PWD/fake_app_2 3 300" | qsub -N fake_app_2 -l walltime=600,mem=128mb

3. Wait for the job to finish, and check the STDOUT and STDERR files of the job. Do you get the expected resultin the STDOUT file?

4. In the STDOUT file, find out relative information concerning job running out of memory limitation in the Epi-logue section. In the example below, the information are presented on lines 4,9 and 10.

On line 4, it shows that the job’s exit code is 137. This is the first hint that the job might be killed by the systemkernel due to memory over usage. On line 9, you see the memory requirement specified at the job submissiontime; while on line 10, it shows that the maximum memory used by the job is 134217728 bytes, which is veryclose to the 128mb in the requirement (i.e. the “asked resources”).

Putting these information together, what happend behind the scene was that the job got killed by the kernelwhen the computational process (the fake_app_2 in this case) tried to allocate memory more than what wasrequested for the job. The killing caused the process to return an exit code 9; and the Torque scheduler translatedit to the job’s exit code by adding an extra 128 to the process’ exit code.

1 ----------------------------------------2 Begin PBS Epilogue Wed Oct 17 10:18:53 CEST 2018 15397643333 Job ID: 17635280.dccn-l029.dccn.nl4 Job Exit Code: 1375 Username: honlee6 Group: tg7 Job Name: fake_app_28 Session: 156689 Asked resources: walltime=00:10:00,mem=128mb

10 Used resources: cput=00:00:04,walltime=00:00:19,mem=134217728b11 Queue: veryshort12 Nodes: dccn-c365.dccn.nl13 End PBS Epilogue Wed Oct 17 10:18:53 CEST 2018 153976433314 ----------------------------------------

5. Try to submit the job again with the memory requirement increased sufficiently for the actual usage.

Tip: Specify the requirement higher, but as close as possible to the actual usage.

Unnecessary high requirement results in inefficient usage of resources, and consequently blocks other jobs(including yours) from having sufficient resources to start.

Exercise: distribute data analysis in the Torque cluster

This exercise mimics a distributed data analysis assuming that we have to apply the same data analysis algorithmindependently on the datasets collected from 6 subjects. We will use the torque cluster to run the analysis in parallel.

Preparation

Using the commands below to download the exercise package and check its content.

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_da/torque_exercise.tgz$ tar xvzf torque_exercise.tgz$ cd torque_exercise





$ lsrun_analysis.sh subject_0 subject_1 subject_2 subject_3 subject_4 subject_5

In the package, there are folders for subject data (i.e. subject_{0..5}). In each subject folder, there is a data filecontaining an encrypted string (URL) pointing to the subject’s photo.

In this fake analysis, we are going to find out who our subjects are, using an trivial “analysis algorithm” that does thefollowing two steps in each subject folder:

1. decrypting the URL string, and

2. downloading the subject’s photo.

The analysis algorithm has been provided as a function in the BASH script run_analysis.sh.

Tasks

1. (optional) read the script run_analysis.sh and try to get an idea how to use it. Don’t spend too much timein understanding every detail.

Tip: The script consists of a BASH function (analyze_subject_data) encapsulating the data-analysisalgorithm. The function takes one input argument, the subject id. In the main program (the last line), the functionis called with an input $1. In BASH, variable $1 is used to refer to the first argument of a shell command.

2. run the analysis interactively on the dataset of subject_0

$ ./run_analysis.sh 0

The command doesn’t return any output to the terminal. If it is successfully executed, you should see a photo inthe folder subject_0.

Tip: The script run_analysis.sh is writen to take one argument as the subject id. Thus the commandabove will perform the data analysis algorithm on the dataset of subject_0 interactively.

3. run the analysis by submitting 5 parallel jobs; each runs on a dataset.

Tip: The command seq 1 N is useful for generating a list of integers between 1 and N. You could also use{1..N} as an alternative.

4. wait until the jobs finish and check out who our subjects are. You should see a file photo.* in each subject’sfolder.

Solution

2.4.5 Exercises: application software

Exercise: Using the environment modules to setup data-analysis software

In this exercise we will learn few useful commands for setting up data-analysis software in the cluster using theenvironment modules. Environment modules are helpful in organising software, and managing environment variables


http://modules.sourceforge.net


required by running the software.

The tasks below use the software R to illustrate the general idea that is applicable to setup other data-analysis softwareinstalled in the cluster.


Tasks

1. List the configured software

The following command is used to check what are software currently configure/setup in your shell environment:

$ module listCurrently Loaded Modulefiles:1) cluster/1.0 3) matlab/R2018b 5) freesurfer/6.02) project/1.0 4) R/3.5.1 6) fsl/6.0.0

Configured software is listed in terms of the loaded modules.

You probably notice a message similar to the one above in the terminal after you login to the cluster’s accessnode. This message informs you about the pre-loaded environment modules. It implies that your bash shell hasbeen configured with proper environment variables (e.g. PATH) for running those software/version right awayafter the login.

2. List available software

$ module avail

Environment modules for the software are organised in software names and versions.

3. List available versions of R

$ module avail R

You may replace R with matlab, freesurfer or fsl to see versions of different software.

4. Show the changes in environment variables w.r.t. the setup for R version 3.2.2

$ module show R/3.2.2

5. Check current value of the $R_HOME environment variable

$ echo $R_HOME/opt/R/3.1.2

As the default R version, the $R_HOME variable is set to point to version 3.1.2.

6. Setup the environment for R version 3.2.2

Firstly, unload the default R with

$ module unload R

, and load the specific R version with



$ module load R

Following to it, check the $R_HOME variable again, it should be pointed to a directory where the version 3.2.2is installed. You should be ready to use R version 3.2.2 in the cluster.

$ echo $R_HOME

7. Don’t like 3.2.2 and want to switch to 3.3.1 . . . Do you know how to do it?

Exercise: distributed data analysis with MATLAB

In this exercise, you will learn how to submit MATLAB jobs in the cluster using two approaches that are commonlyused at DCCN.

The first approach is to use a wrapper script called matlab_sub; while the second is to submit batch jobs right withinthe graphical interface of MATLAB.

Note: In this exercise, we will use commands in MATLAB and in Linux shell. When you see the commands startedwith a prompt $, it means a command in Linux shell. If you see >>, it implies a command to be typed in a MATLABconsole.

Preparation

Follow the steps below to download the prepared MATLAB scripts.

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_matlab/matlab_exercise.tgz$ tar xvzf matlab_exercise.tgz$ lsmatlab_sub qsub_toolbox

Task 1: matlab_sub

When you have a MATLAB script file (i.e. the M-file) which takes no input argument, you can simply submit a jobto run on the script using the matlab_sub command.

In this task, you are given a M-file which generates a 8x8 magic matrix, makes a sum of the diagonal elements, andfinally saves the sum to a file. Follow the steps below for the exercise:

1. Switch the working directory in which the M-file is provided

$ cd matlab_sub$ lsmagic_cal.m

2. Read and understand the magic_cal.m script

3. (Optional) Choose a desired MATLAB version, e.g. R2014b

$ module unload matlab$ module load matlab/R2014b



As long as you are fine with the default version of MATLAB, you can leave this step out. The default version ofMATLAB can be checked with:

$ module avail matlab

4. Submit a job to run the script

$ matlab_sub magic_cal.m

You will be asked to provide the walltime and memory requirements of the job.

Tip: You can bypass the interaction of providing memory and walltime requirements by using the --mem and--walltime options of the matlab_sub script.

The example below submits a job requesting resource of 4 GB memory and 1 hour walltime.

$ matlab_sub --walltime 01:00:00 --mem 4gb magic_cal.m

5. Monitor the job until it is finished. You will see the output file magic_cal_output.mat containing theresult.

Task 2: qsubcellfun

1. Start matlab interactive session with the command

$ matlab2014a

2. In the matlab graphical interface, type the following commands to load the MATLAB functions for submittingjobs to the cluster. Those functions are part of the FieldTrip toolbox.

>> addpath '/home/common/matlab/fieldtrip/qsub'

3. Switch the working directory to which the prepared MATLAB functions are located. For example,

>> cd qsub_toolbox>> lsqsubcellfun_demo.m qsubfeval_demo.m qsubget_demo.m randn_aft_t.m

4. Open the file randn_aft_t.m. This matlab function keeps refreshing a n-dimentional array for a duration. Ittakes two arguments: n for the array dimention, and t for duration. You could try to run it interactively usingthe MATLAB command below:

>> n_array = {10,10,10,10,10};>> t_array = {30,30,30,30,30};>> out = cellfun(@randn_aft_t, n_array, t_array, 'UniformOutput', false);>> out

out =

Columns 1 through 4

[10x10 double] [10x10 double] [10x10 double] [10x10 double]

Column 5



http://www.fieldtriptoolbox.org/



[10x10 double]

5. The cellfun function above makes five iterations sequencially over the randn_aft_t function. For everyiteration, it fill in the function with n=10 and t=30. Using the cluster, the iterations can be made in parallel viathe qsubcellfun function. For example,

>> out = qsubcellfun(@randn_aft_t, n_array, t_array, 'memreq', 10*10*8, 'timreq',→˓30, 'stack', 1);

Note: The qsubcellfun will block the MATLAB console until all submitted jobs are finished.

Task 3: qsubfeval

An alternative way of running MATLAB functions in batch is to use the qsubfeval function. In fact, qsubfevalis the underlying function called by the qsubcellfun for creating and submitting each individual job.

Following the steps below to run the same randn_aft_t function using qsubfeval.

1. Start matlab interactive session with the command

$ matlab2014a

2. In MATLAB, load the qsub toolbox from FieldTrip.

>> addpath '/home/common/matlab/fieldtrip/qsub'

3. Switch the working directory to which the prepared MATLAB functions are located. For example,

>> cd qsub_toolbox>> lsjobmon_demo.m qsubcellfun_demo.m qsubfeval_demo.m qsubget_demo.m randn_aft_t.m

4. Submit batch jobs to run on randn_aft_t function, using qsubfeval.

>> n_array = {2, 4, 6, 8, 10};>> t_array = {20, 40, 60, 80, 100};>> jobs = {};>>>> for i = 1:5req_mem = n_array{i} * n_array{i} * 8;req_etime = t_array{i};jobs{i} = qsubfeval(@randn_aft_t, n_array{i}, t_array{i}, 'memreq', req_mem,→˓'timreq', req_etime);end>>>> save 'jobs.mat' jobs

Each call of qsubfeval submits a job to run on a pair of n (array dimention) and t (duration). For this reason,we should make iteration ourselves using the for loop. This is different to using the qsubcellfun.

Another difference is that the MATLAB prompt is not blocked after job submission. One benefit here is that wecan continue with other MATLAB commands without the need to wait for jobs to finish. However, we need to


http://www.fieldtriptoolbox.org


save references to the submitted jobs in order to retrieve the results later. In the example above, references ofjobs are stored in the array of jobs. You may also save to the reference to a file and leave MATLAB completely.

5. You probably noticed that the job reference returned from qsubfeval is not the torque job id. The qsublistfunction is provided to map the job reference to the torque job id. We could combine this function to query thejob status, using a system call to the qstat command. For example:

>> load 'jobs.mat'>>>> for j = jobsjid = qsublist('getpbsid', j);cmd = sprintf('qstat %s', jid);unix(cmd);end

6. When all jobs are finished, one could retrive the output using qsubget. For example,

>> load 'jobs.mat'>>>> out = {};>>>> for j = jobsout = [out, qsubget(j{:})];end>>>> out

Note: After the output is loaded into Matlab with qsubget function, the output file is removed from the filesystem. If you need to reuse the output data in the future, better save it to a .mat file before you close theMatlab.

Exercise: Running FreeSurfer jobs on the cluster

In this exercise we will construct a small script to run FreeSurfer’s recon-all, and use qsub to submit this scriptto the cluster for execution.

Preparation

Move into the directory you’d like to work in and download the files prepared for the exercise usingthis command:

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_freesurfer/FSdata.tgz$ tar -xvf FSdata.tgz$ cd FSdata

Task 1: create the script

1. Open a text editor and create the script called runFreesurfer.sh



#!/bin/bashexport SUBJECTS_DIR=$(pwd)recon-all -subjid FreeSurfer -i MP2RAGE.nii -all

2. Set the script to be executable

3. Load the freesurfer module (an example of version 5.3)

$ module unload freesurfer$ module load freesurfer/5.3

4. Submit the script to the cluster

$ echo "cd $PWD; ./runFreesurfer.sh" | qsub -l walltime=00:10:00,mem=1GB

5. Verify the job is running with qstat. You should see something like:

$ qstat 11173851Job ID Name User Time Use S Queue+----------------------- ---------------- --------------- -------- - -----11173851.dccn-l029 STDIN dansha 0 Q long

6. Because we don’t really want to run the analysis but rather test a script, kill the job with qdel. For example:

$ qdel 11173851

Exercise: running python in the cluster

In this exercise, you will learn how to run Python script in the cluster, using Anaconda and the conda environment.

Preparation

Follow the steps below to download the prepared Python scripts.

$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_python/python_exercise.tgz$ tar xvzf python_exercise.tgz$ lsexample4d.nii.gz nibabel_example.py

Let’s run the python script, and you should expect some errors as this script requires a python module called nibabel.

$ python nibabel_example.pyTraceback (most recent call last):

File "nibabel_example.py", line 3, in <module>import nibabel as nib

ImportError: No module named nibabel

Task 1: Conda environment

Load the anaconda module using the command below:


https://anaconda.org

https://conda.io/docs/user-guide/tasks/manage-environments.html

http://nipy.org/packages/nibabel/index.html


$ module load anaconda2/4.3.0

, and check which python executable is used, e.g.

$ which python/opt/anaconda2/4.3.0/bin/python

While Anaconda provides a bundle of ready-to-use python packages for data analysis, the conda environment is usefulin two perspectives:

1. It creates isolations between python projects so that requirements and package dependancies in one enviromentdo not spoil other environments.

2. It allows uses to install packages without administrative permission.

After the anaconda module is loaded, use the command below to create a conda enviromnet called demo, and havethe pip, jupyter and numpy packages installed rightaway.

$ conda create --name demo pip jupyter numpy

At the end of the creation, example commands for activating and deactivating the environment will be given on theterminal. To activate the enviromnet we just created, do:

$ source activate demo

After that you will see changes on the shell prompt. For example, the name demo is shown on the terminal prompt.

Now check which python or pip program you will be using:

$ which python~/.conda/envs/demo/bin/python

$ which pip~/.conda/envs/demo/bin/pip

You see that the location of the python and pip program is now under your home directory under a conda environ-ment directory we have created.

The setting in the shell for the conda environment will be trasnferred with the job you submitted to the cluster. Youcould check that by starting an interactive job, and checking the locations of the python and pip programs. Theyshould still be pointed to your home directory under the conda environment.

$ qsub -I -l 'walltime=00:20:00,mem=1gb'

$ which python~/.conda/envs/demo/bin/python


Tip: You may also firstly submit a job then enter the conda environment after the job start. This may be handy whenthe conda environment is only needed within the scope of the job, or you want to switch between conda environmentfor different jobs.

To deactive the enviromnet, do:



$ source deactivate demo

Tip: To deactivate the conda environment, you may also close the terminal in which the conda environment is loaded.

Task 2: Python packages

Let’s activate the conda environment we just created in Task 1.


When you are in a conda environment, you may install your own packages in your environment if the pip package isavailable in the environment. Using the following command to check wether the pip is available in the environment:


The output of the command above should be a path started with ~/.conda.

Try to install a package called nibabel in your conda environment, using the command below:

$ pip install nibabel

Note: The conda environment is created and installed in your home directory under the path $HOME/.conda/envs. Environments are organised in different subfolders. When you install new packages in an environment, relevantfiles will also be created in its own subfolder. Be aware of the fact that conda environments do take space from thequota of your home directory.

Once the installation is done, let’s run the python script in the downloaded tarball again, and it should work.

$ python nibabel_example.py(128, 96, 24, 2)

Task 3: Jupyter notebook

Make sure you are in the conda environment we created in task 1; otherwise, do the following commands:


Jupyter notebook is a web application for creating and sharing documents containing live (Python) codes.

In order to run the live python codes within a conda environment (so that you can access to all python librariesinstalled in your conda environment), the package jupyter should also be installed in the conda environment. Usethe following methods to check it.

$ conda list | grep jupyterjupyter 1.0.0 py27_3jupyter_client 5.1.0 py27_0jupyter_console 5.2.0 py27_0jupyter_core 4.3.0 py27_0

If you don’t see jupyter related packages in your conda environment, run the following command to install it


http://nipy.org/packages/nibabel/index.html

http://jupyter.org


$ conda install jupyter

Within the conda environment, simply run the command jupyter-notebook to start the Jupyter notebook.

Try to run the python script nibabel_example.py again in the notebook. It should just work.

Task 4: Spyder

Spyder is one of the integrated development environment (IDE) for Python projects. It is part of the Anaconda package.

If you just want to make use of the Spyder IDE without the need of loading specific Python modules from your ownconda environment, you could simply run the following command on a cluster access node within a VNC session:

$ spyder

You will encounter a graphical dialog through which you can select the Spyder from a specific Anaconda version. Thewrapper then submits a job to the cluster to launch the specific spyder version on a computer node.

If you want to use specific modules installed in a conda environment, you have to install your own Spyder in the sameconda environment. Using the demo conda environment as an example, here are steps to follow:

Make sure you are in the conda environment we created in task 1; otherwise, do the following commands:


Install the Spyder package, using the conda install command:

Important: DO NOT install spyder from pip install. The spyder installed via pip doesn’t take care of librarydependancies and therefore it is very likely to be broken.

$ conda install spyder

Submit an interactive job with your required resource, e.g.:

$ qsub -I -l walltime=1:00:00,mem=4gb

Under the shell prompt of the interactive job, run the following commands to start Spyder:

$ source activate demo$ spyder

You could now check within the Spyder IDE whether the nibabel Python module we installed earlier is still avaiable.For instance, Open the file nibabel_example.py in Spyder, and press the F5 key on the keyboard (or select theRun on the menu). This should give the result in the IPython console (at the right-bottom of the Spyder IDE).

Exercise: distributed data analysis with R

In this exercise, you will learn how to submit R jobs in the cluster using the Rscript, the scripting front-end of R.

This exercise is divided into two tasks. The first task is to get you familiar with the flow of running R script as batchjobs in the HPC cluster. The second is more about bookkeeping outputs (R data files) produced by R jobs runningconcurrently in the cluster.


https://www.spyder-ide.org/


Note: In this exercise, we will use commands in R and in Linux shell. When you see the commands started with aprompt $, it means a command in Linux shell. If you see >, it implies a command to be typed in a R console.

Preparation

Follow the steps below to download the prepared R scripts.

$ mkdir R_exercise$ cd R_exercise$ wget https://github.com/Donders-Institute/hpc-wiki-v2/raw/master/docs/cluster_howto/→˓exercise_R/R_exercise.tgz$ tar xvzf R_exercise.tgz$ lsmagic_cal_2.R magic_cal_3.R magic_cal.R

Load environment for R version 3.2.2.

$ module unload R$ module load R/3.2.2$ which R/opt/R/3.2.2/bin/R

Task 1: simple job

In this task, we use the script magic_cal.R. This script uses the magic library to genera a magic matrix of a givendimension, and calculate the sum of its diagonal elements. The matrix and the sum are both printed to the standardoutput.

1. run the script interactively, for a matrix of dimention 8

$ export R_LIBS=/opt/R/packages$ Rscript magic_cal.R 5WARNING: ignoring environment value of R_HOMELoading required package: abind

[,1] [,2] [,3] [,4] [,5][1,] 9 2 25 18 11[2,] 3 21 19 12 10[3,] 22 20 13 6 4[4,] 16 14 7 5 23[5,] 15 8 1 24 17[1] 65

2. read and understand the magic_cal.R script

3. run the script to the cluster as a batch job

$ echo "Rscript $PWD/magic_cal.R 5" | qsub -N "magic_cal" -l walltime=00:10:00,→˓mem=256mb11082769.dccn-l029.dccn.nl

4. wait the job to finish, and check the output of the job. Do you get same results as running interactively?

5. run five batch jobs in parallel to run the magic_cal.R with matrices in dimention 5,6,7,8,9.



$ for d in {5 .. 9}; doecho "Rscript $PWD/magic_cal.R $d" | qsub -N "magic_cal_$d" -l

→˓walltime=00:10:00,mem=256mb;done

Task 2: job bookkeeping and saving output objects

In the previous task, data objects are just printed to the standard output, which are consequently captured as text inthe output files of the jobs. Data stored in this way is hardly be reused for following analyses. A better approach is tostore the objects in a R data file (i.e. the RData files), using the save function of R.

Given that batch jobs in the cluster will be executed at the same time, writing objects from different jobs into the samefile is not recommanded as the concurrency issue may result in corrupted outputs. A better approach is to write outputsof each job to a seperate file. In implies that running batch jobs in parallel requires an additional bookkeeping strategyon the jobs as well as the output files produced from them.

In this exercise, we are going to use the script magic_cal_2.R in which functions are provided to

• save objects into data file, and

• get job/process information that can be used for the bookkeeping purpose.

Follow the steps below:

1. run the script interactively

$ Rscript magic_cal_2.R 5WARNING: ignoring environment value of R_HOMELoading required package: abindsaving objects magic_matrix,sum_diagonal to magic_cal_2.out.RData ...done

From the terminal output, you see two objects are saved into a RData file called magic_cal_2.out.RData.Later on, you can load the object from this file into R or a R script. For example,

> load("magic_cal_2.out.RData")> ls()[1] "magic_matrix" "sum_diagonal"> magic_matrix

[,1] [,2] [,3] [,4] [,5][1,] 9 2 25 18 11[2,] 3 21 19 12 10[3,] 22 20 13 6 4[4,] 16 14 7 5 23[5,] 15 8 1 24 17> q(save="no")

2. read and understand the magic_cal_2.R script, especially the functions at the top of the script.

3. try to run magic_cal_2.R as batch jobs as we did in the previous task.

Tip: You probably noticed that the functions defined in magic_cal_2.R are so generic that they can bereused for different scripts.

That is right! In fact, we have factored out those functions into /opt/cluster/share/R so that you couldeasily make use of those functions in the future.



In the script magic_cal_3.R, it shows you how to load those functions in your R scripts. It also shows youhow to construct the name of the RData file using the job information.

2.5 The project storage

Researches at DCCN are organised as projects. Research data associated with projects are centrally organised on theproject storage.

Each project receives a specific directory on the project storage, and it is accessible on the HPC cluster, via thepath /project/<the_project_id>. For example, the storage for project 3010000.01 is under the path /project/3010000.01.

2.5.1 Managing access permission of project data

Data sharing within the project directory is controlled by a role-based mechanism implemented around the NFSv4Access Control List technology.

User roles

In the project storage, access permission of an user is governed by the user’s role in the project. There are the fourroles defined for the access control. They are listed below:

role permissionsViewer User in this role has read-only permission.Con-tribu-tor

User in this role has read and write permission.

Man-ager

User in this role has read, write permission and rights to grant/revoke roles of other users.

Tra-verse

User in this role has permission to “pass through” a directory. This role is only relevent to a directory. Itis similar to the x-bit of the linux filesystem permission. See the usage of the traverse role.

Any user who wants to access data in a project directory must acquire one of the roles in the project. Users in theManager role can grant/revoke user roles.

Tool for viewing access permission

For general end-users, a tool called prj_getacl (as Project Get ACL) is used to show user roles of a given project.For example, to list the user roles of project 3010000.01, one does

$ prj_getacl 3010000.01/project/3010000.01/:

manager: honleecontributor: martyc

viewer: edwgertraverse: mikveng

One could also apply the prj_getacl program on a path (file or directory) in the project storage. For example,


http://www.citi.umich.edu/projects/nfsv4/linux/using-acls.html

http://www.citi.umich.edu/projects/nfsv4/linux/using-acls.html


$ prj_getacl /project/3010000.01/rdm-test/project/3010000.01/rdm-test/:

manager: honleecontributor: martyc

viewer: mikveng,edwger

Note:

• The name prj_getacl should be taken as “Project Get ACL”; thus the last character of it should be thelower-case of the letter L.

• Use the -h option to see additional options supported by prj_getacl.

Tool for managing access permission

For the project manager, the tool called prj_setacl (as Project Set ACL) is used for altering user roles of a project.For example, to change the role of user rendbru from Contributor to Viewer on project 3010000.01. One does

$ prj_setacl -u rendbru 3010000.01

Note: The name prj_setacl should be taken as “Project Set ACL”; thus the last character of it should be thelower-case of the letter L.

Similarly, setting rendbru back to the Contributor role, one does the following command:

$ prj_setacl -c rendbru 3010000.01

To promote rendbru to the Manager role, one uses the -m option then, e.g.

$ prj_setacl -m rendbru 3010000.01

For removing an user from accessing a project, another tool called prj_delacl (as Project Delete ACL) is used.For example, if we want to remove the access right of rendbru from project 3010000.01, one does

$ prj_delacl rendbru 3010000.01

Note: The name prj_delacl should be taken as “Project Delete ACL”; thus the last character of it should be thelower-case of the letter L.

Changing access permission for multiple users

When changing/removing roles for multiple users, it is more efficient to combine the changes into one singleprj_setacl or prj_delacl command as it requires only one loop over all existing files in the project direc-tory. The options -m (for manager), -c (for contributor) and -u (for viewer) can be used at the same time in oneprj_setacl call. Furthermore, multiple users to be set to (removed from) the same role can be specified as acomma(,)-separated list with the prj_setacl and prj_delacl tools.

For example, the following single command will set both honlee and rendbru as contributor, and edwger asviewer of project 3010000.01:

2.5. The project storage 99


$ prj_setacl -c honlee,rendbru -u edwger 3010000.01

The following single command will remove both honlee and edwger from project 3010000.01:

$ prj_delacl honlee,edwger 3010000.01

Controlling access permission on sub-directories

It is possible to set/delete user role on sub-directory within a project directory. It is done by using either the -p option,or directly specifying the absolute path of the directory. Both prj_setacl and prj_delacl programs support it.

When doing so, the user will be automatically granted with (or revoked from) the traverse role on the parentdirectories if the user hasn’t had a role on them.

For example, granting user edwger with the contributor role in the subdirectory subject_001 in project3010000.01 can be done as below:

$ prj_setacl -p subject_001 -c edwger 3010000.01

Alternatively, one could also do:

$ prj_setacl -c edwger /project/3010000.01/subject_001

If it happens that the user edwger doesn’t have any role in directory /project/3010000.01, edwger is also au-tomatically granted with the traverse role for /project/3010000.01. This is necessary for edwger to “traversethrough” it for accessing the subject_001 sub-directory.

Note: In this situation, user edwger has to specify the directory /project/3010000.01/subject_001or P:\3010000.01\subject_001 manually in the file explorer to access the sub-directory. This is due to thefact that the user with traverse role cannot see any content (files or directories, including those the user has accesspermission) in the directory.

The Traverse role

When granting user a role in a sub-directory, a minimum permission in upper-level directories should also be given tothe user to “pass through” the directory tree. This minimum permission is referred as the Traverse role.

The traverse role is automatically managed by the prj_setacl and prj_delacl programs when managing theaccess in a sub-directory or a file within a project directory. See Controlling access permission on sub-directories.

2.6 Linux & HPC tutorials

An regular tutorial is held by the TG. Hereafter are agenda and presentations of the past tutorials.

Note: Tutorial agenda and slides are only accessible within the DCCN network.


HPC wiki Documentation · 2020-02-14 · HPC wiki Documentation, Release 2.0 2.1.3Identity Manager...

Documents

Transcript of HPC wiki Documentation · 2020-02-14 · HPC wiki Documentation, Release 2.0 2.1.3Identity Manager...