Using the IBM Opteron 1350 at OSC —Batch...

Using the IBM Opteron 1350 at OSC—Batch Processing

October 19-20, 2010

2

Online Information

• Technical information:– http://www.osc.edu/supercomputing/

• Hardware

• Software

• Environment

• Training

• Notices

– http://www.osc.edu/supercomputing/computing/#batch

• Contact information– [email protected]

– 1-800-686-6472, 1-614-292-1800

http://www.osc.edu/supercomputing/�

http://www.osc.edu/supercomputing/computing/�

mailto:[email protected]�

3

Table of Contents

• Batch Processing: Step by Step

• Minimum Batch Files for Glenn Cluster

• Advantages of Batch Processing

• Useful Batch File Header Lines

• Useful Batch Environment Variables

• More PBS commands

All files used in the examples can be retrieved from the Batchsubdirectory of the svn repository:

svn checkout http://svn.osc.edu/repos/softdevtools/trunk/Batch Batch

4

Interactive Processing• The way you are used to working on a workstation or laptop!

• Enter a command, output returned to monitor. Based on output, enter a command, output returned to monitor, repeat. User is interacting in real-time with the computer.

• Interactive use is easiest (and almost required) for tasks that involve user’s analysis of previous command’s output to determine the next command.

• Common interactive tasks: file/directory searching, directory management (clean-up, reorganization, etc.), file editing, code debugging (in any manner), use of window-based software, use of performance tools (although most create a raw data file), and the list goes on …

5

Example Program

Throughout this workshop the same tasks will be carried outin different ways. The tasks are:

1) Compiling a bug-free source code file

2) Running the executable produced

3) Examining the output

4) Removing unnecessary files

5) Changing from home directory to “work” directory

6

Example Serial Program

This program (nest.c) simply adds togethernumbers whose values are created in a nested loop

#include <stdio.h>#include <math.h>#include <stdlib.h>

int main(int argc, char* argv[]) { float x, rock = 0;int i, j, N, M;M = 8; N = 4;if (argc == 3) {

N = atoi(argv[1]); // command argumentsM = atoi(argv[2]);

}else { // Exit if incorrect number of arguments

printf("Usage : ./a.out int int\n");exit();

}// Set Loop Countsfor(i=1;i<=N;i=i+1) {

for(j=1;j<=M;j=j+1) { x = log((float) ((10*i)+j)); // natural logrock = rock + x; // accumulate sum

} } /* Output */printf("For loop counts N=%d M=%d\n",N,M); printf("Sum=%f\n",rock);

}

svn checkout http://svn.osc.edu/repos/softdevtools/trunk/Batch Batch

7

This MPI program (search.c)divides an array into four arrays,searches for a value, and reportswhen that value is found

Example Parallel Programif (rank==0) {

for (i=0;i<N;++i) {scanf("%d",&eleven[i]);

}

MPI_Send(&eleven[1*(N/4)],N/4,MPI_INT,1,19,MPI_COMM_WORLD);MPI_Send(&eleven[2*(N/4)],N/4,MPI_INT,2,29,MPI_COMM_WORLD);MPI_Send(&eleven[3*(N/4)],N/4,MPI_INT,3,39,MPI_COMM_WORLD);

for (i=0;i<N/4;++i) {sub[i]=eleven[i];

}

MPI_Send(&eleven[1*(N/4)],N/4,MPI_INT,1,19,MPI_COMM_WORLD);MPI_Send(&eleven[2*(N/4)],N/4,MPI_INT,2,29,MPI_COMM_WORLD);MPI_Send(&eleven[3*(N/4)],N/4,MPI_INT,3,39,MPI_COMM_WORLD);

if (rank!=0) { MPI_Recv(sub,N/4, MPI_INT, 0,MPI_ANY_TAG,

MPI_COMM_WORLD, &info); }

MPI_Barrier(MPI_COMM_WORLD);

MPI_Irecv(&index,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&ask);

[cont’d on next slide]

#include <stdio.h>#include <mpi.h>#define N 16000

int main(int argc, char *argv[]) {int rank,size;int i,j;int eleven[N],sub[N/4];MPI_Status info;MPI_Request ask,found;int done=0;int index;

MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD,&rank);MPI_Comm_size(MPI_COMM_WORLD,&size);

printf("In node[%d], tmpdir is %s\n", rank, system("echo $TMPDIR"));

8

Example Parallel Program[cont’d from previous slide]

i=0;

MPI_Test(&ask,&done,&info);

while (i<N/4 && !done) {if (sub[i]==11 ) {

printf("P:%d 11 found at index=%d\n",rank,i);

for (j=0; j<size; ++j) {MPI_Isend(&i,1,MPI_INT,j,55,

MPI_COMM_WORLD,&found);}

}++i;MPI_Test(&ask,&done,&info);

}

MPI_Wait(&ask,&info);printf("P:%d I searched up to index %d\n",

rank,(i-1));

MPI_Finalize();}

9

Interactive Session

opt-login1:$ cd Batchopt-login1:$ gcc nest.c -lmopt-login1:$ ./a.out 459 121For loop counts N=459 M=121Sum = 306505.406250opt-login1:$ rm a.out

This set of commands assumes you have downloaded nest.c andnest.pbs from the svn repository:

10

Batch Processing of SAME Tasks• Key fact: In the previous interactive session the user already knew

before logging in what commands they were going to type. The code had been debugged, everything is ready for a “production” run.

• NO REAL-TIME INTERACTION IS REQUIRED→ Therefore, you can put the commands in a file (“batch file”)

• A batch file is just a script (sequence of UNIX commands put into a file) that contains:

1) The EXACT SAME commands you typed on the keyboard during the interactive session.

2) Some lines at the beginning of the file that give the batchsystem software some parameters it needs to know. Theseopening lines are called the header of the batch file.

11

Structure of a Minimum Batch File#PBS -N nest

#PBS -l walltime=00:05:00

#PBS -j oe

#PBS –S /bin/ksh

set –x

cd $PBS_O_WORKDIR

gcc nest.c -lm

./a.out 459 121

/bin/rm a.out

Header

Same Unix Commands

• environment variable which is automatically set when submitting a batch request

• it is set to the directory path from where the batch request was submitted

• more on other environment variables later

• in /bin/ksh, this prints (echos) the executed commands in your output file

• in /bin/csh, this command is set echo

this will remove the named file(s) immediately

12

Batch Processing Session -bash-3.2$ qsub nest.pbs2289978.opt-batch.osc.edu

-bash-3.2$ qstat -u yzhangopt-batch.osc.edu:

Req'd Req'd Elap

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

--------------- -------- -------- ---------- ------ --- --- ------ ----- ------

2289978.opt-bat yzhang serial nest -- 1 -- -- 00:05 Q --

-bash-3.2$ qstat -u yzhang2289978.opt-bat yzhang serial nest -- 1 -- -- 00:05 Q

--

-bash-3.2$ qstat -a | grep nest2289978.opt-bat yzhang serial nest 402 1 -- -- 00:05 R

--

-bash-3.2$ qstat -u yzhang2289978.opt-bat yzhang serial nest 402 1 -- -- 00:05 R

00:01

13

Batch Processing Session Results:

-bash-3.2$ ls -ltrtotal 108-rw-r--r-- 1 yzhang G-3040 276 Oct 28 14:30 run-nest.job-rw-r--r-- 1 yzhang G-3040 295 Oct 28 14:30 run-nest3.job-rw-r--r-- 1 yzhang G-3040 140 Oct 28 14:30 nest.pbs-rw-r--r-- 1 yzhang G-3040 725 Oct 28 14:30 nest.c-rw------- 1 yzhang G-3040 139 Nov 1 22:17 nest.o2289978

-bash-3.2$ cat nest.o2289978 + cd /nfs/06/yzhang/workshops/softw09/Batch+ gcc nest.c -lm+ ./a.out 459 121For loop counts N=459 M=121Sum=306505.406250+ rm -f a.out

14

Description of Batch File Header Lines

• Batch processing on OSC systems is implemented by the Portable Batch System.

• Header lines begin with #PBS. The # symbol indicates that this line is just a comment from the shell's point of view.

• #PBS -l walltime=00:01:00

– The character before the word walltime is a lower-case “el” not the numeric character “1”. The “el” stands for limit.

– This header line tells PBS how much real time (the time on the clock on the wall) you expect the execution of the batch file commands to take.

– Since the time format is hh:mm:ss, the limit on the wall clock time is 1 minute in this example

– You can estimate wall clock time by running your program once and using external or internal timing commands (see later)

15

PBS at OSC

• PBS is installed on Glenn cluster and Bale cluster

• Glenn cluster has four partitions

• Options may vary across systems/partitions , e.g., to take advantage of system-specific hardware

• Review OSC web pages on the different computing systems/partitions.

http://www.osc.edu/supercomputing/hardware/

16

Description of Batch File Header Lines• #PBS –N nest

– provides a name to your batch job (important!); here the name is nest– the name is used by PBS in several ways:

• appears in status (qstat command) output• prefix for the output file name returned by PBS, e.g. nest.o2289978

– A PBS output log file can be thought of as a screen dump. The log file contains everything that would have appeared on your monitor if the commands had been run interactively.

• #PBS –j oe– By default, PBS returns two log files: one for the standard output data

stream, the other for the standard error data stream– This option joins these two into a single log file

17

Description of Batch File Header Lines

•#PBS –S /bin/ksh– Sets the Linux shell which will interpret the shell commands in your

batch script– In this example, the Korn shell is used

• Can use any available shell for your command interpreter

• The command

cat /etc/shells

shows the shells available on a system

18

Procedure to run a Batch job1. Create a batch job-file (standard text file)

2. Use the qsub command to submit the job-file to PBS• When PBS begins executing batch file commands, the shell is invoked in your

login directory (your home directory, $HOME).• Retain the job number identified in the return line!

3. Use the qstat command to check on your job status(tip: use qstat -u my_user_id to see the status of your jobs only)

4. Your batch session is complete when the job no longer appears in the qstat output table• Your job is started when requested resources become available• Your job ends when batch-file commands finish or time limit reached

5. Examine the file returned by PBS to see your job output• File appears in your home directory or the directory from which you

submitted the job if using $PBS_O_WORKDIR

19

qsub and qstat•qsub batch_file_name

– Submits your batch job to PBS– Based on the resources requested in the job header, PBS places your job

in a queue into which your job “fits”– Returns your Job Identification Number (“Job ID” in qstat output,

2289978.opt-batch.osc.edu in our example)

•qstat –a– Status information on all batch jobs in PBS at that time– Output is a multi-columned, lengthy table containing many pieces of

information– The first four columns identify the job:

• JID that matches value returned from qsub, e.g.,2289978.opt-batch.osc.edu

• Your OSC login ID, e.g., ucn1234

20

qstat -a (continued)

• The name of the queue your job was placed in• The “internal” name for your job as set in the –N header line of

your batch file

– The Req'd Time column shows the time limit you set in the batch file header (HH:MM:SS)

– THE MOST IMPORTANT COLUMN is the S(tatus) column which shows a letter code for what your code is doing now

Code MeaningR RunningQ QueuedH Held

21

The PBS log file• The log file may seem cryptic but there is a pattern to it:

– internal_name.oJID– internal_name is the name given in “#PBS -N” header line– The o after the period stands for “output” (e stands for “error”)

• Log file contains the UNIX command(s) run and their output (if any)

• In our sample log file, the UNIX commands are proceeded by a “+”. This is due to the set –x command (ksh/bash) executed first in the batch file. This command causes each batch file command to be echoed to the monitor with a + in front. Without the set –xcommand (or its equivalent) only the actual screen output would appear in the log file.

• For those who use tcsh or csh, echoing of shell commands is achieved by using the command set echo on.

22

Deleting a Batch Job

• Situations may arise in which the user may want to delete one of your jobs from the PBS queue:

– resource limits set incorrectly– missing input file(s)– incorrect or missing commands in the batch file– program is taking too long to run (“infinite-loop”)

• The PBS command to delete a batch job is qdel:

$ qdel Job_ID

23

Advantages of Batch Processing• Interactive resource limits too small

– limit UNIX command will show interactive limits on CPU time (session & process), memory size, disk size, etc.

– Current limits– 2 hours of CPU time– 1 GB of memory

• Improves overall system efficiency by weighing user requirements against system load

• Makes sure all users can get equal access to resources by enforcing a scheduling policy

• Automatically keeps a log of your Unix commands and their output• Only way to access > 1 nodes for parallel processing• Batch processing concepts same for all batch software• Learn PBS on one OSC machine, know it for all

24

Useful Batch File Header Lines

• Header Line = qsub option

• Optional resource request

• Mailing options

• Rename the Log File

• Use a Different Shell

• Parallel Processing

• Starting Date & Time

• Special Queues

25

qsub Options

• The header lines of a PBS batch file are actually options to the qsubcommand. For example in our batch files we have put the header line

#PBS –j oe

We could have left out that header line and used that option when the batch file is submitted:

qsub –j oe batch_file

• It is recommended to put the options in the header section of the batch file so that the user has a record of values used.

• In this chapter, other options (besides the minimum suggested) will be discussed with emphasis on the most useful.

• All qsub options can be found with

$ man qsub

Optional Resource Request-l mem=amount (OPTIONAL) Request use of amount of memory

per node. Default units are bytes; can also beexpressed in megabytes (e.g. mem=1000MB) orgigabytes (e.g., mem=2GB)

-l file=amount (OPTIONAL) Request use of amount of local scratch disk space per node. Default unitsare bytes; can also be expressed in megabytes(e.g. disk=10000MB) or gigabytes (e.g., disk=10GB). Only required for jobs using > 10GB oflocal scratch space per node

-l software=package[+N](OPTIONAL) Request use of N licenses for package.If omitted, N=1. Only required for jobs using specificsoftware packages with limited numbers of licenses;see software documentation for details.

EMPOWER. PARTNER. LEAD.

27

Email from PBS• Do I really have to keep checking on the status of my job with qstat –a to find out its progress?

– Answer: no. By using the following batch file header lines, PBS will email you a message that your job has begun and that your job has ended, respectively

#PBS –m b#PBS –m e

• In the “job ending” email message the return status of job is reported. A return status of 0 indicates success. Total time and memory consumed is also reported.

• There is also a –m a option that sends email if your job aborts• Edit the contents of your ${HOME}/.forward file to specify the

destination of the email, e.g.,

kilroywashere@your_local_supernet.net

28

Sample End Email

Date: Sunday - November 1, 2009 9:33 PM

From: root <[email protected]>

To: <[email protected]>

Subject: PBS JOB 2289554.opt-batch.osc.edu

PBS Job Id: 2289554.opt-batch.osc.eduJob Name: nestExecution terminatedExit_status=0resources_used.cput=00:02:18resources_used.mem=676kbresources_used.vmem=9824kbresources_used.walltime=00:02:20

29

Log File Name

• Do I have to live with that awkward name for the log file returned by PBS?

• Answer: no Add the following header line and choose the name you want:

#PBS –o file_name

• This option stands for (o)utput. When the batch job is finished everything that would have been displayed on your monitor is contained in file_name.

30

Changing Shells• If not otherwise specified, the UNIX shell used to execute your batch

file commands is your login shell

• The user can choose to run a batch job in a different shell if they desire. The header line is:

#PBS –S /bin/[csh|ksh|bash]

• Notice the full path name of the shell command must be used

• NOTE: Echoing of commands in the C shell is enabled by the command

set echo

The “echoed” commands are not proceeded by a ‘+’

31

Parallel Processing for Clusters• One batch file header line performs the most critical step needed

in parallel processing: setting the number of processors your code will run on.

• This syntax for this important header line is:#PBS -l nodes=N:ppn=1 (1<=ppn<=8)

• The first part of this option specifies the number (N) of nodes you need. The maximum value for N depends on the nodes available on a given machine.

• Glenn cluster has 4/8/16 processors per node. The second part of the option indicate how many processors per node are used. If the ppn section is omitted, PBS will default to 1 processor per node.

Dual Socket Quad Socket

Dual Core

Quad Core

Number of Machines: 877 Number of Machines: 88

Number of Machines: 650 Number of Machines: 8

#PBS -1 nodes=N:ppn=C:type

Number of Cores: 4Memory: 8 GB

To request, specify:1 ≤ N ≤ 5121 ≤ C ≤ 4

Type=olddualExample:#PBS –l nodes=10:ppn=4:olddual

Number of Cores: 8Memory: (70) 16 GB, (16) 32 GB, (2) 64 GBTo request, specify:

N = 11 ≤ C ≤ 8

Type=oldquadTo request memory,

#PBS -l mem=16GBExample:#PBS –l nodes=1:ppn=8:oldquad

Number of Cores: 8Memory: 24 GBTo request, specify:

1 ≤ N ≤ 2565 ≤ C ≤ 8

Type=newdualExample:#PBS –l nodes=5:ppn=8:newdual

Number of Cores: 16Memory: 64 GBTo request, specify:

N = 19 ≤ C ≤ 16

Type=newquadTo request memory,

# PBS -1 mem = 32 GBExample:#PBS –l nodes=1:ppn=16:newquad

33

Parallel Processing Batch File• The search program uses 4 nodes to search a quarter of an integer array for

the value 11. When one processor has found it, that processor signals the others to stop searching.

• The –n option for qstat has been used to show what physical nodes the code is actually being run on.

#PBS -l walltime=00:04:00#PBS -N search#PBS -j oe#PBS –S /bin/ksh#PBS -l nodes=4:ppn=1:olddualset -xcd $PBS_O_WORKDIRmpicc search.cqstat –u $USER -rnmpiexec a.out < data3.txtrm a.outrm search.o

34

Log File (nodes=4:ppn=1)-bash-3.2$ less search.o2289997

+ cd /nfs/06/yzhang/workshops/softw09/Batch

+ mpicc search.c

+ qstat -u yzhang -rn

opt-batch.osc.edu:

Req'd Req'd Ela

p

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Tim

e

-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ---

--

2289997.opt-batch.os yzhang parallel search -- 4 -- -- 00:04 R -

-

opt0345/0+opt0344/0+opt0343/0+opt0342/0

+ mpiexec a.out

+ 0< data3.txt

P:0 11 found at index=1499

P:1 I searched up to index 1789




+ rm -f a.out search.o

35

Log File (nodes=2:ppn=2)-bash-3.2$ less search.o2289998

+ cd /nfs/06/yzhang/workshops/softw09/Batch

+ mpicc search.c

+ qstat -u yzhang -rn

opt-batch.osc.edu:

Req'd Req'd Ela

p

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Tim

e

-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ---

--

2289998.opt-batch.os yzhang parallel search -- 2 -- -- 00:04 R -

-

opt0341/1+opt0341/0+opt0340/1+opt0340/0

+ mpiexec a.out

+ 0< data3.txt

P:0 11 found at index=1499





+ rm -f a.out search.o

36

Parallel Processing for SMPs• Specifying a processor count for SMP parallel job:

#PBS -l nodes=1:ppn=8:oldquad#PBS -l walltime=10:00:00#PBS -j oe#PBS -N openmp#PBS -S /bin/kshcd $TMPDIRcp $HOME/openmp/a.out .export OMP_NUM_THREADS=8./a.out

37

Setting Execution Day & Time• Instruct PBS to not begin executing my job until a certain date and time• Use the header line:

#PBS –a [YYYY][MM][DD]hhmm

• This is the standard “(a)t” option. Only the hours and minutes must be set. If the time set has already passed, PBS will assume the date is tomorrow

• Let's say I wanted to pick a time in which a machine was not very busy. Say, this Saturday at 5 am. The header line would look like this:

#PBS –a 200911070500

• You can submit the job today, and it will be put in the (W)ait state until the date & time indicated. The output of qstat –a for a timed job looks as follows:

17450.nfs4.osc. osu2917 serial hpmulti_de 16440 1 -- -- 240:0 R 00:13

17453.nfs4.osc. yzhang batch dt -- ---- -- 00:04 W --

17454.nfs4.osc. utl0192 parallel fractaltda 31330 8 -- -- 01:05 R --

38

Queue Specification• On many HPC systems special queues are set up and can be used

only by permitted users. These queues might allow a huge amount of memory or time, a large number of processors, or access to a third-party software package.

• For example, on the glenn cluster there is a special queue called “longserial”, dedicated to serial jobs that need to run more than 168 hours and less than 336 hours.

• If you have permission, you can specify the queue for your job with the header line

#PBS –q queue_name

Otherwise, just let PBS put your job in the appropriate queue.

• For most applications, users will be put into either the serial or parallel queues.

39

Useful Batch Environmental Variables

• Using the /tmp directory

• Changing to your work directory

• Informative environment variables

40

TMPDIR• For many user the sizes of data files or executable files are so large

they cannot be placed in their home directories.

• The /tmp directory offers a huge amount of temporary disk space (315TB in total) to all users of an OSC system. In addition, it is muchfaster to access than $HOME disk since it is on local disk (not NFS-mounted).

• For each batch job, there is a subdirectory of /tmp uniquely associated with that job. It comes into existence when the job begins and is deleted when the job is finished. The name of the /tmp subdirectory is stored in the environment variable TMPDIR

• In the batch file the user should copy all files needed to $TMPDIR, cd to $TMPDIR, run your code, and finally bring needed output back files to your $HOME area.

• Note that “clean-up” at the end of the batch file is not needed since the $TMPDIR directory and all its files are deleted when the job ends.

41

Batch File using TMPDIR#PBS -l walltime=00:04:00#PBS -N nest#PBS -j oeset -xcd $HOME/workshops/softw09/Batchcp nest.c $TMPDIRcd $TMPDIRgcc nest.c -lm./a.out 3 4 > outputcp output $HOME/workshops/softw09/Batchcd $HOME/workshops/softw09/Batchcat output

42

Returned Log File

+ cd /nfs/06/yzhang/workshops/softw09/Batch+ cp nest.c /tmp/pbstmp.2290002+ cd /tmp/pbstmp.2290002+ gcc nest.c -lm+ ./a.out 3 4+ cp output /nfs/06/yzhang/workshops/softw09/Batch+ cd /nfs/06/yzhang/workshops/softw09/Batch+ cat outputFor loop counts N=3 M=4Sum=17.406998

pbsdcp – Distributed Copy for Parallel Jobs• $TMPDIR directory is not shared across nodes!

• When a parallel job starts running on multiple nodes, each node has its own $TMPDIR.

• Use pbsdcp when copying files to directories not shared between nodes (e.g. /tmp or $TMPDIR)

– Distributed copy command– Two modes:

• -s scatter mode (default)• -g gather mode


Batch File Using pbsdcp#PBS -N search#PBS -l walltime=00:04:00#PBS -j oe#PBS -l nodes=2:ppn=2:olddual#PBS -S /bin/ksh

set –xqstat $PBS_JOBID -ncd $HOME/workshops/opteron09/batchmpicc -O3 search.c -o search

pbsdcp search data3.txt $TMPDIRcd $TMPDIR

/usr/bin/time mpiexec search < data3.txt


Log File Returned+ cd /nfs/06/yzhang/workshops/opteron09/batch+ qstat 2247114.opt-batch.osc.edu -rn

opt-batch.osc.edu: Req'd Req'd

ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time-------------------- -------- -------- ------- ------ --- --- ------ ----- - -----2247114.opt-batch.os yzhang parallel search -- 2 -- -- 00:04 R --

opt0871/1+opt0871/0+opt0866/1+opt0866/0

+ mpicc -O3 search.c -o search+ pbsdcp search data3.txt /tmp/pbstmp.2247114+ cd /tmp/pbstmp.2247114+ /usr/bin/time mpiexec search+ 0< data.txtP:0 11 found at index=1499P:0 I searched up to index 1499P:1 I searched up to index 2236P:2 I searched up to index 564P:3 I searched up to index 4390.00user 0.00system 0:01.06elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k0inputs+0outputs (0major+1855minor)pagefaults 0swaps


46

PBS_O_WORKDIR

• Is there a way that PBS can automatically cd to my working directory since I always start out in my home directory?

• Answer: mostly. Once a user has used qsub to submit a batch job, the environment variable PBS_O_WORKDIR is filled with the absolute path of the directory from which qsub was executed.

• Usually, where the user has the files the batch job needs to work on is also where they submit from. Thus, the first line of their batch file can be made general purpose

cd $PBS_O_WORKDIR

• The user doesn’t even have to remember the path to the directory they are working in.

47

Batch File Using $PBS_O_WORKDIR

#PBS -l walltime=00:04:00#PBS -N nest#PBS -j oeset -xcd $PBS_O_WORKDIRgcc nest.c -lm./a.out 3 4rm a.out

48

Log File Returned

+ cd /nfs/06/yzhang/workshops/softw09/Batch+ gcc nest.c -lm+ ./a.out+ 3 4Enter Loop CountsFor loop counts N=3 M=4Sum=17.406998 + rm a.out

49

Information Variables

• PBS has a number of built-in environment variables that preserve job information:– PBS_O_HOST = hostname of machine running PBS– PBS_O_QUEUE = starting queue your job was put in– PBS_QUEUE = queue your job was executed in– PBS_JOBID = JID of your job– PBS_JOBNAME = “internal” name you gave job– PBS_NODEFILE = name of the file containing list of nodes your

job used

• On the next two slides are a batch file and its return log file that shows that these variables are filled with the correct values

50

Batch File Reporting Environment Information

#PBS -l walltime=5:00#PBS -N print-env-var#PBS -j oe

set -xcd $PBS_O_WORKDIRqstat -u $USER -rnecho $PBS_O_HOSTecho $PBS_O_QUEUEecho $PBS_QUEUEecho $PBS_JOBIDecho $PBS_JOBNAMEcat $PBS_NODEFILE

51

Returned Log File+ cd /nfs/06/yzhang/workshops/softw09/Batch+ qstat -u yzhang -rn

opt-batch.osc.edu: Req'd Req'd

ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S

Time-------------------- -------- -------- ---------- ------ ----- --- ------ ----- -

-----2290013.opt-batch.os yzhang serial print-env- 24320 1 -- -- 00:05 R

--opt0372/2

+ echo opt-login02.osc.edu (PBS_O_HOST)opt-login02.osc.edu+ echo batch (PBS_O_QUEUE)batch+ echo serial (PBS_QUEUE)serial+ echo 2290013.opt-batch.osc.edu (PBS_JOBID)2290013.opt-batch.osc.edu+ echo print-env-var (PBS_JOBNAME)print-env-var+ cat /var/spool/batch/torque/aux//2290013.opt-batch.osc.edu (PBS_NODEFILE)opt0372

52

More PBS Commands

•qpeek

•qstat (more options)

• Moab Scheduler Commands

53

qpeek

•qpeek is a well-named command. It allows for the user to “peek” into the partially-completed log file of a running job. Thus, the user can see the progress of the job.

• On the next slide, the qpeek command is used to see at what batch file command the following job is at:#PBS -l walltime=00:20:00

#PBS -N liver

#PBS -j oe

set -x

cd liver_ia64

./liver

rm liver

54

qpeek Demonstrationipf-login1:$ qstat -a

nfs1.osc.edu:

Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----17547.nfs1.osc. abe serial liver 18385 1 -- -- 00:20 Q --

ipf-login1:$ qpeek 17547Job 17551 is not running!ipf-login1:$ qpeek 17547+ cd liver_ia64ipf-login1:$ qpeek 17547+ cd liver_ia64+ ./liveripf-login1:$ qpeek 17547+ cd liver_ia64+ ./liveripf-login1:$ qpeek 17547qstat: Unknown Job Id 17547.nfs1.osc.eduJob 17547 is not running!

55

qstat

• By using the qstat command the user can get a vast amount of information about jobs (as we have already seen) and the batch system queues themselves.

• We have already used these qstat options:-a show status info on all jobs-r show status info on running jobs only-n show the nodes jobs are running on

• The new options are used to check how busy the queues are and what the queue limits/properties are-Q summary of load on each of the queues-q summary of limits on each queues-Qf | more detailed description of all queue properties

56

Glenn Cluster Queues-bash-3.2$ qstat -QQueue Max Tot Ena Str Que Run Hld Wat Trn Ext

Type---------------- --- --- --- --- --- --- --- --- --- --- ---

--longserial 0 7 yes yes 0 7 0 0 0 0 E parallel 0 65 yes yes 0 65 0 0 0 0 E serial 0 407 yes yes 18 388 1 0 0 0 E dedicated 0 0 yes yes 0 0 0 0 0 0 E batch 0 0 yes yes 0 0 0 0 0 0 R

-bash-3.2$ qstat -qserver: opt-batch.osc.edu

Queue Memory CPU Time Walltime Node Run Que Lm State---------------- ------ -------- -------- ---- --- --- -- -----longserial -- -- 336:00:0 1 7 0 -- E Rparallel -- -- 96:00:00 256 65 0 -- E Rserial -- -- 168:00:0 1 386 19 -- E Rdedicated -- -- 48:00:00 965 0 0 -- E Rbatch -- -- -- -- 0 0 -- E R

----- -----458 19

57

Full Queue Description-bash-3.2$ qstat -Qf parallelQueue: parallel

queue_type = Executionmax_user_queuable = 100total_jobs = 65state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:65 Exiting:0 resources_max.nodect = 256resources_max.nodes = 256:ppn=4resources_max.walltime = 96:00:00resources_min.nodect = 2resources_default.nodes = 2:ppn=1resources_default.walltime = 01:00:00mtime = 1237216627resources_assigned.mem = 0bresources_assigned.nodect = 823resources_assigned.vmem = 0benabled = Truestarted = True

Batch Request Limits For Users• For each user

– 128 concurrently running jobs– 2048 processor cores in concurrent use

• Serial jobs– Request only one node and up to 8 processor cores– 168 hour limit

• Parallel jobs– Request multiple nodes and up to 2048 processor cores– 96 hour limit

• Exceptions possible– Longer time limits– Larger processor counts– Contact [email protected]


59

Maui Scheduler Commands

• OSC PBS software has been enhanced with the use of the Moab Scheduler to improve job flow.

– Advance reservations– Backfill scheduling– Fairshare and quality-of-service (QOS) levels

• Maui also comes with its own set of useful commands:– showq (list currently running and queued jobs)

– showstart (estimates start time for a queued job)

– showbf (tells what processors are available to “back-fill”the system)

60

Moab Scheduling Algorithm• Compute priorities for all jobs not currently running.

• Sort idle jobs in priority order from highest to lowest, removing any jobs which have had holds place on them or exceed policy limits.

• Starting with the highest priority job, attempt to run each job until there are not enough resources available to run the highest priority job remaining.

• Given current system conditions, compute when is the soonest time the highest priority job could run, and create a reservation for it at that time.

• Backfill any other idle jobs which will not cause the start time for the highest priority job to slip further into the future.

61

Factors Involved in Job Priority

• Recent usage– How much other computing has been done by user over last several days– How much other computing has been done by user’s group over last several days

• Processor count requested• How long the job has been queued• Expansion factor (ratio of job length to queue time)

These factors tend to favor large processor-count, long-runningjobs, as those are the most difficult to schedule. Smaller processor-count and/or shorter-running jobs are filled in using backfill scheduling.

NOTE: Highest priority does not mean a job will run immediately, the system must free up enough resources (processors and memory) to run it

62

showq OutputACTIVE JOBS--------------------

JOBNAME USERNAME STATE PROC REMAINING STARTTIME

2291837 osu3127 Running 1 00:28:41 Mon Nov 2 15:47:24

2293402 osu4970 Running 4 00:38:21 Tue Nov 3 10:57:04






2289284 utl0253 Running 4 1:36:35 Sun Nov 1 20:55:18

2293408 kazantzi Running 8 1:45:01 Tue Nov 3 11:03:44





2292067 osu5208 Running 16 2:37:07 Mon Nov 2 17:25:50

...

519 active jobs 5856 of 9256 processors in use by local jobs (63.27%)

1058 of 1584 nodes active (66.79%)

63

showq Output (cont’d)

IDLE JOBS----------------------

JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

2258855 ysu0077 Idle 4 5:05:00:00 Sun Nov 1 21:59:55

2264762 ysu0077 Idle 4 6:06:00:00 Sun Nov 1 21:59:55

2271320 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55

2271321 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55

2271322 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55

5 Idle Jobs

BLOCKED JOBS----------------

JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

2275323 wsu0167 UserHold 4 2:22:00:00 Wed Dec 31 19:00:00

2288910 osu4410 Idle 2 15:00:00 Wed Dec 31 19:00:00

2288911 osu4410 Idle 2 15:00:00 Wed Dec 31 19:00:00

...

14 blocked jobs

Total jobs: 538

64

showstart Usage

-bash-3.2$ showstart 2290021job 2290021 requires 4 procs for 00:04:00

Estimated Rsv based start in 00:00:04 on Sun Nov 1 22:52:39

Estimated Rsv based completion in 00:04:04 on Sun Nov 1 22:56:39

Best Partition: olddual

65

showbf Usage

-bash-3.2$ showbf

Partition Tasks Nodes Duration StartOffset StartDate

--------- ----- ----- ------------ ------------ --------------

ALL 20914 237 2:02:01 00:00:00 21:57:59_11/01

ALL 20795 204 10:02:01 00:00:00 21:57:59_11/01

ALL 20777 199 1:10:02:01 00:00:00 21:57:59_11/01

ALL 20745 195 10:10:02:01 00:00:00 21:57:59_11/01

ALL 20741 194 11:10:02:01 00:00:00 21:57:59_11/01

ALL 20731 192 INFINITY 00:00:00 21:57:59_11/01

olddual 20282 148 2:02:01 00:00:00 21:57:59_11/01

olddual 20166 116 10:02:01 00:00:00 21:57:59_11/01

olddual 20148 111 INFINITY 00:00:00 21:57:59_11/01

newdual 19762 2 2:02:01 00:00:00 21:57:59_11/01

newdual 19759 1 INFINITY 00:00:00 21:57:59_11/01

oldquad 20380 85 INFINITY 00:00:00 21:57:59_11/01

newquad 19759 1 INFINITY 00:00:00 21:57:59_11/01

torque 19767 5 INFINITY 00:00:00 21:57:59_11/01

66

Why Won't My Job Run?There are a number of reasons why your job may not run immediately,even if there appears to be sufficient resources for it to run: • Other users' jobs may have be assigned higher priority than your job,

depending on what you're asking for. Especially in cases with small processor-count, long-running jobs, the scheduler may not be able to backfill the smaller job without interfering with a higher priority job's start time.

• There may be downtime or other system reservations in place. These will often be noted in the system's message of the day (/etc/motd) and/or the OSC “Notices” web page (http://www.osc.edu/supercomputing/notices).

• You or your group may be at the maximum CPU count or running job count for a user or group. These are generally set up such that a single user can run 128 jobs and/or use 2048 processors at a time.

67

Reporting Batch Problems to OSC Help

If you are having a problem with the batch system on any of OSC'smachines, you should send email your problems [email protected]. Including the following information will aid OSC's Science and Technology Support (STS) staff in diagnosingyour problem quickly:

• Name and telephone number• User ID (username)• Home institution• Name of the system you are using (BALE cluster, Opteron cluster)• Job ID• Job script• Job output and/or error messages (preferably in context).

mailto:[email protected]�

Using the IBM Opteron 1350 at OSC —Batch...

Documents

Transcript of Using the IBM Opteron 1350 at OSC —Batch...