Introduction to CCR where to find information required software login and data transfer basic...

46
Introduction to CCR where to find information required software login and data transfer basic UNIX commands data storage using modules compiling codes running jobs

Transcript of Introduction to CCR where to find information required software login and data transfer basic...

Introduction to CCR

where to find informationrequired softwarelogin and data transferbasic UNIX commandsdata storageusing modulescompiling codesrunning jobs

Information and Getting Help

Getting help: CCR uses an email problem ticket system.

Users send their questions and descriptions of problems to [email protected]

The technical staff receives the email and responds to the user.• Usually within one business day.

This system allows staff to monitor and contribute their expertise to the problem.

CCR website: http://www.ccr.buffalo.edu

Cluster Computing

Traditional UNIX style command line interface. A few basic commands are necessary.

Accessible from the UB domain.Requires a CCR account.Cluster and login machines (front-end) run

the Linux operating system.Compute nodes are not accessible from

outside the cluster.The U2 cluster has over 1000 dual

processor computer nodes.

Required Software

Linux/UNIX workstation: Secure Shell (ssh) is used to login to the u2

cluster front-end machine. X-Displayed can be set for ssh.

• This is useful for graphical applications.• Tunnels GUI displays from u2 to the workstation.

Secure File Transfer (sftp).• Upload data files to the u2 cluster.• Download data files from the u2 cluster

ssh, sftp and X are usually installed on LINUX/UNIX workstations by default.

Required Software

Windows Workstations: A Secure Shell client must be used to login to

the u2 cluster front-end machine. PuTTY and X-Win32 provide secure shell

applications for Windows. X-Win32 is necessary for displaying graphical

applications. X-Displayed can be enabled in the ssh client.

• This is useful for graphical applications.• Securely tunnels GUI displays from u2 to the

workstation.

Required Software

Windows workstations: UB faculty and students can obtain PuTTY and

X-Win32 from the UBit website.• http://ubit.buffalo.edu/software/win/index.php

Secure File Transfer must be used for the transfer of data files.

PuTTY has secure command line transfer and copy.

The WinsSCP client provides a drag-and-drop interface.• http://winscp.net/eng/index.php

Accessing the U2 Cluster

The u2 cluster front-end is accessible from the UB domain (.buffalo.edu) Use VPN for access from outside the

University. The UBIT website provides a VPN client for

both Linux and Windows.• http://ubit.buffalo.edu/vpn/index.php

The VPN client connects the machine to the UB domain, from which u2 can be accessed.

Telnet access is not permitted.

Login and X-Display

LINUX/UNIX workstation: ssh u2.ccr.buffalo.edu

• ssh [email protected]

The –X or –Y flags will enable an X-Display from u2 to the workstation.• ssh –X u2.ccr.buffalo.edu

sftp u2.ccr.buffalo.edu• This is a command line interface.• put, get, mput and mget are used to uploaded and

download data files.• The wildcard “*” can be used with mput and mget.

scp filename u2.ccr.buffalo.edu:filename

Login and X-Display

Windows workstations: Launch the PuTTY client to login.

• Enter: u2.ccr.buffalo.edu– [email protected]

To login with X-Display:• Start the X-Win32 client.• Launch the PuTTY client, specifying u2 as above.• Under X11, check Enable X11 forwarding.

Launch WinSCP to transfer files to and from your workstation.• Drag and drop files and folders.

Logout: logout or exit in the login window.

Unix Environment and Shell

After login the user will be in a command line interface with a prompt.

This is the user’s login shell, which on the U2 cluster will be bash. A login shell is a system process that waits and accepts

command line input from the user. A shell specifies a type of UNIX scripting language.

There are several types of shells. Generally, bash and csh are the most popular.

Users do not need to know shell scripting, but it can be helpful.

Commands are typed on the command line. The .bashrc file is a script that runs when a user

logs in. This script can be modified to set variables and paths.

Basic Unix Commands

Using the U2 cluster requires knowledge of some basic UNIX commands.

The UNIX Reference Card provides a short list of the basic commands. http://wings.buffalo.edu/computing/

Documentation/unix/ref/unixref.html

These will get you started, then you can learn more commands as you go. List files:

• ls

• ls –la (long listing that shows all files)

Basic Unix Commands

View files:• cat filename (displays file to screen)• more filename (displays file with page breaks)

Change directory:• cd directory-pathname• cd (go to home directory)• cd .. (go back one level)

Show directory pathname • pwd (shows current directory

pathname)

Copy files and directories• cp old-file new-file• cp –R old-directory new-directory

Basic Unix Commands

Move files and directories: • mv old-file new-file• mv old-directory new-directory • NOTE: move is a copy and remove

Create a directory:• mkdir new-directory

remove files and directories:• rm filename • rm –R directory (removes directory and

contents)• rmdir directory (directory must be empty) • Note: be careful when using the wildcard “*”

More about a command: man command

Basic Unix Commands

View files and directory permissions using ls command.• ls –l

Permissions have the following format:• -rwxrwxrwx … filename

– user group other

Change permissions of files and directories using the chmod command. • Arguments for chmod are ugo+-rxw

– user group other read write execute

• chmod g+r filename– add read privilege for group

• chmod –R o-rwx directory-name – Removes read, write and execute privileges from the directory

and its contents.

Basic Unix Commands

There are a number of editors available: emacs, vi, nano, pico

• Emacs will default to a GUI if logged in with X-DISPLAY enabled.

The UBIT web pages provide reference cards.

Files edited on Windows PCs may have embedded characters that can create runtime problems. Check the type of the file:

• file filename

Convert DOS file to Unix. This will remove the Windows/DOS characters.• dos2unix –n old-file new-file

Data Storage

Home directory: /san/user/UBITusername/u2 The default user quota for a home directory is 2GB.

• Users requiring more space should contact the CCR staff. Data in home directories are backed up.

• CCR retains data backups for one month.Projects directories:

/san/projectsx/research-group-name UB faculty can request additional disk space for the use

by the members of the research group. The default group quota for a project directory is

100GB. Data in project directories is NOT backed up by default.

• Faculty wanting to backup this data should contact CCR staff for fee and schedule information.

Data Storage

Scratch spaces are available for TEMPORARY use by jobs running on the cluster. /san/scratch provides 2TB of space.

• Accessible from the front-end and all compute nodes. /ibrix/scratch provides 25TB of high performance

storage.• Applications with high IO and that share data files benefit

the most from using IBRIX.• Accessible from the front-end and all compute nodes.

/scratch provides a minimum of 60GB of storage.• The front-end and each computer nodes has local scratch

space. This space is accessible from that machine only.• Applications with high IO and that do not share data files

benefit the most from using local scratch.• Jobs must copy files to and from local scratch.

Software

CCR provides a wide variety of scientific and visualization software. Some examples: BLAST, MrBayes, iNquiry, WebMO,

ADF, GAMESS, TurboMole, CFX, Star-CD, Espresso, IDL, TecPlot, and Totalview.

The CCR website provides a complete listing of application software, as well as compilers and numerical libraries.

The GNU, INTEL, and PGI compilers are available on the U2 cluster.

A version of MPI (MPICH) is available for each compiler, and network.

Note: U2 has two networks: gigabit ethernet and Myrinet. Myrinet performs at twice the speed of gigabit ethernet.

Modules

Modules are available to set variables and paths for application software, communication protocols, compilers and numerical libraries. module avail (list all available modules) module load module-name (loads a module)

• Updates PATH variable with path of application.

module unload module-name (unloads a module) • Removes path of application from the PATH variable.

module list (list loaded modules) module show module-name

• Show what the module sets.

Modules can be loaded in the user’s .bashrc file.

Compiling Codes

The GNU compilers are in the default path. gcc, g77, gfortran

Modules must be loaded to set the paths for the INTEL and PGI compilers. icc, ifort pgcc, pgf77, pgf90

Compiling with the INTEL fortran compiler: module load intel ifort -o hello-intel helloworld.f

• hello-intel is the executable.• ./hello-intel (runs the code)

Running on the U2 Cluster

The U2 cluster has over 1000 compute machines. These machines are not accessible from

outside the cluster.Users login to the cluster front-end

machine, u2.ccr.buffalo.edu. The front-end is used for editing and

compiling code, as well as submitting jobs to the scheduler.

The compute machines are assigned to user jobs by the PBS (Portable Batch System) scheduler.

PBS Execution Model

PBS executes a login as the user on the master host, and then proceeds according to one of two modes, depending on how the user requested that the job be run. Script - the user executes the command:

qsub [options] job-script• where job-script is a standard UNIX shell script containing some PBS

directives along with the commands that the user wishes to run (examples later).

Interactive - the user executes the command:qsub [options] –I

• the job is run “interactively,” in the sense that standard output and standard error are connected to the terminal session of the initiating ’qsub’ command. Note that the job is still scheduled and run as any other batch job (so you can end up waiting a while for your prompt to come back “inside” your batch job).

Execution Model Schematic

qsub myscript pbs_server SCHEDULER

Run?

No

Yes

$PBS_NODEFILE

node1

node2

nodeN

prologue $USER login myscript epilogue

PBS Queues

The PBS queues defined for the U2 cluster are CCR and debug.

The CCR queue is the defaultThe debug queue can be requested by the user.

Used to test applications.

qstat –q Shows queues defined for the scheduler. Availability of the queues.

qmgr Shows details of the queues and scheduler.

PBS Queues

Do you even need to specify a queue?

You probably don’t need (and may not even be able) to specify a specific queue destination.

Most of our PBS servers use a routing queue. The exception is the debug queue on u2, which

requires a direct submission. This queue has a certain number of compute nodes set aside for its use during peak times. Usually, this queue has 32 compute nodes. The queue is always available, however it has dedicated nodes

Monday through Friday, from 9:00am to 5:00pm. Use -q debug to specify the debug queue on the u2 cluster.

Batch Scripts - qsub

The qsub command is used to submit jobs to the PBS scheduler.

Syntax of the qsub command:

qsub [-a date_time] [-A account_string]

[-c interval] [-C directive_prefix] [-e path]

[-h] [-I] [-j join] [-k keep] [-l resource_list]

[-m mail_options] [-M user_list] [-N name]

[-o path] [-p priority] [-q destination] [-r c]

[-S path_list] [-u user_list] [-v variable_list]

[-V] [-W additional_attributes] [-z] [script]

Batch Scripts – qsub

We will discuss the most often used of these flags. A full description can be obtained from the man pages for qsub (man qsub). All of the options (except -I) can be specified using directives inside the job-script file.

Batch Scripts - Resources

The “-l” options are used to request resources for a job. Used in batch scripts and interactive jobs.

-l walltime=01:00:00 wall-clock limit of the batch job. Requests 1 hour wall-clock time limit. If the job does not complete before this time limit, then it will be

terminated by the scheduler. All tasks will be removed from the nodes.

-l nodes=8:ppn=2 number of cluster nodes, with optional processors per node.

Requests 8 nodes with 2 processors per node. All the compute nodes in the u2 cluster have 2 processors per

node. If you request 1 processor per node, then you may share that node with another job.

Batch Scripts - Resources

-l nodes=32:GM:ppn=2 Requests nodes that have a Myrinet network connection. This is necessary when using the Myrinet network interface.

-l nodes=8:MEM4GB:ppn=2 Requests nodes that have at least 4 GB of memory each. Useful on u2, where there are 32 nodes with 8 GB of memory, and

64 nodes with 4 GB of memory.

-l mem=1024mb maximum memory requested (also understands units of kb, gb).

most useful on time-shared hosts

At the moment, CCR machines are primarily cluster hosts, including the SGI Altix. For more resource possibilities, see the man pages (man pbs resources).

Environmental Variables

$PBS_O_WORKDIR - directory from which the job was submitted.

By default, a PBS job starts from the user’s $HOME directory.

Note that you can change this default in your .cshrc or .bashrc file.

add the following to your .cshrc file:if ( $?PBS_ENVIRONMENT ) then

cd $PBS_O_WORKDIRendif

or this to your .bashrc file:if [ -n "$PBS_ENVIRONMENT" ]; then

cd $PBS_O_WORKDIRFi

In practice, many users change directory to the $PBS_O_WORKDIR directory in their scripts.

Environmental Variables

$PBSTMPDIR - reserved scratch space, local to each host (this is a CCR definition, not part of the PBS package).

This scratch directory is created in /scratch and is unique to the job.

The $PBSTMPDIR is created on every compute node running a particular job.

$PBS_NODEFILE - name of the file containing a list of nodes assigned to the current batch job.

Used to allocate parallel tasks in a cluster environment.

Sample Interactive Job

Example: qsub -I -q debug -lnodes=2:ppn=2 -lwalltime=01:00:00

Sample Script – Cluster

Running jobs in batch on the cluster: Care must be taken to ensure that the distributed hosts all get a

copy of the input file and executable. • This can be accomplished by using a directory that is nfs mounted

throughout the cluster, such as your home directory, the san projects directories and the san scratch space

• Otherwise, there is an extensive amount of explicit file staging to use the local scratch space.

Note that the amount of file staging necessary depends strongly on the nature of the code (for example, one could have an MPI application wherein a single master process handled most of the i/o and passed any necessary info off to slave processes).

Most of the PBS directives are self-explanatory, with the possible exception of

-m e send email when job ends. -M user@domain address for email notification. -j oe join the standard output and error streams (otherwise you get

them separately).

Sample Script – Cluster

Example of a PBS script for the cluster: /util/pbs-scripts/pbsCPI-sample-u2-mpirun

Sample Script – Cluster

Example of a PBS script for the cluster: Submit pbsCPI-sample-u2-mpirun

Sample Script – Cluster

Example of a PBS script for the cluster: Submit pbsCPI-sample-u2-mpiexec

Monitoring Jobs

For text-based job inquiry, use the qstat command: qstat [-a|-i|-r] [-n] [-s] [-G|-M] [-R] [-u user_list]

[job_identifier... | destination...]

Monitoring Jobs

jobvis - a GUI for displaying and monitoring of the nodes in a PBS job.

jobvis jobid

Monitoring Jobs

More views of the job with jobvis:

Monitoring Jobs

More views of the job with jobvis:

Manipulating PBS Jobs

qsub - job submission. qsub myscript

qdel - job deletion. qdel jobid

qhold - hold job. qhold jobid

qlrs – release hold on job. qlrs jobid

qmove - move a job (between servers/queues). qalter - alter a job (usually resources). More information: use man.

FAQ

PBS FAQ

When will my job run? That depends - the more resources you ask for, the longer

that you are likely to wait. On platforms that run the Maui scheduler (for CCR, that

would be all of the current production systems), use the commands showbf to see what resources are available now, and for how long.

Use showq to view a list of running and queued jobs. This will also display the number of processors active.

FAQ

Example of showbf: List available nodes: showbf –S List available Myrinet nodes: showbf –f GM

FAQ

Example of showstart: Shows estimated start time.

FAQ

Example of showq: Shows job queue:

FAQ

Example of showq: Also shows Percentage of Active Processors and Nodes.