Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

94
Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer

Transcript of Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Page 1: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Viglen Rocks Cluster Training

University of Liverpool

David PowerHPC Systems Engineer

Page 2: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Viglen Rocks Cluster Training

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk Quotas

• Using Rocks

Page 3: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Rocks Overview

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk Quotas

• Using Rocks

Page 4: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Rocks Overview

• Rocks is an open-source Linux cluster distribution that enables end users to easily build computational clusters, grid endpoints and visualization tiled-display walls

• It was started by National Partnership for Advanced Computational Infrastructure and the SDSC (San Diego Supercomputer Centre) in 2000

• It is based on CentOS with a modified anaconda installer that simplifies mass installation onto many computers.

• Installations are customised with additional software packages called rolls

Page 5: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Adding Software to Compute Nodes

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk Quotas

• Using Rocks

Page 6: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Building RPMs in Rocks

• Use rpmbuild if a .spec file is provided

• Use the Rocks built in mechanism to create RPMs from directories.

• Build your software from source and install on the frontend:– configure– make– make install

• Or, just untar a binary bundle

Page 7: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Building RPMs in Rocks

• Creating RPM packages in Rocks:

rocks create package <path> <package-name>

rocks create package /opt/myapp myapp

Page 8: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Contribute the RPM

• Your distribution looks for packages from Rolls and in the contrib area.

• Copy your RPMs in to the contrib directory.

cp myapp-1.0-1.x86_64.rpm /export/rocks/install/contrib/5.X/x86_64/RPMS

Page 9: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Extend XML

• Compute Node installation profile can be extended with additional applications by creating an extend-compute.xml file and editing this:

cd /export/rocks/install/site-profiles/5.1/nodes/

cp skeleton.xml extend-compute.xml

vi extend-compute.xml

Page 10: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Add a package to the Compute Profile

Page 11: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Rebuild Distribution

• RPM package is present in the contrib directory

• XML node file is extended (and updated)

• Now we need to rebuild the distribution (apply changes above)

• This must be done in /export/rocks/install

$ cd /export/rocks/install/

$ rocks create distro

Page 12: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Reinstall the nodes

• PXE Boot– Network boot is first in BIOS boot order– Set Rocks Boot action to install– Reboot the host

$ rocks set host boot <host> action=<boot-action>

$ rocks set host boot compute-0-0 action=install

(or action=os to cancel reinstall)

Page 13: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Reinstall the nodes

• Files updated in /tftpboot

/tftpboot/pxelinux/pxelinux.cfg/XXYYZZAABBCC

default rocks

prompt 0

label rocks

localboot 0

Page 14: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Reinstall the nodes

• Initiate a reboot on the nodes to re-install.

$ rocks run host

<host>

<command>

$ rocks run host

compute-0-0

/sbin/init 6

Page 15: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Check the PXE boot action

• List the current PXE boot action on all the nodes.

$ rocks list host boot

HOST ACTION

frontend: os

compute-0-0: os

compute-0-1: os

compute-1-0: os

compute-1-1: install

compute-2-0: os

compute-2-1: os

Page 16: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

<post> section

• The <post> section of the extend-compute.xml can be used to do any post installation configuration

• The file is XML based and certain characters are not permitted:

Special Character XML syntax > &gt; < &lt; “ &quot; & &amp;

<post>.../path/to/daemon &gt; /path/to/outputfile</post>

Page 17: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

<post> section

• You can put in standard shell commands in to the <post> section

<post>...chkconfig --level 2345 rocks-grub off</post>

• You can also put in python commands

<post><eval shell=“python”>

python shell code </eval>

</post>

Page 18: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Append files in the <post> section

<post>

...

<file name=“/etc/motd” mode=“append”>

Updated MOTD message

</file>

</post>

Page 19: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Create files in the <post> section

<post>

...

<file name=“/etc/rc.d/rocksconfig.d/post-99-dofinalconf” mode=“create” perms=“0755”>

#!/bin/bash

cd /share/apps/myapp

cp file /opt/myapp

rm -f /etc/rc.d/rocksconfig.d/post-99-dofinalconf

</file>

</post>

Page 20: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Problems with adding new packages

• Verify the syntax of the xml extend-compute file

$ xmllint –noout /path/to/extend-compute.xml

• Are kickstart files being generated correctly?

$ /export/rocks/sbin/kickstart.cfg

--client=compute-0-0

Page 21: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Sync Files across Cluster (411)

• 411 is used to keep files consistent across the cluster

• Used for passwd/shadow/group etc• Can be used for other files also

/var/411/Files.mk

FILES_NOCOMMENT = /etc/passwd \

/etc/group \

/etc/shadow

$ rocks sync users

Page 22: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Torque/Maui

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk Quotas

• Using Rocks

Page 23: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Software Included

• Torque http://www.clusterresources.com/products/torque

• Maui http://www.clusterresources.com/products/maui

• mpiexec http://www.osc.edu/~pw/mpiexec

• pbstools http://www.osc.edu/~troy/pbs

• pbspython ftp://ftp.sara.nl/pub/outgoing/

Page 24: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Installed daemons

• Frontend– maui– pbs_server– pbs_mom (not running)– mpiexec (mostly for the man-page)

• Compute– pbs_mom– mpiexec

Page 25: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Scheduling Features

• Maui provides a rich set of scheduling features

• Maui can schedule on– CPUs– Walltime– Memory– Disk size– Network Topology

Page 26: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Needed Job Info

• For scheduling to be useful one needs info about the jobs

– At least number of cpus and walltime– Memory requirement also useful

#PBS –lwalltime=HH:MM:SS

#PBS –lnodes=10:ppn=8

#PBS –lpmem=1gb

Page 27: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Memory handling on Linux

• Torque/Maui supports two memory specification types, (p)mem and (p)vmem on linux.

• pmem is not enforced– Used only as information to the sceduler

• pvmem is enforced– Terminating procs that cross the limit– Limits vmem size setting ulimit –v on the processes

Page 28: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Torque Hacking

• Torque is installed in /opt/torque

• qmgr is the torque mgt. command

• Note: backup your working config

$ qmgr –c “print server” > /tmp/pbsconfig.txt

Page 29: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Torque Hacking

• Roll back to escape from a messed up system:

$ qterm

$ pbs_server –t create

$ qmgr < /tmp/pbsconfig.txt

• This will bring you back to where you started

Note: this will wipe the whole queue setup and all current queued and running jobs will be lost!

Page 30: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Maui Hacking

• Most things can be achieved by modifying:

/etc/maui/maui.cfg

• Maui needs a restart after changing the config file:

$ service maui restart

Page 31: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Should I edit Torque or Maui?

• If you can achieve the same thing by changing either torque or maui, use maui.

• Restarting maui is rather lightweight operation, and seldom causes problems for live systems.

• Restarting pbs_server can make the system oscillatory for a few minutes– pbs_server needs to contact all pbs_moms to get

back in state.

Page 32: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Prioritising Short Jobs

• Often it is useful to give shorter jobs higher priority (maui configuration).

• Use the XFACTOR feature in maui rather than torque queues with different priorities

XFACTORWEIGHT 1000

Page 33: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Prioritising Short Jobs

• XFACTOR is defined as XFACTOR=(walltime+queuetime)/walltime

• XFACTOR will increase faster for shorter walltimes thus giving higher priorities for short jobs.

• Depends on users giving reasonable walltime limits.

Page 34: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Prioritising large jobs (maui)

• In a cluster with a diverse mix of jobs it is useful to prioritize the large jobs and make the smaller ones fill the gaps.

CPUWEIGHT 1000

MEMWEIGHT 100

• This should be combined with fairshare to avoid starving users falling outside this prioritisation.

Page 35: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Fairshare (maui)

• Also known as“keeping all users equally unhappy”

• Can be done on several levels– users, groups …

• Set a threshold

USERCFG[DEFAULT] FSTARGET=10FSWEIGHT 100

• Users having more than 10% will get reduced priority and vice versa.

Page 36: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Adjusting your policy

• You can play with the weights to fine tune your scheduling policies

XFACTORWEIGHT 100FSWEIGHT 1000RESWEIGHT 10CPUWEIGHT 1000MEMWEIGHT 100

• Analyse the prioritisation with

$ diagnose -p

Page 37: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Job Node Distribution

• Default is MINRESOURCE– Run on the nodes which gives at least unused resources.

• Spread or pack?NODEALLOCATIONPOLICY PRIORITY

– Select the most busy nodes

NODECFG[DEFAULT] PRIORITYF=JOBCOUNT

– Select the least busy nodesNODECFG[DEFAULT] PRIORITYF=-1.0*JOBCOUNT

Page 38: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Node Access Policy

• Default access policy is SHARED

• Can choose to limit this to SINGLEJOB or SINGLEUSER, i.e. NODEACCESSPOLICY SINGLEUSER

• Single user access prevents users from stepping on each others toes while allowing good utilisation for serial jobs.

Page 39: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Throttling Policies

• Sometimes one needs to limit the user from taking over the system…– MAXPROC– MAXNODE– MAXPS– MAXJOB– MAXIJOB (I indicating idle, scheduled but not running)

• All can be set for all individuals users and groups – USERCFG[DEFAULT]– USERCFG[USERA] MAXPROC=16

http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml

Page 40: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Debugging and Analysing

• Lots of tools:– pbsnodes -- node status– qstat –f -- all details of a job– diagnose –n -- node status from maui– diagnose –p -- job priority calculation– showres –n -- job reservation per node– showstart -- estimated job start time– checkjob -- check job status– checknode -- check node status

Page 41: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Example: Express Queue

• Goal: Supporting development and job script testing, but prevents misuse

• Basic philosophy:

– Create a separate queue– Give it the highest priority– Throttle is so it is barely usable

Page 42: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Example: Express Queue

• Create the queue with qmgr

create queue express

set queue express queue_type = Execution

set queue express resources_max.walltime = 08:00:00

set queue express resources_default.nodes = 1:ppn=8

set queue express resources_default.walltime = 08:00:00

set queue express enabled = True

set queue express started = True

Page 43: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Example: Express Queue

• Increase the priority and limit the usage (maui.cfg)

CLASSWEIGHT 1000

CLASSCFG[express] PRIORITY=1000 MAXIJOB=1 \

MAXJOBPERUSER=1 QLIST=express QDEF=express \

QOSCFG[express] FLAGS=IGNUSER

This will allow users to test job scripts and run interactive jobs with a good turnaround.

Page 44: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Check jobs running on the nodes

• Check for jobs running on the cluster (showq):[root@fotcluster2 nodes]# showq

ACTIVE JOBS--------------------

JOBNAME USERNAME STATE PROC REMAINING STARTTIME

1472 asmith Running 64 99:13:35:10

1473 klangfeld Running 120 99:22:01:12

2 Active Jobs 184 of 948 Processors Active (19.41%)

16 of 79 Nodes Active (20.25%)

Page 45: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Check host jobs are running on

• Whats nodes are the jobs running on: (checkjob JOBID) :[root@fotcluster2 nodes]# checkjob 1473checking job 1473...Req[0] TaskCount: 120 Partition: DEFAULTNetwork: [NONE] Memory >= 0 Disk >= 0 Swap >= 0Opsys: [NONE] Arch: [NONE] Features: [NONE]Allocated Nodes:[compute-2-31:8][compute-2-30:8][compute-2-29:8]

[compute-2-28:8][compute-2-27:8][compute-2-26:8][compute-2-25:8]

[compute-2-24:8][compute-2-23:8][compute-2-35:8][compute-2-34:8]

[compute-2-33:8]

[compute-2-32:8][compute-2-21:8][compute-2-20:8]

Page 46: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Take nodes OFFLINE

• Take nodes offline (no jobs will be allocated) :[root@fotcluster2 nodes]# pbsnodes -o compute-2-0[root@fotcluster2 nodes]# pbsnodes compute-2-0compute-2-0 state = offline np = 12 ntype = cluster status = opsys=linux,uname=Linux compute-2-0.local

2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64,sessions=? 0,nsessions=? 0,nusers=0,idletime=248891,totmem=25696960kb,availmem=25431280kb,physmem=24676844kb,ncpus=12,loadave=0.00,netload=1104506203,state=free,jobs=,varattr=,rectime=1288610352

Page 47: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Bring nodes ONLINE

• Take nodes offline (no jobs will be allocated) :[root@fotcluster2 nodes]# pbsnodes -c compute-2-0[root@fotcluster2 nodes]# pbsnodes compute-2-0compute-2-0 state = free np = 12 ntype = cluster status = opsys=linux,uname=Linux compute-2-0.local

2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64,sessions=? 0,nsessions=? 0,nusers=0,idletime=248981,totmem=25696960kb,availmem=25431356kb,physmem=24676844kb,ncpus=12,loadave=0.00,netload=1104881434,state=free,jobs=,varattr=,rectime=1288610442

Page 48: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Create Queues• Create additional queues using qmgr

qmgr -c "create queue infiniband“

qmgr -c "set queue infiniband queue_type = Execution“

qmgr -c "set queue infiniband enabled = True“

qmgr -c "set queue infiniband started = True“

qmgr -c "set queue infiniband resources_default.neednodes = ib“

qmgr -c "create queue primare“

qmgr -c "set queue primare queue_type = Execution“

qmgr -c "set queue primare enabled = True“

qmgr -c "set queue primare started = True“

qmgr –c “set queue default resources_default.neednodes = ethernet”

Page 49: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Create Queues• Assign resources to nodes:

In file: /opt/torque/server_priv/nodes

...

compute-1-7 np=8 ethernet

compute-1-8 np=8 ethernet

compute-1-9 np=8 ethernet

compute-2-0 np=12 ib

compute-2-10 np=12 ib

compute-2-11 np=12 ib

compute-2-12 np=12 ib

...

Page 50: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Create Queues• Verify nodes report resources correctly:

[root@fotcluster2 ~]# pbsnodes compute-1-0

compute-1-0

state = free

np = 8

properties = ethernet

[root@fotcluster2 ~]# pbsnodes compute-2-0

compute-2-0

state = free

np = 12

properties = ib

Page 51: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Queue Priority• Set the priority levels for the queues:

In File: /opt/maui/maui.cfg

...

CLASSWEIGHT 1

CLASSCFG[default] PRIORITY=1

CLASSCFG[infiniband] PRIORITY=1

CLASSCFG[primare] PRIORITY=10000

...

• Save and restart mauiservice maui restart

Page 52: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Check Job Priorities• (As root) use the diagnose command:

[root@fotcluster2 ~]# diagnose -p

diagnosing job priority information (partition: ALL)

Job PRIORITY* Cred(Class) Serv(QTime)

Weights -------- 1( 1) 1( 1)

1548 10000 100.0(10000) 0.0( 0.0)

1545 1 84.5( 1.0) 15.5( 0.2)

1547 1 92.3( 1.0) 7.7( 0.1)

Percent Contribution -------- 100.0(100.0) 0.0( 0.0)

*indicates system priority set on job

Red text indicates a job on high priority queue (classcgf)

Page 53: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Submitting to the queues• Submit to Infiniband Queue

#PBS -q infiniband

• Submit to Primare Queue#PBS -q primare

• Submit to Primare Queue (on IB nodes)#PBS -q primare

#PBS -l nodes=NN:ppn=PP:ib

• Submit to Primare Queue (on Ethernet nodes)#PBS -q primare

#PBS -l nodes=NN:ppn=PP:ethernet

Page 54: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Restricting User Access to Queue

• Allow only users from the primare group use primare Q

qmgr -c "set queue primare acl_group_enable=true"

qmgr -c "set queue primare acl_groups=primare"

Page 55: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Summary

• Limit the number of queues.

• You need good information about walltime.

Page 56: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

IPMI Management

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk Quotas

• Using Rocks

Page 57: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

IPMI Overview

• IPMI is a standard which defines a set of common interfaces to monitor system health and manage the system.

• IPMI operates independent of the OS

• IPMI operates even if the monitored system is powered off, but connected to a power source

• IPMI BMCs can be communicated to using IP addressing or queried locally.

Page 58: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Module

• RPMs installed:– OpenIPMI– OpenIPMI-tools (inc ipmitool)– OpenIPMI-libs

• Kernel Modules required:– ipmi_msghandler– ipmi_devinf– ipmi_si

Page 59: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Module

• Set the IP address (static):

ipmitool lan set 1 ipsrc static

ipmitool lan set 1 ipaddr 10.1.1.100

ipmitool lan set 1 netmask 255.255.0.0

ipmitool lan set 1 defgw ipaddr 10.1.1.1

• Set the IP address (dhcp):

ipmitool lan set 1 ipsrc dhcp

Page 60: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• IPMI modules part of the Rocks Network• Edit the /etc/hosts.local file

10.1.1.100 compute-ipmi-0-0

10.1.1.101 compute-ipmi-0-1

• Run dbreport hosts or rocks report hosts• Sync the /etc/hosts file using 411

(/var/411/Files.mk)

$ rocks sync users

Page 61: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• IPMI Modules on a separate subnet• Step 1: Add a network

$ rocks add network <network-name>

<network>

<subnet>

$ rocks add network ipmi

192.168.1.0

255.255.255.0

Page 62: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• List the networks on your Rocks cluster

$ rocks list network

NETWORK SUBNET NETMASK

private: 10.12.0.0 255.255.0.0

public: 169.228.3.0 255.255.255.240

ipmi: 192.168.1.0 255.255.255.0

Page 63: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• Step 2: Add the interfaces through Rocks

– Host first must be installed– Then secondary NICs can be added– After all hosts are configured just re-install

Page 64: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• Add host interface:

$ rocks add host interface <host> <iface>

ip=<address> subnet=<name> gateway=<address>

name=<hostname>

$ rocks add host interface compute-0-0 ipmi

ip=192.168.1.1 subnet=ipmi gateway=ipmi

gateway=1 name=compute-ipmi-0-0

Page 65: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Configuring the IPMI Network

• List host interface:

$ rocks list host interface compute-0-0

SUBNET IFACE MAC IP NETMASK GATEWAY MODULE NAME

private eth0 00:15:17:79:d3:c0 10.12.0.12 255.255.0.0 ------- e1000e compute-0-0

Ipmi ipmi ----------------- 192.168.1.2 255.255.255.0 2 ------ ipmi-0-0

Page 66: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

(Alternative) Configuring the IPMI Network

• Add hosts to /etc/hosts.local:10.1.10.10 compute-1-0-ipmi10.1.10.11 compute-1-1-ipmi10.1.10.12 compute-1-2-ipmi10.1.10.13 compute-1-3-ipmi

• Regenerate the /etc/hosts file (which will append all entries from /etc/hosts.local$ dbreport hosts > /etc/hosts...# import from /etc/hosts.local10.1.10.10 compute-1-0-ipmi10.1.10.11 compute-1-1-ipmi10.1.10.12 compute-1-2-ipmi10.1.10.13 compute-1-3-ipmi

Page 67: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Check the IPMI serivice is running

• Use the tentakel command:tentakel -g rack2 chkconfig --list '| grep ipmi'

### compute-2-35(stat: 0, dur(s): 0.16):

ipmi 0:off 1:off 2:on 3:on 4:on 5:on 6:off

### compute-2-23(stat: 0, dur(s): 0.17):

ipmi 0:off 1:off 2:on 3:on 4:on 5:on 6:off

• If any nodes have the service set to off, turn on using:tentakel -g rack2 chkconfig --level 2345 ipmi on

Page 68: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Using IPMItool

• Chassis Status

ipmitool chassis statusSystem Power : onPower Overload : falsePower Interlock : inactiveMain Power Fault : falsePower Control Fault : falsePower Restore Policy : always-offLast Power Event : ac-failedChassis Intrusion : activeFront-Panel Lockout : inactiveDrive Fault : falseCooling/Fan Fault : false

Page 69: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Using IPMItool

• Chassis Status over the network

ipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 chassis statusSystem Power : onPower Overload : falsePower Interlock : inactiveMain Power Fault : falsePower Control Fault : falsePower Restore Policy : always-offLast Power Event : ac-failedChassis Intrusion : activeFront-Panel Lockout : inactiveDrive Fault : falseCooling/Fan Fault : false

Page 70: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Using IPMItool

• Reset the IPMI Module (sometimes causes network issues)

ipmitool mc reset cold

ipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 mc reset cold

• Then reload network modules on nodes:rmmod e1000e ; modprobe e1000e

Page 71: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Using IPMItool

• Controlling the Power

Initiate a clean shutdownipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 power soft

Power Offipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 power off

Power Onipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 power on

Power Cycleipmitool –U ADMIN –P ADMIN –h compute-ipmi-0-0 power cycle

Page 72: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Implementing Disk/User Quotas

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk/User Quotas

• Using Rocks

Page 73: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (general)

• In order to use disk quotas, they must first be enabled:

– Modify /etc/fstab– Remount the filesystem(s)– Run quotacheck– Assign Quotas

Page 74: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (general)

• Modify /etc/fstab

• Edit the line with your home directoryLABEL=/state/partition

• On this line change the mount options in include grpquota and usrquota

defaults -> grpquota,usrquota,defaults

• Reboot the machine

Page 75: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (ext3)

• When the system has come back up, run quotacheck• This examines quota-enabled file systems, building a table of

the current disk usage for each one.

quotacheck –guvma

-g -- group quotas-u -- user quotas-v -- report the operation as it progresses-m -- do not remount filesystems as RO-a -- check all mounted non-NFS filesystems

Page 76: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (ext3)

• Turn File system quotas on

quotaon –guva

• Setup a user quota

edquota –u <username>

• Modify the number of days for the hard and soft limits

edquota –t

• Propagate these settings to all other users

edquota –p <user_with_updated_quota> -u users..

Page 77: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (ext3)

• Editing the quota limits (edquota)

Disk quotas for user viglen (uid 501): Filesystem blocks soft hard inodes soft hard /dev/sda5 24 5000 0 7 0 0

• Blocks: The amount of space in 1K blocks the user is currently using.

• Inodes: The number of files the user is currently using.

• Soft Limit: The maximum blocks/inodes a quota user may have on a partition. The role of a soft limit changes if grace periods are used. When this occurs, the user is only warned that their soft limit has been exceeded. When the grace period expires, the user is barred from using additional disk space or files. When set to zero, limits are disabled.

• Hard Limit: The maximum blocks/inodes a quota user may have on a partition when a grace period is set. Users may exceed a soft limit, but they can never exceed their hard limit.

Page 78: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (ext3)

• Reporting on quota usage:

repquota /state/partition1

• Useful TIP: Display user quotas at their login (/etc/profile)

quota –u $USER

Page 79: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (XFS)

*[Steps below assume /etc/fstab has been updated and the system has been rebooted]*

• Confirm quotas are enabled

xfs_quota –x –c state

• Check the current space usage

xfs_quota –x –c quot

Page 80: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (XFS)

• Set user limits (where username is the user account name)Limits below implement a 6GB hard limit (cant write more than 6GB) and a 5GB soft limit (warnings issued after 5GB)

xfs_quota –x /state/partition1

xfs_quota> limit bsoft=5g bhard=6g username

xfs_quota> quota -h username

Disk quotas for User username(id)

Filesystem Blocks Quota Limit Warn/Time Mounted on /dev/sdb1 2.4G 5G 6G 00 [------] /state/partition1

Page 81: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (XFS)

• Disable user quota, set the limits back to 0

xfs_quota –x /state/partition1

xfs_quota> limit bsoft=0 bhard=0 username

xfs_quota> quota -h username

Disk quotas for User username(id)

Filesystem Blocks Quota Limit Warn/Time Mounted on /dev/sdb1 2.4G 0 0 00 [------] /state/partition1

Page 82: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Disk Quotas (XFS)

• Report on XFS Quota Usage[root@fotcluster2 scripts]# xfs_quota -xxfs_quota> reportUser quota on /state/partition1 (/dev/sdb1) BlocksUser ID Used Soft Hard Warn/Grace---------- --------------------------------------------------root 18050992 0 0 00 [0 days]maui 2447696 0 0 00 [--------]icr 105732 0 0 00 [--------]biouser 0 0 0 00 [--------]viglen 8343092 0 0 00 [--------]klangfeld 1590916 99614720 104857600 00 [--------]pmills 15474924 99614720 104857600 00 [--------]troc 436572 99614720 104857600 00 [--------]ngardiner 4826800 99614720 104857600 00 [--------]

Page 83: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Implementing Disk/User Quotas

• Rocks Overview

• Managing Rocks

• Under the hood with Torque/Maui

• IPMI Management

• Implementing Disk/User Quotas

• Using Rocks

Page 84: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

System Access

• Access the Rocks Cluster via ssh (port 22)• From Linux systems this is already built in (ssh from the CLI)• For graphical forwarding use ssh with X-forwarding

ssh –X headnode

• To copy file use scpscp file username@headnode:path/to/file

scp –r directory username@headnode:/path/to/dir

• From Windows systems use puttyhttp://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

• For graphical forwarding in Windows use Xminghttp://www.straightrunning.com/XmingNotes/

Page 85: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

System Access

• From Windows systems use puttyhttp://www.chiark.greenend.org.uk/~sgtatham/putty/download.html

• For graphical forwarding in Windows use Xminghttp://www.straightrunning.com/XmingNotes/

• To copy files use WinSCPhttp://winscp.net/

Page 86: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

System Environment

• To setup you environment to compile/run applications, u se the modules environment[viglen@ulgqcd ~]$ which pgcc

/usr/bin/which: no pgcc in $PATH

[viglen@ulgqcd ~]$ pgcc -V

-bash: pgcc: command not found

[viglen@ulgqcd ~]$ module available

--------------- /usr/share/Modules/modulefiles --------------------------------------

pgi/32/10.9 pgi/64/10.9 rocks-mpich2/gnu/1.1.1-p1

pgi/32/10.9-mpich pgi/64/10.9-mpich

[viglen@ulgqcd ~]$ module load pgi/64/10.9

[viglen@ulgqcd ~]$ which pgcc

/share/apps/pgi/10.9/linux86-64/10.9/bin/pgcc

[viglen@ulgqcd ~]$ pgcc -V

pgcc 10.9-0 64-bit target on x86-64 Linux -tp istanbul-64

Page 87: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Job Control

• Check Jobs running on the queue[viglen@ulgqcd ~]$ showq

ACTIVE JOBS--------------------

JOBNAME USERNAME STATE PROC REMAINING STARTTIME

0 Active Jobs 0 of 496 Processors Active (0.00%)

0 of 31 Nodes Active (0.00%)

IDLE JOBS----------------------

JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

0 Idle Jobs

BLOCKED JOBS----------------

JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

Page 88: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Sample Submission Script

#!/bin/bash #PBS -l nodes=4

#PBS –l walltime=2:00 #PBS -o mytest.out #PBS -e mytest.err

cd $PBS_O_WORKDIR

NPROCS=`wc -l $PBS_NODEFILE | awk '{ print $1 }'`

/opt/mpich/bin/mpirun -np $NPROCS -machinefile $PBS_NODEFILE hello

Page 89: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Job Control

• Submit a job to the queue[viglen@ulgqcd qsub]$ qsub q_sleep.sh

110.ulgqcd.ph.liv.ac.uk

[viglen@ulgqcd qsub]$ showq

ACTIVE JOBS--------------------

JOBNAME USERNAME STATE PROC REMAINING STARTTIME

110 viglen Running 1 00:00:00 Wed Nov 3 23:20:45

1 Active Job 1 of 496 Processors Active (0.20%)

1 of 31 Nodes Active (3.23%)

Page 90: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Job Control

• Delete a job from the queue[viglen@ulgqcd qsub]$ qsub q_sleep.sh

110.ulgqcd.ph.liv.ac.uk

[viglen@ulgqcd qsub]$ qdel 110

Page 91: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Job Control

• Take a look at the output before jobs finishes[viglen@ulgqcd qsub]$ qpeek 190memtester version 4.0.8 (64-bit)Copyright (C) 2007 Charles Cazabon.Licensed under the GNU General Public License version 2 (only).

pagesize is 4096pagesizemask is 0xfffffffffffff000want 90000MB (94371840000 bytes)got 90000MB (94371840000 bytes), trying mlock ...locked.Loop 1/1: Stuck Address : ok Random Value : ok Compare XOR : Compare SUB : Compare MUL :

Compare DIV : Compare OR : Compare AND : Sequential Increment: ok

Solid Bits : testing 49 Block Sequential : setting 11

Page 92: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Torque/PBS Submit Options

#PBS –I # Submit an interactive job

#PBS –q queue_name # Submit to a certain queue

#PBS –l nodes=compute-0-0 # Submit to a specific host

#PBS –l nodes=4:ppn=4 # 4 Host with 4 Processes per node

#PBS –j oe # Combine stdout and stderr

#PBS –o stdout_file # Specify the Output file

#PBS –e stderr_file # Specify the Error file

#PBS –V # Import you environment vars

Page 93: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Torque/PBS Variables

$PBS_O_HOST # Hostname where job launched

$PBS_O_QUEUE # Originating Queue

$PBS_O_WORKDIR # Execution Queue

$PBS_JOBID # Job Identifier

$PBS_JOBNAME # Job name

$PBS_NODEFILE # Nodefile (for parallel jobs)

$PBS_O_HOME # Current Home Directory

$PBS_PATH # PATH

Page 94: Viglen Rocks Cluster Training University of Liverpool David Power HPC Systems Engineer.

Thank You!

Web: http://www.viglen.co.uk/hpcEmail: [email protected]