Linux RedHat Cluster Manager InstallationAdministrationGuide
Linux Cluster Job Management Systems (SGE)
-
Upload
anandvaidya -
Category
Economy & Finance
-
view
16.202 -
download
7
Transcript of Linux Cluster Job Management Systems (SGE)
Job Management SystemsSGEv1.3Author: Anand [email protected]
Why use SGE?
Maintain order in a shared resource like queing up at a movie ticket counter rather than mobbing the counter
Apply different usage policies PhDs and Profs get better treatment than first year grads
Everyone gets a fair share of the computing resource.
What is SGE?
SGE is a distributed resource management software
Provides users the means to submit computationally demanding tasks to the SGE system for transparent distribution of the associated workload.
How does SGE work?
Users submit jobs to the Grid Engine.
Unless resources are immediately available non-interactive jobs are kept in queues until resources to execute them become available.
Jobs are passed onto the available execution hosts
Records of each jobs progress through the system are kept and reported when requested.
SGE Components
Hosts
Master (coordinate activities, hold queues)
Execution (workers)
Administration (sets up system, queues etc)
Submit (users can submit jobs from these)
Usually the master and admin host are the same machines
Queues (defined by the administrator)
User and Administrator Commands
Daemons: sge_qmaster (Master Daemon), sge_schedd (Scheduler Daemon), sge_execd (Execution Daemon) and sge_commd (Communication Daemon)
SGE Commands - qhost
What is the state of the cluster? How many nodes, type, load? What is my chance of getting a node?
[root@shark ~]# qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
shark-c00 lx24-amd64 2 2.02 3.9G 240.8M 4.0G 0.0
shark-c02 lx24-amd64 2 2.00 3.9G 214.9M 4.0G 0.0
shark-c03 lx24-amd64 2 1.76 3.9G 215.9M 4.0G 0.0
SGE Commands - qsub
Create a jobscripts (myjob.sh)
Submit for execution
$ qsub myjob.sh
Your job 742 ("myjob.sh") has been submitted.
Simplest Job:
[vaidya@shark ~]$ cat myjob.sh
#!/bin/sh
sleep 10
date > /tmp/test1.out.txt
Variations: qsub -cwd myjob.sh
(C) Anand Vaidya [email protected]
SGE Commands - qstat
check status of your job:
qstat ; qstat -f ;
qstat -u username ; qstat -j job_id
[root@shark ~]# qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
639 0.55500 HCPDIV7 test1 r 05/17/2006 10:16:31 all.q@shark-c00 1
658 0.55500 HCPDIV1 test1 r 05/17/2006 13:37:35 all.q@shark-c00 1
694 0.55500 FCCDVI test1 r 05/17/2006 23:52:19 all.q@shark-c02 1
695 0.55500 FCCDVI1 test1 r 05/17/2006 23:52:19 all.q@shark-c02 1
SGE Commands - qstat
Status of the job is indicated by letters as:
qw - waitingt - transfering
r - runnings,S - suspended
R- restarted T- threshold
SGE Commands - qdel
Delete your job, if you wish
qdel 743
vaidya has deleted job 743
SGE Commands - qmon
qmon is a XWindows GUI tool to submit/delete/view jobs, configure SGE system
Example: Submit a job using qmon
Click the Job Submission icon.
Click the Job Script file selection icon to open a file selection box and select your script file. Then, click OK.
Click the Submit button at the bottom of the Job Submission dialog.
After a couple of seconds, you should be able to monitor your job in the Job Control dialog. Click the Job Control icon in the QMON control panel.
You first see it under Pending Jobs, and it quickly moves to Running Jobs after it gets started.
SGE Commands qsh, qtcsh
Submit a Interactive session request:
qlogin
qrsh
Ensure you have a valid XServer running on your desktop. Allow remote xclients to display on your desktop.
Submit an Interactive session request:
qsh
qtcsh
Note: using this feature needs additional configuration, may not work otherwise.
SGE Commands jobscript
sample job script:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
date
sleep 10
env
date
SGE Commands jobscript
sample job script:
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
$MPI_DIR/mpirun -np $NSLOTS -machinefile $TMPDIR/machines myparallelprog.exe {infile.txt outfile.txt}
SGE Commands jobscript
-cwd = change to current dir before running job
-j y = merge error with stdout
-r y = code is re-runnable
-N jname = set the job name
-l h_rt = 00:30:00 run job for max of 30mins
-pe mpich Invoke parallel environment
-pe mpich-ib use infiniband parallel environment
-pe mpich-eth use ethernet parallel env
-V = carry all env variable settings
Admin Commands
Next few slides show commands useful for SGE admins (not users/researchers)
SGE Commands qconf
Show:
complexes:qconf -sc
queues:qconf -sql
PE:qconf -spl
exec host:qconf -selqconf -se c35
submit hosts:qconf -ss
admin hosts:qconf -sh
list calendarsqconf -scall
configurationqconf -sconf
user list:qconf -suserl
Scheduler conf:qconf -ssconf
SGE Commands qping
[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1
05/24/2006 21:57:34:
SIRM version: 0.1
SIRM message id: 1
start time: 05/24/2006 21:31:37 (1148477497)
run time [s]: 1768
messages in read buffer: 0
messages in write buffer: 0
nr. of connected clients: 2
status: 0
info: dispatcher: R (0.04) | OK
Monitor: disabled
LSF Commands
bsub submit a job
bstop suspend a job
bresume resume a suspended task
btop move job to top
bswitch move jobs between queues
lsgrun run a task on a set of hosts
bkill kill a job
LSF Commands
lsmon monitor load, resource availability...
lsid show lsf details (version etc)
lshosts show hosts & static info
lsload show load info for hosts
lsinfo show lsf config info
busers show user info
bacct show acct info on finished jobs
bjobs show info on jobs
bpeek show stdin/stdout of unfinished jobs
Acknowledgements & Copying
This material is based on my experience as well as material collected from SGE documentation.
This presentation can be redistributed as follows:
No commercial re-distribution: eg, as part of a for-profit CDROM or as part of your sales pitch. Seek my permission first.
Must attribute the document creator.
Share alike: If you use this document and enhance it or modify, share the modifications or the modified document
Which means I apply: Creative Commons License, http://creativecommons.org/licenses/by-nc-sa/2.5/
The End
Thanks for your time. If you have any feedback, corrections or questions please contact me: Anand Vaidya, [email protected]
This document was created with OpenOffice on Linux. email me if you want the odp file instead of the pdf
Click to edit the title text format
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level
[email protected]: http://creativecommons.org/licenses/by-nc-sa/2.5/