Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera...

16
Using the IAC Chimera Cluster Ángel de Vicente (Tel.: x5387) SIE de Investigación y Enseñanza

Transcript of Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera...

Page 1: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Using the IAC Chimera Cluster

Ángel de Vicente(Tel.: x5387)

SIE de Investigación y Enseñanza

Page 2: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Chimera overview

● Beowulf­type cluster● Chimera: a monstrouscreature made of the parts ofmultiple animals.● Mailing list:[email protected]● Web page:http://chimera● Course on Adv. Prog. andParallel Comp. (June 11­25)

Page 3: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Schematic View

Page 4: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Hardware Details

● Nodes:– 1 master node (EM64T)– 16 old i686 nodes: 32 Xeon 2.80 Ghz (chi32)– 16 new EM64T nodes: 32 Xeon 3.20 Ghz (chi64)

● RAM: 98 GB (master: 2 + chi32: 32 + chi64: 64)● Disk: ~ 5TB (master: 280 + chi32: 480 + chi64: 

4.5TB)● Network: two independent Gigabit networks 

(user applications and admin, nfs, etc.)

Page 5: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Disk space

● User­available space:– (all) /home  (NFS master):   50 GB

/scratch    (NFS master): 195 GB– (chi32)  /local_scratch (local):  (per node) 20 GB– (chi64) /mnt/pvfs2 (PVFS2 chi64): 3.9 TB

● /home quotas to be implemented● automatic deletion in the other partitions to be 

implemented as well.

Page 6: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

PVFS2 Introduction

● Stripes data across disks (chi64 in Chimera)● Larger files can be created, and potential band­

with is increased.● Multiple user­interfaces:

– MPI­IO support– Traditional Linux file system

Page 7: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

PVFS2 Example● With MPI­IO:

/scratch (NFS) /mnt/pvfs2 (PVFS2)Processors: 60Write bandwith:         24MB/s       892MB/sRead bandwith:                  116MB/s                                          482MB/s

● Traditional Linux file system:local disk /scratch (NFS) /mnt/pvfs2(PVFS2)

Processors: 1Write 900 MB        14.77s                43.942s    11.779s Read 900 MB (wc)   6.401s                     10.007s                             45.942s

Page 8: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Modules package● Dynamic modification of a user's environment:

– PATH, MANPATH, etc.● Shared and/or private modulefiles.● Useful in managing different versions of applica­

tions.● Very simple to use:

– module help | avail | list | load | unload● Use module commands is .bashrc for common 

environment. ● Useful for dealing with chi32 vs. chi64

Page 9: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Compiling code● Code compiled in 64 bits can only run in chi64.● Code compiled in 32 bits can run in chi32, chi64 

or chimera (chi32 + chi64).● By default you login into a 64bits environment.

– (see this by running uname ­a)● Modules are by default 64 bits. 32 bits versions 

end with _32● Environment and modules' bitness should match.

Page 10: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Compiling code (2)● Compiling example for 64 bits:

– [angelv@chimera sieminar]$ mpicc ­o cpi_64 cpi.c– [angelv@chimera sieminar]$ file cpi_64cpi_64: ELF 64­bit LSB executable, AMD x86­64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

● Compiling example for 32 bits:– env32 puts us into a 32 bits environment–[angelv@chimera sieminar]$ module list  (verify 32 bits versions)–[angelv@chimera sieminar]$ mpicc ­o cpi_32 cpi.c–[angelv@chimera sieminar]$ file cpi_32cpi_32: ELF 32­bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped

Page 11: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Submitting jobs to the cluster● Chimera's queueing system:

– Torque: Resource Manager– Maui: Scheduler

● Maui/Torque basic commands: – showq, qsub, checkjob, canceljob

● qsub needs a submitting file:– [angelv@chimera sieminar]$ cat submit­cpi

#!/bin/shNP=$(wc ­l $PBS_NODEFILE | awk '{print $1}')cd $PBS_O_WORKDIR

mpirun ­np $NP ­machinefile $PBS_NODEFILE ./cpi

Page 12: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Submitting jobs to the cluster (2)● With qsub you specify:

– the number of nodes required, the time required, the bitness of nodes required, etc.

● Example submissions:– To chi64 (default):

qsub ­l nodes=4:ppn=2,walltime=03:00:00 submit­cpi– To chi32:

qsub ­l nodes=4:ppn=2 ­q chi32 submit­cpi– To chimera:

qsub ­l nodes=4:ppn=2 ­q chimera submit­cpi

Page 13: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Scheduling policies● Current policies NOT FIFO (/usr/local/maui/maui.cfg):

– Time in queue– Expansion factor– Backfilling– Number of requested processors– Fairshare

● Max time for a job: 3.5 days for 128 processors.● Usage of Beoiac (old cluster): 54.18% (last 2 

years)● “The early bird catches the worm!”

Page 14: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Monitorization● Graphical view of scheduling status

(same output as showq, but perhaps easier to interpret) http://chimera/cgi­bin/mauistatus.pl

● Graphical view of different metrics of the cluster(are your allocated nodes really doing something?)http://chimera/ganglia/

Page 15: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

Other resources at the IAC

● Condor system (~ 180 machines, ideal for pa­rameter studies).

● Future CALP node (512 nodes, 20% exclusive to IAC)

Page 16: Using the IAC Chimera Cluster– [angelv@chimera sieminar]$ mpicc o cpi_64 cpi.c – [angelv@chimera sieminar]$ file cpi_64 cpi_64: ELF 64bit LSB executable, AMD x8664, version 1 (SYSV),

References

● Beowulf.org (http://www.beowulf.org)● Chimera@wikipedia (http://en.wikipedia.org/wiki/Chimera_%28mythology%29)● IAC mailing list (http://listas.iac.es/mailman/listinfo/beowulf) ● Chimera IAC web page (http://chimera/)● IAC Course on Parallel Comp. (http://goya/SIE/forum/viewtopic.php?t=141)● PVFS2 (http://www.pvfs.org)● Modules package (http://modules.sourceforge.net)● Maui (http://www.clusterresources.com/pages/products/maui­cluster­scheduler.php)● Torque (http://www.clusterresources.com/pages/products/torque­resource­manager.php)● Condor IAC web page (http://www.iac.es/sieinvens/SINFIN/Condor/)