Managing Linux Clusters with Rocks Tim Carlson - PNNL [email protected].
-
Upload
annabel-powers -
Category
Documents
-
view
225 -
download
4
Transcript of Managing Linux Clusters with Rocks Tim Carlson - PNNL [email protected].
Introduction
Cluster DesignThe ins and outs of designing compute solutions for scientists
Rocks Cluster SoftwareWhat it is and some basic philosophies of Rocks
Midrange computing with Rocks at PNNLHow PNNL uses Rocks to manage 25 clusters ranging from 32 to 1500 compute cores
I Need a Cluster!
Can you make use of existing resources?
chinook 2310 Barcelona CPUs with DDR Infiniband
Requires EMSL proposal
superdome256 core Itanium 2 SMP machine
Short proposal required
Department clustersHPCaNS manages 25 clusters. Does your department have one of them?
Limited amount of PNNL “general purpose” compute cycles
I Really Need a Cluster!
Why?Run bigger models?
Maybe you need a large memory desk side machine. 72G in a desk side is doable (dual Nehalem with 18 x 4G DIMMS)
Do you need/want to run parallel code? Again, maybe a desk side machine is appropriate. 8 cores in single machine
You Need a Cluster
What software do you plan to run?WRF/MM5 (atmospheric/climate)
May benefit from low latency networkQuad core scaling?
NWChem (molecular chemistry)Usually requires a low latency networkNeed an interconnec that is fully supported by ARMCI/GAFast local scratch required. Fast global scratch a good idea
Home Grown Any idea of the profile of your code? Can we have a test case to run on our test cluster?
Processor choices
IntelHarpertown or Nehalem
Do you need the Nehalem memory bandwidth?
AMDBarcelona or Shanghai
Shanghai is a better Barcelona
DisclaimerThis talk was due 4 weeks early. All of the above could have changed in that time
More Hardware Choices
Memory per coreBe careful configuring Nehalem
InterconnectGigE, DDR, QDR
Local disk I/ODo you even use this?
Global file systemAt any reasonable scale you probably aren’t using NFS
Lustre/PVFS2/Panasas
Rocks Software Stack
Redhat basedPNNL is mostly Redhat so the environment is familiar
NFS Funded since 2000Several HPC Wire awardsOur choice since 2001Originally based on Redhat 6.2, now based on RHEL 5.3
Rocks is a Cluster Framework
CustomizableNot locked into a vendor solution
Modify default disk partitioning
Use your own custom kernel
Add software via RPMs or “Rolls”
Need to make more changes?
Update an XML file, rebuild the distribution, reinstall all the nodes
Rocks is not “system imager” basedAll nodes are “installed” and not “imaged”
Rocks Philosophies
Quick to installIt should not take a month (or even more than a day) to install a thousand node cluster
Nodes are 100% configured No “after the fact” tweaking
If a node is out of configuration, just reinstall
Don’t spend time on configuration management of nodesJust reinstall
What is a Roll
A Roll is a collection of software packages and configuration information“Rolls” provide more specific tools
Commercial compiler Rolls (Intel, Absoft, Portland Group)
Your choice of scheduler (Sun Grid Engine, Torque)
Science specific (Bio Roll)
Many others (Java, Xen, PVFS2, TotalView, etc)
Users can build their own Rolls – https://wiki.rocksclusters.org/wiki/index.php/Main_Page
Scalable
Not “system imager” basedNon-homogeneous makes “system imager” types installation problematic
Nodes install from kickstart files generated from a databaseSeveral clusters registered with over 500 nodes
Avalanche installer removes pressure from any single installation server
Introduced in Rocks 4.1Torrent basedNodes share packages during installation
Community and Commercial Support
Active mailing list averaging over 700 posts per monthAnnual “Rocks-A-Palooza” meeting for community members
Talks, tutorials, working groups
Rocks cluster register has over 1100 clusters registered representing more than 720 Teraflops of computational powerClusterCorp sells Rocks+ support based on open source Rocks
PNNL Midrange Clusters
Started in 20018 node VALinux cluster Dual PIII 500Mhz with 10/100 ethernet Chose “Rocks” as the software stack
Built our first “big” cluster that same year64 Dual Pentium III at 1 GhzRebuild all the nodes with Rocks in under 30 minutesParts of this system are still in production
Currently manage 25 clustersRange in size from 16 to1536 coresInfiniband is the primary interconnectAttached storage ranges from 1 to 100 Terabytes
14
HPCaNS Management Philosophy
Create service center to handle moneyCharge customers between $300 and $800/month based on size and complexity
Covers account management, patching, minimal backups (100G), compiler licenses, BigBrother monitoring, general sysadmin
Use .75 FTE to manage all the clusters
“Non-standard” needs are charged by time and materialsAdding new nodes
Rebuilding to a new OS
Software porting or debugging
Complex queue configurations
Support Methods
BigBrother alertsHooks into ganglia checking for
Node outages
Disk usage
Email problems to cluster sysadmins
See next slide after a bad power outage!
Support queueUsers pointed to central support queue
5 UNIX admins watching the queue for cluster items
Try to teach users to use the support queue
Typical Daily Questions
Can you add application X, Y, Z?My job doesn’t seem to be running in the queue?The compiler gives me this strange error!Do you have space/power/cooling for this new cluster I want to buy?This code runs on cluster X, but doesn’t run on cluster Y. Why is that? Aren’t they the same?Can I add another 10T of disk storage?The cluster is broken!
Always Room for Improvement
Clusters live in 4 different computer roomsCan we consolidate?
Never enough user documentationStandardize on resource managers
Currently have various versions of Torque and SLURM
Should we be upgrading older OSes ?Still have RHEL 3 based clusters
Do we need to be doing “shared/grid/cloud” computing?Why in the world do you have 25 clusters?