Post on 25-May-2020
Streamlining Research Computing Infrastructure
!!
A small school’s experience
Gowtham HPC Research Scientist, ITS Adj. Asst. Professor, Physics/ECE
!g@mtu.edu
(906) 487-3593 http://www.mtu.edu
Houghton, MIIsle Royale National Park, MI 56 miles
Green Bay, WI 215 miles
Dulu
th, M
N 21
5 m
iles
Detroit, MI 550 miles
Twin
Citie
s, M
N 37
5 m
iles
Sault Ste. Marie, M
I Canada 265 m
iles
���3
���4
Michigan Tech Fall 2013
1885
1897
1927
1964
- Population - Houghton/Hancock: 15,000 (22,000)
- Students: 7,000 (5,600 +1,400)
- Faculty: 500
- Staff: 1,000
- General budget: $170 million
- Sponsored programs awards: $48 million
- Endowment value: $83 million
- 8 mini to medium sized clusters - Spread around campus
- Varying versions of Rocks
- Different software configurations
- Single power supply for most components
- Manual systems administration and maintenance
- Minimal end user training and documentation
���5
An as is snapshot January 2011
These 8 clusters — purchased mostly with start-up funds — had 1,000 CPU cores spanning several hardware generations and few low-end GPUs. Only one of them had InfiniBand (40 Gb/s).
- Move all clusters to one of two data centers - Merge clusters when possible
- Consistent racking, cabling and labeling scheme
- Upgrade to Rocks 5.4.2
- Identical software configuration
- End user training
- Complete documentation
���6
Rack 107, Back side, 36th slot, On Board NIC 1 (of a node)
Initial consolidation January 2011 — March 2011
Compute nodes deemed not up to the mark were put away for building a test cluster : wigner.research.mtu.edu
R107B36 OB1
R107B41 P01
Rack 107 Back side, 41st slot, Port 01 (of the switch)
- hpcmonitor.it.mtu.edu
- Ganglia monitoring system
���7
Capture usage pattern April 2011 — December 2011
Monitoring multiple clusters with Ganglia: http://central6.rocksclusters.org/roll-documentation/ganglia/6.1/x111.html
- Low usage - 20% on most days
- 45-50% on luckiest of days
- Inability and/or unwillingness to share resources - Lack of resources for researchers in need
- More systems administrative work - Space, power and cooling costs
- Less time for research, teaching and collaborations
���8
Analysis of usage pattern January 2012
- VPR, Provost, CIO, CTO, Chair of HPC Committee and yours truly - Strongly encourage sharing of under-utilized clusters
- End of life for existing individual clusters
- Stop funding new individual clusters
- Acquire one big centrally managed cluster
- Central administration will fully support the new policies - One person committees
- No exceptions for anyone
���9
The meeting January 2012
���10
The philosophy January 2012
Greatest good for the greatest number- Warren Perger and Gifford Pinchot
Much is said of the questions of this kind, about greatest good for the greatest number. But the greatest number too often is found to be one. It is never the greatest number in the common meaning of the term that makes the greatest noise and stir on questions mixed with money … - John Muir
���11
It’s not just a keel and a hull and a deck and sails. That’s what a ship needs but not what a ship is. But what a ship is … what the Black Pearl Superior really is … is freedom. - Captain Jack Sparrow, Pirates of the Caribbean
Adopted shamelessly from Henry Neeman’s SC11 presentation: Supercomputing in Plain English
The philosophy January 2012
- $750k for everything
- $675k for hardware + 10% for unexpected expenses
- 5 rounds with 4 vendors (2 local; 2 brand names) - Local vendor won the bid February 2013 - Staggered delivery of components April — May 2013
- Fly-wheel installation April — May 2013 - Load test with building and campus generators
���12
Bidding/Acquiring process February 2012 — May 2013
- Built with retired nodes from other clusters - 1 front end
- 2 login nodes
- 1 NAS node (2 TB RAID1 storage)
- 32 compute nodes
- 50+ software suites
- 150+ users
First version of wigner had just two nodes: 1 front end and 1 compute node, built with retired lab PCs and no switch
���13
wigner.research January 2011 — December 2013
As of Spring 2014, wigner has been retired. The nodes are being used as a testing platform for upcoming Data Science program at Michigan Tech and to teach building and managing a research computing cluster as part of PH4395: Computer Simulations
- HPC Proving Grounds - OS installation and customization
- Software compilation and integration with queueing system
- Extensive testing of policies, procedures and user experience - PH4390, PH4395 and MA5903 students
- Small to medium sized research groups
- Automating systems administration
- Integrating configuration files, logs, etc. with a revision control system
���14
wigner.research March 2011 — December 2013
- Central Rocks server (x86_64) - Serves 6.1, 6.0, 5.5, 5.4.3 and 5.4.2
- Saves time during installation
- Facilitates inclusion of cluster-specific rolls
���15
rocks.it.mtu.edu April 2012 — present
Scripts and procedures were provided by Philip Papadopoulos
- 1 front end
- 2 login nodes
- 1 NAS node: 33 TB usable RAID60 storage space
- 72 CPU compute nodes
- 5 GPU compute nodes - 4 NVIDIA Tesla M2090 GPUs (448 CUDA cores)
���16
Superior June 2013
Compute nodes (CPU and GPU): Intel Sandy Bridge E5-2670 2.60 GHz 16 CPU cores and 64 GB RAM
Housed in the newly built Great Lakes Research Center : http://www.mtu.edu/greatlakes/
- 56 Gbps InfiniBand - Primary research network
- Copper cables
- Gigabit ethernet - Administrative and secondary research network
- Redundant power supply for every component
���17
Superior June 2013
With 81 total nodes, there was 33% room for growth before needing to re-design the InfiniBand switch system. Final cost was $680k. Remaining $70k was used to build a test cluster : portage.research.mtu.edu
- Physical assembly (7 days) - Racking, cabling and labeling
- Rocks Cluster Distribution (5 days) - OS installation, customization, compliance
- Software compilation, user accounts
- 3 pilot research groups (14 days) - Reward for being good and productive users
- Help fix bugs, etc.
���18
Superior June 2013
���19
Superior June 2013
Ethernet switch system
Front end
Login nodes
CPU Compute nodes
InfiniBand switch system
Storage node
GPU compute nodes
- short.q (compute-0-N; N: 0-7) - 24 hour limit on run time
- long.q (compute-0-N: N: 8-81) - No limit on run time
- gpu.q (compute-0-N: N: 82-86) - No limit on run time
���20
Superior June 2013
http://superior.research.mtu.edu/available-resources
���21
Benchmarks: HPL June 2013
# Performance (TFLOPS) Notes
1 Theoretical 23.96 --2 Practical 21.57 ~90% of #13 Measured 21.38 89.23% of #1
http://netlib.org/benchmark/hpl Theoretical performance = # of nodes x # of cores per node x Clock frequency (cycles/second) x # of floating point operations per cycle
���22
Benchmarks: LAMMPS June 2013
Benjamin Jensen (advisor : Dr. Gregory Odegard) Computational Mechanics and Materials Research Laboratory, Mechanical Engineering-Engineering Mechanics Results from a simulation involving 1,440 atoms and 500,000 time steps.
Tota
l Run
Tim
e (h
ours
)
0
4.25
8.5
12.75
17
# of nodes (CPU cores)
2 (32) 4 (64) 6 (96) 10 (160)
Michigan Tech: SuperiorNASA: Pleiades
���23
Submit completed proposal to: !Dr. Warren Perger Chair, HPC Committee wfp@mtu.edu
Account request
LaTeX/MS Word template available at http://superior.research.mtu.edu/account-request
- List of software/compilers
- Scalability
- Source of funding
- Résumé
- Proposal
- Title and abstract
- User population
- Preliminary results
- Nature of data sets
- Required resources
- A metric for merit
- An easily accessible list of projects - Know what the facility is being used for - Intellectual scholarship and computational requirements
- For VPR, CIO, deans, dept. chairs and institutional directors
- A fail-safe opportunity to practice writing proposals seeking allocations in NSF’s XSEDE, etc.
���24
Why proposal?
http://nsf.gov http://xsede.org http://superior.research.mtu.edu/list-of-projects
- Tier A - New faculty
- Established faculty with funding
- Tier B - Established faculty with no (immediate) funding
���25
User population
Group members and external collaborators inherit their PI’s tier. New faculty status is valid for 2 years from the first day of work.
���26
Job submission: qgenscriptOne stop shop for
- Array jobs
- Exclusive node access
- Wait on pending jobs
- Email/SMS notifications
- Wait time statistics
- Command to submit the script
- Job information filehttp://superior.research.mtu.edu/job-submission/#batch-submission-scripts
- Users’ priorities are computed periodically - A weighted function of CPU time and production
- In effect only when Superior is running at near 100% capacity
- Pre-emption and advanced reservation are disabled
- Any job that will start will run to completion
���27
Job scheduling policy
http://superior.research.mtu.edu/job-submission/#scheduling-policy
���28
Email/SMS notifications
http://superior.research.mtu.edu/job-submission/#sms-notifications
- Reduces performance for all users
- First offense - Terminates the program
- An email notification [cc: user’s advisor]
- Subsequent offenses - Same as first offense
- Logs the user out and locks down the account
���30
Running programs in login nodes
http://superior.research.mtu.edu/job-submission/#running-programs-on-login-nodes A continued trend will be grounds for removal of user’s account.
- Data is not backed up
- Limits per user
- /home/john - 25 MB
- /research/john - decided on a per proposal basis
- When a user exceeds the limit
- 12 reminders at 6 hour intervals [cc: user’s advisor]
- 13th reminder, logs out the user and locks down the account
���31
Disk usage
http://superior.research.mtu.edu/job-submission/#disk-usage
���32
Useful commands
Developed at Michigan Tech http://superior.research.mtu.edu/job-submission/#useful-commands
- qgenscript
- qresources
- qlist
- qnodes-map
- qnodes-active | qnodes-idle
- qwaittime
- qstatus | quser | qgroup
- qnodes-in-job
- qjobs-in-node
- qjobs-in-active-nodes
- qjobinfo | qjobcount
- qusage
���33
Usage reportsAll PIs and Chair of HPC Committee receive a weekly report. !VPR, CIO, deans, department chairs and institutional directors receive quarterly and annual reports (or when necessary).
- 21 projects - 10 Tier A +11 Tier B
- 100 users
- 9 publications
- 75+% busy on most days
- $325k worth usage
- ~50% of initial investment
- Cost recovery model: $0.10 per CPU-core per hour
���34
Usage reports July 2013 — December 2013
���35
Metrics
Cannot manage what cannot be measured
Not everything that’s (easily) measurable is (really) meaningful Not everything that’s (really) meaningful is (easily) measurable
- Move towards a merit-based system
- Easily measurable quantities - Who users are
- # of CPUs and total CPU time
- Really meaningful entities - Publications - Type (poster, conference proceeding, journal) and impact factor
- Citations
���36
Metrics
Publications reported to: !Dr. Warren Perger Chair, HPC Committee wfp@mtu.edu
���37
An in-house algorithm to compute users’ priorities
Metrics: job prioritySystem already knows who the users are
���38
Metrics
http://superior.research.mtu.edu/usage-reports Interactive visualizations are built using Highcharts framework
���39
Metrics
http://superior.research.mtu.edu/usage-reports
���40
Metrics
http://superior.research.mtu.edu/usage-reports
���41
Metrics
http://superior.research.mtu.edu/usage-reports
���42
Metrics
http://superior.research.mtu.edu/usage-reports
���43
Metrics: global impact
http://superior.research.mtu.edu/list-of-publications
Michigan Tech original Journal Article Book Chapter Conference Proceeding
MS Thesis PhD Dissertation
- Move all clusters to Great Lakes Research Center - Upgrade to Rocks 6.1 and add a login node
- Retire individual clusters when possible
- 16 compute nodes and 1 NAS node added to Superior
- portage.research.mtu.edu - Segue to Superior - 1 front end, 1 login node, 1 NAS node and 6 compute nodes
- Testing, course work projects and beginner research groups
���44
Further consolidation August 2013 — December 2013
- 1 big, 1 mini (central) and 3 individual clusters - 1 data center with .research.mtu.edu network
- Rocks 6.1
- Identical software configurations
- Automated systems administration and maintenance
- Extensive end user training
- Complete documentation
���45
An as is snapshot January 2014
Immersive Visualization Studio (IVS) is powered by a Rocks 5.4.2 cluster and has 24 HD screens (46” 240 Hz LED) working in unison to create a 160 sq. feet display wall. @MTUHPCStatus
- More tools to enhance user experience - Videos for self-paced learning of command line linux
- Encourage GPU computing
- Expand storage
- Provide backup
- Re-design InfiniBand switch system (216 nodes)
- Plan for expanded (or new) Superior
���46
Immediate future February 2014 and beyond
���47
Thanks be to- Philip Papadopoulos and Luca Clementi (UCSD and SDSC)
- Timothy Carlson (PNL)
- Thomas Reuti Reuter (Phillips Universität Marburg)
- Alexander Chekholko (Stanford University)
- Rocks, Grid Engine and Ganglia mailing lists
- Henry Neeman (University of Oklahoma)
- Steven Gordon (The Ohio State University)
- Gergana Slavova, Walter Shands and Michael Tucker (Intel)
- Gaurav Sharma and Scott Benway (MathWorks)
- Adam DeConinck (NVIDIA)