Post on 04-Apr-2020
Making sense out of the many changes to our computing environment
(with a healthy dose of Odyssey)
Bob YantoscaSenior Software Engineer
with the GCST: Melissa, Matt, Lizzie, MikeJacob Group MeetingFriday, 20 Nov 2015
Introduction
● Two major sources of significant disruptive change to our computing environments:– Consolidation of IT assets formerly managed by SEAS
● e.g. Email, web servers, computational clusters● Many of these assets have now been taken over by Harvard
University IT (HUIT) and FAS Research Computing (RC)
– Jacob-group migration from AS cluster to Odyssey● Precipitated by Jack's retirement● But which has resulted in growing pains
Why centralization of IT assets?
● Historically, each school made its own IT decisions– Which led to much repetition and confusion
● i.e. 20+ email systems, 1000's of web servers & clusters● Single point of failure – not good
● Need to provide IT as economically as possible– More IT assets are being managed by HUIT and FAS RC
and less by individual schools– Centralization = Economy of scale = Saving $$– SEAS is doing this already (under Gabriele Fariello)
Letter from Gabriele Fariello to the SEAS community (30 Sep 2015)
Dear Friends and Colleagues,
I am writing to update you on what we in computing have been doing and why, what is left to be done, and to invite you to share your thoughts on the future of computing at SEAS. You can find more details below, but in summary:
Over the past two years, we have worked to refocus computing at SEAS to be able to provide the exceptional service that a world-class engineering and applied sciences school needs. We have:
● significantly increased the compute power available to all faculty while reducing costs and managing resources
● reduced our operational footprint to one-quarter of what it was two years ago
The many recent changes coming to SEAS and Harvard have inevitably added to the burden of changes the community has experienced. I understand that this has not been an easy period of transition, but the period of significant changes is almost over and should be by the end of the Fall Semester.
… -Gabriele
Assistant Dean for Computing & Chief Information OfficerHarvard John A. Paulson School of Engineering and Applied Sciences
Local IT assets that were changed
● seas.harvard.edu emails● seas.harvard.edu web sites● GEOS-Chem and Jacob-group email lists● GEOS-Chem and Jacob-group websites● GEOS-Chem wiki● GEOS-Chem Git repositories
seas.harvard.edu emails
Old System SEAS-hosted Microsoft Exchange Server
New System Cloud-based Microsoft Office 365 Server
Switchover May 2015 thru September 2015
Affected Everyone with a @seas.harvard.edu email address
Issues ● People were migrated to a temporary email server in May and then back again in August and September.
● The transition was not as smooth as hoped for some.● Everyone was receiving an excessive amount of spam for
about 2-3 months until HUIT finalized the spam filters.
seas.harvard.edu web sites
Old System people.seas.harvard.edu/~username
New System Drupal OpenScholar
Switchover Summer 2015
Affected SEAS faculty, staff, students
Issues ● You used to be able to upload HTML files (created with Dreamweaver etc.) to the people.seas.harvard.edu site.
● The people.seas.harvard.edu sites were discontinued (but existing users were grandfathered in)
● You cannot edit HTML code directly with OpenScholar.● OpenScholar's “look-and-feel” < < HTML + CSS.● OpenScholar has a 2GB limit (can't upload a lot of documents).
GC and Jacob-group email lists
Old System SEAS-hosted “Mailman” list servers
New System Google Groups (hosted on g.harvard.edu)
Switchover August 2015
Affected All users of the GEOS-Chem and Jacob-group email lists
Issues ● GROUP@seas.harvard.edu → GROUP@g.harvard.edu ● We were promised a smooth transition, BUT the initial
migration was done incorrectly.● Many addresses were omitted; people complained.● GCST tried to manually add addresses but found a 100
address/day limit was put in place. (UGH!!!)● The migration had to be a second time, which was sucessful.
GC and Jacob-group web sites
Old System Local Atmospheric Sciences web server
New System Amazon Web Services (cloud-based) web server
Switchover Summer 2015
Affected GC and Jacob-group website users
Issues ● None really to speak of; transition was very smooth!● Judit is currently looking into replacing the website Git
servers with another type of content management system (stay tuned).
GEOS-Chem wiki
Old System SEAS-hosted MediaWiki
New System Amazon Web services (cloud-hosted) MediaWiki
Switchover June 2015 (after IGC7)
Affected GC users who rely on the wiki (i.e. everyone)
Issues ● The machine where the GC wiki lived @ SEAS was retired.● Migration to AWS was very smooth. (Thanks Judit!)● The MediaWiki version was updated from v1.19 to v1.24.2.● All pages were preserved during transition.● Very little changes apparent to users, except for look & feel.
GEOS-Chem Git repositories
Old System git.as.harvard.edu
New System bitbucket.org/gcst
Switchover Summer 2015
Affected All GEOS-Chem users
Issues ● git.as.harvard.edu can only be updated from the AS server. It is read-only from Odyssey.
● The git.as.harvard.edu server has a 15-minute delay time before updates are made visible to the outside world.
● The need to sync code between AS and Odyssey prompted us to migrate the Git repos to bitbucket.org.
● We also obtained a academic license for bitbucket.org, so that we can have an unlimited number of developers for free.
Migration to Odyssey
● Homer's Odyssey is a story of a warrior who took 10 years to get home.
● Sometimes it felt like it would take that long to get up and running on Odyssey.
● We faced a few issues along the way (which we'll hear about shortly).
● But first, I'll give a brief introduction to Odyssey.
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
Holyseas01-04
4 x 64 CPUsjacob
Login nodes
Home & LabDisks
Network Scratch
Compute Nodes
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
/n/seasasfs02/YOUR_USER_NAME (long-term storage)
/n/regal/jacob_lab/YOUR_USER_NAME (temp. storage)
/n/home*/YOUR_USER_NAME
jacob (B,I) 18 hours default 256 288 GB/node (ACMG only) 36 hours maximum (64 CPUs/node) (that's 4.5 GB/cpu)
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
You can request interactive (I) or batch (B) jobs to run in these queues via the SLURM resource manager.
(I)
(B)
(B)
(B)
(B)
(B)
NOTE: While there are issues with the jacob partition, you can try submitting to serial_requeue, especially if you are using 8 or less CPUs, and a moderate amount of memory.
The general queue prioritizes high-memory jobs; low memory jobs will pend for days or weeks (until RC fixes this!)
Simple Linux Utility for Resource Management (SLURM)
● You ask SLURM for the following resources:– Amount of memory for job– Amount of time for job– Number of CPUs and nodes for job– Interactive session (srun) or – Queued session (sbatch)– Run queue (aka “Partition”)
Who came up with a name like SLURM?
SLURM was the cola featured on the animated series “Futurama”. It was what people drank in the
year 3000 (a parody of Coke and Pepsi).
“Slurms Mackenzie”, the party slug, was the mascot
for SLURM on the show (pictured at left).
The gag was that SLURM was highly addictive so
people couldn't stop drinking it!
Slide from Introduction to Odyssey & RC Services by Robert Freeman and Plamen Krastev, FAS RC
GCST has also created local documentation about how to log into Odyssey and run jobs on the AS wiki pages!
Cc: help@as.harvard.edu (i.e. Judit)and geos-chem-support@as.harvard.edu
To get to the AS wiki, click on this link!
We have written documentation on how to run jobs on Odyssey on the AS wiki!
Direct Link: http://wiki.as.harvard.edu/wiki/doku.php
Direct Link: http://wiki.as.harvard.edu/wiki/doku.php/wiki:basics
Direct Link: http://wiki.as.harvard.edu/wiki/doku.php/wiki:as:startup_scripts
Direct Link: http://wiki.as.harvard.edu/wiki/doku.php/wiki:as:startup_scripts
Direct Link: http://wiki.as.harvard.edu/wiki/doku.php/wiki:as:slurm
Bumps in the road ...
● During the migration process, several unforeseen issues came up that demanded our attention.
● Case in point, GCHP:– Mike, Matt, and Jiawei (Jintai Lin's student) were early
users of GCHP on Odyssey.– But they immediately ran into a couple of serious
roadblocks...
Mike Long wrote to FAS RC (June 2015):
I failed to notice that you only have TWO Intel licenses. This WILL BE A HUGE PROBLEM for us. Please let me know what we have to do to remedy this soon. Thanks.ML
And the response that Mike got from RC was:
1. Use a free compiler like gcc/gfortran.
2. Don't compile as often, in reality the contention for the Intel compiler isn't that high. When I build software I only stall out once a month maybe, and we don't get many complaints. So unless you are building constantly or need to recompile on the fly it may not be a frequent issue.
3. Bu y more licenses.
As it stands we do not plan to purchase more licenses. However if you want to make a case for it I can pass you along to … our operations manager. However, he may ask you to foot the bill for the additional licenses.
Bumps in the road ...
● This situation was made worse by users abusing the system:
Mike Long wrote again to FAS RC (1st week of June 2015):
A quick follow-up. Unfortunately the situation is disruptive. A user...appears to have an automatically running script that loads an Intel-based OpenMPI, compiles and runs a program. It is completely monopolizing the Intel licenses leaving me absolutely dead in the water. Unfortunately, we rely upon the Intel system for our work.
Is there a way to ask Mr. ____ to amend his procedure?
● We had to address this ASAP.
Bumps in the road ...
● The way that we solved this was that we ported our existing Intel Fortran Compiler licenses (v11) to Odyssey.
● We also brought over our existing netCDF/HDF5 libraries that were compiled with Intel Fortran Compiler v11.
● These are sufficient for most Jacob-group users, who are working with GEOS-Chem “classic”.
Jobs running slower on OdysseyLu Hu wrote (11/5/2015 1:58 PM)
Here are some examples (see below), I got a set of runs, same settings but for different years. The runtime to finish them varies from normally 20 hours, to 32 hours, and many times >36 hours.
All of these jobs were running on /n/regal
~29.5h===> SIMULATION START TIME: 2015/08/25 20:19 <======> SIMULATION END TIME: 2015/08/27 02:53 <===
~20h===> SIMULATION START TIME: 2015/08/26 22:22 <======> SIMULATION END TIME: 2015/08/27 19:22 <===~19.5h===> SIMULATION START TIME: 2015/09/12 06:23 <======> SIMULATION END TIME: 2015/09/13 01:07 <===~32.5h===> SIMULATION START TIME: 2015/09/17 18:05 <======> SIMULATION END TIME: 2015/09/19 02:44 <===
Katie Travis wrote (11/3/2015 5:16 PM)
Judit, my job 50950707 has run 10 days in 6 hours, which is about 2/3 slower than it should be.
Thanks,
Katie
Rachel Silvern wrote (11/3/2015 5:22 PM)
I am have been running into issues trying I am have been running into issues trying to run a one month 4x5 v9-02 (SEAC4RS) to run a one month 4x5 v9-02 (SEAC4RS) simulation this afternoon. My run keeps simulation this afternoon. My run keeps dying at stratospheric chemistry during the dying at stratospheric chemistry during the first timestep [with a segfault].first timestep [with a segfault].
I have a couple of job numbers: 50974904, I have a couple of job numbers: 50974904, 50972514, 50967636. I'm not sure if this is 50972514, 50967636. I'm not sure if this is related to issues on the partition or I have related to issues on the partition or I have my own mysterious issues. my own mysterious issues.
Jobs running slower on Odyssey
Melissa Sulprizio wrote (Thu 11/5/2015 4:05 PM)
Here are the job stats for our recent 1-month benchmark simulations. These ran on AS (v11-01a, v11-01b), the previous holy2a110xx jacob nodes on Odyssey (v11-01c), and the current holyseas[01-04] jacob nodes on Odyssey (v11-01d).
I highlighted the elapsed time for the last two jobs because those runs hit the time limit that I had set. Those jobs should not have timed out because all of these jobs are for 1-month GEOS-Chem simulations and the previous run times were all well below 12 hours. In fact, the last run (50518629) only got to day 16 out of 31 in the allotted 24 hours.
It’s important to note that some of the differences in run time and memory may have been affected by updates that we added to the model.
Jobs running slower on Odyssey
Melissa Sulprizio wrote again (11/6/2015 8:36 AM)
Finally, it's interesting to note that run times on AS were *MUCH* faster. We see typical run times of ~1 hour on AS for these 7-day GEOS-Chem simulations, as compared to 2-4 hours on Odyssey. I think this confirms our suspicion that the Odyssey nodes are not running faster than AS, like we were promised.
Jobs running slower on Odyssey
Matt writes: Over on the left (labeled “Initialization”), we have all of the I/O that’s done, and as you can see, there’s not much variation in it. There’s some small spikes for the blue bit, but the red is almost all the same.
The big kicker we have here is the “Timesteps” bars, which is effectively all computation the code is doing. Notice how on the red, there’s a variance of over 60 minutes between some runs, and for the blue, a variation of almost 50, and the I/O part has nowhere near that level of variation. The other three columns are a breakdown of the different types of science the code does, and the runtimes of each.
Initialization Timesteps Chemistry Transport Convection
GC w/ GEOS-FPGC w/ GCAP2
Mins 300
250
200
150
100
50
0
Courtesy of Matt Yannetti
Jobs running slower on Odyssey
● GCST felt that RC had not given us a satisfactory explanation of why GC was so much slower.
● DJJ and GCST brought these issues to FAS RC (via high-level meetings). RC was responsive. – Scott Yockel (RC) is working with us to diagnose the
problem.– Scott is helping us run tests.
RC has been very responsiveGEOS-CHEM Group,
Thank you all for meeting today. Sometimes it helps a lot on both ends to meet face-to-face. Please continue to send in tickets to anytime you have issues with the cluster. Please address those tickets to Dan Caunt and myself as we will make sure to fully answer your questions/concerns in a timely manner. Also, please spread this to the rest of the Jacobs group that were not at the meeting today. At any point that you are dealing with someone else in RC via our ticketing system, chat, or Office Hours and you are having issues dealing with that individual, please email me directly. And, if you have issues with how I’m handling something, please email James Cuff. I’m hear to make sure that your research throughput is optimal and that RC resources are attractable.
I’ve already closed off serial_requeue jobs from the jacobs nodes. After 51173945 job is finished, I’ll reboot holyseas01 with the lower processor count (essentially hyper threading disabled). Then I’ll run the test provided below on that host to start getting some baseline of timings. Lizzie, if you can also provide me with the runtimes for that same job on the AS cluster too that would be helpful for comparison.
Working hypothesis
● Differences in performance may be due to different architecture of our holyseas01-04 nodes– Holyseas nodes use AMD CPUs (version: “Opteron”) – AS cluster uses Intel CPUs (version: “Nehalem”)
● “Hyperthreading” (each CPU acting as 2) – May also degrade performance when nodes are busy – Recall other users' jobs are “backfilling” onto
holyseas01-04 CPUs to maximize usage
We did some testing ...
● Scott was able to (temporarily) eliminate sources of variability on the holyseas nodes– Turned off “hyperthreading”– Turned off “backfilling”
● Simulations done– 1-year Rn-Pb-Be simulations (Melissa)– 12-hour “benchmark” simulation (Lizzie, Scott)
Results of our tests● Melissa ran a set of 1-yr Rn-Pb-Be simulations
– On both AS and Odyssey
● But, run times are still slower than on AS– By up to a factor of 2
Results of our tests● Scott ran the 12-hr “benchmark” test on Odyssey
– Using varying options, runs took ~10-12 minutes
Results of our tests● Lizzie ran the 12-hr “benchmark” tests on Odyssey
– Consistent run times obtained on Odyssey– But, runs take 2X as long on Odyssey than on AS
Scott Yockel wrote (Fri 11/13, 10:23 AM)
In all of this testing, I see only about 1-2 min variability in the run times (on Odyssey), even when I run 4 jobs at concurrently. I even tried running on a different host (holyhbs03) that doesn't have the memory cap on it to see if more memory would make it run faster and do less swapping. I didn't find that to be the case either.
After watching many of these runs, there is a substantial amount of time when the CPU's are not at 100%. This means that the CPUs are waiting for instructions, this may be due to the times that it is fetching data from files or something else. I'm guessing that some compilation options may make some difference here. The Intel 11 compiler came out in 2008 with an update in 2009 (11.1). These AMD Opteron 6376 chips were launched in late 2012, so there are some chipset features that can be used that the Intel 11.1 doesn't even know about. For example the floating point unit on these chips is 256-bit wide, which means it could do 4 x 64-bit simultaneously. However, if the compiled code isn’t aware of that feature, then it will not feed it 4 instructions per clock cycle. So effectively instead of 2.3Ghz you are cutting that down in half or even by a quarter.
Results of our tests
But .. switching to a newer compiler version doesn't make that much difference
Summary of our findings
● Testing is still continuing– Scott was away this week (back next week).– We may need to try to find the right combination of
SLURM options.– But, Intel Fortran may not work as well on AMD chips
● This may be an Intel marketing decision designed to crush AMD. This is a longstanding AMD complaint.
– Worst case: RC has offered to replace AMD nodes w/ Intel nodes, if our testing suggests that is necessary.
http://techreport.com/news/8547/does-intel-compiler-cripple-amd-performance
Article from 2005.
Not sure if situation has improved since then.
GCHP test by Seb Eastham shows that input from /n/regal is the fastest
GCHP @ 4x51-day run w/ 6 CPUs
Disk I/O is the most expensive operation
GCHP test by Seb Eastham shows that input from /n/regal is the fastest
Average of 10 1-day runs GCHP @ 4x5 w/ 6 CPUs
Questions and Discussion