Practical Guidelines for Moab Stacks
-
Upload
insidehpc -
Category
Technology
-
view
192 -
download
4
Transcript of Practical Guidelines for Moab Stacks
![Page 1: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/1.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 1
Practical Guidelines for Highly Available Moab Stacks
Daniel Hardman, Chief Solutions Architect
@dhh1128 ~ http://codecraft.co ~ http://gplus.to/danielhardman ~ http://lnkd.in/z7PTAR
April 2013
![Page 2: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/2.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 2 © 2013 ADAPTIVE COMPUTING, INC. 2
The Goal of HA
…NOT! :-)
![Page 3: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/3.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 3 © 2013 ADAPTIVE COMPUTING, INC. 3
The real goals of HA
▪ Eliminate or reduce “downtime” for running jobs
▪ Eliminate or reduce “downtime” for new submissions
▪ Make failovers visible and manageable ▪ Satisfy regulatory requirements ▪ Preserve audit trail
![Page 4: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/4.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 4 © 2013 ADAPTIVE COMPUTING, INC. 4
HA is constrained by time, money
How much are you willing to spend to tolerate: ▪ A power outage? ▪ A software crash? ▪ A hacker from unit 61398 in Shanghai? ▪ The Chelyabinsk meteor? ▪ The Chicxulub meteor that wiped out the
dinosarus?
![Page 5: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/5.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 5 © 2013 ADAPTIVE COMPUTING, INC. 5
What is “downtime”?
0 – hardware failure
+3 min – usable, but very slow
-30 min – last checkpoint
+10 min – full restore
![Page 6: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/6.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 6 © 2013 ADAPTIVE COMPUTING, INC. 6
4 Basic Recipes
▪ Simple built-in HA ▪ Standard pairwise HA ▪ Shared pairwise HA ▪ Advanced HA
![Page 7: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/7.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 7
Recipe 1: simple, built-in HA
![Page 8: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/8.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 8 © 2013 ADAPTIVE COMPUTING, INC. 8
Simple, built-in HA
▪ hot ~ warm (daemons idle on fallback svr)
▪ Moab, TORQUE ▪ shared file system, synced clocks, two daemons,
last mod date on semaphore
▪ MAM ▪ DB replication, primary and fallback server
![Page 9: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/9.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 9 © 2013 ADAPTIVE COMPUTING, INC. 9
Sample deployment (simple, built-in HA)
![Page 10: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/10.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 10 © 2013 ADAPTIVE COMPUTING, INC. 10
Pros and cons (simple, built-in HA)
▪ Pros ▪ Fast and easy to set up ▪ Minimal learning curve
▪ Cons ▪ Doesn’t protect the solution DB, MWS, Viewpoint ▪ Depends on synchronized clocks, reliable
propagation of file metadata in shared fs ▪ Risk of false triggers ▪ Shared FS may be single point of failure,
depending on how it’s implemented
![Page 11: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/11.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 11
Recipe 2: standard, pairwise HA
![Page 12: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/12.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 12 © 2013 ADAPTIVE COMPUTING, INC. 12
Standard, pairwise HA
▪ Twin headnodes (all daemons) ▪ hot ~ cold (daemons inert on fallback svr) ▪ Heartbeat, redhat clustering ▪ Replicated FS (DRBD)
![Page 13: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/13.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 13 © 2013 ADAPTIVE COMPUTING, INC. 13
Sample deployment (standard, pairwise HA)
![Page 14: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/14.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 14 © 2013 ADAPTIVE COMPUTING, INC. 14
Pros and cons (standard, pairwise HA)
▪ Pros ▪ All services fail over the same way ▪ Heartbeat is robust, well understood ▪ FS can’t be a single point of failure
▪ Cons ▪ Some vulnerability to “split brain” scenario ▪ More learning curve ▪ More complexity than simple, built-in HA
![Page 15: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/15.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 15
Recipe 3: shared, pairwise HA
![Page 16: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/16.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 16 © 2013 ADAPTIVE COMPUTING, INC. 16
Shared, pairwise HA
▪ Twin headnodes (all daemons) ▪ hot ~ warm (some daemons inert, some
idle on fallback svr) ▪ Heartbeat, redhat clustering ▪ DB failover ▪ Shared FS (e.g., GFS2)
![Page 17: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/17.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 17 © 2013 ADAPTIVE COMPUTING, INC. 17
Sample deployment (shared, pairwise HA 1)
![Page 18: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/18.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 18 © 2013 ADAPTIVE COMPUTING, INC. 18
Sample deployment (shared, pairwise HA 2)
![Page 19: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/19.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 19 © 2013 ADAPTIVE COMPUTING, INC. 19
Pros and cons (shared, pairwise HA)
▪ Pros ▪ Solves “split brain” scenario ▪ May have slightly lower latency
▪ Cons ▪ Greater learning curve ▪ More complexity
![Page 20: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/20.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 20
Recipe 4: advanced HA
![Page 21: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/21.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 21 © 2013 ADAPTIVE COMPUTING, INC. 21
Advanced HA
▪ Each service (potentially) split onto dedicated box
▪ Daemons are paired and fail over with heartbeat, redhat clustering
▪ DB failover ▪ Replicated or shared FS
![Page 22: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/22.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 22 © 2013 ADAPTIVE COMPUTING, INC. 22
Advanced HA
This is less of a recipe, and more of a general pattern. Each unique server role has to have N-way redundancy. Complexity of config is high; we recommend involvement of professional services.
![Page 23: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/23.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 23 © 2013 ADAPTIVE COMPUTING, INC. 23
Pros and cons (advanced HA)
▪ Pros ▪ Can meet very aggressive SLAs ▪ Can be tailored and fine-tuned
▪ Cons ▪ Major implementation effort ▪ Requires sophisticated learning and monitoring
![Page 24: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/24.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 24 © 2013 ADAPTIVE COMPUTING, INC. 24
General Observations
▪ Important to audit ▪ Super-fast failover not a goal in our
recipes ▪ Security implications ▪ Not perf enhancer ▪ Not scalability enhancer ▪ Not DR
![Page 25: Practical Guidelines for Moab Stacks](https://reader030.fdocuments.net/reader030/viewer/2022032714/55abceeb1a28ab90228b46b7/html5/thumbnails/25.jpg)
© 2013 ADAPTIVE COMPUTING, INC. 25 © 2013 ADAPTIVE COMPUTING, INC. 25
More Info
Whitepaper now available. Email me ([email protected]) for a copy, or download from /documents/ha-moab-cloud-hpc.pdf. Documentation for Hopper release includes a new HA task guide for simple, built-in HA configuration.