Rev Up Your HPC Engine
-
Upload
insidehpc -
Category
Technology
-
view
1.474 -
download
0
description
Transcript of Rev Up Your HPC Engine
![Page 1: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/1.jpg)
Rev Up Your HPC EngineFritz Ferstl, CTO Univa Corp, [email protected]
![Page 2: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/2.jpg)
Who is Univa?
Copyright © 2014 Univa Corporation. All Rights Reserved.2
• Profile• Based in Chicago, global
reach
• >500 customers in 3 yrs (mostly Fortune 500)
• Products /Technologies:• Univa Grid Engine
• UniSight
• Univa License Orchestrator
• UniCloud
Data Center Automation ExpertsDo more with less in Big Compute and Big Data
Help organizations play a better game
of Tetris
![Page 3: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/3.jpg)
Challenges for Workload and Resource Management Systems
Copyright © 2014 Univa Corporation. All Rights Reserved. 3
![Page 4: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/4.jpg)
Scalability
• Node counts stay flat or go down, sockets stay flat, cores explode• With the core explosion, the number of jobs also explodes
• Ever shorter run-times, more applications, more use cases
• Large commercial sites approach or go beyond 100K
• Throughput clustersprocess >150 millionjobs / month
4Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 5: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/5.jpg)
Heterogeneity
5Copyright © 2014 Univa Corporation. All Rights Reserved.
• Hardware• Multi-sockets, multi-cores
• Partial cluster upgrades
• Evolving memory, network and storage architectures
• Accelerators: GPUs, Phi
• Job Profiles• Throughput
• Array Jobs
• Large Parallel
• Interactive
• Sessions
• Reservations
• Transactional
• Hybrid
• Dependencies, Workflows
![Page 6: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/6.jpg)
Policy Variety
6Copyright © 2014 Univa Corporation. All Rights Reserved.
• Automated Transparency?
• Manual overrides
• Preferential access
• Priorities
• Reservations
• Resource Urgencies
• Quotas
• Deadlines
• Conflict Resolution• E.g. don‘t starve large
parallel plus maintainhigh utilization
![Page 7: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/7.jpg)
Use Case Variety
7Copyright © 2014 Univa Corporation. All Rights Reserved.
• Classical HPC (simulation) Large parallel / many mid-size parallel
• Verification / Test Throughput
• From single simulation to parameter study array jobs
• Ultra-short jobs
• Big Data / Data Mining
• Exclusive usage of nodesvs shared usage
![Page 8: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/8.jpg)
Geographical Distribution / Clouds
8Copyright © 2014 Univa Corporation. All Rights Reserved.
• Resource sharing: servers, licenses, data, other
• Data access latencies
• Security
• File system dependencies• Pre-/Post-Staging
• Data locality:• Bring the job to the data
• Or bring the data to the job
![Page 9: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/9.jpg)
SolutionsApproachesBest Practices
Copyright © 2014 Univa Corporation. All Rights Reserved. 9
![Page 10: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/10.jpg)
Evolve
• Architecture Evolution• more cores / nodes / jobs
make it faster
• Integration with GPUs, Phi, etc
• New Scheduling Algorithms• Efficient handling of job mixes:
parallel / array / sequential jobs
• Scheduling of ultra-short jobs
• More Monitoring, Better Error Tracking
• Reporting, Accounting & Analytics
10Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 11: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/11.jpg)
Be Street-Smart
• Simplify where possible!
• Be-all solution can be themost expensive• Effort
• Poor utilization slow ROI
• Focus on most important goals
11Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 12: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/12.jpg)
Think Different
• Examples:• Less HA @ more throughput via fast SSD-Raid with
regular back-up
• Use array jobs whereever possible
• More smaller jobs vs fewer biggerjobs
• All considered, preemption maybe a good option
12Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 13: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/13.jpg)
Accept Difference
• Simple: temporarily designate parts of cluster
• Advanced: Cloud-share• Share resources across separate workload
management system instances
• Dynamically re-assign resources(servers) based on demand
• Provides autonomy whilemaintaining high utilization
• But avoid meta-schedulingwhere you can!
13Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 14: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/14.jpg)
Tailored Solutions
• Tailoring & add-ons can make all the difference
• Tailoring such as• Job Classes
• Customized reports
• Add-ons such as• Submission portals
and wrappers
14Copyright © 2014 Univa Corporation. All Rights Reserved.
![Page 15: Rev Up Your HPC Engine](https://reader034.fdocuments.net/reader034/viewer/2022050920/54ba5fcf4a7959a10f8b456b/html5/thumbnails/15.jpg)
Conclusions
• Workload & Resource Management Systems more required than ever
• Specifically in the “new” era of Cloud and Big Data
• Allows you to benefit from 20+ years of experience in HPC workload orchestration and to move beyond
• Clear-cut set of challenges non-trivial solutions
• Build on best-in-class products, architectures and development teams
• Being “street-smart” about architecting and configuration of a cluster has big impact
15Copyright © 2014 Univa Corporation. All Rights Reserved.