Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.
-
Upload
philippa-chandler -
Category
Documents
-
view
218 -
download
3
Transcript of Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.
Leveraging HierarchyIs this our Undiscovered Country?
John T. Daly
Undiscovered Country: Cost vs. Risk?
Data Movement
Concurrency
Latency Hiding
Technology Generation~ 15 years
Time
Log(
Per
form
ance
)
Parallel(IN)
Parallel(OUT)
Vector
Exascale?
Advanced Computing Systems (ACS)
• HPC capability doubles every 14 months, but data doubles every 9 months
• Innovative solutions required to bridge the gap• Partner with industry, academia and national
labs to develop technology enablers for next generation computing
• Generate a steady stream of capability; no “end goal” for scaling
ACS: Bridge to research community
Mission Problems Technical Challenges
Technical SolutionsMission Capability
Participatory Research
Mirroring
AgencyComputeMission
UniversitiesNational LabsGovernment
IndustryCEC
CEC
ACS: technical thrusts + end-to-end• Our HPC stakeholders
– System integrator optimizes power, performance and reliability for a set number of dollars
– System user optimizes usability, dependability and time-to-solution for a set number of deliverables
• Point solutions in six technical thrusts: power efficiency, chip I/O, interconnects, productivity, file I/O and resilience
• Innovative end-to-end solutions– AMOEBA: chip level data movement and packaging– MYRIAD(?): system level modeling and simulation
Extreme is not necessarily “balanced”
• Traditional HPC is an important part of ACS, but not the only part
• Dynamic design space drives the need for simulation and abstract machine model
• Goal: Scientific understanding in HPC
Chip I/O
Interconnect
Power EfficiencyResilience
Productivity
File I/O & Storage
Traditional HPC and ACS too
Also ACS, but maybe not traditional HPC
Future “convergence” ?• Today
– Predictive science starts with an initial model and runs a numerical experiment to generate lots of data
– Data analytics starts with lots of data and extracts features or information that characterize the data
• Tomorrow– Predictive science uses in situ data analytics to reduce the data storage
and post-processing requirements– Data analytics uses in situ predictive science to ask the question “what
ought this data to look like?”!???
Advancing Intelligence Through Science
??! ?? !vs.
Energy is the next shared resource
• Off node communication is over budget • Off chip communication is over budget
DOE Architectures and Technology for Extreme Scale Computing, San Diego, CA
Power Efficiency
Resilience
Productivity
Chi
p I/
O
Inte
rcon
nect
File
I/O
Data is the challenge of scale• Energy, performance and data integrity tapers
are a function of the distance between the data and the processor
• Data locality is key to computing at scale for optimizing right answers per Joule per second– Spatial locality allows me to grab more data in a
single memory transaction– Temporal locality allows me to use the same data
multiple times before I have to move it
A role for NV in the hierarchy
http://www.bit-tech.net/hardware/memory/2007/11/15/the_secrets_of_pc_memory_part_1/3
Node architecture = “shops” of data• Byte/Word addressable memory up and down
the stack, block synchronous between stacks• Control is data aggregator (e.g., gather/scatter)
Processor/Control
Control Control
Processor/Control
Control Control
RAM/NVRAM
RAM/NVRAM
RAM/NVRAM
RAM/NVRAM
RAM
RAM
Exploiting Spatial Locality• Fractal Memory
– Create a virtual mapping of data lines to space filling curves (e.g., Jin and Mellor Crummey, “Using Space-filling Curves for Computation Reordering”)
– Use memory control logic to resolve mappings– Dynamic mapping by user via PM interface
• Move work to data– Adaptive mesh refinement is a refine operation spawned at
another memory component– Map memory references back to processor
Exploiting Temporal Locality• Global one-sided memory model
– Different processors updating same values in PDE solver creates race conditions
– You’re going to get the wrong answer anyway, so checkpoint asynchronously and use QMU
– Inherently resilient algorithms that avoid global synchronization
• Reconfigurable hierarchy: “cache” vs. “scratch pad”– “Cache” is seamless and easy to use, but sometimes I’d like to be
able to bypass it– “Scratch pad” avoids duplicating memory and can be higher
performing, but it is harder to use– Is SSD going to work like “cache” or “scratch pad”?
Motivating example: Exa-sorting• Many linear solution methods are already robust against
errors and data race conditions (e.g. multigrid methods)• What about an application like sorting?
– Gradient descent approach is robust under errors* and can be parallelized asynchronously
– Suggests possibility for research in asynchronous parallel minimization approach for other classes of problems
• How about non-linear solvers?– Analogy in minimization of the objective function via
solution of the adjoint problem?– What about chaotic systems?
* Joseph Sloan, David Kesler, Rakesh Kumar, and Ali Rahimi. “A Numerical Optimization-based Methodologyfor Application Robustification: Transforming Applications for Error Tolerance”. DSN2010, Chicago, July 2010.
Non-linear term
From the user/developer perspective• Domain specific language to serve as portable wrapper for
domain user and SME• Support for globally addressable memory space• Easy one-sided and two-sided, synchronous and asynchronous
access to remote data• Intuitive mechanism for lightweight thread creation and remote
task invocation• Application control over dynamically reconfigurable memory
(hardware cache, software cache and software scratch) at each level of the memory hierarchy (chip, node and storage)
• Tools for monitoring memory and energy utilization, so I know when I’m swapping to DIMM!
Conclusions• Exascale arrives at the end of the technology generation
bridging concurrency to data: risk or opportunity?• Traditional algorithms + architectures too expensive in power,
performance and reliability if data leaves cache• Rethinking computation may yield large ROI
– models of computation– “balanced architecture”– predictive science vs. data analytics
• Required to facilitate new approaches– programming models and tools– simulation and modeling framework– vendor partnerships and technology investment