Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.

Leveraging HierarchyIs this our Undiscovered Country?

John T. Daly

Undiscovered Country: Cost vs. Risk?

Data Movement

Concurrency

Latency Hiding

Technology Generation~ 15 years

Time

Log(

Per

form

ance

)

Parallel(IN)

Parallel(OUT)

Vector

Exascale?

Advanced Computing Systems (ACS)

• HPC capability doubles every 14 months, but data doubles every 9 months

• Innovative solutions required to bridge the gap• Partner with industry, academia and national

labs to develop technology enablers for next generation computing

• Generate a steady stream of capability; no “end goal” for scaling

ACS: Bridge to research community

Mission Problems Technical Challenges

Technical SolutionsMission Capability

Participatory Research

Mirroring

AgencyComputeMission

UniversitiesNational LabsGovernment

IndustryCEC

CEC

ACS: technical thrusts + end-to-end• Our HPC stakeholders

– System integrator optimizes power, performance and reliability for a set number of dollars

– System user optimizes usability, dependability and time-to-solution for a set number of deliverables

• Point solutions in six technical thrusts: power efficiency, chip I/O, interconnects, productivity, file I/O and resilience

• Innovative end-to-end solutions– AMOEBA: chip level data movement and packaging– MYRIAD(?): system level modeling and simulation

Extreme is not necessarily “balanced”

• Traditional HPC is an important part of ACS, but not the only part

• Dynamic design space drives the need for simulation and abstract machine model

• Goal: Scientific understanding in HPC

Chip I/O

Interconnect

Power EfficiencyResilience

Productivity

File I/O & Storage

Traditional HPC and ACS too

Also ACS, but maybe not traditional HPC

Future “convergence” ?• Today

– Predictive science starts with an initial model and runs a numerical experiment to generate lots of data

– Data analytics starts with lots of data and extracts features or information that characterize the data

• Tomorrow– Predictive science uses in situ data analytics to reduce the data storage

and post-processing requirements– Data analytics uses in situ predictive science to ask the question “what

ought this data to look like?”!???

Advancing Intelligence Through Science

??! ?? !vs.

Energy is the next shared resource

• Off node communication is over budget • Off chip communication is over budget

DOE Architectures and Technology for Extreme Scale Computing, San Diego, CA

Power Efficiency

Resilience

Productivity

Chi

p I/

O

Inte

rcon

nect

File

I/O

Data is the challenge of scale• Energy, performance and data integrity tapers

are a function of the distance between the data and the processor

• Data locality is key to computing at scale for optimizing right answers per Joule per second– Spatial locality allows me to grab more data in a

single memory transaction– Temporal locality allows me to use the same data

multiple times before I have to move it

A role for NV in the hierarchy

http://www.bit-tech.net/hardware/memory/2007/11/15/the_secrets_of_pc_memory_part_1/3

Node architecture = “shops” of data• Byte/Word addressable memory up and down

the stack, block synchronous between stacks• Control is data aggregator (e.g., gather/scatter)

Processor/Control

Control Control

Processor/Control

Control Control

RAM/NVRAM

RAM/NVRAM

RAM/NVRAM

RAM/NVRAM

RAM

RAM

Exploiting Spatial Locality• Fractal Memory

– Create a virtual mapping of data lines to space filling curves (e.g., Jin and Mellor Crummey, “Using Space-filling Curves for Computation Reordering”)

– Use memory control logic to resolve mappings– Dynamic mapping by user via PM interface

• Move work to data– Adaptive mesh refinement is a refine operation spawned at

another memory component– Map memory references back to processor

Exploiting Temporal Locality• Global one-sided memory model

– Different processors updating same values in PDE solver creates race conditions

– You’re going to get the wrong answer anyway, so checkpoint asynchronously and use QMU

– Inherently resilient algorithms that avoid global synchronization

• Reconfigurable hierarchy: “cache” vs. “scratch pad”– “Cache” is seamless and easy to use, but sometimes I’d like to be

able to bypass it– “Scratch pad” avoids duplicating memory and can be higher

performing, but it is harder to use– Is SSD going to work like “cache” or “scratch pad”?

Motivating example: Exa-sorting• Many linear solution methods are already robust against

errors and data race conditions (e.g. multigrid methods)• What about an application like sorting?

– Gradient descent approach is robust under errors* and can be parallelized asynchronously

– Suggests possibility for research in asynchronous parallel minimization approach for other classes of problems

• How about non-linear solvers?– Analogy in minimization of the objective function via

solution of the adjoint problem?– What about chaotic systems?

* Joseph Sloan, David Kesler, Rakesh Kumar, and Ali Rahimi. “A Numerical Optimization-based Methodologyfor Application Robustification: Transforming Applications for Error Tolerance”. DSN2010, Chicago, July 2010.

Non-linear term

From the user/developer perspective• Domain specific language to serve as portable wrapper for

domain user and SME• Support for globally addressable memory space• Easy one-sided and two-sided, synchronous and asynchronous

access to remote data• Intuitive mechanism for lightweight thread creation and remote

task invocation• Application control over dynamically reconfigurable memory

(hardware cache, software cache and software scratch) at each level of the memory hierarchy (chip, node and storage)

• Tools for monitoring memory and energy utilization, so I know when I’m swapping to DIMM!

Conclusions• Exascale arrives at the end of the technology generation

bridging concurrency to data: risk or opportunity?• Traditional algorithms + architectures too expensive in power,

performance and reliability if data leaves cache• Rethinking computation may yield large ROI

– models of computation– “balanced architecture”– predictive science vs. data analytics

• Required to facilitate new approaches– programming models and tools– simulation and modeling framework– vendor partnerships and technology investment

Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.

Documents

Transcript of Leveraging Hierarchy Is this our Undiscovered Country? John T. Daly.