0 Arun Rodrigues, Scott Hemmert, Dave Resnick: Sandia National Lab (ABQ) Keren Bergman: Columbia...

Arun Rodrigues, Scott Hemmert, Dave Resnick:

Sandia National Lab (ABQ)

Keren Bergman: Columbia University

Bruce Jacob: U. Maryland

John Shalf, Paul Hargrove: Lawrence Berkeley National Laboratory

Gilbert Hendry: Sandia National Laboratory

Dan Quinlan, Chunhua Liao: Lawrence Livermore National Lab

Sudhakar Yalamanchili: Georgia Tech

Data Movement Dominates (DMD)and

CoDEx: CoDesign for Exascale

Codesign Tools RecapArchitectural Simulation to Accelerate CoDesign

• System level models

• Node level emulation

ROSE• Application

Analysis

ROSE Compiler: Enables deep analysis of application requirements, semi-automatic generation of skeleton applications, and code generation for ACE and SST.

ACE Node Emulation: Rapid design synthesis and FPGA-accelerated emulation for rapid prototyping cycle accurate models of manycore node designs.

SST Macro System Simulation: Enables system-scale simulation through capture of application communication traces and simulation of large-scale interconnects.

SST Micro Software Simulators: Software simulation for node-level simulation

ROSE• Application

Analysis

CoDEx: CoDesign For Exascale

ASCR-funded Simulation Infrastructure Project

SST: Structure Simulation Toolkit

NNSA-funded Simulation Tools

(ASC Program)

ROSE• Application

Analysis

CoDEx: CoDesign For Exascale

ASCR-funded Simulation Infrastructure Project

SST: Structure Simulation Toolkit

NNSA-funded Simulation Tools

(ASC Program)

CAL: (Sandia/LBL) Computer Architecture

Laboratory

Fidelity vs. Scope for Architectural Simulation Methods

ROSE CompilerFull Program Understanding through Deep Source-Code Analysis

• Can automatically predict performance for many input codes and software optimizations

• Predict performance under different architectural scenarios

• Much faster than hardware simulation and manual modeling

ExaSAT: Exascale Static Analysis ToolCompiler-Automated Performance Model Extraction

Combustion Codes

Compiler Analysis

Performance PredictionSpreadsheet

Dependency Graph Optimization

UserParameter

Performance Model

Machine Parameter

SST/macro: Coarse-Grained Simulation

An application code with minor modifications

SST/Macro Impl. of interfaces (MPI), which simulate execution and communication

SST/micro: Cycle-Accurate Framework

• Has a general simulation framework for integrating models

• Simulation backend is parallel• Plenty of people involved

Some Models Currently Integrated

Gem5 is a well-known architectural simulator with models for processors, caches, busses, and network components.

MacSim provides a model of GPU/CPU cores or geterogenous computing nodes, which can be driven from x86 or PTX (CUDA) traces.

IRIS provides a pipelined, cycle- accurate router model capable of modeling a variety of Network-on-Chip (NoC) and inter-node interconnection architectures. PhoenixSim models photonic networks.

Leveraging Embedded Design Automation For Design Space Exploration

This stuff is essential!

Embedded Design Automation(Using FPGA emulation to do rapid prototyping)

RAMP FPGA-acceleratedEmulation of ASIC

Or “tape out”To FPGA

Data Movement Dominates (Sandia, Micron, Columbia, LBL)Understand the Potential of Intelligent, Stacked DRAM Technology

• Data movement are projected to account for over 75% of power budget for an exascale platform

• Work to reduce that via– Optical interconnect(s)– 3D stacking (logic + memory + optics)– New memory protocols

Research Questions– What is the performance potential of stacked memory (power &

speed)– How much intelligence to put into logic layer

• Atomics, gather/scatter, checksums, full-processor-in-memory

– What is the memory consistency model for intelligent DRAM– How to program it if we put embed more intelligence into DRAM

The Cost of Moving Data

Locality Management is KeyWhat are the best combination of software and hardware

mechanisms to maximize data movement efficiency

Vertical Locality Management Horizontal Locality Management

Sun Microsystems

TemporalTopological

Why Study Chip Stacking (TSVs)?Energy = (V 2 C) Overhead + Ecomm ∗ ∗

DRAM Cells Efficient• DRAM cells require < 1 pJ to access • Current DRAM architectures are not

power efficient • Long distances high power ➔• We pay for more than we get at every

level – Cache: throw away 75-80% – DRAM Row: Charge 1024B for each 64B

access – DIMM: Charge 8-9 chips/access – ~800 pJ/byte total

• DRAM design driven by packaging constraints – ~50% of DRAM chip cost is packaging,

mainly in pins – DIMMs use multiple chips with a few

data pins to achieve high BW

TSVs Reduce Costs• TSVs orders of magnitude less energy • –250 fJ/bit for reading DRAM • –5 fJ/bit for TSV • –250 fJ/bit for mem. controller • –~0.5 pJ/bit (compared to 30pJ for

conventional DIMM) • –Don’t have to access more data than

needed • • Enables....

–Lower Capacitance: Narrower –Lower Overhead: Smarter –In-Memory computation

• • Requires • –...changes to how we view the

machine & the memory

Why Photonics?

ELECTRONICS: Buffer, receive, and re-

transmitat every router.

Space Parallelism:Each bus lane routed independently (P NLANES).

Off-chip BW requires much more power than on-chip BW.

Photonics changes the rules for Bandwidth-per-Watt.

PHOTONICS: Modulate/receive data

stream once per communication event.

Wavelength Parallelism:Broadband switch routes entire multi-wavelength stream.

Off-chip BW ≈ on-chip BW for nearly same power.

HBDRAM

• Large Pin-out• Complex wiring• Low bandwidth density• Distance constrained by electrical

limitations• High power dissipation

• All-optical link, no electronic bus to drive• Bit-rate transparent link• High bandwidth density, less pins• Distance immunity at computer scale• Low power dissipation

Optical Link

Traditional Memory Optically-Connected Memory

Why Optically-Connected Memory?

Will not scale to meet power and bandwidth requirements of future high-

performance computing systems

Enables scaling of high-performance computing through increased memory

capacity and bandwidth

HBDRAM

Electronic Bus

Mixed Model Simulationcycle accurate and energy-accurate models

SST/macro

skeleton app (C, C++, Fortran)

(C++)NoC Model

(PhoenixSim)

Memory Model(DRAMSim2, FLASHsim, NVRAM)

Address Translation

Processor Model(SST/micro & Tensilica)

Workload Translation

kernels

SystemC

Checkpoint/restart

MPI Traces(DUMPI)

Simulator Infrastructure: Interconnectscycle accurate and energy-accurate models

Developed by Sandia CollaboratorsCoDEx project

Simulator Infrastructure: Memorycycle accurate and energy-accurate models

Validated against Micron DRAMHMC model coming this summer

Simulator Infrastructurecycle accurate and energy-accurate models

Rewrote Columbia PhoenixSimsummer 2011

Orion-2 energy modelValidated against Cornell test parts

Simulator Infrastructurecycle accurate and energy-accurate models

Full Gate-level RTL model of processorWell characterized energy model

0 Arun Rodrigues, Scott Hemmert, Dave Resnick: Sandia National Lab (ABQ) Keren Bergman: Columbia...

Documents

Transcript of 0 Arun Rodrigues, Scott Hemmert, Dave Resnick: Sandia National Lab (ABQ) Keren Bergman: Columbia...

Ultra-Efficient Exascale Scientific Computing Lenny Oliker, John Shalf, Michael Wehner And other LBNL staff.

wikileaks.org · Web viewJOSEPHINE Y ALEXANDER TEHISHA Y FITCH DONNA L HARGROVE JAI FITCH LAQUINDRA HARGROVE ...

€¦ · PACKARD PHOTOGRAPHY BURBANK Public Relations: Dennis Hargrove (213) 399-3583 Membership Chairman: Ken Tyler ... Vick Martin - Aeronca Champ; Dennis Hargrove & Tony MacLane

i 6·-:J···. · has allowed each of his students to de ... Fugate, Don Garner, Bill . Gamer, Jackie George, Curtis Gottschalk, Jim Green, lloyd Hall, David Hargrove, Claudia Hargrove,

Charred Styles VENTED GAS LOGS Hargrove' …victoriansales.com/HAR-CHARRED-sm.pdfSTYLiNG Hargrove* HEARTH PRODUCTS MAGNIFICENT INFERNO IMPRESSIVE 4 SIZES SUPREME PONDEROSA LARGER PLACES

Andreas Gohs Giselmar Hemmert Michael Holtermann Jérôme ...

Economic indicators dashboard - Halbert Hargrove

RAMÓN VALLE ROY HARGROVE THE TIME IS NOWinandout-records.com/wp-content/uploads/2018/03/... · Roy Hargrove, the 6 time Grammy nominated and two time Grammy winner plays trumpet

Using Technology Effectively Caroline Hargrove World Rowing Coaches Conference 22 nd January 2011.

Recipes of Successeeintennessee.org/Files/eetn/2012/TeeaFALL12.pdf · By Hilary Hargrove. Karen Hargrove, PhD. Member-at-Large: Middle TN Fall Conference Co-Chair khargrov@bellsouth.net

1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.

Hemmert Studio

D Hargrove Slides (SBOT - Advanced Employment Law CLE Jan. 2013)

Presented By Kenneth W. Hargrove, AIA, NCARB

Samuel Williams, Jonathan Carter, Leonid Oliker , John Shalf , Katherine Yelick

Agenda for the week of November 30- December 4, 2015 Mrs. Hargrove ELA.

Dan Hargrove (SBOT 2013 Article)

January 17, 2016 Mr. Myles Hargrove Summit Earthworks Inc ... › wp-content › uploads › ... · Mr. Myles Hargrove . Summit Earthworks Inc. #109 – 32885 Mission Way . Mission,

How to read your Fidelity statement for ... - Halbert Hargrove

[Ray Hargrove-Huttel, Kathryn Cadenhead Colgrove] (Bookos.org)