Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and...

28
Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and Energy- Efficient Systems Aniruddha N. Udipi Naveen Muralimanohar* Rajeev Balasubramonian Al Davis Norm Jouppi* University of Utah and *HP Labs
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and...

Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable

and Energy-Efficient Systems

Aniruddha N. UdipiNaveen Muralimanohar*Rajeev Balasubramonian Al Davis

Norm Jouppi*

University of Utah and *HP Labs

Memory Trends - I

• Multi-socket, multi-core, multi-thread – High bandwidth requirement– 1 TB/s by 2017

• Edge-bandwidth bottleneck– Pin count, per pin bandwidth – Signal integrity and off-chip power

Limited number of DIMMs

• Without melting the system– Or setting up in the Tundra!

2

Source: ZDNet

Source: Tom’s Hardware

• The job of the memory controller is hard– 18+ timing parameters for DRAM!– Maintenance operations

Refresh, scrub, power down, etc.

• Several DIMM and controller variants– Hard to provide interoperability– Need processor-side support for new

memory features

• Now throw in heterogeneity – Memristors, PCM, STT-RAM, etc.

Memory Trends - II

3

Improving the interface

4

CPU

MC

DIMM…

1

2

Memory Interconnect - Efficient application of Silicon Photonics, without modifying DRAM dies

Communication protocol – Streamlined Slot-based Interface

Memory interface under severe pressure

PART 1 – Memory Interconnect

6

Silicon Photonic Interconnects

• We need something that can break the edge-bandwidth bottleneck

• Ring modulator based photonics– Off chip light source

– Indirect modulation using resonant rings

– Relatively cheap coupling on- and off-chip

• DWDM for high bandwidth density– As many as 67 wavelengths possible

– Limited by Free Spectral Range, and coupling losses between rings

Source: Xu et al. Optical Express 16(6), 2008

DWDM

64 λ × 10 Gbps/ λ = 80 GB/s per waveguide

Static Photonic Energy

• Photonic interconnects– Large static power dissipation: ring tuning– Much lower dynamic energy consumption –

relatively independent of distance

• Electrical interconnects– Relatively small static power dissipation– Large dynamic energy consumption

• Should not over-provision photonic bandwidth, use only where necessary

7

The Questions We’re Trying to Answer

8

Should we replace allinterconnects with photonics? On-chip too?

Should we be designing photonic DRAM dies? Stacks? Channels?

How do we make photonics less invasive to memory die design?

What should the role of 3D be in an optically connected memory?

What should the role of electrical signaling be?

Contributions Beyond Prior Work

• Beamer et al. (ISCA 2010)– First paper on fully integrated optical memory– Studied electrical-optical balance point– Focus on losses, proposed photonic power guiding

• We build upon this– Focus on tuning power constraints– Effect of low-swing wires– Effect of 3D and daisy-chaining

9

Energy Balance Within a DRAM Chip

10

Electrical Energy

Photonic Energy

Single Die Design

11

1 Photonic DRAM die

More efficient on-chip electrical communication provides the added benefit of allowing fewer photonic

resources.

Similar to state-of-the-art design, based on prior work. Argues for a specially designed photonic DRAM.

46% energy reduction going between best full-swing config (4 stops) and best low-swing config (1 stop).

Full-swing on-chip wires Low-swing on-chip wires

3D Stacking Imminent for Capacity

• Simply stack photonic dies?– Vertical coupling and hierarchical power

guiding suggested by prior work– This is our baseline design

• But, more photonic rings in the channel– Exactly the same number active as before

• Energy optimal point shifts towards fewer “stops”

– single set of rings becomes optimal

• 2.4x energy consumption, for 8x capacity

12

8 Optimally Designed Photonic DRAM dies

8 Optimally Designed Photonic DRAM dies

Key Idea – Exploiting TSVs

• Move all photonic components to a separate interface die, shared by several memory dies

• Photonics off-chip only

• TSVs for inter-die communication– Best of both worlds; high BW and low

static energy

• Efficient low-swing wires on-die

13

Single photonic interface die

8 Commodity DRAM dies

Proposed Design

14

Processor

DIMMWaveguide

DRAM chips

Photonic Interface die

Memory controller

ADVANTAGE 1:Increased activity factor, more efficient use of photonicsADVANTAGE 3:Not disruptive to the design of commodity memory diesADVANTAGE 2:Rings are co-located; easier to isolate or tune thermally

Energy Characteristics

15

Static energy trumps distance-independent dynamic energy

Single die on the channel Four 8-die stacks on the channel

Final System

• 23% reduced energy consumption• 4X capacity per channel• Potential for performance improvements

due to increased bank count• Less disruptive to memory die design

16

Processor

DIMMWaveguide

DRAM chips

Photonic Interface die

Memory controller

Makes the job of the memory

controller difficult!

PART 2 – Communication Protocol

The Scalability Problem

• Large capacity, high bandwidth, and evolving technology trends will increase pressure on the memory interface

• Processor-side support required for every memory innovation

• Current micro-management requires several signals– Heavy pressure on address/command bus– Worse with several independent banks, large

amounts of state

18

Proposed Solution

• Release MC’s tight control, make memory stack more autonomous

• Move mundane tasks to the interface die– Maintenance operation (refresh, scrub, etc.)– Routine operations (DRAM precharge, NVM wear

leveling)– Timing control (18+ constraints for DRAM alone)– Coding and any other special requirements

19

What would it take to do this?

• “Back-pressure” from the memory

• But, “Free-for-all” would be inefficient– Needs explicit arbitration

• Novel slot-based interface– Memory controller retains control over data bus– Memory module only needs address, returns data

20

Memory Access Operation

21

S1

Arrival First free slot

Issue Start looking

Backup slot

MLML > ML

Time

Slot – Cache line data bus occupancyX – Reserved SlotML – Memory Latency = Addr. latency + Bank access + Data bus latency

x xx S2

Advantages • Plug and play

– Everything is interchangeable and interoperable– Only interface-die support required (communicate ML)

• Better support for heterogeneous systems– Easier DRAM-NVM data movement on the same channel

• More innovation in the memory system– Without processor-side support constraints

• Fewer commands between processor and memory– Energy, performance advantages

22

Target System and Methodology

• Terascale memory node in an Exascale system– 1 TB of memory, 1 TB/s of bandwidth

• Assuming 80 GB/s per channel, we need 16 channels, with 64 GB per channel– 2 GB dies x 8 dies per stack x 4 stacks per channel

• Focus on the design of a single channel• In-house DRAM simulator + SIMICS

– PARSEC, STREAM, synthetic random traffic– Max. traffic load used, just below channel saturation

23

Performance Impact – Synthetic Traffic

24

< 9% latency impact, even at maximum load Virtually no impact on achieved bandwidth

Performance Impact – PARSEC/STREAM

25

Apps have very low BW requirements Scaled down system, similar trends

Tying it together – The Interface Die

Summary of Design• Proposed 3D-stacked interface die with 2 major functions

– Holds photonic devices for Electrical-Optical-Electrical conversion Photonics only on the busy shared bus between this die and the

processor Intra-memory communication all-electrical exploiting TSVs and

low-swing wires – Holds device controller logic

Handles all mundane/routine tasks for the memory devices– Refresh, scrub, coding, timing constraints, sleep modes, etc.

Processor-side controller deals with more important functions such as scheduling, channel arbitration, etc.

Simple speculative slot based interface

27

Key Contributions• Efficient application of photonics

– 23% lower energy– 4X capacity, potential for performance improvements

• Minimally disruptive to memory die design– Single memory die design for photonics and electronics

• Streamlined memory interface– More interoperability and flexibility – Innovation without processor-side changes– Support for heterogeneous memory

28