Server Memory Forum 2011 - Home | JEDEC

18
DDR4 in an DDR4 in an Enterprise Server Enterprise Server Server Memory Forum 2011 Art Kilmer Memory Development IBM Server & Technology Group

Transcript of Server Memory Forum 2011 - Home | JEDEC

DDR4 in an DDR4 in an Enterprise ServerEnterprise Server

Server Memory Forum 2011

Art KilmerMemory DevelopmentIBM Server & Technology Group

Agenda

• Enterprise Server Memory Application Requirements

• DDR4 Report Card & Beneficial Changes– Capacity– Power– Performance

• Summary

Higher Performance Means Higher Memory Capacity• Processor: More Cores, More

Threads -> More Memory –Multi-threaded chips now mainstream.

–Higher clock speeds giving way to higher core counts w/ more threads/core.

• Data Center: Virtualization & Cloud Computing

– Increased efficiencies of larger memory pages.

• Result in higher capacity memory and caches:

–Memory per thread.–Memory per core.–Memory per processor socket–Memory per system 0.4

0.8

1.6

3.2

6.4

12.8

25.6

2.5

1.81.5

1.21.0

0.80.7

0.1

1.0

10.0

100.0

2002 2005 2009 2014 2020 2025 2030

Year

0.1

1.0

10.0BWPowerVDD

DRAM Trends

Capacity / Thread

DRAM capacity

0

5

10

15

20

25

30

35

2005 2007 2009 2011Year0

0.5

1

1.5

2

2.5

DRAM Capacity vs. Capacity/Thread

Enterprise Memory System Challenges

• More cores / system increases pressures:

Higher memory capacity.

Higher memory bandwidth.

• More memory, more bandwidth

implications:

more active power.

more standby power.

higher part count, more parts to fail.

• Each challenge exacerbates the others

Memory now ~20 - 40% system power

High BW drives High Power, Despite Technology Improvements (lower voltage)

OtherMemoryProcessor

Midrange System Power Trend

Bandwidth Requires Improved Bus Efficiency

• In high capacity systems, memory bandwidth is strongly related to availability and bus turn around.

– Read/Write and Read/Read for bank to bank, rank to rank & hidden rank to hidden rank (Load Reduced DIMMs).

– Limiting parameters are tWTR, tFAW, tRC, tRRD.– Refresh (tRFC) is limiting latency in high density devices.

• Bus turn around efficiencies required for effective bandwidth improvements.

– Bandwidth of DDR4-2666 has minimal benefit for server workloads over DDR3-1333 unless bus TAT addressed.

– Bus TAT includes both external & SDRAM internal busses.• R/W & R/R for bank to bank, rank to rank & hidden rank to hidden rank (Load

Reduced DIMMs).

DDR4 – Increased Capacity• 8Gb / 16Gb

– Continued standard scaling efficiencies provide business as usual growth, but are insufficient by itself.

• LRDIMM– Building on DDR3 chip ID decoding and distributed buffer LRDIMM architecture

Load Reduced Modules.– DDR4 RCD supports up to 4 package ranks of DDP or 2 package ranks of 8 high.

• Register support is limited to 4 CS & 1 CID, or 2 CS & 3 CID– Supports 4 package ranks of max 2H stack, or– Supports 2 package ranks of max 8H stack

• Dual Register Support– Required for 4 package ranks of 4H & 8H stacks.– Means to distinguish between two registers without additional pins is required.– Some minimal added latency is acceptable.

• 3DS / MS– Master / slave memory stacking provides the greatest benefit addressing capacity,

power and performance.

3D-stack• Additional ranks (capacity)• Load reduction (higher frequency)• Bus TAT hiding (bus efficiency)• Standby power reduction• Active termination power reduction

Thru-silicon via technology is key opportunity for extensive master-slave implementation.

• Higher stacking feasible within attractive package height (die thinning).

• More bits at same loading (no additional termination required).

• More layers in power-down like state (DLLs, IO and internal clocks) without access penalty. More power savings.

DDR4 DRAMDDR4 DRAMDDR4 DRAMDDR4 DRAM

sharedI/O

single package

Advanced Load Reduction technique : Master-slave

Net: greater Watt-per-bit & bit-per-mm3 efficiency for high-capacity memory systems

3DS DRAM Master-Slave Power Savings

•Master-slave DRAM architecture may offer as much as 40% power reduction.

•For large-capacity systems, both passive and active power is critical.

•Master-slave standby power opportunity:– Removal or gating of content on slave layers.

– DLL power particularly important: • Achieve power-down-like properties without control overhead.

•Master-slave active power opportunity:– Termination power is a sizeable component of today’s large

server-class DIMMs.

•Master-slave network power opportunity:– Higher capacity per memory channel eliminates need for

additional channels.

3DS vs DDP• Power savings from 3DS comes from 3 areas:

• Lower power on the Address, Control, and Clock nets due to lighter loading.• Lower idle power due to DRAM optimizations with 3D Stacking.• Lower Data bus termination power due to lighter loading.

• Majority of the power savings comes from the data bus termination.

• Due to the large number of data and strobe nets on the DIMM.

4 Rank DIMM Power Comparison DDP vs 3DS

addr/clk

data term

Idle

Refresh

DRAM R/W

Activate

Power

3DS packageDDP package

Voltage Reduction Roadmap

1.2v Vdd provides a significant power reduction but:

• DDR voltage hasn’t scaled linearly generation to generation with DDR architecture

• Increasing frequency without scaling voltage creates potential power wall

3200

1600

800

400

6400

1.01.5 1.2

0.6

2.51.8

1.30.9

100

1000

10000

2002 2005 2009 2014 2020

Year of DDR DRAM demonstration

DD

R F

requ

ency

(Mbp

s)

0.1

1.0

10.0

Volta

ge le

vel (

V)

M ax DDRx Freq. (M bps)VDDVDD-linear

Year of DDR DRAM

Vdd Roadmap Required• Continued voltage reduction is a natural means to power reduction.

– Technology scaling provides a means to reduce power while maintaining performance by lowering operating voltage.

– However scaling will become exceedingly difficult beyond 2x nm processes.

•Introduction is driven by the economics of productivity returns from large development and capital investments.

•Difficult to time to a specific time line.

•Even more difficult to plan across multiple manufacturers.

• Planned Vdd voltage is required to enable smooth industry transition and acceptance.

– Infrastructure support • Registers, Modules, models, tools, etc.

– Platform readiness • Robust interface specifications.• System voltage supply capabilities.• Coexistence and technology tolerance.

• A roadmap of power supply and interface specifications is required even before the technology is available to support it.

– Introduction timing may move, but having the steps previously defined will enable swift market migration once the technology is available.

512K Page Size512K Page Size (x4) benefits power, performance and reliability.

• Lower activation power avoids need of x8 devices• Potential for improving bandwidth limiting timings:

– tFAW & tRRD are a critical performance parameters for server workloads.

0

1

2

3

4

5

6

7

DDR2-400

DDR2-533

DDR2-667

DDR2-800

DDR3-1066

DDR3-1333

DDR3-1600

tRR

D [n

CK

]

tRRD(2KB)tRRD(1KB)

tRRD(512B)

0

4

8

12

16

20

24

28

32

DDR2-400

DDR2-533

DDR2-667

DDR2-800

DDR3-1066

DDR3-1333

DDR3-1600

tFA

W [n

CK

]

tFAW(2KB)tFAW(1KB)

tFAW(512B)

Latency vs tREFI

0%

50%

100%

150%

200%

250%

0 1 2 3 4 5 6 7 8 9

tREFI

Late

ncy

tRFC 110tRFC 160tRFC 300tRFC 560

Bandwidth vs tREFI

0%

20%

40%

60%

80%

100%

120%

0 1 2 3 4 5 6 7 8 9

tREFI

Band

wid

th tRFC 110tRFC 160tRFC 300tRFC 560

Performance Impacts of RefreshRefresh time of dense DRAMs

is becoming a significant performance penalty.

• Long refresh cycle times (tRFC) drive longer latency.

• Short refresh intervals drive lower bandwidth.

DDR4 Improvement – Programmable Refresh

• 1x 1/2x 1/4x 1/8x tRFI options.

• Enables controller to optimize refresh time vs. availability.• Ability to hide refresh

during idle time.

Additional DDR4 FeaturesPower• External VPP supply.

– Efficiency of VPP pump in DRAM at 1.2v, and would become worse as Vdd is further reduced.

• Temperature controlled auto & self refresh.– Saves IDD5 and IDD6.

• CAL– Chip Select to address latency saves 40-50% of standby power.

• Vddq Termination– Active and Standby power reduction (up to 40ma per DRAM).

• Max Power Saving Mode– Deep power down.

Performance• Fine Granularity Refresh.

– Enables the controller to more efficiently schedule refresh.

Reliability• CRC

– Protects I/O errors at high data rates.– Protects write errors not covered by ECC.

• CMD/ADD Parity– Protects CMD/ADD errors between register and DRAM.

DDR4 and Beyond

DDR4 capacity, power and bandwidth improvements over DDR3 are well aligned to server needs.

DDR4 attractiveness and more rapid transition into the server space could be accomplished via:– Cost effective availability of 3DS devices.

• TSV process technology maturation. – Addition of a Vdd roadmap.

• Continued maniacal focus on power reduction.– Elimination of tFAW restrictions.– Enhanced refresh to increase device availability.

Final Question

Given the ever widening gap between server and client/application device memory

requirements;

For how many more generations will a single DRAM standard satisfy the compute space?

Thank You

Questions?