Server Memory Forum 2011 - Home | JEDEC
Transcript of Server Memory Forum 2011 - Home | JEDEC
DDR4 in an DDR4 in an Enterprise ServerEnterprise Server
Server Memory Forum 2011
Art KilmerMemory DevelopmentIBM Server & Technology Group
Agenda
• Enterprise Server Memory Application Requirements
• DDR4 Report Card & Beneficial Changes– Capacity– Power– Performance
• Summary
Higher Performance Means Higher Memory Capacity• Processor: More Cores, More
Threads -> More Memory –Multi-threaded chips now mainstream.
–Higher clock speeds giving way to higher core counts w/ more threads/core.
• Data Center: Virtualization & Cloud Computing
– Increased efficiencies of larger memory pages.
• Result in higher capacity memory and caches:
–Memory per thread.–Memory per core.–Memory per processor socket–Memory per system 0.4
0.8
1.6
3.2
6.4
12.8
25.6
2.5
1.81.5
1.21.0
0.80.7
0.1
1.0
10.0
100.0
2002 2005 2009 2014 2020 2025 2030
Year
0.1
1.0
10.0BWPowerVDD
DRAM Trends
Capacity / Thread
DRAM capacity
0
5
10
15
20
25
30
35
2005 2007 2009 2011Year0
0.5
1
1.5
2
2.5
DRAM Capacity vs. Capacity/Thread
Enterprise Memory System Challenges
• More cores / system increases pressures:
Higher memory capacity.
Higher memory bandwidth.
• More memory, more bandwidth
implications:
more active power.
more standby power.
higher part count, more parts to fail.
• Each challenge exacerbates the others
Memory now ~20 - 40% system power
High BW drives High Power, Despite Technology Improvements (lower voltage)
OtherMemoryProcessor
Midrange System Power Trend
Bandwidth Requires Improved Bus Efficiency
• In high capacity systems, memory bandwidth is strongly related to availability and bus turn around.
– Read/Write and Read/Read for bank to bank, rank to rank & hidden rank to hidden rank (Load Reduced DIMMs).
– Limiting parameters are tWTR, tFAW, tRC, tRRD.– Refresh (tRFC) is limiting latency in high density devices.
• Bus turn around efficiencies required for effective bandwidth improvements.
– Bandwidth of DDR4-2666 has minimal benefit for server workloads over DDR3-1333 unless bus TAT addressed.
– Bus TAT includes both external & SDRAM internal busses.• R/W & R/R for bank to bank, rank to rank & hidden rank to hidden rank (Load
Reduced DIMMs).
DDR4 – Increased Capacity• 8Gb / 16Gb
– Continued standard scaling efficiencies provide business as usual growth, but are insufficient by itself.
• LRDIMM– Building on DDR3 chip ID decoding and distributed buffer LRDIMM architecture
Load Reduced Modules.– DDR4 RCD supports up to 4 package ranks of DDP or 2 package ranks of 8 high.
• Register support is limited to 4 CS & 1 CID, or 2 CS & 3 CID– Supports 4 package ranks of max 2H stack, or– Supports 2 package ranks of max 8H stack
• Dual Register Support– Required for 4 package ranks of 4H & 8H stacks.– Means to distinguish between two registers without additional pins is required.– Some minimal added latency is acceptable.
• 3DS / MS– Master / slave memory stacking provides the greatest benefit addressing capacity,
power and performance.
3D-stack• Additional ranks (capacity)• Load reduction (higher frequency)• Bus TAT hiding (bus efficiency)• Standby power reduction• Active termination power reduction
Thru-silicon via technology is key opportunity for extensive master-slave implementation.
• Higher stacking feasible within attractive package height (die thinning).
• More bits at same loading (no additional termination required).
• More layers in power-down like state (DLLs, IO and internal clocks) without access penalty. More power savings.
DDR4 DRAMDDR4 DRAMDDR4 DRAMDDR4 DRAM
sharedI/O
single package
Advanced Load Reduction technique : Master-slave
Net: greater Watt-per-bit & bit-per-mm3 efficiency for high-capacity memory systems
3DS DRAM Master-Slave Power Savings
•Master-slave DRAM architecture may offer as much as 40% power reduction.
•For large-capacity systems, both passive and active power is critical.
•Master-slave standby power opportunity:– Removal or gating of content on slave layers.
– DLL power particularly important: • Achieve power-down-like properties without control overhead.
•Master-slave active power opportunity:– Termination power is a sizeable component of today’s large
server-class DIMMs.
•Master-slave network power opportunity:– Higher capacity per memory channel eliminates need for
additional channels.
3DS vs DDP• Power savings from 3DS comes from 3 areas:
• Lower power on the Address, Control, and Clock nets due to lighter loading.• Lower idle power due to DRAM optimizations with 3D Stacking.• Lower Data bus termination power due to lighter loading.
• Majority of the power savings comes from the data bus termination.
• Due to the large number of data and strobe nets on the DIMM.
4 Rank DIMM Power Comparison DDP vs 3DS
addr/clk
data term
Idle
Refresh
DRAM R/W
Activate
Power
3DS packageDDP package
Voltage Reduction Roadmap
1.2v Vdd provides a significant power reduction but:
• DDR voltage hasn’t scaled linearly generation to generation with DDR architecture
• Increasing frequency without scaling voltage creates potential power wall
3200
1600
800
400
6400
1.01.5 1.2
0.6
2.51.8
1.30.9
100
1000
10000
2002 2005 2009 2014 2020
Year of DDR DRAM demonstration
DD
R F
requ
ency
(Mbp
s)
0.1
1.0
10.0
Volta
ge le
vel (
V)
M ax DDRx Freq. (M bps)VDDVDD-linear
Year of DDR DRAM
Vdd Roadmap Required• Continued voltage reduction is a natural means to power reduction.
– Technology scaling provides a means to reduce power while maintaining performance by lowering operating voltage.
– However scaling will become exceedingly difficult beyond 2x nm processes.
•Introduction is driven by the economics of productivity returns from large development and capital investments.
•Difficult to time to a specific time line.
•Even more difficult to plan across multiple manufacturers.
• Planned Vdd voltage is required to enable smooth industry transition and acceptance.
– Infrastructure support • Registers, Modules, models, tools, etc.
– Platform readiness • Robust interface specifications.• System voltage supply capabilities.• Coexistence and technology tolerance.
• A roadmap of power supply and interface specifications is required even before the technology is available to support it.
– Introduction timing may move, but having the steps previously defined will enable swift market migration once the technology is available.
512K Page Size512K Page Size (x4) benefits power, performance and reliability.
• Lower activation power avoids need of x8 devices• Potential for improving bandwidth limiting timings:
– tFAW & tRRD are a critical performance parameters for server workloads.
0
1
2
3
4
5
6
7
DDR2-400
DDR2-533
DDR2-667
DDR2-800
DDR3-1066
DDR3-1333
DDR3-1600
tRR
D [n
CK
]
tRRD(2KB)tRRD(1KB)
tRRD(512B)
0
4
8
12
16
20
24
28
32
DDR2-400
DDR2-533
DDR2-667
DDR2-800
DDR3-1066
DDR3-1333
DDR3-1600
tFA
W [n
CK
]
tFAW(2KB)tFAW(1KB)
tFAW(512B)
Latency vs tREFI
0%
50%
100%
150%
200%
250%
0 1 2 3 4 5 6 7 8 9
tREFI
Late
ncy
tRFC 110tRFC 160tRFC 300tRFC 560
Bandwidth vs tREFI
0%
20%
40%
60%
80%
100%
120%
0 1 2 3 4 5 6 7 8 9
tREFI
Band
wid
th tRFC 110tRFC 160tRFC 300tRFC 560
Performance Impacts of RefreshRefresh time of dense DRAMs
is becoming a significant performance penalty.
• Long refresh cycle times (tRFC) drive longer latency.
• Short refresh intervals drive lower bandwidth.
DDR4 Improvement – Programmable Refresh
• 1x 1/2x 1/4x 1/8x tRFI options.
• Enables controller to optimize refresh time vs. availability.• Ability to hide refresh
during idle time.
Additional DDR4 FeaturesPower• External VPP supply.
– Efficiency of VPP pump in DRAM at 1.2v, and would become worse as Vdd is further reduced.
• Temperature controlled auto & self refresh.– Saves IDD5 and IDD6.
• CAL– Chip Select to address latency saves 40-50% of standby power.
• Vddq Termination– Active and Standby power reduction (up to 40ma per DRAM).
• Max Power Saving Mode– Deep power down.
Performance• Fine Granularity Refresh.
– Enables the controller to more efficiently schedule refresh.
Reliability• CRC
– Protects I/O errors at high data rates.– Protects write errors not covered by ECC.
• CMD/ADD Parity– Protects CMD/ADD errors between register and DRAM.
DDR4 and Beyond
DDR4 capacity, power and bandwidth improvements over DDR3 are well aligned to server needs.
DDR4 attractiveness and more rapid transition into the server space could be accomplished via:– Cost effective availability of 3DS devices.
• TSV process technology maturation. – Addition of a Vdd roadmap.
• Continued maniacal focus on power reduction.– Elimination of tFAW restrictions.– Enhanced refresh to increase device availability.
Final Question
Given the ever widening gap between server and client/application device memory
requirements;
For how many more generations will a single DRAM standard satisfy the compute space?