DDR4 Memory Compliance Testing Barbara Aichinger FuturePlus Systems

Post on 22-Feb-2017

52 views 1 download

Transcript of DDR4 Memory Compliance Testing Barbara Aichinger FuturePlus Systems

FuturePlus Systems Corporation

15 Constitution Drive

Bedford NH 03110 USA

Barbara P. Aichinger Vice President New Business Development

DDR4 Memory Compliance Testing

Agenda

• DDR Memory Standards for Compliance Testing

• Memory problems continue to plague the industry – Recent Published Papers

– Row Hammer Failures

– Security Issues

• The concept of an Audit for Compliance Testing – Electrical

– Protocol

– Row Hammer

– SPD/MRS

– Performance/Margin

• Summary

Compliance Testing Documents

• Not yet…getting closer…

• FuturePlus Systems Sponsoring a

Protocol Checks Document

– Task Group has several Industry members

and several T&M vendors

– Several ballots have been passed and a

document is expected in 2017

Memory Errors continue to

plague the industry

Memory Errors in Modern Systems

This is called Thresholding

Average ~2%

Errors in Facebook’s Fleet of

Servers

If FB has 100K Servers • ~2% have a memory failure every month

• Of that number 46% of those have a DIMM

swap

• Doing the math….2% of 100K is 2000

• 46% of 2000 = 920 DIMM Swaps a Month!

• 30 days a month, 24 hours a day = 720 hours

in a month

Facebook is swapping out DIMMs every hour of every day of every month all year long!

An Update on Row Hammer

Failures

• Seen on DDR4

– Passmark Blog

• Several reports for DDR4 failing the Row Hammer

test

– ThirdIO paper

• http://www.thirdio.com/rowhammer.pdf

– Usenix

– Blackhat

– SGI seeing DDR4 RH failures in HPC

Row Hammer

A quick review!

0

1

0 0 0 0 0 0 0 0

1 1 1 1 1

Activate Command

Columns

Rows (pages)

Victim Row

USENIX Security Symposium

August 2016

ECC will not save you!

Row Hammer Failures on

DDR4

https://www.sgi.com/pdfs/4567.pdf

Introducing: The concept of an

AUDIT for JEDEC Compliance

Testing

• Not a repeat of a Design Verification

• A check to make sure the JEDEC

specification is being met

For the System and DIMM

• Audit the signal integrity of the memory channel

• Monitor the system for Protocol Violations

– BIOS programming errors

– SPD programmed incorrectly

– Memory Controller Issues

• SPD Check

• Row Hammer Testing

• Performance/Margin Testing

Using a Scan from a Logic

Analyzer instead of a Scope

• Allows for an easy and quick check of:

– Signal Alignment

– Relative Data Valid Eye

– Signal Swing

To see all signals at once a slot

interposer is used

DIMM Slot Interposer allows the system to operate up to 4200MT/s and run any application

Audit: Signal Swing

Slide Courtesy of

Overdriving DDR4 DRAM

to 1.4V could cause

damage.

Potential ODT setting issue. Threshold of first bit in burst has less swing than remainder of burst. Could also be ISI (inter-symbol interference)

Audit: Signal Alignment

For READS the Strobe is level

aligned For WRITES the

Strobe is Edge Aligned to the

Data

Signal Alignment

All the Data signals in a

Byte should be aligned

Relative Data Eye

DQ Write Eye overlay on Byte 5

5000 cycles (2400MT/s)

Eye threshold

centered at 790mv – 838mv

Eye size

Avg. of 272mV x 205 ps

Observations

All eyes are consistent in size and alignment.

Address Signals

Easy to check even at higher speeds

3200MT/s

Read data with Strobe

Write data with Strobe

Next Check for JEDEC Protocol

Violations by the memory controller

• The DDR4 JEDEC spec contains rules on

event ordering

• Examples

– Do not ACTIVATE a bank that is already open

– Do not PRECHARGE a bank that is already

closed

– Do not RD/WR a non open page

Memory Controller

Timing Violations

• Clock edge boundary

– Commands can not be too close together or too far apart

– Examples

• tREFI - Average refresh interval

• tRC - ACT to ACT or REF

• tMOD - MRS to PDE

• tCCD_L - RD to RD to Same Bank Group

65 violations identified with over

1000+ simultaneous checks

Protocol and Timing Compliance

‘in the wild’

JEDEC Specification Violation

The SPD has to be checked! Serial Presence Detect Device

Mistakes in the SPD can lead to the BIOS not

programming the Memory

Controller correctly

Mode Register Settings

Performance Metrics Not necessary for JEDEC compliance but a nice to

know!

• Which power management features are implemented

– Is Self Refresh ever being used?

– Is Max Power Down implemented?

• Can we look to see if any timing parameters can be improved?

Increasing Performance by

looking at timing margins RD to WR same Rank

Spec says 7 system operating at 10

Operating right at

Specification

Not happening! No Power

Management

Making the Measurement

Photos Courtesy of Keysight Technologies Photos Courtesy of FuturePlus Systems

Summary

• Memory Errors in the Field are pervasive!

• DDR Memory Compliance Testing can be

achieved using the method outlined

• Tools are available

– Purchase or Rent

• Companies needing help can hire industry

experts to perform the testing for them

Contact Information

Barbara P. Aichinger

FuturePlus Systems

Barb.Aichinger@FuturePlus.com

603-472-5905

www.FuturePlus.com

www.DDRDetective.com