ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

37
Spring 07, Apr 17, 19 Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Ag ELEC 7770: Advanced VLSI Design (Ag rawal) rawal) 1 ELEC 7770 ELEC 7770 Advanced VLSI Design Advanced VLSI Design Spring 2007 Spring 2007 Soft Errors and Fault-Tolerant Soft Errors and Fault-Tolerant Design Design Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor ECE Department, Auburn University ECE Department, Auburn University Auburn, AL 36849 Auburn, AL 36849 [email protected] [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E77 http://www.eng.auburn.edu/~vagrawal/COURSE/E77 70_Spr07 70_Spr07

description

ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. Soft Errors. - PowerPoint PPT Presentation

Transcript of ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Page 1: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11

ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design

Spring 2007Spring 2007Soft Errors and Fault-Tolerant DesignSoft Errors and Fault-Tolerant Design

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

ECE Department, Auburn UniversityECE Department, Auburn UniversityAuburn, AL 36849Auburn, AL 36849

[email protected]@eng.auburn.eduhttp://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07

Page 2: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22

Soft ErrorsSoft Errors Soft errors are the errors caused by the Soft errors are the errors caused by the

operating environment.operating environment. They are not due to a permanent hardware fault.They are not due to a permanent hardware fault. Soft errors are intermittent or random, which Soft errors are intermittent or random, which

makes their testing unreliable.makes their testing unreliable. One way to deal with soft errors is to make One way to deal with soft errors is to make

hardware robust:hardware robust: Capable of detecting soft errorsCapable of detecting soft errors Capable of correcting soft errorsCapable of correcting soft errors Both measures are probabilisticBoth measures are probabilistic

Page 3: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33

Some Early ReferencesSome Early References J. von Neumann, “Probabilistic Logics and the Synthesis J. von Neumann, “Probabilistic Logics and the Synthesis

of Reliable Organisms from Unreliable Components,” pp. of Reliable Organisms from Unreliable Components,” pp. 329-378, 1959, in A. H. Taub, editor, 329-378, 1959, in A. H. Taub, editor, John von Neumann: John von Neumann: Collected WorksCollected Works, , Volume V: Design of Computers, Volume V: Design of Computers, Theory of Automata and Numerical AnalysisTheory of Automata and Numerical Analysis, , Oxford University Press, 1963. Oxford University Press, 1963.

M. A. Breuer, “Testing for Intermittent Faults in Digital M. A. Breuer, “Testing for Intermittent Faults in Digital Circuits,” Circuits,” IEEE Trans. ComputersIEEE Trans. Computers, vol. C-22, no. 3, pp. , vol. C-22, no. 3, pp. 241-246, March 1973.241-246, March 1973.

T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft T. C. May and M. H. Woods, “Alpha-Particle-Induces Soft Errors in Dynamic Memories,” Errors in Dynamic Memories,” IEEE Trans. Electron IEEE Trans. Electron DevicesDevices, vol. ED-26, no. 1, pp. 2-9, 1979., vol. ED-26, no. 1, pp. 2-9, 1979.

Page 4: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44

Causes of Soft ErrorsCauses of Soft Errors

Interconnect coupling (crosstalk).Interconnect coupling (crosstalk). Power supply noise: IR-drop, delta-I.Power supply noise: IR-drop, delta-I. Effects generally attributed to alpha-particles:Effects generally attributed to alpha-particles:

Charged particles: electrons, protons, ions.Charged particles: electrons, protons, ions. Radiation (photons): X-rays, gamma-rays, ultra-violet Radiation (photons): X-rays, gamma-rays, ultra-violet

light. light.

Page 5: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55

Sources of Alpha-ParticlesSources of Alpha-Particles

Radioactive contamination in VLSI packaging Radioactive contamination in VLSI packaging material.material.

Ionosphere, magnetosphere and solar radiation.Ionosphere, magnetosphere and solar radiation. Other electromagnetic radiation.Other electromagnetic radiation.

Page 6: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66

Alpha-ParticleAlpha-Particle

Helium nucleus: two protons and two Helium nucleus: two protons and two neutrons, mass = 6.65 neutrons, mass = 6.65 ×10×10-27-27kgkg, charge = , charge = +2e (e = 1.6 +2e (e = 1.6 ×10×10-19-19C).C).

Energy = 3.73 GeVEnergy = 3.73 GeV

Page 7: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77

Soft Error Rate (SER)Soft Error Rate (SER)

Failures in time (FIT): One FIT is 1 error per Failures in time (FIT): One FIT is 1 error per billion hours of operation.billion hours of operation.

Alternative unit is mean time between failures Alternative unit is mean time between failures (MTBF).(MTBF).

1 year MTBF = 109/(365×24) = 114,155 FIT

Page 8: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88

Particle StrikeParticle Strike

p - substrate

n - + + ++ - -

Ion orCharged particle

Page 9: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99

Induced CurrentInduced Current

time

curr

ent

I(t) = I0(e– t/a – e– t/b), a >> b

Page 10: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010

Voltage Induced at a NodeVoltage Induced at a Node

V = Q/C

Where Q = ∫ I(t) dt

C = node capacitance

Smaller node capacitance will result in larger voltage swing.

Page 11: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111

Effect on Digital CircuitEffect on Digital Circuit

IN OUT

CK

CombinationalLogic

ChargedParticles

ChargedParticles

Page 12: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212

An SRAM CellAn SRAM Cell

bit bit

VDDWL

BL BL

01

Page 13: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313

SRAM Cell Struck by Alpha-ParticleSRAM Cell Struck by Alpha-ParticleSingle-Event Upset (SEU)Single-Event Upset (SEU)

bit bit

VDDWL

BL BL

0→1 1→0

ChargedParticles

Page 14: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414

D-LatchD-Latch

D

CK = 0

Q1

0

Page 15: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515

SEU in D-LatchSEU in D-Latch

D

CK = 0

Q1→0

0→1

ChargedParticles

Page 16: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616

Single Event Transients in Single Event Transients in Combinational LogicCombinational Logic

CK

CK

1

1

0

1

01

ChargedParticles

Page 17: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717

Effects of TransientsEffects of Transients

Error correcting effectsError correcting effects Transient pulse is filtered by gate inertiaTransient pulse is filtered by gate inertia Transient is blocked by an unsensitized pathTransient is blocked by an unsensitized path Transient is blocked by an inactive clockTransient is blocked by an inactive clock

Error enhancing effectsError enhancing effects Large number of gates can produce multiple Large number of gates can produce multiple

pulsespulses Fanouts can multiply error pulsesFanouts can multiply error pulses

Page 18: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818

SEUs in FPGASEUs in FPGA Parts that can be affectedParts that can be affected

Look-up table (LUT)Look-up table (LUT) Configuration memory cellConfiguration memory cell Flip-flopFlip-flop Block RAMBlock RAM

Page 19: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919

LUTLUT

out

F1 F2 F3 F4

1

01

10

11

00

00

01

110

Mem

ory

cells

Page 20: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020

SEU in SEU in LUTLUT

out

F1 F2 F3 F4

1

01

00

11

00

00

01

110

Mem

ory

cells

ChargedParticle1 changed to 0

Page 21: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121

Four Types of SEU in FPGAFour Types of SEU in FPGA

F1F2F3F4

LUT

FF

M

M

M

M

M M M

Configuration memory cell

Type 1

Type 2

Type 3

BlockRAM

Type 4

Page 22: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222

SEU Detection MethodsSEU Detection Methods

Hardware redundancyHardware redundancy Time redundancyTime redundancy Error detection codes (EDC)Error detection codes (EDC) Self-checker techniquesSelf-checker techniques

Page 23: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323

SEU Mitigation TechniquesSEU Mitigation Techniques

Triple modular redundancy (TMR)Triple modular redundancy (TMR) Multiple redundancy with votingMultiple redundancy with voting Error detection and correction codes (EDAC)Error detection and correction codes (EDAC) Hardened memory cellsHardened memory cells FPGA-specific methodsFPGA-specific methods

ReconfigurationReconfiguration Partial configurationPartial configuration Rerouting designRerouting design

Page 24: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424

Hardware Redundancy for DetectionHardware Redundancy for Detection

CombinationalLogic

CombinationalLogic

(duplicated)

outputinputs

Logic 1 indicates

error

Hardware overhead is high ~ 100%Performance penalty is negligible.

Page 25: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525

Time Redundancy for DetectionTime Redundancy for Detection

CombinationalLogic outputinputs

Logic 1 indicates

error

Hardware overhead is low.Performance penalty ( ~ d) = maximum detectable pulse width.

D Q

D Q

CK+ d

CK

Page 26: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626

Repeat on Error DetectionRepeat on Error Detection

CombinationalLogic

outputinputs

Logic 1 indicates

errorD Q

D Q

CK+ d

CK

C

Operation: If error is detected, then output retains its previous value.Repeating the computation can produce correct result.

Page 27: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2727

Muller C-ElementMuller C-Element

outputC

A

B

AA BB outputoutput

00 00 00

00 11 Old outputOld output

11 00 Old outputOld output

11 11 11

S Q

R

A

B

output

Page 28: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2828

Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)

CombinationalLogic copy 1

outputinputs MajorityVoter

CombinationalLogic copy 3

CombinationalLogic copy 2

Page 29: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2929

Majority Voter CircuitMajority Voter CircuitA

BAA BB CC outputoutput

00 00 00 00

00 00 11 00

00 11 00 00

00 11 11 11

11 00 00 00

11 00 11 11

11 11 00 11

11 11 11 11

A

B output

outputMajorityVoter

C

C

Page 30: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3030

Alternative Implementations of VoterAlternative Implementations of Voter

LUT

00010111

output output

A

B

C

A B C

VDD

Page 31: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3131

Triple Modular Redundancy (TMR)Triple Modular Redundancy (TMR)

CombinationalLogic

output

inputs

D Q

D Q

CK

CK+ d

MajorityVoter

D Q

D Q

CK+2d

CK+3d

Page 32: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3232

TMR for Memory CellsTMR for Memory Cells

CombinationalLogic

output

inputs

D Q

D Q

CK

CK

MajorityVoter

D Q

CK

Problems:1. Accumulation of

errors in flip-flops.1. Voter is not protected.

Page 33: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3333

FF Refresh and TMR for Memory CellsFF Refresh and TMR for Memory Cells

output

D Q

D Q

CK

CK

D Q

CK

MajorityVoter

MajorityVoter

MajorityVoter

MajorityVoter

r1

r2

r3

Page 34: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3434

A Resistor Hardened SRAM CellA Resistor Hardened SRAM Cell

bit bit

VDDWL

BL BL

01

Page 35: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3535

ReferencesReferences

F. L. Kastensmidt, L. Carro and R. Reis, F. L. Kastensmidt, L. Carro and R. Reis, Fault-Fault-Tolerant Techniques for SRAM-Based FPGAsTolerant Techniques for SRAM-Based FPGAs, , Springer, 2006.Springer, 2006.

S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. S. Mitra, N. Seifert, M. Zhang, Q. Shi, and K. S. Kim, “Robust System Design with Built-In Soft-Kim, “Robust System Design with Built-In Soft-Error Resilience,” Error Resilience,” ComputerComputer, vol. 38, no. 2, pp. , vol. 38, no. 2, pp. 43-52, February 2005.43-52, February 2005.

Page 36: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3636

Summary of Topics Covered (1)Summary of Topics Covered (1) Nanotechnology devicesNanotechnology devices Moore’s lawMoore’s law System level design for testability and test scheduling System level design for testability and test scheduling

problemproblem VerificationVerification

Logic equivalenceLogic equivalence Binary decision diagramsBinary decision diagrams

Power consumption and low-power conceptsPower consumption and low-power concepts Multi-core parallelismMulti-core parallelism MicroprocessorsMicroprocessors MemoriesMemories

Page 37: ELEC 7770 Advanced VLSI Design Spring 2007 Soft Errors and Fault-Tolerant Design

Spring 07, Apr 17, 19Spring 07, Apr 17, 19 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 3737

Summary of Topics Covered (2)Summary of Topics Covered (2) TimingTiming

Timing verificationTiming verification Timing simulationTiming simulation Static timing analysisStatic timing analysis

Timing optimizationTiming optimization Linear programming and clock constraintsLinear programming and clock constraints Clock skew problemClock skew problem Zero skew designZero skew design

Retiming, constraint graph and performance Retiming, constraint graph and performance optimizationoptimization

Soft errors and fault-tolerant designSoft errors and fault-tolerant design