Reliability in Reliability in Nanometer Technologies Nanometer … · 2017. 10. 19. · PPGEE ’08...

52
PPGEE PPGEE ’08 ’08 PPGEE PPGEE ’08 ’08 Reliability in Reliability in Nanometer Technologies Nanometer Technologies P bl dS l ti P bl dS l ti Problems and Solutions Problems and Solutions Dr -Ing Frank Sill Dr.-Ing. Frank Sill Department of Electrical Engineering, Federal University of Minas Gerais, Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil [email protected] http://www cpdee ufmg br/~frank/ http://www.cpdee.ufmg.br/~frank/

Transcript of Reliability in Reliability in Nanometer Technologies Nanometer … · 2017. 10. 19. · PPGEE ’08...

  • PPGEEPPGEE ’08’08PPGEE PPGEE ’08’08Reliability in Reliability in Nanometer Technologies Nanometer Technologies ––

    P bl d S l tiP bl d S l tiProblems and SolutionsProblems and SolutionsDr -Ing Frank SillDr.-Ing. Frank Sill

    Department of Electrical Engineering, Federal University of Minas Gerais,Av. Antônio Carlos 6627, CEP: 31270-010, Belo Horizonte (MG), Brazil

    [email protected]://www cpdee ufmg br/~frank/http://www.cpdee.ufmg.br/~frank/

  • AgendaAgendagg

    MotivationMotivationFailures in Nanometer TechnologiesgTechniques to Increase ReliabilityShadow Transistors

    Copyright Sill, 2008 PPGEE‘08, Reliability 2

  • MotivationMotivation

    Reliability important forReliability important for

    Normal user

    Companies

    Medical applications

    Cars

    Air / Space Environment

    Copyright Sill, 2008 PPGEE‘08, Reliability 3

  • MotivationMotivation

    500 Wolfdale

    400

    500M

    ill.]

    410 Mill.

    200

    300

    stor

    s [M

    Prescott

    Yonah151 Mill.

    100

    200

    Tran

    si

    Northwood55 Mill.

    Prescott125 Mill.

    Yonah

    02002 2004 2006 2008

    Yonah, 151 Mill.

    Probability for failures increases due to:

    Year

    Increasing transistor countShrinking technology

    Copyright Sill, 2008 PPGEE‘08, Reliability

  • MotivationMotivation

    150 nm500 Wolfdale130 nm

    90

    150 nm

    400

    500

    gyMill

    .]410 Mill.

    90 nm

    65 nm

    45

    100 nm

    200

    300

    chno

    log

    isto

    rs [M

    Prescott 45 nm 50 nm

    100

    200

    Tec

    Tran

    si

    Northwood55 Mill.

    Prescott125 Mill.

    Yonah

    0 nm 02002 2004 2006 2008

    Yonah, 151 Mill.

    Year

    Probability for failures increases due to:Increasing transistor countShrinking technology

    Copyright Sill, 2008 PPGEE‘08, Reliability

  • DimensionsDimensions

    11 m10 cm1 cm1 mm100 µm10 µm100 nm

    Source: „Spektrum der Wissenschaften“

    „65 nm“-TransistorSource: Intel

    Copyright Sill, 2008 PPGEE‘08, Reliability 6

    Source: Intel

  • Failures in Nanometer Failures in Nanometer TechnologiesTechnologies

  • Process FailuresProcess Failures

    Occur at production phaseOccur at production phaseBased on

    P V i tiProcess VariationsParticles …

    Copyright Sill, 2008 PPGEE‘08, Reliability 8

    Source: Mak

  • SubSub--wavelength Lithographywavelength Lithographyg g p yg g p y

    1 1000

    365nm nm]

    1 1000

    193nm248nm

    engt

    h [n

    180nmon [µ

    ]

    Wav

    ele

    90nm

    130nm Gap0,1

    ener

    atio

    100

    grap

    hy

    65nm

    Generation 45nm32nm

    Ge

    Lith

    og32nm 13nm EUV

    0,011980 1990 2000 2010 2020

    10

    Copyright Sill, 2008 PPGEE‘08, Reliability 9

    Source: Mark Bohr, Intel

  • FieldField--dependent Aberrationsdependent Aberrationspp

    )(ACELL)(ACELL)(ACELL 220011 YXYXYX ≠≠ ),(A_CELL),(A_CELL),(A_CELL 220011 YXYXYX ≠≠

    s

    Lens

    ds L

    ens

    Wafer

    Tow

    ard Wafer

    Plane

    Center: Mi i l

    Edge: High Ab tiMinimal

    AberrationsAberrations

    Source: R Pack Cadence

    Copyright Sill, 2008 PPGEE‘08, Reliability 10

    Source: R. Pack, Cadence

  • Varying Line WidthVarying Line Widthy gy g

    2.32.2[n

    m]

    2.12.0

    eWid

    th

    1.91.8150

    Line

    50100

    150

    020

    4060

    W f X W f Y0 0Wafer X Wafer Y0Source: Zhou, 2001

    Copyright Sill, 2008 PPGEE‘08, Reliability 11

  • Random Dopant FluctuationsRandom Dopant FluctuationsppCauses Vth Variations

    10000D

    opan

    t

    1000

    mbe

    r of

    DA

    tom

    s

    100

    Mea

    n N

    umA

    101000 500 250 130 65 32

    M

    Technology Node (nm)

    UniformUniform NonNon--uniformuniform

    Copyright Sill, 2008 PPGEE‘08, Reliability 12

    Source: Borkar, Intel

  • Power Power DensityDensityyy

    Sun’s10000

    Rocket

    Surface

    1000

    W/c

    m2)

    Nuclear

    RocketNozzle

    100

    ensi

    ty (W Nuclear

    ReactorPrescott

    40048086

    P410ower

    De

    Hot PlatePentium®

    80088080

    8085286 386 486

    Pentium®

    1

    Po

    11970 1980 1990 2000 2010

    Year

    Copyright Sill, 2008 PPGEE‘08, Reliability 13

    Source: Moore, ISSCC 2003

  • Temperature VariationTemperature VariationppPower Map On-Die Temperaturep

    Power density is not uniformly distributed across the chipSilicon is not a good heat conductorMax junction temperature is determined by hot-spots

    Impact on packaging, coolingSource: Borkar Intel

    Copyright Sill, 2008 PPGEE‘08, Reliability 14

    Source: Borkar, Intel

  • Temperature Variation cont’dTemperature Variation cont’dpp

    Power4 Server ChipPower4 Server Chip

    Source: Devgan ICCAD’03

    Copyright Sill, 2008 PPGEE‘08, Reliability 15

    Source: Devgan, ICCAD 03

  • Temperature Variation cont’dTemperature Variation cont’dpp

    S[p

    A]

    rrent

    I DS

    ay [s

    ]

    rain

    cur Del

    Dr

    Temperature [°C]

    Threshold voltage Vth changes with temperature drain-source current changes delay changes Source: Burleson UMASS 2007

    Temperature [ C]

    Copyright Sill, 2008

    changes delay changes

    PPGEE‘08, Reliability 16

    Source: Burleson, UMASS, 2007

  • Supply Voltage DropSupply Voltage Droppp y g ppp y g p

    Source: Trester 2005

    Copyright Sill, 2008 PPGEE‘08, Reliability 17

    Source: Trester, 2005

  • Failures Through Increasing DelayFailures Through Increasing Delayg g yg g y

    Data are processed before clock phase is over

    Clock (Clk)

    Clk

    Clock (Clk) Logic too slow!

    → Data processing longer than clock phase

    → Wrong Data in next clock phase!Clk

    Copyright Sill, 2008 PPGEE‘08, Reliability 18

  • Soft ErrorsSoft ErrorsSource: Automotive 7-8, 2004

    11

    In 70’s observed: DRAMs occasionally flip bits for no apparent reason Ultimately linked to alpha particles and cosmic raysCollisions with particles create electron-hole pairs in substrateThese carriers are collected on dynamic nodes, disturbing the voltage

    Copyright Sill, 2008 PPGEE‘08, Reliability 19

  • Soft Errors cont’dSoft Errors cont’d

    Internal state of node flips shortlyp yIf error isn’t masked by

    Logic: Wrong input doesn’t lead to wrong outputg g p g pElectrical: Pulse is attenuated by following gatesTiming: Data based on pulse reach flipflop after clock transistion

    wrong data

    Copyright Sill, 2008 PPGEE‘08, Reliability 20

  • ElectromigrationElectromigrationgg

    Electromigration: Top View VoidElectromigration: Transport of material caused by the gradual movement of

    Top View

    Metal 1by the gradual movement of ions in a conductor One of the major failure jmechanisms in interconnects.Proportional to the width and

    Metal 1

    thickness of the metal linesInversely proportional to the Whisker, Hillockcurrent density

    Cross Section View

    ,

    Metal 1Metal 1

    Metal 2

    Copyright Sill, 2008 PPGEE‘08, Reliability 21

    Source: Plusquellic, UMBC

  • Electromigration cont’dElectromigration cont’dggVoid in 0.45mm Al-0.5%Cu line

    Source: IMM-BolognaSource: IMM-BolognaWhiskers in Sn

    Source: EPA Centre

    Hillocks in ZnSnSource: Ku&Lin,2007

    Copyright Sill, 2008 PPGEE‘08, Reliability 22

  • TimeTime--Dependent Dielectric Breakdown (TDDB)Dependent Dielectric Breakdown (TDDB)

    T li tTunneling currents

    Wear out of gate oxide

    Creation of conducting path between Gate and Substrate, Drain, Source

    Depending on electrical field over gate oxide, temperature (exp.), and gate oxide thickness (exp.)

    Source: Pey&Tung

    Also: abrupt damage due to extreme overvoltage (e.g. Electro-g ( gStatic Discharge)

    Source: Pey&Tung

    Copyright Sill, 2008

    Source: Pey&Tung

    PPGEE‘08, Reliability 23

  • Variability TrendsVariability Trendsyy

    70

    60

    70

    Vdd

    40

    50

    lity Vth

    30

    40

    Vari

    abil

    Performance

    20

    %

    Power

    Lgate

    0

    10Lgate

    090 80 70 65 57 50 45 40 36 32 28

    Technology Node [nm] Source: Burleson, UMASS, 2007

    Copyright Sill, 2008 PPGEE‘08, Reliability 24

    gy [ ]

  • Variability Trends cont’dVariability Trends cont’dyy

    Soft Error / Chip (Logic & Mem)150

    Soft Error / Chip (Logic & Mem)

    100SE

    R

    50elat

    ive

    50

    Re

    0180 130 90 65 45 32 22 16

    Technology [nm]

    80 30 90 65 5 3 6

    Source: Borkar Intel

    Copyright Sill, 2008 PPGEE‘08, Reliability 25

    Source: Borkar, Intel

  • Variability Trends cont’dVariability Trends cont’dyyFrequency and sub-threshold leakage variations

    Frequency

    1.4

    cy

    q y g

    30%Frequency

    ~30%1 2

    1.3

    eque

    nc

    130nm~1000 samples

    LeakagePower1 1

    1.2

    zed

    Fre

    1000 samples Power~5-10X

    1 0

    1.1

    orm

    aliz

    5X

    0 9

    1.0No

    0.91 2 3 4 5

    Normalized Leakage (Isub)Source: Borkar Intel

    Copyright Sill, 2008 PPGEE‘08, Reliability 26

    Source: Borkar, Intel

  • Variability Trends cont’dVariability Trends cont’d

    Increasing probability for Gate Oxide Breakdown

    yy

    10000

    Increasing probability for Gate-Oxide-Breakdown

    16β)

    1000

    sity Jox

    12

    6

    ull slope

     β

    10

    100

    rren

    t Den

    s

    4

    8

    ty (Weibu

    high-k?

    1

    0

    180 nm 90 nm 45 nm 22 nm

    Cur

    0

    0 2 4 6 8 10 12Reliabilit

    180 nm 90 nm 45 nm 22 nm

    Technology

    Source: Borkar, IntelSource: Kauerauf EDL 2002

    0 2 4 6 8 10 12

    Gate Oxide Thickness  [nm]

    Source: Borkar, IntelSource: Kauerauf, EDL, 2002

    Copyright Sill, 2008 PPGEE‘08, Reliability 27

  • Future DesignsFuture Designsgg

    100 BT integration capacity

    ( )100

    Billions unusable (variations)

    Some will fail over timeBillion

    Some will fail over time

    Intermittent failuresTransistors

    Intermittent failures

    Copyright Sill, 2008 PPGEE‘08, Reliability 28

    Source: Borkar, Intel

  • Approaches to Increase Approaches to Increase ReliabilityReliability

  • Failure MeasurementFailure Measurement

    R li bilit R(t)Reliability R(t):– Probability of a system to perform as desired until time t

    – Example: R(tx) = 0.8 80 % chance that system is still running at time tx

    Mean Time To Failure MTTF:– Average time that a system runs until it fails

    Failure rate λ:Failure rate λ: – Probability that system fails in given time interval

    ( )

    1

    tR t e λ−∞

    =

    0

    1( )MTTF R t dtλ

    = =∫

    Copyright Sill, 2008 PPGEE‘08, Reliability 30

  • Bathtube Failure ModelBathtube Failure ModelInfant mortality Wearout period

    Increasing failure rateDeclining failure rate Based on latent reliability defects

    Increasing failure rate Based on TDDB, EM, etc.

    Normal lifetimeConstant failure rate

    re ra

    te

    Constant failure rateBased on TDDB, EM, hot-electrons…

    Failu

    r

    Time7-15 years1-40 weeks

    Copyright Sill, 2008 PPGEE‘08, Reliability 31

    weeks

  • ClassificationClassification

    FailureFailure

    PermanentDefects, wearout, out of range parameters EM

    Temporary

    range parameters , EM, TDDB ...

    Transient IntermittentProcess variations, infant mortality, random dopant fluctation, ...,

    Radiation N R di tiRadiationSoft errors

    Non - RadiationPower supply, coupling, operation peaks

    Copyright Sill, 2008 PPGEE‘08, Reliability 32

    p pSource: Mitra, 2007

  • The Whole System The Whole System Counts!Counts!yy

    Copyright Sill, 2008 PPGEE‘08, Reliability 33

  • Triple Module Redundancy (TMR)Triple Module Redundancy (TMR)p y ( )p y ( )

    Logic LInput Logic Lp

    A

    Voter OutputCopy of Logic L

    B

    Cg

    Copy of

    C

    Copy of Logic L

    Copyright Sill, 2008 PPGEE‘08, Reliability 34

  • Triple Module Redundancy: VoterTriple Module Redundancy: Voterp yp y

    Hardware realization of 1-bit majority voterHardware realization of 1-bit majority voter

    OUT = AB+AC+BCA

    OUTCBA OUTCBAB 1011

    OUTCBA1011

    OUTCBAOut

    C 00100100

    0010

    0100

    1110 1110

    :Requires 2 gate delays

    ::

    Copyright Sill, 2008 PPGEE‘08, Reliability 35

  • Triple Module Redundancy cont’dTriple Module Redundancy cont’dp yp y

    Note: For a constant module failure rate λNote: For a constant module failure rate λ1.0

    tyTMR

    0.5

    Rel

    iabi

    lit

    Simplex (only 1 module)

    R

    Time0

    After certain time: Reliability of TMR system is lower than of simplex systemWhy: After some time probability that 2 modules are wrong is higher that 2 modules are working!

    Copyright Sill, 2008 PPGEE‘08, Reliability 36

  • Self Adaptive DesignSelf Adaptive Designp gp g

    Extend idea of clock domains to Adaptive Power Domains p

    Tackle static process and slowly varying timing variations

    Control VDD V (indirectly by body bias) f by calibration atControl VDD, Vth (indirectly by body bias), fclk by calibration at Power On

    VDDTest inputsd

    M d lTest

    and responses

    fclkModuleModule

    VBB

    Copyright Sill, 2008 PPGEE‘08, Reliability 37

  • Self Adaptive Design: ExampleSelf Adaptive Design: Examplep g pp g p21 submodules per dieApplying 0.5V Forward/Reverse Body Biasing (FBB/RBB) in steps of 32 mV, respectively

    100%noBB ABB within die ABB

    97% highest bin

    60%

    ted

    die

    100% yield

    97% highest bin

    0%

    20%

    Acc

    ept

    0%

    Higher Frequency

    For given Freq and Power density

    Source: Borkar, Intel

    For given Freq and Power density100% yield with ABB 97% highest freq bin with ABB for within die variability

    Copyright Sill, 2008 PPGEE‘08, Reliability 38

    97% highest freq bin with ABB for within die variability

  • Razor FlipRazor Flip--FlopFloppp pp

    For uncertainty- and variation-tolerant designFor uncertainty and variation tolerant designRazor methodology

    V lt li th d l b d l tiVoltage-scaling methodology based on real-time detection and correction of circuit timing errorsUse the actual hardware to check for errorsLatch the input data twice:Latch the input data twice:

    Once on the clock edge, and then a little laterIf the data is not the same, you are going too fast

    Copyright Sill, 2008 PPGEE‘08, Reliability 39

    Source: Austin, Computer Magazine, 2004

  • Razor FlipRazor Flip--Flop cont’dFlop cont’dpp pp

    Logic stage n+1Main

    flip-flop

    MUX

    Logic Stage n

    E Sl

    DQ

    Shadow FF

    Shadowlatch Comperator

    Error_Sl

    CLKError

    ComperatorCLK

    CLK_delayed

    Copyright Sill, 2008 PPGEE‘08, Reliability 40

    Source: Austin, 2004

  • Shadow Transistor Shadow Transistor ApproachApproach

  • TDDB modelTDDB model

    TDDB between gate and channel

    For an Inverter, 65nm-BPTM:

    GateGate Oxide

    DrainSource 15

    20

    75%

    100%

    DrainSource

    10

    15

    50%

    75%Vout/VDD

    rel. delay

    525%

    y

    Model:

    00%‐RGC [kΩ] →RGC

    W1 W2

    Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.

    W= W1+W2

    Copyright Sill, 2008 PPGEE‘08, Reliability

    in MOS Transistors: Testing Implications at Circuit Level 1995.

    42

  • TDDB Model cont’dTDDB Model cont’d

    TDDB between gate and source/drain

    For an Inverter, 65nm-BPTM:

    100%V /V

    GateGate Oxide

    DrainSource

    50%

    75%Vout/VDD

    DrainSource

    25%

    50%

    Model:

    0%‐RGC [kΩ] →

    Based on: Segura et. al., “A Detailed Analysis of GOS Defects in MOS Transistors: Testing Implications at Circuit Level” 1995.

    Copyright Sill, 2008 PPGEE‘08, Reliability 43

    in MOS Transistors: Testing Implications at Circuit Level 1995.

  • Shadow TransistorsShadow Transistors

    1. Insertion of additional transistors in parallel to vulnerable transistors

    Shadow transistors (ST)

    Relative Delay V /V

    6

    8

    10Relative Delay

    wo/ ST75%

    100%VDD/Vout

    w/ ST

    2

    4

    6w/ ST

    25%

    50%wo/ ST

    0‐RGC [kΩ] →

    0%‐RGC [kΩ] →

    For an Inverter, 65nm-BPTM

    Copyright Sill, 2008 PPGEE‘08, Reliability

    For an Inverter, 65nm BPTM

    44

  • Shadow Transistors cont’dShadow Transistors cont’d

    2. Application of H-Vt/To transistors with:– Higher threshold voltage

    – Thicker gate oxide

    Less vulnerable to TDDBLess vulnerable to TDDB

    0.15/ 0.2210 4.81H Vt ToMTTF − = =0 2210

    oxtΔ

    /L Vt ToMTTF −0.2210

    Source: Srinivasan, “RAMP: A Model for Reliability Aware Microprocessor Design”Stathis, J., “Reliability Limits for the Gate Insulator in CMOS Technology”

    MTTF – Mean Time To Failure

    Copyright Sill, 2008 PPGEE‘08, Reliability 45

    Stathis, J., Reliability Limits for the Gate Insulator in CMOS Technology

  • Shadow Transistors cont’dShadow Transistors cont’d

    3. Selective insertion of shadow transistors in parallel to vulnerable transistors:– Component reliability depends on

    Activity, state, temperature, size, fabrication …

    Most vulnerable can be identified

    Shadow transistors only added in parallelNetlist only added in parallel to most vulnerable devices.

    modification

    Copyright Sill, 2008 PPGEE‘08, Reliability 46

  • Shadow Transistors cont’dShadow Transistors cont’d

    3. Selective insertion of shadow transistors in parallel to vulnerable transistors:– Component reliability depends on

    Activity, state, temperature, size, fabrication …E ti ti f t f t

    New Approach

    Most vulnerable can be identifiedEstimation of stress factors Determination of components reliabilityAdding redundancy only at most vulnerable componentsAdding redundancy only at most vulnerable components

    Advantage: Lower area, power and delay penalty compared to

    Shadow transistors only added in parallelNetlist

    complete redundancy or random insertion [Sri04] Source: [Sri04] Sirisantana, D&T, 2004only added in parallel

    to most vulnerable devices.

    modification

    Copyright Sill, 2008 PPGEE‘08, Reliability 47

  • Shadow Transistors cont’dShadow Transistors cont’d

    Ad t

    Increased reliability in respect to TDDB

    Advantages

    H-Vt/To: Reliability increases by ~5x (for ∆tox = 0.15 nm)Remarkable increase of system life time

    Drawbacks

    Higher input capacity → higher delay and dynamic power dissipationArea increase

    Remarks

    Only slight improvements for Gate-Drain/Source breakdownH-Vt/To has to be supported by technology

    Copyright Sill, 2008 PPGEE‘08, Reliability 48

  • ST ST –– Improvement MTTFImprovement MTTF≈ 23 % additional transistors

    20%

    ds TDDB

    15%

    as regard

    10%

    of M

    TTF 

    5%

    ovem

    net 

    0%

    c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552

    Impr

    Insertion of L‐Vt/To Shadow Transistors

    our algorithm random insertion

    Copyright Sill, 2008 PPGEE‘08, Reliability 49

  • ST ST –– Improvement MTTF (HImprovement MTTF (H--VtVt/To)/To)

    250%B

    (( ))

    200%

    250%

    gards TD

    DB

    150%

    TTF as reg

    100%

    net o

    f MT

    0%

    50%

    mprovem

    n

    0%

    c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552

    Im

    SPth = 30 SPth = 55

    Insertion of H‐Vt/To Shadow Transistors

    SPth   30 SPth   55

    Copyright Sill, 2008 PPGEE‘08, Reliability 50

  • Take Home MessagesTake Home Messagesgg

    I t t d i it f l ki d f f ilIntegrated circuits face several kinds of failures

    Decreasing structures sizes create more failure sourcesg

    Future designs should (have to) be failure tolerant

    Possible approaches:Triple Module Redundancy (TMR)Triple Module Redundancy (TMR)

    Self-Adapting Designs

    R Fli FlRazor Flip-Flops

    Shadow Transistors

    There’s still a lot to do!

    Copyright Sill, 2008 PPGEE‘08, Reliability 51

  • Th k !Th k !Thank you!Thank [email protected]@ufmg.br

    Copyright Sill, 2008 PPGEE‘08, Reliability 52