Spring 2008 CSE 591 Compilers for Embedded Systems
description
Transcript of Spring 2008 CSE 591 Compilers for Embedded Systems
![Page 1: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/1.jpg)
Spring 2008 CSE 591Compilers for Embedded
Systems
Aviral ShrivastavaDepartment of Computer Science and Engineering
Arizona State University
![Page 2: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/2.jpg)
Lecture 4: Soft Errors
Software Techniques
![Page 3: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/3.jpg)
Outline□Soft Errors Recap□Process Technology and Packaging
Solutions□Gate-level and Circuit-level Solutions□Microarchitectural Solutions
□Single-core□Multi-threaded
□Software Solutions□Multi Bit Upsets (MBUs)□Single Event Latchup
![Page 4: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/4.jpg)
Razor
□ Originally proposed to tolerate process variations and achieve power reduction□ Shadow latch clocked with a delayed clock □ If difference in values latched, raise error
□ How to use it to detect soft errors?
![Page 5: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/5.jpg)
Multi-issue Processors
□ Superscalar□ Execute instructions from the same thread
□ Multi-threading□ Execute instructions from the same threads in one cycle, but can switch
between applications□ Simultaneous Multithreading
□ Issue instructions from different threads in the same cycle
Superscalar
Multithreading
Simultaneous Multithreading
![Page 6: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/6.jpg)
SMT Solutions□ SRT: Simultaneous Redundant Threading
□ Duplicate a thread, and run them on the same core as leading thread and trailing thread
□ Threads maintain their contexts, including the register file□ Threads should not diverge when there are no faults□ Memory interface
□ Only leading thread can read from the memory□ Puts a copy in a LVQ – trailing thread reads from here□ Leading thread writes to STB to write store values□ Only trailing thread can write to the memory - after checking the
value in the STB□ Branch Interface
□ Leading thread writes branch outcomes in BOQ□ Trailing thread has perfect branch prediction
![Page 7: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/7.jpg)
SMT Solutions: PER□ Trailing thread competes for resources – High ILP
phases□ STB fills up causing leading thread stalls□ PER: Partial Explicit Redundancy
□ Leading thread uses all resources during high-ILP phases□ SEM: Single Execution Mode
□ Trailing thread executes during low-ILP phases□ REM: Redundant Execution Mode
□ In REM state, check all instructions□ Need resume point for trailing thread
□ Maintain state (LVQ, STB, RF, etc…)□ Proportional to slack size
![Page 8: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/8.jpg)
SMT Solutions: IRTR□IR: Instruction Reuse
□Do not execute an instruction, if it has already executed with the same inputs
□Keep a reuse buffer
□IRTR: Implicit Redundancy Through Reuse□Check with previous value for soft errors
□If matches, continue and overwrite the value in buffer□If mis-match, raise flag
□During high ILP regions
![Page 9: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/9.jpg)
Outline□Soft Errors Recap□Process Technology and Packaging
Solutions□Gate-level and Circuit-level Solutions□Microarchitectural Solutions
□Single-core□Multi-threaded
□Software Solutions□Multi Bit Upsets (MBUs)□Single Event Latchup
![Page 10: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/10.jpg)
Watchdog Processor & Control Flow Checking
□ Watchdog processor□ Simple processor, receives signals from the main processor□ Checks to see if the signals are coming in correct order
□ S3 should not come after S1□ Watchdog program can be automatically generated□ Formal techniques for correctness□ Asynchronous communication of Main processor with
watchdog processor
Processor
Memory Watchdog Processor
BB1
BB2
BB3
Send S1
Send S2
Send S3
![Page 11: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/11.jpg)
EDDI (Error Detection by Duplicated Instructions)
□ Duplicate instructions□ Validation instructions
□ Store and branch are sync points□ Check store and branch operands
□ Memory penalty□ Load/store from duplicated locations
![Page 12: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/12.jpg)
EDDI+CFCSS (Control Flow Checking by Software Signatures)
□ At the beginning of the node, perform G = G xor d□ d2 = s1 xor s2, Then G = s1 xor (s1 xor s2) = s2
□ If two source nodes jump to the same destination node, then the two source nodes should have the same signature
![Page 13: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/13.jpg)
CFCSS + SWIFT (Software Implemented Fault Tolerance)
□ If two source nodes jump to the same destination node, then the two source nodes should have the same signature□ Need another path-dependent D□ B1 -> B5, D=0, Then G = s1 xor d5 xor 0 = s5□ B3 -> B5, D = s1 xor s3, Then G = s3 xor (s1 xor s5) xor (s1
xor s3) = s5
![Page 14: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/14.jpg)
ED4I: Error Detection by Diverse Data and Duplicated Instructions
• The simplest way to detect Byzantine Faults is to run the same program on multiple processors and compare results.
• ED4I is Byzantine Fault detection for uniprocessors.
• Must take into account both temporary and and permanent faults.• Re-executing with same inputs does not guard against
permanent faults• Overhead = 100%
![Page 15: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/15.jpg)
Key Idea• Lets feed into the program two different sets of
data and then compare the results.• Key Insight:
• If the program only uses arithmetic operations, we can alter the input by multiplying all input numbers by a constant.
• Then the modified output will be the (real output) * (the constant).
• Thus, you can verify that the two computations succeeded AND the two computations will be affected by errors differently.
![Page 16: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/16.jpg)
New Program
• If we alter the input to the program, we must alter the program to work with this modified input.
• The transformation is given the constant k (called the “diversity factor”) and it creates the “k-factor diverse program”.
• The new program will have the same control flow graph as the old program but all the variables will be k-multiples of the of original ones.
![Page 17: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/17.jpg)
Transformations• If k<0, branches flip directions
(> ↔ <, ≥ ↔ ≤)• All constants in code get multiplied by k.• Addition and Subtraction of variables
unchanged.• Multiplication:
v1*v2*....*vn → (v1*v2*....*vn)/kn-1
• Division: v1/v2 → (v1/v2)*k
![Page 18: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/18.jpg)
Fault Detection & Data Integrity
• For functional unit hi (such as the adder), fault f and diversity factor k:
• Xi = is the set of inputs to hi
• Ei = subset of X containing the inputs that will result in erroneous output due to the fault.
• E'i = subset of Ei that will escape detection
• Ci(k) = Probability of catching an error in h i.• Di(k) = Probability of missing no errors in hi.
∣ ∣)()('
i
ii
fi X
EEfP=kC )1)(()('
i
i
fj X
EfP=kD
![Page 19: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/19.jpg)
Choosing the value of k• For some functional units we can derive Ci(k) and
Di(k) analytically for each k.• This is too hard in general so try out a range of k's
empirically to determine Ci(k) and Di(k).• Bus Signal (12-bit)
• 12-bit carry look-ahead adder
• 12-bit Multipliers and Dividers
![Page 20: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/20.jpg)
Analytical Computation of AVF
□ Iteration Space□ L-dimensional integer vector space
□ L: levels of loop
□ Each point in IS represents an iteration□ Data dependences exist□ Fully ordered in time
□ Array Space□ M-dimensional integer vector space
□ M: array dimension
□ Every point represents an element of the array
}0,1,|),...,{( 21 iiL
L NxLiiZxxxIS
for (i=0; i<N1; i++) for (j=0; j<N2; j++) a[i][j] = a[i][j-1]+ a[i-1][j] + a[i][j+1]}0,1,|),...,{( 21 ii
ML DxMiiZyyyAS
![Page 21: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/21.jpg)
Analytical Computation of AVF
□ Access Function (AF) of a reference□ Mapping from IS to AS□ When are the elements of array accessed by a
reference
□ References will access different parts of Array Space□ Divide the Array Space into regions, in which
every element is accessed by a subset of references
□ Array Interval (AI): Subset of AS that the reference accesses
□ Every element is accessed by the same set of references
}0,0,,|),{( 221122112
21]][[]][[ NxNxxyxyZyyAF jiajia
}0,0,10,|),{( 221122112
21]][[]10][[ NxNxxyxyZyyAF jiajia
}0,0,1,*2|),{( 221122112
21]][[]][2*[ NxNxxyxyZyyAF jiajia
![Page 22: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/22.jpg)
Analytical Computation of AVF
Iteration Intervals for an Array Interval
□ Each reference will access the elements of array interval at iterations given by AF (Access Function)
□ Iteration Interval (II) is AF in Array Interval□ Formula of access time of each element in II
□ Vulnerability can be computed as a formula on II□ Time from r/w r□ A reference either reads or writes (not both)
□ Need to time-order points in II□ Break into Iteration Segments, which can be ordered
□ Strict order, or point-wise ordered
![Page 23: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/23.jpg)
Outline□Soft Errors Recap□Process Technology and Packaging
Solutions□Gate-level and Circuit-level Solutions□Microarchitectural Solutions
□Single-core□Multi-threaded
□Software Solutions□Multi Bit Upsets (MBUs)□Single Event Latchup
![Page 24: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/24.jpg)
Multiple-bit Upsets (MBUs)
□ Error rate ~ 1/100th of SEU□ Hamming Code
□ 1-bit error correction, 2-bit error detection□ Reed Solomon Codes
□ RS(n,k) with s-bit symbols□ s - Each symbol is s-bits□ n – total number of bits per code, n = 2s-1□ k – data bits□ Number of parity bits = 2t = n-k
□ Can correct errors in ‘t’ symbols, where t = (n-k)/2□ RS(255, 223) with 8-bit symbols
□ Can correct 16 symbol errors in each codeword (255 bits)
□ Other multi-bit error detection and correction schemes□ LDPC
![Page 25: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/25.jpg)
Copyright 2005, M. Tahoori 25
BitRead
Bit has error
protection
Erroris only detected(e.g., parity + no recovery)
Error can be corrected(e.g, ECC)
yes no
Does bit matter?
Silent Data Corruption
(SDC)
yesyes
no
Detected, but unrecoverable
error (DUE)
no error
yes no
benign faultno error
benign faultno error
Strike on state bit (e.g., in register file)
![Page 26: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/26.jpg)
Interleaving bits
□ Interleaving converts□ spatial multi-bit error multiple single bit errors
bits
X X X
X = covered with single ECC code
+ + +
+ = covered with different ECC code
// /00 0
![Page 27: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/27.jpg)
Two Separate Strikes on Different Bits
Temporal Double Bit Errors
□ SECDED ECC (single error correction, double error detection)□ could detect error, but cannot correct the error□ if errors accumulate
□single bit correctable error becomes a double bit detectable error
Cycle 100 Cycle 1,000,000
![Page 28: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/28.jpg)
Solutions for Temporal Double Bit Errors
□ Natural Effects□ whenever a processor reads a cache block, we can correct
the single bit error□ check for errors when cache blocks are replaced from the
cache
□ More Powerful ECC □ SECDED ECC requires 8 bits per 64 bits
□7 bits for single bit correction□8th bit for double bit detection□Overhead = 13%
□ ECC with two bit correction requires 12 bits per 64 bits□Overhead = 19%
![Page 29: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/29.jpg)
Scrubbing□Periodically read memory and
correct all single bit errors
□Disallows accumulation of temporal double bit errors
□Standard technique in main memories (DRAMs)
![Page 30: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/30.jpg)
Outline□Soft Errors Recap□Process Technology and Packaging
Solutions□Gate-level and Circuit-level Solutions□Microarchitectural Solutions
□Single-core□Multi-threaded
□Software Solutions□Multi Bit Upsets (MBUs)□Single Event Latchup
![Page 31: Spring 2008 CSE 591 Compilers for Embedded Systems](https://reader035.fdocuments.net/reader035/viewer/2022081515/56814a4a550346895db76819/html5/thumbnails/31.jpg)
Single Event Latchup
□ SEL: Single Event Latchup□ Parasitic circuit elements forming a silicon controlled rectifier (SCR)□ Potentially destructive
□ the device current may destroy the device if not current limited and removed "in time.
□ Removal of power to the device is required in all non-catastrophic SEL conditions in order to recover device operations.
□ SEL probability increases with temperature!