Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced...
-
Upload
james-gibson -
Category
Documents
-
view
220 -
download
2
Transcript of Computer Engineering Group Brandenburg University of Technology at Cottbus 1 Ressource Reduced...
1
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Ressource Reduced Triple Modular Redundancy for Built-In Self-Repair in
VLIW-Processors
Mario Schölzel
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
2
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Outline
• Why Built-In Self-Repair?• Base Architecture • Resource Reduced TMR• Program Modifications • Architecture Modifications• Conclusions and Limitations
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
3
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Why Built-In Self-Repair ?
• Hardware becomes unreliable (permanent faults due to small feature size)
• ITRS Roadmap 2005 for Design predicts requirement for reliable systems due to:– Infeasibility of full functional test at
manufacturing exit – Relaxing 100% correctness requirement
(reduces functional test complexity and cost)
• Consequence: Redundancy in the system is required for robustness!
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
4
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Simple TMR-Approach
Processor 1
Processor 2
Processor 3
VoterInput Output
We consider the following application domain:• High-performance signal processing
applications (i.e. image- and audio-processing)
• Real-Time demands
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
5
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Basic Processor Architecture
opcode1 src1.1 src1.2 dst1 srcn.1 srcn.2 dstnopcode2 opcoden...Branch
Data Path
Register File
Branch FU 1
Extern
FU n
Control Path
Control Logic
Program Memory Data Memory
Instruction Pointer ...
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
6
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Idea of Resource Reduced TMR
• Redundant operators are naturally available in a VLIW data path
• In TMR: Three results are only necessary in case of a mismatch of two results
• Idea of RR-TMR: Perform every operation only by two operators and use in non-fault case third operator for executing regular operations
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
7
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Modified VLIW Data Path
Data Path
Regular Register File
Branch
Extern
Control Path
Control Logic
Program Memory Data Memory
Instruction Pointer ...
Temporary Register File
FD & C Logic FD & C LogicVoting Control
Logic
FU nFU 1
Limitation: Every operator must be available at least three times.
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
8
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
+ +
+ +
Program Transformation5: 4: 7: 8: 12: 11: 2: 1:
6:+ 9:+ 13:+ 3:+
10:+ 14:+
17: 18: 16: 15:
20:+ 19:+
24: 23: 21: 22:
26:+ 25:+
28:+ 27:+
Duplicated Operations
+
+
Pair of Reference Operations
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
9
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Modified Part of Instruction Word
• RefFU: number of FU that executes reference operation
• Mod=0: RefReg is target register in TRF• Mod=1: RefReg delivers reference value from
TRF• These fields must be set correctly for every
operation and its duplicate after scheduling all operations (We allow scheduling of original and duplicate operations at different times)
opcode src1 src2 dst RefREG RefFUmod
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
10
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: Instruction Word
+
+Time step 8
Time step 10
FU 2 FU 3
Time step 9
Result of Scheduling
Corresponding Instruction Words
… …
+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU
+ R3 R6 R0 1 R6 2
Instr. 8
Instr. 9
Instr. 10
OpC Src1 Src2 Dst mod RReg RFU
Instruction Word Part of FU 2 Instruction Word Part of FU 3
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
11
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU i
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i_Fault
Write Port of FU i in RF
error
mod
From voting logic to voting logicto voting logic
FD&C Logic Details
Every bit represents fault
status of corresponding
operator
Opcode of currently executed
operation in corresponding
FU
Compares current result and reference
value from register RefReg
in TRF
Decides whether an error occurs first time
or not and gives a signal to
Voting Logic
Detects, if current result is faulty
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
12
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: Correct Execution
+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU
+ R3 R6 R0 1 R6 2
Instr. 8
Instr. 9
Instr. 10
OpC Src1 Src2 Dst mod RReg RFU
Instruction Word Part of FU 2 Instruction Word Part of FU 3
0
0 0
0
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 2
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 2 in RF
errormod
From voting logic to voting logicto voting logic
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 3
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 3 in RF
error
mod
From voting logic to voting logicto voting logic
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
13
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: FU 2 is Faulty
+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU
+ R3 R6 R0 1 R6 2
Instr. 8
Instr. 9
Instr. 10
OpC Src1 Src2 Dst mod RReg RFU
Instruction Word Part of FU 2 Instruction Word Part of FU 3
1
1 1
0
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 2
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 2 in RF
errormod
From voting logic to voting logicto voting logic
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 3
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 3 in RF
error
mod
From voting logic to voting logicto voting logic
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
14
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: FU 3 is Faulty
+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU
+ R3 R6 R0 1 R6 2
Instr. 8
Instr. 9
Instr. 10
OpC Src1 Src2 Dst mod RReg RFU
Instruction Word Part of FU 2 Instruction Word Part of FU 3
0
0 0
1
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 2
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 2 in RF
errormod
From voting logic to voting logicto voting logic
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 3
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 3 in RF
error
mod
From voting logic to voting logicto voting logic
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
15
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: Fault Detection (1)
+ R3 R6 R0 0 R6 3OpC Src1 Src2 Dst mod RReg RFU
+ R3 R6 R0 1 R6 2
Instr. 8
Instr. 9
Instr. 10
OpC Src1 Src2 Dst mod RReg RFU
Instruction Word Part of FU 2 Instruction Word Part of FU 3
0
0 0
0
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 2
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 2 in RF
errormod
From voting logic to voting logicto voting logic
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 3
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 3 in RF
error
mod
From voting logic to voting logicto voting logic
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
16
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: Fault Detection (2)
+ R3 R6 R0 1 R6 2
OpC Src1 Src2 Dst mod RReg RFU
Executing mismatch causing operation of FU 3 again in another FU. One of the following two cases applies:
0
+ R3 R6 R0 1 R6 2
OpC Src1 Src2 Dst mod RReg RFU
1
No mismatch is discovered. FU 2 and FU 4 computed correct result. Suppress Write-Back of FU 3
A mismatch is discovered again. It is assumed that FU 3 computed correct result. This is written to register file.
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 1
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 1 in RF
error
mod
From voting logic to voting logicto voting logic
RefReg
RefFU
Cmp
Fault Vector
opcode
Result of FU 1
Write Port TRF
Read Port TRF
Control of TRF Read Ports
Fault Re-
mem-ber
errOpc
errDet
i _Fault
Write Port of FU 1 in RF
error
mod
From voting logic to voting logicto voting logic
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
17
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Details FD&C-Logic
cs1
csk
...
cs_s
el
Voting Instruction
Control Signal Queue
Dec
oder
Fet
ched
Inst
ruct
ion
Fault_1
Fault_m...Voting-Logic
write_disable
to FD&C 1
to FD&C m
op_sel
Operation mode
fu_sel
Fault Memory
cs2
opMode (to data path)
Select a certain control word (normal: cs1)
Current operation mode (normal, voting, resume)
Select control signals of fault
causing operation
Redirect selected signals to a working FU
Remember faulty operators
Control of (De-)Multiplexers
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
18
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: FD&C-Logic
cs1
csk
...
cs_s
el
Voting Instruction
Control Signal Queue
Dec
oder
Fault_1
Fault_m...Voting-Logic
write_disable
to FD&C 1
to FD&C m
op_sel
Operation mode
fu_sel
Fault Memory
cs2
opMode (to data path)
*&
& * +-
nopnopnop
Example Schedule Situation of FD&C-Logic
+ Instruktion 1 (EX)
&
-
&
-Instruktion 2 (Fetch)
Instruktion 3
Fault is reported
normal
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
19
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example: FD&C-Logic
cs1
csk
cs_s
el
Control Signal Queue
Dec
oder
Fault_1
Fault_m...Voting-Logic
write_disable
to FD&C 1
to FD&C m
op_sel
Operation mode
fu_sel
Fault Memory
cs2
opMode (to data path)
*-
*
* +-
* &&
nopnop*
nopnopnop
...
Example Schedule Situation of FD&C-Logic
+ Instruktion 1 (WB)
&
-
&
-Instruktion 2 (EX, stopped)
Instruktion 3 (Fetched, stopped)
Voting
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
20
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Example FD&C-Logic
cs1
csk
cs_s
el
Control Signal Queue
Dec
oder
Fault_1
Fault_m...Voting-Logic
write_disable
to FD&C 1
to FD&C m
op_sel
Operation mode
fu_sel
Fault Memory
cs2
opMode (to data path)
*-
** +-
* &&
nopnop*
nopnopnop
...
Resume starts here
Resume
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
21
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Limitations in Error Detection
++
Fu1 Fu2 Fu3 …Assumption: Operator + in FU 1 is faulty.
Problem: Correctness of Operator + in FU 2 can no longer be checked!
++
Solution: Check correctness of FU 2 with a reference operation in FU 3.
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
22
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Preliminary Results5: 4: 7: 8: 12: 11: 2: 1:
6:+ 9:+ 13:+ 3:+
10:+ 14:+
17: 18: 16: 15:
20:+ 19:+
24: 23: 21: 22:
26:+ 25:+
28:+ 27:+
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
23
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Preliminary Results
Non-Fault Tolerant Fault tolerant
L FUs Add Mul FUs Add Mul
8 4 4 4 8 8 8
9 4 3 4 8 7 8
10 3 3 3 6 6 6
11 3 3 3 6 6 6
12 3 3 2 5 5 5
13 3 3 2 5 5 5
14 2 2 2 5 5 4
15 2 2 2 4 4 4
16 2 2 2 4 4 3
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
24
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Conclusion• Method can detect and repair permanent
and transient faults• Known faults do not cause a delay, new
faults cause a delay of at most 2maxLat+1
• Multiple known faults can be repaired (as long as at least on operation of every pair is executed by a non-faulty FU)
• Overhead of operators and register file ports of approximately 100%
• Overhead of Control-Logic is unknown so far (VHDL model is missing)
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
25
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Open Problems
• Handling of multiple faults that first occur at the same time is possible but difficult
• Faults in wires, registers, control path and FD & C logic
• Hardware implementation for better area and performance estimation
Motivation
VLIW
Architecture
RR-TRM Idea
SW
Modifications
HW
Modifications
Conclusion
26
Computer Engineering Group
Brandenburg University of Technology at Cottbus
Mario SchölzelSPA 2007
Thank You!