[email protected] Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2.
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of [email protected] Exploiting HW+SW Partitioning for Reliable Embedded Systems Part 2.
Exploiting HWExploiting HW++SW PartitioningSW Partitioning
for Reliable Embedded Systemsfor Reliable Embedded Systems
Part 2Part 2
SummarySummary
1. Introduction: targeting the problem
2. The Possible Solution
2.1. SW-Based Fault Detection Mechanisms
2.2. Migrating SW-Based Fault Detection Mechanisms into
HW
3. Experimental Evaluation
4. Final Considerations
1. Introduction: targeting the problem
The increasing # of computer-based
critical applications rises questions about
the techniques for guaranteeing sufficient
degrees of reliability and to keep reasonable
costs for design and manufacturing.
?
? Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)
Duplication/Voter, TMRLayout-Driven Fault Avoidance
Watch-Dogs
Consistency ChecksCapability Checks
Re-computation
EDAC
1. Introduction: targeting the problem
Duplication/Voter, TMRLayout-Driven Fault Avoidance
Watch-Dog Timer
?? Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)
Consistency ChecksCapability Checks
Re-computation
EDAC Impacts design:
performance, weight, size/volume,
power consumption, reliability.
Impacts design:
performance, weight, size/volume,
power consumption, reliability.
1. Introduction: targeting the problem
Duplication/Voter, TMRLayout-Driven Fault Avoidance
Watch-Dog Timer
? Techniques commonly used (on-chip and
system level): stand-alone solutions
Fault-Tolerance Techniques
(HW, SW, Time or Info domains)
Consistency ChecksCapability Checks
Re-computation
EDAC Impacts design:
performance, weight, size/volume,
power consumption, reliability.
Impacts design:
performance, weight, size/volume,
power consumption, reliability.
1. Introduction: targeting the problem
HW Techniques:
Disadvantages:
High area overheadHigh development/fab cost
SW Techniques:Disadvantages:
Significant performance degradationMemory overhead
1. Introduction: targeting the problem
Development of a hybrid
methodology (HW+SW redundancies)
able to perform runtime detection of
errors in μprocessor-based SoCs may
have very good cost X benefit
returns.
2. The Possible Solution2. The Possible Solution
Returns: Minimization of area overhead and fab/development costs
(benefits of SW-based redundancy techniques)
Improvement of performance and minimization of memory
overhead (benefits of HW-based redundancy techniques)
In summary: Minimize fab cost and performance degradation, while
improving reliability
Target faults:Control flow errors
Data handling errors
2. The Possible Solution2. The Possible Solution
Hybrid methodology (HW+SW
redundancies) explores:
• I-IP Core Architecture
• Software-Based Techniques
2. The Possible Solution2. The Possible Solution
HW+SW SoC FT Architecture:
P IP
MemoryIP
CustomIP
I/O port
WDTI-IP
bus
SoCSoCMismatchMismatch
signalsignal
Computes run-time and stores control flow
signatures and data read from memory
Stores a hardened program
Information flow traveling
on the bus
Information flow traveling
on the bus
2. The Possible Solution2. The Possible Solution
Faults Affecting Data:CerberusCerberus (Matteo et al.)
Faults Affecting Control:ECCAECCA (Matteo et al.)CFCSSCFCSS (McCluskey et al.)ECIECI (Miremadi et al.)
2. The Possible Solution2. The Possible Solution
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
Original CodeOriginal Code:: Modified CodeModified Code::
a = b; a0 = b0;a1 = b1;if(b0 != b1)
error(); a = b + c; a0 = b0 + c0;
a1 = b1 + c1;if (b0 != b1) || (c0 != c1)
error(); Code modification for errors affecting data.
Faults Affecting Data:Cerberus (Matteo et al.)
2. The Possible Solution2. The Possible Solution
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Original CodeOriginal Code:: Modified CodeModified Code::
res = search(a); search(a0, a1, &res0, &res1);… …int search(int p) void search(int p0, int p1, int *r0, int *r1){ int q; { int q0, q1;… …q = p + 1; q0 = p0 + 1;… q1 = p1 + 1;return(1); if(p0 != p1)}error();
…*r0 = 1;*r1 = 1;return;
} Code transformation for errors affecting procedure parameters.
Faults Affecting Data:Cerberus (Matteo et al.)
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Original CodeOriginal Code:: Modified CodeModified Code::
/* Basic Block beginning */ /* Basic Block beginning #371 */… ecf = 371;/* Basic Block end */ …
if (ecf != 371)error ();
/* Basic Block end */
Example of detection of errors affecting not allowed branches
Faults Affecting Control:ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Original CodeOriginal Code:: Modified CodeModified Code::
If (condition) If (condition){ /* Block A */ { /* Block A */… if (!condition)} error();else …{ /* Block B */ }… else} { /* Block B */
if (condition)error();
…}
Code transformation for a test statement
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
Faults Affecting Control:ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
2. The Possible Solution2. The Possible Solution
In summaryIn summary
To harden a given program this approach defines the
following assertions introduced into each basic block v j:
• Test Assertion: it controls the signature of basic block vj
checking if vi belongs to pred(vj).
• Set Assertion: updates the signature setting it to the value Bj
associated to vj.
Bj = (Bi M1) M2
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
Faults Affecting Control:ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
2. The Possible Solution2. The Possible Solution
01: while(k1<DIM)
02: {
03: if( != M1 && != M2 )04: //Error detected
05: A1 = matrixA1[i1][k1];
06: B1 = matrixB1[k1][j1];
07: C1 += A1*B1;
08: matrixC1[i1][j1] = C1;
09: k1++;
10: j =(i ^M1)^M2;
11: }
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
Faults Affecting Control:ECCA - (Error Control-Flown Checking using Assertions) (Matteo et al.)
Principle: Modification of a Basic Block
Faults Affecting Control:CFCSS (McCluskey et al.)
2. The Possible Solution2. The Possible Solution
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Basically, the approach consists of six steps:Basically, the approach consists of six steps:
1) DivideDivide the program into basicbasic blocksblocks. A basic block is a minimal set of ordered instructions in which its execution begins from the first instruction and terminates at the last instruction. There is no branching instruction in a basic block except possibly for the last one. A basic block terminates at either an instruction branching to another basic block or an instruction receiving transfer of control flow (CF) from two or more places in the program. Notations: (a) V = {vi: i = 1, 2,…, n}: set of vertices denoting basic blocks. (b) E: set of edges
denoting possible CF between basic blocks.
2) ConstructConstruct a graphgraph for the program according to the instructions flow (each node represents a basic block). Note that a program can be represented by a program-graph, P, where bri,j are not necessarily explicit branch instructions; they also
represent fall-through execution paths, jumps, subroutine calls, and returns. Fig. 2.5 is an example. Notation: P: Program Graph {V, E}.
3) ArbitrarilyArbitrarily assignassign a signaturesignature for eacheach nodenode (compilation time).
4) ComputeCompute the signaturesignature differencedifference between the source and the destiny blocks.
5) ComputeCompute the newnew signaturesignature for each nodeeach node (execution time).
6) CompareCompare both signaturessignatures.
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Sequence of instructionsand its graph. Detection of illegal branch.
General Form f = f(G, di) = G XOR di
G2 = f(G1, d2) = G1 XOR d2 = s1 XOR (s1 XOR s2) = s2
G4 = f(G1, d4) = G1 XOR d4 = G1 XOR (s3 XOR s4) = s1 XOR s3 XOR s4 ≠ s4
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Detection of an illegal branch: a numerical example
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Node v1 and node v3 have the same signatures: Branch Fan-in Nodes
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Node v1 and node v3 have different signatures: Adjusting Signature D
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:CFCSS (McCluskey et al.)
Node v1 and node v3 have different signatures: Adjusting Signature D
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
G5 = f(G1, d5, D1) = G1 XOR d5 XOR D1 = s1 XOR (s1 XOR s5) EXOR “000” = s5
G5 = f(G3, d5, D3) = G3 XOR d5 XOR D3 = s3 XOR (s1 XOR s5) EXOR “s1 EXOR s3” = s5
2. The Possible Solution2. The Possible Solution
Faults Affecting Control:ECI (Miremadi et al.)
Insertion of trap instructions in the program area, in the data area, and in the unused area of the memory.
The ECIs are inserted in the main memory locations that are not used by the CPU during normal execution. Thus, the execution of an ECI is a indication that a control flow error has occurred.
The task of an ECI is to initiate a recovery process.
SW-BasedSW-Based Fault Detection Mechanisms Fault Detection Mechanisms
WDT / I-IPWDT / I-IP works in symbiosis with the
processor which is not modified.
WDT / I-IPWDT / I-IP continuously spies the information
execution flow on the bus (which is computed
to test and update signatures).
If a mismatch is detected, WDT / I-IPWDT / I-IP outputs a
mismatch signal.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
01: while(k1<DIM)02: {03: IIPtest( BB1 );04: IIPtest( BB2 );05: A1 = matrixA1[i1][k1];06: B1 = matrixB1[k1][j1];07: C1 += A1*B1;08: matrixC1[i1][j1] = C1;09: k1++;10: IIPset( BB2);11: }
2. The Possible Solution2. The Possible Solution
Peace of code for control-flow faults detection (ECCA Partitioning):
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
03: if( != M1 && != M2 )04: //Error detected
10: j =(i ^M1)^M2;
WDT / I-IP Architecture:WDT / I-IP Architecture:
• Three modules: - bus interface logic
- consistency check logic
- CAM memory
Bus InterfaceLogic
Consistency CheckLogic
bus
MismatchMismatchSignalSignalWDT / I-IP
adx, data Compares flow signatures
Detects signatures
passing on the bus
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
CAM Memory
Stores flow signatures
ClkReset
Instruction_inRam_data_in
Ram_address_in
WDT / I-IP
Modulo 1Bus Interface Logic
ClkResetInstrucion_inRam_data_inRam_address_in
Data_memory_in
Data_memory_outAdr_memory_out
Ctrl_rw_out
En_compare_outData_1_outData_2_out
Modulo 2CAM Memory
ClkReset
Data_memory_out
Data_memory_inAdr_memory_inCtrl_rw_in
Modulo 3Consistency Check
LogicClkresetEn_compare_outData_1_outData_2_out
Mismatch Signal
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
WDT / I-IP Architecture:WDT / I-IP Architecture:
Consider now that the µprocessor-based SoC runs under an Operating System …
2. The Possible Solution2. The Possible Solution
The application code is only a fragment of the total time allocated during system operation!
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
?
2. The Possible Solution2. The Possible Solution
• Critical applications need operating systems (OS) which guarantee a correct and safe behavior despite the occurrence of errors.
• Faults can affect OS calls as well as the OS kernel: How does the system react in front of invalid or corrupted values operated by the kernel?
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
µProcessorµProcessor WDT / I-IPWDT / I-IP
ApplicationApplication
Address + Data BusAddress + Data Bus
Status RegisterStatus Register
SoC
Memory (Operating System)Memory (Operating System)DriverDriver
HW-SW Partitioning for Fault-Detection in Complex Systems
2. The Possible Solution2. The Possible Solution
Memory (Application Code + Data)
Memory (Application Code + Data)
Error Indication
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
µProcessorµProcessor WDT / I-IPWDT / I-IP
ApplicationApplication
Address + Data BusAddress + Data Bus
Status RegisterStatus Register
SoC
Memory (Operating System)Memory (Operating System)DriverDriver
HW-SW Partitioning for Fault-Detection in Complex Systems
DragonBall, ARM, DragonBall, ARM, Pentium, 8086, 68KPentium, 8086, 68K
ProgrammableProgrammableLogicLogic
SW Part
HW Part
SW Part
2. The Possible Solution2. The Possible Solution
Memory (Application Code + Data)
Memory (Application Code + Data)
Error Indication
µCLinux, µµCLinux, µCOS-IICOS-II
SW Part
Com ChannelCom Channel
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
MC68VZ328 Block Diagram
CGM&Power Control
Real-TimeClock
In-CircuitEmulation
InterruptController
MemoryController
BootstrapMode
8/16-Bit 68000 Bus Interface
FLX6800StaticCPU
16-BitTimers(2)
8-BitPWM1
16-BitPWM2
SPI 1
UART 2IrDA1.0
UART 1IrDA1.0
SPI 2
LCDController
GP
IO P
ort
s
GP
IO P
ort
s
6800
0 In
tern
al B
us
Special FunctionPins (CPU Space)
Status InformationStatus Information
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Status InformationStatus Information
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Special Function Pins (CPU Space): FC2, FC1, FC0
Function Code Output Processor Cycle Type
FC2 FC1 FC0
0 0 0 Undefined, reserved
0 0 1 User Data
0 1 0 User Program
0 1 1 Undefined, reserved
1 0 0 Undefined, reserved
1 0 1 Supervisor Data
1 1 0 Supervisor Program
1 1 1 CPU space (interrupt acknowledge)
Status InformationStatus Information
68000 Die68000 Die
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
68010 – 68030 Dies68010 – 68030 Dies
A16 - A19 Pins
Status InformationStatus Information
FC2 = FC1 = FC0 = 1 indicate CPU operations other FC2 = FC1 = FC0 = 1 indicate CPU operations other than interrupt acknowledge cycles (e.g. than interrupt acknowledge cycles (e.g. co-processor communications). co-processor communications).
Then, different CPU spaces are indicated Then, different CPU spaces are indicated in in A16 - A19A16 - A19 pins, if properly decoded. pins, if properly decoded.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Interrupt Control Pins: IPL2, IPL1, IPL0
Interrupt Processor Level Processor Cycle Type
IPL2 IPL1 IPL0
0 0 0 Lowest priority
0 0 1 |
|
|
|
|
|
|
|
|
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1 Highest priority
Status InformationStatus Information
68000 Die68000 Die
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Event-Ticking Pins – ETPs: PM0, PM1
Status InformationStatus Information
Event-Ticking Pins – ETP associated with Model Specific Registers – MSR to monitor:
# cache memory misses, # committed instructions, # interruptions executed, # taken branches, ...
Model Specific Registers – MSRs: Counters CRT0 and CRT1 programmed through the Control and Events Selector Register - CESR
Pentium DiePentium Die
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Status InformationStatus Information
Instructions used to program counters CRT0 and CRT1 through the Control and Events Selector Register – CESR:
WRMSRRDMSR
The RDMSR instruction may be executed in all CPLs (Current Privileged Level), but the WRMSR instruction may only be executed in CPL0.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Event-Ticking Pins – ETPs: d_i, s_u
Status InformationStatus Information
DragonBall CoreDragonBall Core
If “0”: data;If “1”: instruction;If “z”: undefined.
If “0”: supervisor mode; If “1”: user mode; If “z”: undefined.
These pins were added to the processor core to serve as interface with the I-IP (watch-dog).
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Event-Ticking Pins – ETPs: d_i, s_u
Status InformationStatus Information
2. The Possible Solution2. The Possible Solution
• OS error detection coverage has been measured and observations about OS critical data structures to be improved have been commented, in order to improve the final robustness of the µµCOS-IICOS-II operating system.
Juan Pardo, 2004Fault Tolerant Systems Group
Polytechnic University of Valencia Spain
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
2. The Possible Solution2. The Possible Solution
µC/OS-II Operating SystemµC/OS-II Operating System
• Selection came motivated from the perspective that it is a system widely used in particular for embedded applications since several years ago.
First Version µC/OS 1992
• Industrial robots, motor control, medical instruments, etc.
• It is 99% compliant with the Motor Industry Software Reliability Association (MISRA) C Coding Standards.
• All Modified Condition Decision Coverage (MCDC) code in µC/OS-II has been removed, improving code quality for RTCA / EUROCAE DO-178B Level A-certified environments for avionics applications.
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
2. The Possible Solution2. The Possible Solution
µC/OS-II: µC/OS-II: CharacteristicsCharacteristics
• Portable: uC/OS-II is written in highly portable ANSI C, with target microprocessor-specific code written in assembly language.
• ROMable: was designed for embedded applications. This means that if you have the proper tool chain (i.e., C compiler, assembler, and linker/locator), you can embed uC/OS-II as part of a product.
• Scalable: it’s possible to use only the services needed in the application. This allows to reduce the amount of memory (both RAM and ROM) needed. Scalability is accomplished with the use of conditional compilation (full version: 8KB).
• Preemptive: uC/OS-II is a fully preemptive real-time kernel. This means that uC/OS-II always runs the highest priority task that is ready.
• Multitasking: uC/OS-II can manage up to 64 tasks (Current version of the software reserves 8 of these tasks for system use. This leaves for application up to 56 tasks. Each task has a unique priority assigned to it, which means that uC/OS-II cannot do round-robin scheduling.)
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
µC/OS-II: µC/OS-II: CharacteristicsCharacteristics
• Deterministic: Execution time of all uC/OS-II functions and services are deterministic. You can always know how much time uC/OS-II will take to execute a function or a service. Further more execution time of all uC/OS-II services do not depend on the number of tasks running in your application.
• Task Stacks: Each task requires its own stack (uC/OS-II allows each task to have a different stack size. This allows to reduce the amount of RAM needed for application).
• Services: system services such as mailboxes, queues, semaphores, fixed-sized memory partitions, time-related functions, etc.
• Interrupt Management: Interrupts can suspend the execution of a task. If a higher priority task is awakened as a result of the interrupt, the highest priority task will run as soon as all nested interrupts complete. Interrupts can be nested up to 255 levels deep.
• Robust and Reliable: uC/OS-II is based on uC/OS, which has been used in hundreds of commercial applications since 1992.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Workload DesignWorkload Design
CharacteristicsCharacteristics::
Worst case application: maximum maximum system calls consumesystem calls consume.
System calls: SynchronizationSynchronization, SemaphoresSemaphores, MemoryMemory, QueuesQueues, MessagesMessages, TasksTasks HandlingHandling, TimingTiming ManagementManagement, etc.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
The system workload is The system workload is
continuously runningcontinuously running and consists and consists
of a series of tasks executing the of a series of tasks executing the
application. application.
Consistency checksConsistency checks are added are added
to the to the application codeapplication code and and kernelkernel
to detect faults and invalid values to detect faults and invalid values
at the at the kernel callskernel calls in order to in order to
improve system robustness.improve system robustness.
The WDT / I-IP is the monitormonitor.
Workload DesignWorkload Design
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW
Addition of Consistency Checks
void RandomNumberTask(void *pdata) void RandomNumberTask(void *pdata)
{ { // Declare as auto to ensure reentrancy. // Declare as auto to ensure reentrancy. auto OS_TCB data; auto OS_TCB data; auto INT8U err; auto INT8U err; auto INT16U RNum;auto INT16U RNum;OSTaskQuery(OS_PRIO_SELF, &data); OSTaskQuery(OS_PRIO_SELF, &data); while(1) while(1) { { // Rand is not reentrant, so access must be controlled // Rand is not reentrant, so access must be controlled // via a semaphore. // via a semaphore. OSSemPend(RandomSem, 0, &err);OSSemPend(RandomSem, 0, &err); RNum = (int)(rand() * 100); RNum = (int)(rand() * 100); OSSemPost(RandomSem);OSSemPost(RandomSem);printf("Task%02d's random #: %d\n",data.OSTCBPrio,RNum);printf("Task%02d's random #: %d\n",data.OSTCBPrio,RNum);// Wait 3 seconds in order to view output from each task. // Wait 3 seconds in order to view output from each task. OSTimeDlySec(3); OSTimeDlySec(3); } } }}
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHW// 1. Define necessary configuration constants for uC/OS-II // 1. Define necessary configuration constants for uC/OS-II #define OS_MAX_EVENTS 2 #define OS_MAX_EVENTS 2 #define OS_MAX_TASKS 20 #define OS_MAX_TASKS 20 #define OS_MAX_QS 0 #define OS_MAX_QS 0 #define OS_Q_EN 0 #define OS_Q_EN 0 #define OS_MBOX_EN 0 #define OS_MBOX_EN 0 #define OS_TICKS_PER_SEC 32#define OS_TICKS_PER_SEC 32
// 2. Define necessary stack configuration constants // 2. Define necessary stack configuration constants #define STACK_CNT_512 1 // initial program stack #define STACK_CNT_512 1 // initial program stack #define STACK_CNT_1K OS_MAX_TASKS // task stacks#define STACK_CNT_1K OS_MAX_TASKS // task stacks// 3. This ensures that the above definitions are used // 3. This ensures that the above definitions are used #use "ucos2.lib“#use "ucos2.lib“
void RandomNumberTask(void *pdata);void RandomNumberTask(void *pdata);// Declare semaphore global so all tasks have access // Declare semaphore global so all tasks have access
OS_EVENT* RandomSem;OS_EVENT* RandomSem;void main(){ void main(){ int i;int i;// Initialize OS internals // Initialize OS internals OSInit();OSInit();for(i = 0; i < OS_MAX_TASKS; i++){for(i = 0; i < OS_MAX_TASKS; i++){// Create each of the system tasks // Create each of the system tasks OSTaskCreate(RandomNumberTask, NULL, 1024, i);OSTaskCreate(RandomNumberTask, NULL, 1024, i);} } // semaphore to control access to random number generator // semaphore to control access to random number generator RandomSem = OSSemCreate(1);RandomSem = OSSemCreate(1);// 4. Set number of system ticks per second // 4. Set number of system ticks per second OSSetTicksPerSec(OS_TICKS_PER_SEC);OSSetTicksPerSec(OS_TICKS_PER_SEC);// Begin multi-tasking // Begin multi-tasking OSStart(); OSStart(); }}
OS Call(task waits for signal)
OS Call
(task sends a signal)
Initializing Tasks
Starting Tasks
Workload DesignWorkload Design
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHWWorkload DesignWorkload Design
OS_ENTER_CRITICAL
/*Code implemented for GNU-GAS*/ asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #11, %a0 | Move the byte “11” to the address “a0” ");
…
asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #00, %a0 | Move the byte “00” to the address “a0” ");
OS_EXIT_CRITICAL
Set an indication for the instant when the processor gets into the supervisor mode “OS_ENTER_CRITICAL”and when when it leaves this mode: “OS_EXIT_CRITICAL”. The signaling is done by writing to a specific memory address.
2. The Possible Solution2. The Possible Solution
Migrating Migrating SW-BasedSW-Based Fault Detection Mechanism into Fault Detection Mechanism into HWHWWorkload DesignWorkload Design
/************************************************************** PEND ON SEMAPHORE*************************************************************/UBYTE OSSemPend(OS_SEM *psem, UWORD timeout){ UBYTE x, y, bitx, bity;
OS_ENTER_CRITICAL();
/*Code implemented for GNU-GAS*//*Code implemented for GNU-GAS*/ asm ("asm (" move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100”move.l #0x0100, -(%a0) | Write in “a0” the hexadecimal “0x0100” move.b #4, %a0 | Move the byte “4” to the address “a0”move.b #4, %a0 | Move the byte “4” to the address “a0” ");");/*End*//*End*/ if (psem->OSSemCnt-- > 0) {
OS_EXIT_CRITICAL(); return (OS_NO_ERR);} else { OSTCBCur->OSTCBStat |= OS_STAT_SEM; OSTCBCur->OSTCBDly = timeout; y = OSTCBCur->OSTCBPrio >> 3; x = OSTCBCur->OSTCBPrio & 0x07; bity = OSMapTbl[y]; bitx = OSMapTbl[x];
Systems Calls performed by Pend and Post through Semaphore, Mailbox and QUEUE
if ((OSRdyTbl[y] &= ~bitx) == 0) OSRdyGrp &= ~bity; psem->OSSemTbl[y] |= bitx; psem->OSSemGrp |= bity;
OS_EXIT_CRITICAL(); OSSched();
OS_ENTER_CRITICAL();
if (OSTCBCur->OSTCBStat & OS_STAT_SEM) { if ((psem->OSSemTbl[y] &= ~bitx) == 0) { psem->OSSemGrp &= ~bity; } OSTCBCur->OSTCBStat = OS_STAT_RDY;
OS_EXIT_CRITICAL(); return (OS_TIMEOUT); } else {
OS_EXIT_CRITICAL(); return (OS_NO_ERR); } }}
Consistency Check
Consistency Check
Co
ns
iste
nc
y C
he
ck
Matteo Sonza Reorda, 2002-05Fault Tolerant Systems Group
Politecnico di Torino
3. Experimental Evaluation3. Experimental Evaluation
• An Intel 8051-based SoC was inspected.
• PANDORA I-IP: VHDL (~1500 lines).
3. Experimental Evaluation3. Experimental Evaluation
• Fault detection capabilities evaluated via HW-based
fault injection experiments (FPGA environment).
• Four benchmarks considered:
– Matrix multiplication, Elliptical Filter,
FIR Filter and Viterbi Algorithm.
3. Experimental Evaluation3. Experimental Evaluation
Detection capabilities:• Transient faults (30,000 bit-flips)
• Number of wrong answers evaluated (escape detection).
Matrix 9.78 0.18 0.99 4.88
Ellipf 20.83 0 2.38 14.29
FIR 5.64 0 2.12 4.49
Viterbi 21.06 4.89 6.33 17.48
CFCSS [%]
Program Plain [%]Pandora
[%]ECCA [%]
Orig. SW IP (HW+SW) SW Sol. SW Sol.
3. Experimental Evaluation3. Experimental Evaluation
Memory overhead:
• Additional code lines required to implement the
hybrid technique.
Matrix 223 385 902 456
Ellipf 303 361 640 347
FIR 194 364 701 320
Viterbi 436 707 1,115 725
ECCA [byte]
CFCSS [byte]
Prog.Plain [byte]
Pandora [byte]
Orig. SW IP (HW+SW) SW Sol. SW Sol.
3. Experimental Evaluation3. Experimental Evaluation
Execution time overhead:
Matrix 31,211 41,462 102,356 43,791
Ellipf 16,268 17,815 25,635 17,611
FIR 43,434 71,994 153,458 57,357
Viterbi 286,364 328,150 349,111 314,244
Prog.Plain
[cycle]Pandora [cycle]
ECCA [cycle]
CFCSS [cycle]
Orig. SW IP (HW+SW) SW Sol. SW Sol.
3. Experimental Evaluation3. Experimental Evaluation
Area overhead:
PANDORA size 992 gates
8051 size 30480 gates
PANDORA introduces about
3.2% of area overhead
Area overhead is expected to decrease when processor size increases.
4. Final Considerations4. Final Considerations
Development of a hybrid
methodology (HW+SW redundancies)
able to perform runtime detection of
errors in μprocessor-based SoCs may
have very good cost X benefit
returns.
Returns: Minimization of area overhead and fab/development costs
(benefits of SW-based redundancy techniques)
Improvement of performance and minimization of memory
overhead (benefits of HW-based redundancy techniques)
In summary: Minimize fab cost and performance degradation, while
improving reliability
Target faults:Control flow errors
Data handling errors
4. Final Considerations4. Final Considerations
A hybrid methodology (HW+SW
redundancies) explores:
• I-IP Core Architecture
• Software-Based Techniques
4. Final Considerations4. Final Considerations
4. Final Considerations4. Final Considerations
System architecture co-implemented in HW+SW to detect faults in
control-flow and application data. The main characteristics of this
architecture:
SW-embedded structures at the application code level.
Partial migration of the SW-embedded structures into HW:
specific I-IIP monitors application processor such as a “watch-dog”.
Communication channel between the HW+SW entities: driver
embedded in the OS Kernel and specific signals used to
communicate the I-IP with the application processor.