Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M....

45
Implementing Fault Tolerant Systems with Windows CE .NET Reliable System Design 2010 by: Amir M. Rahmani

Transcript of Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M....

Page 1: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

Implementing Fault Tolerant Systems with Windows CE .NET

Reliable System Design 2010by: Amir M. Rahmani

Page 2: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

CONTENTS Embedded System Real-Time System Windows CE.NET Definitions and Concepts Software Fault Tolerance Techniques Key Operating System Properties Applying Fault Tolerant Techniques Conclusions

Page 3: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

What is an Embedded System?

Electronic devices that incorporate a computer (usually a microprocessor) within their implementation.

A computer is used in such devices primarily as a means to simplify the system design and to provide flexibility.

Often the user of the device is not even aware that a computer is present.

Page 4: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Embedded Device Aerospace: space exploration (e.g., the Mars

Pathfinder) Automotive : air bag controls, GPS mapping

Communications : Satellites; network routers Computer Peripherals : Printers, scanners Home : Dishwashers, microwave ovens Industrial : Elevator controls, robots Medical : Imaging systems (e.g., XRAY, MRI(

Page 5: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

What is a Real-Time System?

Real-time systems process events Events occurring on external inputs

cause other events to occur as outputs.

Minimizing response time is usually a primary objective, otherwise the entire system may fail to operate properly.

Page 6: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Hard/Soft Real-Time Systems

Soft Real-Time System Compute output response as fast as possible, but no

specific deadlines that must be met (delay doesn’t cause application fail) ex: ATM

Hard Real-Time System Output response must be computed by specified

deadline. (delay causes application fail) ex: mussel tracker

Fault Tolerant Real-Time Scheduling

Page 7: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

What is Windows CE .NET? Combines an advanced real-time embedded

operating system with the most powerful tools for rapidly creating the next generation of smart devices.

Windows CE is optimized for devices that have minimal storage .

Devices are often configured without disk storage, and may be configured as a "closed" system that does not allow for end user extension .

Page 8: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Why Windows CE .NET? Very popular embedded OS for PDAs and

mobiles Microsoft’s latest 32-bit embedded OS platform Easy to use, familiar development tools Real-Time Communication Broad CPU support Full feature functionality and flexibility Hard real-time preemptive multitasking kernel Easy Platform Development with Platform

Builder

Page 9: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Thin Clients

Digital Audio Receivers and Players

Smart Displays

Voice-over IPDevices

Medical Devices

IndustrialAutomation

Wide range of devices

Mobile Handhelds

Set-Top Boxes

Gateways

Page 10: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Windows Mobile 2005 Windows CE .NET

Page 11: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Windows CE: Different Hardware

Desktop/Laptop PC 2 GHz Pentium IV 256 K 512 K cache 512 M 1 G DRAM 100 G hard drive 1280x1024 display 50 Watts Keyboard & mouse Extensible through

• PCI, AGC, USB, EISA, 1394, PC-Card, CF, enet, …

Device 400 MHz RISC 4 K 8K cache 4 M ~32 M DRAM 4 M ~32 M Flash / ROM 170x170 640x480 <1 to 2 Watts Stylus or Thumb Not very extensible

• PC-Card, SD, CF

Page 12: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Definitions and Concepts A System is a collection of interacting

components that deliver a service through a service interface to a user.

Dependability of a computing system is the ability to deliver service that can justifiably be trusted.

The system state is the set of component states.

An error is a system state that may lead to failure.

Page 13: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Definitions and Concepts

Availability, the readiness for correct service. Reliability, the continuity of that service. Safety, the avoidance of tragic consequences

on the environment. Security, the prevention of unauthorized

access. Fault tolerance is the ability of a system to

deliver of correct service in the presence of faults .

Page 14: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Specification

A key part of the specification is the identification and classification of faults that must be tolerated or otherwise remediated.

The specification should define fault and error scenarios, consider system response when faults occur in coincidence, and consider the likelihood and handling of latent errors.

Page 15: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Implementation

Implementation begins with the partitioning of the system into subsystems.

Subsystems implement local error detection and recovery, and provide support for higher-level detection and recovery.

Error detection can be performed concurrently with the delivery of the service, or by preemptively suspending the service to execute tests.

Page 16: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Software Fault Tolerance Techniques

Two common software fault tolerance schemes based on redundancy and the assumption that the occurrence of correlated faults are unlikely, are the use of :

Recovery Block

N-Version Programming

Page 17: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Recovery Block The Recovery Block (RB) design makes

use of a primary and one or more alternate program blocks.

Each of which performs the specified operations, but in different ways.

A test on the results determines the acceptability of the results.

Page 18: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Recovery Block

CurrentMethod = PrimaryMethod( );

do {

result = CurrentMethod();

if (AcceptanceTest(result) == OK)

return result;

} While(CurrentMethod=NextAlternateMethod());

fail();

Page 19: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

N-Version Programming uses parallel execution of N independently

developed functionally equivalent software versions.

The outputs of all versions are examined in a voter block to determine the correct output, if one exists.

NVP is commonly called voter-based fault tolerance.

Page 20: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Key Operating System Properties

Understanding Windows CE.NET's implementation of each of these areas is critical to develop fault tolerant applications:

- proper use and partitioning of application code into processes and threads

- correct setting of priorities - use of exception handling - implementation of watchdog behaviors

Page 21: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Processes and Threads Each application that runs under Windows CE .NET is called a process. The basic unit of execution is the thread. Threads execute any part of the code in the

process . Threads allow the application to perform

multiple tasks at the same time. Support of preemptive multitasking allows

Windows CE .NET to create the effect of executing multiple threads simultaneously.

Page 22: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Processes and Threads

Windows CE .NET supports up to 32 processes and as many threads as memory permits.

Each process has a designated primary thread the WinMain function.

The key difference between threads within the same process and threads in two different processes is the memory protection and the context switch times.

Page 23: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Synchronization Objects

Partitioning a monolithic application into several threads and processes requires coordination and synchronization.

Some of synchronization objects that enable communication include:

critical sections

mutexes

semaphores

events

Page 24: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Exception Handling

Windows CE .NET supports the try-except and try-finally Microsoft extensions to the C language to pass execution control to exception handling code.

The compilers included in Windows CE .NET support the C++ try, catch and throw exception handling code.

Page 25: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Priorities Win CE .NET provides 256 priority levels

that you can set on a per thread basis. The choice of which thread priorities to use

is very critical. Thread priorities from 0-247, with 0 being

the highest priority, are referred to as the real-time priorities and require a call to CeSetThreadPriority() to access them.

The normal thread priorities from 248-255 are accessed by using SetThreadPriority().

Page 26: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Real-Time Thread Priorities

0-98(-20) : Open – Real Time Above Drivers 99 : Power management Resume Thread 100-108 : USB OHCI UHCI, Serial 109-129 : Irsir1, NDIS, Touch 133-144 : Open – Device Drivers 145 : PS2 Keyboard 146-147 : Open – Device Drivers 153-247 : Open – Real Time Below Drivers

Page 27: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Normal Thread Priorities

248 : Power Management 249 : WaveDev, Mouse, PnP ,Power 250 : WaveAPI 251 : Power Manager Battery Thread 252-255 : Open

Page 28: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Applying Fault Tolerant Techniques

1- Partition into Threads and Processes

2- Implement Watchdog Threads

3- Implement Exception Handling

4- Implement Interrupt Level Fault Detection

Page 29: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Partition into Threads and Processes

Threads and processes are the building block of Windows CE .NET applications.

Isolation of logic into separate threads with the same process is the first level of protection.

Moving threads to other processes ensures addition protection.

Simple example of a monolithic application is:

Page 30: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

#include "stdafx.h" int g_nCount1 = 0; int g_nCount2 = 0; int WINAPI WinMain(…) { while( 1 ) { Algoritm1(); if( g_nCount1 == 4 ) { Algorithm2(); g_nCount1 = 0; } if( g_nCount2 == 9 ) { Algorithm3(); g_nCount2 = 0; } g_nCount1++; g_nCount2++; Sleep(5); } return 0; }

Page 31: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Breaks up the monolithic application

#include "stdafx.h"

HANDLE g_htAlgoritm1;

HANDLE g_htAlgoritm2;

BOOL g_fFinished = FALSE;

DWORD WINAPI ThreadAlgoritm1 (...);

DWORD WINAPI ThreadAlgoritm2 (...);

Page 32: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

int WINAPI WinMain(…) { g_htAlgoritm1 = CreateThread(..,ThreadAlgorithm1,...); g_htAlgoritm2 = CreateThread(..,ThreadAlgorithm2,...); if( !g_htAlgoritm1 || !g_htAlgoritm1 ) return 20; if( !CeSetThreadPriority(g_htAlgoritm1,50) ) return 30; if( !CeSetThreadPriority(g_htAlgoritm2,60) ) return 30; while( !g_fFinished ) { Sleep( 500 ); } return 0; } DWORD WINAPI ThreadAlgoritm1( LPVOID lpvParam ) } while( !g_fFinished ) { Sleep( 5); Algorithm1(); } return 0; } DWORD WINAPI ThreadAlgoritm2( LPVOID lpvParam ) }while( !g_fFinished ) { Sleep(25); Algorithm2 (); } return 0;}

Page 33: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

They key elements to Process 1 are:

Create the two threads ThreadAlgorithm 1 & ThreadAlgorithm 2

Set the thread priorities: Set Thread Algorithm 1 to 50 (higher than the Algorithm 2) Set Thread Algorithm 2 to 60 . Loop every 500 ms waiting for the

g_fFinished flag to be set ThreadAlgorithm 1 and ThreadAlgorithm 2

sleep the appropriate amount of time and call their respective algorithms

Page 34: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

#include "stdafx.h" HANDLE g_htAlgoritm3; BOOL g_fFinished = FALSE; DWORD WINAPI ThreadAlgoritm3 (...); Int WINAPI WinMain(...){ g_htAlgoritm3 = CreateThread(…,ThreadAlgorithm3,…); if( !g_htAlgoritm3 ) return 20; if( !CeSetThreadPriority(g_htAlgoritm3, 40 ) ) return 30; while( !g_fFinished ) { Sleep( 500 ); } return 0; }

DWORD WINAPI ThreadAlgoritm3 ( LPVOID lpvParam ) } while( !g_fFinished ) { Sleep(50); Algorithm3 (); } return 0;{

Page 35: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

They key elements to Process 2 are:

Create the one thread ThreadAlgorithm3 Set thread algorithm 3 to 40 (higher than the Algorithm 1 & 2). Loop every 500 ms waiting for the g_fFinished

flag to be set . ThreadAlgorithm3 sleeps the appropriate

amount of time and calls Algorithm 3. The two processes have allowed for

partitioning of the algorithms and have increased their ability to run in the event of logic faults in the others.

Page 36: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Partition into Threads and Processes

The developer now has a tremendous amount of flexibility.

The ability to set the thread priorities allows for the prioritization of the algorithms, which was not available in the monolithic case.

The advantages of such a partitioning are just the beginning of the fault tolerant techniques available to the Windows CE .NET developer.

Page 37: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Implement Watchdog Threads

Watchdog threads can be implemented to verify that a particular set of logic is continuously executing.

a control thread with the responsibility to set a particular watchdog event for each control cycle.

If the watchdog thread ever wakes up without the control thread having run, the watchdog thread will execute a shut down behavior.

Page 38: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

#include "stdafx.h"

HANDLE g_hevWatchdog;

HANDLE g_htWatchdog;

HANDLE g_htControl;

BOOL g_fFinished;

DWORD WINAPI ThreadWatchdog (…);

DWORD WINAPI ThreadControl (...);

Simple watchdog example

Page 39: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

int WINAPI WinMain( ...)

{

g_hevWatchdog = CreateEvent(NULL, FALSE, FALSE,NULL);

if(!g_hevWatchdog) return 10;

g_htWatchdog = CreateThread(NULL, 0, ThreadWatchdog, NULL, CREATE_SUSPENDED, &dwThreadID ( ;

if( !g_htWatchdog )return 20;

g_htControl = CreateThread(NULL, 0, ThreadControl , NULL, CREATE_SUSPENDED, &dwThreadID ( ;

if( !g_htControl ) return 20;

if( !CeSetThreadPriority(g_htWatchdog, 5 )) return 30;

if( !CeSetThreadPriority(g_htControl, 10 )) return 40;

ResumeThread( g_htWatchdog );

ResumeThread( g_htControl );

while( !g_fFinished )

{ Sleep( 500 ); }

return 0;

}

Page 40: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

DWORD WINAPI ThreadControl(LPVOID lpvParam ){ while( !g_fFinished ) { SetEvent( g_hevWatchdog ); // Do application processing here... Sleep( 5); } return 0; } DWORD WINAPI ThreadWatchdog( LPVOID lpvParam )} while( !g_fFinished ) { result = WaitForSingleObject( g_hevWatchdog, 500 ); if( result == WAIT_TIMEOUT ) { // We have a failure, application thread is dead!!! WRITE_PORT_UCHAR( (PUCHAR)PowerAddress,POWER_SHUTDOWN); } } return 0; }

Page 41: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Key steps in initializing the controller

Create the watchdog event. Set up the watchdog and control thread

suspended. Set the watchdog thread priority to 5 (higher than the control thread). Set the control thread priority to 10. Resume both the watchdog and control

threads. Wait for the g_fFinished flag to be set.

Page 42: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Exception Handling Exceptions are anomalous situations that

may occur during your program execution. including common software faults such as

arithmetic value domain errors, and access to invalid memory addresses.

With C++ exception handling you can directly handle exceptions or pass exception control up to a higher level where handling the error is more appropriate.

Page 43: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

#include <iostream.h> int WINAPI WinMain(…){ char *buf; try { while( 1) { buf = GetBuffer(); if( buf == 0) throw "Memory Failure from GetBuffer!"; Sleep(4); } } catch( char* str ) { cout << "Exception raised! " << str <<<'\n'; } catch( … ) { throw; // Pass control to the next } return 0; }

Page 44: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

Conclusions Internet and handset devices are growing quickly, so

threats to the PDAs and mobiles become more and more serious.

The entrance of Windows CE .NET into the embedded community has enabled a new class of complex and robust embedded software products.

Partitioning monolithic, complex algorithms into separate threads and processes, proper usage of thread priorities, the use of exception handling, the implementation of watchdog behaviors and algorithmic redundancy are readily available techniques that may be used in a Windows CE .NET implementation to deliver more dependable services.

Page 45: Implementing Fault Tolerant Systems with Windows CE.NET Reliable System Design 2010 by: Amir M. Rahmani.

matlab1.ir

References Nat Frampton and Richard Lee, “Implementing Fault

Tolerant Systems with Windows CE .NET”, 2003.

Mike Hall, “Windows CE Development”, 2004.

Daniel W. Lewis, “Embedded Systems”, 2000.