Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation...

17
Using Software Rules To Using Software Rules To Enhance FPGA Reliability Enhance FPGA Reliability Chandru Mirchandani Chandru Mirchandani Lockheed-Martin Transportation & Security Lockheed-Martin Transportation & Security Solutions Solutions September 7-9, 2005 September 7-9, 2005 P226/MAPLD2005 P226/MAPLD2005 MIRCHANDANI MIRCHANDANI 1

Transcript of Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation...

Page 1: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Using Software Rules To Enhance Using Software Rules To Enhance FPGA ReliabilityFPGA Reliability

Chandru MirchandaniChandru Mirchandani

Lockheed-Martin Transportation & Security SolutionsLockheed-Martin Transportation & Security Solutions

September 7-9, 2005September 7-9, 2005

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 11

Page 2: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

IntroductionIntroduction

To meet…To meet…• System ObjectivesSystem Objectives

Develop a Process to…Develop a Process to…• Verify FPGA CapabilityVerify FPGA Capability• Validate FPGA ReliabilityValidate FPGA Reliability• Enhance FPGA QualityEnhance FPGA Quality

By developing an Adaptive Model…….. By developing an Adaptive Model……..

……...using Software Rules…....using Software Rules….

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 22

Page 3: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Problem StatementProblem Statement

Requirement: Display sensor data in near-real Requirement: Display sensor data in near-real timetime

Constraints: No loss of data, data quality & Constraints: No loss of data, data quality & integrity, and timelinessintegrity, and timeliness

Information: Uncertain…to make design decision Information: Uncertain…to make design decision with lowest risk of failurewith lowest risk of failure

Solution………Adaptive ModelSolution………Adaptive Model

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 33

Page 4: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Software ReliabilitySoftware Reliability

Develop Criteria for Design Objective AcceptanceDevelop Criteria for Design Objective Acceptance

Prioritize tasks or functions in order of criticalityPrioritize tasks or functions in order of criticality

Develop metrics to measure performance of tasks Develop metrics to measure performance of tasks with respect to constraintswith respect to constraints

Evaluate design options based on measured Evaluate design options based on measured reliability metricsreliability metrics

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 44

Page 5: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Typical Typical SoftwareSoftware Options Options

Critical software functions are distributed as Critical software functions are distributed as redundant instances on multiple processors, thus redundant instances on multiple processors, thus minimizing the loss of service due to a processor minimizing the loss of service due to a processor failure……..failure……..

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 55

Processor 1

Processor 2

Application A1 (I-ary)

Application A1 (II-ary)

Page 6: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 66

Typical Software Options (contd.)Typical Software Options (contd.)

Distributing system level functions so that Distributing system level functions so that multiple users can independently use the multiple users can independently use the function…....function…....

Processor 1

Processor 2

Application B1

Application B1

Page 7: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 77

Typical Software Options (contd.)Typical Software Options (contd.)

Data replication to minimize the loss of critical Data replication to minimize the loss of critical data in the event of a processor failure or data in the event of a processor failure or software system failure….. software system failure…..

Processor 1

Processor 2

Application C1

Application C1

Storage 1

Storage 2

Page 8: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Redundant Instances of SoftwareRedundant Instances of Software

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 88

Initially detect, contain and recover from faults as Initially detect, contain and recover from faults as soon as possible, and in the event this is not soon as possible, and in the event this is not possiblepossible

Allow the control to be passed on to the Allow the control to be passed on to the redundant instance within the reliability and redundant instance within the reliability and availability requirements levied on the system availability requirements levied on the system

Finally, include language defined mechanisms to Finally, include language defined mechanisms to detect and prevent the propagation of errorsdetect and prevent the propagation of errors

Page 9: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

MethodologyMethodology

Estimate the reliability based on instruction set Estimate the reliability based on instruction set and operational usageand operational usage

Re-design critical elements to decrease riskRe-design critical elements to decrease risk

Re-evaluate the risk of failure based on a change Re-evaluate the risk of failure based on a change in critical task design based on performance and in critical task design based on performance and requirementsrequirements

Re-evaluate the reliability based on failure rateRe-evaluate the reliability based on failure rate

Factor in the Uncertainty in EvaluationFactor in the Uncertainty in Evaluation

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 99

Page 10: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1010

Task TimesTask Times

Task ClassTask Class StepsSteps Step Time Step Time (s(stasktask))

Task TimeTask Time Total Tasks Time (tTotal Tasks Time (ttasktask))

Reading Reading rr xxriri SSrr ssrr..xxriri (s(srr..xxrri).ni).nrr = = ttrr

Parsing Parsing pp xxpipi sspp sspp..xxpipi (s(spp..xxppi).ni).npp = = ttpp

Pre-processing Pre-processing pp11 xxp1ip1i ssp1p1 ssp1p1..xxp1ip1i (s(sp1p1..xxp1p1i).ni).np1p1 = =

ttp1p1

Monitoring Monitoring MM xxMiMi ssMM ssMM..xxMiMi (s(sMM..xxMMi).ni).nMM = =

ttMM

Sorting Sorting ss xxsisi ssss ssss..xxsisi (s(sss..xxssi).ni).nss = = ttss

Processing Processing PP xxPiPi ssPP ssPP..xxPiPi (s(sPP..xxPPi).ni).nPP = = ttPP

Post-processing Post-processing pp22 xxp2ip2i ssp2p2 ssp2p2..xxp2ip2i (s(sp2p2..xxp2p2i).ni).np2p2 = =

ttp2p2

Status-gathering Status-gathering SS xxSiSi ssSS ssSS..xxSiSi (s(sSS..xxSSi).ni).nSS = = ttSS

Writing Writing ww xxwiwi ssww ssww..xxwiwi (s(sww..xxwwi).ni).nww = = ttww

Page 11: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1111

FPGA System - ConceptualFPGA System - Conceptual

SR

SR

SP

SP

SPP

SPP

Input Output

Consider a FPGA-based system comprising of the Consider a FPGA-based system comprising of the Reading, Parsing and Pre-Processing Tasks….. Reading, Parsing and Pre-Processing Tasks…..

……each Task is a subsystemeach Task is a subsystem

Page 12: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1212

Task Reliability Block DiagramTask Reliability Block Diagram

Reading Reading

HW SW

Reading

CCF

Reading Reading

HW SW

[1-{1-(exp(-(1-γ[1-{1-(exp(-(1-γhh).λ).λ

shwishwi.t).exp(-(1-γ.t).exp(-(1-γss).λ).λ

sswisswi.t))}^2].t))}^2] (exp(-γ(exp(-γhh.u.uhh.λ.λhwihwi.t).exp(-γ.t).exp(-γ

ss.u.uss.λ.λswiswi.t).t)

AND OR

Page 13: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1313

DefinitionsDefinitions

Calendar Time – τCalendar Time – τ Mission Time to Calculate the ReliabilityMission Time to Calculate the Reliability

Execution – eExecution – eii Percentage of Mission Time used by the Task (or Subsystem)Percentage of Mission Time used by the Task (or Subsystem)

Execution Time – tExecution Time – t eeii . τ . τ

Usage for SWUsage for SW Percentage of the Total software used by the TaskPercentage of the Total software used by the Task

Usage for HWUsage for HW Percentage of Area of the Active portion of the Device used by TaskPercentage of Area of the Active portion of the Device used by Task

λλshwishwi Failure Intensity of Task Failure Intensity of Task ii hardware with respect to Execution time hardware with respect to Execution time

λλsswisswi Failure Intensity of Task Failure Intensity of Task ii software with respect to Execution time software with respect to Execution time

γγhihi Fraction of Task Fraction of Task ii Task hardware that are common cause failures Task hardware that are common cause failures

γγsisi Fraction of Task Fraction of Task ii Task software that are common cause failures Task software that are common cause failures

Page 14: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

Parameters & DerivationsParameters & Derivations

Failure Intensity: Failure Intensity: λλshwishwi = λ = λhwihwi.u.uhh.(1-γ.(1-γ

hh))

Failure Intensity: Failure Intensity: λλsswisswi = λ = λswiswi.u.uss.(1-γ.(1-γ

ss))

Common Cause:Common Cause: λλhwihwi.u.uhh.(γ.(γhh) and λ) and λ

swiswi.u.uss.(γ.(γss))

Execution Time Execution Time tt:: eeii . Τ . Τ

RSSi : Subsystem ReliabilitySubsystem Reliability

System Reliability RSystem Reliability RS :S : RRSS1 SS1 .. RRSS2 SS2 .. RRSS3SS3

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1414

   ReadingReading ParsingParsing Pre-ProcessingPre-Processing

Usage SW - uUsage SW - uss 0.30.3 0.30.3 0.40.4

Usage HW - uUsage HW - uhh 0.30.3 0.40.4 0.30.3

λλhwihwi 0.30.3 0.40.4 0.30.3

λλswiswi 0.30.3 0.40.4 0.30.3

Execution - eExecution - eii 0.20.2 0.10.1 0.70.7

Page 15: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1515

System Configuration OptionsSystem Configuration Options

ConfigurationConfiguration HW Common Cause FractionHW Common Cause Fraction SW Common Cause FractionSW Common Cause Fraction

γγhh γγss

SameSame Code & Device Code & Device 0.010.01 11

SameSame Code & Code & DiffDiff Devices Devices 0.00250.0025 0.99750.9975

DiffDiff Code & Code & SameSame Device Device 0.010.01 0.50.5

DiffDiff Code & Devices Code & Devices 0.00250.0025 0.10.1

Page 16: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

ResultsResults

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1616

OptionOption ConfigurationConfiguration FPGA-based System ReliabilityFPGA-based System Reliability

11 Same Code, Same DevicesSame Code, Same Devices 0.8957265640.895726564

22 Same Code, Diff DevicesSame Code, Diff Devices 0.8959738150.895973815

33 Diff Code, Same DevicesDiff Code, Same Devices 0.9447525790.944752579

44 Diff Code, Diff DevicesDiff Code, Diff Devices 0.983561250.98356125

Page 17: Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin Transportation & Security Solutions September 7-9, 2005 P226/MAPLD2005.

ConclusionsConclusions

Cost and Schedule SlipsCost and Schedule Slips

Development Delays and CostsDevelopment Delays and Costs

Adaptive ModelAdaptive Model

Optimization and Design ConstraintsOptimization and Design Constraints

Contact Address: [email protected] Address: [email protected]

P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1717