Practical Statistical Evaluation of Critical...

21
Practical Statistical Evaluation of Critical Software Peter Bernard Ladkin Causalis IngenieurGmbH 22 October 2017 Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 1 / 21

Transcript of Practical Statistical Evaluation of Critical...

Page 1: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Practical Statistical Evaluation of Critical Software

Peter Bernard Ladkin

Causalis IngenieurGmbH

22 October 2017

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 1 / 21

Page 2: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Motivation: IEC 61508-7 Annex D

IEC 61508-7:2010 Annex D, First paragraph:This annex provides initial guidelines on the use of a probabilistic approach to determiningsoftware safety integrity for pre-developed software based on operational experience. Thisapproach is considered particularly appropriate as part of the qualification of operatingsystems, library modules, compilers and other system software. The annex provides anindication of what is possible, but the techniques should be used only by those who arecompetent in statistical analysis.

The first and last sentences are accurate. The second woefully misleading.

There are technical things wrong also: it mixes up Bernoulli and Poisson process models

It has been that way for nearly twenty years (unchanged since the 1997 edition)

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 2 / 21

Page 3: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Options

Make minimal corrections to Annex D, then stop.

Convene new IEC committee to produce a Technical Report/Specification with moredetail.

AK 914.0.9 decided early on to promote this way of proceeding

it remains to be seen if AK 914.0.9 sticks with it

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 3 / 21

Page 4: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Options and Progress

Make minimal corrections to Annex D, then stop.

Make corrections and reduce scope; convene new IEC committee to produce a TechnicalReport/Specification with more detail.

The German National Committee formed a Working Group AK 914.0.9 in November2015, with international guests (Prof. Bev Littlewood, Bertrand Ricque).

AK 914.0.9 produced a putative Annex D replacement, last version October 2016.

This document was largely written by Bev Littlewood, with contributions from JensBraband.

The French National Committee voted to present it to IEC MT 61508-3 for discussion

Bertrand Ricque is leading further work in IEC MT 61508-3 on StatEval

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 4 / 21

Page 5: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Statistics in General

There are two ways to approach statistics.

(Many math and engineering undergrads; all social scientists):Formulas, manipulate;numbers, plug in;significance tests and confidence levels: plug in.

(Tukey and sensible people):I’ve got some numbers.What are they telling me?How can I get them to talk?Is what they say what I want to hear?

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 5 / 21

Page 6: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Stochastic Processes under Consideration for SW

Bernoulli Processes. Well-understood for 300 years.I Applies to a sequence of discrete demands, in which each demand may result in failure, with

a constant probability.

Poisson Processes. Well-understood for 200 years.I Applies to a continuous process over time, which may succeed (continuously) or fail

(discretely) and in which the chance of failure is independent of the time since the lastfailure, as well as other properties.

So-called ‘limit theorems”(Palm-Khintchine, Grigelionis) that allow you to aggregate anumber of stochastic processes into a more tractable process (for us, a Poissondistribution)

(Many people know the Central Limit Theorem, which aggregates processes into aNormal distribution. That has little relevance for us.)

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 6 / 21

Page 7: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Statistics of Critical Systems

Is this history/are these numbers telling me that this system is adequately safe to use?

(Weaker) Are the numbers supporting such an argument substantially?

There are two answers:

The aerospace answer

The IEC 61508 answer

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 7 / 21

Page 8: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

The Aerospace Answer

Statistical reasoning allows you to draw conclusions to high confidence but not certainty.

The regulatory performance requirement is absolute.

That would be equivalent to 100% confidence.

Statistical reasoning generally cannot give you that.

However, the AMC allows you to show adequate safety-reliability short of certainty, andhave this accepted as demonstration of compliance.

An anecdote: the ‘highly-reliable” autopilotI Punchline: PBL gets thrown off a mailing list

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 8 / 21

Page 9: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Another Anecdote: Ariane Flight 510

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 9 / 21

Page 10: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

And Then

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 10 / 21

Page 11: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

And Then

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 11 / 21

Page 12: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Ariane 501 Lessons

Lots of lessons. We’ll just learn one.

A SW routine threw up a run-time exception, on all channels.

These run-time exceptions were neither anticipated nor handled.

The kit was thereby lost.

It happened because one input parameter had different values from previous experience.

To reason from previous to future experience, one (just one!) strict requirement is that allinputs must have values seen in the history

Another requirement for Bernoulli/Poisson reasoning: with the same frequency ofoccurrence

Lesson: you can’t just muck about with “close enough”

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 12 / 21

Page 13: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Aerospace and Statistics

Many aerospace engineers say: don’t touch statistics, for three broad reasons

1 The precise record-keeping needed to render the statistical reasoning valid often (mostly)does not exist;

2 The reasoning rests on impractically precise assumptions, which (see above) must berigorously fulfilled;

3 Such reasoning is particularly vulnerable to misuse.

All these are true!

And not just in aerospace.

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 13 / 21

Page 14: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

A Non-Aerospace Need

Lots of people have legacy kit which historically is very reliable

Selling that working kit to new customers often requires satisfying the conditions of IEC61508

But when the kit was built, the documentation requirements were not as stringent asnow, and the needed documentation does not exist

Question: are the historical in-use data good enough for StatEval?

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 14 / 21

Page 15: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

A Reminder of the Concepts Used in IEC 61508

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 15 / 21

Page 16: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

A Decision

Suppose you are riding in a lift in a mine shaft. The lift safety systems are software-controlled.Two scenarios.

The engineer says “We have used this SW-based safety kit for fifteen years and it’s neverfailed”

The engineer says “We used some SW for fifteen years. It never failed, but we didn’t havethe documentation to qualify this software according to IEC 61508:2010.

So we reimplemented all the SW and this is its first trip!

I can assure you, THE DOCUMENTATION IS PERFECT! ”

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 16 / 21

Page 17: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Summary: What Works I

On-demand safety functions, such as a nuclear-reactor SCRAM, which are to be reliable say999 times out of 1,000 (expected rate of SCRAMS is less than one per year)

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 17 / 21

Page 18: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Summary: What Does Not Work I

Commercial-aircraft flight control systems for which you need a fleet-operational-lifetime’sworth of data before you can satisfy the AMC (Littlewood-Strigini, Butler-Finelli, both 1993)

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 18 / 21

Page 19: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Summary: What Works II

Detailed operation logs with all relevant parameters

With no failures

And no masked failures!

Bernoulli- resp. Poisson-process reasoning; some limit theorems

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 19 / 21

Page 20: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Summary: What Does Not Work II

Plugging some numbers into some formula, such as in Table D.1 of IEC 61508-7:2010Annex D

Mixing up Bernoulli and Poisson processes in mathematical formulas, as in Annex D

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 20 / 21

Page 21: Practical Statistical Evaluation of Critical Softwareconference.vde.com/fs/2017/Vortragsfolien/Documents/Practical... · Practical Statistical Evaluation of Critical Software ...

Peter Bernard Ladkin (Causalis) Practical Statistical Evaluation of Critical Software 22 October 2017 21 / 21