Confidence in Metrology: At the National Lab & On the Shop Floor Alan Steele, Barry Wood & Rob...

Confidence in Metrology:At the National Lab &

On the Shop Floor

Alan Steele, Barry Wood & Rob Douglas

National Research Council

Ottawa, CANADA

e-mail: [email protected]

National Research Conseil nationalCouncil Canada de recherches

Steele Wood and Douglas: ConfidenceNCSL Canada, September 2001

Outline• Measurements

– Communications, Comparisons– Fluctuations, Predictions

• Confidence– Comparisons, Proficiency Tests, on the Shop Floor

• Probability Calculus– confidence intervals– confidence levels

• A Toolkit for Excel– some Visual Basic Code

• A Worked Example – with real comparison data

• Conclusions


Measurement means Communication• The sole purpose of measurement is to communicate an

aspect of physical reality from one person*, place and time to another person*, place and time.

* or autonomous system for which a person is responsible

• The two people must have in commonan understanding of the measuranda system of numbers and units of measurementa means for describing measurement accuracy

“Alas, my work is all in vainIf it doesn’t get to Roundhead’s brain”


Measurement means Comparison

• Any useful measurement is a comparison

• The world uses the SI to provide a network that can inter-relate most of these comparisons

• The implied inter-relationships are checked by special Comparisons for Quality Assurance

Shop floorCalibrations (with NMIs)Proficiency Demonstrations (with NMIs) Bilateral Comparisons (between NMIs) Regional Comparisons (among NMIs)CIPM Key Comparisons (among NMIs)

• At NMIs, special definition-based comparisons are also required for the kelvin, second, kilogram … etc.


Measurement means Fluctuations• Usually, fluctuations can be observed in a measurement even

when we try to keep everything as constant as possible - WE INCLUDE THIS

• Usually, larger fluctuations are observed as temperature, pressure, humidity… are allowed to vary - WE INCLUDE THIS

• Usually, we anticipate an even larger range of fluctuations if the measurement were to be made by other reasonable means - WE INCLUDE THIS IN STANDARD UNCERTAINTY

0 10 20 30 40 50

Measu

rem

ent

Devi

atio

n (

ppm

)

-5-4-3-2-1012345


Measurement means Prediction• The most useful aspect of a measurement is its

predictive ability, either explicit or implicit

• The results of past comparisons are used to infer results for future comparisons

• There is a challenge to relate environmental conditions, history and aging to the accuracy ofa future comparison

Year0 10 20 30 40 50

Mea

sure

men

t Dev

iatio

n (p

pm)

-5-4-3-2-1012345

?


Confidence as a Commodity

• Measurement Confidence starts with CIPM, BIPM, CC’s, definition-based standards and realizations

• MRA, JCRB, Key Comparisons and Regional and Bilateral Comparisons demonstrate confidence

• Shared research and visits help develop Confidence in equivalence to the SI

• This system builds confidence at the National Lab level


Confidence as a Commodity

• You can buy Confidence from your NMI (NRC, NIST…) as calibration reports and Round-robin proficiency tests

• You can multiply Confidence in a well-run lab (CLAS) or on your factory floor

• You can sell Confidence as a commodity within your organization, as well as to your organization’s clients

• To market Confidence, it should be technically rigorous and accessible to non-statisticians


False Confidence• Any technically unjustified confidence claim is

potentially very harmful to any calibration or testing laboratory’s reputation

• Overly strong or technically wrong confidence claims are potentially lethal or actionable

• Sometimes clients need protection from themselves“Why do you have to measure it? I just want a calibration certificate for it!!”

• Rigour and careful wording can avoid false confidence


Overly Complicated Confidence

• “The equivalence study of eleven 10 Volt zeners showed a difference of Lab A - Lab B = +2.731.91 ppm with 230 degrees of freedom, where the 1.91 is the expanded uncertainty corresponding to approximately 95% confidence for a Student-t distribution with 230 degrees of freedom, k=1.97 times the pair standard uncertainty, 0.97 ppm, of the pair difference determined from the internal standard uncertainty statements of the measurements from the two laboratories (1.06 ppm for Lab A and 1.49 ppm for Lab B), with a correlation coefficient of +0.76 accounting for a covariance of +1.2x10-12. The external standard deviation was also evaluated with 21 degrees of freedom and gave a Birge ratio of 1.2.”

• There is a very limited market for this type of Confidence Statement, which still requires the user to deal with the 2.7 ppm bias it reveals...


Simple Confidence Statements

• “Lab A and Lab B are equivalent.” Not Rigorous

• “... 10 V measurements from Lab A and Lab B can be expected to agree with each other within 4.3 ppm, 19 times out of 20.”Has potential


Improved Confidence Statements• The Mutual Recognition Arrangement formalizes the

Key Comparison differences as the preferred means for generating confidence about equivalence

• New methods are being used to transform comparisons into statements of confidence like“On the basis of this Comparison, similar measurements made by Lab A and Lab B can be expected to agree with each other to within 4.3 ppm, with 95% confidence[or 19 times out of 20].”

• This clearer Confidence Statement has a wider market


Communicating with your Clients

Clarity is important to:

• Users of your measurements

• Your users’ management and QA managers

• Your users’ clients

• Your management

Your NMI can help you to communicate confidence clearly


Confidence from NMIs

• The methods used to create statements of confidence for Key Comparisonscan be used for proficiency testing done by your NMI

• Some calibration reports can also be used to generate this type of confidence statement, provided that the travel uncertainty of the artefact is under proper statistical control.


Confidence for your Clients

• The methods used to create statements of confidence for Key Comparisons can be used for proficiency testing done by you on your factory floor

• The statements are the simplest quantitative expressions about the equivalence of two measurement stations


Proficiency Testing

• Accreditation bodies routinely specify that “proficiency testing” on a regularly scheduled basis is a requirement for maintaining accreditation

• Usually the Pilot Laboratory for the comparison is the National Metrology Institute

• Usually the Pilot Laboratory result is taken as the comparison reference value, and the participants’ are initially evaluated against this “truth”

• This is a time-consuming and expensive exercise!


Proficiency Demonstrations

• A pilot lab measures and sends one or more artefacts around to be measured at other Labs

• Pilot re-measures artefact

• Pilot receives otherLabs’ measurements,analyzes them in escrow as comparisons,assigns travel uncertainty and prepares a report.

Pilot LabPilot Lab

116

1514

13

12 11

10

9 8

7

6

5

4

3

2


Proficiency Demonstrations

• CIPM organizes them for NMIs

• NMIs (NRC) organizes them for you

• Do you organize them for yourself ?

• Do you organize them for your clients ?


Proficiency Demonstrations: NMIs

• A pilot NMI measures and sends one or more artefacts around to be measured by NMIs

• Pilot NMI re-measures the artefact

• Pilot NMI receives otherNMIs’ measurements,analyzes them in escrow as comparisons,assigns travel uncertainty and prepares a report,CC and CIPM approve report, results posted on internet.

Pilot NMIPilot NMI

116

1514

13

12 11

10

9 8

7

6

5

4

3

2

NRC


Proficiency Demonstrations: CLAS labs

• NRC measures and sends one or more artefacts around to be measured by CLAS labs

• NRC re-measures the artefact

• NRC receives CLASlabs’ measurements,analyzes them in escrow as comparisons,assigns travel uncertainty and prepares a report.

NRCNRC

116

1514

13

12 11

10

9 8

7

6

5

4

3

2

Your Lab


Proficiency Demonstrations: Shop-Floor

• You measure and send one or more artefacts around to be measured by instruments you normally calibrate

• You re-measure the artefact

• You receive otherworkstations’ measurements,analyze them in escrow as comparisons,assign travel uncertainty and prepare a report.

Your LabYour Lab

116

1514

13

12 11

10

9 8

7

6

5

4

3

2


Proficiency Demonstrations vs Calibrations

• Proficiency Demonstrations evaluate travel uncertainty better

• Proficiency Demonstrations evaluate everything affecting the best capabilities, including environment and operator...

• Proficiency Demonstrations can establish tighter equivalence

• Proficiency Demonstrations require more artefacts and more organization

• Proficiency Demonstrations have new statistical tools and toolkits available for evaluating comparisons


Comparisons• Measurement comparisons provide the main experimental

evidence for “equivalence”

• In general, all participants measure a common artifact and their various results are analyzed from a single common perspective

• The participants may be different laboratories, or different measurement stations on your shop floor

A B C D E F G H I J K L M-3

-2

-1

0

1

2

3

4

5

De

via

tion

fro

m P

ilot


Key Comparisons and NMIs

• National Metrology Institutes have recently signed a “Mutual Recognition Arrangement” in which the validity of their Calibration and Measurement Capabilities is expressed

• The scientific underpinning for this arrangement is a series of “Key Comparisons” which are conducted at the very highest levels of metrology

• In practice, they are not much different from the proficiency tests already in general use among accredited laboratories around the world



-2

-1

0

1

2

3

4

5

Laboratory

De

via

tion

fro

m P

ilot

Reporting Results

• A metrologist reports a result in two parts

– the mean value: mL

– the uncertainty: uL

• The results are plotted as data points with error bars


Uncertainty Budgets• The ISO Guide to the Expression of Uncertainty of

Measurement is widely used as the basis for formulating and publishing laboratory uncertainty statements regarding measurement capabilities

• “Error bars” are an intrinsically probabilistic description of our belief in “what will happen next time” based on what we have done in the past

0 10 20 30 40 50

Me

asu

rem

ent

De

viat

ion

(p

pm)

-5-4-3-2-1012345


Uncertainty Budgets

• “Error bars” are intrinsically probabilistic

• The standard uncertainty interval contains ~68% of the events, or 68% of the histogrammed events, or 68% of the “probability density function”, in physical sciences often referred to as the probability distribution

Flip x and y axes


Probability Distributions• An ISO Guide-compliant uncertainty statement means that

the error bars represent the most expert opinion about the underlying normal (Gaussian) probability distribution

• The fancy name for working with these distributions is Probability Calculus

• In general, we are interested in integrals of the probability distribution

• Integration is only “fancy addition”


Confidence Levels• A confidence level is what we get upon integrating a

probability distribution over a given range [a,b]

• The fractional probability of observing a value between a & b is the normalized integration of the probability distribution function in the range [a, b]

• This is just addition of all the ‘bits’ of the function between a & b

1 68%

2 95%


Confidence Intervals• Remember: a confidence level is what we get by integrating the

distribution over a given range [a,b]

• The confidence interval is the fancy name for the range associated with the confidence level

• The range [-1,+1] is the 68% confidence interval

• The range [-2,+2] is the 95% confidence interval

1 68%

2 95%


Why would you want to do this?

• Lots of time and energy (and expense!) is invested in creating a laboratory result in a comparison

• Getting the maximum amount of information from a measurement comparison is desirable

• You’d like to show off your “confidence” to colleagues (and auditors!)

• Quantifying things is what we do as metrologists

• Your clients may want specific quantified answers to questions of Demonstrated Equivalence based on your Proficiency Testing results


How hard is it to do this?

• With normal distributions, the arithmetic is pretty easy

• You can try this for yourself and really see how it works…

…or you can let us do it for you!

• We have generated simple expressions to help evaluate normal confidence levels and normal confidence intervals, using well known statistical methods developed over the last hundred years or so

• We have put these expressions into a Toolkit for Excel


A Toolkit for Excel

• At NRC, we have written a Quantified Demonstrated Equivalence Toolkit for Microsoft Excel®

• The Toolkit is freely available by contacting us at

[email protected]

• We’ll add you to our mailing list and send you a copy of the sample spreadsheet with the Toolkit, plus a “User’s Guide” in .pdf format


Toolkit Functions and Macros

• The Toolkit contains Functions to:– calculate pair uncertainties (including correlations)– calculate weighted averages– calculate confidence levels– calculate confidence intervals

• The Toolkit contains Macros to:– generate bilateral “tables of equivalence”– generate bilateral “tables of confidence intervals”– generate bilateral “tables of confidence levels”


Toolkit Philosophy and Operation• Functions and Macros are built right in to the Spreadsheet,

and work just like “regular” Excel components


Toolkit Philosophy and Operation• The code is written in

Visual Basic• You can examine the

code to see how it works• Long variableNames

help to “self document” the programs

• You don’t have to look at the code or write your own functions to use the QDE Toolkit from NRC



-2

-1

0

1

2

3

4

5

De

via

tion

fro

m P

ilot

A Worked Example• 13 Laboratories participated in a Proficiency Test at 10 k

Lab Name V -Pilot (ppm) u(k=1) (ppm) U(k=2) (ppm)Lab A 0.10 0.59 1.2

Lab B 0.20 0.50 1.0

Lab C 0.00 0.08 0.2

Lab D -0.18 0.10 0.2

Lab E 1.80 0.75 1.5

Lab F 1.80 6.00 12.0

Lab G 0.01 0.60 1.2

Lab H -2.80 0.80 1.6

Lab I 0.38 0.41 0.8

Lab J -0.18 0.80 1.6

Lab K 4.20 6.50 13.0

Lab L 0.04 0.25 0.5

Lab M 0.80 0.60 1.2


Comparison to the NMI: En

• One common measure of success in Proficiency Tests is the “Normalized Error”

• This is the ratio of the laboratory deviation to the expanded uncertainty:

En(k=2) = abs(mLab - mRef)/sqrt(ULab2 + URef

2)

• Generally, the Laboratory “passes” when En < 1

• En is a dimensionless quantity

Lab Name Lab A Lab B Lab C Lab D Lab E Lab F Lab G Lab H Lab I Lab J Lab K Lab L Lab MV -Pilot (ppm) 0.10 0.20 0.00 -0.18 1.80 1.80 0.01 -2.80 0.38 -0.18 4.20 0.04 0.80U(k=2) (ppm) 1.18 1.00 0.15 0.20 1.50 12.00 1.20 1.60 0.82 1.60 13.00 0.50 1.20

En(k=2) 0.09 0.20 0.02 0.88 1.20 0.15 0.01 1.75 0.46 0.11 0.32 0.08 0.67


Comparison to the NMI: QDC• A quantified approach to Proficiency Tests is to ask the

following question:

What is the probability that a repeat comparison would yield results such that Lab 1’s 95% uncertainty interval encompasses the Pilot Lab value?

• We call this “Quantified Demonstrated Confidence”

• QDC is a dimensionless quantity expressed in %


QDC 95% 94% 95% 59% 35% 94% 95% 7% 86% 95% 91% 95% 75%


Comparison to the NMI: En vs QDC

• and are both dimensionless quantities

• En and its interpretation as an acceptance criterion are difficult to explain to non-metrologists

• QDC and its numerical value are easily explained to non-metrologists

• Note that when En = 1 (and URef << ULab) QDC = 50%

Normalized Error Quantified Demonstrated Confidence


En(k=2) 0.09 0.20 0.02 0.88 1.20 0.15 0.01 1.75 0.46 0.11 0.32 0.08 0.67QDC 95% 94% 95% 59% 35% 94% 95% 7% 86% 95% 91% 95% 75%


Comparison to the NMI: QDE0.95

• A different quantified approach to Proficiency Tests is to ask the following question:

Within what confidence interval can I expect the Lab 1 value and the Pilot Lab value to agree, with a 95% confidence level?

• We call this “Quantified Demonstrated Equivalence”

• QDE0.95 is a dimensioned quantity, same units as V

Lab Name Lab A Lab B Lab C Lab D Lab E Lab F Lab G Lab H Lab I Lab J Lab K Lab L Lab MV -Pilot (ppm) 0.10 0.20 0.00 -0.18 1.80 1.80 0.01 -2.80 0.38 -0.18 4.20 0.04 0.80U(k=2) (ppm) 1.18 1.00 0.15 0.20 1.50 12.00 1.20 1.60 0.82 1.60 13.00 0.50 1.20QDE0.95 (ppm) 2.32 1.99 0.30 0.51 4.27 23.69 2.37 5.43 1.77 3.15 26.74 0.98 2.80


Comparison between Labs: Agreement• We can ask similar questions about agreement

between any two participants in the experiment:

Within what confidence interval (in ppm) can I expect the Lab 1 value and the Lab 2 value to agree, with a 95% confidence level?

Lab A Lab B Lab C Lab D Lab E Lab F Lab G Lab H Lab I Lab J Lab K Lab L Lab MLab A - 1.52 1.18 1.29 3.26 12.25 1.65 4.54 1.50 2.02 15.00 1.26 2.09Lab B 1.52 - 1.06 1.22 3.08 12.18 1.57 4.55 1.31 1.99 14.90 1.14 1.90Lab C 1.18 1.06 - 0.39 3.04 12.26 1.19 4.13 1.06 1.61 15.05 0.52 1.79Lab D 1.29 1.22 0.39 - 3.22 12.37 1.25 3.95 1.25 1.59 15.21 0.66 1.98Lab E 3.26 3.08 3.04 3.22 - 11.94 3.37 6.40 2.83 3.78 13.65 3.06 2.58Lab F 12.25 12.18 12.26 12.37 11.94 - 12.31 14.65 12.08 12.47 17.92 12.24 11.93Lab G 1.65 1.57 1.19 1.25 3.37 12.31 - 4.46 1.59 1.99 15.09 1.28 2.19Lab H 4.54 4.55 4.13 3.95 6.40 14.65 4.46 - 4.66 4.48 17.80 4.22 5.25Lab I 1.50 1.31 1.06 1.25 2.83 12.08 1.59 4.66 - 2.06 14.74 1.14 1.64Lab J 2.02 1.99 1.61 1.59 3.78 12.47 1.99 4.48 2.06 - 15.29 1.69 2.63Lab K 15.00 14.90 15.05 15.21 13.65 17.92 15.09 17.80 14.74 15.29 - 15.02 14.40Lab L 1.26 1.14 0.52 0.66 3.06 12.24 1.28 4.22 1.14 1.69 15.02 - 1.83Lab M 2.09 1.90 1.79 1.98 2.58 11.93 2.19 5.25 1.64 2.63 14.40 1.83 -


Comparison between Labs: Confidence

What if we ask:• What is the probability that a repeat comparison

would yield results such that Lab 1’s 95% uncertainty interval encompasses Lab 2’s value?

Or how about:• What is the probability that a repeat comparison

would yield results such that Lab 2’s 95% uncertainty interval encompasses Lab 1’s value?


Comparison between Labs: Confidence

• The answers to these questions of Quantified

Demonstrated Confidence are shown here

Lab A Lab B Lab C Lab D Lab E Lab F Lab G Lab H Lab I Lab J Lab K Lab L Lab MLab A - 87% 95% 93% 29% 15% 84% 4% 88% 75% 12% 93% 70%Lab B 80% - 93% 89% 25% 13% 79% 2% 86% 67% 10% 91% 68%Lab C 20% 22% - 41% 1% 2% 20% 0% 19% 15% 2% 44% 8%Lab D 24% 24% 57% - 1% 3% 25% 0% 16% 20% 2% 42% 7%Lab E 42% 46% 35% 27% - 20% 38% 0% 54% 33% 17% 37% 69%Lab F 94% 95% 94% 94% 95% - 94% 89% 95% 94% 81% 94% 95%Lab G 84% 86% 95% 94% 27% 15% - 5% 86% 76% 12% 93% 68%Lab H 9% 7% 7% 10% 0% 16% 11% - 4% 18% 11% 7% 2%Lab I 71% 78% 86% 74% 24% 11% 68% 0% - 55% 8% 83% 66%Lab J 88% 88% 95% 95% 36% 20% 88% 18% 87% - 16% 94% 73%Lab K 91% 91% 91% 90% 94% 84% 91% 82% 92% 90% - 91% 92%Lab L 56% 61% 94% 85% 5% 6% 56% 0% 59% 44% 5% - 32%Lab M 71% 77% 75% 64% 57% 16% 68% 1% 84% 57% 13% 75% -


Quantifying Equivalence

• What is the probability that a repeat comparison would have a Lab 2 value within Lab 1’s 95% uncertainty interval?

Probability Calculus tells us the answer:

QDC = 47%

• This is exactly the type of “awkward question” that a Client might ask!

95% interval


Quantifying Equivalence

• What is the probability that a repeat comparison would have a Lab 1 value within Lab 2’s 95% uncertainty interval?

Probability Calculus tells us the answer:

QDC = 22%

• These subtly different “awkward” questions have very different “straightforward” answers!

95% interval


Tricky things about Equivalence

• Equivalence is not transitive – Lab 1 and Lab 2 may both be “equivalent” to the

Pilot, but not to each other!• Equivalence is not commutative

– we are asking two very different questions here!

95% interval

QDC = 47%

95% interval

QDC = 22%


Conclusions• You are already doing quite a bit of Probability Calculus

when you present your results

• The arithmetic for quantified calculations is very straightforward when we have Normal Distributions

• Adding Statistical Confidence explicitly into your Lab’s results helps you to explain them to non-metrologists, and to present precisely what Proficiency Testing has demonstrated for:

– equivalence from different National Laboratories– accreditation assessment– your clients– your factory floor


A Toolkit for Excel

• At NRC, we have written a Quantified Demonstrated Equivalence Toolkit for Microsoft Excel®

• The Toolkit is freely available by contacting us at

[email protected]

• We’ll add you to our mailing list and send you a copy of the sample spreadsheet with the Toolkit, plus a “User’s Guide” in .pdf format

Confidence in Metrology: At the National Lab & On the Shop Floor Alan Steele, Barry Wood & Rob...

Documents

Transcript of Confidence in Metrology: At the National Lab & On the Shop Floor Alan Steele, Barry Wood & Rob...