Error analysis statistics
-
Upload
tarun-gehlot -
Category
Education
-
view
478 -
download
3
description
Transcript of Error analysis statistics
Slide 1
Error Analysis - Statistics
• Accuracy and Precision• Individual Measurement Uncertainty
– Distribution of Data– Means, Variance and Standard Deviation– Confidence Interval
• Uncertainty of Quantity calculated from several Measurements– Error Propagation
• Least Squares Fitting of Data
Slide 2
Accuracy and Precision
• AccuracyCloseness of the data (sample) to the “true value.”
• PrecisionCloseness of the grouping of the data (sample) around some central value.
Slide 3
Accuracy and Precision
• Inaccurate & Imprecise • Precise but Inaccurate
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Slide 4
Accuracy and Precision
• Accurate but Imprecise • Precise and Accurate
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Rel
ativ
e Fr
eque
ncy
X ValueTrue Value
Slide 5
Accuracy and Precision
Q: How do we quantify the concept of accuracy and precision? -- How do we characterize the error that occurred in our measurement?
Individual Measurement Statistics
• Take N measurements: X1, . . . , XN
• Calculate mean and standard deviation:
• What to use as the “best value” and uncertainty so we can say we are Q% confident that the true value lies in the interval xbest x.
• Need to know how data is distributed.
N
iiX
Nx
1
1
N
ixix X
NS
1
22 1
Slide 6
Slide 7
Population and Sample
• Parent PopulationThe set of all possible measurements.
• SampleA subset of the population -measurements actually made.
Population
Bag of Marbles
Handful of marbles from the bag
Samples
Slide 8
Histogram (Sample Based)
• Histogram– A plot of the number of
times a given value occurred.
• Relative Frequency– A plot of the relative
number of times a given value occurred.
Histogram
0
5
10
15
20
25
30 35 40 45 50 55 60 65 70 75 80
X Value (Bin)
Num
ber o
f M
easu
rem
ents
Relative Frequency Plot
0
0.05
0.1
0.15
0.2
0.25
0.3
30 35 40 45 50 55 60 65 70 75 80
X Value (Bin)
Rel
ativ
e Fr
eque
ncy
Slide 9
• Probability Distribution Function (P(x))
– Probability Distribution Function is the integral of the pdf, i.e.
Q: Plot the probability distribution function vs x.
Q: What is the maximum value of P(x)?
Probability Distribution (Population Based)
• Probability Density Function (pdf) (p(x))– Describes the probability
distribution of all possible measures of x.
– Limiting case of the relative frequency.
xX
dxxpxPx
Probability Density Function
0
0.05
0.1
0.15
0.2
0.25
0.3
30 35 40 45 50 55 60 65 70 75 80
x Value (Bin)
Prob
abili
ty p
er u
nit
chan
ge in
x
][ xXPxP Probability that
Slide 10
Ex:
is a probability density function. Find the relationship between A and B.
Probability Density Function
– The probability that a measurement X takes value between (-) is 1.
– Every pdf satisfies the above property.
Q: Given a pdf, how would one find the probability that a measurement is between A and B?
p x dx 1
p xA
xB
12
e
e 2
Hint: - a x dxa
120
Slide 11
• Gaussian (Normal) Distribution
where: x = measured valuex = true (mean) valuex = standard deviationx
2 = variance
Q: What are the two parameters that define a Gaussian distribution?
Common Statistical Distributions
2
2 2 1 e
2
x
x
x
x
p x
Q: How would one calculate the probability of a Gaussian distribution between x1and x2? ( See Chapter 4, Appendix A )
x Value
p x
Slide 12
• Uniform Distribution
where: x = measured valuex1 = lower limitx2 = upper limit
Q: Why do x1 and x2 also define the magnitude of the uniform distribution PDF?
Common Statistical Distributions
otherwise 0
121
12
xxxxx
xp
x Value
p x
Slide 13
Common Statistical Distributions
Ex: A voltage measurement has a Gaussian distribution with mean 3.4 [V] and a standard deviation of 0.4 [V]. Using Chapter 4, Appendix A, calculate the probability that a measurement is between:(a) [2.98, 3.82] [V]
(b) [2.4, 4.02] [V]
Ex: The quantization error of an ADC hasa uniform distribution in the quantization interval Q. What is the probability that the actual input voltage is within Q/8 of the estimated input voltage?
Slide 14
• Standard Deviation (x and Sx )– Characterize the typical deviation of measurements from the mean
and the width of the Gaussian distribution (bell curve).– Smaller x , implies better ______________.
– Population Based
– Sample Based (N samples)
Q: Often we do not know x , how should we calculate Sx ?
Statistical Analysis
x xx p x dx
2
12
N
ixix X
NS
1
21
Slide 15
• Standard Deviation (x and Sx ) (cont.)
Statistical Analysis
Common Name for"Error" Level
Error Level inTerms of
% That the Deviationfrom the Mean is Smaller
Odds That theDeviation is Greater
Standard Deviation 68.3 about 1 in 3
"Two-Sigma Error" 95 1 in 20
"Three-Sigma Error" 99.7 1 in 370
"Four-Sigma Error" 99.994 1 in 16,000
x x x xZ x Z
Slide 16
• Sampled Mean is the best estimate of x .
• Sampled Standard Deviation ( Sx )– Use when x is not available. reduce by one degree of freedom.
Q: If the sampled mean is only an estimate of the “true mean” x , how do we characterize its error?
Q: If we take another set of samples, will we get a different sampled mean?Q: If we take many more sample sets, what will be the statistics of the set of sampled means?
Statistical Analysis
x
dxxpxXEx
N
iiX
Nx
1
1
Degree of Freedom
Best Estimate
x
N
iix
N
ixix xX
NSX
NS x
1
2knownnot When
1
2
11 1
Slide 17
Statistical Analysis
Ex: The inlet pressure of a steam generator was measured 100 times during a 12 hour period. The specified inlet pressure is 4.00 MPa, with 0.7% allowable fluctuation. The measured data is summarized in the following table:Pressure (P)(MPa) Number of Results (m)
3.970 13.980 33.990 124.000 254.010 334.020 174.030 64.040 24.050 1
(1) Calculate the mean, variance and standard deviation. (2) Given the data, what pressure range will contain 95% of the data?
Slide 18
• Sampled Mean Statistics– If N is large, will also have a Gaussian distribution. (Central Limit Theorem)
– Mean of :
is an unbiased estimate.
– Standard Deviation of :
is the best estimate of the errorin estimating x .
Q: Since we don’t know x , how would we calculate ?
Confidence Interval
x
x xE x
x
x
x
x
x
N
x
x
x
x
p x( )
p x( )
p x( )
Slide 19
• For Large Samples ( N > 60 ), Q% of all the sampled means will lie in the interval
Equivalently,
is the Q% Confidence Interval
When x is unknown, Sx will be a reasonable approximation.
Confidence Interval
x
x x xx
N z zQ Q
x
Nx
Nx
xx
x x
z zQ Q
x x
p x
zQ x zQ x
Slide 20
Confidence Interval
Ex: 64 acceleration measurements were taken during an experiment. The estimated mean and standard deviation of the measurements were 3.15 m/s2
and 0.4 m/s2. (1) Find the 98% confidence interval for the true mean.
(2) How confident are you that the true mean will be in the range from 2.85 to 3.45 m/s2 ?
Slide 21
• For Small Samples ( N < 60 ), the Q% Confidence Interval can be calculated using the Student-T distribution, which is similar to the normal distribution but depends on N.
– with Q% confidence, the true mean x will lie in the following interval about any sampled mean:
t,Q is defined in class notes Chapter 4, Appendix B.
Confidence Interval
x S
Nx S
N
N
x
S
xx
Sx x
t t
where
,Q ,Q
Q% confidence interval
1
Slide 22
Confidence Interval
Ex: A simple postal scale is supplied with ½ , 1, 2, and 4 oz brass weights. For quality check, 14 of the 1 oz weights were measured on a precision scale. The results, in oz, are as follows:
1.08 1.03 0.96 0.95 1.041.01 0.98 0.99 1.05 1.080.97 1.00 0.98 1.01
Based on this sample and that the parent population of the weight is normally distributed, what is the 95% confidence interval for the “true” weight of the 1 oz brass weights?
Slide 23
Propagation of Error
Q: If you measured the diameter (D) and height (h) of a cylindrical container, how would the measurement error affect your estimation of the volume ( V = D2h/4 )?
Q: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?
How do errors propagate through calculations?
Slide 24
• A Simple ExampleSuppose that y is related to two independent quantities X1 and X2 through
To relate the changes in y to the uncertainties in X1 and X2, we need to find dy = g(dX1, dX2):
The magnitude of dy is the expected change in y due to the uncertainties in x1 and x2:
Propagation of Error
212211 , XXfXCXCy
dy
222
1
2
22
2
11
21 xxy CCxXfx
Xfy
Slide 25
• General FormulaSuppose that y is related to n independent measured variables {X1, X2, …, Xn} by a functional representation:
Given the uncertainties of X’s around some operating points:
The expected value of and its uncertainty y are:
Propagation of Error
nXXXfy ,,, 21
x x x x x xn n1 1 2 2 , , ,
nxxx
nn
n
xXfx
Xfx
Xfy
xxxfy
,,,
22
22
2
11
11
11
,,,
y
Propagation of Error
•Proof:Assume that the variability in measurement y is caused by k independent zero-mean error sources: e1, e2, . . . , ek.Then, (y - ytrue)2 = (e1 + e2 + . . . + ek)2
= e12 + e2
2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .
E[(y - ytrue)2] = E[e12 + e2
2 + . . . + ek2 + 2e1e2 + 2e1e3 + . . .]
= E[e12 + e2
2 + . . . + ek2]
y k kE e E e E e 12
22 2
12
22 2
Slide 26
Slide 27
• Example (Standard Deviation of Sampled Mean)Given
Use the general formula for error propagation:
Propagation of Error
NXXXXN
x 3211
N
Xx
Xx
Xx
Xx
xx
xN
xxxx N
22
3
2
2
2
1321
Slide 28
Propagation of Error
Ex: What is the uncertainty in calculating the kinetic energy ( mv2/ 2 ) given the uncertainties in the measurements of mass (m) and velocity (v)?
KE KEm
m KEv
v
mv mm
mv vv
mv mm
vv
2 2
22
22
22 2
12
2
12
2
Slide 29
• Best Linear Fit–How do we characterize “BEST”?
Fit a linear model (relation)
to N pairs of [xi, yi] measurements.
Given xi, the error between the estimated output and the measured output yi is:
The “BEST” fit is the model that minimizes the sum of the ___________ of the error
Least Squares Fitting of Data
Input X
Out
put Y best linear
fit yest
measured output yi
y a a xi o i 1
y i
n y yi i i
min minn y yi
i=
N
i ii=
N2
1
2
1
Least Square Error
Slide 30
Let
The two independent variables are?
Q: What are we trying to solve?
Least Squares Fitting of Data
J y y y a a xi ii=
N
i o ii=
N
2
11
2
1
M inim ize Find and such that 1J a a dJo 0
Ja
y a a x
o
i o iiN
0
2 011
Ja
x y a a xi i o iiN
0
2 011
Slide 31
Least Squares Fitting of Data
Rewrite the last two equations as two simultaneous equations for ao and a1:
ax y x x y
aN x y x y
N x xo
i i i i i
i i i ii i
2
1
2 2
where
a N a x y
a x a x x y
aa
yx y
o i i
o i i i i
o i
i i
1
12
1
Slide 32
• Summary: Given N pairs of input/output measurements [xi, yi], the best linear Least Squares model from input xi to output yi is:
where
• The process of minimizing squared error can be used for fitting nonlinear models and many engineering applications.
• Same result can also be derived from a probability distribution point of view (see Course Notes, Ch. 4 - Maximum Likelihood Estimation ).
Q: Given a theoretical model y = ao + a2 x2 , what are the Least Squares estimates for ao & a2?
Least Squares Fitting of Data
y a a xi o i 1
a
x y x x y
aN x y x y N x x
oi i i i i
i i i ii i
2
1
2 2
and
Slide 33
Least Squares Fitting of Data
• Variance of the fit:
• Variance of the measurements in y: y2
• Assume measurements in x are precise.• Correlation coefficient:
is a measure of how well the model explains the data.R2 = 1 implies that the linear model fits the data perfectly.
RS
n
y
n
y
22
2
2
21 1
,
n N i o iiN y a a x2 1
2 12
1