7/30/2019 INTRODUCCION ESTADISTICO
1/37
Introductory Statistics
By Peter Woolf ([email protected])
University of Michigan
Michigan Chemical Process
Dynamics and Controls
Open Textbook
version 1.0
Creative commons
7/30/2019 INTRODUCCION ESTADISTICO
2/37
A foolish consistency is the hobgoblin of little
minds R. W. Emerson
But is this always true??
7/30/2019 INTRODUCCION ESTADISTICO
3/37
Consistency
Why might we want consistency?
Integration of products within a larger
system
Examples: want parts to fit together, want
consistent chemical feeds, want consistent
material properties, want consistent energy
content, want consistent flavor
7/30/2019 INTRODUCCION ESTADISTICO
4/37
Consistency What can be the downsides of
consistency?
Make something consistently bad, but
consistent.
Sometimes people trade consistency forquality--this is not the goal.
Examples: Fast food vs home made food
(depends the cook)
7/30/2019 INTRODUCCION ESTADISTICO
5/37
Measures of Quality (or lack there of):
Six Sigma: Number of defects per million
opportunities
Genichi Taguchi: Uniformity around a target valueor The loss a product imposes on society after it
is shipped
Process control is a central tool for reducing
variability by adjusting and correcting for
variations.
Key Questions: How can we know if our
control system is working well enough?
How can we measure variability?
7/30/2019 INTRODUCCION ESTADISTICO
6/37
Process Specific Questions
1) Do recent data indicate that the
process is broken or changed?
2) Is the process out of control?
3) What are the odds that two samples
come from the same distribution?
4) What factors influence this outcome?
7/30/2019 INTRODUCCION ESTADISTICO
7/37
Detecting if a process has
changed
Scenario: You are a small Acai juice
vendor trying to expand to a world
market with a consistent product.
7/30/2019 INTRODUCCION ESTADISTICO
8/37
Acai juice production
Acai berries in the market
Berry
crusher
juice
7/30/2019 INTRODUCCION ESTADISTICO
9/37
Acai juice productionjuice
A key selling point of your acai
juice is that it contains a large
concentration of antioxidants.
With your berry crusher you get agood quality product most of the time,
but not always. You dont want to
waste berries if your crusher is
hurting your product, but how can
you know if it is not working right?
How can you test this?
7/30/2019 INTRODUCCION ESTADISTICO
10/37
Acai juice productionjuiceHow can you test this?
1) Gather many samples from your
current process and measure the
antioxidant concentration
N sample values: 40.1, 41.3, 44.3, 39.3,
38.6,..
How do we summarize this?
7/30/2019 INTRODUCCION ESTADISTICO
11/37
juice
N sample values: 40.1, 41.3, 44.3,
39.3, 38.6,..How do we summarize this?
=1
Nx
i
i=1
N
"Average:
Deviation from
the average:
(std deviation)"=
1
N(x
i#)2
i=1
N
$
7/30/2019 INTRODUCCION ESTADISTICO
12/37
Deviation from
the average:
(std deviation)
"=
1
N
(xi#)2
i=1
N
$
Interpretation: The average distance from the mean
OR the width of the dispersion around the mean
Problem: What if I have only one sample (e.g. N=1)?
=0!!
Does this mean that the underlying process has no
variation or that I have not sampled it sufficiently?
Result:When N is small, the standard deviation willunderestimate the true variation
Solution: sample standard deviations =
1
(N"1)(x
i")2
i=1
N
#
7/30/2019 INTRODUCCION ESTADISTICO
13/37
Population standard
deviation
(Real deviation)
"=
1
N(x
i#)2
i=1
N
$
s =1
(N"1)(x
i")2
i=1
N
#
Sample standard
deviation
(Observed deviation)
With a measure of the mean and standard deviation, you
have enough information to define a Gaussian distribution
Bell curve shape
based on a model of alarge number of random,
uncorrelated changes
7/30/2019 INTRODUCCION ESTADISTICO
14/37
Gaussian or Normal Distribution:
From previous lecture on Noise:
Approximate Gaussian distribution in Excel by:=RAND()+RAND()+RAND()-RAND()-RAND()-RAND()
The approximation is better and better for larger numbers
of pairs of add and subtract
Gaussian distribution is the basis of much of statisticalquality control, six sigma, and quality engineering in general.
2
3
-2
-3
6 How do we mathematically
define a normal distribution?
7/30/2019 INTRODUCCION ESTADISTICO
15/37
mean and standard deviation aresufficient statistics, meaning that they are sufficient to
describe a normal distribution
Mathematically, we can describe a normal distribution by the followingprobability
distribution functi on :
PDF(x |,") =1
" 2#
exp $1
2
x $
"
%
&
'(
)
*
2+
,-
.
/0
7/30/2019 INTRODUCCION ESTADISTICO
16/37
If we want to find the density up to some point, say z or less we can just integrate:
PDF(x |,")dx#$
z
% =1
21+ erf
z #
" 2
&
'(
)
*+
,
-.
/
01
(Note: this just makes one hard problem into another, in that now we have to calculate theerror function)
The error function is defined as:
erf(x) 22
3
exp(#t2)dt0
x
%
How can we calculate this?
Excel:
Error function is Erf(), thus the solution above could be
expressed as
=1/2*(1+erf((z-m)/(s*sqrt(2))))
Mathematica:
Nintegrate[ f(x), {x,start, end}]
Or
N[1/2*(1+Erf[(z-m)/(s*Sqrt[2])])]
General numerical integration
Using analytical solution
with error function
7/30/2019 INTRODUCCION ESTADISTICO
17/37
juice
Acai juice problem revisited
From 100 samples of the
current process we calculate
the following:
Mean=40 units
Standard deviation= 2 units
From these data, what are the
odds that the next batch will
have an antioxidant value of
37.5 or less?
1
" 2#exp $
1
2
x $
"
%&'
()*2+
,-./0$1
37.5
2 dx
=
1
21+ erf
37.5"
# 2
$
%&
'
()
*
+,
-
./
7/30/2019 INTRODUCCION ESTADISTICO
18/37
Mean=40 units
Standard deviation= 2 units
From these data, what are the
odds that the next batch willhave an antioxidant value of
37.5 or less?
1
" 2#exp $
1
2
x $
"
%&'
()*2+
,-./0$1
37.5
2 dx
=
1
21+ erf
37.5"
# 2
$
%&
'
()
*
+,
-
./
In Mathematica:
short hand notation
Answer: ~10% of the time we expect this situation
7/30/2019 INTRODUCCION ESTADISTICO
19/37
Example 1:Say that we have a reactor with a temperature mean of 100 and standard deviation of 5
degree. Calculate the probability of measuring a temperature of 92 or less.
PDF(x |100,5)dx"#
92
$ =1
21+ erf
92 "100
5 2
%
&'
(
)*
+
,-
.
/0=
1
21+ erf "1.13( )[ ] = 0.054
What about 100 or less? -> 0.5
Example 2:
Given this same system, what is the probability that the reactor is within 4 sigma of themean? (e.g. +/- 10 degrees)
PDF(x |100,5)dx"#
110
$ " PDF(x |100,5)dx"#
90
$ =
1
21+ erf
110 "100
5 2
%
&'
(
)*
+
,-
.
/0"
1
21+ erf
90"100
5 2
%
&'
(
)*
+
,-
.
/0= 0.9545
7/30/2019 INTRODUCCION ESTADISTICO
20/37
Example 1:Say that we have a reactor with a temperature mean of 100 and standard deviation of 5
degree. Calculate the probability of measuring a temperature of 92 or less.
PDF(x |100,5)dx"#
92
$ =1
21+ erf
92 "100
5 2
%
&'
(
)*
+
,-
.
/0=
1
21+ erf "1.13( )[ ] = 0.054
What about 100 or less? -> 0.5
Example 2:
Given this same system, what is the probability that the reactor is within 4 sigma of themean? (e.g. +/- 10 degrees)
PDF(x |100,5)dx"#
110
$ " PDF(x |100,5)dx"#
90
$ =
1
21+ erf
110 "100
5 2
%
&'
(
)*
+
,-
.
/0"
1
21+ erf
90"100
5 2
%
&'
(
)*
+
,-
.
/0= 0.9545
7/30/2019 INTRODUCCION ESTADISTICO
21/37
Acai juice production as a function of time
time
Antioxidant
value
Is this process out of control?
Yes: It is unusual to see so many
batches with such a high value--
this is strange and suggestssomething has changed.
No: This is just normal variation--
nothing is fundamentally different.
Key question:
How do we define
unusual
7/30/2019 INTRODUCCION ESTADISTICO
22/37
One definition: Variation
outside of the six sigma
window is unusual
ean and standard deviation aresufficient statistics, meaning that they are sufficient to
describe a normal distribution
athematically, we can describe a normal distribution by the following probability
istribution function :
PDF(x |,") =1
" 2#exp $
1
2
x $
"
%
&'
(
)*2+
,-
.
/0
2 3-2-3
6
What are the odds of finding
something that falls out of this
bound by chance?
Find by integration!
For both tails the probability is ~0.0027
or 1 in 370
Common confusion:
The Six Sigma process
defines unusualas 3.4 defects
out of 1 million, not within 6
standard deviations (more like10.2 deviations)
7/30/2019 INTRODUCCION ESTADISTICO
23/37
Acai juice production as a function of time
time
Antioxidant
value
Is this process out of control?
Translation: if we assume outside of 6 sigma variation is
unusual: Is this pattern expected to happen less than 1 in
370 of our samples?Solution: Control charts!
7/30/2019 INTRODUCCION ESTADISTICO
24/37
Image from wikipedia western_electric_rules
Control charts determine if a process is behaving in an unusual
way.
7/30/2019 INTRODUCCION ESTADISTICO
25/37
Image from wikipedia western_electric_rules
Control charts determine if a process is behaving in an unusual
way.
What are the odds?
If each dot is a single measurement, and UCL is +3 sigma then
UCL=Uppercontrol limit
X-bar=
average
LCL=Lowercontrol limit
For both tails the probability is ~0.0027
or 1 in 370
Rule 1:
7/30/2019 INTRODUCCION ESTADISTICO
26/37
Control charts determine if a process is behaving in an unusual
way.
What are the odds?
UCL=Uppercontrol limit
X-bar=
average
LCL=Lowercontrol limit
Rule 2: Can do using probability theory.
Assuming each sample is independent, then can find the
total probability of:
2*[P1(out+)P2(out+)P3(out+)+P1(out+)P2(out+)P3(in)+P1(out+)P2(in)P3(out+)+P1(in)P2(out+)P3(out+)]
=P(out+) P(in)=1-P(out+)
=0.00305
or 1 in 326
1 in 370 1 in 326
7/30/2019 INTRODUCCION ESTADISTICO
27/37
What are the odds?Alternative solution by sampling
1 in 370 1 in 326
Approach: Generate
thousands of samples
and test to see how
many satisfy the rule
~ similar to 1 in 370
7/30/2019 INTRODUCCION ESTADISTICO
28/37
What are the odds?
1 in 326
Alternative solution by sampling
Rule 2:
~ similar to 1 in 326
Message: Many complex decision
processes can be evaluated
numerically with good accuracy
(see mathematica code on
website under Lecture 21.nb)
7/30/2019 INTRODUCCION ESTADISTICO
29/37
Odds 1 in 370 Odds 1 in 326
Odds 1 in 256Odds 1 in 180
In all cases these
represent somewhat
rare cases in a
statistical sense, butthey are not all
equally rare.
These are not only
constrained on
statistics though..
e.g. What are the odds
of finding 15 consecutive
samples in zone c?
=Odds 1 in 306
Thus is this system out of control?
Yes, but in a good way.
7/30/2019 INTRODUCCION ESTADISTICO
30/37
Acai juice problem revisited
What if you know that each batch of
berries has some variation, but you
are unsure if the machine is
behaving strangely? Can you still
use your control charts?
Solution: Take samples from each
batch, average them and plotthese average values and
statistics on a control chart.
Problem: The process of
averaging out different samples
will change your odds--averaging
reduces out variation.
Day 1: 40.36, 39.36, 38.43, 39.67
Day 2: 39.96, 40.32, 39.88, 39.75
7/30/2019 INTRODUCCION ESTADISTICO
31/37
Acai process
control using
X-bar charts
Raw Data:
Plotting the raw data, it
is hard to say if
anything is going on..
7/30/2019 INTRODUCCION ESTADISTICO
32/37
To get something
like this need UCL
and LCL
Acai process
control using
X-bar chartsRaw Data:
Data in excel example online
Lecture.21.xls
7/30/2019 INTRODUCCION ESTADISTICO
33/37
To get something
like this need UCL
and LCL
UCL= grand avg+
A3*(avg stdev)
= 39.86+ 1.628*0.55
=40.76
7/30/2019 INTRODUCCION ESTADISTICO
34/37
To get something
like this need UCL
and LCL
UCL= grand avg+
A3*(avg stdev)
= 39.86+ 1.628*0.55
=40.76
Note: If you use A2, you
use the average R. The
result is 40.77--nearly the
same.LCL=grand avg-A3*(avg stdev)
=38.96
UCL represents 3 standard
deviations away from the mean, sothe line between zones A/B is 2
standard deviations away:
A/B line=grand avg+
A3*(avg stdev)*(2/3)= 40.46
7/30/2019 INTRODUCCION ESTADISTICO
35/37
X-bar chart
Is it in control?
Rule 2: fail: points 9 and 10 are in zone A
Rule 1: okay, no points outside of zone A
Rules 3 and 4: okay
Conclusion:
Not in statisticalcontrol.
7/30/2019 INTRODUCCION ESTADISTICO
36/37
Take Home Messages Statistical process control is a method
for systematically identifyinginconsistencies.
Probabilities are often based on aGaussian process
Control charts provide a systematic
method for evaluating if a process isunder control.
7/30/2019 INTRODUCCION ESTADISTICO
37/37
A foolish consistency is the hobgoblin of
little minds
--R. W. Emerson
An intelligent consistency is a virtue in
an integrated global economy
--Anonymous
Top Related