Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources:...
-
Upload
sherilyn-lyons -
Category
Documents
-
view
218 -
download
0
Transcript of Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources:...
Topics:
Statistics & Experimental DesignThe Human Visual SystemColor ScienceLight Sources: Radiometry/PhotometryGeometric OpticsTone-transfer FunctionImage SensorsImage ProcessingDisplays & OutputColorimetry & Color MeasurementImage EvaluationPsychophysics
Design of experimentsWhy is it important?
• We wish to draw meaningful conclusions from data collected
• Statistical methodology is the only objective approach to analysis
Design of experiments
• Recognize the problem• Select factor to be varied, levels and ranges over which factors will be varied• Select the response variable• Choose experimental design:
• Sample size?• Blocking?• Randomization?
• Perform the experiment• Statistical analysis• Conclusions and recommendations
Let’s start easy• We would like to compare the output of two systems.
• Design a testing protocol and run it several times
Run SystemA SystemB
1 y1A y1B
2 y2A y2B
3 y3A y3B
… … …
Visualize data
16
16.5
17
17.5
18
18.5
0 1 2 3 4 5 6 7 8 9 10
Run #
Output
system_A
system_B
For small data sets: Scatter plot
Visualize data
For larger data sets: Histogram
• Divide horizontal axis into intervals (bins)• Construct rectangle over interval with area proportional to number (frequency) of observations
freq
uen
cy,
ni
Statistical inferenceDraw conclusions about a population using a sample from that population.
• Imagine hypothetical population containing a large number N of observations.• Denote measure of location of population as
∑==i
iyN
mean Population1
μ
Statistical inference
• Denote spread of population as variance
( )
N
yi
i∑ −=
2
2
μσ
Statistical inferenceA small group of observations is known as a sample.
• A statistic like the average is calculated from a set of data considered to be a sample from a population
∑=
==n
1iiy
nyaverage Sample
1
Run SystemA SystemB
1 y1A y1B
2 y2A y2B
3 y3A y3B
… … …
Ay By
Statistical inference
• Sample variance supplies a measure of the spread of the sample
( )
1n
yys
n
1ii
−
−=
∑=
2
2
65.554.543.532.521.510.50
0.275
0.25
0.225
0.2
0.175
0.15
0.125
0.1
0.075
0.05
0.0250
Probability distribution functions
52.50-2.5-5
1
0.75
0.5
0.25
0
x
f
x
f
P(axb)
Probability distribution functions
P(xi)
xi
P(x = xi) = p(xi)
( )( ) ( )
( )∑ =
==≤≤
ixi
iii
ii
1xp
xof valuesallfor xpxxP
xof valuesallfor 1xp0
Mean, variance of pdf
• Mean is a measure of central tendency or location
• Variance measures the spread or dispersion
( )∑=y
xxpμ
( ) ( )xpxy
∑ −= 22 μσ
Normal distribution
( )2
2
1
2
1 ⎟⎠
⎞⎜⎝
⎛ −−
= σ
μ
πσ
x
exf = standard deviation = √ mean
3 , == 3 2 , == 0 1 , =−= 3
1050-5-10
0.5
0.375
0.25
0.125
x
y
x
y
1050-5-10
0.5
0.375
0.25
0.125
x
y
x
y
1050-5-10
0.5
0.375
0.25
0.125
x
y
x
y
Normal distribution,
• From previous examples we can see that mean = and variance = 2 completely characterize the distribution.
• Knowing the pdf of the population from which sample is draw determine pdf of particular statistic.
( )2σμ ,N
Normal distribution
• Probability that a positive deviation from the mean exceeds one standard deviation is 0.1587 1/6 = percentage of the total area under the curve. (Same as negative deviation)
• Probability that a deviation in either direction will exceed one standard deviation is 2 x 0.1587 = 0.3174
• Chance that a positive deviation from the mean will exceed two = 0.02275 1/40
Normal distribution
• Sample runs differ as a result of experimental error
• Often can be described by normal distribution
Standard Normal distribution, N(0,1)
1050-5-10
0.5
0.375
0.25
0.125
x
y
x
y
−
y
z
Values for N(0,1) are found in tables.
Standard Normal distribution, N(0,1)
Standard Normal distribution, N(0,1)
Example:
Suppose the outcome of a given experiment is approximately normally distributed with a = 4.0 and = 0.3. What is the probability that the outcome may be 4.4?
Look in table in previous page, to find that the probability is 9%.
1.330.3
44.4
ó
ìyz =
−=
−=
distribution
Another sampling distribution that can be defined in terms of normal random variables.
• Suppose z1, z2, …, zk are normally and independently distributed random variables with mean = 0 and variance 2 = 1 (NID(0,1)), then let’s define
Where follows the chi-square distribution with k degrees of freedom.
222
21 kzzz +++= Kχ
distribution
2520151050
0.2
0.15
0.1
0.05
0
k = 1
k = 5
k = 10
k = 15
Student’s t Distribution
• In practice we don’t know the theoretical parameter
• This means we can’t really use and refer to the result of
the table of standard normal distribution
• Assume that experimental standard deviation s can be used as an estimate of
−
=y
z
Student’s t Distribution
Define a new variable
It turns out that t has a known distribution.
It was deduced by Gosset in 1908
s
yt
−=
Student’s t Distribution
52.50-2.5-5
0.3
0.2
0.1
0
k=1
k=10k=100
Probability points are given in tables.
The form depends on the degree of uncertainty in s2, measured by the number of degrees of freedom, k.
Inferences about differences in means
• Statistical hypothesis: Statement about the parameters of a probability distribution.
Let’s go back to the example we started with, i.e., comparison of two imaging systems.
We may think that the performance measurement of the two systems are equal.
Hypothesis testing
211
210
≠=
::
H
H
First statement is the Null hypothesis, second statement is the Alternative hypothesis. In this case it is a two-sided alternative hypothesis.
How to test hypothesis? Take a random sample, compute an appropriate test statistic and reject, or fail to reject the null hypothesis H0.
We need to specify a set of values for the test statistic that leads to rejection of H0. This is the critical region.
Hypothesis testing
Two errors can be made:
• Type I error: Reject null hypothesis when it is true• Type II error: Null hypothesis is not rejected when it is not true
• In terms of probabilities:( ) ( )( ) ( )false is HHreject tofailPerror II typeP
trueis HHreject Perror I typeP
00
00
==
==
β
α
Hypothesis testing
• We need to specify a value of the probability of type I error . This is known as significance level of the test.
• The test statistic for comparing the two systems is:
Where BA
p
BA0
k1
k1
s
yyt
+
−=
( ) ( )2kk
s1ks1ks
BA
2BB
2AA
p −+
−+−=2
Hypothesis testing
• To determine whether to reject H0, we would compare t0 to the t distribution with kA+kB-2 degrees of freedom.
• If we reject H0 and conclude that means are different.
We have:
220 −+>BA kktt ,/
System A System B
7616.=Ay 9217.=By
102 .=As 06102 .=As
3160.=As 2470.=Bs
10=Ak 10=Bk
Hypothesis testing
211
210
≠=
::
H
H
• We have kA + kB – 2 = 18
• Choose = 0.05
• We would reject H0 if
1012180250180500 .,.,. ==> ttt
Hypothesis testing
( ) ( ) ( )
2840
081018
061090192
.
...
=
=+
=−+
−+−=
p
BA
2BB
2AA
p
s
2kk
s1ks1ks
139
101
101
2840
92177616.
.
..−
+
−
+
−
BAp
BA0
k1
k1
yyt
Hypothesis testing
Since t0 = -9.13 < -t0.025,18 = -2.101 then we reject H0 and conclude that the means are different.
Hypothesis testing doesn’t always tell the whole story. It’s better to provide an interval within which the value of the parameter is expected to lie. Confidence interval.
In other words, it’s better to find a confidence interval on the difference A - B
Confidence interval
BApkkBABA
BApkkBA kk
styykk
styy1111
2222 2121++−≤−≤+−− −+−+ ,/,/
Using data from previous example:
So the 95 percent confidence interval estimate on the difference in means extends from -1.43 to -0.89.
Note that since A – B = 0 is not included in this interval, the data do not support the hypothesis that A = B at the 5% level of significance.
( ) ( )
890431
27016127016110
1
10
12840101292177616
10
1
10
12840101292177616
..
....
........
−≤−≤−
+−≤−≤−−
++−≤−≤+−−
BA
BA
BA
μμ
μμ
μμ