Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
-
Upload
horace-bryant -
Category
Documents
-
view
240 -
download
0
Transcript of Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?
![Page 1: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/1.jpg)
Module 7: Comparing Datasets and Comparing a Dataset
with a Standard
How different is enough?
![Page 2: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/2.jpg)
module 7 2
Concepts Independence of each data point Test statistics Central Limit Theorem Standard error of the mean Confidence interval for a mean Significance levels How to apply in Excel
![Page 3: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/3.jpg)
module 7 3
Independent Measurements
Each measurement must be independent (shake up basket of tickets)
Example of non-independent measurements– Public responses to questions (one result affects
next person’s answer)– Samplers too close together, so air flows
affected
![Page 4: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/4.jpg)
module 7 4
Test Statistics
Some number calculated based on data In student’s t test, for example, t If t is >= 1.96 and
– population normally distributed,– you’re to right of curve, – where 95% of data is in inner portion,
symmetrically between right and left (t=1.96 on right, -1.96 on left)
![Page 5: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/5.jpg)
module 7 5
Test statistics correspond to significance levels
“P” stands for percentile Pth percentile is where p of data falls below,
and 1-p fall above
![Page 6: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/6.jpg)
module 7 6
Two Major Types of Questions Comparing mean against a standard
– Does air quality here meet NAAQS? Comparing two datasets
– Is air quality different in 2006 than 2005?– Better?– Worse?
![Page 7: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/7.jpg)
module 7 7
Comparing Mean to a Standard
Did air quality meet CARB annual standard of 12 microg/m3?
yearFt Smith avg
Ft Smith Min
Ft Smith Max
N_Fort Smith
‘05 14.78 0.1 37.9 77
![Page 8: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/8.jpg)
module 7 8
Central Limit Theorem (magic!)
Even if underlying population is not normally distributed
If we repeatedly take datasets These different datasets have means that
cluster around true mean Distribution of these means is normally
distributed!
![Page 9: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/9.jpg)
module 7 9
Magic Concept #2: Standard Error of the Mean
Represents uncertainty around mean
As sample size N gets bigger, error gets smaller!
The bigger the N, the more tightly you can estimate mean
LIKE standard deviation for a population, but this is for YOUR sample
€
=σN
![Page 10: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/10.jpg)
module 7 10
For a “large” sample (N > 60), or when very close to a normal distribution…
Confidence interval for population mean is:
⎟⎠
⎞⎜⎝
⎛±n
sZx
Choice of z determines 90%, 95%, etc.
![Page 11: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/11.jpg)
module 7 11
For a “Small” Sample
Replace Z value with a t value to get…
€
x ± ts
n
⎛
⎝ ⎜
⎞
⎠ ⎟
…where “t” comes from Student’s t distribution, and depends on sample size
![Page 12: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/12.jpg)
module 7 12
Student’s t Distribution vs. Normal Z Distribution
-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
density
T-distribution and Standard Normal Z distribution
T with 5 d.f.
Z distribution
![Page 13: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/13.jpg)
module 7 13
Compare t and Z Values
Confidencelevel
t value with5 d.f
Z value
90% 2.015 1.65
95% 2.571 1.96
99% 4.032 2.58
![Page 14: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/14.jpg)
module 7 14
What happens as sample gets larger?
-5 0 5
0.0
0.1
0.2
0.3
0.4
Value
density
T-distribution and Standard Normal Z distribution
Z distribution
T with 60 d.f.
![Page 15: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/15.jpg)
module 7 15
What happens to CI as sample gets larger?
⎟⎠
⎞⎜⎝
⎛±n
sZx
⎟⎠
⎞⎜⎝
⎛±n
stx
For large samples
Z and t values become almost identical, so CIs are almost identical
![Page 16: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/16.jpg)
module 7 16
First, graph and review data
Use box plot add-in Evaluate spread Evaluate how far apart mean
and median are (assume sampling design and
QC are good)
![Page 17: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/17.jpg)
module 7 17
Excel Summary Stats
![Page 18: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/18.jpg)
module 7 18
N=77
0
5
10
15
20
25
30
35
40
Ft Smith
Min 0.1
25th 7.5
Median 13.7
75th 18.1
Max 37.9
Mean 14.8
SD 8.7
1.Use the box-plot add-in
2.Calculate summary stats
![Page 19: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/19.jpg)
module 7 19
Our Question
Can we be 95%, 90%, or how confident that this mean of 14.78 is really greater than standard of 12?
We saw that N = 77, and mean and median not too different
Use z (normal) rather than t
![Page 20: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/20.jpg)
module 7 20
The mean is 14.8 +- what? We know equation for CI is
Width of confidence interval represents how sure we want to be that this CI includes true mean
Now, decide how confident we want to be
⎟⎠
⎞⎜⎝
⎛±n
sZx
![Page 21: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/21.jpg)
module 7 21
CI Calculation
For 95%, z = 1.96 (often rounded to 2) Stnd error (sigma/N) = (8.66/square root of
77) = 0.98 CI around mean = 2 x 0.98 We can be 95% sure that mean is included in
(mean +- 2), or 14.8-2 at low end, to 14.8 + 2 at high end
This does NOT include 12 !
![Page 22: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/22.jpg)
module 7 22
Excel can also calculate a confidence interval around the mean
Mean, plus and minus 1.93, is a 95% confidence interval that does NOT include 12!
![Page 23: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/23.jpg)
module 7 23
We know we are more than 95% confident, but how confident can we
be that Ft Smith mean > 12? Calculate where on curve our mean of 14.8 is,
in terms of z (normal) score… …or if N small, use t score
![Page 24: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/24.jpg)
module 7 24
To find where we are on the curve, calc the test statistic…
Ft Smith mean = 14.8, sigma =8.66, N =77
Calculate test statistic, in this case the z factor (we decided we can use the z rather than the t distribution)
If N was < 60, test stat is t, but calculated the same way
N
xz σ
μ)( −=
Data’s mean
Standard of 12
![Page 25: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/25.jpg)
module 7 25
Calculate z Easily
Our mean 14.8 minus standard of 12 (treat real mean μ (mu) as standard) is numerator (= 2.8)
Standard error is sigma/square root of N = 0.98 (same as for CI)
so z = (2.8)/0.98 = z = 2.84 So where is this z on the curve? Remember, at z = 3 we are to the right of ~
99%
![Page 26: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/26.jpg)
module 7 26
Where on the curve?
Z = 3
Z = 2
So between 95 and 99% probable that the true mean will not include 12
![Page 27: Module 7: Comparing Datasets and Comparing a Dataset with a Standard How different is enough?](https://reader036.fdocuments.net/reader036/viewer/2022081506/5697c0301a28abf838cda8ce/html5/thumbnails/27.jpg)
module 7 27
You can calculate exactly where on the curve, using Excel
Use Normsdist function, with z
If z (or t) = 2.84, in Excel
Yields 99.8% probability that the true mean does NOT include 12