For 95 out of 100 (large) samples, the interval will contain the true population mean. But we...
-
Upload
janice-newman -
Category
Documents
-
view
216 -
download
0
Transcript of For 95 out of 100 (large) samples, the interval will contain the true population mean. But we...
![Page 1: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/1.jpg)
![Page 2: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/2.jpg)
For 95 out of 100 (large) samples, the interval
will contain the true population mean.
nx x96.1
But we don’t know ?!
![Page 3: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/3.jpg)
Inference for the Mean of a Population
To estimate , we use a confidence interval around x.
The confidence interval is built with , which we replace with s (the sample std. dev.) if is not known.
nx x96.1
![Page 4: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/4.jpg)
t-distributions
ns
The “standard error” of x.
nsx
t
The “standard error” of x.
For an SRS sample, the one-sample t-statistic has the t-distribution with n-1 degrees of freedom.
(see Table D)
![Page 5: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/5.jpg)
t-distributions
t-distributions with k (=n-1) degrees of freedom – are labeled t(k), – are symmetric around 0, – and are bell-shaped – … but have more variability than Normal
distributions, due to the substitution of s in the place of .
![Page 6: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/6.jpg)
![Page 7: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/7.jpg)
![Page 8: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/8.jpg)
![Page 9: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/9.jpg)
Example: Estimating the level of vitamin C
Data:
26 31 23 22 11 22 14 31 Find a 95% confidence interval for . A: ( , ) Write it as “estimate plus margin of error”
STATA Exercise 1
![Page 10: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/10.jpg)
![Page 11: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/11.jpg)
![Page 12: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/12.jpg)
STATA Exercise 2
![Page 13: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/13.jpg)
STATA Exercise 2
![Page 14: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/14.jpg)
STATA Exercises 3 and 4
![Page 15: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/15.jpg)
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0
STATA Exercise 5
![Page 16: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/16.jpg)
![Page 17: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/17.jpg)
![Page 18: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/18.jpg)
STATA Exercise 6
![Page 19: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/19.jpg)
Robustness of t procedures
t-tests are only appropriate for testing a hypothesis on a single mean in these cases:– If n<15: only if the data is Normally distributed
(with no outliers or strong skewness)– If n≥15: only if there are no outliers or strong
skewness– If n≥40: even if clearly skewed (because of the
Central Limit Theorem)
![Page 20: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/20.jpg)
Comparing Two Means
![Page 21: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/21.jpg)
Comparing Two Means
Suppose we make a change to the registration procedure. Does this reduce the number of mistakes?
Basically, we’re looking at two populations: – the before-change population (population 1)– the after-change population (population 2)
Is the mean number of mistakes (per student) different? Is 1 – 2 = 0 or 0?
![Page 22: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/22.jpg)
Comparing Two Means
Notice that we are not matching pairs. We compare two groups.
![Page 23: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/23.jpg)
Comparing Two Means
Population Variable MeanStandard Deviation
1 x1 1 1
2 x2 2 2
![Page 24: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/24.jpg)
Comparing Two Means
PopulationSample
SizeSample Mean
Sample Standard Deviation
1 n1 x1 s1
2 n2 x2 s2
![Page 25: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/25.jpg)
Comparing Two Means
The population, really, is every single student using each registration procedure, an infinite number of times.– Suppose we get a “good” result today: how do we
know it will be repeated tomorrow? We can’t repeat the procedure an infinite
number of times, we only have a “sample”: numbers from one year.
We estimate (1 – 2) with (x1 – x2) .
![Page 26: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/26.jpg)
Comparing Two Means
Remember is a Random Variable. To estimate we need both and the margin of error around , which is
So we need to know ,or rather, the appropriate standard error for this estimation.
Because we are estimating a difference, we need the standard error of a difference.
nt x*x
nx
xx
![Page 27: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/27.jpg)
=0
Comparing Two Means
If the standard error for is
Then the standard error for (x1 – x2) is
1
1
n
1x
2
22
1
21
nn
![Page 28: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/28.jpg)
![Page 29: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/29.jpg)
2
22
1
21
2121
nn
xxt
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
Two-sample significance test
![Page 30: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/30.jpg)
STATA uses the Satterthwaite approximation as a default. This t* does not have a t-distribution because we are replacing two standard deviations by their sample equivalents.
![Page 31: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/31.jpg)
STATA Exercise 7
![Page 32: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/32.jpg)
Paired, unpaired tests
“Paired” tests compare each individual between two variables and ask whether the mean difference (“gain” in this example) is zero.
Ho: mean(pretest - posttest) = mean(diff) = 0 “Unpaired” tests take the mean of each variable and test
whether the difference of the means is zero.Ho: mean(pretest) - mean(posttest) = diff = 0
STATA Exercise 5
![Page 33: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/33.jpg)
STATA Exercise 8ttest ego, by(group) unequal
![Page 34: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/34.jpg)
Robustness and Small Samples
Two-sample methods are more robust than one-sample methods.– More so if the two samples have similar shapes
and sample sizes. STATA assumes that the variances are the same (what
the book calls “pooled t procedures”), unless you tell it the opposite, using the unequal option.
Small samples, as always, make the test less robust.
![Page 35: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/35.jpg)
Pooled two-sample t procedures
![Page 36: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/36.jpg)
Pooled two-sample t procedures
Suppose the two Normal population distributions have the same standard deviation.
Then the t-statistic that compares the means of samples from those two populations has exactly a t-distribution.
![Page 37: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/37.jpg)
Pooled two-sample t procedures
The common, but unknown standard deviation of both populations is . The sample standard deviations s1 and s2 estimate .
The best way to combine these estimates is to take a “weighted average” of the two, using the dfs as the weights:
2
11
21
222
2112
nn
snsnsp
![Page 38: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/38.jpg)
(assuming is the same for both populations)
21
11
nnsp
Here, t* is the value for the t(n1 + n2 – 2) density curve with area C between – t* and t*.
To test the hypothesis Ho: 1 = 2, compute the pooled two-sample t statistic
And use P-values from the t(n1 + n2 – 2) distribution.
21
21
11nn
s
xxt
p
THE POOLED TWO-SAMPLE T PROCEDURES
ttest ego, by(group)
![Page 39: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/39.jpg)
![Page 40: For 95 out of 100 (large) samples, the interval will contain the true population mean. But we don’t know ?!](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649ea95503460f94bad53f/html5/thumbnails/40.jpg)