S245 12 Sampling Theory
-
Upload
samuel-thompson -
Category
Documents
-
view
225 -
download
0
Transcript of S245 12 Sampling Theory
-
8/12/2019 S245 12 Sampling Theory
1/104
Sampling Theory
Determining the distribution of Sample
statistics
-
8/12/2019 S245 12 Sampling Theory
2/104
Sampling Theory
sampling distributions
It is important that we model this and use it
to assess accuracy of decisions made from
samples. A sample is a subset of the population.
In many instances it is too costly to collect
data from the entire population.
Note:It is important to recognize the dissimilarity(variability) we should expect to see in varioussamples from the same population.
-
8/12/2019 S245 12 Sampling Theory
3/104
Statistics and Parameters
A statisticis a numerical value computed from a
sample. Its value may differ for different samples.
e.g. sample mean , sample standard deviation s, and
sample proportion .
A parameteris a numerical value associated with a
population. Considered fixed and unchanging. e.g.
population mean m, population standard deviation s,and population proportion p.
x
p
-
8/12/2019 S245 12 Sampling Theory
4/104
Observations on a measurementX
x1,x
2,x
3, ,x
ntaken on individuals (cases) selected at random from a
population are random variablesprior to theirobservation.
The observations are numerical quantities whosevalues are determined by the outcome of a randomexperiment (the choosing of a random sample fromthe population).
-
8/12/2019 S245 12 Sampling Theory
5/104
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
The probability distribution of the observationsx1,x2,
x3, ,x
nis sometimes called the population.
This distribution is thesmoothhistogram of the the
variableXfor the entire population
-
8/12/2019 S245 12 Sampling Theory
6/104
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
the populationis unobserved (unless all observations
in the population have been observed)
-
8/12/2019 S245 12 Sampling Theory
7/104
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0 10 20 30 40 50 60
A histogram computed from the observations
x1,x2,x3, ,xnGives an estimate of the population.
-
8/12/2019 S245 12 Sampling Theory
8/104
A statisticcomputed from the observations
x1,x2,x3, ,xnis also a random variableprior to observation of the
sample.
A statisticis also a numerical quantity whose value is
determined by the outcome of a random experiment
(the choosing of a random sample from the
population).
-
8/12/2019 S245 12 Sampling Theory
9/104
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 10 20 30 40 50 60
The probability distribution of statistic computedfrom the observations
x1,x2,x3, ,xnis sometimes called its sampling distribution.
This distribution describes the random behaviour of
the statistic
-
8/12/2019 S245 12 Sampling Theory
10/104
It is important to determine the sampling distribution
of a statistic.
It will describe itssampling behaviour.
The sampling distribution will be used the assess the
accuracy of the statistic when used for the purpose ofestimation.
Sampling theory is the area of Mathematical Statistics
that is interested in determining the sampling
distribution of various statistics
-
8/12/2019 S245 12 Sampling Theory
11/104
Many statistics have a normal distribution.
This quite often is true if the population is Normal
It is also sometimes true if the sample size is
reasonably large. (reasonthe Central limit
theorem, to be mentioned later)
-
8/12/2019 S245 12 Sampling Theory
12/104
Combining Random Variables
-
8/12/2019 S245 12 Sampling Theory
13/104
Combining Random Variables
Quite often we have two or more random variablesX, Y, Z etc
We combine these random variables using amathematical expression.
Important question
What is the distribution of the new random variable?
-
8/12/2019 S245 12 Sampling Theory
14/104
Example 1: Suppose that one performs two
independent tasks (A and B):
X= time to perform task A (normal with mean 25
minutes and standard deviation of 3 minutes.)
Y= time to perform task B (normal with mean 15
minutes and std dev 2 minutes.)
Let T=X+ Y= total time to perform the two tasks
What is the distribution of T?
What is the probability that the two tasks take more
than 45 minutes to perform?
-
8/12/2019 S245 12 Sampling Theory
15/104
Example 2:
Suppose that a student will take three tests in the nextthree days
1. Mathematics (X is the score he will receive on thistest.)
2. English Literature (Y is the score he will receive on
this test.)
3. Social Studies (Z is the score he will receive on thistest.)
-
8/12/2019 S245 12 Sampling Theory
16/104
Assume that
1. X (Mathematics) has a Normal distribution with
mean m= 90 and standard deviation s= 3.
2. Y (English Literature) has a Normal distribution
with mean m= 60 and standard deviation s= 10.3. Z (Social Studies) has a Normal distribution with
mean m= 70 and standard deviation s= 7.
-
8/12/2019 S245 12 Sampling Theory
17/104
Graphs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100
X (Mathematics)
m= 90, s= 3.
Y (English Literature)m= 60, s= 10.
Z (Social Studies)
m= 70 , s= 7.
-
8/12/2019 S245 12 Sampling Theory
18/104
Suppose that after the tests have been written an overall
score, S, will be computed as follows:
S (Overall score) = 0.50X (Mathematics) + 0.30 Y
(English Literature) + 0.20Z (Social Studies) +
10 (Bonus marks)
What is the distribution of the overall score, S?
-
8/12/2019 S245 12 Sampling Theory
19/104
Sums, Differences, Linear Combinations of R.V.s
A linear combination of random variables,X, Y, . . . isa combination of the form:
L =aX +bY + + c (a constant)
where a, b, etc. are numberspositive or negative.
Most common:
Sum=X +Y Difference=XY
Others
Averages = 1/3X +1/3Y +
1/3Z
Weighted averages = 0.40X + 0.25 Y + 0.35Z
-
8/12/2019 S245 12 Sampling Theory
20/104
Sums, Differences, Linear Combinations of R.V.s
A linear combination of random variables,X, Y, . . . isa combination of the form:
L =aX +bY + + c (a constant)
where a, b, etc. are numberspositive or negative.
Most common:
Sum=X +Y Difference=XY
Others
Averages = 1/3X +1/3Y +
1/3Z
Weighted averages = 0.40X + 0.25 Y + 0.35Z
-
8/12/2019 S245 12 Sampling Theory
21/104
Means of Linear Combinations
The mean of L is:
Mean(L)=a Mean(X)+b Mean(Y)+ + c
mL=a mX+b mY+ + cMost common:
Mean(X +Y) = Mean(X) + Mean(Y)
Mean(XY) = Mean(X)Mean(Y)
If L =aX +bY + + c
-
8/12/2019 S245 12 Sampling Theory
22/104
Variances of Linear Combinations
IfX, Y, . . . are independentrandom variables and
L =aX +bY + + cthen
Variance(L)=a2Variance(X)+b2 Variance(Y)+
Most common:
Variance(X +Y) = Variance(X) + Variance(Y)
Variance(XY) = Variance(X) + Variance(Y)
2 2 2 2 2
L X Ya bs s s
The constant c has no effect on the variance
-
8/12/2019 S245 12 Sampling Theory
23/104
Example: Suppose that one performs two
independent tasks (A and B):
X= time to perform task A (normal with mean 25
minutes and standard deviation of 3 minutes.)Y= time to perform task B (normal with mean 15
minutes and std dev 2 minutes.)
Xand Y independent so T=X+ Y= total time is normal
with
6.323deviationstandard
401525mean
22
s
m
0823.39.16.3
404545
ZPZPTP
What is the probability that the two tasks take more than 45
minutes to perform?
-
8/12/2019 S245 12 Sampling Theory
24/104
Example 2:
A student will take three tests in the next three days
1. X (Mathematics) has a Normal distribution with
mean m= 90 and standard deviation s= 3.
2. Y (English Literature) has a Normal distribution
with mean m= 60 and standard deviation s= 10.3. Z (Social Studies) has a Normal distribution with
mean m= 70 and standard deviation s= 7.
Overall score, S = 0.50X (Mathematics) + 0.30 Y(English Literature) + 0.20Z (Social Studies) +
10 (Bonus marks)
-
8/12/2019 S245 12 Sampling Theory
25/104
Graphs
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 20 40 60 80 100
X (Mathematics)
m= 90, s= 3.
Y (English Literature)m= 60, s= 10.
Z (Social Studies)
m= 70 , s= 7.
-
8/12/2019 S245 12 Sampling Theory
26/104
Determine the distribution of
S = 0.50X + 0.30 Y + 0.20Z + 10
S has a normal distribution with
MeanmS= 0.50 mX + 0.30 mY + 0.20 mZ + 10
= 0.50(90) + 0.30(60) + 0.20(70) + 10
= 45 + 18 + 14 +10 = 87
2 2 22 2 20.5 0.3 0.2
s X Y Zs s s s
2 2 22 2 20.5 3 0.3 10 0.2 7
2.25 9 1.96 13.21 3.635
-
8/12/2019 S245 12 Sampling Theory
27/104
Graph
0
0.02
0.04
0.06
0.08
0.1
0.12
0 20 40 60 80 100
distribution of
S = 0.50X + 0.30 Y + 0.20Z + 10
-
8/12/2019 S245 12 Sampling Theory
28/104
Sampling Theory
Determining the distribution of Sample
statistics
-
8/12/2019 S245 12 Sampling Theory
29/104
Combining Random Variables
-
8/12/2019 S245 12 Sampling Theory
30/104
Sums, Differences, Linear Combinations of R.V.s
A linear combination of random variables,X, Y, . . . isa combination of the form:
L =aX +bY + + c (a constant)
where a, b, etc. are numberspositive or negative.
Most common:
Sum=X +Y Difference=XY
Others
Averages = 1/3X +1/3Y +
1/3Z
Weighted averages = 0.40X + 0.25 Y + 0.35Z
-
8/12/2019 S245 12 Sampling Theory
31/104
Means of Linear Combinations
The mean of L is:
Mean(L)=a Mean(X)+b Mean(Y)+ + c
mL=a mX+b mY+ + cMost common:
Mean(X +Y) = Mean(X) + Mean(Y)
Mean(XY) = Mean(X)Mean(Y)
If L =aX +bY + + c
-
8/12/2019 S245 12 Sampling Theory
32/104
Variances of Linear Combinations
IfX, Y, . . . are independentrandom variables and
L =aX +bY + + cthen
Variance(L)=a2Variance(X)+b2 Variance(Y)+
Most common:
Variance(X +Y) = Variance(X) + Variance(Y)
Variance(XY) = Variance(X) + Variance(Y)
2 2 2 2 2
L X Ya bs s s
The constant c has no effect on the variance
-
8/12/2019 S245 12 Sampling Theory
33/104
Normality of Linear Combinations
IfX, Y, . . . are independent Normal random
variables and
L =aX +bY + + c
then L is Normal with
mean
and standard deviation
cba YXL mmm
2222 XXL ba sss
2
-
8/12/2019 S245 12 Sampling Theory
34/104
In particular:
X +Y is normal with
XY is normal with
22
deviationstandard
mean
YX
YX
ss
mm
22deviationstandard
mean
YX
YX
ss
mm
-
8/12/2019 S245 12 Sampling Theory
35/104
The distribution of the sample
mean
-
8/12/2019 S245 12 Sampling Theory
36/104
The distribution of averages (the mean)
Letx1,x2, ,xn denote n independent randomvariables each coming from the same Normal
distribution with mean mand standard deviation s.
Let
11 2
1 1 1
n
i
in
xx x x x
n n n n
What is the distribution of ?x
Th di ib i f ( h )
-
8/12/2019 S245 12 Sampling Theory
37/104
The distribution of averages (the mean)
Because the mean is a linear combination
1 2
1 1 1nx x x xn n n
m m m m
and
1 1 1 1
nn n n nm m m m m
1 2
2 2 2
2 2 2 21 1 1nx x x xn n n
s s s s
2 2 2 2 22 2 2
2
1 1 1n
n n n n n
s ss s s
-
8/12/2019 S245 12 Sampling Theory
38/104
Thus if x1,x2, ,xn denote n independent random
variables each coming from the same Normal
distribution with mean mand standard deviation s.Then
11 2
1 1 1
n
i
in
x
x x x x
n n n n
has Normal distribution with
mean andxm m2
2variancex
n
ss
standard deviation xn
ss
-
8/12/2019 S245 12 Sampling Theory
39/104
Graphs
0
0.02
0.04
0.06
0.08
150 170 190 210 230 250 270 290 310
The probability
distribution of
individual
observations
The probability
distribution of
the mean
s
m
n
s
-
8/12/2019 S245 12 Sampling Theory
40/104
Summary
The distribution of the sample mean is Normal. The distribution of the sample mean has exactly thesame mean as the population (m).
The distribution of the sample mean has a smaller
standard deviation then the population.
Averaging tends todecrease variability
An Excelfile illustrating the distribution of thesample mean
compared ton
ss
x
x
http://localhost/var/www/apps/conversion/tmp/scratch_5/mean.XLShttp://localhost/var/www/apps/conversion/tmp/scratch_5/mean.XLS -
8/12/2019 S245 12 Sampling Theory
41/104
Example
Suppose we are measuring the cholesterol level ofmen age 60-65
This measurement has a Normal distribution with
mean m= 220 and standard deviation s= 17.
A sample of n = 10 males age 60-65 are selected and
the cholesterol level is measured for those 10 males.
x1,x2,x3,x4,x5,x6,x7,x8,x9,x10, are those 10
measurementsFind the probability distribution of
Compute the probability that is between 215 and 225
?x
x
-
8/12/2019 S245 12 Sampling Theory
42/104
Solution
Find the probability distribution of xNormal with 220xm m
17and 5.376
10x n
ss
215 225P x
215 220 220 225 220
5.376 5.376 5.376
xP
0.930 0.930 0.648P z
-
8/12/2019 S245 12 Sampling Theory
43/104
The Central Limit Theorem
The Central Limit Theorem (C.L.T.) states that if nissufficiently large, the sample meansof randomsamples from anypopulation with mean mand finite
standard deviation sare approximately normallydistributedwith mean mand standard deviation .
Technical Note:
The mean and standard deviation given in the CLThold for any sample size; it is only the approximatelynormal shape that requires n to be sufficiently large.
n
s
-
8/12/2019 S245 12 Sampling Theory
44/104
Graphical Illustration of the Central Limit Theorem
Original Population
x10 3020
10 x
Distribution ofx:
n= 10
x
Distribution ofx:
n= 30
10 20
x
Distribution ofx:
n= 2
10 3020
-
8/12/2019 S245 12 Sampling Theory
45/104
Implications of the Central Limit Theorem
The Conclusion that the sampling distribution of thesample mean is Normal, will totrueif the sample size
is large (>30). (even though the population may be non-
normal).
When the population can be assumed to be normal, the
sampling distribution of the sample mean is Normal, will
totruefor any sample size.
Knowing the sampling distribution of the sample meanallows to answer probability questions related to the
sample mean.
E l
-
8/12/2019 S245 12 Sampling Theory
46/104
Example
Example: Consider a normal population with m= 50 and s=
15.Suppose a sample of size 9 is selected at random. Find:
P x( )45 60 Px( . )475
1)
2)
Solutions: Since the original population is normal, the distribution of the
sample mean is also (exactly) normal
1) m mx 50s s
x n 15 9 153 52)
-
8/12/2019 S245 12 Sampling Theory
47/104
5045 60 x01.00 2.00 z
Example
P x PP z
( )(
.
.
.
45 60 45 505
60 505
1.00 2.00)08413
00228
08185
zz = ;x- ms n
-
8/12/2019 S245 12 Sampling Theory
48/104
5047.5 x0-0.50 z
0 3085.
Example
Px PxPz
( . ) .( .)
.
.
.
475 505
475 505
505000
01915
03085
z = ;x- ms n
-
8/12/2019 S245 12 Sampling Theory
49/104
Example
-
8/12/2019 S245 12 Sampling Theory
50/104
Example
xP PPz
( ).
( . )
105 105 109283
141.
00793
z = ;x- ms n z
109105 x0141. z
0 0793.
1)
-
8/12/2019 S245 12 Sampling Theory
51/104
To investigate the claim, we need to examine how likelyanobservation is the sample mean of $120
There is evidence (the sample) to suggest the claim of m= $109 islikely wrong
Since the probability is so small, this suggests the observation of$120 is very rare (if the mean cost is really $109)
Consider how far out in the tail of the distribution of the sample
mean is $120
Px PPz
( ).
( . )
120 120 109283
3891.0000 - 0.9999 = 0.0001
z = ;x- ms n z
2)
-
8/12/2019 S245 12 Sampling Theory
52/104
Summary
The distribution of is (exactly) normal when theoriginal population is normal
The CLT says: the distribution of is approximatelynormal regardless of the shape of the original
distribution, when the sample size is large enough!
The mean of the sampling distribution of is equal to
the mean of the original population:
x
xm m
x
x
The standard deviation of the sampling distribution of(also called the standard error of the mean) is equal to the
standard deviation of the original population divided bythe square root of the sample size:
x
x
n
ss
-
8/12/2019 S245 12 Sampling Theory
53/104
Sampling Distribution of a
Sample Proportion
-
8/12/2019 S245 12 Sampling Theory
54/104
Sampling Distribution for Sample Proportions
Letp =population proportion of interest
or binomial probability of success.
Let
trialsbimomialofno.
succesesofno.
n
Xp
pofondistributisamplingThen the
pp mean m n
ppp
)1(
s
is approximately a normal distribution with
= sample proportion or proportion ofsuccesses.
L i
-
8/12/2019 S245 12 Sampling Theory
55/104
Logic
RecallX = the number of successes in n trials has aBinomial distribution with parameters n andp (the
probability of success).AlsoX has approximately a Normal distributionwith
mean m= np and
standard deviation
1Then the sampling distribution of
Xp X
n n
1 1mean p np p
n nm m
(1 )(1 )
1 1and
p
p pnp p
nn ns s
is a normal distribution with
(1 )npq np ps
-
8/12/2019 S245 12 Sampling Theory
56/104
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c
pofondistributiSampling
p pm
1p
p p
ns
E l S l P ti F i
-
8/12/2019 S245 12 Sampling Theory
57/104
Example Sample Proportion Favor ing a
Candidate
Suppose 20% all voters favor Candidate A.Pollsters take a sample of n= 600 voters. Thenthe sample proportion who favor A will haveapproximately a normal distribution with
20.0mean ppm
01633.0600
)80.0(20.0)1(
n
pp
ps
-
8/12/2019 S245 12 Sampling Theory
58/104
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c
pofondistributiSampling
i S i i i i
-
8/12/2019 S245 12 Sampling Theory
59/104
Determine the probability that the sample proportion
will be between 0.18 and 0.22
i.e. the probability, 0.18 0.22P p
Using the Sampling distribution:
Suppose 20% all voters favor Candidate A. Pollsterstake a sample of n= 600 voters.
-
8/12/2019 S245 12 Sampling Theory
60/104
01633.0600
)80.0(20.0)1(
n
ppps
Solution:
20.0Recall ppm
0.18 0.20 0.20 0.22 0.20
0.18 0.220.1633 0.1633 0.1633
pP p P
7794.01103.08897.0225.1225.1 zP
01633.0 01633.0 01633.0
-
8/12/2019 S245 12 Sampling Theory
61/104
-
8/12/2019 S245 12 Sampling Theory
62/104
Distribution for Sample Mean
the sampling distribution of x
mean and standard deviationx x
n
sm m s
is a normal distribution with
If data is collected from a Normal distributionwith mean mand standard deviation sthen:
-
8/12/2019 S245 12 Sampling Theory
63/104
The Central Limit Thereom
the sampling distribution of x
mean and standard deviationx x
n
sm m s
is a approximatelynormal (for n > 30) with
If data is collected from a distribution (possibly nonNormal)with mean mand standard deviation sthen:
-
8/12/2019 S245 12 Sampling Theory
64/104
Distribution for Sample Proportions
Letp =population proportion of interest
or binomial probability of success.Let
trialsbimomialofno.
succesesofno.
n
Xp
pofondistributisamplingThen the
pp mean m n
ppp
)1(
s
is approximately a normal distribution with
= sample proportion or proportion ofsuccesses.
-
8/12/2019 S245 12 Sampling Theory
65/104
Sampling distribution of a
differences
-
8/12/2019 S245 12 Sampling Theory
66/104
Sampling distribution of a differencein two
Sample means
-
8/12/2019 S245 12 Sampling Theory
67/104
IfX, Yare independentnormal random variables, then :
XY is normal with
Recall
22deviationstandard
mean
YX
YX
ss
mm
-
8/12/2019 S245 12 Sampling Theory
68/104
Comparing Means
Situation
We have two normal populations (1 and 2)
Let m1and s1denote the mean and standard deviation ofpopulation 1.
Let m2and s2denote the mean and standard deviation ofpopulation 2.
Letx1,x2,x3, ,xndenote a sample from a normalpopulation 1.
Lety1,y2,y3, ,ymdenote a sample from a normal
population 2. Objective is to compare the two population means
We know that:
-
8/12/2019 S245 12 Sampling Theory
69/104
We know that:
is Normal with meanD x y
11
is Normal with mean and
x xx
n
sm m s
22
and
is Normal with mean and
y yy
m
sm m s
Thus
1 2 -x y x ym m m m m
2 22 2 1 2=
x y x yn m
s ss s s
E l
-
8/12/2019 S245 12 Sampling Theory
70/104
Example
Consider measuring Heart rate two minutes after a twenty
minute exercise program.
There are two groups of individuals
1. Those who performed exercise program A (considered to be
heavy).2. Those who performed exercise program B (considered to be
light).
The average Heart rate for those who performed exercise
program Awas m1 = 110 with standard deviation, s1= 7.3, while
the average Heart rate for those who performed exercise
program Bwas m2 = 95 with standard deviation, s2= 4.5.
-
8/12/2019 S245 12 Sampling Theory
71/104
-0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
80 90 100 110 120 130
Heart rate for
program B
Heart rate for
program A
-
8/12/2019 S245 12 Sampling Theory
72/104
Situation
Suppose we observe the heart rate of n = 15 subjects onprogram A.
Letx1,x2,x3, ,x15denote these observations.
We also observe the heart rate of m = 20 subjects onprogram B.
Lety1,y2,y3, ,y20denote these observations.
What is the probability that the sample mean heart rate forProgram A is at least 8 units higher than the sample mean
heart rate for Program B?
We know that:
-
8/12/2019 S245 12 Sampling Theory
73/104
We know that:
is Normal with meanD x y
7.3 is Normal with mean 110 and
15x x
x m s
and
4.5is Normal with mean 95 and
20y y
y m s
and
110 - 95 15x y x ym m m
2 2 2 22 2 1 2 7.3 4.5= 2.1366
15 20x y x y
n m
s ss s s
-
8/12/2019 S245 12 Sampling Theory
74/104
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
80 90 100 110 120 130
distn of
sample mean
for program B distn of
sample mean
program A
-
8/12/2019 S245 12 Sampling Theory
75/104
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 5 10 15 20 25 30
distn of differencein sample means, D
-
8/12/2019 S245 12 Sampling Theory
76/104
What is the probability that the sample mean heart rate forProgram A is at least 8 units higher than the sample mean
heart rate for Program B?
Solution
want 8 8 8P x y P x y P D
15 8 15
3.282.1366 2.1366
DP P z
1 0.0005 0.9995
-
8/12/2019 S245 12 Sampling Theory
77/104
Sampling distribution of a differencein two
Sample proportions
C i P ti
-
8/12/2019 S245 12 Sampling Theory
78/104
Comparing Proportions
Situation
Suppose we have two Success-Failure experiments Letp1= the probability of success for experiment 1.
Letp2= the probability of success for experiment 2.
Suppose that experiment 1 is repeated n1times and
experiment 2 is repeated n2 Letx1 = the no. of successes in the n1 repititions ofexperiment 1,x2 = the no. of successes in the n2 repititionsof experiment 2.
1 2
1 2
1 2 = and =
x x
p pn n
1 21 2
1 2
What is the distribution of = ?x x
D p pn n
We know that:
-
8/12/2019 S245 12 Sampling Theory
79/104
1 2 is Normal with meanD p p
1
11 1
1
= is Normal with meanp
xp p
nm
Thus
1 2 1 2 1 2 -p p p p p pm m m
1 2 1 2
1 1 2 22 2
1 2
1 1=
p p p p
p p p p
n ns s s
1
1 1
1
1-andp
p pn
s
2
22 2
2
Also = is Normal with meanp
xp p
nm
2
2 2
2
1-and
p
p p
ns
Example
-
8/12/2019 S245 12 Sampling Theory
80/104
The Globe and Mail carried out a survey to investigate
the State of the Baby Boomers. (June 2006)
Two populations in the study
1. Baby Boomers (age 4059) (n1= 664)
2. GenerationX (age 3039) (n2= 342)
-
8/12/2019 S245 12 Sampling Theory
81/104
One of questions
Are you close to your parents? Yes or No
Suppose that the proportions in the two populations were: Baby Boomers40% yes (p1= 0.40)
GenerationX20% yes (p2= 0.20)
What is the probability that this would be observed inthe samples to a certain degree?
What isP[p1p2 0.15]?^ ^
Solution:
-
8/12/2019 S245 12 Sampling Theory
82/104
1
11 1
1
= is Normal with mean 0.40p
xp p
nm
1
1 1
1
1-andp
p pn
s
2
22 2
2
Also = is Normal with mean 0.20p
xp p
nm
22 2
2
1-and p p pn
s
0.40 1-0.400.019012
664
0.20 1-0.200.02163
342
distn of sample
-
8/12/2019 S245 12 Sampling Theory
83/104
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
proportion for Gen X
distn of sample proportion
for Baby Boomers
-
8/12/2019 S245 12 Sampling Theory
84/104
1 2 is Normal with meanD p p Now
1 2 1 2 1 2 - 0.4 0.2 0.2
D p p p p p pm m m m
1 2 1 2
1 1 2 22 2
1 2
1 1
=D p p p pp p p p
n ns s s s
0.4 1 0.4 0.2 1 0.2
664 3420.028797
D p p Distribution of
-
8/12/2019 S245 12 Sampling Theory
85/104
0
2
4
6
8
10
12
14
16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 2 D p p Distribution of
-
8/12/2019 S245 12 Sampling Theory
86/104
-
8/12/2019 S245 12 Sampling Theory
87/104
Sampling distributions
Summary
-
8/12/2019 S245 12 Sampling Theory
88/104
Distribution for Sample Mean
the sampling distribution of x
mean and standard deviationx x
n
sm m s
is a normal distribution with
If data is collected from a Normal distributionwith mean mand standard deviation sthen:
-
8/12/2019 S245 12 Sampling Theory
89/104
The Central Limit Thereom
the sampling distribution of x
mean and standard deviationx x
n
sm m s
is a approximatelynormal (for n > 30) with
If data is collected from a distribution (possibly nonNormal)with mean mand standard deviation sthen:
-
8/12/2019 S245 12 Sampling Theory
90/104
-
8/12/2019 S245 12 Sampling Theory
91/104
Distribution of a difference in two sample Means
is Normal with meanD x y
1 2 -x y x ym m m m m
2 22 2 1 2=x y x y
n m
s ss s s
Distribution of a difference in two sample proportions
-
8/12/2019 S245 12 Sampling Theory
92/104
1 2 is Normal with meanD p p
1 2 1 2 1 2 -
p p p p p pm m m
1 2 1 2
1 1 2 22 2
1 2
1 1=
p p p p
p p p p
n ns s s
-
8/12/2019 S245 12 Sampling Theory
93/104
The Chi-square (c2) distribution
The Chi-squared distribution
-
8/12/2019 S245 12 Sampling Theory
94/104
with
ndegrees of freedom
Comment:Ifz1,z2, ...,znare independent
random variables each having a standardnormal distribution then
U=
has a chi-squared distribution with ndegrees of freedom.
22
2
2
1 nzzz
The Chi-squared distributionwith
-
8/12/2019 S245 12 Sampling Theory
95/104
0
0.06
0.12
0.18
0 10 20
with
ndegrees of freedom
n- degrees of freedom
-
8/12/2019 S245 12 Sampling Theory
96/104
2 4 6 8 1 0 1 2 1 4
0 . 1
0 . 2
0 . 3
0 . 4
0 . 52 d.f.
3 d.f.
4 d.f.
Statistics that have the Chi squared
-
8/12/2019 S245 12 Sampling Theory
97/104
Statistics that have the Chi-squared
distribution:
2
2 2
1 1 1 1
1.c r c r
ij ij
ij
j i j iij
x Er
Ec
The statistic used to detect independence
between two categorical variables
d.f. = (r1)(c1)
Let x1 x2 x denote a sample from the
-
8/12/2019 S245 12 Sampling Theory
98/104
Letx1,x2, ,xn denote a sample from thenormal distribution with mean mand
standard deviation s, then
2
1
2
2.
r
i
i
x x
Us
has a chi-square distribution with d.f. = n1.
2
2
( 1)n s
s
Example
-
8/12/2019 S245 12 Sampling Theory
99/104
Suppose thatx1,x2, ,x10is a sample of
size n = 10 from the normal distribution withmean m=100 and standard deviation s =15.
2
1
1
r
i
i
x xs
n
Suppose that
p
is the sample standard deviation.Find 10 20 .P s
Note
-
8/12/2019 S245 12 Sampling Theory
100/104
2
1 2
r
i
i
x x
Us
has a chi-square distribution with
d.f. = n1 = 9
2
2
( 1)n s
s
210 20 100 400P s P s
2
2
(9)
(15)
s
22 2 2
9 100 9 400915 15 15
sP
4 16P U
-
8/12/2019 S245 12 Sampling Theory
101/104
The excel function
-
8/12/2019 S245 12 Sampling Theory
102/104
4 16P U
CHIDIST(x,df) computes P x U
x
P x U
-
8/12/2019 S245 12 Sampling Theory
103/104
4 16 CHIDIST(4,9)-CHIDIST(16,9)P U
= 0.91141 - 0.06688 = 0.84453
-
8/12/2019 S245 12 Sampling Theory
104/104
Statistical Inference
http://localhost/var/www/apps/conversion/tmp/scratch_5/S245%2013%20Statistical%20Inference.ppthttp://localhost/var/www/apps/conversion/tmp/scratch_5/S245%2013%20Statistical%20Inference.ppt