Statistics for Business and Economics: bab 21
description
Transcript of Statistics for Business and Economics: bab 21
1 Slide
Slides Prepared byJOHN S. LOUCKS
St. Edward’s University
© 2002 South-Western College Publishing/Thomson Learning
2 Slide
Chapter 21 Sample Survey
Terminology Used in Sample Surveys Types of Surveys and Sampling Methods Survey Errors Simple Random Sampling Stratified Simple Random Sampling Cluster Sampling Systematic Sampling
3 Slide
Terminology Used in Sample Surveys
An element is the entity on which data are collected.
A population is the collection of all elements of interest.
A sample is a subset of the population.
4 Slide
Terminology Used in Sample Surveys
The target population is the population we want to make inferences about.
The sampled population is the population from which the sample is actually selected.
These two populations are not always the same.
If inferences from a sample are to be valid, the sampled population must be representative of the target population.
5 Slide
Terminology Used in Sample Surveys
The population is divided into sampling units which are groups of elements or the elements themselves.
A list of the sampling units for a particular study is called a frame.
The choice of a particular frame is often determined by the availability and reliability of a list.
The development of a frame can be the most difficult and important steps in conducting a sample survey.
6 Slide
Types of Surveys
Surveys Involving Questionnaires• Three common types are mail surveys,
telephone surveys, and personal interview surveys.
• Survey cost are lower for mail and telephone surveys.
• With well-trained interviewers, higher response rates and longer questionnaires are possible with personal interviews.
• The design of the questionnaire is critical.
7 Slide
Types of Surveys
Surveys Not Involving Questionnaires• Often, someone simply counts or measures
the sampled items and records the results.• An example is sampling a company’s
inventory of parts to estimate the total inventory value.
8 Slide
Sampling Methods
Sample surveys can also be classified in terms of the sampling method used.
The two categories of sampling methods are:• Probabilistic sampling• Nonprobabilistic sampling
9 Slide
Nonprobabilistic Sampling Methods
The probability of obtaining each possible sample can be computed.
Statistically valid statements cannot be made about the precision of the estimates.
Sampling cost is lower and implementation is easier.
Methods include convenience and judgment sampling.
10 Slide
Nonprobabilistic Sampling Methods
Convenience Sampling• The units included in the sample are chosen
because of accessibility.• In some cases, convenience sampling is the
only practical approach.
11 Slide
Nonprobabilistic Sampling Methods
Judgment Sampling• A knowledgeable person selects sampling
units that he/she feels are most representative of the population.
• The quality of the result is dependent on the judgment of the person selecting the sample.
• Generally, no statistical statement should be made about the precision of the result.
12 Slide
Probabilistic Sampling Methods
The probability of obtaining each possible sample can be computed.
Confidence intervals can be developed which provide bounds on the sampling error.
Methods include simple random, stratified simple random, cluster, and systematic sampling.
13 Slide
Survey Errors
Two types of errors can occur in conducting a survey:• Sampling error• Nonsampling error
14 Slide
Survey Errors
Sampling Error • It is defined as the magnitude of the
difference between the point estimate, developed from the sample, and the population parameter.
• It occurs because not every element in the population is surveyed.
• It cannot occur in a census.• It can not be avoided, but it can be
controlled.
15 Slide
Survey Errors
Nonsampling Error• It can occur in both a census and a sample
survey. • Examples include:
• Measurement error• Errors due to nonresponse• Errors due to lack of respondent
knowledge• Selection error• Processing error
16 Slide
Survey Errors
Nonsampling Error• Measurement Error
• Measuring instruments are not properly calibrated.
• People taking the measurements are not properly trained.
17 Slide
Survey Errors
Nonsampling Error• Errors Due to Nonresponse
• They occur when no data can be obtained, or only partial data are obtained, for some of the units surveyed.
• The problem is most serious when a bias is created.
18 Slide
Survey Errors
Nonsampling Error• Errors Due to Lack of Respondent
Knowledge• These errors on common in technical
surveys.• Some respondents might be more
capable than others of answering technical questions.
19 Slide
Survey Errors
Nonsampling Error• Selection Error
• An inappropriate item is included in the survey.
• For example, in a survey of “small truck owners” some interviewers include SUV owners while other interviewers do not.
20 Slide
Survey Errors
Nonsampling Error• Processing Error
• Data is incorrectly recorded.• Data is incorrectly transferred from
recording forms to computer files.
21 Slide
Simple Random Sampling
A simple random sample of size n from a finite population of size N is a sample selected such that every possible sample of size n has the same probability of being selected.
We begin by developing a frame or list of all elements in the population.
Then a selection procedure, based on the use of random numbers, is used to ensure that each element in the sampled population has the same probability of being selected.
22 Slide
Simple Random Sampling
We will see in the upcoming slides how to: Estimate the following population parameters:
• Population mean• Population total• Population proportion
Determine the appropriate sample size
23 Slide
In a sample survey it is common practice to provide an approximate 95% confidence interval estimate of the population parameter.
Assuming the sampling distribution of the point estimator can be approximated by a normal probability distribution, we use a value of z = 2 for a 95% confidence interval.
The interval estimate is: Point Estimator +/- 2(Estimate of the Standard Error
of the Point Estimator) The bound on the sampling error is: 2(Estimate of the Standard Error of the Point
Estimator)
Simple Random Sampling
24 Slide
Population Mean• Point Estimator
• Estimate of the Standard Error of the Mean
Simple Random Sampling
x
xN n ss
N n
25 Slide
Population Mean• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Simple Random Sampling
/ 2 xx z s
2 xx s
26 Slide
Population Total• Point Estimator
• Estimate of the Standard Error of the Total
Simple Random Sampling
X̂ Nx
ˆ xxs Ns
27 Slide
Population Total• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Simple Random Sampling
ˆ/ 2 xNx z s
ˆ2 xNx s
28 Slide
Simple Random Sampling
Population Proportion• Point Estimator
• Estimate of the Standard Error of the Proportion (1 )
1pp pN ns
n n
p
29 Slide
Simple Random Sampling
Population Proportion• Interval Estimate
• Approximate 95% Confidence Interval Estimate
/ 2 pp z s
2 pp s
30 Slide
Determining the Sample Size
An important consideration in sample design is the choice of sample size.
The best choice usually involves a tradeoff between cost and precision (size of the confidence interval).
Larger samples provide greater precision, but are more costly.
A budget might dictate how large the sample can be.
A specified level of precision might dictate how small a sample can be.
31 Slide
Determining the Sample Size
Smaller confidence intervals provide more precision.
The size of the approximate confidence interval depends on the bound B on the sampling error.
Choosing a level of precision amounts to choosing a value for B.
Given a desired level of precision, we can solve for the value of n.
32 Slide
Simple Random Sampling
Necessary Sample Sizefor Estimating the Population Mean
Hence,2
22
4
NsnBN s
2 N n sBN n
33 Slide
Example: Innis Investments
Simple Random SamplingInnis is a financial advisor for 200
clients. A sample of 40 clients has been taken to obtain various demographic data and information about the clients’ investment objectives. Statistics of particular interest are the clients’ age, clients’ total net worth, and the proportion favoring fixed income investments.
34 Slide
Example: Innis Investments
Simple Random SamplingFor the sample, the mean age was 52
(with a standard deviation of 10), the mean net worth was $480,000 (with a standard deviation of $120,000), and the proportion favoring fixed-income investments was .30.
35 Slide
Estimate of Standard Error of Mean Age
Approximate 95% Confidence Interval for Mean Age
s N nN
snx
200 40200
1040 141.s N n
Nsnx
200 40200
1040 141.
x sx 2 52 2 141 52 282 = = = 49.18 to 54.82( . ) .x sx 2 52 2 141 52 282 = = = 49.18 to 54.82( . ) .
Example: Innis Investments
36 Slide
Point Estimate of Total Net Worth (TNW) of Clients
Estimate of Standard Error of TNW
= $6,788,400 Approximate 95% Confidence Interval for TNW
= $82,423,200 to $109,576,800
( ) ,X Nx 200 480 96 000 thousand = $96,000,000 ( ) ,X Nx 200 480 96 000 thousand = $96,000,000
ˆ200 40 120400 6,788.4 200 40xX
N n ss Ns NN n
Nx sx 2 96 000 2 6 7884 ( , . ) = , = 82,423.2 to 109,576.8Nx sx 2 96 000 2 6 7884 ( , . ) = , = 82,423.2 to 109,576.8
Example: Innis Investments
37 Slide
Point Estimate of Population ProportionFavoring Fixed-Income Investments
p = .30 Estimate of Standard Error of Proportion
Approximate 95% Confidence Interval
s N nn
p pnp
( ) . ( . ) .11
200 40200
3 1 3200 1 029s N n
np pnp
( ) . ( . ) .11
200 40200
3 1 3200 1 029
p sp 2 300 2 029 = = .242 to .358. (. )p sp 2 300 2 029 = = .242 to .358. (. )
Example: Innis Investments
38 Slide
One year later Innis wants to again survey his clients. He now has 250 clients and wants to set a bound of $30,000 on the error of the estimate of their mean net worth.
Necessary Sample Size
He will need a sample size of 51.
n Ns
N B s
2
2 2
22 2
4
250 120
250 304 120
5096( ) .n Ns
N B s
2
2 2
22 2
4
250 120
250 304 120
5096( ) .
Example: Innis Investments
39 Slide
Stratified Simple Random Sampling The population is first divided into H groups,
called strata. Then for stratum h, a simple random sample of
size nh is selected. The data from the H simple random samples
are combined to develop an estimate of a population parameter.
If the variability within each stratum is smaller than the variability across the strata, a stratified simple random sample can lead to greater precision.
The basis for forming the various strata depends on the judgment of the designer of the sample.
40 Slide
Stratified Simple Random Sampling Population Mean
• Point Estimator
where: H = number of strata = sample mean for stratum
hNh = number of elements in the
population in stratum h N = total number of elements
in the population (all strata)
1
Hh
st hh
Nx xN
hx
41 Slide
Stratified Simple Random Sampling
Population Mean• Estimate of the Standard Error of the Mean
23
21
1 ( )st
hx h h h
h h
ss N N nN n
42 Slide
Population Mean• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Stratified Simple Random Sampling
2stst xx s
/ 2 stst xx z s
43 Slide
Stratified Simple Random Sampling
Population Total• Point Estimator
• Estimate of the Standard Error of the Total
ˆstX Nx
ˆ stxxs Ns
44 Slide
Stratified Simple Random Sampling
Population Total• Interval Estimate
• Approximate 95% Confidence Interval Estimate
ˆ/ 2ˆ
xX z s
ˆˆ 2 xX s
45 Slide
Population Proportion• Point Estimator
where: H = number of strata h = sample proportion for
stratum hNh = number of elements in the
population in stratum h N = total number of elements
in the population (all strata)
Stratified Simple Random Sampling
1
Hh
st hh
Np pN
46 Slide
Stratified Simple Random Sampling
Population Proportion• Estimate of Standard Error of the Proportion
21
(1 )1 ( ) 1st
Hh h
p h h hh h
p ps N N nN n
47 Slide
Population Proportion• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Stratified Simple Random Sampling
2stst pp s
/ 2 stst pp z s
48 Slide
Stratified Simple Random Sampling
Total Sample Size When Estimating Population Mean
2
12
2 2
14
H
h hh
H
h hh
N sn
BN N s
49 Slide
Stratified Simple Random Sampling
Total Sample Size When Estimating Population Total
2
12
2
14
H
h hh
H
h hh
N sn
B N s
50 Slide
Stratified Simple Random Sampling
Allocating Total Sample Size When Estimating Population Mean or Total
1
h hh H
h hh
N sn nN s
51 Slide
Stratified Simple Random Sampling
Total Sample Size When Estimating Population Proportion
2
12
2
1
(1 )
(1 )4
H
h h hh
H
h h hh
N p pn
BN N p p
52 Slide
Stratified Simple Random Sampling
Allocating Total Sample Size When Estimating Population Proportion
1
(1 )(1 )
h h hh H
h h hh
N p pn n
N p p
53 Slide
Example: Mill Creek Co.
Stratified Simple Random SamplingMill Creek Co. has used stratified simple
random sampling to obtain demographic information and preferences regarding health care coverage for its employees and their families.
The population of employees has been divided into 3 strata on the basis of age: under 30, 30-49, and 50 or over. Some of the sample data is shown on the next slide.
54 Slide
Data Annual Family Dental Expense Proportion Stratum Nh nh Mean St.Dev. MarriedUnder 30 100 30 $250 $75 .60 30-49 250 45 400 100 .70
50 or Over 125 30 425 130 .68 475 105
Example: Mill Creek Co.
55 Slide
Point Estimate of Mean Annual Dental Expense
= $375 Estimate of Standard Error of Mean
= 9.27
x NN
xsth
hh
1
3 100475 250 250
475 400 125475 425x N
Nxst
h
hh
1
3 100475 250 250
475 400 125475 425
sN
N N n snx h h h
h
hhst
1 1475
19 390 97222
1
32( ) , ,s
NN N n s
nx h h hh
hhst
1 1475
19 390 97222
1
32( ) , ,
Example: Mill Creek Co.
56 Slide
Approximate 95% Confidence Intervalfor Mean Annual Dental Expense
An approximate 95% confidence interval for mean annual family dental expense is $356.46 to $393.54.
x sst xst 2 375 2 927 = = 356.46 to 393.54( . )x sst xst 2 375 2 927 = = 356.46 to 393.54( . )
Example: Mill Creek Co.
57 Slide
Point Estimate of Total Family Expense for All Employees
Approximate 95% Confidence Interval
= $169,318 to $186,932
( ) , $178,X Nxst 475 375 178 125 125 ( ) , $178,X Nxst 475 375 178 125 125
, ( )( . ) ,X Nsxst 2 178 125 2 475 927 178 125 8807 = = , ( )( . ) ,X Nsxst 2 178 125 2 475 927 178 125 8807 = =
Example: Mill Creek Co.
58 Slide
Point Estimate of Proportion Married
Estimate of Standard Error of Proportion
= .0417 Approximate 95% Confidence Interval for
Proportion
p NN
psth
hh
1
3 100475 6 250
475 7 125475 68 6737. . . .p N
Npst
h
hh
1
3 100475 6 250
475 7 125475 68 6737. . . .
sN
N N n p pnp h h h
h h
hhst
1 11
1475
3916372 1
32( ) ( ) .s
NN N n p p
np h h hh h
hhst
1 11
1475
3916372 1
32( ) ( ) .
p sst pst 2 6737 2 0417 = = .5903 to .7571. (. )p sst pst 2 6737 2 0417 = = .5903 to .7571. (. )
Example: Mill Creek Co.
59 Slide
Cluster Sampling
Cluster sampling requires that the population be divided into N groups of elements called clusters.
We would define the frame as the list of N clusters.
We then select a simple random sample of n clusters.
We would then collect data for all elements in each of the n clusters.
60 Slide
Cluster Sampling
Cluster sampling tends to provide better results than stratified sampling when the elements within the clusters are heterogeneous.
A primary application of cluster sampling involves area sampling, where the clusters are counties, city blocks, or other well-defined geographic sections.
61 Slide
Cluster Sampling
Notation N = number of clusters in the
population n = number of clusters selected in the
sampleMi = number of elements in cluster i M = number of elements in the
population M = average number of elements in a
cluster xi = total of all observations in cluster i ai = number of observations in cluster i
witha certain characteristic
62 Slide
Population Mean• Point Estimator
• Estimate of Standard Error of the Mean
Cluster Sampling
1
1
n
ii
c n
ii
xx
M
2
12
( )
1c
n
i c ii
x
x x MN nsNnM n
63 Slide
Population Mean• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Cluster Sampling
/ 2 cc xx z s
2cc xx s
64 Slide
Population Total• Point Estimator
• Estimate of Standard Error of the Total
Cluster Sampling
ˆcX Mx
ˆ cxXs Ms
65 Slide
Population Total• Interval Estimate
• Approximate 95% Confidence Interval Estimate
Cluster Sampling
/ 2 cc xMx z s
2cc xMx s
66 Slide
Population Proportion• Point Estimator
Cluster Sampling
1
1
n
ii
c n
ii
ap
M
67 Slide
Cluster Sampling
Population Proportion• Estimate of Standard Error of the Proportion
2
12
( )
1c
n
i c ii
p
a p MN nsNnM n
68 Slide
Cluster Sampling
Population Proportion• Interval Estimate
• Approximate 95% Confidence Interval Estimate
/ 2 cc pp z s
2cc pp s
69 Slide
Example: Cooper County Schools
Cluster SamplingThere are 40 high schools in Cooper County. School
officials are interested in the effect of participation in
athletics on academic preparation for college. A cluster sample of 5 schools has been taken and a
questionnaire administered to all the seniors on the
football team at those schools. There are a total of 1200
high school seniors in the county playing football.Data obtained from the questionnaire are shown on
the next slide.
70 Slide
Example: Cooper County Schools
Data
Number Average Number Planning School of Players SAT Score to Attend College
1 45 840 152 20 980 163 30 905 124 38 880 185 40 970 23
173 84
71 Slide
Point Estimate of Mean SAT Score
xx
Mc
ii
ii
1
5
1
545 840 20 980 40 970
45 20 30 38 40 906( ) ( ) ... ( )xx
Mc
ii
ii
1
5
1
545 840 20 980 40 970
45 20 30 38 40 906( ) ( ) ... ( )
Example: Cooper County Schools
72 Slide
Example: Cooper County Schools
Estimate of Standard Error of the Mean
s N nNnM
x x M
nx
i c ii
c
2
21
5
1( )
s N nNnM
x x M
nx
i c ii
c
2
21
5
1( )
1200 1731200 173 30
18 541 9445 1 504782( )( )
, , .
1200 1731200 173 30
18 541 9445 1 504782( )( )
, , .
73 Slide
Approximate 95% Confidence Interval
Point Estimate of Proportion Planning to Attend College
x sc xc 2 = 906 2(5.0478) = 896 to 916x sc xc 2 = 906 2(5.0478) = 896 to 916
pa
Mc
ii
ii
1
5
1
584173 49.p
a
Mc
ii
ii
1
5
1
584173 49.
Example: Cooper County Schools
74 Slide
Systematic Sampling Systematic Sampling is often used as an
alternative to simple random sampling which can be time-consuming if a large population is involved.
If a sample size of n from a population of size N is desired, we might sample one element for every N/n elements in the population.
We would randomly select one of the first N/n elements and then select every (N/n)th element thereafter.
Since the first element selected is a random choice, a systematic sample is often assumed to have the properties of a simple random sample.
75 Slide
End of Chapter 21