Sampling and Statistical Analysis for Decision Making
A. A. ElimamCollege of Business
San Francisco State University
Chapter Topics• Sampling: Design and Methods• Estimation:
• Confidence Interval Estimation for the Mean(Known)
•Confidence Interval Estimation for the Mean (Unknown)
•Confidence Interval Estimation for the Proportion
Chapter Topics
• The Situation of Finite Populations• Student’s t distribution • Sample Size Estimation• Hypothesis Testing• Significance Levels• ANOVA
Statistical Sampling
• Sampling: Valuable tool• Population:
• Too large to deal with effectively or practically• Impossible or too expensive to obtain all data
• Collect sample data to draw conclusions about unknown population
Sample design
• Representative Samples of the population • Sampling Plan: Approach to obtain samples• Sampling Plan: States
• Objectives• Target population • Population frame• Method of sampling• Data collection procedure• Statistical analysis tools
Objectives
• Estimate population parameters such as a mean, proportion or standard deviation• Identify if significant difference exists between two populations
Population Frame• List of all members of the target population
Sampling Methods
• Subjective Sampling: • Judgment: select the sample (best customers)
• Convenience: ease of sampling • Probabilistic Sampling:
• Simple Random Sampling• Replacement• Without Replacement
Sampling Methods
• Systematic Sampling: • Selects items periodically from population. • First item randomly selected - may produce bias
• Example: pick one sample every 7 days
• Stratified Sampling: • Populations divided into natural strata• Allocates proper proportion of samples to each stratum• Each stratum weighed by its size – cost or significance of certain strata might suggest different allocation• Example: sampling of political districts - wards
Sampling Methods
• Cluster Sampling:• Populations divided into clusters then random sample each• Items within each cluster become members of the sample• Example: segment customers for each geographical location
• Sampling Using Excel: • Population listed in spreadsheet• Periodic• Random
Sampling Methods: Selection
• Systematic Sampling:• Population is large – considerable effort to randomly select
• Stratified Sampling: • Items in each stratum homogeneous - Low variances • Relatively smaller sample size than simple random sampling
• Cluster Sampling: • Items in each cluster are heterogeneous • Clusters are representative of the entire Population• Requires larger sample
Sampling Errors
• Sample does not represent target population (e. g. selecting inappropriate sampling method)
• Inherent error:samples only subset of population• Depends on size of Sample relative to population• Accuracy of estimates• Trade-off: cost/time versus accuracy
Sampling From Finite Populations
• Finite without replacement (R)• Statistical theory assumes: samples selected with R• When n < .05 N – difference is insignificant • Otherwise need a correction factor• Standard error of the mean
1x
N nNn
Statistical Analysis of Sample Data
• Estimation of population parameters (PP)• Development of confidence intervals for PP• Probability that the interval correctly estimates true population parameter• Means to compare alternative decisions/process
(comparing transmission production processes)• Hypothesis testing: validate differences among PP
Mean, , is unknown
Population Random SampleI am 95%
confident that is between 40 &
60.
Mean X = 50
Estimation Process
Sample
Mean
Proportion p ps
Variance s2
Population Parameters Estimated
2
X_
Point EstimatePopulation Parameter
Std. Dev. s
• Provides Range of Values Based on Observations from Sample
• Gives Information about Closeness to Unknown Population Parameter
• Stated in terms of Probability Never 100% Sure
Confidence Interval Estimation
Confidence Interval Sample Statistic
Confidence Limit (Lower)
Confidence Limit (Upper)
A Probability That the Population Parameter Falls Somewhere Within the Interval.
Elements of Confidence Interval Estimation
Example: 90 % CI for the mean is 10 ± 2.
Point Estimate = 10
Margin of Error = 2
CI = [8,12]
Level of Confidence = 1 - = 0.9
Probability that true PP is not in this CI = 0.1
Example of Confidence Interval Estimation
Parameter = Statistic ± Its Error
Confidence Limits for Population Mean
X Error
= Error = X
XX
XZ
xZ
XZX
Error
Error
X
90% Samples
95% Samples
x_
Confidence Intervals
xx .. 64516451
xx 96.196.1
xx .. 582582 99% Samples
nZXZX X
X_
• Probability that the unknown population parameter falls within the
interval
• Denoted (1 - ) % = level of confidence e.g. 90%, 95%, 99%
Is Probability That the Parameter Is Not Within the Interval
Level of Confidence
Confidence Intervals
Intervals Extend from (1 - ) % of
Intervals Contain . % Do Not.
1 - /2/2
X_
x_
Intervals & Level of Confidence
Sampling Distribution of
the Mean
toXZX
XZX
X
• Data Variation measured by
• Sample Size
• Level of Confidence (1 - )
Intervals Extend from
Factors Affecting Interval Width
X - Z to X + Z xx
n/XX
Mean
Unknown
ConfidenceIntervals
Proportion
FinitePopulation Known
Confidence Interval Estimates
• Assumptions Population Standard Deviation is Known Population is Normally Distributed If Not Normal, use large samples
• Confidence Interval Estimate
Confidence Intervals (Known)
nZX /
2
nZX /
2
Mean
Unknown
ConfidenceIntervals
Proportion
FinitePopulation Known
Confidence Interval Estimates
• Assumptions Population Standard Deviation is Unknown Population Must Be Normally Distributed
• Use Student’s t Distribution• Confidence Interval Estimate
Confidence Intervals (Unknown)
nStX n,/ 12
n
StX n,/ 12
• Shape similar to Normal Distribution • Different t distributions based on df• Has a larger variance than Normal• Larger Sample size: t approaches Normal• At n = 120 - virtually the same• For any sample size true distribution of
Sample mean is the student’s t• For unknown and when in doubt use t
Student’s t Distribution
Standard Normal
Zt0
t (df = 5)
t (df = 13)Bell-ShapedSymmetric
‘Fatter’ Tails
Student’s t Distribution
• Number of Observations that Are Free to Vary After Sample Mean Has Been Calculated
• Example Mean of 3 Numbers Is 2
X1 = 1 (or Any Number)X2 = 2 (or Any Number)X3 = 3 (Cannot Vary)Mean = 2
degrees of freedom = n -1 = 3 -1= 2
Degrees of Freedom (df)
Upper Tail Area
df .25 .10 .05
1 1.000 3.078 6.314
2 0.817 1.886 2.920
3 0.765 1.638 2.353
t0
Assume: n = 3 df = n - 1 = 2
= .10 /2 =.05
2.920t Values
.05
Student’s t Table
A random sample of n = 25 has = 50 and s = 8. Set up a 95% confidence interval estimate for .
. .46 69 53 30
X
Example: Interval Estimation Unknown
nStX n,/ 12
nStX n,/ 12
2580639250 . 25
80639250 .
Sample of n = 30, S = 45.4 - Find a 99 % CI for, , the mean of each transmission system process. Therefore = .01 and = .005
266.75 312.45
Example: Tracway Transmission
nStX n,/ 12 n
StX n,/ 12
45.4289.6 2.756430
45.4289.6 2.756430
/ 2, 1 .005,29 2.7564nt t
Mean
Unknown
ConfidenceIntervals
Proportion
FinitePopulation Known
Confidence Interval Estimates
• Assumptions Sample Is Large Relative to Population
n / N > .05• Use Finite Population Correction Factor• Confidence Interval (Mean, X Unknown)
X
Estimation for Finite Populations
nStX n,/ 12 n
StX n,/ 121
N
nN1
NnN
Mean
Unknown
ConfidenceIntervals
Proportion
FinitePopulation Known
Confidence Interval Estimates
• Assumptions Two Categorical Outcomes Population Follows Binomial Distribution Normal Approximation Can Be Used n·p 5 & n·(1 - p) 5
• Confidence Interval Estimate
Confidence Interval Estimate Proportion
n)p(pZp ss
/s
1
2 pn
)p(pZp ss/s
12
A random sample of 1000 Voters showed 51% voted for Candidate A. Set up a 90%
confidence interval estimate for p.
p .484 .536
Example: Estimating Proportion
n)p(pZp ss
/s
1
2 p
n)p(pZp ss
/s
1
2
.51(1 .51).51 1.6451000
p .51(1 .51).51 1.645
1000
Sample Size
Too Big:•Requires toomuch resources
Too Small:•Won’t do the job
What sample size is needed to be 90% confident of being correct within ± 5? A pilot study suggested that the standard
deviation is 45.
nZError
2 2
2
2 2
2
1645 45
5219 2 220
..
Example: Sample Size for Mean
Round Up
What sample size is needed to be within ± 5 with 90% confidence? Out of a population of 1,000, we randomly selected 100 of which 30 were defective.
Example: Sample Size for Proportion
Round Up
322705
7030645112
2
2
2
..
))(.(..error
)p(pZn
228
Hypothesis Testing
• Draw inferences about two contrasting propositions (hypothesis)
• Determine whether two means are equal:1. Formulate the hypothesis to test2. Select a level of significance3. Determine a decision rule as a base to
conclusion4. Collect data and calculate a test statistic5. Apply the decision rule to draw conclusion
Hypothesis Formulation
• Null hypothesis: H0 representing status quo• Alternative hypothesis: H1
• Assumes that H0 is true • Sample evidence is obtained to determine
whether H1 is more likely to be true
Test
Accept Reject
Significance Level
FalseTrue
Type II ErrorType I Error
Probability of making Type I error = level of significance
Confidence Coefficient = 1-
Probability of making Type II error = level of significance
Power of the test = 1-
Decision Rules
• Sampling Distribution: Normal or t distribution• Rejection Region• Non Rejection Region• Two-tailed test , /2• One-tailed test , • P-Values
Hypothesis Testing: Cases
• Two-Sample Means
• F-Test for Variances
• Proportions
• ANOVA: Differences of several means
• Chi-square for independence
Chapter Summary• Sampling: Design and Methods• Estimation:
• Confidence Interval Estimation for Mean(Known)
• Confidence Interval Estimation for Mean (Unknown)
• Confidence Interval Estimation for Proportion
Chapter Summary• Finite Populations• Student’s t distribution • Sample Size Estimation• Hypothesis Testing• Significance Levels: Type I/II errors • ANOVA
Top Related