8/3/2019 Analysis of Sample Mean
1/43
Go to Index
Analysis of Means
Farrokh Alemi, Ph.D.
Kashif Haqqi M.D.
8/3/2019 Analysis of Sample Mean
2/43
Go to Index
Table of Content
Review
Objectives
Definitions Expected Value
Normal Distribution
Distribution of Mean
Central Limit Theorem Standard Normal
Distribution
Use of Z Values
Confidence Interval
Hypothesis
Two Types of Error
One-tailed Tests
Steps in Testing aHypothesis
When to AssumeNormal Distribution forMeans
Use t-distribution
8/3/2019 Analysis of Sample Mean
3/43
Go to Index
Review
Frequency distribution
Mean, median, and mode
Standard deviation and range
Statistics is theart of making
sense of
distributions.
8/3/2019 Analysis of Sample Mean
4/43
Go to Index
Objectives
Describe different distributions, including
normal, and t-distributions.
Calculate and interpret confidenceintervals using normal distributions.
Understand types of errors that occurs
with hypothesis testing.
Hypothesis testing using t-distribution.
8/3/2019 Analysis of Sample Mean
5/43
Go to Index
Example You Should Be Able to
Answer at the End
Is it important to ask these
types of questions?
The cost of rehabilitation in the industry is
$25,000, with a standard deviation of
3000. Assume that the average cost in our
hospital is $30,000.
With 95% confidence, would you say thatour cost is different than the industry?
8/3/2019 Analysis of Sample Mean
6/43
Go to Index
Definitions
A random variable is a variable whosevalues are determined by chance.
A probability distribution is theprobability with which values of a randomvariable can or are observed.
Probability of a value is the frequency ofoccurrence of that value divided by thefrequency of occurrences of all values.
8/3/2019 Analysis of Sample Mean
7/43
Go to Index
Example of Probability Estimates
We examined the waiting time of 50 people at
our emergency room and found that 10 people
waited up to 5 minutes, 20 people waited 5.001to 10 minutes, 13 people waited 10.001 to 15
minutes and 7 people waited 15.001to 20
minutes.
What is the probability of waiting 5 minutes?
What is the probability of waiting up to 10
minutes? Distributions help us make probability
estimates about observed values.
8/3/2019 Analysis of Sample Mean
8/43
Go to Index
Example of Probability Estimates
(Continued) The probability of waiting up to 5 minutes
is the number of times people waited up to
5 minutes divided by the total number ofpeople: 10/50=.20.
The probability of waiting up to 10
minutes is the number of people whowaited up to 10 minutes divided by the
total number of people: (10+20)/50=0.6.
8/3/2019 Analysis of Sample Mean
9/43
Go to Index
Expected Value
Expected value of a distribution is the mean ofthe distribution.
It represents our long run expectations about thedistribution.
The expected value of X is given by summingthe product of each value of X, referred to as i,
times its probability of occurring, referred to asp(X=i).
Expected value = mean = p(X=i) * i.
8/3/2019 Analysis of Sample Mean
10/43
Go to Index
Example Calculation of Expected
Value or Mean We examined the waiting time of 50
people at our emergency room and found
that 10 people waited up to 5 minutes, 20people waited 6 to 10 minutes, 13 people
waited 11 to 15 minutes and 7 people
waited 16-20 minutes. What is the mean waiting time at our
emergency room?
8/3/2019 Analysis of Sample Mean
11/43
Go to Index
Example Calculation of Expected
Value or Mean (Continued)
Do this in Excel
Observed
waiting
time Frequency Probability
Probability
times waiting
time
2.5 10 0.2 0.5
7.5 20 0.4 3
12.5 13 0.26 3.25
17.5 7 0.14 2.45
Total 50 1 9.2
The expected value or mean is 9.2
http://biostatistics.gmu.edu/means.xlshttp://biostatistics.gmu.edu/means.xls8/3/2019 Analysis of Sample Mean
12/43
Go to Index
Normal Distribution
A symmetric distribution, meaning that
data are evenly distributed about the
mean. Mean, median and mode are the same
value.
It has one mode and looks like a bellshaped curve.
8/3/2019 Analysis of Sample Mean
13/43
Go to Index
Normal Distribution Continued
The curve is continuous, there are no gaps
or holes.
The curve never touches the X-axis as anyvalue is possible but with infinitely small
probabilities.
99.7% of values are within 3 standarddeviations of mean.
8/3/2019 Analysis of Sample Mean
14/43
Go to Index
Distribution of Mean
If you take a repeated sample of some
observations and average them, then you have a
distribution for the mean. The distribution of the mean has the same mean
as the distribution of the observations.
Standard deviation of the mean = Standard error
= Standard deviation of the observations /
Square root of the sample size.
8/3/2019 Analysis of Sample Mean
15/43
Go to Index
Example
What is the mean, standard deviation and
standard error for the following data: 4, 5,
6? Mean = 5
Standard deviation = 1
Standard error = 1 / 1.7 = 0.58
8/3/2019 Analysis of Sample Mean
16/43
Go to Index
Central Limit Theorem
For any distribution of n observations with
mean of and standard deviation .
As n increases, the sample means willhave a Normal distribution of mean and
standard deviation / square root (n).
The theorem is important because it
helps us ignore questions about the
shape of distribution and focus on the
mean and standard deviation of it.Do this in Excel
http://biostatistics.gmu.edu/avgisnormal.xlshttp://biostatistics.gmu.edu/avgisnormal.xls8/3/2019 Analysis of Sample Mean
17/43
Go to Index
Standard Normal Distribution
A Normal distribution.
Mean of zero.
Standard deviation of 1. Z = (Observed valuemean) / standard
deviation of average.
Where standard deviation of mean = standarderror = standard deviation of observations
divided by square root of sample size.
8/3/2019 Analysis of Sample Mean
18/43
Go to Index
Example Calculation of Z
What is the Z value for the observed mean
of 16, if the average mean is 10 and the
standard error is 2? Z = (16-10) / 2 = 3.
8/3/2019 Analysis of Sample Mean
19/43
Go to Index
Another Example
What is the Z value for the mean 16 of 4
observations, if the average of repeated
sample of means is 10 and the standarddeviation of the observations is 2?
Standard deviation of mean =
2 / 4^0.5 = 2/2 =1 Z value for 16 = (16-10)/1 = 6
8/3/2019 Analysis of Sample Mean
20/43
Go to Index
Use of Z Values
99.7% of data are between z=3 and z=-3.
Z is the number of standard deviations that
X is away from the mean.
0.15% of data are below z=-3.
0.15 % of data are above z=3.
8/3/2019 Analysis of Sample Mean
21/43
Go to Index
Use of Z Value (Continued)
95% of data are within z=1.96 and z=-1.96
5% are outside z=1.96 and z=-1.96
2.5% of data are below z=-1.96
2.5% of data are above z=1.96
8/3/2019 Analysis of Sample Mean
22/43
Go to Index
Confidence Interval
For Normal distributions, the 95% two
tailed confidence interval corresponds to
observations where z=1.96 and z=-1.96.
8/3/2019 Analysis of Sample Mean
23/43
Go to Index
Example
What is the 95% confidence interval for
mean of 10 and standard deviation of 2?
Lower limit = 10-1.96*2 = 6.08.
Upper limit = 10+1.96*2 =13.92.
At 13.92, Z value is (13.92-10)/2=1.96.
At 6.08 , Z value is (6.08-10) / 2=-1.96.
95% of data fall within these limits.
8/3/2019 Analysis of Sample Mean
24/43
Go to Index
Two Tailed Confidence Interval
What percentage of data are between z=1.96 andZ=-1.96. Answer: 95%. Often referred to astwo-tailed confidence interval.
What percentage of data are below z=1.96?
Answer = 97.5. Often referred to as one tailed-confidence interval.
What percentage of data are above Z=-1.96.Answer =97.5. Often referred to as one tailedconfidence interval.
8/3/2019 Analysis of Sample Mean
25/43
Go to Index
Hypothesis
A statistical hypothesis is a conjecture
about population parameter.
The null hypothesis is that there is nodifference between the parameter and a
value.
The alternative hypothesis states there is aspecific difference.
Experimental data can only reject a
hypothesis not accept it.
8/3/2019 Analysis of Sample Mean
26/43
Go to Index
Possible Outcomes of Hypothesis
TestThere are four possible outcomes:
1. We reject a hypothesis that is true.
2. We reject a hypothesis that is false.
3. We do not reject a hypothesis that is true.
4. We do not reject a hypothesis that is false.
8/3/2019 Analysis of Sample Mean
27/43
Go to Index
Two Types of Error
Hypothesis is
true
Hypothesis is
false
We reject
hypothesis
Type one error Correct
We do not rejecthypothesis
Correct Type two error
8/3/2019 Analysis of Sample Mean
28/43
Go to Index
Type 1 Error
The level of significance is the maximum
probability of type 1 error, symbolized by alpha,
. When we base our decision on 95% confidence
intervals, 5% of the data are ignored at the two
tails of the distribution. Therefore, there is 5%
chance that we will reject a hypothesis that istrue.
Type one error= 5%, = 0.05.
8/3/2019 Analysis of Sample Mean
29/43
Go to Index
One-tailed Tests
In a two-tailed test, the hypothesis is
rejected when the value is above higher
limit and below the lower limit. In a one-tailed test that a parameter is
larger than a particular value, the
hypothesis is rejected when the value isabove higher limit.
8/3/2019 Analysis of Sample Mean
30/43
Go to Index
One-tailed Tests (Continued)
When we base our decision on 95%
confidence intervals, 2.5% of the data are
ignored at one tail of the distribution.Therefore, there is 2.5% chance that we
will reject a hypothesis that is true.
=0.025.
8/3/2019 Analysis of Sample Mean
31/43
Go to Index
Steps in Testing a Hypothesis
1. State the null hypothesis.
2. Identify the alternative hypothesis.
3. Is this a one tailed or two tailed test?
4. Decide the critical Z value above or below which the
hypothesis is rejected, usually 1.96.
5. Calculate the Z value corresponding to the
observation.6. Reject or do not reject the hypothesis by comparing
the calculated Z to the critical values.
8/3/2019 Analysis of Sample Mean
32/43
Go to Index
Example
The cost of rehabilitation in the industry is
$25,000, with a standard deviation of
3000. In our hospital, the average cost is
$30,000.
With 95% confidence, would you say thatour cost is different than the industry?
Do this in Excel
http://biostatistics.gmu.edu/stand.xlshttp://biostatistics.gmu.edu/stand.xls8/3/2019 Analysis of Sample Mean
33/43
Go to Index
Steps in Testing Example
Hypothesis1. The null hypothesis: Our cost is higher or
lower than average.
2. Alternative hypothesis: Our costs are thesame as the industry.
3. This is a two tailed test.
4. The critical Z is +1.96 or1.96.
5. Observed Z = (30000-25000)/3000 = 1.66.
6. Do not reject the hypothesis.
8/3/2019 Analysis of Sample Mean
34/43
Go to Index
When to Assume Normal
Distribution for Means When the population variance is known
and observations have a Normal
distribution. When the population variance is unknown
and there are more than 30 observations.
Otherwise use t-distribution anapproximation for Normal distribution.
8/3/2019 Analysis of Sample Mean
35/43
Go to Index
Use t-distribution
If the values in the population is Normal.
If we have less than 30 observations.
If we have to estimate the standard
deviation from the sample and variance of
the population is not known.
The t-distribution is used as anapproximation for near Normal data.
8/3/2019 Analysis of Sample Mean
36/43
Go to Index
Calculating t Statistic
t= (observed averagemean) / standard
deviation of the average.
Critical value of t depends on sample size.
For one tail test of alpha = 0.025 and two
tailed test of alpha =0.05.
The critical t value for sample size of 10 is2.22 and for sample size of 20 is 2.08.
8/3/2019 Analysis of Sample Mean
37/43
Go to Index
Calculating t Statistic
(Continued) If we are examining sample size of 10,
95% of data are within t=2.22 and t=-2.22.
If we are examining sample size of 10,97.5% of data are below t=2.22.
8/3/2019 Analysis of Sample Mean
38/43
Go to Index
Testing With t-distribution
1. State the null hypothesis.
2. Identify the alternative hypothesis.
3. Is this a one tailed or two tailed test?
4. Decide the critical t value above or below which the
hypothesis is rejected, the value depends on sample
size.
5. Calculate the t value corresponding to the
observation.
6. Reject or do not reject the hypothesis by comparing
the calculated t to the critical values.
8/3/2019 Analysis of Sample Mean
39/43
Go to Index
Example Data
8/3/2019 Analysis of Sample Mean
40/43
Go to Index
Selecting Data Analysis
8/3/2019 Analysis of Sample Mean
41/43
Go to Index
Select Descriptive Statistics
8/3/2019 Analysis of Sample Mean
42/43
Go to Index
Enter Data Range
8/3/2019 Analysis of Sample Mean
43/43
Go to Index
Result
Confidence interval is
the mean plus or
minus the confidence
level. If it does notinclude $30,000, then
our hospital has a
different cost structure
than other hospitals in
our database
Top Related