Download - Analysis of Sample Mean

8/3/2019 Analysis of Sample Mean

1/43

Go to Index

Analysis of Means

Farrokh Alemi, Ph.D.

Kashif Haqqi M.D.


2/43

Go to Index

Table of Content

Review

Objectives

Definitions Expected Value

Normal Distribution

Distribution of Mean

Central Limit Theorem Standard Normal

Distribution

Use of Z Values

Confidence Interval

Hypothesis

Two Types of Error

One-tailed Tests

Steps in Testing aHypothesis

When to AssumeNormal Distribution forMeans

Use t-distribution


3/43

Go to Index

Review

Frequency distribution

Mean, median, and mode

Standard deviation and range

Statistics is theart of making

sense of

distributions.


4/43

Go to Index

Objectives

Describe different distributions, including

normal, and t-distributions.

Calculate and interpret confidenceintervals using normal distributions.

Understand types of errors that occurs

with hypothesis testing.

Hypothesis testing using t-distribution.


5/43

Go to Index

Example You Should Be Able to

Answer at the End

Is it important to ask these

types of questions?

The cost of rehabilitation in the industry is

$25,000, with a standard deviation of

3000. Assume that the average cost in our

hospital is $30,000.

With 95% confidence, would you say thatour cost is different than the industry?


6/43

Go to Index

Definitions

A random variable is a variable whosevalues are determined by chance.

A probability distribution is theprobability with which values of a randomvariable can or are observed.

Probability of a value is the frequency ofoccurrence of that value divided by thefrequency of occurrences of all values.


7/43

Go to Index

Example of Probability Estimates

We examined the waiting time of 50 people at

our emergency room and found that 10 people

waited up to 5 minutes, 20 people waited 5.001to 10 minutes, 13 people waited 10.001 to 15

minutes and 7 people waited 15.001to 20

minutes.

What is the probability of waiting 5 minutes?

What is the probability of waiting up to 10

minutes? Distributions help us make probability

estimates about observed values.


8/43

Go to Index

Example of Probability Estimates

(Continued) The probability of waiting up to 5 minutes

is the number of times people waited up to

5 minutes divided by the total number ofpeople: 10/50=.20.

The probability of waiting up to 10

minutes is the number of people whowaited up to 10 minutes divided by the

total number of people: (10+20)/50=0.6.


9/43

Go to Index

Expected Value

Expected value of a distribution is the mean ofthe distribution.

It represents our long run expectations about thedistribution.

The expected value of X is given by summingthe product of each value of X, referred to as i,

times its probability of occurring, referred to asp(X=i).

Expected value = mean = p(X=i) * i.


10/43

Go to Index

Example Calculation of Expected

Value or Mean We examined the waiting time of 50

people at our emergency room and found

that 10 people waited up to 5 minutes, 20people waited 6 to 10 minutes, 13 people

waited 11 to 15 minutes and 7 people

waited 16-20 minutes. What is the mean waiting time at our

emergency room?


11/43

Go to Index

Example Calculation of Expected

Value or Mean (Continued)

Do this in Excel

Observed

waiting

time Frequency Probability

Probability

times waiting

time

2.5 10 0.2 0.5

7.5 20 0.4 3

12.5 13 0.26 3.25

17.5 7 0.14 2.45

Total 50 1 9.2

The expected value or mean is 9.2
http://biostatistics.gmu.edu/means.xlshttp://biostatistics.gmu.edu/means.xls


12/43

Go to Index

Normal Distribution

A symmetric distribution, meaning that

data are evenly distributed about the

mean. Mean, median and mode are the same

value.

It has one mode and looks like a bellshaped curve.


13/43

Go to Index

Normal Distribution Continued

The curve is continuous, there are no gaps

or holes.

The curve never touches the X-axis as anyvalue is possible but with infinitely small

probabilities.

99.7% of values are within 3 standarddeviations of mean.


14/43

Go to Index

Distribution of Mean

If you take a repeated sample of some

observations and average them, then you have a

distribution for the mean. The distribution of the mean has the same mean

as the distribution of the observations.

Standard deviation of the mean = Standard error

= Standard deviation of the observations /

Square root of the sample size.


15/43

Go to Index

Example

What is the mean, standard deviation and

standard error for the following data: 4, 5,

6? Mean = 5

Standard deviation = 1

Standard error = 1 / 1.7 = 0.58


16/43

Go to Index

Central Limit Theorem

For any distribution of n observations with

mean of and standard deviation .

As n increases, the sample means willhave a Normal distribution of mean and

standard deviation / square root (n).

The theorem is important because it

helps us ignore questions about the

shape of distribution and focus on the

mean and standard deviation of it.Do this in Excel
http://biostatistics.gmu.edu/avgisnormal.xlshttp://biostatistics.gmu.edu/avgisnormal.xls


17/43

Go to Index

Standard Normal Distribution

A Normal distribution.

Mean of zero.

Standard deviation of 1. Z = (Observed valuemean) / standard

deviation of average.

Where standard deviation of mean = standarderror = standard deviation of observations

divided by square root of sample size.


18/43

Go to Index

Example Calculation of Z

What is the Z value for the observed mean

of 16, if the average mean is 10 and the

standard error is 2? Z = (16-10) / 2 = 3.


19/43

Go to Index

Another Example

What is the Z value for the mean 16 of 4

observations, if the average of repeated

sample of means is 10 and the standarddeviation of the observations is 2?

Standard deviation of mean =

2 / 4^0.5 = 2/2 =1 Z value for 16 = (16-10)/1 = 6


20/43

Go to Index

Use of Z Values

99.7% of data are between z=3 and z=-3.

Z is the number of standard deviations that

X is away from the mean.

0.15% of data are below z=-3.

0.15 % of data are above z=3.


21/43

Go to Index

Use of Z Value (Continued)

95% of data are within z=1.96 and z=-1.96

5% are outside z=1.96 and z=-1.96

2.5% of data are below z=-1.96

2.5% of data are above z=1.96


22/43

Go to Index

Confidence Interval

For Normal distributions, the 95% two

tailed confidence interval corresponds to

observations where z=1.96 and z=-1.96.


23/43

Go to Index

Example

What is the 95% confidence interval for

mean of 10 and standard deviation of 2?

Lower limit = 10-1.96*2 = 6.08.

Upper limit = 10+1.96*2 =13.92.

At 13.92, Z value is (13.92-10)/2=1.96.

At 6.08 , Z value is (6.08-10) / 2=-1.96.

95% of data fall within these limits.


24/43

Go to Index

Two Tailed Confidence Interval

What percentage of data are between z=1.96 andZ=-1.96. Answer: 95%. Often referred to astwo-tailed confidence interval.

What percentage of data are below z=1.96?

Answer = 97.5. Often referred to as one tailed-confidence interval.

What percentage of data are above Z=-1.96.Answer =97.5. Often referred to as one tailedconfidence interval.


25/43

Go to Index

Hypothesis

A statistical hypothesis is a conjecture

about population parameter.

The null hypothesis is that there is nodifference between the parameter and a

value.

The alternative hypothesis states there is aspecific difference.

Experimental data can only reject a

hypothesis not accept it.


26/43

Go to Index

Possible Outcomes of Hypothesis

TestThere are four possible outcomes:

1. We reject a hypothesis that is true.

2. We reject a hypothesis that is false.

3. We do not reject a hypothesis that is true.

4. We do not reject a hypothesis that is false.


27/43

Go to Index

Two Types of Error

Hypothesis is

true

Hypothesis is

false

We reject

hypothesis

Type one error Correct

We do not rejecthypothesis

Correct Type two error


28/43

Go to Index

Type 1 Error

The level of significance is the maximum

probability of type 1 error, symbolized by alpha,

. When we base our decision on 95% confidence

intervals, 5% of the data are ignored at the two

tails of the distribution. Therefore, there is 5%

chance that we will reject a hypothesis that istrue.

Type one error= 5%, = 0.05.


29/43

Go to Index

One-tailed Tests

In a two-tailed test, the hypothesis is

rejected when the value is above higher

limit and below the lower limit. In a one-tailed test that a parameter is

larger than a particular value, the

hypothesis is rejected when the value isabove higher limit.


30/43

Go to Index

One-tailed Tests (Continued)

When we base our decision on 95%

confidence intervals, 2.5% of the data are

ignored at one tail of the distribution.Therefore, there is 2.5% chance that we

will reject a hypothesis that is true.

=0.025.


31/43

Go to Index

Steps in Testing a Hypothesis

1. State the null hypothesis.

2. Identify the alternative hypothesis.

3. Is this a one tailed or two tailed test?

4. Decide the critical Z value above or below which the

hypothesis is rejected, usually 1.96.

5. Calculate the Z value corresponding to the

observation.6. Reject or do not reject the hypothesis by comparing

the calculated Z to the critical values.


32/43

Go to Index

Example

The cost of rehabilitation in the industry is

$25,000, with a standard deviation of

3000. In our hospital, the average cost is

$30,000.

With 95% confidence, would you say thatour cost is different than the industry?

Do this in Excel
http://biostatistics.gmu.edu/stand.xlshttp://biostatistics.gmu.edu/stand.xls


33/43

Go to Index

Steps in Testing Example

Hypothesis1. The null hypothesis: Our cost is higher or

lower than average.

2. Alternative hypothesis: Our costs are thesame as the industry.

3. This is a two tailed test.

4. The critical Z is +1.96 or1.96.

5. Observed Z = (30000-25000)/3000 = 1.66.

6. Do not reject the hypothesis.


34/43

Go to Index

When to Assume Normal

Distribution for Means When the population variance is known

and observations have a Normal

distribution. When the population variance is unknown

and there are more than 30 observations.

Otherwise use t-distribution anapproximation for Normal distribution.


35/43

Go to Index

Use t-distribution

If the values in the population is Normal.

If we have less than 30 observations.

If we have to estimate the standard

deviation from the sample and variance of

the population is not known.

The t-distribution is used as anapproximation for near Normal data.


36/43

Go to Index

Calculating t Statistic

t= (observed averagemean) / standard

deviation of the average.

Critical value of t depends on sample size.

For one tail test of alpha = 0.025 and two

tailed test of alpha =0.05.

The critical t value for sample size of 10 is2.22 and for sample size of 20 is 2.08.


37/43

Go to Index

Calculating t Statistic

(Continued) If we are examining sample size of 10,

95% of data are within t=2.22 and t=-2.22.

If we are examining sample size of 10,97.5% of data are below t=2.22.


38/43

Go to Index

Testing With t-distribution

1. State the null hypothesis.

2. Identify the alternative hypothesis.

3. Is this a one tailed or two tailed test?

4. Decide the critical t value above or below which the

hypothesis is rejected, the value depends on sample

size.

5. Calculate the t value corresponding to the

observation.

6. Reject or do not reject the hypothesis by comparing

the calculated t to the critical values.


39/43

Go to Index

Example Data


40/43

Go to Index

Selecting Data Analysis


41/43

Go to Index

Select Descriptive Statistics


42/43

Go to Index

Enter Data Range


43/43

Go to Index

Result

Confidence interval is

the mean plus or

minus the confidence

level. If it does notinclude $30,000, then

our hospital has a

different cost structure

than other hospitals in

our database