Hypothesis Testing

70
Hypothesis Testing An Introduction Using the Normal Density for Testing Claims Against the Population Mean

Transcript of Hypothesis Testing

Hypothesis Testing

An Introduction Using the Normal Density for Testing Claims Against the

Population Mean

Objectives

• To introduce hypothesis testing components when determining the accuracy of a claim against a population mean.

• To demonstrate the four-step process for answering questions using hypothesis testing methods.

Materials To Review

• Normal density curves.• z-scores• Sampling distribution of means• Central Limit Theorem

Terminology

• A hypothesis is a claim or statement about a property of a population.

• A hypothesis test (or a test of significance) is a procedure for testing a claim about a property of a population.

• The null hypothesis, denoted as H0, is a statement that the value of a population parameter is equal to some claimed value.

Terminology

• The alternate hypothesis, denoted as Ha, is the statement that the parameter has a value that differs from the null hypothesis.

• The test statistic is a measure of the distance the sample statistic is from the claim based on the distribution.

Terminology

• The P-value (or probability value) is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data, assuming the null hypothesis is accurate.

Terminology

• The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject the null hypothesis.

• The significance level (denoted by a) is the probability that the test statistic will fall in the critical region when the null hypothesis is accurate.

Terminology

• A critical value is any value that separates the critical region from the values of the test statistic that do not lead to rejection of the null hypothesis.

• The critical value depends on:1. Nature of the null hypothesis,2. The sampling distribution that applies, and3. The significance level, a.

Forming Hypotheses

• If you are conducting a study and want to use a hypothesis test to support a claim. The claim must be worded so that it becomes the alternate hypothesis.

• In other words, we always test AGAINST the null hypothesis. So if you wish to find evidence in support of a claim, that claim must be the ALTERNATE hypothesis.

Cautions

• When conducting hypothesis tests as described in this course, be sure to consider:

1. The context of the data,2. The source of the data,3. The sampling method used to collect the

data.• All of these can have an effect on the decisions

made for hypothesis tests.

The Rare Event Rule

• If, under a given assumption, the probability of a particular observed event is smaller than a significance level, we conclude that the assumption is probably not accurate.

• Note that this decision criteria is based on the significance level.

• Since the investigator usually chooses the significance level, in order to be meaningful, it is usually 10% or less.

Conditions for Using Normal Density

• To construct a confidence interval for the purpose of estimating a population mean, the normal density may be used under the following conditions:

1. The sample is a simple random sample.2. The value of the population standard

deviation is known.

Conditions for Using Normal Density

• As an alternative to condition 2, we have:3. Either or both of these conditions are

satisfied:a) The population is known to be normally

distributed, orb) The sample size is greater than 30.

Components for 1-Sample Proportions Test

• Distribution: Normal if conditions are met• Significance Level: a (usually given), 0.05 if

not given.• Null hypothesis: H0: m = m0

• Alternate hypothesis: Type of Test Mathematical Translation

Left-Sided Test Ha: m < m0

Right-Sided Test Ha: m > m0

Two-Sided Test Ha: m ≠ m0

Components for 1-Sample Proportions Test

• Test statistic: .• P-value: Area under the distribution from the

test statistic that describes the probability of obtaining such a sample or a sample with a more extreme value.

Components: Density with Null Hypothesis

Claim

Components: Significance Level and Alternate Hypothesis

• This is the type of diagram that would be seen for a left sided test. The alternate hypothesis suggests the true value is less than what is claimed.

Claim

Significant Level = a

z*

Components: Significance Levels and Alternate Hypothesis

• This is the type of diagram that would be seen for a right sided test. The alternate hypothesis suggests the true value is greater than what is claimed.

Claim

Significant Level = a

z*

Components: Significance Levels and Alternate Hypothesis

• This is the type of diagram that would be seen for a two-sided test. The alternate hypothesis suggests the true value is different than what is claimed.

Claim

Significant Level = /2a

z*

Significant Level = /2a

z*

Components: Significance Levels and Alternate Hypotheses

• In each of the last three diagrams, the area in the red region corresponds to the significance level.

• The region colored in red is referred to as the region of rejection.

Components: Significance Levels and Alternate Hypotheses

• So the significance level tells us the amount of area under the curve should be covered.

• The alternate hypothesis tells us the direction of covering this area.

• The critical values, z*, tell us where on the curve these regions begin so the proper amount of area is covered.

Components: Test Statistic

• The test statistic is a measure in terms of standard deviations of how far our sample statistic is from the claimed value.

• Since we are using information from a sample, we must consider the standard deviation from the sampling distribution, not the original population.

• The Central Limit Theorem provides the mathematical support for the methods used here.

Components: Critical Values

• The critical value is the value that begins the region of rejection. It is a measure of how far away from the claim a sample statistic needs to be in order to cast doubt on the accuracy of the claim.

• Once a test statistic is beyond the critical value, our sample is in the region of rejection and we would have sufficient evidence to reject the claim.

More on Critical Values

• Before computers the critical values were used to determine decisions based on these tests. This is sometimes referred to as the traditional method.

• With computers available, the P-value is computed as to give one single criteria for all testing schemes as well provide information about the chances of obtaining such samples. This is referred to as the P-value method and this is the method we shall use in our class.

Components: P-value (Rejection of Null Hypothesis)

• The P-value is the area under the curve that begins from the test statistic and is shaded in the direction of the alternate hypothesis.

Claim

Significant Level = a

z* z

P-value

Components: P-value (Fail to Reject Null Hypothesis)

• The P-value is the area under the curve that begins from the test statistic and is shaded in the direction of the alternate hypothesis.

Claim

Significant Level = a

z*z

P-value

Four-Step Process

• State: What is the practical question that requires a statistical test?

• Plan: Before you devise your plan, do the following:

Test to see if the conditions are satisfied.If the conditions are satisfied, you may then write out the hypotheses. Also identify the parameter and choose the type of test you are conducting.

Four-Step Process

• Solve: Compute the test statistic and determine the p-value.

• Conclude: In this step we do the following:1. Compare the p-value with the significance

level.2. Write the conclusion in the context of the

problem.

EXAMPLES OF HYPOTHESIS TESTING

In this section we demonstrate how to perform a hypothesis test for population proportions. Each type of alternative is demonstrated.

Example 1: Left-Sided Test

• Management at Ford Motor Company are trying to determine if the average age of a car is now less than what it was in 1995. The age of cars in 1995 was 8.33 years. Based on a random sample of 18 automobiles owners, you obtain the ages of their cars:

{8, 12, 1, 2, 13, 3, 5, 9, 12, 6, 5, 6, 10, 7, 10, 11, 6, 10}.• Assuming that s = 3.8 years, test this claim at the

10% significance level.

Example 1: Left-Sided Tests

• Because of the small sample size, we need to test to see if data are normally distributed with no outliers. This is done with a normal probability plot and a boxplot.

• The graphs are given on the next slide.

Example 1: Left-Sided Test

• Boxplot shown below shows a roughly symmetric shape with no outliers.

• Normal probability plot below shows the data is roughly linear with no outliers. This gives evidence to suggest the data are normally distributed.

Example 1: Left-Sided Test

• First we identify the components:1. Distribution: Since the boxplot and normal

probability plot support the use of a normal density, we may use a normal density for our test.

2. Significance level: We have a significance level of 10%, so a = 0.10.

Example 1: Left-Sided Test

3. Null hypothesis: The null hypothesis would be there is no difference in the age of cars today compared to the age of cars in 1995, which was 8.33 years. Thus Ho: m = 8.33

4. Alternate hypothesis: Since Ford Motor Company’s management believes the age is less than the age in 1995, we have Ha: m < 8.33.

Example 1: DistributionSignificance Level = region of rejectionSignificance Level = 10%Area to the left of critical value = a.

Critical Value, z* = 7.18

z*

Example 1: Left-Sided Test

• Test statistic: If we compute the test statistic using the formula we have:

• So our particular sample’s average is 0.86

standard deviations (on the sampling distribution of means curve) to the left of the mean.

Example 1: Left-Sided Test

• P-value: 0.1936• So the probability of obtaining such a sample

average or one more extreme than this is approximately 19.4%.

Example 1: Test Statistic and P-Value

Critical Value, z*

P-value = hatched shadingP-value = 0.1936 or 19.4%

Test Statistic

Significance Levela = 0.10 or 10%

Example 1: Analyzing the Diagram

• We can see that the test statistic is not further away from the claim than the critical value.

• We can see the amount of area in red is less than the amount of area that is brown.

• The test statistic, is not less than the critical value z*.

Example 1: Left-Sided Test

• State: Are cars younger today than they were in 1995?

• Plan: We shall use a Left-Sided Z-Test for this. We have checked the conditions, since the sample size is small and have concluded that the conditions are satisfied. Our hypotheses are:

Ho: m = 8.33

Ha: m < 8.33

Example 1: Left-Sided Test

• Solve: Our test statistic is -0.86. Our P-value is 0.1936.

• Conclude: The P-value > significance level. This implies we fail to reject the null hypothesis. We do not have sufficient evidence to suggest that the age of cars are in fact younger than they were in 1995.

Example 2: Right-Sided Test

• A researcher claims that the average age of a woman before she has her first child is greater than the 1990 mean age of 26.4 years, on the basis of data obtained from the National Vital Statistics Report, Vol. 48, No. 14. She obtains a simple random sample of 40 women who gave birth to their first child in 1999 and finds the sample mean age to be 27.1 years. Assuming the population’s standard deviation is s = 6.4 years, test the researcher’s claim using a significance level of 5%.

Example 2: Right-Sided Test

• First we identify the components:1. Distribution: Normal density, we have a large

enough sample of 40 individuals and the population standard deviation is given at s = 6.4 years.

2. Significance level: Given at 0.05 or 5%.

Example 2: Right-Sided Test

3. Null hypothesis: The claim we wish to find evidence against is there is no difference between ages of women between the years 1990 and 1999. According to the National Vital Statistics Report, the average age was 26.4 years. So H0: m = 26.4.

4. Alternate hypothesis: The researcher believes that the age is now older, so we have: Ha: m > 26.4.

Example 2: DistributionSignificance Level = region of rejectionSignificance Level = 5%Area to the left of critical value = a.

Critical Value, z* = 28.06

z*

Example 2: Right-Sided Test

• Test statistic:

• This means that my particular proportion from our sample is about 0.69 standard deviations from the claim under the sampling distribution of the means.

Example 2: Right-Sided Test

• P-value: 0.2445.• This means that we have approximately a

24.45% chance of seeing such a sample average or a sample average that is greater.

Example 2: Test Statistic and P-value

Critical Value, z*

P-value = hatched shadingP-value = 0.2445 or 24.45%

Test Statistic

Significance Levela = 0.05 or 5%

Example 2: Analyzing the Diagram

• We can see that the test statistic is not further away from the claim as the critical value is.

• We can see the amount of area hatched is greater than the amount of area shaded in red.

• Both of these are indicators that we fail to reject the null hypothesis. We did not find sufficient evidence to suggest that women are in fact having children later in life.

Example 2: Right-Sided Test

• State: Are women waiting longer before having their first child?

• Plan: I will conduct a Z-Test using a sample of 40 mothers who had their first child in 1999. I am comparing the results of this new data to the results obtained and published in National Vital Statistics Report, Vol. 48, No. 14. in 1995. The condition for using the normal approximation is satisfied.

Example 2: Right-Sided Test

• Solve: The test statistic is approximately 0.6917 and the P-value is approximately 0.2445.

• Conclude: Since the P-value > 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to suggest that women are waiting longer to have their first child.

Example 3: Two-Sided Test

• Suppose I am in the market to purchase a 2010 Ford Mustang. Before shopping around, I want to determine what I should expect to pay.

Example 3: Two-Sided Test

• According to the Kelly Blue Book value, Using the following criteria:

1. Very Good Condition2. 2-Door Coupe3. No Added Options4. For Private Party5. 43,500 miles• We have a price of $12,972.

Example 3: Two-Sided Test

• After doing some shopping on carsforsale.com, I found the following prices for fourteen 2010 Ford Mustangs:

{15950, 17995, 16999, 16995, 17995, 15945, 17985, 18977, 14259, 18689, 14500, 21995, 15900, 18995} • After seeing these values, I think I’ll be paying

something different than what Kelley Blue Book published. Run a test to determine if the Kelley Blue Book value is an accurate estimate to what I would be paying. Assume s = $2100.

Example 3: Two-Sided Test

• Since our sample is small, we need to determine if the conditions are upheld to conduct such a test.

• We use a boxplot and a normal probability plot to determine if the assumption of normality is upheld.

• These pictures are on the next slide.

Example 3: Two-Sided Test

• Boxplot is roughly symmetric with no outliers.

• Normal Probability Plot shows the data is roughly linear giving evidence that a normal density may be used.

Example 3: Two-Sided Test

• First we identify the components:1. Distribution: Using the normal probability

plot and the boxplot we conclude that the normality condition is upheld. Hence we may use a normal density.

2. Significance level: Not given so we shall set it to be 0.05 or 5%.

Example 3: Two-Sided Test

3. Null hypothesis: The claim is there is no statistically significant difference in the published value of the Mustang in Kelley Blue Book and the sample average I found, so H0: m = 12,972.

4. Alternate hypothesis: Since we are testing to see if there is any difference in the averages, we have: Ha: m ≠ 12,972.

Example 3: Distribution1. Shading starts at the critical values z* after we divide the significance level in half.

2. Direction of shading is based on the alternate hypothesis.

𝛼2

𝛼2

Critical Value Critical Value

Example 3: Two-Sided Test

• Test statistic:

• This means that my particular proportion from our sample is about 7.836 standard deviations from the claim using the sampling distribution of means.

Example 3: Two-Sided Test

• P-value: 4.697 × 10-15.• This means that we have approximately a

0.0000000000004697% chance of seeing such a sample proportion or a sample proportion that is greater.

Example 3: Test Statistic and P-value

Significance Level = a/2 Significance Level = a/2

Critical Value Critical Value

Half of P-value Half of P-value

Example 3: Test Statistic and P-value Right Side of Claim

P-value = hatched areaP-value =

Significance Level = 0.025

Critical value Test Statistic

Example 3: Test Statistic and P-value Left Side of Claim

P-value = hatched areaP-value =

Significance Level = 0.025

Critical valueTest Statistic

Example 3: Analyzing the Diagrams

• We can see that the test statistic is further away from the claim than the critical value is.

• We can see the amount of area hatched is less than the amount of area shaded in red.

• Both of these are indicators that we reject the null hypothesis. We did find sufficient evidence to suggest the Kelley Blue Book value is not accurate to what I would be paying for a 2010 Ford Mustang.

Example 3: Analyzing The Diagrams

• We have two detailed diagrams because we have two sides to our test.

• Since we are testing the claim of no difference we must account for the possibility that the test statistic could be on either side of the claim.

• This also accounts for the significance level being halved as well. The significance level corresponds to the total area under the curve that represents what we describe as “a rare event”.

Example 3: Two-Sided Test

• State: Is Kelley Blue Book’s average value of a 2010 Ford Mustang accurate to what people are selling?

• Plan: I will conduct a Z-Test using a sample of 14 prices. The condition for using the normal approximation is satisfied. Therefore our hypotheses are:

H0: m = 12,972

Ha: m ≠ 12,972

Example 3: Two-Sided Test

• Solve: The test statistic is approximately 7.836 and the P-value is approximately 0.

• Conclude: Since the P-value < 0.05, we reject the null hypothesis. We have sufficient evidence to suggest the Kelley Blue Book value and the price I’ll be paying for a 2010 Mustang are different.

Lurking Variables

• What are some lurking variables that could have an effect on my conclusion?

• Individuals will try to make a profit and therefore may set the price of their cars higher?

• Some cars may have additional features, like being convertible. This influences the price.

• Any others?

The End