Ma

Week 5: Hypothesis Testing II: One Sample Hypothesis Tests - Lecture

Hypothesis Testing

The Basic Idea of Hypothesis Testing | An Example and theFive Steps of Hypothesis Testing | The Essential Variations | Problem Situation: z Tests about a Population Mean ( Known): Two-Sided Alternatives | Problem Situation: t Tests about a Population Mean ( Unknown) | Problem Situation: z Tests about a Population Proportion | Summary and Conclusions

In Weeks 3 and 4, we looked first at sampling distributions and then at confidence intervals, culminating in honing our ability to make inferences from a single sample to a population mean or proportion. This week, we will introduce "hypothesis testing," which is the alternative way of making inferences from a single sample to a population mean or proportion.

Hypothesis testing uses the same concepts and tools that we used with sampling distributions and confidence intervals, but it puts them into a special hypothesis testing framework. The reason for having a special hypothesis testing framework is that hypothesis testing can be characterized as a battle between the "new" and the "old." The old is conservative; it believes in the status quo and it believes that the new is not really new, but is simply an extension or a variation of the old. The new approach is revolutionary; it believes that things have changed fundamentally and that the old ways must be abandoned. The old or conservative approach assumes that business circumstances are relatively stable, and that outlying data is simply expected variance. The new approach assumes that the data variation is an indication of fundamentally new business circumstances.

Now, both perspectives are entirely reasonable, and neither is inherently right or wrong. In our view, it depends on the particular situation and the facts of the case. But, in any given situation, how does one decide between them? The special hypothesis testing framework is a procedure for evaluating these competing claims to "the truth." We will turn to the details of this procedure shortly. Before doing so, however, we will briefly introduce an analogy between hypothesis testing and our criminal justice system. This will provide an intuitive way of conveying the basic idea of hypothesis testing.

A cornerstone of our criminal justice system, and our democracy, is that a person is considered innocent until proven guilty. Moreover, the proof of guilt must be "beyond a reasonable doubt" as decided unanimously by a jury of one's peers. While we all know that juries do sometimes convict innocent persons, who may be incarcerated for years or even put to death, the criminal justice system is designed to do everything possible to prevent wrongful convictions. To paraphrase Winston Churchill on democracy, "it's not a good system of government, it's just the best ever devised."

Hypothesis testing arose in the context of scientific research. Scientists are driven to discover the new and the better; say, for example, the claim to have discovered a new "wonder drug" or "alternative health" treatment. This drive runs directly into the skepticism of those who argue that the new is not really new, nor is it necessarily better.

Print This

Page

This lecture uses MathML to make math formulas more readable. Learn more.

The Basic Idea of Hypothesis Testing

σσ

of 16

3/26/2012http://vizedhtmlcontent.next.ecollege.com/(NEXT(1b553fd373))/Main/CourseMode/Vized...

In the language of hypothesis testing, the new is called the alternative hypothesis, symbolized as , and the old or skeptical view is called the null hypothesis, symbolized as .

The hypothesis testing procedure basically works like this:

The null hypothesis assumes an initial population mean or proportion. Based on this assumption, a probability distribution is laid out. Next, we establish -- before gathering any data -- what it will take for us to reject (or not reject) our original assumption. Then we gather sample data; we compare the sample data to the pre-established basis for rejecting, or not rejecting, our original assumption; and we make a judgment about whether or not to reject our original assumption. Finally, we interpret our findings in plain English (which means that we explain our results using words, numbers, and pictures that can be understood by someone who has never studied statistics), in the terms of the problem. This process is critical in business decision making by managers. Hypothesis testing is sometimes considered a "reality check" or "insurance policy" regarding whether or not a line of business reasoning is reasonable.

In the following section of this lecture, we will present a problem situation that details a five-step hypothesis testing procedure, both manually and using Minitab. Don't be worried by the unfamiliar terms in the Five Steps of Hypothesis Testing: we will define and illustrate each of them within the problem.

After our demonstration of the problem situation and the use of the seven steps of hypothesis testing, we will lay out the essential variations on the theme, as we have done in previous lectures. And we will conclude by demonstrating the variations with problem situations, again manually and with Minitab.

Consider the following problem situation:

A bank manager has developed a new system meant to reduce the time customers spend waiting to be served by tellers during peak business hours. Typical waiting times during peak business hours under the current system are roughly nine to ten minutes. The bank manager hopes that the new system will lower typical waiting times to less than six minutes. A random sample of 100 customers is observed, and the wait times are available in the data file "WaitTime" at the publisher's web site (which has a link in the webliography), and via the Minitab link under Course Home (where you will find a tutorial on how to access all of the text book data files). The random sample of n = 100 waiting times yields a sample mean of = 5.46 minutes. Assuming = 2.47 (we'll use the sample standard deviation, s, in a later example). For the given problem situation and data, complete the Five Steps of Hypothesis Testing (which are taken from our text book), as follows (keep in mind that steps four and fivecan be done using a critical value rule OR using a p-value):

1. State the null hypothesis and the alternative hypothesis .

2. Specify the level of significance : for this problem situation, use each of

= .10, .05, .01, and .001.

3. Select the test statistic.

Using a critical value rule:

An Example and the Five Steps of Hypothesis Testing

H aH 0

x− σ

H 0H a

α

α

of 16


4. Determine the critical value rule for deciding whether to reject (and accept ). 5. Collect the sample data, compute the value of the test statistic, and decide whether to reject (and

accept) . Interpret the statistical results.

If we are not given a value for α, we can use the Weight of Evidence Table, as follows:

Interpreting the Weight of Evidence Against the Null Hypothesis:

If the p-value for testing is less than

.10, we say we have some evidence that .

.05, we say we have strong evidence that . .01, we say we have very strong evidence that . .001, we say we have extremely strong evidence that .

Note: if the p-value is equal to or greater than .10, we say we have little evidence that .

Here is a summary box, which summarizes the critical value and the p-value approaches to hypothesis testing: about a population mean, when σ is known.

Click here for Description

Important Steps Key Points (A "think aloud" about the Important Steps)

1. State the null hypothesis and the alternative hypothesis . ,

Often, we start with the alternative hypothesis, by carefully reading the problem, looking for the new or desired state. In this problem, the bank manager hopes that the new system will lower the mean waiting time to less than 6 minutes. Thus, we make our alternative hypothesis . Then we can construct the null hypothesis as the opposite, or complement. The null hypothesis always includes 0. In this problem, the null hypothesis is .

H 0 H aH 0

H a

H 0

is false andH 0 is trueH ais false andH 0 is trueH a

is false andH 0 is trueH ais false andH 0 is trueH a

is false andH 0 is trueH a

H 0H a

:μ ≥ 6 H 0 :μ < 6H a

:μ < 6H a

:μ ≥ 6H 0

of 16


2. Specify the level of significance: for this problem situation, use each of

= .10, .05, .01, and .001.

The level of significance is the probability of a Type I error. A Type I error is the probability of wrongly rejecting the null hypothesis and accepting the alternative hypothesis, when in fact the null hypothesis is true. In our criminal justice example, a Type I error is the probability of wrongly convicting someone. In this problem situation, we are going to try out several different values of .

3. Select the test statistic. The test

statistic is

The test statistic comes from the data. In this problem, it is a z-score. It captures how far our sample mean is from the population mean assumed by the null hypothesis , divided by the standard error of the mean . Notice that this is the same way we have previously created z-scores, except that previously with confidence intervals, and now with hypothesis testing, we use the special "standard error."

A. Using a critical value rule: There are two ways to test hypotheses, which always come to the same conclusion. One way is called the critical value rule, and the other way is called the p-value approach. In much of our work going forward, we will end up using the p-value approach because this is what Minitab gives us. But it is important to also understand the "critical value rule" because it very clearly lays out the logic of hypothesis testing.

4. Determine the critical value rule for deciding whether to reject (and accept ). Use the specified values of to find the critical value given in the critical value rule. ,

, ,

The critical value rule starts with a given (in this problem we have chosen to work with four

, but we will take them one at a time). Take as the first example. Since our alternative hypothesis is , we will be focusing on the left (or lower) tail of the normal probability curve. If , we need to find the z-score, which we label , and which has .10 in the left tail. We figured out this kind of problem numerous times in our previous work with the normal probability distributions. As a quick review: if we start with .10 in the left tail, we look up .10 in the body of the table or cumulative areas under the standard normal curve. Then we get the z-score by following the row to the left column and up the column, to get the second decimal of the z-score. In our example, . Notice that it is a negative number because we are on the left side of the mean in the normal curve. The same process is followed for the other 's to find the -scores.

5. Collect the sample data and compute the value of the test statistic, and decie whether to reject (and accept ). Interpret statistical

Here we are filling in values in the test statistic formula (see #3 above). Here we can see clearly the fundamental logic of hypothesis testing. We started with an

α

α

α

z=− x− μ0σ

n√/( )x−

( )μ0( )σ

n√/

H 0H a

α

= − 1.28z.10 = − 1.65z.05 = − 2.33z.01 < − 3.09 z.001

α

α 'sα = .10

< 6H a

α = .10z.10

= − 1.28z.10

αzα

H 0H a

of 16


results.

The test statistic is ,

, z = -2.19

Our conclusion is that we will reject

and accept at = .10 and .05, because the test statistic from the actual data, z = -2.19, is less than ,

. at = .01, and .001, do not reject and do not accept , because the test statistic z = -2.19 is not less than

or . Interpretation: We conclude that there is "strong evidence," at 's of .10 and .05, that the mean waiting time is less than 6 minutes. But, we cannot conclude at 's of .01 or .001 that the mean waiting time is less than 6 minutes.

assumption about the population mean, (using ). Then we

established the critical value , which is the z-score with as the left tail. Finally, we compare our test statistic z = -2.19, which comes from our actual sample data. Because z = -2.19 is less than -1.28, it is further out in the left tail. This tells us that we can reject and accept , with an chance of being wrong. So far, we have been immersed in the mechanics of hypothesis testing. Now, we come back to the terms of the problem situation and translate our findings into plain English.

B. Using a p-value: (Remember, the first three steps are the same as in the "critical value rule" approach above.)

Remember, using p-values is the second way of testing hypotheses.

4. Collect the sample data, compute the value of the test statistic, and compute the p-value.

The test statistic is ,

, z = -2.19

The Test statistic from the actual data, z = -2.19, has a p-value = .0143.

As we did when using the critical value rule above, we calculate the test statistic, which uses our actual data to calculate a z-score. Our test statistic z = -2.19. Now we find the corresponding area under the left tail of the normal curve. We did this numerous times in our work with the normal distribution. As a brief review: start with z = -2.19. Go to the table of cumulative areas under the standard normal curve; find the area under the normal curve corresponding to -2.19, namely .0143. This is the p-value = .0143.

5. Reject at level of significance if the p-value is less than . Interpre the statistical results. Therefore, reject , and accept , at = .10 and = .05, because the p-value = .0143 is less than these 's. But, do not reject , and do not accept , at = .01 and = .001, because the p-value = .0143 is not less than these 's. We conclude that there is strong

We compare the p-value = .0143 to each of our 's. If our p-value is less than an , it means we can reject and accept at the particular value of . Even more concretely, we can say the probability of being wrong in rejecting and accepting is the p-value = .0143. Finally, it is very important to note, and remember, that the p-value approach and the critical value rule approach always yield the same results, and either result needs to be interpreted in the terms of the actual problem

z=− x− μ0σ

n√/

z=5.46− 6

2.47100√/

H 0 H a α

= − 1.28z.10 = − 1.65z.05 α

H 0H a

= − 2.33z.01 < − 3.09 z.001

α

α

= 6μ0= − 1.28z.10

α = .10

:μ ≥ 6H 0 :μ < 6H aα = .10

z=−x− μ0σ

n√/

z=5.46− 6

2.47100√/

H o αα

H o H aα α

αH o

H a α α

α

α αH 0 H a

α

H 0 H a

of 16


Now, here's the link to learn how to use Minitab to do hypothesis testing.

And, here is the Minitab output you should get.

evidence, at 's of .10 and .05, that the mean waiting time is less than 6 minutes. But we cannot conclude at 's of .01 or .001 that the mean waiting time is less than 6 minutes.

situation, in plain English.

Tutorial

Z Mean - Less Than

α

α

of 16


Notice that the Minitab output gives the p-value = .0144 (our hand calculation got .0143, which was a little less precise, at the fourth decimal) and the z value (the test statistic) = -2.19. If we use p-values, we can compare them immediately to the 's, and reject the 's (and accept ) for any 's for which the p-value is less. If we use the critical value rule, we have to hand calculate the , as we did in our hand calculations.

1. Our example above was a one-tailed hypothesis test. Furthermore, it was a left or lower-tailed test. One-tailed hypothesis tests also can be right or upper-tailed tests, in which case the z-scores are positive rather than negative. In addition to one-tailed tests, hypothesis tests can be two-tailed tests, in which [a particular value] and to [the same particular] value. The only real complication here is figuring out the splitting of between the two tails and determining the p-value.

2. In our example above, we assumed that we knew the population standard deviation, , but most often we don't know , and, therefore, as we saw with confidence intervals, we use the sample standard deviation, s, and the t distribution instead of the standard normal distribution.

3. Also, as with confidence intervals, we want to be able to make inferences about the population proportion in addition to the population mean. Again, the concepts are the same, but the formulas change to reflect proportions rather than means.

Let's now consider a concrete example of each of the essential variations. Again, we will do them both "by hand" and using Minitab.

Consider the following problem situation:

An automobile parts supplier owns a machine that produces a cylindrical engine part. This part is supposed to have an outside diameter of three inches. Parts with diameters that are too small or too large do not meet customer requirements and must be rejected. Lately, the company has experienced problems meeting customer requirements. The technical staff feels that the mean diameter produced by the machine is off target. In order to verify this, a special study will randomly sample 40 parts produced by the machine. The 40 sampled parts will be measured, and if the results obtained cast a substantial amount of doubt on the hypothesis that the mean diameter equals the target value of three inches, the company will assign a problem-solving team to intensively search for the causes of the problem.

a. The parts supplier wishes to set up a hypothesis test so that the problem-solving team will be assigned when the null hypothesis is rejected and the alternative hypothesis is accepted. Set up the null and alternative hypotheses for this situation.

b. In the context of this situation, interpret making a Type I error; interpret making a Type II error. c. Suppose it costs the company $3,000 a day to assign the problem-solving team to a project. Is this $3,000

figure the daily cost of a Type I or a Type II error? Explain. d. A sample of 40 parts yields a sample mean diameter of = 3.006 inches. Assuming equals .016, use

critical values and a p-value to test versus by setting = .05. Should the problem-solving team be assigned?

e. Suppose that product specifications state that each and every part must have a diameter between 2.95 and3.05 inches; that is, the specifications are 3 inches ±.05 inches. Use the sample information given in part d to estimate an interval that contains almost all (99.73%) of the diameters. Compare this estimated interval with the specification limits. Are the specification limits being met, or are some diameters outside the specification limits? Explain.

Here is the "summary box," we introduced above which summarizes the critical value and the p-value approach to hypothesis testing about a population mean when σ is known. In this example, we will be using a two-sided alternative hypothesis.

The Essential Variations

Problem Situation: z Tests about a Population Mean ( Known): Two-Sided Alternatives

α H 0 H a α'szα

=H 0≠H a

ασ

σ

σ

x− σH 0 H a α

of 16



a. = 3 inches; inches. Careful reading of the problem situation tells us that "parts with diameters that are too small or too large do not meet customer requirements and must be rejected." (Emphasis added.) Notice that the alternative hypothesis refers to an undesirable condition,

b. Type I error = conclude the problem-solving team should be formed, that is, concluding that , when it's not needed. Type II error = conclude the problem-solving team should not be formed when it is needed.

Remember our criminal justice example: the Type I error would be wrongly convicting an innocent person, and the Type II error would be wrongly not convicting a guilty person.

c. A Type I error would result in a $3,000 daily expenditure that was not needed.

If we mistakenly conclude that the population mean , and we form the problem-solving team mistakenly, then we will be wasting $3,000/day.

d. Use the summary box to find the critical value rule corresponding to the alternative hypothesis. Use the specified values of to find the critical value given in the critical value rule. The critical value ,

, ,

These are essentially the same mechanics we employed before. The critical difference here is that, because we now have a two-tailed test, we divide the into the two tails. As before, we compare the z-score, the test statistic (which includes our sample data, z = 2.37) to the critical value, preset based on ,

. Because the test statistic is greater than the critical value (that is, further out in the tail), we reject the null hypothesis and accept the alternative hypothesis.

1. Applying the Critical Value Rule:

These are essentially the same mechanics we employed before. The critical difference here is that, because we now have a two-tailed test,

H 0 ≠ 3H a

≠ 3.H a

≠ 3H a

μ ≠ 3

α

= − 1.28z.10= − 1.65z.05 = − 2.33z.01< − 3.09z.001

α

α= 1.96z.025

of 16


To learn how to use Minitab to find the test statistic and the p-value, use the following link.

The test statistic z = =

2.37.

The critical value

.

Since , we reject and

we accept at = .05. Therefore, "yes", the problem-solving team should be assigned.

we divide the into the two tails. As before, we compare the z-score, the test statistic (which includes our sample data, z = 2.37) to the critical value, preset based on ,

. Because the test statistic is greater than the critical value (that is, further out in the tail), we reject the null hypothesis and accept the alternative hypothesis.

2. Applying the p-Value: z (the test statistic) = 2.37. The corresponding p-value = 2*.0089 = .0178. Since .0178 < = .05, we reject and we accept at = .05. Therefore, "yes", the problem-solving team should be assigned.

As before, the p-value approach starts with the test statistic, and we find the area in the tail corresponding to the test statistic. The critical difference for the two-tailed case is that the p-value includes both tails and, therefore, the area under one tail that we find has to be doubled before comparing it to , which includes both tails.

e. Product specifications = 2.95 to 3.05 inches. 99.73% interval = 3.006 ± 3*.016 = 2.958 to 3.054: therefore, "no" the specification limits are not being met; some diameters are outside the specification limits. 3.054 is larger than 3.05.

The product specifications represent the range of values we require. The 99.73% interval is the actual output that we get. By comparing the two ranges, we can see that some of the actual output will be outside the specification range. The needed response by management is to find a way to narrow the range of the actual output. Sometimes, though, management goes back and reviews the product specifications, and may conclude that they were unrealistically tight. If loosened up without affecting the needed quality, the actual production might then be acceptable.

Tutorial

Z Mean - Not Equal

−x− μ0σ

n√/

= = = 1.96zα2/ z.05

2/ z.025

z> zα2/ H 0

H a α

α

α= 1.96z.025

αH 0 H a α

α

of 16


Here is the Minitab output you should get.

Since we generally do not know the population standard deviation , as we saw with Confidence Intervals, we can use the sample standard deviation s, and the t distribution. To illustrate this variation of hypothesis testing, we will return to the Bank Customer Waiting Time Case, which we presented earlier in this lecture. This time, though, we will assume we do not know , so we will use the sample standard deviation s. Here is a brief description of the problem situation:

A bank manager has developed a new system meant to reduce the time customers spend waiting to be served by tellers during peak business hours. Typical waiting times during peak business hours under the current system are roughly nine to ten minutes. The bank manager hopes that the new system will lower typical waiting times to less than six minutes. A random sample of 100 customers is observed, and the wait times are available in the Excel data file "WaitTime," at the publisher's website, which has a link in the Webliography. The random sample of 100 waiting times yields a sample mean of = 5.46 minutes and a sample standard deviation s = 2.475. Let denote the mean of all possible bank customer waiting times using the new system and consider testing versus

.

a) Perform a t test of these hypotheses by setting = .05, and using a critical value. b) Find and interpret the p-value for the hypothesis test.

Problem Situation: t Tests about a Population Mean (Unknown)

σ

σ

σ

x− μ:μ = 6H 0

:μ < 6H a

α

of 16


Here is a summary box, which summarizes the critical value and the p-value approach to hypothesis testing about a population mean, when is unknown.

To learn how to use Minitab to find the test statistic and the p-value, use the following link.


a. Applying the Critical Value Rule: The critical value

.

The test statistic

t = -2.18. And, since -2.18 is less than -1.660, reject and accept at = .05.

This problem type is just like the one-tail hypothesis test we did above, except for three things.

1) We are using s, the sample standard deviation, instead of , the population standard deviation.

2) We are using the t distribution to find the critical value.

b. p-Value: t = -2.18 → .0158. Note: Our t table won't provide the p-value; we have to use Minitab, as we show below. Because the p-value is below the = .05, we reject and accept

at = .05.

And, 3) for the p-value we have to rely on Minitab, because the t table doesn't have enough detail. Except for these three differences, however, the critical value and p-value approaches work the same as in the example using z values.

Tutorial

t Mean - Less Than

σ

− = − = −1.660tα t.05

t= =−x− μ0s

n√/

=5.46− 6.00

2.475100√/

H 0 H a α

σ

α H 0H a α

of 16


Here is the Minitab output you should get.

We will conclude our work on hypothesis testing with a single variable by considering the case of a large sample test about a population proportion.

As an example, we will revisit the following problem situation, involving a customer satisfaction survey at Bank of America, which we first encountered in our Week 3 Lecture.

Quality Progress, February 2005, reports on the results achieved by Bank of America in improving customer satisfaction and customer loyalty by listening to the "voice of the customer." A key measure of customer satisfaction is the response on a scale from 1 to 10 to the question, "Considering all the business you do with Bank of America, what is your overall satisfaction with Bank of America?" Suppose that a random sample of 350 current customers results in 195 customers with a response of 9 or 10 representing "customer delight."

1. Let p denote the true proportion of all current Bank of America customers who would respond with a 9 or 10, and note that the historical proportion of customer delight for Bank of America has been .48. Calculate the p-value for testing versus . How much evidence is there that p exceeds .48?

2. Bank of America has a base of nearly 30 million customers. Do you think that the sample results have practical importance? Explain your opinion.

Problem Situation: z Tests about a Population Proportion

Important Steps Key Points

:p= .48H 0 :p> .48H a

of 16


Here is a "summary box," which summarizes the critical value and the p-value approaches to hypothesis testing about a population proportion.

To learn how to use Minitab to do an hypothesis test with the proportion, use the following link.

(A "think aloud" about the Important Steps)

a. The test statistic =

The test statistic = 2.88

The most important point about this problem type is that the hypothesis testing framework is the same as it is for the mean, except for the change in the test statistic formula. Just be sure to note that is the sample proportion and is the population proportion under the null hypothesis.

b. From the test statistic z = 2.88 we find the p-value = .002. Because the p-value is less than .01, we can use the Weight of Evidence table to conclude that we have "very strong evidence" that is false and that is true, namely that p exceeds .48.

Notice that we are not given a value for in this problem. We can still use the p-value approach by using the earlier table for "Interpreting the Weight of Evidence against the Null Hypothesis" provided above. We can also use the p-value by itself to say, "there is a probability of only .002 that we are making a mistake in rejecting and accepting ."

With 30 million customers, there is great practical significance in the results, because the more delighted customers there are, the more repeat business, and the more referrals there will be for Bank of America.

As always, at the conclusion of an analysis, we want to return to the language of the problem situation and reflect on the real-world meaning of our results.

z= = z=−p p0

(1− )p0 p0n

− −−−−−−√

.557− .480

.480(1−.480)

350

− −−−−−−−−√

pp0

H 0H a

α

H 0H a

of 16


And, here is the Minitab output you should get.

This concludes our work on hypothesis testing with a single variable, including means and proportions, one-tail and two-tail tests, and known and unknown population standard deviation. In the next two lectures, we turn to the consideration of two or more variables, a procedure known as Regression. All of our hard work will be

Tutorial

Z Proportion - Greater Than

Summary and Conclusions

of 16


tremendously useful as we employ confidence intervals and hypothesis testing to making inferences and predictions about one variable (which we'll call the dependent variable), and using one or more additional variables (which we'll call independent variables) as the basis for our predictions.

Finally, we should note that in our use of Minitab to support us in doing hypothesis testing, we have used summary data. However, often we will want to start with raw data, which Minitab can also handle very nicely. Here is a link to show you how to do hypothesis testing with raw data.

And, here is the final Minitab output you should get.

Tutorial

Hypothesis Test with Raw Data

of 16


Cases and the associated data sets contained in this lecture were drawn from the publisher's web site for our text book, Essentials of Business Statistics, 4th Edition, Bowerman, McGraw-Hill, New York, O'Connell, Murphree, and Orris

of 16


Ma

Documents

Transcript of Ma