Non-parametric Methods: Analysis of Ranked Data Chapter 18 McGraw-Hill/Irwin Copyright © 2012 by...

Non-parametric Methods:

Analysis of Ranked Data

Chapter 18

McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.

Learning Objectives

LO1 Define a nonparametric test and when it is appliedLO2 Conduct the sign test for dependent samples using the binomial and standard normal distributions as the test statistics.LO3 Conduct a test of hypothesis for dependent samples using the Wilcoxon signed-rank test.LO4 Conduct and interpret the Wilcoxon rank-sum test for independent samples.LO5 Conduct and interpret the Kruskal-Wallis test for several independent samples.LO6 Compute and interpret Spearman’s coefficient of rank correlation.LO7 Conduct a test of hypothesis to determine whether the correlation among the ranks in the population is different from zero.

18-2

Nonparametric Tests No assumption requirement

on the shape of the population distribution

Sometimes referred to as distribution free tests

Require that responses be ranked or ordered

Responses must be ordinal, interval or ratio scale

LO1 Define a nonparametric test and know when it is applied.

18-3

The Sign TestProcedure to conduct the test: Determine the sign (+ or -) of

the difference between pairs. Determine the number of

usable pairs. Compare the number of

positive (or negative) differences to the critical value.

n is the number of usable pairs (without ties), X is the number of pluses or minuses, and the binomial probability π = .5

LO2 Conduct the sign test for dependent samples using the binomial and standard normal distributions as the test statistics.

18-4

The Sign Test - ExampleThe director of information systems at Samuelson Chemicals recommended that an in-plant training program be instituted for managers. The objective is to improve the knowledge of database usage in accounting, procurement, production, and so on. A sample of 15 managers was selected at random. A panel of database experts determined the general level of competence of each manager with respect to using the database. Their competence and understanding were rated as being either outstanding, excellent, good, fair, or poor. After the three-month training program, the same panel of information systems experts rated each manager again. The two ratings (before and after) are shown along with the sign of the difference. A “+” sign indicates improvement, and a “-” sign indicates that the manager’s competence using databases had declined after the training program. Did the in-plant training program effectively increase the competence of the managers using the company’s database?

LO2

18-5

The Sign Test - Example

Step 1: State the Null and Alternative Hypotheses

H0: π ≤.5 (There is no increase in competence as a result of the in-plant training program.)

H1: π >.5 (There is an increase in competence as a result of the in-plant training program.)

Step 2: Select a level of significance. We chose the .10 level.

Step 3: Decide on the test statistic. It is the number of plus signs resulting from the

experiment.

Step 4: Formulate a decision rule.

LO2

18-6


LO2

18-7


LO2

Step 5: Make a decision regarding the null hypothesis.

Eleven out of the 14 managers in the training course increased their database competency. The number 11 is in the rejection region, which starts at 10, so the null hypothesis is rejected.

We conclude that the three-month training course was effective. It increased the database competency of the managers.

18-8

Normal Approximation to the Binomial

If the number of observations in the sample is larger than 10, the normal distribution can be used to approximate the binomial.

LO2

18-9

The market research department of Cola, Inc., has been given the assignment of testing a new soft drink. Two versions of the drink are considered—a rather sweet drink and a somewhat bitter one. A preference test is to be conducted consisting of a sample of 64 consumers. Each consumer will taste both the sweet cola (labeled A) and the bitter one (labeled B) and indicate a preference.

Conduct a test of hypothesis to determine if there is a difference in the preference for the sweet and bitter tastes. Use the .05 significance level.

Normal Approximation - Example

LO2

18-10

Step 1: State the null hypothesis and the alternate hypothesis.

H0: π = .50 There is no preference

H1: π ≠ .50 There is a preference

Step 2: Select the level of significance.

α = 0.05 as stated in the problem


Step 3: Select the test statistic.

Use Z-distributionwhere µ=.50n and σ=.50 n

LO2

18-11

Step 4: Formulate the decision rule.

Referring to Appendix B.1, Areas under the Normal Curve, for a two-tailed test (because states that π ≠ .50) and the .05 significance level, the critical values are -1.96 and +1.96.

Step 5: Compute z, compare the computed value with the critical value, and make a decision regarding H0

Note: There are 42 pluses. Since 42 is more than n/2 =64/2=32, we use formula (18–2) for z:


38264500

64505042

50

5050.

.

)(.).(

n.

n.).X(z

The computed z of 2.38 is beyond the critical value of 1.96.

Conclusion: The null hypothesis of no difference is rejected at the .05 significance level. There is evidence of a difference in consumer preference. That is, we conclude consumers prefer one cola over another.

LO2

18-12

Wilcoxon Signed-Rank Test for Dependent Samples

Use if the assumption of normality is violated for the paired-t test,

Requires the ordinal scale of measurement.

Observations must be related or dependent.

LO3 Conduct a test of hypothesis for dependent samples using the Wilcoxon signed-rank test.

18-13

Wilcoxon Signed-Rank Test for Dependent Samples – Example

The steps for the test are:1. Compute the differences between related

observations, drop observations with 0 difference from the sample.

2. Rank the absolute differences from low to high.

3. Return the signs to the ranks and sum positive and negative ranks.

4. Compare the smaller of the two rank sums with the T value, obtained from Appendix B.7.

LO3

18-14

Wilcoxon Signed-Rank Test for Dependent Samples - Example

Fricker’s , a family restaurant chain located primarily in the southeastern part of the United States., offers a full dinner menu. Its specialty is chicken. Recently, the owner and founder, developed a new spicy flavor for the batter in which the chicken is cooked. Before replacing the current flavor, he wants to conduct some tests to be sure that patrons will like the spicy flavor better. A random sample of 15 customers is chosen, each customer is given a small piece of the current chicken and asked to rate its overall taste on a scale of 1 to 20. A value near 20 indicates the participant liked the flavor, whereas a score near 0 indicates they did not like the flavor. Next, the same 15 participants are given a sample of the new chicken with the spicier flavor and again asked to rate its taste on a scale of 1 to 20. The results are reported in the table on the right. Is it reasonable to conclude that the spicy flavor is preferred? Use the .05 significance level.

LO3

18-15


Step1: State the Hypotheses:H0: There is no difference in the ratings of the two flavors.

H1: The spicy ratings are higher.

Step 2: Select the Level of SignificanceIt is 0.05

Step 3: Select the Test StatisticIt is the T

Step 4: State the Decision Rule

Reject H0 if computed T < critical T

where: computed T is the smaller of the two rank sums

LO3

18-16


Step 4: Formulate the Decision Rule – Finding the critical T The critical values for the Wilcoxon signed-rank test are located in Appendix

B.7. A portion of that table is shown on the table below.

LO3

18-17


The smaller of the two rank sums is used as the test statistic and referred to as T.

LO3

Step 5: Compute the Test Statistic and Make a Decision

18-18


Recall the Decision Rule Reject Ho if Computed T > Critical T

Computed T of 30 > Critical T of 25 Decision: Do not to reject the null

hypothesis. Conclude: We cannot conclude there is a

difference in the flavor ratings between the current and the spicy. Stay with the current flavor.

LO4 Conduct and interpret the Wilcoxon rank-sum test for independent samples.

18-19

Wilcoxon Rank-Sum Test

Used to determine if two independent samples came from the same or equal populations.

No assumption about the shape of the population is required.

The data must be at least ordinal scale. Each sample must contain at least eight

observations.

LO 4

18-20

Wilcoxon Rank-Sum Test for Independent Samples

The Wilcoxon rank-sum test is based on the sum of ranks. The data are ranked as if the observations were part of a single sample.

The sum of ranks for each of the two samples is determined If the null hypothesis is true, then the ranks will be about evenly

distributed between the two samples, and the sum of the ranks for the two samples will be about the same.

LO4

18-21

Wilcoxon Rank-Sum Test for Independent Samples - Example

Dan Thompson, the president of CEO Airlines, recently noted an increase in the number of no-shows for flights out of Atlanta. He is particularly interested in determining whether there are more no-shows for flights that originate from Atlanta compared with flights leaving Chicago. A sample of nine flights from Atlanta and eight from Chicago are reported on table.

At the .05 significance level, can we conclude that there are more no-shows for the flights originating in Atlanta?

LO4

18-22


Mr. Thompson believes there are more no-shows for Atlanta flights. Thus, a one tailed test is appropriate, with the rejection region located in the upper tail.

Hypothesis:H0: The population distribution of no-shows is the same or less for

Atlanta and Chicago.H1: The population distribution of no-shows is larger for

Atlanta than for Chicago.

Decision Rule:Reject H0 if: computed Z > critical Z.05 level of significance = 1.65 critical Z

LO4

18-23


Rank the observations from both samples as if they were a single group.

The Chicago flight with only 8 no-shows had the fewest, so it is assigned a rank of 1. The Chicago flight with 9 no-shows is ranked 2, and so on.

LO4

18-24


The value of W is calculated for the Atlanta group and is found to be 96.5, which is the sum of the ranks for the no-shows for the Atlanta flights.

The computed z value (1.49) is less than 1.65, the null hypothesis is notrejected. It appears that the number of no-shows is the same in Atlanta as in Chicago.

LO4

18-25

Kruskal-Wallis Test:Analysis of Variance by Ranks

Used to compare three or more samples to determine if they came from equal populations.

Ordinal scale of measurement is required. Alternative to the one-way ANOVA. Chi-square distribution is the test statistic. Each sample should have at least five observations. Data is ranked from low to high as if it were a single group.

LO5 Conduct and interpret the Kruskal-Wallis test for several independent samples.

18-26

Kruskal-Wallis Test:Analysis of Variance by Ranks - Example

The Hospital System of the Carolinas operates three hospitals in the Greater Charlotte area: St. Luke’s Memorial on the west side of the city, The Swedish Medical Center to the south, and Piedmont Hospital on the east side of town. The Director of Administration is concern about the waiting time of patients with nonlife-threatening athletic type injuries that arrive during weekday evenings at the three hospitals. Specifically, is there a difference in the waiting times at the three hospitals?

The Director selected random sample of patients at the three locations and determined the time, in minutes, between entering the particular facility and when treatment was completed. The times in minutes are reported below.

LO5

18-27


Step 1: Set up the Null and Alternate Hypotheses

H0: The population distributions of waiting times are the same for the three hospitals.

H1: The population distributions of waiting times are NOT all the same for the three hospitals.

Step 2: State the Decision Rule

H0 is rejected if the computed H statistic is greater than critical χ2 value of 5.991 (There are 2 degrees of freedom at the .05 significance level. )

LO5

18-28


Step 3: Collect Data and Compute the Chisquare Statistic

Considering the waiting times as a single population, the Piedmont patient with a waiting time of 35 minutes waited the shortest time and hence is given the lowest rank of 1. There are two patients that waited 38 minutes, one at St. Luke’s and one at Piedmont. To resolve this tie, each patient is given a rank of 2.5, found by (2 + 3)/2. This process is continued for all waiting times. The longest waiting time is 107 minutes, and that Swedish Medical Center patient is given a rank of 21. The scores, the ranks, and the sum of the ranks for each of the three hospitals are given in the table below.

LO5

18-29


Because the computed value of H (5.736) is less than the critical value of 5.991, the null hypothesis is not rejected.

There is not enough evidence to conclude there is a difference among the executives from manufacturing, finance, and engineering with respect to their typical knowledge of management principles.

LO5

18-30


LO5

18-31

Rank-Order CorrelationSpearman’s coefficient of rank correlation reports the association between two sets of ranked observations. The features are:

It can range from –1.00 up to 1.00. It is similar to Pearson’s coefficient of correlation, but is based

on ranked data. It computed using the formula:

LO6 Compute and interpret Spearman’s coefficient of rank correlation.

18-32

Rank-Order Correlation - Example

Lorrenger Plastics, Inc., recruits management trainees at colleges and universities throughout the United States. Each trainee is given a score, an expression of future potential ranging from 0 to 200, by the recruiter during the on-campus interview. A higher score indicates more potential. An applicant hired by Lorrenger then enters an in-plant training program. At the completion of this program the recruit is given another composite score (0 to 100), which is based on tests, opinions of group leaders, and in-plant training officers. A higher score indicates more potential. Is there an association between the on-campus and in-plant scores? The on-campus scores and the in-plant training scores are on the table below:

LO6

18-33

Rank-Order Correlation - ExampleLO6

18-34

Rank-Order Correlation - Example

Conclusion:

The value of .785 indicates a strong positive association between the ratings of the on-campus recruiter and the scores of the in-plant training staff.

The graduates that received high ratings from the on-campus recruiter also tended to be the ones that received high ratings from the training staff.

LO6

18-35

Testing the Significance of rs

State the Null and Alternate Hypothesis:

H0: Rank correlation in population is 0.

H1 There is a positive association among the ranks.

Determine the Significance Level and Test Statistic

For a sample of 10 or more, the significance of is determined by computing t using the following formula. The sampling distribution of follows the t distribution with n - 2 degrees of freedom.

LO7 Conduct a test of hypothesis to determine whether the correlation among the ranks in the population is different from zero.

18-36

Testing the Significance of rs - Example

H0: Rank correlation in population is 0.H1 There is a positive association among the ranks.

Reject H0 if computed t > critical t t .05, n-2

t .05. 12-2

1.812

LO7

18-37

Non-parametric Methods: Analysis of Ranked Data Chapter 18 McGraw-Hill/Irwin Copyright © 2012 by...

Documents

Transcript of Non-parametric Methods: Analysis of Ranked Data Chapter 18 McGraw-Hill/Irwin Copyright © 2012 by...