Nonparametric Statistics Lecture 9. Small Sample, Non-normal Population If the sample was large, the...

Nonparametric Statistics

Lecture 9

Small Sample, Non-normal Population

If the sample was large, the Central Limit Theorem would be applicable for testing hypotheses about the mean.If the population was normal, the sampling distribution of the mean is exactly a normal distribution to start with.If the sample is small and the population non-normal, what do we do?Nonparametric statistics is a sub-field of statistics that creates inferences concerning populations that cannot be assumed to follow any particular distribution.

One –Sample Example

Suppose that a nurse has been instructed to perform a procedure in a new way . Researchers recorded the change in the number of minutes it took the nurse to perform the procedure.The data is0.6, -0.5, 1.1, 2.4, 3.5, 2.0-0.1, 1.0, 2.1, -0.6, -0.2We would be hard pressed to say that this data even approximately follows a normal distribution.

Assumption of normality for small sample example

There are only 11 observations and we might be uncomfortable claiming that this distribution looks normal. Instead, it looks more uniform.

The Sign Test – 5 Steps

Assumptions: Random, independent sample

Hypotheses:Null hypothesis: Median equals zero

Alternative hypothesis: Median does not equal zero

Test statistic: p=7/11, interested in comparing proportion that are greater than zero with one-half.

The Sign Test – 5 Steps, cont.

P-value: Need exact calculation since CLT doesn’t apply with small samples. 95% CI for p with small samples: (0.308, 0.891)

Conclusion: Since 0.5 is included in the 95% confidence interval, we can’t say that the median is significantly different than zero at the 0.05 level. (We fail to reject the null hypothesis.)

The Signed Rank Test – 5 stepsAssumptions:

The measurement is continuousIndependent, random sample from the populationDistribution is symmetric

Hypotheses: H0: Median of the distribution is 0

HA: Median of distribution is non-zero

Test Statistic: Minimum of the rank sumsP-value: from the computer!

For this example, p=0.0439

Conclusion: As per usual.

Calculation of Signed Rank Test Statistic

Order observations from smallest to largest in absolute value

|Y|(1) ≤ |Y|(2) ≤ … ≤ |Y|(n)

So from example,|-0.1| < |-0.2| < |-0.5| < |-0.6| = 0.6 < 1.0 < 1.1 < 2.0 < 2.1 < 2.4 < 3.5 Assign Ranks to these absolute values

1, 2, … , nIn example, 1, 2, … , 11

Signed Rank Test Statistic, cont…

Arrange the ranks into two groups: those with actual values that are smaller and those that are larger than zero. Sum the ranks for both the negative and positive valued observations, separately.Here, for negative values, sum of ranks = 1+2+3+4.5 = 10.5For positive valuessum of ranks = 4.5+6+7+8+9+10+11 = 55.5Test Statistic = smallest rank sum

P-values for signed rank test

For critical values and p-values, look at tables/computer generated p-values.This procedure is unavailable in the Student version of SPSS. It is available in SAS and the regular version of SPSS.

Comments on Signed Rank Test

More “powerful” than the Sign Test, but requires more assumptionsOne-sided tests are possibleRobust to outliersSome books/programs use the sum of the ranks of the positive values as the test statistic – p-values are always the sameNonparametric confidence intervals are also available from some software programs.For tied observations, use average rank for each tied observation.

Nonparametric statistics for small, non-normal samples

Paired DataThe same as for univariate data, except perform the test using the differences rather than the raw data.

Two Independent GroupsMann-Whitney Rank Sum Test (Ch. 24)

• Procedure is similar to the Sign Rank test, except that instead of dividing observations according to whether they are positive or negative, we divide observations according to group membership.

• Assumptions include (1) independent, random samples, (2) independently selected groups, and (3) the shape and spread of the two distributions are the same

Paired Differences Example

Wife 0.4 0.5 1.0 0.2 0.9 1.0 1.2 0.1 0.6 0.4 0.2

Husband 0.5 0.4 0.7 0.0 0.6 1.2 0.7 0.1 0.5 0.1 0.1

Difference -0.1 0.1 0.3 0.2 0.3 -0.2 0.5 0.0 0.1 0.3 0.1

Study Hypothesis: Men and women spend different amounts of time reading/watching the news.

The Signed Rank Test – 5 stepsAssumptions:

The measurement (difference) is continuousIndependent, random sample from the populationDistribution of difference is symmetric

Hypotheses: H0: Median of the difference is 0

HA: Median of difference is non-zero

Test Statistic: Minimum of the rank sumsP-value: from the computer!

For this example,


Computer Outputs - Paired

Data for wives and husbands are in two separate columns, with matched observations in the same row.AnalyzeNonparametric tests2 Related Samples…

Wilcoxon Signed Ranks Test

Ranks

8a 5.88 47.00

2b 4.00 8.00

1c

11

Negative Ranks

Positive Ranks

Ties

Total

HUSBAND - WIFEN Mean Rank Sum of Ranks

HUSBAND < WIFEa.

HUSBAND > WIFEb.

WIFE = HUSBANDc.

Test Statisticsb

-2.007a

.045

Z

Asymp. Sig. (2-tailed)

HUSBAND -WIFE

Based on positive ranks.a.

Wilcoxon Signed Ranks Testb.

Computer Outputs - Paired

Data for wives and husbands are in two separate columns, with matched observations in the same row.AnalyzeNonparametric tests2 Related Samples…

Sign Test

Frequencies

8

2

1

11

Negative Differencesa

Positive Differencesb

Ties c

Total

HUSBAND - WIFEN

HUSBAND < WIFEa.

HUSBAND > WIFEb.

WIFE = HUSBANDc.

Test Statisticsb

.109aExact Sig. (2-tailed)

HUSBAND -WIFE

Binomial distribution used.a.

Sign Testb.

Two Independent Groups Example

Wife 0.4 0.5 1.0 0.2 0.9 1.0 1.2 0.1 0.6 0.4 0.2

Husband 0.5 0.4 0.7 0.0 0.6 1.2 0.7 0.1 0.5 0.1 0.1

Study Hypothesis: Men and women spend different amounts of time reading/watching the news.

The Mann-Whitney Test – 5 stepsAssumptions:

Independent, random samplesIndependently selected groupsThe shape and spread of the two distributions are the same

Hypotheses: H0: Group medians are the same

HA: Group medians are different

Test Statistic: rank sumsP-value: from the table or computer!

For this example,


Computer Outputs - Independent

Data for wives & husbands are in the same column; a second column indicates whether each observation is for the wife or husband*.AnalyzeNonparametric tests2 Independent Samples…

Mann-Whitney Test

Ranks

11 12.68 139.50

11 10.32 113.50

22

GROUPHusband

Wife

Total

TIMEN Mean Rank Sum of Ranks

Test Statisticsb

47.500

113.500

-.859

.390

.401a

Mann-Whitney U

Wilcoxon W

Z

Asymp. Sig. (2-tailed)

Exact Sig. [2*(1-tailedSig.)]

TIME

Not corrected for ties.a.

Grouping Variable: GROUPb.

*: Type of this variable must be Numeric in SPSS.

Comments on Nonparametric Test for 2 Independent Samples

Robust to outliersOne-sided tests are possibleNonparametric confidence intervals are also available from some software programsFor tied observations, use average rank for each tied observation.Possible Names

Mann-Whitney Rank Sum TestMann-Whitney TestMann-Whitney U TestWilcoxon Rank Sum Test

Testing for a Relationship between Categorical Variables

Large Sample SizeChi-square test

Small Sample SizeChi-square test with Yates’ continuity correction

Fisher’s exact test

Urgent Colonoscopy for the Diagnosis and Treatment of Severe Diverticular Hemorrhage New England Journal of Medicine 2000;342:78-82

Research Hypothesis

Severe Bleeding

Medical and Surgical Treatment

Medical and Colonoscopic Treatment

Total

No 11 10 21

Yes 6 0 6

Total 17 10 27

Fisher’s Exact Test – 5 stepsAssumptions:

Independent, random sample from the populationTwo variables are categorical

Hypotheses: H0: Response and Predictor are Independent

HA: Response and Predictor are Associated

Test Statistic: (p-value)P-value: from the computer!

For this example, p=0.057


Data Entry

Weight the variable: count.DataWeight Cases…

Computer Outputs - FET

Perform FET (or Chi-square test if sample size is large)AnalyzeDescriptive StatisticsCrosstabs…Assign “bleeding” for

“Row(s)”, “treat” for “Column(s)”

Click “Statistics” to check “Chi-

square”

CrosstabsBLEEDING * TREAT Crosstabulation

Count

11 10 21

6 6

17 10 27

No

Yes

BLEEDING

Total

Medical andSurgical

Treatment

Medical andColonoscopic

Treatment

TREAT

Total

Chi-Square Tests

4.538b 1 .033

2.726 1 .099

6.530 1 .011

.057 .042

4.370 1 .037

27

Pearson Chi-Square

Continuity Correctiona

Likelihood Ratio

Fisher's Exact Test

Linear-by-LinearAssociation

N of Valid Cases

Value dfAsymp. Sig.

(2-sided)Exact Sig.(2-sided)

Exact Sig.(1-sided)

Computed only for a 2x2 tablea.

2 cells (50.0%) have expected count less than 5. The minimum expected count is2.22.

b.

The Inexact Use of Fisher’s Exact Test in Six Major Medical Journals

JAMA 1989;261:3430-3433

Table 1. Specification of Use of Fisher’s Exact Test by Journal

Journal No. of Articles That Specified /

No. of Articles Reviewed

------------------------------------------------------------------------------------------------------

New England Journal of Medicine 8 / 9

Annals of Internal Medicine 2 / 4

British Medical Journal 3 / 6

The Journal of the American 6 / 16

Medical Association

Lancet 4 / 14

American Journal of Medicine 0 / 7

Homework

To be posted, not graded

Solutions will be posted on Monday

Read Chapters 24, 25, 27

Nonparametric Statistics Lecture 9. Small Sample, Non-normal Population If the sample was large, the...

Documents

Transcript of Nonparametric Statistics Lecture 9. Small Sample, Non-normal Population If the sample was large, the...