Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was...

Who was Right about ANOVA for Latin Squares: Neyman or Fisher?Arman Sabbaghi1 & Donald B. Rubin1

1Harvard University Department of Statistics

BackgroundIn a presentation to the Royal Statistical Society in 1935, Jerzy Neymandeclared randomized complete blocks (RCB) a more valid design thanLatin squares (LS), in terms of testing the null hypothesis of zero averagetreatment effects (Neyman’s null) (5). His conclusion ignited R.A. Fisher’slegendary temper, and their relationship became acrimonious, with noreconciliation ever being reached (1), (4), (7).

Recall the standard ANOVA F-test statistic: F = MSTreatmentMSResidual .

Neyman claimed in (5) that, under the null hypothesis of zero averagetreatment effects:

• For RCB: E(MSTr) = E(MSRe).

• For LS: E(MSTr) ≥ E(MSRe).

Based on this comparison of expectations, Neyman concluded that:

• ANOVA F-test for RCB has correct Type I error.

• Test for LS has higher Type I error than nominal.

This type of reasoning persists in experimental design, e.g. (3), (10).

Fisher argued that LS is valid when testing the null of absolutely notreatment effects (Fisher’s sharp null) using a randomization test (5).

In our paper, we prove that Neyman’s conclusions in (5) are incorrect.To do so, we make no assumptions on the potential outcomes, and

• calculate E(MSTr) and E(MSRe), and

• evaluate the relationship between expectations of sums of squaresand Type I error of the ANOVA F-test, for both RCB and LS designs.

NotationRandomized Complete Block (RCB)

Setup: N blocks, T units in each block, T treatments.Potential outcome of unit j in block i, under treatment t: Yij(t)Additivity assumption: Yij(t) = Bi + τ(t)

Latin Square (LS)

Setup: T rows, T columns, T treatments.Potential outcome of unit in row i, column j, under treatment t: Yij(t)Additivity assumption: Yij(t) = Ri + Cj + τ(t)

Fisher’s Sharp Null Hypothesis

H#0 : Yij(1) = . . . = Yij(T ) for every unit (i, j)

Neyman’s Null Hypothesis

H0 : Y··(1) = . . . = Y··(T )

Combination of Neyman’s null and additivity yield Fisher’s sharp null.Neyman made no assumptions on the potential outcomes.

ResultsCalculation of Expectations

Neyman incorrectly calculated E(MSRe), for both RCB and LS, byomitting interactions between the blocking factor(s) and treatment factor.

RCB interaction = 1(N−1)(T−1)

N∑i=1

T∑t=1

{Yi·(t)− Yi·(·)− Y··(t) + Y··(·)}2

LS interaction = 1(T−1)2

T∑i=1

T∑t=1

{Yi·(t)− Yi·(·)− Y··(t) + Y··(·)}2 +

1(T−1)2

T∑j=1

T∑t=1

{Y·j(t)− Y·j(·)− Y··(t) + Y··(·)}2

In actuality, under Neyman’s null:

• For RCB: E(MSTr) ≤ E(MSRe).

• For LS: Inequality could go either way, depending on interactions.

Connection with Type I Error

Without making assumptions on the potential outcomes, the actual TypeI error of the standard ANOVA F-test cannot be determined simply bycomparing E(MSTr) and E(MSRe) under Neyman’s null.

For LS, under Neyman’s null:

• E(MSTr) > E(MSRe) 6=⇒ Type I error > nominal

• E(MSTr) < E(MSRe) 6=⇒ Type I error < nominal

For RCB, E(MSTr) ≤ E(MSRe), but Type I error can go in either direction.

Simple Counterexample: 4× 4 LS

In the table, vector in cell (i, j) represents (Yij(1), Yij(2), Yij(3), Yij(4)).

While Neyman’s null hypothesis holds true for this example, the actualType I error is not equal to the desired rate of 0.05.

Potential outcomes in this table yield E(MSTr) > E(MSRe), and a Type Ierror nearly equal to zero, contradicting Neyman’s claim that the Type Ierror should be larger than 0.05.

Interactions between blocking factors and treatment drive this example.

Column 1 Column 2 Column 3 Column 4

Row 1 (1, 0, 0, 0) (0, 0, 1, 0) (0, 0, 0, 10) (0, 1, 0, 0)Row 2 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0,−9) (0, 0, 0, 0)Row 3 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)Row 4 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)

Similar counterexamples to Neyman’s claim that a comparison ofE(MSTr) and E(MSRe) yields information on Type I errors can beconstructed with E(MSTr) < E(MSRe), and Type I error nearly 0.1.

We can even generate counterexamples with no interactions between theblocking factors and treatment.

Conclusion

I am considering problems which are important from the point of viewof agriculture. And from this viewpoint it is immaterial whether any twovarieties react a little differently to the local differences in the soil. Whatis important is whether on a larger field they are able to give equal ordifferent yields. (Neyman, 1935)

Dr. Neyman thinks that another test would be more important. I amnot going to argue that point. It may be that the question which Dr.Neyman thinks should be answered is more important than the one Ihave proposed and attempted to answer. I suggest that before criticizingprevious work it is always wise to give enough study to the subject tounderstand its purpose. Failing that it is surely quite unusual to claim tounderstand the purpose of previous work better than its author. (Fisher,1935)

Neyman’s expressions for E(MSRe), for both RCB and LS, are incorrect.

Type I error cannot be gauged by comparing E(MSTr) and E(MSRe) un-der Neyman’s null without further assumptions on the potential outcomes,e.g. regarding the interaction between blocking factor(s) and treatment.

Both RCB and LS yield misleading inferences in the presence of suchinteraction, and replication would be required to enable valid inference.

References[1] Box J.F. (1978). R.A. Fisher, the life of a scientist. Wiley Series in Probability and Mathematical Statistics, New York.

[2] Fisher R.A. (1950). The Design of Experiments (Sixth ed.). Hafnar Publishing Company, New York.

[3] Hinkelmann K., Kempthorne O. (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.).Wiley.

[4] Lehmann E.L. (2011). Fisher, Neyman, and the Creation of Classical Statistics. Springer, New York.

[5] Neyman J., with cooperation of K. Iwaszkiewicz and St. Kolodziejczyk (1935). Statistical problems in agricultural experimentation (with discus-sion). Suppl. J. Roy. Statist. Soc. Ser. B 2 107-180.

[6] Pitman E.J.G. (1938). Significance tests which may be applied to samples from any populations: III. The Analysis of Variance Test. BiometrikaVol. 29, No. 3/4, 322-335.

[7] Reid C. (1982). Neyman: From Life. Springer, New York.

[8] Welch B.L. (1937). On the z-test in randomized blocks and latin squares. Biometrika Vol. 29, No. 1/2, 21-52.

[9] Wilk M.B. (1955). The randomization analysis of a generalized randomized block design. Biometrika 42, 70-79.

[10] Wilk M.B., Kempthorne O. (1957). Non-additivities in a Latin square design. J. Amer. Statist. Assoc. 52, 218-236.

This material is based upon research supported by the United States National Science Foundation Graduate Research Fellowship under

Grant No. DGE-1144152.

Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was...

Documents

Transcript of Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was...