Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was...

1
Who was Right about ANOVA for Latin Squares: Neyman or Fisher? Arman Sabbaghi 1 & Donald B. Rubin 1 1 Harvard University Department of Statistics Background In a presentation to the Royal Statistical Society in 1935, Jerzy Neyman declared randomized complete blocks (RCB) a more valid design than Latin squares (LS), in terms of testing the null hypothesis of zero average treatment effects (Neyman’s null) (5). His conclusion ignited R.A. Fisher’s legendary temper, and their relationship became acrimonious, with no reconciliation ever being reached (1), (4), (7). Recall the standard ANOVA F-test statistic: F = MSTreatment MSResidual . Neyman claimed in (5) that, under the null hypothesis of zero average treatment effects: For RCB: E(MSTr) = E(MSRe). For LS: E(MSTr) E(MSRe). Based on this comparison of expectations, Neyman concluded that: ANOVA F-test for RCB has correct Type I error. Test for LS has higher Type I error than nominal. This type of reasoning persists in experimental design, e.g. (3), (10). Fisher argued that LS is valid when testing the null of absolutely no treatment effects (Fisher’s sharp null) using a randomization test (5). In our paper, we prove that Neyman’s conclusions in (5) are incorrect. To do so, we make no assumptions on the potential outcomes, and calculate E(MSTr) and E(MSRe), and evaluate the relationship between expectations of sums of squares and Type I error of the ANOVA F-test, for both RCB and LS designs. Notation Randomized Complete Block (RCB) Setup: N blocks, T units in each block, T treatments. Potential outcome of unit j in block i, under treatment t: Y ij (t) Additivity assumption: Y ij (t)= B i + τ (t) Latin Square (LS) Setup: T rows, T columns, T treatments. Potential outcome of unit in row i, column j , under treatment t: Y ij (t) Additivity assumption: Y ij (t)= R i + C j + τ (t) Fisher’s Sharp Null Hypothesis H # 0 : Y ij (1) = ... = Y ij (T ) for every unit (i, j ) Neyman’s Null Hypothesis H 0 : ¯ Y ·· (1) = ... = ¯ Y ·· (T ) Combination of Neyman’s null and additivity yield Fisher’s sharp null. Neyman made no assumptions on the potential outcomes. Results Calculation of Expectations Neyman incorrectly calculated E(MSRe), for both RCB and LS, by omitting interactions between the blocking factor(s) and treatment factor. RCB interaction = 1 (N -1)(T -1) N X i=1 T X t=1 { ¯ Y i· (t) - ¯ Y i· (·) - ¯ Y ·· (t)+ ¯ Y ·· (·)} 2 LS interaction = 1 (T -1) 2 T X i=1 T X t=1 { ¯ Y i· (t) - ¯ Y i· (·) - ¯ Y ·· (t)+ ¯ Y ·· (·)} 2 + 1 (T -1) 2 T X j =1 T X t=1 { ¯ Y ·j (t) - ¯ Y ·j (·) - ¯ Y ·· (t)+ ¯ Y ·· (·)} 2 In actuality, under Neyman’s null: For RCB: E(MSTr) E(MSRe). For LS: Inequality could go either way, depending on interactions. Connection with Type I Error Without making assumptions on the potential outcomes, the actual Type I error of the standard ANOVA F-test cannot be determined simply by comparing E(MSTr) and E(MSRe) under Neyman’s null. For LS, under Neyman’s null: E(MSTr) > E(MSRe) 6 = Type I error > nominal E(MSTr) < E(MSRe) 6 = Type I error < nominal For RCB, E(MSTr) E(MSRe), but Type I error can go in either direction. Simple Counterexample: 4 × 4 LS In the table, vector in cell (i, j ) represents (Y ij (1),Y ij (2),Y ij (3),Y ij (4)). While Neyman’s null hypothesis holds true for this example, the actual Type I error is not equal to the desired rate of 0.05. Potential outcomes in this table yield E(MSTr) > E(MSRe), and a Type I error nearly equal to zero, contradicting Neyman’s claim that the Type I error should be larger than 0.05. Interactions between blocking factors and treatment drive this example. Column 1 Column 2 Column 3 Column 4 Row 1 (1, 0, 0, 0) (0, 0, 1, 0) (0, 0, 0, 10) (0, 1, 0, 0) Row 2 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, -9) (0, 0, 0, 0) Row 3 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) Row 4 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) Similar counterexamples to Neyman’s claim that a comparison of E(MSTr) and E(MSRe) yields information on Type I errors can be constructed with E(MSTr) < E(MSRe), and Type I error nearly 0.1. We can even generate counterexamples with no interactions between the blocking factors and treatment. Conclusion I am considering problems which are important from the point of view of agriculture. And from this viewpoint it is immaterial whether any two varieties react a little differently to the local differences in the soil. What is important is whether on a larger field they are able to give equal or different yields. (Neyman, 1935) Dr. Neyman thinks that another test would be more important. I am not going to argue that point. It may be that the question which Dr. Neyman thinks should be answered is more important than the one I have proposed and attempted to answer. I suggest that before criticizing previous work it is always wise to give enough study to the subject to understand its purpose. Failing that it is surely quite unusual to claim to understand the purpose of previous work better than its author. (Fisher, 1935) Neyman’s expressions for E(MSRe), for both RCB and LS, are incorrect. Type I error cannot be gauged by comparing E(MSTr) and E(MSRe) un- der Neyman’s null without further assumptions on the potential outcomes, e.g. regarding the interaction between blocking factor(s) and treatment. Both RCB and LS yield misleading inferences in the presence of such interaction, and replication would be required to enable valid inference. References [1] Box J.F. (1978). R.A. Fisher, the life of a scientist. Wiley Series in Probability and Mathematical Statistics, New York. [2] Fisher R.A. (1950). The Design of Experiments (Sixth ed.). Hafnar Publishing Company, New York. [3] Hinkelmann K., Kempthorne O. (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. [4] Lehmann E.L. (2011). Fisher, Neyman, and the Creation of Classical Statistics. Springer, New York. [5] Neyman J., with cooperation of K. Iwaszkiewicz and St. Kolodziejczyk (1935). Statistical problems in agricultural experimentation (with discus- sion). Suppl. J. Roy. Statist. Soc. Ser. B 2 107-180. [6] Pitman E.J.G. (1938). Significance tests which may be applied to samples from any populations: III. The Analysis of Variance Test. Biometrika Vol. 29, No. 3/4, 322-335. [7] Reid C. (1982). Neyman: From Life. Springer, New York. [8] Welch B.L. (1937). On the z-test in randomized blocks and latin squares. Biometrika Vol. 29, No. 1/2, 21-52. [9] Wilk M.B. (1955). The randomization analysis of a generalized randomized block design. Biometrika 42, 70-79. [10] Wilk M.B., Kempthorne O. (1957). Non-additivities in a Latin square design. J. Amer. Statist. Assoc. 52, 218-236. This material is based upon research supported by the United States National Science Foundation Graduate Research Fellowship under Grant No. DGE-1144152.

Transcript of Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was...

Page 1: Who was Right about ANOVA for Latin Squares: Neyman or …sabbaghi/posters/ACIC_2012.pdfWho was Right about ANOVA for Latin Squares: Neyman or Fisher? ... Fisher, Neyman, and the Creation

Who was Right about ANOVA for Latin Squares: Neyman or Fisher?Arman Sabbaghi1 & Donald B. Rubin1

1Harvard University Department of Statistics

BackgroundIn a presentation to the Royal Statistical Society in 1935, Jerzy Neymandeclared randomized complete blocks (RCB) a more valid design thanLatin squares (LS), in terms of testing the null hypothesis of zero averagetreatment effects (Neyman’s null) (5). His conclusion ignited R.A. Fisher’slegendary temper, and their relationship became acrimonious, with noreconciliation ever being reached (1), (4), (7).

Recall the standard ANOVA F-test statistic: F = MSTreatmentMSResidual .

Neyman claimed in (5) that, under the null hypothesis of zero averagetreatment effects:

• For RCB: E(MSTr) = E(MSRe).

• For LS: E(MSTr) ≥ E(MSRe).

Based on this comparison of expectations, Neyman concluded that:

• ANOVA F-test for RCB has correct Type I error.

• Test for LS has higher Type I error than nominal.

This type of reasoning persists in experimental design, e.g. (3), (10).

Fisher argued that LS is valid when testing the null of absolutely notreatment effects (Fisher’s sharp null) using a randomization test (5).

In our paper, we prove that Neyman’s conclusions in (5) are incorrect.To do so, we make no assumptions on the potential outcomes, and

• calculate E(MSTr) and E(MSRe), and

• evaluate the relationship between expectations of sums of squaresand Type I error of the ANOVA F-test, for both RCB and LS designs.

NotationRandomized Complete Block (RCB)

Setup: N blocks, T units in each block, T treatments.Potential outcome of unit j in block i, under treatment t: Yij(t)Additivity assumption: Yij(t) = Bi + τ(t)

Latin Square (LS)

Setup: T rows, T columns, T treatments.Potential outcome of unit in row i, column j, under treatment t: Yij(t)Additivity assumption: Yij(t) = Ri + Cj + τ(t)

Fisher’s Sharp Null Hypothesis

H#0 : Yij(1) = . . . = Yij(T ) for every unit (i, j)

Neyman’s Null Hypothesis

H0 : Y··(1) = . . . = Y··(T )

Combination of Neyman’s null and additivity yield Fisher’s sharp null.Neyman made no assumptions on the potential outcomes.

ResultsCalculation of Expectations

Neyman incorrectly calculated E(MSRe), for both RCB and LS, byomitting interactions between the blocking factor(s) and treatment factor.

RCB interaction = 1(N−1)(T−1)

N∑i=1

T∑t=1

{Yi·(t)− Yi·(·)− Y··(t) + Y··(·)}2

LS interaction = 1(T−1)2

T∑i=1

T∑t=1

{Yi·(t)− Yi·(·)− Y··(t) + Y··(·)}2 +

1(T−1)2

T∑j=1

T∑t=1

{Y·j(t)− Y·j(·)− Y··(t) + Y··(·)}2

In actuality, under Neyman’s null:

• For RCB: E(MSTr) ≤ E(MSRe).

• For LS: Inequality could go either way, depending on interactions.

Connection with Type I Error

Without making assumptions on the potential outcomes, the actual TypeI error of the standard ANOVA F-test cannot be determined simply bycomparing E(MSTr) and E(MSRe) under Neyman’s null.

For LS, under Neyman’s null:

• E(MSTr) > E(MSRe) 6=⇒ Type I error > nominal

• E(MSTr) < E(MSRe) 6=⇒ Type I error < nominal

For RCB, E(MSTr) ≤ E(MSRe), but Type I error can go in either direction.

Simple Counterexample: 4× 4 LS

In the table, vector in cell (i, j) represents (Yij(1), Yij(2), Yij(3), Yij(4)).

While Neyman’s null hypothesis holds true for this example, the actualType I error is not equal to the desired rate of 0.05.

Potential outcomes in this table yield E(MSTr) > E(MSRe), and a Type Ierror nearly equal to zero, contradicting Neyman’s claim that the Type Ierror should be larger than 0.05.

Interactions between blocking factors and treatment drive this example.

Column 1 Column 2 Column 3 Column 4

Row 1 (1, 0, 0, 0) (0, 0, 1, 0) (0, 0, 0, 10) (0, 1, 0, 0)Row 2 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0,−9) (0, 0, 0, 0)Row 3 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)Row 4 (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)

Similar counterexamples to Neyman’s claim that a comparison ofE(MSTr) and E(MSRe) yields information on Type I errors can beconstructed with E(MSTr) < E(MSRe), and Type I error nearly 0.1.

We can even generate counterexamples with no interactions between theblocking factors and treatment.

Conclusion

I am considering problems which are important from the point of viewof agriculture. And from this viewpoint it is immaterial whether any twovarieties react a little differently to the local differences in the soil. Whatis important is whether on a larger field they are able to give equal ordifferent yields. (Neyman, 1935)

Dr. Neyman thinks that another test would be more important. I amnot going to argue that point. It may be that the question which Dr.Neyman thinks should be answered is more important than the one Ihave proposed and attempted to answer. I suggest that before criticizingprevious work it is always wise to give enough study to the subject tounderstand its purpose. Failing that it is surely quite unusual to claim tounderstand the purpose of previous work better than its author. (Fisher,1935)

Neyman’s expressions for E(MSRe), for both RCB and LS, are incorrect.

Type I error cannot be gauged by comparing E(MSTr) and E(MSRe) un-der Neyman’s null without further assumptions on the potential outcomes,e.g. regarding the interaction between blocking factor(s) and treatment.

Both RCB and LS yield misleading inferences in the presence of suchinteraction, and replication would be required to enable valid inference.

References[1] Box J.F. (1978). R.A. Fisher, the life of a scientist. Wiley Series in Probability and Mathematical Statistics, New York.

[2] Fisher R.A. (1950). The Design of Experiments (Sixth ed.). Hafnar Publishing Company, New York.

[3] Hinkelmann K., Kempthorne O. (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.).Wiley.

[4] Lehmann E.L. (2011). Fisher, Neyman, and the Creation of Classical Statistics. Springer, New York.

[5] Neyman J., with cooperation of K. Iwaszkiewicz and St. Kolodziejczyk (1935). Statistical problems in agricultural experimentation (with discus-sion). Suppl. J. Roy. Statist. Soc. Ser. B 2 107-180.

[6] Pitman E.J.G. (1938). Significance tests which may be applied to samples from any populations: III. The Analysis of Variance Test. BiometrikaVol. 29, No. 3/4, 322-335.

[7] Reid C. (1982). Neyman: From Life. Springer, New York.

[8] Welch B.L. (1937). On the z-test in randomized blocks and latin squares. Biometrika Vol. 29, No. 1/2, 21-52.

[9] Wilk M.B. (1955). The randomization analysis of a generalized randomized block design. Biometrika 42, 70-79.

[10] Wilk M.B., Kempthorne O. (1957). Non-additivities in a Latin square design. J. Amer. Statist. Assoc. 52, 218-236.

This material is based upon research supported by the United States National Science Foundation Graduate Research Fellowship under

Grant No. DGE-1144152.