F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness...

25
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole. at least one u X X Y k k ... 2 2 1 0 : 0 ... : 1 2 0 H H k

Transcript of F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness...

Page 1: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

1

This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole.

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 2: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

2

We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 3: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

3

Of course we hope to reject it and conclude that the model does have some explanatory power.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 4: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

4

The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients b2, ..., bk are zero.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 5: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

5

The alternative hypothesis is that at least one of these b coefficients is different from zero.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 6: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

6

In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 7: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

7

In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

Page 8: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

)()1()1(

)(

)1(

)()1(

),1(

2

2

knRkR

knTSSRSS

kTSSESS

knRSSkESS

knkF

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

8

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

The F statistic for the test was defined in the last sequence in Chapter 2. ESS is the explained sum of squares and RSS is the residual sum of squares.

Page 9: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

)()1()1(

)(

)1(

)()1(

),1(

2

2

knRkR

knTSSRSS

kTSSESS

knRSSkESS

knkF

uXXY kk ...221

0 :

0...:

1

20

H

H k

at least one

9

It can be expressed in terms of R2 by dividing the numerator and denominator by TSS, the total sum of squares.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

Page 10: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

)()1()1(

)(

)1(

)()1(

),1(

2

2

knRkR

knTSSRSS

kTSSESS

knRSSkESS

knkF

uXXY kk ...221

10

0 :

0...:

1

20

H

H k

at least one

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

ESS / TSS is the definition of R2. RSS / TSS is equal to (1 – R2). (See the last sequence in Chapter 2.)

Page 11: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

11

The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uSFSMASVABCS 4321

Page 12: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

12

The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

Page 13: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

13

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

Here is the regression output using Data Set 21.

uSFSMASVABCS 4321

0 : ,0: 14320 HH

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681------------------------------------------------------------------------------

at least one

Page 14: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

14

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

In this example, k – 1, the number of explanatory variables, is equal to 3 and n – k, the number of degrees of freedom, is equal to 536.

uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

)/()1/(

,1knRSS

kESSknkF

3.104536/20243/1181

536,3 F

Page 15: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

15

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F )/()1/(

,1knRSS

kESSknkF

Page 16: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

16

uSFSMASVABCS 4321

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

The denominator is the residual sum of squares divided by the number of degrees of freedom remaining.

uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F )/()1/(

,1knRSS

kESSknkF

Page 17: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

17

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

Hence the F statistic is 104.3. All serious regression packages compute it for you as part of the diagnostics in the regression output.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F )/()1/(

,1knRSS

kESSknkF

Page 18: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

18

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is 5.51. Hence we easily reject H0 at the 0.1% level.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 19: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

19

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

This result could have been anticipated because both ASVABC and SF have highly significant t statistics. So we knew in advance that both b2 and b4 were non-zero.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 20: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

20

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 21: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

21

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

Then the F statistic should be low enough for H0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 22: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

22

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R2 is high. You would expect to have a highly significant F statistic.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 23: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

23

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant.

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 24: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

24

In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually.

F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION

uSFSMASVABCS 4321 uSFSMASVABCS 4321

0 : ,0: 14320 HH at least one

. reg S ASVABC SM SF

Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------

3.104536/20243/1181

536,3 F 51.5500,3crit,0.1% F

Page 25: F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates.

Copyright Christopher Dougherty 2012.

These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be

used as a resource for teaching an econometrics course. There is no need to

refer to the author.

The content of this slideshow comes from Section 3.5 of C. Dougherty,

Introduction to Econometrics, fourth edition 2011, Oxford University Press.

Additional (free) resources for both students and instructors may be

downloaded from the OUP Online Resource Centre

http://www.oup.com/uk/orc/bin/9780199567089/.

Individuals studying econometrics on their own who feel that they might benefit

from participation in a formal course should consider the London School of

Economics summer school course

EC212 Introduction to Econometrics

http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx

or the University of London International Programmes distance learning course

EC2020 Elements of Econometrics

www.londoninternational.ac.uk/lse.

2012.10.28