F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness...
-
Upload
reynard-greer -
Category
Documents
-
view
222 -
download
0
Transcript of F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness...
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
1
This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole.
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
2
We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
3
Of course we hope to reject it and conclude that the model does have some explanatory power.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
4
The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients b2, ..., bk are zero.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
5
The alternative hypothesis is that at least one of these b coefficients is different from zero.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
6
In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
7
In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
)()1()1(
)(
)1(
)()1(
),1(
2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
8
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
The F statistic for the test was defined in the last sequence in Chapter 2. ESS is the explained sum of squares and RSS is the residual sum of squares.
)()1()1(
)(
)1(
)()1(
),1(
2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
uXXY kk ...221
0 :
0...:
1
20
H
H k
at least one
9
It can be expressed in terms of R2 by dividing the numerator and denominator by TSS, the total sum of squares.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
)()1()1(
)(
)1(
)()1(
),1(
2
2
knRkR
knTSSRSS
kTSSESS
knRSSkESS
knkF
uXXY kk ...221
10
0 :
0...:
1
20
H
H k
at least one
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
ESS / TSS is the definition of R2. RSS / TSS is equal to (1 – R2). (See the last sequence in Chapter 2.)
11
The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uSFSMASVABCS 4321
12
The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
13
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
Here is the regression output using Data Set 21.
uSFSMASVABCS 4321
0 : ,0: 14320 HH
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------ S | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- ASVABC | .1257087 .0098533 12.76 0.000 .1063528 .1450646 SM | .0492424 .0390901 1.26 0.208 -.027546 .1260309 SF | .1076825 .0309522 3.48 0.001 .04688 .1684851 _cons | 5.370631 .4882155 11.00 0.000 4.41158 6.329681------------------------------------------------------------------------------
at least one
14
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
In this example, k – 1, the number of explanatory variables, is equal to 3 and n – k, the number of degrees of freedom, is equal to 536.
uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
)/()1/(
,1knRSS
kESSknkF
3.104536/20243/1181
536,3 F
15
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F )/()1/(
,1knRSS
kESSknkF
16
uSFSMASVABCS 4321
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
The denominator is the residual sum of squares divided by the number of degrees of freedom remaining.
uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F )/()1/(
,1knRSS
kESSknkF
17
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
Hence the F statistic is 104.3. All serious regression packages compute it for you as part of the diagnostics in the regression output.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F )/()1/(
,1knRSS
kESSknkF
18
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is 5.51. Hence we easily reject H0 at the 0.1% level.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
19
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
This result could have been anticipated because both ASVABC and SF have highly significant t statistics. So we knew in advance that both b2 and b4 were non-zero.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
20
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
21
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
Then the F statistic should be low enough for H0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
22
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R2 is high. You would expect to have a highly significant F statistic.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
23
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant.
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
24
In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually.
F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
uSFSMASVABCS 4321 uSFSMASVABCS 4321
0 : ,0: 14320 HH at least one
. reg S ASVABC SM SF
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 3, 536) = 104.30 Model | 1181.36981 3 393.789935 Prob > F = 0.0000 Residual | 2023.61353 536 3.77539837 R-squared = 0.3686-------------+------------------------------ Adj R-squared = 0.3651 Total | 3204.98333 539 5.94616574 Root MSE = 1.943------------------------------------------------------------------------------
3.104536/20243/1181
536,3 F 51.5500,3crit,0.1% F
Copyright Christopher Dougherty 2012.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 3.5 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2012.10.28