Interpreting Bi-variate OLS Regression

Week 4, 2007 Lecture 4 Slide #1

Interpreting Bi-variate OLS Regression• Stata Regression Output• Regression plots and RSS• R2 -- Coefficient of Determination

– Adjusted R2

• Sample Covariance/Correlation • Hypothesis Testing

– Standard Errors– T-tests and P-values


Data

• Use the “caschool.dat” file

• Data description:– CaliforniaTestScores.pdf

• Build a Stata do-file as you go

• Model:– Test score=f(student/teacher ratio)


Stata Regression Model:Regressing Student Teacher Ratio onto Test Score

0

5

10

15

Percent

15 20 25Student Teacher Ratio

0

5

10

15

Percent

600 620 640 660 680 700Test Score

histogram str, percent normal histogram testscr, percent normal


Source | SS df MS Number of obs = 420-------------+------------------------------ F( 1, 418) = 22.58 Model | 7794.11004 1 7794.11004 Prob > F = 0.0000 Residual | 144315.484 418 345.252353 R-squared = 0.0512-------------+------------------------------ Adj R-squared = 0.0490 Total | 152109.594 419 363.030056 Root MSE = 18.581------------------------------------------------------------------------------ testscr | Coef. Std. Err. t P>|t| Beta-------------+---------------------------------------------------------------- str | -2.279808 .4798256 -4.75 0.000 -.2263628 _cons | 698.933 9.467491 73.82 0.000 .------------------------------------------------------------------------------

Regression Outputregress testscr str


Regression Descriptive Statistics

cor testscr str, means

Variable | Mean Std. Dev. Min Max-------------+---------------------------------------------------- testscr | 654.1565 19.05335 605.55 706.75 str | 19.64043 1.891812 14 25.8

| testscr str-------------+------------------ testscr | 1.0000 str | -0.2264 1.0000


Regression Plot

600

620

640

660

680

700

15 20 25str

95% CI Fitted valuestestscr

twoway (scatter testscr str) (lfitci testscr str)


Measuring “Goodness of Fit”

• Root of Mean Squared Error (“Root MSE”)

– Measures spread around the regression line

• Coefficient of Determination (R2)

se =RSSn−K

, where RSS= e2 , K=parameters∑

∑∑

∑ ∑

−==−=

−=−=

2

222

22

)()1( and

)( and )ˆ(

YY

e

TSS

RSSR

TSS

ESSR

YYTSSYYESS

i

ii

“model” or explained sum of squares “total” sum of squares


Explaining R2

ˆ Y

Y

unexplained deviation explained deviation

For each observation Yi, variation around the mean canbe decomposed into that which is “explained” by theregression and that which is not:

Book terminology:TSS = (all)2

RSS = (unexplained)2

ESS = (explained)2

Stata terminology:Residual = (unexplained)2

Model = (explained)2

Total = (all)2


Sample Covariance & Correlation

• Sample covariance for a bivariate model is defined as:

• Sample correlations (r) “standardize” covariance by dividing by the product of the X and Y standard deviations:

sXY =(Xi − X)(Yi − Y)∑

n−1

r =sXYsXsY

Sample correlations range from-1 (perfect negative relationship) to+1 (perfect positive relationship)


Standardized Regression Coefficients(aka “Beta Weights” or “Betas”)

• Formula:

• In our example:

• Interpretation: the number of std. deviations change in Y one should expect from a one-std. deviation change in X.

b1* =b1

sXsY

226.019.053

1.892 2.28- −=⎟

⎠

⎞⎜⎝

⎛∗


Hypothesis Tests for Regression Coefficients

• For our model: Yi = 698.933-2.279808*Xi+ei

• Another sample of 420 observations would lead to different estimates for b0 and b1. If we drew many

such samples, we’d get the sample distribution of the estimates

• We need to estimate the sample distribution, (because we usually can’t see it) based on our sample size and variance


To do that we calculate SEbs (Bivariate case only)

SEb1 =seTSSX

, whereTSSX = (Xi − X)2∑

SEb0 =se1n+

X2

TSSX


Interpreting Standard Errors

• For our model:– b0 = 698.933, and SEb0 = 9.467

– b1 = -2.28, and SEb1 = .4798

b1 = -2.28b1 + SEb1= -1.8

0(which is 4.75 SEb1 “units”

away from b1)

Assuming that we estimated thesample standard error correctly, wecan identify how many standarderrors our estimate is away fromzero.

Estimated Sampling Distribution for b1 The T-test reports the number ofstandard errors our estimate fallsaway from zero. Thus, the “T” forb1 is 4.75 for our model. (rounding!)

b1 - SEb1=-2.76


Classical Hypothesis Testing

Estimated b1 = 2.27(working hypothesis)

Assume that b1 = 0.0(null hypothesis)

Assume that b1 is zero. What is the probability that your sample would haveresulted in an estimate for b1 that is 4.75 SEb1’s away from zero?

To find out, determine the cumulative density of the estimated samplingdistribution that falls more than 4.75 SEb1’s away from zero.

See Table 2, page 757, in Stock & Watson. It reports discrete “p-values”, giventhe sample size and t-values. Note the distinction between 1 and 2 sided tests

In general, if the t-stat is above 2,the p-value will be <0.05 -- which isthe acceptable upper limit in aclassical hypothesis test.

Note: in Stata-speak,a p-value is a “p>|t|”


Coming up...• For Next Week

– Use the caschool.dta dataseet

– Run a model in Stata using Average Income (avginc) to predict Average Test Scores (testscr)

– Examine the univariate distributions of both variables and the residuals

• Walk through the entire interpretation

• Build a Stata do-file as you go

• For Next Week:– Read Chapter 8 of Stock & Watson

Interpreting Bi-variate OLS Regression

Documents

Transcript of Interpreting Bi-variate OLS Regression