Lecture4-5

University of Hong Kong Introductory Econometrics (ECON0701), Spring 2014

27 January 2014

The Simple Regression Model

• Last time, we introduced the simple regression model. • The simple regression model takes the form

• Where 0 and 1 are chosen so that the fitted values of the model are as close

to the data as possible (minimizing the sum of the squared residuals over all the data).


• Now we will do an empirical example of a simple regression analysis.

• We are interested in exploring the relationship between CEO salaries, the dependent variable, and return on equity, the independent variable.

• The idea is to find out whether CEOs in more profitable companies (with a higher return to equity) get paid more.

The Simple Regression Model ___ ____ ____ ____ ____ tm /__ / ____/ / ____/ ___/ / /___/ / /___/ 7.0 Copyright 1984-2002 Statistics/Data Analysis Stata Corporation 4905 Lakeway Drive Special Edition College Station, Texas 979-696-4600 979-696-4601 (fax) 6-user Stata for Windows (network) perpetual license: Serial number: 81970521252 Licensed to: School of Economics & Finance University of Hong Kong Notes: 1. (/m# option or -set memory-) 0.98 MB allocated to data 2. (/v# option or -set maxvar-) 5000 maximum variables . use "D:\Econometrics\Statafiles\CEOSAL1.DTA"

0 1y x u

The Simple Regression Model . d Contains data from D:\Econometrics\Statafiles\CEOSAL1.DTA obs: 209 vars: 12 16 Sep 1996 15:53 size: 7,106 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- salary int %9.0g 1990 salary, thousands $ pcsalary int %9.0g % change salary, 89-90 sales float %9.0g 1990 firm sales, millions $ roe float %9.0g return on equity, 88-90 avg pcroe float %9.0g % change roe, 88-90 ros int %9.0g return on firm's stock, 88-90 indus byte %9.0g =1 if industrial firm finance byte %9.0g =1 if financial firm consprod byte %9.0g =1 if consumer product firm utility byte %9.0g =1 if transport. or utilties lsalary float %9.0g natural log of salary lsales float %9.0g natural log of sales ------------------------------------------------------------------------------- Sorted by:

The Simple Regression Model . summarize salary, detail 1990 salary, thousands $ ------------------------------------------------------------- Percentiles Smallest 1% 333 223 5% 448 256 10% 525 333 Obs 209 25% 736 360 Sum of Wgt. 209 50% 1039 Mean 1281.12 Largest Std. Dev. 1372.345 75% 1407 4143 90% 1900 6640 Variance 1883332 95% 2327 11233 Skewness 6.854923 99% 6640 14822 Kurtosis 60.54128


. summarize roe, detail return on equity, 88-90 avg ------------------------------------------------------------- Percentiles Smallest 1% 2.1 .5 5% 6.8 1.9 10% 8.9 2.1 Obs 209 25% 12.4 2.9 Sum of Wgt. 209 50% 15.5 Mean 17.18421 Largest Std. Dev. 8.518509 75% 20 44.4 90% 26.8 44.5 Variance 72.56499 95% 35.1 48.1 Skewness 1.56082 99% 44.5 56.3 Kurtosis 6.678555

The Simple Regression Model • Are CEO salaries and return on equity correlated? . correlate salary roe (obs=209) | salary roe -------------+------------------ salary | 1.0000 roe | 0.1148 1.0000

• What is the covariance? . correlate salary roe, covariance (obs=209) | salary roe -------------+------------------ salary | 1.9e+06 roe | 1342.54 72.565

The Simple Regression Model • We can use the variance-covariance table to estimate 1: • Equation to be estimated is

salary = 0 + 1roe + u

The Simple Regression Model . regress salary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 2.77 Model | 5166419.04 1 5166419.04 Prob > F = 0.0978 Residual | 386566563 207 1867471.32 R-squared = 0.0132 -------------+------------------------------ Adj R-squared = 0.0084 Total | 391732982 208 1883331.64 Root MSE = 1366.6 ------------------------------------------------------------------------------ salary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | 18.50119 11.12325 1.66 0.098 -3.428195 40.43057 _cons | 963.1913 213.2403 4.52 0.000 542.7902 1383.592 ------------------------------------------------------------------------------

• The coefficient on "roe" is the estimate of 1. The coefficient labelled "_cons" is the estimate of 0 (the constant term).

1

ˆ ( , ) 1342.54ˆ 18.5ˆ 72.565( )

Cov X Y

Var X


• These commands will make a graph of the residuals and the actual values: . predict salaryhat (option xb assumed; fitted values) . twoway (line salaryhat roe) (scatter salary roe), scheme(s2mono)

05

000

100

001

5000

0 20 40 60return on equity, 88-90 avg

Fitted values 1990 salary, thousands $


• Let's do a second example, this time showing how to read in data. We are looking

at the relationship between wages and years of education. • The data is from the dataset on the Wooldridge site WAGE1.RAW; first, look at

the description file WAGE1.DES: WAGE1.DES wage educ exper tenure nonwhite female married

numdep smsa northcen south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq

tenursq Obs: 526 1. wage average hourly earnings 2. educ years of education ...

The Simple Regression Model • To read in the data, type: . clear . infile wage educ exper tenure nonwhite female married numdep smsa northcen south

west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq using "D:\Econometrics\Textfiles\WAGE1.RAW"

The Simple Regression Model • The general syntax is “infile var1 var2 ... using filename”. • The Stata tutorial goes into more detail on infiling. • . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 1179.73204 1 1179.73204 Prob > F = 0.0000 Residual | 5980.68225 524 11.4135158 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 7160.41429 525 13.6388844 Root MSE = 3.3784 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5413593 .053248 10.17 0.000 .4367534 .6459651 _cons | -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 ------------------------------------------------------------------------------


• Now let’s explore some properties of residuals from the regression. . predict uhat, residuals . summarize uhat Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- uhat | 526 4.43e-09 3.37517 -5.339615 16.60854 . correlate uhat educ (obs=526) | uhat educ -------------+------------------ uhat | 1.0000 educ | 0.0000 1.0000

The Simple Regression Model • Mathematically, these observations translate into two facts:

1

ˆ 0n

ii

u

1

ˆ 0n

i ii

x u


• Now we will explore some measures of model fit.

• To measure the systematic variance of y, we will use the explained sum of squares measure.

• To measure the unsystematic variance of y, we will use the residual sum of squares measure.


• The definitions of the explained and residual sum of squares are:


• The sum of these two is the total sum of squares. • The R-squared of the regression is the ratio of the explained sum of squares to the

total sum of squares.

2

1

2

1

ˆ

ˆ

n

ii

n

ii

SSR u

SSE y y

2

1

n

ii

SST SSE SSR

SST y y

2 SSE SST SSRR

SST SST

The Simple Regression Model • The model fit measures are given in the regression output above the regression

coefficients: . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 1179.73204 1 1179.73204 Prob > F = 0.0000 Residual | 5980.68225 524 11.4135158 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 7160.41429 525 13.6388844 Root MSE = 3.3784 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5413593 .053248 10.17 0.000 .4367534 .6459651 _cons | -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 ------------------------------------------------------------------------------


• It is important to note that OLS statistics depend on the units of measurement.

• For example, suppose we wanted to analyze the relationship between wages and education, but use Hong Kong dollars instead of U.S. dollars.

replace wage=wage*7.8


• This is the result: . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 71774.8968 1 71774.8968 Prob > F = 0.0000 Residual | 363864.708 524 694.398297 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 435639.605 525 829.789723 Root MSE = 26.351 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 4.222602 .4153347 10.17 0.000 3.406677 5.038528 _cons | -7.057842 5.342749 -1.32 0.187 -17.55368 3.437996 ------------------------------------------------------------------------------

• Notice that the coefficients are all multiplied by 7.8, because now they denote HK$,

not US$.

2 1179.730.1648

7160.41R

Break

The Simple Regression Model • Last time, we continued our discussion of the simple regression model. • Even though the simple regression model is a linear model, it can encompass some

nonlinear relationships as well. • For example, one might wish to regress the natural logarithm of a dependent

variable on the independent variable. • This permits analyzing percentage changes, as opposed to absolute changes.

The Simple Regression Model • Why does taking the log of the dependent variable allow interpreting changes as

percentage changes? • This is true because (when n is small) • Suppose n=0.01 (for example). Then, adding 0.01 to the natural log of C is like

multiplying C by 1.01, or increasing C by 1%.


• Let’s re-visit our analysis of CEO salaries, but using log salaries this time. . use "D:\Econometrics\Statafiles\CEOSAL1.DTA" . d Contains data from D:\Econometrics\Statafiles\CEOSAL1.DTA obs: 209 vars: 12 16 Sep 1996 15:53 size: 7,106 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- salary int %9.0g 1990 salary, thousands $ pcsalary int %9.0g % change salary, 89-90 sales float %9.0g 1990 firm sales, millions $ ... lsalary float %9.0g natural log of salary lsales float %9.0g natural log of sales -------------------------------------------------------------------------------

ln 1exp C n C n

The Simple Regression Model . regress lsalary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 9.41 Model | 2.90054039 1 2.90054039 Prob > F = 0.0024 Residual | 63.8216228 207 .308317018 R-squared = 0.0435 -------------+------------------------------ Adj R-squared = 0.0389 Total | 66.7221632 208 .320779631 Root MSE = .55526 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | .0138626 .0045196 3.07 0.002 .0049522 .022773 _cons | 6.712169 .0866445 77.47 0.000 6.54135 6.882987 ------------------------------------------------------------------------------

• The coefficient on “roe” is interpreted this way: for every percent increase in the return on equity (for example, 5% to 6%, NOT 5% to 5.05%), the CEO’s salary is expected to be 1.38% higher.


• Another property of using log salary is that it is invariant to scale. For example,

suppose we want to analyze CEO salaries in Hong Kong dollars.

. replace salary=salary*7.8 . replace lsalary=ln(salary) . regress lsalary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 9.41 Model | 2.90053936 1 2.90053936 Prob > F = 0.0024 Residual | 63.8216209 207 .308317009 R-squared = 0.0435 -------------+------------------------------ Adj R-squared = 0.0389 Total | 66.7221603 208 .320779617 Root MSE = .55526 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | .0138626 .0045196 3.07 0.002 .0049522 .022773 _cons | 8.766292 .0866445 101.18 0.000 8.595473 8.937111 ------------------------------------------------------------------------------

The Simple Regression Model • The simple regression model can also be used to estimate constant-elasticity

models.

. regress lsalary lsales Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 55.30 Model | 14.0661668 1 14.0661668 Prob > F = 0.0000 Residual | 52.6559935 207 .25437678 R-squared = 0.2108 -------------+------------------------------ Adj R-squared = 0.2070 Total | 66.7221603 208 .320779617 Root MSE = .50436 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lsales | .2566717 .0345167 7.44 0.000 .1886224 .3247209 _cons | 6.87612 .2883396 23.85 0.000 6.307662 7.444579 ------------------------------------------------------------------------------

56

78

91

0

4 6 8 10 12natural log of sales

Fitted values natural log of salary


• What kinds of linearities are NOT possible to model with the regression model?

• The regression must be linear in the parameters, not necessarily in the variables.

• Nonlinear regression is possible, but beyond the scope of this course.

The Simple Regression Model • These are OK … • These are NOT ok.

The Simple Regression Model • The OLS estimators will be unbiased estimates (that is, their expected value is

equal to the true parameter value) of 0 and 1when four conditions are true. • The model must be linear in parameters, i.e. have the form

The Simple Regression Model • The data set must represent a random sample from the population. • The conditional mean of the error term must be zero.


• There must be some variation in the dependent variable (the x’s can’t all be equal to the same constant).

• Intuitively, the model tells you the effects of changes in x on y. If there are no changes in x to observe, there is no way to find this out.

20 1

0 1

1

y x u

x uy

2

0 1

0 1

1

y x u

y ux

0 1y x u

, : 1, 2,...,i ix y i n

| 0E u x

The Simple Regression Model • To show that the OLS estimator of 1is unbiased, write :


• This reduces to


• For 0, averaging the sample form of the model over the population • yields

0 11 1

1 2 2

1 1

ˆ ( , )ˆˆ ( )

n n

i i ii i

n n

i ii i

x x y x x x uCov X Y

Var Xx x x x

2

1 0 11 1 1 1

ˆn n n n

i i i ii i i i

x x x x x x x x x u

1

1 1 2

1

ˆ

n

ii

n

ii

x x u

x x

1

1 1 2

1

|ˆ |

n

ii

n

ii

x x E u xE x

x x

0 1i i iy x u

0 1y x u

0 1 0 1 1 0 1 1ˆ ˆ ˆ ˆy x x u x x u

The Simple Regression Model • Finally, taking the expectation of this expression yields • By what we have proven before, the expectation in the second term is zero; by the

zero conditional mean assumption the third term is zero. These then imply the estimator is unbiased.

The Simple Regression Model • Since the estimators of 0 and 1are random variables, we can calculate their

variance as well. • To simplify the calculation of their variance, we will assume conditional

homoskedasticity:


• When there is not conditional homoskedasticity, the error term is said to exhibit heteroskedasticity.

• That is to say, the variance of the error term is dependent on x.

0 0 1 1ˆ ˆ|E x E x E u

2|Var u x

y

x

y Fitted Values

.3 29.9

8.6295

19.8891

y

x

y Fitted Values

.3 29.9

10.07

24.5125

The Simple Regression Model • How can we find the variance of the estimator of 1?

The Simple Regression Model • This can be simplified:

1

1 12

1

ˆ

n

iin

ii

x x u

x x

11 1 2

2

1

|ˆ( | ) ( | )

n

ii

n

ii

Var x x u x

Var x Var x

x x

2

11 2

2

1

22

11 2

2

1

|ˆ( | )

ˆ( | )

n

ii

n

ii

n

ii

n

ii

x x Var u xVar x

x x

x xVar x

x x

The Simple Regression Model • Finally,


• It is sometimes desirable to restrict the linear regression to run through the origin

(x=0,y=0). This is equivalent to saying that 0=0: • An example might be modelling income and savings. If your income is zero, your

savings must be zero as well.

The Simple Regression Model • The problem can be solved by minimizing squared residuals:

y

x1.5e-09 30

7.94396

21.4184

2

12

1

ˆ( | ) n

ii

Var xx x

y x u

2

1 1

ˆ ˆ2 0ˆ

n n

i i i i ii i

dy x y x x

d

2

1

ˆ 0n

i i ii

y x x

The Simple Regression Model • Now for an example of origin regression.

. use "D:\Econometrics\Statafiles\SAVING.DTA" . d Contains data from D:\Econometrics\Statafiles\SAVING.DTA obs: 100 vars: 7 30 Oct 1996 10:23 size: 1,400 (99.7% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- sav int %9.0g annual savings, $ inc int %9.0g annual income, $ size byte %9.0g family size educ byte %9.0g years educ, household head age byte %9.0g age of household head black byte %9.0g =1 if household head is black cons int %9.0g annual consumption, $ ------------------------------------------------------------------------------- Sorted by:

The Simple Regression Model • To start with a normal regression. . regress sav inc Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 6.49 Model | 66368437.0 1 66368437.0 Prob > F = 0.0124 Residual | 1.0019e+09 98 10223460.8 R-squared = 0.0621 -------------+------------------------------ Adj R-squared = 0.0526 Total | 1.0683e+09 99 10790581.8 Root MSE = 3197.4 ------------------------------------------------------------------------------ sav | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc | .1466283 .0575488 2.55 0.012 .0324247 .260832 _cons | 124.8424 655.3931 0.19 0.849 -1175.764 1425.449 ------------------------------------------------------------------------------

1

2

1

ˆ

n

i ii

n

ii

y x

x

The Simple Regression Model • Now try an origin regression. . regress sav inc, noconstant

Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 99) = 31.26 Model | 316431274 1 316431274 Prob > F = 0.0000 Residual | 1.0023e+09 99 10123940.5 R-squared = 0.2400 -------------+------------------------------ Adj R-squared = 0.2323 Total | 1.3187e+09 100 13187013.9 Root MSE = 3181.8 ------------------------------------------------------------------------------ sav | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc | .1561974 .0279389 5.59 0.000 .1007606 .2116343 ------------------------------------------------------------------------------

annual income, $

annual savings, $ Fitted values

0 10000 20000 30000

0

5000

10000

15000

Lecture4-5

Documents

Transcript of Lecture4-5