Lecture4-5

17
University of Hong Kong Introductory Econometrics (ECON0701), Spring 2014 27 January 2014 The Simple Regression Model Last time, we introduced the simple regression model. The simple regression model takes the form Where 0 and 1 are chosen so that the fitted values of the model are as close to the data as possible (minimizing the sum of the squared residuals over all the data). The Simple Regression Model Now we will do an empirical example of a simple regression analysis. We are interested in exploring the relationship between CEO salaries, the dependent variable, and return on equity, the independent variable. The idea is to find out whether CEOs in more profitable companies (with a higher return to equity) get paid more. The Simple Regression Model ___ ____ ____ ____ ____ tm /__ / ____/ / ____/ ___/ / /___/ / /___/ 7.0 Copyright 1984-2002 Statistics/Data Analysis Stata Corporation 4905 Lakeway Drive Special Edition College Station, Texas 979-696-4600 979-696-4601 (fax) 6-user Stata for Windows (network) perpetual license: Serial number: 81970521252 Licensed to: School of Economics & Finance University of Hong Kong Notes: 1. (/m# option or -set memory-) 0.98 MB allocated to data 2. (/v# option or -set maxvar-) 5000 maximum variables . use "D:\Econometrics\Statafiles\CEOSAL1.DTA" 0 1 y x u

description

ec

Transcript of Lecture4-5

Page 1: Lecture4-5

University of Hong Kong Introductory Econometrics (ECON0701), Spring 2014

27 January 2014

The Simple Regression Model

• Last time, we introduced the simple regression model. • The simple regression model takes the form

• Where 0 and 1 are chosen so that the fitted values of the model are as close

to the data as possible (minimizing the sum of the squared residuals over all the data).

The Simple Regression Model

• Now we will do an empirical example of a simple regression analysis.

• We are interested in exploring the relationship between CEO salaries, the dependent variable, and return on equity, the independent variable.

• The idea is to find out whether CEOs in more profitable companies (with a higher return to equity) get paid more.

The Simple Regression Model ___ ____ ____ ____ ____ tm /__ / ____/ / ____/ ___/ / /___/ / /___/ 7.0 Copyright 1984-2002 Statistics/Data Analysis Stata Corporation 4905 Lakeway Drive Special Edition College Station, Texas 979-696-4600 979-696-4601 (fax) 6-user Stata for Windows (network) perpetual license: Serial number: 81970521252 Licensed to: School of Economics & Finance University of Hong Kong Notes: 1. (/m# option or -set memory-) 0.98 MB allocated to data 2. (/v# option or -set maxvar-) 5000 maximum variables . use "D:\Econometrics\Statafiles\CEOSAL1.DTA"

0 1y x u

Page 2: Lecture4-5

The Simple Regression Model . d Contains data from D:\Econometrics\Statafiles\CEOSAL1.DTA obs: 209 vars: 12 16 Sep 1996 15:53 size: 7,106 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- salary int %9.0g 1990 salary, thousands $ pcsalary int %9.0g % change salary, 89-90 sales float %9.0g 1990 firm sales, millions $ roe float %9.0g return on equity, 88-90 avg pcroe float %9.0g % change roe, 88-90 ros int %9.0g return on firm's stock, 88-90 indus byte %9.0g =1 if industrial firm finance byte %9.0g =1 if financial firm consprod byte %9.0g =1 if consumer product firm utility byte %9.0g =1 if transport. or utilties lsalary float %9.0g natural log of salary lsales float %9.0g natural log of sales ------------------------------------------------------------------------------- Sorted by:

The Simple Regression Model . summarize salary, detail 1990 salary, thousands $ ------------------------------------------------------------- Percentiles Smallest 1% 333 223 5% 448 256 10% 525 333 Obs 209 25% 736 360 Sum of Wgt. 209 50% 1039 Mean 1281.12 Largest Std. Dev. 1372.345 75% 1407 4143 90% 1900 6640 Variance 1883332 95% 2327 11233 Skewness 6.854923 99% 6640 14822 Kurtosis 60.54128

The Simple Regression Model

. summarize roe, detail return on equity, 88-90 avg ------------------------------------------------------------- Percentiles Smallest 1% 2.1 .5 5% 6.8 1.9 10% 8.9 2.1 Obs 209 25% 12.4 2.9 Sum of Wgt. 209 50% 15.5 Mean 17.18421 Largest Std. Dev. 8.518509 75% 20 44.4 90% 26.8 44.5 Variance 72.56499 95% 35.1 48.1 Skewness 1.56082 99% 44.5 56.3 Kurtosis 6.678555

Page 3: Lecture4-5

The Simple Regression Model • Are CEO salaries and return on equity correlated? . correlate salary roe (obs=209) | salary roe -------------+------------------ salary | 1.0000 roe | 0.1148 1.0000

• What is the covariance? . correlate salary roe, covariance (obs=209) | salary roe -------------+------------------ salary | 1.9e+06 roe | 1342.54 72.565

The Simple Regression Model • We can use the variance-covariance table to estimate 1: • Equation to be estimated is

salary = 0 + 1roe + u

The Simple Regression Model . regress salary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 2.77 Model | 5166419.04 1 5166419.04 Prob > F = 0.0978 Residual | 386566563 207 1867471.32 R-squared = 0.0132 -------------+------------------------------ Adj R-squared = 0.0084 Total | 391732982 208 1883331.64 Root MSE = 1366.6 ------------------------------------------------------------------------------ salary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | 18.50119 11.12325 1.66 0.098 -3.428195 40.43057 _cons | 963.1913 213.2403 4.52 0.000 542.7902 1383.592 ------------------------------------------------------------------------------

• The coefficient on "roe" is the estimate of 1. The coefficient labelled "_cons" is the estimate of 0 (the constant term).

1

ˆ ( , ) 1342.54ˆ 18.5ˆ 72.565( )

Cov X Y

Var X

Page 4: Lecture4-5

The Simple Regression Model

• These commands will make a graph of the residuals and the actual values: . predict salaryhat (option xb assumed; fitted values) . twoway (line salaryhat roe) (scatter salary roe), scheme(s2mono)

05

000

100

001

5000

0 20 40 60return on equity, 88-90 avg

Fitted values 1990 salary, thousands $

The Simple Regression Model

• Let's do a second example, this time showing how to read in data. We are looking

at the relationship between wages and years of education. • The data is from the dataset on the Wooldridge site WAGE1.RAW; first, look at

the description file WAGE1.DES: WAGE1.DES wage educ exper tenure nonwhite female married

numdep smsa northcen south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq

tenursq Obs: 526 1. wage average hourly earnings 2. educ years of education ...

Page 5: Lecture4-5

The Simple Regression Model • To read in the data, type: . clear . infile wage educ exper tenure nonwhite female married numdep smsa northcen south

west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq using "D:\Econometrics\Textfiles\WAGE1.RAW"

The Simple Regression Model • The general syntax is “infile var1 var2 ... using filename”. • The Stata tutorial goes into more detail on infiling. • . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 1179.73204 1 1179.73204 Prob > F = 0.0000 Residual | 5980.68225 524 11.4135158 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 7160.41429 525 13.6388844 Root MSE = 3.3784 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5413593 .053248 10.17 0.000 .4367534 .6459651 _cons | -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 ------------------------------------------------------------------------------

The Simple Regression Model

• Now let’s explore some properties of residuals from the regression. . predict uhat, residuals . summarize uhat Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- uhat | 526 4.43e-09 3.37517 -5.339615 16.60854 . correlate uhat educ (obs=526) | uhat educ -------------+------------------ uhat | 1.0000 educ | 0.0000 1.0000

The Simple Regression Model • Mathematically, these observations translate into two facts:

1

ˆ 0n

ii

u

1

ˆ 0n

i ii

x u

Page 6: Lecture4-5

The Simple Regression Model

• Now we will explore some measures of model fit.

• To measure the systematic variance of y, we will use the explained sum of squares measure.

• To measure the unsystematic variance of y, we will use the residual sum of squares measure.

The Simple Regression Model

• The definitions of the explained and residual sum of squares are:

The Simple Regression Model

• The sum of these two is the total sum of squares. • The R-squared of the regression is the ratio of the explained sum of squares to the

total sum of squares.

2

1

2

1

ˆ

ˆ

n

ii

n

ii

SSR u

SSE y y

2

1

n

ii

SST SSE SSR

SST y y

2 SSE SST SSRR

SST SST

Page 7: Lecture4-5

The Simple Regression Model • The model fit measures are given in the regression output above the regression

coefficients: . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 1179.73204 1 1179.73204 Prob > F = 0.0000 Residual | 5980.68225 524 11.4135158 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 7160.41429 525 13.6388844 Root MSE = 3.3784 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .5413593 .053248 10.17 0.000 .4367534 .6459651 _cons | -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 ------------------------------------------------------------------------------

The Simple Regression Model

• It is important to note that OLS statistics depend on the units of measurement.

• For example, suppose we wanted to analyze the relationship between wages and education, but use Hong Kong dollars instead of U.S. dollars.

replace wage=wage*7.8

The Simple Regression Model

• This is the result: . regress wage educ Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 1, 524) = 103.36 Model | 71774.8968 1 71774.8968 Prob > F = 0.0000 Residual | 363864.708 524 694.398297 R-squared = 0.1648 -------------+------------------------------ Adj R-squared = 0.1632 Total | 435639.605 525 829.789723 Root MSE = 26.351 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | 4.222602 .4153347 10.17 0.000 3.406677 5.038528 _cons | -7.057842 5.342749 -1.32 0.187 -17.55368 3.437996 ------------------------------------------------------------------------------

• Notice that the coefficients are all multiplied by 7.8, because now they denote HK$,

not US$.

2 1179.730.1648

7160.41R

Page 8: Lecture4-5

Break

The Simple Regression Model • Last time, we continued our discussion of the simple regression model. • Even though the simple regression model is a linear model, it can encompass some

nonlinear relationships as well. • For example, one might wish to regress the natural logarithm of a dependent

variable on the independent variable. • This permits analyzing percentage changes, as opposed to absolute changes.

The Simple Regression Model • Why does taking the log of the dependent variable allow interpreting changes as

percentage changes? • This is true because (when n is small) • Suppose n=0.01 (for example). Then, adding 0.01 to the natural log of C is like

multiplying C by 1.01, or increasing C by 1%.

The Simple Regression Model

• Let’s re-visit our analysis of CEO salaries, but using log salaries this time. . use "D:\Econometrics\Statafiles\CEOSAL1.DTA" . d Contains data from D:\Econometrics\Statafiles\CEOSAL1.DTA obs: 209 vars: 12 16 Sep 1996 15:53 size: 7,106 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- salary int %9.0g 1990 salary, thousands $ pcsalary int %9.0g % change salary, 89-90 sales float %9.0g 1990 firm sales, millions $ ... lsalary float %9.0g natural log of salary lsales float %9.0g natural log of sales -------------------------------------------------------------------------------

ln 1exp C n C n

Page 9: Lecture4-5

The Simple Regression Model . regress lsalary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 9.41 Model | 2.90054039 1 2.90054039 Prob > F = 0.0024 Residual | 63.8216228 207 .308317018 R-squared = 0.0435 -------------+------------------------------ Adj R-squared = 0.0389 Total | 66.7221632 208 .320779631 Root MSE = .55526 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | .0138626 .0045196 3.07 0.002 .0049522 .022773 _cons | 6.712169 .0866445 77.47 0.000 6.54135 6.882987 ------------------------------------------------------------------------------

• The coefficient on “roe” is interpreted this way: for every percent increase in the return on equity (for example, 5% to 6%, NOT 5% to 5.05%), the CEO’s salary is expected to be 1.38% higher.

The Simple Regression Model

• Another property of using log salary is that it is invariant to scale. For example,

suppose we want to analyze CEO salaries in Hong Kong dollars.

. replace salary=salary*7.8 . replace lsalary=ln(salary) . regress lsalary roe Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 9.41 Model | 2.90053936 1 2.90053936 Prob > F = 0.0024 Residual | 63.8216209 207 .308317009 R-squared = 0.0435 -------------+------------------------------ Adj R-squared = 0.0389 Total | 66.7221603 208 .320779617 Root MSE = .55526 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- roe | .0138626 .0045196 3.07 0.002 .0049522 .022773 _cons | 8.766292 .0866445 101.18 0.000 8.595473 8.937111 ------------------------------------------------------------------------------

Page 10: Lecture4-5

The Simple Regression Model • The simple regression model can also be used to estimate constant-elasticity

models.

. regress lsalary lsales Source | SS df MS Number of obs = 209 -------------+------------------------------ F( 1, 207) = 55.30 Model | 14.0661668 1 14.0661668 Prob > F = 0.0000 Residual | 52.6559935 207 .25437678 R-squared = 0.2108 -------------+------------------------------ Adj R-squared = 0.2070 Total | 66.7221603 208 .320779617 Root MSE = .50436 ------------------------------------------------------------------------------ lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lsales | .2566717 .0345167 7.44 0.000 .1886224 .3247209 _cons | 6.87612 .2883396 23.85 0.000 6.307662 7.444579 ------------------------------------------------------------------------------

56

78

91

0

4 6 8 10 12natural log of sales

Fitted values natural log of salary

The Simple Regression Model

• What kinds of linearities are NOT possible to model with the regression model?

• The regression must be linear in the parameters, not necessarily in the variables.

• Nonlinear regression is possible, but beyond the scope of this course.

Page 11: Lecture4-5

The Simple Regression Model • These are OK … • These are NOT ok.

The Simple Regression Model • The OLS estimators will be unbiased estimates (that is, their expected value is

equal to the true parameter value) of 0 and 1when four conditions are true. • The model must be linear in parameters, i.e. have the form

The Simple Regression Model • The data set must represent a random sample from the population. • The conditional mean of the error term must be zero.

The Simple Regression Model

• There must be some variation in the dependent variable (the x’s can’t all be equal to the same constant).

• Intuitively, the model tells you the effects of changes in x on y. If there are no changes in x to observe, there is no way to find this out.

20 1

0 1

1

y x u

x uy

2

0 1

0 1

1

y x u

y ux

0 1y x u

, : 1, 2,...,i ix y i n

| 0E u x

Page 12: Lecture4-5

The Simple Regression Model • To show that the OLS estimator of 1is unbiased, write :

The Simple Regression Model

• This reduces to

The Simple Regression Model

• For 0, averaging the sample form of the model over the population • yields

0 11 1

1 2 2

1 1

ˆ ( , )ˆˆ ( )

n n

i i ii i

n n

i ii i

x x y x x x uCov X Y

Var Xx x x x

2

1 0 11 1 1 1

ˆn n n n

i i i ii i i i

x x x x x x x x x u

1

1 1 2

1

ˆ

n

ii

n

ii

x x u

x x

1

1 1 2

1

|ˆ |

n

ii

n

ii

x x E u xE x

x x

0 1i i iy x u

0 1y x u

0 1 0 1 1 0 1 1ˆ ˆ ˆ ˆy x x u x x u

Page 13: Lecture4-5

The Simple Regression Model • Finally, taking the expectation of this expression yields • By what we have proven before, the expectation in the second term is zero; by the

zero conditional mean assumption the third term is zero. These then imply the estimator is unbiased.

The Simple Regression Model • Since the estimators of 0 and 1are random variables, we can calculate their

variance as well. • To simplify the calculation of their variance, we will assume conditional

homoskedasticity:

The Simple Regression Model

• When there is not conditional homoskedasticity, the error term is said to exhibit heteroskedasticity.

• That is to say, the variance of the error term is dependent on x.

0 0 1 1ˆ ˆ|E x E x E u

2|Var u x

y

x

y Fitted Values

.3 29.9

8.6295

19.8891

Page 14: Lecture4-5

y

x

y Fitted Values

.3 29.9

10.07

24.5125

The Simple Regression Model • How can we find the variance of the estimator of 1?

The Simple Regression Model • This can be simplified:

1

1 12

1

ˆ

n

iin

ii

x x u

x x

11 1 2

2

1

|ˆ( | ) ( | )

n

ii

n

ii

Var x x u x

Var x Var x

x x

2

11 2

2

1

22

11 2

2

1

|ˆ( | )

ˆ( | )

n

ii

n

ii

n

ii

n

ii

x x Var u xVar x

x x

x xVar x

x x

Page 15: Lecture4-5

The Simple Regression Model • Finally,

The Simple Regression Model

• It is sometimes desirable to restrict the linear regression to run through the origin

(x=0,y=0). This is equivalent to saying that 0=0: • An example might be modelling income and savings. If your income is zero, your

savings must be zero as well.

The Simple Regression Model • The problem can be solved by minimizing squared residuals:

y

x1.5e-09 30

7.94396

21.4184

2

12

1

ˆ( | ) n

ii

Var xx x

y x u

2

1 1

ˆ ˆ2 0ˆ

n n

i i i i ii i

dy x y x x

d

2

1

ˆ 0n

i i ii

y x x

Page 16: Lecture4-5

The Simple Regression Model • Now for an example of origin regression.

. use "D:\Econometrics\Statafiles\SAVING.DTA" . d Contains data from D:\Econometrics\Statafiles\SAVING.DTA obs: 100 vars: 7 30 Oct 1996 10:23 size: 1,400 (99.7% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- sav int %9.0g annual savings, $ inc int %9.0g annual income, $ size byte %9.0g family size educ byte %9.0g years educ, household head age byte %9.0g age of household head black byte %9.0g =1 if household head is black cons int %9.0g annual consumption, $ ------------------------------------------------------------------------------- Sorted by:

The Simple Regression Model • To start with a normal regression. . regress sav inc Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 6.49 Model | 66368437.0 1 66368437.0 Prob > F = 0.0124 Residual | 1.0019e+09 98 10223460.8 R-squared = 0.0621 -------------+------------------------------ Adj R-squared = 0.0526 Total | 1.0683e+09 99 10790581.8 Root MSE = 3197.4 ------------------------------------------------------------------------------ sav | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc | .1466283 .0575488 2.55 0.012 .0324247 .260832 _cons | 124.8424 655.3931 0.19 0.849 -1175.764 1425.449 ------------------------------------------------------------------------------

1

2

1

ˆ

n

i ii

n

ii

y x

x

Page 17: Lecture4-5

The Simple Regression Model • Now try an origin regression. . regress sav inc, noconstant

Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 99) = 31.26 Model | 316431274 1 316431274 Prob > F = 0.0000 Residual | 1.0023e+09 99 10123940.5 R-squared = 0.2400 -------------+------------------------------ Adj R-squared = 0.2323 Total | 1.3187e+09 100 13187013.9 Root MSE = 3181.8 ------------------------------------------------------------------------------ sav | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inc | .1561974 .0279389 5.59 0.000 .1007606 .2116343 ------------------------------------------------------------------------------

annual income, $

annual savings, $ Fitted values

0 10000 20000 30000

0

5000

10000

15000