1/59: Topic 1.2 – Extensions of the Linear Regression Model Microeconometric Modeling William...

60
1/59: Topic 1.2 – Extensions of the Linear Regression Model Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA 1.2 Extensions of the Linear Regression Model

Transcript of 1/59: Topic 1.2 – Extensions of the Linear Regression Model Microeconometric Modeling William...

1/59: Topic 1.2 – Extensions of the Linear Regression Model

Microeconometric Modeling

William GreeneStern School of BusinessNew York UniversityNew York NY USA

1.2 Extensions of the Linear Regression Model

2/59: Topic 1.2 – Extensions of the Linear Regression Model

Concepts

• Multiple Imputation• Robust Covariance

Matrices• Bootstrap• Maximum Likelihood• Method of Moments• Estimating Individual

Outcomes

Models

• Linear Regression Model• Quantile Regression• Stochastic Frontier

3/59: Topic 1.2 – Extensions of the Linear Regression Model

Multiple Imputation for Missing Data

4/59: Topic 1.2 – Extensions of the Linear Regression Model

Imputed Covariance Matrix

5/59: Topic 1.2 – Extensions of the Linear Regression Model

Implementation

SAS, Stata: Create full data sets with imputed values inserted. M = 5 is the familiar standard number of imputed data sets. Data are replicated and redistributed SAS: Standard procedure and code distributed. Stata: Elaborate imputation equations, M=5

NLOGIT Create an internal map of the missing values

and a set of engines for filling missing values Loop through imputed data sets during

estimation. M may be arbitrary – memory usage and data

storage are independent of M. Data may be replicated

6/59: Topic 1.2 – Extensions of the Linear Regression Model

Regression with Conventional Standard Errors

7/59: Topic 1.2 – Extensions of the Linear Regression Model

Robust Covariance Matrices

Robust standard errors, not estimates Robust to: Heteroscedasticty Not robust to: (all considered later)

Correlation across observations Individual unobserved heterogeneity Incorrect model specification

‘Robust inference’ means hypothesis tests and confidence intervals using robust covariance matrices

-1 2 -1i i ii

The White Estimator

Est.Var[ ] = ( ) e ( )b X X x x X X

8/59: Topic 1.2 – Extensions of the Linear Regression Model

A Robust Covariance Matrix

Uncorrected

9/59: Topic 1.2 – Extensions of the Linear Regression Model

Bootstrap Estimation of the Asymptotic Variance of an Estimator

Known form of asymptotic variance: Compute from known results

Unknown form, known generalities about properties: Use bootstrapping Root N consistency Sampling conditions amenable to central limit theorems Compute by resampling mechanism within the sample.

10/59: Topic 1.2 – Extensions of the Linear Regression Model

Bootstrapping Algorithm

1. Estimate parameters using full sample: b2. Repeat R times:

Draw n observations from the n, with replacement

Estimate with b(r). 3. Estimate variance with

V = (1/R)r [b(r) - b][b(r) - b]’

(Some use mean of replications instead of b. Advocated (without motivation) by original designers of the method.)

11/59: Topic 1.2 – Extensions of the Linear Regression Model

Application: Correlation between Age and Education

12/59: Topic 1.2 – Extensions of the Linear Regression Model

Bootstrapped Regression

13/59: Topic 1.2 – Extensions of the Linear Regression Model

Bootstrap Replications

14/59: Topic 1.2 – Extensions of the Linear Regression Model

Bootstrapped Confidence IntervalsEstimate Norm()=(12 + 22 + 32 + 42)1/2

15/59: Topic 1.2 – Extensions of the Linear Regression Model

16/59: Topic 1.2 – Extensions of the Linear Regression Model

Quantile Regression

Q(y|x,) = x, = quantile Estimated by linear programming Q(y|x,.50) = x, .50 median regression Median regression estimated by LAD (estimates

same parameters as mean regression if symmetric conditional distribution)

Why use quantile (median) regression? Semiparametric Robust to some extensions (heteroscedasticity?) Complete characterization of conditional distribution

17/59: Topic 1.2 – Extensions of the Linear Regression Model

1 1

Model : , ( | , ) , [ , ] 0

ˆˆResiduals: u

1Asymptotic Variance:

= E[f (0) ] Estimated by

Asymptotic Theory Based Estimator of Variance of Q - REG

x | x

A C A

A xx

i i i i i i i i

i i i

u

y u Q y Q u

y

N

βx βx

-βx

1

.2

1 1 1ˆ1 | | B

B 2 Bandwidth B can be Silverman's Rule of Thumb:

ˆ ˆ( | .75) ( | .25)1.06 ,

1.349

(1- )(1- ) [ ] Estimated by

x x

C = xx

N

i i ii

i iu

uN

Q u Q uMin s

N

EN

12For =.5 and normally distributed u, this all simplifies to .2

But, this is an ideal application for bootstrap

X X

X

g.

X

pin

us

Estimated Variance for Quantile Regression

18/59: Topic 1.2 – Extensions of the Linear Regression Model

= .25

= .50

= .75

Quantile Regressions

19/59: Topic 1.2 – Extensions of the Linear Regression Model

OLS vs. Least

Absolute Deviation

s

20/59: Topic 1.2 – Extensions of the Linear Regression Model

21/59: Topic 1.2 – Extensions of the Linear Regression Model

Coefficient on MALE dummy variable in quantile regressions

22/59: Topic 1.2 – Extensions of the Linear Regression Model

A Production Function Model with Inefficiency The Stochastic Frontier Model

23/59: Topic 1.2 – Extensions of the Linear Regression Model

Inefficiency in Production

24/59: Topic 1.2 – Extensions of the Linear Regression Model

Cost Inefficiency

y* = f(x) C* = g(y*,w)

(Samuelson – Shephard duality results)

Cost inefficiency: If y < f(x), then C must be greater than g(y,w). Implies the idea of a cost frontier.

lnC = lng(y,w) + u, u > 0.

25/59: Topic 1.2 – Extensions of the Linear Regression Model

Corrected Ordinary Least Squares

26/59: Topic 1.2 – Extensions of the Linear Regression Model

COLS Cost Frontier

27/59: Topic 1.2 – Extensions of the Linear Regression Model

Stochastic Frontier Models Motivation:

Factors not under control of the firm Measurement error Differential rates of adoption of technology

Frontier is randomly placed by the whole collection of stochastic elements which might enter the model outside the control of the firm.

Aigner, Lovell, Schmidt (1977),

Meeusen, van den Broeck (1977),

Battese, Corra (1977)

28/59: Topic 1.2 – Extensions of the Linear Regression Model

The Stochastic Frontier Model

( )

ln +

= + .

iviii

i i ii

i i

= fy eTE

= + v uy

+

x

x

x

ui > 0, but vi may take any value. A symmetric distribution, such as the normal distribution, is usually assumed for vi. Thus, the stochastic frontier is

+’xi+vi

and, as before, ui represents the inefficiency.

29/59: Topic 1.2 – Extensions of the Linear Regression Model

Least Squares Estimation

Average inefficiency is embodied in the third moment of the disturbance εi = vi - ui.

So long as E[vi - ui] is constant, the OLS estimates of the slope parameters of the frontier function are unbiased and consistent. (The constant term estimates α-E[ui]. The average inefficiency present in the distribution is reflected in the asymmetry of the distribution, which can be estimated using the OLS residuals:

3

1

1 ˆˆ( - [ ])N

N

3 i ii

= Em

30/59: Topic 1.2 – Extensions of the Linear Regression Model

Application to Spanish Dairy Farms

Input Units Mean Std. Dev.

Minimum

Maximum

Milk Milk production (liters)

131,108 92,539 14,110 727,281

Cows # of milking cows 2.12 11.27 4.5 82.3

Labor

# man-equivalent units

1.67 0.55 1.0 4.0

Land Hectares of land devoted to pasture and crops.

12.99 6.17 2.0 45.1

Feed Total amount of feedstuffs fed to dairy cows (tons)

57,941 47,981 3,924.14

376,732

N = 247 farms, T = 6 years (1993-1998)

31/59: Topic 1.2 – Extensions of the Linear Regression Model

Example: Dairy Farms

32/59: Topic 1.2 – Extensions of the Linear Regression Model

The Normal-Half Normal Model

2

2

ln

1Normal component: ~ [0, ]; ( ) , .

Half normal component: | |, ~ [0, ]

1 Underlying normal: ( ) ,

Half

x

xi i i i

i i

ii v i i

v v

i i i u

ii i

u u

y v u

vv N f v v

u U U N

Uf U v

1 1normal ( ) ,0

(0)

ii i

u u

uf u u

33/59: Topic 1.2 – Extensions of the Linear Regression Model

Skew Normal Variable

34/59: Topic 1.2 – Extensions of the Linear Regression Model

Estimation: Least Squares/MoM

OLS estimator of β is consistent E[ui] = (2/π)1/2σu, so OLS constant

estimates α+ (2/π)1/2σu

Second and third moments of OLS residuals estimate

Method of Moments:Use [a,b,m2,m3] to estimate [,,u, v]

and 0

2 2 32 u v 3 u

- 2 2 4 = + = 1 - m m

35/59: Topic 1.2 – Extensions of the Linear Regression Model

Standard Form: The Skew Normal Distribution

36/59: Topic 1.2 – Extensions of the Linear Regression Model

Log Likelihood Function

Waldman (1982) result on skewness of OLS residuals: If the OLS residuals are positively skewed, rather than negative, then OLS maximizes the log likelihood, and there is no evidence of inefficiency in the data.

37/59: Topic 1.2 – Extensions of the Linear Regression Model

Airlines Data – 256 Observations

38/59: Topic 1.2 – Extensions of the Linear Regression Model

Least Squares Regression

39/59: Topic 1.2 – Extensions of the Linear Regression Model

40/59: Topic 1.2 – Extensions of the Linear Regression Model

Alternative Models:Half Normal and Exponential

41/59: Topic 1.2 – Extensions of the Linear Regression Model

Other Models

Many other parametric models Semiparametric and nonparametric – the recent

outer reaches of the theoretical literature Other variations including heterogeneity in the

frontier function and in the distribution of inefficiency

Normal-Exponential Likelihood

2 2n

ui=1

Ln ( ; ) =

(( ) / ( )1-ln ln

2

v u

u i i v u i i

v v u

L data

v u v u

42/59: Topic 1.2 – Extensions of the Linear Regression Model

A Test for Inefficiency? Base test on u = 0 <=> = 0 Standard test procedures

Likelihood ratio Wald Lagrange Multiplier

Nonstandard testing situation: Variance = 0 on the boundary of the

parameter space Standard chi squared distribution does not

apply.

43/59: Topic 1.2 – Extensions of the Linear Regression Model

44/59: Topic 1.2 – Extensions of the Linear Regression Model

Estimating ui

No direct estimate of ui

Data permit estimation of yi – β’xi. Can this be used? εi = yi – β’xi = vi – ui Indirect estimate of ui, using E[ui|vi – ui] This is E[ui|yi, xi]

vi – ui is estimable with ei = yi – b’xi.

45/59: Topic 1.2 – Extensions of the Linear Regression Model

Fundamental Tool - JLMS

2

( )[ | ] ,

1 ( )i i

i i i ii

E u

We can insert our maximum likelihood estimates of all parameters.

Note: This estimates E[u|vi – ui], not ui.

2

ˆ ˆˆ ˆˆ ( ) ( )ˆ ˆ ˆˆ[ | ] , ˆ ˆ ˆ( )1

i i ii i i i

i

yE u

x

46/59: Topic 1.2 – Extensions of the Linear Regression Model

Application: Electricity Generation

47/59: Topic 1.2 – Extensions of the Linear Regression Model

Estimated Translog Production Frontiers

48/59: Topic 1.2 – Extensions of the Linear Regression Model

Inefficiency Estimates

49/59: Topic 1.2 – Extensions of the Linear Regression Model

Estimated Inefficiency Distribution

50/59: Topic 1.2 – Extensions of the Linear Regression Model

Estimated Efficiency

51/59: Topic 1.2 – Extensions of the Linear Regression Model

A Semiparametric Approach

Y = g(x,z) + v - u [Normal-Half Normal](1) Locally linear nonparametric regression estimates g(x,z)(2) Use residuals from nonparametric regression to estimate variance parameters using MLE(3) Use estimated variance parameters and residuals to estimate technical efficiency.

52/59: Topic 1.2 – Extensions of the Linear Regression Model

Airlines Application

53/59: Topic 1.2 – Extensions of the Linear Regression Model

Efficiency Distributions

54/59: Topic 1.2 – Extensions of the Linear Regression Model

Nonparametric Methods - DEA

55/59: Topic 1.2 – Extensions of the Linear Regression Model

DEA is done using linear programming

56/59: Topic 1.2 – Extensions of the Linear Regression Model

57/59: Topic 1.2 – Extensions of the Linear Regression Model

Methodological Problems with DEA

Measurement error Outliers Specification errors The overall problem with the

deterministic frontier approach

58/59: Topic 1.2 – Extensions of the Linear Regression Model

DEA and SFA: Same Answer?

Christensen and Greene data N=123 minus 6 tiny firms X = capital, labor, fuel Y = millions of KWH

Cobb-Douglas Production Function vs. DEA

59/59: Topic 1.2 – Extensions of the Linear Regression Model

60/59: Topic 1.2 – Extensions of the Linear Regression Model

Comparing the Two Methods.