Download - modelos escolha discreta

7/30/2019 modelos escolha discreta

1/67

Binary Response Models Multinomial Response Models Truncated and Censored Models

Modelos de Escolha Discreta

Cristine Campos de Xavier Pinto

University of Michigan

Winter 2010

Cristine Campos de Xavier Pinto Institute

http://find/http://goback/


2/67


There are some economic behavior that the continuousapproximation for the dependent variable is not a good one.

Examples: When we try to model individuals decision:whether to go to college, number of children, what brand ofautomobile to purchase, etc.

In the qualitative models, y take a nite number of outcomes.

The simplest case, y is a binary variables: y = 1 (success) ;y = 0 (failure)


http://find/


3/67


In this binary response models, we are interested in the

response probability:

p(x) Pr [y = 1j x] = Pr [y = 1j x1 , ..., xK]For a continuous variable, xj, the partial eect ofxj on the

response probability is

Pr [y = 1j x]xj

For binary variable, xj, we calculate the responses probabilities

Pr [x1 , x2 , ..., xk1 , 1] Pr [x1 , x2 , ..., xk1 , 0]


http://find/


4/67


Univariate binary response model:

Pr [y = 1j x] = F

x0i

, i = 1, ..., N

fyig is a sequence of independent binary random variablestaking values 0 or 1xi is a Kx1 vector of explanatory variables0 is a Kx1 vector of parametersF is a known function


http://find/


5/67


The functional forms ofF most used are:Linear Probability Model

F(x) = x

Probit Model

F(x) = (x) =Zx

1p2

exp

t

2

2

dt

Logit Model

F(x) = (x) = ex

1 + ex



C
http://find/


6/67


Linear Probability Model:

1 F is not constrained to lie between 0 and 1.2 In this model,

Pr [y = 1j x]xj

= j

3 In this model, heteroskedasticity is present since

Var[yj x] = x0 1 x0i4 In this model, a ceteris paribus unit increase in xj always

change Pr [y = 1

jx] by the same amount, regardless the value

ofxj. If we keep increasing xj, eventually Pr [y = 1j x] will beoutside the interval [0, 1]



Bi R M d l M l i i l R M d l T d d C d M d l


7/67


Probit and Logit Models:

1 Ifxj is continuous,

Pr [y = 1j x]xj

= f

x0j

where f(x0) = dF(z)dz .2 F(.) is a strictly increasing function, f(z) > 0 for all z. The

sign of the eect is given by the sign of j.3 We can calculate the relative eects

Pr [ y=1

jx]

xj

Pr [ y=1jx]xh

= jh



Bi R M d l M lti i l R M d l T t d d C d M d l
http://find/


8/67


4. The index model can be derived from can be derived fromlatent variable model:

y = x0 + ey = 1 fy > 0g

where e is continuously distributed variable independent of xwith cdfF (.) and the distribution is symmetric about zero.Since the distribution is symmetric about zero,

1 F(z) = F(z)and

Pr [y = 1j x] = Pr [y > 0j x]= Pr

e> x0

x= 1 Fx0= F

x0

Cristine Campos de Xavier Pinto InstituteModelos de Escolha Discreta

http://find/


9/67


Lets model the decision of a person regarding whether she

drives a car or take a bus to work. We assume that the utilityassociated with each model of transportation is a function of:

the mode characteristics zthe individual socioeconomic characteristics wan unobservable term

We dene:

U1i: persons indirect utility associate with driving a carU0i: persons indirect utility associate with taking a bus

U0i = 0 + z00i + w0i0 + 0iU1i = 1 + z

01i + w

0i1 + 1i



http://find/


10/67


Basic assumption: The person will drive a car ifU1i > U0i;

and it will drive a bus ifU1i U0ij x]= Pr

0i 1i < (1 0) + (z1i z0i)0 + w0i (1 0)

x

= F(1 0) + (z1i z0i)0 + w0i (1

0)

where F is a distribution function of 0i 1i.How can we estimate the parameters in this model?





11/67


Assuming that we have a random sample of size N, thelog-likelihood function is

log L =N

i=1

yi log F

x0i

+N

i=1

(1 yi) log

1 Fx0iThe MLE estimator is a solution (if it exits) of:

log L

=N

i=1

yi F (x0i)F (x0i) (1 F (x0i))

f

x0i

xi = 0

We need to use a F that is twice dierentiable, and assumethat the parameter space is compact and that

fxig

is

uniformly bounded in i and E [xix0i] is a nite nonsingularmatrix.

To show consistency, we need to show that all theassumptions necessary to show consistency of MLE hold.



http://find/


12/67


The second derivative is

2 log L0

= Ni=1

yi F (x0i)F (x0i) (1 F (x0i))

2 fx0i2 xix0i+

N

i=1

yi F (x0i)

F (x0i) (1

F (x0i))

f

x0i

0

xix0i

In this case,

E [Hi ()j x] = f(x0i)

2 xix0i

F(x0i) (1

F (x0i))

A (xi,)

which is positive semidenite matrix. In the case of the logitand probit, and assuming that E [xix0i] is a nite nonsingularmatrix, this matrix is positive denite.





13/67

y p p

Under the general conditions of MLE,

pNb!d N0,I1

where I= E [A (xi,)]Since in the logit and probit cases, we have global concavity,computing the MLE using the iteration procedures is verysimple. We can use the Newton-Raphson algorithm, and get

b2 = b1 2 log L0 b1!1

log L b1!





14/67

At the end,

b2 =

264

N

i=1

f

x0ib12 xix0iF

x0ib1

1 F

x0ib1

375

1

24 Ni=1

fx0ib1 xiyi Fx0ib1+ fx0ib1 x0ib1F

x0ib1 1 Fx0ib135

Interpretation: Weighted Least Squares Estimator withweights equal to 1

F(x0ib1)(1F(x0ib1)) .



http://find/


15/67

Neglected Heterogeneity

Now, we deal with endogenous variables and neglectedHeterogeneity in the qualitative models.

Suppose that the structural model of interested is

Pr [y = 1j x, c] = x0 + cwhere x is a vector Kx1 with x1 = 1 and c is a scalar.

Object of Interest: partial eects ofxj on the responseprobability, holding c constant.

The latent model has the form

y = x0 + c+ e

y = 1 fy > 0gej x,c N(0, 1)



http://find/


16/67

Suppose that c is independent of x, and

c N0, 2Under these two assumptions, c+ e is independent ofx , and

c+ e

N0, 22 + 1

In this case,

Pr [y = 1j x] = Pr

c+ e> x0

x

= x0 where 2 = 22 + 1.


http://find/


17/67

Even when the omitted heterogeneity is independent of x, theprobit coecients are inconsistent

plimbj = jHowever, if we are interested in the partial eects, bj givesthe right direction.For continuous xj,

Pr [y = 1j x, c]x

j

= j x0 + c

for various values ofc and x.

Because c is not observed, we cannot estimate .




18/67

Ifc is normalized so that E [c] = 0, so we may be interestedin the partial eects evaluated at c = 0.

However, what is consistently estimate from the probit of y

on x isj

x0

which is dierent from the object of interest.




19/67

Another parameter of interested: Average Partial Eect(APE)

APE: For given x, we average the partial eect across dedistribution ofc in the population. Let x0 be a specic valueof the vector of explanatory variables,

E

hj

x00 + c

i=

j

x00

The probit ofy on x consistently estimate the average partial

eects.


http://find/


20/67

Endogenous Explanatory Variable

Lets assume that the continuous explanatory variable iscorrelated with x.

Consider the following model:

y1 = z11 + 1y2 + u1y2 = z121 + z222 + v2

y1 = 1 [y1 > 0]

where (u1 , v2) has a zero mean, bivariate normal distribution

and is independent of z = (z1 , z2) .

y2 is endogenous ifu1 and v2 are correlated.

In this example, y2 is a continuous random variable (Why?)


http://find/


21/67

We need a normalization to interpret the parameter in thisequation as an average partial eect,

Var[u1] = 1

Lets try to understand why the normalization is necessary.Consider the outcome y1 at two dierent outcomes ofy2 (y2and y2 + 1). Holding all the other factors constant, thedierence at the response functions are:

1 [z11 + 1 (y2 + 1) + u1 0] 1 [z11 + 1y2 + u1 0]


http://find/


22/67

Because u1 is not observed, we cannot estimate the dierencein response for a given population unit. However,u1 N(0, 1) and we can average across the distribution ofu1,

(z11 + 1 (y2 + 1)) (z11 + 1y2)In this case, the parameters in APE are 1 and 1. However, ifwe do not normalize, Var[u1] = , and APE will depend on1 and

1 .


http://find/


23/67

Under the joint normality of (u1 , v2) with Var[u1] = 1, we

can writeu1 = 1v2 + e1

where

1 =Cov(v2,u1 )Var[v2 ]

122

e1 is independent of z and v2, and is normally with mean 0and variance 121, where 21 = Corr(v2 , u1) .

We can write the model as

y1

= z11 + 1y2 + 1v2 + e1

e1j z,y2 , v2 N

0, 121


http://find/


24/67

In this case,

Pr [y1 = 1j z,y2 , v2] = 0

@z11 + 1y2 + 1v2

q121

1

AThe probit ofy1 on z1, y2 and v2, consistently estimate

1p121

, 1p121

and 1p121

.

However, we do not know v2 and we need to estimate it in a

rst step.


http://find/


25/67

We can think about a two step procedure:

STEP 1: run the OLS regression of y2 on z, and save theresiduals

bv2 .

STEP 2: Run the probit ofy1 on z1, y2 and bv2, and getconsistent estimators to 1p121 , 1p121 and 1p121 .To derive the asymptotic variance of this two step-estimator,we need to use the derivation of a variance of a two-stepprocedure for an extremum estimator.


http://find/


26/67

Using this procedure, we can consistently estimate APE. TheAPE is taking derivatives of

Ev2240@z11 + 1y2 + 1v2q

1211A35 = z1 + 1y2

where

=1q

121r

21121

22 + 1

, 22 = Var [v2]1 =

1

q121 r 21

121 2

2 +1

, 22 = Var [v2]

After the two step procedure, we just divide each coecient

by

s b211

b21b22 + 1

!.


http://find/


27/67

Another way to estimate this latent model is to use

conditional MLE. Note that

f(y1 , y2j z) = f(y1j y2 , z) f(y2j z)Using the assumptions above, y2j z N

z2 ,

22

.

Since v2 = y2 z2,

Pr [y1 = 1

jy2 , z] =

0

BBBBB@z11 + 1y2 +

12

(y2 z2)

q121| {z }w

1

CCCCCA


http://find/


28/67

Using the derivation above,

f(y1,

y2j z) = f (w)gy1

f1 (w)g1

y1 12 y2 z22

and the log-likelihood function

N

i=1 y1i log ( (wi)) + (1 y1i) log (1 (wi))

12

log22 1

2

(y2i zi2)222

MLE is more ecient than two-step procedure. (Why?)We get estimates of 1 and 1 .

However, the iteration algorithm do not work well when 21tend to 1 or 1.


http://find/


29/67

Lets assume that the dependent variable yi takes mi + 1values 0, 1, 2, ..., mi. The multinomial response model isdened as

Pr [yi = jj x] = Fij (x, )Note that Pr [yi = 0j x] = Fi0 (x, ) does not need to bespecied since it is going to be equal to one minus the sum ofmi other probabilities.

To dene the MLE of, we need to dene Ni=1 mi + 1 binaryrandom variables

yij =

1 ifyi = j0 ifyi 6= j

for i = 1,

2, ...,

N and j = 0,

1, ...,

mi.

The log-likelihood is

log L =N

i=1

mi

j=0

yij log Fij




30/67

Multinomial Logit Model

In this case, the order of the responses do not matter.Lets assume that yi is a random variable that can assumevalues f1, ..., Jg for J a positive integer.We have a random sample of (xi, yi) from a certainpopulation.

In the multinomial logit model (MNL), the responsesprobabilities are

pj (x,) Pr [y = jj x] =exp

x0j

1 +Jh=1 exp (x

0h), j = 1, ..., J

Since the probabilities sum to one

Pr [y = 0j x] = 11 +Jh=1 exp (x

0h)


http://find/


31/67

The partial eects for a continuous xk are

Pr [y = jj x]xk

= Pr [y = jj x] 8 yi1 , yi1 > yi2j x]= Pr [xi2 xi1 + ai2 > ai1 , xi2 xi0 + ai2 > ai0]=

Z

f(ai2) Zxi2xi1+ai2

f(ai1) dai1

Zxi2xi0+ai2

f(ai0) dai0dai2

=Z

exp[a2] exp[ exp [a2] exp [

exp [

xi2 + xi1

ai2]]

exp [ exp [xi2 + xi0 ai2]] dai2=

exp [xi2]

exp [xi2] + exp [xi1] + exp [xi0]


http://find/


36/67

The marginal eects are given bypj (x)

xjk= pj (x) [1 pj (x)] k, j = 0, ..., J, k = 1, ..., K

pj (x)

xhk = pj (x) ph (x) k, j6= h, k = 1, ..., KConditional logit model: The explanatory variables canchange from choice to choice, but the eect of each variableis the same for all the alternatives. The parameter is common

for all the choices, .




37/67

One important restriction is that

pj (xj)

ph (xh) =

expx0jexp (x0h) = exp (xj xh)0

The relative probabilities only depend on the attributes ofthose two alternatives (Independence from IrrelevantAlternatives, IIA)

Many models relax this assumption:1 Multinomial Probit Model: ai has a multivariate normal

distribution with arbitrary correlations between aij and aih , forj6= h.

Disadvantage: The response probability involves

(J+ 1) dimensional integral and computation is a problem.2 Hierarchical model (Nested logit model): Aggregate the

alternatives into S groups of similar alternatives. In the rstlevel, the probability ofy being in a group. In the second level,we pick the actual alternatives within each group.




38/67

Ordered Response Models

The values that y takes corresponds to a partition of the realline.

Suppose we have a latent variable y. In this case,

y = j if and only ifj < y< j

+1, j = 0, 1, ..., J

and,

Pr [y = jj x, ] = F

j+1 x0

F

j x0

In the order probit,y = x0+e, ej x N(0, 1)


http://find/


39/67

In this case,

Pr [y = 0j x] = Pr [y 1j x]=

1 x0

Pr [y

= 1j x] = Pr [1< y

2j x]= 2 x0 1 x0until we get

Pr [y = Jj x] = Pr [y > Jj x]= 1 J x0


http://find/


40/67

The parameters and can be estimated by MLE. Thelog-likelihood function is

log L =N

i=1

1 [yi = 0] log

1 x0i

+1 [yi = 1] log 2 x0i 1 x0i+... + 1 [yi = J] log

1 J x0i

We can use a logistic distribution for e, and we have the

ordered logit model.




41/67

For the order probit model, the marginal eects arep0 (x)

xk= k

1 x0

pJ (x)

xk= k J x0

pj (x)

xk= k

j1 x0

j x0 , 0 < j< JThe sign of do not always determine the direction of the

eect, only at the extremes.



http://find/


42/67

Limited Dependent Variable Models: the dependent variable isconstrained in some way.

Truncated models: observations outside a specic range istotally lost.

Censored models: we can observe at least the exogenousvariables.

Examples: data censoring, corn solution outcomes (rmexpenditures, insure plan, etc.) and survival and durationmodels.



http://find/


43/67

Example: A household is assumed to maximize utility subjectto a budge constraint

y+ z Rand the boundary constraint y y0 or y = 0.Suppose that y is the solution of the maximization subject to

the budget constraint only, and we assume that

y = 1 + 2x+ u

The solution for this problem is

y =y ify > y0

0 or y0 ify y0



http://find/


44/67

To solve this example, we assume that u is a random variableand y0 is known.

Given a random sample of size N, and obtain the loglikelihood

L = 0

Fi (y0i)1

fi (yi)

where

0

: product over those i for which yi y0

1

: product over those i for which yi > y0



http://find/


45/67

Standard Tobit Model (or Type I model):

yi = x0i + ui, uij xi N

0, 2

yi = max (0, y

i )

where x includes a column of ones.Objects of interested:

Censored Models: E [yj x] = xCorn solutions: E [yj x] or E [yj x, y> 0]

What do we know about bound for E [yj x]?



http://find/


46/67

Using Jenens inequality

E [yj x] max (0,E [yj x])since g(z) = max (0, z) is a convex function.

In addition, we can write

E [yj x] = Pr [y = 0j x] 0 + Pr [y> 0j x] E [yj x, y> 0]= Pr [y> 0j x] E [yj x, y> 0]



http://find/


47/67

Lets dene w = 1 ify > 0, and w = 0 ify < 0.

Pr [y> 0j x] = Pr [w = 1j x]= Pr [y > 0j x]= Pr

u> x0

x

= Pr u> x0

x

=

x

A probit ofw on x consistently estimate .




R ll h f N ( ) h f
http://find/


48/67

Recall that ifz N(0, 1), then for a constant c

E

[zjz> c

] =

(c)

1 (c)Note that

E [yj x, y> 0] = x0 +E [uj u> x]

= x0 + 24 x0

1 x0

35

= x0 + 24

x0 x0 35

Inverse Mills Ratio: (c) = (c)(c)





49/67

Ifxj is a continuously explanatory variable,

E [yj x, y> 0]xj

= j + j

24dx0

dc

35= j1 x0 x0 + x0

Using the properties of normal, we can show that

n1 x0 h

x0 +

x0 io > 0, so the sign ofj givesthe direction of the impact.



http://find/


50/67

Using the above results,

E [yj x] = x0 8 0j x]

xjE [yj x, y> 0]

+ Pr [y> 0j x] E [yj x, y> 0]xj

= x0j

What is the interpretation of the adjustment factor?



http://find/


51/67

Consider two estimators:

1 Probit Maximum Likelihood2 Least Squares3 Heckman two-step least squares4 Tobit Maximum Likelihood

Random Sample of (yi, xi) of size N. However fyi g isunobserved ifyi 0.Assumptions: fxig are uniformly bounded andlimN! 1N

Ni=1 x

0ixi is positive denite. The parameter space

of and

2

is compact.



http://find/


52/67

We need to derive the density of yi conditional on xi.

From above, we know that

Pr [yi = 0j xi] = 1x0i

For c> 0

Pr [yi cj xi] = Pr [yi cj xi]so

f(c

jxi) = f

(cjxi)



http://find/


53/67

By assumption yj x Nx, 2, andf (cj xi) = 1

c xi

The density ofyi conditional on xi is

f(cj xi) =

1x0i

1fyi=0g 1

c xi

1fyi>0g




Probit Maximum Likelihood
http://find/


54/67

Probit Maximum Likelihood

The log-likelihood for the censored model can be written

L () =N

i=1

1

x0i

1fyi=0g

x0i

1fyi>0gN

i=1

(x0i

1)1fyi>0g 1yi x0i

1fyi>0gThe rst part is a likelihood function of a probit model, and

the last part is the likelihood of a truncated probit.The Probit MLE estimator of = is obtained bymaximizing only the logarithm of the rst part.





55/67

This method cannot be ecient, since it uses only the valuesofy

iand not the value of yi when we observe.

The estimator is consistent, but inecient.

Using the same derivation as we did for MLE, we can showthat

b !p X0D1X1 X0D1D10 (w E [w])where

D1 is a diagonal matrix NxN with the elements (x0)

D0 is a diagonal matrix NxN with the elements

x0i1 1 x0i1 (x0 )2w is the vector with wi




Least Squares Estimator
http://find/


56/67

Least Squares Estimator

We can use OLS in the entire sample (incluing theobservations with zero) and in the sample for which yi > 0.

Both estimators are going to be inconsistents.

Using the results above,

E [yj x, y> 0] = x0 + 24

x0

x0

35= x + x0



http://find/


57/67

so we can write

yi = x0i + x0i+ eiE [eij xi, yi > 0] = 0

Lets dene .If we run OLS ofy on xi using the sample with yi > 0, weomit i ) inconsistent of OLS estimatorIf we run OLS ofy on x using the full sample, OLS is alsoinconsistent. E [yj x] is a NONLINEAR function ofx, and

.




Heckmans Two-Step Estimator


58/67

Heckman s Two Step Estimator

Lets go back to the model

yi = x0i +

x0i + ei

The variance ofei is

Var[eij xi] = 2 2x0i

x0i

2

x0i

2

We have a nonlinear regression model.



http://find/


59/67

The estimation proposed by Heckman has 2 steps:

STEP 1: Estimate by the probit MLE.STEP 2: Regress yi on xi and

x0ib by least squares, using

only the sample in which yi > 0.

To derive the properties of Heckmans estimator, lets rewrite

the model as

yi = x0i +

x0ib+ ei + i

where

i = x0i x0ib





60/67

Lets dene

bZi = (xi, (x

0i

b)) and =

0,

0. In addition,

N1 as size of the sample in which yi > 0.In this case,

b = N1i=1

bZ0i bZi!1

N1

i=1

bZ0iyi!

Is b consistent?Lets try to derive the asymptotic distribution of b. We canwrite,

pN1 (b ) = 1N1N1

i=1bZ0i bZi!1

1pN1N1

i=1bZ0iei + 1pN1N1

i=1bZ0ii!



http://find/


61/67

From before, we know the probit

b is consistent, so

p limN1!

1

N1

N1

i=1

bZ0i bZi = p limN1!

1

N1

N1

i=1

Z0iZi = EZ0iZi

where Zi = (xi, (x0i)) .

Under the assumption that fuig is i.i.d with N0, 2, wecan show that

1pN1

N1

i=1b

Z0iei !d N

0, 2E

Z0iZi

where E [eie0ij xi] = 2.




Doing a mean valued expansion of (x0ib) around (x0i )


62/67

i i

x0ib = x0i+ (x0ie

)

(b )so

i = (x0ie)

(b )

Under the assumptions above,

1pN1

N1

i=1

bZ0ii !d N(0, V)where

V = 2E hZ0 (I ) x0 x0D1x1 x0 (I )ZiAt the end,

pN1 (b ) converges to a normal with mean

zero and a nite variance.




Tobit Maximum Likelihood Estimator
http://find/


63/67

The Tobit MLE will maximize the log-likelihood function, = (, ):

l() =N

i=1

1 fyi = 0g log 1x0i

+1 fyi > 0g

log

yi x0i

log

2

2

!)




The rst derivative of this problem is


64/67

l()

=

N

i=18 0g

(yi x0i) xi

2

l()

2=

N

i=1

8 0g

264yi x0i2

24 1

223759>=>;




To derive the asymptotic distribution, we need the hessian.L
http://find/


65/67

Lets getA (xi, )

E [Hi ()

jxi]

A (xi, ) =aix0ixi bix0ibixi ci

where

ai = 2 8>:x0i x0i 264 x

0i

2

1 x0i x0i3759>=>;bi =

1

23

x0i

2

x0i

+

x0i

264x0i x0i2

1 x0i3759>=>



http://find/


66/67

ci =

1

4

4 x0i

3 x

0i+ x

0i x

0i

264x0i

x0i2

1 x0i375 2 x0i

9>=

>;Since its MLE estimator, its consistenty and asymptoticnormality with asymptotic variance-covariance matrix equalsto E [A (xi, )]

1.

Computation: Convergence is assured by global concavity,

however a choice of a good estimator for the mills rationimprove speed.

To compute the Tobit model, we can use the EM algorithm.




References


67/67

Amemya: 9 e 10

Rudd: 27Wooldridge: 15 e 16