Econ 423 – Lecture...
Transcript of Econ 423 – Lecture...
![Page 1: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/1.jpg)
11-1
Econ 423 – Lecture Notes
(These notes are slightly modified versions of lecture notes provided by
Stock and Watson, 2007. They are for instructional purposes only
and are not to be distributed outside of the classroom.)
![Page 2: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/2.jpg)
11-2
Regression with a Binary Dependent
Variable
So far the dependent variable (Y) has been continuous:
• district-wide average test score
• traffic fatality rate
What if Y is binary?
• Y = get into college, or not; X = years of education
• Y = person smokes, or not; X = income
• Y = mortgage application is accepted, or not; X = income,
house characteristics, marital status, race
![Page 3: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/3.jpg)
11-3
Example: Mortgage denial and race
The Boston Fed HMDA data set
• Individual applications for single-family mortgages made
in 1990 in the greater Boston area
• 2380 observations, collected under Home Mortgage
Disclosure Act (HMDA)
Variables
• Dependent variable:
o Is the mortgage denied or accepted?
• Independent variables:
o income, wealth, employment status
o other loan, property characteristics
o race of applicant
![Page 4: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/4.jpg)
11-4
The Linear Probability Model
A natural starting point is the linear regression model with a
single regressor:
Yi = β0 + β1Xi + ui
But:
• What does β1 mean when Y is binary? Is β1 = Y
X
∆∆
?
• What does the line β0 + β1X mean when Y is binary?
• What does the predicted value Y mean when Y is binary?
For example, what does Y = 0.26 mean?
![Page 5: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/5.jpg)
11-5
The linear probability model, ctd.
Yi = β0 + β1Xi + ui
Recall assumption #1: E(ui|Xi) = 0, so
E(Yi|Xi) = E(β0 + β1Xi + ui|Xi) = β0 + β1Xi
When Y is binary,
E(Y) = 1×Pr(Y=1) + 0×Pr(Y=0) = Pr(Y=1)
so
E(Y|X) = Pr(Y=1|X)
![Page 6: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/6.jpg)
11-6
The linear probability model, ctd.
When Y is binary, the linear regression model
Yi = β0 + β1Xi + ui
is called the linear probability model.
• The predicted value is a probability:
o E(Y|X=x) = Pr(Y=1|X=x) = prob. that Y = 1 given x
o Y = the predicted probability that Yi = 1, given X
• β1 = change in probability that Y = 1 for a given ∆x:
β1 = Pr( 1 | ) Pr( 1 | )Y X x x Y X x
x
= = + ∆ − = =∆
![Page 7: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/7.jpg)
11-7
Example: linear probability model, HMDA data
Mortgage denial v. ratio of debt payments to income
(P/I ratio) in the HMDA data set (subset)
![Page 8: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/8.jpg)
11-8
Linear probability model: HMDA data, ctd.
�deny = -.080 + .604P/I ratio (n = 2380)
(.032) (.098)
• What is the predicted value for P/I ratio = .3?
�Pr( 1 | / .3)deny P Iratio= = = -.080 + .604×.3 = .151
• Calculating “effects:” increase P/I ratio from .3 to .4:
�Pr( 1 | / .4)deny P Iratio= = = -.080 + .604×.4 = .212
The effect on the probability of denial of an increase in P/I
ratio from .3 to .4 is to increase the probability by .061, that is,
by 6.1 percentage points (what?).
![Page 9: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/9.jpg)
11-9
Linear probability model: HMDA data, ctd
Next include black as a regressor:
�deny = -.091 + .559P/I ratio + .177black
(.032) (.098) (.025)
Predicted probability of denial:
• for black applicant with P/I ratio = .3:
�Pr( 1)deny = = -.091 + .559×.3 + .177×1 = .254
• for white applicant, P/I ratio = .3:
�Pr( 1)deny = = -.091 + .559×.3 + .177×0 = .077
• difference = .177 = 17.7 percentage points
• Coefficient on black is significant at the 5% level
• Still plenty of room for omitted variable bias…
![Page 10: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/10.jpg)
11-10
The linear probability model: Summary
• Models Pr(Y=1|X) as a linear function of X
• Advantages:
o simple to estimate and to interpret
o inference is the same as for multiple regression (need
heteroskedasticity-robust standard errors)
• Disadvantages:
o Does it make sense that the probability should be linear
in X?
o Predicted probabilities can be <0 or >1!
• These disadvantages can be solved by using a nonlinear
probability model: probit and logit regression
![Page 11: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/11.jpg)
11-11
Probit and Logit Regression
The problem with the linear probability model is that it models
the probability of Y=1 as being linear:
Pr(Y = 1|X) = β0 + β1X
Instead, we want:
• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
• Pr(Y = 1|X) to be increasing in X (for β1>0)
This requires a nonlinear functional form for the probability.
How about an “S-curve”…
![Page 12: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/12.jpg)
11-12
The probit model satisfies these conditions:
• 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
• Pr(Y = 1|X) to be increasing in X (for β1>0)
![Page 13: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/13.jpg)
11-13
Probit regression models the probability that Y=1 using the
cumulative standard normal distribution function, evaluated at
z = β0 + β1X:
Pr(Y = 1|X) = Φ(β0 + β1X)
• Φ is the cumulative normal distribution function.
• z = β0 + β1X is the “z-value” or “z-index” of the probit
model.
Example: Suppose β0 = -2, β1= 3, X = .4, so
Pr(Y = 1|X=.4) = Φ(-2 + 3×.4) = Φ(-0.8)
Pr(Y = 1|X=.4) = area under the standard normal density to left
of z = -.8, which is…
![Page 14: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/14.jpg)
11-14
Pr(Z ≤ -0.8) = .2119
![Page 15: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/15.jpg)
11-15
Probit regression, ctd.
Why use the cumulative normal probability distribution?
• The “S-shape” gives us what we want:
o 0 ≤ Pr(Y = 1|X) ≤ 1 for all X
o Pr(Y = 1|X) to be increasing in X (for β1>0)
• Easy to use – the probabilities are tabulated in the
cumulative normal tables
• Relatively straightforward interpretation:
o z-value = β0 + β1X
o 0β + 1β X is the predicted z-value, given X
o β1 is the change in the z-value for a unit change in X
![Page 16: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/16.jpg)
11-16
STATA Example: HMDA data
. probit deny p_irat, r;
Iteration 0: log likelihood = -872.0853 We’ll discuss this later
Iteration 1: log likelihood = -835.6633
Iteration 2: log likelihood = -831.80534
Iteration 3: log likelihood = -831.79234
Probit estimates Number of obs = 2380
Wald chi2(1) = 40.68
Prob > chi2 = 0.0000
Log likelihood = -831.79234 Pseudo R2 = 0.0462
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901
_cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082
------------------------------------------------------------------------------
�Pr( 1 | / )deny P Iratio= = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
![Page 17: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/17.jpg)
11-17
STATA Example: HMDA data, ctd.
�Pr( 1 | / )deny P Iratio= = Φ(-2.19 + 2.97×P/I ratio)
(.16) (.47)
• Positive coefficient: does this make sense?
• Standard errors have the usual interpretation
• Predicted probabilities:
�Pr( 1 | / .3)deny P Iratio= = = Φ(-2.19+2.97×.3)
= Φ(-1.30) = .097
• Effect of change in P/I ratio from .3 to .4:
�Pr( 1 | / .4)deny P Iratio= = = Φ(-2.19+2.97×.4) = .159
Predicted probability of denial rises from .097 to .159
![Page 18: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/18.jpg)
11-18
Probit regression with multiple regressors
Pr(Y = 1|X1, X2) = Φ(β0 + β1X1 + β2X2)
• Φ is the cumulative normal distribution function.
• z = β0 + β1X1 + β2X2 is the “z-value” or “z-index” of the
probit model.
• β1 is the effect on the z-score of a unit change in X1, holding
constant X2
![Page 19: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/19.jpg)
11-19
STATA Example: HMDA data
. probit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853
Iteration 1: log likelihood = -800.88504
Iteration 2: log likelihood = -797.1478
Iteration 3: log likelihood = -797.13604
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
We’ll go through the estimation details later…
![Page 20: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/20.jpg)
11-20
STATA Example, ctd.: predicted probit probabilities
. probit deny p_irat black, r;
Probit estimates Number of obs = 2380
Wald chi2(2) = 118.18
Prob > chi2 = 0.0000
Log likelihood = -797.13604 Pseudo R2 = 0.0859
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181
black | .7081579 .0831877 8.51 0.000 .545113 .8712028
_cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463
------------------------------------------------------------------------------
. sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0;
. display "Pred prob, p_irat=.3, white: " normprob(z1);
Pred prob, p_irat=.3, white: .07546603
NOTE
_b[_cons] is the estimated intercept (-2.258738)
_b[p_irat] is the coefficient on p_irat (2.741637)
sca creates a new scalar which is the result of a calculation
display prints the indicated information to the screen
![Page 21: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/21.jpg)
11-21
STATA Example, ctd.
�Pr( 1 | / , )deny P I black=
= Φ(-2.26 + 2.74×P/I ratio + .71×black)
(.16) (.44) (.08)
• Is the coefficient on black statistically significant?
• Estimated effect of race for P/I ratio = .3:
�Pr( 1 | .3,1)deny = = Φ(-2.26+2.74×.3+.71×1) = .233
�Pr( 1 | .3,0)deny = = Φ(-2.26+2.74×.3+.71×0) = .075
• Difference in rejection probabilities = .158 (15.8 percentage
points)
• Still plenty of room still for omitted variable bias…
![Page 22: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/22.jpg)
11-22
Logit Regression
Logit regression models the probability of Y=1 as the
cumulative standard logistic distribution function, evaluated at
z = β0 + β1X:
Pr(Y = 1|X) = F(β0 + β1X)
F is the cumulative logistic distribution function:
F(β0 + β1X) = 0 1( )
1
1X
eβ β− ++
![Page 23: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/23.jpg)
11-23
Logit regression, ctd.
Pr(Y = 1|X) = F(β0 + β1X)
where F(β0 + β1X) = 0 1( )
1
1X
eβ β− ++
.
Example: β0 = -3, β1= 2, X = .4,
so β0 + β1X = -3 + 2×.4 = -2.2 so
Pr(Y = 1|X=.4) = 1/(1+e–(–2.2)) = .0998
Why bother with logit if we have probit?
• Historically, logit is more convenient computationally
• In practice, logit and probit are very similar
![Page 24: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/24.jpg)
11-24
![Page 25: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/25.jpg)
11-25
![Page 26: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/26.jpg)
11-26
STATA Example: HMDA data . logit deny p_irat black, r;
Iteration 0: log likelihood = -872.0853 Later…
Iteration 1: log likelihood = -806.3571
Iteration 2: log likelihood = -795.74477
Iteration 3: log likelihood = -795.69521
Iteration 4: log likelihood = -795.69521
Logit estimates Number of obs = 2380
Wald chi2(2) = 117.75
Prob > chi2 = 0.0000
Log likelihood = -795.69521 Pseudo R2 = 0.0876
------------------------------------------------------------------------------
| Robust
deny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481
black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913
_cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753
------------------------------------------------------------------------------
. dis "Pred prob, p_irat=.3, white: "
> 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0)));
Pred prob, p_irat=.3, white: .07485143
NOTE: the probit predicted probability is .07546603
![Page 27: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/27.jpg)
11-27
Predicted probabilities from estimated probit and logit models
usually are (as usual) very close in this application.
![Page 28: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/28.jpg)
11-28
Example for class discussion:
Characterizing the Background of Hezbollah Militants
Source: Alan Krueger and Jitka Maleckova, “Education, Poverty and
Terrorism: Is There a Causal Connection?” Journal of Economic
Perspectives, Fall 2003, 119-144.
Logit regression: 1 = died in Hezbollah military event
Table of logit results:
![Page 29: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/29.jpg)
11-29
![Page 30: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/30.jpg)
11-30
![Page 31: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/31.jpg)
11-31
Hezbollah militants example, ctd.
Compute the effect of schooling by comparing predicted
probabilities using the logit regression in column (3):
Pr(Y=1|secondary = 1, poverty = 0, age = 20)
– Pr(Y=0|secondary = 0, poverty = 0, age = 20):
Pr(Y=1|secondary = 1, poverty = 0, age = 20)
= 1/[1+e–(–5.965+.281×1 – .335×0 – .083×20)]
= 1/[1 + e7.344] = .000646 does this make sense?
Pr(Y=1|secondary = 0, poverty = 0, age = 20)
= 1/[1+e–(–5.965+.281×0 – .335×0 – .083×20)]
= 1/[1 + e7.625] = .000488 does this make sense?
![Page 32: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/32.jpg)
11-32
Predicted change in probabilities:
Pr(Y=1|secondary = 1, poverty = 0, age = 20)
– Pr(Y=1|secondary = 1, poverty = 0, age = 20)
= .000646 – .000488 = .000158
Both these statements are true:
• The probability of being a Hezbollah militant increases by
0.0158 percentage points, if secondary school is attended.
• The probability of being a Hezbollah militant increases by
32%, if secondary school is attended (.000158/.000488 =
.32).
• These sound so different! what is going on?
![Page 33: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/33.jpg)
11-33
Estimation and Inference in Probit (and Logit) Models
Probit model:
Pr(Y = 1|X) = Φ(β0 + β1X)
• Estimation and inference
o How can we estimate β0 and β1?
o What is the sampling distribution of the estimators?
o Why can we use the usual methods of inference?
• First motivate via nonlinear least squares
• Then discuss maximum likelihood estimation (what is
actually done in practice)
![Page 34: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/34.jpg)
11-34
Probit estimation by nonlinear least squares
Recall OLS:
0 1
2
, 0 1
1
min [ ( )]n
b b i i
i
Y b b X=
− +∑
• The result is the OLS estimators 0β and 1β
• Nonlinear least squares estimator of probit coefficients:
0 1
2
, 0 1
1
min [ ( )]n
b b i i
i
Y b b X=
− Φ +∑
How to solve this minimization problem?
• Calculus doesn’t give and explicit solution.
• Solved numerically using the computer(specialized
minimization algorithms)
• In practice, nonlinear least squares isn’t used because it
isn’t efficient – an estimator with a smaller variance is…
![Page 35: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/35.jpg)
11-35
Probit estimation by maximum likelihood
The likelihood function is the conditional density of Y1,…,Yn
given X1,…,Xn, treated as a function of the unknown
parameters β0 and β1.
• The maximum likelihood estimator (MLE) is the value of
(β0, β1) that maximize the likelihood function.
• The MLE is the value of (β0, β1) that best describe the full
distribution of the data.
• In large samples, the MLE is:
o consistent
o normally distributed
o efficient (has the smallest variance of all estimators)
![Page 36: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/36.jpg)
11-36
Special case: the probit MLE with no X
Y = 1 with probability
0 with probability 1
p
p
−
(Bernoulli distribution)
Data: Y1,…,Yn, i.i.d.
Derivation of the likelihood starts with the density of Y1:
Pr(Y1 = 1) = p and Pr(Y1 = 0) = 1–p
so
Pr(Y1 = y1) = 1 11(1 )y yp p
−− (verify this for y1=0, 1!)
![Page 37: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/37.jpg)
11-37
Joint density of (Y1,Y2):
Because Y1 and Y2 are independent,
Pr(Y1 = y1,Y2 = y2) = Pr(Y1 = y1)× Pr(Y2 = y2)
= [ 1 11(1 )y yp p
−− ]×[ 2 21(1 )y yp p
−− ]
= ( ) [ ]1 2 1 22 ( )
(1 )y y y y
p p+ − +−
Joint density of (Y1,..,Yn):
Pr(Y1 = y1,Y2 = y2,…,Yn = yn)
= [ 1 11(1 )y yp p
−− ]×[ 2 21(1 )y yp p
−− ]×…×[1
(1 )n ny yp p
−− ]
= ( )11 (1 )
nnii ii
n yyp p ==
−∑∑ −
![Page 38: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/38.jpg)
11-38
The likelihood is the joint density, treated as a function of the
unknown parameters, which here is p:
f(p;Y1,…,Yn) = ( )11 (1 )
nnii ii
n YYp p ==
−∑∑ −
The MLE maximizes the likelihood. Its easier to work with
the logarithm of the likelihood, ln[f(p;Y1,…,Yn)]:
ln[f(p;Y1,…,Yn)] = ( ) ( )1 1ln( ) ln(1 )
n n
i ii iY p n Y p
= =+ − −∑ ∑
Maximize the likelihood by setting the derivative = 0:
1ln ( ; ,..., )nd f p Y Y
dp = ( ) ( )1 1
1 1
1
n n
i ii iY n Y
p p= =
−+ − − ∑ ∑ = 0
Solving for p yields the MLE; that is, ˆ MLEp satisfies,
![Page 39: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/39.jpg)
11-39
( ) ( )1 1
1 1
ˆ ˆ1
n n
i iMLE MLEi iY n Y
p p= =
−+ − − ∑ ∑ = 0
or
( ) ( )1 1
1 1
ˆ ˆ1
n n
i iMLE MLEi iY n Y
p p= == −
−∑ ∑
or
ˆ
ˆ1 1
MLE
MLE
Y p
Y p=
− −
or
ˆ MLEp = Y = fraction of 1’s
whew… a lot of work to get back to the first thing you would
think of using…but the nice thing is that this whole approach
generalizes to more complicated models...
![Page 40: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/40.jpg)
11-40
The MLE in the “no-X” case (Bernoulli distribution), ctd.:
ˆ MLEp = Y = fraction of 1’s
• For Yi i.i.d. Bernoulli, the MLE is the “natural” estimator of
p, the fraction of 1’s, which is Y
• We already know the essentials of inference:
o In large n, the sampling distribution of ˆ MLEp = Y is
normally distributed
o Thus inference is “as usual:” hypothesis testing via t-
statistic, confidence interval as ± 1.96SE
![Page 41: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/41.jpg)
11-41
The MLE in the “no-X” case (Bernoulli distribution), ctd:
• The theory of maximum likelihood estimation says that ˆ MLEp
is the most efficient estimator of p – of all possible
estimators – at least for large n. (Much stronger than the
Gauss-Markov theorem). This is why people use the MLE.
• STATA note: to emphasize requirement of large-n, the
printout calls the t-statistic the z-statistic; instead of the F-
statistic, the chi-squared statistic (= q×F).
• Now we extend this to probit – in which the probability is
conditional on X – the MLE of the probit coefficients.
![Page 42: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/42.jpg)
11-42
The probit likelihood with one X
The derivation starts with the density of Y1, given X1:
Pr(Y1 = 1|X1) = Φ(β0 + β1X1)
Pr(Y1 = 0|X1) = 1–Φ(β0 + β1X1)
so
Pr(Y1 = y1|X1) = 1 11
0 1 1 0 1 1( ) [1 ( )]y y
X Xβ β β β −Φ + − Φ +
The probit likelihood function is the joint density of Y1,…,Yn
given X1,…,Xn, treated as a function of β0, β1:
f(β0,β1; Y1,…,Yn|X1,…,Xn)
= { 1 11
0 1 1 0 1 1( ) [1 ( )]Y Y
X Xβ β β β −Φ + − Φ + }×
…×{1
0 1 0 1( ) [1 ( )]n nY Y
n nX Xβ β β β −Φ + − Φ + }
![Page 43: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/43.jpg)
11-43
The probit likelihood function:
f(β0,β1; Y1,…,Yn|X1,…,Xn)
= { 1 11
0 1 1 0 1 1( ) [1 ( )]Y Y
X Xβ β β β −Φ + − Φ + }×
…×{1
0 1 0 1( ) [1 ( )]n nY Y
n nX Xβ β β β −Φ + − Φ + }
• Can’t solve for the maximum explicitly
• Must maximize using numerical methods
• As in the case of no X, in large samples:
o 0ˆ MLEβ , 1
ˆ MLEβ are consistent
o 0ˆ MLEβ , 1
ˆ MLEβ are normally distributed
o 0ˆ MLEβ , 1
ˆ MLEβ are asymptotically efficient – among all
estimators (assuming the probit model is the correct
model)
![Page 44: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/44.jpg)
11-44
The Probit MLE, ctd.
• Standard errors of 0ˆ MLEβ , 1
ˆ MLEβ are computed
automatically…
• Testing, confidence intervals proceeds as usual
• For multiple X’s, see SW App. 11.2
![Page 45: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/45.jpg)
11-45
The logit likelihood with one X
• The only difference between probit and logit is the
functional form used for the probability: Φ is replaced by
the cumulative logistic function.
• Otherwise, the likelihood is similar; for details see SW
App. 11.2
• As with probit,
o 0ˆ MLEβ , 1
ˆ MLEβ are consistent
o 0ˆ MLEβ , 1
ˆ MLEβ are normally distributed
o Their standard errors can be computed
o Testing, confidence intervals proceeds as usual
![Page 46: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/46.jpg)
11-46
Measures of fit for logit and probit
The R2 and 2R don’t make sense here (why?). So, two other
specialized measures are used:
1. The fraction correctly predicted = fraction of Y’s for which
predicted probability is >50% (if Yi=1) or is <50% (if
Yi=0).
2. The pseudo-R2 measure the fit using the likelihood
function: measures the improvement in the value of the log
likelihood, relative to having no X’s (see SW App. 11.2).
This simplifies to the R2 in the linear model with normally
distributed errors.
![Page 47: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/47.jpg)
11-47
Application to the Boston HMDA Data
(SW Section 11.4)
• Mortgages (home loans) are an essential part of buying a
home.
• Is there differential access to home loans by race?
• If two otherwise identical individuals, one white and one
black, applied for a home loan, is there a difference in the
probability of denial?
![Page 48: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/48.jpg)
11-48
The HMDA Data Set
• Data on individual characteristics, property characteristics,
and loan denial/acceptance
• The mortgage application process circa 1990-1991:
o Go to a bank or mortgage company
o Fill out an application (personal+financial info)
o Meet with the loan officer
• Then the loan officer decides – by law, in a race-blind way.
Presumably, the bank wants to make profitable loans, and
the loan officer doesn’t want to originate defaults.
![Page 49: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/49.jpg)
11-49
The loan officer’s decision
• Loan officer uses key financial variables:
o P/I ratio
o housing expense-to-income ratio
o loan-to-value ratio
o personal credit history
• The decision rule is nonlinear:
o loan-to-value ratio > 80%
o loan-to-value ratio > 95% (what happens in default?)
o credit score
![Page 50: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/50.jpg)
11-50
Regression specifications
Pr(deny=1|black, other X’s) = …
• linear probability model
• probit
Main problem with the regressions so far: potential omitted
variable bias. All these (i) enter the loan officer decision
function, all (ii) are or could be correlated with race:
• wealth, type of employment
• credit history
• family status
The HMDA data set is very rich…
![Page 51: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/51.jpg)
11-51
![Page 52: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/52.jpg)
11-52
![Page 53: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/53.jpg)
11-53
![Page 54: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/54.jpg)
11-54
Table 11.2, ctd.
![Page 55: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/55.jpg)
11-55
Table 11.2, ctd.
![Page 56: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/56.jpg)
11-56
Summary of Empirical Results
• Coefficients on the financial variables make sense.
• Black is statistically significant in all specifications
• Race-financial variable interactions aren’t significant.
• Including the covariates sharply reduces the effect of race on
denial probability.
• LPM, probit, logit: similar estimates of effect of race on the
probability of denial.
• Estimated effects are large in a “real world” sense.
![Page 57: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/57.jpg)
11-57
Remaining threats to internal, external validity
• Internal validity
1. omitted variable bias
• what else is learned in the in-person interviews?
2. functional form misspecification (no…)
3. measurement error (originally, yes; now, no…)
4. selection
• random sample of loan applications
• define population to be loan applicants
5. simultaneous causality (no)
• External validity
This is for Boston in 1990-91. What about today?
![Page 58: Econ 423 – Lecture Noteseconweb.umd.edu/~chao/Teaching/Econ423/Econ423_Binary_Choice_Models.pdf · Econ 423 – Lecture Notes (These notes are slightly modified versions of lecture](https://reader033.fdocuments.net/reader033/viewer/2022042211/5eb060dd32dd7e68764fb891/html5/thumbnails/58.jpg)
11-58
Summary
(SW Section 11.5)
• If Yi is binary, then E(Y| X) = Pr(Y=1|X)
• Three models:
o linear probability model (linear multiple regression)
o probit (cumulative standard normal distribution)
o logit (cumulative standard logistic distribution)
• LPM, probit, logit all produce predicted probabilities
• Effect of ∆X is change in conditional probability that Y=1.
For logit and probit, this depends on the initial X
• Probit and logit are estimated via maximum likelihood
o Coefficients are normally distributed for large n
o Large-n hypothesis testing, conf. intervals is as usual