The Probit Model
Transcript of The Probit Model
![Page 1: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/1.jpg)
The Probit Model
Alexander SpermannUniversity of FreiburgUniversity of Freiburg
SoSe 2009
![Page 2: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/2.jpg)
1. Notation and statistical foundations
2. Introduction to the Probit model
3. Application
4. Coefficients and marginal effects
Course outline
2
4. Coefficients and marginal effects
5. Goodness-of-fit
6. Hypothesis tests
![Page 3: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/3.jpg)
1 2 2
t 0 1 1
'
^ ^
1. y Gujarati
y W ooldridge
2. M atrix
i i k ik i
t k tk t
x x
x x u
Y X
Y x
Y X u
β β β εβ β β
β εβ ε
β
= + + + += + + + +
= += +
= +
K
K
Notation and statistical foundations
3
( ) ( )
'
' '
0 1
1 2 1 2 3 2
2 3
1 1
i i i
i i
Y X u
y x
x x
x x x x
ββ ε
β ββ ββ ββ β
= += +
![Page 4: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/4.jpg)
Notation and statistical foundations – Vectors
� Column vector:
� Transposed (row vector): [ ]
1
2
1
'
xn
n
a
aa
a
a a a a
=
=
M
K
4
� Transposed (row vector):
� Inner product:
[ ]
[ ]
'1 2
1
1
2'1 2
xn
n
n i i
n
a a a a
b
ba b a a a a b
b
=
= =
∑
K
KM
![Page 5: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/5.jpg)
� PDF: probability density function f(x)
� Example: Normal distribution:
( )( )2
2
121
x
x e
µσφ
− − =
Notation and statistical foundations – density function
5
� Example: Standard normal distribution: N(0,1), µ = 0, σ = 1
( )2
x eφσ π
=
( )2
21
2
x
x eφπ
−=
0=µ
![Page 6: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/6.jpg)
� Standard logistic distribution:
Exponential distribution:
( ) 3,0,
1)(
22
2
πσµ ==+
=x
x
e
exf
Notation and statistical foundations – distibutions
6
Exponential distribution:
� Poisson distribution:
220,
1
0,0 ,,0,)( θσθµθθ
θ ==>
=≥
≤
−xe
x
x
xf
θσθµθθ
===−
2,,!
)(x
exf
x
![Page 7: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/7.jpg)
� CDF: cumulative distribution function F(x)
� Example: Standard normal distribution:
( )2
21
2
z x
z e dxπ
−
−∞
Φ = ∫
Notation and statistical foundations – CDF
7
� The cdf is the integral of the pdf.
![Page 8: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/8.jpg)
� Rule I:
� Rule II:
log log log
n
y x z
y x z
y x
== +
=
Notation and statistical foundations – logarithms
8
� Rule II:
� Rule III:
log log
log log log
b
y x
y n x
y a x
y a b x
==
== +
![Page 9: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/9.jpg)
� Why not use OLS instead?
Introduction to the Probit model – binary variables
=0
1y
OLS
9
� Nonlinear estimation, for example by maximum likelihood.
1
0
OLS(linear)
x x x x x x x
x x x x
![Page 10: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/10.jpg)
� Latent variable: Unobservable variable y* which can take all values in (-∞, +∞).
� Example: y* = Utility(Labour income) - Utility(Non labour income)
Underlying latent model:
Introduction to the Probit model – latent variables
10
� Underlying latent model:
iii
i
ii
xy
y
yy
εβ +=
≤>
=
'*
0*,0
0*,1
![Page 11: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/11.jpg)
� Probit is based on a latent model:
Introduction to the Probit model – latent variables
( )εφ
)|(
)|0(
)|0()|1(
'
'
*
ββε
εβ
ii
ii
ii
xxP
xxP
xyPxyP
−−=
−>=
>+=
>==
11
Assumption: Error terms are independent and normally distributed:
'ix β−
)(1 'βixF −−=
ββ ''ii xx−
)(
1),(1)|1(
'
'
β
σσβ
i
ii
x
xxyP
Φ=
≡−Φ−==
because of symmetry
![Page 12: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/12.jpg)
� Example:
Introduction to the Probit model – CDF
( )zΦ=CDF ( )zΦ=CDF
0,8
1
12
1-0,2=0,81-0,2=0,8z− z0
0,2
0,5
0,8
β'ixz =
![Page 13: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/13.jpg)
� F(z) lies between zero and one
� CDF of Probit: CDF of Logit:
Introduction to the Probit model – CDF Probit vs. Logit
13
β'ixz = β'ixz =
![Page 14: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/14.jpg)
� PDF of Probit: PDF of Logit:
Introduction to the Probit model – PDF Probit vs. Logit
14
![Page 15: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/15.jpg)
� Joint density:
Introduction to the Probit model – The ML principle
[ ]ii
ii
yi
yi
y
iy
ii
FF
xFxFxyf
−
−
−=
−=
∏∏
1
)1(''
)1(
)(1)(),|( βββ
15
� Log likelihood function:
ii
i∏
)1ln()1(lnln iii
ii FyFyL −−+=∑
![Page 16: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/16.jpg)
� The principle of ML: Which value of β maximizes the probability of observing the given sample?
Introduction to the Probit model – The ML principle
1
))(1(ln
−−−+=
∂∂
∑ ii i
ii
i
ii xF
fy
F
fyL
β
16
0
)1(
1
=
−−=
−∂
∑ ii
iii
ii
i ii
xfFF
Fy
FFβ
![Page 17: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/17.jpg)
� Example taken from Greene, Econometric Analysis, 5. ed. 2003, ch. 17.3.
� 10 observations of a discrete distribution
� Random sample: 5, 0, 1, 1, 0, 3, 2, 3, 4, 1
Introduction to the Probit model – Example
17
� PDF:
� Joint density :
� Which value of θ makes occurance of the observed sample most probable?
( )!
,i
x
i x
exf
iθθθ−
=
( ) ( )36,207
!,|,,,
2010
10
1
1010
11021
θθθθθθ ⋅=
∑⋅==−
=
−
= ∏∏ e
x
exfxxxf
ii
x
ii
i i
K
![Page 18: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/18.jpg)
( )( )
ln 10 20ln 12,242
ln 2010 0
L
d L
d
θ θ θθ
θ θ
= − + −
= − + =
( )xL |θ
Introduction to the Probit model – Example
18
( )xL |θ
θ
( )xL |θ
( )xL |ln θ
2Maximumd
Ld
⇒
−=22
2 20)(ln
θθθ
![Page 19: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/19.jpg)
Application
� Analysis of the effect of a new teaching method in economic sciences
� Data: Beobachtung GPA TUCE PSI Grade Beobachtung GPA TUCE PSI Grade
1 2,66 20 0 0 17 2,75 25 0 02 2,89 22 0 0 18 2,83 19 0 0
19
Source: Spector, L. and M. Mazzeo, Probit Analysis and Economic Education. In: Journal of Economic Education, 11, 1980, pp.37-44
3 3,28 24 0 0 19 3,12 23 1 04 2,92 12 0 0 20 3,16 25 1 15 4 21 0 1 21 2,06 22 1 06 2,86 17 0 0 22 3,62 28 1 17 2,76 17 0 0 23 2,89 14 1 08 2,87 21 0 0 24 3,51 26 1 09 3,03 25 0 0 25 3,54 24 1 110 3,92 29 0 1 26 2,83 27 1 111 2,63 20 0 0 27 3,39 17 1 112 3,32 23 0 0 28 2,67 24 1 013 3,57 23 0 0 29 3,65 21 1 114 3,26 25 0 1 30 4 23 1 115 3,53 26 0 0 31 3,1 21 1 016 2,74 19 0 0 32 2,39 19 1 1
![Page 20: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/20.jpg)
Application – Variables
� GradeDependent variable. Indicates whether a student improved his grades after the new teaching method PSI had been introduced (0 = no, 1 = yes).
� PSI
20
� PSIIndicates if a student attended courses that used the new method (0 = no, 1 = yes).
� GPAAverage grade of the student
� TUCEScore of an intermediate test which shows previous knowledge of a topic.
![Page 21: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/21.jpg)
Application – Estimation
� Estimation results of the model (output from Stata):
21
![Page 22: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/22.jpg)
Application – Discussion
� ML estimator: Parameters were obtained by maximization of the log likelihood function.Here: 5 iterations were necessary to find the maximum of the log likelihood function (-12.818803)
� Interpretation of the estimated coefficients:
22
� Interpretation of the estimated coefficients:
� Estimated coefficients do not quantify the influence of the rhs variables on the probability that the lhs variable takes on the value one.
� Estimated coefficients are parameters of the latent model.
![Page 23: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/23.jpg)
Coefficients and marginal effects
� The marginal effect of a rhs variable is the effect of an unit change of this variable on the probability P(Y = 1|X = x), given that all other rhs variables are constant:
ββϕ )()|()|1( '
iiiii x
x
xyE
x
xyP =∂
∂=∂=∂
23
� Recap: The slope parameter of the linear regression model measures directly the marginal effect of the rhs variable on the lhs variable.
iii xx ∂∂
![Page 24: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/24.jpg)
Coefficients and marginal effects
� The marginal effect depends on the value of the rhs variable.
� Therefore, there exists an individual marginal effect for each person of the sample:
24
![Page 25: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/25.jpg)
Coefficients and marginal effects – Computation
� Two different types of marginal effects can be calculated:
� Average marginal effect Stata command: margin
25
� Marginal effect at the mean: Stata command: mfx compute
![Page 26: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/26.jpg)
Coefficients and marginal effects – Computation
� Principle of the computation of the average marginal effects:
26
� Average of individual marginal effects
![Page 27: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/27.jpg)
Coefficients and marginal effects – Computation
� Computation of average marginal effects depends on type of rhs variable:
� Continuous variables like TUCE and GPA:
ββϕ∑=
=n
iix
nAME
1
' )(1
27
� Dummy variable like PSI:
=in 1
[ ]∑=
=Φ−=Φ=n
i
kii
kii xxxx
nAME
1
'' )0()1(1 ββ
![Page 28: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/28.jpg)
Coefficients and marginal effects – Interpretation
� Interpretation of average marginal effects:
� Continuous variables like TUCE and GPA:An infinitesimal change of TUCE or GPA changes the probability that the lhs variable takes the value one by X%.
� Dummy variable like PSI:
28
� Dummy variable like PSI:A change of PSI from zero to one changes the probability that the lhs variable takes the value one by X percentage points.
![Page 29: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/29.jpg)
Coefficients and marginal effects – Interpretation
Variable Estimated marginal effect Interpretation
GPA 0.364 If the average grade of a student goes up by an infinitesimal amount, the probability for the variable grade taking the value one rises by
29
the value one rises by 36.4 %.
TUCE 0.011 Analog to GPA,with an increase of 1.1%.
PSI 0.374 If the dummy variable changes from zero to one, the probability for the variable grade taking the value one rises by 37.4 ppts.
![Page 30: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/30.jpg)
Coefficients and marginal effects – Significance
� Significance of a coefficient: test of the hypothesis whether a parameter is significantly different from zero.
� The decision problem is similar to the t-test, wheras the probit test statistic follows a standard normal distribution. The z-value is equal to the estimated parameter divided by
30
The z-value is equal to the estimated parameter divided by its standard error.
� Stata computes a p-value which shows directly the significance of a parameter:
z-value p-value Interpretation
GPA : 3.22 0.001 significant
TUCE: 0,62 0,533 insignificant
PSI: 2,67 0,008 significant
![Page 31: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/31.jpg)
Coefficients and marginal effects
� Only the average of the marginal effects is displayed.
� The individual marginal effects show large variation:
31
Stata command: margin, table
![Page 32: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/32.jpg)
Coefficients and marginal effects
� Variation of marginal effects may be quantified by the confidence intervals of the marginal effects.
� In which range one can expect a coefficient of the population?
In our example:
32
� In our example:
Estimated coefficient Confidence interval (95%)
GPA: 0,364 - 0,055 - 0,782
TUCE: 0,011 - 0,002 - 0,025
PSI: 0,374 0,121 - 0,626
![Page 33: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/33.jpg)
Coefficients and marginal effects
� What is calculated by mfx?
� Estimation of the marginal effect at the sample mean.
33
Sample mean
![Page 34: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/34.jpg)
Goodness of fit
� Goodness of fit may be judged by McFaddens Pseudo R².
� Measure for proximity of the model to the observed data.
� Comparison of the estimated model with a model which only contains a constant as rhs variable.
34
� : Likelihood of model of interest.
� : Likelihood with all coefficients except that of the intercept restricted to zero.
� It always holds that
)(ˆln FullML
)(ˆln InterceptML
)(ˆln FullML ≥ )(ˆln InterceptML
![Page 35: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/35.jpg)
Goodness of fit
� The Pseudo R² is defined as:
� Similar to the R² of the linear regression model, it holds
)(ˆln
)(ˆln122
Intercept
FullMcF
ML
MLRPseudoR −==
35
� Similar to the R² of the linear regression model, it holds that
� An increasing Pseudo R² may indicate a better fit of the model, whereas no simple interpretation like for the R² of the linear regression model is possible.
10 2 ≤≤ McFR
![Page 36: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/36.jpg)
Goodness of fit
� A high value of R²McF does not necessarily indicate a good fit, however, as R²McF = 1 if = 0.
� R²McF increases with additional rhs variables. Therefore, an adjusted measure may be appropriate:
)(ˆln FullML
)(ˆln KML −
36
� Further goodness of fit measures: R² of McKelvey and Zavoinas, Akaike Information Criterion (AIC), etc. See also the Stata command fitstat.
)(ˆln
)(ˆln122
Intercept
FullMcFadjusted
ML
KMLRPseudoR
−−==
![Page 37: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/37.jpg)
Hypothesis tests
� Likelihood ratio test: possibility for hypothesis testing, for example for variable relevance.
� Basic principle: Comparison of the log likelihood functions of the unrestricted model (ln LU) and that of the restricted model (ln LR)
37
model (ln LR)
� Test statistic:
� The test statistic follows a χ² distribution with degrees of freedom equal to the number of restrictions.
( )2R ULR 2ln 2(lnL lnL ) χ Kλ= − = − −
R
U
L0 1
Lλ λ= ≤ ≤
��
![Page 38: The Probit Model](https://reader030.fdocuments.net/reader030/viewer/2022020108/5892ea361a28ab90278ba587/html5/thumbnails/38.jpg)
Hypothesis tests
� Null hypothesis: All coefficients except that of the intercept are equal to zero.
� In the example:
� Prob > chi2 = 0.0014
2LR (3) 15,55χ =
38
� Interpretation: The hypothesis that all coefficients are equal to zero can be rejected at the 1 percent significance level.