Post on 05-Jan-2016
Issues in Estimation
Data Generating Process:
What behavior and what sampling process generated data that you have collected?
Estimation
• Are you gathering a random sample of all possible participants (e.g. telephone or mail survey of population)?
• Or, are you sampling on site?
1. Censored SamplesIf you sample a population of potential
participants, you will find that some took trips to the site of interest and some (many?) took no trips.
Plot of trip cost against number of trips for all observations in a hypothetical sample.
Number oftrips
Trip cost
x
x
xxx
xx
xx
x
xx
xx
x
x
x
x
x
x
x
0 trips
Non-participants
Here’s a hypothetical data set and actual least squares regression lines
0 5 1 0
trip s
1 0
2 0
3 0
4 0
c
o
s
t
Least Squares line including zeros
Least squares line excluding zeros
Which, if either, is right?
Answer: Neither
Censored Samples – empirical models to
analyze them:• Tobit model –
Assumes an underlying latent variable that could be negative
• Count models –Recognizes that trips are non-negative integers
• Sample selection models–Models the participation decision differently from the trips decision
Tobit Model
Underlying model
Latent variable:
iiiz i10 c*
But,
zi = zi*, if zi* > 0
zi = 0, if zi* 0
(To cut down on notation, 0i stands for the intercept and all other covariates that might be in the model, so it varies over individuals.)
Estimation by Maximum Likelihood
Every observation makes contribution to the likelihood function.
Contribution by non-trip takers:
Pr(zi* 0) =
where F is the cumulative distribution function
for ; the x’s are the explanatory variables in the model, including cost of access.
)(
]Pr[
]0Pr[
kkik
kkiki
ik
kik
xF
x
x
Contribution by trip takers:
]0Pr[]0Pr[
)(
iki
kk
ikik
k
kik
ki
xx
xzf
)0*Pr(]0*|)Pr[( iiikik
ki zzxz
)( kik
ki xzf
Note: this is the same expression as for ordinary least squares.
Tobit – maximize the following likelihood
function
Likelihood function equals:
k
ikkNi
kiTi k
ki xFxzf )()(
where T is the set of trip takers and N is the set of non-trip takers
For our simple example:Ordinary Least Squares estimates:
0 = 8.89
1 = -.282 = 2.4
Tobit estimates:
0 =13.81
1 = -.72
2 = 2.30 5 1 0
trip s
1 0
2 0
3 0
4 0
c
o
s
t
OLS
Tobit
How do we get welfare measures in the Tobit?
The Tobit is usually estimated in linear form.
The area behind a linear demand function is given by:
1
2~
10 2)(0
ii
c
c
ii
zdccCSi
i
But how do you evaluate this expression?
But what do you use for zi?
Do you use the individual’s actual number of trips?
Or do you use the predicted number of trips using the model?
1
.iz
ci
estimated function
i0
1/1 slope
zi
Use as estimate for 1;
If you want to use the predicted number of trips...
You must calculate the expected value of trips in the Tobit framework – which is a somewhat complicated expression.
Fortunately, LIMDEP* will do this for you in a simple command.
*LIMDEP is a software package by William Greene, Columbia University
You should know that expected trips will always be positive in the Tobit.
The answers can be quite different…
but the choice is not obvious.
In our simple example, the difference isn’t great.
Using Actual z Using Predicted z
Ave. trips 2.53 2.55
Ave. consumer
surplus $15.32 $13.93
Total CS for sample $459.60 $417.90
Difference in average consumer surplus is due to nonlinearity of consumer surplus in trips.
Reasons for using one rather than another…
Use the expected value of trips,
if you think the dominant source of “error” is from measurement.
Use the actual number of trips,
if you think the dominant source of “error” is from specification.
(Note: in the Tobit, the predicted number of trips is never zero.)
,ˆiz
,iz
Getting an estimate for the population
If your sample is a random sample of the population:
average CS * population
Count Models
The Tobit assumes an underlying latent variable that can take on negative values.
Count models explicitly account for the fact that the dependent variable, trips, can only be an integer and can only be non-negative.
Count Models..
…specify that the quantity demanded of trips is a non-negative random variable whose mean is a function of the exogenous regressors in the model.
The Poisson Distribution is a common choice
Poisson distribution:
Where the mean is i and it is usually modeled as:
!)Pr(
n
enz
ni
i
i
k
kiki x )exp(
Intuition?The Poisson model implies that
the number of trips a person decides to take is a random variable drawn from a distribution that only allows non-negative integers.
The distribution can be centered around different non-negative numbers, however, depending on the exogenous variables the individual faces.
E.g. A person with a relatively low access cost will face a distribution with a higher mean number of trips.
An individual’s contribution to the likelihood function in the Poisson is this very complicated looking expression:
(Note: 0! is defined mathematically as =1)
Fortunately, LIMDEP will estimate this for you without any hard work on your part.
!
))exp(())exp(exp(
i
kikik
kkik
z
zxx
Getting Welfare Measures in the Poisson
The expected number of trips for an individual is the mean of the Poisson distribution for that individual.
)exp()( 10 iiii czE
The mean is i in the above expression and is a usually specified as a semi-log function of the explanatory variables:
We saw earlier that…
the area under a semi-log demand function is given by:
Because CS is linear in trips for a semi-log function, it does not matter whether you use actual or expected trips. The answer is the same.
1
100
)exp(
i
c
iii
z
dccCSi
Welfare measures in our simple hypothetical case
Using Actual z Using Predicted z
Ave. trips 2.53 2.53
Ave. consumer
surplus $14.90 $14.90
Total CS
for sample $447.00 $447.00
The Poisson has the property that the mean of expected trips = mean of actual trips.
The formula for consumer surplus in a semi-log function is linear in trips.
THEREFORE, it does not matter in this model whether you use expected or actual trips.
Another Popular Count Model
The negative binomial distribution is also used often. It is a more general distribution than the Poisson, in that it does not constrain the mean and the variance to be equal.
See LIMDEP if you wish to estimate this model.
Participation vs Demand for Trips
In the above models, the same model affects how many trips a user takes and whether or not he is a user.
Suppose different factors affected– whether he used the site– how many times he used the site, if he did use the site
Two types of models (see LIMDEP): Combination of probit and truncated models (E.g. Cragg) Selection models (e.g. Heckman)
2. Truncated Samples
Now suppose you have only collected data from people who actually visit the site.
There will be no zeros in this dataset.
Do you still need to make econometric adjustments?
The answer is “YES”
Ordinary least squares assumes that every observation is drawn from a normal distribution with a given variance.
Let’s look at data again…
Remember the model is:
OLS assumes that
Number oftrips
Trip cost
xxx
x
xx
xx
x
x
x
x
x
0 trips
iiii cz 10
),0(~ 2 Ni
x
xDistribution istruncated for obsnear access
Relationship you want
Result of running OLS regression
OLS applied to truncated data
produces biased slope estimates if truncation is “relevant”.
The bias will generate a larger negative estimate for the slope of the line in the graph, which is really a smaller negative estimate for 1.
Since -1 is in the denominator of the consumer surplus formula, the result will be an over-estimate of consumer surplus.
Contribution to the Likelihood Function in the
Truncated Model
]0Pr[
)(
ikik
k
kik
ki
x
xzf
Pr (trips=zi|trips>0) =
2 4 6 8 1 0 1 2
trip s
5
1 0
1 5
2 0
cost
OLS Regression line
Truncated regression
The difference between the OLS and Truncated estimated relationship for our simple hypothetical data
Oh no, another problem!
The reason you have only non-zero observations for trips is probably because you sampled on site.
On-site sampling is often the only practical way to get enough information on users of a site.
But this, too, causes problems!
If you randomly sample on-site, you are actually randomly sampling trips instead of
trip-takers.
This is not a random sample of users of the site.
The problem is called “endogenous stratification”.
A simple example..Suppose there are only two types of
users:
25 users take 1 trip to site
75 users take 2 trips to site
Total number of trips taken = 175.
Average number of trips taken = 1.75.
Now, suppose you randomly sample trips (not users).
Prob. of encountering a 1-trip user = 25/175 = .14 (rather than .25)
Prob. of encountering a 2-trip user = 75/175 = .86 (rather than .75)
A solution to endogenous stratification is to weight each observation by 1/trips.
Parameter estimates for our little sample:
OLS 12.85 -0.59OLS weighted 9.99 -0.44Truncated 14.05 -0.74Trun weighted 13.76 -0.84
0 1
*Note: for many problems the truncated model does not converge in estimation.
**
A Better and Easier Alternative
Poisson Count Model:– Easy to estimate with truncation.– Easy to estimate with truncation
and endogenous stratification
“It turns out that”…..
You can solve both the truncation and the endogenous stratification problem by:
estimating the regular Poisson with the value zi –1 substituted for zi in estimation
Poisson Endogenous Stratification Results
and Welfare Estimates
Coeff. Std.Err. t-ratio P-valueConstant 2.882 0.244 11.8 2.89E-15COST -0.131 0.027 -4.78 1.72E-06
*Note: Remember that this is basically a semi-logdemand function so the parameters are not directlycomparable to the parameters in the previous models.
Welfare Calculation
Average WTP estimate for elimination of site
$24.30 of 13.
3.16loss
z
c
Note: must also be adjusted forendogenous stratification.
Mean number of trips =
z
N
1n
)1
(
N
nz
N=number of individuals sampledzn = number of trips taken by individual n