1 Demand Estimation
Transcript of 1 Demand Estimation
1 Demand Estimation
1. Motivation.
2. Estimation.
3. Consistent Estimation
4. Limitations of Logit.
5. BLP
6. Microdata
7. Semiparametrics
2 Motivation
• We begin our study of differentiated product mar-kets by describing the method of BLP (1995) for
demand estimation in differentiated product mar-
kets.
• We will also discuss some limitations of this methodand some possible extensions.
• BLP is a method for estimating demand in differ-entiated product markets using aggregate data.
• The method allows for endogenous prices and ran-dom coefficients.
• The method also allows for consistent estimationof the model parameters even if there is imperfect
competition.
3 A simple example
• To motivate the framework, consider the follow-ing simple example based on Berry (RAND, 1994).
• There are i = 1, ..., I (=∞) agents in t = 1, ..., Tmarkets.
• Each agent makes a choice between j = 1, ..., J
mutually exclusive alternatives.
• xj,t = (xjt,1, ..., xjt,K)0 is a K × 1vector of char-
acteristics for product j.
• Let pj,t denote the price of j at time t.
• ξj,t = ξj+ ξt+∆ξj,t denote an unobserved char-
acteristic/demand shock/measurement error in price.
• ξj is a permanent component for j, ξt is a com-
mon shock and ∆ξj,t is a product/time specific
shock for j.
• Specify the random utility as:
uijt = x0j,tβ − αpj,t + ξj,t + εij
• Assume that the error term corresponds to the
(conditional) logit model.
• Then the market share for j at time t is:
sjt(x, β, α, ξ) =exp(x0j,tβ − αpj,t + ξj,t)PJ
j0=1 exp(x0j0,tβ − αpj0,t + ξj0,t)
• Berry assumes that we are working with aggre-gate data and that, at the true parameter values,
sjt(x, β, α, ξ) = Sjt where Sjt denotes the ”true”
market share.
• This differs from the standard logit model in two
ways.
• First, we have unobserved heterogeneity/demandshock, ξj,t.
• Why ξj,t?
1. Observe list of product attributes is incomplete.
This goes back to hedonic regressions.
2. Measurement error in prices. Typically price data
is an average.
3. Without ξj,t, shares should not vary holding x0j,t pj,t
fixed. This is likely to be violated in some data
sets.
• Second, we are working with aggregate data in-stead of individual choices, as in the standard con-
ditional logit.
• Thus, the data set needs to contain market shares.
• Many of the methods we are going to study arenot valid if market shares are measured with error.
3.1 Estimation.
• Berry notes that the following transformation canbe made:
log(sjt(x, β, α, ξ)) = et + x0j,tβ − αpj,t + ξj,t
et = − log(JX
j0=1exp(x0j0,tβ − αpj0,t + ξj0,t))
• Next we assume a ”law of large numbers” so thatSjt = sjt(x, β, α, ξ) at the true parameters.
• If we normalize the utility of the outside good tozero, this implies that:
s0t(x, β, α, ξ) =exp(0)PJ
j0=1 exp(x0j0,tβ − αpj0,t + ξj0,t)
log s0t(x, β, α, ξ) = 0− et
• This implies that:
log(Sjt)− log(Sot) = x0j,tβ − αpj,t + ξj,t
• where Sot is the share of the outside good.
• Berry noted that an obvious way to estimate thismodel is by regression.
• The dependent variable is log(Sjt)− log(Sot)
• The independent variables are [x0j,t, pj,t]
• The error term is ξj,t.
• However, in general we would expect cov(pj,t, ξj,t) 6=0.
• In the presence of a demand shock, oligopolymodels suggest that firms should raise prices.
• Thus, ols estimates of β and α will be biased.
3.2 Consistent Estimation.
3.2.1 Fixed Effects
• A first approach to consistent estimation would
be to estimate the following fixed effects model:
log(Sjt)− log(Sot) = x0j,tβ − αpj,t + ξj + ξt +∆ξjt
• Where ξj is a brand fixed effect, ξt is a categorymarket/time shock
• The identifying assumption is E[∆ξjt|x0j,t, pj,t] =0
• This is clearly more appealing thatE[ξjt|x0j,t, pj,t] =0
• However, there are a couple of limitations.
• First, there may be colinearity between ξj and
xj,t if some characteristics for product j are time
invariant.
• Thus, a brand fixed effect does not allow us to
learn about the valuation of individual product
characteristics.
• Also, it presumes that cov(pjt,∆ξjt) = 0
• This assumes that in a given time period, productlevel price variation is exogenous.
• Remark: This type of assumption is commonlymade in marketing.
3.2.2 BLP Instruments
• A second approach to identification is to find a
set of instruments.
• That is, we need to find a variable zjt such thatE[ξjt|zjt] = 0, cov(zjt, [xjt, pjt])6= 0 (i.e. satis-
fies standard rank conditions for IV).
• One obvious instrument is a supply shifter (e.g.change in costs).
• Problem, there are too few instruments and theymay be weak.
• Weak instruments- standard errors incorrect, biaslarge.
• BLP and Berry(1994) suggest measures of isola-tion in product space.
• e.g. zjtk =Pj06=j xj0tk
• How much does product j contribute to the (un-weighted) average of characteristic k.
• This instrument is usually available and it tendsto be highly correlated with price.
• Models of oligopoly suggest the more isolated youare in product space, the more likely you are to
have a higher margin.
• Thus, prices will be correlated with zjtk.
• Critiques of this instrument-
1. Little variation over time.
2. Assumes cov(ξjt, xjtk) = 0.
• This assume that omitted product attributes areuncorrelated wth observed attributed.
• This seems hard to believe since the observed at-tributes are correlated with each other.
• This is a classic problem in demand estimation.
• In hedonic, researchers have long worried aboutthe consistency of:
pjt = x0jtβ + ξjt
• For example, in a home price regression, the ob-served attributes are likely to be correlated with
the unobserved attributes.
• Ackerberg, however, notes that if cov(zjtk, xjtk) =0 for all k, it is possible to consistently estimate
price elasticities for this model (even if other pa-
rameter estimates are biased).
• This condition is testable!
• Many questions can be answered with price elasticities-e.g. measurement of market power.
• As with the the fixed effects case above, it seemsmore appealing to assume:
E[∆ξjt|zjt] = 0
• This is possible if we include brand/time fixedeffects.
• Remark: Price endogeneity is being accounted forusing only demand side information.
3.3 Hausmann Instruments
• Hausmann proposes using prices in other marketsas instruments.
• E.g. use prices in Iowa, Wisconsin and the Dako-tas as instruments for price endogeneity in Min-
neapolis.
• The idea behind these instruments is that theypick up common cost shocks.
• However, if they pick up common demand shocks,they are invalid.
• In general, both the BLP and Hausmann instru-ments have the advantage of at least being avail-
able!
4 Limitations of the Logit
• Some Limitations of the Logit
• While the logit model is computationally conve-nient, it imposes some unpleasant restrictions on
the data.
• It is still widely used since there are few other
computationally convenient estimators.
1. Implausible substitution patters.
• In the logit model exhibits the independence ofirrelevant alternatives (IIA).
• That is, the ratio of the probability of two choicesdoes not change depending on the set of choices
that are available.
Pr(i chooses j)
Pr(i chooses j0)= constant
for all j and j0 regardless of the set of alternativesthat are available.
• A famous example is the red bus/blue bus prob-lem
• Suppose that we are studying the mode of trans-portation choice.
• Choice set is take the (red) bus to work or todrive.
• Suppose that these choices are equal in probabil-ity.
• Now suppose that the bus company introduces
blue buses in addition to red buses.
• Suppose that consumers are indifferent about thecolor of their bus and that the probability of the
red bus and blue bus is equal.
• IIA implies that prob(red bus)= prob(blue bus)
=prob(drive) =1/3.
• Amore ”intuitive” answer would be rob(red bus)=prob(blue bus)= 1/4 and prob(drive)=1/2.
• This example shows that IIA can give wierd sub-stitution patterns.
• This can also show up in terms of price elasticities.
• Suppose that we are modeling consumer demandfor a differentiated product.
• Suppose that the latent utilities are:
ynj = x0njβ − αpj + εnj
• where pj is price.
• Calculating the own and cross price elasticites.
ηjk =∂ Pr(i chooses j)
∂pk
pkPr(i chooses j)
=n−αpj(1− sj) if j = k
−αpksk
• Since in most cases there are many products sothat the market shares are typically small, (1−sj)is approximately equal to price.
• This implies that the lower the price the lower theelasticity.
• This implies that markups should be higher incheap products.
• This is clearly not appropriate in many industries.
• A second limitation is that cross price elasticitiesare determined by αpksk.
• Suppose that Lucky Charms and Grape Nuts aresimilarly priced and have a similar market share.
• An implication of this formula is that both ofthese will have the same cross price elasticity with
CoCo Puffs.
• This is clearly a priori implausible, yet it is anassumption that we have imposed through the
functional form.
3. Treatment of Heterogeneity.
• In the logit model, consumers are only heteroge-nous because of εij.
• εij can be thought of as adding additional product
characteristics into the model for each j and an iid
random preference shock for that characteristic.
• Caplin and Nalebuff argue that this generates toomuch ”taste for variety”.
• Applied studies of welfare, such as Petrin (JPE2002, Quantifying the Benefits of New Products:
The Case of the Minivan), argue that to much of
the utility comes from implausbly large draws of
the εij.
• Leads to pathological implications (e.g. markupsin Bertrand may not converge to zero as market
becomes thick).
• See Anderson, DePalma and Thisse.
5 BLP-Random Coefficients Logit.
• In Berry (1994) and BLP (1995), consumer pref-erences can be written as:
u(xj, ξj, pj, vi; θd)
where:
• xj = (xj,1, ..., xj,K) is a vector of K character-
istics of product j that are observed by both the
economist and the consumer.
• ξj is a characteristic of product j observed by the
consumer but not by the economist.
• pj is the price of good j
• vi vector of taste parameters for consumer i
• θd vector of demand parameters.
• One commonly used specification is the logit modelwith random (normal) coefficients:
uij = xjβi − αpj + ξj + εij
• The K random coefficients are:
βi,k = βk + σkηi,k
ηi,k ∼ N(0, 1), iid
• Consumer i will purchase good j if and only if it isutility maximizing, just as in the previous lecture.
• Question: How do we interpret the parameters ofthis model?
• It is useful to decompose utility into two parts, thefirst is a “mean” level of utility and the second is
a heteroskedastic error terms that captures the
effect of random tastes parameters:
υij =
⎡⎣Xk
xjkσkηi,k
⎤⎦+ εij
δj = xjβ − αpj + ξj
• We can now write utility of person i for product
j as:
uij = δj + υij
• Next, we will write the market shares for aggre-gate demand in a particularly convenient fashion.
First define the set of “error terms” that make
product j utility maximizing given the J dimen-
sional vector δ = (δj)
Aj(δ) =nυi = (vij)|δj + vij ≥ δj0 + vij0 for all j
0 6= jo
• The market share of product j can then be writtenas (assuming a law of large numbers):
sj(δ(x, p, ξ), x, θ) =ZAj(δ)
f(υ)dυ
• In this case, the parameter θ is β, α and σ.
• Given θ and the demand for product j actually
observed in the data, esj it must be the case that:
esj = sj(δ(x, p, ξ), x, θ)
• Given θ, this can be expressed as a system of J
equations in J unknowns (the ξj).
• To estimate, we find a set of instruments for theξj.
• We must find a set of instruments correlated withthe endogenous variable pj, but uncorrelated with
the residual ξj.
Commonly used instruments:
1. The product characteristics.
2. Prices of products in other markets (interpret ξjas a demand shifter).
3. Measures of isolation in product space (Pj06=j xj0,k)
4. Cost shifters.
• Question: Are these really valid instruments?
• Typically we think of product characteristics as achoice variable.
• Suppose that a firm chose product characteristics
optimally.
• Then the unobserved characteristics (to the econo-metrician) of a product would be independent of
the observed characteritics only under strong sep-
arability assumptions about cost and demand.
• The model written down probably violates theseparability assumptions on demand.
• A number of empirical case studies have been
done. They find that BLP style estimators typi-
cally find more elastic demand curves.
6 Firm Behavior.
• In the model above, we abstracted from the be-
havior of the firm.
• Suppose that firms engage in Bertrand price com-petition.
• Let firm f produce some set of products Pf .
• Then to profit maximization problem for firm f is
to choose prices pj for j ∈ Pf that maximize ex-
pected profit holding the prices of the other firms
fixed:
πf =Xj∈Pf
(pj −mcj)Msj(x, p, ξ, θ)
• Suppose that we know the function sj, then the
first order conditions for all of the products are a
system of J equations in J unknowns where the
unknowns are the latent cost parameters mcj.
• Note that if we recover the marginal cost parame-ters by assuming Bertrand price competition and
that the first order conditions hold, we could do
policy experiments.
• For instance, some have used this approproach tosimulate the effects of a merger.
• BLP (1995) proceeds in a similar fashion to Berry,except that it models the supply side as well by
assuming that firms are Bertrand price competi-
tors.
• We then need to find instruments for a set ofunobserved supply shifters.
7 Computation.
• In this section, I shall outline some of the keysteps needed to actually compute Berry (1994).
• A key step in many programming projects is to
do a fake data experiment.
• Simulate the model using fixed parameter values.
• Pretend you don’t know the parameter values andestimate.
• This tests the code and sometimes shows you lim-itations of the models.
• One of the best ways to really learn the econo-metrics in a paper is to do a fake data experiment.
• We shall consider as an example the random co-
efficinet logit model.
There are basically 4 things we need to do in order to
compute the value of the objective function in order
to do GMM.
1. For a given value of σ and δ, compute the vector
of market shares.
2. For a given value of σ, find the vector δ that
equates the observed market shares and those pre-
dicted by the model using the contraction map-
ping.
3. Given δ and β, α compute the value of ξ
4. Search for the value of ξ that mimizes the objec-
tive function.
7.1 Computing Market Shares.
• In the random coefficient logit model, we can
compute the market shares, given δ as follows:
sj(δ, σ) =Z exp(δj +
Pk xj,kηi,kσk)
1 +Pj0 exp(δj0 +
Pk xj0,kηi,kσk)
df(ηi)
• In practice, the integral above is computed usingsimulation.
• Make a set of S simulation draws and keep themfixed for the whole problem.
• Sometimes importance sampling is useful in orderto improve the speed/accuracy of the integration.
• See Judd for an overview of numerical integration.
• We can compute confidence intervals using stan-dard methods to see whether the simulated mar-
ket shares are well estimated.
7.2 The contraction mapping.
• Next, we wish to find the δ that matches the
observed market shares given σ.
• In Berry and BLP they demonstrate that the fol-lowing is a contraction:
δ(n+1)j = δ
(n)j + ln(esj)− ln(sj(δ, σ))
• Therefore, given that we can compute marketshares, we can use the formula above to find the
value of δ by making an initial guess at δ and then
evaluting the equation above until convergence is
(approximately) achieved.
• A mapping T that maps S → S is a contraction
with modulus β if for all x, y d(T ◦ x, T ◦ y) ≤βd(x, y).
• A contraction mapping has a unique fixed point.
• Let vo be an initial guess about the fixed pointv. Let Tn(vo) denote applying the mapping n
times, as in the previous equation.
• This converges to the fixed point at an exponen-tial rate.
• Point: Market shares can be inverted very quicklyin a fairly simple manner!
• Contraction mappings are used all the time in eco-nomics, particularly in modern Macro.
• See Stokey and Lucas, chapters 4 and 5 for proofs.
8 Computing the value of ξ
• The next set is simple. Just let:
ξj = δj − (xjβ − αpj)
where δj is computed using the contraction mapping.
8.1 Computing the value of the objective
function.
• Let Z be the set of instruments.
• The objective function is formulated as in all GMMproblems assuming E (ξ|Z) = 0.
• The econometrician then chooses β, α, and σ inorder to minimize the objective function.
• Standard mathematical programs (MATLAB, GAUSS,IMSL,NAG) contain software for optimization prob-
lems.
• One standard way to proceed is to do a roughglobal search first and then use a derivative based
method second once you have a very rough sense
of the overall shape of the objective function.
• Multiple starting points commonly used in orderto search for multiple local solutions to minimiza-
tion problem..
• See Judd for an overview of numerical minimiza-tion.
• Doing a ”fake data experiment” is a good way tolearn how well the estimator works.
• Fix true parameters, simulate the model. Then
see if your computations allow you to get back
the correct answer.
9 Individual Level Data
• These models are discussed in some detail in Cameronand Travedi, Chapter 15.
• In the notation of Cameron and Trivedi, j =
1, ..., J indexes choices and i = 1, ..., I indexes
households.
• That is:
Uij = x0ijβi + εijβi ∼ N(β,Σβ)
• In the above, εij comes from the Weibull distrib-
ution as before.
• Each household i is allowed to have a unique setof marginal utilities which come from a normal
distribution with unknown mean and variance.
• In this model, the probability that houshold i choosesproduct j, pij is therefore:
pij =exp(x0njβi)
1 +JX
j0=1exp(x0nj0βi)
• The probability of choice j, pj is therefore:
pj(β,Σβ) =Z exp(x0njβi)
1 +JX
j0=1exp(x0nj0βi)
φ(βi|β,Σβ)dβi
• where φ(βi|β,Σβ) is the normal density.
• We could in principal estimate the model usingMLE since our model generates a likelihood for
the choice probabilities.
• If x0nj has a large dimension (e.g. there are manycharacteristics), then evaluation of the above in-
tegral is difficult.
• Therefore, we need to estimate these models us-ing simulation.
• We will study the theory of simulation in detailnext week, however, we will sketch how to form
a simulated likelihood function.
• Suppose that the βi can be written as:
βi,k = βk + ηiσk k = 1, ...,K
ηi standard normal
• In this specification, we are assuming that therandom coefficients are independently distributed
across k with a normal distribution of mean βkand standard deviation σk.
• In the simplest simulation based estimator, wecould make s = 1, ..., S monte carlo draws η
(s)i
of the random coefficients for each household i.
• A monte carlo estimator of bpj(β,Σβ) is then:
bpj(β,Σβ) =1
S
X exp(x0njβk +Pηiσkxnjk)
1 +JX
j0=1exp(x0nj0βk +
Pηiσkxnj0k)
• The ”simulated” likelihood function would thenbe:
ln bL(β,Σβ) =NXi=1
JXj=1
ynj log³bpj(β,Σβ)
´
• If we let the number of simulations become infi-nite (at an appropriate rate) as the sample size
N → ∞, this will yield a consistent estimator ofour model parameters.
• It is also possible to derive the asymptotic vari-ance matrix in a reasonably straightforward way.
• There are some limitations, however.
• A first limitation is that this estimator is in generalbiased.
• An alternative, unbiased estimator is based on aNLLS approach:
NXn=1
xn,r³yn,j − bpj(β,Σβ)
´= 0
r = 1, ..,K and j = 1, ..., J
• Unfortunately, this estimator is not efficient ingeneral and may not even be smooth without us-
ing some fairly sophisticated numerical approaches.
• A second limitation is the variance of our esti-
mates may be high if the distribution of random
coefficients is flexibly specified.
• Hence, tightly parameterized models are required.
• Computational burden increases considerably innumber of choices.
• An alternative approach is to use Gibbs sampling.
10 A semiparametric alternative
• Briefly discuss a computationally simple, but flex-ible estimator due to Bajari, Fox and Ryan (2006).
• Let (β(r)1 , ..., β(r)K ) for r = 1, ...,∞ be a sequence
of real vectors that is dense in the domain of βi.
• Assume that the random preference shock comes
from the Weibull distribution as before.
• We will chose a large, but finite number of pointsof support r = 1, ..., R for the distribution of ran-
dom coefficients.
• Let p(r) denote the probability that βi = (β(r)1 , ..., β
(r)K ).
• Let P (j) denote the probability that the choice jis made. Then
P (j|xij) =RXr=1
p(r)
⎛⎜⎜⎜⎜⎜⎜⎜⎝exp(β(r)xij)JX
j0=1exp(β(r)xij0)
⎞⎟⎟⎟⎟⎟⎟⎟⎠
• Note that in the above we let regressors vary byboth j and i.
• Let yij = 1 if consumer i chooses j and zero
otherwise.
• Straightforward algebra implies that:
yij =IX
i=1
RXr=1
p(r)
⎛⎜⎜⎜⎜⎜⎜⎜⎝exp(β(r)xij)JX
j0=1exp(β(r)xij0)
⎞⎟⎟⎟⎟⎟⎟⎟⎠+ ejtm
(1)
for j = 1, ..., J, i = 1, ..., I (2)
where ejtm = yij − P (j|xij)
• Since ejtm is pure forecast error due to random
sampling, it is orthogonal to all of our regressors
and functions of our regressors.
• An attractive feature of this model is that it is lin-ear in the parameters p(r) and we do not require
nonlinear maximization to find the estimator.
• If we let R be sufficiently large, this can approx-
imate any discrete choice model to an arbitrary
degree of precision due to a result by McFadden
and Trian (2000).
• A first (naive) estimator for this model would beto minimize:
bp = argminp
1
I
IXi=1
⎛⎜⎜⎜⎜⎜⎜⎜⎝yij −RXr=1
p(r)
⎛⎜⎜⎜⎜⎜⎜⎜⎝exp(β(r)xij)JX
j0=1exp(β(r)xij0)
⎞⎟⎟⎟⎟⎟⎟⎟⎠
⎞⎟⎟⎟⎟⎟⎟⎟⎠
2
• Note that this is just regression!
• We would then naively interpret our estimates asthe probabilities p(r).
• In Monte Carlo studies, this performed poorly.Many of the coefficients p(r) were negative forinstance.
Instead, we propose using the following estimator:
bp = argminp
1
I
IXi=1
⎛⎜⎜⎜⎜⎜⎜⎜⎝yij −RXr=1
p(r)
⎛⎜⎜⎜⎜⎜⎜⎜⎝exp(β(r)xij)JX
j0=1exp(β(r)xij0)
⎞⎟⎟⎟⎟⎟⎟⎟⎠
⎞⎟⎟⎟⎟⎟⎟⎟⎠
2
s.t. p(r) ≥ 0 andXrp(r) = 1
• This is an inequality constrained regression, asconsidered by Judge and Takayam (1966), Geweke(1986) and Wolak (1987).
• This is a straightforward quadratic programmingproblem.
• The algorithm for finding the solution is impli-
mented in standard software packages including
Matlab.
• The standard errors are simple for this model andcorrepond to simple modifications of OLS formu-
las.
11 Identification.
• An important question to ask is whether our ran-dom coefficient discrete choice models are identi-
fied.
• That is, can the primitives (i.e. random utilities)
be uniquely recovered from the data.
• The answer to this question in general is no.
• To see why, suppose that there was only a singleconsumer with a deterministic utility.
• This is a special case of our more general, randomutility framework.
• Utility functions cannot be identified from choice
behavior.
• We can always make monotonic transformations.
• Therefore, in general, distributions over utility func-tions cannot be identified.
11.1 Quasi Linear Preferences.
• One case where we can identify our model is thecase of quasi linear preferences.
• Consider a simple example where there are i =
1, ..., 3 consumers choose between two goods j =
1 or 2 and an outside option ( j = 0).
• If utility is quasi linear, WLOG we can write theutility function for consumer i as:
ui(j, c) = βi,11{j = 1}+ βi,21{j = 2}+ c
• For tour simple example, suppose that³β1,1, β1,2
´=
(1, 2) and³β2,1, β2,2
´= (5, 6).
• If p2 is sufficiently high, the demand for good 1will be two if p1 is less than 1, one unit if p1 is
less than 5 and zero if p1 exceeds 5.
• One consumer has a marginal utility for good 1equal to 1 and another person has marginal utility
of 5.
• In a similar fashion, the economist can learn thatthe marginal utilities for good 2 are equal to 2
and 6.
• At this point, cannot determined whether³β1,1, β1,2
´=
(1, 2) or³β1,1, β1,2
´= (1, 6).
• However, note that when p1 = 1 and p2 = 2, theconsumer is exactly indifferent between consum-
ing good 1 and good 2.
• Therefore, the demand changes discontinuouslyat this point.
• In fact, demand changes discontinuously alongthe plane where βi,1 − p1 = βi,2 − p2 and p1 ≤1, p2 ≤ 2.
• Therefore, we can conclude that the preferencesof consumers in this market can be represented
by³β1,1, β1,2
´= (1, 2) and
³β2,1, β2,2
´= (5, 6).
• More generally, using this type of logic, we candemonstrate that the distribution of random co-
efficients for the model below is identified:
ui(j, c) =JX
j0=1βi,11{j = j0}+ c (3)