1 Demand Estimation

1 Demand Estimation

1. Motivation.

2. Estimation.

3. Consistent Estimation

4. Limitations of Logit.

5. BLP

6. Microdata

7. Semiparametrics

2 Motivation

• We begin our study of differentiated product mar-kets by describing the method of BLP (1995) for

demand estimation in differentiated product mar-

kets.

• We will also discuss some limitations of this methodand some possible extensions.

• BLP is a method for estimating demand in differ-entiated product markets using aggregate data.

• The method allows for endogenous prices and ran-dom coefficients.

• The method also allows for consistent estimationof the model parameters even if there is imperfect

competition.

3 A simple example

• To motivate the framework, consider the follow-ing simple example based on Berry (RAND, 1994).

• There are i = 1, ..., I (=∞) agents in t = 1, ..., Tmarkets.

• Each agent makes a choice between j = 1, ..., J

mutually exclusive alternatives.

• xj,t = (xjt,1, ..., xjt,K)0 is a K × 1vector of char-

acteristics for product j.

• Let pj,t denote the price of j at time t.

• ξj,t = ξj+ ξt+∆ξj,t denote an unobserved char-

acteristic/demand shock/measurement error in price.

• ξj is a permanent component for j, ξt is a com-

mon shock and ∆ξj,t is a product/time specific

shock for j.

• Specify the random utility as:

uijt = x0j,tβ − αpj,t + ξj,t + εij

• Assume that the error term corresponds to the

(conditional) logit model.

• Then the market share for j at time t is:

sjt(x, β, α, ξ) =exp(x0j,tβ − αpj,t + ξj,t)PJ

j0=1 exp(x0j0,tβ − αpj0,t + ξj0,t)

• Berry assumes that we are working with aggre-gate data and that, at the true parameter values,

sjt(x, β, α, ξ) = Sjt where Sjt denotes the ”true”

market share.

• This differs from the standard logit model in two

ways.

• First, we have unobserved heterogeneity/demandshock, ξj,t.

• Why ξj,t?

1. Observe list of product attributes is incomplete.

This goes back to hedonic regressions.

2. Measurement error in prices. Typically price data

is an average.

3. Without ξj,t, shares should not vary holding x0j,t pj,t

fixed. This is likely to be violated in some data

sets.

• Second, we are working with aggregate data in-stead of individual choices, as in the standard con-

ditional logit.

• Thus, the data set needs to contain market shares.

• Many of the methods we are going to study arenot valid if market shares are measured with error.

3.1 Estimation.

• Berry notes that the following transformation canbe made:

log(sjt(x, β, α, ξ)) = et + x0j,tβ − αpj,t + ξj,t

et = − log(JX

j0=1exp(x0j0,tβ − αpj0,t + ξj0,t))

• Next we assume a ”law of large numbers” so thatSjt = sjt(x, β, α, ξ) at the true parameters.

• If we normalize the utility of the outside good tozero, this implies that:

s0t(x, β, α, ξ) =exp(0)PJ

j0=1 exp(x0j0,tβ − αpj0,t + ξj0,t)

log s0t(x, β, α, ξ) = 0− et

• This implies that:

log(Sjt)− log(Sot) = x0j,tβ − αpj,t + ξj,t

• where Sot is the share of the outside good.

• Berry noted that an obvious way to estimate thismodel is by regression.

• The dependent variable is log(Sjt)− log(Sot)

• The independent variables are [x0j,t, pj,t]

• The error term is ξj,t.

• However, in general we would expect cov(pj,t, ξj,t) 6=0.

• In the presence of a demand shock, oligopolymodels suggest that firms should raise prices.

• Thus, ols estimates of β and α will be biased.

3.2 Consistent Estimation.

3.2.1 Fixed Effects

• A first approach to consistent estimation would

be to estimate the following fixed effects model:

log(Sjt)− log(Sot) = x0j,tβ − αpj,t + ξj + ξt +∆ξjt

• Where ξj is a brand fixed effect, ξt is a categorymarket/time shock

• The identifying assumption is E[∆ξjt|x0j,t, pj,t] =0

• This is clearly more appealing thatE[ξjt|x0j,t, pj,t] =0

• However, there are a couple of limitations.

• First, there may be colinearity between ξj and

xj,t if some characteristics for product j are time

invariant.

• Thus, a brand fixed effect does not allow us to

learn about the valuation of individual product

characteristics.

• Also, it presumes that cov(pjt,∆ξjt) = 0

• This assumes that in a given time period, productlevel price variation is exogenous.

• Remark: This type of assumption is commonlymade in marketing.

3.2.2 BLP Instruments

• A second approach to identification is to find a

set of instruments.

• That is, we need to find a variable zjt such thatE[ξjt|zjt] = 0, cov(zjt, [xjt, pjt])6= 0 (i.e. satis-

fies standard rank conditions for IV).

• One obvious instrument is a supply shifter (e.g.change in costs).

• Problem, there are too few instruments and theymay be weak.

• Weak instruments- standard errors incorrect, biaslarge.

• BLP and Berry(1994) suggest measures of isola-tion in product space.

• e.g. zjtk =Pj06=j xj0tk

• How much does product j contribute to the (un-weighted) average of characteristic k.

• This instrument is usually available and it tendsto be highly correlated with price.

• Models of oligopoly suggest the more isolated youare in product space, the more likely you are to

have a higher margin.

• Thus, prices will be correlated with zjtk.

• Critiques of this instrument-

1. Little variation over time.

2. Assumes cov(ξjt, xjtk) = 0.

• This assume that omitted product attributes areuncorrelated wth observed attributed.

• This seems hard to believe since the observed at-tributes are correlated with each other.

• This is a classic problem in demand estimation.

• In hedonic, researchers have long worried aboutthe consistency of:

pjt = x0jtβ + ξjt

• For example, in a home price regression, the ob-served attributes are likely to be correlated with

the unobserved attributes.

• Ackerberg, however, notes that if cov(zjtk, xjtk) =0 for all k, it is possible to consistently estimate

price elasticities for this model (even if other pa-

rameter estimates are biased).

• This condition is testable!

• Many questions can be answered with price elasticities-e.g. measurement of market power.

• As with the the fixed effects case above, it seemsmore appealing to assume:

E[∆ξjt|zjt] = 0

• This is possible if we include brand/time fixedeffects.

• Remark: Price endogeneity is being accounted forusing only demand side information.

3.3 Hausmann Instruments

• Hausmann proposes using prices in other marketsas instruments.

• E.g. use prices in Iowa, Wisconsin and the Dako-tas as instruments for price endogeneity in Min-

neapolis.

• The idea behind these instruments is that theypick up common cost shocks.

• However, if they pick up common demand shocks,they are invalid.

• In general, both the BLP and Hausmann instru-ments have the advantage of at least being avail-

able!

4 Limitations of the Logit

• Some Limitations of the Logit

• While the logit model is computationally conve-nient, it imposes some unpleasant restrictions on

the data.

• It is still widely used since there are few other

computationally convenient estimators.

1. Implausible substitution patters.

• In the logit model exhibits the independence ofirrelevant alternatives (IIA).

• That is, the ratio of the probability of two choicesdoes not change depending on the set of choices

that are available.

Pr(i chooses j)

Pr(i chooses j0)= constant

for all j and j0 regardless of the set of alternativesthat are available.

• A famous example is the red bus/blue bus prob-lem

• Suppose that we are studying the mode of trans-portation choice.

• Choice set is take the (red) bus to work or todrive.

• Suppose that these choices are equal in probabil-ity.

• Now suppose that the bus company introduces

blue buses in addition to red buses.

• Suppose that consumers are indifferent about thecolor of their bus and that the probability of the

red bus and blue bus is equal.

• IIA implies that prob(red bus)= prob(blue bus)

=prob(drive) =1/3.

• Amore ”intuitive” answer would be rob(red bus)=prob(blue bus)= 1/4 and prob(drive)=1/2.

• This example shows that IIA can give wierd sub-stitution patterns.

• This can also show up in terms of price elasticities.

• Suppose that we are modeling consumer demandfor a differentiated product.

• Suppose that the latent utilities are:

ynj = x0njβ − αpj + εnj

• where pj is price.

• Calculating the own and cross price elasticites.

ηjk =∂ Pr(i chooses j)

∂pk

pkPr(i chooses j)

=n−αpj(1− sj) if j = k

−αpksk

• Since in most cases there are many products sothat the market shares are typically small, (1−sj)is approximately equal to price.

• This implies that the lower the price the lower theelasticity.

• This implies that markups should be higher incheap products.

• This is clearly not appropriate in many industries.

• A second limitation is that cross price elasticitiesare determined by αpksk.

• Suppose that Lucky Charms and Grape Nuts aresimilarly priced and have a similar market share.

• An implication of this formula is that both ofthese will have the same cross price elasticity with

CoCo Puffs.

• This is clearly a priori implausible, yet it is anassumption that we have imposed through the

functional form.

3. Treatment of Heterogeneity.

• In the logit model, consumers are only heteroge-nous because of εij.

• εij can be thought of as adding additional product

characteristics into the model for each j and an iid

random preference shock for that characteristic.

• Caplin and Nalebuff argue that this generates toomuch ”taste for variety”.

• Applied studies of welfare, such as Petrin (JPE2002, Quantifying the Benefits of New Products:

The Case of the Minivan), argue that to much of

the utility comes from implausbly large draws of

the εij.

• Leads to pathological implications (e.g. markupsin Bertrand may not converge to zero as market

becomes thick).

• See Anderson, DePalma and Thisse.

5 BLP-Random Coefficients Logit.

• In Berry (1994) and BLP (1995), consumer pref-erences can be written as:

u(xj, ξj, pj, vi; θd)

where:

• xj = (xj,1, ..., xj,K) is a vector of K character-

istics of product j that are observed by both the

economist and the consumer.

• ξj is a characteristic of product j observed by the

consumer but not by the economist.

• pj is the price of good j

• vi vector of taste parameters for consumer i

• θd vector of demand parameters.

• One commonly used specification is the logit modelwith random (normal) coefficients:

uij = xjβi − αpj + ξj + εij

• The K random coefficients are:

βi,k = βk + σkηi,k

ηi,k ∼ N(0, 1), iid

• Consumer i will purchase good j if and only if it isutility maximizing, just as in the previous lecture.

• Question: How do we interpret the parameters ofthis model?

• It is useful to decompose utility into two parts, thefirst is a “mean” level of utility and the second is

a heteroskedastic error terms that captures the

effect of random tastes parameters:

υij =

⎡⎣Xk

xjkσkηi,k

⎤⎦+ εij

δj = xjβ − αpj + ξj

• We can now write utility of person i for product

j as:

uij = δj + υij

• Next, we will write the market shares for aggre-gate demand in a particularly convenient fashion.

First define the set of “error terms” that make

product j utility maximizing given the J dimen-

sional vector δ = (δj)

Aj(δ) =nυi = (vij)|δj + vij ≥ δj0 + vij0 for all j

0 6= jo

• The market share of product j can then be writtenas (assuming a law of large numbers):

sj(δ(x, p, ξ), x, θ) =ZAj(δ)

f(υ)dυ

• In this case, the parameter θ is β, α and σ.

• Given θ and the demand for product j actually

observed in the data, esj it must be the case that:

esj = sj(δ(x, p, ξ), x, θ)

• Given θ, this can be expressed as a system of J

equations in J unknowns (the ξj).

• To estimate, we find a set of instruments for theξj.

• We must find a set of instruments correlated withthe endogenous variable pj, but uncorrelated with

the residual ξj.

Commonly used instruments:

1. The product characteristics.

2. Prices of products in other markets (interpret ξjas a demand shifter).

3. Measures of isolation in product space (Pj06=j xj0,k)

4. Cost shifters.

• Question: Are these really valid instruments?

• Typically we think of product characteristics as achoice variable.

• Suppose that a firm chose product characteristics

optimally.

• Then the unobserved characteristics (to the econo-metrician) of a product would be independent of

the observed characteritics only under strong sep-

arability assumptions about cost and demand.

• The model written down probably violates theseparability assumptions on demand.

• A number of empirical case studies have been

done. They find that BLP style estimators typi-

cally find more elastic demand curves.

6 Firm Behavior.

• In the model above, we abstracted from the be-

havior of the firm.

• Suppose that firms engage in Bertrand price com-petition.

• Let firm f produce some set of products Pf .

• Then to profit maximization problem for firm f is

to choose prices pj for j ∈ Pf that maximize ex-

pected profit holding the prices of the other firms

fixed:

πf =Xj∈Pf

(pj −mcj)Msj(x, p, ξ, θ)

• Suppose that we know the function sj, then the

first order conditions for all of the products are a

system of J equations in J unknowns where the

unknowns are the latent cost parameters mcj.

• Note that if we recover the marginal cost parame-ters by assuming Bertrand price competition and

that the first order conditions hold, we could do

policy experiments.

• For instance, some have used this approproach tosimulate the effects of a merger.

• BLP (1995) proceeds in a similar fashion to Berry,except that it models the supply side as well by

assuming that firms are Bertrand price competi-

tors.

• We then need to find instruments for a set ofunobserved supply shifters.

• BLP propose the use of product characteristics.

7 Computation.

• In this section, I shall outline some of the keysteps needed to actually compute Berry (1994).

• A key step in many programming projects is to

do a fake data experiment.

• Simulate the model using fixed parameter values.

• Pretend you don’t know the parameter values andestimate.

• This tests the code and sometimes shows you lim-itations of the models.

• One of the best ways to really learn the econo-metrics in a paper is to do a fake data experiment.

• We shall consider as an example the random co-

efficinet logit model.

There are basically 4 things we need to do in order to

compute the value of the objective function in order

to do GMM.

1. For a given value of σ and δ, compute the vector

of market shares.

2. For a given value of σ, find the vector δ that

equates the observed market shares and those pre-

dicted by the model using the contraction map-

ping.

3. Given δ and β, α compute the value of ξ

4. Search for the value of ξ that mimizes the objec-

tive function.

• We shall consider these one at a time.

7.1 Computing Market Shares.

• In the random coefficient logit model, we can

compute the market shares, given δ as follows:

sj(δ, σ) =Z exp(δj +

Pk xj,kηi,kσk)

1 +Pj0 exp(δj0 +

Pk xj0,kηi,kσk)

df(ηi)

• In practice, the integral above is computed usingsimulation.

• Make a set of S simulation draws and keep themfixed for the whole problem.

• Sometimes importance sampling is useful in orderto improve the speed/accuracy of the integration.

• See Judd for an overview of numerical integration.

• We can compute confidence intervals using stan-dard methods to see whether the simulated mar-

ket shares are well estimated.

7.2 The contraction mapping.

• Next, we wish to find the δ that matches the

observed market shares given σ.

• In Berry and BLP they demonstrate that the fol-lowing is a contraction:

δ(n+1)j = δ

(n)j + ln(esj)− ln(sj(δ, σ))

• Therefore, given that we can compute marketshares, we can use the formula above to find the

value of δ by making an initial guess at δ and then

evaluting the equation above until convergence is

(approximately) achieved.

• A mapping T that maps S → S is a contraction

with modulus β if for all x, y d(T ◦ x, T ◦ y) ≤βd(x, y).

• A contraction mapping has a unique fixed point.

• Let vo be an initial guess about the fixed pointv. Let Tn(vo) denote applying the mapping n

times, as in the previous equation.

• This converges to the fixed point at an exponen-tial rate.

• Point: Market shares can be inverted very quicklyin a fairly simple manner!

• Contraction mappings are used all the time in eco-nomics, particularly in modern Macro.

• See Stokey and Lucas, chapters 4 and 5 for proofs.

8 Computing the value of ξ

• The next set is simple. Just let:

ξj = δj − (xjβ − αpj)

where δj is computed using the contraction mapping.

8.1 Computing the value of the objective

function.

• Let Z be the set of instruments.

• The objective function is formulated as in all GMMproblems assuming E (ξ|Z) = 0.

• The econometrician then chooses β, α, and σ inorder to minimize the objective function.

• Standard mathematical programs (MATLAB, GAUSS,IMSL,NAG) contain software for optimization prob-

lems.

• One standard way to proceed is to do a roughglobal search first and then use a derivative based

method second once you have a very rough sense

of the overall shape of the objective function.

• Multiple starting points commonly used in orderto search for multiple local solutions to minimiza-

tion problem..

• See Judd for an overview of numerical minimiza-tion.

• Doing a ”fake data experiment” is a good way tolearn how well the estimator works.

• Fix true parameters, simulate the model. Then

see if your computations allow you to get back

the correct answer.

9 Individual Level Data

• These models are discussed in some detail in Cameronand Travedi, Chapter 15.

• In the notation of Cameron and Trivedi, j =

1, ..., J indexes choices and i = 1, ..., I indexes

households.

• That is:

Uij = x0ijβi + εijβi ∼ N(β,Σβ)

• In the above, εij comes from the Weibull distrib-

ution as before.

• Each household i is allowed to have a unique setof marginal utilities which come from a normal

distribution with unknown mean and variance.

• In this model, the probability that houshold i choosesproduct j, pij is therefore:

pij =exp(x0njβi)

1 +JX

j0=1exp(x0nj0βi)

• The probability of choice j, pj is therefore:

pj(β,Σβ) =Z exp(x0njβi)

1 +JX

j0=1exp(x0nj0βi)

φ(βi|β,Σβ)dβi

• where φ(βi|β,Σβ) is the normal density.

• We could in principal estimate the model usingMLE since our model generates a likelihood for

the choice probabilities.

• If x0nj has a large dimension (e.g. there are manycharacteristics), then evaluation of the above in-

tegral is difficult.

• Therefore, we need to estimate these models us-ing simulation.

• We will study the theory of simulation in detailnext week, however, we will sketch how to form

a simulated likelihood function.

• Suppose that the βi can be written as:

βi,k = βk + ηiσk k = 1, ...,K

ηi standard normal

• In this specification, we are assuming that therandom coefficients are independently distributed

across k with a normal distribution of mean βkand standard deviation σk.

• In the simplest simulation based estimator, wecould make s = 1, ..., S monte carlo draws η

(s)i

of the random coefficients for each household i.

• A monte carlo estimator of bpj(β,Σβ) is then:

bpj(β,Σβ) =1

S

X exp(x0njβk +Pηiσkxnjk)

1 +JX

j0=1exp(x0nj0βk +

Pηiσkxnj0k)

• The ”simulated” likelihood function would thenbe:

ln bL(β,Σβ) =NXi=1

JXj=1

ynj log³bpj(β,Σβ)

´

• If we let the number of simulations become infi-nite (at an appropriate rate) as the sample size

N → ∞, this will yield a consistent estimator ofour model parameters.

• It is also possible to derive the asymptotic vari-ance matrix in a reasonably straightforward way.

• There are some limitations, however.

• A first limitation is that this estimator is in generalbiased.

• An alternative, unbiased estimator is based on aNLLS approach:

NXn=1

xn,r³yn,j − bpj(β,Σβ)

´= 0

r = 1, ..,K and j = 1, ..., J

• Unfortunately, this estimator is not efficient ingeneral and may not even be smooth without us-

ing some fairly sophisticated numerical approaches.

• A second limitation is the variance of our esti-

mates may be high if the distribution of random

coefficients is flexibly specified.

• Hence, tightly parameterized models are required.

• Computational burden increases considerably innumber of choices.

• An alternative approach is to use Gibbs sampling.

10 A semiparametric alternative

• Briefly discuss a computationally simple, but flex-ible estimator due to Bajari, Fox and Ryan (2006).

• Let (β(r)1 , ..., β(r)K ) for r = 1, ...,∞ be a sequence

of real vectors that is dense in the domain of βi.

• Assume that the random preference shock comes

from the Weibull distribution as before.

• We will chose a large, but finite number of pointsof support r = 1, ..., R for the distribution of ran-

dom coefficients.

• Let p(r) denote the probability that βi = (β(r)1 , ..., β

(r)K ).

• Let P (j) denote the probability that the choice jis made. Then

P (j|xij) =RXr=1

p(r)

⎛⎜⎜⎜⎜⎜⎜⎜⎝exp(β(r)xij)JX

j0=1exp(β(r)xij0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠

• Note that in the above we let regressors vary byboth j and i.

• Let yij = 1 if consumer i chooses j and zero

otherwise.

• Straightforward algebra implies that:

yij =IX

i=1

RXr=1

p(r)


j0=1exp(β(r)xij0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠+ ejtm

(1)

for j = 1, ..., J, i = 1, ..., I (2)

where ejtm = yij − P (j|xij)

• Since ejtm is pure forecast error due to random

sampling, it is orthogonal to all of our regressors

and functions of our regressors.

• An attractive feature of this model is that it is lin-ear in the parameters p(r) and we do not require

nonlinear maximization to find the estimator.

• If we let R be sufficiently large, this can approx-

imate any discrete choice model to an arbitrary

degree of precision due to a result by McFadden

and Trian (2000).

• A first (naive) estimator for this model would beto minimize:

bp = argminp

1

I

IXi=1

⎛⎜⎜⎜⎜⎜⎜⎜⎝yij −RXr=1

p(r)


j0=1exp(β(r)xij0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠

⎞⎟⎟⎟⎟⎟⎟⎟⎠

2

• Note that this is just regression!

• We would then naively interpret our estimates asthe probabilities p(r).

• In Monte Carlo studies, this performed poorly.Many of the coefficients p(r) were negative forinstance.

Instead, we propose using the following estimator:

bp = argminp

1

I

IXi=1

⎛⎜⎜⎜⎜⎜⎜⎜⎝yij −RXr=1

p(r)


j0=1exp(β(r)xij0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠

⎞⎟⎟⎟⎟⎟⎟⎟⎠

2

s.t. p(r) ≥ 0 andXrp(r) = 1

• This is an inequality constrained regression, asconsidered by Judge and Takayam (1966), Geweke(1986) and Wolak (1987).

• This is a straightforward quadratic programmingproblem.

• The algorithm for finding the solution is impli-

mented in standard software packages including

Matlab.

• The standard errors are simple for this model andcorrepond to simple modifications of OLS formu-

las.

11 Identification.

• An important question to ask is whether our ran-dom coefficient discrete choice models are identi-

fied.

• That is, can the primitives (i.e. random utilities)

be uniquely recovered from the data.

• The answer to this question in general is no.

• To see why, suppose that there was only a singleconsumer with a deterministic utility.

• This is a special case of our more general, randomutility framework.

• Utility functions cannot be identified from choice

behavior.

• We can always make monotonic transformations.

• Therefore, in general, distributions over utility func-tions cannot be identified.

11.1 Quasi Linear Preferences.

• One case where we can identify our model is thecase of quasi linear preferences.

• Consider a simple example where there are i =

1, ..., 3 consumers choose between two goods j =

1 or 2 and an outside option ( j = 0).

• If utility is quasi linear, WLOG we can write theutility function for consumer i as:

ui(j, c) = βi,11{j = 1}+ βi,21{j = 2}+ c

• For tour simple example, suppose that³β1,1, β1,2

´=

(1, 2) and³β2,1, β2,2

´= (5, 6).

• If p2 is sufficiently high, the demand for good 1will be two if p1 is less than 1, one unit if p1 is

less than 5 and zero if p1 exceeds 5.

• One consumer has a marginal utility for good 1equal to 1 and another person has marginal utility

of 5.

• In a similar fashion, the economist can learn thatthe marginal utilities for good 2 are equal to 2

and 6.

• At this point, cannot determined whether³β1,1, β1,2

´=

(1, 2) or³β1,1, β1,2

´= (1, 6).

• However, note that when p1 = 1 and p2 = 2, theconsumer is exactly indifferent between consum-

ing good 1 and good 2.

• Therefore, the demand changes discontinuouslyat this point.

• In fact, demand changes discontinuously alongthe plane where βi,1 − p1 = βi,2 − p2 and p1 ≤1, p2 ≤ 2.

• Therefore, we can conclude that the preferencesof consumers in this market can be represented

by³β1,1, β1,2

´= (1, 2) and

³β2,1, β2,2

´= (5, 6).

• More generally, using this type of logic, we candemonstrate that the distribution of random co-

efficients for the model below is identified:

ui(j, c) =JX

j0=1βi,11{j = j0}+ c (3)

• It is also possible to prove that if (i) there is paneldata on individual decisions and (ii)individual pref-

erences remain fixed, then the distribution of pref-

erences is identified (up to monotonic transforma-

tions of the utility function.

• See Bajari, Fox and Ryan for a proof.

1 Demand Estimation

Documents

Transcript of 1 Demand Estimation