Bayesian Estimation of Discrete Games of Complete …...Keywords: Discrete games, multiple...

Bayesian Estimation of Discrete Gamesof Complete Information

Sridhar Narayanan∗

May 30, 2011

Discrete games of complete information have been used to analyze a variety of con-texts such as market entry, technology adoption and peer effects. They are extensionsof discrete choice models, where payoffs of each player are dependent on actions ofother players, and each outcome is modeled as Nash equilibria of a game, where theplayers share common knowledge about the payoffs of all the players in the game.An issue with such games is that they typically have multiple equilibria, leading tothe absence of a one-to-one mapping between parameters and outcomes. Theorytypically has little to say about equilibrium selection in these games. Researchershave therefore had to make simplifying assumptions, either analyzing outcomes thatdo not have multiplicity, or making ad-hoc assumptions about equilibrium selection.Another approach has been to use a bounds approach to set identify rather thanpoint identify the parameters. A third approach has been to empirically estimatethe equilibrium selection rule.

In this paper, we take a Bayesian MCMC approach to estimate the parametersof the payoff functions in such games. Instead of making ad-hoc assumptions onequilibrium selection, we specify a prior over the possible equilibria, reflecting theanalyst’s uncertainty about equilibrium selection and find posterior estimates forthe parameters that accounts for this uncertainty. We develop a sampler using thereversible jump algorithm to navigate the parameter space corresponding to multipleequilibria and obtain posterior draws whose marginal distributions are potentiallymulti-modal. When the equilibria are not identified, it goes beyond the boundsapproach by providing posterior distributions of parameters, which may be importantgiven that there are likely regions of low density for the parameters within the bounds.When data allow us to identify the equilibrium, our approach generates posteriorestimates of the probability of specific equilibria, jointly with the estimates for theparameters. Our approach can also be cast in a hierarchical framework, allowingnot just for heterogeneity in parameters, but also in equilibrium selection. Thus,it complements and extends the existing literature on dealing with multiplicity indiscrete games.

We first demonstrate the methodology using simulated data, exploring the method-ology in depth. We then present two empirical applications, one in the context ofjoint consumption, using a dataset of casino visit decisions by married couples, andthe second in the context of market entry by competing chains in the retail sta-tionery market. We show the importance of accounting for multiple equilibria inthese application, and demonstrate how inferences can be distorted by making thetypically used equilibrium selection assumptions. Our applications show that it isimportant for empirical researchers to take the issue of multiplicity of equilibria se-riously, and that taking an empirical approach to the issue, such as the one we havedemonstrated, can be very useful.

Keywords: Discrete games, multiple equilibria, Bayesian estimation, MarkovChain Monte Carlo methods, reversible jump algorithm

∗Graduate School of Business, Stanford University. Email: [email protected]. I thank Wes

Hartmann for useful discussions, and the participants of the seminar at Goethe University and the Marketing

Science Conference 2010 for useful comments. All errors are my own.

1

1 Introduction

In a number of situations, agents’ payoffs from their actions are not independent of actions ofother agents. For instance, the profits that a firm would earn from entering a market woulddepend on the entry decisions of its competitors, or the benefits to adopting a technology ora new product with network effects depend on adoption decisions of others. Discrete gameshave been applied to several such contexts where decisions are discrete or can be discretized.Discrete games of complete information were first applied by Bresnahan and Reiss (1990) andBerry (1992) in market entry contexts. The actions of agents are modeled as the Nash equi-librium outcomes of a static game where all agents are fully informed about the payoffs of allother agents. Subsequently, such games have been applied to product quality choice (Mazzeo,2002), pricing strategy choice (Zhu, Singh, and Manuszak, 2009) and joint consumption (Hart-mann, 2010) amongst others. A specific issue with such games is the absence of a one-to-onemapping between parameters and outcomes - a problem of multiplicity of equilibria. The extantliterature has proposed ways to deal with multiplicity, ranging from the modeling of outcomesthat are unique even in the presence of multiplicity (Bresnahan and Reiss, 1990; Berry, 1992),specification of a sequential rather than simultaneous move game (Berry, 1992), assuming anad-hoc equilibrium selection rule (Hartmann, 2010), randomizing the equilibria (Soetevent andKooreman, 2007), obtaining the identified set estimates rather than unidentified point estimatesof parameters (Ciliberto and Tamer, 2009) and empirically estimating the equilibrium selectionrule (Bajari, Hong, and Ryan, 2010). In this paper, we propose an alternative hierarchicalBayesian approach to deal with this issue of multiplicity, in which the uncertainty of the analystover the multiple equilibria is accounted for and posterior parameter estimates obtained thatreflect this uncertainty.

The problem of multiple equilibria was explored in detail by Bresnahan and Reiss (1990, 1991)and Berry (1992) in their seminal studies of market entry. However, the idea itself has beenpointed out earlier (for instance Heckman 1978). The main issue with estimation in the presenceof multiple equilibria is that for a given set of parameters, observable covariates and unobserv-able variables, there is the potential for more than one equilibrium outcome. This causes thelikelihood to be ill-defined because the probabilities of the potential outcomes for a particularobservation add up to a value greater than 1. Such an econometric model is termed incoherent(Tamer, 2003) and cannot be directly estimated using a maximum likelihood procedure with-out simplifying assumptions. The literature on discrete games has made various simplifyingassumptions, but these assumptions either reduce the scope of the problems that can be studied(e.g. the study of unique number of entrants rather than their identities in Bresnahan and Reiss(1990)) or can lead to inconsistency of estimates if the assumption is invalid (e.g. the assumedsequence of entry in Berry 1992, or the assumption of equilibrium selection in Hartmann 2010).An alternative approach has been to focus on estimating the bounds for parameters rather thanpoint estimates since the bounds are uniquely determined for a given set of outcomes in the

2

presence of multiplicity (Tamer 2003; Ciliberto and Tamer 2009). While this approach has itsappeal, one issue is that little is known about where the parameters lie within these bounds,reducing the utility of the estimates for any counterfactual analysis. A third approach has beento take an empirical approach to the problem of multiplicity, estimating the equilibrium selectionrule (Bajari, Hong, and Ryan, 2010). However, this approach either requires the researcher toobserve a set of variables that affect one player’s payoffs, but can be excluded from the payofffunctions of other players, or relies on an identification at infinity argument, where there areobservations with sufficiently large values of covariates. In practice, it may be difficult to findsuch excluded variables in some contexts, or observations that meet the requirements for identi-fication at infinity. For a detailed review of the empirical literature on discrete games, the readeris referred to Ellickson and Misra (2010).

We develop a hierarchical Bayesian approach to estimate parameters of the payoff functionsof the players in the presence of multiple equilibria. Essentially, this approach takes a Bayesianmodel selection route to addressing the problem of multiplicity, by conditioning the model onequilibrium selection. Conditional on equilibrium selection, the likelihood is well-defined, andhence the parameters of this conditional model can be estimated. Thus, the problem can beconsidered to be one where there are multiple (conditional on equilibrium selection) models,over which the analyst has uncertainty. We develop a sampling algorithm for this problem,using a reversible jump MCMC algorithm to sample the parameters over the multiple modelscorresponding to the multiple equilibria jointly with indicators for the models themselves. Thisprocedure generates draws from the posterior distribution of parameters across the multipleequilibria, reflecting our uncertainty about equilibrium selection. The procedure has severalappealing features. Compared to an approach that estimates bounds on parameters (Cilibertoand Tamer, 2009), this approach gives us posterior distributions of parameters, and thereforetells us the regions within the bounds with high posterior probability of the parameters. Italso nests within it an equilibrium selection rule of the kind proposed by Bajari, Hong, andRyan (2010), generating posterior estimates of the selected equilibrium in the data, providedthe equilibrium is identified. In cases where the equilibrium is not identified, the parameterestimates reflect the analyst’s prior uncertainty about equilibrium selection. It also nests withinit equilibrium selection rules such as Pareto optimality (Hartmann, 2010). Thus, the proposedmethod provides a practical tool to deal with multiple equilibria in the estimation of discretegames, nesting within it many of the approaches used in the extant literature.

We demonstrate our proposed algorithm first using simulated data. We show that it is able torecover parameters of a discrete game with more than 2 players with a high degree of accuracy.We further demonstrate through these simulations the downsides of making ad-hoc equilibriumselection mechanisms. We then apply the methodology to a social interaction game involvingcasino visits by married couples. We estimate the parameters of the model, and demonstrate howthe parameter estimates as well as estimates of counterfactual simulations are biased when thepresence of multiple equilibria is ignored or dealt with using an ad-hoc equilibrium selection rule.

3

We finally present an application to a competitive market entry game in the retail stationerymarket. Our estimates show that the commonly assumed equilibrium selection rule has lowposterior probability in this market, raising questions about the assumption for other marketsas well.

The rest of the paper is organized as follows. We first set up an illustrative model to demon-strate the issue of multiplicity for two different contexts - one where players’ payoffs are positivelyaffected by the presence of other players, and another where they are negatively affected. Webriefly discuss existing approaches to estimating discrete games of complete information, includ-ing approaches that have been used to deal with multiplicity of equilibria. Next, we explainin detail our proposed Bayesian method to estimate the parameters of the conditional modeland model indicators using a reversible jump MCMC sampler. We discuss conceptual issuesrelated to the methodology, as well as issues related to implementation. We next demonstrateour methodology using simulated data, and then present our empirical application. We discussthe results of the empirical analysis, pointing in particular to the consequences of assumptionson equilibrium selection. We finally conclude.

2 Multiple equilibria in discrete games

We first set up an illustrative model for a market entry game, modifying the discussion for asocial interactions game subsequently. Assume that there are a total of i = 1 . . . I players inthe game, and t = 1 . . . T outcomes observed for these players. In a market entry context, theplayers are firms considering entry in the T markets, and the outcomes represent the marketstructure in each market, i.e. which of the firms choose to enter and which do not. Let thepayoff functions for firm i in market t be the following if it chooses to enter the market.

πit = αi+Xitβ−γ ·1

�

j �=iyjt > 0

−γδ ·1

�

j �=iyjt > 1

−γδ2 ·1

�

j �=iyjt > 2

− . . .+εit (1)

Here, αi is a player specific intercept and represents the intrinsic profitability of the firm.Xit is a vector of exogenous covariates, representing factors like the population size and otherdemographic factors of market t, and β is a vector of coefficients for these covariates. Afterthis are a set of terms that represent the effect of the presence of competitors in market t. yitrepresents the decision of firm i to enter market t and takes the value 1 if the firm choosesto enter and 0 otherwise. The first of the competitive effect terms thus captures the impacton firm i’s profits if it has one competitor in the market, and subsequent terms capture theeffect of additional competitors. The parameter γ and δ are assumed to be positive, and thusthe competitive effect is assumed to be negative. Finally, there is an additively separable errorwhich is unobservable to the econometrician, but is observed by all the firms. This may representunobserved fixed costs of entering the market, which is firm specific. The common knowledgeof the errors amongst all players of the game makes this a complete information game. The

4

specification laid out above is similar to the one in Hartmann (2010) and captures the competitiveeffect in a parsimonious way.1

Each firm considers its own entry decision, taking into account the optimal decisions of itscompetitors. It chooses to enter the market if it expects positive profits, and chooses to stayout of the market otherwise. The outcome is thus a Nash equilibrium of this game, with anequilibrium in pure strategies guaranteed with the restrictions on parameters that we haveimposed (non negative γ and δ).

As has been pointed out in the literature, such a game has multiple equilibria. To illustratethis point, we consider the two player case, and without covariates. The profit functions of thetwo firms are then given by

πit = αi − γ · y−i,t + εit, i ∈ {1, 2} (2)

where y−i,t is the entry decision of the competing firm.Each firm enters the market if it expects positive profits. Let us first consider the case where

neither firm enters market t. Thus, both y1t and y2t are 0 in this case. The profits for the firmsin this situation are given by

π1t = α1 + ε1t < 0

π2t = α2 + ε1t < 0 (3)

Thus, for neither firm to enter

ε1t < −α1

ε2t < −α2 (4)

Firm 1 would enter and firm 2 would not enter the market if the former expected positiveprofits and the latter did not. In this condition y1t = 1 and y2t = 0. Thus, the equilibriumconditions are given by

ε1t > −α1

ε2t < −α2 + γ (5)

and firm 2 would enter the market and firm 1 would not if1It is necessary to give structure to the competitive effect, particularly for small number of players since there

are limited degrees of freedom available. For instance, in a two player game, one can estimate at most three

parameters in addition to the coefficients for exogenous covariates - for instance two firm-specific intercepts

and a competitive effect parameter. In a 3-player game, one can estimate at most seven parameters, and hence

we would not be able to estimate the most general competitive effect even in this case, where there were for

instance a different effect of each firm on each of its competitors, in addition to firm-specific intercepts. As the

number of players increase, the degrees of freedom increase exponentially, giving more flexibility in specifying

the competitive effects.

5

ε1t < −α1 + γ

ε2t < −α2 (6)

Both firms would enter the market if both expected positive profits and this corresponds to

ε1t > −α1 + γ

ε2t > −α2 + γ (7)

The multiple equilibria in this case result from the fact the conditions in equations 5 and6 are not mutually exclusive. The equilibrium conditions are depicted graphically in figure 1.Consider panel A of this figure. There are nine regions depicted in this panel, which we shallrefer to as cells. These cells are labeled I through IX. These cells are defined by the variousequilibrium conditions in equations 4 through 6. Now consider panel B of the figure. Cell I,which corresponds to neither firm entering (i.e. satisfying the equilibrium condition in 4) isshaded. Similarly panels C, D and E are graphical representations of the equilibrium conditionsin 5, 6 and 7 respectively. Note from panels C and D that they share a cell - cell V, i.e. bothequilibria are satisfied in this cell. This is the region of multiple equilibria, highlighted in panelF of the figure. This cell is consistent with two equilibria - either firm 1 alone enters, or firm 2alone enters.

Discrete games can also be set up to model social interactions, where players benefit from jointconsumption of a product. A slightly modified version of the model used for analyzing marketentry could be used for this context, although the equilibrium conditions, and the nature ofmultiplicity of equilibria in this case are distinctly different. Let the payoffs of the two playersbe given by

πit = αi+Xitβ+γ ·1

�

j �=iyjt > 0

+γδ ·1

�

j �=iyjt > 1

+γδ2 ·1

�

j �=iyjt > 2

+ . . .+εit (8)

Note that the main difference between the payoffs in this social interaction game and thosein the market entry game in equation 1 is the sign of the interaction terms. In this case,consumption by another player increases the payoffs from consumption. Thus, for γ > 0 andδ > 0, we are guaranteed the existence of an equilibrium in pure strategies for any realizationof the errors εit and any values of the covariates and the parameters.

The equilibrium conditions for a two-player game with social interactions can be derived ina similar fashion as for the market entry game. Assuming there are no other covariates in themodel, the payoffs are given by

πit = αi + γ · y−i,t + εit (9)

6

where y−i,t is the consumption decision of the other player.Neither player consumes if both expect negative payoffs from consumption. The conditions

for this equilibrium are

ε1t < −α1

ε2t < −α2 (10)

If player 1 consumes and player 2 does not, the expected payoff of the former is positive andthat of the latter is negative. Hence, the conditions for this equilibrium are

ε1t > −α1

ε2t < −α2 − γ (11)

those for player 2 consuming alone are

ε1t < −α1 − γ

ε2t < −α2 (12)

and those for both players consuming jointly are

ε1t > −α1 − γ

ε2t > −α2 − γ (13)

The change in sign of the interaction effect in the payoff function has a non-trivial impacton the nature of the multiplicity of equilibria. In particular, the overlap now is between theequilibrium where neither player consumes and the one where both players consume. This isagain best seen graphically. Figure 2 depicts the equilibrium conditions for the various outcomes.Panel A shows the nine regions defined by the equilibrium conditions in 10 through 13. PanelB of this figure shows us the conditions for the equilibrium where neither player consumes -this corresponds to cells I, II, IV and V. Panels C and D respectively show the conditions forplayer 1 consuming alone (Cell III) and player 2 consuming alone (Cell VII) respectively. PanelE shows the conditions for joint consumption by both players (Cells V, VI, VIII and IX). Ascan be seen in Panel F, multiplicity of equilibria arises from the fact that Cell V is consistentwith two equilibria - neither player consuming and both of them consuming.

With greater than two players in the game, the issue of multiplicity remains and the number ofcells with multiple equilibria increases. While it is somewhat tedious to work out the equilibriumconditions for a large number of players by hand, it is straightforward to enumerate the equilibriausing a computer. This is because irrespective of the values of the parameters, the cells that

7

correspond to more than one equilibrium are the same for a given number of players and a giventype of game (i.e. market entry or social interactions). The addition of covariates to the modelalso does not fundamentally change the equilibrium conditions and the cells corresponding tothe different equilibria. For instance, the equilibrium conditions for a market entry game aremodified to the following. For nether firm to enter the market, the conditions are

ε1t < −α1 −X1tβ

ε2t < −α2 −X2tβ (14)

Firm 1 would enter and firm 2 would not enter the market if the former expected positiveprofits and the latter did not. In this condition y1t = 1 and y2t = 0. Thus, the equilibriumconditions are given by

ε1t > −α1 −X1tβ

ε2t < −α2 −X2tβ + γ (15)

and firm 2 would enter the market and firm 1 would not if

ε1t < −α1 −X1tβ + γ

ε2t < −α2 −X2tβ (16)

Both firms would enter the market if both expected positive profits and this corresponds to

ε1t > −α1 −X1tβ + γ

ε2t > −α2 −X2tβ + γ (17)

It can be seen from these equilibrium conditions that covariates merely shift the boundariesof the cells corresponding to the various outcomes.

3 Estimating discrete games of complete information

3.1 Existing approaches

There are several approaches that the literature has used to estimate complete informationdiscrete games of the type laid out above. These approaches vary in how they deal with mul-

8

Figure 1: Equilibrium Conditions - Discrete Game of Market Entry

9

Figure 2: Equilibrium Conditions - Discrete Game of Social Interactions

10

tiplicity of equilibria. The early literature on market entry (Bresnahan and Reiss, 1990, 1991;Berry, 1992) recognized that while there are multiple equilibria in market entry games, thenumber of players is unique for any given realization of the errors and any set of values of thecovariates and parameters. For the two player game for instance, it is clear from figure 1 that themultiplicity is between the equilibrium where firm 1 enters alone, or the one where firm 2 entersalone (cell V). Thus, if one were interested in modeling the number of firms that entered a mar-ket rather that their identities, there would be no problem of multiplicity. Estimation proceedsby specifying the distributions of the errors, which makes it possible to develop a maximumlikelihood procedure to find parameter estimates.

Another approach has been to make an ad hoc assumption about equilibrium selection. Forinstance, Hartmann (2010) estimates a game of social interactions to model the decisions offriends playing golf together. In such a game, the multiplicity is between no consumption byeither player and joint consumption by both players in a two-player game. This approach relieson theoretical arguments to posit that agents choose the Pareto optimal outcome of both playerschoosing to consume over the Pareto dominated outcome of neither consuming. Thus, cell Vin figure 2 is assumed to correspond entirely with the outcome of both players choosing toconsume. With distributional assumptions on the errors, the likelihood of this model is nowwell-defined. A maintained assumption in this case is that the players are strictly better off byconsuming the focal good than not consuming it. This could be violated, for instance, if theoutside option involves consuming other goods, whose joint consumption gives greater payoffsthan joint consumption of the focal good. For instance, if instead of playing golf together, theplayers could go to the movies together, it is not clear if the assumption on selection of thePareto dominant equilibrium could be sustained.

A third approach is to specify the sequence in which players move in a sequential rather thansimultaneous move game. For instance, Berry (1992) estimates a sequential move discrete gamein the airline market by making the assumption that the more profitable player moves first. Suchan assumption gets rid of the issue of multiplicity, though it depends on observing the order inwhich firms entered. In many empirical applications of discrete games, this order is unobserved.

The recent literature on discrete games has focused on taking an empirical approach to theissue of multiplicity. The methodology proposed by Ciliberto and Tamer (2009), for instancerelies on the fact that equilibrium conditions of the kind in equations 4 through 7 define thenecessary conditions that need to be met for each of the outcomes, though multiplicity impliesthat some of those conditions are not sufficient to define an outcome. These necessary conditionscan be used to set up moment inequalities, that allow for set identification of the parameterseven when the parameters are not point identified. This procedure identifies upper and lowerbounds on the parameters which are obtained through two sets of moment inequalities foreach parameter. These two sets of inequalities correspond respectively to regions of the errorspace leading to unique equilibria, and those including the regions of multiple equilibria. Thepredicted choice probabilities for various outcomes corresponding to these regions of the error

11

space are matched with the empirical choice probabilities to obtain estimates of the bounds onthe parameters.

Another empirical approach to dealing with the issue of multiplicity is to estimate the equilib-rium selection probabilities simultaneously with other model parameters. In general, the equi-librium selection probabilities are not identified if we merely observe the actions of the agents.However, in the presence of exclusion restrictions, they are identified. This is the approach takenby Bajari, Hong, and Ryan (2010), where a set of excluded covariates, covariates that affect thepayoffs of one player but not the others - is used to identify the probabilities for the variousequilibria as well as the payoff parameters simultaneously. An example of excluded variablesused in the literature is distance from the headquarters and warehouses of retailers in a marketentry game - the rationale being that the distance from its own headquarters or warehouse toa focal market would impact the profits of a retailer in the market (through transportation andmanagement costs that depend on distance) but would not directly affect the profits of its com-petitors (except through the entry decision of the retailer). Another example in the situation ofjoint consumption of a product by friends, for instance, is targeted promotions that are offeredand can be availed by one player but not by the other.2

3.2 Identification of equilibrium using exclusion restrictions

In this sub-section, we develop the intuition for how exclusion restrictions can be used to identifythe equilibrium selection probabilities. Assume a two player competitive game of the typedepicted in Figure 1. Consider without loss of generality the case where variable X1 affects thepayoffs of firm 1, but not those of firm 2, and variable X2 affects the payoffs of firm 2 but notof firm 1. The coefficients of X1 and X2 are respectively β1 and β2, which are assumed positivewithout loss of generality. Figures 3 and 4 graphically depict the effect of shift of covariates onthe outcome probabilities for the two equilibria for such a game - the first (equilibrium 1) whereCell V is entirely allocated to firm 1’s entry, and the second (equilibrium 2) where Cell V isentirely allocated to firm 2’s entry. Panel A in these two figures depict the baseline situation,i.e. at given levels of the covariates and at some fixed set of values of the parameters.

Now consider Panel B in the two figures, where the covariate for firm 1, X1 is shifted to alower value X �1. The shift in the covariate shifts the cell boundaries to the right in the case ofboth equilibria. This shifts the probabilities of the various outcomes - in particular, it increasesthe probability that neither firm enters (Cell I expands) and decreases the probability that bothfirms enter the market (Cell IX contracts). Note that the effect of the shift in covariate onthese two outcomes is identical for the case of both equilibria. The shift in covariate also shiftsthe probabilities for the remaining two outcomes - firm 1 entering alone and firm 2 entering

2Bajari, Hong, and Ryan (2010) also discuss an identification at infinity approach to identify the equilibriumselection probabilities, which relies on observing local perturbations of covariates around values for which

there is a unique equilibrium with probability approaching 1. Using an invariance assumption on the equi-

librium selection rule, the probabilities of selecting the various equilibria are estimated using observations

corresponding to other values of the covariates. In our Bayesian approach, we do not use such an approach

and hence we do not discuss this in further detail.

12

alone. In the case of both equilibrium 1 and equilibrium 2, the probability of firm 1 enteringthe market alone decreases and that for firm 2 entering alone increases. However, the change inprobabilities is different for the two equilibria as long as the errors are not uniformly distributed.This is because the specific areas that shift from being consistent with firm 1 entering aloneto firm 2 entering alone are different, and these two areas would have different densities ofthe distribution of errors for any non-uniform distribution. Thus, the shift in covariates causesdifferent substitution patterns for the two equilibria. However, this alone is insufficient to identifythe equilibrium from the data, since a given substitution pattern arising out of the shift of X1could be rationalized by not just the two equilibria but also different sets of parameters. Thus, forinstance, a shift in shares consistent with the parameters (α, β1, γ) in Panel B of Figure 4 couldbe potentially rationalized by a a different set of parameters (α�, β�1, γ�) in Panel B of Figure 3,highlighting the fact that variation in this covariate alone cannot identify the equilibrium. Wehave a similar situation when we shift firm 2’s covariate X2, as shown in Panel C of Figures 3and 4. Another way to state this non-identification is that there are insuffficient moments toidentify both the equilibrium and the parameters.

Now consider the situation where we shift both covariates X1 and X2 simultaneously. This isdepicted in Panel D of the two figures. We see once again that the shift in the probabilities ofneither firm entering and both firms entering are identical for the two equilibria, and the shift inshares of the remaining two outcomes differ between the two equilibria. It can be seen from thefigure that the changes in shares that result from the simultaneous shifting of the two covariatesis not just a mere addition of the changes that take place when the two covariates are shiftedseparately. For instance, note that a small rectangular area at the bottom-left corner of CellV becomes a part of the area for the new Cell 1 when the two covariates are shifted, and thuscorrespond to the outcome of neither firm entering the market. This area is not a part of theexpanded Cell I for either the case where X1 alone shifts (Panel B) or where X2 alone shifts(Panel C). Similarly, the changes in the probabilities of the other outcomes in the case of bothcovariates shifting are not mere summations of the changes for each covariate shifting alone. Thiscan be thought of as an interaction effect of the two covariates on the outcomes. In the case ofthe two outcomes that involve only one firm entering alone, this interaction effect is differentfor the two equilibria, since the regions involving this interaction effect are different for the twoequilibria, with different densities of the error distribution (unless the distribution is uniform).Note, however, that due to the exclusion restriction, there is no direct interaction effect of thetwo covariates on the payoffs of the two firms. We thus have an additional set of moments thatcannot be rationalized merely by a different set of coefficients. Thus, the equilibrium is identifiedwhen we observe the three different scenarios - where X1 alone shifts, X2 shifts and both variablesshift. While we have presented a situation, for the purpose of illustration, where one covariateshifts, keeping the other covariate unchanged, the discussion can be extended to other patternsof variation in the data. In the absence of exclusion restrictions, the interaction effect of the shiftof both covariates could be rationalized by an additional set of parameters (e.g. the effect of X1

13

on firm 2’s payoffs and so on), and the equilibrium would be unidentified. Note also that theparameters and equilibria of discrete games of this nature are not identified non-parametrically- i.e. we have to impose assumptions on the distributions of the unobservables.

This discussion illustrates how exclusion restrictions identify the parameters as well as theequilibrium. It can be trivially extended to the case of other games, for instance games of jointconsumption and for games with a different number of participants. An important point to noteis that the number of equilibria that can be identified depends on the number of moments that theexclusion restrictions make available for such identification - we cannot identify more equilibriathan there are moments. This places restrictions on the nature of the empirical analysis thatcan be done. For instance, in a competitive game with just three players, there are hundredsof equilibria in pure strategies. Thus, one would need at least that many moments through theexclusion restrictions if one wished to identify which of these equilibria are played in a specificempirical context. Since in practice such a large number of moments is unlikely to be available,the empirical researcher would need to reduce the problem to a more feasible one - for instance,one could identify between a Pareto dominant and the set of all Pareto dominated equilibria inthis case. Such restrictions are in practice necessary for any game that involves more than twoplayers, since the number of equilibria increases exponentially in the number of players.

3.3 Bayesian approach

Our proposed Bayesian approach to estimating discrete games of complete information startsby making distributional assumptions on the errors and specifying prior distributions for theparameters. Due to multiple equilibria, the likelihood is not well defined and hence a standardMarkov Chain procedure such as a Metropolis Hastings algorithm is not feasible. However,conditional on a selected equilibrium, the likelihood is well-defined. The main idea behindour proposed methodology is to treat each selected equilibrium as a model, and to set up anestimation procedure that augments the parameter state space with a model state. Thus, theestimation routine navigates both parameter and model spaces, and generates joint posteriordraws of parameters and model indicators. When the equilibrium is identified, for instancethrough the presence of excluded variables, the estimation routine generates posterior estimatesof model parameters as well as model indicators. When the equilibrium is not identified, theposterior estimates of the model indicators would be the same as the prior indicators (up to asimulation error), but the procedure is able to uncover the posterior distribution of parametersthat spans the various models and thus reflects the analyst’s (prior) uncertainty about theequilibria. This procedure is related to the bounds procedure of Ciliberto and Tamer (2009)in that it provides estimates reflecting the full set of equilibria rather than selecting one of

14

Figure 3: Relationship of outcome probabilities with covariates - Equilibrium 1

15

Figure 4: Relationship of outcome probabilities with covariates - Equilibrium 2

16

them a priori. However, in place of the extremities of the set of parameter values, the proposedmethod obtains posterior distributions of the parameters and thus gives us information aboutwhere the posterior distributions of parameters have greater mass and where they do not. LikeBajari, Hong, and Ryan (2010), our procedure allows the probabilities corresponding to eachequilibrium to be estimated when they are identified, but also allows for estimation of parameterswhen they are not identified. Further, we obtain parameter estimates corresponding to each ofthe equilibria, whereas the procedure in Bajari, Hong, and Ryan (2010) estimates one set ofparameters.

We first discuss a Bayesian approach to estimating a discrete game, with the use of an ad-hocequilibrium selection rule, as has been often used in the literature. We then build on this todevelop a methodology for dealing with multiplicity of equilibria.

3.3.1 Bayesian estimation with an ad-hoc equilibrium selection

To illustrate the proposed methodology, we first start with the case of an entry game with twoplayers, and subsequently extend it to cases with a greater number of players and to games ofsocial interaction. Let the payoff functions of the two players be given by

πit = αi +Xitβ − γ · y−i,t + εit (18)

where the notation is the same as before. πit represents firm i’s profits in market t. αirepresents a firm specific intercept. The vector Xit represents a set of exogenous covariates,which potentially includes a set of excluded variables. If it is an excluded variable, it takes thevalue 0 for all observations for the player whose payoff it is excluded from. The coefficient forthe covariates is β, which includes the coefficients for the excluded variables. The interactionbetween firms is captured by the coefficient γ. Finally, εit is assumed to be an error that isuncorrelated across markets.

The conditions for entry are therefore

yit =

�1 if πit > 00 if πit < 0

(19)

Conditional on an equilibrium being selected, the model is a bivariate probit model. Theequilibrium conditions are similar to those described earlier, except that covariates are includedand the equilibrium selection rule is imposed. Note from figure 1 that there are two possibleequilibria for a region of the error space (shown as Cell V in the figure) - one where firm 1 entersthe market, and another where firm 2 enters the market. Take the case of the first equilibriumbeing selected, i.e. firm 1 enters but firm 2 does not when errors are in Cell V. This could result,for instance, from the true data generating process being one with sequential entry, with firm 1making entry decisions before firm 2. In such a case, the regions of the error space correspondingto the four possible outcomes are below

1. (y1t, y2t) = (0, 0) ⇒ (ε1t, ε2t) ∈ Cell I

17

2. (y1t, y2t) = (0, 1) ⇒ (ε1t, ε2t) ∈ Cells IV/VII/VIII

3. (y1t, y2t) = (1, 0) ⇒ (ε1t, ε2t) ∈ Cells II/III/V/VI

4. (y1t, y2t) = (1, 1) ⇒ (ε1t, ε2t) ∈ Cell IX

Note here that the boundaries of the cells would be modified from that in figure 1 due to thepresence of covariates in the model. Thus αi would be substituted by αi +Xitβ in all the cellboundaries on both axes. For a given guess of parameters, the likelihood for a given observationis the sum of the probabilities of the errors being in the cells corresponding to that outcome.With a suitable distributional assumption on the errors, the likelihood can be evaluated. Weassume that the unobservables are normally distributed with mean 0. i.e.

εit ∼ N (0, 1) (20)

The variance of the error is fixed to 1 for identification purposes.In order to specify the likelihood for a particular observation, we need to find the probability

of the unobservables being in one of the cells corresponding to the outcome for that observation.This probability is the sum of the probabilities for each of the cells for that outcome. Theprobability for the unobservables in any given cell j for the tth observation are given by

Pjt =

ˆAjt

f (εt) dεt (21)

where εt is the vector of unobservables, f (εt) is the distribution of εt and Ajt denotes the regionof the error space for the jth cell (i.e. the boundaries of the cell, which vary with observationbecause of variation in the covariates Xit). Since we have assumed εt distributed to be distributednormally, this reduces to the following expression in our two-player example

Pjt =

ˆ uj1tlj1t

ˆ uj2tlj2t

φ (ε1t)φ (ε2t) dε1tdε2t = [Φ (uj1t)− Φ (lj1t)] [Φ (uj2t)− Φ (lj2t)] (22)

where ljit and ujit represent the lower and upper boundaries of the jth cell for εit, φ (·) repre-sents the normal density function and Φ (·) represents the normal distribution function. Theboundaries of the cells have been defined earlier. For instance, for cell I, lI1t = lI2t = −∞,uI1t = −α1 − X1tβ and uI2t = −α2 − X2tβ. The boundaries of the other cells are similarlyobtained.

The likelihood of the tth observation is then

Lt (θ;Xt, yt) = 1 (y1t = 0, y2t = 0) (PI,t) + 1 (y1t = 0, y2t = 1) (PIV,t + PV II,t + PV III,t) (23)

+1 (y1t = 1, y2t = 0) (PII,t + PIII,t + PV,t + PV I,t) + 1 (y1t = 1, y2t = 1) (PIX,t)(24)

18

where 1 (·) is an indicator function, taking the value 1 if the conditions in it are true and 0otherwise, yt is the vector of decisions, Xt is the matrix formed by stacking all covariate vectors(Xit), and θ is the vector of all parameters

The likelihood of the data is then

L (θ;X, y) =�

t

Lt (θ;Xt, yt) (25)

Here, y is the stacked vector of outcomes, and X is the stacked matrix of covariates across allobservations.

To complete the model, we need to specify prior distributions for the parameters - α1, α2, βand γ. Since γ is constrained to be positive, we reparametrize it as exp (γ̃). Thus,

θ =�

α1 α2 β γ̃�

(26)

Let the prior distribution for θ be

θ ∼ N (µ,Σ) (27)

The posterior distribution of parameters is then given by

f (θ|X, y, µ,Σ) ∝ L (θ;X, y) f (θ|µ,Σ) (28)

The parameters can be estimated in an MCMC procedure that utilizes the Metropolis-Hastingsalgorithm (Chib and Greenberg, 1995). This involves generating a sequence of draws from acandidate density and then modifying it through a rejection step to achieve the detailed balancecondition. This sequence of draws converges to the posterior distribution of parameters, allowingus to do inference or conduct counterfactual experiments.

We have discussed the estimation of a market entry discrete game with two players, with thead-hoc equilibrium selection rule of firm 1 entering in the case of multiple equilibria (where thetwo equilibria are of either firm 1 entering the market alone, or firm 2 entering it alone). Butthe methodology can be easily extended to more players. For instance, in a three player entrygame, there are a total of 6 cells that are consistent with two equilibria each, and 2 cells thatare consistent with three equilibria each. Conditional on X and ε (i.e. within a cell), there areup to 3 equilibria. However, there are a total 576 (= 26 · 32) possible combinations of equilibriaif we did not condition on X and ε. We henceforth refer to each such combination of cell-by-cellequilibria as an equilibrium profile. Conditional on choosing an equilibrium profile, there isa unique assignment of each cell to an outcome, and hence the likelihood for such a game iswell-specified. Given the likelihood, a Metropolis Hastings algorithm for simulation from theposterior distributions of parameters is relatively straightforward to implement.

The methodology can also be extended to other equilibrium selection mechanisms. For in-stance, the assumption in Berry (1992) is that in cases of multiple equilibria, the firms enter

19

in the order of profitability. This can be justified on the basis of an underlying assumptionof sequential entry, where the firm with the greatest expected profits in a market enters first,followed by the next most profitable firm and so on. Thus, the equilibrium selection mechanismselects the equilibrium that maximizes industry profits market by market.

Implementing the equilibrium selection rule described above requires a modification of theprocedure we have described for a rule that allocates the entire cell with multiple equilibriato one of the outcomes. This is because within a cell, different regions would be allocatedto different outcomes if we assumed that the more profitable firm entered. To see this, letus first discuss this in the context of the two-player market entry game described earlier anddepicted in figure 1. The region for multiple equilibria is in Cell V. The two outcomes thatthis cell is consistent with involve either of the firms entering the market alone. Consider firstthe outcome of firm 1 entering, and consider for notational simplicity the case of no covariates.The equilibrium selection rule is that the more profitable firm enters the market in the case ofmultiple equilibria. Hence, it must be the case that the profits of firm 1 entering alone must begreater than the profits of firm 2 entering, given a realization of the errors. Thus,

π1 (y1 = 1, y2 = 0) > π2 (y1 = 0, y2 = 1) (29)

This implies

α1 + ε1 > α2 + ε2 (30)

Thus, the condition for firm 1 being the more profitable firm and hence the outcome of(y1 = 1, y2 = 0) being picked over (y1 = 0, y2 = 1) is

ε2 < −α2 + α1 + ε1 (31)

This condition describes a straight line with intercept (−α2 + α1) and a slope equal to 1, i.eit is a 45 degree line intersecting the ε2-axis at (−α2 + α1). It is easy to see that it this linebisects the cell in two, running from the lower left corner to the upper right corner. In regionunder the line in Cell V, firm 1 enters the market and in the region above the line, firm 2 enters.This is shown graphically in figure 5.

To estimate the model with this equilibrium selection rule (sequential entry in decreasingorder of profitability), we need to ensure that the likelihoods corresponding to each outcome arecorrectly evaluated. The likelihood with this equilibrium selection rule is thus,

Lt (θ;Xt, yt) = 1 (y1t = 0, y2t = 0) (PI,t) + 1 (y1t = 0, y2t = 1)�PIV,t + PV II,t + PV III,t + PV,t,(0,1)

�

+1 (y1t = 1, y2t = 0)�PII,t + PIII,t + PV,t,(1,0) + PV I,t

�

+1 (y1t = 1, y2t = 1) (PIX,t) (32)

20

where PV,t,(0,1) is the probability of the errors being in the part of Cell V that is assigned tothe equilibrium of firm 2 entering alone, and PV,t,(1,0) is the probability that the errors are inthe remainder of Cell V.

PV,t,(0,1) =

ˆ −α1+γ−α1

�ˆ −α2+γ−α2+α1+ε1t

φ (ε2t) dε2t

�φ (ε1t) dεtt

= Φ (−α2 + γ) [Φ (−α1 + γ)− Φ (−α1)]

−ˆ −α1+γ−α1

Φ (−α2 + α1 + ε1t)φ (ε1t) dεtt (33)

The integral in the above expression cannot be evaluated analytically, and hence is evaluatednumerically. The probabilities of the errors being in each of the other cells remains the sameas before. Thus, a similar estimation procedure as described earlier can be used, with the onlydifference being in the probabilities for the two outcomes of either firm entering the marketalone.

This procedure can easily be extended for more than 2 players. It is easy to verify for athree-player game of market entry, for instance, that with an equibrium selection rule that hasthe firms entering a market in decreasing order of profitability, each cell with multiple equilibriawould similarly be equally divided between the different outcomes that are consistent with thecell. As in the two-player example illustrated above, the likelihood for each observation couldbe constructed through assignment of the cells partially to multiple outcomes, instead of beingassigned exclusively to one outcome.

To summarize, for any game with multiple equilibria,the likelihood is well-specified and aMetropolis Hastings procedure can be set up to sample from the posterior distribution of pa-rameters as long as a particular equilibrium profile is selected, i.e. all the cells (or parts thereof)are uniquely assigned to specific equilibria.

3.3.2 Bayesian estimation with multiple equilibria

In this sub-section, we discuss the Bayesian estimation of discrete games of complete informationwithout any assumption on equilibrium selection. We have already seen that without an equi-librium selection rule, there is no unique mapping between some of the cells and outcomes, andthus no well-specified likelihood. Conditional on equilibrium selection, however, the likelihoodis well-specified. Consider a particular equilibrium profile - i.e. a unique mapping of cells orparts of the cells to specific equilibria - to be a model. Thus, there is a series of models - onecorresponding to each equilibrium profile, each of which can be estimated using a MetropolisHastings algorithm. We introduce the reversible jump algorithm, which allows us to simulateposterior distributions of parameters across multiple models and discuss how we apply it to thecontext of discrete games with multiple equilibrium profiles corresponding to multiple models.

21

Figure 5: Equilibrium Conditions - Sequential Entry in Order of Profitability

Reversible jump algorithm

The reversible jump algorithm (Green, 1995; Green and Hastie, 2009) is a Markov Chain MonteCarlo algorithm that generates draws from a stationary distribution for an across-model statespace which is potentially trans-dimensional. For instance, in modeling count data, there couldbe two potential models - a Poisson model, which is a single-parameter model and a NegativeBinomial model, which has two parameters. The reversible jump algorithm is an extension ofthe Metropolis Hastings algorithm, which simultaneously draws a model indicator as well asparameters. Like the Metropolis Hastings algorithm, it does this by generating draws from acandidate density, which is modified through a rejection step to satisfy the detailed balancecondition.

Formally, let there be a countable set K of models, with the k being a model indicator andθk being the parameter vector of dimension nk corresponding to the kth model. Let the data berepresented by D. Let L (D|k, θk) be the likelihood of the data corresponding to the kth modeland p (θk|k) be the density of the prior for the parameters of this model. Also, let there be aprior across models, with its density3 specified as p (k). Then the joint posterior of the modeland parameters is given by

π (k, θk|D) ∝ L (D|k, θk) p (θk|k) p (k) (34)

Let t denote a move type, including a forward move from (k, θk) to�k�θ�k�

�as well as the

reverse move and let there be a countable set T of such moves. Let θ�k� have dimension nk� .Let mt (k, θk) indicate the probability of this move, which could be dependent on both thecurrent model and parameter values of the current state. The reverse probability would thus bemt

�k�, θ�k�

�. For the forwards move, let us draw a vector of random numbers u with dimension

3It is not necessary to have separate priors for θk|k and k - it is sufficient to specify a prior p (k, θk). However,for the applications we consider, it is natural to specify the two priors separately.

22

rt from a known distribution ft (u), and let the vector u� be drawn for the reverse move. Let thedimensions of u� be r�t, which satisfies the condition nk + rt = nk� + rt�and the distribution it isdrawn from be f �t (u�). Let the function that defines the transformation from (θk, u) to

�θ�k� , u

��

be denoted by ht (θk, u) and the inverse function by h�t (θ�k, u�), such that

�θ�k� , u

��= ht (θk, u)

(θk, u) = h�t

�θ�k� , u

�� (35)

Finally, let the probability that the move is accepted be denoted by αt ((k, θk) , (k�, θ�k)) andthe reverse probability be α�t ((k�, θ�k) , (k, θk)). To generate draws from the target density, inthis case the joint posterior distribution of parameters and model indicators, it is sufficient toconstruct a Markov Chain whose transition kernel satisfies the Detailed Balance condition. It isstraightforward to show that this is the case when

ˆ ˆ(k,θk)∈A,(k�,θ�k�)∈B

π (k, θk|D)mt (k, θk) gt (u)αt�(k, θk) ,

�k�, θ�k�

��d (k, θk) du

=

ˆ ˆ(k�,θ�k�)∈B,(k,θk)∈A

π�k�, θ�k� |D

�m�t

�k�, θ�k�

�g�t (u)α

�t

��k�, θ�k�

�, (k, θk)

�d�k�, θ�k�

�du� (36)

This condition says that it is equally likely for the chain to move from set A to B as it is tomove from B to A. This will be satisfied if the move of type t is accepted with probability

αt�(k, θk) ,

�k�, θ�k�

��= min

�1,

π�k�, θ�k� |D

�m�t

�k�, θ�k�

�f �t (u

�)

π (k, θk|D)mt (k, θk) ft (u)

��∂ht (θk, u)

∂ (θk, u)

��

�(37)

The last term, is the Jacobian for the transformation from (θk, u) to�θ�k� , u

��.To summarize the algorithm, we propose a candidate move from (k, θk) to (k�, θ�k) with prob-

ability mt (k, θk) by first generating draws for u from the density gt (u) and then transform-ing (θk, u) to

�θ�k� , u

�� using the function ht (·). We then accept this move with probabilityαt

�(k, θk) ,

�k�, θ�k�

��. Repeated draws from this chain converge to the posterior distribution

π (k, θk|D).

Reversible jump algorithm applied to discrete games

In the case of discrete games, the multiple models correspond to the multiple equilibrium profilesHenceforth, we will use the term model and equilibrium profile interchangeably. Thus, in a two-player entry game for instance, we have two models, with a region of the error space (CellV in Figure 1) being consistent with two different equilibria. A given set of parameters thatrationalizes observed outcomes for one model would not rationalize the observed outcomes forthe second model. Thus, while the dimensions of the parameters do not differ for the differentmodels, the parameter spaces are different. As we have discussed earlier, the reversible jumpsampler is an MCMC sampler that traverses a broader parameter space that is the union ofparameter spaces across multiple models. We thus employ this sampler for our context even

23

though ours is not a trans-dimensional problem for which reversible jump samplers have beentypically employed.

Consider a move type t, which involves a candidate move from model k to model k� as theforward move and k� to k as the reverse move. With the assignment of priors to the parametersconditional on model p (θk|k) and model indicator p (k), the posterior is given by equation 34.We assume a multinomial prior on the model indicator. Thus,

k ∼ Multinomial (p) (38)

where p is a vector of prior probabilities associated with each of the K models.Let the probability of move t be independent of the current state of the parameter θk. i.e.

mt (k, θk) = mt (k). Since the move type indicates the models corresponding to the currentstate and the candidate move, this can be more simply denoted by mk,k� . The probability of thereverse move is similarly mk�,k. We make the further assumption that

mk,k� = pk� (39)

where pk� = p (k�) is the element of the prior p corresponding to model k�, In other words, wegenerate a candidate move to model k� with its prior probability and independent of the currentmodel. This is not necessary for our algorithm (for instance, one could use a uniform probabilityof moves across models) but as we will see, this is a natural choice for the jump probability andalso simplifies the expression for the acceptance ratio.

Let the random vector u have the same dimension as the parameter vector, and let this bea draw from a mean-zero normal distribution with a variance-covariance matrix Σu. We definethe variable transformation function ht (θk, u) as follows

θ�k� = gt (θk) + u

u� = u (40)

The function gt (θk) transforms the parameters from the parameter space of model k to thatof model k� by matching a set of moments. The set of moments to be matched are specificto the empirical application, but could include the predicted shares of the full set of outcomesat various values of covariates. In our empirical applications, we match these moments for themean levels of covariates and for values that are one standard deviation apart on either side ofthe mean. The number of moments to be matched needs to be at least as large as the dimensionof θk. We conduct this moment matching through an optimization step, which minimizes themean squared difference between the sample analogs of these moments for the two models.

The reverse function h�t�θ�k� , u

�� is the inverse of this transformation and is given by

θk = g�t

�θ�k − u�

�

u = u� (41)

24

where g�t�θ�k�

�is the inverse of gt (θk) .

The Jacobian is thus given by

��∂ht (θk, u)

∂ (θk, u)

�� =

��

∂[gt(θk)+u]∂θk

∂[gt(θk)+u]∂u

∂u∂θk

∂u∂u

��

=

��

∂gt(θk)∂θk

Ink

0 Ink

��

=

��∂gt (θk)

∂θk

�� (42)

This Jacobian does not have an analytical solution, and is hence evaluated numerically in ourempirical applications.

The candidate move described above is like that in a random walk Metropolis Hastings sam-pler, except that the candidate density is centered at a transformed value of the current stateinstead of the current state itself. When the candidate move is to the same model as in thecurrent state, the algorithm reduces to a standard random walk Metropolis Hastings algorithm.To see this, note that for k� = k, the function gt (θk) that equalizes the moments for the currentand candidate models is simply

g (θk) = θk

The Jacobian in this case equals 1. The probabilities mt (k, θk) and m�t�k�, θ�k�

�are equal, as are

the densities ft (u) and f �t (u�) and hence the acceptance probability in equation 37 reduces tothe standard expression for a random walk Metropolis Hastings algorithm.

Returning to the general case of a candidate move from model k to model k�, the acceptanceprobability in equation 37 can be written as

αt�(k, θk) ,

�k�, θ�k

��= min

�1,

L�D|k�, θ�k�

�p�θ�k� |k�

�p (k�) p (k) f �t (u

�)

L (D|k, θk) p (θk|k) p (k) p (k�) ft (u)

��∂gt (θk)

∂ (θk)

��

�(43)

where we have written out the expressions for π (k, θk|D) and π�k�, θ�k� |D

�, substituted in the

values of mt (k, θk) and mt�k�, θ�k�

�using equation 39 and that for the Jacobian from equation

42. Noting that ft (u) = f �t (u�) and canceling terms, we obtain the acceptance probability foracross-model moves as

αt�(k, θk) ,

�k�, θ�k�

��= min

�1,

L�D|k�, θ�k�

�p�θ�k� |k�

�

L (D|k, θk) p (θk|k)

��∂gt (θk)

∂ (θk)

��

�(44)

For within model moves, this acceptance probability is the standard random walk MetropolisHastings ratio

αt�(k, θk) ,

�k, θ�k

��= min

�1,

L (D|k, θ�k) p (θ�k|k)L (D|k, θk) p (θk|k)

�(45)

Every move need not be an across-model move. In general, it is efficient to have several sweepsof draws, with an across-model move being potentially attempted in every sweep, with severalwithin-model moves within each sweep. To summarize, a draw from the sampler has the followingsteps

25

1. The current draw is (k, θk).

2. Generate a candidate draw of model k� from the multinomial distribution Multinomial (p)where p is also the vector of prior probabilities for the various models.

3. Using a set of moment matching conditions, generate g (θk). This is done by minimizingthe distance between the moments for the state (k, θk) k and (k�, g (θk)).

4. Numerically evaluate ∂g(θk)∂θk .

5. Generate a draw for u from the normal distribution N(0,Σu)

6. Set the candidate parameter vector θ�k� = g (θk) + u

7. Calculate the acceptance probability αt�(k, θk) ,

�k�, θ�k�

��.

8. Generate µ ∼ Uniform (0, 1)

9. If µ < αt�(k, θk) ,

�k�, θ�k�

��accept the candidate draw

�k, θ�k�

�, else set the new draw to

be equal to the previous one (k, θk).

Tuning the reversible jump algorithm

One of the important choices to be made in practically implementing a Metropolis Hastingsalgorithm is the candidate density. In the case of a random walk algorithm with a normalcandidate, this choice boils down to the choice of the variance covariance matrix for the candidatemove. In principle, this choice does not matter as long as the density fulfills certain regularityconditions, but in practice this is a crucial choice. Intuitively, if the candidate moves to a very faraway location from the current draw, it might be in a region where the target density is very low,and hence the move would get rejected. If the step size is too small, such that the candidate drawhas a very similar target density as the current draw, the probability of acceptance becomes veryhigh. In both cases, information is lost due to high autocorrelation of the draws. This meansthat it would take a very large number of draws to dissipate initial conditions, and to traversethe regions where the target density is non-negligible. Thus, a significant part of practicalimplementation of a Metropolis Hastings algorithm involves tuning the algorithm.

The issue of tuning assumes even greater importance in the case of the reversible jump al-gorithm. This is because the algorithm traverses parameter spaces across models, which arenot entirely comparable. At an extreme, consider the case where g (θk) = θk for all moves,including across model moves. In principle, such a sampler would work, but the rejection rate ofcandidate draws might be unacceptable high and may render the algorithm practically useless.This is because the candidate draws generating by taking a deviation from the current drawmay have very low target density for the candidate model, making it almost certain that thecandidate draw would be rejected. The algorithm would not make across-model moves in sucha case. The moment-matching step in our algorithm reduces the chance of a prohibitively high

26

rejection rate, by ensuring that the target densities for the current draw and the transformeddraw (i.e. g (θk) are at least at similar levels. But tuning is still a non-trivial part of making thereversible jump algorithm work. One challenge is that a small step size for one model might bea large one for another. Hence, it would be attractive to come up with an adaptive version ofthe algorithm that has some degree of ’automatic’ tuning as has been attempted for the stan-dard Metropolis Hastings algorithm (Roberts and Rosenthal, 2001). These adaptive algorithmsadjust the variance of the candidate draw to keep the acceptance rate close to a pre-specifiedtarget level. The problem with this approach is that there is typically no clear criterion (i.e.no generalizable choice of target acceptance rate) that can be used for automating the tuning,particularly in the case of the reversible jump algorithm. Thus, one would have to use relativelyarbitrary criterion (such as an arbitrarily chosen ’optimal’ acceptance rate) to adaptively adjustthe tuning. An alternative approach in the case of the Metropolis Hastings algorithm is to use aproposal that utilizes information about the gradient of the target density to generate efficientcandidate draws in situations where rejection rates of the standard random walk algorithm areprohibitively high. This underlies the so-called Langevin algorithm (Roberts and Tweedie, 1996;Roberts and Rosenthal, 2006), which relies on generating draws that satisfies two conditions.First, it ensures that the acceptance rates are equal to 1 when the candidate draw is equal tothe current draw. Second, it ensures that the gradient of the acceptance rate with respect tothe candidate is equal to zero when the candidate is the current draw. A similar Langevin-likeapproach was introduced by Brooks, Giudici, and Roberts (2003) for the reversible jump algo-rithm, by proposing a draw that ensures that the acceptance rate is equal to 1 when candidateis a transformed version of the current draw, and by equating the gradient of the acceptancerate to zero with respect to a transformed version of the current draw to 0, at the current draw.One of the costs of this approach is that it can be computationally expensive, particularly inthe context where the transformation function g (θk) cannot be computed analytically.

We have tried both approaches in our Monte Carlo simulations and found in our context thatan automatic tuning approach worked well, although it comes at the cost of using an arbitrarycriterion such as the one in Roberts and Rosenthal (2001). We find that the sampler mixesquite well in this case, with an acceptance rate ranging from about 10% to 25%; We use thisprocedure in our empirical applications.

Assessing convergence

Assessing convergence of any MCMC sampler can be tricky and often depends on intuitivemethods such as plotting the chains of the parameter draws and looking at measures such asautocorrelations. The issue of assessing convergence becomes much more complicated in the caseof the reversible jump algorithm due to the across-model parameter space it samples from, anddue to the difficulties in assessing convergence of the model indicator itself. One way to assessconvergence of parameters is to see if there is good mixing of the parameters within each model.Depending on the dimensions of the parameter vector and the number of models involved, this

27

can be an onerous task to do manually. An alternative is to compute a summary statistic ormoment for each set of parameters that is invariant across models and assess convergence of thatmeasure both within and across models (Brooks and Giudici, 2001). Assessing convergence of themodel indicators can also be tricky since it is a discrete variable. One indicator of convergencewould be that successive blocks of draws have very similar proportions of indicators for thevarious models or to use more formal tests of convergence such as the hypothesis tests suggestedby Brooks, Giudici, and Philippe (2003).

Identification of equilibria and the reversible jump algorithm

So far, we have laid out the method by which we can estimate Discrete Games with multipleequilibria using a reversible jump algorithm. However, the methodology does not assume any-thing about the identification of equilibria. When the equilibria are not identified, the posteriordensities conditional on the model would be the same at the mode for each model. On average,the algorithm would draw each of the models with its prior probability. Thus, when the modelis unidentified, the posterior probabilities for the model indicators would essentially be the sameas the prior probabilities. The posterior estimates of the parameters conditional on the modelwould not be affected by the fact that the posterior probabilities are unidentified. However, themarginal posterior distributions of the parameters would reflect the prior probabilities of themodels. When the model is identified, the posterior densities of the parameters would be differ-ent at the modes of the different models. The models with higher modes would on average bemore likely be drawn, since the acceptance probabilities of these models would be higher. Thus,the probabilities of drawing each model would depend on their relative levels of the posteriordensities, and would differ from the prior probabilities. The stronger the identification of themodel, the weaker would be the influence of the prior on the posterior probabilities. Thus, thisapproach nests both cases - when the equilibrium is unidentified and when it is identified. Whenthere are no excluded variables available, the empirical researcher still benefits by obtaining pa-rameters estimates that reflect her own uncertainty about the true equilibrium. When excludedvariables are available, the researcher can let the data inform her about the true equilibrium.

We have discussed identification of equilibria earlier and noted that there might be contextswhere we don’t have sufficient moments to identify every equilibrium profile, and therefore mightonly be able to identify subsets of equilibria. For instance, a three-player game of market entryhas over 500 equilibrium profiles and it is unlikely that we would have the number of momentsrequired to identify them. Instead, we may be interested in identifying the probabilities of asmaller set of equilibria being played. For instance, we may consider two sets of equilibria - oneequilibrium profile that assumes that firms enter in decreasing order of profitability (Berry, 1992)and another set consisting of all other equilibria. In a social entry context, we might be interestedin the probability that the Pareto dominant equilibrium is played (Hartmann, 2010), with theremaining equilibrium profiles placed in the other set. In each of these two instances, we couldclassify the two sets of equilibria as two models, with the specific equilibrium profile within each

28

set randomized at each draw4. Thus, we draw an indicator for which set of equilibrium profilesare played rather than a specific equilibrium profile, and then randomize the actual equilibriumwithin the set. The parameter draws for a set that contains more than one equilibrium profilewould reflect the uncertainty within the set about which specific equilibrium is played.

Accommodating heterogeneity in equilibrium selection

In several contexts, one might be interested in modeling heterogeneity in equilibrium selection.Consider, for instance, the context of joint consumption of a good by friends (Hartmann, 2010),modeled as the outcome of a discrete game of complete information. In such games, one of theequilibria is always Pareto dominant. Instead of making the typical ad-hoc assumption that thisequilibrium is selected, one could estimate the probability of its selection when one has suitableexcluded variables. However, a valid question of interest might be that different sets of friendsdo not have the same equilibrium selection rule. Since inference and evaluation of counterfactualoutcomes depend on equilibrium selection, it is likely important in such situations to model theheterogeneity in equilibrium selection.

Accommodating heterogeneity in equilibrium selection is straightforward using the reversiblejump algorithm we have laid out earlier in this section. Consider N units (e.g. sets of friendsin the joint consumption example) for each of which we observe data Di, where i indexes theunit. Let ki be an indicator of the model for the ith unit and θiki represent the parameter vectorfor the ith unit and the ith model. Then the unit level equivalent of the posterior density inequation 34 is

π (k, θk|D) ∝ L (Di|ki, θiki) p (θiki |ki) p (ki) (46)

It might be infeasible to specify priors for the parameter vector for each unit separately andhence it might be useful to cast this model in a hierarchical framework, for instance by assumingthat

θi,k ∼ N(qiλk, Vθ,k)

where qi is a vector of observed covariates, λk is a population-level parameter vector specificto the model and Vθ,k is a population-level variance-covariance matrix. The model would becompleted by specifying priors on these population-level parameters. With an assumption ofconditional independence of the unit-level parameters θiki , we could set up an overall Gibbssampler, with the population-level parameters being drawn as straightforward Gibbs steps usingthe full-conditional distributions of these parameters, and the unit-level parameters being drawnthrough a reversible jump step. Thus, if we assume a conditionally conjugate prior of the formgiven below

4Such a randomization has also been used in Soetevent and Kooreman (2007), but in their case across all

equilibria rather than a subset of them

29

λk ∼ N�λ̄, Vθ,k ⊗A−1

�∀k ∈ K (47)

Vθ,k ∼ InverseWishart (ν, V ) ∀k ∈ K (48)

we could derive the full conditional densities of the population-level parameters as follows

vec (λk) |· ∼ N�vec

��Q�kQk +A

�−1 �Q�kΘk + aλ̄

��, Vθ,k ⊗

�Q�kQk +A

�−1�

Vθ,k ∼ InverseWishart (ν + nk, V + Sk)

where Qk is the matrix formed by stacking up all the qi vectors corresponding to model k, Θkis similarly the stacked matrix of θi,k vectors for the model, nk is the number of such modelsand Sk is the sum of squared residuals matrix. Thus, once we have draws of unit-level θi,kparameters, the population-level parameters can be drawn model by model as simple Gibbssteps. The hierarchical setup is similar to that for standard hierarchical Probit models (see forinstance Rossi, McCulloch, and Allenby (1996)), and the set of draws at the population levelare very similar for those models, except that we have separate population-level parameters foreach model. The algorithm would involve a series of reversible jump steps, one for each unit,within a larger Gibbs sampler. We would draw a model indicator for each individual and couldconduct inference on the posterior probabilities of equilibrium selection for each unit. One lastpoint to make in this section is that even when the equilibria are not identified, for instance,when there are no exclusion restrictions available, our experimentation with this method usingMonte Carlo simulations show that it is still useful to cast the unit level model in a hierarchicalframework, specifically when the payoff functions include covariates. The reason is that acrossmodel moves depend on efficient moment matching between models to ensure that the sampleraccepts candidate draws at a sufficient rate. Since the moment are matched for specific valuesof covariates (e.g. mean, mean plus one standard deviation, etc.), the posterior densities for theactual observations in the data depend on how close the covariates in the observations are tothose used in the moment matching conditions. The covariates are typically closer within units,and there could be typically large dispersions away from these values when we pool observationsacross units, causing high rejection rates. Thus, specifying heterogeneity in equilibrium selectionhas practical benefits even when the equilibria are not identified. Note that the parameters of thepayoff functions and any heterogeneity in those are still identified, and specifying heterogeneityin equilibrium selection does not affect inference for these parameters.

Discussion We finally summarize the estimation strategy for discrete games of complete infor-mation in the presence of multiple equilibria. The method involves first enumerating the set ofequilibria for the game, assuming that the equilibrium is in pure strategies. This assumptionguarantees that there is a countable set of equilibria. We then specify priors over these equilibria.

30

Note that the choice of an ad-hoc equilibrium selection rule, which assumes one specific equilib-rium, is entirely nested within our approach since it is identical to specifying a prior probabilityof 1 for that equilibrium. We then apply the reversible jump MCMC algorithm to our multipleequilibria context, by assuming that each equilibrium profile is a model. While the likelihoodis not well-specified in the presence of multiple equilibria, it is well-specified conditional on amodel (an equilibrium profile). The reversible jump sampler negotiates the union of parame-ter spaces across models to generate draws from the joint posterior density of parameters andmodel indicators. When the model - in our case the equilibrium - is identified, the marginalposterior distribution of the model indicators reflects the information about the likelihood of theequilibrium being played in addition to the prior. When the model is not identified, it merelyreflects the prior distribution. In this way, our approach nests both cases, where the equilibriumis identified and where it is not. It is thus complementary to two different streams of literature,one which does not assume that the equilibrium is identified but tries to recover bounds onparameters instead of point estimates, and another which attempts to recover parameter esti-mates when the equilibrium is identified using suitable exclusion restrictions. Relative to theformer, our approach is able to recover not merely bounds for parameters, but marginal posteriordistributions which may be potentially multi-modal, reflecting the multiple equilibria. Relativeto the latter, our approach is able to not just recover parameter estimates common to models,but model-specific parameters. Further, it can accommodate heterogeneity in equilibrium se-lection, given suitable data, and can even recover not just unit-level parameters, but unit-levellikelihoods of equilibria being played.

4 Monte Carlo simulations

4.1 Two player game of social interaction

4.1.1 Unidentified equilibrium

We first simulate a social interaction game with two players, with players deciding whether toconsume the good jointly with their partner, alone or not at all. We assume that there areno exclusion restrictions available. We observe 1000 such decisions by the individuals, withfour potential outcomes in each observation. There are three parameters in the model, theintercepts for the two players (α1 and α2), and a social interaction effect of the consumptioneffect of the other player on the focal player’s payoffs (γ). We draw the errors (ε1 and ε2) fromindependent standard normal distributions. We then simulate the outcomes by checking whichcell the error draw falls in. In case the error falls in Cell V in figure 2, we assume that thePareto dominant equilibrium is played, where the outcome of both players consuming Paretodominates the outcome of neither consuming. We estimate the model using our algorithm,assuming a uniform prior over the multiple equilibria, and using diffuse priors for the otherparameters.

We report the estimates of this simulation in Table 1, in which we report the conditional pos-

31

terior estimates of parameters, conditional on the model, and also the marginal estimates. Notethat there are two models in this case, one corresponding to the Pareto dominant equilibrium(Equilibrium 1 in the Table) and the other corresponding to the Pareto dominated equilibriumbeing selected (Equilibrium 2 in the Table). The estimates conditional on the true model (i.e.Equilibrium 1) are quite close to the true parameter values. But the estimates conditional on theincorrect model are far away from the true parameter values for the two intercept parameters α1and α2. We also depict the posterior densities for these parameters in Figure 6. The panels onthe left show the marginal densities, while the panels on the right show the conditional densities.The true parameter values are depicted by vertical lines. The conditional densities for α1 andα2 once again show that the conditional densities are relatively far apart, with the densitiesconditional on the selection of Pareto dominant equilibrium being close to the true parametervalues, while those for the Pareto dominated equilibrium do not enclose the true values. More-over, the marginal densities for these parameters show two modes, reflecting the two equilibria.Also, the algorithm draws the Pareto dominant and Pareto dominated models in roughly thesame proportion as the prior probabilities for the models (which was 0.5 for each of the models).

These estimates demonstrate two important things. First, they show the effect of equilibriumselection on the parameter estimates. The parameter estimates conditional on the incorrectmodel being selected are far away from the truth, and in the case of these simulations, do noteven enclose the truth within the 95% posterior credible interval. Specifically, assuming thePareto dominated model, which allocates the entire region of multiplicity (Cell V in Figure 2) tothe outcome of neither player consuming, leads to an overestimation of the intercepts for the twoplayers. This makes sense, given that with a smaller region allocated for the outcome of bothplayers consuming, the observed outcomes can only be rationalized by increasing the intercepts.While we don’t show the maximum likelihood estimates for these models, the estimates assum-ing the incorrect equilibrium show a similar pattern, with the truth not lying within the 95%confidence interval for these parameters. Second, these estimates show that our methodologyis working, recovering the truth conditional on the true equilibrium selection, and recoveringa potentially multi-modal posterior distribution for parameters. Thus, these estimates demon-strates how our methodology complements existing methodologies. For instance, relative tothe bounds approach, we recover posterior densities, indicating the specific regions with highposterior probabilities rather than merely bounds.

4.1.2 Identified equilibrium

In this simulation, we add excluded covariates to the data generating process, which identifies theequilibrium as discussed earlier. We simulate one covariate for each of the two players, assuming

32

Figure 6: Posterior Densities: 2 player game of social interaction

33

Table 1: Monte Carlo Simulation - 2 player game of social interaction

Parameter True ValuePosterior Estimates (Mean/Std. Dev.)

Marginal True Model Model 2

α1 -0.2-0.1346(0.0790)

-0.2040(0.0492)

-0.0755(0.0434)

α2 -0.4-0.3274(0.0823)

-0.3989(0.0549)

-0.2665(0.0438)

log (γ) -0.5 -0.4994(0.0848)-0.4873(0.0862)

-0.5098(0.0822)

probability (k = i) 0.48 0.52

that it is uniformly distributed between -1 and 1. We assume values for the coefficients for theseexcluded covariate, and keep everything else the same as the previous simulation, including theassumption that the Pareto dominant equilibrium is selected. We again estimate the modelusing our reversible jump algorithm, assuming an equal prior probability for both the Paretodominant and Pareto dominated equilibria, and diffuse priors for the other parameters.

The results for this simulation are reported in Table 2. The results look similar to those forthe previous simulation. Once again, we find that the intercepts for the two players dependcritically on the assumption on equilibrium selection. The conditional estimates for the truemodel for all the parameters enclose the true value within the 95% credible interval, but not forthe incorrect model. We also find that the estimates of the coefficients for the covariates, andof the social interaction parameter, are similar for both models. The interesting result is thatthe correct model, i.e. with the selection of the Pareto dominant equilibrium, is drawn morethan twice as often as the Pareto dominated model. This demonstrates the identification of theequilibrium in the presence of excluded covariates.

Figure 7 shows the posterior densities of three parameters - the two intercepts and the socialinteraction parameter (the coefficients for the two excluded covariates are not shown, but havesimilar patterns as the social interaction parameter). The marginal density is a mixture of thedensities for the two models, but with higher contribution from the Pareto dominant model asit is drawn more often than the other model. The key point to note is that the marginal densityin this case is closer to that for the true model and does not have multiple modes, though ithas broader shoulders than the density for either model. As the posterior probability for the aparticular model increases, this marginal density should move closer and closer to the conditionaldensity for the that model. Thus, with better identification, we should get tighter estimates ofthe parameters.

We have shown through these simulations that the algorithm is able to do a reasonably good

34

Table 2: Monte Carlo Simulation - 2 player game of social interaction - with covariates

Parameter True ValuePosterior Estimates (Mean/Std. Dev.)

Marginal True Model Other Models

α1 -0.2-0.1273(0.0630)

-0.1569(0.0491))

-0.0685(0.0433)

α2 -0.4-0.3424(0.0676)

-0.3720(0.0453)

-0.2837(0.0510)

log (γ) -0.5 -0.6515(0.0968)-0.6509(0.0970)

-0.6526(0.0963)

β1 0.20.2176

(0.0710)0.2173

(0.0710)0.2181

(0.0711)

β2 0.20.1873

(0.0723)0.1870

(0.0711)0.1880

(0.0747)

probability (k = i) 0.32 0.68

job of recovering the parameters, and reflecting the uncertainty over equilibrium selection. Itspans both extremes, when the model is unidentified, and when it is known with certainty(which would be equivalent to setting a prior probability of 1 on that model) and the continuumof situations in between, where the model is not known with certainty, but can be identifiedusing the data.

4.2 Three-firm competitive game

We next conduct a simulation for a competitive game, with three players and with payoff func-tions (reduced form profit functions) as defined in equation 1. We also add excluded variablesin this simulation, thus the equilibrium is identified (up to the number of moments available).Similar to the Monte Carlo simulations in the case of the 2-player social interaction game, wefirst assume parameter values. The parameters in this simulation include the intercepts - α1,α2 and α3, the two competitive effect parameters - γ and δ, and the coefficients for the threeexcluded covariates θ1, θ2 and θ3. We assume an equilibrium selection rule that picks the equilib-rium that maximizes industry profits. The underlying assumption of such an equilibrium is thatfirms enter in decreasing order of profits. We then draw the errors for 1000 observations fromindependent standard normal distributions and generate the outcomes, applying the equilibrium

35

Figure 7: Posterior Densities: 2 player game of social interaction - with covariates

36

selection rule when the errors are in a cell with multiple equilibria. Once again, estimation wascarried out using our reversible jump approach, specifying diffuse priors for the paramete

Bayesian Estimation of Discrete Games of Complete …...Keywords: Discrete games, multiple...

Documents

Transcript of Bayesian Estimation of Discrete Games of Complete …...Keywords: Discrete games, multiple...