Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial...

Identification in Multivariate Partial Observability Probit

Dale J. PoirierUniversity of California, Irvine, USA

September 15, 2011

Abstract

Poirier (1980, JoE ) considered a bivariate probit model in which thebinary dependent variables y1 and y2 were not observed individually, but theproduct z = y1@y2 was observed. This paper expands this notion of partialobservability to multivariate settings.

2

Bivariate Probit

C Consider N independent observations from the latent bivariateregression model

where 2 = [$1N, $2N, D]N is an unknown parameter vector.

3

C The bivariate probit (BP) model arises when only the sign of isobserved, i.e., if , and = 0 if (i = 1, 2).

B Zellner and Lee (1965, Econometrica) and Ashford and Sowden(1970, Biometrics) were early contributors.

B Chib and Greenberg (1998, Biometrika) provided a Bayesian analysisof multivariate probit (MP).

C The bivariate ordered probit model arises when (i = 1, 2) areobserved in more than two categories.

4

C Of primary concern here is extending the case of bivariate partialobservability (BPO) probit introduced in Poirier (1980, JoE) in whichonly is observed. Also considered BPO in sampleselections models (Heckit-like estimators).

C BPO can be thought of as a 2×2 contingency table with covariates andpartial observability as suggested below.

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

B In BPO, we only observe (n01 + n10 + n01) = N - n11 and n11.

5

Example: Consider a two-agent committee that requires both agents to be infavor in order for a motion to pass.

C For example, consider a man and a women who contemplate gettingmarried in the current year.

C You observe Z = 1 if they get married, and Z = 0 if they don’t.

C But you don’t observe their individual decisions Y1 = 0, 1 and Y2 = 0, 1.

6

Note: There are analogs of partial observability in the analysis ofcontingency tables in which cells cannot be fully distinguished.

C See Fienberg (1970), Cohen (1971), Pocock (1973), Haberman (1974),and Chen and Fienberg (1974, 1976).

C In both cases issues of identifiability require careful analysis.

C An important difference between the these contingency table approachand BPO is the introduction of covariates provides ties across cellsbeyond the requirement that cell probabilities must add to unity.

7

Table 1: Applications with Partial ObservabilityField Article Discrete Outcomes

agricultureDimara, and Skuras

(2003)producer aware of innovationproducer adopts innovation

bankingDwyer and Hassan

(2007)bank fails

bank suspends payments

credit Swain (2002)bank’s decision on access

household’s demand for loans

education Ballou (1996)individual seeks teaching job

educator offers a job

development Glewwe (1996)working

private vs. public sector

IMFPrzeworski and

Vreeland (2002);Rajbhandari (2011)

decision by IMF to extend financingdecision by a gov. to apply for

assistance

immigration Aydemir (2002)individual applies to emigratehost decides whether to accept

foreign investment Konig (2003)ownership advantages holdlocation advantages hold

internalization advantages hold

labor Mohanty (1992)individual seeks employment

firm selects individual

law and economics Feinstein (1990)violationdetection

mergers Brasington (2003)richer school districtpoorer school district

8

C Farber (1982, Econometrica) considered an intermediate case betweenBP and POBP which can be visualized as the 2×2 contingency table withcovariates and the following structure (censored probit).

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

B In this case, we observe (n00 + n01), n10, and n11.

9

C Maximum likelihood estimation of BPO is included in STATA.

C Cavanagh and Sherman (1998, JoE ) introduced a class of rank estimatorsof scaled coefficients in semiparametric monotonic linear index models.

B CS also considered single equation multiple indices modelsincluding BPO probit.

B Importantly, their results imply semiparametric analysis of partialobservability models without the normality assumption are identified.

10

C Let M(@) denote the univariate standard normal cdf, and M2(0, 0; D) denotethe bivariate standard normal cdf with correlation D.

C The likelihood functions for BP and BPO are

where and are obser-ved and the component probabilities are given in the following table.

11

Joint Probabilities (i, j = 0, 1) for BP

yn1 = 0 yn1 = 1

yn2 = 0

yn2 = 1

12

C Meng and Schmidt (1983, IER) compared the relative variances for BPvs. BPO for a small number of parameter settings.

B The information loss under BPO can be “surprisingly large,”particularly near points that are not identified.

B The cost increases with the fraction of the data for which thedependant variable is imperfectly observed.

13

C If D = 0:B BP reduces to two univariate probit models with a block diagonal

information matrix. B The BPO information matrix is not block diagonal.

C If D … 0:B BP is more efficient than univariate probit.B In contrast, the underlying latent system shows no gains from pooling

the two equations unless restrictions are added.< The nonlinearity of the model is why the OLS efficiency result

for the linear latent model does not carry over.< But this result does carry over if a bivariate linear probability

model is estimated.

14

Multivariate ProbitC Consider a sample of N independent observations from a multivariate

probit model with latent variable representation

where

(1)

(2)

15

and you only observe the binary outcomes

B Analysis here is conditional on the K×1 vectors xn (n = 1, 2, ..., N).

B Stack all parameters into the M = 3K + 3 vector

B Finally, let denote a J-dimensional normal pdf with meanm and covariance matrix E and denote the corresponding cdf by

(3)

16

C The multivariate probit (MP) choice probability is

where the regions of integration are given by

(4)

(5)

17

C A convenient representation for MP choice probability can beobtained by reparameterizing :n and D to

where (n = 1, ..., N; j = 1, ..., J). Then

C The log-likelihood function for J-dimensional multivariate probit is

(6)

(7)

18

C Barrenechea (2007) derived where

is the (J - 1)×(J - 1) matrix obtained by omitting the jth row andcolumn of , is the (J - 2)×(J - 2) matrix obtained by omittingthe ith and jth rows and columns of ,

(8)

(9)

19

(13)

(12)

(11)

(10)

20

C Table 2 expresses

in terms of the original parameters :n and D in the case J = 3.

Table 2: Trivariate Probit:

yn2 yn1 = 0 yn1 = 1

yn3 = 0yn2 = 0

yn2 = 1

yn3 = 1yn2 = 0

yn2 = 1

(6)

21

C The information matrix for multivariate probit is

where the expected information in cell i = [i1, ..., iJ]N is

(14)

(15)

22

Multivariate Partial Observability Probit

C A direct extension of Poirier (1980) to multivariate partial observability

(MPO) probit corresponds to observing only (n = 1, ..., N).

Table 3: Sampling Distribution for MPO

zn = 0 1 - zn = 1

23

C The log-likelihood for MPO is

where is observed.

(16)

24

C The information matrix for MPO is

where

(17)

25

C The additional information in MP over MPO is

where i = [i1, ..., iJ]N, 4J = [1, ..., 1]N, and gn(i*2) is given by (15).

B Clearly (18) is positive semidefinite.

B The internal sum in (18) is over the cells for yn that cannot be

distinguished when zn = 0.

(18)

26

C MPO reflects unanimous consent among J agents.

B J = 2 (marriage)

B J = 12 (American criminal justice)

< Zn = 1 (guilty)

< Zn = 0 implies unanimous agreement on “not guilty” or a hung

jury.

27

C The next section discusses conditions under which (16) is positive

definite, and hence, 2 is locally identified.

B But even when 2 is identified, MPO is unlikely to provide much

information on 2 when J > 2.

B So it is natural to look at other types of partial information that

involve more information than MPO, but less than MP.

28

C Suppose the complete set of J(J - 1)/2 bivariate products

Znij = Yni @Ynj (i, j = 1, ..., J; i < j)

are observed. BPO applied to all possible pairs in MP is defined to be as

multivariate bivariate partial observability (MBPO).

29

B MP uncovers 2J cells for yn.

B MPO uncovers one cell, yn = [1, ..., 1]N, but cannot distinguish among

the other 2J - 1 cells.

B MBPO uncovers 2J - J - 1 cells, but cannot distinguish among the J +

1 cells with znij = 0 (i.e., the cell yn = 0J and J cells with exactly one

ynij = 1).

B As J increases the proportion of cells that can be distinguished

increases.

30

C How interesting is the MBPO data situation?

B A data collecting agency can use MBPO to design data releases that

are only partially informative.

B Consider asking J binary (yes/no) questions including a highly

sensitive one in which ynJ = 0 is embarrassing.

< If individuals are only asked whether all their answers equal one

(MPO), or whether all answers on pairs of equations are one

(MBPO), then a “no” answer does not imply ynJ = 0.

B Other suggestions?

31

C The marginal distribution of Znij in MBPO is Bernouli with

B Poirier (1980) gave conditions such that Znij identifies $i, $j, and Dij.

B The joint sampling distribution Znij = Yni @Ynj (i, j = 1, ..., J; i < j)

provides a likelihood function for identifying 2.

32

< Unlike using all possible BPs to estimate MP parameters

[e.g., Kimhi (1994, AJAE)], MBPO uses the actual likelihood for

and not a quasi-likelihood.

− “quasi” because the BPs are not independent.

< MBPO involves J-dimensional integrals. BPO only requires two

dimensional integrals.

33

C If zn uncovers a specific cell, let dn = 1, and let dn = 0 otherwise.

B If dn = 1, then .

B If dn = 0, then equals the sum of probabilities

for all the cells that cannot be distinguished, i.e.,

(19)

34

C The MBPO log-likelihood is

B The information matrix for MBPO is

B I MBPO (2) - I MPO (2) and I MP(2) - I MBPO(2) are positive semidefinite.

(20)

35

C Consider trivariate partial observability (TPO) (J = 3) [Konig (2003,

Rev. World Econ)].

36

B The additional information in trivariate probit (TP) over TPO is

where i = [i1, i2, i3]N and

(15)

37

C Trivariate bivariate partial observability (TBPO) arises when only zn12

= yn1 @ yn2, zn13 = yn1 @ yn3, and zn23 = yn2 @ yn3 are observed.

38

B For some realizations of

it is possible to recover the unobserved yn.

< If zn12 = zn23 = 1, then yn = [1, 1, 1]N.

< If zn12 = 1 and zn23 = 0, then yn = [1, 1, 0]N.

< If zn12 = 0 and zn23 = 1, then yn = [0, 1, 1]N.

< If zn12 = zn23 = 0 and zn13 = 1, then yn = [1, 0, 1]N.

< Set dn = 1 if any of these four cases occurs.

39

< Otherwise, set dn = 0 for the four remaining cells which cannot be

distinguished when zn12 = zn13 = zn23 = 0 [see (19)]:

B See Table 4.

(23)

40

Table 4: Mapping z12, z23, z13 6 yn for TBPO

zn13 = 0 zn13 = 1zn12 = 0 zn12 = 1 zn12 = 0 zn12 = 1

zn23 = 0[0, 0, 0]N, [1, 0, 0]N,

[0, 1, 0]N, [0, 0, 1]N[1, 1, 0]N [1, 0, 1]N

zn23 = 1 [0, 1, 1]N [1, 1, 1]N

Note: Blank cells indicate impossible combinations.

C The joint distribution of Zn, is given in Table 5.

41

Table 5: Sampling Distribution for TBPO

zn12 zn13 zn23

0 0 0

1 -

-

-

- 1 0 00 1 00 0 11 1 1

Note: Other combinations of the znij are assigned probability zero.

42

C The TBPO log-likelihood is

where and

(24)

43

B The information lost in going from TBPO to TP is

B The information gained in going from TPO to TBPO is

44

C Just like in the case of MP, increasing J by one adds new parameters, but

also the possibility of more information on $j (1 # j # J) provided Dj(J+1)

… 0. The same holds for partial observability.

B Going from J to J + 1 increases the number of parameters by K + J.

45

Identification

C Lindley (1971): “In passing it might be noted that unidentifiability causes no

real difficulty in the Bayesian approach.” (assuming a proper prior)

C Kadane (1974): “... identification is a property of the likelihood function, and

is the same whether considered classically or from the Bayesian approach.”

C Poirier (1988, ET ): There is, however, no Bayesian

free lunch. The “price” is that there exist quantities

about which the data are uninformative, i.e., their

marginal prior and posterior distributions are identical.

46

C Consider the bivariate case J = 2.

C The BPO likelihood is

C Because

there is a labeling problem.

47

C Geweke (2007, Comp. Stat. & Data Anal.) discussed a similar labeling

problem in the context of mixtures.

B If the function of interest is invariant to parameter permutation, then

the labeling is not a problem.

< An example is prediction of z.

< However, obtaining posterior simulation convergence can be

more challenging.

B If the function of interest involves, say, only $1, then the prior should

also not be permutation invariant.

< Different marginal priors for $1 and $2 should be used.

< E.g., a dogmatic exclusion restriction on a component in $1.

48

B It is inherent in BPO that the researcher must introduce non-data

based information to distinguish between the two equations.

49

C Given restrictions R(2) = 0, Poirier (1980, JoE) showed [using

Rothenberg (1971, Econometrica)] that 2 is locally identified provided

where the BPO information matrix is

50

where

51

C I BPO(2) is a weighted sum across observations of outer products of the

gradient vectors .

C Provided these gradients exhibit sufficient variations across observations

and there is at least one exclusion restriction on $, I BPO(2) is nonsingular

and 2 is locally identified.

B If the excluded covariate is continuous, then the required variation

is obtained.

52

B The restriction D12 = 0 alone may or may not identify $1 and $2.

B The intermediate cases of Meng and Schmidt (1983, IER) (e.g.,

censored probit) are identified without any restrictions.

C Begin with the simplest case: no covariates.

C Japanese proverb: “Your garden is not complete until there’s nothing

more you can take out of it.”

53

C Pearson (1900, Phil. Tran. Roy. Soc. of Lon., Series A ) and Sheppard

(1900, Tran. of the Cam. Phil. Soc.).

B K = 1, xn = 1 (n = 1, ..., N).

B D12 is the tetrachoric correlation coefficient.

B Now known as BP with no covariates.

B With full observability everything is identified. Life is great for

BP!

54

Figure 1: BP Likelihood, $1 =0, $2 =0, D = 0

55

Figure 2: BP Likelihood Contours, $1 = 0, $2 = 0, D12 = 0

56

B If D12 … 0, MLE for BP is more efficient than univariate probit MLEs:

Cell Counts

y1 = 0 y1 = 1y2 = 0 n00 n10

y2 = 1 n01 n11

< The MLE of is The MLE of $1,

$2, and D12 are and

(implicitly)

57

C For this intercept-only case, the sample information matrix for BPO is

where

58

C Clearly I BPO(2) is singular in this intercept-only case.

B The rank deficiency is 3 - 1 = 2. Knowing D12 does not identify $.

B $1, $2 and D12 are individually unidentified for BPO.

B Only the scalar

is identified. The MLE of is

59

Figure 3: BPO Likelihood, $1 = 0, $2 = 0, D12 = 0

60

Figure 4: BPO Likelihood Contours, $1 = 0, $2 = 0, D12 = 0

61

C Clearly, without restrictions, BPO has a labeling problem.

B In fact, there are hyperplanes through the parameter space for which

the likelihood is constant.

< $2 = D12 $1

< $1 = D12 $2

B Recall:

62

A “Peculiar” Case [Poirier (1980, JoE)]

Add a binary regressor to xn such that xn = [1, 0] for the first r

observations, and xn = [1, 1] for the last N - r observations.

C Suppose and or in other words,

R(2) = $12 = 0 and M R / M 2N = [0, 1, 0, 0, 0].

B (n = 1, ..., r)

B (n = r + 1, ..., N).

63

C Under D12 = 0 the information matrix for BPO is

where

64

C Define

B *NI BPO(2) * = 0 implying I BPO(2) is not positive definite.

B Therefore, 2 is not identified if the excluded covariate is binary.

65

C Recall the “simple” case: there are three parameters $11, $21, D12, but only

is identified with BPO.

B Now add $22, but there is only one additional cofactor value.

B Even with D12 = 0, $11, $21, and $22 are not individually identified,

because the cofactor does not exhibit sufficient variation.

B If the cofactor also took on a third value, then $11, $21, and $22 are

individually identified if D12 = 0.

B If the cofactor is continuous, then no peculiar identification problem

arises.

B The non-linearity of ‹BPO(2) provides the local identification.

66

C In the TPO case

where

67

B Identification depends on the nonsingularity of

which rests on sufficient variation of the gradient over observations.

B In the intercept-only case the only thing identified is the scalar M3($1,

$2 $3; D12, D23, D13).

B Unlike the rank deficiency of two in the bivariate case, the rank

deficiency is 6 - 1 = 5 in the trivariate case.

(17)

68

B Also there are labeling problems with information matrix (17)

becoming singular if $2 = D12 $1, $1 = D12 $2, $3 = D13 $1, $1 = D13 $3,

$2 = D23 $3, and $3 = D23 $2.

B Following the discussion of the BPO case, let xn = [1, xn2, xn3]N and

suppose $1 = [$11, 0, 0]N, $2 = [$21, $22, 0]N, and $3 = [$31, 0, $33]N.

< In other words, two continuous covariates xn2 and xn3 are intro-

duced together with the four restrictions $12 = $13 = $23 = $32 = 0.

< For the moment also suppose D = 03.

69

< Dropping the last three elements in (36) yields

70

Computation Issues

C The MP likelihood function requires computation of MJ (:n; D).

C J # 4 in applications in health, labor or education economics.

C Lesaffre and Kaufman (1992, JASA) showed

B MJ (:n; D) is strictly concave in $ given D.

B MJ (:n; D) is not strictly concave in D.

B MLE is consistent asymptotically normal provided no perfect

classifiers exist.

B Did not really address computational issues.

71

C Freedman and Sekhon (2010, Pol. Anal., p. 145) note numerical problems

with bivariate probit.

B

< strictly monotone in D12.

< convex on(-1, 0)

< concave on (0, 1)

B

72

C Huguenin, Pelgrin and Holly (2009) provide an exact decomposition of

the cdf of the J-variate (standardized) normal vector encountered in the

likelihood function of a multivariate probit model.

B Obtain a sum of multivariate integrals, in which the highest

dimension of the integrands is J ! 1.

< Integration domains are bounded - simplifying the integration.

< Based on Plackett (1954, Biometrika).

B J = 4 is feasible.

B Also consider the singular case.

73

C Gassmann (2003, JCGS) also employed a recursive numerical method

based on Plackett (1954, Biometrika).

B Argues it works for J # 10.

B Drezner (1994) and Genz (2004) used the method for J = 3.

74

C Gassmann, Deák, and Szántai (2002, JCGS) described and compared

numerical methods for finding multivariate probabilities over a rectangle.

B Computation times depend on J, the correlation structure, the

magnitude of the sought probability, and the required accuracy.

B Numerical tests were conducted on approximately 3,000 problems

generated randomly in up to J = 20 dimensions.

75

B The findings indicate that direct integration methods give acceptable

results for up to J = 12, provided that the probability mass of the

rectangle is not too large (less than about 0.9).

B For problems with small probabilities (less than 0.3) a crude Monte

Carlo method gives reasonable results quickly, while bounding

procedures perform best on problems with large probabilities (> 0.9).

B For larger problems numerical integration with quasi-random

Korobov points and a decomposition method due to Deak.

76

C Factor structures for the correlation matrix.

B Integration problems are typically greatly reduced.

B Butler and Moffitt (1992, Econometrica), equicorrelated.

B Ochi and Prentice (1984, Biometrika), equi-correlated.

B Hausman and Wise (1978, Econometrica): J = 3.

B Muthén (1979, JASA): J = 4

B Hausman (1980, Cahiers du Seminaire D’Econometri)

B Bock and Gibbons (1996, Biometrics)

B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.

B Song and Lee (2005, Statistica Sinica): MLE, J = 4.

B Ansari and Jedidi (2000, Psychometrika): Bayesian, J =12.

B Gibbons and Wilcox-Gök (1998, JASA): MLE, J = 5.

77

C Bayesians must sample from non-tractable posterior that is proportional

to the product of the prior and the MP likelihood.

B Data augmentation to the rescue! [Chib and Greenberg (1998,

Biometrika), Albert and Chib (1993, JASA) McCulloch and Rossi

(1994, JoE)].

C It is possible to sample from the posterior without computing the

likelihood function!

78

B Bayesian examples:

< Czado (1996): J = 5.

< Dey and Chen (1996), Chen and Dey (1998, Sankhy~): J = 6.

< Chib and Greenberg (1998): J = 7.

< Voicu and Buddelmeyer (2003, IZA): J = 6.

< Edwards and Allenby (2003, JMR): J = 25, equicorrelated.

< García-Zattera, Jara, Lesaffre and Declerk (2005): J = 8.

< Stefanescu and Turnbull (2005, Biom. J.): equi-correlated.

< Tabet (2007): J = 8.

< Lawrence et al. (2008): J = 6.

< Rosas and Shomer (2008, Leg. Stud. Quar.)

< Duvvuri and Gruca (2010, Psychometrika): J = 4

< Hahn, Carvallo and Scott (2010): J = 100, six factors

79

C MCMC methods have difficulty dealing with the generation of draws

from the correlation matrix, because its conditional density is of

nonstandard form.

B Methods have been developed to generate draws using variants of

< Metropolis-Hastings algorithms [Chib and Greenberg (1998,

Biometrika); Nierop et al. (2000)].

< Griddy-Gibbs methods [Barnard, McCulloch, and Meng (2000);

Liechty, Ramaswamy, and Cohen (2000)].

B These approaches are computationally intensive and therefore are of

limited use when applied to large dimensional problems.

80

C One contested issue is whether to work in terms of

B unidentified covariance matrix [Edwards and Allenby (2003, JMR),

McCulloch and Rossi (1994, JoE), Rossi, Allenby, and McCulloch

(2005, Bayesian Statistics and Marketing)]

B identified correlation matrix [Chib and Greenberg (1998,

Biometrika)]

81

Priors

C Given the information loss in BPO, an informative Bayesian analysis is

an attractive option for supplementing sample information.

B Identification of 2 in the BPO model is often weak, and what is

needed is an informative prior with some public appeal, not a

“noninformative” prior.

B Below is a prior family which should be attractive in many situations.

B Later I will suggest a restricted version depending on only four easily

interpreted hyperparameters.

82

C Assume $ = [$1N, $2N]N z D with joint pdf

where K = K1 + K2.

B This is a conjugate multivariate normal prior for $ given D.

B The prior pdf f$(D* ) is the symmetric beta

83

Posterior

C The posterior pdf of 2 is unfortunately analytically intractable. So

Bayesian analysis rests on the ability to draw a sample from it.

B BPO can be viewed as a binary outcome model in which the link

function is a bivariate standard normal cdf.

B Thus consider the data augmented posterior where

.

84

B Set and r = 0 and proceed as follows.

Step 1: Consider the distribution . The only difference from the

standard data augmentation case is that we must sample from a truncated

bivariate distribution here. Let

85

Set r = r + 1 and sample either

or

86

Step 2: - where

87

Step 3: , where

The random walk Metropolis-Hastings algorithm is a convenient way to

sample from this univariate distribution.

88

C Write D†(r) = D(r - 1) + u, where D†(r) is the candidate value, D(r - 1) is the

current value, and u - ù(0, N–1).

C Let

Generate a proposed value D†(r) by drawing u. Put D(r) = D†(r) with

probability

and leave D(r) = D(r - 1) with probability 1 - "(D†, D).

89

C Go back to step 1 and repeat until convergence is obtained. As R 64, the

resulting draws converge in distribution to the augmented posterior.

C Given a sample (r = 1, 2, ..., R), it is then possible to

compute posterior expectations of a quantity of interest, say, h($1, $2, D).

Examples are:

B Moments of parameters.

B Posterior probabilities of the signs of parameters.

B Predictive density corresponding to covariates

and

(i, j = 0, 1).

90

Restricted Prior

C The following restrictions should be attractive in many cases. Their

appeal rests on the easiness of choosing the hyperparameters.

91

C Suppose

B the first element in xn1 and xn2 is unity,

B any continuous covariates are measured as deviations from their

sample means divided by sample standard deviations, and

B dummy variables are left as is.

B These conventions guarantee a simple interpretation for the intercepts

and render coefficients unitless facilitating prior elicitation.

92

C Suppose , where (i = 1, 2), and

B This prior centers the slopes over zero, leaving specification of the

hyperparameters and to center the intercepts.

B One way of choosing (j = 1, 2) is to consider the average observation

, subjectively assess = Prob(ynj = 1), and then

choose (j = 1, 2).

B The hyperparameter controls the prior precision for $ relative to the

precision in OLS applied separately to the latent system.

93

C The remaining two hyperparameters a and c are candidates for a sensitivity

analysis reflecting the strength of the prior.

B The hyperparameter determines the precision 4(2 + 1) of D.

B Centering the prior for D over zero (implying the two binary decisions

are independent), is often a convenient reference point.

B This prior has many similarities with priors in Adkins and Hill (1996),

Zellner (1986), and Zellner and Rossi (1984, JoE), but applied here to

bivariate probit.

94

C Note ( j = 1, 2), where

B Posterior mean slopes shrink the OLS slopes toward and the

posterior mean intercept shrinks the OLS slope toward

B Centering the slopes over non-zero values, or changing prior precision

matrix to an arbitrary does not complicate matters.

95

Concluding Comments

C multivariate-t [Chib (2000), Dey and Chen (1996, 2008)]

C Chen and Dey (1998, Sankhy~): scale mixture of multivariate normal link.

B Unlike Chib and Greenberg (1998, Biometrika), use a Metropolized

hit-and-run algorithm (Chen and Schmeiser, 1993) .

C Chen and Shao (1999, JMA) considered multivariate data where some are

binary and others are ordinal; Bayesian, J = 3.

C multivariate multinomial [Zhang, Boscardin, and Belin (2008, CSDA)]

96

C Correlated binary response data with covariates:

B Liang and Zeger (1986, Biometrika)

B Zeger and Liang (1986, Biometrics)

B Prentice (1988, Biometrics)

B Tan, Qu and Kutner (1997, Comm. Stat.)

B Carey, Zeger and Diggle (1993, Biometrika)

B Glonek and McCullagh (1995, JRSSB).

Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial...

Documents

Transcript of Identification in Multivariate Partial Observability ProbitIdentification in Multivariate Partial...