Running Markov Chain with and without Markov basesmypages.iit.edu › ~as2014 › talks ›...

.

.

. ..

.

.

Running Markov Chain with and without Markov bases

Hisayuki Hara

Niigata University

May 20 2014Algebraic Statistics 2014 @ Illinois Institute of Technology

This talk is based on joint works with R. Yoshida, A. Takemura, S. Aoki, T. Sei and T. Akasaka

H. Hara (Niigata U) Running Markov Chain w and w /o MB May 20, 2014 1 / 46

Contents

.

. . 1 Markov bases

.

. .

2 MB technologies for discrete logistic regression model

.

. .

3 Lattice basis and exact test

.

. .

4 Lattice basis of THMC model

.

. .

5 Conclusion


Contents

.

. . 1 Markov bases

.

. .


.

. .


.

. .


.

. .

5 Conclusion


Contingency tables

x := {x(i) | i ∈ I} : m-way contingency table.P (x) : full exponential family.b : sufficient statistic.P (x | b) is hypergeometric distribution.Fiber Fb : set of contingency tables sharing b.Problem : the sampling of tables following hypergeometricdistribution from Fb.


moves

When we regard x as an |I| dimensional vector,x and b have a linear relation

Ax = b

A : configuration matrix

move z : integer kernel of A

Az = 0

By adding or subtracting moves, we can move around in Fb.

5 3 2 104 2 4 10 +1 5 4 10

10 10 10 30

1 -1 0 0-1 1 0 0 =0 0 0 00 0 0 0

6 2 2 103 3 4 101 5 4 1010 10 10 30


Algebraic description of moves

move zz = z+ − z−

binomial corresponding to z

uz+ − uz−

uz+=

Y

i

u(i)z+(i), uz−=

Y

i,j

u(i)z−(i)

z+ = {z+(i)} : positive partz− = {z−(i)} : negative parti : a cell of x


Markov basis

.

Definition (Toric ideal)

.

.

.

. ..

.

.

IA :=< uz+ − uz−; Az = 0 > .

.

Theorem (Diaconis and Sturmfels(1998))

.

.

.

. ..

. .

B is a MB of A⇔ {uz+ − uz−; z ∈ B} forms a generator of IA

Algebraically MB is defined as a generator of toric idealassociated with A.


Markov basis

.

Markov basis

.

.

.

. ..

.

.

A Markov basis B is a set of moves connecting “every” fiber Fb.

Any two tables in the same fiber are mutuallyaccessible by moves in MB

x, y ∈ Fb

∃z1, . . . ,zK ∈ B s.t.

y = x +K∑

k=1

zk, x +K′∑k=1

zk ≥ 0, K ′ ≤ K.


Markov basis

.

Markov basis

.

.

.

. ..

.

.

A Markov basis B is a set of moves connecting “every” fiber Fb.

Any two tables in the same fiber are mutuallyaccessible by moves in MB

x, y ∈ Fb

∃z1, . . . ,zK ∈ B s.t.

y = x +K∑

k=1

zk, x +K′∑k=1

zk ≥ 0, K ′ ≤ K.

x

y


Diaconis-Sturmfels Algorithm

This connectivity combined with the standard Metropolis-Hastingsprocedure enables us to sample contingency tables from anirreducible Markov chain whose stationary distribution ishypergeometric distribution (Diaconis and Sturmfels, 1998) .

[Diaconis and Sturmfels algorithm]

Step 0. x : contingency tablePr(x | b) : hypergeometric distribution

Step 1. Sample z ∈ B randomly.Step 2. if x + z ≥ 0

x← x + z with prob min(

Pr(x + z | b)Pr(x | b)

, 1)

.

Go to Step 1.


Difficulty of implementation of exact test with MB

A Markov basis enables us to implement an exact test.There are algebraic algorithms for computing Markov basis.There are some softwares implementing them.

ex. 4ti2

However computational costs of these algorithms are considerablyhigh.In general, the structure of a Markov basis is very complicatedand it is not also easy to obtain an exact list of Markov basistheoretically.Here we introduce some alternative methods for running Markovchain instead of using a Markov basis.

.

.

.

1 A Markov subbasis

.

.

.

2 A lattice basis.


Contents

.

. . 1 Markov bases

.

. .


.

. .


.

. .


.

. .

5 Conclusion


Logistic regression models

p1j : incidence rate for covariates j = 1, 2, . . . , J

p2j = 1− p1j

logistic regression model with one covariate

log(

p1j

1− p1j

)= µ + αj

p1jk : incidence rate for covariates (j, k) = (1, 1), . . . , (J,K).p2jk = 1− p1jk

logistic regression model with two covariates

log(

p1jk

1− p1jk

)= µ + αj + βk,


Logit model with one covariate

sufficient statistic: x1+,∑J

j=1 jx1j .x+1, . . . , x+J are also fixed.

number of trials

b := (x1+, x+1, . . . , x+J ,∑J

j=1 jx1j).Define a fiber Fb by a set of tables sharing b.Configuration Λ(A) :

Λ(A) =(

A 0EJ EJ

), A =

(1 1 . . . 11 2 . . . J

).

EJ : J × J identity matrix.


Markov basis for Poisson regression

Λ(A) : Lawrence lifting of A

A =(

1 1 . . . 11 2 . . . J

).

A : configuration matrix for Poisson regression model

xj ∼ Po(µj), µj = µ + αj, j = 1, . . . , J.

.

Theorem (HTY(2010))

.

.

.

. ..

. .

A set of all degree 2 moves

j1 j1 + k j2 − k j2

1 −1 −1 1

1 ≤ j1 < j2 ≤ J

forms the minimum-fiber Markov basis for Poisson regression with onecovariate.


Lattice basis for logit model

.

Lemma (e.g. HAT2012)

.

.

.

. ..

.

.


z(j1, j2; k) :=j1 j1 + k j2 − k j2

1 −1 −1 1−1 1 1 −1

1 ≤ j1 < j2 ≤ J

forms a (not necessarily minimal) lattice basis for logit model with onecovariate.

Lattice basis : a basis of kerΛ(A) ∩ Z|I|.Deg 4 moves are minimum degree moves for logit model.


Lattice basis for logit model

.

Theorem (Sturmfels (1996))

.

.

.

. ..

.

.

Maximum degree of moves in a minimal Markov basis is 2J − 2.

logit model⇔ homogeneous primitive partition identity

.

Corollary

.

.

.

. ..

.

.

The set of all moves (linear combination of degree 4 moves) withdegree less than or equal to 2J − 2 forms a Markov basis for logitmodel with one covariate.


Markov subbasis for the model with one covariate

x+j : number of trialsx+j are usually fixed by a sampling scheme and sometimes wecan assume x+j > 0.

.

Theorem(Chen et al.(2005), HTY(2010))

.

.

.

. ..

.

.


z(j1, j2; k) :=j1 j1 + k j2 − k j2

1 −1 −1 1−1 1 1 −1

1 ≤ j1 < j2 ≤ J

connects every fiber satisfying (x+1, . . . , x+J) > 0.


Model with two covariates


log(

p1jk

1− p1jk

)= µ + αj + βk

sufficient statistic: x1++,∑J

j=1 jx1j+,∑K

k=1 kx1+k.x+jk, j = 1, . . . , J , k = 1, . . . , K are also fixed.

number of trials

b := (x1++, x+11, . . . , x+jk,∑J

j=1 jx1j+,∑K

k=1 kx1k+).Fb : a set of tables sharing b.



Configuration Λ(A⊗B)

Λ(A⊗B) =(

A⊗B 0EJK EJK

)

A =(

1 1 . . . 11 2 . . . J

)B =

(1 1 . . . 11 2 . . . K

).

EJK : JK × JK identity matrix.A⊗B : Segre product

A⊗B is a configuration for Poisson regression

xjk ∼ Po(µjk), µjk = µ + αj + βk.



In general Poisson regression with m covariates

x(i) ∼ Po(µ(i)), µ(i) = µ + α1i1 + α2i2 + · · ·+ αmim

has a Markov basis consisting of only deg 2 moves (HTY2010).Logit model has a lattice basis consisting of only deg 4 moves.

including multinomial logit model

In case of the model with two covariates,j1 j1 + c j2 − c j2

k1 1 0 0 0k1 + d 0 −1 0 0k2 − d 0 0 −1 0

k2 0 0 0 1

j1 j1 + c j2 − c j2k1 −1 0 0 0

k1 + d 0 1 0 0k2 − d 0 0 1 0

k2 0 0 0 −1i = 1 i = 2

A sharp bound of Markov degree is not known.


Markov subbasis for the model with two covariate

.

Theorem(HTY(2010))

.

.

.

. ..

.

.

A set of all degree 4 moves defined in the previous slide connectsevery fiber satisfying (x+11, . . . , x+JK) > 0.

When every covariate is dummy variable, this theorem is extendedalso to the model with three covariates (AHT(2012)).However, the extension to the model with more than threecovariates or to the model with multinomial responce seems to bedifficult at this point.


Contents

.

. . 1 Markov bases

.

. .


.

. .


.

. .


.

. .

5 Conclusion


Lattice basis

x : contingency tableb : sufficient statisticsA : configuration

Ax = b

M : a set of moves

M := kerA ∩ Z|I| = {z ∈ Z|I| | Az = 0}

l = dim kerA

Lattice basis L := {z1, . . . ,zl} is an basis ofM.


Exact test with Lattice basis

The computation of L is easy.L itself does not guarantee the connectivity of every fiber.Every move z is written as an integer linear combination ofelements of L

z = α1z1 + . . . + αlzl, αk ∈ Z.

If we generate moves in such a way that every integercombination of elements of a lattice basis has a positiveprobability, then we can indeed guarantee the connectivity ofevery fiber (Diaconis and Sturmfels(1998), HAT2012).


An algorithm for generating moves

.

An algorithm for generating moves

.

.

.

. ..

.

.

Lattice basis L := {z1, . . . ,zl}.

.

.

.

1 |αk|iid∼ Po(λ) (exclude the case |α1| = · · · = |αl| = 0).

.

.

.

2 αk ← |αk| or αk ← −|αk| with probability 1/2 for k = 1, . . . , l.

.

.

.

3 Compute a move z by

z = α1z1 + . . . + αlzl.


Exact test for trinomial logit model

Trinomial logit model with two covariates

pijk =exp(µ + αij + βik)

1 +∑I−1

i=1 exp(µ + αij + βik)

i = 1, 2, 3, j = 1, . . . , J, k = 1, . . . ,K.

MB is not known.There exists a LB consisting of deg 4 moves.The number of all deg 4 moves >> l = dimkerA.Here we regard all deg 4 moves as a lattice basis.


Simulation

We set J = 4, K = 4.Cell frequency is five on average.H1 is the model with three covariates.Test statistic is LR statistic.Degree of freedom of asymptotic χ2 is one.In this case we can also compute MB by 4ti2.


Results

Markov basis(burn-in, iteration) = (1000, 10000)

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

05

1015

2025

30

number of sampling

LR.li

st.a

ll

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series LR.list


Results

Lattice basis with Po(1)(burn-in, iteration) = (1000, 10000)

Den

sity

0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

05

1015

2025

30number of sampling

LR.li

st.a

ll0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series LR.list

Lattice basis with Po(50)(burn-in, iteration) = (1000, 10000)

Den

sity

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

0 2000 4000 6000 8000 10000

05

1015

2025

30

number of sampling

LR.li

st.a

ll

0 2000 4000 6000 8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series LR.list


Discussion

Diaconis and Sturmfels (1998) pointed out that lattice basis issometimes less stable than Markov basis.In our setting l = 42.When we use only l moves as a lattice basis, the performancegets less stable.Here we use all degree 4 moves (>> l) as a lattice basis and thenthe performance gets more stable.


Contents

.

. . 1 Markov bases

.

. .


.

. .


.

. .


.

. .

5 Conclusion


Markov chain with finite states

Xt, t = 1, . . . , T (≥ 3) : Markov chain with a finite state spaceS = {1, . . . , S}, S ≥ 2

ω = (s1, . . . , sT ) ∈ ST : observed path

.

.

1

.

2

.

T

.

. . .

.

1

.

2

.

...

.

S


Homogeneous Markov chain model

pij = P (Xt+1 = j | Xt = i)time homogeneous

{πi}： initial distribution of X1

HMC modelp(ω) = πs1ps1s2 . . . psT−1sT ,

HMC is a curved exponential family due to the constraint∑st+1∈S

pstst+1 = 1

We cannot directly apply a conditional test to HMC.


Toric homogeneous Markov chain (THMC) model

toric homogeneous Markov chain (THMC) model

p(ω) = cγs1βs1s2 . . . βsT−1sT

HMC without the constraints∑

st+1∈S pstst+1 = 1THMC is a full exponential family.THMC is the envelope exponential family of HMC model.

We test a larger null hypothesis

H0 : THMC model

HMC ⊂ THMCTHMC is rejected→ HMC is also rejected.

Since THMC model is a full exponential family, we can useconditional test with a Markov basis.


Markov chain and contingency table

observation : n paths W = {ω1, . . . , ωn}We can identify n path with T -way contingency tablex = {x(ω), ω ∈ ST } such that

x(ω) is a cell frequencysample size is nnumber of levels is S

0000 8 1000 130001 14 1001 110010 13 1010 90011 19 1011 90100 11 1100 100101 9 1101 80110 11 1110 90111 13 1111 10

3rd0 1

1st 2nd 4th 4th0 1 0 1

0 0 8 14 13 190 1 11 9 11 131 0 13 11 8 91 1 10 8 9 10

Markov chain of S = 2 and T = 4


Sufficient statistic of THMC model

x1s ： frequency of initial state s1 = s

xtij : number of transition i→ j at t

x+ij =

T−1∑t=1

xtij : total number of transitions from i to j in all paths W

sufficient statistic of THMC

b = b(x) = {x1s, s ∈ S} ∪ {x+

ij , i, j ∈ S}.


Lattice basis for THMC model

A : (S + S2)× ST .rankA = S2 + S − 1.l = dim kerA = ST − S2 − S + 1.Computation of MB for this model is known to be quite hard.

S = 2 or T = 3e.g. HT2011, TA2011

Computation of LB is easy.When T is large, too large strage space is required for a latticebasis.Here we derive a LB for THMC model explicitly.Then we give an adaptive algorithm to generate moves by usingthe LB.


Type 1 move for THMC model

type 1 move : z = {z(ω), ω ∈ ST }

z(ω) =

1, ω = (1s2 · · · sT−2sT−1sT ),

(1 · · · 1s3), (1 · · · 1s4), . . . , (1 · · · 1sT−1)−1, ω = (1 · · · 1s2s3), (1 · · · 1s3s4), . . . , (1 · · · 1sT−1sT )

Degree is at most T − 2.There are ST−1 − S2 moves in this class.A move in this class is not square-free in general.


Type 2 move for THMC model

type 2 move z = {z(ω), ω ∈ ST }

z(ω) =

1, ω = (s1s2 · · · sT−2sT−1sT ),

(1 · · · 1s11), (1 · · · 1s2), . . . , (1 · · · 1sT−1)−1, ω = (s11 · · · 1), (1 · · · 1s1s2), . . . , (1 · · · 1sT−1sT )

There are (S − 1)(ST−1 − 1) moves in this class.A move in this class is not square-free in general.


Lattice basis for THMC model

.

Theorem(AHT2014)

.

.

.

. ..

.

.

The union of type 1 moves and type 2 forms a lattice basis for THMCmodel.

Then (z1, . . . ,zl)t is written by∗ ∗ ∗ 1∗ ∗ ∗ 1 0...

......

. . ....

......

. . .∗ ∗ ∗ ∗ 1


Adaptive move generating algorithm

.

Algorithm for generating moves for THMC model

.

.

.

. ..

.

.

|α| =∑l

i=1 |αi|

.

. . 1 Generate |α| > 0 randomly.

.

..

2 Decide whether type 1 or type 2.2-1(a). The case of type 1

Generate S2, . . . , ST−2.Generate a move in type 1.

2-2. The case of type 2Generate S1, . . . , ST

Generate a move in type 2.

.

.

.

3 Generate |α| moves in this way and compute

z = α1z1 + . . . + αlzl

Computational cost for generating a move is about O(|α|T ).No strage space is needed.


Marijuana use of 120 females

19781 2 3

1977 1979 1979 19791 2 3 1 2 3 1 2 3

1 76 6 6 4 12 1 0 0 12 3 0 0 1 1 2 1 0 23 0 0 0 0 0 0 0 2 2

A longitudinal data from 1977 to 1979 on marijuana use

120 female respondents who were age 14 in 1977

1. never use, 2. no more than once a month,3. more than once a month

test statistic : Pearson’s χ2 statistic

We sampled 100,000 tables by MCMC after 50,000 burn-in steps.


Results

Markov basis(burn-in, iteration) = (5000, 15000)

Den

sity

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 5 10 15 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 5000 10000 15000

510

1520

number of sampling

P.li

st0 5000 10000 15000

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series P.list

Proposed algorithm, Po(0.5)(burn-in, iteration) = (5000, 15000)

Den

sity

0 5 10 15 20 25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 5 10 15 20 25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0 5000 10000 15000

510

1520

number of sampling

P.li

st

0 5000 10000 15000

0.0

0.2

0.4

0.6

0.8

1.0

Lag

AC

F

Series P.list


Contents

.

. . 1 Markov bases

.

. .


.

. .


.

. .


.

. .

5 Conclusion


Conclusion

In this talk we proposed alternative methods to implement anexact test instead of using a Markov basis.These methods are practical and show comparable performancesto Markov bases method.It is not necessarily easy to obtain practical Markov subbasesconnecting specific fibers in general.The computation of a Lattice basis is easy for every model.When a table is large or sample size is small, the performances oflattice basis method become less stable.Markov basis technology still have many implementationproblems.


References

Aoki, S., Hara, H. and Takemura, A. (2012). Markov Bases in Algebraic Statistics.Springer, New York.

Chen, Y., Dinwoodie, I., Dobra, A. and Huber, M. (2005).Lattice points, contingency tables and samplingInteger points in polyhedra-geometry, number theory, algebra, optimization, 65-78.

Diaconis, P. and Sturmfels, B. (1998).Algebraic algorithms for sampling from conditional distributions.Ann. Statist., 26, 363-397.

Hara, H., Aoki, S. and Takemura, A. (2012).Running Markov chain without Markov basis.Harmony of Grobner Bases and the Modern Industrial Society, World Scientific(Takayuki Hibi ed.), pp.45 - 62(2012).

Hara, H. and Takemura, A. (2010).Connecting tables with zero-one entries by a subset of Markov basis.Algebraic Methods in Statistics and Probability (II - Urbana Volume)in AMS Contemporary Mathematics Series, 516, 199-213.

Hara, H., Takemura, A. and Yoshida, R. (2010).On connectivity of fibers with positive marginals in multiple logistic regression.J. Multivariate Anal., 101, 909-925.


Running Markov Chain with and without Markov basesmypages.iit.edu › ~as2014 › talks ›...

Documents

Transcript of Running Markov Chain with and without Markov basesmypages.iit.edu › ~as2014 › talks ›...