Binary Contingency Tables in Theory and Practice

Ivona Bezáková(Rochester Institute of Technology)

Based on joint works with Nayantara Bhatnagar, Alistair Sinclair, Daniel Štefankovič, and Eric Vigoda.

DIMACS Workshop on Markov Chain Monte Carlo: Synthesizing Theory and Practice June 5th, 2007

The Voyage of the Beagle

Galápagos archipelago (1835)

Darwin’s Finches

2378164210

9 3 5 7 3 8 10 9 10 8

chance

competitive pressures

Input:

Sample space: 0/1 tables satisfying the marginals

Goal: count / sample

Binary Contingency Tables

marginals(row sums r1, r2, …, rm, column sums c1, c2, …, cn)

Input:

Sample space: 0/1 tables satisfying the marginals

Goal: count / sample

Binary Contingency Tables

marginals(row sums r1, r2, …, rm, column sums c1, c2, …, cn)

1111 1

Different Approaches

Theory (Markov chain Monte Carlo with simulated annealing)

• Jerrum-Sinclair-Vigoda ’01: approximate permanent in O*(n10), yields O*((mn)10) algorithm for m x n binary contingency tables

• Bezáková-Bhatnagar-Vigoda ’06: O*((mn)3(m+n)5)

Practice (sequential importance sampling, Chen-Diaconis-Holmes-Liu ’05)

• Bezáková-Sinclair-Štefankovič-Vigoda ’06: negative example

• Jose Blanchet ’06: SIS works if marginals O(n1/4)

• Bayati-Kim-Saberi ’07: alternative importance sampling method, works if marginals O(n1/4)

Practice (the switching Markov chain, Diaconis-Gangolli ’94)

• Kannan-Tetali-Vempala ’97, Cooper-Dyer-Greenhill ’05: works for regular marginals

Permanent: Broder Chain [Broder ’88]

What for: uniform sampling of perfect matchings

How: Markov chain on perfect + near-perfect matchings

Perfect matching:

Permanent:

subset of vertex-disjoint edges covering all vertices

number of all perfect matchings

Perfect matching:

Permanent:

subset of vertex-disjoint edges covering all vertices

number of all perfect matchings

At a perfect matching:

- remove a random edge

At a near matching:

- choose a random vertex

At a near matching:

• if unmatched: match to the other hole

At a near matching:

• if matched: slide adjacent edge

At a near matching:

Broder Chain

Mixes in polynomial time ? Even if it did…

Broder Chain

1 perfect matching

Broder Chain

1 perfect matching

near matchings≥2(n/4)

Broder Chain

1 perfect matching

near matchings≥2(n/4)

Thm [Jerrum-Sinclair ’89]:Rapid mixing if perfects polynomially related to nears.

Jerrum-Sinclair-Vigoda ’01:Change the weight so that perfect matchingstake polynomial fraction.

Simulated Annealing for Permanent

Jerrum-Sinclair-Vigoda ’01:

n2+1 regions, very different size

Change the weight so that perfect matchingstake polynomial fraction.

Originally:

n2+1 regions, very different size

the same

Ideal weights(for a matching with holes u,v):

(# perfects)

(# nears with holes u,v)

n2+1 regions, very different sizethe same

(# perfects)

A perfect matching sampled with prob. 1/(n2+1)

Computing ideal weights as hard as original problem ?

Solution: Approximate

(# perfects)

Unordered state Ordered state

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

(# perfects)

Kn,n G

(EASY) (DIFFICULT)

Solution: Approximate λλλλ = 1 … ~0

Ideal weights(# perfects)

λλλλ = 1 … ~0

need to be λλλλ-weighted:

w(u,v) =

λλλλ(M) = λλλλ# λλλλ-edges in M

λλλλ(S) = ∑∑∑∑ λλλλ(M)

λλλλ( P P P P )

λλλλ( N N N N (u,v) )

M in S

λλλλ = 1

λλλλ( P P P P )

λλλλ( N N N N (u,v) )w(u,v) =

Initially, λλλλ = 1. Thus w(u,v) = n!/(n-1)! = n.

Algorithm (sketch):

Later, have approx. of w(u,v). Run chain to improve the approx. Decrease λλλλ (until ~0).

(Improved approx. of old λ = starting approx. of new λ)

Thm [Jerrum-Sinclair-Vigoda ’01]: Weighted Broder chain mixes if w(u,v) approximated within a constant factor.

λλλλ = 0.7

λλλλ( P P P P )

Algorithm (sketch):

λλλλ = 0.7

λλλλ( P P P P )

Algorithm (sketch):

4-apx2

λλλλ = 0.7 0.6

λλλλ( P P P P )

Algorithm (sketch):

4-apx2 = 4-apx for

λλλλ = 0.7 0.6 …~0

λλλλ( P P P P )

Algorithm (sketch):

4-apx2 = 4-apx for

0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

BCT: Bipartite Graphs with Given Degrees

2 3 1 2

2 2 2 2

columns

0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

2 3 1 2

2 2 2 2

“Sliding” Markov Chain on perfect and near tables

Perfect: remove a random edge

Near: slide edges or match

0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 1

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 1 0

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 1 0 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

1 0 0 1

0 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

0 0 0 1

1 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

0 0 0 1

1 1 0 0

columns

2 3 1 2

2 2 2 2

2 0 1 0 1

1 0 1 0

0 1 0 1

1 1 0 0

columns

2 3 1 2

2 2 2 2

Simulated Annealing for BCT ?

22 3 1

2 2 2 2

Ideal weights

(# perfects)

(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2?

Bezáková-Bhatnagar-Vigoda ’06

22 3 1

2 2 2 2

Ideal weights

(# perfects)

(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2? on Kn,n

22 3 1

2 2 2 2

Ideal weights

(# perfects)

(EASY) (DIFFICULT)

2 3 1 2

2 2 2 2on Kn,n

2 3 1 2

2 2 2 2on G*

22 3 1

2 2 2 2 where

w(u,v) = λλλλ( P P P P )

T in S

λλλλ(T) = λλλλ# λλλλ-edges in T

λλλλ(S) = ∑∑∑∑ λλλλ(T)

Recall that

2 3 1 2

2 2 2 2on Kn,n

2 3 1 2

2 2 2 2on G*

λλλλ = ~0

λλλλ

……………………… 1

22 3 1

2 2 2 2

λλλλ

The catch :

What, if for some u,v, there is no near-table which uses all real edges ? Then,

λλλλ( N N N N (u,v) ) = 0 for λλλλ = 0.

T in S

λλλλ(T) = λλλλ# λλλλ-edges in T

λλλλ(S) = ∑∑∑∑ λλλλ(T)

Recall that

22 3 1

2 2 2 2

Recall that

Thm [Bezáková-Bhatnagar-Vigoda ’06]:

There exists a graph G* with given degree sequence s.t. between any two vertices there exists an “alternating”path of length ≤≤≤≤ 5.

22 3 1

2 2 2 2

Recall that

Corollary: There exists a (u,v)-near-table similar to G*.

22 3 1

2 2 2 2

Recall that

a near-table T

22 3 1

2 2 2 2

λλλλ w(u,v) = λλλλ( P P P P )

Recall that

Corollary: λλλλ( N N N N (u,v) ) “easy” to compute for λλλλ = ~0.

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

22 3 1

2 2 2 2

Recall that

G* Kn,n

Recall thatH Kn,n

22 3 1

2 2 2 2

It is possible to sample/count bipartite graphs of given degree sequence (which are subgraphs of a given graph H) in time O*((nm)3(n+m)5).

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Recall thatH Kn,n

22 3 1

2 2 2 2

Importance Sampling for counting problems

with positive probability σσσσ(x)>0

Probability distribution σσσσon the points + ♦

Random variable ηηηη(s) =1/σσσσ(s)

if s in the set

if s is ♦

Unbiased estimator

E[ηηηη] = ∑∑∑∑ σσσσ(x).1/σσσσ(x) = size of the set

a specific σσσσ

• fill table column-by-column

• assign each column ignoring other column sums

[Chen-Diaconis-Holmes-Liu ’05]

Sequential Importance Sampling for BCT

a specific σσσσ

4assign the column with probability proportional to

a specific σσσσ

∏∏∏∏ ri/(n-ri)

where product ranges over i: rows with assignment 1

a specific σσσσ

where product ranges over i: rows with assignment 1 3

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3

∏∏∏∏ ri/(n-ri)

assign the column with probability proportional to

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3 2

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3 22 2

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3 22 2

∏∏∏∏ ri/(n-ri)

a specific σσσσ

where product ranges over i: rows with assignment 1 3 3 22 2 1

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

a specific σσσσ

∏∏∏∏ ri/(n-ri)

A Counterexample for SIS

1 1 1 11 1 γγγγm

ββββm

Thm [Bezáková-Sinclair-Štefankovič-Vigoda ‘06]:

For any ββββ ≠≠≠≠ γγγγ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).

1 1 1 11 1

ββββm

For any ββββ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).

Simpler example

1 1 1 11 1

ββββm

1Intuition

Random table:

- randomly choose ββββm ones

1 1 1 11 1

ββββm

1Intuition

Random table:

1 1 1 11 1

ββββm

1Intuition

Random table:

1 1 1 11 1

ββββm

1Intuition

Random table:

ααααm

Expect: αβαβαβαβm ones

SIS: asymptotically fewer

Intuition

Expect: αβαβαβαβm ones

SIS: asymptotically fewer

all tables

tables with ~αβαβαβαβm ones

tables seen by SIS whp

1 1 1 11 1 γγγγm

ββββm

For any ββββ ≠≠≠≠ γγγγ, SIS output after any subexponentialnumber of trials is off by an exponential factor(with high probability).

Result holds for any order of rows/columns.

Alternating rows and columns?

50 100 150 200 250 300 350

SIS – Experimental Results

Bad example, m = 300, ββββ = 0.6, γγγγ = 0.7

log-scaleof SIS estimate

number SIS steps

correct

• Practical algorithm ?

• Detecting convergence of SIS

• SIS for larger marginals ?

• The Switching Markov chain of Diaconis-Gangolli ?

• General contingency tables

• Cell-bounded tables

• Counting non-bipartite graphs with a given degree sequence

Open Problems

Binary Contingency Tables in Theory and Practice - Rochester

Documents

Transcript of Binary Contingency Tables in Theory and Practice - Rochester