Factorized Asymptotic Bayesian Inference for Latent Feature Models

24
Factorized Asymptotic Bayesian Inference for Latent Feature Models Kohei Hayashi 12 1 National Institute of Informatics 2 JST, ERATO, Kawarabayashi Large Graph Project Sept. 5th, 2014 Joint work with Ryohei Fujimaki (NEC Labs America) 1 / 24

description

Casual slides that explains our work accepted in NIPS2013.

Transcript of Factorized Asymptotic Bayesian Inference for Latent Feature Models

Page 1: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Factorized Asymptotic BayesianInference for Latent Feature Models

Kohei Hayashi12

1National Institute of Informatics

2JST, ERATO, Kawarabayashi Large Graph Project

Sept. 5th, 2014

Joint work with Ryohei Fujimaki (NEC Labs America)

1 / 24

Page 2: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Background

Generally, data are High-dimensional and consist oflarge-samples

• Sensor data, texts, images, ...

• Raw data are often hard to interpret for human

One purpose of machine learning: data interpretation

• Aim: to find meaningful features from data

Latent feature models

2 / 24

Page 3: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Example: Mixture Models (MMs)

3 / 24

Page 4: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Example: Mixture Models (MMs)

4 / 24

Page 5: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Model SelectionIn MMs, selection of #components is important• Too many components: difficult to interpret

5 / 24

Page 6: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Factorized Information Criterion (FIC)

Model selection for binary latent variable models

• MMs [Fujimaki&Morinaga AISTATS’12]

• HMMs [Fujimaki&Hayashi ICML’12]

Pros/Cons:

:) Asymptotically equivalent to marginal likelihood• Preferable for “Big Data” scenario

:) Fast computation, no sensitive tuning parameter• An alternative of nonparametric Bayesian methods

:( Only applicable for MM-type models

6 / 24

Page 7: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Contribution

Derive FIC for latent feature models

• Compact and accurate model selection

• Runtime is ×5–50 faster (v.s. IBP)

• May applicable for other non MM-type models (e.g.topic model)

7 / 24

Page 8: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Latent Feature Models (LFMs)

8 / 24

Page 9: Factorized Asymptotic Bayesian Inference for Latent Feature Models

LFM: an Extension of MM

• LFM considers combinations of components

9 / 24

Page 10: Factorized Asymptotic Bayesian Inference for Latent Feature Models

LFM: an Extension of MM

• LFM considers combinations of components10 / 24

Page 11: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Observation Model (Likelihood)For n = 1, . . . , N,

xn = Wzn + εn (1)

• xn ∈ RD: observation• W ∈ RD×K : linear bases• zn ∈ {0, 1}K : binary latent variable• εn ∈ RD: Gaussian noise N(0, diag(λ)−1)

Real

Binary

Real

K

N

D

X ZW

ObservationLatent

Variable Linear Bases

11 / 24

Page 12: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Priors

• p(Z) =∏

n

∏k π

znkk (1− πk)

1−znk

• p(P) for P ≡ {π,W,λ}

12 / 24

Page 13: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Marginal Likelihood

A criterion for Bayesian model selection

p(X) =∑Z

∫dPp(P)p(X,Z|P) (2)

Problems:

• Integral w.r.t. P is intractable

• Sum over Z needs O(2K)

Approach: use

• Laplace approximation

• Variational bound (+ mean field + linearlization)

13 / 24

Page 14: Factorized Asymptotic Bayesian Inference for Latent Feature Models

FIC of LFMs

14 / 24

Page 15: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Variational Lower Bound

Suppose we have p(X,Z) =∫dPp(P)p(X,Z|P), then

log p(X) ≥∑Z

q(Z) log p(X,Z) +H(q) (3)

• H(q) ≡ −∑

Z q(Z) log q(Z)

• Equality holds iff q(Z) is a true posterior

15 / 24

Page 16: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Laplace Approximation of p(X,Z)

Suppose we have maximum likelihood estimators P̂ ..

......

For N →∞,

log p(X,Z)

= log p(X,Z|P̂)− r(Z)− D +K

2logN +Op(1) (4)

• r(Z) ≡ D2

∑k log

∑n znk: complexity of model

16 / 24

Page 17: Factorized Asymptotic Bayesian Inference for Latent Feature Models

By combining the two approxs, we obtain FICLFM:.

......max

qEq

[log p(X,Z|P̂)− r(Z)

]+H(q) +

D +K

2logN

“Asymptotically” equivalent to marginal likelihood:

log p(X) = FICLFM +O(1)

• r(Z) = D2

∑k log

∑n znk prefers sparse Z

0 2 4 6 8 10

-log(x)

17 / 24

Page 18: Factorized Asymptotic Bayesian Inference for Latent Feature Models

FAB AlgorithmMaximize w.r.t. q̃ and P by EM-like algorithm

Maximize w.r.t. q̃

Eq̃[znk]← sigmoid

(cnk + logit(πk)−

D

2∑

m Eq̃[zmk]

)• cnk = w>k diag(λ)(xn −

∑l 6=k Eq̃[znl]wl − 1

2wk)

Maximize w.r.t. W and λ

• Closed-form solutions

Shrinkage Z

• Delete zk and wk if∑

n znk/N ' 018 / 24

Page 19: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Experiments

19 / 24

Page 20: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Artificial Data• Generate X by observation model (D = 30)• error-bar: sd over 10 trials

N

Ela

psed

tim

e (s

ec)

100.5101

101.5102

102.5103

True K=5

100 250 500 10002000

10

100 250 500 10002000

fab em ibp meibp vb

Computational time v.s. N20 / 24

Page 21: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Artificial Data (Cont’d)

N

Est

imat

ed K

5

10

15

20

25

305

100 250 500 1000 2000

10

100 250 500 1000 2000

Selected K v.s. N

21 / 24

Page 22: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Block Data

22 / 24

Page 23: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Real Data• Evaluate testing and training errors (PLL and TLL)

Data Method Time (h) K PLL TLL

Sonar FAB < 0.01 4.4± 1.1 −1.25± 0.02 −1.14± 0.03208× 49 EM < 0.01 48.8± 0.5 −4.04± 0.46 −0.08± 0.07(N ×D) IBP 3.3 69.6± 4.8 −4.48± 0.15 0.13± 0.02Libras FAB < 0.01 19.0± 0.7 −0.63± 0.03 −0.42± 0.03360× 90 EM 0.01 75.6± 8.6 −0.68± 0.11 0.76± 0.24

IBP 4.8 36.4± 1.1 −0.18± 0.01 0.13± 0.01Auslan FAB 0.04 6.0± 0.7 −1.34± 0.15 −0.92± 0.0216180× 22 EM 0.2 22± 0 −1.79± 0.27 −0.78± 0.02

IBP 50.2 73± 5 −4.54± 0.08 0.08± 0.01EEG FAB 1.6 11.2± 1.6 −0.93± 0.02 −0.76± 0.04120576× 32 EM 3.7 32± 0 −0.88± 0.09 −0.59± 0.01

IBP 53.0 46.4± 4.4 −3.16± 0.03 −0.26± 0.05Piano FAB 19.4 58.0± 3.5 −0.83± 0.01 −0.63± 0.0257931× 161 EM 50.1 158.6± 3.4 −0.82± 0.02 −0.45± 0.01

IBP 55.8 89.6± 4.2 −1.83± 0.02 −0.84± 0.05yaleB FAB 2.2 77.2± 7.9 −0.37± 0.02 −0.29± 0.032414× 1024 EM 50.9 929± 20 −4.60± 1.20 0.80± 0.27

IBP 51.7 94.2± 7.5 −0.54± 0.02 −0.35± 0.02USPS FAB 11.2 110.2± 5.1 −0.96± 0.01 −0.64± 0.02110000× 256 EM 45.7 256± 0 −1.06± 0.01 −0.36± 0.01

IBP 61.6 181.0± 4.8 −2.59± 0.08 −0.76± 0.0123 / 24

Page 24: Factorized Asymptotic Bayesian Inference for Latent Feature Models

Summary

• Derive FIC for LFMs

• Develop FAB algorithm accelerating sparseness

• Demonstrate FAB is fast and accurate

24 / 24