Factorized Asymptotic BayesianInference for Latent Feature Models
Kohei Hayashi12
1National Institute of Informatics
2JST, ERATO, Kawarabayashi Large Graph Project
Sept. 5th, 2014
Joint work with Ryohei Fujimaki (NEC Labs America)
1 / 24
Background
Generally, data are High-dimensional and consist oflarge-samples
• Sensor data, texts, images, ...
• Raw data are often hard to interpret for human
One purpose of machine learning: data interpretation
• Aim: to find meaningful features from data
Latent feature models
2 / 24
Example: Mixture Models (MMs)
3 / 24
Example: Mixture Models (MMs)
4 / 24
Model SelectionIn MMs, selection of #components is important• Too many components: difficult to interpret
5 / 24
Factorized Information Criterion (FIC)
Model selection for binary latent variable models
• MMs [Fujimaki&Morinaga AISTATS’12]
• HMMs [Fujimaki&Hayashi ICML’12]
Pros/Cons:
:) Asymptotically equivalent to marginal likelihood• Preferable for “Big Data” scenario
:) Fast computation, no sensitive tuning parameter• An alternative of nonparametric Bayesian methods
:( Only applicable for MM-type models
6 / 24
Contribution
Derive FIC for latent feature models
• Compact and accurate model selection
• Runtime is ×5–50 faster (v.s. IBP)
• May applicable for other non MM-type models (e.g.topic model)
7 / 24
Latent Feature Models (LFMs)
8 / 24
LFM: an Extension of MM
• LFM considers combinations of components
9 / 24
LFM: an Extension of MM
• LFM considers combinations of components10 / 24
Observation Model (Likelihood)For n = 1, . . . , N,
xn = Wzn + εn (1)
• xn ∈ RD: observation• W ∈ RD×K : linear bases• zn ∈ {0, 1}K : binary latent variable• εn ∈ RD: Gaussian noise N(0, diag(λ)−1)
Real
Binary
Real
K
N
D
X ZW
ObservationLatent
Variable Linear Bases
11 / 24
Priors
• p(Z) =∏
n
∏k π
znkk (1− πk)
1−znk
• p(P) for P ≡ {π,W,λ}
12 / 24
Marginal Likelihood
A criterion for Bayesian model selection
p(X) =∑Z
∫dPp(P)p(X,Z|P) (2)
Problems:
• Integral w.r.t. P is intractable
• Sum over Z needs O(2K)
Approach: use
• Laplace approximation
• Variational bound (+ mean field + linearlization)
13 / 24
FIC of LFMs
14 / 24
Variational Lower Bound
Suppose we have p(X,Z) =∫dPp(P)p(X,Z|P), then
log p(X) ≥∑Z
q(Z) log p(X,Z) +H(q) (3)
• H(q) ≡ −∑
Z q(Z) log q(Z)
• Equality holds iff q(Z) is a true posterior
15 / 24
Laplace Approximation of p(X,Z)
Suppose we have maximum likelihood estimators P̂ ..
......
For N →∞,
log p(X,Z)
= log p(X,Z|P̂)− r(Z)− D +K
2logN +Op(1) (4)
• r(Z) ≡ D2
∑k log
∑n znk: complexity of model
16 / 24
By combining the two approxs, we obtain FICLFM:.
......max
qEq
[log p(X,Z|P̂)− r(Z)
]+H(q) +
D +K
2logN
“Asymptotically” equivalent to marginal likelihood:
log p(X) = FICLFM +O(1)
• r(Z) = D2
∑k log
∑n znk prefers sparse Z
0 2 4 6 8 10
-log(x)
17 / 24
FAB AlgorithmMaximize w.r.t. q̃ and P by EM-like algorithm
Maximize w.r.t. q̃
Eq̃[znk]← sigmoid
(cnk + logit(πk)−
D
2∑
m Eq̃[zmk]
)• cnk = w>k diag(λ)(xn −
∑l 6=k Eq̃[znl]wl − 1
2wk)
Maximize w.r.t. W and λ
• Closed-form solutions
Shrinkage Z
• Delete zk and wk if∑
n znk/N ' 018 / 24
Experiments
19 / 24
Artificial Data• Generate X by observation model (D = 30)• error-bar: sd over 10 trials
N
Ela
psed
tim
e (s
ec)
100.5101
101.5102
102.5103
True K=5
100 250 500 10002000
10
100 250 500 10002000
fab em ibp meibp vb
Computational time v.s. N20 / 24
Artificial Data (Cont’d)
N
Est
imat
ed K
5
10
15
20
25
305
100 250 500 1000 2000
10
100 250 500 1000 2000
Selected K v.s. N
21 / 24
Block Data
22 / 24
Real Data• Evaluate testing and training errors (PLL and TLL)
Data Method Time (h) K PLL TLL
Sonar FAB < 0.01 4.4± 1.1 −1.25± 0.02 −1.14± 0.03208× 49 EM < 0.01 48.8± 0.5 −4.04± 0.46 −0.08± 0.07(N ×D) IBP 3.3 69.6± 4.8 −4.48± 0.15 0.13± 0.02Libras FAB < 0.01 19.0± 0.7 −0.63± 0.03 −0.42± 0.03360× 90 EM 0.01 75.6± 8.6 −0.68± 0.11 0.76± 0.24
IBP 4.8 36.4± 1.1 −0.18± 0.01 0.13± 0.01Auslan FAB 0.04 6.0± 0.7 −1.34± 0.15 −0.92± 0.0216180× 22 EM 0.2 22± 0 −1.79± 0.27 −0.78± 0.02
IBP 50.2 73± 5 −4.54± 0.08 0.08± 0.01EEG FAB 1.6 11.2± 1.6 −0.93± 0.02 −0.76± 0.04120576× 32 EM 3.7 32± 0 −0.88± 0.09 −0.59± 0.01
IBP 53.0 46.4± 4.4 −3.16± 0.03 −0.26± 0.05Piano FAB 19.4 58.0± 3.5 −0.83± 0.01 −0.63± 0.0257931× 161 EM 50.1 158.6± 3.4 −0.82± 0.02 −0.45± 0.01
IBP 55.8 89.6± 4.2 −1.83± 0.02 −0.84± 0.05yaleB FAB 2.2 77.2± 7.9 −0.37± 0.02 −0.29± 0.032414× 1024 EM 50.9 929± 20 −4.60± 1.20 0.80± 0.27
IBP 51.7 94.2± 7.5 −0.54± 0.02 −0.35± 0.02USPS FAB 11.2 110.2± 5.1 −0.96± 0.01 −0.64± 0.02110000× 256 EM 45.7 256± 0 −1.06± 0.01 −0.36± 0.01
IBP 61.6 181.0± 4.8 −2.59± 0.08 −0.76± 0.0123 / 24
Summary
• Derive FIC for LFMs
• Develop FAB algorithm accelerating sparseness
• Demonstrate FAB is fast and accurate
24 / 24
Top Related