Factorized Asymptotic Bayesian Inference for Latent Feature Models
-
Upload
kohei-hayashi -
Category
Data & Analytics
-
view
68 -
download
1
description
Transcript of Factorized Asymptotic Bayesian Inference for Latent Feature Models
Factorized Asymptotic BayesianInference for Latent Feature Models
Kohei Hayashi12
1National Institute of Informatics
2JST, ERATO, Kawarabayashi Large Graph Project
Sept. 5th, 2014
Joint work with Ryohei Fujimaki (NEC Labs America)
1 / 24
Background
Generally, data are High-dimensional and consist oflarge-samples
• Sensor data, texts, images, ...
• Raw data are often hard to interpret for human
One purpose of machine learning: data interpretation
• Aim: to find meaningful features from data
Latent feature models
2 / 24
Example: Mixture Models (MMs)
3 / 24
Example: Mixture Models (MMs)
4 / 24
Model SelectionIn MMs, selection of #components is important• Too many components: difficult to interpret
5 / 24
Factorized Information Criterion (FIC)
Model selection for binary latent variable models
• MMs [Fujimaki&Morinaga AISTATS’12]
• HMMs [Fujimaki&Hayashi ICML’12]
Pros/Cons:
:) Asymptotically equivalent to marginal likelihood• Preferable for “Big Data” scenario
:) Fast computation, no sensitive tuning parameter• An alternative of nonparametric Bayesian methods
:( Only applicable for MM-type models
6 / 24
Contribution
Derive FIC for latent feature models
• Compact and accurate model selection
• Runtime is ×5–50 faster (v.s. IBP)
• May applicable for other non MM-type models (e.g.topic model)
7 / 24
Latent Feature Models (LFMs)
8 / 24
LFM: an Extension of MM
• LFM considers combinations of components
9 / 24
LFM: an Extension of MM
• LFM considers combinations of components10 / 24
Observation Model (Likelihood)For n = 1, . . . , N,
xn = Wzn + εn (1)
• xn ∈ RD: observation• W ∈ RD×K : linear bases• zn ∈ {0, 1}K : binary latent variable• εn ∈ RD: Gaussian noise N(0, diag(λ)−1)
Real
Binary
Real
K
N
D
X ZW
ObservationLatent
Variable Linear Bases
11 / 24
Priors
• p(Z) =∏
n
∏k π
znkk (1− πk)
1−znk
• p(P) for P ≡ {π,W,λ}
12 / 24
Marginal Likelihood
A criterion for Bayesian model selection
p(X) =∑Z
∫dPp(P)p(X,Z|P) (2)
Problems:
• Integral w.r.t. P is intractable
• Sum over Z needs O(2K)
Approach: use
• Laplace approximation
• Variational bound (+ mean field + linearlization)
13 / 24
FIC of LFMs
14 / 24
Variational Lower Bound
Suppose we have p(X,Z) =∫dPp(P)p(X,Z|P), then
log p(X) ≥∑Z
q(Z) log p(X,Z) +H(q) (3)
• H(q) ≡ −∑
Z q(Z) log q(Z)
• Equality holds iff q(Z) is a true posterior
15 / 24
Laplace Approximation of p(X,Z)
Suppose we have maximum likelihood estimators P̂ ..
......
For N →∞,
log p(X,Z)
= log p(X,Z|P̂)− r(Z)− D +K
2logN +Op(1) (4)
• r(Z) ≡ D2
∑k log
∑n znk: complexity of model
16 / 24
By combining the two approxs, we obtain FICLFM:.
......max
qEq
[log p(X,Z|P̂)− r(Z)
]+H(q) +
D +K
2logN
“Asymptotically” equivalent to marginal likelihood:
log p(X) = FICLFM +O(1)
• r(Z) = D2
∑k log
∑n znk prefers sparse Z
0 2 4 6 8 10
-log(x)
17 / 24
FAB AlgorithmMaximize w.r.t. q̃ and P by EM-like algorithm
Maximize w.r.t. q̃
Eq̃[znk]← sigmoid
(cnk + logit(πk)−
D
2∑
m Eq̃[zmk]
)• cnk = w>k diag(λ)(xn −
∑l 6=k Eq̃[znl]wl − 1
2wk)
Maximize w.r.t. W and λ
• Closed-form solutions
Shrinkage Z
• Delete zk and wk if∑
n znk/N ' 018 / 24
Experiments
19 / 24
Artificial Data• Generate X by observation model (D = 30)• error-bar: sd over 10 trials
N
Ela
psed
tim
e (s
ec)
100.5101
101.5102
102.5103
True K=5
100 250 500 10002000
10
100 250 500 10002000
fab em ibp meibp vb
Computational time v.s. N20 / 24
Artificial Data (Cont’d)
N
Est
imat
ed K
5
10
15
20
25
305
100 250 500 1000 2000
10
100 250 500 1000 2000
Selected K v.s. N
21 / 24
Block Data
22 / 24
Real Data• Evaluate testing and training errors (PLL and TLL)
Data Method Time (h) K PLL TLL
Sonar FAB < 0.01 4.4± 1.1 −1.25± 0.02 −1.14± 0.03208× 49 EM < 0.01 48.8± 0.5 −4.04± 0.46 −0.08± 0.07(N ×D) IBP 3.3 69.6± 4.8 −4.48± 0.15 0.13± 0.02Libras FAB < 0.01 19.0± 0.7 −0.63± 0.03 −0.42± 0.03360× 90 EM 0.01 75.6± 8.6 −0.68± 0.11 0.76± 0.24
IBP 4.8 36.4± 1.1 −0.18± 0.01 0.13± 0.01Auslan FAB 0.04 6.0± 0.7 −1.34± 0.15 −0.92± 0.0216180× 22 EM 0.2 22± 0 −1.79± 0.27 −0.78± 0.02
IBP 50.2 73± 5 −4.54± 0.08 0.08± 0.01EEG FAB 1.6 11.2± 1.6 −0.93± 0.02 −0.76± 0.04120576× 32 EM 3.7 32± 0 −0.88± 0.09 −0.59± 0.01
IBP 53.0 46.4± 4.4 −3.16± 0.03 −0.26± 0.05Piano FAB 19.4 58.0± 3.5 −0.83± 0.01 −0.63± 0.0257931× 161 EM 50.1 158.6± 3.4 −0.82± 0.02 −0.45± 0.01
IBP 55.8 89.6± 4.2 −1.83± 0.02 −0.84± 0.05yaleB FAB 2.2 77.2± 7.9 −0.37± 0.02 −0.29± 0.032414× 1024 EM 50.9 929± 20 −4.60± 1.20 0.80± 0.27
IBP 51.7 94.2± 7.5 −0.54± 0.02 −0.35± 0.02USPS FAB 11.2 110.2± 5.1 −0.96± 0.01 −0.64± 0.02110000× 256 EM 45.7 256± 0 −1.06± 0.01 −0.36± 0.01
IBP 61.6 181.0± 4.8 −2.59± 0.08 −0.76± 0.0123 / 24
Summary
• Derive FIC for LFMs
• Develop FAB algorithm accelerating sparseness
• Demonstrate FAB is fast and accurate
24 / 24