A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
-
Upload
tomonari-masada -
Category
Engineering
-
view
420 -
download
3
Transcript of A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
![Page 1: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/1.jpg)
A Simple Stochastic Gradient Variational
Bayesfor Latent Dirichlet
Allocation
Tomonari MASADA ( 正田备也 )Nagasaki University (长崎大学 )
![Page 2: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/2.jpg)
Aim•Obtain an informative summary of a large
set of documents•by extracting word lists, each relating to a
specific topic
Topic modeling
2
![Page 3: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/3.jpg)
3
![Page 4: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/4.jpg)
Contribution•We propose a new posterior estimation for
latent Dirichlet allocation (LDA) [Blei+ 03]
•by applying stochastic gradient variational Bayes
(SGVB) [Kingma+ 14] to LDA
4
![Page 5: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/5.jpg)
5
![Page 6: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/6.jpg)
LDA [Blei+ 03]• Achieve a clustering of word tokens by assigning each word
token to one among the topics.
• : the topic to which the -th word token in document is
assigned.
• : How often the topic is talked about in document ?
• Topic probability distribution in each document
• : How often the word is used to talk about the topic ?
•Word probability distribution for each topic
discrete variablescontinuous variables
6
![Page 7: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/7.jpg)
Variational Bayesian (VB) inference= maximization of evidence lower bound (ELBO)•VB tries to approximate the true posterior.•An approximate posterior is introduced when ELBO is
obtained by applying Jensen's inequality:
• : discrete hidden variables (topic assignments)• : continuous hidden variables (multinomial parameters)
evidence approximate posterior
7
![Page 8: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/8.jpg)
Factorization assumption•We assume the approximate posterior factorizes as
to make the inference tractable.
•Then ELBO can be written as
8
![Page 9: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/9.jpg)
Stochastic gradient variational Bayes (SGVB) [Kingma+ 14]•A general framework for estimating evidence
lower bound (ELBO) in variational Bayes (VB)
•Only applicable to continuous distributions
9
![Page 10: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/10.jpg)
(SGVB) Monte Carlo integration•By using Monte Carlo integration, ELBO can be
estimated with random samples as
• The discrete part is estimated in a similar manner to
the original VB for LDA [Blei+ 03].10
![Page 11: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/11.jpg)
(SGVB) Reparameterization• SGVB can be applied "under certain mild conditions."•We use the logistic normal distributions for approximating
the true posterior of: per-doc topic probability distributions, and: per-topic word probability distributions.
•We can efficiently sample from the logistic normal with reparameterization.
11
![Page 12: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/12.jpg)
Maximize ELBO using gradient ascent
12
![Page 13: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/13.jpg)
13
![Page 14: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/14.jpg)
"Stochastic" gradient VB•The expectation integrations in ELBO are estimated
by Monte Carlo method.
•The derivatives of ELBO depend on random
samples.
•Randomness is incorporated into maximization.• SGVB = VB where gradients are stochastic.
• (Observation) It seems easier to avoid poor local minima.
14
![Page 15: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/15.jpg)
without randomness= with zero standard deviation •A special case of the proposed method is quite
similar to CVB0 [Asuncion+ 09].
•Our method has a context.15
![Page 16: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/16.jpg)
Data sets for evaluation# docs # vocabulary
words
NYT 99,932 46,263
MOVIE 27,859 62,408
NSF 128,818 21,471
MED 125,490 42,83016
![Page 17: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/17.jpg)
17
![Page 18: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/18.jpg)
18
![Page 19: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/19.jpg)
19
![Page 20: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/20.jpg)
20
![Page 21: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/21.jpg)
Not that efficient in time…
•500 iters for NYT data set when
•LNV: 43 hours
•CGS: 14 hours
•VB: 23 hours
•However, parallelization with GPU works.
• (preparing an implementation with TensorFlow)
21
![Page 22: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/22.jpg)
Conclusion•We incorporate randomness into variational
inference for LDA by applying SGVB to LDA.
•The proposed method gives perplexities
comparable to the existing inferences for
LDA.
22
![Page 23: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/23.jpg)
Future work•SGVB is a general framework for devising a
posterior inference for probabilistic models.
•We've already applied SGVB to CTM [Blei+ 05].• This will be poster-presented at APWeb'16.
•SGVB is also applicable to other document models.• NVDM [Miao+ 16]: document modeling with MLP
23
![Page 24: A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation](https://reader031.fdocuments.net/reader031/viewer/2022013005/58997f611a28abb97c8b4cfb/html5/thumbnails/24.jpg)
24