Post on 21-Jun-2020
Inter and Intra Topic Structure Learning with Word Embeddings
He Zhao 1 Lan Du 1 Wray Buntine 1 Mingyaun Zhou 2
AbstractOne important task of topic modeling for text anal-ysis is interpretability. By discovering structuredtopics one is able to yield improved interpretabil-ity as well as modeling accuracy. In this paper,we propose a novel topic model with a deep struc-ture that explores both inter-topic and intra-topicstructures informed by word embeddings. Specifi-cally, our model discovers inter topic structures inthe form of topic hierarchies and discovers intratopic structures in the form of sub-topics, each ofwhich is informed by word embeddings and cap-tures a fine-grained thematic aspect of a normaltopic. Extensive experiments demonstrate that ourmodel achieves the state-of-the-art performancein terms of perplexity, document classification,and topic quality. Moreover, with topic hierar-chies and sub-topics, the topics discovered in ourmodel are more interpretable, providing an illumi-nating means to understand text data.
1. IntroductionSignificant research effort has been devoted to developingadvanced text analysis technologies. Probabilistic topicmodels such as Latent Dirichlet Allocation (LDA), are pop-ular approaches for this task, which discover latent topicsfrom text collections. One preferred property of proba-bilistic topic models is interpretability: one can explainthat a document is composed of topics and a topic is de-scribed by words. Although widely used, most variationsof standard vanilla topic models (e.g., LDA) assume topicsare independent and there are no structures among them.This limits those models’ ability to explore any hierarchicalthematic structures. Therefore, it is interesting to developa model that is capable of exploring topic structures andyields not only improved modeling accuracy but also better
1Faculty of Information Technology, Monash University, Aus-tralia 2McCombs School of Business, University of Texas at Austin.Correspondence to: Lan Du <lan.du@monash.edu>, MingyuanZhou <mingyuan.zhou@mccombs.utexas.edu>.
Proceedings of the 35 th International Conference on MachineLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s).
Table 1. Example local topics with top 10 words
1journal science biology research journals
international cell psychology scientific bioinformatics
2fitness piano guitar swimming violin
weightlifting lessons training swim weight
3taylor prince swift william
jovi bon woman gala pill jon
4san auto theft grand andreas
mobile gta game rockstar december
interpretability.
One popular direction to explore topic structure is usingthe hierarchical/deep representation of text data, such as thenested hierarchical Dirichlet process (nHDP) (Paisley et al.,2015), Deep Poisson Factor Analysis (DPFA) (Gan et al.,2015), and Gamma Belief Network (GBN) (Zhou et al.,2016; Cong et al., 2017). In general, these models assumethat topics in the higher layers of a hierarchy are moregeneral/abstract than those in the lower layers. Therefore,by revealing hierarchical correlations between topics, topichierarchies provide an intuitive way to understand text data.
In addition to topic hierarchies, we are also interested inanalyzing the fine-grained thematic structure within each in-dividual topic. As we know, in conventional models, topicsare discovered locally from the word co-occurrences in acorpus. So we refer those topics as local topics. Due to thelimitation of the context of a target corpus, some local topicsmay be hard to interpret because of the following two ef-fects: (1) They can mix the words which co-occur locally inthe target corpus but are less semantically related in general;(2) Local topics can be dominated by specialized words,which are less interpretable without extra knowledge. Forexample, we show four example topics of our experimentsin Table 1, where we can see: Topic 1 is composed of thewords from both the “scientific publication” and “biology”aspects; Topic 2 is a mixture of “sports” and “music”; Topics3 and 4 are very specific topics about “singer” and “videogame” respectively. We humans are able to understand thoselocal topics in the above way because we are equipped withthe global semantics of the words, making us go beyondthe local context of the target corpus. Therefore, we aremotivated to propose a model which is able to automaticallyanalyze the fine-grained thematic structures of local topics,further improving the interpretability of topic modeling.
WEDTM
Fortunately, word embeddings such as GloVe (Penningtonet al., 2014), word2vec (Mikolov et al., 2013), and Fast-Text (Bojanowski et al., 2017) can be used as an accessiblesource of global semantic information for topic models.Learned from large corpora, word embeddings encode thesemantics of words with their locations in a space, wheremore related words are closer to each other. For example inTopic 1, according to the distances of word embeddings, thewords “biology, cell, psychology, bioinformatics” shouldbe in one cluster and “journal, science, research, interna-tional, scientific” should be in the other. Therefore, if a topicmodel can leverage the information in word embeddings, itmay discover the fine-grained thematic structures of localtopics. Furthermore, it has been demonstrated that conven-tional topic models suffer from data sparsity, resulting ina large performance degradation on some shorter internet-generated documents like tweets, product reviews, and newsheadlines (Zuo et al., 2016; Zhao et al., 2017c). In thiscase, word embeddings can also serve as complementaryinformation to alleviate the sparsity issue in topic models.
In this paper, we propose a novel deep structuredtopic model, named the Word Embeddings Deep TopicModel, WEDTM1, which improves the interpretability oftopic models by discovering topic hierarchies (i.e., intertopic structure) and fine-grained interpretations of local top-ics (i.e., intra topic structure). Specifically, the proposedmodel adapts a multi-layer Gamma Belief Network whichgenerates deep representations of topics as well as docu-ments. Moreover, WEDTM is able to split a local topicinto a set of sub-topics, each of which captures one fine-grained thematic aspect of the local topic, in a way thateach sub-topic is informed by word embeddings. WEDTMhas the following key properties: (1) Better interpretabil-ity with topic hierarchies and sub-topics informed by wordembeddings. (2) The state-of-the-art perplexity, documentclassification, and topic coherence performance, especiallyfor sparse text data. (3) A straightforward Gibbs samplingalgorithm facilitated by fully local conjugacy under dataaugmentation.
2. Related WorkDeep/hierarchical topic models: Several approaches havebeen developed to learn hierarchical representations ofdocuments and topics. The Pachinko Allocation model(PAM) (Li & McCallum, 2006) assumes the topic structureis modeled by a directed acyclic graph (DAG), which isdocument specific. nCRP (Blei et al., 2010) models topichierarchies by introducing a tree structure prior constructedwith multiple CRPs. Paisley et al. (2015); Kim et al. (2012);Ahmed et al. (2013) further extend nCRP by either softeningits constraints or applying it to different problems respec-
1https://github.com/ethanhezhao/WEDTM
tively. Poisson Factor Analysis (PFA) (Zhou et al., 2012)is a nonnegative matrix factorization model with Poissonlink, which is a popular alternative to LDA, for topic model-ing. The details of the close relationships between PFA andLDA can be found in Zhou (2018). There are several deepextensions to PFA for documents, such as DPFA (Gan et al.,2015), DPFM (Henao et al., 2015), and GBN (Zhou et al.,2016). Among them, GBN factorizes the factor score ma-trix (topic weights of documents) in PFA with nonnegativegamma-distributed hidden units connected by the weightsdrawn from the Dirichlet distribution. From a modeling per-spective, GBN is related to PAM, while GBN assumes thereis a corpus-level topic hierarchy shared by all the documents.As reported by Cong et al. (2017), GBN outperforms otherhierarchical models including nHDP, DPFA, and DPFM.Despite having the attractive properties, these deep modelsbarely consider intra topic structures or the sparsity issueassociated with internet-generated corpora.
Word embedding topic models: Recently, there is a grow-ing interest in applying word embeddings to topic models,especially for sparse data. For example, WF-LDA (Pettersonet al., 2010) extends LDA to model word features with thelogistic-normal transform, where word embeddings are usedas word features in Zhao et al. (2017b). LF-LDA (Nguyenet al., 2015) integrates word embeddings into LDA by re-placing the topic-word Dirichlet multinomial componentwith a mixture of a Dirichlet multinomial component anda word embedding component. Due to the non-conjugacyin WF-LDA and LF-LDA, part of the inference has to bedone by MAP optimization. Instead of generating tokens,Gaussian LDA (GLDA) (Das et al., 2015) directly gener-ates word embeddings with the Gaussian distribution. Themodel proposed in Xun et al. (2017) further extends GLDAby modeling topic correlations. MetaLDA (Zhao et al.,2017c; 2018a) is a conjugate topic model that incorporatesboth document and word meta information. However, inMetaLDA, word embeddings have to be binarized, whichcan lose useful information. WEI-FTM (Zhao et al., 2017b)is a focused topic model where a topic focuses on a subset ofwords, informed by word embeddings. To our knowledge,topic hierarchies and sub-topics are not considered in mostof the existing word embedding models.
3. The Proposed ModelBased on the PFA framework, WEDTM is a hierarchicalmodel with two major components: one for discoveringthe inter topic hierarchies and the other for discoveringintra topic structures (i.e., sub-topics) informed by wordembeddings. The two components are connected by thebottom-layer topics, detailed as follows. Assume that eachdocument j is presented as a word count vector x(1)
j ∈ NV0 ,where V is the size of the vocabulary; the pre-trained L
WEDTM
dimensional real-valued embeddings for each word v ∈{1, · · · , V } are stored in a L-dimensional vector fv ∈ RL.Now we consider WEDTM with T hidden layers, where thet-th layer is with Kt topics and kt is the index of each topic.In the bottom layer (t = 1), there are K1 (local) topics, eachof which is associated with S sub-topics. To assist clarity,we split the generative process of the model into three parts,shown as follows:
Generatingdocuments
θ(1)j ∼ Gam
[Φ(2)θ
(2)j , p
(2)j /(1− p(2)j )
],
φ(1)k1∼ Dir (βk1) ,
x(1)j ∼ Pois
(Φ(1)θ
(1)j
),
Interstructure
θ(T )j ∼ Gam
(r, 1/c
(T+1)j
),
· · ·θ(t)j ∼ Gam
(Φ(t+1)θ
(t+1)j , 1/c
(t+1)j
)(t < T ),
φ(t)kt∼ Dir (η01) (t > 1),
· · ·
Intrastructure
w<s>k1∼ N [0, diag(1/σ<s>)] ,
α<s>k1∼ Gam(α<s>0 /S, 1/c<s>0 ),
β<s>vk1∼ Gam
(α<s>k1
, ef>v w<s>
k1
),
βvk1 :=∑Ss β
<s>vk1
,
where (t) is the index of the layer that a variable belongs toand <s> is the index of sub-topic s. To complete the model,we impose the following priors on the latent variables:
rkT ∼ Gam(γ0/KT , 1/c0),
γ0 ∼ Gam(a0, 1/b0), p(t)j ∼ Beta(a0, b0),
c(t)j ∼ Gam(e0, 1/f0), α
<s>0 ∼ Gam(e0, 1/f0),
c<s>0 ∼ Gam(e0, 1/f0), σ<s>l ∼ Gam(a0, 1/b0).
We first take a look at the bottom layer of the model, i.e.,the process of generating the documents, which follows aPFA framework. In this part, WEDTM models the wordcounts x(1)
j in a document by a Poisson (Pois) distributionand factorizes the Poisson parameters into a product of thefactor loadings Φ(1) ∈ RV×K1
+ and hidden units θ(1)j . θ(1)jis the first-layer latent representation (unnormalized topicweights) of document j, each element of which is drawnfrom a gamma (Gam) distribution2. The k1-th column ofΦ(1), φ(1)
k1∈ RV+ is the word distribution of topic k1, drawn
from a Dirichlet (Dir) distribution. We then explain thecomponent for discovering inter topic hierarchies, which issimilar to the structure of GBN (Zhou et al., 2016). Specif-ically, the shape parameter of θ(1)j is factorized into θ(2)jand Φ(2) ∈ RK1×K2
+ , where θ(2)j is the second-layer latent
2The first and second parameters of the gamma distribution arethe shape and scale respectively.
representation of document j and φ(2)k2∈ RK1
+ models thecorrelations between topic k2 and all the first-layer topics.Note that strictly speaking, k2 is not a “real” topic as it isnot a distribution over words. But it can be interpreted withwords by Φ(1)φ
(2)k2
. By repeating this construction, we areable to build a deep structure to discover topic hierarchies.
Now we explain how sub-topics are discovered for thebottom-layer topics with the help of word embeddings. Firstof all, WEDTM applies individual asymmetric Dirichletparameters βk1 ∈ RV+ for each bottom-layer (local) topicφ
(1)k1
. We further construct βvk1 =∑Ss β
<s>vk1
, where β<s>vk1models how strongly word v is associated with sub-topic sin local topic k1. For each sub-topic s, we introduce an L-dimensional sub-topic embedding: w<s>
k1∈ RL. As β<s>vk1
is gamma distributed, its scale parameter is constructed bythe dot product of the embeddings of sub-topic s and wordv through the exponential function.
The basic idea of our model is summarized as follows:
1. In terms of sub-topics, we assume each (local, bottom-layer) topic is associated with several sub-topics, in away that the sub-topics contribute to the prior of thelocal topic via a sum model (Zhou, 2016). Therefore, ifa word dominates in one or more sub-topics, it is likelythat the word will still dominate in the local topic. Withthis construction, a sub-topic is expected to capture onefine-grained thematic aspect of the local topic and eachsub-topic can be directly interpreted with words viaβ<s>k1
∈ RV+ .
2. To leverage word embeddings to inform the learningof sub-topics, we introduce the sub-topic embeddingfor each of them, w<s>
k1, which directly interacts with
the word embeddings. Therefore, sub-topic embed-dings are learned with both the local context of thetarget corpus and the global information of word em-beddings. According to our model construction, theprobability density function of βvk1 is the convolutionof S covariance-dependent gamma distributions (Zhou,2016). Therefore, if the sub-topic embeddings of s andword embeddings of v are close, the dot product ofthem will be large, giving a large expectation of β<s>vk1
.The large expectation means that v has a large weightin sub-topic s of k. Finally, β<s>vk1
further contributes
to the local topic’s prior βvk1 , informing φ(1)vk1 of thelocal topic.
3. It is also noteworthy the special case of WEDTM,where S = 1, meaning that there are no sub-topicsand each local topic k1 is associated with one topic em-bedding vector wk1 . Consequently, in WEDTM, thereare three latent variables capturing the weights betweenthe words and local topic k1: eF
>wk1 (F ∈ RL×V is
WEDTM
the embeddings of all the words), βk1 , and φ(1)k1
, eachof which is a vector over words. It is interesting to ana-lyze the connections and differences of them. eF
>wk1
is the prior of βk1 , while βk1 is the prior of φ(1)k1
. So
eF>wk1 is the closest one to the word embeddings, i.e.,
the global semantic information, while φ(1)k1
is the clos-est one to the data, i.e., the local document contextof the target corpus. Therefore, unlike conventionaltopic models with φ(1)
k1only, the three variables of
WEDTM give three different views to the same topic,from global to local, respectively. We qualitativelyshow this interesting comparison in Section 5.4.
4. The last but not least, word embeddings in WEDTMcan be viewed to serve as the prior/complementaryinformation to assist the learning of the whole model,which is important especially for sparse data.
4. InferenceUnlike many other word embeddings topic models, the fullylocal conjugacy of WEDTM facilitates the derivation of aneffective Gibbs sampling algorithm. As the sampling forthe latent variables in the process of generating documentsand modeling inter topic structure are similar to GBN, thedetails can be found in Zhou et al. (2016). Here we focuson the sampling of the latent variables for modeling intratopic structure.
Assume that sampled by Eq. (28) in Appendix B of Zhouet al. (2016), the latent count for the bottom-layer local top-ics are x(1)vjk1 , which counts how many words v in documentj are allocated with local topic k1.
Sample β<s>vk1. We first sample:
(h<1>vk1
, · · · , h<S>vk1
)∼ Mult
(hvk1 ,
β<1>vk1
βvk1, · · · ,
β<S>vk1
βvk1
),
(1)
where hvk1 ∼ CRT(x(1)v·k1 , βvk1
)(Zhou & Carin, 2015;
Zhao et al., 2017a), and x(1)v·k1 :=∑j x
(1)vjk1
3. Then:
β<s>vk1∼
Gam(α<s>k1+ h<s>vk1
, 1)
e−π<s>vk1 + log 1
qk1
, (2)
where qk1 ∼ Beta(β·k1 , x(1)··k1) (Zhao et al., 2018b) and we
define π<s>vk1:= f>v w
<s>k1
.
3We hereafter use · of a dimension to denote the sum over thatdimension.
Sample α<s>k . We first sample g<s>vk1∼
CRT(h<s>vk1
, α<s>k1
), then:
α<s>k1∼
Gam(α<s>0 /S + g<s>·k1 , 1)
c<s>0 + log(1 + eπ
<s>vk1 log 1
qk1
) . (3)
It is noteworthy that the hierarchical construction onα<s>k1is
closely related to the gamma-negative binomial process andcan be considered as a (truncated) gamma process (Zhou &Carin, 2015; Zhou, 2016) with an intrinsic shrinkage mech-anism on S. It means that the model is able to automaticallylearn the number of effective sub-topics.
Sample w<s>k1
.
w<s>k1∼ N (µ<s>k1
,Σ<s>k1
),
µ<s>k1=
Σ<s>k1
[V∑v
(h<s>vk1
− α<s>k1
2− ω<s>vk1
log log1
qk1
)fv
],
Σ<s>k1
=
[diag(1/σ<s>) +
V∑v
ω<s>vk1fv(fv)
>
]−1,(4)
where ω<s>vk1∼ PG
(h<s>vk1
+ α<s>k1, π<s>vk1
+ log log 1qk1
)and PG denotes the Polya gamma distribution (Polson et al.,2013). To sample from PG, we use an accurate and efficientapproximate sampler in Zhou (2016).
Omitted derivations, details, and the overall algorithm arein the supplementary materials.
5. ExperimentsWe evaluate the proposed WEDTM by comparing it withseveral recent advances including deep topic models andword embedding topic models. The experiments were con-ducted on four real-world datasets including both regularand sparse texts. We report perplexity, document classifica-tion accuracy, and topic coherence scores. We also qualita-tively analyze the topic hierarchies and sub-topics.
5.1. Experimental Settings
In the experiments, we used a regular text dataset (20NG)and three sparse text datasets (WS, TMN, Twitter), the de-tails of which are as follows: 1. 20NG, 20 Newsgroup, con-sists of 18,774 articles with 20 categories. Following Zhouet al. (2016), we used the 2000 most frequent terms afterremoving stopwords. The average document length is 76.2. WS, Web Snippets, contains 12,237 web search snip-pets with 8 categories, used by Li et al. (2016); Zhao et al.(2017c;b). The vocabulary contains 10,052 tokens and thereare 15 words in one snippet on average. 3. TMN, Tag My
WEDTM
(a) WS
50 100 200-200
-150
-100
-50
0
50
WEDTM-1
WEDTM-2
WEDTM-3
GBN-1
GBN-2
GBN-3
MetaLDA
WEI-FTM
GBN-1: 704, 595, 548
(b) TMN
50 100 200-250
-200
-150
-100
-50
0
50
GBN-1: 1498, 1299, 1190
(c) Twitter
50 100 200-50
-40
-30
-20
-10
0
10
GBN-1: 461, 349, 305
(d) 20NG
100 200 400-20
-10
0
10
20
GBN-1: 631, 522, 479
(e) WS
20% 40% 60% 80%-600
-500
-400
-300
-200
-100
0
100
WEDTM-1
WEDTM-2
WEDTM-3
GBN-1
GBN-2
GBN-3
MetaLDA
WEI-FTM
GBN-1: 1786, 994, 728, 595
(f) TMN
20% 40% 60% 80%-800
-600
-400
-200
0
200
GBN-1: 3140, 1925, 1495, 1299
(g) Twitter
20% 40% 60% 80%-150
-100
-50
0
50
GBN-1: 699, 446, 382, 349
(h) 20NG
20% 40% 60% 80%-25
-20
-15
-10
-5
0
5
GBN-1: 704, 595, 549, 522
Figure 1. (a)-(d): Relative per-heldout-word perplexity6 (the lower the better) with the varied K1 and fixed proportion (80%) of trainingwords of each document. (e)-(h): Relative per-heldout-word perplexity6 with the varied proportion of training words of each documentand fixed K1 (100 on WS, TMN, and Twitter; 200 for 20NG). The error bars indicate the standard deviations of 5 random trials. Thenumber attached to WEDTM and GBN indicates the number of layers (i.e., T ) used.
News, consists of 32,597 RSS news snippets from Tag MyNews with 7 categories, used by Nguyen et al. (2015); Zhaoet al. (2017c;b). Each snippet contains a title and a short de-scription. There are 13,370 tokens in the vocabulary and theaverage length of a snippet is 18. 4. Twitter, was extractedin 2011 and 2012 microblog tracks at Text REtrieval Con-ference (TREC)4 and preprocessed in Yin & Wang (2014).It has 11,109 tweets in total. The vocabulary size is 6,344and a tweet contains 21 words on average.
We compared WEDTM with: 1. GBN (Zhou et al., 2016),the state-of-the-art deep topic model. 2. MetaLDA (Zhaoet al., 2017c; 2018a), the state-of-the-art topic model withbinary meta information about document and/or word. Wordembeddings need to be binarized before used in the model. 3.WEI-FTM (Zhao et al., 2017b), the state-of-the-art focusedtopic model that incorporates real-valued word embeddings.
It is noteworthy that GBN was reported (Cong et al., 2017)to have better performance than other deep (hierarchi-cal) topic models such as nHDP (Paisley et al., 2015),DPFA (Gan et al., 2015), and DPFM (Henao et al., 2015).MetaLDA and WEI-FTM were reported to perform betterthan other word embedding topic models including WF-LDA (Petterson et al., 2010) and GPUDMM (Li et al., 2016)as well as short text topic models like PTM (Zuo et al.,
4http://trec.nist.gov/data/microblog.html
2016). Therefore, we considered the three above competi-tors to WEDTM.
Originally MetaLDA (when no document meta informa-tion is provided) and WEI-FTM follow the LDA frame-work, where the topic distribution for document j is θj ∼Dir(α01) and α0 is a hyperparameter (usually set to 0.1).For a fair comparison, we replaced this part with thePFA framework with the gamma-negative binomial pro-cess (Zhou & Carin, 2015), which is equivalent to GBNwhen T = 1 and closely related to the hierarchical DirichletProcess LDA (HDPLDA) (Teh et al., 2012).
For all the models, we used 50-dimensional GloVe wordembeddings pre-trained on Wikipedia5. Except for Met-aLDA, where we followed the paper to binarise the wordembeddings, the other three models used the original real-valued embeddings. The hyperparameter settings we usedfor WEDTM and GBN are a0 = b0 = 0.01, e0 = f0 =1.0, η0 = 0.05. For MetaLDA and WEI-FTM, we collected1000 MCMC samples after 1000 burnins; for GBN andWEDTM, we collected 1000 for T = 1 and 500 for T > 1MCMC samples after 1000 for T = 1 and 500 for T > 1burnins, to estimate the posterior mean. Due to the shrink-age effect of WEDTM on S, discussed in Section 4, we setS = 5 which is large enough for all the topics.
5https://nlp.stanford.edu/projects/glove/
WEDTM
(a) WS
20% 40% 60% 80%-5
0
5
10
15
20WEDTM-1
WEDTM-2
WEDTM-3
GBN-1
GBN-2
GBN-3
MetaLDA
WEI-FTM
GBN-1: 66.6, 80.9, 83.1, 82.2
(b) TMN
20% 40% 60% 80%-4
-2
0
2
4
6
8
10
GBN-1: 72.0, 77.9, 78.8, 79.4
(c) 20NG
20% 40% 60% 80%-2
-1
0
1
2
3
4
5
GBN-1: 64.5, 68.3, 69.6, 69.9
Figure 2. Relative document classification accuracy6 (%) on WS, TMN, and 20NG with the varied proportion of training words of eachtraining document. The results with K1 = 100 on WS and TMN, K1 = 200 on 20NG are reported.
(a) WS
20% 40% 60% 80%-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
WEDTM
WEDTM-te
GBN
MetaLDA
WEI-FTM
WEI-FTM-te
(b) TMN
20% 40% 60% 80%-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
(c) Twitter
20% 40% 60% 80%-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
Figure 3. Topic coherence (NPMI, the higher the better) on WS, TMN, and Twitter with the varied proportion of training words of eachdocument. The results with K1 = 100 are reported. For WEDTM and WEI-FTM, the top words of a topic are generated by ranking theword distribution and the dot product of topic and word embeddings (denoted “-te”).
5.2. Perplexity
Perplexity is a measure that is widely used (Wallach et al.,2009) to evaluate the modeling accuracy of topic models.Here we randomly chose a certain proportion of the wordtokens in each document as training and used the remain-ing ones to calculate per-heldout-word perplexity. Figure 1shows the relative perplexity6 results of all the models onall the datasets, where we varied the number of bottom-layer topics as well as the proportion of training words. Theproposed WEDTM performs significantly better than theothers, especially on sparse data. There are several inter-esting remarks of the results: (1) The perplexity advantageof WEDTM over GBN becomes obvious when the corpusbecomes sparse (e.g., WS/TMN/Twitter V.S. 20NG and 20%V.S. 80% training words). It shows that using word embed-dings as the prior information benefits the model. (2) Ingeneral, increasing the depth of the model leads to betterperplexity. However, when the data are too sparse (e.g. WSwith 20% training words), the single-layer WEDTM andGBN perform better than their multi-layer counterparts. (3)Although MetaLDA and WEI-FTM leverage word embed-
6We subtracted the score of GBN with only one layer (GBN-1)from the score of each model. The lines plot the differences. SoGBN-1 is the horizontal line on “0”. The absolute score of GBN-1is given below each figure.
dings as well, the proposed WEDTM outperforms themsignificantly. Perhaps the way that WEDTM incorporatesword embeddings is more effective.
5.3. Document Classification
We consider the multi-class classification task for predict-ing the categories for test documents to evaluate the qualityof the latent document representation (unnormalized topicweights) extracted by these models.7 In this experiment,following Zhou et al. (2016), we ran the topic models onthe training documents and trained a L2 regularized logisticregression using the LIBLINEAR package (Fan et al., 2008)with the latent representation θ(1)j as features. After train-ing, we used the trained topic models to extract the latentrepresentations of the test documents and the trained logisticregression to predict the categories. For all the datasets, werandomly selected 80% documents for training and usedthe remaining 20% for testing. Figure 2 shows the relativedocument classification accuracy6 results for all the models.It can be observed that with word embeddings, WEDTMoutperforms GBN significantly, the best on TMN and 20NG,and the second-best on WS. Again, we see a similar phe-
7The results of Twitter are not reported because each documentof it is associated with multiple categories.
WEDTM
nomenon: word embeddings help more on the sparser dataand increasing the network depth improves the accuracy.
5.4. Topic Coherence
Topic coherence is another popular evaluation of topic mod-els (Zuo et al., 2016; Zhao et al., 2017b;b). It measuresthe semantic coherence in the most significant words (topwords) in a topic. Here we used the Normalized PointwiseMutual Information (NPMI) (Aletras & Stevenson, 2013;Lau et al., 2014) to calculate topic coherence score of thetop 10 words of each topic and report the average score ofall the topics.8
To compare with the other models, in this experiment, weset S = 1 for WEDTM. Recall that in WEDTM, fromglobal to local, there are three ways to interpret a topic.Here we evaluate NPMI for two of them: eF
>wk1 and φ(1)k1
.Figure 3 shows the NPMI scores for all the models on WS,TMN, and Twitter. It is not surprising to see that the topwords generated by eF
>wk1 in WEDTM always gain thehighest NPMI scores, meaning that the topics are morecoherent. This is because the topic embeddings in WEDTMdirectly interact with word embeddings. Moreover, if wejust compare the topics generated by φ(1)
k1, WEDTM also
gives more coherent topics than the other models. Thisdemonstrates that the proposed model is able to discovermore interpretable topics.
5.5. Qualitative Analysis
As one of the most appealing properties of WEDTM is itsinterpretability, we conducted the extensive qualitative eval-uation of the quality of the topics discovered by WEDTM,including topic embeddings, sub-topics, and topic hierar-chies. More qualitative analysis including topic hierarchyvisualization and synthetic document generation is shownin the supplementary materials.
Demonstration of topic embeddings: We demonstratethat WEDTM discovers more coherent topics by comparingwith those of GBN in Table 2. Here we set S = 1 as well.This demonstration further explains the numerical results inFigure 3. It is also interesting to compare the local interpre-tation (φ(1)
k ) and global interpretation (topic embeddings)of the same topic in WEDTM. For example, in the fifthset, the local interpretation (5.b) is about “networks andsecurity,” while the global interpretation (5.c) generalizes itwith more general words related to “communications.” Wecan also observe that although the local interpretation ofWEDTM is not as close to word embeddings as the globalinterpretation, as informed by the global interpretation, the
8We used the Palmetto package with a large Wikipedia dumpto compute NPMI (http://palmetto.aksw.org).
local interpretation of WEDTM’s topics is still considerablymore coherent than those in GBN.
Demonstration of sub-topics: In Figure 4, We show thesub-topics discovered by WEDTM for the topics used as ex-amples at the beginning of the paper (Table 1). It can be ob-served that the intra topic structures with sub-topics clearlyhelp to explain the local topics. For example, WEDTMsuccessfully splits Topic 1 into sub-topics related to “jour-nal” and “biology,” and Topic 2 into “music” and “sports”.Moreover, with the help of word embeddings, WEDTM dis-covers general sub-topics for specific topics. For example,Topic 3 and 4 are more interpretable with the sub-topicsof “singer” and “game” respectively. The experiment alsoempirically demonstrates the shrinkage mechanism of themodel: for most topics, the effective sub-topics are less thanthe maximum number S = 5.
Demonstration of topic hierarchies: Figure 5 shows anexample that jointly demonstrates the inter and intra struc-tures of WEDTM. The tree is a cluster of topics relatedto “health,” where the topic hierarchies are discovered byranking {Φ(t)}t, the leaf nodes are the topics in the bottomlayer, and each bottom-layer topic is associated with a setof sub-topics. In WEDTM, the inter topic structures are re-vealed in the form of topic hierarchies while the intra topicstructures are revealed in the form of sub-topics. Combiningthe two kinds of topic structures in this way gives a betterview of the target corpus, which may further benefit othertext analysis tasks.
6. ConclusionIn this paper, we have proposed WEDTM, a deep topicmodel that leverages word embeddings to discover intertopic structures with topic hierarchies and intra topic struc-tures with sub-topics. Moreover, with the introduction tosub-topic embeddings, each sub-topic can be informed bythe global information in word embeddings, so as to dis-cover a fine-grained thematic aspect of a local topic. Withtopic embeddings, WEDTM provides different views to atopic, from global to local, which further improves the inter-pretability of the model. As a fully conjugate model, the in-ference of WEDTM can be done by a straightforward Gibbssampling algorithm. Extensive experiments have shownthat WEDTM achieves the state-of-the-art performance onperplexity, document classification, and topic quality. In ad-dition, with topic hierarchies, sub-topics, and topic embed-dings, the model can discover more interpretable structuredtopics, which helps to get better understandings of text data.Given the local conjugacy, it is possible to derive more scal-able inference algorithms for WEDTM, such as stochasticvariational inference and stochastic gradient MCMC, whichis a good subject for future work.
WEDTM
Table 2. Top 10 words of five sets of example topics on the WS dataset. Each set contains the top words of 3 topics: topic ‘a’ is generatedby φ
(1)k in GBN-3; topic ‘b’ is generated by φ
(1)k in WEDTM-3; topic ‘c’ is generated by eF
>wk1 in WEDTM-3. Topic ‘a’ and ‘b’ arematched by the Hellinger distance of φ(1)
k . Topic ‘b’ and ‘c’ are different ways of interpreting one topic in WEDTM.Topic Index Top 10 words NPMI
1a engine car buying home diesel selling fuel automobile violin jet -0.055b engine motor diesel fuel gasoline jet electric engines gas technology 0.202c engine diesel engines gasoline steam electric fuel propulsion motors combustion 0.224
2a party labor democratic political socialist movement union social news australian 0.168b party political communist democratic socialist labor republican parties conservative leader 0.188c party democratic communist labour liberal socialist conservative opposition elections republican 0.219
3a cancer lung tobacco intelligence artificial information health symptoms smoking treatment -0.006b cancer lung tobacco information health smoking treatment gov research symptoms 0.050c cancer breast diabetes pulmonary cancers patients asthma cardiovascular cholesterol obesity 0.050
4a oscar academy awards swimming award winners swim oscars nominations picture 0.020b art awards oscar academy gallery museum surrealism sculpture picasso arts 0.076c paintings awards award art museum gallery sculpture painting picasso portrait 0.087
5a security computer network nuclear weapons networking spam virus spyware national 0.059b security network wireless access networks spam spyware networking national computer 0.061c wireless internet networks devices phone broadband users network wi-fi providers 0.143
journal science biology
research journals
international cell
psychology scientific
bioinformatics
journal journals scientific research society
publishes science
information documents
peer-reviewed
biology genetics research
nanotechnology genetic
molecular biological
interdisciplinary neuroscience biotechnology
peer-reviewed zoology
humanist internationale
asm nejm
naturalis cri
bas mathematica
1
fitness piano guitar
swimming violin
weightlifting lessons training swim
weight
piano guitar violin violins pianos guitars tenor
strings keyboard
fender
indoor workout
swimming conditioning
exercise training fitness weight boxing gloves
ebay listings trash mass grand digg
rental overlooked
luxury collectors
2
taylor swift
prince william
jovi bon gala jon
white winter
concert music
singing crowd
audience singer sang band song night
singer princess dancer
romantic actress sang gown sing jovi duet
3
san auto theft
grand andreas mobile
gta game
rockstar december
game version
playstation android nintendo release season series
original released
andreas theft auto
mobile dlc
continues madness
sad anniversary
struck
4
Figure 4. The sub-topics (red) of the example topics (blue). Larger font size indicates larger weight (∑V
v β<s>vk ) of a sub-topic to the
local topic. We set S = 5 and trimmed off the sub-topics with extreme small weights.
47 health cancer information gov disease healthy nutrition diet medical hiv
4 health cancer information gov healthy nutrition disease diet medical heart
21 theoretical edu models hypotheses cgi model reasoning abstract hypothesis framework
33 hiv aids system respiratory prevention epidemic infection hepatitis organism living
1 information links web
online resources
home page
search com news
online information
web internet pages
website books page mail book
full-text brochures
how-to summaries guidebooks databases dictionaries
listings maps
keywords
pdf link
unreleased search
org virtual mexico
authoritative disclaimer
iranian
4 news bbc com story blog
people comics article msnbc
talk
dead killed
violence fled
muslim god
cartoons minister morning cartoon
news com
people washington
story article bush blog
stories world
5 health cancer
information healthy
gov nutrition disease
diet medical
heart
health cancer disease nutrition diseases
care medical
food information
patients
vitamin cholesterol
diets dietary
hdl calorie fatty
calcium nutrition obesity
12 theoretical edu
models hypotheses
cgi model
reasoning abstract
hypothesis framework
theory mathematical
analysis empirical
theoretical study
growth theories
molecular understanding
superstring physik
pts theorem
stl symmetry
que jul syr
mathematica
36 hiv aids
system respiratory prevention epidemic infection hepatitis organism species
hiv respiratory
aids disease species
diseases infection human
prevention hepatitis
rcn nlm
chudler acad mcw hse
assoc mtn nic
disassembles
Figure 5. One example sub-tree of the topic hierarchy discovered by WEDTM on the WS dataset with K1 = 50 and S = 5. Thetree is generated in the same way to Zhou et al. (2016). A line from node kt at layer t to node kt−1 at layer t − 1 indicates thatφ(t)kt−1kt
> 1.5/Kt−1 and its width indicates the value of φ(t)kt−1kt
(i.e. topic correlation strength). The outside border of the text box iscolored as orange, blue, or black if the node is at layer three, two, or one, respectively. For the leaf nodes, sub-topics are shown in thesame way to Figure 4.
WEDTM
ReferencesAhmed, A., Hong, L., and Smola, A. Nested Chinese restau-
rant franchise process: Applications to user tracking anddocument modeling. In ICML, 2013.
Aletras, N. and Stevenson, M. Evaluating topic coherenceusing distributional semantics. In Proc. of the 10th In-ternational Conference on Computational Semantics, pp.13–22, 2013.
Blei, D., Griffiths, T., and Jordan, M. The nested Chineserestaurant process and Bayesian nonparametric inferenceof topic hierarchies. Journal of the ACM (JACM), 57(2):7, 2010.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. En-riching word vectors with subword information. TACL, 5:135–146, 2017.
Cong, Y., Chen, B., Liu, H., and Zhou, M. Deep latentDirichlet allocation with topic-layer-adaptive stochasticgradient Riemannian MCMC. In ICML, pp. 864–873,2017.
Das, R., Zaheer, M., and Dyer, C. Gaussian LDA for topicmodels with word embeddings. In ACL, pp. 795–804,2015.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., andLin, C.-J. LIBLINEAR: A library for large linear classifi-cation. JMLR, 9:1871–1874, 2008.
Gan, Z., Chen, C., Henao, R., Carlson, D., and Carin, L.Scalable deep Poisson factor analysis for topic modeling.In ICML, pp. 1823–1832, 2015.
Henao, R., Gan, Z., Lu, J., and Carin, L. Deep Poissonfactor modeling. In NIPS, pp. 2800–2808, 2015.
Kim, J. H., Kim, D., Kim, S., and Oh, A. Modeling topichierarchies with the recursive Chinese restaurant process.In CIKM, pp. 783–792. ACM, 2012.
Lau, J. H., Newman, D., and Baldwin, T. Machine readingtea leaves: Automatically evaluating topic coherence andtopic model quality. In EACL, pp. 530–539, 2014.
Li, C., Wang, H., Zhang, Z., Sun, A., and Ma, Z. Topicmodeling for short texts with auxiliary word embeddings.In SIGIR, pp. 165–174, 2016.
Li, W. and McCallum, A. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML,pp. 577–584, 2006.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., andDean, J. Distributed representations of words and phrasesand their compositionally. In NIPS, pp. 3111–3119, 2013.
Nguyen, D. Q., Billingsley, R., Du, L., and Johnson, M.Improving topic models with latent feature word repre-sentations. TACL, 3:299–313, 2015.
Paisley, J., Wang, C., Blei, D., and Jordan, M. Nestedhierarchical Dirichlet processes. TPAMI, 37(2):256–270,2015.
Pennington, J., Socher, R., and Manning, C. GloVe: Globalvectors for word representation. In EMNLP, pp. 1532–1543, 2014.
Petterson, J., Buntine, W., Narayanamurthy, S. M., Caetano,T. S., and Smola, A. J. Word features for Latent DirichletAllocation. In NIPS, pp. 1921–1929, 2010.
Polson, N., Scott, J., and Windle, J. Bayesian inferencefor logistic models using Polya–Gamma latent variables.Journal of the American Statistical Association, 108(504):1339–1349, 2013.
Teh, Y. W., Jordan, M., Beal, M., and Blei, D. HierarchicalDirichlet processes. Journal of the American StatisticalAssociation, 101(476):1566–1581, 2012.
Wallach, H. M., Mimno, D. M., and McCallum, A. Rethink-ing LDA: Why priors matter. In NIPS, pp. 1973–1981,2009.
Xun, G., Li, Y., Zhao, W. X., Gao, J., and Zhang, A. Acorrelated topic model using word embeddings. In IJCAI,pp. 4207–4213, 2017.
Yin, J. and Wang, J. A Dirichlet multinomial mixture model-based approach for short text clustering. In SIGKDD, pp.233–242. ACM, 2014.
Zhao, H., Du, L., and Buntine, W. Leveraging node at-tributes for incomplete relational data. In ICML, pp.4072–4081, 2017a.
Zhao, H., Du, L., and Buntine, W. A word embeddingsinformed focused topic model. In ACML, pp. 423–438,2017b.
Zhao, H., Du, L., Buntine, W., and Liu, G. MetaLDA: Atopic model that efficiently incorporates meta information.In ICDM, pp. 635–644, 2017c.
Zhao, H., Du, L., Buntine, W., and Liu, G. Leveragingexternal information in topic modelling. Knowledge andInformation Systems, pp. 1–33, 2018a.
Zhao, H., Rai, P., Du, L., and Buntine, W. Bayesian multi-label learning with sparse features and labels, and labelco-occurrences. In AISTATS, pp. 1943–1951, 2018b.
Zhou, M. Softplus regressions and convex polytopes. arXivpreprint arXiv:1608.06383, 2016.
WEDTM
Zhou, M. Nonparametric Bayesian negative binomial factoranalysis. Bayesian Analysis, 2018.
Zhou, M. and Carin, L. Negative binomial process countand mixture modeling. TPAMI, 37(2):307–320, 2015.
Zhou, M., Hannah, L., Dunson, D. B., and Carin, L. Beta-negative binomial process and Poisson factor analysis. InAISTATS, pp. 1462–1471, 2012.
Zhou, M., Cong, Y., and Chen, B. Augmentable gammabelief networks. JMLR, 17(163):1–44, 2016.
Zuo, Y., Wu, J., Zhang, H., Lin, H., Wang, F., Xu, K., andXiong, H. Topic modeling of short texts: A pseudo-document view. In SIGKDD, pp. 2105–2114, 2016.
Supplementary Material for “Inter and Intra Topic Structure Learning withWord Embeddings”
He Zhao 1 Lan Du 1 Wray Buntine 1 Mingyaun Zhou 2
1. Inference DetailsRecall that the data is the word count vector x(1)
j of doc-ument j. Given the PFA framework, we can apply thecollapsed Gibbs sampling to sample the bottom-layer topicfor the i-th word in j, vji, similar to Eq. (28) in AppendixB of Zhou et al. (2016), as follows:
P (zji = k1) ∝βvk1 + x
(1)−ji
vji·k1
β·k1 + x(1)−ji
··k1
(x(1)−ji
·jk1 + φ(2)k1:θ(2)j
)(1)
where zji is the topic index for vji and x(1)vjk :=∑i δ(vji =
v, zji = k1) counts the number of times that term v appearsin document j; we use x−ji to denote the count x calculatedwithout considering word i in document j.
Given the latent counts x(1)v·k1 , the multinomial likelihood of
φ(1)k1
is proportional to(φ(1)vk1
)x(1)v·k1 . Due to the Dirichlet-
multinomial conjugacy, the joint likelihood of βk1 is com-puted as:
L (βvk1) ∝ Γ(β·k1)
Γ(βk1· + x(1)··k1)
V∏v
Γ(βvk1 + x(1)v·k1)
Γ(βvk1). (2)
By introducing an auxiliary beta distributed variable:
qk1 ∼ Beta(β·k1 , x(1)··k1), (3)
we can transform the first gamma ratio in Eq. (2) to(qk1)
β·k1 (Zhao et al., 2018). After this augmentation, onecan show that the likelihood of βk1,v is proportional to thenegative binomial distribution.
Recall that βvk1 =∑Ss β
<s>vk1
and the shape parameter ofβ<s>vk1
, α<s>k1is drawn from hierarchical gamma. This con-
struction is closely related to the gamma-negative binomial
1Faculty of Information Technology, Monash University, Aus-tralia 2McCombs School of Business, University of Texas at Austin.Correspondence to: Lan Du <lan.du@monash.edu>, MingyuanZhou <mingyuan.zhou@mccombs.utexas.edu>.
Proceedings of the 35 th International Conference on MachineLearning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018by the author(s).
process (Zhou & Carin, 2015; Zhou, 2016), which enablesthe model to automatically determine the number of effec-tive sub-topics.
Furthermore, we introduce another auxiliary variable:
hvk1 ∼ CRT(x(1)v·k1 , βvk1
), (4)
where h ∼ CRT(n, r) stands for the Chinese RestaurantTable distribution (Zhou & Carin, 2015) that generates thenumber of tables h seated by n customers in a Chineserestaurant process with the concentration parameter r (Wray& Marcus, 2012). Given hvk1 , the second gamma ratio canbe augmented as (βvk1)
hvk1 . Finally, with the two auxiliaryvariables, Eq. (2) can be written as:
L (βvk1 , qk1 , hvk1) ∝ (qk1)βvk1 (βvk1)
hvk1 . (5)
Sample β<s>vk1. Given the table counts hvk1 in Eq. (5), we
can sample the counts h<s>vk1for each sub-topic s as follows:
(h<1>vk1
, · · · , h<S>vk1
)∼ Mult
(hvk1 ,
β<1>vk1
βvk1, · · · ,
β<S>vk1
βvk1
).
(6)
Given h<s>vk1, we can sample β<s>vk1
as:
β<s>vk1∼
Gam(α<s>k1+ h<s>vk1
, 1)
e−π<s>vk1 + log 1
qk1
, (7)
where we define
π<s>vk1:= f>v w
<s>k1
. (8)
Sample α<s>k1. By integrating β<s>vk1
out, we sampleα<s>k1
as:
α<s>k1∼
Gam(α<s>0 /S + g<s>·k1 , 1)
c<s>0 + log(
1 + eπ<s>vk1 log 1
qk1
) , (9)
where
g<s>·k1 :=
V∑v
g<s>vk1, (10)
g<s>vk1∼ CRT
(h<s>vk1
, α<s>k1
). (11)
According to the gamma-gamma conjugacy, α<s>0 andc<s>0 can be sampled in a similar way.
WEDTM
Sample w<s>k1
. After integrating β<s>vk1out and ignoring
unrelated terms, the joint likelihood related to w<s>k1
isproportional to:
L(π<s>vk1
)∝
(eπ<s>vk1
+log log 1qk1
)h<s>vk1
(1 + e
π<s>vk1+log log 1
qk1
)α<s>k1+h<s>vk1
.(12)
The above likelihood can be augmented by an auxiliaryvariable: ω<s>vk1
∼ PG(1, 0), where PG denotes the Polyagamma distribution (Polson et al., 2013). Given ω<s>vk1
, weget:
L(π<s>vk1
, ω<s>vk1
)∝ e
h<s>vk1
−α<s>k1
2 π<s>vk1 e−ω<s>vk12 π<s>vk1
2
.
(13)
The above likelihood results in the normal likelihood ofw<s>k1
. Therefore, we sample it from a multi-variate normaldistribution as follows:
w<s>k1∼ N (µ<s>k1
,Σ<s>k1
),
µ<s>k1=
Σ<s>k1
[V∑v
(h<s>vk1
− α<s>k1
2− ω<s>vk1
log log1
qk1
)fv
],
Σ<s>k1
=
[diag(1/σ<s>) +
V∑v
ω<s>vk1fv(fv)
>
]−1,
(14)
where µ<s>k1∈ RL and Σ<s>
k1∈ RL×L.
According to (Polson et al., 2013), we can sample
ω<s>vk1∼ PG
(h<s>vk1
+ α<s>k1, π<s>vk1
+ log log1
qk1
).
(15)
To sample from the Polya gamma distribution, we use anaccurate and efficient approximate sampler in Zhou (2016).
Finally, σ(s) can be sampled from its gamma posterior.
The inference algorithm is shown in Figure 1.
2. Visualization of the Discovered TopicHierarchies and Sub-topics
Figure 1-9 show the topic hierarchies discovered byWEDTM on WS, TMN, and Twitter respectively.
3. Generating Synthetic DocumentsBelow we provide several synthetic documents generatedfrom WEDTM, following the GBN paper (Zhou et al., 2016).Given trained {Φ(t)}t, we used the generative model shownin Figure 1 in the main paper to generate a simulated topicweights θ(1)j . We show the top words ranked according to
Φ(1)θ(1)j . Below are some example synthetic documents
generated in this manner with WEDTM trained on the TMNdataset. The generated documents are clearly interpretable.
1. study drug cancer risk health people heart women dis-ease patients drugs researchers children brain foundfinds kids high doctors suggests research medicalsurgery food diabetes treatment blood years report menyoung shows year time scientists mets age linked yan-kees coli long care tuesday good hospital early prostatebreast fda developing common adults monday outbreakolder weight lower parents problems taking day studiesvirus aids babies life diet thursday loss levels higherwednesday home evidence make trial tests effectivedeath sox therapy pregnancy experts prevent low pa-tient run pressure pain alzheimer game flu type winnumber treat start season obesity symptoms
2. mets yankees game season sox win run baseball nflleague hit home time team red inning players day backbeat phillies start pitcher innings night runs rangersgiants major year games rays big manager york victoryindians coach left past boston make draft dodgers hitsgood pitch hitter career bay tigers blue list play bravesfans marlins tuesday years disabled high cubs thursdaysaturday lockout chicago week nationals top wednes-day lead white jays twins streak los days philadelphiafield long texas josh pitched ninth friday sunday fran-cisco homer lineup put young college football timesroundup pitching cleveland loss sports angeles
3. japan nuclear earthquake plant power tsunami cri-sis japanese oil quake radiation stocks disaster tokyoprices world friday tuesday fukushima investors hitcrippled energy reactor water march monday safetythursday plants economic reactors market governmentweek wednesday year country devastating wall highworkers radioactive concerns worst street officials per-cent companies damage electric report massive fearsearnings damaged economy impact food levels agencyoperator global daiichi plans markets stock chernobylsales coast growth rise higher states atomic fuel unitedrisk stricken health crude supply dollar low strong de-mand level recovery day magnitude quarter shares fellstruck tepco billion commodities experts gains relief
4. wedding royal prince kate william middleton lon-don queen britain ireland british irish friday eliza-
WEDTM
Require: {x(1)j }j , T, S, {Kt}t, a0, b0, e0, f0, η0, MaxIteration
Ensure: The sampled value for all the latent variables
1: Randomly initialise all the latent variables according to the generative process;2: for t← 1 to T do3: for iter ← 1 to MaxIteration do4: if t = 1 then5: / ∗ Generating documents ∗ /6: Sample the bottom-layer topics by Eq. (1); Calculate {x(1)vjk1}v,j,k1 ;7: end if8: / ∗ Inter topic structure ∗ /9: Do the upward-downward Gibbs sampler
10: (Algorithm 1 in Zhou et al. (2016)) for {θ(t)j , c(t)j ,p
(t)j }t,j , {Φ(t)}t, r;
11: if t = 1 then12: / ∗ Intra topic structure ∗ /13: Sample {qk1}k1 by Eq. (3); Sample {hvk1}v,k1 by Eq. (4);14: for s← 1 to S do15: Sample {h<s>vk1
}v,k1 by Eq. (6);16: Sample {g<s>vk1
}v,k1 by Eq. (11); Calculate {g<s>·k1 }k1 by Eq. (10);17: Sample α<s>0 and c<s>0 from their gamma posterior;18: Sample {α<s>k1
}k1 by Eq. (9);19: Sample {ω<s>vk1
}v,k1 by Eq. (15); Sample σ<s> from its gamma posterior;20: Sample {w<s>
k1}k1 by Eq. (14);
21: Calculate {π<s>vk1}v,k1 by Eq. (8);
22: Sample {β<s>vk1}v,k1 by Eq. (7);
23: end for24: Calculate {βvk1}v,k1 ;25: end if26: end for27: end for
Figure 1. Gibbs sampling algorithm for WEDTM
3 computer programming web software internet java network language information server
16 web internet server release software download microsoft information page service
24 programming software java language web code tutorials source com information
28 computer network security apple wireless personal access networking virus spam
42 fashion intel chip design processor core pentium designers duo itanium
1 information links web
online resources
home page
search com news
online information
web internet pages
website books page mail book
full-text brochures
how-to summaries guidebooks databases dictionaries
listings maps
keywords
pdf link
unreleased search
org virtual mexico
authoritative disclaimer
iranian
20 web internet server release software download microsoft service browser
beta
server browser software
user windows microsoft
application http
users access
usenet dial-up
browsers newsgroups downloaded
applets wirelessly
msn icq
modem
controller domain
info prediction desktops hosting
kilocalories openings
deployment attractions
29 memory computer
virtual amd cpu disk
cache discovery hardware
athlon
memory computer system
cpu disk
computers user data
software portable
cpu desktop
encryption microprocessor microprocessors
desktops pentium
risc workstations
solaris
soldiers speakers
franz languages
pronounced sun wind ultra
integrated base
37 digital reviews video photo com
roberts camera
julia mobile news
video photo digital phone online
graphics computer awards sony dvd
msn nytimes
materialist wal-mart
umi microarray biographie metasearch
presario keyword
25 edu data
library document
xml algorithms structures
code machine
instruction
document data
system documents
format files
analysis code
applications xml
xml parser parse
encryption http
javascript delete html risc
syntax
42 programming software
java language
code web
tutorials source
developer computer
programming software language
object-oriented computer interactive
python programs
php download
sun platforms
lie developer torvalds hackers notation spielberg speeds schmidt
47 amazon com
books node
american guide
shopping edition
dvd selection
amazon com
edition children
troy profiles
assignment proofs
isolation refereed
scott jeff
john geoffrey
sir smith ryan ian
hugh andrew
books reading node
lecturer accident
pregnancy galaxy
compounding field
temple
19 software systems computer parallel
computing linux
device architecture
management design
software computer systems
computing system
computers algorithms information database
user
scalable unix hpc
tonto freshwater
arg ita
goalkeepers ftp
gopher
32 computer network security apple
wireless personal access
networking virus spam
network networks wireless internet software phone
devices usb
microsoft users
alerts transit
integrating screen cited
slashdot bacterial creators
relevance geared
computer security copied stereo
capability analyzes
sem wds
korea curriculum
43 fashion intel chip
design processor
core pentium
designers duo
itanium
fashion designer clothing clothes
designers accessories
design dresses products trends
intel processor
chip apple
itanium microprocessor
xeon centrino
processors computer
Figure 2. Analogous plots to Figure 6 in the main paper for a tree about “computers” on the WS dataset.
beth designer princess april couple week palace peoplevisit idol american day fashion world made abbey artbride marriage diana duchess secret time honeymoonphoto king dress dinner famous sarah cake back medialady buckingham year ring show photos hat coveragewednesday party saturday mcqueen month morningpop watch guests charlie days sheen westminster fu-
ture kardashian star television night work crown tributeago duke officials worn check years pre kim maya wearfinal site met ceremony music tonight museum nuptialsharry ferguson pounds royals trip tuesday turned
5. apple sony mobile data ipad company corp networkphone billion software google iphone computer mil-
WEDTM
14 party political democratic labor communist wikipedia election socialist campaign politics
5 political democracy system government republic parliamentary politics presidential president social
17 party political labor democratic communist election socialist campaign republican communism
36 wikipedia encyclopedia wiki category article united culture knowledge word model
1 information links web
online resources
home page
search com news
online information
web internet pages
website books page mail book
full-text brochures
how-to summaries guidebooks databases dictionaries
listings maps
keywords
pdf link
unreleased search
org virtual mexico
authoritative disclaimer
iranian
3 economic international development
public policy
national council sector global
organization
economic international development community
organization global
national united public europe
transparency cooperation
nongovernmental investment
strengthening imf
undp macroeconomic competitiveness
strengthen
manufacture manufacturer
directory china taiwan
products manufacturers
exporter supplier suppliers
4 news bbc com story blog
people comics article msnbc
talk
dead killed
violence fled
muslim god
cartoons minister morning cartoon
news com
people washington
story article bush blog
stories world
8 political democracy
system government
republic parliamentary presidential
politics president
social
political democracy government presidential
leader politics
parliamentary president
democratic conflict
totalitarian democratic dictatorial
semi-presidential liberties socialist
centralized accountable economically
nonviolent
republic africa congo
parliament kenya france czech
indonesia lebanon
feb
27 party political labor
democratic communist
election socialist
campaign republican
communism
party democratic communist
socialist political election
campaign elections
committee congress
9 wikipedia encyclopedia
wiki category
article united
knowledge word
model the
wikipedia encyclopedia
article motivated
jump physicality sherman
pump manager
dod
language century system
rule forms
boundary distinct function structure
geographical
wiki primary circle
alleges identification
plus staying
psi reporters
greco-roman
17 culture history ancient
american modern
japanese western asian
religion roman
culture history ancient cultural
literature religion folklore
buddhism cultures chinese
Figure 3. Analogous plots to Figure 6 in the main paper for a tree about “movie” on the WS dataset.
49 war military force world revolution nuclear navy civil weapons research
12 gov house congress information legislation government federal budget senate medicare
13 economic research international development public national policy council sector global
29 war military force revolution nuclear navy world civil weapons army
34 economic international development public national policy research council sector global
48 edu data library document xml research algorithms structures code machine
1 information links web
online resources
home page
search com news
online information
web internet pages
website books page mail book
full-text brochures
how-to summaries guidebooks databases dictionaries
listings maps
keywords
pdf link
unreleased search
org virtual mexico
authoritative disclaimer
iranian
14 gov house
congress legislation
government federal budget
information senate
medicare
committee congress senate federal
legislative legislature
bill amendment legislation
appropriations
2 edu school
university department graduate course
students science college faculty
school university graduate college
students science teaching program
education harvard
grad brookings princeton doctoral drexel suny cuny mba
graduate fordham
3 economic international development
public policy
national council sector global
organization
economic international development community
organization global
national united public europe
transparency cooperation
nongovernmental investment
strengthening imf
undp macroeconomic competitiveness
strengthen
manufacture manufacturer
directory china taiwan
products manufacturers
exporter supplier suppliers
11 research gov
writing papers
laboratory project science national
fields researchers
research information
program science studies
materials scientific
researchers documents
environmental
res anl
mcafee sciences
dobb nih fas
herein laboratories
ahrq
4 news bbc com story blog
people comics article msnbc
talk
dead killed
violence fled
muslim god
cartoons minister morning cartoon
news com
people washington
story article bush blog
stories world
35 war military force
revolution nuclear
navy world civil
weapons army
war military force navy civil
invasion air
army personnel
troops
weapons nuclear bomb bombs
explosive israeli fuel
weapon radiological
tank
8 political democracy
system government
republic parliamentary presidential
politics president
social
political democracy government presidential
leader politics
parliamentary president
democratic conflict
totalitarian democratic dictatorial
semi-presidential liberties socialist
centralized accountable economically
nonviolent
republic africa congo
parliament kenya france czech
indonesia lebanon
feb
20 web internet server release software download microsoft service browser
beta
server browser software
user windows microsoft
application http
users access
usenet dial-up
browsers newsgroups downloaded
applets wirelessly
msn icq
modem
controller domain
info prediction desktops hosting
kilocalories openings
deployment attractions
25 edu data
library document
xml algorithms structures
code machine
instruction
document data
system documents
format files
analysis code
applications xml
xml parser parse
encryption http
javascript delete html risc
syntax
Figure 4. Analogous plots to Figure 6 in the main paper for a tree about “war” on the WS dataset.
17 school year tax states state health high college prices money
9 state governor bill law house wisconsin texas senate court budget
18 school college students schools chicago education high university public student
51 tax year prices health states money million pay state costs
60 derby rail kentucky animal morning kingdom line winner horse belmont
65 city york day state texas tornado year town governor home
81 rules law internet tuesday federal states laws companies groups government
87 food oil health gas energy recipes natural water spill year
4 tax health money states million
pay billion costs
federal insurance
money tax pay
million health billion costs
government care
insurance
medicaid medicare
tax money loans loan
income costs
pension borrowers
29 budget republican
house republicans
senate plan cuts
spending bill
congress
republican senate budget
republicans debate
democrats congressional
bill democratic congress
president party
support john
leader opposition
united democrats
leaders wednesday
dominated head ticked
nevada smoke
courteney apologizing
hasn haley steve
39 school college
students schools chicago
education high
university student
cooperative
school college
students education university schools teachers student colleges graduate
41 state governor
bill texas law
house immigration
arizona senate year
bill law
abortion immigration
repeal state
legislation texas
massachusetts legislature
43 rules law
internet laws
federal tuesday
companies government
states ban
laws rules
regulations legislation
ban federal
companies restrictions
officials law
proceeds communications
impede create official
suspends effectively explosives handling promised
65 court supreme appeal judge
appeals federal case ruling
monday ruled
court supreme
judge appeals
ban decision
case appeal federal friday
87 wisconsin union state public law
workers bargaining
rights collective
ohio
republican vote
legislation county election
legislature senate
democrats congressional republicans
1 time back years long make good week year fans big
feel things
bit sense pretty feeling
moment moments
smile ability
time long back years good make work
season times man
12 city york day
vegas home center
las town park
square
city park
home anniversary
day center million
memorial world
downtown
chapel park
disneyland downtown
avenue mall
replica hotel
garden cemetery
13 women population
census cities young white
american growing
world change
women population
children growing people living age
world poor
hispanic
gender social
feminist religion
inequality affluent
minorities culture theory racial
10 prices sales year
march april
economy percent growth
unemployment consumer
prices sales
percent economy
year rate
decline fell
months month
inflation growth upward sharply prices
downward downturn
inventories recession
rate
52 cars toyota auto car
production ford
motor plant parts
motors
cars toyota
car production
electric auto
automakers engine trucks fuel
chevrolet hybrid cruze
camaro midsize
gmc sedan sedans pickups sporty
83 bus crash killed train york
driver crashed people plane killing
bus crash truck train
accident passengers
police driver
hit southern
court agrees appeal
decision means ruling made
months regular season
35 star show girl
schwarzenegger baby black
celebrity arnold news reality
actress sarah
girl katie kate
daughter britney jessica
tells girlfriend
wife woman mother actress show
daughter man girl
interview red
74 derby rail
kentucky animal
morning kingdom
line winner horse
belmont
sunday saturday
match friday race open
thursday crowd
win week
derby belmont
preakness thoroughbred
horse kentucky stakes horses race
baffert
75 sheen day
charlie sports box
office replay photo
weekend men
summer movie year york
games win
hollywood night
american play
replay surprises
tour miss thor
sequel messina escort
accusations sequels
moviegoers episodes cheers scream
hangover fool
roaring broadway madness doodle
8 tornado tornadoes
people storms texas
weather homes south
alabama joplin
area southern county
hit homes south
california thousands northern florida
winds wildfires storm rain rains
wildfire hurricane flooding tropical floods
30 police man
woman year
family dead found death killed fire
woman girl boy
killing child
stabbed mother gunman
apartment kills
police woman
man killed found
mother home injured dead shot
54 oil gas
energy natural
spill drilling
gulf mexico
coal climate
gas oil
energy coal mine
drilling water mining
petroleum natural
91 death drug man
tuesday killer
search execution
found texas
penalty
found officials tuesday relatives police days
investigators evidence monday
drug
killer injection
executions penalty heroin
sentence sedative painkiller
lethal kills
44 world race
olympic cup
champion hockey
olympics london
championships wins
championships world
champion win won
olympics cup
championship olympic friday
olympic champion
cup usain
medalist olympics
win cycling
swimmer denmark
46 nfl players lockout league labor
owners judge
goodell court roger
talks negotiations agreement
meeting discuss dispute hearing monday
deal scheduled
nfl league
nba football games season
undrafted team sports coach
players lockout cheryl retains riding
rheumatoid amended
shots brit wiki
59 food health recipes
wine coffee
starbucks foods fresh eat
make
food products
meat vegetables
recipes chocolate
salad eat
coffee foods
chocolate wine bread
vegetable champagne
baked broccoli beans foods pasta
66 united world china states global
countries million year food
international
china world
countries economic
global africa
growth nations country
economy
overheating thirst
hazards salmonella pregnancy
pregnancies mortality sickness midlife
worsens
76 deal year
agreement trade
contract reached company
talks sign
extension
deal agreement
talks contract
negotiations accord
company signed plan
government
collapsed staying portal doubt action putt
balanced rights
marketplace join
Figure 5. Analogous plots to Figure 6 in the main paper for a tree about “daily life” on the TMN dataset.
18 libya obama president libyan war military gaddafi united muammar coast
20 libya obama president libyan gaddafi united military muammar war coast
46 protesters protests syrian forces syria government bahrain protest killed president
66 kahn strauss libya lede dominique imf journalists middle war french
95 war military afghanistan american report iraq rights human civil troops
1 time back years long make good week year fans big
feel things
bit sense pretty feeling
moment moments
smile ability
time long back years good make work
season times man
7 united states arab
minister iran
world east
middle state
foreign
united president minister
arab foreign
iraq iran
saudi world states
iran russia
sanctions diplomatic
talks summit tensions peace
countries envoy
36 war military
afghanistan american
iraq rights report human
civil troops
war intelligence
military human
iraq afghanistan
crimes army
security troops
58 libya libyan
gaddafi muammar
nato military rebels
qaddafi leader
fly
troops military force nato
forces war
government rebels friday
security
libya gaddafi libyan
muammar moammar
qaddafi gadhafi
benghazi libyans vieira
67 obama president barack white house
administration speech
wednesday support
visit
president visit
barack house bush
obama presidential secretary minister prime
obama limbo downs
als inhabiting
foal unaware
twins seafood
businesses
79 south sudan border army north region africa
people tunisia town
south border north killed troops
soldiers northern
africa thousands hundreds
court agrees appeal
decision means ruling made
months regular season
escalating enclave border restive tension congo ethnic
tensions bloodshed
defuse
86 israel gaza
palestinian israeli hamas
palestinians west
netanyahu bank
peace
israeli palestinian
israel gaza
palestinians militants
talks jerusalem
peace hamas
court agrees appeal
decision means ruling made
months regular season
89 president iran
german merkel news cuba
minister germany
cuban state
minister president
chief told
government economic
prime deputy agency
economy
cuban cuba
europe wide exiles
moratorium arrest island goal ideas
94 coast ivory
gbagbo ouattara laurent forces russian putin russia
president
forces troops
opposition fighting
supporters army fight
military leader rebels
ivory ouattara gbagbo laurent ivorian
alassane cocoa seized thierry liberian
loyalty president
move return
strongman young patriot
intensifies contest
son
5 protesters protests syrian syria
forces bahrain protest
government security police
police killed
saturday soldiers protests
thousands clashes
city protesters hundreds
protesters protests
opposition protest
government crackdown democracy thousands
party rights
protestors protesters
demonstrators inciting slogans
riot protests batons rioting mobs
28 election party
minister prime
elections vote
berlusconi opposition president
government
elections election
party prime
minister government
vote opposition
parliamentary ruling
81 yemen president
saleh ali
abdullah opposition
yemeni power leader step
president government
talks opposition
officials leader
tuesday monday
deal wednesday
impasse talks
standoff withdraw
protracted persuade
restart negotiations
avert showdown
ali saleh
abdullah yemeni sanaa iran
persian qatar
uranium saudi
31 guilty prison
charges years
sentenced court
murder trial man year
prison convicted
guilty pleaded murder criminal charges
jail lawyer
conviction
man prison court years trial
month case
woman convicted charges
crimes genocide murder ratko serb
tribunal criminal
prosecutor hague serbian
45 lawsuit million filed case suit
federal judge pay
court sued
filed lawsuit million court case
company pay legal rights claim
defamation lawsuit
infringement filed
alleging suing
wrongdoing libel sued
violating
61 libyan forces rebels nato
gaddafi rebel libya
muammar city
misrata
forces troops artillery army
fighters militants attack
airstrikes rebels force
sunday day main front
attack friday left
gunfire troops killed
misrata gaddafi
muammar qaddafi
moammar gadhafi
libya tripoli
benghazi libyan
80 kahn strauss
lede dominique
libya imf
journalists middle french hotel
journalist journalists
woman television
told police news media
security war
98 giffords hospital surgery gabrielle
rep doctors chavez brain
venezuela houston
surgery hospital injury brain clinic knee
medical transplant
repair doctors
giffords gabrielle caracas
zsa danisco
jalen congresswoman
polycom del
ronald
35 star show girl
schwarzenegger baby black
celebrity arnold news reality
actress sarah
girl katie kate
daughter britney jessica
tells girlfriend
wife woman mother actress show
daughter man girl
interview red
39 school college
students schools chicago
education high
university student
cooperative
school college
students education university schools teachers student colleges graduate
54 oil gas
energy natural
spill drilling
gulf mexico
coal climate
gas oil
energy coal mine
drilling water mining
petroleum natural
97 pirates ship navy
somali release
crew pirate
american jack
disney
ship navy
pirates ships boat
warship naval waters aboard pirate
russian talks friday
monday french agency british russia time
thursday
dodgers calif jack pete
champion buffett warren turtle
anchor ichiro
Figure 6. Analogous plots to Figure 6 in the main paper for a tree about “politics” on the TMN dataset.
30 food health study found air years time scientists people recipes
26 air found france scientists sea species ocean atlantic crash years
33 food health recipes time wine make coffee starbucks free foods
37 study women cancer drug people risk population heart patients census
51 tax year prices health states money million pay state costs
1 time back years long make good week year fans big
feel things
bit sense pretty feeling
moment moments
smile ability
time long back years good make work
season times man
42 air found france
scientists sea
species atlantic ocean crash
observatory
found scientists discovery wreckage evidence
discovered early
missing site sea
whales sea
volcano birds deer arctic
elephant mammals monkeys species
air france coast ocean french
sea water
atlantic flight paris
54 oil gas
energy natural
spill drilling
gulf mexico
coal climate
gas oil
energy coal mine
drilling water mining
petroleum natural
70 air flight plane airport airlines
jet traffic airline boeing flights
air flight plane
passengers aircraft flights airport planes boeing
jet
probe obama
instruction weight
devastating pension
laser san
manuals rips
3 dies book life art
family love
author died taylor age
art writer book love
author artist stone writes books
biography
book life
family world man love
music time
death story
6 business companies
tech small china start
market investors
make big
business companies
make investors
investment big
market money
buy good
investor marketers advertisers
savvy networking profitable investors startups
niche clout
12 city york day
vegas home center
las town park
square
city park
home anniversary
day center million
memorial world
downtown
chapel park
disneyland downtown
avenue mall
replica hotel
garden cemetery
59 food health recipes
wine coffee
starbucks foods fresh eat
make
food products
meat vegetables
recipes chocolate
salad eat
coffee foods
chocolate wine bread
vegetable champagne
baked broccoli beans foods pasta
68 questions readers books letter editor
amazon reader kindle book
apologizes
readers read
words comments
books book letters text
newspaper messages
95 madoff coli
outbreak people
flu ponzi health
bernard virus
deadly
flu disease people infected
virus outbreak
killed infections reported deaths
loan europe minister scheme made faction shared nation safety
populations
2 study cancer drug risk
heart patients disease drugs
researchers people
drug patients disease cancer breast
risk drugs
diabetes blood
treatment
study cancer
researchers disease doctors people found
studies tests
research
disorder overweight
asthma obese
patients symptoms
obesity estrogen diabetic
medication
diabetes treatments
clinical vaccine
prescribe arthritis
nutritional aspirin allergy asthma
10 prices sales year
march april
economy percent growth
unemployment consumer
prices sales
percent economy
year rate
decline fell
months month
inflation growth upward sharply prices
downward downturn
inventories recession
rate
13 women population
census cities young white
american growing
world change
women population
children growing people living age
world poor
hispanic
gender social
feminist religion
inequality affluent
minorities culture theory racial
4 tax health money states million
pay billion costs
federal insurance
money tax pay
million health billion costs
government care
insurance
medicaid medicare
tax money loans loan
income costs
pension borrowers
41 state governor
bill texas law
house immigration
arizona senate year
bill law
abortion immigration
repeal state
legislation texas
massachusetts legislature
52 cars toyota auto car
production ford
motor plant parts
motors
cars toyota
car production
electric auto
automakers engine trucks fuel
chevrolet hybrid cruze
camaro midsize
gmc sedan sedans pickups sporty
83 bus crash killed train york
driver crashed people plane killing
bus crash truck train
accident passengers
police driver
hit southern
court agrees appeal
decision means ruling made
months regular season
Figure 7. Analogous plots to Figure 6 in the main paper for a tree about “food and health” on the TMN dataset.
WEDTM
10 nokia lumia blackberry bbm phone android will channel window tablet
4 nokia lumia phone window tablet smartphone will inch device microsoft
48 blackberry bbm android channel app user social will device messenger
1 year time thing will big
good week best well don
lot thing pretty good think feel
going hard way find
year day
good best time will well long top ago
5 nokia lumia phone window tablet
smartphone will inch
device launch
will apple
company microsoft market
sell video buy
stock computer
nokia lumia liam
competing rollout
celebration aug
cancelled estimate
immigration
smartphone blackberry
smartphones ipad
megapixel kinect
handset app
cellphone android
38 black friday deal
monday cyber best sale
holiday shopping
find
best gift
online list buy web copy data
million record
black cyber
introduce bombshell decision
impromptu govt
pentagon pair brink
friday deal
monday asked
lay ian
stopping dallas tablet favor
56 good bmw claim order
durable car
jobless week labor drop
percent car
auto economy earnings
retail jobless market month
forecast
66 sony kinect patent apple
company tech
sensor device
technology wig
software microsoft
data company
application technology
google user
design communication
14 tweet print email nov
november update
comment share photo
reuters
photo update email est
updated headline monthly column
gmt editor
60 blackberry bbm
android channel
app user
social will
device messenger
blackberry android
messaging networking
user iphone apple
browser app
software
will network service major region internet group online mobile
national
Figure 8. Analogous plots to Figure 6 in the main paper for a tree about “phone” on the Twitter dataset.
37 thanksgiving friday black day holiday hanukkah year will time shopping
2 friday black thanksgiving shopping deal holiday store monday day cyber
6 thanksgiving hanukkah holiday year day time will turkey jewish family
19 cameron benefit david migrant movember keith urban prime will minister
37 illinois northern lynch michigan western husky jordan game quarterback season
45 thanksgiving macy parade day balloon will york band wind nov
95 hewlett packard share earnings quarter revenue company year fourth book
1 year time thing will big
good week best well don
lot thing pretty good think feel
going hard way find
year day
good best time will well long top ago
4 friday black
thanksgiving shopping
store day
holiday retailer year
shopper
shopping thanksgiving
store holiday
mall street food
discount christmas
retail
day season
year will time night early
training full
open
friday black unity
jeremy bill
compromise stricter
compare failed divide
14 tweet print email nov
november update
comment share photo
reuters
photo update email est
updated headline monthly column
gmt editor
38 black friday deal
monday cyber best sale
holiday shopping
find
best gift
online list buy web copy data
million record
black cyber
introduce bombshell decision
impromptu govt
pentagon pair brink
friday deal
monday asked
lay ian
stopping dallas tablet favor
7 thanksgiving hanukkah
holiday day year time
turkey will
jewish family
hanukkah thanksgiving
celebrate holiday feast
dinner meal
hannukah dessert
eat
year million family day
dinner people book event time week
2 photo nov
news november
image getty file
reuters topic
photograph
cox press editor david
columnist chris mail anne photo journal
news web
service wednesday
weekly press photo staff
coach san
30 cameron benefit david
migrant movember
keith urban prime
minister hair
foreign jean
european minister
government protest
ban opposition
inflation british
movember moustache eyebrow cervical raping
taxpayer oxytocin
huy prostate
lip
41 illinois northern
lynch michigan western husky jordan
quarterback game
season
championship win
season consecutive
victory cup
touchdown football
bowl basketball
73 usa today sport credit
mandatory center
nov image
oct quarter
field jeff
center game
forward houston
tied dallas
chicago ball
manhattan brown oldest
updated questioned
georgia bryant queen enlarge
filmmaker
79 press associated published
print bill bon
picture jovi nfl
comment
newspaper weekly scott news paul
coach tom
director told
spokeswoman
86 school rape
steubenville ohio
charge year case girl
charged leave
police investigation
jury attorney abuse senior court
school rape
criminal
ohio location
announced conspiracy
settled monday stunning conduct
died three
50 google street view
station meningitis
airport university student
princeton train
service space travel airport
site will visit
mission center
destination
university school health college medical center degree
treatment mental state
53 macy parade
thanksgiving day
balloon york will
band wind
school
mph band park
school night tour
street day
festival beach
89 hewitt love
jennifer baby birth girl
brian married actress
husband
daughter married
christmas girl boy
pregnant baby wife
girlfriend divorced
actress gameday
excitement celebration masterpiece hollywood
reveals sharlto peyton frantic
19 hewlett packard share
earnings quarter revenue company
fourth sale nyse
earnings percent
profit quarter stock
market billion share
forecast growth
hewlett packard
keith bright
knowshon grow
walking secretive published
unsure
49 book psalm printed auction
bay baghdad record killed iraq
tuesday
killed wounded
police bomb city
attack blast dead
explosion people
sotheby printed
sold bought
collection auctioned
selling record
purchased printing
book psalm
approval geek
recorded bryant
tue etat
monogamy couch
Figure 9. Analogous plots to Figure 6 in the main paper for a tree about “thanksgiving” on the Twitter dataset.
16 net game brooklyn knicks york star point jellyfish robot lakers
11 net brooklyn jellyfish robot piston flying raptor toronto kidd point
12 knicks york game star will usa lakers wizard lecavalier mike
15 bryant kobe lakers year extension angeles los contract nba signed
62 star wizard game duck lakers dallas washington wall anaheim point
1 year time thing will big
good week best well don
lot thing pretty good think feel
going hard way find
year day
good best time will well long top ago
2 photo nov
news november
image getty file
reuters topic
photograph
cox press editor david
columnist chris mail anne photo journal
news web
service wednesday
weekly press photo staff
coach san
12 net brooklyn jellyfish robot piston flying raptor toronto
kidd point
game win
team scored
play saturday quarter
lead goal
coach
fossil robotic
tail aircraft
quantum plane sea
ancient design space
51 knicks york mike
anthony woodson carmelo
stoudemire shumpert
game season
season game coach playing injury night start going injured team
stoudemire couldn
carmelo amare lebron lakers
shootaround didn gasol knicks
betting beijing
question remark tuesday
wednesday china kong
morning regulator
22 bryant kobe lakers year
extension angeles
los contract
nba signed
contract month return nba sign team year
season tuesday
deal
angeles lakers
los gasol nba tim
segundo laker star
roster
kobe bryant year
reference yeezus fastest forcing desktop reveals
preventing
29 star wizard game duck dallas lakers
washington wall
anaheim point
game goal
scored play star
scoring quarter
win second season
73 usa today sport credit
mandatory center
nov image
oct quarter
field jeff
center game
forward houston
tied dallas
chicago ball
manhattan brown oldest
updated questioned
georgia bryant queen enlarge
filmmaker
87 lecavalier tampa
johansson globe
scarlett golden
flyer lightning
bay will
film play
movie winning
title football player time win best
press philadelphia
eric story
phoenix downtown
florida written father
multiple
lecavalier vinny
scarlett samantha
legend geek
joaquin seth jan
pope
14 tweet print email nov
november update
comment share photo
reuters
photo update email est
updated headline monthly column
gmt editor
Figure 10. Analogous plots to Figure 6 in the main paper for a tree about “NBA” on the Twitter dataset.
WEDTM
lion microsoft market amazon tablet playstation makerdevices wireless app technology security service salesusers year smartphone android customers phones dealplans nokia report world week system online breachsmartphones services amp apps group percent buy ex-ecutive top thursday music information ceo computerssamsung consumer location chief wednesday storeslargest research business electronics growth booksprices jobs china windows major selling hackers firmlaunch high digital tuesday friday shares intel com-panies systems tablets make years device city peoplepopulation energy verizon blackberry profit internetbid store
Below are some example synthetic documents withWEDTM trained on the WS dataset.
1. research papers thesis project writing edu paper disser-tation proposal reports patent technical report topicsguide academic exploratory information write methodsresearchers publications parallel process phd issuesideas computing media topic custom projects essayhelp archives tools page labs web lab university com-munication resources ibm proposals intellectual homeadvice cluster dissertations survey library essays ad-dress practical links step abstract property bell systemslinux performance master serve abstracts developedwritten department index quality exploring matters restheses documents course college collaborative process-ing programme materials focused preparing computerprogram students search guidelines online articles ex-perience services recommendations welcome conductfunding aspects idea people
2. football team league news american fans nfl playerssoccer stadium sports indoor national com home statis-tics club history hockey baseball college fan associa-tion teams arena england game player texas websitescores basketball professional fixtures espn season ncaaclubs university afl schedule tickets minnesota inde-pendent tables athletics sport welcome united leaguesdedicated ice chicago giants nba stadiums online rugbystats nhl gymnastics coverage coaches levels deliverscity arsenal index coach database conference standingsinformation cstv fantasy bears articles views athleticmoved rules assault bulls supporters officials kickoffdivision games women tournament photos body mediafame covering adjacent sexual federation pages rutgers
3. health hiv prevention stress healthy aids diet nutritionhepatitis fitness food epidemic pressure infection in-formation cdc disease people blood life gov diseasesepidemiology treatment cholesterol picnic tips medicalarticles children safety symptoms fat topics heart foods
living centers fatty help kids liver care researchers eat-ing diagnosis who weight body virus recipes humancontrol viral sheets com helps patterns and compre-hensive diabetes risk trials int learn conditions adviceinfluenza diets exercise affect public press consensusconstant united eat infections listing unaids expertsmind personality disorder causes world flu infectiousguidelines global loss women avian humans populationucsf cope job brains index
4. system government parliamentary republic presidentialmaradona diego parliament systems westminster free-dom independence democracy semi-presidential con-sensus constitutional armando elected congressionalconstitution executive america people declaration elec-tions president czech gov australia election documentshand french political national congress authority unitedarchives legislature kingdom country governmentspowers power public british representation canada ar-gentine versus buenos separation born legislative cen-tral assembly affairs republics india peace france srisemi english branch body operates charters romaniadistributed presidency practice the parliaments govern-ing house countries britain rule federal principles jurylanka foreign bill govern nunavut portuguese irelandaires immense head qld parl pub shtml politics ukraineparties
5. school research college edu graduate university edu-cation students elementary center student library dis-trict information schools program programs degree citypublic media papers phd master thesis law departmentundergraduate home project staff america regional writ-ing scholarship calendar independent academic townpaper administration news scholarships north disser-tation community faculty welcome archives virginiachildren president union wisconsin temple proposalparents ohio pennsylvania teachers philadelphia re-ports guide houston nation degrees chronicle servicesmassachusetts patent opportunities boston technicalactivities county study masters east report employmentcommission usa expo lewis michigan san academymemorial topics meeting institute campus homepageweb arizona maine business announcements philly hub
ReferencesPolson, N., Scott, J., and Windle, J. Bayesian inference
for logistic models using Polya–Gamma latent variables.Journal of the American Statistical Association, 108(504):1339–1349, 2013.
Wray, B. and Marcus, H. A Bayesian view of the Poisson-Dirichlet process. arXiv preprint arXiv:1007.0296v2[math.ST], 2012.
WEDTM
Zhao, H., Rai, P., Du, L., and Buntine, W. Bayesian multi-label learning with sparse features and labels, and labelco-occurrences. In AISTATS, pp. 1943–1951, 2018.
Zhou, M. Softplus regressions and convex polytopes. arXivpreprint arXiv:1608.06383, 2016.
Zhou, M. and Carin, L. Negative binomial process countand mixture modeling. TPAMI, 37(2):307–320, 2015.
Zhou, M., Cong, Y., and Chen, B. Augmentable gammabelief networks. JMLR, 17(163):1–44, 2016.