Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh...

28
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William Cohen and Eric Xing Machine Learning Department Carnegie Mellon University
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh...

Page 1: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

Sparse Word Graphs:A Scalable Algorithm for Capturing Word Correlations in Topic Models

Ramesh NallapatiJoint work with

John Lafferty, Amr Ahmed,

William Cohen and Eric XingMachine Learning Department

Carnegie Mellon University

Page 2: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 2/28

Introduction

• Statistical topic modeling: an attractive framework for topic discovery– Completely unsupervised– Models text very well

• Lower perplexity compared to unigram models

– Reveals meaningful semantic patterns– Can help summarize and visualize document

collections– e.g.: PLSA, LDA, DPM, DTM, CTM, PA

Page 3: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 3/28

Introduction

• A common assumption in all the variants:– Exchangeability: “bag of words” assumption– Topics represented as a ranked list of words

• Consequences:– Word Correlation information is lost

• e.g.: “white-house” vs. “white” and “house”• Long distance correlations

Page 4: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 4/28

Introduction

• Objective:– To capture correlations between words within

topics

• Motivation:– More interpretable representation of topics as

a network of words rather than a list– Helps better visualize and summarize

document collections– May reveal unexpected relationships and

patterns within topics

Page 5: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 5/28

Past Work: Topic Models

• Bigram topic models [Wallach, ICML 2006]

• Requires KV(K-1) parameters

• Only captures local dependencies

• Does not model sparsity of correlations

• Does not capture “within-topic” correlations

Page 6: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 6/28

Past work: Other approaches

• Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96]

– Word pair correlation measured as a weighted count of number of times they occur within a fixed length window

– Weight of an occurrence / 1/(mutual distance)

Page 7: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 7/28

Past work: Other approaches

• Hyperspace Analog to Language (HAL) [Lund and Burges, Cog. Sci., ‘96]

– Plusses: • Sparse solutions, scalability

– Minuses: • Only unearths global correlations, not semantic correlations

– E.g.: “river – bank”, “bank – check” • Only local dependencies

Page 8: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 8/28

Past work: Other approaches

• Query expansion in IR– Similar in spirit: finds words that highly co-

occur with the query words– However, not a corpus visualization tool:

requires a context to operate on

• Wordnet– Semantic networks– Human labeled: not directly related to our goal

Page 9: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 9/28

Our approach

• L1 norm regularization

– Known to enforce sparse solutions• Sparsity permits scalability

– Convex optimization problem • Globally optimal solutions

– Recent advances in learning structure of graphical models:

• L1 regularization framework asymptotically leads to true structure

Page 10: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 10/28

Background:LASSO

• Example: linear regression

• Regularization used to improve generalizability– E.g.1: Ridge regression: L2 norm regularization

– E.g.2: Lasso: L1 norm regularization

Page 11: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 11/28

Background: LASSO

• Lasso encourages sparse solutions

Page 12: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 12/28

Background: Gaussian Random Fields

• Multivariate Gaussian distribution

• Random field structure: G = (V,E)– V: set of all variables {X1,,Xp}

– (s,t) 2 E , -1st 0

– Xs ? Xu | XN(s) where u N(s)

Page 13: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 13/28

Background: Gaussian Random Fields

• Estimating the graph structure of GRF from data [Meinshausen and Buhlmann, Annals. Stats., 2006]

– Regress each variable onto others imposing L1 penalty to encourage sparsity

– Estimated neighborhood:

Page 14: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 14/28

Background: Gaussian Random Fields

True Graph Estimated graph

Courtesy: [Meinshausen and Buhlmann, Annals. Stats., 2006]

Page 15: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 15/28

Background: Gaussian Random Fields

• Application to topic models: CTM [Blei and Lafferty, NIPS, 2006]

Page 16: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 16/28

Background: Gaussian Random Fields

• Application to CTM:[Blei & Lafferty, Annals. Appl. Stats., ‘07]

Page 17: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 17/28

Structure learning of an MRF

• Ising model

• L1 regularized conditional likelihood learns true structure asymptotically

[Wainwright, Ravikumar and Lafferty, NIPS’06]

Page 18: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 18/28

Structure learning of an MRFCourtesy: [Wainwright, Ravikumar and Lafferty, NIPS’06]

Page 19: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 19/28

Sparse Word Graphs• Algorithm

– Run LDA on the document collection and obtain topic assignments

– Convert topic assignments for each document into K binary vectors X:

– Assume an MRF for each topic with X as underlying data

– Apply structure learning for MRF using regularized conditional likelihood

Page 20: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 20/28

Sparse Word Graphs

Page 21: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 21/28

Sparse Word Graphs: Scalability

• We still run V logistic regression problems, each of size V for each topic: O(KV2) !– However, each example is very sparse

– L1 penalty results in sparse solutions

– Can run each topic in parallel

– Efficient interior point based L1 regularized logistic regression [Koh, Kim & Boyd, JMLR,’07]

Page 22: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 22/28

Experiments

• Small AP corpus– 2.2K Docs, 10.5K unique words

• Ran 10 topic LDA model

• Used = 0.1 in L1 logistic regression

• Took just 45 min. per topic

• Very sparse solutions– Computes only under 0.1% of the total

number of possible edges

Page 23: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 23/28

Topic “Business”: neighborhood of top LDA terms

Page 24: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 24/28

Topic “Business”: neighborhood of top edges

Page 25: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 25/28

Topic “War”: neighborhood of top LDA terms

Page 26: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 26/28

Topic “War”: neighborhood of top edges

Page 27: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 27/28

Concluding remarks

• Pros– A highly scalable algorithm for capturing

within topic word correlations– Captures both short distance and long

distance correlations– Makes topics more interpretable

• Cons– Not a complete probabilistic model

• Significant modeling challenge since the correlations are latent

Page 28: Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models Ramesh Nallapati Joint work with John Lafferty, Amr Ahmed, William.

8/28/2007/9:30am ICDM’07 HPDM workskop 28/28

Concluding remarks

• Applications of Sparse Word Graphs– Better document summarization and

visualization tool– Word sense disambiguation– Semantic query expansion

• Future Work– Evaluation on a “real task”– Build a unified statistical model