Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by...

34
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH

Transcript of Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by...

Page 1: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Bayesian Hierarchical Clustering

Paper by K. Heller and Z. GhahramaniICML 2005Presented by HAO-WEI, YEH

Page 2: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Outline

Background - Traditional Methods

Bayesian Hierarchical Clustering (BHC) Basic ideas

Dirichlet Process Mixture Model (DPM)

Algorithm

Experiment results

Conclusion

Page 3: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Background

Traditional Method

Page 4: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Hierarchical ClusteringGiven : data pointsOutput: a tree (series of

clusters)Leaves : data points Internal nodes : nested clusters

ExamplesEvolutionary tree of living

organisms Internet newsgroupsNewswire documents

Page 5: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Traditional Hierarchical Clustering

Bottom-up agglomerative algorithm

Closeness based on given distance measure

(e.g. Euclidean distance between cluster means)

Page 6: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Traditional Hierarchical Clustering (cont’d)

Limitations No guide to choosing correct number of clusters, or where to prune tree.

Distance metric selection (especially for data such as images or sequences)

Evaluation (Probabilistic model) How to evaluate how good result is ? How to compare to other models ? How to make predictions and cluster new data with existing hierarchy ?

Page 7: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

BHC

Bayesian Hierarchical Clustering

Page 8: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Bayesian Hierarchical Clustering

Basic ideas: Use marginal likelihoods to decide which clusters to merge

P(Data to merge were from the same mixture component)vs. P(Data to merge were from different mixture components)

Generative Model : Dirichlet Process Mixture Model (DPM)

Page 9: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Dirichlet Process Mixture Model (DPM)

Formal Definition

Different Perspectives Infinite version of Mixture Model (Motivation and Problems)

Stick-breaking Process (How generated distribution look like)

Chinese Restaurant Process, Polya urn scheme

Benefits Conjugate prior

Unlimited clusters

“Rich-Get-Richer, ” Does it really work? Depends!

Pitman-Yor process, Uniform Process, …

Page 10: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

BHC Algorithm - Overview

Same as traditional One-pass, bottom-up method

Initializes each data point in own cluster, and iteratively merges pairs of clusters.

Difference Uses a statistical hypothesis test to choose which clusters to merge.

Page 11: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

BHC Algorithm - Concepts

Two hypotheses to compare1. All data was generated i.i.d.

from the same probabilistic model with unknown parameters.

2. Data has two or more clusters in it.

Page 12: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Hypothesis H1

Probability of the data under H1:

: prior over the parameters

Dk : data in the two trees to be merged

Integral is tractable with conjugate prior

Page 13: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Hypothesis H2

Probability of the data under H2:

Product over sub-trees

Page 14: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

From Bayes Rule, the posterior probability of the merged hypothesis:

The pair of trees with highest probability are merged.

Natural place to cut the final tree:

Data number, concentration(DPM)Hidden features (Beneath Distribution)

BHC Algorithm - Working Flow

Page 15: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Tree-Consistent Partitions

Consider the right tree and all 15 possible partitions of {1,2,3,4}:

(1)(2)(3)(4), (1 2)(3)(4), (1 3)(2)(4), (1 4)(2)(3), (2 3)(1)(4),

(2 4)(1)(3), (3 4)(1)(2), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3),

(1 2 3)(4), (1 2 4)(3), (1 3 4)(2), (2 3 4)(1), (1 2 3 4)

(1 2) (3) (4) and (1 2 3) (4) are tree-consistent partitions.

(1)(2 3)(4) and (1 3)(2 4) are not tree-consistent partitions.

Page 16: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Merged Hypothesis Prior (πk)

Based on DPM (CRP perspective)

πk = P(All points belong to one cluster)

d’s are the case for all tree-consistent partitions

Page 17: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Predictive Distribution

BHC allow to define predictive distributions for new data points.

Note : P(x|D) != P(x|Dk) for root!?

Page 18: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Approximate Inference for DPM prior

BHC forms a lower bound for the marginal likelihood of an infinite mixture model by efficiently summing over an exponentially large subset of all partitions. Idea : deterministically sum over partitions with high probability, therefore accounting for

most of the mass.

Compare to MCMC method, this is more deterministic and efficient.

Page 19: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Learning Hyperparameters

α : Concentration parameter

β : Define G0

Learned by recursive gradients and EM-like method

Page 20: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

To Sum Up for BHC

Statistical model for comparison and decides when to stop.

Allow to define predictive distributions for new data points.

Approximate Inference for DPM marginal.

Parameters α : Concentration parameter

β : Define G0

Page 21: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Unique Aspects of BHC Algorithm

Hierarchical way of organizing nested clusters, not a hierarchical generative model.

Derived from DPM.

Hypothesis test : one vs. many other clusterings

(compare to one vs. two clusters at each stage)

Not iterative and does not require sampling. (except for learning parameters)

Page 22: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Results

from the experiments

Page 23: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Page 24: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Page 25: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Page 26: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Page 27: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Page 28: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Conclusion

and some take home notes

Page 29: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Conclusion

Limitations

-> No guide to choosing correct number of clusters, or where to prune tree.

(Natural Stop Criterion) <-

-> Distance metric selection

(Model-based Criterion) <-

-> Evaluation, Comparison, Inference

(Probabilistic model) <-

Some useful results for DPM) <-

Solved!

!

Page 30: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Summary

Defines probabilistic model of data, can compute probability of new data point belonging to any cluster in tree.

Model-based criterion to decide on merging clusters.

Bayesian hypothesis testing used to decide which merges are advantageous, and to decide appropriate depth of tree.

Algorithm can be interpreted as approximate inference method for a DPM; gives new lower bound on marginal likelihood by summing over exponentially many clusterings of the data.

Page 31: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Limitations

Inherent greediness

Lack of any incorporation of tree uncertainty

O(n2) complexity for building tree

Page 32: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

References

Main paper: Bayesian Hierarchical Clustering, K. Heller and Z. Ghahramani, ICML 2005

Thesis: Efficient Bayesian Methods for Clustering, Katherine Ann Heller

Other references:

Wikipedia

Paper Slides

www.ee.duke.edu/~lcarin/emag/.../DW_PD_100705.ppt

http://cs.brown.edu/courses/csci2950-p/fall2011/lectures/2011-10-13_ghosh.pdf

General ML

http://blog.echen.me/

Page 33: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

References

Other references(cont’d)

DPM & Nonparametric Bayesian :

http://nlp.stanford.edu/~grenager/papers/dp_2005_02_24.ppt

https://www.cs.cmu.edu/~kbe/dp_tutorial.pdf

http://www.iro.umontreal.ca/~lisa/seminaires/31-10-2006.pdf

http://videolectures.net/mlss07_teh_dp/ , http://mlg.eng.cam.ac.uk/tutorials/07/ywt.pdf

http://www.cns.nyu.edu/~eorhan/notes/dpmm.pdf (Easy to read)

http://mlg.eng.cam.ac.uk/zoubin/talks/uai05tutorial-b.pdf

Heavy text:

http://stat.columbia.edu/~porbanz/reports/OrbanzTeh2010.pdf

http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/dp.pdf

http://www.stat.uchicago.edu/~pmcc/reports/clusters.pdf

Hierarchical DPM

http://www.cs.berkeley.edu/~jordan/papers/hdp.pdf

Other methods

https://people.cs.umass.edu/~wallach/publications/wallach10alternative.pdf

Page 34: Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Thank You for Your Attentions!