Frank Wood - [email protected] Gentle Introduction to Infinite Gaussian Mixture Modeling … with...

31
Frank Wood - [email protected] Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999

Transcript of Frank Wood - [email protected] Gentle Introduction to Infinite Gaussian Mixture Modeling … with...

Page 1: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Gentle Introduction to Infinite Gaussian Mixture Modeling

… with an application in neuroscience

By Frank WoodRasmussen, NIPS 1999

Page 2: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Neuroscience Application: Spike Sorting• Important in

neuroscience and for medical device performance

• Neural electrical activity is recorded and “spikes” are manually detected and segmented

• “Spike sorting” is the process of deciding which waveforms are spikes and which out of an unknown number of neurons they came from

Page 3: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Spike Sorting Data

Many waveforms recorded on a single electrode and plotted

together

Accepted neuroscience assumption: ideal mean spike, Gaussian noise

PCA

We want to know which spikes came from the which neuron.

Lewicki et al 99

Page 4: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Important Questions

• Did these two spikes come from the same neuron?– Did these two data points come from the same

hidden class?

• How many neurons are there?– How many hidden classes are there?

• Which spikes came from which neurons?– What model best explains the data?

Page 5: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Mixture Modeling

Class weight, class prior probability, multinomial

Multivariate Normal

Number of hidden components

Normal parameters

Observations

Class weights

Normal = Gaussian

A formalism for modeling a probability density function as a sum of parameterized functions.

Page 6: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Toy Data and Notationthe data, observed

Page 7: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Toy Data and Notationthe data, observed

Page 8: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Toy Data and Notation

red = 1, green = 2, blue = 3, black =4

the data, observed

Page 9: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Toy Data and Notation

red = 1, green = 2, blue = 3, black =4

the data, observed

Page 10: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Goal: learn model parameters from unlabeled data

• Learn the mixture model parameters– Maximum likelihood estimation

• Good if you are certain that your generative model is correct and if all you want is a point estimate of ‘the right answer’

• Fast, expectation maximization parameter estimation

– Bayesian estimation• Better if you would like to maintain a representation of

your modeling uncertainty• Slow, sampling• No ‘right answer’ – learn a distribution instead• Can treat the number of hidden classes as a parameter

to be learned

Page 11: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Bayesian Modeling

• Estimate a posterior distribution over models• Provides a principled way to encode prior beliefs about the

form of the solution• Posterior distribution represented by samples• Will enable us to estimate how many hidden classes there are

Posterior PriorLikelihood

= model

= observations / training data

Page 12: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

What we need:

• Priors for the model parameters• Sampler

– To draw samples from the posterior distribution

Page 13: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Priors for the model parameters

• Prior over class assignments– Class assignments are Multinomial, we will

choose a conjugate Dirichlet prior. This allows us to specify a priori how likely we think each class will be.

• Prior over class distribution parameters– Class distributions are multivariate Normal. We

will choose conjugate Normal*Inverse-Wishart priors. These let us specify a priori where and how broad we think each mixture density should be.

Page 14: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Conjugate Priors

• A prior distribution is conjugate if a likelihood distribution times the prior results in a distribution with the same functional form as the prior distribution

• Examples:

Poisson Gamma

Binomial Beta

Multinomial Dirichlet

Multivariate Normal

Multivariate Normal * Inverse Wishart

Likelihood Conjugate Prior

Page 15: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Sampling the posterior distribution

• Simulate a Markov chain whose equilibrium distribution is the Bayesian mixture model posterior distribution

Multivariate Normal

Normal Inverse-Wishart

Multinomial Dirichlet

Posterior: Remember, a distribution over model parameters is what we seek.

Prior: CRP over class assignments,

normal-IW over normal parameters

Likelihood: Multivariate Normal

Metropolis, Geman & Geman

Page 16: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

But what about the infinite part?

– Properly parameterized, a posterior formed from a Multinomial Dirichlet conjugate pair is well behaved as the number of hidden classes approaches infinity.

– This results in a model with an infinite number of hidden causes, but one that only a finite number are causal w.r.t. our finite dataset.

– The Chinese Restaurant Process is one process that generates samples from such a model.

• A hyperparameter (prior) will remain that allows us to specify our a priori belief about how many hidden classes cause our finite data.

Page 17: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Sampling class membership in an infinite mixture model: the Chinese Restaurant Process

1 2 3 456 7 8

9

10

11

Exchangeable distribution (Aldous, 1985; Pitman, 2002)

Infinitely many tables

First customer sits at the first table.

Remaining customers seat themselves randomly.

Page 18: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Infinite Gaussian Mixture Model Sampler

• Easy to implement and use• Gibbs sampler – conjugate priors produce

analytic conditional distributions for sampling• Two step iterative algorithm:

– Sample Normal distribution means and covariances given a current assignment of data to classes

– Sample the assignment of data to classes given current values for the means and covariances (CRP)

• After some time, sampler converges to a set of samples from the posterior, i.e. a scored set of feasible models given the training data

Page 19: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Gibbs Sampling the Posterior

Page 20: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Toy Data Results

Distribution over # of classes K Maximum a posteriori sample

Page 21: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Single channel spike sorting resultsExpectation MaximizationInfinite Mixture Model

• Priors enforce preference for intuitive models

• CRP prior allows inference over # of hidden classes

• Lack of priors allows non-intuitive solutions

• No distribution over # of hidden classes

Page 22: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Conclusions

• Bayesian mixture modeling is principled way to add prior information into the modeling process

• IMM / CRP is a way estimate the number of hidden classes

• Infinite Gaussian mixture modeling is good for automatic spike sorting

• Particle filtering for online spike sorting

Future Work

Page 23: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Thank you

IGMM Software available at http://www.cs.brown.edu/~fwood/code.html

Thanks to Michael Black, Tom Griffiths, Sharon Goldwater, and the Brown University machine

learning reading group.

This work was supported by NIH-NINDS R01 NS 50967-01 as part of theNSF/NIH Collaborative Research in Computational Neuroscience Program.

Page 24: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Generative Viewpoint

Pick class label according to multinomial

Generate observation according to class model

.25

.25

.25

.25

Page 25: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Mixture Modeling

• A formalism for modeling a probability density function as a sum of parameterized functions

• Observed population data is complicated – not well fit by a canonical parametric distribution

• Assume: ‘Hidden’ subpopulation data is simple – well fit by a canonical parametric distribution

• Hope: 1 hidden subpopulation <-> 1 simple parametric distribution

• Key questions:– How many hidden subpopulations are responsible for generating the

data?– Which subpopulation did each data point come from?

1D MIXTURE MODEL PLOT HERE

Page 26: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Limiting Behavior of Uniform Dirichlet Prior

Page 27: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Bayesian Mixture Model Priors

• Prior over class assignments

• Prior over class distribution parameters

Page 28: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Conjugacy – our friend

• If you choose a conjugate prior then the posterior will be in the same family as the prior.– Normal <-> Normal * Inverse-Wishart– Dirichlet <-> Multinomial

– Analytic posteriors allow Gibbs sampling

Page 29: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Sampler

Page 30: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Maximum likelihood techniques

• Expectation maximization

• Bayesian Information Criteria

-- but not Bayesian; no distribution over

Page 31: Frank Wood - fwood@cs.brown.edu Gentle Introduction to Infinite Gaussian Mixture Modeling … with an application in neuroscience By Frank Wood Rasmussen,

Frank Wood - [email protected]

Example applications

• Modeling network packet traffic– Network applications’ performance dependent on

distribution of incoming packets – Want a population model to build a fancy scheduler– Potentially multiple heterogeneous applications generating

packet traffic– How many types of applications are generating packets?

• Clustering sensor data (robotics, sensor networks)– Robot encounters multiple types of physical environments

(doors, walls, hallways, etc.)– How many types of environments are there?– How do we tell what type of space we are in?