PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust...

M. Verleysen UCL1

PCA and Mixtures of PCA:PCA and Mixtures of PCA:Improving the robustness to outliersImproving the robustness to outliers

26 26 OctoberOctober 20072007

Joint work with- Nicolas Delannay, UCL machine learning group- Cédric Archambeau, University College London

M. Verleysen UCL2

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

M. Verleysen UCL3

OverviewOverview

M. Verleysen UCL4

Maximum variance direction [1/2]Maximum variance direction [1/2]

We look for an orthogonal coordinate system such that We look for an orthogonal coordinate system such that the elements of the elements of XX in the new system are uncorrelatedin the new system are uncorrelated

We look for axes that maximize variance after projectionWe look for axes that maximize variance after projection

-3 3 -0.5

-0.5 0.5

M. Verleysen UCL5

Maximum variance direction [2/2]Maximum variance direction [2/2]

We look for axes which maximise the variance after We look for axes which maximise the variance after projectionprojection

i.e. i.e. along axis along axis uu withwith

In the new coordinate system :In the new coordinate system :

Therefore :Therefore :

Maximum variance = min. distortion (LMS) Maximum variance = min. distortion (LMS)

uYYu ][][)( 11

X EXXEXVar === σ

1|||| == Tuuu

M. Verleysen UCL6

Maximum variance = minimum distortionMaximum variance = minimum distortion

M. Verleysen UCL7

Choice of direction [1/3]Choice of direction [1/3]

Choice of Choice of uu ??

uΘΛΘu

maxarg

][maxarg

eΘΛΘeYYC321

2max,1

TTX =σ

M. Verleysen UCL8

Choice of Choice of uu ??

where Θ1 is the eigenvector corresponding to the largesteigenvalue of CYY

uYYueu

][maxarg TT E= 1Θe =

TTX ]0...01.[

0000...00000000

].0...01[ 2

2max, 11

⎥⎥⎥⎥

⎢⎢⎢⎢

σ ΘΘΛΘΘYC321

max,1λσ =X where nλλλ ≥≥≥ ...21

M. Verleysen UCL9

Classical result :Classical result :

In the space orthogonal to In the space orthogonal to uu11::

And so on ...And so on ...

Best choice for u1 is the eigenvector Θ1 associated tothe largest eigenvalue λ1 of matrix CYY.

Best choice for u2 is the eigenvector Θ2 associated tothe largest eigenvalue λ2 of matrix CYY.

M. Verleysen UCL10

OverviewOverview

M. Verleysen UCL11

Idea #1: Idea #1: generativegenerative probabilisticprobabilistic model with model with latent variables latent variables xxIdea #2: Idea #2: linearlinear dependencydependency between between yynn and and xxnn

Idea #3: Idea #3: GaussianGaussian distribution of distribution of latent variableslatent variablesIdea #4: Idea #4: GaussianGaussian additive additive noisenoise

{ } { } NJxy JNnn

DNnn <ℜ⊂ℜ⊂ == ,, 11

( ) ( )( ) ( )D

IWxyxyP

IOxxP1,

,−+Ν=

M. Verleysen UCL12

Marginal distribution:Marginal distribution:

wherewhere

( ) ( )( ) ( )D

IWxyxyP

IOxxP1,

,−+Ν=

Isotropic: natural (what else?)

Arbitrary center and variance:no problem because compensated by μ and W

Gaussians: natural, and mathematical convenience!

( ) ( ) ( ) ( )ΣΝ== ∫Χ ,μydxxPxyPyP

DT IWW 1−+=Σ τ

M. Verleysen UCL13

Probabilistic PCA trainingProbabilistic PCA training

Finding parameters Finding parameters θθ ={={WW, , μμ, , τ}τ} inin

How ? By finding the parameters that lead to How ? By finding the parameters that lead to maximum likelihood of the observations maximum likelihood of the observations yynn

( ) ( )( ) ( )D

IWxyxyP

IOxxP1,

,−+Ν=

( ) ( )

( )( )

( ) ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+−Σ−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

∏∏

−−

2log22

minarg

logminarg

logmaxargmaxarg

πτμμτ

M. Verleysen UCL14

Probabilistic PCA trainingProbabilistic PCA training

How to find How to find θθMLML ??

≡≡ fitting a multivariate Gaussian distribution to the fitting a multivariate Gaussian distribution to the observations, with a constrained covariance matrixobservations, with a constrained covariance matrix

How: EM algorithmHow: EM algorithm

( ) ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+−Σ−= ∑ −−

Nyy 11ML 2log

22minarg πτμμτθ

DT IWW 1−+=Σ τ

M. Verleysen UCL15

Is it useful?Is it useful?

(does the probabilistic formulation add something to (does the probabilistic formulation add something to the deterministic one?)the deterministic one?)

M. Verleysen UCL16

Answer: no Answer: no

Tipping and Bishop showed that the axes found by Tipping and Bishop showed that the axes found by PPCA span the same subspace as those found by PCAPPCA span the same subspace as those found by PCA

(standard equivalence between ML + Gaussian noise (standard equivalence between ML + Gaussian noise hypothesis, and LMS criterion) hypothesis, and LMS criterion)

M. Verleysen UCL17

Answer: yes Answer: yes ☺☺

The probabilistic formulation makes it possible to The probabilistic formulation makes it possible to introduce new, more realistic hypothesesintroduce new, more realistic hypotheses

M. Verleysen UCL18

Answer: yes Answer: yes ☺☺

The probabilistic formulation makes it possible to The probabilistic formulation makes it possible to introduce new, more realistic hypothesesintroduce new, more realistic hypotheses

Ex: LMS criterion (Ex: LMS criterion (≡≡ Gaussian Gaussian hyphyp.) does not .) does not correspond to natural goals!!!correspond to natural goals!!!

M. Verleysen UCL19

OverviewOverview

M. Verleysen UCL20

PCA and outliersPCA and outliers

With With ““easyeasy”” datadata

M. Verleysen UCL21

With a few outliersWith a few outliers

y2 new data

M. Verleysen UCL22

Why is it the case?Why is it the case?

new data:not likely (low posterior)

M. Verleysen UCL23

new data:not likely (low posterior)

M. Verleysen UCL24

all data:more likely

M. Verleysen UCL25

PCA robust to outliersPCA robust to outliers

Increasing the probability of outliers: Increasing the probability of outliers: GaussianGaussian

new data

Problem: all dataclose to x1 becomeequally probable

M. Verleysen UCL26

PCA robust to outliersPCA robust to outliers

Increasing the probability of outliers: Increasing the probability of outliers: StudentStudent

new data

For identical outlierposterior: betterdiscrimination around axis

M. Verleysen UCL27

StudentStudent--tt distributiondistribution

−5 0 50

GaussianStudent

M. Verleysen UCL28

Robust PPCARobust PPCA

Robust Probabilistic PCARobust Probabilistic PCA

( ) ( )( ) ( )D

IWxyxyP

IOxxP1,

,−+Ν=

( ) ( )( ) ( )ντμ

IWxyxyP

IOxxP−+=

=Identical for simplicity !

M. Verleysen UCL29

ModelModel

But StudentBut Student--tt = infinite mixture of Gaussians:= infinite mixture of Gaussians:

where where Ga(Ga(uu|.,.) is a Gamma distribution|.,.) is a Gamma distribution

Therefore theTherefore thegenerative model isgenerative model is

( ) ( )( ) ( )⎪⎩

⎪⎨⎧

=− ντμ

IWxyxyP

( ) 0,2

Ga1,N,,St0

>⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛ Σ=Σ ∫∞

νννμνμ duuu

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎟⎠⎞

⎜⎝⎛=

WxyuxyP

M. Verleysen UCL30

ModelModel

Good news: the posterior is tractableGood news: the posterior is tractable

where againwhere again

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎟⎠⎞

⎜⎝⎛=

WxyuxyP

( ) ( ) ( ) ( ) ( )νμ ,,St,0

Σ== ∫ ∫∞ ydxuPuxPuxyPyP

DT IWW 1−+=Σ τ

M. Verleysen UCL31

Robust PPCA trainingRobust PPCA training

Finding parameters Finding parameters θθ ={={WW, , μμ, , τ, ν}τ, ν} inin

( )( )⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑ n

nyPlogminargML

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎟⎠⎞

⎜⎝⎛=

WxyuxyP

M. Verleysen UCL32

Robust PPCA advantagesRobust PPCA advantages

1.1. Only one parameter to fix in advance (the dimension Only one parameter to fix in advance (the dimension of the latent of the latent ––projectionprojection-- space)space)

2.2. Natural framework for an extension to mixturesNatural framework for an extension to mixtures

M. Verleysen UCL33

OverviewOverview

M. Verleysen UCL34

Mixtures of (robust) PPCAMixtures of (robust) PPCA

(P)PCA: linear dependencies in data only(P)PCA: linear dependencies in data only

Mixtures of (P)PCA: nonlinear dependencies, through Mixtures of (P)PCA: nonlinear dependencies, through mixtures of linear manifoldsmixtures of linear manifolds

Latent variable model, Latent variable model, nono common projection spacecommon projection space

M. Verleysen UCL35

Concept of (hidden) cluster membershipConcept of (hidden) cluster membership

( ) ( )yPyP kk

k∑= π

Single PCA model

Mixture proportions1

M. Verleysen UCL36

zz=[=[zz11, , zz22,,……, , zzKK] with ] with zzkk=1 if =1 if yynn was generated by component was generated by component kkzzkk=0 otherwise=0 otherwise

Latent variable modelLatent variable model

xWyzuxyP

OxzuxP

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

⎟⎠⎞

⎜⎝⎛=

Possible differentdimensionalities

M. Verleysen UCL37

Mixtures of (robust) PPCA trainingMixtures of (robust) PPCA training

Finding parameters Finding parameters θθ ={(={(WWkk, , μμkk, , ττkk, , ννkk, , ππkk)})}kk=1=1……KK inin

( )( )⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑ n

nyPlogminargML

xWyzuxyP

OxzuxP

⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

⎟⎠⎞

⎜⎝⎛=

M. Verleysen UCL38

Mixtures of (robust) PPCA trainingMixtures of (robust) PPCA training

In practice: EM algorithmIn practice: EM algorithmEE--step: fix the model parameters, and compute step: fix the model parameters, and compute expectations expectations (over an approximate distribution) of (over an approximate distribution) of latent variableslatent variablesMM--step: update the model parameters to step: update the model parameters to maximize maximize the likelihoodthe likelihood

Two hyperTwo hyper--parameters: parameters: –– the number of componentsthe number of components–– the dimensionality of the latent representationsthe dimensionality of the latent representations

M. Verleysen UCL39

OverviewOverview

M. Verleysen UCL40

PCA and robust PCAPCA and robust PCA

−4 −3 −2 −1 0 1 2 3 4−4

M. Verleysen UCL41

nonnon--robustrobust robustrobust

PCAPCA

mixturesmixtures

−5 −4 −3 −2 −1 0 1 2−4

M. Verleysen UCL42

Three 3Three 3--D Gaussian clustersD Gaussian clusters–– Diagonal covariance matrix: diag(5, 1, 0.2) before rotationDiagonal covariance matrix: diag(5, 1, 0.2) before rotation–– Intrinsic dimensionality: 2Intrinsic dimensionality: 2

−10−5

10 −10−5

0 2 4 6 8 10 121.6

2.5x 10

number of components

dim of latent spaceJ=1 J=2

robust

without outliers

M. Verleysen UCL43

True model (True model (K K =3, =3, J J =2)=2)OutliersOutliers

0 10 20 30 40 50 601.65

2.1x 10

# outliers

Performances on test set

0 10 20 30 40 50 600

# outliers

Degree of freedom parameters

almost Gaussian

takes outliers into account

M. Verleysen UCL44

USPS handwritten digit datasetUSPS handwritten digit dataset–– around 700 images of around 700 images of ““22””–– around 700 images of around 700 images of ““33””–– around 100 images of around 100 images of ““00”” (as outliers)(as outliers)

Experiment: two 1Experiment: two 1--D components (two clusters)D components (two clusters)Illustrations: images close from the 1Illustrations: images close from the 1--D subspacesD subspaces

standard mixtures robust mixtures

M. Verleysen UCL45

ConclusionsConclusions

Probabilistic formulation of PCA:Probabilistic formulation of PCA:–– identical to PCAidentical to PCA–– but possible to introduce other hypothesesbut possible to introduce other hypotheses

Robust PCA:Robust PCA:–– replacing Gaussians by Studentreplacing Gaussians by Student--tt–– makes outliers more likely makes outliers more likely →→ lower influence on the modellower influence on the model

Mixtures of PCAMixtures of PCA–– latent variable modellatent variable model–– better than mixtures of Gaussians through regularization better than mixtures of Gaussians through regularization

(dimensionality of latent space)(dimensionality of latent space)

Mixtures of Robust PCAMixtures of Robust PCA–– replacing Gaussians by Studentreplacing Gaussians by Student--tt

PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust...

Documents

Transcript of PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust...

Emphysema Classification Based on Embedded Probabilistic PCA · 2015-07-10 · Emphysema Classiﬁcation Based on Embedded Probabilistic PCA Teresa Zulueta-Coarasa1 Sila Kurugol 3James

LECTURE :FACTOR ANALYSIS › ~rita › uml_course › lectures › FA.pdf · Probabilistic PCA The columns of W are the principle components. Can be found using ML in closed form

t) — Vol. 20, No. 2, 1998 : PCa 1-73. , PCa PCa PCa 3.1 — 1231 · 2010. 1. 22. · PCa 1-73. , PCa PCa PCa 3.1 — 1231 — Created Date: 2/26/2003 3:40:41 AM ...

Deep Learning: A Bayesian Perspective - arXivtion 2 provides a Bayesian probabilistic interpretation of many traditional statistical techniques (PCA, PCR, SIR, LDA) which are shown

Robust probabilistic PCA with missing data and ...epubs.surrey.ac.uk/69209/2/tchen09-csda.pdfRobust probabilistic PCA with missing data and contribution analysis for outlier detection

Probabilistic seismic inversion using pseudo-wells · 2019-08-25 · Patrick Connolly*, PCA Ltd geophysics for integration. Patrick Connolly Associates Ltd. Probabilistic seismic

Scalable probabilistic PCA for large-scale genetic ... · singular value decomposition (SVD) scales quadratically with sample size (for datasets where the number of SNPs is larger

Sparse Submodular Probabilistic PCAproceedings.mlr.press/v38/khanna15.pdfSparse Submodular Probabilistic PCA for the latent variables. e.g. another application could be sparse topic

Syllabus - Universität Hildesheim · Machine Learning 1. Principal Components Analysis Outline 1. Principal Components Analysis 2. Probabilistic PCA & Factor Analysis 3. Non-linear

Elements, Compounds, Homogeneous mixtures & Heterogeneous mixtures

Emphysema Classification Based on Embedded ... Classiﬁcation Based on Embedded Probabilistic PCA Teresa Zulueta-Coarasa1 Sila Kurugol 3James C. Ross George G. Washko2 Ra´ul San

Probabilistic Linear Discriminant Analysis · recognition [1,2]. Most notably, these include Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). While PCA identiﬁes

Mixtures of Probabilistic Principal Component Analysers · Mixtures of Probabilistic Principal Component Analysers 2 of S. In this context, the principal component projection is often

Representing Probabilistic Rules with Networks of …tresp/papers/tresp-mlj.pdfREPRESENTING PROBABILISTIC RULES WITH NETWORKS OF GBFS 3 man et al. (1988) use mixtures of Gaussians

Pathological Brain Detection Using Weiner Filtering, 2D ...downloads.hindawi.com/journals/cin/2017/4205141.pdf · 2D-Discrete Wavelet Transform, Probabilistic PCA, and Random Subspace

PCA 310, PCA 320, PCA 330 - Hanna Instruments Portugal PCA320 PCA330.pdf · 1 Manual de Instruções PCA 310, PCA 320, PCA 330 Analisadores de Cloro, pH, Temperatura e ORP Fabricantes

Mixtures of Tree-Structured Probabilistic Graphical Models for … · fund for scientiﬁc research, to the PASCAL2 network, to WBI and to the PAI(Polesd’AttractionInteruniversitaires)BIOMAGNET(Bioinformatics

Pure Substances, Mixtures, and Solutionsphhsicpchemistry.weebly.com/.../icpchem_pure_substances_mixtures_and_solutions_pdf.pdfHomogeneus mixtures Homogeneous mixtures : is a mixture

Dimensionality Reduction: Probabilistic PCA and Factor Analysis · 2016-02-14 · Using the equation for marginal of Gaussians (lecture-2 and PRML 2.115), the marginal distribution

Bayesian peA - papers.nips.ccpapers.nips.cc/paper/1549-bayesian-pca.pdf · Bayesian peA 385 Figure 1: Representation of Bayesian PCA as a probabilistic graphical model showing the