PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust...

23
1 M. Verleysen UCL 1 PCA and Mixtures of PCA: PCA and Mixtures of PCA: Improving the robustness to outliers Improving the robustness to outliers 26 26 October October 2007 2007 Joint work with - Nicolas Delannay, UCL machine learning group - Cédric Archambeau, University College London M. Verleysen UCL 2 Overview Overview Principal Component Analysis: a reminder Principal Component Analysis: a reminder Probabilistic PCA Probabilistic PCA Robust probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments Experiments

Transcript of PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust...

Page 1: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

1

M. Verleysen UCL1

PCA and Mixtures of PCA:PCA and Mixtures of PCA:Improving the robustness to outliersImproving the robustness to outliers

26 26 OctoberOctober 20072007

Joint work with- Nicolas Delannay, UCL machine learning group- Cédric Archambeau, University College London

M. Verleysen UCL2

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

Page 2: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

2

M. Verleysen UCL3

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

M. Verleysen UCL4

Maximum variance direction [1/2]Maximum variance direction [1/2]

We look for an orthogonal coordinate system such that We look for an orthogonal coordinate system such that the elements of the elements of XX in the new system are uncorrelatedin the new system are uncorrelated

We look for axes that maximize variance after projectionWe look for axes that maximize variance after projection

-3 3 -0.5

0.5

-0.5 0.5

-3

3

=

Page 3: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

3

M. Verleysen UCL5

Maximum variance direction [2/2]Maximum variance direction [2/2]

We look for axes which maximise the variance after We look for axes which maximise the variance after projectionprojection

i.e. i.e. along axis along axis uu withwith

In the new coordinate system :In the new coordinate system :

Therefore :Therefore :

Maximum variance = min. distortion (LMS) Maximum variance = min. distortion (LMS)

uYYu ][][)( 11

12

1TT

X EXXEXVar === σ

uY=1X

1|||| == Tuuu

M. Verleysen UCL6

Maximum variance = minimum distortionMaximum variance = minimum distortion

y1

y2

x1

max

min

Page 4: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

4

M. Verleysen UCL7

Choice of direction [1/3]Choice of direction [1/3]

Choice of Choice of uu ??

uΘΛΘu

uCu

uYYue

YYCu

u

u

321)(

maxarg

maxarg

][maxarg

EVD

TT

YYT

TT E

=

=

=

eΘΛΘeYYC321

)(

2max,1

EVD

TTX =σ

M. Verleysen UCL8

Choice of direction [2/3]Choice of direction [2/3]

Choice of Choice of uu ??

where Θ1 is the eigenvector corresponding to the largesteigenvalue of CYY

uYYueu

][maxarg TT E= 1Θe =

T

neig

TTX ]0...01.[

0000...00000000

].0...01[ 2

1

)(

2max, 11

1

⎥⎥⎥⎥

⎢⎢⎢⎢

==

λ

λλ

σ ΘΘΛΘΘYC321

12

max,1λσ =X where nλλλ ≥≥≥ ...21

Page 5: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

5

M. Verleysen UCL9

Choice of direction [3/3]Choice of direction [3/3]

Classical result :Classical result :

In the space orthogonal to In the space orthogonal to uu11::

And so on ...And so on ...

Best choice for u1 is the eigenvector Θ1 associated tothe largest eigenvalue λ1 of matrix CYY.

Best choice for u2 is the eigenvector Θ2 associated tothe largest eigenvalue λ2 of matrix CYY.

M. Verleysen UCL10

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

Page 6: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

6

M. Verleysen UCL11

Probabilistic PCAProbabilistic PCA

Idea #1: Idea #1: generativegenerative probabilisticprobabilistic model with model with latent variables latent variables xxIdea #2: Idea #2: linearlinear dependencydependency between between yynn and and xxnn

Idea #3: Idea #3: GaussianGaussian distribution of distribution of latent variableslatent variablesIdea #4: Idea #4: GaussianGaussian additive additive noisenoise

{ } { } NJxy JNnn

DNnn <ℜ⊂ℜ⊂ == ,, 11

( ) ( )( ) ( )D

J

IWxyxyP

IOxxP1,

,−+Ν=

Ν=

τμ

M. Verleysen UCL12

Probabilistic PCAProbabilistic PCA

Marginal distribution:Marginal distribution:

wherewhere

( ) ( )( ) ( )D

J

IWxyxyP

IOxxP1,

,−+Ν=

Ν=

τμ

Isotropic: natural (what else?)

Arbitrary center and variance:no problem because compensated by μ and W

Gaussians: natural, and mathematical convenience!

( ) ( ) ( ) ( )ΣΝ== ∫Χ ,μydxxPxyPyP

DT IWW 1−+=Σ τ

Page 7: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

7

M. Verleysen UCL13

Probabilistic PCA trainingProbabilistic PCA training

Finding parameters Finding parameters θθ ={={WW, , μμ, , τ}τ} inin

How ? By finding the parameters that lead to How ? By finding the parameters that lead to maximum likelihood of the observations maximum likelihood of the observations yynn

( ) ( )( ) ( )D

J

IWxyxyP

IOxxP1,

,−+Ν=

Ν=

τμ

( ) ( )

( )( )

( ) ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+−Σ−=

⎟⎟⎠

⎞⎜⎜⎝

⎛−=

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

⎛=

∏∏

−−

nn

Tn

nn

nn

nn

Nyy

yP

yPyP

11

ML

2log22

minarg

logminarg

logmaxargmaxarg

πτμμτ

θ

θ

θ

θθ

M. Verleysen UCL14

Probabilistic PCA trainingProbabilistic PCA training

How to find How to find θθMLML ??

≡≡ fitting a multivariate Gaussian distribution to the fitting a multivariate Gaussian distribution to the observations, with a constrained covariance matrixobservations, with a constrained covariance matrix

How: EM algorithmHow: EM algorithm

( ) ( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛+−Σ−= ∑ −−

nn

Tn

Nyy 11ML 2log

22minarg πτμμτθ

θ

DT IWW 1−+=Σ τ

Page 8: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

8

M. Verleysen UCL15

Probabilistic PCAProbabilistic PCA

Is it useful?Is it useful?

(does the probabilistic formulation add something to (does the probabilistic formulation add something to the deterministic one?)the deterministic one?)

M. Verleysen UCL16

Probabilistic PCAProbabilistic PCA

Is it useful?Is it useful?

(does the probabilistic formulation add something to (does the probabilistic formulation add something to the deterministic one?)the deterministic one?)

Answer: no Answer: no

Tipping and Bishop showed that the axes found by Tipping and Bishop showed that the axes found by PPCA span the same subspace as those found by PCAPPCA span the same subspace as those found by PCA

(standard equivalence between ML + Gaussian noise (standard equivalence between ML + Gaussian noise hypothesis, and LMS criterion) hypothesis, and LMS criterion)

Page 9: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

9

M. Verleysen UCL17

Probabilistic PCAProbabilistic PCA

Is it useful?Is it useful?

(does the probabilistic formulation add something to (does the probabilistic formulation add something to the deterministic one?)the deterministic one?)

Answer: yes Answer: yes ☺☺

The probabilistic formulation makes it possible to The probabilistic formulation makes it possible to introduce new, more realistic hypothesesintroduce new, more realistic hypotheses

M. Verleysen UCL18

Probabilistic PCAProbabilistic PCA

Is it useful?Is it useful?

(does the probabilistic formulation add something to (does the probabilistic formulation add something to the deterministic one?)the deterministic one?)

Answer: yes Answer: yes ☺☺

The probabilistic formulation makes it possible to The probabilistic formulation makes it possible to introduce new, more realistic hypothesesintroduce new, more realistic hypotheses

Ex: LMS criterion (Ex: LMS criterion (≡≡ Gaussian Gaussian hyphyp.) does not .) does not correspond to natural goals!!!correspond to natural goals!!!

Page 10: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

10

M. Verleysen UCL19

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

M. Verleysen UCL20

PCA and outliersPCA and outliers

With With ““easyeasy”” datadata

y1

y2

x1

Page 11: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

11

M. Verleysen UCL21

PCA and outliersPCA and outliers

With a few outliersWith a few outliers

y1

y2 new data

x1

M. Verleysen UCL22

PCA and outliersPCA and outliers

Why is it the case?Why is it the case?

y1

y2

x1

new data:not likely (low posterior)

Page 12: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

12

M. Verleysen UCL23

PCA and outliersPCA and outliers

Why is it the case?Why is it the case?

y1

y2

x1

new data:not likely (low posterior)

M. Verleysen UCL24

PCA and outliersPCA and outliers

Why is it the case?Why is it the case?

y1

y2

x1

all data:more likely

Page 13: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

13

M. Verleysen UCL25

PCA robust to outliersPCA robust to outliers

Increasing the probability of outliers: Increasing the probability of outliers: GaussianGaussian

y1

y2

x1

new data

Problem: all dataclose to x1 becomeequally probable

M. Verleysen UCL26

PCA robust to outliersPCA robust to outliers

Increasing the probability of outliers: Increasing the probability of outliers: StudentStudent

y1

y2

x1

new data

For identical outlierposterior: betterdiscrimination around axis

Page 14: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

14

M. Verleysen UCL27

StudentStudent--tt distributiondistribution

−5 0 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

P(x

)

−5 0 50

2

4

6

8

10

12

14

x

−lo

g P

(x)

GaussianStudent

GaussianStudent

M. Verleysen UCL28

Robust PPCARobust PPCA

Probabilistic PCAProbabilistic PCA

Robust Probabilistic PCARobust Probabilistic PCA

( ) ( )( ) ( )D

J

IWxyxyP

IOxxP1,

,−+Ν=

Ν=

τμ

( ) ( )( ) ( )ντμ

ν

,,St

,,St1

D

J

IWxyxyP

IOxxP−+=

=Identical for simplicity !

Page 15: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

15

M. Verleysen UCL29

Robust PPCARobust PPCA

ModelModel

But StudentBut Student--tt = infinite mixture of Gaussians:= infinite mixture of Gaussians:

where where Ga(Ga(uu|.,.) is a Gamma distribution|.,.) is a Gamma distribution

Therefore theTherefore thegenerative model isgenerative model is

( ) ( )( ) ( )⎪⎩

⎪⎨⎧

+=

=− ντμ

ν

,,St

,,St1

D

J

IWxyxyP

IOxxP

( ) 0,2

,2

Ga1,N,,St0

>⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛ Σ=Σ ∫∞

νννμνμ duuu

yy

( )

( )

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛=

D

J

Iu

WxyuxyP

Iu

OxuxP

uuP

τμ

νν

1,N,

1,N

2,

2Ga

M. Verleysen UCL30

Robust PPCARobust PPCA

ModelModel

Good news: the posterior is tractableGood news: the posterior is tractable

where againwhere again

( )

( )

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛=

D

J

Iu

WxyuxyP

Iu

OxuxP

uuP

τμ

νν

1,N,

1,N

2,

2Ga

( ) ( ) ( ) ( ) ( )νμ ,,St,0

Σ== ∫ ∫∞ ydxuPuxPuxyPyP

X

DT IWW 1−+=Σ τ

Page 16: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

16

M. Verleysen UCL31

Robust PPCA trainingRobust PPCA training

Finding parameters Finding parameters θθ ={={WW, , μμ, , τ, ν}τ, ν} inin

How ? By finding the parameters that lead to How ? By finding the parameters that lead to maximum likelihood of the observations maximum likelihood of the observations yynn

( )( )⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑ n

nyPlogminargML

θθ

( )

( )

( ) ⎟⎠⎞

⎜⎝⎛ +=

⎟⎠

⎞⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛=

D

J

Iu

WxyuxyP

Iu

OxuxP

uuP

τμ

νν

1,N,

1,N

2,

2Ga

M. Verleysen UCL32

Robust PPCA advantagesRobust PPCA advantages

1.1. Only one parameter to fix in advance (the dimension Only one parameter to fix in advance (the dimension of the latent of the latent ––projectionprojection-- space)space)

2.2. Natural framework for an extension to mixturesNatural framework for an extension to mixtures

Page 17: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

17

M. Verleysen UCL33

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

M. Verleysen UCL34

Mixtures of (robust) PPCAMixtures of (robust) PPCA

(P)PCA: linear dependencies in data only(P)PCA: linear dependencies in data only

Mixtures of (P)PCA: nonlinear dependencies, through Mixtures of (P)PCA: nonlinear dependencies, through mixtures of linear manifoldsmixtures of linear manifolds

Latent variable model, Latent variable model, nono common projection spacecommon projection space

y1

y2

x11

x12

x13

Page 18: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

18

M. Verleysen UCL35

Mixtures of (robust) PPCAMixtures of (robust) PPCA

Concept of (hidden) cluster membershipConcept of (hidden) cluster membership

y1

y2

x11

x12

x13

( ) ( )yPyP kk

k∑= π

Single PCA model

Mixture proportions1

0

=

>

∑k

k

k

ππ

M. Verleysen UCL36

Mixtures of (robust) PPCAMixtures of (robust) PPCA

zz=[=[zz11, , zz22,,……, , zzKK] with ] with zzkk=1 if =1 if yynn was generated by component was generated by component kkzzkk=0 otherwise=0 otherwise

Latent variable modelLatent variable model

( )

( )

( )

( )k

k

k

k

z

kD

kkkk

z

kJ

k

z

k

kkk

k

zk

Iu

xWyzuxyP

Iu

OxzuxP

uzuP

zP

⎟⎟⎠

⎞⎜⎜⎝

⎛+=

⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛=

=

τμ

νν

π

1,N,,

1,N,

2,

2Ga

Possible differentdimensionalities

Page 19: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

19

M. Verleysen UCL37

Mixtures of (robust) PPCA trainingMixtures of (robust) PPCA training

Finding parameters Finding parameters θθ ={(={(WWkk, , μμkk, , ττkk, , ννkk, , ππkk)})}kk=1=1……KK inin

How ? By finding the parameters that lead to How ? By finding the parameters that lead to maximum likelihood of the observations maximum likelihood of the observations yynn

( )( )⎟⎟⎠

⎞⎜⎜⎝

⎛−= ∑ n

nyPlogminargML

θθ

( )

( )

( )

( )k

k

k

k

z

kD

kkkk

z

kJ

k

z

k

kkk

k

zk

Iu

xWyzuxyP

Iu

OxzuxP

uzuP

zP

⎟⎟⎠

⎞⎜⎜⎝

⎛+=

⎟⎟⎠

⎞⎜⎜⎝

⎛=

⎟⎠⎞

⎜⎝⎛=

=

τμ

νν

π

1,N,,

1,N,

2,

2Ga

M. Verleysen UCL38

Mixtures of (robust) PPCA trainingMixtures of (robust) PPCA training

In practice: EM algorithmIn practice: EM algorithmEE--step: fix the model parameters, and compute step: fix the model parameters, and compute expectations expectations (over an approximate distribution) of (over an approximate distribution) of latent variableslatent variablesMM--step: update the model parameters to step: update the model parameters to maximize maximize the likelihoodthe likelihood

Two hyperTwo hyper--parameters: parameters: –– the number of componentsthe number of components–– the dimensionality of the latent representationsthe dimensionality of the latent representations

Page 20: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

20

M. Verleysen UCL39

OverviewOverview

Principal Component Analysis: a reminderPrincipal Component Analysis: a reminder

Probabilistic PCAProbabilistic PCA

Robust probabilistic PCARobust probabilistic PCA

Mixtures of (robust) probabilistic PCAMixtures of (robust) probabilistic PCA

ExperimentsExperiments

M. Verleysen UCL40

ExperimentsExperiments

PCA and robust PCAPCA and robust PCA

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

y1

y 2

Page 21: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

21

M. Verleysen UCL41

ExperimentsExperiments

nonnon--robustrobust robustrobust

PCAPCA

mixturesmixtures

−5 −4 −3 −2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

y1

y 2

−5 −4 −3 −2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

y1

y 2

−5 −4 −3 −2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

y1

y 2

−5 −4 −3 −2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

y1

y 2

M. Verleysen UCL42

ExperimentsExperiments

Three 3Three 3--D Gaussian clustersD Gaussian clusters–– Diagonal covariance matrix: diag(5, 1, 0.2) before rotationDiagonal covariance matrix: diag(5, 1, 0.2) before rotation–– Intrinsic dimensionality: 2Intrinsic dimensionality: 2

−10−5

05

10 −10−5

05

10

−10

−8

−6

−4

−2

0

2

4

6

8

10

y2y

1

y 3

0 2 4 6 8 10 121.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5x 10

4

K

−lo

g P

(y)

number of components

dim of latent spaceJ=1 J=2

robust

without outliers

Page 22: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

22

M. Verleysen UCL43

ExperimentsExperiments

True model (True model (K K =3, =3, J J =2)=2)OutliersOutliers

0 10 20 30 40 50 601.65

1.7

1.75

1.8

1.85

1.9

1.95

2

2.05

2.1x 10

4

# outliers

−lo

g P

(y)

Performances on test set

0 10 20 30 40 50 600

10

20

30

40

50

60

70

80

90

100

# outliers

ν k

Degree of freedom parameters

almost Gaussian

takes outliers into account

M. Verleysen UCL44

ExperimentsExperiments

USPS handwritten digit datasetUSPS handwritten digit dataset–– around 700 images of around 700 images of ““22””–– around 700 images of around 700 images of ““33””–– around 100 images of around 100 images of ““00”” (as outliers)(as outliers)

Experiment: two 1Experiment: two 1--D components (two clusters)D components (two clusters)Illustrations: images close from the 1Illustrations: images close from the 1--D subspacesD subspaces

standard mixtures robust mixtures

Page 23: PCA and Mixtures of PCA: Improving the robustness to outliers · Probabilistic PCA Robust probabilistic PCA Mixtures of (robust) probabilistic PCA Experiments M. Verleysen UCL 4 Maximum

23

M. Verleysen UCL45

ConclusionsConclusions

Probabilistic formulation of PCA:Probabilistic formulation of PCA:–– identical to PCAidentical to PCA–– but possible to introduce other hypothesesbut possible to introduce other hypotheses

Robust PCA:Robust PCA:–– replacing Gaussians by Studentreplacing Gaussians by Student--tt–– makes outliers more likely makes outliers more likely →→ lower influence on the modellower influence on the model

Mixtures of PCAMixtures of PCA–– latent variable modellatent variable model–– better than mixtures of Gaussians through regularization better than mixtures of Gaussians through regularization

(dimensionality of latent space)(dimensionality of latent space)

Mixtures of Robust PCAMixtures of Robust PCA–– replacing Gaussians by Studentreplacing Gaussians by Student--tt