Sparsity Control for Robust Principal Component Analysis

1

Sparsity Control for Robust Principal Component Analysis

Gonzalo Mateos and Georgios B. Giannakis

ECE Department, University of Minnesota

Acknowledgments: NSF grants no. CCF-1016605, EECS-1002180

Asilomar ConferenceNovember 10, 2010

22

Principal Component Analysis

Our goal: robustify PCA by controlling outlier sparsity

Motivation: (statistical) learning from high-dimensional data

Principal component analysis (PCA) [Pearson’1901] Extraction of low-dimensional data structure Data compression and reconstruction PCA is non-robust to outliers [Jolliffe’86]

DNA microarray Traffic surveillance

33

Our work in context

Robust PCA Robust covariance matrix estimators [Campbell’80], [Huber’81] Computer vision [Xu-Yuille’95], [De la Torre-Black’03] Low-rank matrix recovery from sparse errors [Wright et al’09]

Huber’s M-class and sparsity in linear regression [Fuchs’99]

Contemporary applications Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] Video surveillance, e.g., [Oliver et al’99]

Original Robust PCA `Outliers’

44

PCA formulations Training data:

Minimum reconstruction error: Dimensionality reduction operator Reconstruction operator

Maximum variance:

Factor analysis model:

Solution:

55

Robustifying PCA Least-trimmed squares (LTS) regression [Rousseeuw’87]

(LTS PCA)

LTS-based PCA for robustness

is the -th order statistic among

Trimming constant determines breakdown point

Q: How should we go about minimizing ?

(LTS PCA) is nonconvex; existence of minimizer(s)?

A: Try all subsets of size , solve, and pick the best

Simple but intractable beyond small problems

66

Modeling outliers

Remarks and are unknown If outliers sporadic, then vector is sparse!

Introduce auxiliary variables s.t. inlieroutlier

Inliers obey ; outliers something else Inlier noise: are zero-mean i.i.d. random vectors

Natural (but intractable) estimator

77

LTS PCA as sparse regression Lagrangian form

Tuning controls sparsity in , thus number of outliers

(P0)

Justifies the model and its estimator (P0); ties sparsity with robustness

Proposition 1: If solves (P0) with chosen such that , then solves (LTS PCA) too.

8

Just relax! (P0) is NP-hard relax

(P2)

Q: Does (P2) yield robust estimates ?

A: Yap! Huber estimator is a special case

Role of sparsity controlling is central

9

Entrywise outliers Use -norm regularization

(P1)

Original Robust PCA (P2) Robust PCA (P1)

Outlier pixels

Entire image

rejected

Outlier pixels

rejected

1010

Alternating minimization(P1)

update: reduced-rank Procrustes rotation update: coordinatewise soft-thresholding

Proposition 2: Alg. 1’s iterates converge to a stationary point of (P1).

1111

Refinements Nonconvex penalty terms approximate better in (P0)

Options: SCAD [Fan-Li’01], or sum-of-logs [Candes etal’08]

Iterative linearization-minimization of around Iteratively reweighted version of Alg. 1 Warm start: solution of (P1) or (P2) Bias reduction in (cf. weighted Lasso [Zou’06])

Discard outliers identified in Re-estimate missing data problem

1212

Online robust PCA Motivation: Real-time data and memory limitations

Exponentially-weighted robust PCA

Approximation [Yang’95] At time , do not re-estimate past outlier vectors

1313

Video surveillanceOriginal PCA Robust PCA `Outliers’

Data: http://www.cs.cmu.edu/~ftorre/

1414

Online PCA in actionA

ng

le b

etw

een

C(n

) an

d C

Inliers:

Outliers:

Figure of merit: angle between and

1515

Concluding summary Sparsity control for robust PCA

LTS PCA as -(pseudo)norm regularized regression (NP-hard) Relaxation (group)-Lassoed PCA M-type estimator Sparsity controlling role of central

Tests on real video surveillance data for anomaly extraction

Batch and online robust PCA algorithms i) Outlier identification, ii) Robust subspace tracking Refinements via nonconvex penalty terms

Ongoing research Preference measurement: conjoint analysis and collaborative filtering Robustifying kernel PCA and blind dictionary learning

Sparsity Control for Robust Principal Component Analysis

Documents

Transcript of Sparsity Control for Robust Principal Component Analysis