Sparsity Control for Robustness and Social Data Analysis

32
1 Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Wa MURI (AFOSR FA9550-10-1-0567) grant Minneapolis, MN December 9, 2011

description

Sparsity Control for Robustness and Social Data Analysis. Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments : Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grant. - PowerPoint PPT Presentation

Transcript of Sparsity Control for Robustness and Social Data Analysis

Page 1: Sparsity Control for Robustness and Social Data Analysis

1

Sparsity Control for Robustness and Social Data Analysis

Gonzalo MateosECE Department, University of Minnesota

Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grantMinneapolis, MN

December 9, 2011

Page 2: Sparsity Control for Robustness and Social Data Analysis

22

Learning from “Big Data” `Data are widely available, what is scarce is the ability to extract wisdom from them’

Hal Varian, Google’s chief economist

BIG Fast

Productive

Revealing

Ubiquitous

SmartK. Cukier, ``Harnessing the data deluge,'' Nov. 2011. Messy

Page 3: Sparsity Control for Robustness and Social Data Analysis

33

Social-Computational Systems

The means: leverage dual role of sparsity Complexity control through variable selection Robustness to outliers

Complex systems of people and computers

The vision: preference measurement (PM), analysis, management Understand and engineer SoCS

Page 4: Sparsity Control for Robustness and Social Data Analysis

44

Conjoint analysis Marketing, healthcare, psychology [Green-Srinivasan‘78]

Success story [Wind et al’89]

Attributes: room size, TV options, restaurant, transportation

Goal: learn consumer’s utility function from preference data Linear utilities: `How much is each part worth?’

Optimal design and positioning of new products Strategy: describe products by a set of attributes, `parts’

Page 5: Sparsity Control for Robustness and Social Data Analysis

55

Modeling preliminaries Respondents (e.g., consumers)

Rate profiles Each comprises attributes

Linear utility: estimate vector of partworths

Conjoint data collection formats

(M1) Metric ratings:

(M2) Choice-based conjoint data:

Online SoCS-based preference data exponentially increases Inconsistent/corrupted/irrelevant data Outliers

Page 6: Sparsity Control for Robustness and Social Data Analysis

6

residuals discarded

6

Robustifying PM Least-trimmed squares [Rousseeuw’87]

(LTS)

Q: How should we go about minimizing nonconvex (LTS)?A: Try all subsets of size , solve, and pick the best

is the -th order statistic among

G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted).

Simple but intractable beyond small problems Near optimal solvers [Rousseeuw’06], RANSAC [Fischler-Bolles’81]

Page 7: Sparsity Control for Robustness and Social Data Analysis

77

Modeling outliers Outlier variables s.t. outlier

otherwise

Both and unknown, typically sparse!

Natural (but intractable) nonconvex estimator

Nominal ratings obey (M1); outliers something else -contamination [Fuchs’99], Bayesian model [Jin-Rao’10]

Page 8: Sparsity Control for Robustness and Social Data Analysis

88

LTS as sparse regression Lagrangian form

Tuning parameter controls sparsity in number of outliers

(P0)

Formally justifies the preference model and its estimator (P0) Ties sparse regression with robust estimation

Proposition 1: If solves (P0) with chosen s.t. , then in (LTS).

Page 9: Sparsity Control for Robustness and Social Data Analysis

99

Just relax!

(P1)

(P1) convex, and thus efficiently solved Role of sparsity-controlling is central

Q: Does (P1) yield robust estimates ?A: Yap! Huber estimator is a special case

where

(P0) is NP-hard relax e.g., [Tropp’06]

Page 10: Sparsity Control for Robustness and Social Data Analysis

1010

Lassoing outliers Suffices to solve Lasso [Tibshirani’94]

Data-driven methods to select Lasso solvers return entire robustification path (RP)

Proposition 2: ,Minimizers of (P1) are

Coeff

s.

Decreasing

Page 11: Sparsity Control for Robustness and Social Data Analysis

1111

Nonconvex regularization Nonconvex penalty terms approximate better in (P0)

Options: SCAD [Fan-Li’01], or sum-of-logs [Candes et al’08]

Iterative linearization-minimization of around

Initialize with , use Bias reduction (cf. adaptive Lasso [Zou’06])

Page 12: Sparsity Control for Robustness and Social Data Analysis

1212

Comparison with RANSAC , i.i.d.

Nominal:

Outliers:

Page 13: Sparsity Control for Robustness and Social Data Analysis

1313

Nonparametric regression

If one trusts data more than any parametric model Go nonparametric regression: lives in a space of “smooth’’ functions

Ill-posed problem Workaround: regularization [Tikhonov’77], [Wahba’90] RKHS with kernel and norm

Interactions among attributes? Not captured by Driven by complex mechanisms hard to model

Page 14: Sparsity Control for Robustness and Social Data Analysis

1414

Function approximationTrue function Nonrobust predictions

Robust predictions Refined predictions

Effectiveness in rejecting outliers is apparentG. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012

Page 15: Sparsity Control for Robustness and Social Data Analysis

1515

Load curve data cleansing Load curve: electric power consumption recorded periodically

Reliable data: key to realize smart grid vision [Hauser’09]

Uruguay’s power consumption (MW)

Faulty meters, communication errors Unscheduled maintenance, strikes, sport events

B-splines for load curve prediction and denoising [Chen et al ’10]

Page 16: Sparsity Control for Robustness and Social Data Analysis

1616

NorthWrite data

Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky

Outliers: “Building operational transition shoulder periods” No manual labeling of outliers [Chen et al’10]

Energy consumption of a government building (’05-’10) Robust smoothing spline estimator, hours

Page 17: Sparsity Control for Robustness and Social Data Analysis

1717

Principal Component Analysis

Our goal: robustify PCA by controlling outlier sparsity

Motivation: (statistical) learning from high-dimensional data

Principal component analysis (PCA) [Pearson’1901] Extraction of low-dimensional data structure Data compression and reconstruction PCA is non-robust to outliers [Jolliffe’86]

DNA microarray Traffic surveillance

Page 18: Sparsity Control for Robustness and Social Data Analysis

1818

Our work in context

Robust PCA Robust covariance matrix estimators [Campbell’80], [Huber’81] Computer vision [Xu-Yuille’95], [De la Torre-Black’03] Low-rank matrix recovery from sparse errors, e.g., [Wright et al’09]

Contemporary applications tied to SoCS Anomaly detection in IP networks [Huang et al’07], [Kim et al’09] Video surveillance, e.g., [Oliver et al’99] Matrix completion for collaborative filtering, e.g., [Candes et al’09]

Page 19: Sparsity Control for Robustness and Social Data Analysis

1919

PCA formulations Training data

Minimum reconstruction error Compression operator Reconstruction operator

Maximum variance

Component analysis model

Solution:

Page 20: Sparsity Control for Robustness and Social Data Analysis

2020

Robustifying PCA Outlier-aware model

G. Mateos and G. B. Giannakis , ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted).

Interpret: blind preference model with latent profiles

(P2)

-norm counterpart tied to (LTS PCA) (P2) subsumes optimal (vector) Huber -norm regularization for entry-wise outliers

Page 21: Sparsity Control for Robustness and Social Data Analysis

2121

Alternating minimization(P2)

update: SVD of outlier-compensated data update: row-wise vector soft-thresholding

Proposition 3: Alg. 1’s iterates converge to a stationary point of (P2).

1

Page 22: Sparsity Control for Robustness and Social Data Analysis

2222

Video surveillance

Data: http://www.cs.cmu.edu/~ftorre/

Original PCA Robust PCA `Outliers’

Page 23: Sparsity Control for Robustness and Social Data Analysis

2323

Big Five personality factors Five dimensions of personality traits [Goldberg’93][Costa-McRae’92]

Measure the Big Five Short-questionnaire (44 items) Rate 1-5, e.g.,

`I see myself as someone who……is talkative’…is full of energy’

Big Five Inventory (BFI)

Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008.

Discovered through factor analysis WEIRD subjects

Page 24: Sparsity Control for Robustness and Social Data Analysis

24

BFI data

24

Robust PCA identifies 8 outlying subjects Validated via `inconsistency’ scores, e.g., VRIN [Tellegen’88]

Eugene-Springfield community sample [Goldberg’08] subjects, item responses, factors

Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller

Page 25: Sparsity Control for Robustness and Social Data Analysis

2525

Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA

At time , do not re-estimate

Page 26: Sparsity Control for Robustness and Social Data Analysis

2626

Online PCA in action

Outliers:

Nominal:

Page 27: Sparsity Control for Robustness and Social Data Analysis

2727

Robust kernel PCA Kernel (K)PCA [Scholkopf ‘97]

Challenge: -dimensionalKernel trick:

Input space

Feature space

Related to spectral clustering

Page 28: Sparsity Control for Robustness and Social Data Analysis

2828

Unveiling communities

Data: http://www-personal.umich.edu/~mejn/netdata/

Network: NCAA football teams (nodes), F’00 games (edges) teams, kernel

Identified exactly: Big 10, Big 12, ACC, SEC, Big East Outliers: Independent teams

ARI=0.8967

Page 29: Sparsity Control for Robustness and Social Data Analysis

29

Spectrum cartography

Goal: find s.t. is the spectrum at position

Approach:Basis expansion model for , nonparametric basis pursuit

Idea: collaborate to form a spatial map of the spectrum

SPECTRUM MAP

Original Estimated

J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct. 2011.

Page 30: Sparsity Control for Robustness and Social Data Analysis

30

Technical Approaches: Consensus-based in-network operation in ad hoc WSNs Distributed optimization using alternating-direction methods Online learning of statistics using stochastic approximation Performance analysis via stochastic averaging

0 100 200 300 400 500 600 700 80010

-3

10-2

10-1

100

101

102

Time t

Lear

ning

Cur

ve

Jmin

Centralized-LMS

D-LMSD-LMS w/ noisy links

Local-LMS

Diffusion LMS

Distributed adaptive algorithms

Issues and Significance: Fast varying (non-)stationary processes Unavailability of statistical information Online incorporation of sensor data Noisy communication links

Improved learning through cooperation

G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,'‘IEEE Trans. Signal Process., Nov. 2009.

Wireless sensor

Page 31: Sparsity Control for Robustness and Social Data Analysis

31

Unveiling network anomalies

Anomalies across flows and timeEnhanced detection capabilities

Approach: Flag anomalies across flows and time via sparsity and low rank

Payoff: Ensure high performance, QoS, and security in IP networks

M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).

Page 32: Sparsity Control for Robustness and Social Data Analysis

32

OUTLIER-RESILIENT ESTIMATION

SIGNAL PROCESSING

LASSO

32

Concluding summary Research issues addressed

Sparsity control for robust metric and choice-based PM Kernel-based nonparametric utility estimation Robust (kernel) principal component analysis Scalable distributed real-time implementations

Control sparsity in model residuals for robust learning

Application domains Preference measurement and conjoint analysis Psychometrics, personality assessment Video surveillance Social and power networks

Experimental validation with GPIPP personality ratings (~6M)

Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com