The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The...

15
The Nonparanormal Skeptic Han Liu, Fang Han, Ming Yuan, John Lafferty and Larry Wasserman ICML 2012 Presented by Esther Salazar Duke University June 7, 2013 E. Salazar (Reading group) June 7, 2013 1 / 14

Transcript of The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The...

Page 1: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

The Nonparanormal Skeptic

Han Liu, Fang Han, Ming Yuan,John Lafferty and Larry Wasserman

ICML 2012

Presented by Esther SalazarDuke University

June 7, 2013

E. Salazar (Reading group) June 7, 2013 1 / 14

Page 2: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

Summary

The nonparanormal SKEPTIC method is proposed for estimating highdimensional undirected graphical models

SKEPTIC: Spearman/Kendall estimates preempt transformations toinfer correlations

Nonparametric rank based correlation coefficients: Spearman’s rhoand Kendall’s tau

The authors point that the paranormal graphical model can be a safereplacement for the Gaussian graphical model

E. Salazar (Reading group) June 7, 2013 2 / 14

Page 3: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

Undirected graphical models (UGM)UGM provide a powerful framework for exploring interrelationships among largenumber of random variables

The joint distribution of a random vector X = (X1, . . . , Xd) is associated with agraph G = (V,E), where each vertex i corresponds to Xi

The pair (i, j) is not an element of the edge set E if and only if Xi isindependent of Xj given (Xk : k 6= i, j)

Goal: We have n observations of the random vector X, wish to estimate the edgeset E (i.e. the precision matrix!)

E. Salazar (Reading group) June 7, 2013 3 / 14

Page 4: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

When the dimension d is small → assume that X has a multivariateGaussian distribution and then test the sparsity pattern of Ω = Σ−1

based on Σn

Drawback: d must be strictly smaller than n

In the high dimensional setting (d > n), a number of methods havebeen proposed

I Meinshausen & Buhlmann (2006): method based on parallel lassoregressions of each Xi on (Xj : j 6= i)

I Friedman et al. (2008): Ω computed using the glasso algorithmI . . .

Important issue: Normality assumption is restrictive and conclusionsinferred under this assumption could be misleading

To relax this assumption, the nonparanormal distributions is proposed

E. Salazar (Reading group) June 7, 2013 4 / 14

Page 5: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

When the dimension d is small → assume that X has a multivariateGaussian distribution and then test the sparsity pattern of Ω = Σ−1

based on Σn

Drawback: d must be strictly smaller than n

In the high dimensional setting (d > n), a number of methods havebeen proposed

I Meinshausen & Buhlmann (2006): method based on parallel lassoregressions of each Xi on (Xj : j 6= i)

I Friedman et al. (2008): Ω computed using the glasso algorithmI . . .

Important issue: Normality assumption is restrictive and conclusionsinferred under this assumption could be misleading

To relax this assumption, the nonparanormal distributions is proposed

E. Salazar (Reading group) June 7, 2013 4 / 14

Page 6: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

The nonparanormal

Let f = (f1, . . . , fd) be a set of monotonic univariate functions and letΣ0 ∈ Rd×d be a positive-definite correlation matrix with diag(Σ0) = 1

A d-dimensional random variable X = (X1, . . . , Xd)T has a nonparanormaldistribution X ∼ NPNd(f,Σ0) if

f(X) := (f1(X1), . . . , fd(Xd))T ∼ N(0,Σ0)

For continuous functions f , Liu et al. (2009) show that the NPN family isequivalent to the Gaussian copula family

The authors claim that the NPN family is much richer than the Normal. Also, theconditional independence graph is still encoded by the sparsity pattern ofΩ0 = (Σ0)−1, i.e.

Ω0jk = 0⇔ Xj ⊥⊥ Xk|X\j,k

E. Salazar (Reading group) June 7, 2013 5 / 14

Page 7: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

The Normal-score based Nonparanormal Graph Estimator

(Liu et al., 2009)

Let Sns = [Snsjk ] be the correlation matrix of the transformed data, where

E. Salazar (Reading group) June 7, 2013 6 / 14

Page 8: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

The Nonparanormal SKEPTIC: main idea

The main idea is to exploit Spearman’s rho and Kendall’s tau statistics todirectly estimate the unknown correlation matrix Σ0, without explicitlycalculating the marginal transformation functions fj

Both can be viewed as a form of nonparametric correlation between Xj

and Xk

E. Salazar (Reading group) June 7, 2013 7 / 14

Page 9: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

The Nonparanormal SKEPTIC: main ideaPopulation versions of Spearman’s rho and Kendall’s tau are given by

E. Salazar (Reading group) June 7, 2013 8 / 14

Page 10: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

NPN Skeptic with different graph estimators

NPN Skeptic with the graphical Dantzig selector

Main idea: take advantage of the connection between multivariate linearregression and entries of Ω

E. Salazar (Reading group) June 7, 2013 9 / 14

Page 11: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

NPN Skeptic with different graph estimators

NPN Skeptic with CLIME (Cai et al., 2011, JASA)

CLIME: constrained `1-minimization for inverse matrix estimation

Main idea: the estimated correlation matrix S can also be plugged into theCLIME estimator defined by

E. Salazar (Reading group) June 7, 2013 10 / 14

Page 12: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

NPN Skeptic with different graph estimators

NPN Skeptic with the graphical lasso

Main idea: plug in the estimated correlation coefficient matrix S into thegraphical lasso

E. Salazar (Reading group) June 7, 2013 11 / 14

Page 13: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

Important theoretical property

The authors prove that the NPN Skeptic achieves the optimal parametricrate of convergence for precision matrix estimation

E. Salazar (Reading group) June 7, 2013 12 / 14

Page 14: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

Application: Stock price data from Yahoo!

Finance

Data: Daily closing prices for 452 stocks from S&P 500 (Jan. 1, 2003to Jan. 1, 2008) that gives 1257 data points

St = (St,1, . . . , St,452) with St,j denoting the closing price of stock jon day t

They consider the variables Xtj = log(St,j/St−1,j)

Goal: Build graphs over the indices j

The 452 stocks are categorized into 10 Global Industry ClassificationStandard (GICS) sectors. It is expected that stocks from the same GICSsector should tend to be clustered together.

E. Salazar (Reading group) June 7, 2013 13 / 14

Page 15: The Nonparanormal Skepticpeople.ee.duke.edu/~lcarin/Esther6.7.2013.pdf · 07-06-2013  · The Nonparanormal SKEPTIC: main idea The main idea isto exploit Spearman’s rho and Kendall’s

E. Salazar (Reading group) June 7, 2013 14 / 14