Multivariate Quantiles and Ranks using Optimal Transportation

Multivariate Quantiles and Ranks using OptimalTransportation

Bodhisattva Sen1

Department of StatisticsColumbia University, New York

Department of StatisticsGeorge Mason University

Joint work with Promit Ghosal (Columbia University)

05 April, 2019

1Supported by NSF grants DMS-1712822 and AST-1614743

How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1

X is a random variable with c.d.f. F

Rank: The rank of x ∈ R is F (x)

Property: If F is continuous, F (X ) ∼ Unif([0, 1])

Quantile: The quantile function is F−1

Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

E‖X −m‖

E[|X − x | − (2u − 1)x

E[‖X − x‖ − 〈u, x〉

E‖X −m‖

E[|X − x | − (2u − 1)x

E[‖X − x‖ − 〈u, x〉

E‖X −m‖

E[|X − x | − (2u − 1)x

E[‖X − x‖ − 〈u, x〉

E‖X −m‖

E[|X − x | − (2u − 1)x

E[‖X − x‖ − 〈u, x〉

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Outline

Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?

Monge Problem

What’s the cheapest way to transport a pile of sand to cover asinkhole?

Blanchet (Columbia U. and Stanford U.) 5 / 60

Goal: infT :T (X )∼ν

Eµ[c(X ,T (X ))]

µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =

∫Ydν(y) = 1

c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)

T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,

ν(B) = µ(T−1(B)) =

T−1(B)

dµ, B ⊂ Y

Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?

Monge Problem

What’s the cheapest way to transport a pile of sand to cover asinkhole?

Blanchet (Columbia U. and Stanford U.) 5 / 60

Goal: infT :T (X )∼ν

Eµ[c(X ,T (X ))]

µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =

∫Ydν(y) = 1

c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)

T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,

ν(B) = µ(T−1(B)) =

T−1(B)

dµ, B ⊂ Y

One-dimensional optimal transport

Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν

(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

|F�1(z) � G�1(z)|p◆1/p

One-dimensional optimal transport

Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν

(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2

W pp (P, Q) = sup

Z (y)dQ(y) �

Z�(x)dP (x)

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

Wp(P, Q) =

✓Z 1

|F�1(z) � G�1(z)|p◆1/p

W pp (P, Q) = sup

Z (y)dQ(y) �

Z�(x)dP (x)

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

Wp(P, Q) =

✓Z 1

|F�1(z) � G�1(z)|p◆1/p

The minimizing T must satisfy (Why?)

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

This means that if x1 > x0 then T (x1) ≥ T (x0)

So T must be a monotone nondecreasing function

Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

Thus, T = F−1ν ◦ Fµ (and this map T is unique)

W pp (P, Q) = sup

Z (y)dQ(y) �

Z�(x)dP (x)

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

Wp(P, Q) =

✓Z 1

|F�1(z) � G�1(z)|p◆1/p

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

W pp (P, Q) = sup

Z (y)dQ(y) �

Z�(x)dP (x)

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

Wp(P, Q) =

✓Z 1

|F�1(z) � G�1(z)|p◆1/p

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Outline

Monge’s problem: Given probability measures µ and ν solve:

infT :T (X )∼ν

Eµ[c(X ,T (X ))] = infT :T#µ=ν

Xc(x ,T (x))dµ(x) (1)

where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel

Drawbacks

Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map

Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs!

Monge’s problem: Given probability measures µ and ν solve:

infT :T (X )∼ν

Eµ[c(X ,T (X ))] = infT :T#µ=ν

Xc(x ,T (x))dµ(x) (1)

where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel

Drawbacks

Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map

Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs!

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): infT :T#µ=ν

∫X c(x ,T (x))dµ(x)

Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

minπ∈Π(µ,ν)

Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)

X×Yc(x , y)dπ(x , y)

Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2

Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ

Computation: Kantorovich Dual Problem Dual

2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)

minπ∈Π(µ,ν)

When µ and ν are discrete

Kantorovich relaxation: Solve min{Eπ[c(X ,Y )] : π ∈ Π(µ, ν)

Discrete version: µ and ν supported on {xi}Mi=1 and {yj}Nj=1. Then

min{pij≥0}

pijc(xi , yj) :M∑

pij = ν(yj);N∑

pij = µ(xi )

Discrete Kantorovich formulation (Earth Mover’s Distance)

I Let µ =PN

i=1 pi�xi and µ =PM

j=1 qj�yj , where �xi is a Dirac measure,

K(µ, ⌫) = min�

c(xi, yj)�ij

�ij = pi,X

�ij = qj , �ij � 0 (7)

S. Kolouri and G. K. Rohde OT Crash Course

Outline

A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2

Monge’s problem3 (M): infT :T#µ=ν

∫‖u − T (u)‖2dµ(u)

T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B

Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)

∫‖u − x‖2dπ(u, x)

Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.

3Monge’s problem is not meaningful unless µ and ν have finite second moments

A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2

Monge’s problem3 (M): infT :T#µ=ν

∫‖u − T (u)‖2dµ(u)

T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B

Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)

∫‖u − x‖2dπ(u, x)

Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.

3Monge’s problem is not meaningful unless µ and ν have finite second moments

U ∼ µ abs. cont. distribution with support S ⊂ Rd

Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))

X ∼ ν; ν is a given probability measure in Rd

Goal: Find the “optimal” transportation map T s.t. T#µ = ν

Theorem [Knot and Smith, Brenier, McCann ...]

There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form

Q(u) = ∇ϕ(u), for µ-a.e. u,

where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).

If, in addition, µ, ν have finite second moments, then

(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

Quantile map when d ≥ 1

µ has an abs. cont. distribution with support S ⊂ Rd

ν a given probability measure in Rd (need not be abs. cont.)

Quantile map

The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

Q ≡ ∇ϕ : S → Rd

where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.

Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc

In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)

Quantile map when d ≥ 1

Quantile map

The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

Q ≡ ∇ϕ : S → Rd

where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.

Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc

In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)

Sample quantiles when d = 1

µ = Uniform([0, 1])ν ≡ νn = 1

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ R

X(1) < . . . < X(n) be the order statistics

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−0.5

Quantile function

Then the sample quantile function Qn (Qn#µ = νn) reduces to

Qn(u) = X(i), if u ∈(i − 1

), i = 1, . . . , n

At in , i = 1, . . . , n − 1, we are free to define

)∈ [X(i),X(i+1)]

Sample quantiles in Rd , d ≥ 1

µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)

ν ≡ νn = 1n

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd

Qn is the transport (Monge) map s.t. Qn#µ = 1n

∑ni=1 δXi and

minimizes (in this case (K)=(M))∫

‖u − T (u)‖2dµ(u) =n∑

{u∈S:T (u)=Xi}

‖u − Xi‖2dµ(u)

Sample quantiles in Rd , d ≥ 1

µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)

ν ≡ νn = 1n

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd

Qn is the transport (Monge) map s.t. Qn#µ = 1n

∑ni=1 δXi and

minimizes (in this case (K)=(M))∫

‖u − T (u)‖2dµ(u) =n∑

{u∈S:T (u)=Xi}

‖u − Xi‖2dµ(u)

Computation in the semi-discrete case

Obtain a convex subdivision of S — “partition” of S = ∪ni=1Q−1n (Xi )

Top-dimensional cells: convex polyhedral sets in the subdivision of Swith non-empty interior

Question: How to compute Qn? (Figures and plots?)

0.0 0.2 0.4 0.6 0.8 1.0

Figure: The data sets are drawn from the following distributions (clockwise topto bottom): (i) X ∼ N2((0, 0), I2); (ii) X ∼ N2((0, 0),Σ) where Σ1,1 = Σ2,2 = 1and Σ1,2 = Σ2,1 = 0.99; (iii) two spiral structures with Gaussian perturbations(with small variance); and (iv) a mixture of four different distributions.

Rank map

Quantile map

The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd

where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function

aBy convention ϕ(u) = +∞ for u /∈ S

Rank map

The rank map of ν (w.r.t. µ) is defined by

R ≡ ∇ϕ∗

where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:

ϕ∗(x) := supu∈Rd

{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}

Note that the rank map R(·) is finite on Rd convex functions

Rank map

Quantile map

The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd

where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function

aBy convention ϕ(u) = +∞ for u /∈ S

Rank map

The rank map of ν (w.r.t. µ) is defined by

R ≡ ∇ϕ∗

where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:

ϕ∗(x) := supu∈Rd

{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}

Note that the rank map R(·) is finite on Rd convex functions

When is R = Q−1?

X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4

U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density

Result [Ghosal and S. (2018+)]

Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:

(i) The inverse function of Q exists, and has the form

Q−1 = ∇ϕ∗ =: R,

where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)

(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

4with mild boundedness (from below and above) assumptions on the density on X

When is R = Q−1?

Q−1 = ∇ϕ∗ =: R,

When is R = Q−1?

Q−1 = ∇ϕ∗ =: R,

Properties of the rank/quantile maps

Characterizes the distribution

The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations

Suppose Y = AX , A is d × d matrix

A is an orthogonal matrix, i.e., AA> = A>A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))

Then, QY (u) = AQX (A>u) for µ-a.e. u

RY (y) = ARX (A>y), for a.e. y ∈ Rd

Quantile/rank maps — equivariant under orthogonal transformations

Under mutual independence

X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;

Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)

µ = Uniform([0, 1]d)

Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))

Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps

Mutual independence [Ghosal and S. (2018+)]

If X1, . . . ,Xk are mutually independent then

Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),

R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .

Under mutual independence

X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;

Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)

µ = Uniform([0, 1]d)

Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))

Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps

Mutual independence [Ghosal and S. (2018+)]

If X1, . . . ,Xk are mutually independent then

Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),

R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .

Sample rank map when d ≥ 1

X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex

Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n

∑ni=1 δXi

Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.

Thus ϕn is a piecewise affine convex function:

ϕn(u) =

i=1,...,n{〈Xi , u〉+ hi}, u ∈ S

+∞, u ∈ Sc

Sample rank map

The sample rank map is defined as

Rn = ∇ϕ∗n

where ϕ∗n : Rd → R is also convex piecewise affine:

ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}

Sample rank map when d ≥ 1

X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex

Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n

∑ni=1 δXi

Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.

Thus ϕn is a piecewise affine convex function:

ϕn(u) =

i=1,...,n{〈Xi , u〉+ hi}, u ∈ S

+∞, u ∈ Sc

Sample rank map

The sample rank map is defined as

Rn = ∇ϕ∗n

where ϕ∗n : Rd → R is also convex piecewise affine:

ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}

How to define the sample ranks Rn(Xi)?

The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)

But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n

Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u)

= Qn(u) convex functions

Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!

How to define the sample ranks Rn(Xi)?

The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)

But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n

Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u) = Qn(u) convex functions

Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!

The sample ranks Rn(Xi ) when d = 1

The sample rank map: Rn(x) = in , if x ∈

(X(i),X(i+1)

Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]

Usual ranks when d = 1

The sample DF (rank): Fn(x) = in , if x ∈

[X(i),X(i+1)

Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1

n (Xi ))‖u‖

The sample ranks Rn(Xi ) when d = 1

The sample rank map: Rn(x) = in , if x ∈

(X(i),X(i+1)

Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]

Usual ranks when d = 1

The sample DF (rank): Fn(x) = in , if x ∈

[X(i),X(i+1)

Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1

n (Xi ))‖u‖

Distribution-free multivariate ranks Rn(Xi )

When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)

We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1

n (Xi ), i.e.,

Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))

Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)

Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map

n (Xi ), i.e.,

Glivenko-Cantelli type result [Ghosal & S. (2018+)]

Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism

Let Qn and Rn be any sample quantile and rank functions

Let K1 ⊂ Int(S) be a compact set. Then, we have

supu∈K1

‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd

‖Rn(x)− R(x)‖ a.s.→ 0

Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.

Glivenko-Cantelli type result [Ghosal & S. (2018+)]

Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism

Let Qn and Rn be any sample quantile and rank functions

Let K1 ⊂ Int(S) be a compact set. Then, we have

supu∈K1

‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd

‖Rn(x)− R(x)‖ a.s.→ 0

Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.

Outline

Two-sample Testing

Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd

Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd

µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))

Goal: Test H0 : νX = νY versus H1 : νX 6= νY

Quantile maps

QX and QY are the sample quantile maps for Xi ’s and Yj ’s

Population quantile maps: QX and QY

Recall: QX#µ = νX and QX#µ = νY

Two-sample Testing

Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd

Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd

µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))

Quantile maps

Population quantile maps: QX and QY

Recall: QX#µ = νX and QX#µ = νY

Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn

Test statistic:

Tm,n :=

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =

{Fn(F−1(u))− u}2du

Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn

Test statistic:

Tm,n :=

Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =

{Fn(F−1(u))− u}2du

Test statistic: Tm,n =∫S

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)]

Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,

Tm,nP→ 0, as m, n→∞.

Further, for νX 6= νY (and mild regularity conditions on νX and νY ),

Tm,nP→ c > 0 as m, n→∞.

Outline

Independence Testing

(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d

Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y

µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )

µ = µX × µY = Unif([0, 1]d)

Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)

Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)

RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY

Define Rn := (RXn , R

Yn ) : Rd → [0, 1]d

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Independence Testing

(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d

Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y

µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )

µ = µX × µY = Unif([0, 1]d)

Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)

Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)

RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY

Define Rn := (RXn , R

Yn ) : Rd → [0, 1]d

Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)

Rn := (RXn , R

Yn ); RX

n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY

n rank map of Y1, . . . ,Yn

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Under H0 : X ⊥⊥ Y ,

TnP→ 0, as n→∞.

Further, if X 6⊥⊥ Y (and mild regularity conditions),

TnP→ c > 0, as n→∞.

Rn := (RXn , R

Yn ); RX

TnP→ c > 0, as n→∞.

Rn := (RXn , R

Yn ); RX

TnP→ c > 0, as n→∞.

Rn := (RXn , R

Yn ); RX

TnP→ c > 0, as n→∞.

Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.

Estimation of the “center” of the data cloud

Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))

We can show Qn(0)P→ Q(0). What about rate of convergence?

What is the limiting distribution?

What about other sample quantiles?

Thank you very much!

Questions?

Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.

Estimation of the “center” of the data cloud

Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))

We can show Qn(0)P→ Q(0). What about rate of convergence?

What is the limiting distribution?

What about other sample quantiles?

Thank you very much!

Questions?

Kantorovich duality for general cost functions

π ∈ Π(µ, ν): all probability dist. on X × Y with marginals µ and ν

Kontorovich duality

Let I (π) :=∫X×Y c(x , y)dπ(x , y) where c(·, ·) ≥ 0 is l.s.c. Then,

infπ∈Π(µ,ν)

I (π) = sup(φ,ψ)∈Φc

Xφ(x)dµ(x) +

Yψ(y)dν(y)

where Φc is the set of all measurable functions (φ, ψ) ∈ L1(µ)× L1(ν)satisfying

φ(x) + ψ(y) ≤ c(x , y) for µ-a.e. x ∈ X , ν-a.e. y ∈ Y.

In fact, one can restrict to fncs. in Φc that are bounded & continuous.

Duality when c(x , y) = ‖x − y‖2 and X = Y = Rd

µ, ν probability measures on Rd with finite second momentsπ ∈ Π(µ, ν): all prob. measures on Rd × Rd with marginals µ and ν

Let M2 =∫Rd ‖x‖2dµ(x) +

∫Rd ‖y‖2dν(y) < +∞

Then, infπ∈Π(µ,ν)

∫‖x − y‖2dπ(x , y) = M2− 2 sup

π∈Π(µ,ν)

∫〈x , y〉dπ(x , y)

Kontorovich duality

For a pair (φ, φ∗) of l.s.c. proper conjugate cvx. func. s.t. fora.e. x , y ∈ Rd , 〈x , y〉 ≤ φ(x) + ψ(y), we have

supπ∈Π(µ,ν)

∫〈x , y〉dπ(x , y) = inf

(φ,φ∗)

{∫φ(x)dµ(x) +

∫φ∗(y)dν(y)

Result: Suppose that (ϕ,ϕ∗) solves (??). If µ is abs. cont., then

(i) the unique optimal tranference plan is π = (id ,∇ϕ)#µ;(ii) ∇ϕ is the unique solution to the Monge problem:

∫‖x −∇ϕ(x)‖2dµ(x) = inf

T :T#µ=ν

∫‖x − T (x)‖2dµ(x)

Proof: Step 1

Let J(φ, ψ) :=∫Rd φ(x)dµ(x) +

∫Rd ψ(y)dν(y)

Duality: infπ∈Π(µ,ν)

I (π) = sup(φ,ψ)∈Φ

J(φ, ψ)

where (φ, ψ) ∈ Φ iff for a.a. x , y ∈ Rn, φ(x) + ψ(y) ≤ ‖x − y‖2

Simple algebra yields: 〈x , y〉 ≤[‖x‖2

2 − φ(x)]

+[‖y‖2

2 − ψ(y)]

Define: φ(x) = ‖x‖2

2 − φ(x), ψ(y) = ‖y‖2

2 − ψ(y)

infπ∈Π(µ,ν)

∫‖x − y‖2dπ(x , y) = M2 − 2 sup

π∈Π(µ,ν)

∫〈x , y〉dπ(x , y)

sup(φ,ψ)∈Φ

J(φ, ψ) = M2 − 2 inf(φ,ψ)∈Φ

J(φ, ψ)

where (φ, ψ) ∈ Φ iff for a.a. x , y , 〈x , y〉 ≤ φ(x) + ψ(y)

Thus, supπ∈Π(µ,ν)

∫〈x , y〉dπ(x , y) = inf

(φ,ψ)∈ΦJ(φ, ψ)

Proof: Step 2

Double convexification trick to improve the admissible pairs in thedual problem

Semi-discrete OT

Data: X1, . . . ,Xn in Rd ; ν = 1n

∑ni=1 δXi

µ: an abs. cont. distribution on compact convex set S ⊂ Rd

The dual Kantorovich problem in this setting can be written as:

infψ convex

∫ψ∗(x)dµ(x) +

∫ψ(y)dν(y)

Let ψi = ψ(Xi ), the above minimization problem reduces to

(M) = inf

Ssup{〈Xi , x〉 − ψi}dµ(x) +

Some facts about convex functions

Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by

∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd

Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = {∇f (x)}

A convex function (in Rd) is a.e. differentiable

Some facts about convex functions

Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by

∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd

Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = {∇f (x)}

A convex function (in Rd) is a.e. differentiable

Legendre-Fenchel dual

Let φ : Rd → R ∪ {+∞}. The convex conjugate φ∗ : Rd → R ∪ {+∞}of φ is defined as

φ∗(x) := supy∈Rd

{〈x , y〉 − φ(y)}

Lemma (Characterization of subdifferential)

Let f : Rd → R ∪ {+∞} be a proper (i.e., f (x) <∞ for some x ∈ Rd)l.s.c. convex function. Then for all x , y ∈ Rd ,

〈x , y〉 = f (x) + f ∗(y)⇐⇒ y ∈ ∂f (x)⇐⇒ x ∈ ∂f ∗(y).

Lemma (Legendre duality)

Let f : Rd → R ∪ {+∞} be a proper function. Then the following threeproperties are equivalent:

(i) f is convex l.s.c. function;

(ii) f = ψ∗ for some proper function ψ;

(iii) f ∗∗ = f .

When is (K) = (M)? (with c(x , y) = ‖x − y‖2)

Discrete case

Suppose that µ and ν supported on {xi}Mi=1 and {yj}Nj=1

Kantorovich’s problem (K):

min{pij≥0}

pijc(xi , yj) :M∑

pij = ν(yj);N∑

pij = µ(xi )

Monge’s problem (M): minT :T#µ=ν

∑Mi=1 ‖xi − T (xi )‖2µ(xi )

In general, (K) 6= (M)

If M = N and

µ(xi ) = ν(yj) =1

N∀i , j ,

then the optimal transference plan in the (K) problem coincides withthe solution of the (M) problem

Absolutely continuous case

µ and ν are absolutely continuous

Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem

Semi-discrete case

ν supported on finite set {yj}Nj=1; µ abs. cont. with support S ⊂ Rd

Monge’s problem: Find T s.t. T#µ =∑N

i=1 νjδyj & minimizes

∫‖x − T (x)‖2dµ(x) =

{x :T (x)=yj}

‖x − yj‖2dµ(x)

Note that: µ(T−1(yj)) = µ({x : T (x) = yj}) = νj

Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem

Multivariate Quantiles and Ranks using Optimal Transportation

Documents

Transcript of Multivariate Quantiles and Ranks using Optimal Transportation