Multivariate Quantiles and Ranks using Optimal Transportation

94
Multivariate Quantiles and Ranks using Optimal Transportation Bodhisattva Sen 1 Department of Statistics Columbia University, New York Department of Statistics George Mason University Joint work with Promit Ghosal (Columbia University) 05 April, 2019 1 Supported by NSF grants DMS-1712822 and AST-1614743

Transcript of Multivariate Quantiles and Ranks using Optimal Transportation

Page 1: Multivariate Quantiles and Ranks using Optimal Transportation

Multivariate Quantiles and Ranks using OptimalTransportation

Bodhisattva Sen1

Department of StatisticsColumbia University, New York

Department of StatisticsGeorge Mason University

Joint work with Promit Ghosal (Columbia University)

05 April, 2019

1Supported by NSF grants DMS-1712822 and AST-1614743

Page 2: Multivariate Quantiles and Ranks using Optimal Transportation

How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1

X is a random variable with c.d.f. F

Rank: The rank of x ∈ R is F (x)

Property: If F is continuous, F (X ) ∼ Unif([0, 1])

Quantile: The quantile function is F−1

Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])

Page 3: Multivariate Quantiles and Ranks using Optimal Transportation

How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1

X is a random variable with c.d.f. F

Rank: The rank of x ∈ R is F (x)

Property: If F is continuous, F (X ) ∼ Unif([0, 1])

Quantile: The quantile function is F−1

Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])

Page 4: Multivariate Quantiles and Ranks using Optimal Transportation

How to define ranks and quantiles in Rd , d > 1?

Ranks and quantiles when d = 1

X is a random variable with c.d.f. F

Rank: The rank of x ∈ R is F (x)

Property: If F is continuous, F (X ) ∼ Unif([0, 1])

Quantile: The quantile function is F−1

Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])

Page 5: Multivariate Quantiles and Ranks using Optimal Transportation

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

]

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

]

Page 6: Multivariate Quantiles and Ranks using Optimal Transportation

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

]

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

]

Page 7: Multivariate Quantiles and Ranks using Optimal Transportation

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

]

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

]

Page 8: Multivariate Quantiles and Ranks using Optimal Transportation

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

]

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

]

Page 9: Multivariate Quantiles and Ranks using Optimal Transportation

Defining quantiles, ranks, depth, etc. difficult when d > 1

Lack of a natural ordering in Rd , when d > 1

Many notions of multivariate quantiles/ranks have been suggested:

Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...

Spatial median and geometric quantile

Spatial median: M := arg minm∈Rd

E‖X −m‖

Quantile when d = 1: For u ∈ (0, 1),

F−1(u) = arg minx∈R

E[|X − x | − (2u − 1)x

]

Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let

Q(u) := arg minx∈Rd

E[‖X − x‖ − 〈u, x〉

]

Page 10: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 11: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 12: Multivariate Quantiles and Ranks using Optimal Transportation

Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?

Monge Problem

What’s the cheapest way to transport a pile of sand to cover asinkhole?

Blanchet (Columbia U. and Stanford U.) 5 / 60

Goal: infT :T (X )∼ν

Eµ[c(X ,T (X ))]

µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =

∫Ydν(y) = 1

c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)

T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,

ν(B) = µ(T−1(B)) =

T−1(B)

dµ, B ⊂ Y

Page 13: Multivariate Quantiles and Ranks using Optimal Transportation

Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?

Monge Problem

What’s the cheapest way to transport a pile of sand to cover asinkhole?

Blanchet (Columbia U. and Stanford U.) 5 / 60

Goal: infT :T (X )∼ν

Eµ[c(X ,T (X ))]

µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =

∫Ydν(y) = 1

c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)

T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,

ν(B) = µ(T−1(B)) =

T−1(B)

dµ, B ⊂ Y

Page 14: Multivariate Quantiles and Ranks using Optimal Transportation

One-dimensional optimal transport

Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν

(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

,�

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

)

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

0

|F�1(z) � G�1(z)|p◆1/p

4

Page 15: Multivariate Quantiles and Ranks using Optimal Transportation

One-dimensional optimal transport

Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν

(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

,�

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

)

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

0

|F�1(z) � G�1(z)|p◆1/p

4

Page 16: Multivariate Quantiles and Ranks using Optimal Transportation

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

,�

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

)

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

0

|F�1(z) � G�1(z)|p◆1/p

4

The minimizing T must satisfy (Why?)

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

This means that if x1 > x0 then T (x1) ≥ T (x0)

So T must be a monotone nondecreasing function

Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)

dµ)

∫ x

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

Thus, T = F−1ν ◦ Fµ (and this map T is unique)

Page 17: Multivariate Quantiles and Ranks using Optimal Transportation

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

,�

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

)

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

0

|F�1(z) � G�1(z)|p◆1/p

4

The minimizing T must satisfy (Why?)

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

This means that if x1 > x0 then T (x1) ≥ T (x0)

So T must be a monotone nondecreasing function

Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)

dµ)

∫ x

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

Thus, T = F−1ν ◦ Fµ (and this map T is unique)

Page 18: Multivariate Quantiles and Ranks using Optimal Transportation

Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.

where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤

(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.

It can be shown that

W pp (P, Q) = sup

,�

Z (y)dQ(y) �

Z�(x)dP (x)

where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation

W1(P, Q) = sup

(Zf(x)dP (x) �

Zf(x)dQ(x) : f 2 F

)

where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.

When d = 1, the distance has a closed form:

Wp(P, Q) =

✓Z 1

0

|F�1(z) � G�1(z)|p◆1/p

4

The minimizing T must satisfy (Why?)

(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2

This means that if x1 > x0 then T (x1) ≥ T (x0)

So T must be a monotone nondecreasing function

Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)

dµ)

∫ x

−∞dµ(x) =

∫ T (x)

−∞dν(y) ⇒ Fµ(x) = Fν(T (x))

Thus, T = F−1ν ◦ Fµ (and this map T is unique)

Page 19: Multivariate Quantiles and Ranks using Optimal Transportation

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Page 20: Multivariate Quantiles and Ranks using Optimal Transportation

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Page 21: Multivariate Quantiles and Ranks using Optimal Transportation

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Page 22: Multivariate Quantiles and Ranks using Optimal Transportation

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Page 23: Multivariate Quantiles and Ranks using Optimal Transportation

Optimal transportation when d = 1

X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s

Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]

Solution: T = F−1ν ◦ Fµ (and this map T is unique)

Ranks and Quantiles when d = 1

When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map

When ν = Unif([0, 1]), T = Fµ — rank map

Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem

How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?

Page 24: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 25: Multivariate Quantiles and Ranks using Optimal Transportation

Monge’s problem: Given probability measures µ and ν solve:

infT :T (X )∼ν

Eµ[c(X ,T (X ))] = infT :T#µ=ν

Xc(x ,T (x))dµ(x) (1)

where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel

Drawbacks

Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map

Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs!

Page 26: Multivariate Quantiles and Ranks using Optimal Transportation

Monge’s problem: Given probability measures µ and ν solve:

infT :T (X )∼ν

Eµ[c(X ,T (X ))] = infT :T#µ=ν

Xc(x ,T (x))dµ(x) (1)

where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel

Drawbacks

Above optimization problem is highly non-linear and can be ill-posed

No admissible T may exist; e.g., if µ is the Dirac delta and ν is not

Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map

Solution need not be unique (book shifting example)

Not much progress was made for about 160 yrs!

Page 27: Multivariate Quantiles and Ranks using Optimal Transportation

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 28: Multivariate Quantiles and Ranks using Optimal Transportation

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): infT :T#µ=ν

∫X c(x ,T (x))dµ(x)

Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

minπ∈Π(µ,ν)

Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)

X×Yc(x , y)dπ(x , y)

Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2

Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ

Computation: Kantorovich Dual Problem Dual

2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)

Page 29: Multivariate Quantiles and Ranks using Optimal Transportation

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): infT :T#µ=ν

∫X c(x ,T (x))dµ(x)

Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

minπ∈Π(µ,ν)

Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)

X×Yc(x , y)dπ(x , y)

Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2

Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ

Computation: Kantorovich Dual Problem Dual

2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)

Page 30: Multivariate Quantiles and Ranks using Optimal Transportation

Kantorovich Relaxation: Primal Problem

Monge’s problem (M): infT :T#µ=ν

∫X c(x ,T (x))dµ(x)

Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.

πX = marginal of X = µ, πY = marginal of Y = ν

Kantorovich relaxation (K): Solve

minπ∈Π(µ,ν)

Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)

X×Yc(x , y)dπ(x , y)

Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2

Linear program (infinite dimensional)

Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ

Computation: Kantorovich Dual Problem Dual

2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)

Page 31: Multivariate Quantiles and Ranks using Optimal Transportation

When µ and ν are discrete

Kantorovich relaxation: Solve min{Eπ[c(X ,Y )] : π ∈ Π(µ, ν)

}

Discrete version: µ and ν supported on {xi}Mi=1 and {yj}Nj=1. Then

min{pij≥0}

M∑

i=1

N∑

j=1

pijc(xi , yj) :M∑

i=1

pij = ν(yj);N∑

j=1

pij = µ(xi )

Discrete Kantorovich formulation (Earth Mover’s Distance)

I Let µ =PN

i=1 pi�xi and µ =PM

j=1 qj�yj , where �xi is a Dirac measure,

K(µ, ⌫) = min�

X

i

X

j

c(xi, yj)�ij

s.t.X

j

�ij = pi,X

i

�ij = qj , �ij � 0 (7)

S. Kolouri and G. K. Rohde OT Crash Course

Page 32: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 33: Multivariate Quantiles and Ranks using Optimal Transportation

A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2

Monge’s problem3 (M): infT :T#µ=ν

∫‖u − T (u)‖2dµ(u)

T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B

Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)

∫‖u − x‖2dπ(u, x)

Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.

3Monge’s problem is not meaningful unless µ and ν have finite second moments

Page 34: Multivariate Quantiles and Ranks using Optimal Transportation

A Geometric Approach to Optimal Transportation

µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2

Monge’s problem3 (M): infT :T#µ=ν

∫‖u − T (u)‖2dµ(u)

T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B

Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)

∫‖u − x‖2dπ(u, x)

Compared to above notions this approach has the following advantages:

This relies on appealing geometric ideas

Does not require any moment conditions

When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.

3Monge’s problem is not meaningful unless µ and ν have finite second moments

Page 35: Multivariate Quantiles and Ranks using Optimal Transportation

U ∼ µ abs. cont. distribution with support S ⊂ Rd

Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))

X ∼ ν; ν is a given probability measure in Rd

Goal: Find the “optimal” transportation map T s.t. T#µ = ν

Theorem [Knot and Smith, Brenier, McCann ...]

There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form

Q(u) = ∇ϕ(u), for µ-a.e. u,

where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).

If, in addition, µ, ν have finite second moments, then

(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ

Page 36: Multivariate Quantiles and Ranks using Optimal Transportation

U ∼ µ abs. cont. distribution with support S ⊂ Rd

Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))

X ∼ ν; ν is a given probability measure in Rd

Goal: Find the “optimal” transportation map T s.t. T#µ = ν

Theorem [Knot and Smith, Brenier, McCann ...]

There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form

Q(u) = ∇ϕ(u), for µ-a.e. u,

where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).

If, in addition, µ, ν have finite second moments, then

(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ

Page 37: Multivariate Quantiles and Ranks using Optimal Transportation

U ∼ µ abs. cont. distribution with support S ⊂ Rd

Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))

X ∼ ν; ν is a given probability measure in Rd

Goal: Find the “optimal” transportation map T s.t. T#µ = ν

Theorem [Knot and Smith, Brenier, McCann ...]

There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form

Q(u) = ∇ϕ(u), for µ-a.e. u,

where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).

If, in addition, µ, ν have finite second moments, then

(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ

Page 38: Multivariate Quantiles and Ranks using Optimal Transportation

U ∼ µ abs. cont. distribution with support S ⊂ Rd

Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))

X ∼ ν; ν is a given probability measure in Rd

Goal: Find the “optimal” transportation map T s.t. T#µ = ν

Theorem [Knot and Smith, Brenier, McCann ...]

There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form

Q(u) = ∇ϕ(u), for µ-a.e. u,

where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).

If, in addition, µ, ν have finite second moments, then

(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,

infT :T#µ=ν

∫‖u − T (u)‖2dµ(u) =

∫‖u − Q(u)‖2dµ(u);

(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ

Page 39: Multivariate Quantiles and Ranks using Optimal Transportation

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 40: Multivariate Quantiles and Ranks using Optimal Transportation

Quantile map when d ≥ 1

µ has an abs. cont. distribution with support S ⊂ Rd

ν a given probability measure in Rd (need not be abs. cont.)

Quantile map

The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

Q ≡ ∇ϕ : S → Rd

where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.

Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc

In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)

Page 41: Multivariate Quantiles and Ranks using Optimal Transportation

Quantile map when d ≥ 1

µ has an abs. cont. distribution with support S ⊂ Rd

ν a given probability measure in Rd (need not be abs. cont.)

Quantile map

The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa

Q ≡ ∇ϕ : S → Rd

where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.

Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.

aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc

In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)

Page 42: Multivariate Quantiles and Ranks using Optimal Transportation

Sample quantiles when d = 1

µ = Uniform([0, 1])ν ≡ νn = 1

n

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ R

X(1) < . . . < X(n) be the order statistics

0.0 0.2 0.4 0.6 0.8 1.0

−1.5

−0.5

0.5

Quantile function

u

Q

Then the sample quantile function Qn (Qn#µ = νn) reduces to

Qn(u) = X(i), if u ∈(i − 1

n,i

n

), i = 1, . . . , n

At in , i = 1, . . . , n − 1, we are free to define

Qn

(i

n

)∈ [X(i),X(i+1)]

Page 43: Multivariate Quantiles and Ranks using Optimal Transportation

Sample quantiles in Rd , d ≥ 1

µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)

ν ≡ νn = 1n

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd

Qn is the transport (Monge) map s.t. Qn#µ = 1n

∑ni=1 δXi and

minimizes (in this case (K)=(M))∫

S

‖u − T (u)‖2dµ(u) =n∑

i=1

{u∈S:T (u)=Xi}

‖u − Xi‖2dµ(u)

Page 44: Multivariate Quantiles and Ranks using Optimal Transportation

Sample quantiles in Rd , d ≥ 1

µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)

ν ≡ νn = 1n

∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd

Qn is the transport (Monge) map s.t. Qn#µ = 1n

∑ni=1 δXi and

minimizes (in this case (K)=(M))∫

S

‖u − T (u)‖2dµ(u) =n∑

i=1

{u∈S:T (u)=Xi}

‖u − Xi‖2dµ(u)

Page 45: Multivariate Quantiles and Ranks using Optimal Transportation

Computation in the semi-discrete case

Obtain a convex subdivision of S — “partition” of S = ∪ni=1Q−1n (Xi )

Top-dimensional cells: convex polyhedral sets in the subdivision of Swith non-empty interior

Question: How to compute Qn? (Figures and plots?)

Page 46: Multivariate Quantiles and Ranks using Optimal Transportation

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure: The data sets are drawn from the following distributions (clockwise topto bottom): (i) X ∼ N2((0, 0), I2); (ii) X ∼ N2((0, 0),Σ) where Σ1,1 = Σ2,2 = 1and Σ1,2 = Σ2,1 = 0.99; (iii) two spiral structures with Gaussian perturbations(with small variance); and (iv) a mixture of four different distributions.

Page 47: Multivariate Quantiles and Ranks using Optimal Transportation

Rank map

µ has an abs. cont. distribution with support S ⊂ Rd

ν a given probability measure in Rd (need not be abs. cont.)

Quantile map

The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd

where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function

aBy convention ϕ(u) = +∞ for u /∈ S

Rank map

The rank map of ν (w.r.t. µ) is defined by

R ≡ ∇ϕ∗

where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:

ϕ∗(x) := supu∈Rd

{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}

Note that the rank map R(·) is finite on Rd convex functions

Page 48: Multivariate Quantiles and Ranks using Optimal Transportation

Rank map

µ has an abs. cont. distribution with support S ⊂ Rd

ν a given probability measure in Rd (need not be abs. cont.)

Quantile map

The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd

where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function

aBy convention ϕ(u) = +∞ for u /∈ S

Rank map

The rank map of ν (w.r.t. µ) is defined by

R ≡ ∇ϕ∗

where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:

ϕ∗(x) := supu∈Rd

{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}

Note that the rank map R(·) is finite on Rd convex functions

Page 49: Multivariate Quantiles and Ranks using Optimal Transportation

When is R = Q−1?

X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4

U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density

Result [Ghosal and S. (2018+)]

Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:

(i) The inverse function of Q exists, and has the form

Q−1 = ∇ϕ∗ =: R,

where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)

(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

4with mild boundedness (from below and above) assumptions on the density on X

Page 50: Multivariate Quantiles and Ranks using Optimal Transportation

When is R = Q−1?

X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4

U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density

Result [Ghosal and S. (2018+)]

Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:

(i) The inverse function of Q exists, and has the form

Q−1 = ∇ϕ∗ =: R,

where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)

(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

4with mild boundedness (from below and above) assumptions on the density on X

Page 51: Multivariate Quantiles and Ranks using Optimal Transportation

When is R = Q−1?

X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4

U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density

Result [Ghosal and S. (2018+)]

Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:

(i) The inverse function of Q exists, and has the form

Q−1 = ∇ϕ∗ =: R,

where ϕ∗ is the Legendre-Fenchel dual of ϕ.

(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)

(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.

athe c.d.f. F is continuous and strictly increasing

4with mild boundedness (from below and above) assumptions on the density on X

Page 52: Multivariate Quantiles and Ranks using Optimal Transportation

Properties of the rank/quantile maps

Characterizes the distribution

The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations

Suppose Y = AX , A is d × d matrix

A is an orthogonal matrix, i.e., AA> = A>A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))

Then, QY (u) = AQX (A>u) for µ-a.e. u

RY (y) = ARX (A>y), for a.e. y ∈ Rd

Quantile/rank maps — equivariant under orthogonal transformations

Page 53: Multivariate Quantiles and Ranks using Optimal Transportation

Properties of the rank/quantile maps

Characterizes the distribution

The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations

Suppose Y = AX , A is d × d matrix

A is an orthogonal matrix, i.e., AA> = A>A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))

Then, QY (u) = AQX (A>u) for µ-a.e. u

RY (y) = ARX (A>y), for a.e. y ∈ Rd

Quantile/rank maps — equivariant under orthogonal transformations

Page 54: Multivariate Quantiles and Ranks using Optimal Transportation

Properties of the rank/quantile maps

Characterizes the distribution

The quantile and rank functions characterize the associated distribution

Equivariance under orthogonal transformations

Suppose Y = AX , A is d × d matrix

A is an orthogonal matrix, i.e., AA> = A>A = Id

µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))

Then, QY (u) = AQX (A>u) for µ-a.e. u

RY (y) = ARX (A>y), for a.e. y ∈ Rd

Quantile/rank maps — equivariant under orthogonal transformations

Page 55: Multivariate Quantiles and Ranks using Optimal Transportation

Under mutual independence

X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;

Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)

µ = Uniform([0, 1]d)

Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))

Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps

Mutual independence [Ghosal and S. (2018+)]

If X1, . . . ,Xk are mutually independent then

Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),

R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .

Page 56: Multivariate Quantiles and Ranks using Optimal Transportation

Under mutual independence

X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;

Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)

µ = Uniform([0, 1]d)

Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))

Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps

Mutual independence [Ghosal and S. (2018+)]

If X1, . . . ,Xk are mutually independent then

Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),

R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .

Page 57: Multivariate Quantiles and Ranks using Optimal Transportation

Sample rank map when d ≥ 1

X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex

Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n

∑ni=1 δXi

Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.

Thus ϕn is a piecewise affine convex function:

ϕn(u) =

{max

i=1,...,n{〈Xi , u〉+ hi}, u ∈ S

+∞, u ∈ Sc

Sample rank map

The sample rank map is defined as

Rn = ∇ϕ∗n

where ϕ∗n : Rd → R is also convex piecewise affine:

ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}

Page 58: Multivariate Quantiles and Ranks using Optimal Transportation

Sample rank map when d ≥ 1

X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex

Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n

∑ni=1 δXi

Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.

Thus ϕn is a piecewise affine convex function:

ϕn(u) =

{max

i=1,...,n{〈Xi , u〉+ hi}, u ∈ S

+∞, u ∈ Sc

Sample rank map

The sample rank map is defined as

Rn = ∇ϕ∗n

where ϕ∗n : Rd → R is also convex piecewise affine:

ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}

Page 59: Multivariate Quantiles and Ranks using Optimal Transportation

How to define the sample ranks Rn(Xi)?

The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)

But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n

Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u)

= Qn(u) convex functions

Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!

Page 60: Multivariate Quantiles and Ranks using Optimal Transportation

How to define the sample ranks Rn(Xi)?

The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)

But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n

Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u) = Qn(u) convex functions

Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!

Page 61: Multivariate Quantiles and Ranks using Optimal Transportation

The sample ranks Rn(Xi ) when d = 1

The sample rank map: Rn(x) = in , if x ∈

(X(i),X(i+1)

)

Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]

Usual ranks when d = 1

The sample DF (rank): Fn(x) = in , if x ∈

[X(i),X(i+1)

)

Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1

n (Xi ))‖u‖

Page 62: Multivariate Quantiles and Ranks using Optimal Transportation

The sample ranks Rn(Xi ) when d = 1

The sample rank map: Rn(x) = in , if x ∈

(X(i),X(i+1)

)

Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]

Usual ranks when d = 1

The sample DF (rank): Fn(x) = in , if x ∈

[X(i),X(i+1)

)

Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1

n (Xi ))‖u‖

Page 63: Multivariate Quantiles and Ranks using Optimal Transportation

Distribution-free multivariate ranks Rn(Xi )

When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)

We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1

n (Xi ), i.e.,

Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))

Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)

Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map

Page 64: Multivariate Quantiles and Ranks using Optimal Transportation

Distribution-free multivariate ranks Rn(Xi )

When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)

We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1

n (Xi ), i.e.,

Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))

Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)

Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map

Page 65: Multivariate Quantiles and Ranks using Optimal Transportation

Distribution-free multivariate ranks Rn(Xi )

When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)

We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1

n (Xi ), i.e.,

Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))

Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)

Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map

Page 66: Multivariate Quantiles and Ranks using Optimal Transportation

Glivenko-Cantelli type result [Ghosal & S. (2018+)]

Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism

Let Qn and Rn be any sample quantile and rank functions

Let K1 ⊂ Int(S) be a compact set. Then, we have

supu∈K1

‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd

‖Rn(x)− R(x)‖ a.s.→ 0

Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.

Page 67: Multivariate Quantiles and Ranks using Optimal Transportation

Glivenko-Cantelli type result [Ghosal & S. (2018+)]

Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))

Let Q and R be the quantile and rank maps of ν (w.r.t. µ)

Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism

Let Qn and Rn be any sample quantile and rank functions

Let K1 ⊂ Int(S) be a compact set. Then, we have

supu∈K1

‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd

‖Rn(x)− R(x)‖ a.s.→ 0

Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.

Page 68: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 69: Multivariate Quantiles and Ranks using Optimal Transportation

Two-sample Testing

Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd

Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd

µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))

Goal: Test H0 : νX = νY versus H1 : νX 6= νY

Quantile maps

QX and QY are the sample quantile maps for Xi ’s and Yj ’s

Population quantile maps: QX and QY

Recall: QX#µ = νX and QX#µ = νY

Page 70: Multivariate Quantiles and Ranks using Optimal Transportation

Two-sample Testing

Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd

Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd

µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))

Goal: Test H0 : νX = νY versus H1 : νX 6= νY

Quantile maps

QX and QY are the sample quantile maps for Xi ’s and Yj ’s

Population quantile maps: QX and QY

Recall: QX#µ = νX and QX#µ = νY

Page 71: Multivariate Quantiles and Ranks using Optimal Transportation

Goal: Test H0 : νX = νY versus H1 : νX 6= νY

QX and QY are the sample quantile maps for Xi ’s and Yj ’s

Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn

Test statistic:

Tm,n :=

S

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =

∫ 1

0

{Fn(F−1(u))− u}2du

Page 72: Multivariate Quantiles and Ranks using Optimal Transportation

Goal: Test H0 : νX = νY versus H1 : νX 6= νY

QX and QY are the sample quantile maps for Xi ’s and Yj ’s

Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn

Test statistic:

Tm,n :=

S

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =

∫ 1

0

{Fn(F−1(u))− u}2du

Page 73: Multivariate Quantiles and Ranks using Optimal Transportation

Test statistic: Tm,n =∫S

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)]

Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,

Tm,nP→ 0, as m, n→∞.

Further, for νX 6= νY (and mild regularity conditions on νX and νY ),

Tm,nP→ c > 0 as m, n→∞.

Page 74: Multivariate Quantiles and Ranks using Optimal Transportation

Test statistic: Tm,n =∫S

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)]

Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,

Tm,nP→ 0, as m, n→∞.

Further, for νX 6= νY (and mild regularity conditions on νX and νY ),

Tm,nP→ c > 0 as m, n→∞.

Page 75: Multivariate Quantiles and Ranks using Optimal Transportation

Test statistic: Tm,n =∫S

∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)

When d = 1, Tm,n is distribution-free!

Question: Is Tm,n is (asymptotically) distribution-free when d > 1?

Critical value: Can always be computed by permutation test

Theorem [Ghosal and S. (2018+)]

Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,

Tm,nP→ 0, as m, n→∞.

Further, for νX 6= νY (and mild regularity conditions on νX and νY ),

Tm,nP→ c > 0 as m, n→∞.

Page 76: Multivariate Quantiles and Ranks using Optimal Transportation

Outline

1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach

2 Quantile and Rank Functions in Rd (d ≥ 1)

3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing

Page 77: Multivariate Quantiles and Ranks using Optimal Transportation

Independence Testing

(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d

Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y

µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )

µ = µX × µY = Unif([0, 1]d)

Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)

Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)

RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY

n

Define Rn := (RXn , R

Yn ) : Rd → [0, 1]d

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Page 78: Multivariate Quantiles and Ranks using Optimal Transportation

Independence Testing

(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d

Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y

µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )

µ = µX × µY = Unif([0, 1]d)

Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)

Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)

RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY

n

Define Rn := (RXn , R

Yn ) : Rd → [0, 1]d

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Page 79: Multivariate Quantiles and Ranks using Optimal Transportation

Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)

Rn := (RXn , R

Yn ); RX

n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY

n rank map of Y1, . . . ,Yn

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)]

Under H0 : X ⊥⊥ Y ,

TnP→ 0, as n→∞.

Further, if X 6⊥⊥ Y (and mild regularity conditions),

TnP→ c > 0, as n→∞.

Page 80: Multivariate Quantiles and Ranks using Optimal Transportation

Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)

Rn := (RXn , R

Yn ); RX

n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY

n rank map of Y1, . . . ,Yn

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)]

Under H0 : X ⊥⊥ Y ,

TnP→ 0, as n→∞.

Further, if X 6⊥⊥ Y (and mild regularity conditions),

TnP→ c > 0, as n→∞.

Page 81: Multivariate Quantiles and Ranks using Optimal Transportation

Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)

Rn := (RXn , R

Yn ); RX

n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY

n rank map of Y1, . . . ,Yn

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)]

Under H0 : X ⊥⊥ Y ,

TnP→ 0, as n→∞.

Further, if X 6⊥⊥ Y (and mild regularity conditions),

TnP→ c > 0, as n→∞.

Page 82: Multivariate Quantiles and Ranks using Optimal Transportation

Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)

Rn := (RXn , R

Yn ); RX

n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY

n rank map of Y1, . . . ,Yn

Test statistic: Tn :=∫S

∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)

Question: Is Tn is (asymptotically) distribution-free for d > 1?

Critical value: Can always be computed by permutation principle

Theorem [Ghosal and S. (2018+)]

Under H0 : X ⊥⊥ Y ,

TnP→ 0, as n→∞.

Further, if X 6⊥⊥ Y (and mild regularity conditions),

TnP→ c > 0, as n→∞.

Page 83: Multivariate Quantiles and Ranks using Optimal Transportation

Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.

Estimation of the “center” of the data cloud

Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))

We can show Qn(0)P→ Q(0). What about rate of convergence?

What is the limiting distribution?

What about other sample quantiles?

Thank you very much!

Questions?

Page 84: Multivariate Quantiles and Ranks using Optimal Transportation

Future research

Construct finite sample distribution-free goodness-of-fit tests

Power study of these testing procedures

Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.

Estimation of the “center” of the data cloud

Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))

We can show Qn(0)P→ Q(0). What about rate of convergence?

What is the limiting distribution?

What about other sample quantiles?

Thank you very much!

Questions?

Page 85: Multivariate Quantiles and Ranks using Optimal Transportation

Kantorovich duality for general cost functions

π ∈ Π(µ, ν): all probability dist. on X × Y with marginals µ and ν

Kontorovich duality

Let I (π) :=∫X×Y c(x , y)dπ(x , y) where c(·, ·) ≥ 0 is l.s.c. Then,

infπ∈Π(µ,ν)

I (π) = sup(φ,ψ)∈Φc

{∫

Xφ(x)dµ(x) +

Yψ(y)dν(y)

}

where Φc is the set of all measurable functions (φ, ψ) ∈ L1(µ)× L1(ν)satisfying

φ(x) + ψ(y) ≤ c(x , y) for µ-a.e. x ∈ X , ν-a.e. y ∈ Y.

In fact, one can restrict to fncs. in Φc that are bounded & continuous.

Page 86: Multivariate Quantiles and Ranks using Optimal Transportation

Duality when c(x , y) = ‖x − y‖2 and X = Y = Rd

µ, ν probability measures on Rd with finite second momentsπ ∈ Π(µ, ν): all prob. measures on Rd × Rd with marginals µ and ν

Let M2 =∫Rd ‖x‖2dµ(x) +

∫Rd ‖y‖2dν(y) < +∞

Then, infπ∈Π(µ,ν)

∫‖x − y‖2dπ(x , y) = M2− 2 sup

π∈Π(µ,ν)

∫〈x , y〉dπ(x , y)

Kontorovich duality

For a pair (φ, φ∗) of l.s.c. proper conjugate cvx. func. s.t. fora.e. x , y ∈ Rd , 〈x , y〉 ≤ φ(x) + ψ(y), we have

supπ∈Π(µ,ν)

∫〈x , y〉dπ(x , y) = inf

(φ,φ∗)

{∫φ(x)dµ(x) +

∫φ∗(y)dν(y)

}(2)

Result: Suppose that (ϕ,ϕ∗) solves (??). If µ is abs. cont., then

(i) the unique optimal tranference plan is π = (id ,∇ϕ)#µ;(ii) ∇ϕ is the unique solution to the Monge problem:

∫‖x −∇ϕ(x)‖2dµ(x) = inf

T :T#µ=ν

∫‖x − T (x)‖2dµ(x)

Page 87: Multivariate Quantiles and Ranks using Optimal Transportation

Proof: Step 1

Let J(φ, ψ) :=∫Rd φ(x)dµ(x) +

∫Rd ψ(y)dν(y)

Duality: infπ∈Π(µ,ν)

I (π) = sup(φ,ψ)∈Φ

J(φ, ψ)

where (φ, ψ) ∈ Φ iff for a.a. x , y ∈ Rn, φ(x) + ψ(y) ≤ ‖x − y‖2

Simple algebra yields: 〈x , y〉 ≤[‖x‖2

2 − φ(x)]

+[‖y‖2

2 − ψ(y)]

Define: φ(x) = ‖x‖2

2 − φ(x), ψ(y) = ‖y‖2

2 − ψ(y)

infπ∈Π(µ,ν)

∫‖x − y‖2dπ(x , y) = M2 − 2 sup

π∈Π(µ,ν)

∫〈x , y〉dπ(x , y)

sup(φ,ψ)∈Φ

J(φ, ψ) = M2 − 2 inf(φ,ψ)∈Φ

J(φ, ψ)

where (φ, ψ) ∈ Φ iff for a.a. x , y , 〈x , y〉 ≤ φ(x) + ψ(y)

Thus, supπ∈Π(µ,ν)

∫〈x , y〉dπ(x , y) = inf

(φ,ψ)∈ΦJ(φ, ψ)

Page 88: Multivariate Quantiles and Ranks using Optimal Transportation

Proof: Step 2

Double convexification trick to improve the admissible pairs in thedual problem

Page 89: Multivariate Quantiles and Ranks using Optimal Transportation

Semi-discrete OT

Data: X1, . . . ,Xn in Rd ; ν = 1n

∑ni=1 δXi

µ: an abs. cont. distribution on compact convex set S ⊂ Rd

The dual Kantorovich problem in this setting can be written as:

infψ convex

∫ψ∗(x)dµ(x) +

∫ψ(y)dν(y)

Let ψi = ψ(Xi ), the above minimization problem reduces to

(M) = inf

[∫

Ssup{〈Xi , x〉 − ψi}dµ(x) +

1

n

n∑

i=1

ψi

]

Page 90: Multivariate Quantiles and Ranks using Optimal Transportation

Some facts about convex functions

Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by

∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd

}

Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = {∇f (x)}

A convex function (in Rd) is a.e. differentiable

Page 91: Multivariate Quantiles and Ranks using Optimal Transportation

Some facts about convex functions

Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by

∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd

}

Any element in ∂f (x) is called a subgradient of f at x

If f is differentiable at x then ∂f (x) = {∇f (x)}

A convex function (in Rd) is a.e. differentiable

Page 92: Multivariate Quantiles and Ranks using Optimal Transportation

Legendre-Fenchel dual

Let φ : Rd → R ∪ {+∞}. The convex conjugate φ∗ : Rd → R ∪ {+∞}of φ is defined as

φ∗(x) := supy∈Rd

{〈x , y〉 − φ(y)}

Lemma (Characterization of subdifferential)

Let f : Rd → R ∪ {+∞} be a proper (i.e., f (x) <∞ for some x ∈ Rd)l.s.c. convex function. Then for all x , y ∈ Rd ,

〈x , y〉 = f (x) + f ∗(y)⇐⇒ y ∈ ∂f (x)⇐⇒ x ∈ ∂f ∗(y).

Lemma (Legendre duality)

Let f : Rd → R ∪ {+∞} be a proper function. Then the following threeproperties are equivalent:

(i) f is convex l.s.c. function;

(ii) f = ψ∗ for some proper function ψ;

(iii) f ∗∗ = f .

Page 93: Multivariate Quantiles and Ranks using Optimal Transportation

When is (K) = (M)? (with c(x , y) = ‖x − y‖2)

Discrete case

Suppose that µ and ν supported on {xi}Mi=1 and {yj}Nj=1

Kantorovich’s problem (K):

min{pij≥0}

M∑

i=1

N∑

j=1

pijc(xi , yj) :M∑

i=1

pij = ν(yj);N∑

j=1

pij = µ(xi )

Monge’s problem (M): minT :T#µ=ν

∑Mi=1 ‖xi − T (xi )‖2µ(xi )

In general, (K) 6= (M)

If M = N and

µ(xi ) = ν(yj) =1

N∀i , j ,

then the optimal transference plan in the (K) problem coincides withthe solution of the (M) problem

Page 94: Multivariate Quantiles and Ranks using Optimal Transportation

Absolutely continuous case

µ and ν are absolutely continuous

Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem

Semi-discrete case

ν supported on finite set {yj}Nj=1; µ abs. cont. with support S ⊂ Rd

Monge’s problem: Find T s.t. T#µ =∑N

i=1 νjδyj & minimizes

∫‖x − T (x)‖2dµ(x) =

N∑

j=1

{x :T (x)=yj}

‖x − yj‖2dµ(x)

Note that: µ(T−1(yj)) = µ({x : T (x) = yj}) = νj

Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem