Multivariate Quantiles and Ranks using Optimal Transportation
Transcript of Multivariate Quantiles and Ranks using Optimal Transportation
Multivariate Quantiles and Ranks using OptimalTransportation
Bodhisattva Sen1
Department of StatisticsColumbia University, New York
Department of StatisticsGeorge Mason University
Joint work with Promit Ghosal (Columbia University)
05 April, 2019
1Supported by NSF grants DMS-1712822 and AST-1614743
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
How to define ranks and quantiles in Rd , d > 1?
Ranks and quantiles when d = 1
X is a random variable with c.d.f. F
Rank: The rank of x ∈ R is F (x)
Property: If F is continuous, F (X ) ∼ Unif([0, 1])
Quantile: The quantile function is F−1
Property: If F is continuous, F−1(U) ∼ F where U ∼ Unif([0, 1])
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Defining quantiles, ranks, depth, etc. difficult when d > 1
Lack of a natural ordering in Rd , when d > 1
Many notions of multivariate quantiles/ranks have been suggested:
Puri and Sen (1971), Chaudhuri and Sengupta (1993), Mottonen andOja (1995), Chaudhuri (1996), Liu and Singh (1993), Serfling (2010) ...
Spatial median and geometric quantile
Spatial median: M := arg minm∈Rd
E‖X −m‖
Quantile when d = 1: For u ∈ (0, 1),
F−1(u) = arg minx∈R
E[|X − x | − (2u − 1)x
]
Geometric quantile [Chaudhuri (1996)]: For ‖u‖ < 1, let
Q(u) := arg minx∈Rd
E[‖X − x‖ − 〈u, x〉
]
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?
Monge Problem
What’s the cheapest way to transport a pile of sand to cover asinkhole?
Blanchet (Columbia U. and Stanford U.) 5 / 60
Goal: infT :T (X )∼ν
Eµ[c(X ,T (X ))]
µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =
∫Ydν(y) = 1
c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)
T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,
ν(B) = µ(T−1(B)) =
∫
T−1(B)
dµ, B ⊂ Y
Gaspard Monge (1781): What is the cheapest way to transport a pile ofsand to cover a sinkhole?
Monge Problem
What’s the cheapest way to transport a pile of sand to cover asinkhole?
Blanchet (Columbia U. and Stanford U.) 5 / 60
Goal: infT :T (X )∼ν
Eµ[c(X ,T (X ))]
µ (on X ) and ν (on Y) probability measures,∫Xdµ(x) =
∫Ydν(y) = 1
c(x , y) ≥ 0: cost of transporting x to y (e.g., c(x , y) = ‖x − y‖p)
T transports µ to ν, i.e., T (X ) ∼ ν where X ∼ µ, or,
ν(B) = µ(T−1(B)) =
∫
T−1(B)
dµ, B ⊂ Y
One-dimensional optimal transport
Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν
(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
One-dimensional optimal transport
Suppose X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν
(ii) T minimizes cost Eµ[(X −T (X ))2]; assume c(x , y) = (x − y)2
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Figure 3: Two densities p and q and the optimal transport map to that morphs p into q.
where p � 1. When p = 1 this is also called the Earth Mover distance. The minimizer J⇤
(which does exist) is called the optimal transport plan or the optimal coupling. In case thereis an optimal transport map T then J is a singular measure with all its mass on the set{(x, T (x))}.
It can be shown that
W pp (P, Q) = sup
,�
Z (y)dQ(y) �
Z�(x)dP (x)
where (y) � �(x) ||x � y||p. This is called the dual formulation. In special case wherep = 1 we have the very simple representation
W1(P, Q) = sup
(Zf(x)dP (x) �
Zf(x)dQ(x) : f 2 F
)
where F denotes all maps from Rd to R such that |f(y) � f(x)| ||x � y|| for all x, y.
When d = 1, the distance has a closed form:
Wp(P, Q) =
✓Z 1
0
|F�1(z) � G�1(z)|p◆1/p
4
The minimizing T must satisfy (Why?)
(x0 − T (x0))2 + (x1 − T (x1))2 ≤ (x0 − T (x1))2 + (x1 − T (x0))2
This means that if x1 > x0 then T (x1) ≥ T (x0)
So T must be a monotone nondecreasing function
Therefore, choose T (·) so that (recall: ν(B) =∫T−1(B)
dµ)
∫ x
−∞dµ(x) =
∫ T (x)
−∞dν(y) ⇒ Fµ(x) = Fν(T (x))
Thus, T = F−1ν ◦ Fµ (and this map T is unique)
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Optimal transportation when d = 1
X ,Y ⊂ R; µ, ν abs. cont.; Fµ and Fν c.d.f.’s
Goals: (i) Transport µ to ν; i.e., find T s.t. if X ∼ µ then T (X ) ∼ ν(ii) T minimizes cost Eµ[(X − T (X ))2]
Solution: T = F−1ν ◦ Fµ (and this map T is unique)
Ranks and Quantiles when d = 1
When µ = Unif([0, 1]), T = F−1ν transports µ to ν — quantile map
When ν = Unif([0, 1]), T = Fµ — rank map
Thus, when d = 1, the rank and quantile maps are solutions to theoptimal transport problem
How to do this in higher dimensions, e.g., when X = Y = Rd , d > 1?
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Monge’s problem: Given probability measures µ and ν solve:
infT :T (X )∼ν
Eµ[c(X ,T (X ))] = infT :T#µ=ν
∫
Xc(x ,T (x))dµ(x) (1)
where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel
Drawbacks
Above optimization problem is highly non-linear and can be ill-posed
No admissible T may exist; e.g., if µ is the Dirac delta and ν is not
Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map
Solution need not be unique (book shifting example)
Not much progress was made for about 160 yrs!
Monge’s problem: Given probability measures µ and ν solve:
infT :T (X )∼ν
Eµ[c(X ,T (X ))] = infT :T#µ=ν
∫
Xc(x ,T (x))dµ(x) (1)
where T (X ) ∼ ν iff µ(T−1(B)) = ν(B), for all B Borel
Drawbacks
Above optimization problem is highly non-linear and can be ill-posed
No admissible T may exist; e.g., if µ is the Dirac delta and ν is not
Moreover, the infimum in (1) may not be attained, i.e., a limit oftransport maps {Ti}i≥1 may fail to be a transport map
Solution need not be unique (book shifting example)
Not much progress was made for about 160 yrs!
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
Kantorovich Relaxation: Primal Problem
Monge’s problem (M): infT :T#µ=ν
∫X c(x ,T (x))dµ(x)
Let Π(µ, ν) be the class of joint distributions of (X ,Y ) ∼ π s.t.
πX = marginal of X = µ, πY = marginal of Y = ν
Kantorovich relaxation (K): Solve
minπ∈Π(µ,ν)
Eπ[c(X ,Y )] = minπ:π∈Π(µ,ν)
∫
X×Yc(x , y)dπ(x , y)
Always has a solution for c(·, ·) ≥ 0 lower semicontinuous (l.s.c)2
Linear program (infinite dimensional)
Result: If µ is abs. cont. then (K)=(M) and πopt = (id ,Topt)#µ
Computation: Kantorovich Dual Problem Dual
2A function φ : Rd → R is l.s.c at x0 iff lim infx→x0 φ(x) ≥ φ(x0)
When µ and ν are discrete
Kantorovich relaxation: Solve min{Eπ[c(X ,Y )] : π ∈ Π(µ, ν)
}
Discrete version: µ and ν supported on {xi}Mi=1 and {yj}Nj=1. Then
min{pij≥0}
M∑
i=1
N∑
j=1
pijc(xi , yj) :M∑
i=1
pij = ν(yj);N∑
j=1
pij = µ(xi )
Discrete Kantorovich formulation (Earth Mover’s Distance)
I Let µ =PN
i=1 pi�xi and µ =PM
j=1 qj�yj , where �xi is a Dirac measure,
K(µ, ⌫) = min�
X
i
X
j
c(xi, yj)�ij
s.t.X
j
�ij = pi,X
i
�ij = qj , �ij � 0 (7)
S. Kolouri and G. K. Rohde OT Crash Course
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
A Geometric Approach to Optimal Transportation
µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2
Monge’s problem3 (M): infT :T#µ=ν
∫‖u − T (u)‖2dµ(u)
T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B
Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)
∫‖u − x‖2dπ(u, x)
Compared to above notions this approach has the following advantages:
This relies on appealing geometric ideas
Does not require any moment conditions
When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.
3Monge’s problem is not meaningful unless µ and ν have finite second moments
A Geometric Approach to Optimal Transportation
µ, ν — two probability measures on Rd ; c(u, x) = ‖u − x‖2
Monge’s problem3 (M): infT :T#µ=ν
∫‖u − T (u)‖2dµ(u)
T#µ is the push forward of µ by T , i.e., T#µ(B) = µ(T−1(B)), ∀B
Kantorovich Relaxation (K): minπ:π∈Π(µ,ν)
∫‖u − x‖2dπ(u, x)
Compared to above notions this approach has the following advantages:
This relies on appealing geometric ideas
Does not require any moment conditions
When d = 1 opt. transport T = F−1ν ◦Fµ irrespective of moment assump.
3Monge’s problem is not meaningful unless µ and ν have finite second moments
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
U ∼ µ abs. cont. distribution with support S ⊂ Rd
Example: µ = Unif([0, 1]d) or Unif(Bd(0, 1))
X ∼ ν; ν is a given probability measure in Rd
Goal: Find the “optimal” transportation map T s.t. T#µ = ν
Theorem [Knot and Smith, Brenier, McCann ...]
There is an µ-a.e. unique measurable mapping Q : S → Rd , transportingµ to ν (i.e., Q#µ = ν or Q(U) ∼ ν), of the form
Q(u) = ∇ϕ(u), for µ-a.e. u,
where ϕ: Rd → R ∪ {+∞} is a convex function (cf. when d = 1).
If, in addition, µ, ν have finite second moments, then
(i) Q(·) is the µ-a.e. unique transport map (sol. (M)), i.e.,
infT :T#µ=ν
∫‖u − T (u)‖2dµ(u) =
∫‖u − Q(u)‖2dµ(u);
(ii) µ-a.e. unique optimal tranference plan (sol. (K)) is π = (id ,Q)#µ
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Quantile map when d ≥ 1
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa
Q ≡ ∇ϕ : S → Rd
where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.
Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.
aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc
In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)
Quantile map when d ≥ 1
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ) is the µ-a.e. unique mapa
Q ≡ ∇ϕ : S → Rd
where ∇ϕ pushes µ to ν (i.e., ∇ϕ#µ = ν) and ϕ : Rd → R∪ {+∞} is aconvex function.
Restatement: The quantile map Q of ν (w.r.t. µ) is the µ-a.e. uniquemap that is the gradient of a convex function and pushes µ to ν.
aNote that Q is uniquely defined for µ-a.e.; w.l.o.g., let ϕ(u) = +∞ for u ∈ Sc
In the statistics literature this study was initiated by Chernozhukov etal. (2017, AoS); Hallin (2018, AoS, in revision)
Sample quantiles when d = 1
µ = Uniform([0, 1])ν ≡ νn = 1
n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ R
X(1) < . . . < X(n) be the order statistics
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−0.5
0.5
Quantile function
u
Q
Then the sample quantile function Qn (Qn#µ = νn) reduces to
Qn(u) = X(i), if u ∈(i − 1
n,i
n
), i = 1, . . . , n
At in , i = 1, . . . , n − 1, we are free to define
Qn
(i
n
)∈ [X(i),X(i+1)]
Sample quantiles in Rd , d ≥ 1
µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)
ν ≡ νn = 1n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd
Qn is the transport (Monge) map s.t. Qn#µ = 1n
∑ni=1 δXi and
minimizes (in this case (K)=(M))∫
S
‖u − T (u)‖2dµ(u) =n∑
i=1
∫
{u∈S:T (u)=Xi}
‖u − Xi‖2dµ(u)
Sample quantiles in Rd , d ≥ 1
µ abs. cont. with support S ⊂ Rd ; e.g., µ = Uniform([0, 1]d)
ν ≡ νn = 1n
∑ni=1 δXi is the empirical distribution of {Xi}ni=1 ⊂ Rd
Qn is the transport (Monge) map s.t. Qn#µ = 1n
∑ni=1 δXi and
minimizes (in this case (K)=(M))∫
S
‖u − T (u)‖2dµ(u) =n∑
i=1
∫
{u∈S:T (u)=Xi}
‖u − Xi‖2dµ(u)
Computation in the semi-discrete case
Obtain a convex subdivision of S — “partition” of S = ∪ni=1Q−1n (Xi )
Top-dimensional cells: convex polyhedral sets in the subdivision of Swith non-empty interior
Question: How to compute Qn? (Figures and plots?)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Figure: The data sets are drawn from the following distributions (clockwise topto bottom): (i) X ∼ N2((0, 0), I2); (ii) X ∼ N2((0, 0),Σ) where Σ1,1 = Σ2,2 = 1and Σ1,2 = Σ2,1 = 0.99; (iii) two spiral structures with Gaussian perturbations(with small variance); and (iv) a mixture of four different distributions.
Rank map
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd
where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function
aBy convention ϕ(u) = +∞ for u /∈ S
Rank map
The rank map of ν (w.r.t. µ) is defined by
R ≡ ∇ϕ∗
where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:
ϕ∗(x) := supu∈Rd
{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}
Note that the rank map R(·) is finite on Rd convex functions
Rank map
µ has an abs. cont. distribution with support S ⊂ Rd
ν a given probability measure in Rd (need not be abs. cont.)
Quantile map
The quantile map of ν (w.r.t. µ): µ-a.e. unique map Q ≡ ∇ϕ : S → Rd
where ∇ϕ#µ = ν and ϕ : Rd → R ∪ {+∞} is a convexa function
aBy convention ϕ(u) = +∞ for u /∈ S
Rank map
The rank map of ν (w.r.t. µ) is defined by
R ≡ ∇ϕ∗
where ∇ϕ#µ = ν, ϕ∗ : Rd → R is (convex) Legendre-Fenchel dual of ϕ:
ϕ∗(x) := supu∈Rd
{〈x , u〉 − ϕ(u)} = supu∈S{〈x , u〉 − ϕ(u)}
Note that the rank map R(·) is finite on Rd convex functions
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
When is R = Q−1?
X ∼ ν: supported on convex X ⊂ Rd with Lebesgue density4
U ∼ µ: supported on cvx. compact S ⊂ Rd with bounded density
Result [Ghosal and S. (2018+)]
Let Q = ∇ϕ be the quantile function of ν (w.r.t. µ), whereϕ : Rd → R ∪ {+∞} is convex. Then:
(i) The inverse function of Q exists, and has the form
Q−1 = ∇ϕ∗ =: R,
where ϕ∗ is the Legendre-Fenchel dual of ϕ.
(ii) Q is a homeomorphism from Int(S) to Int(X ) (cf. when d = 1a)
(ii) R = Q−1 is the ν-a.e. unique map that pushes ν to µ (i.e.,R#ν = µ) which is the gradient of a convex function.
athe c.d.f. F is continuous and strictly increasing
4with mild boundedness (from below and above) assumptions on the density on X
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Properties of the rank/quantile maps
Characterizes the distribution
The quantile and rank functions characterize the associated distribution
Equivariance under orthogonal transformations
Suppose Y = AX , A is d × d matrix
A is an orthogonal matrix, i.e., AA> = A>A = Id
µ: spherically symmetric distribution (e.g., µ = Uniform(Bd(0, 1))
Then, QY (u) = AQX (A>u) for µ-a.e. u
RY (y) = ARX (A>y), for a.e. y ∈ Rd
Quantile/rank maps — equivariant under orthogonal transformations
Under mutual independence
X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;
Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)
µ = Uniform([0, 1]d)
Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))
Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps
Mutual independence [Ghosal and S. (2018+)]
If X1, . . . ,Xk are mutually independent then
Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),
R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .
Under mutual independence
X = (X1,X2, . . . ,Xk) ∼ ν where k ≥ 2;
Xi ∼ νi , for i = 1, . . . , k are r.v. in Rdi (here d1 + . . .+ dk = d)
µ = Uniform([0, 1]d)
Let Q and Qi be the quantile maps of X and Xi , for i = 1, . . . , k,respectively (w.r.t. µ and µi = Uniform([0, 1]di ))
Let R and Ri , for i = 1, . . . , k , be the corresponding rank maps
Mutual independence [Ghosal and S. (2018+)]
If X1, . . . ,Xk are mutually independent then
Q(u1, . . . , uk) = (Q1(u1), . . . ,Qk(uk)), for µ-a.e. (u1, . . . , uk),
R(x1, . . . , xk) = (R1(x1), . . . ,Rk(xk)), for a.e. (x1, . . . , xk) ∈ Rd .
Sample rank map when d ≥ 1
X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex
Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n
∑ni=1 δXi
Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.
Thus ϕn is a piecewise affine convex function:
ϕn(u) =
{max
i=1,...,n{〈Xi , u〉+ hi}, u ∈ S
+∞, u ∈ Sc
Sample rank map
The sample rank map is defined as
Rn = ∇ϕ∗n
where ϕ∗n : Rd → R is also convex piecewise affine:
ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}
Sample rank map when d ≥ 1
X1, . . . ,Xn ∈ Rd ; µ is abs. cont. on S ⊂ Rd convex
Sample quantile function: Qn ≡ ∇ϕn pushes µ to ν = 1n
∑ni=1 δXi
Observe that ∇ϕn ∈ {X1, . . . ,Xn} µ-a.e.
Thus ϕn is a piecewise affine convex function:
ϕn(u) =
{max
i=1,...,n{〈Xi , u〉+ hi}, u ∈ S
+∞, u ∈ Sc
Sample rank map
The sample rank map is defined as
Rn = ∇ϕ∗n
where ϕ∗n : Rd → R is also convex piecewise affine:
ϕ∗n(x) = supu∈S{〈x , u〉 − ϕn(u)}
How to define the sample ranks Rn(Xi)?
The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)
But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n
Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u)
= Qn(u) convex functions
Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!
How to define the sample ranks Rn(Xi)?
The sample rank map is Rn = ∇ϕ∗n (Recall: Qn = ∇ϕn)
But ϕ∗n is not differentiable at Xi ; i = 1, . . . , n
Fact: u ∈ ∂ϕ∗n(Xi ) ⇔ Xi ∈ ∂ϕn(u) = Qn(u) convex functions
Result: Rn(Xi ) ∈ ∂ϕ∗n(Xi ) which contains top-dim. cell Q−1n (Xi )!
The sample ranks Rn(Xi ) when d = 1
The sample rank map: Rn(x) = in , if x ∈
(X(i),X(i+1)
)
Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]
Usual ranks when d = 1
The sample DF (rank): Fn(x) = in , if x ∈
[X(i),X(i+1)
)
Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1
n (Xi ))‖u‖
The sample ranks Rn(Xi ) when d = 1
The sample rank map: Rn(x) = in , if x ∈
(X(i),X(i+1)
)
Free to define Rn(X(i)) as anything in the interval [(i − 1)/n, i/n]
Usual ranks when d = 1
The sample DF (rank): Fn(x) = in , if x ∈
[X(i),X(i+1)
)
Convention: We can define Rn(Xi ) = maxu∈Cl(Q−1
n (Xi ))‖u‖
Distribution-free multivariate ranks Rn(Xi )
When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)
We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1
n (Xi ), i.e.,
Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))
Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)
Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map
Distribution-free multivariate ranks Rn(Xi )
When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)
We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1
n (Xi ), i.e.,
Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))
Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)
Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map
Distribution-free multivariate ranks Rn(Xi )
When d = 1, the ranks {Fn(Xi )}ni=1, are identically distributed(on {1/n, 2/n, . . . , n/n} with probability 1/n each)
We define Rn(Xi ) as a random point drawn from the uniformdistribution on the cell Q−1
n (Xi ), i.e.,
Rn(Xi )|X1, . . . ,Xn ∼ Uniform(Q−1n (Xi ))
Result: If µ = Uniform(S), Rn(Xi ) ∼ µ = Uniform(S)
Compare: R(Xi ) ∼ µ = Uniform(S), R is the pop. rank map
Glivenko-Cantelli type result [Ghosal & S. (2018+)]
Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))
Let Q and R be the quantile and rank maps of ν (w.r.t. µ)
Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism
Let Qn and Rn be any sample quantile and rank functions
Let K1 ⊂ Int(S) be a compact set. Then, we have
supu∈K1
‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd
‖Rn(x)− R(x)‖ a.s.→ 0
Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.
Glivenko-Cantelli type result [Ghosal & S. (2018+)]
Let X1,X2, . . . ∈ Rd be i.i.d. ν where ν is abs. cont. with support XTake µ = Uniform(Bd(0, 1))
Let Q and R be the quantile and rank maps of ν (w.r.t. µ)
Suppose Q = ∇ϕ where ∇ϕ : Int(S)→ Int(X ) is homeomorphism
Let Qn and Rn be any sample quantile and rank functions
Let K1 ⊂ Int(S) be a compact set. Then, we have
supu∈K1
‖Qn(u)− Q(u)‖ a.s.→ 0 and supx∈Rd
‖Rn(x)− R(x)‖ a.s.→ 0
Generalizes the G-C result in Chernozhukov et al. (2017, AoS) which:(i) assumed that ν is compactly supported;(ii) showed uniform convergence of Rn only on compacts inside Int(X );(iii) showed in probability convergence.
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Two-sample Testing
Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd
Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd
µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
Quantile maps
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Population quantile maps: QX and QY
Recall: QX#µ = νX and QX#µ = νY
Two-sample Testing
Suppose that X1, . . . ,Xm are i.i.d. νX (abs. cont.) on Rd
Suppose that Y1, . . . ,Yn are i.i.d. νY (abs. cont.) on Rd
µ: Distribution on S ⊂ Rd (e.g., µ = Unif([0, 1]d))
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
Quantile maps
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Population quantile maps: QX and QY
Recall: QX#µ = νX and QX#µ = νY
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn
Test statistic:
Tm,n :=
∫
S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =
∫ 1
0
{Fn(F−1(u))− u}2du
Goal: Test H0 : νX = νY versus H1 : νX 6= νY
QX and QY are the sample quantile maps for Xi ’s and Yj ’s
Joint rank map: Rm,n is the rank map (properly defined) of thecombined sample X1, . . . ,Xm,Y1, . . . ,Yn
Test statistic:
Tm,n :=
∫
S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
Motivation: One sample Cramer-von Mises statistic∫{Fn(x)− F (x)}2dF (x) =
∫ 1
0
{Fn(F−1(u))− u}2du
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Test statistic: Tm,n =∫S
∥∥Rm,n(QX (u))− Rm,n(QY (u))∥∥2dµ(u)
When d = 1, Tm,n is distribution-free!
Question: Is Tm,n is (asymptotically) distribution-free when d > 1?
Critical value: Can always be computed by permutation test
Theorem [Ghosal and S. (2018+)]
Suppose that mm+n → θ ∈ (0, 1) as m, n→∞. Under H0 : νX = νY ,
Tm,nP→ 0, as m, n→∞.
Further, for νX 6= νY (and mild regularity conditions on νX and νY ),
Tm,nP→ c > 0 as m, n→∞.
Outline
1 Introduction to Optimal TransportationMonge’s ProblemKantorovich Relaxation: Primal ProblemA Geometric Approach
2 Quantile and Rank Functions in Rd (d ≥ 1)
3 Some Applications is StatisticsTwo-sample Goodness-of-fit TestingIndependence Testing
Independence Testing
(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d
Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y
µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )
µ = µX × µY = Unif([0, 1]d)
Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)
Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)
RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY
n
Define Rn := (RXn , R
Yn ) : Rd → [0, 1]d
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Independence Testing
(X1,Y1), . . . , (Xn,Yn) are i.i.d. ν (abs. cont.) on RdX × RdY ; dX + dY = d
Goal: Test H0 : X ⊥⊥ Y versus H1 : X 6⊥⊥ Y
µX = Unif([0, 1]dX ), µY = Unif([0, 1]dY )
µ = µX × µY = Unif([0, 1]d)
Rn : Rd → Rd — rank map of joint sample (X1,Y1), . . . , (Xn,Yn)
Qn: sample quantile map of joint sample (X1,Y1), . . . , (Xn,Yn)
RXn : RdX → RdX — rank map of X1, . . . ,Xn; similarly RY
n
Define Rn := (RXn , R
Yn ) : Rd → [0, 1]d
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)
Rn := (RXn , R
Yn ); RX
n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY
n rank map of Y1, . . . ,Yn
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Question: Is Tn is (asymptotically) distribution-free for d > 1?
Critical value: Can always be computed by permutation principle
Theorem [Ghosal and S. (2018+)]
Under H0 : X ⊥⊥ Y ,
TnP→ 0, as n→∞.
Further, if X 6⊥⊥ Y (and mild regularity conditions),
TnP→ c > 0, as n→∞.
Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)
Rn := (RXn , R
Yn ); RX
n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY
n rank map of Y1, . . . ,Yn
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Question: Is Tn is (asymptotically) distribution-free for d > 1?
Critical value: Can always be computed by permutation principle
Theorem [Ghosal and S. (2018+)]
Under H0 : X ⊥⊥ Y ,
TnP→ 0, as n→∞.
Further, if X 6⊥⊥ Y (and mild regularity conditions),
TnP→ c > 0, as n→∞.
Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)
Rn := (RXn , R
Yn ); RX
n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY
n rank map of Y1, . . . ,Yn
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Question: Is Tn is (asymptotically) distribution-free for d > 1?
Critical value: Can always be computed by permutation principle
Theorem [Ghosal and S. (2018+)]
Under H0 : X ⊥⊥ Y ,
TnP→ 0, as n→∞.
Further, if X 6⊥⊥ Y (and mild regularity conditions),
TnP→ c > 0, as n→∞.
Rn and Qn — rank and quantile map based on (X1,Y1), . . . , (Xn,Yn)
Rn := (RXn , R
Yn ); RX
n : RdX → RdX — rank map of X1, . . . ,Xn;similarly RY
n rank map of Y1, . . . ,Yn
Test statistic: Tn :=∫S
∥∥Rn(Qn(u))− Rn(Qn(u))∥∥2dµ(u)
Question: Is Tn is (asymptotically) distribution-free for d > 1?
Critical value: Can always be computed by permutation principle
Theorem [Ghosal and S. (2018+)]
Under H0 : X ⊥⊥ Y ,
TnP→ 0, as n→∞.
Further, if X 6⊥⊥ Y (and mild regularity conditions),
TnP→ c > 0, as n→∞.
Future research
Construct finite sample distribution-free goodness-of-fit tests
Power study of these testing procedures
Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.
Estimation of the “center” of the data cloud
Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))
We can show Qn(0)P→ Q(0). What about rate of convergence?
What is the limiting distribution?
What about other sample quantiles?
Thank you very much!
Questions?
Future research
Construct finite sample distribution-free goodness-of-fit tests
Power study of these testing procedures
Comparison with other methods; e.g., RKHS methods, Energydistance methods, mutual information, etc.
Estimation of the “center” of the data cloud
Sample “median” Qn(0) when µ = Uniform(Bd(0, 1))
We can show Qn(0)P→ Q(0). What about rate of convergence?
What is the limiting distribution?
What about other sample quantiles?
Thank you very much!
Questions?
Kantorovich duality for general cost functions
π ∈ Π(µ, ν): all probability dist. on X × Y with marginals µ and ν
Kontorovich duality
Let I (π) :=∫X×Y c(x , y)dπ(x , y) where c(·, ·) ≥ 0 is l.s.c. Then,
infπ∈Π(µ,ν)
I (π) = sup(φ,ψ)∈Φc
{∫
Xφ(x)dµ(x) +
∫
Yψ(y)dν(y)
}
where Φc is the set of all measurable functions (φ, ψ) ∈ L1(µ)× L1(ν)satisfying
φ(x) + ψ(y) ≤ c(x , y) for µ-a.e. x ∈ X , ν-a.e. y ∈ Y.
In fact, one can restrict to fncs. in Φc that are bounded & continuous.
Duality when c(x , y) = ‖x − y‖2 and X = Y = Rd
µ, ν probability measures on Rd with finite second momentsπ ∈ Π(µ, ν): all prob. measures on Rd × Rd with marginals µ and ν
Let M2 =∫Rd ‖x‖2dµ(x) +
∫Rd ‖y‖2dν(y) < +∞
Then, infπ∈Π(µ,ν)
∫‖x − y‖2dπ(x , y) = M2− 2 sup
π∈Π(µ,ν)
∫〈x , y〉dπ(x , y)
Kontorovich duality
For a pair (φ, φ∗) of l.s.c. proper conjugate cvx. func. s.t. fora.e. x , y ∈ Rd , 〈x , y〉 ≤ φ(x) + ψ(y), we have
supπ∈Π(µ,ν)
∫〈x , y〉dπ(x , y) = inf
(φ,φ∗)
{∫φ(x)dµ(x) +
∫φ∗(y)dν(y)
}(2)
Result: Suppose that (ϕ,ϕ∗) solves (??). If µ is abs. cont., then
(i) the unique optimal tranference plan is π = (id ,∇ϕ)#µ;(ii) ∇ϕ is the unique solution to the Monge problem:
∫‖x −∇ϕ(x)‖2dµ(x) = inf
T :T#µ=ν
∫‖x − T (x)‖2dµ(x)
Proof: Step 1
Let J(φ, ψ) :=∫Rd φ(x)dµ(x) +
∫Rd ψ(y)dν(y)
Duality: infπ∈Π(µ,ν)
I (π) = sup(φ,ψ)∈Φ
J(φ, ψ)
where (φ, ψ) ∈ Φ iff for a.a. x , y ∈ Rn, φ(x) + ψ(y) ≤ ‖x − y‖2
Simple algebra yields: 〈x , y〉 ≤[‖x‖2
2 − φ(x)]
+[‖y‖2
2 − ψ(y)]
Define: φ(x) = ‖x‖2
2 − φ(x), ψ(y) = ‖y‖2
2 − ψ(y)
infπ∈Π(µ,ν)
∫‖x − y‖2dπ(x , y) = M2 − 2 sup
π∈Π(µ,ν)
∫〈x , y〉dπ(x , y)
sup(φ,ψ)∈Φ
J(φ, ψ) = M2 − 2 inf(φ,ψ)∈Φ
J(φ, ψ)
where (φ, ψ) ∈ Φ iff for a.a. x , y , 〈x , y〉 ≤ φ(x) + ψ(y)
Thus, supπ∈Π(µ,ν)
∫〈x , y〉dπ(x , y) = inf
(φ,ψ)∈ΦJ(φ, ψ)
Proof: Step 2
Double convexification trick to improve the admissible pairs in thedual problem
Semi-discrete OT
Data: X1, . . . ,Xn in Rd ; ν = 1n
∑ni=1 δXi
µ: an abs. cont. distribution on compact convex set S ⊂ Rd
The dual Kantorovich problem in this setting can be written as:
infψ convex
∫ψ∗(x)dµ(x) +
∫ψ(y)dν(y)
Let ψi = ψ(Xi ), the above minimization problem reduces to
(M) = inf
[∫
Ssup{〈Xi , x〉 − ψi}dµ(x) +
1
n
n∑
i=1
ψi
]
Some facts about convex functions
Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by
∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd
}
Any element in ∂f (x) is called a subgradient of f at x
If f is differentiable at x then ∂f (x) = {∇f (x)}
A convex function (in Rd) is a.e. differentiable
Some facts about convex functions
Given a convex function f : Rd → R ∪ {+∞} we define thesubdifferential of f at x ∈ Dom(f ) by
∂f (x) :={ξ ∈ Rd : f (x) + 〈y − x , ξ〉 ≤ f (y), for all y ∈ Rd
}
Any element in ∂f (x) is called a subgradient of f at x
If f is differentiable at x then ∂f (x) = {∇f (x)}
A convex function (in Rd) is a.e. differentiable
Legendre-Fenchel dual
Let φ : Rd → R ∪ {+∞}. The convex conjugate φ∗ : Rd → R ∪ {+∞}of φ is defined as
φ∗(x) := supy∈Rd
{〈x , y〉 − φ(y)}
Lemma (Characterization of subdifferential)
Let f : Rd → R ∪ {+∞} be a proper (i.e., f (x) <∞ for some x ∈ Rd)l.s.c. convex function. Then for all x , y ∈ Rd ,
〈x , y〉 = f (x) + f ∗(y)⇐⇒ y ∈ ∂f (x)⇐⇒ x ∈ ∂f ∗(y).
Lemma (Legendre duality)
Let f : Rd → R ∪ {+∞} be a proper function. Then the following threeproperties are equivalent:
(i) f is convex l.s.c. function;
(ii) f = ψ∗ for some proper function ψ;
(iii) f ∗∗ = f .
When is (K) = (M)? (with c(x , y) = ‖x − y‖2)
Discrete case
Suppose that µ and ν supported on {xi}Mi=1 and {yj}Nj=1
Kantorovich’s problem (K):
min{pij≥0}
M∑
i=1
N∑
j=1
pijc(xi , yj) :M∑
i=1
pij = ν(yj);N∑
j=1
pij = µ(xi )
Monge’s problem (M): minT :T#µ=ν
∑Mi=1 ‖xi − T (xi )‖2µ(xi )
In general, (K) 6= (M)
If M = N and
µ(xi ) = ν(yj) =1
N∀i , j ,
then the optimal transference plan in the (K) problem coincides withthe solution of the (M) problem
Absolutely continuous case
µ and ν are absolutely continuous
Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem
Semi-discrete case
ν supported on finite set {yj}Nj=1; µ abs. cont. with support S ⊂ Rd
Monge’s problem: Find T s.t. T#µ =∑N
i=1 νjδyj & minimizes
∫‖x − T (x)‖2dµ(x) =
N∑
j=1
∫
{x :T (x)=yj}
‖x − yj‖2dµ(x)
Note that: µ(T−1(yj)) = µ({x : T (x) = yj}) = νj
Then there is a unique solution to the (K) problem which turns outto be also the solution of the (M) problem