Riemannian Optimization and the Computation of the Divergences and the Karcher...
Transcript of Riemannian Optimization and the Computation of the Divergences and the Karcher...
-
Riemannian Optimization and the Computation ofthe Divergences and the Karcher Mean of
Symmetric Positive Definite Matrices
Xinru Yuan1, Wen Huang2, Kyle A. Gallivan1, and P.-A. Absil3
1,Florida State University 2,Rice University 3,Université Catholique de Louvain
May 7, 2018SIAM Conference on Applied Linear Algebra
Means of SPD Matrices 1
-
Symmetric Positive Definite (SPD) Matrix
Definition
A symmetric matrix A is called positive definite iff all its eigenvalues arepositive.
2× 2 SPD matrix
u√λu
v√λv
3× 3 SPD matrix
u√λu v√
λv
w√λw
Means of SPD Matrices 2
-
Motivation of Averaging SPD Matrices
Possible applications of SPD matrices
- Diffusion tensors in medical imaging[CSV12, FJ07, RTM07]
- Describing images and video[LWM13, SFD02, ASF+05, TPM06,HWSC15]
Motivation of averaging SPD matrices
- Aggregate several noisy measurements of the same object
- Subtask in interpolation methods, segmentation, and clustering
Means of SPD Matrices 3
-
Averaging Schemes: from Scalars to Matrices
Let A1, . . . ,AK be SPD matrices.
Generalized arithmetic mean: 1K
K∑i=1
Ai
→ Not appropriate in many practical applications
A A+B2 B
detA = 50 det(A+B2 ) = 267.56 detB = 50
Generalized geometric mean: (A1 · · ·AK )1/K
→ Not appropriate due to non-commutativity→ How to define a matrix geometric mean?
Means of SPD Matrices 4
-
Averaging Schemes: from Scalars to Matrices
Let A1, . . . ,AK be SPD matrices.
Generalized arithmetic mean: 1K
K∑i=1
Ai
→ Not appropriate in many practical applications
A A+B2 B
detA = 50 det(A+B2 ) = 267.56 detB = 50
Generalized geometric mean: (A1 · · ·AK )1/K
→ Not appropriate due to non-commutativity→ How to define a matrix geometric mean?
Means of SPD Matrices 4
-
Averaging Schemes: from Scalars to Matrices
Let A1, . . . ,AK be SPD matrices.
Generalized arithmetic mean: 1K
K∑i=1
Ai
→ Not appropriate in many practical applications
A A+B2 B
detA = 50 det(A+B2 ) = 267.56 detB = 50
Generalized geometric mean: (A1 · · ·AK )1/K
→ Not appropriate due to non-commutativity→ How to define a matrix geometric mean?
Means of SPD Matrices 4
-
Desired Properties of a Matrix Geometric Mean
The desired properties are given in the ALM list1, some of which are:
G (Aπ(1), . . . ,Aπ(K)) = G (A1, . . . ,AK ) with π a permutation of (1, . . . ,K)
if A1, . . . ,AK commute, then G(A1, . . . ,AK ) = (A1, . . . ,AK )1/K
G(A1, . . . ,AK )−1 = G(A−11 , . . . ,A
−1K )
det(G(A1, . . . ,AK )) = (det(A1) · · · det(AK ))1/K
1T. Ando, C.-K. Li, and R. Mathias, Geometric means, Linear Algebra and ItsApplications, 385:305-334, 2004
Means of SPD Matrices 5
-
Geometric Mean of SPD Matrices
A well-known mean on the manifold of SPD matrices is the Karchermean [Kar77]:
G (A1, . . . ,AK ) = arg minX∈Sn++
1
2K
K∑i=1
δ2(X ,Ai ), (1)
where δ(X ,Y ) = ‖ log(X−1/2YX−1/2)‖F is the geodesic distanceunder the affine-invariant metric
g(ηX , ξX ) = trace(ηXX−1ξXX
−1)
The Karcher mean defined in (1) satisfies all the geometricproperties in the ALM list [LL11]
Means of SPD Matrices 6
-
Algorithms
G (A1, . . . ,Ak) = arg minX∈Sn++
1
2K
K∑i=1
δ2(X ,Ai ),
Riemannian steepest descent [RA11, Ren13]
Riemannian Barzilai-Borwein method [IP15]
Riemannian Newton method [RA11]
Richardson-like iteration [BI13]
Riemannian steepest descent, conjugate gradient, BFGS, and trustregion Newton methods [JVV12]
Limited-memory Riemannian BFGS method [YHAG16]
Means of SPD Matrices 7
-
Conditioning of the Objective Function
Hemstitching phenomenonfor steepest descent
well-conditioned Hessian ill-conditioned Hessian
Small condition number ⇒ fast convergence
Large condition number ⇒ slow convergence
Means of SPD Matrices 8
-
Conditioning of the Karcher Mean Objective Function
Riemannian metric:
gX (ξ, η) = trace(ξX−1ηX−1)
Euclidean metric:
gX (ξ, η) = trace(ξη)
Condition number κ of Hessian at the minimizer µ:
Hessian of Riemannian metric:
- κ(HR) ≤ 1 + ln(maxκi )2
, where κi = κ(µ−1/2Aiµ
−1/2)
- κ(HR) ≤ 20 if max(κi ) = 1016
Hessian of Euclidean metric:
-κ2(µ)
κ(HR)≤ κ(HE) ≤ κ(HR)κ2(µ)
- κ(HE ) ≥ κ2(µ)/20
Means of SPD Matrices 9
-
BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian
replace by Rxk (ηk)
Update formula:
y
xk+1 = xk + αkηk
Search direction:ηk = −B−1k grad f (xk)
Bk update:
Bk+1 = Bk −Bksks
Tk Bk
sTk Bksk+
ykyTk
yTk sk,
forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)
x xreplaced by R−1xk (xk+1) on different tangent spaces
Means of SPD Matrices 10
-
BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian
replace by Rxk (ηk)
Update formula:y
xk+1 = xk + αkηk
Search direction:ηk = −B−1k grad f (xk)
Bk update:
Bk+1 = Bk −Bksks
Tk Bk
sTk Bksk+
ykyTk
yTk sk,
forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)
x xreplaced by R−1xk (xk+1) on different tangent spaces
Means of SPD Matrices 10
-
BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian
replace by Rxk (ηk)
Update formula:y
xk+1 = xk + αkηk
Search direction:ηk = −B−1k grad f (xk)
Bk update:
Bk+1 = Bk −Bksks
Tk Bk
sTk Bksk+
ykyTk
yTk sk,
forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)x x
replaced by R−1xk (xk+1) on different tangent spaces
Means of SPD Matrices 10
-
Riemannian BFGS (RBFGS) Algorithm
Update formula:
xk+1 = Rxk (αkηk) with ηk = −B−1k grad f (xk)
Bk update [HGA15]:
Bk+1 = B̃k −B̃k sk(B̃k sk)[
(B̃k sk)[sk+
yky[ky[k sk
,
where sk = Tαkηkαkηk , yk = β−1k grad f (xk+1)− Tαkηk grad f (xk),
and B̃k = Tαkηk ◦ Bk ◦ T −1αkηk .
Stores and transports B−1k as a dense matrix
Requires excessive computation time and storage space forlarge-scale problem
Means of SPD Matrices 11
-
Limited-memory RBFGS (LRBFGS)
Riemannian BFGS:
Bk+1 = B̃k − B̃k sk (B̃k sk )[
(B̃k sk )[sk+
yky[ky[k sk
,
where sk = Tαkηkαkηk , yk = β−1k grad f (xk+1)− Tαkηk grad f (xk),
and B̃k = Tαkηk ◦ Bk ◦ T −1αkηk
Limited-memory Riemannian BFGS:
Stores only the m most recent sk and yk
Transports those vectors to the new tangent space rather than theentire matrix B−1kComputational and storage complexity depends upon m
Means of SPD Matrices 12
-
Implementations
Representations of tangent vectors
: TX Sn++ = {S ∈ Rn×n|S = ST}
Extrinsic representation: n2-dimensional vector
Intrinsic representation: d-dimensional vector where d = n(n + 1)/2
Retraction
Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2
Second order approximation retraction [JVV12]:
RX (ξ) = X + ξ +1
2ξX−1ξ
Vector transport
Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(
X−12 ηX−
12
2)X−
12
Vector transport by parallelization [HAG15]: essentially an identity
Means of SPD Matrices 13
-
Implementations
Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}
Extrinsic representation: n2-dimensional vector
Intrinsic representation: d-dimensional vector where d = n(n + 1)/2
Retraction
Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2
Second order approximation retraction [JVV12]:
RX (ξ) = X + ξ +1
2ξX−1ξ
Vector transport
Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(
X−12 ηX−
12
2)X−
12
Vector transport by parallelization [HAG15]: essentially an identity
Means of SPD Matrices 13
-
Implementations
Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}
Extrinsic representation: n2-dimensional vector
Intrinsic representation: d-dimensional vector where d = n(n + 1)/2
Retraction
Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2
Second order approximation retraction [JVV12]:
RX (ξ) = X + ξ +1
2ξX−1ξ
Vector transport
Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(
X−12 ηX−
12
2)X−
12
Vector transport by parallelization [HAG15]: essentially an identity
Means of SPD Matrices 13
-
Implementations
Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}
Extrinsic representation: n2-dimensional vector
Intrinsic representation: d-dimensional vector where d = n(n + 1)/2
Retraction
Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2
Second order approximation retraction [JVV12]:
RX (ξ) = X + ξ +1
2ξX−1ξ
Vector transport
Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(
X−12 ηX−
12
2)X−
12
Vector transport by parallelization [HAG15]: essentially an identity
Means of SPD Matrices 13
-
Complexity Comparison for LRBFGS
Extrinsic approach:
Function
Riemannian gradient
Intrinsic approach:
Function
Riemannian gradient
Both approaches have the same complexities: f + ∇f cost
Means of SPD Matrices 14
-
Complexity Comparison for LRBFGS
Extrinsic approach:
Function
Riemannian gradient
Retraction
- Evaluate RX (η)
Intrinsic approach:
Function
Riemannian gradient
Retraction
- Compute η from η̃d
- Evaluate RX (η)
Intrinsic cost = Extrinsic cost + 2n3 + o(n3)
Means of SPD Matrices 15
-
Complexity Comparison for LRBFGS
Extrinsic approach:
Function
Riemannian gradient
Retraction
Riemannian metric
- 6n3 + o(n3)
Intrinsic approach:
Function
Riemannian gradient
Retraction
Reduces to Euclidean metric
- n2 + o(n2)
Means of SPD Matrices 16
-
Complexity Comparison for LRBFGS
Extrinsic approach:
Function
Riemannian gradient
Retraction
Riemannian metric
(2m) times of vector transport
Intrinsic approach:
Function
Riemannian gradient
Retraction
Reduces to Euclidean metric
No explicit vector transport
Means of SPD Matrices 17
-
Complexity Comparison for LRBFGS
Extrinsic approach:
Function
Riemannian gradient
Retraction
Riemannian metric
(2m) times of vector transport
Intrinsic approach:
Function
Riemannian gradient
Retraction
Reduces to Euclidean metric
No explicit vector transport
Complexity comparison:
f +∇f +27n3 + 12mn2+
2m × Vector transport cost
f +∇f +22n3/3 + 4mn2
Means of SPD Matrices 18
-
Numerical Results: Comparison of Different Algorithms
K = 100, size = 3× 3, d = 6
1 ≤ κ(Ai ) ≤ 200
iterations0 5 10 15 20 25 30 35 40 45 50
dist(µ,X
t)
10-15
10-10
10-5
100
RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS
time (s)0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
dist(µ,X
t)
10-15
10-10
10-5
100
RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS
103 ≤ κ(Ai ) ≤ 107
iterations0 5 10 15 20 25 30 35 40 45 50
dist(µ,X
t)
10-15
10-10
10-5
100
RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS
time (s)0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
dist(µ,X
t)
10-15
10-10
10-5
100
RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS
Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations
Means of SPD Matrices 19
-
Numerical Results: Comparison of Different Algorithms
K = 30, size = 100× 100, d = 5050
1 ≤ κ(Ai ) ≤ 20
iterations0 5 10 15 20 25 30 35 40 45 50
dis
t(7;X
t)
10-15
10-10
10-5
100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS
time (s)0 1 2 3 4 5 6 7 8
dis
t(7;X
t)
10-15
10-10
10-5
100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS
104 ≤ κ(Ai ) ≤ 107
iterations0 5 10 15 20 25 30 35 40 45 50
dis
t(7;X
t)
10-15
10-10
10-5
100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS
time (s)0 5 10 15
dis
t(7;X
t)
10-15
10-10
10-5
100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS
Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations
Means of SPD Matrices 20
-
Numerical Results: Riemannian vs. Euclidean Metrics
K = 100, n = 3, and 1 ≤ κ(Ai ) ≤ 106.
iterations0 20 40 60 80 100
dis
t(7;X
t)
10-6
10-4
10-2
100 RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc
time (s)0 0.05 0.1 0.15
dis
t(7;X
t)
10-6
10-4
10-2
100 RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc
K = 30, n = 100, and 1 ≤ κ(Ai ) ≤ 105.
iterations0 20 40 60 80 100
dis
t(7;X
t)
10-6
10-4
10-2
100RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc
time (s)0 5 10 15 20 25
dis
t(7;X
t)
10-6
10-4
10-2
100RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc
Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations
Means of SPD Matrices 21
-
Outline
Karcher mean computation on Sn++
Divergence-based means on Sn++
L1-norm median computation on Sn++
Application: Structure tensor image denoising
Summary
Means of SPD Matrices 22
-
Motivations
Karcher mean
K(A1, . . . ,AK ) = arg minX∈Sn++
1
2K
K∑i=1
δ2(X ,Ai ), (1)
where δ(X ,Y ) = ‖ log(X−1/2YX−1/2)‖Fpros: holds desired properties
cons: high computational cost
Use divergences as alternatives to the geodesic distance due to theircomputational and empirical benefits
A divergence is like a distance except it lacks
triangle inequality
symmetry
Means of SPD Matrices 23
-
LogDet α-divergence and Associated Mean
The LogDet α-divergence is defined as
G (A1, . . . ,Ak) = arg minX∈Sn++
1
2K
K∑i=1
δ2LD,α(Ai ,X ) , (2)
where the LogDet α-divergence on Sn++ is given by
δ2LD,α(X ,Y ) =4
1− α2log
det( 1−α2 X +1+α
2 Y )
det(X )1−α
2 det(Y )1+α
2
The LogDet α-divergence is asymetric in general, except for α = 0
(2) defines the right mean. The left mean can be defined in a similarway.
Means of SPD Matrices 24
-
Karcher Mean vs. LogDet α-divergence Mean
Complexity comparison for problem-related operations
function gradient total
LD α-div. mean2Kn3
33Kn3
11Kn3
3
Karcher mean 18Kn3 5Kn3 23Kn3
Invariance properties
scalinginvariance
rotationinvariance
congruenceinvariance
inversioninvariance
LD α-div. mean 3 3 3 7
Karcher mean 3 3 3 3
Means of SPD Matrices 25
-
Numerical Experiment: Comparions of Different Algorithms
K = 100, n = 3, and 10 ≤ κ(Ai ) ≤ 106 Fixed Point iteration2: stepsize =1 − α
2K
iterations0 100 200 300 400
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:9
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.02 0.04 0.06 0.08
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:9
FPRSDRBBLRBFGSRBFGSRNewton
iterations0 50 100 150 200
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:5
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.005 0.01 0.015 0.02 0.025 0.03
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:5
FPRSDRBBLRBFGSRBFGSRNewton
iterations0 20 40 60 80 100
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.005 0.01 0.015 0.02
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0
FPRSDRBBLRBFGSRBFGSRNewton
2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012
2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012
2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012
2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012
Means of SPD Matrices 26
-
Numerical Experiment: Comparions of Different Algorithms
K = 100, n = 3, and 10 ≤ κ(Ai ) ≤ 106
iterations0 10 20 30 40 50
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:9
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.002 0.004 0.006 0.008 0.01
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:9
FPRSDRBBLRBFGSRBFGSRNewton
iterations0 10 20 30 40 50
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:5
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.002 0.004 0.006 0.008 0.01
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0:5
FPRSDRBBLRBFGSRBFGSRNewton
iterations0 10 20 30 40 50
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0
FPRSDRBBLRBFGSRBFGSRNewton
time (s)0 0.002 0.004 0.006 0.008 0.01
dist(7;X
t)
10-8
10-6
10-4
10-2
100
102, = 0
FPRSDRBBLRBFGSRBFGSRNewton
Means of SPD Matrices 27
-
Outline
Karcher mean computation on Sn++
Divergence-based means on Sn++
L1-norm median computation on Sn++
Application: Structure tensor image denoising
Conclusion
Means of SPD Matrices 28
-
Motivations
The mean of a set of points is sensitive to outliers
The median is robust to outliers
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
points
outliers
mean
median
Figure: The geometric mean and median in R2 space.
Means of SPD Matrices 29
-
Riemannian Median of SPD Matrices
The Riemannian median of a set of SPD matrices is defined as:
M(A1, . . . ,AK ) = arg minX∈Sn++
1
2K
K∑i=1
δ(Ai ,X ),
where δ(X ,Y ) is a distance or the square root of a divergence function.
The cost function is non-smooth at X = Ai
There may exist multiple local minima for some δ
Means of SPD Matrices 30
-
Numerical Experiment
Original data (K = 50) Outliers
outliers 0 1 5 10 25 50
δR
Median
Mean
Means of SPD Matrices 31
-
Outline
Karcher mean computation on Sn++
Divergence-based means on Sn++
Riemannian L1-norm median computation on Sn++
Application: Structure tensor image denoising
Conclusion
Means of SPD Matrices 32
-
Application: Structure Tensor Image Denoising
A structure tensor image is a spatialstructured matrix field
I : Ω ⊂ Z2 → Sn++
Noisy tensor images are simulated byreplacing the pixel values by an outliertensor with a given probability Pr
Denoising is done by averagingmatrices in the neighborhood of eachpixel
Mean Riemannian Error:
MRE =1
#Ω
∑(i,j)∈Ω
δR(Ii,j , Ĩi,j)
Original tensor image
Noisy tensor image
Means of SPD Matrices 33
-
Structure Tensor Image Denoising: Pr = 0.02
Means of SPD Matrices 34
-
Structure Tensor Image Denoising: MRE and Time
MRE comparison
Means
A-m LE-m J-m K-m R-median α-m α-median
MRE
0
0.2
0.4
0.6
0.8
1
1.21.089
0.263
0.616
0.264
0.026
0.125
0.029
Pr = 0.02
Time comparison
MeansA-m LE-m J-m K-m R-median α-m α-median
Tim
e(s)
0
2
4
6
8
10
12
14
16
18
0.05 0.40 0.30
7.12
16.53
3.62
11.76
Pr = 0.02
Means of SPD Matrices 35
-
Structure Tensor Image Denoising: Pr = 0.1
Means of SPD Matrices 36
-
Structure Tensor Image Denoising: MRE and Time
MRE comparison
Means
A-m LE-m J-m K-m R-median α-m α-median
MRE
0
1
2
3
4
5
3.959
0.947
2.071
0.948
0.078
0.407
0.055
Pr = 0.1
Time comparison
MeansA-m LE-m J-m K-m R-median α-m α-median
Tim
e(s)
0
2
4
6
8
10
12
14
16
0.04 0.40 0.31
6.47
14.31
4.66
11.32
Pr = 0.1
Means of SPD Matrices 37
-
Summary
Karcher mean for SPD matrices
Analyze the conditioning of the Hessian of the Karcher mean costfunctionApply a limited-memory Riemannian BFGS method to computingthe SPD Karcher mean with efficient implementationsRecommend using LRBFGS as the default method for the SPDKarcher mean computation
Other averaging techniques for SPD matrices
Investigate divergence-based means and Riemannian L1-normmedians on Sn++Use recent development in Riemannian optimization to developefficient and robust algorithm on Sn++
Evaluate the performance of different averaging techniques inapplications
Means of SPD Matrices 38
-
Thank you!
Means of SPD Matrices 39
-
References I
Ognjen Arandjelovic, Gregory Shakhnarovich, John Fisher, RobertoCipolla, and Trevor Darrell.Face recognition with image sets using manifold density divergence.In Computer Vision and Pattern Recognition, 2005. CVPR 2005.IEEE Computer Society Conference on, volume 1, pages 581–588.IEEE, 2005.
D. A. Bini and B. Iannazzo.Computing the Karcher mean of symmetric positive definitematrices.Linear Algebra and its Applications, 438(4):1700–1710, 2013.
Guang Cheng, Hesamoddin Salehian, and Baba Vemuri.Efficient recursive algorithms for computing the mean diffusiontensor and applications to DTI segmentation.Computer Vision–ECCV 2012, pages 390–401, 2012.
Means of SPD Matrices 40
-
References II
P. T. Fletcher and S. Joshi.Riemannian geometry for the statistical analysis of diffusion tensordata.Signal Processing, 87(2):250–262, 2007.
Wen Huang, P-A Absil, and Kyle A Gallivan.A Riemannian symmetric rank-one trust-region method.Mathematical Programming, 150(2):179–216, 2015.
Wen Huang, Kyle A Gallivan, and P-A Absil.A Broyden class of quasi-Newton methods for Riemannianoptimization.SIAM Journal on Optimization, 25(3):1660–1685, 2015.
Zhiwu Huang, Ruiping Wang, Shiguang Shan, and Xilin Chen.Face recognition on large-scale video in the wild with hybridEuclidean-and-Riemannian metric learning.Pattern Recognition, 48(10):3113–3124, 2015.
Means of SPD Matrices 41
-
References III
Bruno Iannazzo and Margherita Porcelli.The Riemannian Barzilai-Borwein method with nonmonotoneline-search and the Karcher mean computation.Optimization online, December, 2015.
B. Jeuris, R. Vandebril, and B. Vandereycken.A survey and comparison of contemporary algorithms for computingthe matrix geometric mean.Electronic Transactions on Numerical Analysis, 39:379–402, 2012.
H. Karcher.Riemannian center of mass and mollifier smoothing.Communications on Pure and Applied Mathematics, 1977.
J. Lawson and Y. Lim.Monotonic properties of the least squares mean.Mathematische Annalen, 351(2):267–279, 2011.
Means of SPD Matrices 42
-
References IV
Jiwen Lu, Gang Wang, and Pierre Moulin.Image set classification using holistic multiple order statisticsfeatures and localized multi-kernel metric learning.In Proceedings of the IEEE International Conference on ComputerVision, pages 329–336, 2013.
Q. Rentmeesters and P.-A. Absil.Algorithm comparison for Karcher mean computation of rotationmatrices and diffusion tensors.In 19th European Signal Processing Conference, pages 2229–2233,Aug 2011.
Q. Rentmeesters.Algorithms for data fitting on some common homogeneous spaces.PhD thesis, Universite catholiqué de Louvain, 2013.
Means of SPD Matrices 43
-
References V
Y. Rathi, A. Tannenbaum, and O. Michailovich.Segmenting images on the tensor manifold.In IEEE Conference on Computer Vision and Pattern Recognition,pages 1–8, June 2007.
Gregory Shakhnarovich, John W Fisher, and Trevor Darrell.Face recognition from long-term observations.In European Conference on Computer Vision, pages 851–865.Springer, 2002.
Oncel Tuzel, Fatih Porikli, and Peter Meer.Region covariance: A fast descriptor for detection and classification.In European conference on computer vision, pages 589–600.Springer, 2006.
Xinru Yuan, Wen Huang, P.-A. Absil, and Kyle A. Gallivan.A Riemannian limited-memory BFGS algorithm for computing thematrix geometric mean.Procedia Computer Science, 80:2147–2157, 2016.
Means of SPD Matrices 44