representing work spaces generically in construction method models
Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de...
Transcript of Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de...
Semi-Riemannian Manifold Optimization
Tingran Gao
William H. Kruskal InstructorCommittee on Computational and Applied Mathematics (CCAM)
Department of StatisticsThe University of Chicago
2018 China-Korea International Conference on Matrix Theory with ApplicationsShanghai University, Shanghai, China
December 19, 2018
Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods
Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods
Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences)
Riemannian Manifold Optimization
I Optimization on manifolds:
minx∈M
f (x)
where (M, g) is a (typically nonlinear, non-convex)Riemannian manifold, f :M→ R is a smooth function on M
I Difficult constrained optimization; handled as unconstrainedoptimization from the perspective of manifold optimization
I Key ingredients: tangent spaces, geodesics, exponential map,parallel transport, gradient ∇f , Hessian ∇2f
Riemannian Manifold Optimization
I Riemannian Steepest Descent
xk+1 = Expxk (−t∇f (xk)) , t ≥ 0
I Riemannian Newton’s Method[∇2f (xk)
]ηk = −∇f (xk)
xk+1 = Expxk (−tηk) , t ≥ 0
I Riemannian trust region
I Riemannian conjugate gradient
I ......
Gabay (1982), Smith (1994), Edelman et al. (1998), Absil et al. (2008), Adler et al. (2002), Ring and Wirth(2012), Huang et al. (2015), Sato (2016), etc.
Riemannian Manifold Optimization
I Applications: Optimization problems on matrix manifoldsI Stiefel manifolds
Vk (Rn) = O (n) /O (k)
I Grassmannian manifolds
Gr (k , n) = O (n) / (O (k)× O (n − k))
I Flag manifolds: for n1 + · · ·+ nd = n,
Flagn1,··· ,nd = O (n) / (O (n1)× · · · × O (nd))
I Shape Spaces (Rk×n\ {0}
)/O (n)
I ......
From Riemannian to Semi-Riemannian
Motivation 1: Question from a Geometer
How does optimization depend on the choice of metrics?
I Critical points {x ∈M | ∇f (x) = 0} are metric independent
I Gradients, Hessians, Geodesics are metric dependent
I Index of Hessian becomes metric independent at a criticalpoint; Morse theory
I Optimization trajectories are obviously metric dependent
I Newton’s equation is metric independent[∇2f (xk)
]ηk = −∇f (xk) ⇔
[∇2f (xk)
]ηk = −∇f (xk)
I Does it still work if the metric structure is more general thanRiemannian? (E.g. Semi-Riemannian, Finsler, etc.)
Barrier Functions and Interior Point Methods
f : Q → R strictly convex, defined on an open subset Q ⊂ Rd .
I ∇2f (x) � 0 on any x ∈ Q
I ∇2f (x) defines a Riemannian metric on Q
I gradient under this Riemannian metric:[∇2f (x)
]−1∇f (x)
I Newton’s method for f is exactly the gradient descent underthe Riemannian metric defined by ∇2f (x)!
I Nesterov and Todd (2002): primal-dual central path are“close to” geodesics under this Riemannian metric
• Nesterov, Yurii E., and Michael J. Todd. On the Riemannian geometry defined by self-concordant barriers andinterior-point methods. Foundations of Computational Mathematics 2, no. 4 (2002): 333-361.• Duistermaat, Johannes J. On Hessian Riemannian structures. Asian Journal of Mathematics 5, no. 1 (2001):79-91.
From Riemannian to Semi-Riemannian
Motivation 2: Nonconvex objectives for interior pointmethods
I What if f : Q → R is not strictly convex?
I What happens to Newton’s method if the Hessian is notpositive semi-definite? (Newton’s method with modifiedHessian)
Semi-Riemannian Geometry
I Scalar product: 〈·, ·〉 : V × V → R symmetric bilinear formon vector space V , not necessarily positive (semi-)definite
I Non-degeneracy: 〈v ,w〉 = 0 for all w ∈ V ⇔ v = ~0
I A semi-Riemannian manifold is a differentiable manifold Mwith a scalar product 〈·, ·〉x defined on TxM, and 〈·, ·〉x variessmoothly with respect to x ∈ M
I A semi-Riemannian manifold is non-degenerate if 〈·, ·〉x isnon-degenerate for all x ∈ M
I Non-degeneracy needs not be inherited by subspaces!
I Semi-Riemannian submanifold, geodesic, parallel transport,gradient, Hessian, ...... all well-defined if non-degenerate
I Of great interest to general relativity as a spacetime model(Lorentzian geometry)
Semi-Riemannian Geometry
A Simplest Degenerate Subspace Example
I Consider R2 equipped with the Lorentzian metric of signature(−,+), i.e., for x = (x1, x2) ∈ R2 and y = (y1, y2) ∈ R2,define the scalar product as 〈x , y〉 = −x1y1 + x2y2
I Let W = span {(1, 1)}, which is a degenerate subspace of R2
I If we define W⊥ :={v ∈ R2 | 〈w , v〉 = 0∀w ∈W
}, then one
has W = W⊥!
I Take-home message: If W is a degenerate subspace of ascalar product space (V , 〈·, ·〉), then in general W +W⊥ 6= V ,and it may well be the case that W ∩W⊥ 6= 0!
I Nevertheless: If W is a non-degenerate subspace of V , thenit is still true that V = W ⊕W⊥
Examples of Semi-Riemannian Manifolds
I Minkowski space Rp,q (p ≥ 0, q ≥ 0): vector space Rp+q
equipped with scalar product
〈u, v〉 = u>Ip,qv
where
Ip,q =
(−Ip 0
0 Iq
)I Square matrices Rn×n, equipped with scalar product
〈A,B〉 = Tr(A>Ip,qB
)= vec (A)> (In ⊗ Ip,q) vec (B)
I Submanifolds of semi-Riemannian manifolds
Examples of Semi-Riemannian Manifolds: Applications
I Spacetime models: Lorentzian spaces, de Sitter spaces,anti-de Sitter spaces, ......
I Network models: social networks “behave like” spaces ofnegative Alexandrov curvature
• Carroll, Sean M. Spacetime and geometry: An introduction to general relativity. San Francisco, USA:Addison-Wesley, 2004.• Krioukov, Dmitri, Maksim Kitsak, Robert S. Sinkovits, David Rideout, David Meyer, and Marian Boguna.Network cosmology. Scientific reports 2 (2012): 793.
Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods
Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods
Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences)
Optimality Conditions
Given a semi-Riemannian manifold with scalar product 〈·, ·〉x onTxM, define the semi-Riemannian gradient of a smooth functionf ∈ C 2 (M) by
〈Df ,X 〉x = Xf (x) , ∀X ∈ Γ (M,TxM) , x ∈ M
and define the semi-Riemannian Hessian of f by
D2f (X ,Y ) = X (Yf )− (DXY ) f , ∀X ,Y ∈ Γ (M,TxM)
where DXY is the covariant derivative of Y with respect to X .
Optimality Conditions
Necessary conditions: A local optimum x ∈ M satisfies
I (First order necessary condition) Df (x) = 0 if f : M → R isfirst-order differentiable
I (Second order necessary condition) Df (x) = 0 andD2f (x) � 0 if f : M → R is second-order differentiable
Sufficient condition:
I (Second-order sufficient condition) If x ∈ M is an interiorpoint on a semi-Riemannian manifold M, if Df (x) = 0 andD2f (x) � 0, then x is a strict relative minimum of f
Descent Direction
I If general, −Df (x) is not a descent direction at x ∈ M! Themain issue is that 〈−Df (x) ,Df (x)〉x may be negative, so thefirst-order approximationf (x − Df (x)) ≈ f (x)− 〈Df (x) ,Df (x)〉x needs not besmaller than f (x)
I Similar issue exists in Newton’s method: When Hessian∇2f (x) is not positive definite, the Newton direction− [∇f (x)]−1∇f (x) may not be a descent direction
I Solution for Newton’s method: Modified Hessian Methods
I For semi-Riemannian optimization: similar idea of modifyingDf (x)
Descent Direction
Solution: modify −Df (x) to obtain a descent direction on thesemi-Riemannian manifold
I For each TxM, use semi-Riemannian Gram-Schmidt to find anorthonormal basis e1, · · · , ed for TxM
I Construct [Df (x)]+ =n∑
i=1
〈Df (x) , ei 〉 ei
I − [Df (x)]+ is a descent direction, since
⟨− [Df (x)]+ ,Df (x)
⟩= −
n∑i=1
|〈Df (x) , ei 〉|2 < 0
Semi-Riemannian Steepest Descent I/II
Semi-Riemannian Steepest Descent II/II
Semi-Riemannian Conjugate Gradient
Semi-Riemannian Second-Order Methods
I Newton’s Method: Solve[D2f (x)
]η = Df (x) for a descent
direction η ∈ TxM at x ∈ M. Note that this is exactly thesame Newton’s equation for Riemannian optimization!
I Trust Region: Minimize a local quadratic optimizationproblem near each x ∈ M. Again same as Riemannianoptimization!
I The equivalence between Riemannian and Semi-Riemanniansecond-order methods is not just a coincidence; it is becausethese methods build upon local polynomial surrogates, whichare always metric-independent in the sense of jets.
Outline
Motivation
I Riemannian Manifold Optimization
I Barrier Functions and Interior Point Methods
Semi-Riemannian Manifold Optimization
I Optimality Conditions
I Descent Direction
I Semi-Riemannian First Order Methods
I Metric Independence of Second Order Methods
Optimization on Semi-Riemannian Submanifolds
Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences)
Which manifolds admit semi-Riemannian structures?
I Euclidean spaces and all Riemannian manifolds are particularsemi-Riemannian manifolds
I Tangent bundles of manifolds admitting a semi-Riemannianmetric that is not Riemannian must admit at least onedecomposition into two sub-bundles
I Any manifold with trivial tangent bundle admitssemi-Riemannian manifolds, e.g. all matrix Lie groups,Minkowski Rp,q
I Unfortunately, almost all matrix Lie groups are eitherdegenerate semi-Riemannian manifolds as submanifolds of theambient Minkowski space, or do not admit closed-formsemi-Riemannian geodesics and/or parallel-transports
Optimization on Semi-Riemannian Submanifolds
Minkowski Spaces Rp,q (∼= Rp+q as vector spaces)For all x , y ∈ Rp,q and V ,W ∈ TxRp,q:
I Tangent Spaces: TxRp,q = Rp,q
I Scalar Product: 〈V ,W 〉x = −p∑
i=1
ViWi +
p+q∑j=p+1
VjWj
I Geodesics: Expx (tV ) = x + tV
I Parallel Transport: Px→yV = V ,∀V ∈ TxRp,q = TyRp,q
I Semi-Riemannian Gradient: Df (x) = Ip,q∇f (x)
I Semi-Riemannian Hessian: D2f (x) = Ip,q∇2f (x) Ip,q
Example: Minkowski Spaces
minx∈R1,1
x>Ax , where A � 0
-1 -0.5 0 0.5 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-1 -0.5 0 0.5 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Optimization on Semi-Riemannian Submanifolds
Euclidean sphere in a Minkowski space Sp+q−1
I Tangent Spaces: TxSp+q−1 = {V ∈ TxRp,q | 〈V , Ip,qx〉 = 0}I Scalar Product: Inherited from the ambient space Rp,q
I The induced scalar product is non-degenerate except for a setof measure zero Z :=
{x ∈ Sp+q−1 | x>Ip,qx = 0
}I Projection from TxRp,q to TxSp+q−1:
Projx (V ) = V − V>x
x>Ip,qxIp,qx , ∀x ∈ Sp+q−1\Z
I Geodesics: No closed-form expression, but can useRiemannian geodesics for optimization purposes
I Parallel Transport: No closed-form expression, but can useRiemannian parallel-transports for optimization purposes
Optimization on Semi-Riemannian Submanifolds
Euclidean sphere in a Minkowski space Sp+q−1
I Semi-Riemannian Gradient:
Df (x) = Projx (∇f (x)) , ∀x ∈ Sp+q−1\Z
I Semi-Riemannian Hessian:
D2f (x) = Projx ◦ ∇2f (x) , ∀x ∈ Sp+q−1\Z
(Viewing ∇2f (x) as a linear map from TxRp,q to TxRp,q,and D2f (x) as a linear map from TxSp+q−1 to TxSp+q−1)
Example: Euclidean Spheres in Minkowski Spaces
maxx2
1 +···+x2p+q=1
x>Ax , where A = A>
0 20 40 60 80 100 120 140 16010
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
101
0 20 40 60 80 100 120 140 160 18010
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
101
Optimization on Semi-Riemannian SubmanifoldsPseudosphere Sp,q in a Minkowski space Sp+q−1
Sp,q :={x ∈ Rp,q | −x2
1 − · · · − x2p + x2
p+q + · · ·+ x2p+q = 1
}I Tangent Spaces: TxSp,q = {V ∈ TxRp,q | 〈V , x〉 = 0}I Scalar Product: Inherited from the ambient space Rp,q
Non-degenerate!
I Projection from TxRp,q to TxSp,q:
Projx (V ) = V −(V>Ip,qx
)x , ∀x ∈ Sp,q
I Semi-Riemannian Gradient:
Df (x) = Projx (∇f (x)) , ∀x ∈ Sp,q
I Semi-Riemannian Hessian:
D2f (x) = Projx ◦ ∇2f (x) , ∀x ∈ Sp,q
Optimization on Semi-Riemannian Submanifolds
Pseudosphere Sp,q in a Minkowski space Sp+q−1
Sp,q :={x ∈ Rp,q | −x2
1 − · · · − x2p + x2
p+q + · · ·+ x2p+q = 1
}I Geodesics: For any x ∈ Sp,q and V ∈ TxSp,q,
Expx (tV ) =
x cos(t ‖V ‖) +V
‖V ‖sin(t ‖V ‖), if 〈V ,V 〉 > 0,
x cosh(t ‖V ‖) +V
‖V ‖sinh(t ‖V ‖), if 〈V ,V 〉 < 0,
x + tV , if 〈V ,V 〉 = 0.
where ‖V ‖ :=√|〈V ,V 〉|.
Optimization on Semi-Riemannian Submanifolds
Pseudosphere Sp,q in a Minkowski space Sp+q−1
I Parallel Transport along geodesics: For any x ∈ Sp,q andV ∈ TxSp,q, the parallel transport of V along a geodesicemanating from x in the direction X ∈ TxSp,q is
Px→Expx (tX ) (V ) =
−〈V ,X 〉‖X‖
[x sin (t ‖X‖)− X
‖X‖cos(t ‖X‖)
],
+ +(V − 〈V ,X 〉X/ ‖X‖2
), if 〈X ,X 〉 > 0,
−〈V ,X 〉‖X‖
[x sinh (t ‖X‖) +
X
‖X‖cosh(t ‖X‖)
]+(V + 〈V ,X 〉X/ ‖X‖2
), if 〈X ,X 〉 < 0,
−〈V ,X 〉(tx +
1
2t2X
)+ V , if 〈X ,X 〉 = 0.
Optimization on Semi-Riemannian Submanifolds
Pseudosphere Sp,q in a Minkowski space Sp+q−1
Sp,q :={x ∈ Rp,q | −x2
1 − · · · − x2p + x2
p+q + · · ·+ x2p+q = 1
}Semi-Riemannian vs. Riemannian optimization on pseudospheres
I Riemannian geodesics and parallel transports are much harderto compute on Sp,q, if possible at all
I Sp,q is non-compact
• Fiori, Simone. Learning by natural gradient on noncompact matrix-type pseudo-Riemannian manifolds. IEEEtransactions on neural networks 21, no. 5 (2010): 841-852.
Example: Pseudospheres
minx∈Sp,q
‖x − ξ‖22 , where ξ /∈ Sp,q
0 20 40 60 80 100 120 140 160 180 200
10 -6
10 -4
10 -2
10 0
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)• MATLAB code available at https://github.com/trgao10/SemiRiem
Matrix Manifolds
I Matrix manifolds are all Lie groups, so their tangent bundlesare all trivial. They admit semi-Riemannian metrics ofarbitrary index.
I Unfortunately, very few matrix manifolds inheritnon-degenerate semi-Riemannian structures from the ambientMinkowski space; geodesics and parallel-transports for thenon-degenerate ones are hard to compute in general
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)
Thank you!
Future Work:
I More applications, especially for matrix computation
I Trajectory analysis for non-convex optimization
I Numerical Differential Geometry
• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)• MATLAB code available at https://github.com/trgao10/SemiRiem