Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de...

$: Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de Sitter spaces, anti-de Sitter spaces, ..... I Network models: social networks \behave$
Semi-Riemannian Manifold Optimization

Tingran Gao

William H. Kruskal InstructorCommittee on Computational and Applied Mathematics (CCAM)

Department of StatisticsThe University of Chicago

2018 China-Korea International Conference on Matrix Theory with ApplicationsShanghai University, Shanghai, China

December 19, 2018

Outline

Motivation

I Riemannian Manifold Optimization

I Barrier Functions and Interior Point Methods


I Optimality Conditions

I Descent Direction

I Semi-Riemannian First Order Methods

I Metric Independence of Second Order Methods

Optimization on Semi-Riemannian Submanifolds

Joint work with Lek-Heng Lim (The University of Chicago) and Ke Ye (Chinese Academy of Sciences)

Riemannian Manifold Optimization

I Optimization on manifolds:

minx∈M

f (x)

where (M, g) is a (typically nonlinear, non-convex)Riemannian manifold, f :M→ R is a smooth function on M

I Difficult constrained optimization; handled as unconstrainedoptimization from the perspective of manifold optimization

I Key ingredients: tangent spaces, geodesics, exponential map,parallel transport, gradient ∇f , Hessian ∇2f


I Riemannian Steepest Descent

xk+1 = Expxk (−t∇f (xk)) , t ≥ 0

I Riemannian Newton’s Method[∇2f (xk)

]ηk = −∇f (xk)

xk+1 = Expxk (−tηk) , t ≥ 0

I Riemannian trust region

I Riemannian conjugate gradient

I ......

Gabay (1982), Smith (1994), Edelman et al. (1998), Absil et al. (2008), Adler et al. (2002), Ring and Wirth(2012), Huang et al. (2015), Sato (2016), etc.


I Applications: Optimization problems on matrix manifoldsI Stiefel manifolds

Vk (Rn) = O (n) /O (k)

I Grassmannian manifolds

Gr (k , n) = O (n) / (O (k)× O (n − k))

I Flag manifolds: for n1 + · · ·+ nd = n,

Flagn1,··· ,nd = O (n) / (O (n1)× · · · × O (nd))

I Shape Spaces (Rk×n\ {0}

)/O (n)

I ......

From Riemannian to Semi-Riemannian

Motivation 1: Question from a Geometer

How does optimization depend on the choice of metrics?

I Critical points {x ∈M | ∇f (x) = 0} are metric independent

I Gradients, Hessians, Geodesics are metric dependent

I Index of Hessian becomes metric independent at a criticalpoint; Morse theory

I Optimization trajectories are obviously metric dependent

I Newton’s equation is metric independent[∇2f (xk)

]ηk = −∇f (xk) ⇔

[∇2f (xk)

]ηk = −∇f (xk)

I Does it still work if the metric structure is more general thanRiemannian? (E.g. Semi-Riemannian, Finsler, etc.)

Barrier Functions and Interior Point Methods

f : Q → R strictly convex, defined on an open subset Q ⊂ Rd .

I ∇2f (x) � 0 on any x ∈ Q

I ∇2f (x) defines a Riemannian metric on Q

I gradient under this Riemannian metric:[∇2f (x)

]−1∇f (x)

I Newton’s method for f is exactly the gradient descent underthe Riemannian metric defined by ∇2f (x)!

I Nesterov and Todd (2002): primal-dual central path are“close to” geodesics under this Riemannian metric

• Nesterov, Yurii E., and Michael J. Todd. On the Riemannian geometry defined by self-concordant barriers andinterior-point methods. Foundations of Computational Mathematics 2, no. 4 (2002): 333-361.• Duistermaat, Johannes J. On Hessian Riemannian structures. Asian Journal of Mathematics 5, no. 1 (2001):79-91.

From Riemannian to Semi-Riemannian

Motivation 2: Nonconvex objectives for interior pointmethods

I What if f : Q → R is not strictly convex?

I What happens to Newton’s method if the Hessian is notpositive semi-definite? (Newton’s method with modifiedHessian)

Semi-Riemannian Geometry

I Scalar product: 〈·, ·〉 : V × V → R symmetric bilinear formon vector space V , not necessarily positive (semi-)definite

I Non-degeneracy: 〈v ,w〉 = 0 for all w ∈ V ⇔ v = ~0

I A semi-Riemannian manifold is a differentiable manifold Mwith a scalar product 〈·, ·〉x defined on TxM, and 〈·, ·〉x variessmoothly with respect to x ∈ M

I A semi-Riemannian manifold is non-degenerate if 〈·, ·〉x isnon-degenerate for all x ∈ M

I Non-degeneracy needs not be inherited by subspaces!

I Semi-Riemannian submanifold, geodesic, parallel transport,gradient, Hessian, ...... all well-defined if non-degenerate

I Of great interest to general relativity as a spacetime model(Lorentzian geometry)

Semi-Riemannian Geometry

A Simplest Degenerate Subspace Example

I Consider R2 equipped with the Lorentzian metric of signature(−,+), i.e., for x = (x1, x2) ∈ R2 and y = (y1, y2) ∈ R2,define the scalar product as 〈x , y〉 = −x1y1 + x2y2

I Let W = span {(1, 1)}, which is a degenerate subspace of R2

I If we define W⊥ :={v ∈ R2 | 〈w , v〉 = 0∀w ∈W

}, then one

has W = W⊥!

I Take-home message: If W is a degenerate subspace of ascalar product space (V , 〈·, ·〉), then in general W +W⊥ 6= V ,and it may well be the case that W ∩W⊥ 6= 0!

I Nevertheless: If W is a non-degenerate subspace of V , thenit is still true that V = W ⊕W⊥

Examples of Semi-Riemannian Manifolds

I Minkowski space Rp,q (p ≥ 0, q ≥ 0): vector space Rp+q

equipped with scalar product

〈u, v〉 = u>Ip,qv

where

Ip,q =

(−Ip 0

0 Iq

)I Square matrices Rn×n, equipped with scalar product

〈A,B〉 = Tr(A>Ip,qB

)= vec (A)> (In ⊗ Ip,q) vec (B)

I Submanifolds of semi-Riemannian manifolds

Examples of Semi-Riemannian Manifolds: Applications

I Spacetime models: Lorentzian spaces, de Sitter spaces,anti-de Sitter spaces, ......

I Network models: social networks “behave like” spaces ofnegative Alexandrov curvature

• Carroll, Sean M. Spacetime and geometry: An introduction to general relativity. San Francisco, USA:Addison-Wesley, 2004.• Krioukov, Dmitri, Maksim Kitsak, Robert S. Sinkovits, David Rideout, David Meyer, and Marian Boguna.Network cosmology. Scientific reports 2 (2012): 793.

Outline

Motivation





I Descent Direction





Optimality Conditions

Given a semi-Riemannian manifold with scalar product 〈·, ·〉x onTxM, define the semi-Riemannian gradient of a smooth functionf ∈ C 2 (M) by

〈Df ,X 〉x = Xf (x) , ∀X ∈ Γ (M,TxM) , x ∈ M

and define the semi-Riemannian Hessian of f by

D2f (X ,Y ) = X (Yf )− (DXY ) f , ∀X ,Y ∈ Γ (M,TxM)

where DXY is the covariant derivative of Y with respect to X .

Optimality Conditions

Necessary conditions: A local optimum x ∈ M satisfies

I (First order necessary condition) Df (x) = 0 if f : M → R isfirst-order differentiable

I (Second order necessary condition) Df (x) = 0 andD2f (x) � 0 if f : M → R is second-order differentiable

Sufficient condition:

I (Second-order sufficient condition) If x ∈ M is an interiorpoint on a semi-Riemannian manifold M, if Df (x) = 0 andD2f (x) � 0, then x is a strict relative minimum of f

Descent Direction

I If general, −Df (x) is not a descent direction at x ∈ M! Themain issue is that 〈−Df (x) ,Df (x)〉x may be negative, so thefirst-order approximationf (x − Df (x)) ≈ f (x)− 〈Df (x) ,Df (x)〉x needs not besmaller than f (x)

I Similar issue exists in Newton’s method: When Hessian∇2f (x) is not positive definite, the Newton direction− [∇f (x)]−1∇f (x) may not be a descent direction

I Solution for Newton’s method: Modified Hessian Methods

I For semi-Riemannian optimization: similar idea of modifyingDf (x)

Descent Direction

Solution: modify −Df (x) to obtain a descent direction on thesemi-Riemannian manifold

I For each TxM, use semi-Riemannian Gram-Schmidt to find anorthonormal basis e1, · · · , ed for TxM

I Construct [Df (x)]+ =n∑

i=1

〈Df (x) , ei 〉 ei

I − [Df (x)]+ is a descent direction, since

⟨− [Df (x)]+ ,Df (x)

⟩= −

n∑i=1

|〈Df (x) , ei 〉|2 < 0

Semi-Riemannian Steepest Descent I/II

Semi-Riemannian Steepest Descent II/II

Semi-Riemannian Conjugate Gradient

Semi-Riemannian Second-Order Methods

I Newton’s Method: Solve[D2f (x)

]η = Df (x) for a descent

direction η ∈ TxM at x ∈ M. Note that this is exactly thesame Newton’s equation for Riemannian optimization!

I Trust Region: Minimize a local quadratic optimizationproblem near each x ∈ M. Again same as Riemannianoptimization!

I The equivalence between Riemannian and Semi-Riemanniansecond-order methods is not just a coincidence; it is becausethese methods build upon local polynomial surrogates, whichare always metric-independent in the sense of jets.

Outline

Motivation





I Descent Direction





Which manifolds admit semi-Riemannian structures?

I Euclidean spaces and all Riemannian manifolds are particularsemi-Riemannian manifolds

I Tangent bundles of manifolds admitting a semi-Riemannianmetric that is not Riemannian must admit at least onedecomposition into two sub-bundles

I Any manifold with trivial tangent bundle admitssemi-Riemannian manifolds, e.g. all matrix Lie groups,Minkowski Rp,q

I Unfortunately, almost all matrix Lie groups are eitherdegenerate semi-Riemannian manifolds as submanifolds of theambient Minkowski space, or do not admit closed-formsemi-Riemannian geodesics and/or parallel-transports


Minkowski Spaces Rp,q (∼= Rp+q as vector spaces)For all x , y ∈ Rp,q and V ,W ∈ TxRp,q:

I Tangent Spaces: TxRp,q = Rp,q

I Scalar Product: 〈V ,W 〉x = −p∑

i=1

ViWi +

p+q∑j=p+1

VjWj

I Geodesics: Expx (tV ) = x + tV

I Parallel Transport: Px→yV = V ,∀V ∈ TxRp,q = TyRp,q

I Semi-Riemannian Gradient: Df (x) = Ip,q∇f (x)

I Semi-Riemannian Hessian: D2f (x) = Ip,q∇2f (x) Ip,q

Example: Minkowski Spaces

minx∈R1,1

x>Ax , where A � 0

-1 -0.5 0 0.5 1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-1 -0.5 0 0.5 1

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1


Euclidean sphere in a Minkowski space Sp+q−1

I Tangent Spaces: TxSp+q−1 = {V ∈ TxRp,q | 〈V , Ip,qx〉 = 0}I Scalar Product: Inherited from the ambient space Rp,q

I The induced scalar product is non-degenerate except for a setof measure zero Z :=

{x ∈ Sp+q−1 | x>Ip,qx = 0

}I Projection from TxRp,q to TxSp+q−1:

Projx (V ) = V − V>x

x>Ip,qxIp,qx , ∀x ∈ Sp+q−1\Z

I Geodesics: No closed-form expression, but can useRiemannian geodesics for optimization purposes

I Parallel Transport: No closed-form expression, but can useRiemannian parallel-transports for optimization purposes


Euclidean sphere in a Minkowski space Sp+q−1

I Semi-Riemannian Gradient:

Df (x) = Projx (∇f (x)) , ∀x ∈ Sp+q−1\Z

I Semi-Riemannian Hessian:

D2f (x) = Projx ◦ ∇2f (x) , ∀x ∈ Sp+q−1\Z

(Viewing ∇2f (x) as a linear map from TxRp,q to TxRp,q,and D2f (x) as a linear map from TxSp+q−1 to TxSp+q−1)

Example: Euclidean Spheres in Minkowski Spaces

maxx2

1 +···+x2p+q=1

x>Ax , where A = A>

0 20 40 60 80 100 120 140 16010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

101

0 20 40 60 80 100 120 140 160 18010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

101

Optimization on Semi-Riemannian SubmanifoldsPseudosphere Sp,q in a Minkowski space Sp+q−1

Sp,q :={x ∈ Rp,q | −x2

1 − · · · − x2p + x2

p+q + · · ·+ x2p+q = 1

}I Tangent Spaces: TxSp,q = {V ∈ TxRp,q | 〈V , x〉 = 0}I Scalar Product: Inherited from the ambient space Rp,q

Non-degenerate!

I Projection from TxRp,q to TxSp,q:

Projx (V ) = V −(V>Ip,qx

)x , ∀x ∈ Sp,q

I Semi-Riemannian Gradient:

Df (x) = Projx (∇f (x)) , ∀x ∈ Sp,q

I Semi-Riemannian Hessian:

D2f (x) = Projx ◦ ∇2f (x) , ∀x ∈ Sp,q


Pseudosphere Sp,q in a Minkowski space Sp+q−1

Sp,q :={x ∈ Rp,q | −x2

1 − · · · − x2p + x2

p+q + · · ·+ x2p+q = 1

}I Geodesics: For any x ∈ Sp,q and V ∈ TxSp,q,

Expx (tV ) =

x cos(t ‖V ‖) +V

‖V ‖sin(t ‖V ‖), if 〈V ,V 〉 > 0,

x cosh(t ‖V ‖) +V

‖V ‖sinh(t ‖V ‖), if 〈V ,V 〉 < 0,

x + tV , if 〈V ,V 〉 = 0.

where ‖V ‖ :=√|〈V ,V 〉|.



I Parallel Transport along geodesics: For any x ∈ Sp,q andV ∈ TxSp,q, the parallel transport of V along a geodesicemanating from x in the direction X ∈ TxSp,q is

Px→Expx (tX ) (V ) =

−〈V ,X 〉‖X‖

[x sin (t ‖X‖)− X

‖X‖cos(t ‖X‖)

],

+ +(V − 〈V ,X 〉X/ ‖X‖2

), if 〈X ,X 〉 > 0,

−〈V ,X 〉‖X‖

[x sinh (t ‖X‖) +

X

‖X‖cosh(t ‖X‖)

]+(V + 〈V ,X 〉X/ ‖X‖2

), if 〈X ,X 〉 < 0,

−〈V ,X 〉(tx +

1

2t2X

)+ V , if 〈X ,X 〉 = 0.



Sp,q :={x ∈ Rp,q | −x2

1 − · · · − x2p + x2

p+q + · · ·+ x2p+q = 1

}Semi-Riemannian vs. Riemannian optimization on pseudospheres

I Riemannian geodesics and parallel transports are much harderto compute on Sp,q, if possible at all

I Sp,q is non-compact

• Fiori, Simone. Learning by natural gradient on noncompact matrix-type pseudo-Riemannian manifolds. IEEEtransactions on neural networks 21, no. 5 (2010): 841-852.

Example: Pseudospheres

minx∈Sp,q

‖x − ξ‖22 , where ξ /∈ Sp,q

0 20 40 60 80 100 120 140 160 180 200

10 -6

10 -4

10 -2

10 0

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)• MATLAB code available at https://github.com/trgao10/SemiRiem

https://github.com/trgao10/SemiRiem

Matrix Manifolds

I Matrix manifolds are all Lie groups, so their tangent bundlesare all trivial. They admit semi-Riemannian metrics ofarbitrary index.

I Unfortunately, very few matrix manifolds inheritnon-degenerate semi-Riemannian structures from the ambientMinkowski space; geodesics and parallel-transports for thenon-degenerate ones are hard to compute in general

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)

Thank you!

Future Work:

I More applications, especially for matrix computation

I Trajectory analysis for non-convex optimization

I Numerical Differential Geometry

• Tingran Gao, Lek-Heng Lim, and Ke Ye. Semi-Riemannian Manifold Optimization. arXiv:1812.07643 (2018)• MATLAB code available at https://github.com/trgao10/SemiRiem

https://github.com/trgao10/SemiRiem

Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de...

Documents

Transcript of Semi-Riemannian Manifold Optimization · 2019. 4. 4. · I Spacetime models: Lorentzian spaces, de...