A SEMISMOOTH NEWTON METHOD FOR THE NEAREST · 2018-06-01 · A SEMISMOOTH NEWTON METHOD FOR THE...

A SEMISMOOTH NEWTON METHOD FOR THE NEARESTEUCLIDEAN DISTANCE MATRIX PROBLEM∗

HOU-DUO QI†

Abstract. The Nearest Euclidean distance matrix problem (NEDM) is a fundamental compu-tational problem in applications such as multidimensional scaling and molecular conformation fromnuclear magnetic resonance data in computational chemistry. Especially in the latter application,the problem is often large scale with the number of atoms ranging from a few hundreds to a fewthousands. In this paper, we introduce a semismooth Newton method that solves the dual problemof (NEDM). We prove that the method is quadratically convergent. We then present an applicationof the Newton method to NEDM with H-weights. We demonstrate the superior performance ofthe Newton method over existing methods including the latest quadratic semi-definite programmingsolver. This research also opens a new avenue towards efficient solution methods for the molecularembedding problem.

Key words. Euclidean distance matrix, semismooth Newton method, quadratic convergence.

AMS subject classifications. 49M45, 90C25, 90C33

1. Introduction. Finding a Euclidean distance matrix (EDM) that is nearestto a given data matrix is a fundamental computational problem in many applicationsincluding multidimensional scaling and molecular conformation from nuclear magneticresonance data in computational chemistry. We do not intend to give a detailedaccount of the importance of EDM to the two applications. Instead we simply pointto the excellent books [3] by Borg and Gronenen and [9] by Cox and Cox for the formerapplication, and [10] by Crippen and Havel and the review paper [31] (and referencestherein) by Neumaier for the latter. We also refer to a recent paper [13] by Fang andO’Leary for algorithmic comparisons on different approaches to the EDM completionproblem, which is closely related to ours. For its link to the latest development insemidefinite programming, see Dattorro [11], Toh [39] and a recent survey [27] byKrislock and Wolkowicz. The purpose of this paper is to propose an efficient Newtonmethod for large scale problems of this type. Below we describe the problem in detailand review some existing methods that motivate our research.

Let Sn denote the space of n×n symmetric matrices equipped with the standardinner product 〈A,B〉 = trace(AB) for A,B ∈ Sn. Let ‖ · ‖ denote the induced Frobe-nius norm. Let Sn+ denote the cone of symmetric matrices in Sn (often abbreviatedas X 0 for X ∈ Sn+). The so-called hollow subspace Snh is defined by

Snh := A ∈ Sn : diag(A) = 0 ,

where diag(A) is the vector formed by the diagonal elements of A. A matrix D isa EDM if D ∈ Snh and there exist points p1, . . . , pn in IRr (r ≤ n − 1) such thatDij = ‖pi − pj‖2 for i, j = 1, . . . , n. IRr is often referred to as the embedding spaceand r is the embedding dimension when it is the smallest such r. It is probably themost well-known result on EDM that a matrix D ∈ Snh is a EDM if and only if

J(−D)J 0 and J := I − eeT /n, (1)

∗September 27, 2011. Revised February 28, 2012†School of Mathematics, University of Southampton, Highfield, Southampton SO17 1BJ, UK.

E-mail: [email protected].

1

2 HOU-DUO QI

where I (or In when the indication of dimension is needed) is the identity matrix inSn and e is the vector of all ones in IRn. The origin of this result can be traced backto Schoenberg [36] and an independent work [40] by Young and Householder. Seealso Gower [20] for a nice derivation of (1). The corresponding embedding dimensionr = rank(JDJ) ≤ n− 1.

It is noted that the matrix J , when treated as an operator, is the orthogonalprojection onto the subspace e⊥ := x ∈ IRn : eTx = 0. The characterization (1)simply requests that (−D) is positive semidefinite on the subspace e⊥, that is,

−D ∈ Kn+ :=A ∈ Sn : xTAx ≥ 0, x ∈ e⊥

. (2)

It is easy to check whether a given data matrix D is a EDM via (1). If it is not,it is often to calculate the nearest EDM to D in order to retain as much distanceinformation as possible. This problem can be formulated as the following nearestEuclidean distance matrix problem:

min ‖D −X‖2/2 s.t. X ∈ Snh ∩ Kn+. (3)

Given (−D) being used in (2), the matrix D in (3) should be −D. This change ofsign has been widely adopted to reformulate (3) (see., e.g., [16, 14, 1]) and it remindsus that the objective is to minimize a distance. The widely used H-weighted version(see [1]) is defined as

min ‖H (D −X)‖2/2 s.t. X ∈ Snh ∩ Kn+, (4)

where H ∈ Sn is nonnegative (i.e., Hij ≥ 0) and denotes the Hadamard productamong matrices. In practice, the magnitude of Hij reflects the level of accuracy of thecorresponding distance Dij . The H-weighted problem is much more difficult to solvethan (3) (Note: (4) reduces to (3) when H = E, the matrix of all ones in Sn.) Ourmain purpose of this paper is to develop a fast convergent Newton method for (3) andthen apply it to (4). Below we conduct a brief literature review on both problems.

Problem (3) has been the main subject of several important papers. We first notethat the feasible region is the intersection of the subspace Snh and the closed convexcone Kn+. Hence, alternating projection methods of Dykstra-Han type [12, 21] are achoice. In fact, one of such methods called Modified Alternating Projection (MAP) wasstudied by Glunt et al. [16]. The same method was independently studied by Gaffkeand Mathar [14], but based on a different projection formula on Kn+ (see (14) and(15)). However, MAP does not apply to (4) unless H = E. Problem (3) (and in general(4)) can also be solved by Semi-Definite Programming (SDP) initiated by Alfakih etal. [1]. We note that the dimension of Snh is n(n−1)/2, so is the dimension of the coneEn of the Euclidean distance matrices, where En := Snh ∩ (−Kn+) (see [23].) Alfakih

et al. introduced their famous linear mapping: KV : Sn−1+ 7→ En defined by

KV (X) = diag(V XV T )eT + ediag(V XV T )T − 2V XV T ,

where V ∈ IRn×(n−1) satisfies V TV = In−1 and V e = 0. Then (4) is equivalent tothe problem

min ‖H (KV (X)−D)‖2/2 s.t. X ∈ Sn−1+ . (5)

Alfakih et al. studied an interior point method based on the Gauss-Newton direction.This method can only deal with problems with size up to a hundred [39]. Problem (5)

COMPUTING THE NEAREST EUCLIDEAN DISTANCE MATRIX 3

(possibly with more linear equalities) was one of the major convex quadratic SDPsstudied by Toh [39], where problem size n can be a couple of thousands. Other linearmappings instead of KV can be used, see [27]. Generally speaking, specific transforma-tions must take place before SDP can be applied to (3) and (4). Such transformationsaim to shift the difficulty of handling the cone Kn+ to new objective functions de-fined on Sn+. The cost is that the new objective function is more complicated thanthe original distance function. Those transformations tend to destroy nice geometricproperties of Kn+, which we will take advantage of to develop our Newton’s method.

Problem (3) also plays a very important role in solving the embedding problem.In multidimensional scaling, the given matrix D is often called dissimilarity matrixand the embedding dimension r should be small; whereas in molecular conformationD is often called a predistance matrix (i.e., D ∈ Snh with Dij ≥ 0 for all i, j) and theembedding dimension r = 3. Therefore, the well-known embedding problem [17] is tofind the nearest EDM with a low embedding dimension r:

min ‖D −X‖2/2 s.t. X ∈ Snh ∩ Kn+ and rank(JXJ) ≤ r. (6)

Compared to (3), problem (6) is nonconvex and is hence extremely difficult to solve.The Phase I of the two-phase methods in [17] for (6) is to find the optimal solution of(3) and modifies it as a starting point for the Phase II method. Further developmentscan be found in [18, 19]. Fast algorithms for (3) are crucial in those applications.

Problem (3) also bears a remarkable resemblance to the nearest correlation matrixproblem:

min ‖C −X‖2/2 s.t. diag(X) = e, X ∈ Sn+, (7)

where C ∈ Sn is given. The constraints define the set of all n×n correlation matrices.Higham [25] studied this problem and made it widely accessible to the community ofnumerical analysis and optimization. The subsequent research, all trying to designefficient algorithms for (7), include (to just name a few) [28, 5, 33, 39, 4]. An importantapproach emerged from those studies is the Lagrangian dual approach, which wasfirstly applied to (7) by Malick [28] and Boyd and Xiao [5]. The dual approach wasthen studied by Qi and Sun [33] to design what is now known as one of the mostefficient methods for (7): the semismooth Newton method. This also partly motivatedour research in this paper.

When applied to problem (3), the Lagrangian dual problem becomes (see [30,Thm. 2.2] and also [35, 28, 5]):

miny∈IRn

θ(y) := ‖ProjKn+(D +A∗(y))‖2/2, (8)

where ProjKn+(·) denotes the orthogonal projection onto the closed convex cone Kn+and A∗(y) := Diag(y), the diagonal matrix with y being its diagonal. Function θ(·)is just once continuously differentiable, but convex. Since the Slater condition holdsfor (3) (see Prop. 2.1), θ(·) is coercive, i.e., θ(y)→ +∞ as ‖y‖ → +∞. Therefore, (8)must have an optimal solution, which can be found through the first-order optimalitycondition:

F (y) := ∇θ(y) = A(

ProjKn+(D +A∗(y)))

= 0, (9)

where A : Sn 7→ IR is the diagonal operator used in defining Snh . If y is a solution of(9), then

X := ProjKn+(D +A∗(y)), (10)

4 HOU-DUO QI

is the optimal solution of (3). Hence, it is enough to solve the dual problem and it isrelatively easy to solve as it is defined in IRn rather than in Sn.

It follows from the projection formula of Gaffke and Mathar (15) that F (y) isstrongly semismooth because it is a composition of linear mappings and ΠSn+(·) (the

orthogonal projection onto Sn+) and ΠSn+(·) has been known to be strongly semismooth

[38, 6]. Now it becomes natural to develop the semismooth Newton method: Giveny0 ∈ IRn and let k := 0. Compute Vk ∈ ∂F (yk) and

yk+1 = yk − V −1k F (yk), k = 1, 2, . . . (11)

where ∂F (y) denotes the generalized Jacobian of F at y in the sense of Clarke [8,Sect. 2.6]. Since F is the gradient of θ, ∂F is often called the generalized Hessian of θ,denoted by ∂2θ(y). We refer to [33, Sect. 3] for a detailed development of (11) for (7).The above arguments leading to the Newton method (11) for (3) fail to hold for theH-weighted problem (4) because the projection onto Kn+ under the H-weights doesnot have an analytical formula. It is already very difficult to calculate the projectionunder the H-weights, let alone computing its generalized Jacobian.

Therefore, our main tasks in this paper are (i) to address the quadratic conver-gence of (11); (ii) to demonstrate its superb numerical performance, especially onlarge scale problems; and (iii) to apply it to the H-weighted problem (4).

The paper is organized as follows. The first two sections below are devoted tothe Newton method (11). In the next section, we include some notation and tech-nical results. One of the results states that problem (3) is constraint nondegenerate(Prop. 2.2). A characterization of the constraint nondegeneracy (Prop. 2.3) general-izes the corresponding result in SDP of Alizadeh et al. [2]. In Section 3, we conductquadratic convergence analysis of Newton’s method (11). The main result is Prop. 3.3,which says that every matrix in the generalized Jacobian of F at the optimal solu-tion is positive definite. This result then leads to the quadratic convergence resultThm. 3.5. Sect. 4 includes an application of the Newton method to the H-weightedproblem (4). We report our numerical results in Sect. 5 and we conclude the paperin Sect. 6 by discussing the use of Newton’s method in future research.

2. Preliminaries. In this section, we first list most of notation used in thispaper and review two formulae of ProjKn+ . We finish this section by establishing two

results on the Slater condition and constraint nondegeneracy of (3).

2.1. Notation and Two Formulae of ProjKn+ . Apart from Sn, Sn+, Snh , Kn+, J

and A and A∗ we have mentioned in the introduction, we also need the following (“:=”means “define”): ei is the ith unit basis vector in IRn and e is the vector of all ones.Q := I − 2

vT vvvT with v = [1, . . . , 1,

√n + 1]T ∈ IRn (Householder matrix). ΠSn+(X)

is the orthogonal projection of X onto Sn+. TKn+(A) and NKn+(A) are respectively the

tangent cone and the normal cone of Kn+ at A ∈ Kn+. lin(TKn+(A)) is the largest linear

space contained in TKn+(A). A B := [AijBij ] is the Hadamard product between twomatrices A and B of same size. The tangent cone can be defined as follows,

TKn+(A) :=A ∈ Sn : 〈A,B〉 ≤ 0 ∀ B ∈ NKn+(A)

. (12)

We note that Q is symmetric and orthogonal: Q2 = I. We often split matrix X ∈ Sninto blocks

X =

[X1 xxT x0

], with X1 ∈ Sn−1, x ∈ IRn−1, x0 ∈ IR.


For subsets α, β of 1, . . . , n, denote Bαβ as the submatrix of B indexed by α andβ. Bα denotes the submatrix consisting of columns of B indexed by α, and |α| is thecardinality of α.

There are two known formulae for computing ProjKn+ . One is due to Hayden and

Wells [22, Thm. 2.1]:

A ∈ Kn+ ⇐⇒ QAQ =:

[A aaT a0

]and A ∈ Sn−1+ , (13)

and

ProjKn+(A) = Q

[ΠSn−1

+(A) a

aT a0

]Q, ∀ A ∈ Sn. (14)

The other is due to Gaffke and Mathar [14, Eq. 29]

ProjKn+(A) = A+ ΠSn+(−JAJ), ∀ A ∈ Sn. (15)

We note that the original projection formula of Gaffke and Mathar is onto (−Kn+).Each formula has its own advantage. Formula (14) states that the projection is in factcarried out onto Sn−1+ , while (15) brings the formula to the defining space Sn. Wewill use Gaffke-Mathar formula in our numerical implementation and Hayden-Wellsformula for our analysis because it brings out the rich structures that exist in TKn+(A).

2.2. Slater Condition and Constraint Nondegeneracy. The following re-sult on Slater condition is important as it ensures that the dual problem (8) musthave an optimal solution.

Proposition 2.1. The Slater condition holds, that is, int(Kn+) ∩ Snh 6= ∅. Here,int(Kn+) represents the interior of Kn+.

Proof. Let

A0 := Q

[In−1 0

0 −(n− 1)

]Q =: QU0Q.

According to (14), A0 ∈ Kn+. It is easy to verify that A0 = In − eeT , which impliesdiag(A0) = 0 and hence A0 ∈ Snh . Therefore A0 ∈ Snh ∩Kn+. Let U be a neighborhoodat U0. Then QUQ is a neighborhood at A0 because Q is orthogonal. Moreover, anymatrix A ∈ QUQ has a corresponding matrix B ∈ U such that

QAQ = B =

[A aaT a0

].

We can shrink U whenever necessary to make A close enough to In−1 so that A ∈ Sn−1+ .It follows from (13) that A ∈ Kn+. This proves the whole neighborhood QUQ ⊆ Kn+.In other words, A0 ∈ int(Kn+) and the Slater condition holds.

Constraint nondegeneracy plays a very important role in optimization, see [2,Def. 5], [7, Def. 9], and [32, Sect. 2] for its use in SDP. Generally speaking, it ensurescertain regularity of optimal solutions. For our problem (3), the nondegeneracy ofA ∈ Snh ∩ Kn+ amounts to the following condition:

A(

lin(TKn+(A)))

= IRn. (16)

6 HOU-DUO QI

Let A ∈ Kn+ and

A = Q

[Z zzT z0

]Q, Z ∈ Sn−1. (17)

Then Z 0 by (13). We assume that rank(Z) = r and let 0 < λ1 ≤ λ2 ≤ . . . ≤ λrbe the r positive eigenvalues of Z in nondecreasing order. Let Λ = Diag(λ1, . . . , λr).We assume that Z takes the following spectral decomposition

Z = U

[Λ

0

]UT , (18)

where UTU = In−1. The normal cone NKn+(A) is given by [16, Thm. 3.1].

NKn+(A) =

Q U

[0 00 M

]UT 0

0 0

Q : −M ∈ Sn−r−1+

.

Let

U :=

[U 00 1

]∈ IRn×n. (19)

Then UTU = I and the normal cone can be equivalently written as

NKn+(A) =

QU [ 0 0

0 M

]0

0 0

UTQ : −M ∈ Sn−r−1+

.

By the definition (12) of the tangent cone in terms of NKn+(A), we have

TKn+(A) =

QU [ Σ1 Σ12

ΣT12 Σ2

]a

aT a0

UTQ :Σ1 ∈ Sr,Σ2 ∈ Sn−r−1+

Σ12 ∈ IRr×(n−r−1)

a ∈ IRn−1, a0 ∈ IR

(20)

=

Q U

[Σ1 Σ12

ΣT12 Σ2

]UT a

aT a0

Q :Σ1 ∈ Sr,Σ2 ∈ Sn−r−1+

Σ12 ∈ IRr×(n−r−1)

a ∈ IRn−1, a0 ∈ IR

.(21)

The last equality used the facts that U is nonsingular and [aT , a0] goes all over thespace IRn. Now we are ready to prove the following result.

Proposition 2.2. Constraint nondegeneracy holds at each feasible point A ofproblem (3).

Proof. We only need to prove condition (16). It follows from (20) that

lin(TKn+(A)) =

QU [ Σ1 Σ12

ΣT12 0

]a

aT a0

UTQ :Σ1 ∈ SrΣ12 ∈ IRr×(n−r−1)

a ∈ IRn−1, a0 ∈ IR

. (22)

It is obvious from (21) that

A = Q

[0(n−1)×(n−1) a

aT a0

]Q ∈ lin(TKn+(A)) ∀ [aT , a0] ∈ IRn.


Let b ∈ IRn be arbitrary. We will find [aT , a0] ∈ IRn such that

A(A) = b. (23)

We calculate the diagonal of A. For i = 1, . . . , n, we have

Aii = eTi Q

[0(n−1)×(n−1) a

aT a0

]Qei = trace

(Qei(e

Ti Q)

[0(n−1)×(n−1) a

aT a0

])= eTn (Qeie

Ti Q)

[2aa0

]= − 1√

neT (eie

Ti Q)

[2aa0

](using Qen = − 1√

ne)

= − 1√n

(eTi Q)

[2aa0

].

We therefore have

A(A) = − 1√nQ

[2aa0

].

Substituting it to (23) to solve for a and a0, we obtain[2aa0

]= −√nQb.

With such choice of a and a0 in A, we have b = A(A) ∈ A(lin(TKn+(A))). This proves

(16) and hence constraint nondegeneracy at A.Prop. 2.2 is not practical enough for our use. We now develop a result for later

use. Let Ei := eieTi for i = 1, . . . , n and

Bi := QEiQ =:

[Bi1 bi

(bi)T bi0

]with Bi1 ∈ Sn−1.

Let U be defined as in (18). We define the corresponding index sets:

α(Z) := i : λi > 0 and α(Z) := 1, 2, . . . , n− 1 \ α(Z). (24)

Whenever no confusion is caused, we short-write α(Z), α(Z) as α and α respectively.We write

U = [Uα, Uα] .

We further define

Ci :=

[ UTαBi1Uα UTαB

i1Uα

UTαBi1Uα 0

]UT bi

(bi)TU bi0

= UTBiU −

[ 0 00 UTαB

i1Uα

]0

0 0

,(25)

where U is defined by (19).Proposition 2.3. The matrices Cini=1 are linearly independent at each feasible

point A of (3).

8 HOU-DUO QI

Proof. We note that constraint nondegeneracy condition (16) is equivalent to

Null(A) + lin(TKn+(A)) = Sn, (26)

where Null(A) is the null space of A. (26) in turn is equivalent to

Null(A)⊥ ∩(

lin(TKn+(A)))⊥

= 0, (27)

where Null(A)⊥ denotes the subspace orthogonal to Null(A).Suppose (27) holds, we prove the claim by contradiction. Assume that Cini=1

are linearly dependent. Then there exists 0 6= h ∈ IRn such that∑ni=1 hiCi = 0. It

follows from (25) that

n∑i=1

hiCi = UT(∑

(hiBi))U −

[ 0 00 UTα

∑(hiB

i1)Uα

]0

0 0

= U

T(Q(A∗(h))Q

)U −

[ 0 00 UTα

∑(hiB

i1)Uα

]0

0 0

. (28)

Hence,∑ni=1 hiCi = 0 implies

UT(Q(A∗(h))Q

)U =

[ 0 00 UTα

∑(hiB

i1)Uα

]0

0 0

. (29)

It is obvious that A∗(h) ∈ Null(A)⊥. For any A ∈ lin(TKn+(A)), we have

〈A∗(h), A〉 = 〈UTQ(A∗(h))QU, UTQAQU〉 = 〈UTα

∑(hiB

i1)Uα, 0〉 = 0.

The first equality used the fact QU is orthogonal because each of them is so. Thesecond equality used (29) and the structure of lin(TKn+(A)) in (22). Hence

0 6= A∗h ∈ Null(A)⊥ ∩(

lin(TKn+(A)))⊥,

which contradicts (27). This proves the linear independence of Cini=1.As a matter of fact, it is not hard to derive constraint nondegeneracy at A from

the linear independence of Cini=1. In other words, linear independence of Cini=1

is equivalent to constraint nondegeneracy at A. It is interesting to note that thisequivalent characterization is a natural extension of a result of Alizadeh, Haeberly,and Overton [2, Thm. 6] on primal nondegeneracy in SDP from Sn+ to Kn+.

3. Quadratic Convergence. This section is mainly concerned with the quadraticconvergence of Newton’s method (11). Globalizing the Newton method is straight-forward as the dual problem (8) is convex (see Sect. 5). Our key result is that everymatrix in ∂2θ(y) is positive definite when y is an optimal solution of (8). This resultwill lead to the desired quadratic convergence. To facilitate our analysis, we need tostudy the structure of ∂2θ(y) = ∂F (y).

It follows from the definition of F (y) in (9) and the Jacobian chain rule of Clarke[8, Thm. 2.6.6] that we have

∂2θ(y)h ⊆ h−A(∂ΠSn+(Y )(J(A∗h)J)

), (30)

where Y := −J(D+A∗y)J . We will reveal the rich structures in ∂ΠSn+(Y )(J(A∗h)J)

step by step so as to prove our ultimate result on quadratic convergence of (11).


3.1. Generalized Jacobian of ΠSn+(·). Let

Y := −J(D +A∗(y))J and Y = PΛPT , (31)

where PTP = I and Λ := Diag(λ1, . . . , λn) with λ1 ≥ λ2 ≥ . . . ≥ λn being eigenvaluesof Y in nonincreasing order (note that the eigenvalues in (18) are arranged in nonde-creasing order. It will become clear why we have done so. See the comments below(53).) For those eigenvalues, define the corresponding symmetric matrix Ω ∈ Sn withentries

Ωij :=maxλi, 0+ maxλj , 0

|λi|+ |λj |, i, j = 1, . . . , n (32)

where 0/0 is defined to be 1.We further define three index sets:

α(Y ) := i : λi > 0, β(Y ) := i : λi = 0, γ(Y ) := i : λi < 0. (33)

We will drop the dependence of those indices on Y whenever no confusion is caused.We have the following formula of describing ∂ΠSn+(Y ).

Proposition 3.1. [37, Prop. 2.2] Suppose that Y ∈ Sn has the spectral decom-position as in (31). Then V ∈ ∂ΠSn+(Y ) if and only if there exists V|β| ∈ ∂ΠS|β|+

(0)

such that

V (H) = P

Hαα Hαβ Ωαγ Hαγ

HTαβ V|β|(Hββ) 0

ΩTαγ HTαγ 0 0

PT , ∀ H ∈ Sn (34)

where H := PTHP .Therefore, to specify an element V ∈ ∂ΠSn+(Y ) one needs to specify the corre-

sponding V|β| from ∂ΠS|β|+(0). It is usually complicated to specify all elements in

∂ΠS|β|+(0), see [29], which is solely devoted to a detailed characterization. But for us

we only need the following property on V|β|.

〈Z1, V|β|(Z2)〉 ≤ ‖Z1‖‖Z2‖, ∀ Z1, Z2 ∈ S |β|. (35)

This can be easily proved by using [7, Eq. (17)].The general description in (34) is not adequate for our further analysis. We need

to break it into pieces that will reveal the structure of our problem. Next we establisha useful relationship between P and Q.

3.2. Relationship between P and Q. The following identity has been usedby Glunt et al. [16, P. 591].

Q

[In−1 0

0 0

]Q = J. (36)

It follows from (31) that

Y = Q

[Y1 00 0

]Q, (37)

10 HOU-DUO QI

where we denote

−Q(D +A∗(y))Q =:

[Y1 yyT y0

]with Y1 ∈ Sn−1. (38)

Let Y1 ∈ Sn−1 take the spectral decomposition

Y1 = U Λ1UT , (39)

where Λ1 := Diag(λ1, . . . , λn−1) with λ1 ≥ . . . ≥ λn−1 being eigenvalues of Y1 andUTU = In−1. Define

α := i : λi > 0, β := i : λi = 0, and γ := i : λi < 0, (40)

and

U = [Uα, Uβ , Uγ ]. (41)

Then we have

Y :=

[Y1 00 0

]=

[U 00 1

] [Λ1

0

] [UT 00 1

].

This means that in addition to λ1, . . . , λn−1, 0 is the last eigenvalue of Y and en isthe corresponding eigenvector. It follows from (37), Y and Y share the same set ofeigenvalues because Q is orthogonal. The relationship between index sets α, β, andγ in (33) and α, β, and γ is

α = α, β = β ∪ |α|+ |β|+ 1, and γ = i+ 1 : i ∈ γ.

We define U by

U :=

[Uα Uβ 0 Uγ0 0 1 0

]=[Uα Uβ Uγ

], (42)

where

Uα :=

[Uα0

], Uβ :=

[Uβ 0

0 1

], and Uγ :=

[Uγ0

].

We then arrive at

Y = QY Q = QUΛUTQ.

Therefore, the matrix P in (31) can be chosen to satisfy

P = QU and U is defined by (42). (43)

3.3. Structure of ∂ΠSn+(Y )(J(A∗h)J). We let

H := J(A∗h)J and H := Q(A∗h)Q =:

[H1 h

hT h0

], (44)


where H1 ∈ Sn−1. By the identity in (36), we have

H = Q

[H1 00 0

]Q and hence QHQ =

[H1 00 0

].

We also note from (43) that

Pα = QUα, Pβ = QUβ , and Pγ = QUγ .

We also recall from Prop. 3.1 that H = PTHP . It follows that

Hαα = PTαHPα = UT

αQHQUα = UTαH1Uα.

Similarly, we can calculate the following

Hαβ =[UTαH1Uβ 0

], Hαγ = UTαH1Uγ ,

and

Hββ =

[UTβH1Uβ 0

0 0

], Hγγ = UTγ H1Uγ .

We have now completed our preparation to describe any element in ∂ΠSn+(Y )(J(A∗h)J)

for h ∈ IRn. The description only uses the spectral information of Y1 in (39) and H1

defined in (44). We note that h and h0 in H of (44) do not appear in our description.We put it in a proposition.

Proposition 3.2. For any y ∈ IRn, let Y := −J(D + A∗(y))J , which assumesthe spectral decomposition (31). Let matrix Ω ∈ Sn be defined in (32). Let H :=J(A∗h)J for h ∈ IRn. Then a matrix L ∈ ∂ΠSn+(Y )(J(A∗h)J) if and only if there

exists V ∈ ∂ΠSn+(Y ) such that L = V (H). Furthermore, V (H) has the following

characterization: There exists V|β| ∈ ∂ΠS|β|+(0) such that

V (H) = PW(H)PT , (45)

where P is defined by (43) and

W(H) =

UTαH1Uα

[UTαH1Uβ 0

]Ωαγ UTαH1Uγ[

UTβH1Uα

0

]V|β|

([UTβH1Uβ 0

0 0

])0

ΩTαγ UTγ H1Uα 0 0

. (46)

Proof. This result is just a new interpretation of the formula in Prop. 3.1 in termsof the above calculations.

This proposition will be used to study the nonsingularity of ∂2θ(y) in the nextsection. Before going to the next subsection, let us list two facts that will be usedthere.

The first fact is a simple observation on ‖h‖ for h ∈ IRn.

‖h‖2 = ‖A∗(h)‖2 = ‖PT (A∗h)P‖2 = ‖UTQ(A∗h)QU‖2 = ‖UTHU‖2. (47)

12 HOU-DUO QI

The second fact is about an inequality. Let

Gβ :=

[UTβH1Uβ UT

βh

hTUβ h0

], Gβ :=

[UTβH1Uβ 0

0 0

].

It is easy to see that ‖Gβ‖ ≤ ‖Gβ‖ and

‖Gβ‖2 − ‖Gβ‖2 = 2‖UT

βh‖2 + h20.

Hence we have

‖Gβ‖(‖Gβ‖ − ‖Gβ‖) ≥ ‖UTβh‖2 +

1

2h20. (48)

3.4. Nonsingularity of ∂2θ(y). We are ready to prove the following technicalresult.

Proposition 3.3. Let y be an optimal solution of the dual problem (8). Thenevery matrix M ∈ ∂2θ(y) is positive definite.

Proof. We continue to use the notation developed so far. Let M ∈ ∂2θ(y). Itfollows from (30) that there exists V ∈ ∂ΠSn+(Y ) satisfying (45), (46), and

Mh = h−A(PW(H)PT

),

where H is defined by (44).We now calculate 〈h, Mh〉.

〈h, Mh〉 = ‖h‖2 − 〈A∗h, PW(H)PT 〉 = ‖h‖2 − 〈PT (A∗h)P, W(H)〉

= ‖h‖2 − 〈UTQ(A∗h)QU, W(H)〉 (by (43))

= ‖UTHU‖2 − 〈UTHU, W(H)〉 (by (47), (44))

= 2‖UTα h‖2 + ‖UTαH1Uγ‖2 − 〈UTαH1Uγ , Ωαγ (UTαH1Uγ)〉

+2‖UT

βH1Uγ‖2 + ‖UTγ h‖2 + ‖UTγ H1Uγ‖2/2

+‖Gβ‖2 − 〈Gβ , V|β|(Gβ)〉.

The last equality made use of (46) and the structure of UTHU . Define τmax :=

maxi∈α,j∈γ Ωij . By (32), 0 < τmax < 1. We continue to simplify 〈h,Mh〉.

〈h, Mh〉 ≥ 2‖UTα h‖2 + ‖UTγ h‖2 + ‖UT

βH1Uγ‖2 + (1− τmax)‖UTαH1Uγ‖2

+‖UTγ H1Uγ‖2 + ‖Gβ‖2 − ‖Gβ‖‖Gβ‖ (by (35))

≥ 2

‖UTα h‖2 + ‖UTγ h‖2 +

1

2‖UT

βh‖2

+ ‖UTγ H1Uγ‖2

+2

(1− τmax)‖UTαH1Uγ‖2 + ‖UTβH1Uγ‖2

+

1

2h20 (by (48))

≥ 0. (49)

Hence, the assumption 〈h, V h〉 = 0 would imply

UTα h = 0, UTβh = 0, UTγ h = 0, and h0 = 0, (50)


and

UTαH1Uγ = 0, UTβH1Uγ = 0, UTγ H1Uγ = 0. (51)

Because of (41) and nonsingularity of U , (50) implies

h = 0 and h0 = 0. (52)

Since y is an optimal solution of (8), A := ProjKn+(D + A∗y) is the optimal

solution of (3) by (10). Obviously, A is feasible with respect to the constraints of (3).Constraint nondegeneracy holds at A due to Prop. 2.2. We assume A is decomposedas in (17). We now clarify what the matrix Z is. We recall the decomposition (38)and Hayden-Wells formula (14) for ProjKn+ . It follows that

A = ProjKn+(D+A∗y) = Q

[ΠSn−1

+(−Y1) −y

−yT −y0

]Q = Q

[UΠSn−1

+(−Λ1)UT −y−yT −y0

]Q.

Hence, the matrix Z in (17) has the form

Z = UΠSn−1+

(−Λ1)UT . (53)

(Comparing this expression to (18) explains why we have used notation U there). Nowwe recall the definitions of α(Z) and α(Z) in (24). By comparing them to those in(40), we know that there is one-to-one correspondence between the positive eigenvalues

λi(Z) indexed by α(Z) and negative eigenvalues of λi indexed by γ and that there isone-to-one correspondence between the zero eigenvalues λi(Z) indexed by α(Z) and

the eigenvalues of λi indexed by α ∪ β. We also recall that the eigenvalues of Z in(24) are arranged in nondecreasing order and the eigenvalues in Λ1 are arranged innonincreasing order. Therefore, the corresponding eigenvectors have the relationship:

Uα(Z) = Uγ and Uα(Z) = Uγ with γ := α ∪ β.

Then the matrices Cini=1 defined in (25) are linearly independent by Prop. 2.3.It follows from (52) that

H = Q(A∗h)Q =

[H1 h

hT h0

]=

[H1 00 0

].

Now we consider the linear combination∑hiCi, which has been derived in (28):

n∑i=1

hiCi = UT(Q(A∗(h))Q

)U −

[ 0 00 UTα

∑(hiB

i1)Uα

]0

0 0

=

UTγ H1Uγ UTγ H1Uγ 0

UTγ H1Uγ 0 00 0 0

.(51) forces

∑hiCi = 0. The linear independence of Ci in turn forces h = 0.

Therefore, 〈h, Mh〉 = 0 if and only if h = 0. In other words, 〈h, Mh〉 > 0 for0 6= h ∈ IRn by (49). This proves that M is positive definite.

14 HOU-DUO QI

We now state a couple of consequences of Prop. 3.3. The first is on the uniquenessof the optimal solution of the dual problem (8). Let us regard the gradient functionF (y) = ∇θ(y) as a mapping from IRn to IRn. The generalized Jacobian ∂F (y) is saidto be of maximal rank provided that every matrix M in ∂F (y) is of maximal rank(i.e., nonsingular) [8, P. 253]. It follows from Prop. 3.3 that ∂F (y∗) is of maximal rankprovided that y∗ is an optimal solution of (8). Then, the inverse function theorem ofClarke [8, Thm. 7.1.1] and the convexity of (8) lead to the following result.

Corollary 3.4. The dual problem (8) has a unique optimal solution.The second consequence of Prop. 3.3 is about the quadratic convergence of New-

ton’s method (11). We state it as a theorem.Theorem 3.5. Newton’s method (11) is quadratically convergent provided that

y0 is sufficiently close to the unique optimal solution y∗ of (8).Proof. In the general quadratic convergence-rate theorem of Qi and Sun [34,

Thm. 3.2] for general semismooth Newton methods, there are three conditions: (i)The function F is strongly semismooth, which is true for our case because it is acomposition of linear mappings and the strongly semismooth mapping ΠSn+(·) [38].

(ii) Every matrix in the generalized Jacobian of ∂F (y∗) is nonsingular, which has beenproved in Prop. 3.3. The last condition is that the initial point y0 stays close to y0.This proves our result.

Since (8) is convex, globalization of the Newton method (11) is an easy task. Wesimply use one of the well developed globalization method studied by Qi and Sun [33]in our numerical experiment.

4. Application to the H-weighted Problem. As briefly mentioned in In-troduction, the H-weighted problem (4) is much more difficult to solve than theunweighted case (3). In this section, we develop a global method for this difficultproblem. The most important feature of this method is that each subproblem is adiagonally weighted problem of (3), and this subproblem can be efficiently solved bya Newton method similar to (11). The bridge that links the H-weighted problem andthe diagonally weighted problem is the majorization approach introduced by Gao andSun [15] for a H-weighted nearest correlation matrix problem. We refer to [15] formore information about the majorization approach initially used in multidimensionalscaling. We will first demonstrate how this approach works for (4).

4.1. The Majorization Approach. Denote the objective function in (4) by

f(X) = 0.5‖H (X −D)‖2.

Obviously, f(·) is quadratic and its Taylor expansion at Xk ∈ Sn is

f(X) = f(Xk) + 〈H H (Xk −D), X −Xk〉+ 0.5‖H (X −Xk)‖2.

We replace the quadratic term by a simpler function ‖W 1/2(X −Xk)W 1/2‖2, whichsatisfies

‖W 1/2(X −Xk)W 1/2‖ ≥ ‖H (X −Xk)‖, ∀ X ∈ Sn

where W := Diag(w) and 0 < w ∈ IRn. A particular choice recommended by Gaoand Sun [15] is

wi := maxτ, maxHij : j = 1, . . . , n, i = 1, . . . , n, (54)


where τ > 0 is a constant. Define

fk(X) := f(Xk) + 〈H H (Xk −D), X −Xk〉+ 0.5‖W 1/2(X −Xk)W 1/2‖2. (55)

We certainly have the following property

fk(X) ≥ f(X) ∀ X ∈ Sn and fk(Xk) = f(Xk). (56)

Because of this property, fk(X) is called a majorization of f at Xk. The majorizationapproach aims to solve the following problem:

min fk(X), s.t. X ∈ Snh ∩ Kn+. (57)

We note that problem (57) is strictly convex, it has a unique solution, denoted byXk+1. We then have (because of (56))

f(Xk+1) ≤ fk(Xk+1) ≤ fk(Xk) = f(Xk). (58)

In other words, the solution of (57) provides a better point Xk+1 than Xk in termsof the original objective function. Property (58) is known as the sandwich propertyin majorization approaches.

Numerical implication of the majorization approach is then to solve a sequenceof the problem (57) starting from X0. Theoretically, we get a sequence of Xkwith decreasing function values. Numerically, this approach is sensible only if thenew problem (57) is much easier to solve than the original problem. We demonstratebelow it is the case.

4.2. Solving Subproblem (57). It is observed that problem (57) is actually adiagonally weighted problem of (3). To see this, we note that

fk(X) =1

2‖W 1/2(X − (Xk −Dk))W 1/2‖2 + f(Xk)− ‖W−1/2(H H (Xk −D))W−1/2‖2,

where Dk := W−1(H H (Xk−D))W−1. Ignoring the constant term in fk, problem(57) is equivalent to

min1

2‖W 1/2(X −Dk

)W 1/2‖2, s.t. X ∈ Snh ∩ Kn+, (59)

where Dk

:= Xk − Dk. Because of W = Diag(w), we call this problem diagonallyweighted version of problem (3).

Let

X := W 1/2XW 1/2 and D := W 1/2DkW 1/2.

Then problem (59) is equivalent to

min1

2‖X − D‖2 s.t. W−1/2XW−1/2 ∈ Snh ∩ Kn+. (60)

It is easy to verify that (because W is diagonal)

W−1/2XW−1/2 ∈ Snh if and only if X ∈ Snh ,

16 HOU-DUO QI

and

W−1/2XW−1/2 0 on e⊥ if and only if X 0 onW 1/2e

⊥.

Define the closed convex cone

Knw :=

X ∈ Sn : X 0 on

W 1/2e

⊥.

It follows from (60) that (59) is equivalent to

min1

2‖X − D‖2 s.t. X ∈ Snh ∩ Knw. (61)

This problem is almost the same as (3) except that Kn+ is being replaced by Knw. Wecan develop Newton’s method for this problem just as we have done for problem (3).We summarize this procedure below.

The corresponding dual problem and its first-order optimality condition are (see(8) and (9) respectively for problem (3)):

miny∈IRn

θw(y) :=1

2‖ProjKnw(D +A∗(y))‖2,

and

Fw(y) := ∇θw(y) = A(

ProjKnw(D +A∗(y)))

= 0.

The Newton method therefore takes the following form (see (11)):

yj+1 = yj − V −1j Fw(yj), j = 0, 1, 2, · · · (62)

where Vj ∈ ∂Fw(yj).In order to implement Newton’s method (62), we need to characterize the pro-

jection ProjKnw(A) for any A ∈ Sn. This can be done as follows. Let Q be the

Householder transformation that maps the vector W 1/2e to [0, . . . , 0,−‖W 1/2e‖]T .Let

v :=

√w1, . . . ,√wn−1,

√wn +

√√√√ n∑i=1

wi

T .Then

Q = I − 2

vT vvvT .

According to [22, Thm. 2.1] (take S = W 1/2e there), we have

ProjKnw(A) = Q

[ΠSn−1

+(A1) a

aT a0

]Q,

where

QAQ =

[A1 aaT a0

], with A1 ∈ Sn−1, a ∈ IRn−1, a0 ∈ IR.


Let

J := Q

[In−1 0

0 0

]Q = I − 1∑

wi

√w√wT

with√w :=

√w1

...√wn

.Then the corresponding formula of Gaffke and Mathar (15) is

ProjKnw(A) = A+ ΠSn+(−JAJ).

We can repeat the analysis conducted in Subsect. 2.2 and Sect. 3 to conclude that theNewton method (62) is quadratically convergent (see Thm. 3.5). We omit the details.

4.3. Global Method for the H-weighted Problem. Having addressed thequadratic convergence of Newton’s method (62), we are ready to formally state ourglobal method for the H-weighted problem (4).

Algorithm 4.1. (Global Method)Step 1. Choose X0 ∈ Sn. Set k := 0.Step 2. Define the function fk by (55). Use Newton’s method (62) to solve the prob-

lem (57). Denote the obtained optimal solution by Xk+1.Step 3. If Xk+1 = Xk, stop; otherwise, set k := k + 1 and go to Step 2.

We have the following remarks regarding this algorithm.(R1) An obvious choice of the initial pointX0 is obtained by solving the unweighted

problem (3). The initial point for Newton’s method (62) can be taken to bethe solution of the dual problem from the previous subproblem (57).

(R2) As indicated in (56), the algorithm produces a sequence Xk with decreasingfunction values in terms of the original function f . Moreover, for any Xk, wehave fk(Xk) = f(Xk) ≤ f(X0). That is, the sequence of the function valuesfk(Xk) is bounded above. We note that the sequence of the majorizationfunctions fk(X) is a sequence of uniformly strictly convex quadratic func-tions (they share the same Hessian matrix). Hence, the sequence Xk isbounded. At this point, we would like to point it out that the feasible regionin [15] is bounded. Therefore, there are more flexibilities in choosing the ma-jorization functions without having to worry about the boundedness of thegenerated sequence Xk. In contrast, our feasible region is the intersectionof a subspace and a closed convex cone, and hence is unbounded.

(R3) When defining fk in (55), we replaced the H-weighted quadratic term by adiagonally weighted quadratic term. The information about the H-weightsis not all lost. the linear term in (55) retains the original weights through(H H (Xk −D)). This linear term is extremely important in establishingthe global convergence, which is stated below.

(R4) As stated in the next result, Alg. 4.1 is only globally convergent. However,numerical experiments in the next section show that it works very well evenfor large scale problems. The key to this good performance is that the Newtonmethod (62) is extremely efficient.

Using the facts in Remarks 2 and 3 above, one can follow the proof of [15,Thm. 3.4] to establish the following result. We omit the details.

Theorem 4.2. Let Xk be the sequence generated by Alg. 4.1. Then f(Xk)is a monotonically decreasing sequence. If Xk+1 = Xk for some Xk, then Xk is anoptimal solution of (4). Otherwise, the infinite sequence f(Xk) satisfies

1

2‖W 1/2(Xk+1 −Xk)W 1/2‖2 ≤ f(Xk)− f(Xk+1), k = 0, 1, . . . .

18 HOU-DUO QI

Moreover, the sequence Xk is bounded and any accumulation is an optimal solutionof (4).

5. Numerical Results. In this section, we conduct numerical tests on bothproblem (3) and the H-weighted problem (4). For the former, we use Newton’smethod (11) and for the latter we use Alg. 4.1. At each of its iterations, Alg. 4.1 usesthe Newton method (62) to solve its subproblem (59). Since both Newton’s methodsin their current forms are only locally quadratically convergent, we used a globalizedversion of each of the Newton methods in our implementation. The globalized versionwe used is taken from [33, Alg. 5.1]. This globalized Newton method is globally andquadratically convergent (see the convergence analysis in [33, Sect. 5].)

We just like to make three remarks about this globalized Newton method. Weuse (11) as an example. The first remark is about calculating the matrix Vk. Thiscan be done by adapting the computing procedure of [33, Sect. 5 (a)] to our functionF . We summarize the calculation in a lemma (for simplicity, we drop the iterationindex k on y).

Lemma 5.1. (Computing Vy ∈ ∂F (y)) Let Y := −J(D + A∗(y))J have thespectral decomposition (31), with index sets α, β, and γ being defined by (33). Thena matrix Vy ∈ ∂F (y) can be computed as follows.

Vyh = h−A(P (My (PTHP ))PT

)∀ h ∈ IRn,

where H := J(A∗h)J and My is defined by

My :=

Eαα Eαβ (τij(y)) i∈αj∈γ

Eβα 0 0(τji(y)) i∈α

j∈γ0 0

, τij(y) :=λi

λi − λj, i ∈ α, j ∈ γ.

Evaluating the explicit form of Vy costs a prohibitive O(n4) operations. We there-fore chose conjugate-gradient (CG) method, which requires matrix-vector productsonly, to solve the Newton equation (11). The second remark is about preconditioningCG by the diagonal preconditioner of Vy. The preconditioner can be calculated byadapting the computing procedure of [4, Sect. 3.2] for the problem (7) to our case.The computational complexity is about 2n3, similar to that of [4].

Our last remark is about extending the Newton method to handle additional fixeddistance constraints:

Xij = Dij for (i, j) ∈ B, (63)

where B is the index set that fixes those known distances Dij . Toh [39] included suchconstraints in solving (5). Test example 5.6 considers such additional constraints. Ourmethodology and computation do not depend on the fact of A being the diagonalmapping. It applies to any type of linear equality constraints as long as we cancheaply calculate the adjoint A∗. However, if we have too many extra constraintsof the type (63), we may loose the property of constraint nondegeneracy, which inturn may destroy the quadratic convergence of the Newton method. We like to pointout that it is a very complicated issue to know what constraints enjoy the constraintnondegeneracy and what do not.

We will test the following problems. The first two problems are of dense type,i.e., Dij 6= 0 when i 6= j, while the remaining three enjoy certain sparsity patterns.


The first three problems are the type of unweighted problem (3) and the last two areof H-weighted problem (4). Examples 5.3 - 5.6 refer to the EDM1 problem of Toh[39].

Example 5.2. [16] The predistance matrix D is randomly generated with valuesuniformly distributed between 10−5 and 10.

Example 5.3. This problem is a slight modification of EDM1 problem of Toh[39]. First, we generate n random points, x1, . . . , xn, in the unit cube centered at theorigin in IR3. We calculate Dij = ‖xi − xj‖2 (the squared distance between xi andxj.) We then add to D an n × n random symmetric matrix with entries in [−α, α],where α = 0.3 in our test.

Example 5.4. This is EDM1 problem of [39] except that the H-weight matrix istaken to be H = E. First, we generate n random points, x1, . . . , xn, in the unit cubecentered at the origin in IR3. Then we set Dij = ‖xi−xj‖2 if the distance is less thana certain cut-off distance R; otherwise, set Dij = 0. R = 1 in our test.

Example 5.5. This is EDM1 problem of [39] except that we do not have fixeddistances. Generate matrix D as in Example 5.4 with various choices of R. Theweight matrix H is chosen to be the 0-1 matrix having the same sparsity pattern asD. Density is calculated by nnz(D)/numel(D).

Example 5.6. This is EDM1 problem of [39]. Generate D and H as in Example5.5. The set of indices where additional distances of the type (63) are fixed is givenby B := (1, j) : D1j 6= 0, j = 1, . . . , n .

All tests were carried out using the 64-bit version of MATLAB R2011b on aWindows 7 desktop with 64-bit operating system having Intel(R) Core(TM) 2 DuoCPU of 3.16GHz, and 4.0 GB of RAM. In Table 1, we compare Newton’s methodwith MAP [16] and the QSDP solver of Toh [39]. It follows from [14] or [28, Thm. 5.1]that the alternating projection method is actually the gradient method for the dualproblem with step size 1. Therefore, the error measured between successive iteratesby MAP is the norm of the gradient ‖∇θ(y)‖. Therefore, we terminate MAP when

Res := ‖∇θ(yk)‖ ≤ tol, (64)

with tol = 10−5, and we stop Newton’s method when (64) is satisfied with tol =10−6. The reason why we chose 10−5 for MAP is that it run into difficulties in some casesfor higher accuracy (e.g., took too many iterations to have one more digit accuracy).On the contrary, Newton’s method can quickly reach a higher accuracy. This is wellreflected by the cpu time (in hh:mm:ss format) and the number of iterations (Itercolumns) used by the two methods. The starting point for both methods was set tobe 0 and the maximum iterations of MAP is capped at 2000. As for the QSDP solver,we used the default parameter settings. The Obj column contains returned objectivefunction values by each method. The results reported below are the average on 10randomly generated instances of each test problem.

The performance of Newton’s method on unweighted problems in Table 1 is out-standing. It took under 1 minute to solve problems with n = 2000, which is equivalentto about 2 millions independent variables in each problem. An interesting observa-tion is that once it reached the level ‖∇θ(yk)‖ ≤ 10−1, Newton’s method convergesat a quadratic rate, taking just a few more steps to reach the required accuracy of10−6. This observation seems independent of problem size and probably justifies whyit took only about 4-8 steps to terminate for all the problems. On the contrary, MAPused an increasing number of iterations as n increases. We note that the complex-ity of one iteration of MAP is about one full eigenvalue-eigenvector decomposition.

20 HOU-DUO QI

The large number of iterations needed by MAP slows its convergence and took longtime to terminate. For example, Newton took about 8 seconds to solve Example 5.4(n = 1000) while MAP used about 13 minutes. When n = 2000, the numbers are 51seconds for Newton vs nearly 2 hours for MAP, which reached the maximum iterationwith ‖θ(yk)‖ ≈ 10−3. This less accuracy of the final iterate is reflected by the corre-sponding (slightly) lower objective function value 84996 than 84998 returned by theNewton method. This is because the final matrix returned by MAP is not yet (but closeto) a EDM due to the low accuracy. QSDP suffers similar difficulties as MAP when ngets bigger than 1000. When n = 2000, it took more than 9 hours to terminate. Beinga general solver for quadratic SDPs, we feel that QSDP has much room to improve onour test problems by taking advantage of the problem structure such as the problemsare unweighted. An encouraging observation is that all methods were able to returnalmost the same objective function value on each test problem.

In Tables 2 and 3, we report our numerical experience with Alg. 4.1 and QSDP

solver on H-weighted problems with (e.g., Example 5.6) and without (e.g., Example5.5) additional fixed distances (MAP is not applicable to this kind of problems.) In ourimplementation, we used an inexact accelerated proximal gradient strategy, whichis recently proposed by Jiang et al. [26, Alg. 2] for large scale linearly constrainedconvex QSDP. Please refer to [26] for theoretical justification on using this strategy.We terminate Alg. 4.1 when

fprog :=

√f(Xk−1)−

√f(Xk)

max100,√f(Xk−1)

≤ 10−5.

In other words, whenever there is lack of the relative progress on the successive ob-jective function values we stop the algorithm. This stopping criterion was suggestedby Gao and Sun [15] for their majorization method. We once again used the defaultparameter values for QSDP. In particular, it was terminated when the relative gapdefined in [39] is less than 10−5.

It is observed that the H-weighted problem is much more difficult to solve than theunweighted one. The difficulty level seems to increase as the density of H decreases.In Table 2, we tested Examples 5.5 and 5.6 with fixed dimension n = 500, but withvarying densities (ranging from 99.79% to 2.63%.) In Table 3, we tested the twoexamples with varying dimensions (from n = 100 to 2000) and varying densities(from 91.98% to 2.59%.) It is evident that our algorithm performed significantlyfaster than QSDP for all the problems. An important observation from those tables aswell as from our extensive numerical experiments unreported here is that when thedensity is above about 10%, both Alg. 4.1 and QSDP solver return almost the sameobjective function value. However, when it is below the 10%, QSDP often terminatedearly as the psqmr solver used in QSDP reached the default maximum number of steps.This observation can be clearly seen from Table 3 for n ≥ 500 with density less than5%. It is also worth to mention that one can stop Alg. 4.1 at any iteration oncea satisfactory approximate solution is obtained. This is because at each iterationof Alg. 4.1, the solution of the subproblem solved by Newton method (62) alreadyprovides a Euclidean distance matrix.

6. Conclusion and Future Research. In this paper, we studied Newton’smethod for computing the nearest Euclidean distance matrix from a given pre-distance(or dissimilarity) matrix. Our theoretical analysis is mainly on the unweighted case(3). The main result is that the Newton method is quadratically convergent. Thismain result also holds for the diagonally weighted problem (59), which naturally


arises from a majorization approach for the H-weighted problem (4). Our numericalexperiments showed that Newton’s method is extremely efficient even for large scaleproblems. This research also provides a solid foundation for other important problems.

One such problem is the embedding problem (6) and its H-weighted version:

min1

2‖H (X −D)‖2 s.t. X ∈ Snh ∩ Kn+ and rank(JXJ) ≤ r. (65)

In distance geometry models for molecular conformation, distances are often knownto be contained in box, i.e., lij ≤ Dij ≤ uij , 1 ≤ i, j ≤ n. “The difference betweenthe upper bound and lower bound reflects the accuracy with which the data is known.To reflect this accuracy in the algorithms, it is important that weighted models beconsidered.” For more explanation on the above statement, see [18, P. 114], whichrecommends Hij = 1/(1 + 10(uij − lij)). The finding in this paper opens a new avenuefor using Newton’s method to (65) through a penalty approach (i.e., penalizing therank constraint). We hope we will be able to report our results in a follow-up paper.

For the H-weighted problem (4), we proposed a majorization approach, which ateach iteration, solves a diagonally weighted problem. As seen from Tables 2 and 3, thisapproach sometimes took a good number of iterations to reach the required accuracy.Given the inner problem can be efficiently solved, one area we plan to investigate isto look at strategies to improve the efficiency of the majorization approach as well asother approaches.

Acknowledgments. The author would like to thank Prof. K.-C. Toh of Na-tional University of Singapore for his help on using QSDP solver on our test problems.The author also wishes to thank the two referees for their valuable comments andconstructive suggestions, which have significantly improved the quality of the paper.

REFERENCES

[1] A.Y. Alfakih, A. Khandani, and H. Wolkowicz, Solving Euclidean distance matrix com-pletion problems via semidefinite programming, Comput. Optim. Appl., 12 (1999), pp.13–30.

[2] F. Alizadeh, J.-P. A Haeberly, M.L. Overton, Complementarity and nondegeneracy insemidefinite programming, Math. Program. 77, 111-128 (1997)

[3] I. Borg and P.J.F. Groenen, Modern Multidimensional Scaling: Theory and Applications(2nd ed.) Springer Series in Statistics, Springer, 2005.

[4] R. Borsdorf and N.J. Higham, A preconditioned Newton algorithm for the nearest correla-tion matrix, IMA Journal of Numerical Analysis 94 (2010), pp. 94-107.

[5] S. Boyd and L. Xiao, Least-squares covariance matrix adjustment, SIAM Journal on MatrixAnalysis and Applications 27 (2005), pp. 532-546.

[6] X. Chen, H.-D. Qi, and P. Tseng, Analysis of nonsmooth symmetric matrix valued functionswith applications to semidefinite complementarity problems, SIAM J. Optim. 13 (2003),pp. 960–985.

[7] Z.X. Chan and D.F. Sun, Constraint nondegeneracy, strong regularity and nonsingularityin semidefinite programming, SIAM J. Optim. 19, 370–396 (2008).

[8] F.H. Clarke, Optimization and Nonsmooth Analysis, John Wiley & Sons, New York, 1983.[9] T.F. Cox and M.A.A. Cox, Multidimensional Scaling 2nd Ed, Chapman and Hall/CRC,

2001.[10] G. Crippen and T. Havel, Distance Geometry and Molecular Conformation. New York:

Wiley, 1988[11] J. Dattorro, Convex Optimization and Euclidean Distance Geometry. Meboo Publishing

USA,2005.[12] R.L. Dykstra, An algorithm for restricted least squares regression, J. Amer. Statist. Assoc.,

78 (1983), pp. 839–842.[13] H.-R. Fang and D.P. O’Leary, Euclidean distance matrix completion problems, To appear

in Optim. Methods and Software.

22 HOU-DUO QI

[14] N. Gaffke and R. Mathar, A cyclic projection algorithm via duality, Metrika, 36 (1989),pp. 29–54.

[15] Y. Gao and D.F. Sun, A majorized penalty approach for calibrating rank constrained correla-tion matrix problems. Technical Report, Department of Mathematics, National Universityof Singapore, March 2010.

[16] W. Glunt, T.L. Hayden, S. Hong, and J. Wells, An alternating projection algorithm forcomputing the nearest Euclidean distance matrix, SIAM J. Matrix Anal. Appl., 11 (1990),pp. 589–600.

[17] W. Glunt, T.L. Hayden, and W.-M. Liu, The embedding problem for predistance matrices,Bulletin of Mathematical Biology, 53 (1991), pp. 769–796.

[18] W. Glunt, T.L. Hayden, and R. Raydan, Molecular conformations from distance matrices,J. Computational Chemistry, 14 (1993), pp. 114–120.

[19] W. Glunt, T.L. Hayden, and R. Raydan, Preconditioners for distance matrix algorithms,J. Computational Chemistry, 15 (1994), pp. 227–232.

[20] J.C. Gower, Properties of Euclidean and non-Euclidean distance matrices, Linear AlgebraAppl., 67 (1985), pp. 81–97.

[21] S.P. Han, A successive projection method, Math. Programming, 40 (1988), pp. 1–14.[22] T.L. Hayden and J. Wells, Approximation by matrices positive semidefinite on a subspace,

Linear Algebra Appl., 109 (1988), pp. 115–130.[23] T.L. Hayden, J. Wells, W.-M. Liu, and P. Tarazaga, The cone of distance matrices,

Linear Algebra Appl., 144 (1991), pp. 153–169.[24] M.R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems,

J. Res. Nat. Bur. Stand. 49 (1952), pp. 409–436.[25] N.J. Higham, Computing the nearest correlation matrix – a problem from finance, IMA

Journal of Numerical Analysis, 22 (2002), pp. 329–343.[26] K.F. Jiang, D.F. Sun, and K.-C. Toh, An inexact accelerated proximal gradient method for

large scale linearly constrained convex SDP, Tech. Report, Department of Mathematics,National University of Singapore, September 2011.

[27] N. Krislock and H. Wolkowicz, Euclidean distance matrices and applications, In Handbookof Semidefinite, Cone and Polynomial Optimization, M. Anjos and J. Lasserre (Editors),2010.

[28] J. Malick, A dual approach to semidefinite least-squares problems, SIAM Journal on MatrixAnalysis and Applications 26 (2004), pp. 272–284.

[29] J. Malick and H.S. Sendov, Clarke generalized Jacobian of the projection onto the cone ofpositive semidefinite matrices, Set-Valued Analysis 14 (2006), pp. 273–293.

[30] C.A. Micchelli and F.I.Utreras, Smoothing and interpolation in a convex subset of aHilbert space, SIAM J. Sci. Statist. Comput. 9 (1988), pp. 728–747.

[31] A. Neumaier, Molecular modeling of proteins and mathematical prediction of protein struc-ture, SIAM Review 39 (1997), pp. 407–460.

[32] H.-D. Qi, Positive semidefinite matrix completions on chordal graphs and constraint nonde-generacy in semidefinite programming, Linear Algebra Appl. 430 (2009), pp. 1151–1164.

[33] H.-D. Qi and D.F. Sun, A quadratically convergent Newton method for computing the nearestcorrelation matrix, SIAM Journal on Matrix Analysis and Applications 28 (2006), pp.360–385.

[34] L. Qi and J. Sun, A nonsmooth version of Newton’s method, Math. Programming 58 (1993),pp. 353–367.

[35] R.T. Rockafellar, Conjugate Duality and Optimization, SIAM, Philadelphia, 1974.[36] I.J. Schoenberg, Remarks to Maurice Frechet’s article “Sur la definition axiomatque d’une

classe d’espaces vectoriels distancies applicbles vectoriellement sur l’espace de Hilbet”,Ann. Math. 36 (1935), pp. 724–732.

[37] D.F. Sun, The strong second-order sufficient condition and constraint nondegeneracy in non-linear semidefinite programming and their implications, Math. Oper. Res. 31, 761–776(2006)

[38] D.F. Sun and J. Sun, Semismooth matrix valued functions, Math. Oper. Res. 27 (2002), pp.150–169.

[39] K.C. Toh, An inexact path-following algorithm for convex quadratic SDP, MathematicalProgramming, 112 (2008), pp. 221–254.

[40] G. Young and A.S. Householder, Discussion of a set of points in terms of their mutualdistances, Psychometrika 3 (1938), pp. 19–22.


Table1

Co

mpa

riso

nbe

twee

nNewton,MAP,

an

dQSDP

on

un

wei

ghte

dp

robl

ems

Newton

MAP

QSDP

nIt

ercp

uR

esO

bj

Iter

cpu

Res

Ob

jIt

ercp

ugap

Ob

j10

05

0.1

6.58

E-0

810

804.

0142

0.1

8.2

7E

-06

10804.0

112

17

2.7

5E

-07

10804.0

420

06

0.2

8.62

E-1

050

738.

2960

0.6

8.6

1E

-06

50738.2

912

58

7.0

3E

-07

50738.5

4E

5.2

500

51.

29.

38E

-07

3776

74.2

685

79.9

4E

-06

377674.2

612

8:0

52.7

3E

-07

377674.8

410

006

73.

61E

-09

1644

287.

6211

654

9.5

8E

-06

1644287.6

212

45:3

87.2

9E

-07

1644294.5

515

006

226.

92E

-09

3856

026.

5514

13:2

09.8

3E

-06

3856026.5

413

2:2

5:2

49.8

7E

-08

3856028.4

620

006

551.

69E

-08

7023

779.

9816

19:5

49.1

8E

-06

7023779.9

713

5:1

4:3

38.8

2E

-08

7023783.1

710

04

0.1

1.48

E-0

78.

2727

0.1

7.9

1E

-06

8.2

715

18

2.9

4E

-07

8.2

720

05

0.2

1.37

E-0

843

.21

460.4

9.8

8E

-06

43.2

116

50

4.3

4E

-07

43.2

1E

5.3

500

51.

31.

59E

-08

333.

5761

59.8

1E

-06

333.5

723

6:1

83.9

3E

-07

333.5

910

005

69.

46E

-08

1464

.24

8640

9.6

3E

-06

1464.2

436

1:1

5:3

11.0

2E

-07

1464.2

615

005

182.

53E

-07

3438

.71

104

2:3

19.3

9E

-06

3438.7

147

4:0

7:5

46.9

9E

-07

3439.0

820

006

546.

26E

-09

6280

.59

118

7:1

29.8

3E

-06

6280.5

959

12:2

4:1

24.1

8E

-07

6280.9

810

06

0.1

4.65

E-0

915

6.92

162

0.4

9.9

4E

-06

156.9

214

15

3.4

1E

-07

156.9

220

06

0.3

1.36

E-0

771

8.24

342

39.9

8E

-06

718.2

416

50

6.2

5E

-07

718.2

4E

5.4

500

71.

73.

35E

-07

5119

.41

861

1:0

59.9

6E

-06

5119.4

120

6:5

88.5

1E

-07

5119.4

410

007

85.

54E

-07

2136

7.14

1783

13:0

59.9

7E

-06

21367.1

424

50:4

08.3

6E

-07

21367.2

215

008

242.

98E

-08

4762

9.30

2000

48:0

42.7

1E

-04

47629.1

729

3:2

5:2

41.0

8E

-07

47629.3

320

008

511.

91E

-07

8499

8.87

2000

1:5

7:0

32.6

4E

-03

84996.7

635

9:3

2:3

85.1

9E

-07

84999.0

8

24 HOU-DUO QI

Table 2Comparison between Alg. 4.1 and QSDP on H-weighted problems. “*” means that psqmr in QSDP

reached the maximum number of steps and the algorithm terminated before reaching the accuracy.

Alg. 4.1 QSDP

n = 500 Density R Iter cpu fprog Obj Iter cpu gap Obj99.79% 1.5 2 4 1.44E-06 1755.9480 23 8:57 1.72E-06 1756.034590.80% 1 18 26 5.79E-06 1088.3068 26 7:44 8.83E-06 1088.478569.14% 0.8 30 38 1.95E-06 428.3699 24 11:13 1.31E-06 427.6617

E5.5 40.68% 0.6 93 1:33 9.00E-06 85.3167 21 11:10 1.80E-06 85.100627.27% 0.5 144 2:08 9.87E-06 26.9529 20 13:46 2.53E-06 26.540516.16% 0.4 101 1:30 9.83E-06 6.4204 20 19:15 4.39E-06 6.00312.63% 0.2 40 36 9.28E-06 0.0192 20 24:04 1.34E-02* 0.0313

99.79% 1.5 4 1:01 3.19E-07 2666.3489 50 9:03 5.82E-06 2666.612690.80% 1 15 1:50 8.81E-06 1551.5751 48 8:34 5.08E-06 1551.649869.14% 0.8 27 2:25 3.74E-06 527.9511 43 11:02 2.08E-06 527.2400

E5.6 40.68% 0.6 86 5:10 9.92E-06 95.4337 35 12:57 2.10E-06 95.229327.27% 0.5 141 6:54 9.94E-06 28.3609 29 14:59 1.70E-06 27.957916.16% 0.4 101 2:43 9.85E-06 6.6095 29 19:52 3.64E-06 6.18622.63% 0.2 39 45 9.91E-06 0.0197 21 3:41 6.76E-02* 0.0874

Table 3Comparison between Alg. 4.1 and QSDP on H-weighted problems. “*” means that psqmr in QSDP

reached the maximum number of steps and the algorithm terminated before reaching the accuracy.

Alg. 4.1 QSDP

n Density R Iter cpu fprog Obj Iter cpu gap Obj100 91.98% 1 11 1 4.47E-06 34.1656 14 14 2.27E-06 34.1322200 71.9% 0.8 24 5 3.41E-07 62.0968 16 31 8.62E-06 61.9664

E5.5 500 4.78% 0.25 51 44 9.80E-06 0.16880 21 32:42 2.63E-04* 0.135041000 2.56% 0.2 56 4:11 9.59E-06 0.15303 21 3:52:29 1.05E-03* 0.123481500 2.57% 0.2 68 13:34 9.82E-06 0.45769 23 13:21:57 1.20E-03* 0.386682000 2.59% 0.2 80 17:55 9.77E-06 0.96547 27 36:58:21 4.76E-04* 0.81649100 91.98% 1 10 5 5.81E-06 50.8218 21 13 7.96E-06 50.7989200 71.9% 0.8 23 21 2.35E-07 84.1753 29 57 1.97E-06 84.0205

E5.6 500 4.78% 0.25 51 1:04 9.83E-06 0.17051 28 40:57 1.35E-04* 0.135081000 2.56% 0.2 56 4:33 9.56E-06 0.15484 30 4:11:52 8.39E-04* 0.123461500 2.57% 0.2 68 14:38 9.81E-06 0.46239 33 12:54:59 1.22E-03* 0.391562000 2.59% 0.2 80 34:55 9.80E-06 0.97490 36 36:58:49 8.22E-04* 0.83625

A SEMISMOOTH NEWTON METHOD FOR THE NEAREST · 2018-06-01 · A SEMISMOOTH NEWTON METHOD FOR THE...

Documents

Transcript of A SEMISMOOTH NEWTON METHOD FOR THE NEAREST · 2018-06-01 · A SEMISMOOTH NEWTON METHOD FOR THE...