S Regularization Methods and Fixed Point Algorithms for ... · The matrix completion problem (1.2)...

S1/2 Regularization Methods and Fixed Point Algorithms for

Affine Rank Minimization Problems

Dingtao Peng ∗ Naihua Xiu † and Jian Yu ‡

Abstract

The affine rank minimization problem is to minimize the rank of a matrix under linearconstraints. It has many applications in various areas such as statistics, control, systemidentification and machine learning. Unlike the literatures which use the nuclear norm orthe general Schatten q (0 < q < 1) quasi-norm to approximate the rank of a matrix, inthis paper we use the Schatten 1/2 quasi-norm approximation which is a better approx-imation than the nuclear norm but leads to a nonconvex, nonsmooth and non-Lipschitzoptimization problem. It is important that we give a globally necessary optimality condi-tion for the S1/2 regularization problem by virtue of the special objective function. This isvery different from the local optimality conditions usually used for the general Sq regular-ization problems. Explicitly, the globally optimality condition for the S1/2 regularizationproblem is a fixed point equation associated with the singular value half thresholdingoperator. Naturally, we propose a fixed point iterative scheme for the problem. We alsoprovide the convergence analysis of this iteration. By discussing the location and settingof the optimal regularization parameter as well as using an approximate singular valuedecomposition procedure, we get a very efficient algorithm, half norm fixed point algo-rithm with an approximate SVD (HFPA algorithm), for the S1/2 regularization problem.Numerical experiments on randomly generated and real matrix completion problems arepresented to demonstrate the effectiveness of the proposed algorithm.

Key words. affine rank minimization problem; matrix completion problem; S1/2 regu-larization problem; fixed point algorithm; singular value half thresholding operator

AMS Subject Classification. 90C06, 90C26, 90C59, 65F22

1 Introduction

The affine rank minimization problem, which is to minimize the rank of a matrix under linearconstraints, can be described as follows

minX∈Rm×n rank(X)s.t. A(X) = b,

(1.1)

∗College of Science, Guizhou University, Guiyang 550025, Guizhou, China; and School of Science, BeijingJiaotong University, Beijing 100044, China; ([email protected]). This author was supported by the NSFCgrant 11171018 and the Guizhou Provincial Science and Technology Foundation grant 20102133.

†School of Science, Beijing Jiaotong University, Beijing 100044, China ([email protected]). This authorwas supported by the National Basic Research Program of China grant 2010CB732501 and the NSFC grant71271021.

‡College of Science, Guizhou University, Guiyang 550025, Guizhou, China ([email protected]).

1

where b ∈ Rp is a given vector andA : Rm×n → Rp is a given linear transformation determinedby p matrices A1, · · · , Ap ∈ Rm×n via

A(X) := [⟨A1, X⟩, · · · , ⟨Ap, X⟩]T for all X ∈ Rm×n,

with ⟨Ai, X⟩ := trace(ATi X), i = 1, · · · , p. An important special case of (1.1) is the matrix

completion problem [6]

minX∈Rm×n rank(X)s.t. Xi,j = Mi,j , (i, j) ∈ Ω,

(1.2)

where X and M are both m × n matrices, Ω is a subset of index pairs (i, j), and a smallsubset Mi,j |(i, j) ∈ Ω of the entries is known.

Many applications arising in various areas can be captured by solving the model (1.1),for instance, the low-degree statistical models for a random process [17, 36], the low-orderrealization of linear control systems [19, 37], low-dimensional embedding of datum in Eu-clidean spaces [20], system identification in engineering [28], machine learning [32], and otherapplications [18]. The matrix completion problem (1.2) often arises, for which the examplesinclude the Netflix problem, global positioning, remote sensing and so on [5, 6]. Moreover,problem (1.1) is an extension of the well-known sparse signal recovery (or compressed sens-ing) which is formulated as finding a sparsest solution of an underdetermined system of linearequations [7, 15].

Problem (1.1) was considered by Fazel [18] in which its computational complexity isanalyzed and it is proved to be an NP-hard problem. To overcome such a difficulty, Fazel [18]and other researchers (e.g., [6, 8, 34]) have suggested to relax the rank of X by the nuclearnorm, that is, to consider the following nuclear norm minimization problem

minX∈Rm×n ∥X∥∗s.t. A(X) = b,

(1.3)

or the nuclear norm regularization problem

minX∈Rm×n ∥A(X)− b∥22 + λ∥X∥∗ (1.4)

if the data contain noises, where ∥X∥∗ is the nuclear norm of X, i.e., the sum of its singularvalues. It is well-known that problems (1.3) and (1.4) are both convex and therefore, can beeasier (at least in theory) solved than (1.1).

Many existing algorithms rely on nuclear norm. For examples, problem (1.3) can bereformulated as a semidefinite programming [34] and be solved by SDPT3 [41]; Lin et al [26],and Tao and Yuan [39] adopt augmented lagrangian multipliers (ALM) methods to solverobust PCA problems and its extension which contain the matrix completion problem as aspecial case; The SVT [4] solves (1.3) by applying a singular value thresholding operator; Tohand Yun [40] solve a general model that contains (1.3) as a special case by accelerated proximalgradient (APG); Liu, Sun and Toh [27] present a framework of proximal point algorithms inthe primal, dual and primal-dual forms for solving the nuclear norm minimization with linearequality and second order cone constraints; Ma, Goldfarb and Chen [29] proposed fixed pointand Bregman iterative algorithms for solving problem (1.3).

Considering the nonconvexity of the original problem (1.1), some researchers [23, 25, 30,31, 33] suggest to use the Schatten q (0 < q < 1) quasi-norm (for short, q norm) relaxation,that is, to solve the q-norm minimization problem

minX∈Rm×n ∥X∥qqs.t. A(X) = b,

(1.5)

2

or the Sq regularization problem

minX∈Rm×n ∥A(X)− b∥22 + λ∥X∥qq (1.6)

if the data contain noises, where the Schatten q quasi-norm of X is defined by ∥X∥qq :=∑minm,ni=1 σq

i and σi (i = 1, · · · ,minm,n) are the singular values of X. Problems (1.5) isintermediate between (1.1) and (1.3) in the sense that

rank(X) =

minm,n∑i = 1σi = 0

σ0i , ∥X∥qq =

minm,n∑i=1

σqi , and ∥X∥∗ =

minm,n∑i=1

σ1i .

Obviously, the q quasi-norm is a better approximation of the rank function than the nuclearnorm, but it leads to a nonconvex, nonsmooth, non-Lipschitz optimization problem for whichthe global minimizers are difficult to find.

In fact, the nonconvex relaxation method was firstly proposed in the region of sparse signalrecovery [9,10]. Recently, nonconvex regularization methods associated with the ℓq (0 < q <1) norm have attracted much attention and many theoretical results and algorithms have beendeveloped to solve the nonconvex, nonsmooth, even non-Lipschitz optimization problems, see,e.g., [2, 14, 23, 25, 31]. Extensive computational results have shown that using the ℓq normcan find very sparse solutions under very little measurements, see, e.g., [9–14, 25, 31, 36, 45].However, since the ℓq norm minimization is a nonconvex, non-smooth and non-Lipschitzproblem, it is in general difficult to give a theoretical guarantee for finding a global solution.Moreover, which q should be selected is another interesting problem. The results in [43–45]revealed that the ℓ1/2 relaxation can be somehow regarded as a representation among all theℓq relaxation with q in (0, 1) in the sense that the ℓ1/2 relaxation has more powerful recoveringability than the ℓq relaxation as 1/2 < q < 1, meanwhile the recovering ability has no muchdifference between the ℓ1/2 relaxation and the ℓq relaxation as 0 < q < 1/2. Moreover, Xu etal [44] in fact provided a globally necessary optimality condition for the ℓ1/2 regularizationproblem, which is expressed as a fixed point equation involving the half thresholding function.This condition may not hold at the local minimizers. Then they developed a fast iterative halfthresholding algorithm for the ℓ1/2 regularization problem which matches the iterative hardthresholding algorithm for the ℓ0 regularization problem and the iterative soft thresholdingalgorithm for the ℓ1 regularization problem.

In this paper, inspired by the works of nonconvex regularization method, especially ℓ1/2regularization mentioned above, we focus our attention on the following S1/2 regularizationproblem

minX∈Rm×n

∥AX − b∥22 + λ∥X∥1/21/2

, (1.7)

where ∥X∥1/21/2 =∑minm,n

i=1 σ1/2i and σi (i = 1, · · · ,minm,n) are the singular values of X.

This paper is organized as follows. In Section 2, we briefly discuss the relation bween theglobal minimizers of problem (1.5) and problem (1.6). In Section 3, we deduce an analyticalthresholding expression associated with the solutions to problem (1.7), and establish an exactlower bound of the nonzero singular values of the solutions. Moveover, we prove that thesolutions to problem (1.7) are fixed points of a matrix-valued thresholding operator. InSection 4, based on the fixed point condition, we give a naturally iterative formula, andprovide the convergence analysis of our proposed iteration. Section 5 discusses the locationof the optimal regularization parameter and the setting of the parameter which coincides with

3

the fixed point continuation technique used in the convex optimization. Since the singularvalue decomposition is computationally expensive, in Section 6 we employ an approximatesingular value decomposition procedure to cut the cost of time. Thus we get a very fast, robustand powerful algorithm which we call HFPA algorithm (half norm fixed point algorithmwith an approximate SVD). Numerical experiments on randomly generated and real matrixcompletion problems are presented in Section 7 to demonstrate the effectiveness of the HFPAalgorithm. At last, we conclude our results in section 8.

Before continuing, we summarize the notations that will be used in this paper. Through-out this paper, without loss of generality, we always suppose m ≤ n. Let ∥x∥2 denotethe Euclidean norm of any vector x ∈ Rp. For any x, y ∈ Rp, ⟨x, y⟩ = xT y denotes theinner product of two vectors. For any matrix X ∈ Rm×n, σ(X) = (σ1(X), · · · , σm(X))T

denotes the vector of singular values of X arranged in nonincreasing order, and it will besimply denoted by σ = (σ1, · · · , σm)T if no confusion is caused in the context; Diag(σ(X))denotes a diagonal matrix whose diagonal vector is σ(X); and ∥X∥F denotes the Frobe-

nius norm of X, i.e., ∥X∥F =(∑

i,j X2ij

)1/2=(∑m

i=1 σ2i

)1/2. For any X,Y ∈ Rm×n,

⟨X,Y ⟩ = tr(Y TX) denotes the inner product of two matrices. Let the linear transfor-mation A : Rm×n → Rp be determined by p given matrices A1, · · · , Ap ∈ Rm×n, thatis, A(X) = (⟨A1, X⟩, · · · , ⟨Ap, X⟩)T . Define A = (vec(A1), · · · , (vec(Ap))

T ∈ Rp×mn andx = vec(X) ∈ Rmn where vec(·) is the stretch operator, then we have A(X) = Ax and∥A(X)∥2 ≤ ∥A∥∥X∥F , where ∥A∥ := max∥A(X)∥2 : ∥X∥F = 1 = ∥A∥2 and ∥A∥2 is thespectral norm of the matrix A. Let A∗ denote the adjoint of A. Then for any y ∈ Rp, wehave A∗y =

∑pi=1 yiAi and ⟨A(X), y⟩ = ⟨X,A∗y⟩ = ⟨vec(X), vec(A∗y)⟩ = ⟨x,AT y⟩.

2 Relation between global minimizers of problem (1.5) andproblem (1.6)

We now show that in some sense, problem (1.5) can be solved by solving problem (1.6).The theorem here is general and covers problem (1.7) as a special case. We note that theregularization term ∥X∥qq is nonconvex, nonsmooth and non-Lipschitz, hence the result isnontrivial.

Theorem 2.1 For each λ > 0, the set of global minimizers of (1.6) is nonempty and bounded.Let λk be a decreasing sequence of positive numbers with λk → 0, and Xλk

be a globalminimizer of problem (1.6) with λ = λk. Suppose that problem (1.5) is feasible, then Xλk

is bounded and any of its accumulation points is a global minimizer of problem (1.5).

Proof. Since Cλ(X) := ∥A(X) − b∥22 + λ∥X∥qq ≥ λ∥X∥qq, the objective function Cλ(X)is bounded from below and is coercive, i.e., Cλ(X) → ∞ if ∥X∥F → ∞, and hence the set ofglobal minimizers of (1.6) is nonempty and bounded.

Suppose that problem (1.5) is feasible and X is any feasible point, then AX = b. SinceXλk

is a global minimizer of problem (1.6) with λ = λk, we have

maxλk∥Xλk

∥qq, ∥A(X)λk− b∥22

≤ λk∥Xλk

∥qq + ∥A(X)λk− b∥22

≤ λk∥X∥qq + ∥AX − b∥22= λk∥X∥qq.

From λk∥Xλk∥qq ≤ λk∥X∥qq, we get ∥Xλk

∥qq ≤ ∥X∥qq, that is, the sequence Xλk is bounded.

Thus, Xλk has at least one accumulation point. Let X∗ be any accumulation point of

4

Xλk. From ∥A(X)λk

− b∥22 ≤ λk∥X∥qq and λk → 0, we derive A(X)∗ = b, that is, X∗ is afeasible point of problem (1.5). It follows from ∥Xλk

∥qq ≤ ∥X∥qq that ∥X∗∥qq ≤ ∥X∥qq. Thenby the arbitrariness of X, we obtain that X∗ is a global minimizer of problem (1.5).

3 Globally necessary optimality condition

In this section, we give a globally necessary optimality condition for problem (1.7), whichperhaps does not hold at the local minimizers. This condition is expressed as a matrix-valued fixed point equation associated with a special thresholding operator which we calledhalf thresholding operator. Before we start to research the S1/2 regularization problem, webegin with introducing the half thresholding operator.

3.1 Half thresholding operator

First, we introduce the half thresholding function, which is to minimize a real-valued function.The following key lemma follows but is different from Xu et al [44].

Lemma 3.1 Let t ∈ R, λ > 0 be two given real numbers. Suppose that x∗ ∈ R is a globalminimizer of the problem

minx≥0

f(x) := (x− t)2 + λx1/2. (3.1)

Then x∗ is uniquely determined by (3.1) when t =3√544 λ2/3, and can be analytically expressed

by

x∗ = hλ(t) :=

hλ,1/2(t), if t >

3√544 λ2/3

hλ,1/2(t), 0, if t =3√544 λ2/3

0, if t <3√544 λ2/3

(3.2)

where

hλ,1/2(t) =2

3t

(1 + cos

(2π

3− 2

3ϕλ(t)

))(3.3)

with

ϕλ(t) = arccos

(λ

8

(t

3

)−3/2). (3.4)

Proof. Firstly, we consider the positive stationary points of (3.1). The first order optimalcondition of (3.1) gives that

x− t+λ

4√x= 0. (3.5)

This equation implies that if and only if t > 0 it has positive roots, and that if t ≤ 0, f(x)is increasing on [0,+∞) and x = 0 is the unique minimizer of (3.1). Hence, we need onlyto consider t > 0 from now on. By solving equation (3.5) and comparing the values of f ateach root of equation (3.5), Xu et al [44] have showed that x = hλ,1/2(t) defined by (3.3) isthe unique desired positive stationary point of (3.1) such that f(x) is the smallest among the

5

values of f at its all positive stationary points (see (14),(15) and (16) in [44], we note that,in (16), xi >

34λ

2/3 is not necessary, in fact, xi > 0 is enough).The rest thing is to compare the values between f(x) and f(0). Fortunately, Xu et al

(see Lemma 1 and Lemma 2 in [44]) have showed that

f(x) < f(0) ⇔ t >3√54

4λ2/3

and

f(x) = f(0) ⇔ t =3√54

4λ2/3.

The other case is naturally

f(x) > f(0) ⇔ t <3√54

4λ2/3.

The above three relationships imply

x∗ =

x, if t >

3√544 λ2/3

x, 0, if t =3√544 λ2/3

0, if t <3√544 λ2/3

,

which completes the proof.

Figure 1 shows the minimizers of the function f(x) with two different pairs of (t, λ).Specifically, in (a) t = 2, λ = 8 and in (b) t = 4, λ = 8. In (b), x = 0 is a local minimizer of

f(x); Meanwhile, since t = 4 >3√544 λ2/3 = 3

√54, the global minimizer is x∗ = hλ,1/2(4) > 0.

−1 0 1 2 3 4 50

5

10

15

20

25

x(a)

f(x)

t =2, λ=8

−1 0 1 2 3 4 5−2

0

2

4

6

8

10

12

14

16

18

x(b)

f(x)

t =4, λ=8

Figure 1: The minimizers of the function f(x) with two different pairs of (t, λ).

Lemma 3.2 (Appendix A in [44]) If t >3√544 λ2/3, then the function hλ(t) is strictly

increasing.

Similar to [33, 44], using hλ(·) defined in Lemma 3.1, we can define the following halfthresholding function and half thresholding operators.

6

Definition 3.3 (Half thresholding function) Assume t ∈ R. For any λ > 0, the functionhλ(·) defined by (3.2)-(3.4) is called a half thresholding function.

Definition 3.4 (Vector half thresholding operator) For any λ > 0, the vector half threshold-ing operator Hλ(·) is defined as

Hλ(x) := (hλ(x1), hλ(x2), · · · , hλ(xn))T , ∀x ∈ Rn.

Definition 3.5 (Matrix half thresholding operator) Suppose Y ∈ Rm×n of rank r admits asingular value decomposition (SVD) as

Y = UDiag(σ)V T ,

where U and V are respectively m× r and n× r matrices with orthonormal columns, and thevector σ = (σ1, σ2, · · · , σr)T consists of positive singular values of Y arranged in nonincreas-ing order (Unless specified otherwise, we will always suppose the SVD of a matrix is givenin this reduced form). For any λ > 0, the matrix half thresholding operator Hλ(·) : Rm×n →Rm×n is defined by

Hλ(Y ) := UDiag(Hλ(σ))VT .

In what follows, we will see that the matrix half thresholding operator defined above is in

fact a proximal operator associated with ∥X∥1/21/2, a nonconvex and non-Lispschitz function.This in some sense can be regarded as an extension of the well-known proximal operatorassociated with convex functions [27,35].

Lemma 3.6 The global minimizer Xs of the following problem

minX∈Rm×n

∥X − Y ∥2F + λ∥X∥1/21/2 (3.6)

can be analytically given by

Xs = Hλ(Y ).

Proof. See the Appendix.

3.2 Fixed point equation for global minimizers

Now, we can begin to consider our S1/2 regularization problem (1.7):

minX∈Rm×n

∥AX − b∥22 + λ∥X∥1/21/2

. (3.7)

For any λ, µ > 0 and Z ∈ Rm×n, let

Cλ(X) := ∥A(X)− b∥22 + λ∥X∥1/21/2, (3.8)

Cλ,µ(X,Z) := µ(Cλ(X)− ∥A(X)−A(Z)∥22

)+ ∥X − Z∥2F , (3.9)

Bµ(Z) := Z + µA∗(b−A(Z)). (3.10)

Lemma 3.7 If Xs ∈ Rm×n is a global minimizer of Cλ,µ(X,Z) for any fixed λ, µ and Z,then Xs can be analytically expressed by

Xs = Hλµ(Bµ(Z)). (3.11)

7

Proof. Note that Cλ,µ(X,Z) can be reexpressed as

Cλ,µ(X,Z) = µ(∥A(X)− b∥22 + λ∥X∥1/21/2 − ∥A(X)−A(Z)∥22

)+ ∥X − Z∥2F

= ∥X∥2F + 2µ⟨A(X),A(Z)⟩ − 2µ⟨A(X), b⟩ − 2⟨X,Z⟩+ λµ∥X∥1/21/2

+∥Z∥2F + µ∥b∥22 − µ∥A(Z)∥22= ∥X∥2F − 2⟨X,Z + µA∗(b−A(Z))⟩+ λµ∥X∥1/21/2

+∥Z∥2F + µ∥b∥22 − µ∥A(Z)∥22= ∥X∥2F − 2⟨X,Bµ(Z)⟩+ λµ∥X∥1/21/2 + ∥Z∥2F + µ∥b∥22 − µ∥A(Z)∥22.

= ∥X −Bµ(Z)∥2F + λµ∥X∥1/21/2

+∥Z∥2F + µ∥b∥22 − µ∥A(Z)∥22 − ∥Bµ(Z)∥2F .

This implies that minimizing Cλ,µ(X,Z) for any fixed λ, µ and Z is equivalent to solving

minX∈Rm×n

∥X −Bµ(Z)∥2F + λµ∥X∥1/21/2

.

By applying Lemma 3.6 with Y = Bµ(Z), we get expression (3.11).

Lemma 3.8 Let λ and µ be two fixed numbers satisfying λ > 0 and 0 < µ ≤ ∥A∥−2. If X∗

is a global minimizer of Cλ(X), then X∗ is also a global minimizer of Cλ,µ(X,X∗), that is,

Cλ,µ(X∗, X∗) ≤ Cλ,µ(X,X∗) for all X ∈ Rm×n. (3.12)

Proof. Since 0 < µ ≤ ∥A∥−2, we have

∥X −X∗∥2F − µ∥AX −AX∗∥22 ≥ 0.

Hence for any X ∈ Rm×n,

Cλ,µ(X,X∗) = µ(Cλ(X)− ∥AX −AX∗∥22

)+ ∥X −X∗∥2F

= µ(∥AX − b∥22 + λ∥X∥1/21/2

)+(∥X −X∗∥2F − µ∥AX −AX∗∥22

)≥ µ

(∥AX − b∥22 + λ∥X∥1/21/2

)= µCλ(X)

≥ µCλ(X∗) = Cλ,µ(X

∗, X∗),

where the last inequality is due to that X∗ is a global minimizer of Cλ(X). The proof is thuscomplete.

By applying Lemmas 3.7 and 3.8, we can now derive the main result of this section.

Theorem 3.9 Given λ > 0, 0 < µ ≤ ∥A∥−2. Let X∗ be a global minimizer of problem (1.7)and Bµ(X

∗) = X∗ + µA∗(b−A(X)∗) admit the following SVD

Bµ(X∗) = U∗Diag(σ(Bµ(X

∗)))V ∗T . (3.13)

Then X∗ satisfies the following fixed point equation

X∗ = Hλµ(Bµ(X∗)). (3.14)

8

Particularly, one can express

[σ(X∗)]i = hλµ([σ(Bµ(X∗))]i)

=

hλµ,1/2([σ(Bµ(X

∗))]i), if [σ(Bµ(X∗))]i >

3√544 (λµ)2/3

hλµ,1/2([σ(Bµ(X∗))]i), 0, if [σ(Bµ(X

∗))]i =3√544 (λµ)2/3

0, if [σ(Bµ(X∗))]i <

3√544 (λµ)2/3.

(3.15)

Moreover, we have

either [σ(X∗)]i ≥3√54

6(λµ)2/3 or [σ(X∗)]i = 0. (3.16)

Proof. Since X∗ is a global minimizer of Cλ(X), by Lemma 3.8, X∗ is also a globalminimizer of Cλ,µ(X,X∗). Consequently by Lemma 3.7, X∗ satisfies equation (3.14). (3.15)is a reexpression of equation (3.14) in the form of component. According to (3.2)-(3.4), bydirect computation, we have

limt↓

3√544

(λµ)2/3ϕλµ(t) =

π

4and lim

t↓3√544

(λµ)2/3hλµ(t) =

3√54

6(λµ)2/3.

This limitation together with the strict monotonicity of hλµ on t >3√544 (λµ)2/3 (Lemma 3.2)

implies that [σ(X∗)]i >3√546 (λµ)2/3 as [σ(Bµ(X

∗))]i >3√544 (λµ)2/3. The last one of (3.15)

shows [σ(X∗)]i = 0 as [σ(Bµ(X∗))]i <

3√544 (λµ)2/3. Thus, (3.16) is derived.

Theorem 3.10 provides not only the lower bound estimation, say3√546 (λµ)2/3, for the

nonzero singular values of the global minimizers of the S1/2 regularization problem, but alsoa global necessary optimality condition in the form of a fixed point equation associated withthe matrix half thresholding operator Hλµ(·). In one hand, it is analogous to the fixed pointcondition of the nuclear norm regularization solution associated with the so-called singularvalue shrinkage operator (see, e.g., [4,29]). On the other hand, the half thresholding operatorassociated here is more complicated than the singular value shrinkage operator due to ournonconvex, nonsmooth and non-Lipschitz minimization problem.

Definition 3.10 We call X∗ a global stationary point of problem (1.7) if there exists 0 <µ ≤ ∥A∥−2 such that X∗ satisfies the fixed point equation (3.14).

4 Fixed point iteration and its convergence

According to the fixed point equation (3.14), a fixed point iterative formula for the S1/2

regularization problem (1.7) can be naturally proposed as follows: given X0,

Xk+1 = Hλµ(Xk + µA∗(b−A(X)k)). (4.1)

To simplify the process of iterations and for the aim to find low rank solutions, we make aslightly adjustment of hλµ in (4.1) as follows

hλµ(t) :=

hλµ,1/2(t), if t >

3√544 (λµ)2/3

0, otherwise.(4.2)

The adjustment here is to choose hλµ(t) = 0 when t =3√544 (λµ)2/3.

Next, let us analyze the convergence of the above fixed point iteration.

9

Theorem 4.1 Given λ > 0, choose 0 < µ < ∥A∥−2. Let Xk be the sequence generated byiteration (4.1). Then

(i) Cλ(Xk) is strictly monotonically decreasing and converges to Cλ(X∗) where X∗ is

any accumulation point of Xk.(ii) Xk is asymptotically regular, that is, lim

k→∞∥Xk+1 −Xk∥F = 0.

(iii) Any accumulation point of Xk is a global stationary point of problem (1.7).

Proof. (i) Let Cλ(X), Cλ,µ(X,Z) and Bµ(Z) be defined by (3.8)-(3.10), and Bµ(Z)admit the SVD as Bµ(Z) = UDiag(σ)V T where U ∈ Rm×r, V ∈ Rn×r and σ ∈ Rr

++.From Lemma 3.7, we have

Cλ,µ(Hλµ(Bµ(Z)), Z) = minX

Cλ,µ(X,Z),

and therefore,

Cλ,µ(Xk+1, Xk) = minX

Cλ,µ(X,Xk), (4.3)

where Xk+1 = Hλµ(Bµ(Xk)) = UkDiag(Hλµ(σk))VTk and UkDiag(σk)V

Tk is the SVD of

Bµ(Xk). Since 0 < µ < ∥A∥−2, we have

∥A(X)k+1 −A(X)k∥22 −1

µ∥Xk+1 −Xk∥2F < 0.

Hence,

Cλ(Xk+1) =1

µ

(Cλ,µ(Xk+1, Xk)− ∥Xk+1 −Xk∥2F

)+ ∥A(X)k+1 −A(X)k∥22

≤ 1

µ

(Cλ,µ(Xk, Xk)− ∥Xk+1 −Xk∥2F

)+ ∥A(X)k+1 −A(X)k∥22

=1

µCλ,µ(Xk, Xk) +

(∥A(X)k+1 −A(X)k∥22 −

1

µ∥Xk+1 −Xk∥2F

)<

1

µCλ,µ(Xk, Xk) = Cλ(Xk),

which shows that Cλ(Xk) is strictly monotonically decreasing. Since Cλ(Xk) is boundedfrom below, Cλ(Xk) converges to a constant C∗. From Xk ⊂ X : Cλ(X) ≤ Cλ(X0)which is bounded, it follows that Xk is bounded, and therefore Xk has at least oneaccumulation point. Let X∗ be an accumulation point of Xk. By the continuity of Cλ(X)and the convergence of Cλ(Xk), we get Cλ(Xk) → C∗ = Cλ(X

∗) as k → +∞.(ii) Since 0 < µ < ∥A∥−2, we have 0 < δ := 1− µ∥A∥2 < 1 and

∥Xk+1 −Xk∥2F ≤ 1

δ

(∥Xk+1 −Xk∥2F − µ∥A(X)k+1 −A(X)k∥22

).

From (3.8), (3.9) and (4.3), we derive

µ[Cλ(Xk)− Cλ(Xk+1)] = Cλ,µ(Xk, Xk)− µCλ(Xk+1)

≥ Cλ,µ(Xk+1, Xk)− µCλ(Xk+1)

= ∥Xk+1 −Xk∥2F − µ∥A(X)k+1 −A(X)k∥22.

10

The above two inequalities yield that, for any positive integer K,

K∑k=0

∥Xk+1 −Xk∥2F ≤ 1

δ

K∑k=0

(∥Xk+1 −Xk∥2F − µ∥A(X)k+1 −A(X)k∥22

)≤ µ

δ

K∑k=0

(Cλ(Xk)− Cλ(Xk+1))

=µ

δ(Cλ(X0)− Cλ(XK+1))

≤ µ

δCλ(X0).

Hence,∞∑k=0

∥Xk+1 − Xk∥2F < +∞, and so ∥Xk+1 − Xk∥F → 0 as k → +∞. Thus, Xk is

asymptotically regular.(iii) Let Xkj be a convergent subsequence of Xk and X∗ be its limit point, i.e.,

Xkj → X∗, as kj → +∞. (4.4)

From the above limitation, we derive

Bµ(Xkj ) = Xkj + µA∗(b−A(Xkj )) → X∗ + µA∗(b−A(X∗)) = Bµ(X∗), as kj → +∞,

i.e.,

UkjDiag(σkj )VTkj

→ U∗Diag(σ∗)V ∗T , as kj → +∞, (4.5)

where Bµ(Xkj ) = UkjDiag(σkj )VTkj

and Bµ(X∗) = U∗Diag(σ∗)V ∗T are the SVDs of Bµ(Xkj )

and Bµ(X∗) respectively. According to (4.5) and [22, Corollary 7.3.8], we have

[σkj ]i → [σ∗]i for each i = 1, · · · r, as kj → +∞, (4.6)

where r is the rank of Bµ(X∗). By the selection principle (see, e.g., [22, Lemma 2.1.8]), we

can suppose that

Ukj → U , Diag(σkj ) → Diag(σ∗), Vkj → V , as kj → +∞, (4.7)

for some U ∈ Rm×r and V ∈ Rn×r both with orthonormal columns. From (4.7), we getUkjDiag(σkj )V

Tkj

→ UDiag(σ∗)V T . This together with (4.5) implies

UDiag(σ∗)V T = U∗Diag(σ∗)V ∗T = Bµ(X∗). (4.8)

The limitation (4.4) and the asymptotical regularity of Xk imply

∥Xkj+1 −X∗∥F ≤ ∥Xkj+1 −Xkj∥F + ∥Xkj −X∗∥F → 0, as kj → +∞,

which verifies that Xkj+1 also converges to X∗. Note that Xkj+1 = UkjDiag(Hλµ(σkj ))VTkj,

which together with Xkj+1 → X∗ yields

UkjDiag(Hλµ(σkj ))VTkj

→ X∗, as kj → +∞. (4.9)

If there holds

hλµ([σkj ]i) → hλµ([σ∗]i) for each i = 1, 2, · · · r, as kj → +∞, (4.10)

11

then from (4.7), (4.10) and (4.8), we get

UkjDiag(Hλµ(σkj ))VTkj

→ UDiag(Hλµ(σ∗))V T = Hλµ(Bµ(X

∗))

as kj → +∞,

where the last equality is due to the well-definedness 1 of Hλµ(·). The above limitation aswell as (4.9) gives X∗ = Hλµ(Bµ(X

∗)), that is, X∗ is a global stationary point of problem(1.7).

The rest thing is to prove (4.10) to be true.

For i = 1, · · · , r, if [σ∗]i <3√544 (λµ)2/3, then by (4.6),

[σkj ]i <3√54

4(λµ)2/3 when kj is sufficiently large.

This inequality as well as the definition of hλµ in (4.2) gives

hλµ([σkj ]i) = 0 → hλµ([σ∗]i) = 0, as kj → +∞.

If [σ∗]i >3√544 (λµ)2/3, then by (4.6),

[σkj ]i >3√54

4(λµ)2/3 when kj is sufficiently large.

Note that although hλµ(·) defined by (4.2) is not continuous on [0,+∞), it is continuous in(3√544 (λµ)2/3,+∞

). So it follows from [σkj ]i → [σ∗]i that

hλµ([σkj ]i) → hλµ([σ∗]i), as kj → +∞.

If [σ∗]i =3√544 (λµ)2/3, since [σkj ]i → [σ∗]i, there are two possible cases:

Case 1: There is a subsequence of [σkj ]i, say [σkjm ]i, converging to [σ∗]i such that[σkjm ]i ≤ [σ∗]i for each kjm . In this case, we have

hλµ([σkjm ]i) = 0 → hλµ([σ∗]i) = 0, as kjm → +∞.

Case 2: There is a subsequence of [σkj ]i, say [σkjn ]i, converging to [σ∗]i such that

[σkjn ]i > [σ∗]i =3√544 (λµ)2/3 for each kjn . However, we will verify this case can never happen

as long as µ is chosen appropriately.If Case 2 happens,there is a large integer N1 such that

[σkjn ]i ∈( 3√54

4(λµ)2/3,

3√54

3(λµ)2/3

)holds for any kjn ≥ N1. By (ii), ∥Xkjn+1 −Xkjn

∥F → 0 as kjn → +∞. Then there is a largeinteger N2 ≥ N1 such that

[σkjn+1]i ∈( 3√54

4(λµ)2/3,

3√54

3(λµ)2/3

)(4.11)

1The matrix half thresholding operator Hλµ : Rm×n → Rm×n here is in fact a non-symmetric Lowner’soperator [38] associated with the half thresholding function hλµ : R → R. The non-symmetric Lowner’soperator Hλµ : Rm×n → Rm×n is called well-defined if it is independent of the choice of the matrices U and Vin the SVD. In other words, if Y ∈ Rm×n has two different SVDs such as Y = U1Diag(σ)V T

1 = U2Diag(σ)V T2 ,

we have Hλµ(Y ) = U1Diag(hλµ(σ1), · · · , hλµ(σm))V T1 = U2Diag(hλµ(σ1), · · · , hλµ(σm))V T

2 . Theorem 1 ofLecture III in [38] proves that a non-symmetric Lowner’s operator H : Rm×n → Rm×n associated with ascalar valued function h : R+ → R+ is well-defined if and only if h(0) = 0. By this theorem, our matrix halfthresholding operator Hλµ is well-defined since hλµ(0) = 0.

12

holds for any kjn ≥ N2.On the other hand, since Bµ(Xkjn

) = Xkjn+ µA∗(b − A(Xkjn

)) is continuous in µ andBµ(Xkjn

) → Xkjnif µ → 0, we know that if µ is chosen sufficiently small, [σ(Bµ(Xkjn

))]i willbe closed to [σkjn ]i. Let µ be chosen such that

[σ(Bµ(Xkjn))]i ∈

( 3√54

4(λµ)2/3,

3√54

3(λµ)2/3

)holds for any kjn ≥ N2. According to (3.2)-(3.4), by direct computation, we know

limt↓

3√544

(λµ)2/3ϕλµ(t) =

π

4and lim

t↓3√544

(λµ)2/3hλµ(t) =

3√54

6(λµ)2/3.

Note that [σkjn+1]i = hλµ([σ(Bµ(Xkjn))]i) and hλµ(·) is increasing in (

3√544 (λµ)2/3,+∞)

(Lemma 3.2), then there is a large integer N3 ≥ N2 such that

[σkjn+1]i = hλµ([σ(Bµ(Xkjn ))]i) ∈( 3√54

6(λµ)2/3,

3√54

4(λµ)2/3

)(4.12)

holds for any kjn ≥ N3. One can find that (4.12) is in contradiction with (4.11). Thiscontradiction shows that Case 2 will never happen as long as µ is chosen appropriately.

Therefore, we have shown (4.10) is true. The proof is thus complete.

5 Setting of parameters and fixed point contiuation

In this section, we discuss the problem of parameter selection in our algorithm. As we allknow, the quality of solutions to optimization problems depends seriously on the setting ofregularization parameter λ. But the selection of proper parameters is a very hard prob-lem. There is no optimal rule in general. Nevertheless, when some prior information (e.g.,low rank) is known for a problem, it is realistic to set the regularization parameter morereasonably.

5.1 Location of the optimal regularization parameter

We begin with finding the location of the optimal λ∗, which then serves as the basis of theparameter setting strategy used in the algorithm to be proposed. Specifically, suppose thata problem can be formulated as an S1/2 regularization form (1.7), whose solutions are thematrices of rank r. Thus, we are required to solve the S1/2 regularization problem restrictedto the subregion

∑r = X ∈ Rm×n| rank(X) = r. For any µ, denote by Bµ(X) = X +

µA∗(b−A(X)). Assume X∗ is a solution to the S1/2 regularization problem and σ(Bµ(X∗))

is arranged in nonincreasing order. By Theorem 3.9 (particularly (3.16)) and (4.2), we have

[σ(Bµ(X∗))]i >

3√54

4(λ∗µ)2/3 ⇔ [σ(X∗)]i >

3√54

6(λ∗µ)2/3 ⇔ i ∈ 1, 2, · · · , r

and

[σ(Bµ(X∗))]i ≤

3√54

4(λ∗µ)2/3 ⇔ [σ(X∗)]i = 0 ⇔ i ∈ r + 1, r + 2, · · · , n,

13

which implies

√96

9µ([σ(Bµ(X

∗))]r+1)3/2 ≤ λ∗ <

√96

9µ([σ(Bµ(X

∗))]r)3/2 .

The above estimation provides an exact location of where the optimal parameter should be.We can then take

λ∗ =

√96

9µ

((1− α) ([σ(Bµ(X

∗))]r+1)3/2 + α ([σ(Bµ(X

∗))]r)3/2)

with any α ∈ [0, 1). Especially, a most reliable choice of λ∗ is

λ∗ =

√96

9µ([σ(Bµ(X

∗))]r+1)3/2 . (5.1)

Of course, it may not be the best choice since we should note that the larger λ∗, the larger

threshold value3√544 (λ∗µ)2/3, and the lower rank of the solution resulted by the thresholding

algorithm.We also note that formula (5.1) is valid for any fixed µ. We will use it with a fixed µ0

satisfying 0 < µ0 < ∥A∥−2 below. In applications, we may use Xk instead of the real solutionX∗ and the rank of Xk instead of r + 1, that is, we can take

λk+1 =

√96

9µ0([σ(Xk)]rk)

3/2 , (5.2)

where rk is the rank of Xk. More often, we can also take

λk+1 = max

λ, min

ηλk,

√96

9µ0([σ(Xk)]rk)

3/2

, (5.3)

where λ is a sufficiently small but positive real number, and η ∈ (0, 1) is a constant and rkis the rank of Xk. In this case, λk can keep monotonically decreasing. In next subsection,one will see that (5.3) may result an acceleration of the iteration.

5.2 Interpretation as a method of fixed point continuation

In this subsection, we recast (5.3) as a continuation technique (i.e., homotopy approach)which accelerates the convergence of the fixed point iteration. In [21], Hale et al. de-scribe a continuation technique to accelerate the convergence of the fixed point iterationfor the ℓ1 regularization problem. Inspired by this work, Ma et al. [29] provide a sim-ilar continuation technique to accelerate the convergence of the fixed point iteration forthe nuclear norm regularization problem. As shown in [21, 29], this continuation techniqueimproves considerably the convergence speed of fixed point iterations. The main idea intheir continuation technique, explained in our context, is to choose a decreasing sequenceλk : λ1 > λ2 > · · · > λL = λ > 0, then in the kth iteration, use λ = λk. Therefore,formula (5.3) coincides with this continuation technique. Generally speaking, our algorithmcan be regarded as a fixed point continuation algorithm, but is implemented to a nonsmooth,nonconvex and non-Lipschitz optimization problem.

Thus, a fixed point iterative algorithm based on the half norm of matrices for problem(1.7) can be specified as follows.

14

Algorithm 5.2. Half Norm Fixed Point algorithm (HFP algorithm)Given the linear operator A : Rm×n → Rp and the vector b ∈ Rp;Set the parameters µ0 > 0, λ > 0 and η ∈ (0, 1).- Initialize: Choose the initial values X0, λ1 with λ1 ≫ λ, set X = X0 and λ = λ1.- for k = 1 : maxiter, do λ = λk,

-while NOT converged, do• Compute B = X + µ0A∗(b−A(X)), and its SVD, say B = UDiag(σ)V T

• Compute X = UDiag(Hλµ0(σ))VT

- end while, and output: Xk, σk, rk =rank(Xk);

- set λk+1 = maxλ, min

ηλk,

√96

9µ0([σ(Xk)]rk)

3/2

;

- if λk+1 = λ, return;- end for

In Algorithm 5.2, the positive integer maxiter is large enough that the convergence inouter loop can be ensured.

5.3 Stopping criteria for inner loops

Note that in the half norm fixed point algorithm, in the kth inner loop we solve problem(1.7) for a fixed λ = λk. We must determine when to stop this inner iteration and go tothe next inner iteration. Since when Xk gets close to an optimal solution X∗, the distancebetween Xk and Xk+1 should become very small. Hence, we can use the following criterion

∥Xk+1 −Xk∥Fmax1, ∥Xk∥F

< xtol, (5.4)

where xtol is a small positive number.Besides the above stopping criterion, we use Im to control the maximum number of the

inner loops. i.e., if the stopping rule (5.4) is not satisfied after Im iterations, we terminatethe subproblem and update λ to start the next subproblem.

6 HFPA algorithm: HFP algorithm with an approximate SVD

In Algorithm 5.2, computing singular value decompositions is the main computational cost.Inspired by the works of Cai et al. [4] and Ma et al. [29], instead of computing the full SVDof the matrix B in each iteration, we implement a variant of HFP algorithm in which wecompute only a rank-r approximation to B, where r is an estimator of the rank of the opti-mal solution. We call this half norm fixed point algorithm with an approximate SVD HFPAalgorithm. This approach greatly reduces the computational effort required by the algo-rithm. Specifically, we compute an approximate SVD by a fast Monte Carlo algorithm: theLinear Time SVD algorithm developed by Drineas et al. [16]. For a given matrix A ∈ Rm×n,and parameters cs, ks ∈ Z+ with 1 ≤ ks ≤ cs ≤ n, and pini=1, pi ≥ 0,

∑ni=1 pi = 1, this

algorithm returns an approximation to the largest ks singular values and corresponding leftsingular vectors of the matrix A in linear O(m + n) time. The Linear Time ApproximateSVD Algorithm is outlined below.

15

Linear Time Approximate SVD Algorithm [16,29]- Input: A ∈ Rm×n,cs, ks ∈ Z+ s.t. 1 ≤ ks ≤ cs ≤ n, pini=1 s.t. pi ≥ 0,

∑ni=1 pi = 1.

- Output: Hks ∈ Rm×ks and σt(C), t = 1, 2, · · · , ks.- For t = 1 to cs,

• Pick it ∈ 1, 2, · · · , n with Probit = α = pα, α = 1, 2, · · · , n.• Set C(t) = A(it)/

√ctpit .

- Compute CTC and its SVD, say CTC =∑cs

t=1 σ2t (C)ytyt

T.

- Compute ht = Cyt/σt(C) for t = 1, 2, · · · , ks.- Return Hks , where H

(t)ks

= ht, and σt(C), t = 1, 2, · · · , ks.

The outputs σt(C) (t = 1, 2, · · · , ks) are approximations to the largest ks singular values

and H(t)ks

(t = 1, 2, · · · , k) are approximations to the corresponding left singular vectors ofthe matrix A. Thus, the SVD of A is approximated by

A ≈ Aks := HksDiag(σ(C))(ATHksDiag(1/σ(C)T ).

Drineas et al. [16] prove that with high probability, Aks is an approximation to the bestrank-ks approximation to A.

In our numerical experiments, same as in [29], we set cs = 2rm − 2, where rm =[(m+ n−

√(m+ n)2 − 4p

)/2]is, for a given number of entries sampled, the largest rank

of m× n matrices for which the matrix completion problem has a unique solution. We referro [29] for how to set ks. We also set all pi equal to 1/n. For more details for the choicesof the parameters in the Linear Time Approximate SVD Algorithm, please see [16, 29]. TheLinear Time Approximate SVD Code we will use is written by Shiqian Ma and is availableat http://www.columbia.edu/∼sm2756/FPCA.htm.

7 Numerical experiments

In this section, we report some numerical results on a series of matrix completion problemsof the form (1.2) to demonstrate the performance of the HFPA algorithm. The purpose ofnumerical experiments is to assess the effectiveness, accuracy, robustness and convergence ofthe algorithm. The effectiveness is measured by how few measurements required to exactlyrecover a low-rank matrix. The fewer the measurements used by an algorithm, the betterthe algorithm. Under the same measurements, the shorter time used by an algorithm andthe higher accuracy achieved by it, the better the algorithm. We will also test the robustnessof the algorithm with respect to the varying dimensions, the varying ranks and the varyingsampling ratios. To compare performance of finding low-rank matrix solutions, some othercompetitive algorithms such as the singular value thresholding algorithm (SVT2) [4], the fixedpoint contiuation algorithm based on an approximate SVD using the iterative Lanczos algo-rithm (FPC3) [29], the fixed point contiuation algorithm based on a linear time approximateSVD (FPCA4) [29] have been also demonstrated together with our HFPA algorithm. Notethat the former three algorithms are all based on the nuclear norm minimization, while these

2The SVT code is available at http://svt.caltech.edu, which is written by Emmanuel Candes, October2008, and last modified by Farshad Harirchi and Stephen Becker, April 2011.

3The FPC code is available at http://svt.caltech.edu, which is coded by Stephen Becker, March 2009. Hereferred to [29].

4The FPCA code is available at http://www.columbia.edu/∼sm2756/FPCA.htm, which is coded and mod-ified by Shiqian Ma, July 2008 and April 2009, respectively.

16

four algorithms all depend on the approximate SVD. We also note that some manifold-basedalgorithms without SVD, such as GenRTR [1], RTRMC [3], OptSpace [24] and LMaFit [42],have good performances. Because of space constraints, we will not compare to them.

All computational experiments were performed in MATLAB R2009a on a Dell desktopcomputer with an Intel(R) Core (TM) i3-2120 3.29GHZ CPU and 3.23GB of RAM.

In our simulations, we will use the same way as used in relevant researches (for instance,[4, 6, 29]) to generate m × n matrices of rank r. The procedure is that: we first generaterandom matrices ML ∈ Rm×r and MR ∈ Rn×r with i.i.d. Gaussian entries, and then setM = MLM

TR . We then sampled a subset Ω of p entries uniformly at random. Thus, the

entries of M on Ω are observed data and M is the real unknown matrix. For each problemwith m × n matrix M , measurement number p and rank r, we solved a fixed number ofrandomly created matrix completion problems. We use SR:=p/(mn), i.e., the number ofmeasurements divided by the number of entries of the matrix, to denote the sampling ratio.Recall that an m × n matrix of rank r depends upon df:=r(m + n − r) degrees of freedom.Then OS:=p/df is the oversampling ratio, i.e., the ratio between the number of sampledentries and the ‘true dimensionality’ of an m × n matrix of rank r. Note that if OS< 1,then there is always an infinite number of matrices with rank r with the given entries, sowe cannot hope to recover the matrix in this situation. We also note that when OS≥ 1, thecloser to 1 the OS is, the more difficult to recover the matrix. For this reason, following [29],we call a matrix completion problem a ‘easy’ problem if OS and SR in this problem are suchthat OS×SR>0.5 and OS > 2.6, equivalently a ‘hard’ problem if OS×SR≤0.5 or OS≤ 2.6.In the tables, FR := 1/OS = df/p is an often used quantity in literatures. We use ‘rank’ todenote the average rank of matrices that are recovered by an algorithm. We use ‘time’ and‘iter’ to denote the average time (seconds) and the average number of iterations respectivelythat an algorithm takes to reach convergence.

We use three relative errors: the relative error on Ω, the relative recovery error in theFrobinus norm and the relative recovery error in the spectral norm

rel.err(Ω) :=∥M(Ω)−Xopt(Ω)∥F

∥M(Ω)∥F, rel.err.F :=

∥M −Xopt∥F∥M∥F

, rel.err.s :=∥M −Xopt∥2

∥M∥2to evaluate the closeness of Xopt to M , where Xopt is the ‘optimal’ solution to (1.2) obtainedby an algorithm.

The parameters and initial values in HFPA algorithm for matrix completion problemsare listed in Table 1.

Table 1: Parameters and initial values in HFPA algorithm

λ = 1e− 4, η = 1/4, λ1 = min3,mn/p∥A∗(b)∥2, X0 = A∗(b), maxiter = 10, 000,if hard problem & max(m,n) < 1000 µ0 = 1.7; Im = 200;else

if SR < 0.5 & min(m,n) ≥ 1000 µ0 = 1.98; else µ0 = 1.5; end; Im = 10;end

7.1 Results for randomly created noiseless matrix completion problems

Our first experiment is to compare the recovering ability of HFPA to SVT, FPC and FPCA forsmall and easy matrix completion problems. Here a ’small’ matrix means that the dimension

17

of the matrix is less than 200, Specifically in the first experiment, we take m = n = 100,OS= 3, FR=0.33 and let the real rank increased from 6 to 16 per 1 increase. The tolerance inthe four algorithms is set to be 10−4. For each scale of these problems, we solve 10 randomlycreated matrix completion problems. The computational results for this experiment aredisplayed in Table 2.

From Table 2, the first observation is that only HFPA recovers all the real ranks. Whenr < 11, the recovered ranks by SVT are larger than the real ranks; the recovered ranks byFPC are also larger than the real ones when r < 10; the same thing happens to FPCA asthe real rank equal to 16. The second observation is that HFPA runs fastest among the fouralgorithms for most of the problems. As the real ranks change from 6 to 16, the time costby HFPA is almost no change. Although when r ≤ 6, FPCA is slightly faster than HFPA inseveral percent seconds, HFPA is much faster than FPCA when r ≥ 12. Obviously, HFPAruns faster than SVT and FPC. At last, let us make a comparison among the accuraciesachieved by the four algorithms. We can observe that HFPA achieves the most accuratesolutions for most of the problems; even when r ≥ 12, at least one of the three relative errorsby HFPA achieves 10−6; meanwhile the accuracies of SVT and FPC are not very good whenr ≤ 7, and FPCA begins to yield very inaccurate solutions when r ≥ 13. We can drawa conclusion that for the small and easy matrix completion problems, HFPA is very fast,effective and robust.

Our second experiment is to compare the recovering abilities of HFPA to SVT, FPCand FPCA for small but hard matrix completion problems. These problems are ‘hard’ andchallenging to recover because the oversampling ratio OS=2 is very close to 1, which impliesthat the observed data are very limited with respect to the freedom degree of the unknownmatrices. In this experiment, we take m = n = 100, OS=2, FR=0.50 and let r increasedfrom 2 to 24 per 2 increases. For this set of problems, SR ranges from 7.9% to 84.5%. Thetolerance in this experiment is set to be 10−6. For each scale of these problems, we alsosolve 10 randomly created matrix completion problems. The results are displayed in Table 3.From Table 3, we find that SVT and FPC cannot work well in the sense that the recoveredranks by them are far more than the real ranks and the accuracies of their solutions are pooruntil the real rank increases to 20. It is clear that FPCA and HFPA both work very well.We can observe that as r increases from 2 to 24, the time cost by HFPA and FPCA are bothincreasing but in slow speed. As we can see, HFPA shares the accuracy as good as or slightlybetter than FPCA, however the former is obviously faster than the later.

Now we begin to test our algorithm for large randomly created matrix completion prob-lems. We only run 5 times for each large scale problems. The numerical results of HFPAfor easy and large matrix completion problems are presented in Table 4. For easy problems,since SVT performs in general better than FPC, we omit the results of FPC in Table 4 forthe sake of limited spaces. For example, when m = n = 1000, r = 10, OS=6, SR=0.119 andFR=0.17, FPC costs more than 350 seconds to recover the matrix and SVT only costs about8 seconds while they achieve the similar accuracy.

From Table 4, we can see that for a 3000× 3000 unknown matrix of rank 200 with 38.7%sampling ratio, HFPA can well recover it in only 12 minutes, while SVT needs half an hourand FPCA fails to work. We also find that for a fixed scale unknown matrix, the decrease ofsampling ratio has little influence on the computational time of HFPA, however the increaseof sampling ratio can remarkably improve its accuracy. We can conclude that for these easyproblems some of which have a very low rank and some of which have a low but not very lowrank, HFPA is always powerful enough to recover them.

For hard problems, without any exception SVT and FPC either diverge or cannot termi-

18

Table 2: Comparison of SVT, FPC, FPCA and HFPA for randomly created small and easymatrix completion problems (m = n = 100, r = 6 : 1 : 16, OS=3, FR=0.33, xtol=10−4)

r SR Solver rank iter time rel.err(Ω) rel.err.F rel.err.sSVT 11 1000 29.19 4.43e-2 3.51e-2 5.51e-2

6 0.349 FPC 10 742 14.39 3.67e-4 1.13e-2 2.11e-2HFPA 6 96 0.14 1.58e-4 3.44e-4 5.43e-4FPCA 6 84 0.12 9.56e-5 1.91e-4 2.49e-4SVT 11 1000 29.21 3.02e-2 3.49e-2 7.32e-2










16 0.883 FPC 16 85 1.72 1.13e-4 1.21e-4 8.76e-5HFPA 16 51 0.13 1.76e-6 1.72e-6 1.65e-6FPCA 17 600 1.10 6.70e-1 6.30e-1 6.89e-1

19

Table 3: Comparison of SVT, FPC, FPCA and HFPA for randomly created small but hardmatrix completion problems (m = n = 100, r = 2 : 2 : 24, OS=2, FR=0.50, xtol=10−6).

r SR Solver rank iter time rel.err(Ω) rel.err.F rel.err.sSVT Divergent - - -












24 0.845 FPC 24 234 5.67 1.22e-4 1.52e-4 1.28e-4HFPA 24 1611 3.17 4.37e-7 4.31e-7 5.66e-7FPCA 24 5011 9.50 1.93e-7 2.96e-7 4.14e-7

20

Table 4: Comparison of SVT, FPCA and HFPA for randomly created large and easy matrixcompletion problems (xtol=10−4).

Problems SVT HFPA FPCAm=n r OS SR FR time rel.err.F time rel.err.F time rel.err.F200 20 3 0.570 0.33 9.29 1.80e-4 0.43 2.98e-5 0.64 3.19e-5

4 0.760 0.25 2.33 1.37e-4 0.43 1.24e-6 4.73 3.36e-3500 50 3 0.570 0.33 29.20 1.73e-4 3.67 1.76e-5 6.50 2.56e-5

4 0.760 0.25 11.25 1.38e-4 3.45 7.73e-7 44.00 6.00e-4800 80 3 0.570 0.33 78.78 1.70e-4 15.48 1.21e-5 28.99 2.84e-5

4 0.760 0.25 37.70 1.34e-4 15.03 2.25e-7 162.05 2.57e-31000 10 6 0.119 0.17 8.29 1.65e-4 4.96 4.05e-4 6.66 4.36e-4

50 4 0.390 0.25 51.33 1.61e-5 14.56 1.34e-5 24.86 3.14e-5100 3 0.570 0.33 129.58 1.67e-4 21.99 1.04e-5 42.38 2.29e-5

2000 10 6 0.060 0.17 17.70 1.69e-4 26.87 9.01e-4 35.51 9.14e-4100 4 0.390 0.25 231.85 1.60e-4 195.95 1.06e-5 356.37 2.36e-5200 3 0.570 0.33 909.87 1.69e-4 257.87 1.11e-5 1413.06 4.34e-5

3000 50 5 0.165 0.20 167.38 1.54e-4 94.99 2.45e-4 126.15 2.54e-4100 4 0.262 0.25 368.42 1.66e-4 283.15 8.67e-5 420.30 9.86e-5200 3 0.387 0.33 1837.82 1.85e-4 717.59 3.88e-5 Out of memory!

Table 5: Comparison of FPCA and HFPA for randomly created large and hard matrixcompletion problems (xtol=10−4).

Problems HFPA FPCAm=n r OS SR FR time rel.err(Ω) rel.err.F time rel.err(Ω) rel.err.F200 10 2 0.195 0.50 2.03 1.43e-4 3.60e-4 7.11 2.44e-4 6.91e-4

20 1.3 0.247 0.77 3.39 1.24e-4 5.75e-4 10.62 2.04e-4 1.07e-3500 25 2 0.195 0.50 15.48 1.40e-4 3.07e-4 60.16 2.44e-4 5.97e-4

50 1.2 0.228 0.83 38.24 1.30e-4 6.66e-4 98.68 2.21e-4 1.53e-3800 40 2 0.195 0.50 43.85 1.10e-4 2.32e-4 185.64 2.25e-4 5.36e-4

80 1.2 0.228 0.83 123.46 1.29e-4 6.77e-4 336.79 2.21e-4 1.53e-31000 20 2 0.079 0.50 21.00 2.95e-4 8.19e-4 22.79 2.80e-4 7.67e-4

50 2 0.195 0.50 18.01 1.64e-4 3.99e-4 20.05 1.64e-4 3.92e-4100 1.5 0.285 0.67 49.41 1.12e-4 3.09e-4 54.00 1.11e-4 2.98e-4

2000 20 3 0.060 0.33 52.15 4.21e-4 8.94e-4 60.78 4.18e-4 8.74e-450 2 0.099 0.50 89.02 2.58e-4 6.59e-4 100.45 2.52e-4 6.47e-4100 2 0.195 0.50 109.22 1.69e-4 3.99e-4 127.21 1.67e-4 3.94e-4200 2 0.380 0.50 348.40 7.58e-5 1.50e-4 414.71 7.38e-5 1.41e-4

3000 50 2 0.066 0.50 277.01 3.16e-4 8.28e-4 290.93 3.27e-4 8.52e-4100 2 0.131 0.50 282.62 2.14e-4 5.27e-4 304.70 2.10e-4 5.17e-4200 2 0.258 0.50 659.74 1.39e-4 3.12e-4 747.07 1.33e-4 2.96e-4300 2 0.320 0.50 1247.87 8.15e-5 1.60e-4 1420.86 6.90e-5 1.32e-4

21

nate in one hour, or yield very inaccurate solutions. For example, when m = n = 200, r = 10and SR=0.195 which is the simplest case, SVT costs more than 300 seconds to recover amatrix of rank 43 with the relative error in Frobinus norm of 10−1, while FPC recovers amatrix of rank 69 with relative error in Frobinus norm of 10−1. Another simple example isthat when m = n = 200, r = 20 and SR=0.380, SVT costs more than 700 seconds to recovera matrix of rank 87 with relative error in Frobinus norm of 10−1, while FPC recovers a matrixof rank 96 with relative error in Frobinus norm of 10−2. Therefore, in this case, only FPCAis comparable to HFPA. The results are displayed in Table 5. From Table 5, we can see thatHFPA still has a powerful recovering ability for hard and large matrix completion problems.

7.2 Results for matrix completion from noisy sampled entries

In this subsection, we simply demonstrate the results of HFPA for matrix completion prob-lems from noisy sampled entries. Suppose we observe data from the following model

Bij = Mij + Zij , (i, j) ∈ Ω, (7.1)

where Z is a zero-mean Gaussian white noise with standard deviation σ. The results ofHFPA together with SVT and FPCA are displayed in Table 6. The quantities are averagesof 5 runs. The tolerance is set to be 10−4.

From Table 6, we see that for the noisy sampled data HFPA performs as well as or slightlybetter than FPCA, while it is obviously more powerful than SVT.

Table 6: Numerical results for SVT, HFPA and FPCA on random matrix completion prob-lems with noisy

noise Problems SVT HFPA FPCAlevel σ m=n r OS SR time rel.err.F time rel.err.F time rel.err.F10−2 1000 10 6 0.119 74.38 3.15e-3 33.44 1.71e-3 34.88 1.73e-3

50 4 0.390 448.84 1.65e-3 206.44 1.02e-3 208.34 1.02e-3100 3 0.570 930.83 1.26e-3 294.20 8.29e-4 292.99 1.09e-3150 2 0.555 - - 291.02 9.04e-4 298.56 1.30e-3

10−1 1000 10 6 0.119 - - 33.22 1.71e-2 34.49 1.73e-250 4 0.390 1770.99 *1.66e-2 203.97 1.00e-2 204.89 1.01e-2100 3 0.570 2392.36 **1.23e-2 289.36 8.38e-3 288.25 1.10e-2150 2 0.555 - - 308.96 5.09e-3 309.24 7.36e-3

* The recovered rank by SVT is 125.** The recovered rank by SVT is 167.- The SVT algorithm can not terminate in one hour.

7.3 Results for real problems

In this subsection, we apply HFPA to image inpainting problems in order to test its ef-fectiveness in real data matrices. It is well known that grayscale images and color imagescan be expressed by matrices and tensors, respectively. In grayscale image inpainting, thegrayscale value of some of the pixels of the image are missing, and we want to fill in thesemissing values. If the image is of low-rank, or of numerical low-rank, we can solve the imageinpainting problem as a matrix completion problem (1.2) (see, e.g., [29]). Here, Figure 2(a)is a 600 × 903 grayscale image of rank 600. We applied SVD to Figure 2(a) and truncated

22

(a)100 200 300 400 500 600 700 800 900

(b)100 200 300 400 500 600 700 800 900

(c)100 200 300 400 500 600 700 800 900

(d)100 200 300 400 500 600 700 800 900

(e)100 200 300 400 500 600 700 800 900

(f)100 200 300 400 500 600 700 800 900

(g)100 200 300 400 500 600 700 800 900

(h)100 200 300 400 500 600 700 800 900

Figure 2: (a):Original 600 × 903 image with full rank; (b): Image of rank 80 truncated from (a);(c): 50% randomly masked from (a); (d): Recovered image from (c) (rel.err.F = 8.30e-2); (e): 50%randomly masked from (b); (f): Recovered image from (e) (rel.err.F = 6.56e-2); (g): Deterministicallymasked from (b); (h): Recovered image from (g) (rel.err.F= 6.97e-2).

23

this decomposition to get the rank-80 image which is shown in Figure 2(b). Figure 2(c) is amasked version of the image in Figure 2(a), where half of the pixels in Figure 2(a) have beenlost at random. Figure 2(d) is an image obtained from Figure 2(c) by applying HFPA. Figure2(d) is of rank 54 and it is a low-rank approximation to Figure 2(a) with a relative error of8.30e-2. Figure 2(e) is a masked version of the image in Figure 2(b), where half of the pixelsin Figure 2(b) have been masked uniformly at random. Figure 2(f) is the image obtainedfrom Figure 2(e) by applying HFPA. Figure 2(f) is of rank 46 and it is an approximation toFigure 2(b) with a relative error of 6.56e-2. Figure 2(g) is another masked image obtainedfrom Figure 2(b), where 10 percent of the pixels have been masked in a non-random fashion.Figure 2(h) is the image obtained from Figure 2(g) by applying HFPA. Figure 2(h) is of rank52 and it is an approximation to Figure 2(b) with a relative error of 6.97e-2.

8 Conclusion

In this paper, we proposed using the S1/2 regularization method, a nonconvex, nonsmooth,and non-Lipschitz optimization problem, to solve the affine rank minimization problem. Wefirst gave a globally necessary optimality condition for the S1/2 regularization problem, whichwas characterized as a matrix-valued fixed point equation associated with the singular valuehalf thresholding operator. Then the fixed point iterative method for the S1/2 regularizationproblem was naturally proposed, and the convergence analysis was established. By using auseful parameter setting strategy together with an approximate singular value decompositionprocedure, we get a very efficient algorithm (HFPA) for affine rank minimization problems.Numerical results on matrix completion problems showed that the proposed HFPA algorithmis very fast, efficient and robust.

Acknowledgements

The authors are grateful to Prof. Shiqian Ma at The Chinese University of Hong Kong forsharing the FPCA code. The authors also thank Prof. Wotao Yin at Rice University for hisvaluable suggestion in the convergence analysis.

References

[1] P.-A. Absil, C. Baker, and K. Gallivan, Trust-region methods on Riemannian manifolds,Found. Comput. Math., 7(2007), pp. 303-330.

[2] W. Bian, X. Chen, Worst-case complexity of smoothing quadratic regularization meth-ods for non-Lipschitzian optimization, SIAM J. Optim., 23 (2013), pp. 1718-1741.

[3] N. Boumal, and P. Absil, RTRMC: A riemannian trust-region method for low-rankmatrix completion, In NIPS, 2011.

[4] J. Cai, E. Candes, and Z. Shen, A singular value thresholding algorithm for matrixcompletion, SIAM J. Optim., 20 (2010), pp. 1956-1982.

[5] E. Candes, and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98(2010), pp. 925-936.

[6] E. Candes, and B. Recht, Exact matrix completion via convex optimization, Found.Comput. Math., 9 (2009), pp. 717-772.

24

[7] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal recon-struction from highly incomplete frequency information, IEEE Trans. Inform. Theory,52 (2006), 489-509.

[8] E. Candes, and T. Tao, The power of convex relaxation: Near-optimal matrix comple-tion, IEEE Trans. Inform. Theory, 56 (2010), pp. 2053-2080.

[9] R. Chartrand, Exact reconstructions of sparse signals via nonconvex minimization, IEEESignal Process. Lett., 14 (2007), pp. 707-710.

[10] R. Chartrand, Nonconvex regularization for shape preservation, IEEE Inter. Confer.Image Process., I (2007), pp. 293-296.

[11] R. Chartrand, Fast algorithms for nonconvex compressive sensing: MRI reconstructionfrom very few data, in Proc. IEEE Int. Symp. Biomed. Imag., 2009, pp. 262-265.

[12] R. Chartrand, and V. Staneva, Restricted isometry properties and nonconvex compres-sive sensing, Inverse Problems, 24 (2008), pp. 20-35.

[13] X. Chen, D. Ge, Z. Wang, Y. Ye, Complexity of unconstrained ℓ2-ℓp minimization, Math.Program., to appear.

[14] X. Chen, F. Xu, and Y. Ye, Lower bound theory of nonzero entries in solutions of ℓ2−ℓpminimization, SIAM J. Sci. Comput., 32 (2010), pp. 2832-2852.

[15] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), pp. 1289-1306.

[16] P. Drineas, R. Kannan, and M. W. Mahoney, Fast Monte Carlo algorithms for matricesii: computing low-rank approximations to a matrix, SIAM J. Comput., 36 (2006), pp.158-183.

[17] B. Efron, T. Hastie, and I. M. Johnstone et al, Least angle regression, Annals of Statis-tics, 32 (2004), pp. 407-499.

[18] M. Fazel, Matrix rank minimization with applications, PhD thesis, Stanford University,2002.

[19] M. Fazel, H. Hindi, and S. Boyd, A rank minimization heuristic with application tominimum order system approximation, in Proc. Amer. Control Confer., 2001.

[20] M. Fazel, H. Hindi, and S. Boyd, Log-det heuristic for matrix rank minimization withapplications to Hankel and Euclidean distance matrices, in Proc. Amer. Control Confer.,2003.

[21] E. T. Hale, W. Yin, and Y. Zhang, A fixed-point continuation method for ℓ1-regularizedminimization: methodology and convergence, SIAM J. Optim., 19 (2008), pp. 1107-1130.

[22] R. Horn, and C. Johnson, Matrix Analysis, Cambridge University Press, New York,1990.

[23] S. Ji, K.-F. Sze, and Z. Zhou et al, Beyond convex relaxation: a polynomial-time non-convex optimization approach to network localization, avaliable athttp://www.stanford.edu/ yyye/lp snl local.

[24] R. Keshavan, A. Montanari, and S. Oh, Matrix completion from a few entries, IEEETrans. Inform. Theory, 56 (2010), pp. 2980-2998.

[25] M.-J. Lai, Y. Xu, and W. Yin, Improved iteratively rewighted least squares for uncon-strained smoothed ℓp minimization, Rice CAAM technical report 11-12, 2012.

[26] Z. Lin, M. Chen, and Y. Ma, The augmented Lagrange multiplier method for exactrecovery of corrupted low-rank matrices, arxiv.org/abs/1009.5055v2, 2011.

[27] Y. Liu, D. Sun, and K.-C. Toh, An implementable proximal point algorithmic frameworkfor nuclear norm minimization, Math. Program. Ser A, 133 (2012), pp. 399-436.

25

[28] Z. Liu, and L. Vandenberghe, Interior-point method for nuclear norm approximationwith application to system identification, SIAM J. Matrix Anal. Appl., 31 (2009), pp.1235-1256.

[29] S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterative methods for matrixrank minimization, Math. Program. Ser A, 128 (2011), pp. 321-353.

[30] K. Mohan, and M. Fazel, Iterative reweighted algorithms for matrix rank minimization,J Machine Learn. Res., 13 (2012), pp. 3253-3285.

[31] F. Nie, H. Huang, and C. Ding, Low-rank matrix recovery via efficient Schatten p-norm minimization, Proceedings of the Twenty-Sixth AAAI Conference on ArtificialIntelligence, 2012, pp. 655-661.

[32] A. Rakotomamonjy, R. Flamary, and G. Gasso et al, ℓp-ℓq penalty for sparse linear andsparse multiple kernel multitask learning, IEEE Trans. Neural Netw., 22 (2011), pp.1307-1320.

[33] G. Rao, Y. Peng, and Z. Xu, Robust sparse and low-rank matrix fraction based on theS1/2 modeling, Science China-Infor. Sci., to appear.

[34] B. Recht, M. Fazel, and P. Parrilo, Guaranteed minimum rank solutions of matrixequations via nuclear norm minimization, SIAM Review, 52 (2010), pp. 471-501.

[35] R. T. Rockafellar, Monotone Operators and the proximal point algorithm, SIAM J.Control Optim., 14 (1976), pp. 877-898.

[36] A. Rohde, and A. Tsybakov, Estimation of high-dimensional low-rank matrices, Annalsof Statistics, 39 (2011), pp. 887-930.

[37] R. Skelton, T. Iwasaki, and K. Grigoriadis, A unified algebraic approach to linear controldesign, Taylor and Francis, 1998.

[38] D. Sun, Matrix Conic Programming, Dalian University of Science and Technology,Dalian, 2011.

[39] M. Tao, and X. Yuan, Recovering low-rank and sparse components of matrices fromincomlete and noisy observations, SIAM J. Optim., 21 (2011), pp. 57-81.

[40] K.-C. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regu-larized least squares problems, Pacific J. Optim., 6 (2010), 615-640.

[41] R. Tutuncu, K. Toh, and M. Todd, Solving semidefinite-quadratic-linear programs usingSDPT3, Math. Program. Ser. B, 95 (2003), pp. 189-217.

[42] Z. Wen, W. Yin, and Y. Zhang, Solving a low-rank factorization model for matrixcompletion by a nonlinear successive over-relaxation algorithm, Math. Program. Comp.,4 (2012), pp. 333-361.

[43] Z. Xu, Data modeling: Visual psychology approach and ℓ1/2 regularization theory, inProc. Inter. Congr. Math., 4 (2010), pp. 3151-3184.

[44] Z. Xu, X. Chang, and F. Xu et al, ℓ1/2 regularization: a thresholding representationtheory and a fast solver, IEEE Trans. Neural Netw. Learn. Sys., 23 (2012), pp. 1013-1027.

[45] Z. Xu, H. Guo, and Y. Wang et al, Representation of ℓ1/2 regularizer among ℓq (0 < q ≤1) regularizer: an experimental study based on phase diagram, Acta Autom. Sinica, 38(2012), pp. 1225-1228.

26

AAppendix: Proof of Lemma 3.6

The proof here follows [33] but is more complete than there.Let the SVD of Y ∈ Rm×n be Y = UDiag(σ)V T . Denote by

σ = (σ1, · · · , σr)T , U = [u1, · · · , ur], V = [v1, · · · , vr], (A.1)

where ui ∈ Rm, vi ∈ Rn (i = 1, · · · , r) are the orthonormal columns of U and V respectively,and σ1 ≥ · · · ≥ σr > 0.

Let X admit the SVD as X = U ′Diag(σ′)V ′T =∑n

i=1 σ′iu

′iv

′Ti , where σ′ = (σ′

1, · · · , σ′m),

U ′ = [u′1, · · · , u′m], V = [v′1, · · · , v′m] with u′i ∈ Rm, v′i ∈ Rn and σ′1 ≥ · · · ≥ σ′

m ≥ 0. Notethat the SVD of X here may not be in the most reduced form. Denote by ti = u′Ti Y v′i foreach i = 1, · · · ,m. Note that

∥X − Y ∥2F + λ∥X∥1/21/2 = ∥U ′Diag(σ′)V ′T − Y ∥2F + λ∥U ′Diag(σ′)V ′T ∥1/21/2

= ∥Diag(σ′)− U ′TY V ′∥2F + λ∥Diag(σ′)∥1/21/2

= ∥Diag(σ′)− U ′TY V ′∥2F + λ∥Diag(σ′)∥1/21/2

=

m∑i=1

[(σ′i − u′Ti Y v′i)

2 + λσ′1/2i ]

=

m∑i=1

[(σ′i − ti)

2 + λσ′1/2i ]. (A.2)

Let

Q(U ′, V ′) = min

n∑

i=1

[(σ′i − ti)

2 + λσ′1/2i ] | σ′

1 ≥ 0, · · · , σ′m ≥ 0

,

then problem (3.6) is equivalent to

minU ′,V ′

Q(U ′, V ′)

s.t. U ′TU ′ = Im, V ′TV ′ = Im.(A.3)

Let f(σ′i) = (σ′

i − ti)2 + λσ

′1/2i , then

Q(U ′, V ′) = min

m∑i=1

f(σ′i)∣∣ σ′

1 ≥ 0, · · · , σ′m ≥ 0

. (A.4)

Fixing U ′, V ′, note that∑m

i=1 f(σ′i) is separable as to (σ′

1, · · · , σ′m). Hence, solving problem

(A.4) is equivalent to solving the following m problems, for each i = 1, · · · ,m,

minσ′i≥0

f(σ′i) = (σ′

i − ti)2 + λσ

′1/2i . (A.5)

By Lemma 3.1, for each i = 1, · · · ,m, the global minimizer of (A.5) is σ∗i = hλ(ti). Thus

f(σ∗i ) = [hλ(ti)− ti]

2 + λh1/2λ (ti).

27

From (3.2) and (A.5), we know that f(σ∗i ) = 0 as ti ≤

3√544 λ2/3, and that f(σ∗

i ) < 0 and

σ∗i > 0 as ti >

3√544 λ2/3. When ti >

3√544 λ2/3, the first order optimal condition of (A.5) yields

that

2hλ(ti)− 2ti +1

2λh

−1/2λ (ti) = 0. (A.6)

By direct computation and using (A.6), we get

df(σ∗i )

dti= 2hλ(ti)h

′λ(ti)− 2hλ(ti)− 2tih

′λ(ti) +

1

2λh

−1/2λ (ti)h

′λ(ti)

= h′λ(ti)

(2hλ(ti)− 2ti +

1

2λh

−1/2λ (ti)

)− 2hλ(ti)

= −2hλ(ti) = −2σ∗i < 0.

This implies that f(σ∗i ) = h2λ(ti)−2tihλ(ti)+λh

1/2λ (ti) is a monotonically decreasing function

in the variable ti = u′Ti Y v′i. Note that Q(U ′, V ′) =∑m

i=1 f(σ∗i ). Therefore, solving problem

(A.3) is equivalent to solving the following m problems, for each i = 1, · · · ,m,

maxu′i,v

′i

ti = u′Ti Y v′i =r∑

i=1σiu

′Ti uiv

Ti v

′i

s.t. ∥u′i∥2 = 1, ∥v′i∥2 = 1,u′i ⊥ u′1, u′2, · · · , u′i−1,v′i ⊥ v′1, v′2, · · · , v′i−1.

(A.7)

Note that σ1 ≥ · · · ≥ σr > 0 and m ≥ r. Solving the m problems one by one from i = 1 toi = m, we can find that for each i = 1, · · · , r, the maximizer of (A.7) is u∗i = ui, v

∗i = vi and

the optimal value of the objective function is t∗i = uTi Y vi = σi, where ui, vi, σi all belong to Y(see (A.1)); and that for each i = r + 1, · · · ,m, since u∗i⊥u1, · · · , ur and v∗i⊥v1, · · · , vr,the optimal value of the objective function is t∗i = 0, and then hλ(t

∗i ) = 0. Thus, we have

Xs =m∑i=1

σ∗i u

∗i v

∗iT =

m∑i=1

hλ(t∗i )u

∗i v

∗iT

=r∑

i=1

hλ(σi)uivTi +

m∑i=r+1

0 (u∗i v∗iT )

= UDiag(Hλ(σ))VT = Hλ(Y ).

where Hλ(·) is defined as Definition 3.5. The proof is thus complete.

28

S Regularization Methods and Fixed Point Algorithms for ... · The matrix completion problem (1.2)...

Documents

Transcript of S Regularization Methods and Fixed Point Algorithms for ... · The matrix completion problem (1.2)...