Cramer Rao

download Cramer Rao

of 13

Transcript of Cramer Rao

  • 7/26/2019 Cramer Rao

    1/13

    IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011 4559

    Lower Bounds on the Mean-Squared Error ofLow-Rank Matrix Reconstruction

    Gongguo Tang, Member, IEEE, and Arye Nehorai, Fellow, IEEE

    AbstractWe investigate the behavior of the mean-square error(MSE) of low-rank matrix reconstruction and its special case,matrix completion. We first derive the constrained CramrRaobound (CRB) on the MSEmatrixof any locally unbiased estimator,and then analyze the behavior of the constrained CRB when asubset of entries of the underlying matrix is randomly observed.We design an alternating minimization procedure to computethe maximum likelihood estimator (MLE) for the low-rank ma-trix, and demonstrate through numerical simulations that theperformance of the MLE approaches the constrained CRB whenthe signal-to-noise ratio is high. Applying a ChapmanRobbinstype Barankin bound allows us to derive lower bounds on theworst-case scalar MSE. We demonstrate that the worst-case scalar

    MSE is infinite even if the model is identifiable. However, theinfinite scalar MSE is achieved only on a set of low-rank matriceswith measure zero. We discuss the implications of these boundsand compare them with the empirical performance of the matrixLASSO estimator and the existing bounds in the literature.

    Index TermsBarankin bound, ChapmanRobbins bound,constrained CramrRao bound, low-rank matrix reconstruction,matrix completion, maximum likelihood estimator, mean-squareerror.

    I. INTRODUCTION

    R

    ECONSTRUCTION of a low-rank matrix from noisylinear measurements, especially from a subset of its

    entries corrupted by noise, appears in many signal processingbranches, such as factor analysis, linear system realization [1],[2], matrix completion [3], [4], quantum state tomography[5], face recognition [6], [7], and Euclidean embedding [8],to name a few (see [9][11] for discussions and referencestherein). Suppose is a low-rank matrix with rank

    , then the goal of low-rank matrix reconstructionis to determine from the linear measurements:

    (1)

    where is the measurement vector, isthe sensing operator, and is the noise vector. In par-

    ticular, when the operator observes a subset of entries of thematrix , the resulting problem is called matrix completion.

    Manuscript received October 19, 2010; revised March 28, 2011; acceptedJune 17, 2011. Date of publication July 12, 2011; date of current versionSeptember 14, 2011. The associate editor coordinating the review of thismanuscript and approving it for publication was Prof. Jean Pierre Delmas.This work was supported by the Department of Defense under the Air ForceOffice of Scientific Research MURI Grant FA9550-05-1-0443, ONR GrantN000140810849, and NSF Grants CCF-1014908 and CCF-0963742.

    The authors are with the Preston M. Green Department of Electrical and Sys-tems Engineering, Washington University in St. Louis, St. Louis, MO 63130USA (e-mail: [email protected]; [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TSP.2011.2161471

    Since the size of the measurement vector is usually less thanthe size of the matrix , the measurement model (1) is under-determined.

    In this paper, we investigate the behavior of the MSE in esti-mating under the unbiasedness condition. For a fixed matrix

    , we derive a constrained CramrRao bound (CRB) on theMSE matrix that applies to any locallyunbiased estimator. Thebound depends on the sensing operator , the row and columnspaces of the underlying matrix , and the noise level of . Weapproximate the typical behavior of the constrained CRB usinga concentration of measure argument. We design an alternating

    algorithm to compute the maximum likelihood estimator (MLE)of the low-rank matrix. Numerical simulations show that whenthesignal-to-noise ratio (SNR) is relatively high, thescalar MSEof the MLE, which is equal to the trace of the MSE matrix,approaches the trace of the constrained CRB. The constrainedCRB is helpful for system design as it provides insight intowhich properties of the sensing operator are important forlow-rank matrix recovery. Under agloballyunbiased condition,we show that the worst-case scalar MSE is infinite for any esti-mator. Actually this infinite MSE is achieved by any matrix thatis not strictly of rank .

    We review approaches to solve from its measurements. Ina noiseless setting, solutions to the model (1) are not unique. A

    natural strategy to obtain the true is to find the solution withlowest rank that is consistent with the measurement, i.e.,

    (2)

    where denotes the rank of a matrix. Unfortunately, theoptimization problem (2) is NP-hard. A variety of computation-ally affordable methods, which work well for the noisy case,have been proposed to estimate by exploiting its low-rank-ness. We are particular interested in the behavior of the matrixLASSO estimator, which solves the following regularized nu-clear norm minimization problem:

    (3)

    where is the nuclear norm of the matrix. We use thefixed point continuation with approximate SVD (FPCA) algo-rithm [12] to efficiently solve (3). For the matrix completionproblem, we also design an alternating minimization procedureto compute the MLE, assuming the knowledge of rank for .More specifically, we write for ,and alternatingly minimize

    (4)

    with respect to and while fixing the other. We compare theperformance of theFPCA, theMLE,and the derived constrained

    CramrRao bound. Numerical simulations show that when the1053-587X/$26.00 2011 IEEE

  • 7/26/2019 Cramer Rao

    2/13

    4560 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    SNR is high, the biased matrix LASSO estimator is suboptimal,and the constrained CRB is achieved by the MLE.

    Universal lower bounds on the MSE matrix (or error covari-ance matrix) of any unbiased estimator, most notably the CRB,have long been used as a benchmark for system performancein the signal processing field [13]. However, the applicationof the CRB requires a regular parameter space (an open set in

    , for example) which is not satisfied by the low-rank matrixreconstruction problem. The parameter space for low-rankmatrices can not even be represented byfor continually differentiable and , a common form ofconstraints in the theory of constrained CRB [14][16]. In[17][19], the constrained CRB is applied to study unbiasedestimators for sparse vectors. In this paper, we analyze thelow-rank matrix reconstruction problem by employing theChapmanRobbins form of the Barankin bound [20][22],as well as a multiparameter CramrRao type lower boundwith parameter space constrained to nonopen subset of[14][16]. The most significant challenge presented by ap-plying the ChapmanRobbins type Barankin bound is tooptimize the lower bound over all possible test points. Weaddress the challenge by establishing a technical lemma onthe behavior of a matrix function. To address the issue ofrepresenting the constraints in applying the constrained CRB,we directly derive the constrained CramrRao lower boundfrom the ChapmanRobbins bound with additional regularityconditions [14].

    The paper is organized as follows. In Section II, we intro-duce model assumptions for (1), the ChapmanRobbins typeBarankin bound, and the constrained CRB. In Section III, wederive the constrained CRB on the MSE matrix for any locallyunbiased estimator. Section IV shows that, by applying theBarankin bound, the worst-case scalar MSE for a globallyunbiased low-rank matrix estimator is infinite. In Section V, theconstrained CRB is compared with the empirical performanceof the matrix LASSO estimator and the MLE. Section VI is aconcluding summary.

    II. MODELASSUMPTIONS,THECHAPMANROBBINSBOUND,AND THECONSTRAINEDCRAMRRAOBOUND

    In this section, we introduce model assumptions, and reviewthe ChapmanRobbins type Barankin bound and constrained

    CRB. Suppose we have a low-rank matrix , where theparameter space

    (5)

    For any matrix , we use to de-note the vector obtained by stacking the columns of intoa single column vector. Similarly, for a vector ,we use to denote the operation of reshapinginto an matrix such that .Without introducing any ambiguity, we identify with

    .We observe through the linear measurement mechanism

    (6)

    where the noise vector . It is convenient to rewrite(6) in the following matrix-vector form:

    (7)

    where , and is the matrix corre-sponding to the operator , namely, .Therefore, the measurement vector follows witha probability density function (pdf)

    (8)

    Our goal is to derive lower bounds on the MSE matrix for anyunbiased estimator that infers the deterministic parameter

    from .We consider two types of unbiasedness requirements: global

    unbiasedness and local unbiasedness. Global unbiasedness re-quires that an estimator is unbiased at any parameter point

    ; that is,

    (9)

    Here, the expectation istakenwith respect tothenoise. Thelocal unbiased condition only imposes the unbiased constrainton parameters in the neighborhood of a single point. More pre-cisely, we require

    (10)

    where . Refer to [17] formore discussion on implications of unbiased conditions in thesimilar sparse estimator scenario.

    It is good to distinguish two kinds of stability results forlow-rank matrix reconstruction. For the first kind, we seek acondition on the operator (e.g., the matrix restricted isometryproperty) such that we could reconstruct all matricesstably from the measurement . For the second kind, we studythe stability of reconstructing a specific and find the setof sensing operators that work well for thisparticular . Weusually need to identify good low-rank matrices that can bereconstructed stably by all or most of the sensing operators. Wesee that, for the first kind of problem, it is suitable to considerthe worst MSE among all matrices with rank less than a spec-ified value and global unbiasedness, while for the second kindit is more appropriate to focus on locally unbiased estimators

    and a fixed low-rank matrix . Therefore, for the first kind ofproblems, we will apply the ChapmanRobbins type Barankinbound in a worst-case framework, and for the second kind wewill apply the constrained CRB.

    We now present the ChapmanRobbins version of theBarankin bound [14], [20], [21] on the MSE matrix (or errorcovariance matrix), defined as follows:

    (11)

    for any unbiased estimator . For any integer ,and arbitrary vectors that are not equal to ,we define the finite differences, and , as

    and (12)

  • 7/26/2019 Cramer Rao

    3/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4561

    If is unbiased at , the ChapmanRob-bins bound states that the MSE matrix satisfies the matrixinequality

    (13)

    where and thedenotes the pseudo-inverse. We use in the sense that

    is positive semidefinite. Taking the trace of both sides of(13) yields a bound on the scalar MSE

    (14)

    While the MSE matrix is a more accurate and completemeasure of system performance, the scalar MSEis sometimes more amenable to analysis. We will use theChapmanRobbins bound to derive a lower bound on theworst-case scalar MSE as follows:

    (15)

    We include a proof for (15) in Appendix A.The worst-case scalar MSE has been used by many re-

    searchers as an estimation performance criterion [23][26].In many cases, it is desirable to minimize the scalar MSE toobtain a good estimator. However, the scalar MSE dependsexplicitly on the unknown parameters when the parameters aredeterministic, and hence can not be optimized directly [26].To circumvent this difficulty, the authors of [23][26] resortto a minimax framework to find estimators that minimize theworst-case scalar MSE. When an analytical expression of the

    scalar MSE is not available, it makes sense for system designto identify a lower bound on the worst-case MSE [27] andminimize it [28].

    A constrained CramrRao bound for locally convex param-eter space under local unbiasedness condition is obtained bytaking arbitrarily close to . Suppose that andtest points are contained infor sufficiently small . Then under certainregularity conditions, the MSE matrix for any locally unbiasedestimator satisfies [14, Lemma 2]

    (16)

    where is any matrix whose column space equals, and where the Fisher information

    matrix

    (17)

    Note that the positive definiteness of in Lemma 2 of [14] canbe relaxed to .

    III. THECONSTRAINEDCRAMRRAOBOUND FORANYLOCALLYUNBIASEDESTIMATOR

    In this section, we apply (16) to derive the constrained CRBon the MSE matrix for any locally unbiased estimator. We are

    particularly interested in the matrix completion problem, forwhich we study the typical behavior of the derived constrainedCRB in a probabilistic framework. We also propose an alter-nating minimization algorithm which computes the MLE formatrix completion.

    A. Constrained CramrRao Lower Bound

    For any , in order to employ (16) and let testpoints lie in , we need to carefully se-lect the direction vectors . Denoteand for . Suppose that

    is the singular value decomposition of with, and

    . If we define

    (18)

    with , then . If additionallyare linearly independent when viewed as vectors, then

    (hence ) are also linearly indepen-dent. To see this, note that multiplying both sides of

    (19)

    with yields . Therefore, we can find at mostlinearly independent directions in this manner. Similarly,

    if we take , we find another linearly in-dependent directions. However, the union of the two sets of di-rections is linearly depen-dent. As a matter of fact, we have only linearly

    independent directions, as explicitly constructed in the proof ofTheorem 1.We first need the following lemma, whose proof is given in

    Appendix B:Lemma 1: Suppose and are two

    full rank matrices with . If ispositive definite, then .

    We have the following theorem:Theorem 1: Suppose has the full-size sin-

    gular value decompositionwith

    , and. The MSE matrix at for

    any unbiased estimator satisfies

    (20)

    with

    (21)

    as long as .Proof: It is easy to compute that the Fisher information

    matrix for (6) is . Lemma 1 tells us that weshould find linearly independent directions spanning

    as large a subspace as possible. If two sets of directions spanthe same subspace, then they are equivalent in maximizing the

  • 7/26/2019 Cramer Rao

    4/13

    4562 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    lower bound. Therefore, without loss of generality, we take thefollowing directions:

    (22)

    We denote the index set ofvalid s inthe above directionsas. Defining ,

    we have

    (23)

    which implies that . Therefore, weobtain

    (24)

    The constrained CRB (16) then implies that

    (25)

    with defined in (21).An immediate corollary is the following lower bound on the

    scalar MSE.Corollary 1: Under the notations and assumptions of The-

    orem 1, we have

    (26)

    Proof: Taking the trace of both sides of (20) yields

    (27)

    which is (26).The condition is satisfied if

    for any nonzero matrix with a rank of at most . Tosee this, first note that

    (28)

    due to the definition of and . Thus, we only need to show

    (29)

    for any , which is actually a consequenceof the fact that

    (30)

    is nonzero and has a rank of at most . Here, denotesthe th canonical basis of , i.e., the vector with the th com-ponent one and the rest of the components zeros.

    The more restrictive condition for any nonzero

    matrix with rank at most is of a global nature and is alwaysnot satisfied by the matrix completion problem. To see this, sup-pose selects entriesinthe indexset .Then for any ,we have with . However, thecondition is met for matrix com-pletion if the operation selects linearly inde-pendent rows of . This will happen with high probability forrandom selection operators unless the singular vectors of arevery spiky.

    We note that and in Theorem 1 are determined by theunderlying matrix , while the semi-orthogonal matricesand are arbitrary as long as they span the spaces orthogonalto the column spaces of and , respectively. Suppose we

    have another choice of and . Since ( respectively)spans the same space as ( respectively), we have

    and (31)

    for the invertible matrices and. Then, it is easy to see that

    (32)

    which together with Lemma 1 implies that the choice of anddoes not affect the bound (20).

    We now present a simplified bound:Corollary 2:Under the conditions of Theorem 1, we have the

    following simplified but slightly looser bound:

    (33)

  • 7/26/2019 Cramer Rao

    5/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4563

    In particular, for the matrix completion problem with whitenoise , the above bound further simplifies to

    (34)

    Here, is the index set for observed entriesin the th column of matrix is similarlythe observation index set for the th row of is thesubmatrix of with rows indicated by ; and is definedsimilarly.

    Proof: Note that, in Theorem 1, we could also equivalentlytake to be

    (35)

    (36)

    If we take and , respectively, thenaccording to Lemma 1 it is easy to get two lower bounds on

    (37)

    and

    (38)

    When and we observe a subset of entries of , i.e.,the matrix completion problem with white noise, each row ofthe observation matrix has a single 1 and all other elementsare zeros. If we exclude the possibility of repeatedly observingentries, then is a diagonal matrix with ones andzeros in the diagonal, where the ones correspond to the observedlocations in . Then algebraic manipulations of (37) and (38)yield the desired (34). Thus, the conclusion of the corollaryholds.

    The simplified bound given in (34) is not as tight as the onegiven in Theorem 1, as shown in Fig. 3. However, it is mucheasier to compute (34) when and are large.

    B. Probability Analysis of the Constrained Cramr-Rao Bound

    We analyze the behavior of the boundfor the matrix completion

    problem when . Suppose we randomly anduniformly observe entries of matrix , the correspondingindex set of which is denoted by . Based on this measurementmodel, we rewrite

    (39)

    Fig.1. Normalized constrained CramrRao bound and its approximationwith .

    where is the th row of . Note that

    (40)

    Since isa convex function for positive semidefinite ma-trices [29, p. 283, Proposition 8.5.15, xviii], Jensens inequalityimplies that

    (41)

    Our result of Theorem 3, specifically (50), strengthens theabove result and says the bound actuallyconcentrates around with high probability. Asa matter of fact, when and are relatively large, the bound isvery close to , as illustrated in Fig. 1.

    We need to establish a result stronger than (40) thatthe eigenvalues of concentrate around onewith high probability, which implies that the lower bound

    concentrates around .For this purpose, we need to study the following quantity:

    (42)

  • 7/26/2019 Cramer Rao

    6/13

    4564 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    We first present a lemma about the behavior of a Rademacheraverage [30], whose proof is given in Appendix C.

    Lemma 2: Let have uniformly boundedentries, i.e., for all . Then

    (43)

    where are independent symmetric valued randomvariables (the Rademacher sequence), and is a constant thatdepends on .

    We establish the following theorem, whose proof is given inAppendix D.

    Theorem 2: Suppose

    (44)

    for some . Then if

    (45)

    we have

    (46)

    The proof essentially follows [31]. Note that a standard sym-metrization technique implies that the left-hand side of (46) isbounded by

    (47)

    which is a scaled version of the Rademacher average consideredin Lemma 2.

    The assumption (44) means that the singular vectors spreadacross all coordinates, i.e., they are not very spiky. Assump-tions similar to this are used in many theoretical results for

    matrix completion [4], [32]. Recall that for and, only and are determined by the underlyingmatrix , while and are arbitrary aside from formingorthogonal unit bases for the spaces orthogonal to the columnspaces of and , respectively. Furthermore, the constrainedCRB does not depend on the choice of

    and . Hence, it is not very natural to impose conditionssuch as (44) on and . We conjecture that if the rank isnot extremely large, it is always possible to construct andsuch that the assumption (44) holds for and if it holds for

    and . If this conjecture can be shown, our assumption isthe same as the weak incoherence property imposed in [4] and[32]. However, currently, we are not able to prove this conjec-

    ture. Note that our Theorem 2 and Theorem 3 are based on theassumption (44) and not on the conjecture we raise here.

    Because the elements of areof the form , we conclude from (44) that

    (48)

    or equivalently

    (49)

    Hence, the uniform boundedness condition in Lemma 2 issatisfied.

    We proceed to use a concentration inequality to show the fol-lowing high probability result.

    Theorem 3: Under the assumption (44), we have

    (50)

    with probability greater than for some constant ,as long as satisfies (45).Theorem 3 follows from the following concentration of measureresult:

    (51)

    whose proof follows [31] with minor modifications. Hence, weomit the proof in this paper. The implication of (51) is that theeigenvalues of are between 1/2 and 3/2 withhigh probability.

    We compare our approximated bound withexisting results for matrix completion. Our result is an approx-imation of the universal lower-bound. In some sense, our resultis more of a necessary condition. The results of [4], [32] are forparticular algorithms, and they are sufficient conditions to guar-antee the stability of these algorithms. Consider the followingoptimization problem:

    (52)

    (53)

    For this problem, a typical result of ([4], Equation III.3) statesthat when the solution obeys with high probability

    (54)

    Our probability analysis says that any locally unbiased estimatorapproximately satisfies

    (55)

    Since must be greater than , the right-hand side of (55)is essentially , which means the per-element error is propor-tional to the noise level. However, the existing bound in (54) isproportional to . Considering that the MLE approaches the

    constrained CRB (see Fig. 3), we see that the bound (54) is notoptimal.

  • 7/26/2019 Cramer Rao

    7/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4565

    TABLE IMAXIMUMLIKELIHOODESTIMATORALGORITHM

    C. Maximum Likelihood Estimation for Matrix Completion

    In this subsection, assuming knowledge of the matrix rank, we present a simple alternating minimization algorithm to

    heuristically compute the maximum likelihood estimator of alow-rank matrix based on a few noise corrupted entries. Whenthe SNR is relatively high, the algorithm performs very well.Suppose is of rank , then we can write as

    where (56)

    Assume that we observe a few entries of that are corruptedby noise:

    (57)

    where is the t h row of is the th column of isindependent identically distributed (i.i.d.) Gaussian noise withvariance , and is the index set of all observed entries. TheMLE of and are obtained by minimizing

    (58)

    We adopt an alternating minimization procedure. First, for fixed, setting the derivative of with respect to to zero

    gives

    (59)

    for . Similarly, when is fixed, we get

    (60)

    for . Suppose with, and is the SVD of . Set-

    ting and , we then alternate between(59) and (60). To increase stability, we do a QR decompositionfor and set the obtained orthogonal matrix as . Hence,

    is always a matrix with orthogonal columns. The overallalgorithm goes as shown in Table I.

    IV. APPLICATION OF THEBARANKINBOUND FORLOW-RANKMATRIXESTIMATION

    In this section, we apply the ChapmanRobbins typeBarankin bound to the low-rank matrix reconstruction problem(6) with Gaussian noise.

    A. ChapmanRobbins Type Barankin Bound With One

    Test Point

    We first consider the ChapmanRobbins bound withtest point . Suppose that is any globally unbiasedestimator for , we derive lower bounds on the worst-case scalar MSE for . The intent of this subsection is mainlyto demonstrate the application of the ChapmanRobbins boundto low-rank matrix estimation without going into complicatedmatrix manipulations as in the next subsection. According to

    (13), (14), and (15), we have

    (61)

    Using the pdf for the Gaussian distribution, we calculate theintegral in (61) as

    (62)

    Since maximization with respect to and overis equivalent to maximization with respect to over ,the lower bound in (61) becomes

    (63)

    In order to perform the maximization in (63), we establish thefollowing lemma, whose proof is fairly easy (See also the proofof Lemma 5 in Appendix F).

    Lemma 3: For any and , define a func-tion as

    (64)

  • 7/26/2019 Cramer Rao

    8/13

    4566 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    Then, we have the following:1) is decreasing in , and is strictly decreasing if

    ;

    2) .

    Since, Lemma 3 allows us to rewrite the maximization in (63) as

    (65)

    Note that when the supremum is achievable, we switch to themaximization notation.

    Therefore, we have the following proposition:Proposition 1: The worst-case scalar MSE for any globally

    unbiased estimator satisfies

    (66)

    The result of Proposition 1 merits discussion. The quantityis closely related to the

    matrix restricted isometry property (RIP) constant [33]:

    (67)

    which is believed to guarantee stable low-rank ma-trix recovery. From Proposition 1 or from (63), itis easy to see that the worst-case MSE is infinite if

    . This is because

    the model (6) is not identifiable under this condition. Therequirement of

    guarantees that there are not two matrices of rank that giverise to the same measurement in the noiseless setting; i.e.,

    for and . Thiscondition was discussed following Theorem 1; recall that it isnot satisfied by the matrix completion problem.

    B. The Worst-Case Scalar MSE Is Infinite

    In this subsection, by optimizing the lower bound in (13)for multiple test points, we demonstrate that the worst-casescalar MSE is infinite even if the model is identifiable. Denote

    . Then, we compute the thelement of as

    Fig. 2. The limiting behavior of and . Note the horizontal axis isfor instead of . In the first and second plots, the error functions are defined as 0 6 and 0 6 , respectively. In the third plot, the function .

    (68)

    where for the second equality we used . Notethat (68) coincides with (62) when . Therefore, we obtain

    where isthe elementwise exponential function of a matrix and is thecolumn vector with all ones.

    Although the lower bound (13) uses a pseudoinverse for gen-erality, the following lemma shows that is always invertible.The proof, which is given in Appendix E, relies on the fact thatany Gaussian pdfs with a common covariance matrix but dis-

    tinct means are linearly independent.Lemma 4: If for any matrix with rank of at

    most , the covariance matrix is positive definite.The lower bound in Proposition 1 is not very strong since we

    have only one test point. In order to consider multiple test points,we extend Lemma 3 to the matrix case in the following lemma.For generality, we consider both and . Theproof is given in Appendix F.

    Lemma 5: Suppose and are ma-trices of full rank and . For

    , define a matrix valued function as

    (69)

  • 7/26/2019 Cramer Rao

    9/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4567

    Fig. 3. Performance of the MLE and the FPCA compared with two CramrRao bounds.

    and define the trace of as

    (70)

    Then, the following hold.1) is strictly decreasing in the Lwner partial order if

    for any matrix with rank of at most .2) if

    . In particular, if , we have

    .3) if .Numerical simulations show that under the condition of

    Lemma 5 more general results hold.1) If , we always have

    . Furthermore, if , we have .

    2) If is sufficient for .We illustrate these results in Fig. 2. Forexample, the second sub-figure shows that if and , the matrix function

    converges to the the classic CramrRao bound. Althoughwe have not yet been successful in showing the above generalresults analytically, the special cases covered by Lemma 5 suf-

    fice for our purpose.Lemma 5 leads to the following theorem.

    Theorem 4: Suppose that , andfor any matrix with rank of at most . Then, the

    worst-case scalar MSE for any unbiased estimator is . Inaddition, the infinite scalar MSE is achieved at any such that

    .Proof: Consider any suchthat

    . For any , define where is theth column of the dimensional identity matrix . Due to

    the rank inequality

    (71)

    we get . Thus, we obtain that . According toLemma 5, the trace of in (13) can be made arbitrarily largeby letting approach 0. Hence, the MSE for is and allconclusions of the theorem hold.

    Theorem 4 essentially says there is no globally unbiased es-timator that has finite worst-case scalar MSE, even if the modelis identifiable. Actually there is no locally unbiased estimatorwith finite worst-case scalar MSE for matrix with rank lessthan . Fortunately these sets of matrices form a measure zerosubset of . The applicability of the key lemma (Lemma 5)

    to the worst-case scalar MSE also hinges on the fact that thesignal/parameter vector can be arbitrarily small, which enables

  • 7/26/2019 Cramer Rao

    10/13

    4568 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    TABLE IIPARAMETERCONFIGURATION FORSIMULATIONS

    us to drive the parameter in to zero. We might obtaina bounded MSE if we impose additional restrictions on the pa-rameter space, for example, a positive threshold for the absolutevalues of nonzero singular values.

    V. NUMERICALSIMULATIONS

    In this section, we show several numerical examples todemonstrate the performances of the FPCA that solves thematrix LASSO estimator (3) and the MLE algorithm given inTable I, and compare them with the derived constrained CRB

    (26).We describe the experiment setup for the constrained CRB

    first. In four experiments, whose results are shown in Fig. 3, wegenerated a rank matrix . We tookin all experiments. Here and were ma-trices whose entries followed i.i.d. Gaussian distribution withmean zero and variance one, and was the signal level. Werandomly observed entries of corrupted bywhite Gaussian noise of variance . We then ran the FPCAand the MLE algorithms times for different realizations ofthe noise and recorded the averaged MSE. The parameter con-figurations of the four experiments are summarized in Table II.The parameter in the matrix LASSO estimator (3) solved by

    the FPCA was set to be ,or depending on whetherthe optimization problem was hard or easy [12]. We varieddifferent parameters, e.g., the noise level , the matrix sizeand , and the fraction of observed entries , and plotted theroot mean-square error (RMSE) as a function of these varyingvariables. Note that the RMSE is related to the scalar MSE by

    , i.e., it is the square root of the per-el-ement error. In addition to the empirical RMSE, we also plottedthe constrained CRB (26) and the relaxed bound (34) as a func-tion of the varying variables. For comparison with the RMSE,we also divided them by and took the square root. In Fig. 3,the bound given by (26) is labeled CRB and that of (34) is la-

    beled CRB2.In Fig. 3(a) and (b), we see that the FPCA performs better

    than both the MLE and the predictions of the constrained CRBfor high levels of noise. This is achieved by introducing largebias toward the zero matrix. However, for relatively high SNR,the performance of the MLE is better than that of the FPCA, es-pecially when the matrix rank is high. This finding confirms thatbiased estimators are suboptimal when the signal is strong andimplies that there is room to improve the performance of currentmatrix completion techniques in the relatively high SNR region.In addition, the constrained CRB (26) predicts the behavior ofthe MLE very well. It also serves as a lower bound on the perfor-mance of the matrix LASSO estimator for low levels of noise.

    However, the constrained CRB fails to capture the thresholdphenomenon, in which the performance of the MLE suddenly

    improves at certain SNR level. This threshold phenomenon isobserved in many signal processing problems and can be cap-tured by considering bounds tighter than the CRB. We also no-tice a gap between the constrained CRB (26) and its relaxed ver-sion (34) labeled CRB2 in the figure. In addition, from Fig. 3(c)and (d) we see that the performance of both the FPCA and MLEimprove as and increase, which is correctly predicted by theconstrained CRB.

    VI. CONCLUSION

    We analyzed the behavior of the MSE matrix and the scalarMSE of locally and globally unbiased estimators for low-rankmatrix reconstruction. Compared with the performance analysisof low-rank matrix recovery for specific algorithms, these lowerbounds apply to any unbiased estimator. The global and localunbiasedness requirements are related to the two kinds of sta-bility problems raised in low-rank matrix reconstruction: sta-bility results applying to all low-rank matrices and those ap-plying to a specific low-rank matrix. We derived a constrainedCRB for any locally unbiased estimator and showedthat the pre-dicted performance bound is approached by the MLE. Due tothe good performance of the MLE, our ongoing work involvesdesigning more efficient implementations of the basic MLE al-

    gorithm presented in this paper, demonstrating its convergence,and incorporating procedures to automatically estimate the rank. We also demonstrated that the worst-case MSE for any glob-

    ally unbiased estimator is infinite, which is achieved by matricesof rank strictly less than .

    APPENDIXAPROOF OF(15)

    Proof: Note that in the lower bound on the scalar MSE [see(14)]

    (72)

    the right-hand side depends on the integer and the test points

    , while the left-hand side does not dependon these quantities. The tightest bound on the scalar MSE forany globally unbiased estimator of a particular is obtained bymaximizing over all integers and all possible testpoints

    (73)

    Intuitively, the tightest bound on the worst-case scalar MSE isobtained by taking an additional maximization over all possible

    . We develop this intuition more rigorously in the following.For any , suppose there exists such that

    (74)

  • 7/26/2019 Cramer Rao

    11/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4569

    Then, we have

    (75)

    Due to the arbitrariness of , we have proved (15).

    APPENDIXBPROOF OFLEMMA1

    Proof: The conditions of the lemma imply that there ex-ists a full rank matrix such that .Since is positive definite, we construct a matrix

    with , thezero matrix, through the Gram-Schmidt orthogonalization

    process withinner product defined by . Define thenonsingular matrix . Then, we have

    (76)

    where for the first equality we relied on the fact thatif is invertible.

    APPENDIXCPROOF OFLEMMA2

    Proof: The proof essentially follows [34]. Denote. Using the comparison principle, we replace the

    Rademacher sequence in (47) by the standard Gaussiansequence and then apply Dudleys inequality [35]:

    (77)

    where is the unit ball in , and is theminimal number of balls with radius under metric that coversthe set . The metric is defined by the Gaussian process asfollows:

    (78)

    As a consequence, we have

    (79)

    where . Now note the followingcontainments:

    (80)

    where is the unit ball under . We use the following twoestimates on the covering numbers:

    (81)

    (82)

    We compute the Dudley integral as follows:

    (83)

    Choosing yields an upper bound of the form

    (84)

  • 7/26/2019 Cramer Rao

    12/13

    4570 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, NO. 10, OCTOBER 2011

    APPENDIXDPROOF OFTHEOREM2

    Proof: Denote the left-hand side of (46) by and denote. Conditioned on a choice of and using

    Lemma 2, we get

    (85)

    which implies that

    (86)

    as long as . Thus, if wetake

    (87)

    then we have (46).

    APPENDIXEPROOF OFLEMMA4

    Proof: Suppose is a nonzero column vector. Wehave

    (88)

    Therefore, the quadratic form is equivalent to. If we can show that any

    Gaussian functions withare linearly independent, then is positive definite under

    the lemmas conditions. To this end, we compute that the Grammatrix associated with is

    (89)

    We note that if for any matrix with rank of at

    most , the mean vectors for the Gaussian pdfs inare distinct. According to [36, p. 14], the Gram matrix is

    nonsingular, which implies that the functions in are linearlyindependent.

    APPENDIXFPROOF OFLEMMA5

    Proof:

    1) If for any matrix with rank of at most , ac-cording to Lemma 4, is positive definite. Hence, thepseudoinverse in the definition of is actually an in-verse. Taking the derivative of with respect to yields

    (90)

    It suffices to show that is positive semidef-inite. To this end, suppose is a nonzerocolumn vector, and construct the function

    (91)

    where . Obviously, we have. Therefore, it suffices to show that for

    . Taking the derivative of gives

    (92)

    where . We note thatis a positive definite matrix and is a pos-

    itive semidefinite matrix with positive diagonal elementsunder the lemmas conditions. Therefore, according tothe Shur product theorem [37, Theorem 7.5.3, p. 458]and [38, Theorem 8.17, p. 300], the Hadamard product

    is positive definite. Therefore, thefunction is increasing, which implies thatis negative definite. Thus, we conclude that is strictlyincreasing in in the Lwner partial order.

    2) For the second claim, if , the full-rank-ness of implies that . There-fore, both and are invertible.The conclusion follows from the continuity of the matrixinverse.

    3) If and is of full rank, we get

    (93)

    and(94)

    where s are the eigenvalues of a matrix in the de-creasing order. Due to the continuity of matrix eigenvalueswith respect to matrix entries, we have

    (95)

  • 7/26/2019 Cramer Rao

    13/13

    TANG AND NE HOR AI: L OWER B OUNDS ON T HE MEAN-SQUARED ERROR OF LOW-RANK MATRIX R ECONST RUCT ION 4571

    Since , the lasteigenvalues of are zeros. Therefore, we obtain

    (96)

    REFERENCES

    [1] L. El Ghaoui and P. Gahinet, Rank minimization under LMI con-straints: A framework for output feedback problems, presented at theEur. Control Conf., Groningen, The Netherlands, Jun. 28, 1993.

    [2] M. Fazel, H. Hindi, and S. Boyd, A rank minimization heuristic withapplication to minimum order system approximation, in Proc. Amer.Control Conf., 2001, vol. 6, pp. 47344739.

    [3] E. J. Cands and B. Recht, Exact matrix completion via convex opti-mization, Found. Comput. Math., vol. 9, no. 6, pp. 717772, 2009.

    [4] E. J. Candes and Y. Plan, Matrix completion with noise,Proc. IEEE,vol. 98, no. 6, pp. 925936, Jun. 2010.

    [5] D. Gross, Y. Liu, S. T. Flammia, S. Becker, and J. Eisert, Quantumstate tomography via compressed sensing,Phys. Rev. Lett., vol. 105,no. 15, pp. 150401150404, Oct. 2010.

    [6] R. Basri and D. W. Jacobs, Lambertian reflectance and linear sub-spaces, IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 2, pp.218233, Feb. 2003.

    [7] E. J. Cands, X. Li,Y. Ma, andJ. Wright, Robust principalcomponentanalysis?, J. ACM, vol. 58, no. 3, pp. 137, Jun. 2011.

    [8] N. Linial, E. London, and Y. Rabinovich, The geometry of graphsand some of its algorithmic applications,Combinatorica, vol. 15, pp.215245, 1995.

    [9] B. Recht,M. Fazel, andP. A. Parrilo, Guaranteedminimum-ranksolu-tions of linear matrix equations via nuclear norm minimization,SIAM

    Rev., vol. 52, no. 3, pp. 471501, 2010.[10] M. Fazel, Matrix rank minimization with applications, Ph.D. disser-

    tation, Stanford Univ., Stanford, CA, 2002.[11] E. J. Cands and Y. Plan, Tight oracle inequalities for low-rank ma-

    trix recovery from a minimal number of noisy random measurements,IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 23422359, Apr. 2011.

    [12] S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterativemethods for matrix rank minimization, Math. Programm., vol. 120,no. 2, pp. 133, 2009.

    [13] P. Stoica and A. Nehorai, MUSIC, maximum likelihood, andCramrRao bound: Further results and comparisons, IEEE Trans.

    Acoust., Speech, Signal Process., vol. 38, no. 12, pp. 21402150, Dec.1990.

    [14] J. D. Gorman and A. O. Hero, Lower bounds for parametric esti-mation with constraints, IEEE Trans. Inf. Theory, vol. 36, no. 6, pp.12851301, Nov. 1990.

    [15] T. L. Marzetta,A simple derivation of the constrained multipleparam-eter CramrRao bound,IEEE Trans. Signal Process., vol. 41, no. 6,pp. 22472249, Jun. 1993.

    [16] P. Stoica and B. C. Ng, On the CramrRao bound under parametricconstraints, IEEE Signal Process. Lett., vol. 5, no. 7, pp. 177179, Jul.1998.

    [17] Z. Ben-Haim and Y. C. Eldar, The Cramr-Rao bound for estimatinga sparse parameter vector,IEEE Trans. Signal Process., vol. 58, no. 6,pp. 33843389, Jun. 2010.

    [18] Z. Ben-Haim and Y. C. Eldar, On the constrained Cramr-Rao boundwith a singular fisher information matrix,IEEE Signal Process. Lett.,vol. 16, no. 6, pp. 453456, Jun. 2009.

    [19] A. Jung, Z. Ben-Haim, F. Hlawatsch, and Y. C. Eldar, Unbiased es-timation of a sparse vector in white Gaussian noise, ArXiv E-Prints

    May 2010 [Online]. Available: http://arxiv.org/abs/1005.5697[20] J. M. Hammersley, On estimating restricted parameters,J. Roy. Stat.Soc. Series B, vol. 12, no. 2, pp. 192240, 1950.

    [21] D. G. Chapman and H. Robbins, Minimum variance estimationwithout regularity assumptions,Ann. Math. Stat., vol. 22, no. 4, pp.581586, 1951.

    [22] R. McAulay and E. Hofstetter, Barankin bounds on parameter estima-tion,IEEE Trans. Inf. Theory, vol. 17, no. 6, pp. 669676, Nov. 1971.

    [23] J. Pilz, Minimax linear regression estimation with symmetric param-eter restrictions, J. Stat. Planning Inference, vol. 13, pp. 297318,1986.

    [24] Y. C. Eldar, A. Ben-Tal, and A. Nemirovski, Robust mean-squarederror estimation in the presence of model uncertainties, IEEE Trans.Signal Process., vol. 53, no. 1, pp. 168181, Jan. 2005.

    [25] Y. Guo and B. C. Levy, Worst-case MSE precoder design for im-perfectly known mimo communications channels,IEEE Trans. SignalProcess., vol. 53, no. 8, pp. 29182930, Aug. 2005.

    [26] Y. C. Eldar, Minimax MSE estimation of deterministic parameterswith noise covariance uncertainties,IEEE Trans. Signal Process., vol.54, no. 1, pp. 138145, Jan. 2006.

    [27] P. A. Parker, P. Mitran, D. W. Bliss, and V. Tarokh, On bounds andalgorithms for frequency synchronization for collaborative commu-nication systems, IEEE Trans. Signal Process., vol. 56, no. 8, pp.37423752, Aug. 2008.

    [28] A. Funai and J. A. Fessler, Cramr Rao bound analysis of joint b1/t1mapping methods in MRI, inIEEE Int. Symp. Biomed. Imaging: From

    Nano to Macro, Apr. 2010, pp. 712715.[29] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas

    With Application to Linear Systems Theory. Princeton, NJ: Princeton

    Univ. Press, 2005.[30] M. Ledoux and M. Talagrand, Probability in Banach Spaces:Isoperimetry and Processes. New York: Springer-Verlag, 1991.

    [31] M. Rudelson and R. Vershynin, Sparse reconstruction by convex re-laxation: Fourier and Gaussian measurements, in Proc. 40th Conf. Inf.Sci. Syst. (CISS 2006), Mar. 2006, pp. 207212.

    [32] E. J. Cands and T. Tao, The power of convex relaxation: Near-op-timal matrix completion,IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.20532080, May 2010.

    [33] E. J. Cands, The restricted isometry property and its implications forcompressed sensing,Compte Rendus de lAcademie des Sciences, ser.I, vol. 346, pp. 589592, 2008.

    [34] M. Rudelsonand R. Vershynin, On sparse reconstructionfrom Fourierand Gaussian measurements,Commun. Pure Appl. Math., vol. 61, pp.10251045, 2008.

    [35] M. Talagrand, The Generic Chaining: Upper and Lower Bounds of Sto-chastic Processes. New York: Springer, 2005.

    [36] M. D. Buhmann, Radial Basis Functions: Theory and Implementa-tions. Cambridge, U.K.: Cambridge Univ. Press, 2009.

    [37] R. A. Horn and C. R. Johnson, Matrix Analysis. New York: Cam-bridge Univ. Press, 1990.

    [38] J. R. Schott, Matrix Analysis for Statistics, 2nd ed. New York: Wiley-Interscience, 2005, 1.

    Gongguo Tang(S09M11) received the B.Sc. de-gree in mathematics from the Shandong University,China, in 2003, the M.Sc. degree in systems sciencefrom the Chinese Academy of Sciences, China, in2006, and the Ph.D. degree in electrical and systemsengineering fromWashington University in St. Louis,MO, in 2011.

    He is currently a Postdoctoral Research Associate

    at the Department of Electrical and Computer Engi-neering, University of Wisconsin-Madison. His re-search interests are in the area of sparse signal pro-

    cessing, matrix completion, mathematical programming, statistical signal pro-cessing, detection and estimation, and their applications.

    Arye Nehorai (S80M83SM90F94) receivedthe B.Sc. and M.Sc. degrees from the TechnionIs-rael Institute of Technology, Haifa, Israel, and thePh.D. degree from Stanford University, Stanford,CA.

    He was formerly a faculty member at Yale Uni-versity and the University of Illinois at Chicago.He is currently the Eugene and Martha Lohman

    Professor and Chair of the Department of Electricaland Systems Engineering at Washington Universityin St. Louis (WUSTL) and serves as the Director of

    the Center for Sensor Signal and Information Processing at WUSTL.Dr.Nehoraiserved as Editor-in-Chiefof theIEEE TRANSACTIONS ON SIGNAL

    PROCESSING from 2000 to 2002. From 2003 to 2005, he was Vice-President(Publications) of the IEEE Signal Processing Society (SPS), Chair of the Pub-lications Board, and member of the Executive Committee of this Society. Hewas the Founding Editor of the special columns on Leadership Reflections intheIEEE Signal Processing Magazinefrom 2003 to 2006. He received the 2006IEEE SPS Technical Achievement Award and the 2010 IEEE SPS MeritoriousService Award. He was elected Distinguished Lecturer of the IEEE SPS for theterm 2004 to 2005. He was corecipient of the IEEE SPS 1989 Senior Awardfor Best Paper coauthor of the 2003 Young Author Best Paper Award and core-cipient of the 2004 Magazine Paper Award. In 2001, he was named UniversityScholar of the University of Illinois. He is the Principal Investigator of the Mul-tidisciplinary University Research Initiative (MURI) project entitled Adaptive

    Waveform Diversity for Full Spectral Dominance. He has been a Fellow of theRoyal Statistical Society since 1996.