Lower bounds for the low-rank matrix approximation

Low-rank matrix recovery is an active topic drawing the attention of many researchers. It addresses the problem of approximating the observed data matrix by an unknown low-rank matrix. Suppose that A is a low-rank matrix approximation of D, where D and A are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$m \times n$\end{document}m×n matrices. Based on a useful decomposition of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$D^{\dagger} - A^{\dagger}$\end{document}D†−A†, for the unitarily invariant norm \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\|\cdot\|$\end{document}∥⋅∥, when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\|D\|\geq\|A\| $\end{document}∥D∥≥∥A∥ and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\|D\|\leq\|A\|$\end{document}∥D∥≤∥A∥, two sharp lower bounds of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$D - A$\end{document}D−A are derived respectively. The presented simulations and applications demonstrate our results when the approximation matrix A is low-rank and the perturbation matrix is sparse.


Introduction
In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data.
Low-rank approximation of a linear operator is ubiquitous in applied mathematics, scientific computing, numerical analysis, and a number of other areas. For example, a lowrank matrix could correspond to a low-degree statistical model for a random process (e.g., factor analysis), a low-order realization of a linear system [], or a low-dimensional embedding of data in the Euclidean space [], the image and computer vision [-], bioinformatics, background modeling and face recognition [], latent semantic indexing [, ], machine learning [-] and control [] etc. These data may have thousands or even billions of dimensions, and a large number of samples may have the same or similar structure. As we know, the important information lies in some low-dimensional subspace or lowdimensional manifold, but interfered with some perturbative components (sometimes interfered by the sparse component).
Let D ∈ R m×n be an observed data matrix which is combined as where A ∈ R m×n is the low-rank component and E ∈ R m×n is the perturbation component of D. The singular value decomposition (SVD []) is a method for dealing with such high-dimensional data. If the matrix E is small, the classical principal components analysis (PCA [-]) can seek the best rank-r estimation of A by solving the following constrained optimization via SVD of D and then projecting the columns of D onto the subspace spanned by the r principal left singular vectors of D: where r min(m, n) is the target dimension of the subspace, is an upper bound on the perturbative component E F and · F is the Frobenius norm. Despite its many advantages, the traditional PCA suffers from the fact that the estima-tionÂ obtained by classical PCA can be arbitrarily far from the true A, when E is sufficiently sparse (relative to the rank of A). The reason for this poor performance is precisely that the traditional PCA makes sense for Gaussian noise and not for sparse noise. Recently, robust PCA (RPCA []) is a family of methods that aims to make PCA robust to large errors and outliers. That is, RPCA is an upgrade of PCA.
There are some reasons for the study of lower bound of a low-rank matrix approximation problem. Firstly, as far as we know, there is no literature to consider the lower bound of the low-rank matrix approximation problem. In our paper, we first put forward the lower bound. Secondly, for the low-rank approximation, when a perturbation E exists, there is an approximation error which cannot be avoided, that is, the approximation error cannot equal , but tends to . Thirdly, from our main results, we can clearly find the influence of the spectral norm ( ·  ) on the low-rank matrix approximation. For example, for our main result of Case II, when the maximum eigenvalue of the matrix D is larger, the approximation error of (D -A) is smaller. In addition, the lower bound can verify whether the solution obtained by algorithms is optimal. For details, please refer to the experiments Section  of our paper. Therefore, it is necessary and significant to study the lower bound of the low-rank matrix approximation problem.
Remark . PCA and RPCA are methods for the low-rank approximation problem when perturbation item exists. Our aim is to prove that no matter what method is used, the lower bound of error always exists and it cannot be avoided with the perturbation item E. Considering the existence of error, this paper focuses on the specific situation of this lower bound.

Notations
For a matrix A ∈ R m×n , let A  and A * denote the spectral norm and the nuclear norm (i.e., the sum of its singular values), respectively. Let · be a unitarily invariant norm. The pseudo-inverse and the conjugate transpose of A are denoted by A † and A H , respectively. We consider the singular value decomposition (SVD) of a matrix A of rank r where U and V are m × r and n × r matrices with orthonormal columns, respectively, and σ i is the positive singular values. We always assume that the SVD of a matrix is given in the reduced form above. Furthermore, A, B = trace(A H B) denotes the standard inner product, then the Frobenius norm is

Organization
In this paper, we study a perturbation theory for low-rank matrix approximation. When D ≥ A or D ≤ A , two sharp lower bounds of D -A are derived for a unitarily invariant norm respectively. This work is organized as follows. In Section , we provide a review of relevant linear algebra and some preliminary results. In Section , under different norms, two sharp lower bounds of D -A are given for the low-rank approximation problem and some proofs of Theorem . are presented. In Section , example and applications are given to verify the provided lower bounds. Finally, we conclude the paper with a short discussion.

Preliminaries
In order to prove our main results, we mention the following results for our further discussions.

Unitarily invariant norm
An important property of a Euclidean space is that shapes and distance do not change under rotation. In particular, for any vector x and for any unitary matrices U, we have An analogous property is shared by the spectral and Frobenius norms: namely, for any unitary matrices U and V , the product UAV H is defined by These examples suggest the following definition.
for any unitary matrices U and V . It is normalized if Remark . Let = UAV H be the singular value decomposition of the matrix A with order n. Let · be a unitarily invariant norm. Since U and V are unitary, Thus A is a function of the singular values of A.
The -norm plays a special role in the theory of unitarily invariant norms as the following theorem shows.

Theorem . ([]) Let · be a family of unitarily invariant norm. Then
We have observed that the spectral and Frobenius norms are unitarily invariant. However, not all norms are unitarily invariant as the following example shows.

Example . Let
obviously, A ∞ = , but for a unitary matrix Remark . It is easy to verify that the nuclear norm · * is a unitarily invariant norm.

Projection
Let C m and C n be m and n-dimensional inner product spaces over the complex field, respectively, and A ∈ C m×n be a linear transformation from C n into C m .

Definition . ([]) The column space (range) of A is denoted by
and the null space of A by Further, we let ⊥ denote the orthogonal complement and get The following properties [] of the pseudo-inverse are easily established.

Theorem . ([])
For any matrix A, the following hold.

The decomposition of D † -A †
In this section, we focus on the decomposition of D † -A † and a general bound of the perturbation theory for pseudo-inverses. Firstly, according to the orthogonal projection, we can deduce the following lemma.

Lemma . For any matrix A, P A = AA † and P A H = A † A, then we have
The proof is completed.
Using Lemma ., the decompositions of D † -A † are developed by Wedin [].
where γ is given in Table .
In the following section, based on Theorem ., we provide two lower error bounds of D -A for a unitarily invariant norm.

Our main results
In this section, we consider the lower bound theory for the low-rank matrix approximation based on a useful decomposition of D † -A † . When Rank(A) ≤ Rank(D), some sharp lower bounds of D -A are derived in terms of a unitarily invariant norm. In order to prove our result, some lemmas are listed below.

Lemma . ([]) Let D = A + E, the projections P D and P A satisfy
therefore ) Let A, D ∈ C m×n , Rank(A) = r, Rank(D) = s, r ≤ s, then there exists a unitary matrix Q ∈ C m×m such that According to Lemma ., we can easily get the following result. and and Therefore, they have the same singular values which yield that P ⊥ This is a useful lemma that we will use in the proof of the main result. In order to prove our main theorem, two lower bounds of D -A are required by the following lemma.

Lemma . For the unitarily invariant norm, if Rank(A) ≤ Rank(D), then the lower bound of D -A satisfies:
Case I: We complete the proof of Lemma ..
Our main results can be described as the following theorem.
Theorem . Suppose that D = A + E, Rank(A) ≤ Rank(D), for the unitarily invariant norm · , the error of D -A has the following bounds.
where the value options for γ are the same as in Table .
Proof Case I: For D ≥ A , by Theorem . and Lemma . (), we can deduce Case II: Similarly, for D ≤ A , by Theorem . and Lemma . (), we can deduce where the value options for γ are the same as in Table . In summary, we prove the lower bounds of Theorem ..
Remark . From the main theorem, we can see that if D = A , then D -A = . However, in the problem of low-rank matrix approximation, D is not necessarily equal to A , so the approximation error is present. Furthermore, when D is close to A , simulations demonstrate that the error has a very small magnitude (see Section ).
In this section, we discuss the error bounds under different conditions for the unitarily invariant norm. Based on a useful decomposition of D † -A † , for D ≥ A and D ≤ A , we have bounds () and (), respectively. The two error bounds are useful in low-rank matrix approximation. The following experiments illustrate our results when the approximation matrix A is low-rank and the perturbation matrix E is sparse.

The singular value thresholding algorithm
Our results are obtained by a singular value thresholding (SVT []) algorithm. This algorithm is easy to implement and surprisingly effective both in terms of computational cost and storage requirement when the minimum nuclear norm solution is also the lowest-rank solution. The specific algorithm is described as follows.
For the low-rank matrix approximation problem which is contaminated with perturbation item E, we observe that the data matrix D = A + E. To approximate D, we can solve the convex optimization problem where · * denotes the nuclear norm of a matrix (i.e., the sum of its singular values).
For solving (), we introduce the soft-thresholding operator D τ [] which is defined as In general, this operator can effectively shrink some singular values toward zero. The following theorem is with respect to the shrinkage operators [-], which will be used at each iteration of the proposed algorithms.

Theorem . ([])
For each τ >  and W ∈ R m×n , the singular value shrinkage operator D τ (·) obeys By introducing a Lagrange multiplier Y to remove the inequality constraint, one has the augmented Lagrangian function of () The iterative scheme of the classical augmented Lagrangian multipliers method is Based on the optimality conditions, () is equivalent to Algorithm  SVT Task: Approximate the solution of (). Input: Observation matrix D = A + E, weight τ . Y  = zeros(m, n) while the termination criterion is not met, do where ∂(·) denotes the subgradient operator of a convex function. Then, by Theorem . above, we have the iterative solution The SVT approach works as described in Algorithm .

Simulations
In this section, we use the SVT algorithm for the low-rank matrix approximation problem. Let D = A + E ∈ R m×n be the available data. Simply, we restrict our examples to square matrices (m = n). We draw A according to the independent random matrices and generate the perturbation matrix E to be sparse, which satisfies the i.i.d. Gaussian distribution. Specially, the rank of the matrix A and the sparse entries of the perturbation matrix E are selected to be %m and %m  , respectively.

Applications
In this section, we use the SVT algorithm for the low-rank image approximation. From Figures  and , comparing with the original image (a), the low-rank image (b) loses some details. We can hardly get any detailed information from incomplete image (c). However, the output image (d) = A k , which is obtained by the SVT algorithm, can recover the details of the low-rank image (b). If we denote image (b) to be a low-rank matrix A, then image (c) is the observed data matrix D which is perturbed by a sparse matrix E, that is,   Using the SVT algorithm for the low-rank image approximation problem, the lower bound comparison results are shown in Table . We calculate E F = D -A F are .e- and .e- for images Cameraman and Barbara, respectively. But for F-norm of our lower bound (), we can see that they are .e- and .e- for images Cameraman and Barbara, respectively. That is to say, our error bounds can verify that the SVT algorithm still can be improved.

Conclusion
Low-rank matrix approximation problem is a field which arises in a number of applications in model selection, system identification, complexity theory, and optics. Based on a useful decomposition of D † -A † , this paper reviewed the previous work and provided two sharp lower bounds for the low-rank matrices recovery problem with a unitarily invariant norm. From our main Theorem ., we can see that if D = A , then D -A = . However, in the problem of low-rank matrix approximation, D is not necessarily equal to A , so the approximation error is present. Furthermore, from the main results, we can clearly find the influence of the spectral norm ( ·  ) on the low-rank matrix approximation. For example, in Case II, when the maximum eigenvalue of the matrix D is larger, the error of D -A is smaller.
Finally, we use the SVT algorithm for the low-rank matrix approximation problem. Table  shows that our lower bounds (), () are smaller than lower bound (). Simulation results demonstrate that the lower bounds have a very small magnitude. In applications section, we use the SVT algorithm for the low-rank image approximation problem, the lower bounds comparison results are shown in Table . From the comparison results, we find that our lower bounds can verify whether the SVT algorithm can be improved.