- Research
- Open Access
Lower bounds for the low-rank matrix approximation
- Jicheng Li^{1},
- Zisheng Liu^{1, 2}Email author and
- Guo Li^{3}
https://doi.org/10.1186/s13660-017-1564-z
© The Author(s) 2017
- Received: 5 August 2017
- Accepted: 10 November 2017
- Published: 21 November 2017
Abstract
Low-rank matrix recovery is an active topic drawing the attention of many researchers. It addresses the problem of approximating the observed data matrix by an unknown low-rank matrix. Suppose that A is a low-rank matrix approximation of D, where D and A are \(m \times n\) matrices. Based on a useful decomposition of \(D^{\dagger} - A^{\dagger}\), for the unitarily invariant norm \(\|\cdot\|\), when \(\|D\|\geq\|A\| \) and \(\|D\|\leq\|A\|\), two sharp lower bounds of \(D - A\) are derived respectively. The presented simulations and applications demonstrate our results when the approximation matrix A is low-rank and the perturbation matrix is sparse.
Keywords
- low-rank matrix
- approximation
- error estimation
- pseudo-inverse
- matrix norms
MSC
- 15A23
- 34D10
- 68W25
- 90C25
- 90C59
1 Introduction
In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data.
Low-rank approximation of a linear operator is ubiquitous in applied mathematics, scientific computing, numerical analysis, and a number of other areas. For example, a low-rank matrix could correspond to a low-degree statistical model for a random process (e.g., factor analysis), a low-order realization of a linear system [1], or a low-dimensional embedding of data in the Euclidean space [2], the image and computer vision [3–5], bioinformatics, background modeling and face recognition [6], latent semantic indexing [7, 8], machine learning [9–12] and control [13] etc. These data may have thousands or even billions of dimensions, and a large number of samples may have the same or similar structure. As we know, the important information lies in some low-dimensional subspace or low-dimensional manifold, but interfered with some perturbative components (sometimes interfered by the sparse component).
Despite its many advantages, the traditional PCA suffers from the fact that the estimation Â obtained by classical PCA can be arbitrarily far from the true A, when E is sufficiently sparse (relative to the rank of A). The reason for this poor performance is precisely that the traditional PCA makes sense for Gaussian noise and not for sparse noise. Recently, robust PCA (RPCA [6]) is a family of methods that aims to make PCA robust to large errors and outliers. That is, RPCA is an upgrade of PCA.
There are some reasons for the study of lower bound of a low-rank matrix approximation problem. Firstly, as far as we know, there is no literature to consider the lower bound of the low-rank matrix approximation problem. In our paper, we first put forward the lower bound. Secondly, for the low-rank approximation, when a perturbation E exists, there is an approximation error which cannot be avoided, that is, the approximation error cannot equal 0, but tends to 0. Thirdly, from our main results, we can clearly find the influence of the spectral norm (\(\|\cdot\|_{2}\)) on the low-rank matrix approximation. For example, for our main result of Case II, when the maximum eigenvalue of the matrix D is larger, the approximation error of \((D-A)\) is smaller. In addition, the lower bound can verify whether the solution obtained by algorithms is optimal. For details, please refer to the experiments Section 4 of our paper. Therefore, it is necessary and significant to study the lower bound of the low-rank matrix approximation problem.
Remark 1.1
PCA and RPCA are methods for the low-rank approximation problem when perturbation item exists. Our aim is to prove that no matter what method is used, the lower bound of error always exists and it cannot be avoided with the perturbation item E. Considering the existence of error, this paper focuses on the specific situation of this lower bound.
1.1 Notations
1.2 Organization
In this paper, we study a perturbation theory for low-rank matrix approximation. When \(\|D\|\geq\|A\| \) or \(\|D\|\leq\|A\|\), two sharp lower bounds of \(D - A\) are derived for a unitarily invariant norm respectively. This work is organized as follows. In Section 2, we provide a review of relevant linear algebra and some preliminary results. In Section 3, under different norms, two sharp lower bounds of \(D - A\) are given for the low-rank approximation problem and some proofs of Theorem 3.5 are presented. In Section 4, example and applications are given to verify the provided lower bounds. Finally, we conclude the paper with a short discussion.
2 Preliminaries
In order to prove our main results, we mention the following results for our further discussions.
2.1 Unitarily invariant norm
Definition 2.1
([18])
Remark 2.2
The 2-norm plays a special role in the theory of unitarily invariant norms as the following theorem shows.
Theorem 2.3
([18])
We have observed that the spectral and Frobenius norms are unitarily invariant. However, not all norms are unitarily invariant as the following example shows.
Example 2.4
Remark 2.5
It is easy to verify that the nuclear norm \(\|\cdot\|_{\ast}\) is a unitarily invariant norm.
2.2 Projection
Let \(\mathbb{C}^{m}\) and \(\mathbb{C}^{n}\) be m and n-dimensional inner product spaces over the complex field, respectively, and \(A\in\mathbb{C}^{m\times n}\) be a linear transformation from \(\mathbb{C}^{n}\) into \(\mathbb{C}^{m}\).
Definition 2.6
([18])
The following properties [18] of the pseudo-inverse are easily established.
Theorem 2.7
([18])
- 1.
If \(A\in\mathbb{C}^{m\times n}\) has rank n, then \(A^{\dagger }=(A^{H} A)^{-1}A^{H}\) and \(A^{\dagger}A=I^{(n)}\).
- 2.
If \(A\in\mathbb{C}^{m\times n}\) has rank m, then \(A^{\dagger }=A^{H}(A A^{H} )^{-1}\) and \(A A^{\dagger}=I^{(m)}\).
Theorem 2.8
([18])
For any matrix A, \(P_{A} = AA^{\dagger}\) is the orthogonal projector onto \(\mathcal{R}(A)\), \(P_{A^{H}}= A^{\dagger}A\) is the orthogonal projector onto \(\mathcal{R}(A^{H})\), \(I - P_{A^{H}}\) is the orthogonal projector onto \(\mathcal{N}(A)\).
2.3 The decomposition of \(D^{\dagger} - A^{\dagger}\)
In this section, we focus on the decomposition of \(D^{\dagger} - A^{\dagger}\) and a general bound of the perturbation theory for pseudo-inverses. Firstly, according to the orthogonal projection, we can deduce the following lemma.
Lemma 2.9
Proof
Using Lemma 2.9, the decompositions of \(D^{\dagger} - A^{\dagger}\) are developed by Wedin [19].
Theorem 2.10
([19])
By Lemma 2.9, using \(P_{A} = AA^{\dagger}\), \(P_{A^{H}}= A^{\dagger}A\), \(P_{A}^{\perp}=I-P_{A}\), \(P_{A^{H}}^{\perp}=I-P_{A^{H}}\), these expressions can be verified.
In previous work [19], Wedin developed a general bound of the perturbation theory for pseudo-inverses. Theorem 2.11 is based on a useful decomposition of \(D^{\dagger} - A^{\dagger}\), where D and A are \(m \times n\) matrices. Sharp estimates of \(\|D^{\dagger} - A^{\dagger}\|\) are derived for a unitarily invariant norm. In [20], Chen et al. presented some new perturbation bounds for the orthogonal projections \(\|P_{D} - P_{A}\|\).
Theorem 2.11
([19])
Value options for γ
∥⋅∥ | Arbitrary | Spectral | Frobenius |
---|---|---|---|
γ | 3 | \(\frac{1+\sqrt{5}}{2}\) | \(\sqrt{2}\) |
Remark 2.12
For the spectral norm, by formula (11) we can achieve \(\gamma= \frac {1+\sqrt{5}}{2}\). When \(\|\cdot\|\) is the Frobenius norm, by formula (12), we have \(\gamma= \sqrt{2}\). Similarly, for an arbitrary unitarily invariant norm, according to formula (13), we can deduce \(\gamma=3\).
Remark 2.13
In the following section, based on Theorem 2.11, we provide two lower error bounds of \(D - A\) for a unitarily invariant norm.
3 Our main results
In this section, we consider the lower bound theory for the low-rank matrix approximation based on a useful decomposition of \(D^{\dagger} - A^{\dagger}\). When \(\operatorname{Rank}(A)\leq \operatorname{Rank}(D)\), some sharp lower bounds of \(D - A \) are derived in terms of a unitarily invariant norm. In order to prove our result, some lemmas are listed below.
Lemma 3.1
([18])
Lemma 3.2
([21])
According to Lemma 3.2, we can easily get the following result.
Lemma 3.3
Proof
This is a useful lemma that we will use in the proof of the main result. In order to prove our main theorem, two lower bounds of \(D - A\) are required by the following lemma.
Lemma 3.4
For the unitarily invariant norm, if \(\operatorname{Rank}(A)\leq \operatorname{Rank}(D)\), then the lower bound of \(D-A\) satisfies:
Proof
Our main results can be described as the following theorem.
Theorem 3.5
Suppose that \(D = A + E\), \(\operatorname{Rank}(A)\leq \operatorname{Rank}(D)\), for the unitarily invariant norm \(\|\cdot\|\), the error of \(D - A\) has the following bounds.
Proof
Remark 3.6
From the main theorem, we can see that if \(\|D\|=\|A\|\), then \(\|D - A \|=0\). However, in the problem of low-rank matrix approximation, \(\|D\|\) is not necessarily equal to \(\|A\|\), so the approximation error is present. Furthermore, when \(\|D\|\) is close to \(\|A\|\), simulations demonstrate that the error has a very small magnitude (see Section 4).
In this section, we discuss the error bounds under different conditions for the unitarily invariant norm. Based on a useful decomposition of \(D^{\dagger} - A^{\dagger}\), for \(\| D\| \geq\|A\|\) and \(\| D\| \leq\|A\|\), we have bounds (26) and (27), respectively. The two error bounds are useful in low-rank matrix approximation. The following experiments illustrate our results when the approximation matrix A is low-rank and the perturbation matrix E is sparse.
4 Experiments
4.1 The singular value thresholding algorithm
Our results are obtained by a singular value thresholding (SVT [22]) algorithm. This algorithm is easy to implement and surprisingly effective both in terms of computational cost and storage requirement when the minimum nuclear norm solution is also the lowest-rank solution. The specific algorithm is described as follows.
Theorem 4.1
([22])
4.2 Simulations
In this section, we use the SVT algorithm for the low-rank matrix approximation problem. Let \(D = A + E\in\mathbb{R}^{m\times n}\) be the available data. Simply, we restrict our examples to square matrices (\(m=n\)). We draw A according to the independent random matrices and generate the perturbation matrix E to be sparse, which satisfies the i.i.d. Gaussian distribution. Specially, the rank of the matrix A and the sparse entries of the perturbation matrix E are selected to be \(5\% m\) and \(5\% m^{2}\), respectively.
Lower bound comparison results
Bound ( 24 ) | Bound ( 25 ) | Bound ( 12 ) | ||||
---|---|---|---|---|---|---|
m = n | \(\boldsymbol{\|\cdot\|_{2}}\) | \(\boldsymbol{\|\cdot\|_{F}}\) | \(\boldsymbol{\|\cdot\|_{2}}\) | \(\boldsymbol{\| \cdot\|_{F}}\) | \(\boldsymbol{\|\cdot\|_{2}}\) | \(\boldsymbol{\|\cdot\|_{F}}\) |
100 | 8.13e-7 | 1.89e-7 | 1.54e-7 | 3.31e-7 | 1.01e-4 | 1.27e-4 |
500 | 5.11e-8 | 3.71e-8 | 4.22e-8 | 4.62e-8 | 4.23e-4 | 5.22e-4 |
1,000 | 3.76e-8 | 2.14e-8 | 1.01e-8 | 1.19e-8 | 5.57e-4 | 7.48e-4 |
4.3 Applications
Lower bound comparison results of low-rank image approximation
Cameraman | Barbara | |
---|---|---|
\(\|E\|_{F}\) | 8.71e-2 | 7.23e-2 |
Bound (25) | 2.59e-5 | 1.09e-5 |
Iters | 200 | 200 |
5 Conclusion
Low-rank matrix approximation problem is a field which arises in a number of applications in model selection, system identification, complexity theory, and optics. Based on a useful decomposition of \(D^{\dagger} - A^{\dagger}\), this paper reviewed the previous work and provided two sharp lower bounds for the low-rank matrices recovery problem with a unitarily invariant norm.
From our main Theorem 3.5, we can see that if \(\|D\|=\|A\|\), then \(\|D - A \|=0\). However, in the problem of low-rank matrix approximation, \(\|D\|\) is not necessarily equal to \(\|A\|\), so the approximation error is present. Furthermore, from the main results, we can clearly find the influence of the spectral norm (\(\|\cdot\|_{2}\)) on the low-rank matrix approximation. For example, in Case II, when the maximum eigenvalue of the matrix D is larger, the error of \(D-A\) is smaller.
Finally, we use the SVT algorithm for the low-rank matrix approximation problem. Table 2 shows that our lower bounds (24), (25) are smaller than lower bound (12). Simulation results demonstrate that the lower bounds have a very small magnitude. In applications section, we use the SVT algorithm for the low-rank image approximation problem, the lower bounds comparison results are shown in Table 3. From the comparison results, we find that our lower bounds can verify whether the SVT algorithm can be improved.
Declarations
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China under grant No. 11671318, and the Fundamental Research Funds for the Central Universities (Xi’an Jiaotong University, Grant No. xkjc2014008).
Authors’ contributions
All authors worked in coordination. All authors carried out the proof, read and approved the final version of the manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Fazel, M, Hindi, H, Boyd, S: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the American Control Conference, vol. 6, pp. 4734-4739 (2002) Google Scholar
- Linial, N, London, E, Rabinovich, Y: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215-245 (1995) MathSciNetView ArticleMATHGoogle Scholar
- Tomasi, C, Kanade, T: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vis. 9, 137-154 (1992) View ArticleGoogle Scholar
- Chen, P, Suter, D: Recovering the missing components in a large noisy low-rank matrix: application to SFM. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 1051-1063 (2004) View ArticleGoogle Scholar
- Liu, ZS, Li, JC, Li, G, Bai, JC, Liu, XN: A new model for sparse and low-rank matrix decomposition. J. Appl. Anal. Comput. 2, 600-617 (2017) MathSciNetGoogle Scholar
- Wright, J, Ganesh, A, Shankar, R, Yigang, P, Ma, Y: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Twenty-Third Annual Conference on Neural Information Processing Systems (NIPS 2009) (2009) Google Scholar
- Deerwester, S, Dumains, ST, Landauer, T, Furnas, G, Harshman, R: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. Technol. 41(6), 391-407 (1990) View ArticleGoogle Scholar
- Papadimitriou, C, Raghavan, P, Tamaki, H, Vempala, S: Latent semantic indexing, a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217-235 (2000) MathSciNetView ArticleMATHGoogle Scholar
- Argyriou, A, Evgeniou, T, Pontil, M: Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41-48 (2007) Google Scholar
- Abernethy, J, Bach, F, Evgeniou, T, Vert, JP: Low-rank matrix factorization with attributes. arXiv:cs/0611124 (2006)
- Amit, Y, Fink, M, Srebro, N, Ullman, S: Uncovering shared structures in multiclass classification. In: Proceedings of the 24th International Conference on Machine Learning, pp. 17-24. ACM, New York (2007) Google Scholar
- Zhang, HY, Lin, ZC, Zhang, C, Gao, J: Robust latent low rank representation for subspace clustering. Neurocomputing 145, 369-373 (2014) View ArticleGoogle Scholar
- Mesbahi, M, Papavassilopoulos, GP: On the rank minimization problem over a positive semidefinite linear matrix inequality. IEEE Trans. Autom. Control 42, 239-243 (1997) MathSciNetView ArticleMATHGoogle Scholar
- Golub, GH, Van Loan, CF: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013) MATHGoogle Scholar
- Eckart, C, Young, G: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211-218 (1936) View ArticleMATHGoogle Scholar
- Hotelling, H: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417-520 (1932) View ArticleGoogle Scholar
- Jolliffe, I: Principal Component Analysis. Springer, Berlin (1986) View ArticleMATHGoogle Scholar
- Stewart, GW, Sun, JG: Matrix Perturbation Theory. Academic Press, New York (1990) MATHGoogle Scholar
- Wedin, P-Å: Perturbation theory for pseudo-inverses. BIT Numer. Math. 13(2), 217-232 (1973) MathSciNetView ArticleMATHGoogle Scholar
- Chen, YM, Chen, XS, Li, W: On perturbation bounds for orthogonal projections. Numer. Algorithms 73, 433-444 (2016) MathSciNetView ArticleMATHGoogle Scholar
- Sun, JG: Matrix Perturbation Analysis, 2nd edn. Science Press, Beijing (2001) Google Scholar
- Cai, J-F, Candès, EJ, Shen, ZW: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956-1982 (2010) MathSciNetView ArticleMATHGoogle Scholar
- Candès, EJ, Li, X, Ma, Y, Wright, J: Robust principal component analysis? J. ACM 58(3), 1-37 (2011) MathSciNetView ArticleMATHGoogle Scholar
- Tao, M, Yuan, XM: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 20(1), 57-81 (2011) MathSciNetView ArticleMATHGoogle Scholar