 Research
 Open Access
 Published:
A customized proximal point algorithm for stable principal component pursuit with nonnegative constraint
Journal of Inequalities and Applications volume 2015, Article number: 148 (2015)
Abstract
The stable principal component pursuit (SPCP) problem represents a large class of mathematical models appearing in sparse optimizationrelated applications such as image restoration, web data ranking. In this paper, we focus on designing a new primaldual algorithm for the SPCP problem with nonnegative constraint. Our method is based on the framework of proximal point algorithm. By taking full exploitation to the special structure of the SPCP problem, the method enjoys the advantage of being easily implementable. Global convergence result is established for the proposed method. Preliminary numerical results demonstrate that the method is efficient.
Introduction
With the development of information technology, the subject of highdimensional data becomes more and more popular in science and engineering applications such as image and video processing, web documents analysis and bioinformatics data processing. An intensive research attention has been devoted recently to analyzing, processing and exacting useful information from the highdimensional data efficiently and accurately. The classical principal component analysis (PCA) is the most widely used tool for highdimensional data analysis, and it plays a fundamental role in dimensionality reduction. PCA computes the singular value decomposition (SVD) of a matrix to obtain a lowdimensional approximation to highdimensional data in the \(\ell_{2}\) sense [1]. However, PCA usually breaks down when the given data is corrupted by gross errors. In other words, the classical PCA is not robust to gross errors or outliers. To overcome this issue, many methods have been proposed. In [2], a new model called principal component pursuit (PCP) was proposed by Candès and Wright under weak assumptions. It is assumed that the matrix \(M\in\mathbb{R}^{m\times n}\) is of the form \(M=L+S\), where L is the underlying lowrank matrix representing the principle components and S is a sparse matrix with its most entries being zero. To recover L and S, PCP requires to solve the following convex optimization problem:
where \(\L\_{*}\) denotes the nuclear norm of L, which is equal to the sum of its singular values, \(\S\_{1}=\sum_{i,j}S_{i,j}\) is the \(\ell_{1}\) norm of S, and ρ is a parameter balancing the lowrank and sparsity.
In [3], it was shown that the recovery is still feasible even when the data matrix M is corrupted with a dense error matrix Z such that \(\Z\_{F}\leq\delta\). Indeed, this can be accomplished by solving the following stable principal component pursuit (SPCP) problem:
where the matrix Z is the noise, \(\Z\_{F}\) denotes the Frobenius norm of Z, and \(\sigma>0\) is the noise level. Note that (1.1) is a special case of the SPCP problem (1.2) with \(\sigma=0\). In practical applications such as background exacting from face recognition, video denoising and surveillance video, the lowrank matrix L always represents an image. Therefore, adding a nonnegative constraint \(L\geq0\) to (1.2) makes sense. This results in the following SPCP problem with nonnegative constraint:
where \(\mathcal{I}(\cdot)\) is an indicator function [4].
Recently, many algorithms using only firstorder information for solving the SPCP problem (1.2) have been proposed. Aybat and Iyengar proposed a firstorder augmented Lagrangian algorithm (FALC) which was the first algorithm with a known complexity bound that solves the SPCP problem in [5]. Tao and Yuan developed the alternating splitting augmented Lagrangian method (ASALM) and its variant (VASALM) for solving (1.2) in [6]. Aybat et al. advanced a new firstorder algorithm NSA based on partial variable splitting in [7]. Nevertheless, how to solve (1.3) has not caused enough attention. We can only find that Ma proposed an alternating proximal gradient method (APGM) which was based on the framework of alternating direction method of multipliers for solving (1.3) in [4]. In this paper, we propose a customized proximal point algorithm with a special proximal regularization parameter to solve the SPCP problem. Note that (1.3) is wellstructured in the sense that the separable structure emerges in both the objective function and the constraints. A natural idea is to develop a customized algorithm to take advantage of the favorable structure of (1.3). This is the main motivation of our paper.
The rest of this paper is organized as follows. In Section 2, we give some useful preliminaries. In Section 3, we present the customized PPA for solving (1.3) and the convergence analysis is shown in Section 4. In Section 5, we compare our algorithm with APGM to illustrate the efficiency by performing numerical experiments. Finally, some conclusions are drawn in Section 6.
Preliminaries
In order to facilitate the analysis, we consider the following separable convex optimization problem with linear constraint instead of (1.3):
where \(A\in\mathbb{R}^{m\times n}\), \(B\in\mathbb{R}^{m\times p}\), \(b\in\mathbb{R}^{m}\), \(\mathcal{X} \subseteq\mathbb{R}^{n}\) and \(\mathcal{Y} \subseteq\mathbb{R}^{p}\) are convex sets, and \(f(x): \mathbb{R}^{n}\rightarrow\mathbb{R}\) and \(g(y): \mathbb {R}^{p}\rightarrow\mathbb{R}\) are both convex but not necessarily smooth functions [8]. Throughout, the solution set of (2.1) is assumed to be nonempty. Furthermore, we assume that \(f(x)\) and \(g(y)\) are ‘simple’ which means that their proximal operators have a closedform representation or they can be efficiently solved up to a high precision [9]. The proximal operator of the function \(\varphi(x): \mathbb{R}^{n}\rightarrow \mathbb{R}\) is defined as
for any given \(a\in\mathbb{R}^{n}\) and \(\xi>0\) [10]. The nuclear norm of L and the \(\ell_{1}\) norm of S in (1.3) both are simple functions. Under the assumption, we will show that our algorithm for solving (1.3) can result in easy proximal subproblems.
We signify the subdifferential of the convex function \(f(x)\) by \(\partial{f(x)}\),
and each \(d\in\partial{f(x)}\) is called a subgradient of \(f(x)\) [11]. Let \(\theta(x) \in\partial{f(x)}\), then we can have
Merging (2.4a) and (2.4b), we can easily get
which implies that the mapping \(\theta(\cdot)\) is monotone.
Now, we show that (2.1) can be characterized by a variational inequality (VI) framework. Let \(\lambda\in\mathbb{R}^{m}\) be the Lagrangian multiplier associated with the linear constraint in (2.1), then the Lagrangian function of (2.1) is
By deriving the optimality conditions of (2.1), we can easily find that solving (2.1) is equivalent to finding a pair of \((x^{*},y^{*};\lambda^{*})\) which satisfies
where \(\theta(x^{*})\in\partial{f(x^{*})}\) and \(\gamma(y^{*})\in\partial {g(y^{*})}\). By denoting
and
problem (2.7) can be rewritten as the variational inequality reformulation
Obviously, the mapping \(F(w)\) defined in (2.8a) is monotone. The solution set of (2.8a)(2.9), denoted by \(\Omega^{*}\) is nonempty under the assumption that the solution set of (2.1) is not empty.
The new algorithm
In this section, we will present our new algorithm to solve VI (2.9). However, at the beginning, we first review the classical PPA.
After the PPA was proposed firstly by Martinet in [12] and further developed by Rockafellar in [13], it plays a vital role in optimization area. Given the iterate \(w^{k}\), the classical PPA generates the new iterate \(w^{k+1}\in\Omega\) via the following procedure:
where the metric proximal parameter \(G\in\mathbb{R}^{n\times n}\) is required to be a positive definite matrix. A popular choice of G is \(G=\beta I\), where \(\beta>0\) and I is the identity matrix [14]. Here, we are ready to present our new algorithm for solving (2.1).
Algorithm 1
(The main algorithm for (2.1))

Let \(r>0\) and \(s>\frac{1}{r}\B^{T}B\\), take \((x^{0},y^{0};\lambda^{0})\in\mathcal{X}\times\mathcal{Y}\times\mathbb{R}^{m}\) as the initial point.
 Step 1.:

Update y, x and λ:
$$\begin{aligned}& y^{k+1}=\operatorname{argmin} \biggl\{ g(y)+\frac{r}{2} \biggl\ yy^{k}\frac{1}{r}B^{T}\lambda^{k} \biggr\ ^{2}\Bigm{y\in \mathcal{Y}} \biggr\} , \\& x^{k+1}=\operatorname{argmin} \biggl\{ f(x)\biggl(\lambda^{k} \frac{1}{s}\bigl(Ax^{k}+B\bigl(2y^{k+1}y^{k} \bigr)b\bigr)\biggr)^{T}Ax \\& \hphantom{x^{k+1}=}{}+\frac{1}{2s}\bigl\ A\bigl(xx^{k}\bigr) \bigr\ ^{2}+\frac{1}{2}\bigl\ xx^{k}\bigr\ ^{2}\Bigm{x\in \mathcal{X}}\biggr\} , \\& \lambda^{k+1}=\lambda^{k}\frac{1}{s} \bigl(Ax^{k+1}+B\bigl(2y^{k+1}y^{k}\bigr)b \bigr). \end{aligned}$$  Step 2.:

If the termination criterion is met, stop the algorithm; otherwise, go to Step 1.
The new customized PPA described above is known as alternating direction method of multipliers (ADMM) with two blocks [15]. Its global convergence result has been proven in many literature works. However, there are three variables in (1.3). If we apply the customized PPA to the SPCP problem directly, the convergence of the algorithm cannot be guaranteed [16]. Moreover, the proximal mapping of \(\L\_{*}+\mathcal {I}(L\geq0)\) is difficult to compute [4]. By introducing a new auxiliary parameter K, we can group L and S as one big block \([L;S]\), and group Z and K as another big block \([Z;K]\). Then (1.3) can be rewritten as a similar form of (2.1):
Then we can solve (1.3) or (3.2) by applying the new customized PPA as follows:
The simplicity of the above scheme is that all the subproblems have closedform solutions. As we see, model (1.3) turns out to be (1.1) when we set \(\sigma>0\) and abandon the constraint \(L\geq0\). Under these circumstances, subproblems (3.3a) and (3.3b) are the solution of (1.1). We now show the reason that the four subproblems can be solved easily. The first subproblem (3.3a) is equivalent to solving the proximal mapping of the nuclear norm \(\L\_{*}\) and can be expressed by
where the matrix shrinkage operation \(\operatorname{MatShrink}(M, \alpha)\) (\(\alpha>0\)) is defined as
and \(U\operatorname{Diag}(\sigma)V^{T}\) is the SVD of the matrix M, see [17] and [18].
The Ssubproblem (3.3b) can be solved by
where the \(\ell_{1}\) shrinkage operator [19] \(\operatorname {Shrink}(M, \alpha)\) is defined as
The closedform solution of the third subproblem (3.3c) can be written as
which means projecting the matrix \(W^{k} :=M+s\Lambda _{1}^{k}(2L^{k+1}L^{k}+2S^{k+1}S^{k})\) onto the Euclidean ball \(\Z\_{F}\leq \sigma\). The Ksubproblem (3.3d) corresponds to projecting the matrix \((2L^{k+1}L^{k}+s\Lambda_{2}^{k}M)\) onto the nonnegative orthant and this can be done by
where the max function is componentwise [4].
Convergence analysis
In this section, we will show the global convergence result of the algorithm proposed for solving (2.1) or (3.2). First, we need to prove the following lemma.
Lemma 1
Let \(w^{k+1}=(x^{k+1},y^{k+1};\lambda^{k+1})\) be generated by the proposed algorithm from the given \(w^{k}=(x^{k},y^{k};\lambda^{k})\). Then we can have
where
Proof
Deriving the firstorder optimality condition of the first equality in Algorithm 1, we can obtain
It can also be expressed as
Homoplastically, the second iteration for solving \(x^{k+1}\) in Algorithm 1 shows us that
Substituting \(\lambda^{k}=\lambda^{k+1}+\frac{1}{s} (Ax^{k+1}+B(2y^{k+1}y^{k})b ) \) into (4.4), we get
Note that the λiteration can be written as
Merging (4.3), (4.5) and (4.6), we achieve
Utilizing the notation of w, \(F(w)\) and the matrix G, the inequality above can be rewritten as
In other words, the lemma is proved. Inequality (4.8) implies that the proposed algorithm is in fact equivalent to the proximal point algorithm with a special proximal regularization parameter matrix G. □
Now we show and prove the contractive property of the proposed algorithm.
Lemma 2
Assume that the parameters \(r>0\) and \(s>\frac{1}{r}\ B^{T}B\\) are satisfied. Let \(w^{k+1}\) be the sequence generated by the new algorithm with an arbitrary initial iterate \(w^{0}\). Then it holds
The norm \(\\cdot\_{G}^{2}\) is defined as \(\w\_{G}^{2}=\langle w,Gw \rangle \) and the corresponding inner product \(\langle\cdot,\cdot\rangle_{G}\) is defined as \(\langle u,v\rangle_{G}=\langle u,Gv\rangle\).
Proof
Because \(w^{*}\in\Omega^{*}\) is optimal to (2.1), it follows from the KKT conditions that the following hold:
As we see, the optimality condition for the first subproblem in Algorithm 1 is
Combining (4.10b) and (4.11) under the fact that \(\theta(\cdot)\) is monotone, we have
Similarly, the optimality condition for the subproblem with respect to x can be given by
Substituting the λsubproblem into (4.12), we obtain
Combining (4.10a) and (4.13), we get
Summing (4.12) and (4.15), we can achieve
Combining the λsubproblem with (4.10c), we get
Note that (4.17) can be rewritten as
Using the definition of \(\langle\cdot,\cdot\rangle_{G}\) and (4.18), we have
Recall (4.16), we can easily get
Therefore
Combining (4.21) with the identity
we get
This completes the proof. Note that the sequence \(\{w^{k}\}\) is Fejér monotone with respect to the solution set. In addition, the proposed algorithm to solve (2.1) or (3.2) has the framework of contraction type methods. Therefore, by using the Fejér monotonicity and the contractility, the rest of the convergence proof becomes standard. Here, we do not repeat. We refer the readers [20] for more details. □
Numerical results
In this section, we study the performance of Algorithm 1 for solving (1.3). Our codes were written in MATLAB R2009a. In addition, all of the experiments were performed on a laptop with an Intel Core 2 Duo CPU at 2.2 GHz and 2 GB memory. In the experiments, we get the data randomly in the same way as in [4]. For given n, \(r< n\), the \(n\times n\) matrix \(L^{*}\) with rankr was generated by \(R_{1}R_{2}^{T}\), where \(R_{1}\) and \(R_{2}\) are both random matrices with all components distributed in \([0,1]\) uniformly. As we see, \(L^{*}\) is a nonnegative and lowrank matrix we want to recover. The support of the sparse matrix \(S^{*}\) was chosen uniformly at random, and the nonzero components of \(S^{*}\) were drawn uniformly in the interval \([500,500]\). The components of matrix \(Z^{*}\) for noise were generated as i.i.d. Gaussian with standard deviation 10^{−4}. Then we set \(M=L^{*}+S^{*}+Z^{*}\). According to the suggestion in [2], we chose \(\rho=1/\sqrt{n}\). The starting point for the two algorithms was set as \(L^{0}=K^{0}=M\), \(S^{0}=Z^{0}=0\), \(\Lambda _{1}^{0}=\Lambda_{2}^{0}=0\). In our experiments, we use
as the recursion terminal condition. The tolerance parameter \(\epsilon _{r}\) here is chosen as 10^{−4}. In order to ensure the rank of \(L^{*}\) be \(n \ast R_{r}\) and the cardinality of \(S^{*}\) be \(n^{2} \ast C_{r}\), we denote \(R_{r} := r/n\) and \(C_{r} := \operatorname {cardinality}(S^{*})/(n^{2})\), respectively. For different cases of m, \(R_{r}\) and \(C_{r}\), we focus on the iteration numbers and CPU times in the experiments. Then we define
as the root mean square error of the matrix \(L (\mathit{rms}_{L})\) and root mean square error of the sparse matrix \(S (\mathit{rms}_{S})\), respectively, where k is the current number of iterations. In order to increase the persuasiveness, we randomly created ten examples, so the results were averaged over ten runs. The numerical results about the CPU times and iteration numbers are presented in Table 1 and Table 2. As we can see, our algorithm shows competitive performance with APGM in most cases. In some cases, Algorithm 1 is attended to be more efficient than APGM. For example, when \(n\in[150,250]\), \(R_{r} =0.02\) and \(C_{r} =0.02\), Algorithm 1 needs less CPU times and much fewer iteration numbers than APGM.
To better observe the convergence and performance of our algorithm, we plot the evolutions of the objective function value in Figure 1, \(\mathit{rms}_{S}\) in Figure 2 and \(\mathit{rms}_{L}\) in Figure 3, respectively. Plots in these figures indicate that the root mean squares of S and L decrease gently at first. However, when approaching the recursion terminal condition, the \(\mathit{rms}_{S}\) and \(\mathit{rms}_{L}\) in Algorithm 1 decrease more rapidly than APGM. In other words, Algorithm 1 meets the stopping criterion faster than APGM.
Conclusions
For solving the SPCP problem (1.3), we proposed a new algorithm based on the PPA in this paper. The global convergence of our algorithm is established. Then the computational results indicate that our algorithm achieves comparable performance with APGM. In certain circumstances, our algorithm can get better results than APGM.
References
 1.
Aybat, NS, Goldfarb, D, Ma, S: Efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58(1), 129 (2014)
 2.
Candès, EJ, Li, X, Ma, Y, Wright, J: Robust principal component analysis? J. ACM 58(3), 11 (2011)
 3.
Zhou, Z, Li, X, Wright, J, Candes, E, Ma, Y: Stable principal component pursuit. In: Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pp. 15181522. IEEE Press, New York (2010)
 4.
Ma, S: Alternating proximal gradient method for convex minimization. Preprint (2012)
 5.
Aybat, NS, Iyengar, G: A unified approach for minimizing composite norms. Math. Program. 144(12), 181226 (2014)
 6.
Tao, M, Yuan, X: Recovering lowrank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 5781 (2011)
 7.
Aybat, NS, Goldfarb, D, Iyengar, G: Fast firstorder methods for stable principal component pursuit (2011). arXiv:1105.2126
 8.
He, B, Yuan, X: On the direct extension of ADMM for multiblock separable convex programming and beyond: from variational inequality perspective
 9.
He, B, Yuan, X, Zhang, W: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559572 (2013)
 10.
Gu, G, He, B, Yuan, X: Customized proximal point algorithms for linearly constrained convex minimization and saddlepoint problems: a unified approach. Comput. Optim. Appl. 59(12), 135161 (2014)
 11.
Boyd, S: EE364b Course Notes: SubGradient Methods. Stanford University, Stanford, CA (2010)
 12.
Martinet, B: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. ESAIM: Math. Model. Numer. Anal. 4(R3), 154158 (1970)
 13.
Rockafellar, RT: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97116 (1976)
 14.
Ma, F, Ni, M, Zhu, L, Yu, Z: An implementable firstorder primaldual algorithm for structured convex optimization. Abstr. Appl. Anal. 2014, Article ID 396753 (2014)
 15.
Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1122 (2011)
 16.
Chen, C, He, B, Ye, Y, Yuan, X: The direct extension of ADMM for multiblock convex minimization problems is not necessarily convergent. Math. Program., 123 (2014)
 17.
Ma, S, Goldfarb, D, Chen, L: Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128(12), 321353 (2011)
 18.
Cai, JF, Candès, EJ, Shen, Z: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 19561982 (2010)
 19.
Parikh, N, Boyd, S: Proximal algorithms. Found. Trends Optim. 1(3), 123231 (2013)
 20.
Ma, S, Xue, L, Zou, H: Alternating direction methods for latent variable Gaussian graphical model selection. Neural Comput. 25(8), 21722198 (2013)
Acknowledgements
The work is supported in part by the Natural Science Foundation of China Grant 71401176 and the Natural Science Foundation of Jiangsu Province Grant BK20141071.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
All the authors contributed equally. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Huai, K., Ni, M., Ma, F. et al. A customized proximal point algorithm for stable principal component pursuit with nonnegative constraint. J Inequal Appl 2015, 148 (2015). https://doi.org/10.1186/s1366001506686
Received:
Accepted:
Published:
Keywords
 proximal point method
 customized
 stable principal component pursuit
 primaldual algorithm