# A customized proximal point algorithm for stable principal component pursuit with nonnegative constraint

## Abstract

The stable principal component pursuit (SPCP) problem represents a large class of mathematical models appearing in sparse optimization-related applications such as image restoration, web data ranking. In this paper, we focus on designing a new primal-dual algorithm for the SPCP problem with nonnegative constraint. Our method is based on the framework of proximal point algorithm. By taking full exploitation to the special structure of the SPCP problem, the method enjoys the advantage of being easily implementable. Global convergence result is established for the proposed method. Preliminary numerical results demonstrate that the method is efficient.

## Introduction

With the development of information technology, the subject of high-dimensional data becomes more and more popular in science and engineering applications such as image and video processing, web documents analysis and bioinformatics data processing. An intensive research attention has been devoted recently to analyzing, processing and exacting useful information from the high-dimensional data efficiently and accurately. The classical principal component analysis (PCA) is the most widely used tool for high-dimensional data analysis, and it plays a fundamental role in dimensionality reduction. PCA computes the singular value decomposition (SVD) of a matrix to obtain a low-dimensional approximation to high-dimensional data in the $$\ell_{2}$$ sense . However, PCA usually breaks down when the given data is corrupted by gross errors. In other words, the classical PCA is not robust to gross errors or outliers. To overcome this issue, many methods have been proposed. In , a new model called principal component pursuit (PCP) was proposed by Candès and Wright under weak assumptions. It is assumed that the matrix $$M\in\mathbb{R}^{m\times n}$$ is of the form $$M=L+S$$, where L is the underlying low-rank matrix representing the principle components and S is a sparse matrix with its most entries being zero. To recover L and S, PCP requires to solve the following convex optimization problem:

$$\begin{array}{ll} \min & \|L\|_{*}+\rho\|S\|_{1} \\ \quad \mbox{s.t.} &L+S=M, \\ &L,S\in\mathbb{R}^{m\times n}, \end{array}$$
(1.1)

where $$\|L\|_{*}$$ denotes the nuclear norm of L, which is equal to the sum of its singular values, $$\|S\|_{1}=\sum_{i,j}|S_{i,j}|$$ is the $$\ell_{1}$$ norm of S, and ρ is a parameter balancing the low-rank and sparsity.

In , it was shown that the recovery is still feasible even when the data matrix M is corrupted with a dense error matrix Z such that $$\|Z\|_{F}\leq\delta$$. Indeed, this can be accomplished by solving the following stable principal component pursuit (SPCP) problem:

$$\begin{array}{ll} \min & \|L\|_{*}+\rho\|S\|_{1} \\ \quad \mbox{s.t.} &L+S+Z=M, \\ &\|Z\|_{F}\leq\sigma, \\ &L,S,Z\in\mathbb{R}^{m\times n}, \end{array}$$
(1.2)

where the matrix Z is the noise, $$\|Z\|_{F}$$ denotes the Frobenius norm of Z, and $$\sigma>0$$ is the noise level. Note that (1.1) is a special case of the SPCP problem (1.2) with $$\sigma=0$$. In practical applications such as background exacting from face recognition, video denoising and surveillance video, the low-rank matrix L always represents an image. Therefore, adding a nonnegative constraint $$L\geq0$$ to (1.2) makes sense. This results in the following SPCP problem with nonnegative constraint:

$$\begin{array}{ll} \min & \|L\|_{*}+\rho\|S\|_{1}+\mathcal{I} (\|Z\|_{F}\leq\sigma )+\mathcal{I}(L\geq0) \\ \quad \mbox{s.t.} &L+S+Z=M, \\ &L,S,Z\in\mathbb{R}^{m\times n}, \end{array}$$
(1.3)

where $$\mathcal{I}(\cdot)$$ is an indicator function .

Recently, many algorithms using only first-order information for solving the SPCP problem (1.2) have been proposed. Aybat and Iyengar proposed a first-order augmented Lagrangian algorithm (FALC) which was the first algorithm with a known complexity bound that solves the SPCP problem in . Tao and Yuan developed the alternating splitting augmented Lagrangian method (ASALM) and its variant (VASALM) for solving (1.2) in . Aybat et al. advanced a new first-order algorithm NSA based on partial variable splitting in . Nevertheless, how to solve (1.3) has not caused enough attention. We can only find that Ma proposed an alternating proximal gradient method (APGM) which was based on the framework of alternating direction method of multipliers for solving (1.3) in . In this paper, we propose a customized proximal point algorithm with a special proximal regularization parameter to solve the SPCP problem. Note that (1.3) is well-structured in the sense that the separable structure emerges in both the objective function and the constraints. A natural idea is to develop a customized algorithm to take advantage of the favorable structure of (1.3). This is the main motivation of our paper.

The rest of this paper is organized as follows. In Section 2, we give some useful preliminaries. In Section 3, we present the customized PPA for solving (1.3) and the convergence analysis is shown in Section 4. In Section 5, we compare our algorithm with APGM to illustrate the efficiency by performing numerical experiments. Finally, some conclusions are drawn in Section 6.

## Preliminaries

In order to facilitate the analysis, we consider the following separable convex optimization problem with linear constraint instead of (1.3):

$$\begin{array}{ll} \min & f(x)+g(y) \\ \quad \mbox{s.t.} &Ax+By=b, \\ &x\in\mathcal{X},\qquad y\in\mathcal{Y}, \end{array}$$
(2.1)

where $$A\in\mathbb{R}^{m\times n}$$, $$B\in\mathbb{R}^{m\times p}$$, $$b\in\mathbb{R}^{m}$$, $$\mathcal{X} \subseteq\mathbb{R}^{n}$$ and $$\mathcal{Y} \subseteq\mathbb{R}^{p}$$ are convex sets, and $$f(x): \mathbb{R}^{n}\rightarrow\mathbb{R}$$ and $$g(y): \mathbb {R}^{p}\rightarrow\mathbb{R}$$ are both convex but not necessarily smooth functions . Throughout, the solution set of (2.1) is assumed to be nonempty. Furthermore, we assume that $$f(x)$$ and $$g(y)$$ are ‘simple’ which means that their proximal operators have a closed-form representation or they can be efficiently solved up to a high precision . The proximal operator of the function $$\varphi(x): \mathbb{R}^{n}\rightarrow \mathbb{R}$$ is defined as

$$\operatorname{Prox}(\varphi,\xi,a)=\operatorname{argmin} \biggl\{ \varphi(x)+\frac{1}{2\xi}\|x-a\| ^{2}\Bigm| x\in\mathcal{X} \biggr\}$$
(2.2)

for any given $$a\in\mathbb{R}^{n}$$ and $$\xi>0$$ . The nuclear norm of L and the $$\ell_{1}$$ norm of S in (1.3) both are simple functions. Under the assumption, we will show that our algorithm for solving (1.3) can result in easy proximal subproblems.

We signify the subdifferential of the convex function $$f(x)$$ by $$\partial{f(x)}$$,

$$\partial{f(x)}:= \bigl\{ d\in\mathbb{R}^{n}\mid f(z)-f(x) \geq {d^{T}(z-x)},\forall{z}\in\mathbb{R}^{n} \bigr\} ,$$
(2.3)

and each $$d\in\partial{f(x)}$$ is called a subgradient of $$f(x)$$ . Let $$\theta(x) \in\partial{f(x)}$$, then we can have

\begin{aligned}& f(z)-f(x)\geq(z-x)^{T}\theta(x), \end{aligned}
(2.4a)
\begin{aligned}& f(x)-f(z)\geq(x-z)^{T}\theta(z). \end{aligned}
(2.4b)

Merging (2.4a) and (2.4b), we can easily get

$$(x-z)^{T} \bigl(\theta(x)-\theta(z) \bigr)\geq{0}, \quad \forall{x,z}\in\mathbb{R}^{n},$$
(2.5)

which implies that the mapping $$\theta(\cdot)$$ is monotone.

Now, we show that (2.1) can be characterized by a variational inequality (VI) framework. Let $$\lambda\in\mathbb{R}^{m}$$ be the Lagrangian multiplier associated with the linear constraint in (2.1), then the Lagrangian function of (2.1) is

$$\mathcal{L}(x,y;\lambda)=f(x)+g(y)-\lambda^{T}(Ax+By-b).$$
(2.6)

By deriving the optimality conditions of (2.1), we can easily find that solving (2.1) is equivalent to finding a pair of $$(x^{*},y^{*};\lambda^{*})$$ which satisfies

$$\left \{ \begin{array}{l} x^{*}\in\mathcal{X}, \quad (x-x^{*})^{T}(\theta(x^{*})-A^{T}\lambda^{*})\geq{0},\quad \forall{x}\in\mathcal{X}, \\ y^{*}\in\mathcal{Y}, \quad (y-y^{*})^{T}(\gamma(y^{*})-B^{T}\lambda^{*})\geq{0},\quad \forall{y}\in\mathcal{Y}, \\ Ax^{*}+By^{*}-b=0, \end{array} \right .$$
(2.7)

where $$\theta(x^{*})\in\partial{f(x^{*})}$$ and $$\gamma(y^{*})\in\partial {g(y^{*})}$$. By denoting

$$u= \left ( \begin{array}{@{}c@{}} x\\ y \end{array} \right ), \qquad w= \left ( \begin{array}{@{}c@{}} x \\ y \\ \lambda \end{array} \right ),\qquad h(u)=f(x)+g(y), \qquad F(w)=\left ( \begin{array}{@{}c@{}} -A^{T}\lambda \\ -B^{T}\lambda \\ Ax+By-b \end{array} \right ),$$
(2.8a)

and

$$\Omega=\mathcal{X}\times\mathcal{Y}\times\mathbb{R}^{m},$$
(2.8b)

problem (2.7) can be rewritten as the variational inequality reformulation

$$w^{*}\in\Omega,\quad h(u)-h \bigl(u^{*} \bigr)+ \bigl(w-w^{*} \bigr)^{T}F \bigl(w^{*} \bigr)\geq0,\quad \forall w\in\Omega.$$
(2.9)

Obviously, the mapping $$F(w)$$ defined in (2.8a) is monotone. The solution set of (2.8a)-(2.9), denoted by $$\Omega^{*}$$ is nonempty under the assumption that the solution set of (2.1) is not empty.

## The new algorithm

In this section, we will present our new algorithm to solve VI (2.9). However, at the beginning, we first review the classical PPA.

After the PPA was proposed firstly by Martinet in  and further developed by Rockafellar in , it plays a vital role in optimization area. Given the iterate $$w^{k}$$, the classical PPA generates the new iterate $$w^{k+1}\in\Omega$$ via the following procedure:

$$h(u)-h \bigl(u^{k+1} \bigr)+ \bigl(w-w^{k+1} \bigr)^{T} \bigl(F \bigl(w^{k+1} \bigr)+G \bigl(w^{k+1}-w^{k} \bigr) \bigr)\geq0,\quad \forall u\in\Omega,$$
(3.1)

where the metric proximal parameter $$G\in\mathbb{R}^{n\times n}$$ is required to be a positive definite matrix. A popular choice of G is $$G=\beta I$$, where $$\beta>0$$ and I is the identity matrix . Here, we are ready to present our new algorithm for solving (2.1).

### Algorithm 1

(The main algorithm for (2.1))

• Let $$r>0$$ and $$s>\frac{1}{r}\|B^{T}B\|$$, take $$(x^{0},y^{0};\lambda^{0})\in\mathcal{X}\times\mathcal{Y}\times\mathbb{R}^{m}$$ as the initial point.

Step 1.:

Update y, x and λ:

\begin{aligned}& y^{k+1}=\operatorname{argmin} \biggl\{ g(y)+\frac{r}{2} \biggl\| y-y^{k}-\frac{1}{r}B^{T}\lambda^{k} \biggr\| ^{2}\Bigm|{y\in \mathcal{Y}} \biggr\} , \\& x^{k+1}=\operatorname{argmin} \biggl\{ f(x)-\biggl(\lambda^{k}- \frac{1}{s}\bigl(Ax^{k}+B\bigl(2y^{k+1}-y^{k} \bigr)-b\bigr)\biggr)^{T}Ax \\& \hphantom{x^{k+1}=}{}+\frac{1}{2s}\bigl\| A\bigl(x-x^{k}\bigr) \bigr\| ^{2}+\frac{1}{2}\bigl\| x-x^{k}\bigr\| ^{2}\Bigm|{x\in \mathcal{X}}\biggr\} , \\& \lambda^{k+1}=\lambda^{k}-\frac{1}{s} \bigl(Ax^{k+1}+B\bigl(2y^{k+1}-y^{k}\bigr)-b \bigr). \end{aligned}
Step 2.:

If the termination criterion is met, stop the algorithm; otherwise, go to Step 1.

The new customized PPA described above is known as alternating direction method of multipliers (ADMM) with two blocks . Its global convergence result has been proven in many literature works. However, there are three variables in (1.3). If we apply the customized PPA to the SPCP problem directly, the convergence of the algorithm cannot be guaranteed . Moreover, the proximal mapping of $$\|L\|_{*}+\mathcal {I}(L\geq0)$$ is difficult to compute . By introducing a new auxiliary parameter K, we can group L and S as one big block $$[L;S]$$, and group Z and K as another big block $$[Z;K]$$. Then (1.3) can be rewritten as a similar form of (2.1):

$$\textstyle\begin{array}{ll} \min_{L, S, Z, K} & \|L\|_{*}+\rho\|S\|_{1}+\mathcal{I} (\|Z\| _{F}\leq\sigma )+\mathcal{I}(K\geq0) \\ \mbox{s.t.} &\left ( \begin{array}{@{}c@{\quad}c@{}} I &0 \\ 0 &I \end{array} \right )\left ( \begin{array}{@{}c@{}} Z \\ K \end{array} \right )+\left ( \begin{array}{@{}c@{\quad}c@{}} I &I\\ -I &0 \end{array} \right )\left ( \begin{array}{@{}c@{}} L\\ S \end{array} \right )= \left ( \begin{array}{@{}c@{}} M\\ 0 \end{array} \right ). \end{array}$$
(3.2)

Then we can solve (1.3) or (3.2) by applying the new customized PPA as follows:

\begin{aligned}& L^{k+1}=\operatorname{argmin}\|L\|_{*}+\frac{r}{2} \biggl\Vert L-L^{k}-\frac{1}{r} \bigl(\Lambda _{1}^{k}- \Lambda_{2}^{k} \bigr) \biggr\Vert _{F}^{2}, \end{aligned}
(3.3a)
\begin{aligned}& S^{k+1}=\operatorname{argmin}\rho\|S\|_{1}+\frac{r}{2} \biggl\Vert S-S^{k}-\frac{1}{r}\Lambda _{1}^{k} \biggr\Vert _{F}^{2}, \end{aligned}
(3.3b)
\begin{aligned}& Z^{k+1}= \operatorname{argmin}\mathcal{I} \bigl(\Vert Z\Vert _{F}\leq\sigma \bigr) \\& \hphantom{Z^{k+1}=}{}+\frac{1+s}{2s} \biggl\Vert Z-Z^{k}- \frac{1}{1+s} \bigl(s\Lambda _{1}^{k}- \bigl(2L^{k+1}-L^{k}+2S^{k+1}-S^{k}+Z^{k}-M \bigr) \bigr) \biggr\Vert _{F}^{2}, \end{aligned}
(3.3c)
\begin{aligned}& K^{k+1}= \operatorname{argmin}\mathcal{I}(K\geq0) \\& \hphantom{K^{k+1}=}{}+ \frac{1+s}{2s} \biggl\Vert K-K^{k}- \frac{1}{1+s} \bigl(s\Lambda _{2}^{k}- \bigl(L^{k}-2L^{k+1}+K^{k}+M \bigr) \bigr) \biggr\Vert _{F}^{2}, \end{aligned}
(3.3d)
\begin{aligned}& \Lambda_{1}^{k+1}=\Lambda_{1}^{k}- \frac{1}{s} \bigl(2L^{k+1}-L^{k}+2S^{k+1}-S^{k}+Z^{k+1}-M \bigr), \end{aligned}
(3.3e)
\begin{aligned}& \Lambda_{2}^{k+1}=\Lambda_{2}^{k}- \frac{1}{s} \bigl(K^{k+1}+L^{k}-2L^{k+1} \bigr). \end{aligned}
(3.3f)

The simplicity of the above scheme is that all the subproblems have closed-form solutions. As we see, model (1.3) turns out to be (1.1) when we set $$\sigma>0$$ and abandon the constraint $$L\geq0$$. Under these circumstances, subproblems (3.3a) and (3.3b) are the solution of (1.1). We now show the reason that the four subproblems can be solved easily. The first subproblem (3.3a) is equivalent to solving the proximal mapping of the nuclear norm $$\|L\|_{*}$$ and can be expressed by

$$L^{k+1} := \operatorname{MatShrink} \biggl(L^{k}+ \frac{1}{r} \bigl(\Lambda_{1}^{k}- \Lambda _{2}^{k} \bigr), \frac{1}{r} \biggr),$$
(3.4)

where the matrix shrinkage operation $$\operatorname{MatShrink}(M, \alpha)$$ ($$\alpha>0$$) is defined as

$$\operatorname{MatShrink}(M, \alpha):= U\operatorname{Diag} \bigl(\max\{\sigma - \alpha,0\} \bigr)V^{T},$$

and $$U\operatorname{Diag}(\sigma)V^{T}$$ is the SVD of the matrix M, see  and .

The S-subproblem (3.3b) can be solved by

$$S^{k+1} := \operatorname{Shrink} \biggl(S^{k}+ \frac{1}{r}\Lambda_{1}^{k}, \frac{\rho}{r} \biggr),$$
(3.5)

where the $$\ell_{1}$$ shrinkage operator  $$\operatorname {Shrink}(M, \alpha)$$ is defined as

$$\bigl[\operatorname{Shrink}(M, \alpha) \bigr]_{ij}:=\left \{ \begin{array}{l@{\quad}l} M_{ij}+\alpha,& \mbox{if } M_{ij}< -\alpha, \\ M_{ij}-\alpha,& \mbox{if } M_{ij}>\alpha, \\ 0,& \mbox{if } |M_{ij}|\leq\alpha. \end{array} \right .$$

The closed-form solution of the third subproblem (3.3c) can be written as

$$Z^{k+1} :=W^{k} / \max \bigl\{ 1, \bigl\Vert W^{k} \bigr\Vert _{F}/\sigma \bigr\} ,$$
(3.6)

which means projecting the matrix $$W^{k} :=M+s\Lambda _{1}^{k}-(2L^{k+1}-L^{k}+2S^{k+1}-S^{k})$$ onto the Euclidean ball $$\|Z\|_{F}\leq \sigma$$. The K-subproblem (3.3d) corresponds to projecting the matrix $$(2L^{k+1}-L^{k}+s\Lambda_{2}^{k}-M)$$ onto the nonnegative orthant and this can be done by

$$K^{k+1} :=\max \bigl\{ 2L^{k+1}-L^{k}+s \Lambda_{2}^{k}-M,0 \bigr\} ,$$
(3.7)

where the max function is componentwise .

## Convergence analysis

In this section, we will show the global convergence result of the algorithm proposed for solving (2.1) or (3.2). First, we need to prove the following lemma.

### Lemma 1

Let $$w^{k+1}=(x^{k+1},y^{k+1};\lambda^{k+1})$$ be generated by the proposed algorithm from the given $$w^{k}=(x^{k},y^{k};\lambda^{k})$$. Then we can have

$$h(u)-h \bigl(u^{k+1} \bigr)+ \bigl(w-w^{k+1} \bigr)^{T} \bigl(F \bigl(w^{k+1} \bigr)+G \bigl(w^{k+1}-w^{k} \bigr) \bigr)\geq0, \quad \forall w\in\Omega,$$
(4.1)

where

$$G= \left ( \begin{array}{@{}c@{\quad}c@{\quad}c@{}} I & 0 &0 \\ 0 & rI &B^{T} \\ 0 & B &sI \end{array} \right ).$$

### Proof

Deriving the first-order optimality condition of the first equality in Algorithm 1, we can obtain

$$g(y)-g \bigl(y^{k+1} \bigr)+ \bigl(y-y^{k+1} \bigr)^{T} \bigl[-B^{T}\lambda^{k}+r \bigl(y^{k+1}-y^{k} \bigr) \bigr]\geq0,\quad \forall y\in \mathcal{Y}.$$
(4.2)

It can also be expressed as

$$g(y)-g \bigl(y^{k+1} \bigr)+ \bigl(y-y^{k+1} \bigr)^{T} \bigl[B^{T} \bigl(\lambda^{k+1}- \lambda^{k} \bigr)-B^{T}\lambda ^{k+1}+r \bigl(y^{k+1}-y^{k} \bigr) \bigr]\geq0, \quad \forall y\in \mathcal{Y}.$$
(4.3)

Homoplastically, the second iteration for solving $$x^{k+1}$$ in Algorithm 1 shows us that

\begin{aligned}& f(x)-f \bigl(x^{k+1} \bigr)+ \bigl(x-x^{k+1} \bigr)^{T} \biggl[-A\lambda^{k}+\frac{1}{s}A^{T} \bigl(Ax^{k}+B \bigl(2y^{k+1}-y^{k} \bigr)-b \bigr) \\& \quad {}+ \biggl(I+\frac {1}{s}A^{T}A \biggr) \bigl(x^{k+1}-x^{k} \bigr) \biggr]\geq0,\quad \forall x\in \mathcal{X}. \end{aligned}
(4.4)

Substituting $$\lambda^{k}=\lambda^{k+1}+\frac{1}{s} (Ax^{k+1}+B(2y^{k+1}-y^{k})-b )$$ into (4.4), we get

$$f(x)-f \bigl(x^{k+1} \bigr)+ \bigl(x-x^{k+1} \bigr)^{T} \bigl(-A^{T}\lambda^{k+1}+ \bigl(x^{k+1}-x^{k} \bigr) \bigr)\geq 0, \quad \forall x\in \mathcal{X}.$$
(4.5)

Note that the λ-iteration can be written as

$$Ax^{k+1}+By^{k+1}-b+B \bigl(y^{k+1}-y^{k} \bigr)+s \bigl(\lambda^{k+1}-\lambda^{k} \bigr)=0.$$
(4.6)

Merging (4.3), (4.5) and (4.6), we achieve

\begin{aligned}& h(u)-h \bigl(u^{k+1} \bigr)+\left ( \begin{array}{@{}c@{}} x-x^{k+1} \\ y- y^{k+1} \\ \lambda-\lambda^{k+1} \end{array} \right )^{T} \\& \quad {}\times \left \{\left ( \begin{array}{@{}c@{}} -A^{T}\lambda^{k+1} \\ -B^{T}\lambda^{k+1} \\ Ax^{k+1}+By^{k+1}-b \end{array} \right )+ \left ( \begin{array}{@{}c@{}} (x^{k+1}-x^{k}) \\ r(y^{k+1}-y^{k})+B^{T}(\lambda^{k+1}-\lambda^{k}) \\ B(y^{k+1}-y^{k})+s(\lambda^{k+1}-\lambda^{k}) \end{array} \right ) \right \}\geq0,\quad \forall w\in\Omega. \end{aligned}
(4.7)

Utilizing the notation of w, $$F(w)$$ and the matrix G, the inequality above can be rewritten as

$$h(u)-h \bigl(u^{k+1} \bigr)+ \bigl(w-w^{k+1} \bigr)^{T} \bigl(F \bigl(w^{k+1} \bigr)+G \bigl(w^{k+1}-w^{k} \bigr) \bigr)\geq0, \quad \forall w\in\Omega.$$
(4.8)

In other words, the lemma is proved. Inequality (4.8) implies that the proposed algorithm is in fact equivalent to the proximal point algorithm with a special proximal regularization parameter matrix G. □

Now we show and prove the contractive property of the proposed algorithm.

### Lemma 2

Assume that the parameters $$r>0$$ and $$s>\frac{1}{r}\| B^{T}B\|$$ are satisfied. Let $$w^{k+1}$$ be the sequence generated by the new algorithm with an arbitrary initial iterate $$w^{0}$$. Then it holds

$$\bigl\Vert w^{k+1}-w^{*} \bigr\Vert _{G}^{2}\leq \bigl\Vert w^{k}-w^{*} \bigr\Vert _{G}^{2}- \bigl\Vert w^{k+1}-w^{k} \bigr\Vert _{G}^{2}, \quad \forall w^{*}\in\Omega^{*}.$$
(4.9)

The norm $$\|\cdot\|_{G}^{2}$$ is defined as $$\|w\|_{G}^{2}=\langle w,Gw \rangle$$ and the corresponding inner product $$\langle\cdot,\cdot\rangle_{G}$$ is defined as $$\langle u,v\rangle_{G}=\langle u,Gv\rangle$$.

### Proof

Because $$w^{*}\in\Omega^{*}$$ is optimal to (2.1), it follows from the KKT conditions that the following hold:

\begin{aligned}& 0\in\partial f \bigl(x^{*} \bigr)-A^{T}\lambda^{*}, \end{aligned}
(4.10a)
\begin{aligned}& 0\in\partial g \bigl(y^{*} \bigr)-B^{T}\lambda^{*}, \end{aligned}
(4.10b)
\begin{aligned}& 0=Ax^{*}+By^{*}-b. \end{aligned}
(4.10c)

As we see, the optimality condition for the first subproblem in Algorithm 1 is

$$0\in\partial g \bigl(y^{k+1} \bigr)+r \biggl(y^{k+1}-y^{k}- \frac{1}{r}B^{T}\lambda ^{k} \biggr).$$
(4.11)

Combining (4.10b) and (4.11) under the fact that $$\theta(\cdot)$$ is monotone, we have

$$\bigl(y^{k+1}-y^{*} \bigr)^{T} \bigl(B^{T} \lambda^{k}-r \bigl(y^{k+1}-y^{k} \bigr)-B^{T} \lambda^{*} \bigr)\geq 0.$$
(4.12)

Similarly, the optimality condition for the subproblem with respect to x can be given by

\begin{aligned} 0 \in&\partial f \bigl(x^{k+1} \bigr)-A^{T} \biggl( \lambda^{k}-\frac {1}{s} \bigl(Ax^{k+1}+B \bigl(2y^{k+1}-y^{k} \bigr)-b \bigr) \biggr) \\ &{}+ \frac {1}{s}A^{T}A \bigl(x^{k+1}-x^{k} \bigr)+ \bigl(x^{k+1}-x^{k} \bigr). \end{aligned}
(4.13)

Substituting the λ-subproblem into (4.12), we obtain

$$0\in\partial f \bigl(x^{k+1} \bigr)-A^{T} \lambda^{k+1}+ \bigl(x^{k+1}-x^{k} \bigr).$$
(4.14)

Combining (4.10a) and (4.13), we get

$$\bigl(x^{k+1}-x^{*} \bigr)^{T} \bigl(A^{T} \lambda^{k+1}- \bigl(x^{k+1}-x^{k} \bigr)-A^{T}x^{*} \bigr)\geq0.$$
(4.15)

Summing (4.12) and (4.15), we can achieve

\begin{aligned}& \bigl(x^{k+1}-x^{*} \bigr)^{T}A^{T} \bigl(\lambda^{k+1}-\lambda ^{*} \bigr)+ \bigl(x^{k+1}-x^{*} \bigr)^{T} \bigl(x^{k}-x^{k+1} \bigr)+ \bigl(y^{k+1}-y^{*} \bigr)^{T}B^{T} \bigl(\lambda ^{k}-\lambda^{k+1} \bigr) \\& \quad {}+ \bigl(y^{k+1}-y^{*} \bigr)^{T}B^{T} \bigl( \lambda^{k+1}-\lambda ^{*} \bigr)+r \bigl(y^{k+1}-y^{*} \bigr)^{T} \bigl(y^{k}-y^{k+1} \bigr)\geq0. \end{aligned}
(4.16)

Combining the λ-subproblem with (4.10c), we get

$$\bigl(\lambda^{k+1}-\lambda ^{*} \bigr)^{T} \bigl(B \bigl(y^{k}+y^{*}-2y^{k+1} \bigr)-A \bigl(x^{k+1}-x^{*} \bigr)+s \bigl(\lambda^{k}-\lambda ^{k+1} \bigr) \bigr)\geq0.$$
(4.17)

Note that (4.17) can be rewritten as

\begin{aligned}& \bigl(\lambda^{k+1}-\lambda^{*} \bigr)^{T}B \bigl(y^{k}-y^{k+1} \bigr)+s \bigl(\lambda^{k+1}- \lambda ^{*} \bigr)^{T} \bigl(\lambda^{k}-\lambda^{k+1} \bigr) \\& \quad \geq \bigl(x^{k+1}-x^{*} \bigr)^{T}A^{T} \bigl( \lambda^{k+1}-\lambda ^{*} \bigr)+ \bigl(y^{k+1}-y^{*} \bigr)^{T}B^{T} \bigl(\lambda^{k+1}-\lambda^{*} \bigr). \end{aligned}
(4.18)

Using the definition of $$\langle\cdot,\cdot\rangle_{G}$$ and (4.18), we have

\begin{aligned}& \bigl\langle w^{k+1}-w^{*},w^{k}-w^{k+1} \bigr\rangle _{G} \\& \quad \geq \bigl(x^{k+1}-x^{*} \bigr)^{T}A^{T} \bigl( \lambda^{k+1}-\lambda ^{*} \bigr)+ \bigl(x^{k+1}-x^{*} \bigr)^{T} \bigl(x^{k}-x^{k+1} \bigr) \\& \qquad {}+ \bigl(y^{k+1}-y^{*} \bigr)^{T}B^{T} \bigl( \lambda^{k}-\lambda ^{k+1} \bigr)+r \bigl(y^{k+1}-y^{*} \bigr)^{T} \bigl(y^{k}-y^{k+1} \bigr) \\& \qquad {}+ \bigl(y^{k+1}-y^{*} \bigr)^{T}B^{T} \bigl( \lambda ^{k+1}-\lambda^{*} \bigr). \end{aligned}
(4.19)

Recall (4.16), we can easily get

$$\bigl\langle w^{k+1}-w^{*},w^{k}-w^{k+1} \bigr\rangle _{G}\geq0.$$
(4.20)

Therefore

$$\bigl\langle w^{k+1}-w^{k},w^{*}-w^{k} \bigr\rangle _{G}\geq \bigl\Vert w^{k+1}-w^{k} \bigr\Vert _{G}^{2}.$$
(4.21)

Combining (4.21) with the identity

$$\bigl\Vert w^{k+1}-w^{*} \bigr\Vert _{G}^{2}= \bigl\Vert w^{k+1}-w^{k} \bigr\Vert _{G}^{2}-2 \bigl\langle w^{k+1}-w^{k},w^{*}-w^{k} \bigr\rangle _{G}+ \bigl\Vert w^{k}-w^{*} \bigr\Vert _{G}^{2},$$
(4.22)

we get

$$\bigl\Vert w^{k+1}-w^{*} \bigr\Vert _{G}^{2}\leq \bigl\Vert w^{k}-w^{*} \bigr\Vert _{G}^{2}- \bigl\Vert w^{k+1}-w^{k} \bigr\Vert _{G}^{2}.$$
(4.23)

This completes the proof. Note that the sequence $$\{w^{k}\}$$ is Fejér monotone with respect to the solution set. In addition, the proposed algorithm to solve (2.1) or (3.2) has the framework of contraction type methods. Therefore, by using the Fejér monotonicity and the contractility, the rest of the convergence proof becomes standard. Here, we do not repeat. We refer the readers  for more details. □

## Numerical results

In this section, we study the performance of Algorithm 1 for solving (1.3). Our codes were written in MATLAB R2009a. In addition, all of the experiments were performed on a laptop with an Intel Core 2 Duo CPU at 2.2 GHz and 2 GB memory. In the experiments, we get the data randomly in the same way as in . For given n, $$r< n$$, the $$n\times n$$ matrix $$L^{*}$$ with rank-r was generated by $$R_{1}R_{2}^{T}$$, where $$R_{1}$$ and $$R_{2}$$ are both random matrices with all components distributed in $$[0,1]$$ uniformly. As we see, $$L^{*}$$ is a nonnegative and low-rank matrix we want to recover. The support of the sparse matrix $$S^{*}$$ was chosen uniformly at random, and the nonzero components of $$S^{*}$$ were drawn uniformly in the interval $$[-500,500]$$. The components of matrix $$Z^{*}$$ for noise were generated as i.i.d. Gaussian with standard deviation 10−4. Then we set $$M=L^{*}+S^{*}+Z^{*}$$. According to the suggestion in , we chose $$\rho=1/\sqrt{n}$$. The starting point for the two algorithms was set as $$L^{0}=K^{0}=-M$$, $$S^{0}=Z^{0}=0$$, $$\Lambda _{1}^{0}=\Lambda_{2}^{0}=0$$. In our experiments, we use

$$\mathrm{resid}=\frac{\|L+S+Z-M\|_{F}}{\|M\|_{F}}< \epsilon_{r}$$
(5.1)

as the recursion terminal condition. The tolerance parameter $$\epsilon _{r}$$ here is chosen as 10−4. In order to ensure the rank of $$L^{*}$$ be $$n \ast R_{r}$$ and the cardinality of $$S^{*}$$ be $$n^{2} \ast C_{r}$$, we denote $$R_{r} := r/n$$ and $$C_{r} := \operatorname {cardinality}(S^{*})/(n^{2})$$, respectively. For different cases of m, $$R_{r}$$ and $$C_{r}$$, we focus on the iteration numbers and CPU times in the experiments. Then we define

$$\mathit{rms}_{L} :=\frac{\|L-L^{*}\|_{F}}{k},\qquad \mathit{rms}_{S} :=\frac{\|S-S^{*}\|_{F}}{k}$$
(5.2)

as the root mean square error of the matrix $$L (\mathit{rms}_{L})$$ and root mean square error of the sparse matrix $$S (\mathit{rms}_{S})$$, respectively, where k is the current number of iterations. In order to increase the persuasiveness, we randomly created ten examples, so the results were averaged over ten runs. The numerical results about the CPU times and iteration numbers are presented in Table 1 and Table 2. As we can see, our algorithm shows competitive performance with APGM in most cases. In some cases, Algorithm 1 is attended to be more efficient than APGM. For example, when $$n\in[150,250]$$, $$R_{r} =0.02$$ and $$C_{r} =0.02$$, Algorithm 1 needs less CPU times and much fewer iteration numbers than APGM.

To better observe the convergence and performance of our algorithm, we plot the evolutions of the objective function value in Figure 1, $$\mathit{rms}_{S}$$ in Figure 2 and $$\mathit{rms}_{L}$$ in Figure 3, respectively. Plots in these figures indicate that the root mean squares of S and L decrease gently at first. However, when approaching the recursion terminal condition, the $$\mathit{rms}_{S}$$ and $$\mathit{rms}_{L}$$ in Algorithm 1 decrease more rapidly than APGM. In other words, Algorithm 1 meets the stopping criterion faster than APGM.

## Conclusions

For solving the SPCP problem (1.3), we proposed a new algorithm based on the PPA in this paper. The global convergence of our algorithm is established. Then the computational results indicate that our algorithm achieves comparable performance with APGM. In certain circumstances, our algorithm can get better results than APGM.

## References

1. 1.

Aybat, NS, Goldfarb, D, Ma, S: Efficient algorithms for robust and stable principal component pursuit problems. Comput. Optim. Appl. 58(1), 1-29 (2014)

2. 2.

Candès, EJ, Li, X, Ma, Y, Wright, J: Robust principal component analysis? J. ACM 58(3), 11 (2011)

3. 3.

Zhou, Z, Li, X, Wright, J, Candes, E, Ma, Y: Stable principal component pursuit. In: Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pp. 1518-1522. IEEE Press, New York (2010)

4. 4.

Ma, S: Alternating proximal gradient method for convex minimization. Preprint (2012)

5. 5.

Aybat, NS, Iyengar, G: A unified approach for minimizing composite norms. Math. Program. 144(1-2), 181-226 (2014)

6. 6.

Tao, M, Yuan, X: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57-81 (2011)

7. 7.

Aybat, NS, Goldfarb, D, Iyengar, G: Fast first-order methods for stable principal component pursuit (2011). arXiv:1105.2126

8. 8.

He, B, Yuan, X: On the direct extension of ADMM for multi-block separable convex programming and beyond: from variational inequality perspective

9. 9.

He, B, Yuan, X, Zhang, W: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56(3), 559-572 (2013)

10. 10.

Gu, G, He, B, Yuan, X: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59(1-2), 135-161 (2014)

11. 11.

Boyd, S: EE364b Course Notes: Sub-Gradient Methods. Stanford University, Stanford, CA (2010)

12. 12.

Martinet, B: Brève communication. Régularisation d’inéquations variationnelles par approximations successives. ESAIM: Math. Model. Numer. Anal. 4(R3), 154-158 (1970)

13. 13.

Rockafellar, RT: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97-116 (1976)

14. 14.

Ma, F, Ni, M, Zhu, L, Yu, Z: An implementable first-order primal-dual algorithm for structured convex optimization. Abstr. Appl. Anal. 2014, Article ID 396753 (2014)

15. 15.

Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1-122 (2011)

16. 16.

Chen, C, He, B, Ye, Y, Yuan, X: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program., 1-23 (2014)

17. 17.

Ma, S, Goldfarb, D, Chen, L: Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128(1-2), 321-353 (2011)

18. 18.

Cai, J-F, Candès, EJ, Shen, Z: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956-1982 (2010)

19. 19.

Parikh, N, Boyd, S: Proximal algorithms. Found. Trends Optim. 1(3), 123-231 (2013)

20. 20.

Ma, S, Xue, L, Zou, H: Alternating direction methods for latent variable Gaussian graphical model selection. Neural Comput. 25(8), 2172-2198 (2013)

## Acknowledgements

The work is supported in part by the Natural Science Foundation of China Grant 71401176 and the Natural Science Foundation of Jiangsu Province Grant BK20141071.

## Author information

Authors

### Corresponding author

Correspondence to Kaizhan Huai.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

All the authors contributed equally. All authors read and approved the final manuscript.

## Rights and permissions 