# The modified accelerated Bregman method for regularized basis pursuit problem

## Abstract

In this paper, a modified accelerated Bregman method (MABM) for solving the regularized basis pursuit problem is considered and analyzed in detail. This idea is based on the fact that the linearized Bregman method (LBM) proposed by Osher et al. (Multiscale Model. Simul. 4(2):460-489, 2005) is equivalent to a gradient descent method applied to a certain dual formulation which converges to the solution of the regularized basis pursuit problem. The proposed method is based on an extrapolation technique which is used in accelerated proximal gradient methods presented by Nesterov (Dokl. Akad. Nauk SSSR 269:543-547, 1983). It is verified that the modified accelerated Bregman method (MABM) is equivalent to the corresponding accelerated augmented Lagrangian method (AALM). The theoretical results confirm that the method has a rapid convergence rate of $O\left(1/{\left(k+1\right)}^{2}\right)$.

## 1 Introduction

Compressed sensing, an interesting research field involving how to obtain information, acts as a crucial role in signal processing, image restoration, etc. The origin of its name is based on the idea of encoding a large sparse signal exploiting a relatively small number of linear measurements, then decoding the signal either through minimizing the 1-norm or utilizing a combinational algorithm, a greedy algorithm, etc. It resulted from elementary analysis and approximation theory by Kashin [1], but was brought into the forefront by the work of Candés et al. [24] and Donoho [5] who constructed specific methods and showed their application prospects. The concrete model considered in compressed sensing is the so-called basis pursuit problem [6, 7]

(1.1)

where $A\in {\mathrm{\Re }}^{m×n}$, $b\in {\mathrm{\Re }}^{m}$. It can be deemed a classical transform in the compressed sensing field into the following NP-hard discrete optimization problem:

(1.2)

where $A\in {\mathrm{\Re }}^{m×n}$, $b\in {\mathrm{\Re }}^{m}$, ${\parallel \cdot \parallel }_{0}$ denotes the number of nonzero elements.

In many practical circumstances, the data or signals are frequently represented by a few matrices, which, in fact, is convenient for data processing and analysis. However, the data are often given with damage, loss or noise pollution, etc. In this case, how to restore the original data is among the practical difficulties faced when solving the matrix reconstruction problem. Being similar to handling the original signal (vector form) in the compressed sensing field, matrix remodeling refers to an original compressible or sparse representation matrix which can be exactly or approximately reconstructed with a proper model. The matrix reconstruction problem can be classified with matrix completion (MC) [8, 9] and matrix recovery (MR) [10, 11], which is met in various areas, e.g., a famous application is the Netflix system [12] in matrix completion; while the matrix recovery problem originates from face image processing, background modeling and so forth, it can be applied to image alignment as by Peng et al. [13]. Generally speaking, the matrix reconstruction problem is written as

(1.3)

where $X\in {\mathrm{\Re }}^{m×n}$ and C is a convex set. Normally, the following affine constrained minimization problem can be considered:

(1.4)

where the linear map $A:{\mathrm{\Re }}^{m×n}\to {\mathrm{\Re }}^{d}$, $b\in {\mathrm{\Re }}^{d}$ is a given vector.

The model emerges in many fields, such as determining a low-order controller for a plant [14] and a minimum order linear system realization [15], and solving low-dimensional Euclidean embedding problems [16]. A natural generalization in the sense of compressed sensing, (1.4) can be formulated as the nuclear norm minimization problem

(1.5)

where ${\parallel \cdot \parallel }_{\ast }$ denotes the sum of singular values of the corresponding matrix. The matrix completion problem

(1.6)

where $X\in {\mathrm{\Re }}^{m×n}$, $M\in {\mathrm{\Re }}^{m×n}$, Ω is a certain index set, is considered as a special case of (1.3). The idea is inspired by the method presented in [17, 18]etc. Now, we consider the so-called regularized basis pursuit problem

(1.7)

Adding the penalized term of the constraint condition to the objection function of the regularized basis pursuit problem, we obtain

$\underset{x\in {\mathrm{\Re }}^{n}}{min}{\parallel x\parallel }_{1}+\frac{\mu }{2}{\parallel x\parallel }_{2}^{2}+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}.$
(1.8)

Particularly in [18], putting in the term in (1.7) yields the tractable object function, which is a strictly convex function. Thus, the linearly constrained basis pursuit problem has a sole solution and its dual problem is smooth. When μ is sufficiently small, the solution to (1.7) is also the solution of (1.1). This exact regularization property of (1.7) was studied in [19, 20]. Problem (1.7) has a 2-norm in the regularizer term, thus it is considered less sensitive to noise than the basis pursuit problem (1.1).

To solve (1.1), a linearized Bregman iteration method was presented in [21], which was motivated by [22]. The idea of the linearized Bregman method is to combine a fixed point iteration with the Bregman method in [23, 24], but the linearized Bregman method is a bit slow in the convergence result. Hence, many accelerated schemes have been brought up in various theses. For example, in [21] the kicking technique was introduced. Yin [19] verified that the linearized Bregman method is equivalent to a gradient descent method applied to the Lagrangian dual problem of (1.7). They improved the linearized Bregman method, utilizing the Barzilai-Borwein line of searching [25], nonlinear conjugate gradient methods, and the method of limited memory BFGS [26]. Huang et al. [18] proposed an accelerated linearized Bregman method which is based on the fact that the linearized Bregman method is equivalent to the gradient descent method applied to a Lagrangian dual of problem (1.7) and the extrapolation technique, which is adopted in the accelerated proximal gradient methods [27] proposed by Nesterov et al. To solve problem (1.7), Goldstein et al. [28] used an alternating split technique and its Lagrange dual problem.

Based on these studies, we extend the accelerated Bregman method to solve (1.7) in which the object function might be not differentiable but have the ‘good’ performance (convex and continuous). We put forward a new improvement formula based on the accelerated Bregman method. It can be proved to have the property that the modified Bregman method is equivalent to the corresponding accelerated augmented Lagrangian method, and the latter has a rapid convergence rate which can be deemed to be an improvement of [17].

The rest of this article is organized as follows. In Section 2, we sketch the original Bregman method and the linearized Bregman method which are useful for the subsequent analysis. In Section 3, we introduce the accelerated augmented Lagrangian method (AALM), and we present our modified accelerated Bregman method (MABM). Section 4 is devoted to the convergence of the regularized basis pursuit problem and here we analyze the error bound of the MABM in detail. In Section 5, we give some conclusions and discuss the research plans for our future work.

## 2 The original Bregman method and its linearized form

The Bregman method was introduced into image processing by Osher et al. in [23] for solving the total-variation (TV)-based image restoration problems. Let $\phi \left(\cdot \right)$ be a Bregman function, the Bregman distance [29] of the point u to the point v is defined as

${D}_{\phi }^{p}\left(u,v\right):=\phi \left(u\right)-\phi \left(v\right)-〈p,u-v〉,$
(2.1)

where $p\in \partial \phi \left(v\right)$, the subdifferential of φ at v. Note that updating the formula for (1.7), this Bregman iterative regularization procedure recursively solves

${x}^{k+1}:=arg\underset{x}{min}{D}_{\phi }^{{p}^{k}}\left(x,{x}^{k}\right)+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}$
(2.2)

for $k=0,1,\dots$ , starting with ${x}^{0}=0$, ${p}^{0}=0$. Since (2.2) is a convex programming problem, the optimality conditions can be given by

$0\in \partial \phi \left({x}^{k+1}\right)-{p}^{k}-\lambda {A}^{T}\left(b-A{x}^{k+1}\right),$
(2.3)

from which we obtain the updated formula

${p}^{k+1}:={p}^{k}+\lambda {A}^{T}\left(b-A{x}^{k+1}\right).$
(2.4)

Hence, the Bregman iterative scheme is given by

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}{D}_{\phi }^{{p}^{k}}\left(x,{x}^{k}\right)+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2},\\ {p}^{k+1}:={p}^{k}+\lambda {A}^{T}\left(b-A{x}^{k+1}\right).\end{array}$
(2.5)

By the following lemma, we get the equivalent form of (2.5)

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}\phi \left(x\right)+\frac{\lambda }{2}{\parallel Ax-{b}^{k}\parallel }_{2}^{2},\\ {b}^{k+1}:={b}^{k}+\left(b-A{x}^{k+1}\right),\end{array}$
(2.6)

starting with ${x}^{0}=0$, ${b}^{0}=b$.

For the sake of the requirement for the whole theoretical analysis, in the following, we propose some significant equivalence results, and we give the detailed proofs.

Lemma 2.1 The Bregman iterative scheme, which is given by (2.5), will obtain the same optimum point in the first term as (2.6) if

${p}^{k}=\lambda {A}^{T}\left({b}^{k}-b\right)$
(2.7)

holds, where λ is a certain positive constant.

Proof From the first formula of (2.5) and the definition of Bregman distance (2.1), we get

$\begin{array}{rcl}{x}^{k+1}& =& arg\underset{x}{min}\phi \left(x\right)-\phi \left({x}^{k}\right)-〈{p}^{k},x-{x}^{k}〉+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)-〈\lambda {A}^{T}\left({b}^{k}-b\right),x〉+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)-\lambda 〈{b}^{k}-b,Ax-b〉+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}+\frac{\lambda }{2}{\parallel {b}^{k}-b\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)-\frac{\lambda }{2}{\parallel Ax-{b}^{k}\parallel }_{2}^{2},\end{array}$

therefore, we complete the proof. □

From the discussion above, we can give a crucial conclusion.

Theorem 2.2 The original Bregman iterative scheme (2.5) is equivalent to its variant (2.6).

Proof By induction, in fact, we only need to verify that (2.7) holds.

If $k=0$, ${p}^{0}=\lambda {A}^{T}\left({b}^{0}-b\right)=0$, (2.5) holds by the initial conditions ${p}^{0}=0$ and ${b}^{0}=b$.

Now, we suppose that (2.7) holds for $k-1$. Then

$\begin{array}{rcl}{p}^{k}& =& {p}^{k-1}+\lambda {A}^{T}\left(b-A{x}^{k}\right)\\ =& \lambda {A}^{T}\left({b}^{k-1}-b\right)+\lambda {A}^{T}\left(b-A{x}^{k}\right)\\ =& \lambda {A}^{T}\left({b}^{k-1}-b+b-A{x}^{k}\right)\\ =& \lambda {A}^{T}\left({b}^{k}-b\right),\end{array}$

where the first equality is from the second term of (2.5), and the third equality is from the second term of (2.6). Therefore, the original Bregman iterative scheme (2.5) is equivalent to its variant (2.6). □

Noting that [24, 30, 31] and the references therein, it was argued that the Bregman iterative method is equivalent to the augmented Lagrangian method. The significance of this statement will be demonstrated in our later analysis in Section 4.

For the following analysis, we define the Lagrangian function of (1.7),

$L\left(x,\sigma \right):=\phi \left(x\right)+〈\sigma ,b-Ax〉,$

where $\sigma \in {\mathrm{\Re }}^{m}$ is the Lagrangian multiplier. The corresponding augmented Lagrangian function can be expressed as

$L\left(x,\sigma ,\lambda \right):=\phi \left(x\right)+〈\sigma ,b-Ax〉+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2},$
(2.8)

where $\phi \left(x\right):={\parallel x\parallel }_{1}+\frac{\mu }{2}{\parallel x\parallel }_{2}^{2}$ and $\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}$ is the penalty term.

The augmented Lagrangian iterative scheme is

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}\phi \left(x\right)+〈{\sigma }^{k},b-Ax〉+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2},\\ {\sigma }^{k+1}:={\sigma }^{k}+\lambda \left(b-A{x}^{k+1}\right),\end{array}$
(2.9)

starting from ${\sigma }^{0}=0$.

Lemma 2.3 The first item of the iterative sequence $\left\{{x}^{k}\right\}$ of (2.9) and that of (2.6) are equal if

${b}^{k}=b+\frac{1}{\lambda }\sigma$
(2.10)

holds.

Proof From the first formula of (2.6) and (2.10), we get

$\begin{array}{rcl}{x}^{k+1}& =& arg\underset{x}{min}\phi \left(x\right)+\frac{\lambda }{2}{\parallel Ax-b-\frac{{\sigma }^{k}}{\lambda }\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}+〈{\sigma }^{k},b-Ax〉.\end{array}$

The constant of the objective function does not affect the optimum point, so by comparing with the above equations, we get the conclusion. □

Theorem 2.4 The augmented Lagrangian iterative scheme (2.9) is equivalent to the Bregman iterative method variant (2.6).

Proof It is not difficult to see that the proof is the same as that of Theorem 2.2. Similarly, by the mathematical induction, we simply show that (2.7) holds.

If $k=0$, ${b}^{0}=b+\frac{{\sigma }^{0}}{\lambda }$, (2.7) holds by the initial conditions ${\sigma }^{0}=0$ and ${b}^{0}=b$. Now, we suppose that (2.7) holds for $n\le k-1$. Then, when $n=k$,

$\begin{array}{rcl}{b}^{k}& =& {b}^{k-1}+\left(b-A{x}^{k}\right)=b+\frac{1}{\lambda }{\sigma }^{k-1}+\left(b-A{x}^{k}\right)\\ =& b+\frac{1}{\lambda }{\sigma }^{k-1}+\frac{1}{\lambda }\left({\sigma }^{k}-{\sigma }^{k-1}\right)=b+\frac{1}{\lambda }{\sigma }^{k},\end{array}$

where the first equality is from the second term of (2.6), the second equality has its roots in induction, and the third equality is derived from the second term of (2.9). Therefore, the augmented Lagrangian iterative scheme (2.9) is equivalent to the Bregman iterative method variant (2.6). Moreover, we can get the equivalence of (2.5) and (2.9) from the two theorems above. □

For solving the subproblem of Bregman iterative method (2.2), different algorithms were conceived in [27, 32, 33], however, these methods have to perform significant iterations for solving the subproblem. In order to overcome the difficulties, the linearized Bregman method is obtained by linearizing the last term in (2.2) into $〈{A}^{T}\left(A{x}^{k}-b\right),x〉$ (where $\lambda =1$) and adding the ${l}_{2}$-proximity term $\frac{1}{2\gamma }{\parallel x-{x}^{k}\parallel }^{2}$. The concrete scheme is

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}{D}_{\phi }^{{p}^{k}}\left(x,{x}^{k}\right)+〈{A}^{T}\left(A{x}^{k}-b\right),x〉+\frac{1}{2\gamma }{\parallel x-{x}^{k}\parallel }_{2}^{2},\\ {p}^{k+1}:={p}^{k}+{A}^{T}\left(A{x}^{k+1}-b\right)-\frac{1}{\gamma }\left(x-{x}^{k}\right).\end{array}$
(2.11)

By the optimality conditions, we obtain the following formula:

$0\in \partial \phi \left({x}^{k+1}\right)-{p}^{k}-{A}^{T}\left(b-A{x}^{k}\right)+\frac{1}{\gamma }\left({x}^{k+1}-{x}^{k}\right).$

That is, for the second term of (2.11) ${p}^{k+1}\in \partial \phi \left({x}^{k+1}\right)$. In [7, 19, 21], the linearized Bregman method was analyzed; as $\gamma <\frac{2}{{\parallel A\parallel }^{2}}$ (where $\parallel \cdot \parallel$ denotes the spectral norm of the corresponding matrix) the iterates of the linearized Bregman method converge to the solution of the regularized basis pursuit problem

It has appeared in the accelerated Bregman algorithm in recent years, such as in [18]etc. But we shall argue that the accelerated Bregman method has large room to advance. We give some valid improvements on these accelerated Bregman methods. It can be verified that the theoretical result as regards the proposed method has a rapid convergence rate of $O\left(1/{\left(k+1\right)}^{2}\right)$.

## 3 The modified accelerated Bregman method

In [24], the linearized Bregman algorithm could be written as follows.

Algorithm 3.1 (Linearized Bregman method (LBM))

Step 0. Input: $J\left(\cdot \right),H\left(\cdot \right),\gamma >0$; initial point: ${x}^{0}$, ${p}^{0}$.

Step 1. Initialize: $k=0$, let ${x}^{0}=0$ and ${p}^{0}=0$.

Step 2. Compute ${x}^{k+1}:=arg{min}_{x}{D}_{\phi }^{{p}^{k}}\left(x,{x}^{k}\right)+〈\mathrm{\nabla }H\left({x}^{k}\right),x〉+\frac{1}{2\gamma }{\parallel x-{x}^{k}\parallel }_{2}^{2}$.

Step 3. Set ${p}^{k+1}:={p}^{k}-\mathrm{\nabla }H\left({x}^{k}\right)-\frac{1}{\gamma }\left({x}^{k+1}-{x}^{k}\right)$.

Step 4. Set $k:=k+1$, go to step 2.

For the iterative scheme above, $H\left(x\right)=\frac{1}{2}{\parallel Ax-b\parallel }_{2}^{2}$. Next, we give its equivalence form from the following lemma.

Lemma 3.2 The linearized Bregman method (LBM) in Algorithm 3.1 is equivalent to the iterative scheme

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}\phi \left(x\right)+\frac{1}{2\gamma }{\parallel x-\gamma {u}^{k}\parallel }_{2}^{2},\\ {u}^{k+1}:={u}^{k}+{A}^{T}\left(b-A{x}^{k+1}\right),\end{array}$
(3.1)

starting from ${u}^{0}={A}^{T}b$, where ${u}^{k}={p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)+\frac{{x}^{k}}{\gamma }$.

Proof From step 3 of Algorithm 3.1, we have

${p}^{k+1}={p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)-\frac{1}{\gamma }\left({x}^{k+1}-{x}^{k}\right)=\cdots =\sum _{i=0}^{k}{A}^{T}\left(b-A{x}^{i}\right)-\frac{1}{\gamma }{x}^{k+1}.$
(3.2)

Thus, we denote

${u}^{k}:={p}^{k+1}+\frac{1}{\gamma }{x}^{k+1}={p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)+\frac{1}{\gamma }{x}^{k}=\sum _{i=0}^{k}{A}^{T}\left(b-A{x}^{i}\right);$

then we simplify step 2 of Algorithm 3.1,

$\begin{array}{rcl}{x}^{k+1}& =& arg\underset{x}{min}\phi \left(x\right)+〈{p}^{k},x〉-〈{A}^{T}\left(b-A{x}^{k}\right),x〉+\frac{1}{2\gamma }{\parallel x-{x}^{k}\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)+\frac{1}{2\gamma }{\parallel x-\gamma \left[{p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)+\frac{{x}^{k}}{\gamma }\right]\parallel }_{2}^{2}\\ =& arg\underset{x}{min}\phi \left(x\right)+\frac{1}{2\gamma }{\parallel x-\gamma {u}^{k}\parallel }_{2}^{2}.\end{array}$

On the other hand,

$\begin{array}{rcl}{u}^{k+1}& =& {p}^{k+1}+{A}^{T}\left(b-A{x}^{k+1}\right)+\frac{1}{\gamma }{x}^{k+1}\\ =& {p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)-\frac{1}{\gamma }\left({x}^{k+1}-{x}^{k}\right)+{A}^{T}\left(b-A{x}^{k+1}\right)+\frac{{x}^{k+1}}{\gamma }\\ =& {p}^{k}+{A}^{T}\left(b-A{x}^{k}\right)+\frac{1}{\gamma }{x}^{k}+{A}^{T}\left(b-A{x}^{k+1}\right)\\ =& {u}^{k}+{A}^{T}\left(b-A{x}^{k+1}\right),\end{array}$

where the second equality is from (3.2). So, step 2 and step 3 in Algorithm 3.1 can be rewritten as (3.1). □

Yin et al. presented all kinds of techniques, for example, line search, and L-BFGS and BB steps, to accelerate the linearized Bregman method. It is interesting that for the latter in [18] the accelerated linearized Bregman algorithm is argued for as follows.

Algorithm 3.3 (Accelerated linearized Bregman method (ALBM))

Step 0. Initialize: ${x}^{0}={\stackrel{˜}{x}}^{0}={p}^{0}={\stackrel{˜}{p}}^{0}=0$, $\gamma >0$, $\lambda >0$, $k=0$.

Step 1. Compute ${x}^{k+1}:=arg{min}_{x}{D}_{\phi }^{{\stackrel{˜}{p}}^{k}}\left(x,{\stackrel{˜}{x}}^{k}\right)+\lambda 〈{A}^{T}\left(A{\stackrel{˜}{x}}^{k}-b\right),x〉+\frac{1}{2\gamma }{\parallel x-{\stackrel{˜}{x}}^{k}\parallel }_{2}^{2}$.

Step 2. Set ${p}^{k+1}:={\stackrel{˜}{p}}^{k}-\lambda {A}^{T}\left(A{\stackrel{˜}{x}}^{k}-b\right)-\frac{1}{\gamma }\left({x}^{k+1}-{\stackrel{˜}{x}}^{k}\right)$.

Step 3. Set ${\stackrel{˜}{x}}^{k+1}:={\alpha }_{k}{x}^{k+1}+\left(1-{\alpha }_{k}\right){x}^{k}$.

Step 4. Set ${\stackrel{˜}{p}}^{k+1}:={\alpha }_{k}{p}^{k+1}+\left(1-{\alpha }_{k}\right){\stackrel{˜}{x}}^{k}$.

Step 5. Set $k:=k+1$, go to step 1.

These methods motivated us to consider the following algorithm, and its acceleration idea is based on the extrapolation technique proposed by Nesterov in [34, 35]; see also the references therein.

Algorithm 3.4 (Modified accelerated Bregman method (MABM))

Step 0. Initialize: ${x}^{0}={p}^{0}={\stackrel{˜}{p}}^{0}=0$, ${t}_{0}=1$, $\lambda >0$, $k=0$.

Step 1. Compute ${x}^{k+1}:=arg{min}_{x}{D}_{\phi }^{{p}^{k}}\left(x,{x}^{k}\right)+\frac{\lambda }{2}{\parallel Ax-b\parallel }_{2}^{2}$.

Step 2. Set ${\stackrel{˜}{p}}^{k+1}:={p}^{k}+\lambda {A}^{T}\left(b-A{x}^{k}\right)$.

Step 3. Set ${p}^{k+1}:=\left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{p}}^{k+1}+\frac{1-2{t}_{k}}{{t}_{k+1}}{p}^{k}$.

Step 4. Set ${t}_{k+1}=\frac{1}{3}\left(\sqrt{1+9{t}_{k}^{2}}+1\right)$.

Step 5. Set $k:=k+1$, go to step 2.

The basic idea of the equivalence between MABA and the corresponding AALM can be traced back to [31]. Especially, we can see that our updated iterative for ${p}^{k}$ is obviously better than the pre-iterative ${\stackrel{˜}{p}}^{k}$ [17], since we consider a sufficient amount of information about the former iterative. In this way, most of the better iterative efficiency could be expected, which is just our purpose in improving the method. Then we will be dependent on a series of transformations in preparation for the convergence proof in Section 4.

Lemma 3.5 The MABM in Algorithm 3.4 is equal to the following iterative scheme:

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}\phi \left(x\right)+\frac{\lambda }{2}{\parallel Ax-{b}^{k}\parallel }_{2}^{2},\\ {\stackrel{˜}{b}}^{k+1}:={b}^{k}+\left(b-A{x}^{k+1}\right),\\ {b}^{k+1}:=\left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{b}}^{k+1}+\frac{1-2{t}_{k}}{{t}_{k+1}}{b}^{k},\end{array}$
(3.3)

starting with ${b}^{0}=b$.

Proof Recalling (2.7) and noting Theorem 2.2, we can prove that (2.7) holds for all k by induction.

If $k=0$, ${p}^{0}=\lambda {A}^{T}\left({b}^{0}-b\right)=0$, (2.7) holds by the initial conditions ${p}^{0}=0$ and ${b}^{0}=b$. Now, we assume that (2.7) holds for $n\le k-1$, then

$\begin{array}{rcl}{\stackrel{˜}{p}}^{n}& =& {p}^{n-1}+\lambda {A}^{T}\left(b-A{x}^{n}\right)=\lambda {A}^{T}\left({b}^{n-1}-b\right)+\lambda {A}^{T}\left(b-A{x}^{n}\right)\\ =& \lambda {A}^{T}\left({b}^{n-1}-b+b-A{x}^{n}\right)=\lambda {A}^{T}\left({\stackrel{˜}{b}}^{n}-b\right),\end{array}$
(3.4)

where the first equality is directly derived from step 2 of Algorithm 3.4, the first equality is derived from the induction hypothesis, the fourth equality utilizes the second step of (3.3).

Moreover, when $n=k$, we have

$\begin{array}{rcl}{p}^{k}& =& \left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{p}}^{k}+\frac{1-2{t}_{k}}{{t}_{k+1}}{p}^{k-1}\\ =& \left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right)\lambda {A}^{T}\left({\stackrel{˜}{b}}^{k}-b\right)+\frac{1-2{t}_{k}}{{t}_{k+1}}\lambda {A}^{T}\left({b}^{k-1}-b\right)\\ =& \lambda {A}^{T}\left[\left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{b}}^{k}-\left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right)b+\frac{1-2{t}_{k}}{{t}_{k+1}}{b}^{k-1}-\frac{1-2{t}_{k}}{{t}_{k+1}}b\right]\\ =& \lambda {A}^{T}\left({b}^{k}-b\right),\end{array}$

where the first equality is from step 3 of Algorithm 3.4, the second term is from (3.4) and the induction hypothesis, the fourth equality is from the third term of (3.3), so (2.7) holds for all k. Namely, the MABM in Algorithm 3.4 is equal to (3.3). □

Lemma 3.6 Iterative scheme (3.3) is equal to the following AALM iterative scheme:

$\left\{\begin{array}{l}{x}^{k+1}:=arg{min}_{x}\phi \left(x\right)+〈{\sigma }^{k},b-Ax〉+\frac{\lambda }{2}{\parallel Ax-{b}^{k}\parallel }_{2}^{2},\\ {\stackrel{˜}{\sigma }}^{k+1}:={\sigma }^{k}+\left(b-A{x}^{k+1}\right),\\ {\sigma }^{k+1}:=\left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{\sigma }}^{k+1}+\frac{1-2{t}_{k}}{{t}_{k+1}}{\sigma }^{k},\end{array}$
(3.5)

starting from ${\sigma }^{0}=0$, where ${\sigma }^{k}\in {\mathrm{\Re }}^{m}$ is the Lagrangian multiplier.

Proof It is not difficult to see that the idea has likeness to Theorem 2.4, we are just required to verify that (2.7) holds. To this end, we proceed by mathematical induction.

If $k=0$, ${b}^{0}=b+\frac{{\sigma }^{0}}{\lambda }$, (2.7) holds by the initial conditions ${\sigma }^{0}=0$ and ${b}^{0}=b$. Suppose that (2.7) holds for $n\le k-1$, then, when $n=k$, we have

$\begin{array}{rcl}{\stackrel{˜}{b}}^{k}& =& {b}^{k-1}+\left(b-A{x}^{k}\right)=b+\frac{1}{\lambda }{\sigma }^{k-1}+\left(b-A{x}^{k}\right)\\ =& b+\frac{1}{\lambda }{\sigma }^{k-1}+\frac{1}{\lambda }\left({\stackrel{˜}{\sigma }}^{k}-{\sigma }^{k-1}\right)=b+\frac{1}{\lambda }{\stackrel{˜}{\sigma }}^{k},\end{array}$

where the first equality is from the second term of (3.3), the second equality stems from induction, and the third equality is derived from the second term of (3.5).

Thus, we get

$\begin{array}{rcl}{b}^{k}& =& \left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right){\stackrel{˜}{b}}^{k}+\frac{1-2{t}_{k}}{{t}_{k+1}}{b}^{k-1}\\ =& \left(1-\frac{1-2{t}_{k}}{{t}_{k+1}}\right)\left(b+\frac{1}{\lambda }{\stackrel{˜}{\sigma }}^{n}\right)+\frac{1-2{t}_{k}}{{t}_{k+1}}\left(b+\frac{1}{\lambda }{\sigma }^{n-1}\right)\\ =& b+\frac{1}{\lambda }\left(\frac{1-2{t}_{k}}{{t}_{k+1}}{\stackrel{˜}{\sigma }}^{n}+\frac{1-2{t}_{k}}{{t}_{k+1}}{\sigma }^{n-1}\right)\\ =& b+\frac{1}{\lambda }{\sigma }^{n}.\end{array}$

Therefore, AALM (3.5) is equivalent to iterative scheme (3.3). Moreover, we can get the equivalence of the MABM in Algorithm 3.4 to AALM from the above two lemmas. □

Theorem 3.7 The MABM in Algorithm 3.4 is equivalent to the corresponding AALM (3.5).

## 4 The convergence analysis

A practical challenge for the regularized basis pursuit problem is to offer an efficient method to solve the non-smooth optimization problems. Many algorithms have been proposed in recent years [36]. In these methods, some schemes of approximation that have to do with the non-smooth norm term are usually employed. However, a fast global convergence is difficult to guarantee. Due to the non-smooth nature of the 1-norm, a simple method to solve these problems is the subgradient approach [37], which converges only as $O\left(\frac{1}{\sqrt{k}}\right)$, where k is the iteration counter.

In this paper, we present an efficient method with fast global convergence rate to solve the regularized basis pursuit problem. Particularly, we verify that this result is an extended gradient algorithm with the convergence rate of $O\left(\frac{1}{\sqrt{k}}\right)$, like that for smooth problems. Following the Nesterov method for accelerating the gradient method [34, 35], we show that the MABM can be further accelerated to converge as $O\left(1/{\left(k+1\right)}^{2}\right)$.

A series of lemmas in the following are to ensure the convergence rate of the MABM.

Lemma 4.1 Let $\left({x}^{n},{\sigma }^{n}\right)$ be generated by the augmented Lagrangian iterative scheme (2.9), and let $\left({x}^{\ast },{\sigma }^{\ast }\right)$ be a globally optimal solution of the problem

(4.1)

then the inequality

$\phi \left({x}^{k}\right)-\phi \left(x\right)\ge 〈\sigma ,A\left({x}^{k}-x\right)〉$
(4.2)

for any $\left(x,\sigma \right)=\left({x}^{n},{\sigma }^{n}\right)$ and $\left({x}^{\ast },{\sigma }^{\ast }\right)$ holds.

Proof By the optimality conditions, we get

$0\in \partial \phi \left({x}^{n}\right)-{A}^{T}{\sigma }^{n-1}-\lambda {A}^{T}\left(b-A{x}^{n}\right).$

From the second term of (2.9), we obtain

${A}^{T}{\sigma }^{n}\in \partial \phi \left({x}^{n}\right).$
(4.3)

By the definition of subdifferential, we have

$\phi \left({x}^{k}\right)\ge \phi \left({x}^{n}\right)+〈{\sigma }^{n},A\left({x}^{k}-{x}^{n}\right)〉.$

So (4.1) holds for $\left(x,\sigma \right)=\left({x}^{n},{\sigma }^{n}\right)$. Owing to the fact that $\left({x}^{\ast },{\sigma }^{\ast }\right)$ of (4.1) satisfies the KKT condition, it is easy to obtain

$0\in \partial \phi \left({x}^{\ast }\right)-{A}^{T}{\sigma }^{\ast }\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}A{x}^{\ast }=b.$

Compared with (4.3), inequality (4.2) holds for $\left(x,\sigma \right)=\left({x}^{\ast },{\sigma }^{\ast }\right)$, which completes the proof. □

Lemma 4.2 Let $\left({x}^{k},{\sigma }^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (2.9). For any $\left(x,\sigma \right)$, the inequality

$L\left({x}^{k},{\sigma }^{k}\right)-L\left(x,\sigma \right)\ge \frac{1}{\lambda }{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-\frac{1}{\lambda }〈{\sigma }^{k-1}-\sigma ,{\sigma }^{k-1}-{\sigma }^{k}〉$
(4.4)

holds.

Proof By the definition of the Lagrangian function and Lemma 4.1, we get the following result:

$\begin{array}{rcl}L\left({x}^{k},{\sigma }^{k}\right)-L\left(x,\sigma \right)& =& \phi \left({x}^{k}\right)-\phi \left(x\right)-〈\sigma ,b-Ax〉+〈{\sigma }^{k},b-A{x}^{k}〉\\ \ge & 〈\sigma ,A\left({x}^{k}-x\right)〉-〈\sigma ,b-Ax〉+〈{\sigma }^{k},b-A{x}^{k}〉\\ =& 〈{\sigma }^{k}-\sigma ,b-A{x}^{k}〉\\ =& \frac{1}{\lambda }〈\sigma -{\sigma }^{k},{\sigma }^{k}-{\sigma }^{k-1}〉\\ =& \frac{1}{\lambda }{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-\frac{1}{\lambda }〈{\sigma }^{k-1}-\sigma ,{\sigma }^{k-1}-{\sigma }^{k}〉.\end{array}$

The second inequality is derived from (4.2), and the third equality is from (2.9). □

From the fact that ${A}^{T}{\sigma }^{k}\in \partial \phi \left({x}^{k}\right)$, ${A}^{T}{\sigma }^{\ast }\in \partial \phi \left({x}^{\ast }\right)$, and the definition of subdifferential, we have

$\phi \left({x}^{\ast }\right)\ge \phi \left({x}^{k}\right)+〈{\sigma }^{n},A\left({x}^{\ast }-{x}^{n}\right)〉,$

then

$\phi \left({x}^{\ast }\right)-〈{\sigma }^{\ast },A{x}^{\ast }-b〉\ge \phi \left({x}^{k}\right)-〈{\sigma }^{n},A{x}^{n}-b〉,$

thus

$L\left({x}^{\ast },{\sigma }^{\ast }\right)\ge L\left({x}^{k},{\sigma }^{k}\right),$
(4.5)

where we exploit the fact that $A{x}^{\ast }=b$.

Lemma 4.3 Let $\left({x}^{k},{\sigma }^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (2.9), then

${\parallel {\sigma }^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}\ge {\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-2\lambda \left(L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\sigma }^{k}\right)\right).$

Proof Considering (4.4) and replacing $\left(x,\sigma \right)$ with $\left({x}^{\ast },{\sigma }^{\ast }\right)$, we get

$\begin{array}{rcl}{\parallel {\sigma }^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}& =& {\parallel {\sigma }^{k}-{\sigma }^{k-1}+{\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}\\ =& {\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}+2〈{\sigma }^{k}-{\sigma }^{k-1},{\sigma }^{k-1}-{\sigma }^{\ast }〉+{\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}\\ \le & {\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}+{\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}\\ -2{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-2\lambda \left(L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\sigma }^{k}\right)\right)\\ =& {\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-2\lambda \left(L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\sigma }^{k}\right)\right),\end{array}$

which completes the proof. □

Moreover, ${\parallel {\sigma }^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}\le {\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}$ holds due to (4.5), this implies the global convergence of (2.9). By summing the above inequality over $k=1,2,\dots ,n$, we get

$\sum _{k=1}^{n}{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}\le {\parallel {\sigma }^{1}-{\sigma }^{\ast }\parallel }_{2}^{2}.$
(4.6)

Hence,

$\underset{k\to \mathrm{\infty }}{lim}{\parallel {\sigma }^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}=0.$
(4.7)

Theorem 4.4 Let $\left({x}^{k},{\sigma }^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (2.9), then $L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\sigma }^{k}\right)=O\left(\frac{1}{k}\right)$.

Proof Noting (4.4) of Lemma 4.2 and substituting $\left({x}^{n-1},{\sigma }^{n-1}\right)$ for $\left(x,\sigma \right)$, we get

$L\left({x}^{n},{\sigma }^{n}\right)-L\left({x}^{n-1},{\sigma }^{n-1}\right)\ge \frac{1}{\lambda }{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}.$
(4.8)

By multiplying with $n-1$ and summing it over $n=1,\dots ,k$, we obtain

$\begin{array}{c}\sum _{n=1}^{k}\left\{\left(n-1\right)L\left({x}^{n},{\sigma }^{n}\right)-\left(n-1\right)L\left({x}^{n-1},{\sigma }^{n-1}\right)\right\}\hfill \\ \phantom{\rule{1em}{0ex}}=\sum _{n=1}^{k}\left\{nL\left({x}^{n},{\sigma }^{n}\right)-\left(n-1\right)L\left({x}^{n-1},{\sigma }^{n-1}\right)-L\left({x}^{n},{\sigma }^{n}\right)\right\}\hfill \\ \phantom{\rule{1em}{0ex}}=kL\left({x}^{k},{\sigma }^{k}\right)-\sum _{k=1}^{n}L\left({x}^{n},{\sigma }^{n}\right)\hfill \\ \phantom{\rule{1em}{0ex}}\ge \sum _{n=1}^{k}\frac{n-1}{\lambda }{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}\ge 0.\hfill \end{array}$

From the above inequality, we have

$\sum _{n=1}^{k}L\left({x}^{n},{\sigma }^{n}\right)\le kL\left({x}^{k},{\sigma }^{k}\right).$
(4.9)

On the other hand, it follows from Lemma 4.3 that

$L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{n},{\sigma }^{n}\right)\le \frac{1}{2\lambda }\left({\parallel {\sigma }^{n-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}-{\parallel {\sigma }^{n}-{\sigma }^{\ast }\parallel }_{2}^{2}\right).$

Summing the inequality over $n=1,\dots ,k$, we obtain

$\begin{array}{c}\sum _{n=1}^{k}\left(L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{n},{\sigma }^{n}\right)\right)\hfill \\ \phantom{\rule{1em}{0ex}}=-\sum _{n=1}^{k}L\left({x}^{n},{\sigma }^{n}\right)+nL\left({x}^{\ast },{\sigma }^{\ast }\right)\hfill \\ \phantom{\rule{1em}{0ex}}\le \frac{1}{2\lambda }\sum _{n=1}^{k}\left({\parallel {\sigma }^{n-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{n}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}\right)\hfill \\ \phantom{\rule{1em}{0ex}}=\frac{1}{2\lambda }\left({\parallel {\sigma }^{0}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}-\sum _{n=1}^{k}{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}\right).\hfill \end{array}$
(4.10)

Combining (4.9) with (4.10), we get

$\begin{array}{rcl}-kL\left({x}^{k},{\sigma }^{k}\right)+kL\left({x}^{\ast },{\sigma }^{\ast }\right)& \le & -\sum _{n=1}^{k}L\left({x}^{n},{\sigma }^{n}\right)+kL\left({x}^{\ast },{\sigma }^{\ast }\right)\\ \le & \frac{1}{2\lambda }\left({\parallel {\sigma }^{0}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\sigma }^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}-\sum _{n=1}^{k}{\parallel {\sigma }^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}\right)\\ \le & \frac{1}{2\lambda }{\parallel {\sigma }^{0}-{\sigma }^{\ast }\parallel }_{2}^{2}.\end{array}$

Consequently, we obtain

$L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\sigma }^{k}\right)\le \frac{1}{2k\lambda }{\parallel {\sigma }^{\ast }-{\sigma }^{0}\parallel }_{2}^{2}=O\left(\frac{1}{k}\right).$

This completes the proof. □

Comparing (2.9) with (3.5), we have the following two lemmas by replacing $\left({x}^{k},{\sigma }^{k}\right)$ with $\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)$.

Lemma 4.5 Let $\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (3.5). For any $\left(x,\sigma \right)$, we have the inequality

$L\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)-L\left(x,\sigma \right)\ge \frac{1}{\lambda }{\parallel {\stackrel{˜}{\sigma }}^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-\frac{1}{\lambda }〈{\sigma }^{k-1}-\sigma ,{\sigma }^{k-1}-{\stackrel{˜}{\sigma }}^{k}〉.$
(4.11)

Lemma 4.6 Let $\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (3.5), then

${\parallel {\stackrel{˜}{\sigma }}^{k}-{\sigma }^{\ast }\parallel }_{2}^{2}\ge {\parallel {\sigma }^{k-1}-{\sigma }^{\ast }\parallel }_{2}^{2}-{\parallel {\stackrel{˜}{\sigma }}^{k}-{\sigma }^{k-1}\parallel }_{2}^{2}-2\lambda \left(L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)\right).$
(4.12)

Theorem 4.7 Let $\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)$ be generated by the augmented Lagrangian iterative scheme (3.5), then

$L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)=O\left(1/{\left(k+1\right)}^{2}\right).$

Proof We work from step 4 of Algorithm 3.4, i.e., ${t}_{k+1}=\frac{1}{3}\left(\sqrt{1+9{t}_{k}^{2}}+1\right)$, where ${t}_{0}=1$. By a simple calculation, we have

${t}_{k+1}\left(3{t}_{k+1}-2\right)=3{t}_{k}^{2}$
(4.13)

and

${t}_{k}\ge \frac{k+2}{3}.$
(4.14)

Based on Lemma 4.5, by replacing $\left(x,\sigma \right)$ with $\left({x}^{n-1},{\stackrel{˜}{\sigma }}^{n-1}\right)$ and $\left({x}^{\ast },{\sigma }^{\ast }\right)$, setting $k=n$, we get the two inequalities

${w}_{n-1}-{w}_{n}\ge \frac{1}{\lambda }{\parallel {\stackrel{˜}{\sigma }}^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}-\frac{1}{\lambda }〈{\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n-1},{\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n}〉,$
(4.15)
$-{w}_{n}\ge \frac{1}{\lambda }{\parallel {\stackrel{˜}{\sigma }}^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}-\frac{1}{\lambda }〈{\sigma }^{n-1}-{\sigma }^{\ast },{\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n}〉,$
(4.16)

where

${w}_{n}:=L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{n},{\stackrel{˜}{\sigma }}^{n}\right).$

By multiplying both sides of (4.15), (4.16) with $3{t}_{n}-2$ and 2, respectively, and adding the two sides, we obtain

$\begin{array}{c}\left(3{t}_{n}-2\right){w}_{n}-3{t}_{n}{w}_{n+1}\hfill \\ \phantom{\rule{1em}{0ex}}\ge \frac{3{t}_{n}}{\lambda }{\parallel {\stackrel{˜}{\sigma }}^{n}-{\sigma }^{n-1}\parallel }_{2}^{2}+\frac{1}{\lambda }〈2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-3{t}_{n}{\sigma }^{n-1},{\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n}〉.\hfill \end{array}$
(4.17)

For the sake of convenience, we denote

${P}_{n}:=2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-4{t}_{n}{\stackrel{˜}{\sigma }}^{n}+{t}_{n}{\sigma }^{n-1}$

and

${Q}_{n}:=2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-2{t}_{n}{\stackrel{˜}{\sigma }}^{n}-{t}_{n}{\sigma }^{n-1}.$

By multiplying both sides of (4.17) with ${t}_{n}$, we have

$\begin{array}{c}{t}_{n}\left(3{t}_{n}-2\right){w}_{n}-3{t}_{n}^{2}{w}_{n+1}\hfill \\ \phantom{\rule{1em}{0ex}}\ge \frac{3}{\lambda }{\parallel {t}_{n}\left({\stackrel{˜}{\sigma }}^{n}-{\sigma }^{n-1}\right)\parallel }_{2}^{2}\hfill \\ \phantom{\rule{2em}{0ex}}+\frac{1}{\lambda }〈2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-3{t}_{n}{\sigma }^{n-1},{t}_{n}\left({\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n}\right)〉\hfill \\ \phantom{\rule{1em}{0ex}}=\frac{1}{\lambda }〈2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-3{t}_{n}{\stackrel{˜}{\sigma }}^{n},{t}_{n}\left({\sigma }^{n-1}-{\stackrel{˜}{\sigma }}^{n}\right)〉\hfill \\ \phantom{\rule{1em}{0ex}}=\frac{1}{4\lambda }{\parallel 2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-4{t}_{n}{\stackrel{˜}{\sigma }}^{n}+{t}_{n}{\sigma }^{n-1}\parallel }_{2}^{2}\hfill \\ \phantom{\rule{2em}{0ex}}-\frac{1}{4\lambda }{\parallel 2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-2{t}_{n}{\stackrel{˜}{\sigma }}^{n}-{t}_{n}{\sigma }^{n-1}\parallel }_{2}^{2}\hfill \\ \phantom{\rule{1em}{0ex}}=\frac{1}{4\lambda }{\parallel {P}_{n}\parallel }_{2}^{2}-\frac{1}{4\lambda }{\parallel {Q}_{n}\parallel }_{2}^{2},\hfill \end{array}$

where the second equality is derived from the fact that ${\parallel \alpha +\beta \parallel }_{2}^{2}-{\parallel \alpha -\beta \parallel }_{2}^{2}=4〈\alpha ,\beta 〉$.

Then we have

$\begin{array}{rcl}{P}_{n}-{Q}_{n}& =& \left[2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-4{t}_{n}{\stackrel{˜}{\sigma }}^{n}+{t}_{n}{\sigma }^{n-1}\right]\\ -\left[2{\sigma }^{\ast }+\left(3{t}_{n}-2\right){\stackrel{˜}{\sigma }}^{n-1}-2{t}_{n}{\stackrel{˜}{\sigma }}^{n}-{t}_{n}{\sigma }^{n-1}\right]\\ =& 2{t}_{n}{\sigma }^{n-1}-2{t}_{n}{\stackrel{˜}{\sigma }}^{n}\\ =& 2{t}_{n}{\sigma }^{n-1}-2{t}_{n}\frac{{t}_{n+1}{\sigma }^{n}-\left(1-2{t}_{n}\right){\sigma }^{n-1}}{{t}_{n+1}+2{t}_{n}-1}\\ =:& {K}_{1}\left({\sigma }^{n-1}-{\sigma }^{n}\right),\end{array}$

where

${K}_{1}:=\frac{2{t}_{n}{t}_{n+1}}{{t}_{n+1}+2{t}_{n}-1},$

and the third equality is from the third iterative scheme of (3.5) and we make a transposition to the third term. Thus,

$\begin{array}{rcl}3{t}_{n-1}{w}_{n}-3{t}_{n}^{2}{w}_{n+1}& \ge & \frac{1}{4\lambda }{\parallel {P}_{n}\parallel }_{2}^{2}-\frac{1}{4\lambda }{\parallel {P}_{n}-\left({P}_{n}-{Q}_{n}\right)\parallel }_{2}^{2}\\ \ge & \frac{1}{4\lambda }{\parallel {P}_{n}\parallel }_{2}^{2}-\frac{1}{4\lambda }\left({\parallel {P}_{n}\parallel }^{2}+{\parallel {P}_{n}-{Q}_{n}\parallel }_{2}^{2}\right)\\ \ge & -\frac{1}{4\lambda }{\parallel {P}_{n}-{Q}_{n}\parallel }_{2}^{2}\\ :=& -\frac{{K}_{1}}{4\lambda }{\parallel {\sigma }^{n-1}-{\sigma }^{n}\parallel }_{2}^{2}.\end{array}$

Summing the inequality over $n=1,\dots ,k$, we get

$3{t}_{0}^{2}{w}_{1}-3{t}_{k}^{2}{w}_{k+1}\ge -\sum _{n=1}^{k}\frac{{K}_{1}}{4\lambda }{\parallel {\sigma }^{n-1}-{\sigma }^{n}\parallel }_{2}^{2}.$
(4.18)

Then, combining (4.7), (4.14) with (4.18), we have

$\begin{array}{rcl}{w}_{k+1}& \le & \frac{3{t}_{0}^{2}{w}_{1}+{\sum }_{n=1}^{k}\frac{{K}_{1}}{4\lambda }{\parallel {\sigma }^{n-1}-{\sigma }^{n}\parallel }_{2}^{2}}{3{t}_{k}^{2}}\\ \le & \frac{9{t}_{0}^{2}{w}_{1}+3\epsilon }{{\left(k+2\right)}^{2}},\end{array}$

as k tends to infinity, where ε is an arbitrarily small positive number. Hence,

$L\left({x}^{\ast },{\sigma }^{\ast }\right)-L\left({x}^{k},{\stackrel{˜}{\sigma }}^{k}\right)=O\left(1/{\left(k+1\right)}^{2}\right).$

Thus, we complete the convergence proof. □

Remark The right formula of (4.18) exploits the fact that the choice of the penalty factor λ can be seen as a monotonically increasing sequence (such as ${\lambda }_{k}$) that depends on the selection of ${t}_{k}$ in Algorithm 3.4. In this way, we are not only able to guarantee the convergence in the formula of ${K}_{1}$ divided by ${\lambda }_{k}$, but also to play a critical role of punishment to the constraint condition.

## 5 Conclusion

In this paper, we put forward the modified accelerated Bregman method (MABM) for solving the regularized basis pursuit problem. We give some beneficial improvement tasks on the basis of some recent literature on the accelerated Bregman method, and we perform the theoretical feasibility analysis in detail. It can be showed that the proposed MABM has a rapid convergence rate of $O\left(1/{\left(k+1\right)}^{2}\right)$. We will devote our future study to combining the advantages of LBM with our MABM as regards theory and numerical results.

## References

1. Kashin B: The widths of certain finite dimensional sets and classes of smooth functions. Izv. Akad. Nauk SSSR, Ser. Mat. 1977, 41: 334-351.

2. Candés E, Romberg J, Tao T: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52: 489-509.

3. Candés E, Romberg J, Tao T: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 2006, 59: 1207-1223. 10.1002/cpa.20124

4. Candés E, Tao T: Decoding by linear programming. IEEE Trans. Inf. Theory 2005, 51: 4203-4215. 10.1109/TIT.2005.858979

5. Donoho D: Compressed sensing. IEEE Trans. Inf. Theory 2006, 52: 1289-1306.

6. Candés E, Wakin M: An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 21: 21-30.

7. Cai J, Osher S, Shen Z: Linearized Bregman iterations for compressed sensing. Math. Comput. 2009, 78: 1515-1536. 10.1090/S0025-5718-08-02189-3

8. Cai J, Candés E, Shen Z: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 2010, 20: 1956-1982. 10.1137/080738970

9. Candés E, Recht B: Exact matrix completion via convex optimization. Found. Comput. Math. 2009, 9: 717-772. 10.1007/s10208-009-9045-5

10. Candés, E, Li, X, Ma, Y, Wright, J: Robust principal component analysis. Preprint (2009)

11. Figueiredo MAT, Nowak RD, Wright SJ: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 2007,1(4):586-597.

12. Beck A, Teboulle M: Fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2: 183-202. 10.1137/080716542

13. Peng Y, Ganesh A, Wright J, Xu W, Ma Y: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010.

14. Ghaoui LE, Gahinet P: Rank minimization under LMI constraints: a framework for output feedback problems. Proceedings of the European Control Conference 1993.

15. Fazel M, Hindi H, Boyd SP: A rank minimization heuristic with application to minimum order system approximation. 6. Proceedings of the American Control Conference 2001, 4734-4739.

16. Linial N, London E, Rabinovich Y: The geometry of graphs and some of its algorithmic applications. Combinatorica 1995, 15: 215-245. 10.1007/BF01200757

17. Kang M, Yun S, Woo H, Kang M:Accelerated Bregman method for linearly constrained ${l}_{1}-{l}_{2}$ minimization. J. Sci. Comput. 2012, 56: 515-534.

18. Huang B, Ma SQ, Goldfarb D: Accelerated linearized Bregman method. J. Sci. Comput. 2013, 54: 428-453. 10.1007/s10915-012-9592-9

19. Yin W: Analysis and generalizations of the linearized Bregman method. SIAM J. Imaging Sci. 2010, 3: 856-877. 10.1137/090760350

20. Friedlander M, Tseng P: Exact regularization of convex programs. SIAM J. Optim. 2007, 18: 1326-1350.

21. Osher S, Mao Y, Dong B, Yin W: Fast linearized Bregman iteration for compressive sensing and sparse denoising. Commun. Math. Sci. 2010, 8: 93-111. 10.4310/CMS.2010.v8.n1.a6

22. Darbon, J, Osher, S: Fast discrete optimization for sparse approximations and deconvolutions. Preprint (2007)

23. Osher S, Burger M, Goldfarb D, Xu J, Yin W: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 2005,4(2):460-489. (electronic). MR2162864 (2006c:49051) 10.1137/040605412

24. Yin W, Osher S, Goldfarb D, Darbon J:Bregman iterative algorithms for ${\ell }_{1}$-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 2008,1(1):143-168. 10.1137/070703983

25. Barzilai J, Borwein J: Two point step size gradient methods. IMA J. Numer. Anal. 1988, 8: 141-148. 10.1093/imanum/8.1.141

26. Liu D, Nocedal J: On the limited memory method for large scale optimization. Math. Program., Ser. B 1989, 45: 503-528. 10.1007/BF01589116

27. Bennett J, Lanning S: The Netflix prize. Proceedings of KDD Cup and Workshop 2007.

28. Goldstein T, Osher S: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2009,2(2):323-343. 10.1137/080725891

29. Bregman L: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7: 200-217.

30. Powell MJD: A method for nonlinear constraints in minimization problems. In Optimization Edited by: Fletcher R. 1972, 283-298.

31. Rockafellar RT: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1976, 1: 97-116. 10.1287/moor.1.2.97

32. Hale ET, Yin W, Zhang Y:Fixed-point continuation for ${l}_{1}$ minimization: methodology and convergence. SIAM J. Optim. 2008, 19: 1107-1130. 10.1137/070698920

33. Berg EVD, Friedlander MP: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 2008, 31: 890-912.

34. Nesterov YE:A method for unconstrained convex minimization problem with the rate of convergence $O\left(1/{k}^{2}\right)$. Dokl. Akad. Nauk SSSR 1983, 269: 543-547.

35. Nesterov YE 87. Introductory Lectures on Convex Optimization 2004, 220-236.

36. Rennie JDM, Srebro N: Fast maximum margin matrix factorization for collaborative prediction. Proceedings of the International Conference on Machine Learning 2005, 713-719.

37. Bertsekas D: Nonlinear Programming. Athena Scientific, Nashua; 1999.

## Acknowledgements

The project is supported by the Scientific Research Special Fund Project of Fujian University (Grant No. JK2013060), Fujian Natural Science Foundation (Grant No. 2013J01006) and the National Natural Science Foundation of China (Grant No. 11071041).

## Author information

Authors

### Corresponding author

Correspondence to Changfeng Ma.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

## Rights and permissions

Reprints and Permissions

Xie, Y., Ke, Y. & Ma, C. The modified accelerated Bregman method for regularized basis pursuit problem. J Inequal Appl 2014, 130 (2014). https://doi.org/10.1186/1029-242X-2014-130

• Accepted:

• Published:

• DOI: https://doi.org/10.1186/1029-242X-2014-130

### Keywords

• compressed sensing
• linear constraint
• Bregman method
• modified accelerated method
• convergence rate