The symmetric ADMM with indefinite proximal regularization and its application

Due to updating the Lagrangian multiplier twice at each iteration, the symmetric alternating direction method of multipliers (S-ADMM) often performs better than other ADMM-type methods. In practical applications, some proximal terms with positive definite proximal matrices are often added to its subproblems, and it is commonly known that large proximal parameter of the proximal term often results in ‘too-small-step-size’ phenomenon. In this paper, we generalize the proximal matrix from positive definite to indefinite, and propose a new S-ADMM with indefinite proximal regularization (termed IPS-ADMM) for the two-block separable convex programming with linear constraints. Without any additional assumptions, we prove the global convergence of the IPS-ADMM and analyze its worst-case \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(1/t)$\end{document}O(1/t) convergence rate in an ergodic sense by the iteration complexity. Finally, some numerical results are included to illustrate the efficiency of the IPS-ADMM.


Introduction
Let R n i stand for an n i -dimensional Euclidean space, and let X i ⊆ R n i be nonempty, closed and convex set, where i = , . For two continuous closed convex functions θ i (x i ) : R n i → R (i = , ), the canonical two-block separable convex programming with linear equality constraints is where A i ∈ R m×n i (i = , ), b ∈ R m . Throughout, the solution set of () is assumed to be nonempty. Convex programming () has promising applicability in modeling many concrete problems arising in a wide range of disciplines, such as statistical learning, inverse problems and image processing; see, e.g. [-] for more details. Convex programming () has been studied extensively in the literature, researchers have developed many numerical methods to solve it during the last decades, which are mainly based on the well-known Douglas-Rachford splitting method [, ] and the Peachmen-Rachford splitting method [, ], which originate with the partial differential equation (PDE) literature. Concretely, applying the Douglas-Rachford splitting method to the dual of () [, ], we get the well-known alternating direction of multipliers (ADMM) [, ], whose iterative schemes reads where λ ∈ R m is the Lagrangian multiplier; β >  is a penalty parameter, and s ∈ (, + √   ) is a relaxation factor. Analogously, applying the Peachmen-Rachford splitting method to the dual of (), we get the symmetric ADMM [-], which generates its sequence via the scheme where the feasible region of r, s is D = (r, s) r ∈ (-, ), s ∈ ,  + √   & r + s > , |r| <  + ss  .
Both methods make full use of the separable structure of (), and minimize the primal variables x  and x  individually in the Gauss-Seidel way. As elaborated in [], the S-ADMM updates the Lagrangian multiplier twice at each iteration and thus the variables x  , x  are treated in a symmetric manner. The S-ADMM includes some well-known ADMMbased schemes as special cases. For example, it reduces to the original ADMM () when r = , and reduces to the generalized ADMM [] when r ∈ (-, ), s = . Therefore, the S-ADMM provides a unified framework to study the ADMM-type methods. The convergence results of the S-ADMM with any (r, s) ∈ D, including global convergence, the worst-case O(/t) convergence rate in an ergodic sense, have been established in []. To the best of the authors' knowledge, the worst-case O(/t) convergence rate in some nonergodic sense of the S-ADMM is still missing. In practical applications, the two essential subproblems related to x  and x  dominate the computation of the S-ADMM, which are often either linear or easily solvable, but nevertheless challenging. In order to solve the issue, some proximal terms are often added to these subproblems, which can linearize the quadratic term β  A i x i  (i = , ) of these subproblems, and as a result we have the following proximal S-ADMM (termed PS-ADMM) [-]: where G ∈ R n  ×n  is a positive definite matrix. When we set ADMM is offset and thus the quadratic term β Then, if X  = R n  , the PS-ADMM only needs to compute the proximal mapping of the involved convex function θ  (·) at each iteration, which is often simple enough to have a closed-form solution in many practical applications, such as θ  (x  ) = x   in the compressive sensing problems [], θ  (x  ) = x  * (here x  is a square matrix) in the robust principal component analysis models []. x  * is defined by the sum of all singular values of x  .
The curse accompanying the above improvement in solvability is that the proximal parameter τ is not easy to determine for some problems in practice. Large τ prompts the weight of the quadratic term   x x k   G in the objective function of the x  -subproblem and inevitably results in the 'too-small-step-size' phenomenon. Then, the advance of x  is tiny at the kth iteration, which often slows down the convergence of the corresponding method. Therefore, it is meaningful to expand the feasible set of τ . Obviously, if we further reduce τ to τ ≤ β A  A  , the proximal matrix G will become indefinite, and it is thus natural to ask whether or not the corresponding method with such G is still globally convergent? Quite recently the authors in [-] partially answered the question. More specifically, for the ADMM () with s = , He et al. [] have proved that the feasible set of τ can be expanded to {τ |τ > .β A  A  }, and for the ADMM () with s ∈ (, + √   ), Sun et al. [] have proved that the feasible set of τ can be expanded to {τ |τ > ( -min{s,  + ss  })β A  A  /}. Then, for the S-ADMM with r ∈ (-, ), s = , Gao et al. [] have proved that the feasible set of τ can be expanded to {τ |τ > (r r + )β A  A  /(r  -r + )}. Other relevant studies can be found in [, ]. In this paper, we continue to study along this direction, and present a new feasible set of τ , which generalizes those in [-] to any (r, s) ∈ D. Furthermore, we show that for any (r, s) ∈ D, the global convergence of the S-ADMM with some indefinite proximal regularization can be guaranteed.
The rest of the paper is organized as follows. In Section , we summarize some preliminaries which are useful for further discussion. Then, in Section , we list the iterative scheme of the IPS-ADMM and prove its convergence results, including the global convergence and the convergence rate. Some preliminary numerical results are reported in Section . Finally, some conclusions are drawn in Section .

Preliminaries
In this section, we first list some notation used in this paper, and then characterize problem () by a mixed variational inequality problem. Some matrices and variables to simplify the notation of our later analysis are also defined.
For any two vectors x, y ∈ R n , x, y or x y denote their inner product. For any two matrices A ∈ R s×m , B ∈ R n×s , the Kronecker product of A and B is defined as A ⊗ B = (a ij B). We let ·  and · be the  -norm and  -norm for vector variables, respectively. I n denotes the n-dimensional identity matrix. If the matrix G ∈ R n×n is symmetric, we use the symbol x  G to denote x Gx even if G is indefinite; G  (resp., G ) denotes that the matrix G is positive definite (resp., semi-definite).
Let us split the feasible set D of the parameters (r, s) into the following five subsets: Obviously, the set {D  , D  , D  , D  , D  } is a simplicial partition of the set D. Throughout, the proximal matrix G is defined by where we set τ = ατ withτ > β A  A  , α ∈ (c(r, s), +∞), and c(r, s) is defined by which provides more choices for researchers or practitioners.
Furthermore, we define an auxiliary matrix as follows: Invoking the first-order optimality condition for convex programming, we get the following equivalent form of problem (): Finding a vector w * ∈ W such that Obviously, the problem () is a mixed variational inequality problem, which is denoted by MVI(θ , F, W). The mapping F(w) defined in () is not only monotone, but also satisfies the property Furthermore, the solution set of MVI(θ , F, W), denoted by W * , is nonempty under the nonempty assumption for the solution set of problem (). Now, let us define three matrices in order to make our following analysis more succinct. Set Then, the matrices M, Q, H defined, respectively, in (), () satisfies Proof The proof of () is trivial, and we only need to prove (). By the positive definiteness of P, we only need to prove H( : ,  : ) is positive definite. Here H( : ,  : ) denotes the corresponding sub-matrix formed from the rows and columns with the indices ( : ) and ( : ) as in Matlab. Substituting () into the right-hand side of (), we get where the relationship comes from α >  andτ > β A  A  . Since the matrix A  is full column rank, we only need to prove the positive definiteness of the matrix which can be further written as where ⊗ denotes the matrix Kronecker product. Then, we only need to show the -by- matrix Therefore, the matrix H is positive definite. The proof is completed.
At the end of this section, let us summarize two criteria to measure the worst-case O(/t) convergence rate of the ADMM-type methods in an ergodic sense.
() For a given compact setD ⊂ R m+n , let d = sup{ ww  |w ∈D}, where w  is the initial iterate. He et al.
[] established the following criterion: where w t =  t+ t k= w k , C > , and t is the iteration counter. This criterion is used in [, ]. Obviously, we can only ensure that any w ∈D satisfies (). Therefore, the criterion () is not reasonable.
where c > . Proposition  in [] indicates that the vector x t ∈ X  × X  is an optimal solution to () if and only if the left-hand side of () equals zero. Compared with (), the criterion () is more reasonable. Therefore, we shall use a criterion similar to () to measure the O(/t) convergence rate of our new method.

Algorithm and convergence results
In this section, we first present the symmetric ADMM with indefinite proximal regularization (termed IPS-ADMM), and then prove the convergence results of the sequence generated by the IPS-ADMM.
Step . Compute the new iterate w k+ = (x k+  , x k+  , λ k+ ) by the following iterative scheme: () Step . If w kw k+ ≤ ε, then stop; otherwise set k := k + , and go to Step .
Remark . Since the global convergence of IPS-ADMM with α ≥  has been established in the literature [, -], in the following, we restrict α ∈ (c(r, s), ).
To prove the convergence results of the IPS-ADMM, we first define a block matrix and an auxiliary variable.
Proof The proof of this lemma is similar to that of Lemma . and Theorem . in [], which is omitted.
Remark . By the definition of F(·) in (), (), for any (x  , x  , λ) ∈ R m+n  +n  such that A  x  + A  x  = b, the left-hand side of () can be written as Then, substituting the above equality into the left-hand side of (), we get Comparing all the terms appeared in () and (), we find that the left-hand side of () does not have the term Ax k+b  temporarily, and due to the indefinite of R, the term v k -ṽ k  R on the right-hand side of () maybe negative. Now let us deal with the term v k -ṽ k  R , and by doing so, the term Ax k+b  will also appear. By a manipulation, we get the concrete expression of the matrix R, which is as follows: Proof The proof of this lemma is similar to that of Lemma . in [], which is omitted.
The following lemma deals with the crossing term (Ax k+b) A  (x k+ x k  ) on the righthand side of (), whose proof is mainly motivated by those of Lemma . in [] and Lemma . in [].
} be the sequence generated by the IPS-ADMM. Then we have Proof The first-order optimality condition of x  -subproblem in () indicates that, for any Similarly, taking x  = x k+  in () for k := k -, we have Then, adding the above two inequalities, we get From the update formula for λ in (), we have Substituting the above equality into the left-hand side of (), we get By the definitions of G and G  (see () and ()), we have where the last inequality comes from the Cauchy-Schwartz inequality. Substituting the above inequality into the right-hand side of () and arranging terms, we get the assertion () immediately.
Then, substituting () into the right-hand side of (), we get the following main theorem, which provides a lower bound of as w k -w k  R , and the lower bound is composed of the term Ax k+b  , some terms in the form ww k+ ww k  , and some others.

the sequence generated by the IPS-ADMM. Then we have
Now, let us rewrite all the terms on the right-hand side of () by some quadratic terms, and mainly deal with the term x k x k+   G  and the crossing term (Ax kb) A  (x k x k+  ). According to the simplicial partition D i (i = , , . . . , ) of the set D in (), the following analysis is divided into five cases, which are discussed in the following five subsections. 1: (r, s)
Proof We prove the assertion () from the definition of the matrix R directly. Define an auxiliary matrix R  as By the expression of R in (), we have which can be written as Obviously, when α > α  , the above matrix is positive definite. Therefore, the matrix S is positive definite, and then the matrices R and R  are both positive definite by the full column rank of A  and the positive definiteness of P. By a manipulation, we get By the positive definiteness of the matrix S, we get the assertion (). By the definitions of α  and α  , we have Therefore, α  > α  , for any (r, s) ∈ D  . By some manipulations, we have Therefore, α  ∈ (α  , ), for any (r, s) ∈ D  .
Remark . For any (r, s) ∈ D  , Gao et al. [] have proved that α G . = r  -r+ r  -r+ is a lower bound of α. The curves of α  and α G with r ∈ (-, ) are drawn in Figure , from which we have α  < α G if r ∈ (-, ), and α  > α G if r ∈ (, ). Therefore, compared with that in [], the feasible set of τ in this paper is expanded if r ∈ (-, ), and is shrunk if r ∈ (, ). However, Gao et al. only established the worst-case convergence rate of the IPS-ADMM using the criterion (), and we shall prove the worst-case convergence rate of the IPS-ADMM using the more reasonable criterion (); see the following Theorem ..
Proof By the Cauchy-Schwartz inequality, we have Then, substituting the above inequality into the right-hand side of (), we get which proves (). From the definition of T  , α ∈ (α  , ), (r, s) ∈ D  , it is easy to verify that C  , C  , C  , C  > . From the definition of C  , we get Furthermore, by some manipulations, ∀(r, s) ∈ D  , we have Therefore, α  ∈ (α  , ), for any (r, s) ∈ D  .
Proof By the Cauchy-Schwartz inequality, we have Then, substituting the above inequality into the right-hand side of (), we get which proves (). From the definition of T  , α ∈ (α  , ), (r, s) ∈ D  , it is easy to verify that C  , C  , C  , C  > . From the definition of C  , for any (r, s) ∈ D  , we get By the definition of α  , for any (r, s) ∈ D  , we have where the first inequality follows from s  <  + r + s, and the second inequality comes from r < , s ∈ (, + √   ), r < s+  . By some manipulations, we obtain Therefore, α  ∈ (α  , ), for any (r, s) ∈ D  .
In the remainder of this section, we shall establish the convergence results of the sequence generated by the IPS-ADMM. First, based on () and Lemmas .-., we can get the following theorem.
 , x k  , λ k )} be the sequence generated by the IPS-ADMM. Then, for any (r, s) ∈ D, α ∈ (c(r, s), ), where c(r, s) is defined in (), we have With the above theorems in hand, now we are ready to prove the global convergence of the IPS-ADMM.
} be the sequence generated by the IPS-ADMM. Then, if A  , A  are both full column rank, the sequence {(x k , λ k )} is bounded and converges to a point (x ∞ , λ ∞ ) ∈ W * . its promising numerical behaviors in solving an image restoration problem: the totalvariational denoising problem. All the codes were written by Matlab Ra and all the numerical experiments were conducted on a THINKPAD notebook with Pentium(R) Dual-Core CPU@. GHz and  GB RAM.
Below, we consider the total-variational (TV) denoising problem []: where D = [D  , D  ] is a discrete gradient operator with D  : R n → R n , D  : R n → R n being the finite-difference operators in the horizontal and vertical directions, respectively; η >  is the regularization parameter. Here, we set η = .
Introducing an auxiliary variable x ∈ R n , we can reformulate () as Obviously, () is a special case of (), and therefore the IPS-ADMM is applicable. Now, let us elaborate on how to derive the closed-form solutions for the subproblems resulted by the IPS-ADMM. Set P = τ  I n , G = ατ  I n -βD D. For given (x k , y k , λ k ), the first subproblem is which has a closed-form solution: x k+ = shrink , τ  x k + βDy k + λ k β + τ  , η β + τ  .
For given x k+ , y k , λ k+   , the third subproblem is which has a closed-form solution: For the IPS-ADMM, we set β = , τ  = ., τ  = .β D D , α = .c(r, s). For the PS-ADMM, we set G = τ  I n -βB B. The initialization is chosen as x  = , y  = b, λ  = . The stopping criterion is the same as that in []: x k+ -Dy k+ ≤ pri and βD y k+y k ≤ dual , where pri = √ n abs + rel max{ x k+ , Dy k+ }, and dual = √ n abs + rel y k+ with abs =  - and rel =  - . We use the following Matlab scripts to generate some synthetic data

Conclusions
In this paper, a symmetric ADMM with indefinite proximal regularization for two-block linearly constrained convex programming is proposed. Under mild conditions, we have established the global convergence and the worst-case O(/t) convergence rate in an ergodic sense of the new method. Some numerical results are given, which illustrate that the new method often performs better than its counterpart with positive definite proximal regularization. Note that this paper only discusses the symmetric ADMM with indefinite proximal regularization for the two-block separable convex problems. In the future, we shall study the ADMM-type method with indefinite proximal regularization for the multi-block case.