The convergence rate of the proximal alternating direction method of multipliers with indefinite proximal regularization

The proximal alternating direction method of multipliers (P-ADMM) is an efficient first-order method for solving the separable convex minimization problems. Recently, He et al. have further studied the P-ADMM and relaxed the proximal regularization matrix of its second subproblem to be indefinite. This is especially significant in practical applications since the indefinite proximal matrix can result in a larger step size for the corresponding subproblem and thus can often accelerate the overall convergence speed of the P-ADMM. In this paper, without the assumptions that the feasible set of the studied problem is bounded or the objective function’s component \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\theta_{i}(\cdot)$\end{document}θi(⋅) of the studied problem is strongly convex, we prove the worst-case \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{O}(1/t)$\end{document}O(1/t) convergence rate in an ergodic sense of the P-ADMM with a general Glowinski relaxation factor \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\gamma\in(0,\frac{1+\sqrt{5}}{2})$\end{document}γ∈(0,1+52), which is a supplement of the previously known results in this area. Furthermore, some numerical results on compressive sensing are reported to illustrate the effectiveness of the P-ADMM with indefinite proximal regularization.


Introduction
Let θ i : R n i → (-∞, +∞] (i = , ) be two lower semicontinuous proper (not necessarily smooth) functions. This work aims to solve the following two-block separable convex minimization problem: where A i ∈ R l×n i (i = , ), b ∈ R l . If there are convex set constraints x i ∈ X i (i = , ), where X i ⊆ R n i (i = , ) are some simple convex set, such as the nonnegative cones or positive semi-definite cones, etc. Then, we can define the indicator function as I X i (·) (I X i (x i ) =  if x i ∈ X i ; otherwise, I X i (x i ) = +∞), by which we can incorporate the constraints x i ∈ X i (i = , ) into the objective function of (), and get the following equivalent form: min θ  (x  ) + I X  (x  ) + θ  (x  ) + I X  (x  )|A  x  + A  x  = b .
Then, we can further introduce some auxiliary variables and functions to rewrite the above problem as problem () (Please refer to [] for more details). Therefore, problem () is quite general, and in fact problems like () come from diverse applications, such as the latent variable graphical model selection [], the sparse inverse covariance selection [], stable principal component pursuit with nonnegative constraint [], and robust alignment for linearly correlated images [], etc. As one of the first-order methods, the following Algorithm , that is proximal alternating direction method of multipliers (P-ADMM) [-] is quite efficient for solving () or related problems, especially for large scale case.
The parameter γ in the P-ADMM is called the Glowinski relaxation factor in the literature, and γ >  can often accelerate the P-ADMM []. Due to its high efficiency, the P-ADMM has been intensively studied during the past few decades, and many scholars presented a lot of customized variants of the P-ADMM for some concrete separable minimization problems [-].
In this paper, we only focus our attention on the P-ADMM. In fact, the theory developed in this work can easily be extended to its various variants. Now, let us briefly analyze the structure advantages of the P-ADMM. Obviously, the P-ADMM fully utilizes the separable structure inherent to the original problem (), which decouples the primal variable (x  , x  ) and get two subproblems with lower-dimension. Then, at each iteration, the computation of P-ADMM is dominated by solving its two subproblems. Fortunately, the two subproblems in () often admit closed-form solutions provided that θ i (·) (i = , ) are some the functions (such as θ i (·) = ·  , ·  or · * ) and the matrices A i (i = , ) are unitary (i.e. A i A i (i = , ) are the identity matrices). Even if A i (i = , ) are not unitary, we can judiciously set G i = rI n i -βA i A i with r > β A i A i (i = , ), and then the two subproblems in the P-ADMM also have closed-form solutions in many practical applications. The global convergence of the P-ADMM with γ =  has been proved in [, ] for some concrete models of (), and in [], Xu and Wu presented an elegant analysis of the global convergence of the P-ADMM with γ ∈ (, + √   ) for the general model (). Quite recently, He et al. [] have further studied the P-ADMM and get some substantial advances by relaxing the matrix G  in the proximal regularization term of its second subproblem to be indefinite. This is quite preferred in practical applications since the indefinite proximal matrix can result in a larger step size for the subproblem and thus maybe accelerate the overall convergence speed of the P-ADMM. Compared with the study of the global convergence of the P-ADMM, the research of its convergence rate is quite insubstantial in the literature. In [, ], under the assumption that the feasible set of () is bounded, He et al. have proved the worst-case O(/t) convergence rate of the P-ADMM with γ = , where t denotes the iteration counter. In [], Lin et al. have presented a parallel version of the P-ADMM with the adaptive penalty β, and proved that the convergence rate of their new method is also O(/t). In addition, Goldstein et al.
[] proved a better convergence rate than O(/t) for the P-ADMM scheme with γ =  and G  = , G  =  under the assumption that θ i (·) (i = , ) are both strongly convex, which is usually violated in practice, and thus excludes many practical applications of the P-ADMM. Then, by introducing some free parameters α k and γ k , Xu [] developed a new variant of the P-ADMM for (), which refined the results in []. In fact, only under the assumption that the function θ  (·) is strongly convex, Xu [] proved that the new method has O(/t) convergence rate with constant parameters and enjoys O(/t  ) convergence rate with adaptive parameters.
In this paper, we aim to further improve the above results by removing the assumptions of the strong convexity of θ  (·) and the boundedness of the feasible set of (), and prove that the P-ADMM for the convex minimization problem () has a worst-case O(/t) convergence rate in an ergodic sense, which partially improves the results in [, -, ].
The remaining of the paper is organized as follows. Section  gives some useful preliminaries. In Section , we prove the convergence rate of the P-ADMM in detail. In Section , a simple experiment on compressive sensing is conducted to demonstrate the effectiveness of the P-ADMM.

Preliminaries
In this section, we summarize some basic concepts and preliminaries that will be used in the later discussion.
First, we list some notation to be used in this paper. ·, · denotes the inner product of R n ; G  (or G ) denotes that the symmetric matrix G is positive definite (or positive semi- The set of all relative interior points of a given nonempty convex set C is denoted by ri(C).
Then, if f : R n → R is convex, we have the following first-order necessary condition: where ∂f (y) = {ξ ∈ R n : f (ȳ) ≥ f (y) + ξ ,ȳy , forallȳ ∈ R n } denotes the subdifferential of f (·) at the point y.
The following equality is used frequently in the paper: From now on, we denote Throughout this paper, we make the following assumptions.
Then, under Assumption ., it follows from Corollaries .. and .
is an optimal solution to problem () iff there exists a Lagrangian multiplier λ * ∈ R l such that (x *  , x *  , λ * ) is a solution of the following KKT systems: The set of the solutions of () is denoted by W * . By Assumption ., (), and (), for any (x * , λ * ) = (x *  , x *  , λ * ) ∈ W * , we have the following useful inequality: Assumption . The solution set W * of the KKT systems () is nonempty, and at least one (x *  , x *  , λ * ) ∈ W * with λ * = .

Convergence rate of the P-ADMM
In this section, we aim to prove the convergence rate of the P-ADMM, and to accomplish this, we need to make some restrictions of the matrices A i , G i (i = , ) included in the P-ADMM as follows.
, the parameter α can take any value of the interval [., ). Obviously, the parameter α in this paper can also obtain the lower bound . if γ = .
Let us introduce some matrices to simplify our notation in the subsequent analysis. More specifically, we set , we see that the matricesḠ  , M, N, H defined by () are all positive definite. However, the matrix G  defined in Assumption . may be indefinite. For example, when γ = , α = ., and τ = .β Remark . From the definitions of G  andḠ  , we have Now, we start proving the convergence rate of the P-ADMM under Assumptions .-. and Assumption .. Firstly, we prove three lemmas step by step.
Proof Note that the optimality condition for the first subproblem (i.e., the subproblem with respect to x  ) in () is , and the second equality uses the updating formula for λ in (). Then () can be rewritten as where the inequality comes from the convexity of θ  (·) and (). Similarly, the optimality condition for the second subproblem (i.e., the subproblem with respect to x  ) in () gives i.e., where the inequality follows from the convexity of θ  (·) and (). Then, adding () and (), we obtain where the last equality comes from the identity (). Now, let us deal with the term Ax k+b, -λ k on the right side of (). Specifically, from the updating formula for λ in () again, we can get where the second equality comes from λ k+ = λ kγ (λ k -λ k ), and the last equality uses the identity (). Then, substituting () into () yields (). This completes the proof.
The following lemma aims to further refine the crossing term β Ax k+b, A  x k  -A  x k+  on the right side of ().
That is, Similarly, taking x  = x k+  in () for k := k -, and thus we have That is, Adding () and (), we obtain We havẽ Substituting the above equality into (), we obtain Then, substituting () into () yields (). The proof is completed. Now, let us deal with the term -γ  on the right side of ().
} be the sequence generated by the P-ADMM. Then, we have Proof Obviously, by the updating formula for λ in (), we have Then, applying the Cauchy-Schwartz inequality, we can get Then, substituting the above two inequalities into (), and by some simple manipulations, we obtain which is the same as the assertion (), and the lemma is thus proved. Substituting () into (), we get the following important inequality: Now, let us deal with all the terms related with the variable x  on the right side of (). From the definition of the matrices G  , N and (), we have Then, substituting the above inequality into (), we can obtain where the inequality comes from α ∈ (, ] and α ≥ - . Based on (), we can prove the worst-case O(/t) convergence rate in an ergodic sense of the P-ADMM.
where (x * , λ * ) = (x *  , x *  , λ * ) with λ * =  is a point satisfying the KKT conditions in (), and D is a constant defined by (   ) x  = x, x  = x, () can be recast as which is a special case of () with and thus, the P-ADMM can be used to solve CS. In our experiment, the stopping criterion of the P-ADMM is set as where f k denotes the function value of () at the iterate x k  . The initial points of x  , x  , λ are all set as A y, and due to the limit of EMS memory of our computer, we only test a medium scale of () with n = ,, m = , k = , where k is the number of random nonzero elements contained in the original signal. In addition, we set and ν = ., G  = τ I n -βA A with τ = , β = mean(abs(b)). In the literature, the relative error (RelErr) is usually used to measure the quality of recovered signal and is defined by wherex andx denote the recovered signal and the original signal, respectively. First, let us illustrate the sensitivity of γ for the P-ADMM. We choose different values of γ in the interval [., .] (More specifically, we take γ = ., ., . . . , .). The numerical results of the objective value of () and the CPU time in seconds requited by the P-ADMM are depicted in Figure , and the numerical results of the numbers of iteration and the RelErr required by P-ADMM are depicted in Figure .
According to the curves in Figures -, we can see that the relaxation factor γ works well for a wide range of values and, based on this experiment, the values greater than . are more preferred. Now let us test the effectiveness of the P-ADMM with the indefinite proximal matrix G  = (ατβ)I n . Here we set α = ., τ = ., β = , and γ = . The numerical results of one experiment are as follows: the objective value is .; the CPU time is .; the numbers of iteration is  and the RelErr is .%. The original signal, the measurement and the signal recovered by the P-ADMM for this test scenario are given in Figure    clearly that almost the original signal is recovered with high precision. This indicates that the P-ADMM is effective though the proximal matrix G  is indefinite.