Modified hybrid decomposition of the augmented Lagrangian method with larger step size for three-block separable convex programming

The Jacobian decomposition and the Gauss–Seidel decomposition of augmented Lagrangian method (ALM) are two popular methods for separable convex programming. However, their convergence is not guaranteed for three-block separable convex programming. In this paper, we present a modified hybrid decomposition of ALM (MHD-ALM) for three-block separable convex programming, which first updates all variables by a hybrid decomposition of ALM, and then corrects the output by a correction step with constant step size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\alpha \in(0,2-\sqrt{2})$\end{document}α∈(0,2−2) which is much less restricted than the step sizes in similar methods. Furthermore, we show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$2-\sqrt{2}$\end{document}2−2 is the optimal upper bound of the constant step size α. The rationality of MHD-ALM is testified by theoretical analysis, including global convergence, ergodic convergence rate, nonergodic convergence rate, and refined ergodic convergence rate. MHD-ALM is applied to solve video background extraction problem, and numerical results indicate that it is numerically reliable and requires less computation.


Introduction
Many problems encountered in applied mathematics area can be formulated as separable convex programming, such as basis pursuit (BP) problem [1][2][3], video background extraction problem [4][5][6][7], image decomposition [8][9][10], and so on. Thus the solving of separable convex programming plays a fundamental role in applied mathematics and has drawn persistent attention. In the existing literature, several forms of separable convex programming have been investigated [11][12][13][14][15], in which the following three-block separable convex programming rouses more interest: where θ i : R n i → (-∞, +∞] (i = 1, 2, 3) are lower semicontinuous proper convex functions, A i ∈ R l×n i (i = 1, 2, 3) and b ∈ R l , X i (i = 1, 2, 3) are nonempty closed convex sets in R n i (i = 1, 2, 3). Throughout this paper, we assume that the solution set of problem (1) is nonempty. The Lagrangian and augmented Lagrangian functions of problem (1) are defined, respectively, as where λ ∈ R l is the Lagrange multiplier associated with the linear constraints in (1), and β > 0 is a penalty parameter. Applying the augmented Lagrangian method (ALM) [16] to problem (1), we can obtain the following iterative scheme: Obviously, three variables x 1 , x 2 , x 3 are all involved in the minimization problem of (4), which makes the method often hard to implement. One technique to handle this is to split the subproblem into several small scale subproblems. Based on this, if we split it in a Gauss-Seidel manner and adopt the famous alternating direction method of multiplier (ADMM) [11], we obtain the following iterative scheme: On the other hand, if we split it in a Jacobian manner, we get the following full parallel iterative scheme: Compared with the minimization problem in (4), the scale of the minimization procedures in (5) and (6) is decreased, and they fully utilize the separable property of the objective function of (1), thus the new iterative schemes (5) and (6) gain some solvability. However, their convergence cannot be guaranteed under milder conditions as shown in [12,17].
Compared with the regularization method, the prediction-correction method has attracted extensive interest, and during the past decades many scholars have performed studies in this direction. For example, He et al. [24] proposed an ADMM-based contraction type method for solving multi-block separable convex programming, which first generates a temporal iterate by (5), and then corrects it with a Gaussian back substitution procedure. Later, He et al. [12] developed a full Jacobian decomposition of the augmented Lagrangian method for solving multi-block separable convex programming, which first generates a temporal iterate by (6), and then corrects it with a constant step size or varying step size. Different from the above, Han et al. [13] proposed a partial splitting augmented Lagrangian method for solving three-block separable convex programming, which first updates the primal variables x 1 , x 2 , x 3 in a partially-parallel manner, and then corrects x 3 , λ with a constant step size. Later, Wang et al. [25] presented a proximal partially-parallel splitting method for solving multi-block separable convex programming, which first updates all primal variables in a partially-parallel manner, and then corrects the output with a constant step size or varying step size. Quite recently, Chang et al. [26] proposed a convergent prediction-correction-based ADMM in which more minimization problems are involved. In conclusion, the above iteration schemes first generate a temporal iterate by (5) or (6) or their variants, and then generate the new iterate by correcting the temporal iterate with varying step size or a constant step size.
Varying step size needs to be dynamically updated at each iteration, which might be computationally demanding for large-scale (1). Hence in this paper, we consider the prediction-correction method with constant step size for solving problem (1). To the best of our knowledge, He et al. [12] first proposed a prediction-correction method with constant step size for solving (1), and they proved that the upper bound of the constant step size is 0.2679. By taking a hybrid splitting of (4) as the prediction step, Wang et al. [25] relaxed the upper bound of the constant step size to 0.3670 and Han et al. [13] further relaxed it to 0.3820. In practice, to enhance the numerical efficiency of the corresponding iteration method, larger values of the step size are preferred as long as the convergence is still guaranteed [26]. In this paper, based on the methods in [12,13,25], we propose a modified hybrid decomposition of the augmented Lagrangian method with constant step size, whose upper bound is relaxed to 0.5858.
The rest of this paper is organized as follows. Section 2 lists some notations and basic results. In Sect. 3, we present a modified hybrid decomposition of the augmented Lagrangian method with larger step size for problem (1) and establish its global convergence and refined convergence rate. Furthermore, a simple example is given to illustrate that 2 -√ 2 ∼ = 0.5858 is the optimal upper bound of the constant step size in MHD-ALM. In Sect. 4, some numerical results are given to demonstrate the numerical advantage of larger step size. Finally, a brief conclusion including some possible future works is drawn in Sect. 5.

Preliminaries
In this section, we give some notations and basic results about the minimization problem (1), which will be used in the forthcoming discussions.
Throughout this paper, we define the following notations: and Definition 2.1 A tuple (x * , λ * ) ∈ W is called a saddle point of the Lagrangian function (2) if it satisfies the inequalities Solving problem (1) is equivalent to finding a saddle point of L(x, λ) [26,27]. Therefore, to solve (1), we only need to solve the two inequalities in (7), which can be written as the following mixed variational inequality: where Because F(w) is a linear mapping with skew-symmetric coefficient matrix, it satisfies the following property: The mixed variational inequality (8) is denoted by MVI(W, F, θ ), whose solution set is denoted by W * , which is nonempty from the assumption on problem (1).
To solve MVI(W, F, θ ), He et al. [28] presented the following prototype algorithm: A prototype algorithm for MVI(W, F, θ ), denoted by ProAlo: where the matrix Q has the property: (Q + Q ) is positive definite. Correction: Determine a nonsingular matrix M, a scalar α > 0, and generate the new Under Condition 2.1, He et al. [28] established the convergence results of ProAlo, including the global convergence, the worst-case O(1/t) convergence rate in ergodic or nonergodic sense, where t is the iteration counter. See Theorems 3.3, 4.2, 4.5 in [28].
To end this section, we give the following lemma which will be used in the subsequent section.

Algorithm and its convergence
In this section, we give the process of the modified hybrid decomposition of the augmented Lagrangian method (MHD-ALM) for three-block separable convex programming (1) and establish its convergence results, including global convergence, ergodic convergence rate, nonergodic convergence rate, and refined ergodic convergence rate.
The methods in [12,13,[24][25][26] and MHD-ALM all fall into the algorithmic framework of prediction-correction methods. The main differences among these methods are: (i) in the prediction step, the methods in [24,26] update all the primal variables in a sequential order; the method in [12] updates all the primal variables in a parallel manner; the methods in [13,25] and MHD-ALM update all the primal variables in a partial parallel manner, i.e., they first update x 1 and then update x 2 , x 3 in a parallel manner; (ii) in the correction step, the method in [13] updates x 3 , λ; the method in [26] and MHD-ALM update x 2 , x 3 , λ, and the methods in [12,24,25] update all the variables.
The convergence analysis of MHD-ALM needs the following assumption and auxiliary sequence. (1) are both full column rank.

Define an auxiliary sequenceŵ
To prove the convergence results of MHD-ALM, we only need cast it into the ProAlo and ensure the following two conditions hold: (i) the generated sequence satisfying (11), (12); (ii) the resulting matrices Q, M satisfying Condition 2.1 in Sect. 2. We first verify the first condition. Based on Lemma 2.1, we can derive the first order optimality conditions of the subproblems in (13), which are summarized in the following lemma. Lemma 3.1 Let {w k } be the sequence generated by MHD-ALM and {ŵ k } be defined as in (15). Then it holds that where the matrix Q is defined by Proof Based on Lemma 2.1 and using the notation ofŵ k in (15), the first order optimality conditions for the three minimization problems in (13) can be summarized as the following inequalities: Furthermore, the definition of the variableλ k in (15) gives Adding the above four inequalities, rearranging terms, and using the definition of the matrix Q, the function F(w), we can get the result (16). This completes the proof.
This and inequality (16) indicate that Therefore, w k ∈ W * , and the stopping criterion of MHD-ALM is reasonable.
By the definition ofλ k in (15), the updating formula ofλ k can be represented as This together with (14), (15) gives where the matrix M is defined as Now to establish the convergence results of MHD-ALM, we only need to verify that the matrices Q, M satisfy Condition 2.1 in Sect. 2. (17) and (19). If α ∈ (0, 0.5858) and Proof (i) From the definition of Q, we have
which is obviously positive definite by Assumption 3.1.

Theorem 3.1 (Global convergence) Let {w k } be the sequence generated by MHD-ALM.
Then it converges to a vector w ∞ , which belongs to W * . (15). Set

Theorem 3.2 (Ergodic convergence rate) Let {w k } be the sequence generated by MHD-ALM, {ŵ k } be the corresponding sequence defined in
Then, for any integer t ≥ 1, we have Theorem 3.3 (Nonergodic convergence rate) Let {w k } be the sequence generated by MHD-ALM. Then, for any w * ∈ W * and integer t ≥ 1, we have where c 0 > 0 is a constant.
The term H on the right-hand side of (21) is used to measure the ergodic convergence rate of MHD-ALM. However, it is not only independent of the distance between the initial iterate w 0 and the solution set W * but also hard to estimate due to the variable v. Therefore, inequality (21) is not a reasonable criterion to measure the nonergodic convergence rate of MHD-ALM. In the following, we shall give a refined result from the objective function and constraint condition of problem (1), which is more reasonable, accurate, and intuitive.

Lemma 3.3 Let {w k } be the sequence generated by MHD-ALM. Then, for any w ∈ W, we have
Proof The proof is similar to that of Lemma 3.1 in [28] and is omitted for brevity of this paper. This completes the proof. Then, for any integer t ≥ 1, there exists a constant c > 0 such that Proof Choose w * = (x * , λ * ) ∈ W * . Then, for any λ ∈ R l , we havew * := (x * , λ) ∈ W. From the definition of F(w) in (9), we have where the first equation follows from (10). Setting w =w * in (22), we get Combining the above two inequalities gives Summing the above inequality from k = 0 to t -1 yields Dividing both sides of the above inequality by t, we get Then it follows from the convexity of θ i (i = 1, 2, 3) that where . Since (23) holds for any λ, we can set and consequently, and we thus get .
Since x * ∈ X * (here X * denotes the solution set of problem (1)), we have Combining the above two inequalities gives which completes the proof.
As mentioned in Sect. 1, He et al. [12] used a simple example to show that the iterative scheme (6) may diverge for two-block separable convex programming. If we set θ 1 = 0, A 1 = 0 in (1) and MHD-ALM, then MHD-ALM reduces to the method in [12]. In this case, the feasible set of α in [12] is (0,0.3670), the same as that of the method in [25] for three-block separable convex programming. Now we use this example given in [12]  Example 3.1 Consider the linear equation Obviously, the linear equation (24) is a special case of problem (1) with the specifications: Due to θ 1 = 0, A 1 = 0, in the following we do not consider the variable x 1 . The solution set of the corresponding mixed variation inequalities is For MHD-ALM, we set β = 1, the initial point The stopping criterion is set as or the number of iterations exceeds 10,000. The numerical results are graphically shown in Fig. 1, which illustrates that when α ≤ 0.5, the number of iterations is descending with respect to α, while when α ∈ (0.5, 0.55), the number of iterations increases quickly. Therefore, α = 0.5 is optimal for this problem, and larger values of α in its feasible set indeed can enhance the numerical performance of MHD-ALM. Of course, some extreme values, such as the values near the upper bound 0.5858, are not appropriate choices. Now, we show that MHD-ALM may diverge for α ≥ 2 -√ 2. By some simple manipulations, the iterative scheme of (13) and (14) for problem (24) can be written in the following compact form: where Three eigenvalues of the matrix P(α) are Now let us consider the following two cases: (1) For any α > 2 -√ 2, we have Then ρ(P(α)) > 1 for α > 2 -√ 2, where ρ(P(α)) is the spectral radius of P(α). Hence, the iterative scheme (25) with α > 2 -√ 2 is divergent for this problem. (2) For α = 2 -√ 2, by eigenvalue decomposition, the matrix P(2 -√ 2) can be decomposed as Thus, by (25), we get from which it holds that With similar reasoning, we can prove that the iterative scheme (30), (31) is convergent for any α ∈ (0, 2 -√ 3).

Numerical results
In this section, we demonstrate the practical efficiency of MHD-ALM by applying it to recover low-rank and sparse components of matrices from incomplete and noisy observation. Furthermore, to give more insight into the behavior of MHD-ALM, we compare it with the full Jacobian decomposition of the augmented Lagrangian method (FJD-ALM) [12] and the proximal partially parallel splitting method with constant step size (PPPSM) [25]. All experiments are performed on a Pentium(R) Dual-Core CPU T4400@2.2 GHz PC with 4 GB of RAM running on 64-bit Windows operating system. The mathematical model of recovering low-rank and sparse components of matrices from incomplete and noisy observation is [20] min L,S,U where D ∈ R p×q is a given matrix, τ > 0 is a balancing parameter, μ > 0 is a penalty parameter, ⊆ {1, 2, . . . , p} × {1, 2, . . . , q} is the index set of the observable entries of D, and P : R p×q → R p×q is the projection operator defined by Problem (32) is a concrete model of the generic problem (1), and MHD-ALM is applicable. For this problem, the three minimization problems in (13) all admit closed-form solutions, which can be found in [20].

Simulation example
We generate the synthetic data of (32) in the same way as [5,20]. Specifically, let L * , S * be the low-rank matrix, the sparse matrix, respectively, and rr, spr, and sr represent the ratios of the low-rank ratio of L * (i.e., r/p), the number of nonzero entries of S * (i.e., S * 0 /(pq)), and the observed entries (i.e., | /(pq)), respectively. The observed part of the matrix D is generated by the following Matlab scripts, in which b is the vectorization of D: In this experiment, we set τ = 1/ √ p, μ = p + √ 8pσ /10, β = 0.06| | P (D) 1 , the initial iterate (L 0 , S 0 , U 0 , λ 0 ) = (0, 0, 0, 0), and use the stopping criterion or the number of iterations exceeds 500.
The parameters in the three tested methods are listed as follows:

Application example
In this subsection, we apply the proposed method to solve the video background extraction problem with missing and noisy data [29]. There is a video taken in an airport, which consists of 200 grayscale frames with each frame having 144 × 176 pixels. We need to separate its background and foreground. Vectorizing all frames of the video, we get a matrix D ∈ R 25,344×50 , and each column represents a frame. Let L, S ∈ R 25,344×200 be the matrix representations of its background and foreground (i.e., the moving objects), respectively. Then the rank of L is equal to one exactly, and S should be sparse with only a small number of nonzero elements. We consider only a fraction entries of D can be observed, whose indices are collected in the index set . Then the background extraction problem with missing and noisy data can be casted as problem (32). In the experiment, the parameters in MHD-ALM are set as α = 0.5, β = 0.005| | P (D) 1 , the parameters in (32) are set as τ = 1/ √ p, μ = 0.01, and the initial iterate (L 0 , S 0 , U 0 , λ 0 ) = (0, 0, 0, 0). We use the same stopping criterion as (33) with the tolerance 10 -2 . Figure 2 displays the separation results of the 10th and 125th frames of the video with sr = 0.7, which indicate that the proposed MHD-ALM successfully separates the background and foreground of the two frames.

Conclusion
In this paper, a hybrid decomposition of the augmented Lagrangian method is proposed for three-block separable convex programming, whose most important characteristic is that its correction step adopts a constant step size. We showed that the optimal upper bound of the constant step size is 2 -√ 2. Preliminary numerical results indicate that the proposed method is more efficient than similar methods in the literature.
The following two issues deserve further researching: (i) Due to Condition 2.1 being only a sufficient condition to ensure the convergence of the ProAlo, is 1 the optimal upper bound of α in the iterative scheme (28), (29)? Similarly, is 2 -√ 2 the optimal upper bound of α in the iterative scheme (30), (31)? (ii) If we choose different step sizes for x 2 , x 3 , λ in the correction step of MHD-ALM, the feasible set of these step sizes needs more discussion.