Skip to main content

On the global and linear convergence of direct extension of ADMM for 3-block separable convex minimization models

Abstract

In this paper, we show that when the alternating direction method of multipliers (ADMM) is extended directly to the 3-block separable convex minimization problems, it is convergent if one block in the objective possesses sub-strong monotonicity which is weaker than strong convexity. In particular, we estimate the globally linear convergence rate of the direct extension of ADMM measured by the iteration complexity under some additional conditions.

1 Introduction

Because there is still a gap between the empirical efficiency of the direct extension of ADMM for a variety of applications and the lack of theoretical conditions that can both ensure the convergence of the direct extension of ADMM and be satisfied by applications, the main attention of this paper is paid to the study of the convergence of the direct extension of ADMM for the 3-block separable convex optimization problems.

We consider the following separable convex minimization problem whose objective function is the sum of three functions without coupled variables:

$$ \begin{aligned} &\min \theta_{1}(x_{1})+ \theta_{2}(x_{2})+\theta_{3}(x_{3}) \\ &\quad \mbox{s.t. } A_{1}x_{1}+A_{2}x_{2}+A_{3}x_{3}=b, \end{aligned} $$
(1)

where \(A_{i}\in\mathcal{R}^{l\times n_{i}}\) (\(i = 1,2,3\)), \(b\in\mathcal {R}^{l}\), and \(\theta_{i}:\mathcal{R}^{n_{i}}\rightarrow(-\infty,+\infty ]\) (\(i = 1,2,3\)) are closed proper convex (not necessarily smooth) functions. This model has a lot of applications in practice. For example, the latent variable Gaussian graphical model selection in [1], the quadratic discriminant analysis model in [2] and the robust principal component analysis model with noisy and incomplete data in [3, 4], and so on. The augmented Lagrangian function of (1) is defined as

$$ \mathcal{L_{\beta}}(x_{1},x_{2},x_{3}, \lambda):=\sum_{i=1}^{3}\theta _{i}(x_{i})-\Biggl\langle \lambda,\sum _{i=1}^{3}A_{i}x_{i}-b\Biggr\rangle +\frac{\beta }{2}\Biggl\Vert \sum_{i=1}^{3}A_{i}x_{i}-b \Biggr\Vert ^{2}, $$
(2)

where \(\lambda\in\mathcal{R}^{l}\) and \(\beta>0\).

The classical alternating direction method of multipliers (ADMM) for solving the 2-block separable convex minimization problems was first introduced by Gabay and Mercier [5] and Glowinski and Marrocco [6], respectively, and its iterative scheme can be described by

$$\begin{aligned}& x_{1}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{1}} \biggl\{ \theta _{1}(x_{1})+\bigl\langle \lambda ^{k},A_{1}x_{1}\bigr\rangle +\frac{\beta}{2} \bigl\Vert A_{1}x_{1}+A_{2}x^{k}_{2}-b \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(3a)
$$\begin{aligned}& x_{2}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{2}} \biggl\{ \theta _{2}(x_{2})+\bigl\langle \lambda ^{k},A_{2}x_{2}\bigr\rangle +\frac{\beta}{2}\bigl\Vert A_{1}x^{k+1}_{1}+A_{2}x_{2}-b \bigr\Vert ^{2}\biggr\} , \end{aligned}$$
(3b)
$$\begin{aligned}& \lambda^{k+1}=\lambda^{k}-\alpha_{0}\beta \bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}-b \bigr), \end{aligned}$$
(3c)

where \(\alpha>0\) is called step-length. The convergence of ADMM has been well established in the literature (see [5, 7, 8]). For more details of the ADMM, the reader can also refer to [5, 9–13].

Due to the classical ADMM extreme simplicity and efficiency in numerous applications such as mathematical imaging science, signal processing, and so on, it is natural to extend the classical ADMM (3a)-(3c) directly to (1). The direct extension of the ADMM for solving problem (1) consists of the following iterations:

$$\begin{aligned}& x_{1}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{1}} \mathcal{L_{\beta }}\bigl(x_{1},x_{2}^{k},x_{3}^{k}, \lambda ^{k}\bigr), \end{aligned}$$
(4a)
$$\begin{aligned}& x_{2}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{2}} \mathcal{L_{\beta }}\bigl(x_{1}^{k+1},x_{2},x_{3}^{k}, \lambda^{k}\bigr), \end{aligned}$$
(4b)
$$\begin{aligned}& x_{3}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{3}} \mathcal{L_{\beta }}\bigl(x_{1}^{k+1},x_{2}^{k+1},x_{3}, \lambda^{k}\bigr), \end{aligned}$$
(4c)
$$\begin{aligned}& \lambda^{k+1}=\lambda^{k}-\alpha\beta \bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \bigr). \end{aligned}$$
(4d)

Despite the scheme working very well for many concrete applications of (1) (see e.g. [1, 2, 4, 14]), Chen et al. [15] showed by a counter example that the convergence of (4a)-(4d) fails. The absence of the convergence of (8) has inspired some improved algorithms. These algorithms are mainly used the following two ways: One way is to correct the output of (4a)-(4d). For example, the authors of [16, 17] added an additional Gaussian back substitution correction step in each iteration after all the block variables are updated. Although, numerically, these algorithms perform slightly slower than the scheme (4a)-(4d), they possess global convergence. The other way is to employ a simple proximal term to solve inexactly the \(x_{i}\)-subproblem in (4a)-(4d), which can make the subproblems of (4a)-(4d) become much easier to carry out and the entire algorithm runs in less time. The readers can refer to [4, 18–25].

On the other hand, several researchers have also studied the convergence of the direct extension of the ADMM (4a)-(4d) by introducing some strong conditions. Han and Yuan [26] have showed that the scheme (4a)-(4d) with \(\alpha=1\) is convergent if the functions \(\theta_{i}\) (\(i=1,2,3\)) are all strongly convex and the penalty parameter β chosen in a certain interval. Subsequently, these conditions were weakened in [27, 28], and the authors showed that the condition that the two functions are strongly convex can ensure the convergence of (4a)-(4d) with \(\alpha=1\). Recently, these conditions were further weakened, Cai et al. [29] had proved that the scheme (4a)-(4d) with \(\alpha=1\) was convergent if one function in the objective is strongly convex. Very recently, Li et al. [30] showed that the directly extended 3-block ADMM with \(\alpha \in(0,(1+\sqrt{5})/2)\) is convergent, if β is smaller than a certain threshold and the first and third linear operators in the linear equation constraint are full column rank, and the second function in the objective is strongly convex. However, many applications that can be efficiently solved by the scheme (4a)-(4d) will be excluded because of the strong convexity. Thus, these conditions are of only theoretical interests and they seem to be too strict to be satisfied by many mentioned applications.

In the cyclic sense, the scheme (4a)-(4d) can be rewritten as

$$\begin{aligned}& x_{1}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{1}} \mathcal{L_{\beta }}\bigl(x_{1},x_{2}^{k},x_{3}^{k}, \lambda ^{k}\bigr), \end{aligned}$$
(5a)
$$\begin{aligned}& x_{2}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{2}} \mathcal{L_{\beta }}\bigl(x_{1}^{k+1},x_{2},x_{3}^{k}, \lambda^{k}\bigr), \end{aligned}$$
(5b)
$$\begin{aligned}& \lambda^{k+1}=\lambda^{k}-\beta \bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b \bigr), \end{aligned}$$
(5c)
$$\begin{aligned}& x_{3}^{k+1}=\mathop{\operatorname{arg\,min}}_{x_{3}} \mathcal{L_{\beta }}\bigl(x_{1}^{k+1},x_{2}^{k+1},x_{3}, \lambda^{k+1}\bigr). \end{aligned}$$
(5d)

In this manuscript, we show that (5a)-(5d) is convergent if one function in the objective of (1) is sub-strongly monotone together with some minor restrictions on the coefficient matrices \(A_{1}\), \(A_{2}\), \(A_{3}\), and the penalty parameter β, which explains why the direct extension of ADMM (4a)-(4d) works well for some applications, even though there are not strong convex functions in such applications. Furthermore, we establish a globally linear convergence rate for the direct extension of ADMM (5a)-(5d) under some additional conditions.

After presenting in Section 2 the needed preliminary material, we devote Section 3 to a proof of the global and linear convergence of the scheme (5a)-(5d) under some assumptions. In Section 4, we construct an example which satisfies the convergence conditions given in Section 3 but do not satisfy the condition that one of the functions in the objective is strongly convex.

2 Preliminaries

In this section we summarize some of notations and the fundamental tools of variational analysis.

We use \(\langle\cdot,\cdot\rangle\) to denote the inner product of \(\mathcal{R}^{n}\), and denote by \(\|\cdot\|\) its induced norm. \(\mathbf {B}_{r}(x)\) stands for the closed ball of radius r centered at x. Throughout the paper we let all vectors be column vectors. Let A be a symmetric matrix, \(\lambda_{\min}(A)\) and \(\lambda_{\max}(A)\) to denote the smallest eigenvalue and the largest eigenvalue of A, respectively. A real symmetric matrix \(A\in\mathcal{R}^{n\times n}\) is called positive definite (or positive semi-definite) if for all \(x\neq0\), \(x^{T}Ax>0\) (or \(x^{T}Ax\geq0\)). We denote this as \(A\succ0\) (or \(A\succeq0\)). For any real symmetric matrices \(A,B\in \mathcal{R}^{n\times n}\), we use \(A\succ B\) (or \(A\succeq B\)) to mean \(A-B\succ0\) (or \(A-B\succeq0\)). We denote by \(\|x\|_{M}:=\sqrt{x^{T}Mx}\) the M-norm of the vector x if the matrix M is symmetric and positive definite. For a given matrix A, we use

$$\|A\|:=\sup_{x\neq0}\frac{\|Ax\|}{\|x\|} $$

to denote its norm.

Given a nonempty subset C in \(\mathcal{R}^{n}\), its indicator function is defined as

$$\delta(x;C):= \textstyle\begin{cases} 0,&x\in C, \\ +\infty,&\mbox{otherwise}. \end{cases} $$

A function \(f: \mathcal{R}^{n}\rightarrow\mathcal{R}\) is convex if

$$f\bigl(\alpha x+(1-\alpha)y\bigr)\leq\alpha f(x)+(1-\alpha)f(y), \quad \forall x,y\in\mathcal{R}^{n}, \forall\alpha\in[0,1], $$

and it is strongly convex with modulus \(\mu> 0\) if

$$f\bigl(\alpha x+(1-\alpha)y\bigr)\leq\alpha f(x)+(1-\alpha)f(y)- \frac{\mu }{2}\alpha(1-\alpha)\|x-y\|^{2},\quad \forall x,y\in \mathcal{R}^{n}, \forall \alpha\in[0,1]. $$

A multifunction \(F: \mathcal{R}^{n}\rightrightarrows\mathcal{R}^{n}\) (see [31]) is monotone if

$$\langle y_{1}-y_{2},x_{1}-x_{2}\rangle \geq0,\quad \forall y_{1}\in F(x_{1}), \forall y_{2}\in F(x_{2}), $$

and strongly monotone with modulus \(\mu> 0\) if

$$\langle y_{1}-y_{2},x_{1}-x_{2}\rangle \geq\mu\|x_{1}-x_{2}\|^{2},\quad \forall y_{1}\in F(x_{1}), \forall y_{2}\in F(x_{2}). $$

It is well known that a function f is convex if and only if ∂f, the subdifferential of f, is monotone; and f is strongly convex if and only if ∂f is strongly monotone (see, e.g., [31]).

For a differentiable function f, the gradient ∇f is called Lipschitz continuous with constant \(L_{f}>0\) if

$$\bigl\Vert \nabla f(x)-\nabla f(y)\bigr\Vert \leq L_{f}\Vert x-y\Vert ,\quad \forall x,y\in\mathcal{R}^{n}. $$

For any two vectors x and y with the same dimension, we have

$$ 2\langle x,y\rangle\leq t\|x\|^{2}+\frac{1}{t}\|y \|^{2},\quad \forall t>0. $$
(6)

Throughout this paper, we make the following standard assumption.

Assumption 2.1

There is a point \((\hat{x}_{1},\hat{x}_{2},\hat{x}_{3})\in\operatorname{ri} (\operatorname{dom}\theta_{1}\times\operatorname{dom}\theta_{2}\times \operatorname{dom}\theta_{3})\) such that \(A_{1}\hat{x}_{1}+A_{2}\hat{x}_{2}+A_{3}\hat{x}_{3}=b\).

Suppose that the constraint qualification (CQ) holds, then we know from Corollary 28.2.2 of [31] and Corollary 28.3.1 of [31] that \((x_{1}^{*},x_{2}^{*},x_{3}^{*})\in\operatorname{ri} (\operatorname{dom}\theta_{1}\times\operatorname{dom}\theta_{2}\times\operatorname{dom}\theta_{3})\) is an optimal solution to problem (1) if and only if there exists a Lagrange multiplier \(\lambda^{*}\in\mathcal{R}^{l}\) such that \((x_{1}^{*},x_{2}^{*},x_{3}^{*},\lambda^{*})\) is a solution to the following Karush-Kuhn-Tucher (KKT) system:

$$\begin{aligned}& 0\in\partial\theta_{1}\bigl(x_{1}^{*}\bigr)-A_{1}^{T} \lambda^{*}, \end{aligned}$$
(7a)
$$\begin{aligned}& 0\in\partial\theta_{2}\bigl(x_{2}^{*}\bigr)-A_{2}^{T} \lambda^{*}, \end{aligned}$$
(7b)
$$\begin{aligned}& 0\in\partial\theta_{3}\bigl(x_{3}^{*}\bigr)-A_{3}^{T} \lambda^{*}, \end{aligned}$$
(7c)
$$\begin{aligned}& 0=A_{1}x_{1}^{*}+A_{2}x_{2}^{*}+A_{3}x_{3}^{*}-b. \end{aligned}$$
(7d)

We denote by \(\mathcal{W}^{*}\) the set of the solutions of (7a)-(7d).

3 Convergence

In this section, we prove that the iterative sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) generated by the direct extension of ADMM (5a)-(5d) converges to a point \((x_{1}^{*}, x^{*}_{2}, x^{*}_{3}, \lambda^{*})\) which is a solution of the KKT system (7a)-(7d) under the following assumption. In the following, the matrices \(A_{1}\), \(A_{2}\), and \(A_{3}\) are assumed to be full column rank. We define the notations

$$ G:= \begin{pmatrix}\beta A_{2}^{T}A_{2}&{\mathbf{0}}&{\mathbf{0}}\\{\mathbf{0}}&\beta A_{3}^{T}A_{3}&-A^{T}_{3}\\{\mathbf{0}}&-A_{3}&\frac{1}{\beta}I \end{pmatrix} , \qquad G_{1}:= \begin{pmatrix}\beta A_{2}^{T}A_{2}&{\mathbf{0}}&{\mathbf{0}}\\{\mathbf{0}}&\beta(1+\frac {3}{\rho}) A_{3}^{T}A_{3}&-A^{T}_{3}\\{\mathbf{0}}&-A_{3}&\frac{1}{\beta}I \end{pmatrix} , $$

and

$$ w:=(x_{1},x_{2},x_{3}, \lambda)^{T}, \qquad v:=(x_{2},x_{3}, \lambda)^{T}, $$

where \(\rho>0\) and β is the penalty parameter in the direct extension of ADMM (5a)-(5d). Then the matrices G and \(G_{1}\) are symmetric.

3.1 Global convergence

Assumption 3.1

(Sub-strong monotonicity)

There exist \((\tilde{x}_{1}^{*},\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde {\lambda}^{*})\in\mathcal{W}^{*}\) and a real number \(\mu_{3}>0\) such that

$$ \bigl\langle y_{3}-A_{3}^{T}\tilde{ \lambda}^{*},x_{3}-\tilde{x}^{*}_{3}\bigr\rangle \geq\mu _{3}\bigl\Vert x_{3}-\tilde{x}^{*}_{3}\bigr\Vert ^{2}, \quad \text{for all } x_{3}\in \mathcal {R}^{n_{3}} \text{ and } y_{3}\in\partial\theta_{3}(x_{3}). $$
(8)

Now, we start proving the convergence of the iterative scheme (5a)-(5d) under Assumption 3.1. First, we give several lemmas.

Lemma 3.1

Suppose Assumption  3.1 holds. For the iterative sequence \(\{ (x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) generated by the direct extension of ADMM (5a)-(5d), then we have

$$\begin{aligned} \bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k}-v^{k+1}\bigr) \geq&\bigl\langle A_{2} \bigl(x_{2}^{k}-x_{2}^{k+1}\bigr), \lambda^{k}-\lambda^{k+1}\bigr\rangle \\ &{}-\beta\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{3}\bigl(x_{3}^{k}-\tilde {x}_{3}^{*}\bigr)\bigr\rangle +\mu_{3}\bigl\Vert x_{3}^{k+1}-\tilde{x}^{*}_{3}\bigr\Vert ^{2}, \end{aligned}$$
(9)

where \(\tilde{v}^{*}=(\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde{\lambda }^{*})\) in \((\tilde{x}_{1}^{*},\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde{\lambda }^{*})\) introduced in Assumption  3.1.

Proof

Indeed, the optimality condition of subproblems in (5a)-(5d) can be written as

$$\begin{aligned}& 0\in\partial\theta_{1}\bigl(x_{1}^{k+1} \bigr)-A_{1}^{T}\lambda^{k}+\beta A_{1}^{T}\bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k}+A_{3}x_{3}^{k}-b \bigr), \end{aligned}$$
(10a)
$$\begin{aligned}& 0\in\partial\theta_{2}\bigl(x_{2}^{k+1} \bigr)-A_{2}^{T}\lambda^{k}+\beta A_{2}^{T}\bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k}-b \bigr), \end{aligned}$$
(10b)
$$\begin{aligned}& 0\in\partial\theta_{3}\bigl(x_{3}^{k+1} \bigr)-A_{3}^{T}\lambda^{k+1}+\beta A_{3}^{T}\bigl(A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b \bigr). \end{aligned}$$
(10c)

Using (5c), (10a)-(10c) can be rewritten as

$$\begin{aligned}& 0\in\partial\theta_{1}\bigl(x_{1}^{k+1} \bigr)-A_{1}^{T}\lambda^{k+1}+\beta A_{1}^{T}A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr), \end{aligned}$$
(11a)
$$\begin{aligned}& 0\in\partial\theta_{2}\bigl(x_{2}^{k+1} \bigr)-A_{2}^{T}\lambda^{k+1}, \end{aligned}$$
(11b)
$$\begin{aligned}& 0\in\partial\theta_{3}\bigl(x_{3}^{k+1} \bigr)-A_{3}^{T}\lambda^{k+1}+ A_{3}^{T} \bigl(\lambda ^{k}-\lambda^{k+1}\bigr)+\beta A_{3}^{T}A_{3}\bigl(x_{3}^{k+1}-x_{3}^{k} \bigr). \end{aligned}$$
(11c)

Using the monotonicity of the subdifferential and Assumption 3.1, it follows from (7a)-(7d) and (11a)-(11c) that we have

$$\begin{aligned}& \bigl\langle A_{1}^{T}\bigl(\lambda^{k+1}-\tilde{ \lambda}^{*}\bigr)+\beta A_{1}^{T}A_{2} \bigl(x_{2}^{k+1}-x_{2}^{k} \bigr),x_{1}^{k+1}-\tilde{x}^{*}_{1}\bigr\rangle \geq0, \end{aligned}$$
(12a)
$$\begin{aligned}& \bigl\langle A_{2}^{T}\bigl(\lambda^{k+1}-\tilde{ \lambda}^{*}\bigr),x_{2}^{k+1}-\tilde {x}^{*}_{2}\bigr\rangle \geq0, \end{aligned}$$
(12b)
$$\begin{aligned}& \bigl\langle A_{3}^{T}\bigl(\lambda^{k+1}-\tilde{ \lambda}^{*}\bigr)+A_{3}^{T}\bigl(\lambda ^{k+1}- \lambda^{k}\bigr)-\beta A_{3}^{T}A_{3} \bigl(x_{3}^{k+1}-x_{3}^{k} \bigr),x_{3}^{k+1}-\tilde {x}^{*}_{3}\bigr\rangle \\& \quad \geq\mu_{3}\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}, \end{aligned}$$
(12c)

where \(\mu_{3}>0\). Adding up these three inequalities in (12a)-(12c) and using (7d), we obtain

$$\begin{aligned}& \mu_{3}\bigl\Vert x_{3}^{k+1} - \tilde{x}^{*}_{3}\bigr\Vert ^{2} \\& \quad \leq\bigl\langle \lambda^{k+1}-\lambda^{*},A_{1} \bigl(x_{1}^{k+1}-\tilde {x}_{1}^{*} \bigr)+A_{2}\bigl(x_{2}^{k+1}-\tilde{x}_{2}^{*} \bigr)+A_{3}\bigl(x_{3}^{k+1}-\tilde {x}_{3}^{*}\bigr)\bigr\rangle \\& \qquad {}+\bigl\langle x_{1}^{k+1}-\tilde{x}_{1}^{*}, \beta A_{1}^{T}A_{2}\bigl(x_{2}^{k+1}-x_{2}^{k} \bigr) \bigr\rangle +\bigl\langle x_{3}^{k+1}- \tilde{x}_{3}^{*}, A_{3}^{T}\bigl(\lambda ^{k+1}-\lambda^{k}\bigr) \bigr\rangle \\& \qquad {}+\bigl\langle x_{3}^{k+1}-\tilde{x}_{3}^{*}, \beta A_{3}^{T}A_{3}\bigl(x_{3}^{k}-x_{3}^{k+1} \bigr) \bigr\rangle \\& \quad = \biggl\langle \lambda^{k+1}-\tilde{\lambda}^{*},\frac{1}{\beta} \bigl(\lambda ^{k}-\lambda^{k+1}\bigr)-A_{3} \bigl(x_{3}^{k}-x_{3}^{k+1}\bigr)\biggr\rangle \\& \qquad {}+\bigl\langle x_{3}^{k+1}-\tilde{x}_{3}^{*}, \beta A_{3}^{T}A_{3}\bigl(x_{3}^{k}-x_{3}^{k+1} \bigr)-A_{3}^{T}\bigl(\lambda^{k}- \lambda^{k+1}\bigr) \bigr\rangle \\& \qquad {}+\bigl\langle x_{1}^{k+1}-\tilde{x}_{1}^{*}, \beta A_{1}^{T}A_{2}\bigl(x_{2}^{k+1}-x_{2}^{k} \bigr) \bigr\rangle . \end{aligned}$$

Using the notations G and v, we further obtain

$$\begin{aligned}& \bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k}-v^{k+1}\bigr) \\& \quad \geq\beta\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{1}\bigl(x_{1}^{k+1}-\tilde {x}^{*}_{1}\bigr)\bigr\rangle +\mu_{3}\bigl\Vert x_{3}^{k+1}-\tilde{x}^{*}_{3}\bigr\Vert ^{2} \\& \qquad {}+\beta\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{2}\bigl(x_{2}^{k+1}-\tilde {x}^{*}_{2}\bigr)\bigr\rangle \\& \quad =\beta\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{1}\bigl(x_{1}^{k+1}-\tilde {x}^{*}_{1}\bigr)+A_{2}\bigl(x_{2}^{k+1}- \tilde{x}^{*}_{2}\bigr)\bigr\rangle +\mu_{3}\bigl\Vert x_{3}^{k+1}-\tilde {x}^{*}_{3}\bigr\Vert ^{2} \\& \quad =\beta\biggl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),\frac{1}{\beta}\bigl(\lambda ^{k}-\lambda^{k+1} \bigr)-A_{3}\bigl(x_{3}^{k}-\tilde{x}_{3}^{*} \bigr)\biggr\rangle +\mu_{3}\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}, \end{aligned}$$
(13)

which implies (9) and thus completes the proof. □

Lemma 3.2

There exists a real number \(\rho\in(0,1)\) such that the matrix \(G_{1}\) is symmetric and positive definite.

Proof

Let

$$ P_{1}:= \begin{pmatrix}\beta(1+\frac{3}{\rho}) A_{3}^{T}A_{3}&-A^{T}_{3}\\-A_{3}&\frac {1}{\beta}I \end{pmatrix} . $$

In order to justify the matrix \(G_{1}\) is symmetric and positive definite, we only need to show the matrix \(P_{1}\) is positive definite. Since

$$\begin{aligned} \frac{1}{\beta}I-A_{3}\biggl[\beta\biggl(1+ \frac{3}{\rho}\biggr) A_{3}^{T}A_{3} \biggr]^{-1}A_{3}^{T}&=\frac{1}{\beta}I- \frac{1}{\beta}\cdot\frac {\rho}{3+\rho}A_{3}\bigl(A_{3}^{T}A_{3} \bigr)^{-1}A_{3}^{T} \\ &\succeq\frac{1}{\beta}I-\frac{1}{\beta}\cdot\frac{\rho}{3+\rho }\cdot \frac{\lambda_{\max}(A_{3}A_{3}^{T})}{\lambda_{\min}(A_{3}^{T}A_{3})}I \\ &=\biggl[1-\frac{\rho}{3+\rho}\cdot\frac{\lambda_{\max }(A_{3}A_{3}^{T})}{\lambda_{\min}(A_{3}^{T}A_{3})}\biggr]\frac{1}{\beta}I. \end{aligned}$$

If \(\lambda_{\max}(A_{3}A_{3}^{T})\leq\lambda_{\min}(A_{3}^{T}A_{3})\), then for any \(\rho\in(0,1)\), we have

$$\frac{1}{\beta}I-A_{3}\biggl[\beta\biggl(1+\frac{3}{\rho} \biggr) A_{3}^{T}A_{3}\biggr]^{-1}A_{3}^{T} \succ0. $$

Otherwise, for any

$$\rho\in\biggl(0,\frac{3\lambda_{\min}(A_{3}^{T}A_{3})}{\lambda_{\max }(A_{3}A_{3}^{T})-\lambda_{\min}(A_{3}^{T}A_{3})}\biggr), $$

we have

$$\frac{1}{\beta}I-A_{3}\biggl[\beta\biggl(1+\frac{3}{\rho} \biggr) A_{3}^{T}A_{3}\biggr]^{-1}A_{3}^{T} \succ0. $$

Thus, it follows from the Schur complement [32], Section A.5.5, that there exists a real number \(\rho\in(0,1)\) such that the matrix \(P_{1}\) is symmetric and positive definite, and so is \(G_{1}\). □

Lemma 3.3

Let the iterative sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) be generated by the direct extension of ADMM (5a)-(5d) with \(\beta\in(0,2\rho\mu_{3}/(5\|A_{3}^{T}A_{3}\|))\) and \(\rho\in(0,1)\) defined in Lemma  3.2. Suppose Assumption  3.1 holds. Then there is a real number \(\eta>0\) such that

$$ \bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}\leq\bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}-\eta \bigl\Vert v^{k}-v^{k+1} \bigr\Vert _{G_{1}}^{2}. $$
(14)

Proof

Note that (11b) is also true for \(k := k - 1\), i.e.,

$$0\in\partial\theta_{2}\bigl(x_{2}^{k} \bigr)-A_{2}^{T}\lambda^{k}. $$

Using the monotonicity of the subdifferential \(\partial\theta_{2}\), we have

$$\begin{aligned} 0&\leq\bigl\langle A_{2}^{T} \lambda^{k}-A_{2}^{T}\lambda^{k+1}, x_{2}^{k}-x_{2}^{k+1}\bigr\rangle \\ &=\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),\lambda^{k}-\lambda^{k+1}\bigr\rangle . \end{aligned}$$
(15)

It follows from (6) that

$$ -2\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{3}\bigl(x_{3}^{k}-\tilde{x}_{3}^{*} \bigr)\bigr\rangle \geq -\rho\bigl\Vert A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr)\bigr\Vert ^{2}-\frac{1}{\rho}\bigl\Vert A_{3} \bigl(x_{3}^{k}-\tilde {x}_{3}^{*}\bigr) \bigr\Vert ^{2} $$
(16)

and

$$\begin{aligned} -\bigl\Vert A_{3}\bigl(x_{3}^{k}-x_{3}^{k+1} \bigr)\bigr\Vert ^{2} =&-\bigl\Vert A_{3} \bigl(x_{3}^{k}-\tilde {x}_{3}^{*} \bigr)-A_{3}\bigl(x_{3}^{k+1}-\tilde{x}_{3}^{*} \bigr)\bigr\Vert ^{2} \\ =&-\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}+2\bigl\langle A_{3}\bigl(x_{3}^{k}-\tilde {x}_{3}^{*}\bigr),A_{3}\bigl(x_{3}^{k+1}- \tilde{x}_{3}^{*}\bigr) \bigr\rangle \\ &{} -\bigl\Vert A_{3}\bigl(x_{3}^{k+1}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2} \\ \ge&-2\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}-2\bigl\Vert A_{3}\bigl(x_{3}^{k+1}-\tilde {x}_{3}^{*}\bigr)\bigr\Vert ^{2}. \end{aligned}$$
(17)

Let

$$ P:= \begin{pmatrix}\beta(1+\frac{1}{\rho}) A_{3}^{T}A_{3}&-A^{T}_{3}\\-A_{3}&\frac {1}{\beta}I \end{pmatrix} . $$

It follows from (9) that

$$\begin{aligned}& \bigl(v^{k} -\tilde{v}^{*}\bigr)^{T}G\bigl(v^{k}- \tilde{v}^{*}\bigr) \\& \quad =\bigl[\bigl(v^{k}-v^{k+1}\bigr)+\bigl(v^{k+1}- \tilde {v}^{*}\bigr)\bigr]^{T}G\bigl[\bigl(v^{k}-v^{k+1} \bigr)+\bigl(v^{k+1}-\tilde{v}^{*}\bigr)\bigr] \\& \quad =\bigl(v^{k}-v^{k+1}\bigr)^{T}G \bigl(v^{k}-v^{k+1}\bigr)+2\bigl(v^{k+1}-\tilde {v}^{*} \bigr)^{T}G\bigl(v^{k}-v^{k+1}\bigr) \\& \qquad {}+\bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde{v}^{*}\bigr) \\& \quad \geq\bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde {v}^{*}\bigr)+\bigl(v^{k}-v^{k+1} \bigr)^{T}G\bigl(v^{k}-v^{k+1}\bigr) \\& \qquad {}+2\mu_{3}\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}+2\bigl\langle A_{2} \bigl(x_{2}^{k}-x_{2}^{k+1}\bigr), \lambda^{k}-\lambda^{k+1}\bigr\rangle \\& \qquad {}-2\beta\bigl\langle A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr),A_{3}\bigl(x_{3}^{k}-\tilde {x}_{3}^{*}\bigr)\bigr\rangle , \end{aligned}$$

which together with (15), (16), and (17) gives

$$\begin{aligned}& \bigl(v^{k}-\tilde{v}^{*}\bigr)^{T}G\bigl(v^{k}- \tilde{v}^{*}\bigr) \\& \quad \geq\bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde {v}^{*}\bigr)+\bigl(v^{k}-v^{k+1} \bigr)^{T}G\bigl(v^{k}-v^{k+1}\bigr) \\& \qquad {}+2\mu_{3}\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}-\beta\rho\bigl\Vert A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr)\bigr\Vert ^{2}-\frac{\beta}{\rho}\bigl\Vert A_{3} \bigl(x_{3}^{k}-\tilde {x}_{3}^{*}\bigr) \bigr\Vert ^{2} \\& \quad = \bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde{v}^{*}\bigr)+\beta(1-\rho)\bigl\Vert A_{2} \bigl(x_{2}^{k}-x_{2}^{k+1}\bigr)\bigr\Vert ^{2} \\& \qquad {}+\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1} \\ \lambda^{k}-\lambda^{k+1} \end{pmatrix} -\frac{\beta}{\rho}\bigl\Vert A_{3} \bigl(x_{3}^{k}-x_{3}^{k+1}\bigr)\bigr\Vert ^{2} \\& \qquad {}+2\mu_{3}\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}-\frac{\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k}-\tilde{x}_{3}^{*} \bigr)\bigr\Vert ^{2} \\& \quad \ge\bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde{v}^{*}\bigr)+\beta(1-\rho)\bigl\Vert A_{2} \bigl(x_{2}^{k}-x_{2}^{k+1}\bigr)\bigr\Vert ^{2} \\& \qquad {}+\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1} \\ \lambda^{k}-\lambda^{k+1} \end{pmatrix} +2\mu_{3}\bigl\Vert x_{3}^{k+1}-\tilde{x}^{*}_{3}\bigr\Vert ^{2} \\& \qquad {}-\frac{2\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k+1}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}- \frac {2\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}- \frac{\beta}{\rho }\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2} \\& \quad \geq\bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde{v}^{*}\bigr)+\frac{\beta }{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k+1}-\tilde{x}_{3}^{*} \bigr)\bigr\Vert ^{2}+\frac{2\beta}{\rho}\bigl\Vert A_{3} \bigl(x_{3}^{k+1}-\tilde{x}_{3}^{*}\bigr) \bigr\Vert ^{2} \\& \qquad {}+\beta(1-\rho)\bigl\Vert A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr)\bigr\Vert ^{2}+\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1} \\ \lambda^{k}-\lambda^{k+1} \end{pmatrix} \\& {}-\frac{\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}- \frac {2\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k}- \tilde{x}_{3}^{*}\bigr)\bigr\Vert ^{2}+\biggl(2 \mu_{3}-\frac {5\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho}\biggr)\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}, \end{aligned}$$

which implies that

$$\begin{aligned}& \bigl(v^{k+1}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k+1}-\tilde{v}^{*}\bigr)+\frac{3\beta}{\rho }\bigl\Vert A_{3}\bigl(x_{3}^{k+1}-\tilde{x}_{3}^{*} \bigr)\bigr\Vert ^{2} \\& \quad \leq\bigl(v^{k}-\tilde{v}^{*}\bigr)^{T}G \bigl(v^{k}-\tilde{v}^{*}\bigr)+\frac{3\beta}{\rho}\bigl\Vert A_{3}\bigl(x_{3}^{k}-\tilde{x}_{3}^{*} \bigr)\bigr\Vert ^{2}-\biggl(2\mu_{3}-\frac{5\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho} \biggr)\bigl\Vert x_{3}^{k+1}-\tilde{x}^{*}_{3}\bigr\Vert ^{2} \\& \qquad {}-\beta(1-\rho)\bigl\Vert A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr)\bigr\Vert ^{2}-\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1} \\ \lambda^{k}-\lambda^{k+1} \end{pmatrix} . \end{aligned}$$
(18)

Using the notation \(G_{1}\) and (18), we have

$$\begin{aligned} \bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \leq&\bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}-\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1}\\\lambda^{k}-\lambda^{k+1} \end{pmatrix} \\ &{} -\biggl(2\mu_{3}-\frac{5\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho}\biggr)\bigl\Vert x_{3}^{k+1}-\tilde {x}^{*}_{3}\bigr\Vert ^{2}-\beta(1-\rho)\bigl\Vert A_{2}\bigl(x_{2}^{k}-x_{2}^{k+1} \bigr)\bigr\Vert ^{2}. \end{aligned}$$
(19)

To prove such \(\eta>0\) exists for (14), we only need \(2\mu_{3}-\frac{5\beta\|A_{3}^{T}A_{3}\|}{\rho}>0\), which holds if \(\beta<2\rho\mu_{3}/(5\|A_{3}^{T}A_{3}\|)\). □

Now, we are ready to prove the convergence of the sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) generated by the direct extension of ADMM (5a)-(5d) under Assumption 3.1. The result is summarized in the following theorem.

Theorem 3.1

Let the iterative sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) be generated by the direct extension of ADMM (5a)-(5d) with \(\beta\in(0,2\rho\mu_{3}/(5\|A_{3}^{T}A_{3}\|))\) and \(\rho\in(0,1)\) defined in Lemma  3.2. Suppose Assumption  3.1 holds. Then the sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) converges to a KKT point in \(\mathcal{W}^{*}\).

Proof

It follows from (14) that

$$ \lim_{k\rightarrow+\infty}\bigl\Vert v^{k}-v^{k+1} \bigr\Vert _{G_{1}}=0 $$
(20)

and the sequence \(\{v^{k}\}\) is bounded. Equation (5c) then further implies that \(\{x^{k}_{1}\}\) is also bounded and hence the sequence \(\{ (x_{1}^{k},x_{2}^{k},x_{3}^{k},\lambda^{k})\}\) generated by (5a)-(5d) is bounded. The boundedness of the sequence \(\{(x_{1}^{k},x_{2}^{k},x_{3}^{k},\lambda ^{k})\}\) indicates that there is at least one cluster point of \(\{ (x_{1}^{k},x_{2}^{k},x_{3}^{k},\lambda^{k})\}\). Let \(\bar{w}:=(\bar{x}_{1},\bar {x}_{2},\bar{x}_{3},\bar{\lambda})\) be an arbitrary cluster point of \(\{ (x_{1}^{k},x_{2}^{k},x_{3}^{k},\lambda^{k})\}\) and \(\{ (x_{1}^{k_{j}},x_{2}^{k_{j}},x_{3}^{k_{j}},\lambda^{k_{j}})\}\) be the subsequence converging to w̄. By the inequality (19), we have \(\bar{x}_{3}=\tilde{x}_{3}^{*}\). It follows from (5c) and (11a)-(11c) that

$$\begin{aligned}& 0\in\partial\theta_{1}\bigl(x_{1}^{k_{j}} \bigr)-A_{1}^{T}\lambda^{k_{j}}+\beta A_{1}^{T}A_{2}\bigl(x_{2}^{k_{j}-1}-x_{2}^{k_{j}} \bigr), \end{aligned}$$
(21a)
$$\begin{aligned}& 0\in\partial\theta_{2}\bigl(x_{2}^{k_{j}} \bigr)-A_{2}^{T}\lambda^{k_{j}}, \end{aligned}$$
(21b)
$$\begin{aligned}& 0\in\partial\theta_{3}\bigl(x_{3}^{k_{j}} \bigr)-A_{3}^{T}\lambda^{k_{j}}+ A_{3}^{T} \bigl(\lambda ^{k_{j}-1}-\lambda^{k_{j}}\bigr)+P\bigl(x_{3}^{k_{j}}-x_{3}^{k_{j}-1} \bigr), \end{aligned}$$
(21c)
$$\begin{aligned}& 0=\frac{1}{\beta}\bigl(\lambda^{k_{j}}-\lambda ^{k_{j}-1} \bigr)+A_{1}x_{1}^{k_{j}}+A_{2}x_{2}^{k_{j}}+A_{3}x_{3}^{k_{j}}-b+A_{3} \bigl(x_{3}^{k_{j}-1}-x_{3}^{k_{j}} \bigr). \end{aligned}$$
(21d)

Taking the limit in (21a)-(21d) and using (20), we obtain

$$\begin{aligned}& 0\in\partial\theta_{1}(\bar{x}_{1})-A_{1}^{T} \bar{\lambda}, \end{aligned}$$
(22a)
$$\begin{aligned}& 0\in\partial\theta_{2}(\bar{x}_{2})-A_{2}^{T} \bar{\lambda}, \end{aligned}$$
(22b)
$$\begin{aligned}& 0\in\partial\theta_{3}(\bar{x}_{3})-A_{3}^{T} \bar{\lambda}, \end{aligned}$$
(22c)
$$\begin{aligned}& 0=A_{1}\bar{x}_{1}+A_{2}\bar{x}_{2}+A_{3} \bar{x}_{3}-b, \end{aligned}$$
(22d)

which implies that \((\bar{x}_{1},\bar{x}_{2},\bar{x}_{3},\bar{\lambda})\) is a KKT point in \(\mathcal{W}^{*}\). It follows from (14) and (5c) that the iterative sequence \(\{(x_{1}^{k},x_{2}^{k},x_{3}^{k},\lambda^{k})\}\) generated by the direct extension of ADMM (5a)-(5d) converges to a KKT point in \(\mathcal {W}^{*}\). The proof is completed. □

Remark 3.1

If the sequence \(\{x_{3}^{k}\}\) is bounded, then Assumption 3.1 can be substituted by the following.

Assumption 3.2

There exist \((\tilde{x}_{1}^{*},\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde {\lambda}^{*})\in\mathcal{W}^{*}\) and a real number \(\mu_{3}>0\) such that

$$ \bigl\langle y_{3}-A_{3}^{T}\tilde{ \lambda}^{*},x_{3}-\tilde{x}^{*}_{3}\bigr\rangle \geq\mu _{3}\bigl\Vert x_{3}-\tilde{x}^{*}_{3}\bigr\Vert ^{2},\quad \text{for all } x_{3}\in \mathbf {B}_{a}\bigl(\tilde{x}_{3}^{*}\bigr) \text{ and } y_{3}\in\partial\theta_{3}(x_{3}), $$
(23)

where \(a=\max_{k}\{\|x_{3}^{k}-\tilde{x}^{*}_{3}\|\}\).

3.2 Global linear convergence

Cai et al. [29] show that the globally linear convergence of the direct extension of ADMM (4a)-(4d) can be ensured if \(\theta_{2}\) and \(\theta_{3}\) are strongly convex. In this subsection, we will show that the globally linear convergence rate of the direct extension of ADMM (5a)-(5d) can be ensured under weaker conditions. More precisely, we establish the globally linear convergence result for the iterative scheme (5a)-(5d) by showing that there exist \(\sigma\in(0,1)\) and \(\eta _{1}>0\) such that

$$ \bigl\Vert v^{k}-v^{k+1}\bigr\Vert _{G_{1}}\leq\sigma^{k}\frac{\Vert v^{0}-\hat{v}^{*}\Vert _{G_{1}}}{\sqrt{\eta_{1}}}. $$
(24)

Assumption 3.3

\(A^{T}_{2}\) is full column rank. For any \((x_{1}^{*},x_{2}^{*},x_{3}^{*},\lambda^{*})\in \mathcal{W}^{*}\), there exists a real number \(\mu_{2}>0\) such that

$$ \bigl\langle y_{2}-A_{2}^{T} \lambda^{*},x_{2}-x^{*}_{2}\bigr\rangle \geq\mu_{2}\bigl\Vert x_{2}-x^{*}_{2}\bigr\Vert ^{2}, \quad \text{for all } x_{2}\in\mathcal{R}^{n_{2}} \text{ and } y_{2}\in \partial\theta_{2}(x_{2}). $$
(25)

Theorem 3.2

Let the iterative sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) be generated by the direct extension of ADMM (5a)-(5d) with \(\beta\in(0,2\rho\mu_{3}/(5\|A_{3}^{T}A_{3}\|))\) and \(\rho\in(0,1)\) defined in Lemma  3.2. Suppose Assumption  3.1 and Assumption  3.3 hold. If the function \(\theta_{2}\) is differentiable, and its gradient \(\nabla\theta_{2}\) is Lipschitz continuous with positive constant \(L_{2}\), then there exists \(\delta>0\) such that (24) holds.

Proof

Since Assumption 3.1 and Assumption 3.3 hold, following the same discussions as of Lemma 3.1 and Lemma 3.3, we have

$$\begin{aligned} \bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \leq&\bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \\ &{}-\bigl(x_{3}^{k}-x_{3}^{k+1}, \lambda^{k}-\lambda^{k+1}\bigr)P \begin{pmatrix}x_{3}^{k}-x_{3}^{k+1} \\ \lambda^{k}-\lambda^{k+1} \end{pmatrix} -\beta(1-\rho)\bigl\Vert A_{2} \bigl(x_{2}^{k}-x_{2}^{k+1}\bigr)\bigr\Vert ^{2} \\ &{}-2\mu_{2}\bigl\Vert x_{2}^{k+1}- \tilde{x}^{*}_{2}\bigr\Vert ^{2}-\biggl(2\mu_{3}- \frac{5\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho}\biggr)\bigl\Vert x_{3}^{k+1}- \tilde{x}^{*}_{3}\bigr\Vert ^{2}, \end{aligned}$$

where \(\mu_{2}>0\) and \(\mu_{3}>0\). Thus, there is a real number \(\eta _{1}>0\) such that

$$\begin{aligned} \bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}-\bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \geq&\eta _{1}\bigl\Vert v^{k}-v^{k+1}\bigr\Vert _{G_{1}}^{2}+2 \mu_{2}\bigl\Vert x_{2}^{k+1}-\tilde{x}^{*}_{2} \bigr\Vert ^{2} \\ &{} +\biggl(2\mu_{3}-\frac{5\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho}\biggr)\bigl\Vert x_{3}^{k+1}-\tilde {x}^{*}_{3}\bigr\Vert ^{2}. \end{aligned}$$
(26)

Since \(\theta_{2}\) is differentiable and \(\nabla\theta_{2}\) is Lipschitz continuous with positive constant \(L_{2}\), it follows from (7b) and (11b) that

$$\bigl\Vert x_{2}^{k+1}-\tilde{x}^{*}_{2}\bigr\Vert \geq\frac{1}{L_{2}}\bigl\Vert A_{2}^{T}\bigl( \lambda ^{k+1}-\tilde{\lambda}^{*}\bigr)\bigr\Vert , $$

which together with (26) yields

$$\begin{aligned} \bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}-\bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \geq&\eta _{1}\bigl\Vert v^{k}-v^{k+1}\bigr\Vert _{G_{1}}^{2}+ \biggl(2\mu_{3}-\frac{\beta \Vert A_{3}^{T}A_{3}\Vert }{\rho}\biggr)\bigl\Vert x_{3}^{k+1}-\tilde{x}^{*}_{3}\bigr\Vert ^{2} \\ &{}+\mu_{2}\bigl\Vert x_{2}^{k+1}- \tilde{x}^{*}_{2}\bigr\Vert ^{2}+\mu_{2} \frac{1}{L_{2}^{2}}\bigl\Vert A_{2}^{T}\bigl( \lambda^{k+1}-\tilde{\lambda}^{*}\bigr)\bigr\Vert ^{2}. \end{aligned}$$
(27)

Since the matrix \(A_{2}^{T}\) is full column rank, the inequality (27) implies that there exists \(\delta>0\) such that

$$ \bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}\geq(1+\delta)\bigl\Vert v^{k+1}- \tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}. $$
(28)

Using (27) again, we obtain

$$\begin{aligned} \eta_{1}\bigl\Vert v^{k}-v^{k+1} \bigr\Vert ^{2}_{G_{1}} \leq&\bigl\Vert v^{k}- \tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}-\bigl\Vert v^{k+1}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \\ \leq&\bigl\Vert v^{k}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}} \leq\frac{1}{(1+\delta)^{k}}\bigl\Vert v^{0}-\tilde{v}^{*}\bigr\Vert ^{2}_{G_{1}}. \end{aligned}$$
(29)

Let \(\sigma=\frac{1}{\sqrt{1+\delta}}\), we see that (24) holds. □

Notice that if \(\|v^{k}-v^{k+1}\|^{2}_{G}=0\), it follows from (5c) and (11a)-(11c) that

$$\begin{aligned} \begin{aligned} &0\in\partial\theta_{1}\bigl(x_{1}^{k+1} \bigr)-A_{1}^{T}\lambda^{k+1}, \\ &0\in\partial\theta_{2}\bigl(x_{2}^{k+1} \bigr)-A_{2}^{T}\lambda^{k+1}, \\ &0\in\partial\theta_{3}\bigl(x_{3}^{k+1} \bigr)-A_{3}^{T}\lambda^{k+1}, \\ &0=A_{1}x_{1}^{k+1}+A_{2}x_{2}^{k+1}+A_{3}x_{3}^{k+1}-b, \end{aligned} \end{aligned}$$

which shows that \((x_{1}^{k+1},x_{2}^{k+1},x_{3}^{k+1},\lambda^{k+1})\) is a solution of (7a)-(7d). Thus, Theorem 3.2 establishes a globally linear convergence rate for the direct extension of ADMM (5a)-(5d), and it inspires an easily implementable stopping criterion for implementing ADMM (5a)-(5d):

$$ \max\biggl\{ \frac{\|x_{2}^{k}-x_{2}^{k+1}\|}{1+\|x^{k}_{2}\|},\frac{\| x_{3}^{k}-x_{3}^{k+1}\|}{1+\|x^{k}_{3}\|}, \frac{\|\lambda^{k}-\lambda^{k+1}\| }{1+\|\lambda^{k}\|}\biggr\} < \epsilon. $$
(30)

4 Example

Chen et al. [15] constructed the following example of solving a 3-dimensional linear system:

$$ \begin{aligned} &\min 0\times x_{1}+0\times x_{2}+0\times x_{3} \\ &\quad \mbox{s.t. } \begin{pmatrix}1&1&1\\1&1&2\\1&2&2 \end{pmatrix} \begin{pmatrix}x_{1}\\x_{2}\\x_{3} \end{pmatrix} = \begin{pmatrix}0\\0\\0 \end{pmatrix} \end{aligned} $$
(31)

to show that the directly extended alternating direction method of multipliers applied to the above 3-block (treating each variable as one block) optimization problem will diverge.

We replace the item \(0\times x_{3}\) by \(\mu\|A_{3}x_{3}\|_{1}+\delta (x_{3};\mathbf{B}_{r}(0))\) with \(\mu>0\) and arbitrary \(r>0\) in (31), and obtain

$$ \begin{aligned} &\min 0\times x_{1}+0\times x_{2}+\mu\|A_{3}x_{3}\|_{1}+\delta \bigl(x_{3};\mathbf {B}_{r}(0)\bigr) \\ &\quad \mbox{s.t. } \begin{pmatrix}1&1&1\\1&1&2\\1&2&2 \end{pmatrix} \begin{pmatrix}x_{1}\\x_{2}\\x_{3} \end{pmatrix} = \begin{pmatrix}0\\0\\0 \end{pmatrix} . \end{aligned} $$
(32)

The example (32) can be rewritten as the 3-block optimization problem (1) with the following specifications:

  • \(\theta_{1}(x_{1}):=0\times x_{1}\), \(\theta _{2}(x_{2}):=0\times x_{2}\), \(\theta_{3}(x_{3}):=\mu\|A_{3}x_{3}\|_{1}+\delta (x_{3};\mathbf{B}_{r}(0))\);

  • The coefficients \(A_{i}\) (\(i = 1, 2, 3\)) and the vector b are given by

    $$A_{1}:= \begin{pmatrix}1\\1\\1 \end{pmatrix} , \qquad A_{2}:= \begin{pmatrix}1\\1\\2 \end{pmatrix} , \qquad A_{3}:= \begin{pmatrix}1\\2\\2 \end{pmatrix} , \qquad b:= \begin{pmatrix}0\\0\\0 \end{pmatrix} . $$

Obviously, \(\theta_{3}(x_{3})\) is not strongly convex. For problem (32), \((x_{1},x_{2},x_{3},\lambda)=(0,0,0,0)\) is a KKT point. If the iterative sequence \(\{(x_{1}^{k}, x^{k}_{2}, x^{k}_{3}, \lambda^{k})\}\) is generated by the direct extension of ADMM (5a)-(5d), then we have \(x_{3}^{k}\in\mathbf{B}_{r}(0)\), ∀k. To justify the convergence of the direct extension of ADMM (5a)-(5d) applied to (32), one just needs to show that Assumption 3.2 holds at \((\tilde{x}_{1}^{*},\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde{\lambda }^{*})=(0,0,0,0)\).

For any \(x_{3}\in\mathbf{B}_{r}(\tilde{x}_{3}^{*})\), we have

$$ \mu\|A_{3}x_{3}\|_{1}=5\mu|x_{3}|\ge \frac{5\mu}{r}x_{3}^{2}. $$

Thus,

$$ \theta_{3}(x_{3})\geq\theta_{3} \bigl(\tilde{x}_{3}^{*}\bigr)+\bigl\langle A_{3}^{T} \tilde {\lambda}^{*},x_{3}-\tilde{x}_{3}^{*}\bigr\rangle + \alpha^{r}\bigl\Vert x_{3}-\tilde{x}_{3}^{*}\bigr\Vert ^{2},\quad \forall x_{3}\in\mathbf{B}_{r} \bigl(\tilde{x}_{3}^{*}\bigr), $$
(33)

where \(\alpha^{r}=5\mu/r\). On the other hand, since the function \(\theta_{3}\) is convex, we have

$$ \theta_{3}\bigl(\tilde{x}_{3}^{*}\bigr)\geq \theta_{3}(x_{3})+\bigl\langle y_{3},\tilde {x}_{3}^{*}-x_{3}\bigr\rangle ,\quad \forall x_{3}\in \mathbf{B}_{r}\bigl(\tilde {x}_{3}^{*}\bigr) \mbox{ and } \forall y_{3}\in\partial\theta_{3}(x_{3}). $$
(34)

Adding up (33) and (34), we obtain

$$ \bigl\langle y_{3}-A_{3}^{T}\tilde{ \lambda}^{*},x_{3}-\tilde{x}_{3}^{*}\bigr\rangle \ge \alpha^{r}\bigl\| x_{3}-\tilde{x}_{3}^{*}\bigr\| ^{2}, \quad \forall x_{3}\in\mathbf {B}_{r}\bigl( \tilde{x}_{3}^{*}\bigr) \mbox{ and } \forall y_{3}\in\partial \theta_{3}(x_{3}). $$

Thus Assumption 3.2 holds for (32) at \((\tilde{x}_{1}^{*},\tilde{x}_{2}^{*},\tilde{x}_{3}^{*},\tilde{\lambda }^{*})=(0,0,0,0)\) with \(\mu_{3}=5\mu/r\), it follows from Theorem 3.1 that the direct extension of ADMM (5a)-(5d) applied to (32) is convergent.

Remark 4.1

If the function \(\theta_{3}\) is strongly convex, then Assumption 3.1 or Assumption 3.2 holds trivially. The example (32) shows that the direct extension of ADMM (5a)-(5d) applied to (32) is convergent, although \(\theta_{3}\) is not strongly convex. This explains why the original scheme of the direct extension of ADMM works well for some applications even though there is not a strong convex function in the objective.

References

  1. Chandrasekaran, V, Parrilo, PA, Willsky, AS: Latent variable graphical model selection via convex optimization. Ann. Stat. 40, 1935-1967 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. McLachlan, GJ: Discriminant Analysis and Statistical Pattern Recognition. Wiley-Interscience, New York (2004)

    MATH  Google Scholar 

  3. Candès, EJ, Li, X, Ma, Y, Wright, J: Robust principal component analysis. J. ACM 58(3), Article 11 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Tao, M, Yuan, XM: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57-81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gabay, D, Mercier, B: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17-40 (1976)

    Article  MATH  Google Scholar 

  6. Glowinski, R, Marrocco, A: Sur l’approximation, par éléments fins d’ordren, et la résolution, par pénalisation-dualité, d’une classe de problèmes de dirichlet nonlinéares. Rev. Fr. Autom. Inform. Rech. Opér., Anal. Numér. 9, 41-76 (1975)

    MathSciNet  Google Scholar 

  7. Fortin, M, Glowinski, R: Augmented Lagrangian Methods. North-Holland, Amsterdam (1983)

    MATH  Google Scholar 

  8. Glowinski, R: Lectures on Numerical Methods for Nonlinear Variational Problems. Springer, Berlin (1980)

    MATH  Google Scholar 

  9. Boyd, S, Parikh, N, Chu, E, Peleato, B, Eckstein, J: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1-122 (2010)

    Article  MATH  Google Scholar 

  10. Chan, TF, Glowinski, R: Finite element approximation and iterative solution of a class of mildly non-linear elliptic equations. Technical report, Stanford University (1978)

  11. Eckstein, J: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR research report RRR 32-2012, Rutgers University (2012)

  12. He, BS, Liu, H, Wang, ZR, Yuan, XM: A strictly Peaceman-Rachford splitting method for convex programming. SIAM J. Optim. 24, 1011-1040 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lin, Z, Chen, M, Wu, L, Ma, Y: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices. Math. Program. 9(22), 15-26 (2010)

    Google Scholar 

  14. Peng, YG, Ganesh, A, Wright, J, Xu, WL, Ma, Y: Robust alignment by sparse and low rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2233-2246 (2012)

    Article  Google Scholar 

  15. Chen, CH, He, BS, Ye, YY, Yuan, XM: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1-2), 57-79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  16. He, BS, Tao, M, Yuan, XM: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313-340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. He, BS, Tao, M, Yuan, XM: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Manuscript

  18. Chen, G, Teboulle, M: A proximal-based decomposition method for convex minimization problems. Math. Program. 64, 81-101 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  19. Deng, W, Yin, WT: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889-916 (2016)

    Article  MathSciNet  Google Scholar 

  20. Eckstein, J: Some saddle-function splitting methods for convex programming. Optim. Methods Softw. 4, 75-83 (1994)

    Article  MathSciNet  Google Scholar 

  21. Fazel, M, Pong, TK, Sun, DF, Tseng, P: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34, 946-977 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. He, BS, Liao, LZ, Han, D, Yang, H: A new inexact alternating direction method for monotone variational inequalities. Math. Program. 92, 103-118 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  23. Hong, MY, Luo, ZQ: On the linear convergence of the alternating direction method of multipliers. Math. Program. (2016). doi:10.1007/s10107-016-1034-2

    Google Scholar 

  24. Xu, MH, Wu, T: A class of linearized proximal alternating direction methods. J. Optim. Theory Appl. 155, 321-337 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yang, J, Zhang, Y: Alternating direction algorithms for \(l_{1}\) problems in compressive sensing. SIAM J. Sci. Comput. 33, 250-278 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  26. Han, DR, Yuan, XM: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227-238 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Chen, CH, Shen, Y, You, YF: On the convergence analysis of the alternating direction method of multipliers with three blocks. Abstr. Appl. Anal. 2013, Article ID 183961 (2013)

    MathSciNet  MATH  Google Scholar 

  28. Lin, TY, Ma, SQ, Zhang, SZ: On the convergence rate of multi-block ADMM (2014). arXiv:1408.4265

  29. Cai, XJ, Han, DR, Yuan, XM: The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex. Manuscript (2014)

  30. Li, M, Sun, DF, Toh, KC: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia-Pac. J. Oper. Res. 32(4), Article ID 1550024 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  31. Rockafellar, RT: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  32. Boyd, S, Vandenberghe, L: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Sciences Grants (No. 11371116, No. 41071262, and No. 41101243).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingquan Deng.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, H., Wang, J. & Deng, T. On the global and linear convergence of direct extension of ADMM for 3-block separable convex minimization models. J Inequal Appl 2016, 227 (2016). https://doi.org/10.1186/s13660-016-1173-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-016-1173-2

Keywords