# Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

## Abstract

The alternating direction method of multipliers (ADMM) has been widely used to solve linear constrained problems in signal processing, matrix decomposition, machine learning, and many other fields. This paper introduces two linearized ADMM algorithms, namely sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM), which integrate linearized ADMM approach with inertial technique in the full nonconvex framework with nonseparable structure. Iterative schemes are formulated using either partial or full linearization while also incorporating the sequential gradient of the composite term in each subproblem’s update. This adaptation ensures that each iteration utilizes the latest information to improve the efficiency of the algorithms. Under some mild conditions, we prove that the sequences generated by two proposed algorithms converge to the critical points of the problem with the help of KŁ property. Finally, some numerical results are reported to show the effectiveness of the proposed algorithms.

## 1 Introduction

In this paper, we consider the following linearly constrained nonconvex optimization problem with multiple block variables:

\begin{aligned} \begin{aligned}& \underset{x_{i}, y}{\min } \sum_{i = 1}^{n} {{f_{i}} ( {{x_{i}}} )} + g ( {{x_{1}}, {x_{2}},\ldots , {x_{n}} ,y} ), \\ &\quad \text{s.t. }\sum_{i = 1}^{n} {{A_{i}} {x_{i}}} + By =b, \end{aligned} \end{aligned}
(1.1)

where $${x_{i}} \in {\mathbb{R} ^{{p_{i}}}} ( {i = 1,2, \ldots n} )$$ and $$y \in {\mathbb{R} ^{q}}$$ are variables, each $${f_{i}}:{\mathbb{R} ^{{p_{i}}}} \to \mathbb{R} \cup \{ { + \infty } \} ( {i = 1,2, \ldots n} )$$ are proper lower semicontinuous functions, which are nonconvex and (possibly) nonsmooth, $$g:{\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}} \to \mathbb{R}$$ is continuously differentiable, and g is Lipschitz continuous with modulus $$l_{g}>0$$, $${A_{i}} \in {\mathbb{R}^{m \times {p_{i}}}} ( {i = 1,2, \ldots n} ), B \in {\mathbb{R}^{m \times {q}}}$$ are given matrices, and $$b \in {\mathbb{R} ^{m}}$$. Denote $$\mathbf{x}_{[i,j]} = (x_{i},x_{i+1}, \ldots , x_{j-1},x_{j})$$ and $$\mathbf{Ax}_{[j,k]} = \sum_{i=j}^{k}A_{i}x_{i}$$.

The Augmented Lagrangian Function (ALF) of (1.1) is defined as

\begin{aligned} \begin{aligned} L ( {\mathbf{x}_{[1,n]} ,y,\lambda } ) = \sum _{i=1}^{n}{f_{i}} ( {{x_{i}}} ) +g ( \mathbf{x}_{[1,n]},y ) - \langle {\lambda , \mathbf{Ax}_{[1,n]} + By - b} \rangle +\frac{\beta}{2} \Vert \mathbf{Ax}_{[1,n]} + By - b \Vert ^{2}, \end{aligned} \end{aligned}

where $$\lambda \in \mathbb{R}^{m}$$ is the Lagrangian dual variable, and $$\beta >0$$ is a penalty parameter.

The problem (1.1) encapsulates a multitude of nonconvex optimization problems across various domains, including signal processing, image reconstruction, matrix decomposition, machine learning, etc. [13]. When the number of blocks n equals 2, and $$g(\cdot )$$ is identically zero, this problem degenerates into two-block separable problem. If the problem contains merely a mixed term, it becomes similar to the problem in [4]. On the other hand, if variable y is absent, the problem becomes the study in [5]. Hence, problem (1.1) extends the scope of the objective functions found in the literature [46], encompassing a broader range of scenarios with additional variables and potential mixed terms, thereby reflecting the versatility and complexity encountered in contemporary applications.

Indeed, ADMM has been established as a powerful tool for solving two-block separable convex optimization problems [7, 8]. However, its effectiveness and convergence guarantees become much more intricate when dealing with nonconvex problems, especially when the number of blocks exceeds two. Zhang et al. [9] tackled this challenge by proposing a proximal ADMM for solving three-block nonconvex optimization tasks, building upon the groundwork laid by Sun et al. [10]. Meanwhile, Wang et al. [11] proposed an inertial proximal partially symmetric ADMM, suitable for handling multiblock separable nonconvex optimization problems. Hien et al. [12] developed an inertial version of ADMM, referred to as iADMM, which integrated the majorization-minimization principle within each block update step to address a specific class of nonconvex low-rank representation problems. Chao et al. [13] contributed to this area with a linear Bregman ADMM algorithm for nonconvex multiblock optimization problems featuring nonseparable structures.

Inertial technique, initially conceived by Polyak [17], serves as an acceleration strategy that takes into account the dynamics of the optimization process by incorporating information from the last two iterations, thereby mitigating substantial differences between consecutive points. Subsequently, Zavriv et al. [18] expanded the use of the inertial technique to tackle nonconvex optimization problems, marking a significant milestone in broadening the applicability of this methodology. Recently, the inertial technique has seen widespread adoption in conjunction with various optimization algorithms to enhance their performance in solving nonconvex optimization problems. Bot et al. [19] proposed an inertial forward-backward algorithm for the minimization of the sum of two non-convex functions. Attouch et al. [20] introduced an inertial proximal method and a proximal alternating projection method for maximal-monotone problems and minimization problems, respectively. Pock et al. [21] went on to propose a linear Inertial Proximal Alternating Minimization Algorithm (IPAMA) for a diverse range of nonconvex and nonsmooth optimization problems. Building upon these advancements, researchers have successfully integrated the inertial technique with the Alternating Direction Method of Multipliers (ADMM). Hien et al. [22] developed an Inertial Alternating Direction Method of Multipliers (iADMM) specifically designed for a class of nonconvex multiblock optimization problems with nonlinear coupling constraints. Wang et al. [11] also introduced an Inertial Proximal Partially Symmetric ADMM, tailored for nonconvex settings, further highlighting the versatility and efficacy of combining inertial techniques with ADMM in modern optimization methodologies.

Inspired by the previous works [11, 13, 16, 23], in this paper, we construct two new variant linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM) for problem (1.1).

The novelty of this paper can be summarized as follows:

(I) The proposed algorithms combine the inertial effect with the linearization skill. The former improves the feasibility of the algorithms, while the latter contributes to fast convergence.

(II) Unlike conventional approaches such as those in [13], during the linearization phase, the gradient of the mixed term of the $$x_{j}$$-sub-problem is calculated as $${\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,j-1]}^{k+1} ,\mathbf{x}_{[j,k]}^{k} ,{y^{k}}})$$ rather than $${\nabla _{{x_{j}}}}g( {\mathbf{x}_{[1,n]}^{k} ,{y^{k}}} )$$. This distinctive characteristic enables us to linearize the mixed term dynamically based on the progress of the indicator sequence, meaning that each update depends on the current state of the indicators. Consequently, it is referred to as a sequential gradient iteration scheme.

The rest of this paper is organized as follows: In Sect. 2, some necessary preliminaries for further analysis are summarized. Then, we establish the convergence of the two algorithms in Sect. 3. Section 4 shows the validity of the algorithms by some numerical experiments. Finally, some conclusions are drawn in Sect. 5.

## 2 Preliminaries

In this section, we recall some basic notations and preliminary results, which will be used in this paper. Throughout, $${\mathbb{R}^{n}}$$ denotes the n-dimensional Euclidean space, $$\mathbb{R} \cup \{ { + \infty } \}$$ denotes the extended real number set, and $$\mathbb{N}$$ denotes the natural number set. The image space of a matrix $$Q \in {\mathbb{R} ^{m \times n}}$$ is defined as $${\mathop{\mathrm{Im}} } Q: = \{ {Qx:x \in { \mathbb{R}^{n}}} \}$$. If matrix $$Q \ne 0$$, let $${\rho _{\min (Q^{\mathrm{T}}Q)}}$$ denote the smallest positive singular value of the matrix $${Q^{\mathrm{T}}Q}$$. $$\Vert \cdot \Vert$$ represents the Euclidean norm. $$\operatorname{dom} f: = \{ {x \in {\mathbb{R} ^{n}}:f ( x ) < + \infty } \}$$ is the domain of a function $$f:{\mathbb{R} ^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$$, $$\langle {x,y} \rangle = {x^{\mathrm{T}}}y = \sum_{i = 1}^{n} {{x_{i}}{y_{i}}}$$.

### Definition 2.1

([24])

Let $$f:\mathbb{R}^{n}\to \mathbb{R}\bigcup \{+\infty \}$$ be a proper lower semicontinuous function.

(I) The Fréchet subdifferential, or regular subdifferential, of f at $$x\in {\mathrm{dom}} f$$, written $$\hat{\partial} f(x)$$, is defined as

\begin{aligned} \hat{\partial f}(x)= \biggl\{ x^{*}\in \mathbb{R}^{n}:\lim _{y\neq x}\inf_{y \neq x}\frac{f(y)-f(x)-\langle x^{*},y-x\rangle}{ \Vert y-x \Vert }\geq 0 \biggr\} , \end{aligned}

when $$x\notin \operatorname{dom}f$$, we set $$\hat{\partial} f( x) = \emptyset$$.

(II) The limiting-subdifferential, or simply the subdifferential, of f at $$x\in {\mathrm{dom}}f$$, written $$\partial f(x)$$, is defined as

\begin{aligned} \partial f(x)= \bigl\{ x^{*}\in \mathbb{R}^{n}:\exists x_{k}\to x, s.t. f(x_{k}) \to f(x),x_{k}^{*} \in \hat{\partial}f(x), x_{k}^{*}\to x^{*} \bigr\} . \end{aligned}

(III) A point that satisfies

\begin{aligned} 0\in \partial f(x) \end{aligned}

is called a critical point or a stationary point of the function f. The set of critical points of f is denoted by crit f.

### Proposition 2.1

We collect some basic properties of the subdifferential [24].

(I) $$\hat{f}(x) \subseteq \partial f(x)$$ for each $$x\in \mathbb{R}^{n}$$, where the first set is closed convex, and the second set is only closed.

(II) Let $$x_{k}^{*}\in \partial f(x_{k})$$ and $$\lim_{k\to \infty}(x_{k},x_{k}^{*})=(x,x^{*})$$, then, $$x^{*}\in \partial f(x)$$.

(III) If $$f: \mathbb{R}^{n}\to \mathbb{R}\bigcup \{ + \infty \}$$ is proper lower semicontinuous, and $$g:\mathbb{R}^{m}\to \mathbb{R}$$ is continuous differentiable, then $$\partial (f+g)(x)=\partial f(x)+\nabla g(x)$$ for any $$x\in \operatorname{dom}f$$.

### Definition 2.2

If $${\omega ^{*}} = { ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}},{ \lambda ^{*}}} )^{T}}$$ such that

\begin{aligned} \textstyle\begin{cases} A_{i}^{\mathrm{T}}{\lambda ^{*}} \in \partial {f_{i}} ( {x_{i}^{*}} ) + {\nabla _{{x_{i}}}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ),\quad i = 1,2, \ldots n, \\ {B^{\mathrm{T}}}{\lambda ^{*}} = {\nabla _{y}}g ( {x_{1}^{*}, \ldots x_{n}^{*},{y^{*}}} ), \\ {A_{1}}x_{1}^{*} + \cdots + {A_{n}}x_{n}^{*}+B{y^{*}} = b, \end{cases}\displaystyle \end{aligned}
(2.1)

then $${\omega ^{*}}$$ is called a critical point or stationary point of the Lagrangian function $$L ( {x_{1}}, \ldots, {x_{n}},y,\lambda )$$.

A very important technique to prove the convergence of ADMM for nonconvex optimization problems is the assumption that the Lagrangian function satisfies the Kurdyka-Łojasiewicz property (KŁ property) [19, 25]. For notational simplicity, we use $${\Phi _{\eta }} ( {\eta > 0} )$$ to denote the set of concave functions $$\varphi : [ 0, \eta ) \to [ 0, \infty )$$ such that

(I) $$\varphi ( 0 ) = 0$$;

(II) φ is continuously differentiable on $$( {0,\eta } )$$ and continuous at 0;

(III) $$\varphi ' ( s ) > 0$$ for all $$s \in ( {0,\eta } )$$.

The KŁ property can be described as follows.

### Definition 2.3

(see [19, 26]) (KŁ property) Let $$f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$$ be a proper lower semicontinuous function. If there exist $$\eta \in ( 0 , { + \infty } ]$$, a neighborhood U of $${x^{*}}$$, and a continuous concave function $$\varphi \in {\Phi _{\eta }}$$ such that for all $$x \in U \cap \{ {x \in {R^{m}}:f ( {{x^{*}}} ) < f ( x ) < f ( {{x^{*}}} ) + \eta } \}$$, it holds that

\begin{aligned} \varphi ' \bigl( {f ( x ) - f \bigl( {{x^{*}}} \bigr)} \bigr)\operatorname{dist} \bigl( {0,\partial f ( x )} \bigr) \ge 1, \end{aligned}
(2.2)

where the distance from x to S is defined by $$d(x,S):=\inf \{\|y-x\|:y\in S\}$$. Then, f is said to have the KŁ property at $${x^{*}}$$.

### Lemma 2.1

(see [25]) (Uniformized KŁ property) Suppose that $$f:{\mathbb{R}^{n}} \to \mathbb{R} \cup \{ { + \infty } \}$$ is a proper lower semicontinuous function, and Ω is a compact set. If $$f ( x ) \equiv {f^{*}}$$ for all $$x \in \Omega$$ and satisfies the KŁ property at each point of Ω, then there exist $$\varepsilon > 0,\eta > 0$$ and $$\varphi \in {\Phi _{\eta }}$$ such that

\begin{aligned} \varphi ' \bigl( {f ( x ) - {f^{*}}} \bigr) \operatorname{dist} \bigl( {0, \partial f ( x )} \bigr) \ge 1, \end{aligned}
(2.3)

for all $$x \in \{ {x \in {\mathbb{R}^{m}}:\operatorname{dist} ( {x,\Omega } ) < \varepsilon } \} \cap \{ {{f^{*}} < f ( x ) < {f^{*}} + \eta } \}$$.

### Lemma 2.2

(see [25]) (Descent lemma) Let $$h:{\mathbb{R}^{n}} \to \mathbb{R}$$ be a continuous differentiable function where gradient h is Lipschitz continuous with the modulus $${l_{h}} > 0$$, then for any $$x,y \in {\mathbb{R}^{n}}$$, we have

\begin{aligned} \bigl\vert {h ( y ) - h ( x ) - \bigl\langle { \nabla h ( x ),y - x} \bigr\rangle } \bigr\vert ^{2} \le \frac{{{l_{h}}}}{2}{ \Vert {y - x} \Vert ^{2}}. \end{aligned}
(2.4)

### Lemma 2.3

(see [27]) Let $$Q \in {\mathbb{R}^{m \times n}}$$ be a nonzero matrix, and let $${\rho _{\min (Q^{\mathrm{T}}Q)}}$$ denote the smallest positive eigenvalue of $${Q^{\mathrm{T}}Q}$$. Then, for every $$u \in {\mathbb{R}^{n}}$$, it holds that

\begin{aligned} \sqrt {\rho _{\min (Q^{\mathrm{T}}Q)}} \Vert {P_{Q}u} \Vert \le \Vert {Qu} \Vert , \end{aligned}
(2.5)

where $${P_{Q}}$$ denotes the Euclidean projection onto $${\mathrm{Im}}(Q)$$.

## 3 Algorithms and their convergence

In this section, we propose two linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM), and sequential complete linear inertial ADMM (SCLI-ADMM) and prove their convergence with some suitable conditions. Furthermore, we prove the boundedness of the sequence.

### 3.1 Two linear inertial algorithms

First, we present Algorithm 1 for (1.1).

In every iteration of the subproblems, our approach utilizes sequential gradient to update the variables. Specifically, for the $$(k+1)$$th iteration of $$x_{i}$$ $$(i=1,\ldots ,n)$$, the mixed term $$g(\mathbf{x}_{[1,i-1]}^{k+1} ,x_{i},\mathbf{x}_{[i+1,n]}^{k} ,y^{k})$$ is replaced with a linearized approximation that includes an inertial proximal term: $$g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) + \langle x_{i}-x_{i}^{k}, \nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k}) \rangle + \frac{\tau}{2}\| x_{i}-z_{i}^{k} \|^{2}$$. Here, the sequential gradient $$\nabla g_{x_{i}}(\mathbf{x}_{[1,i-1]}^{k+1} ,\mathbf{x}_{[i,n]}^{k} ,y^{k})$$ is refreshed for each subproblem, reflecting the most recent variable updates. Note that the y-subproblem remains unlinearized, so we call it sequential partial linear inertial ADMM.

For $$x_{j}$$-subproblem $$(i=1,\ldots ,n)$$ and y-subproblem, respectively, we get the following auxiliary functions:

\begin{aligned} &\begin{aligned} \hat{f}_{j}^{k}(x_{j})={}&{f_{j}} ( {{x_{j}}} ) + \bigl\langle x_{j}-x_{j}^{k}, \nabla _{x_{j}} g \bigl(\mathbf{x}_{[1,j-1]}^{k+1} , \mathbf{x}_{[j,n]}^{k} ,y^{k} \bigr) \bigr\rangle \\ &{}+ \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} + {A_{j}}x_{j} +\mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {{x_{j}} - z_{j}^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}
(3.1)
\begin{aligned} &\begin{aligned} \hat{h}^{k}(y)=g \bigl( \mathbf{x}_{[1,n]}^{k+1} ,y \bigr) + \frac{\beta }{2} \biggl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}, \end{aligned} \end{aligned}
(3.2)

where

\begin{aligned} \textstyle\begin{cases} z_{1}^{k} = x_{1}^{k} + {\theta _{k}} ( {x_{1}^{k-1} - x_{1}^{k}} ), \\ z_{2}^{k} = x_{2}^{k} + {\theta _{k}} ( {x_{2}^{k-1} - x_{2}^{k}} ), \\ \vdots \\ z_{n}^{k} = x_{n}^{k} + {\theta _{k}} ( {x_{n}^{k-1} - x_{n}^{k}} ), \end{cases}\displaystyle \end{aligned}
(3.3)

and $$\theta _{k}\in [0,\frac{1}{2})$$. Utilizing the auxiliary functions above, the update rules are summarized in Algorithm 1 as follows:

### Remark 1

(I) The auxiliary functions defined in (3.1) own the inertial term $$\frac{\tau}{2}\|x_{i}-z_{i}^{k}\|^{2}$$, $$i=1,2,\ldots ,n$$, respectively. The inertial schemes update the new iteration by employing the two previous iterations. By adding the inertial term to $$x_{i}$$ subproblems, the iteration trends to the direction $$x_{i}^{k}-x_{i}^{k-1}$$.

(II) The purpose of linearizing the mixed term in $$x_{i}$$-subproblem is to use the properties of differentiable blocks and simplify the calculation of each iteration.

(III) The initial point $$\mathbf{x}_{[1,n]}^{-1} =\mathbf{x}_{[1,n]}^{0} = 0, y^{-1}=y^{0}=0$$ was designed for demonstrating the boundedness of the sequence $$\{\omega ^{k}\}$$ generated by the algorithm.

The update rules of Algorithm 2 can be written as follows:

Algorithm 2 is obtained by further linearization on the basis of Algorithm 1. The $$x_{i}$$-subproblems $$(i=1,\ldots ,n)$$ are same to that of Algorithm 1, the iterative scheme can be written as (3.4). During the $$(k+1)$$th iteration for updating y, we replace the function in $$g(\mathbf{x}_{[1,n]}^{k+1} ,y)$$ with a linearized approximation plus a regularization term $$g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) + \langle y-y^{k}, \nabla g_{y}(\mathbf{x}_{[1,n]}^{k+1},y^{k}) \rangle + \frac{\tau}{2}\|y-y^{k} \|^{2}$$. In Algorithm 2, all the subproblems were linearized and sequential updated, hence we call it the Sequential Complete Linear Inertial ADMM.

The auxiliary function of y-subproblem is as follows

\begin{aligned} \begin{aligned} \bar{{h}}^{k}(y)= \bigl\langle y-y^{k}, \nabla _{y} g \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k} \bigr) \bigr\rangle + \frac{\beta }{2}{ \biggl\Vert {\mathbf{Ax}_{[1,n]}^{k+1} + By - b - \frac{{{\lambda ^{k}}}}{\beta }} \biggr\Vert ^{2}} + \frac{{{\tau }}}{2}{ \bigl\Vert {y - y^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}
(3.8)

### 3.2 A descent inequality

A crucial element in establishing the convergence of these algorithms is to verify the descent property of the regularized augmented Lagrangian function sequence. To facilitate our analysis, the following notations are introduced throughout this paper. For $$k\ge 1$$,

\begin{aligned} \begin{aligned}&\Delta x_{i}^{k+1} = x_{i}^{k+1}-x_{i}^{k}, \qquad\Delta y^{k+1}=y^{k+1}-y^{k},\qquad \Delta \lambda ^{k+1}=\lambda ^{k+1}-\lambda ^{k}. \\ & \Delta \mathbf{x}_{[i,j]}^{k+1} = \bigl(\Delta x_{i}^{k+1},\ldots , \Delta x_{j}^{k+1} \bigr), \qquad \theta \bigl\Vert \Delta \mathbf{x}_{[i,j]}^{k+1} \bigr\Vert =\sum_{s=i}^{j}\theta \bigl\Vert \Delta x_{s}^{k+1} \bigr\Vert . \end{aligned} \end{aligned}

The convergence analysis relies on the following assumptions:

### Assumption A

(I) g is $$l_{g}$$-Lipschitz differentiable, and g is bounded from below. g is $$l_{g}$$-Lipschitz continuous, i.e.,$$\Vert { \nabla g(u) - \nabla g(v)} \Vert \le {l_{g}} \Vert {u - v} \Vert$$ for all $$u,v \in {\mathbb{R} ^{p_{1}}}\times{\mathbb{R} ^{p_{2}}}\times \cdots \times{\mathbb{R} ^{p_{n}}}\times{\mathbb{R} ^{q}}$$;

(II) $$f_{i}$$, $$i=1,\ldots ,n$$ are proper lower semicontinuous, and $$f_{i}$$ are bounded from below;

(III) The linear operator B is surjective, i.e., $$B\neq 0$$ and $$\{b\}\bigcup \{\bigcup_{i=1}^{n} \mathop{\mathrm{Im}}A_{i} \} \subset \mathop{\mathrm{Im}}B$$;

(IV) For Algorithm 1 and Algorithm 2, $$\theta _{k} \in [0,\frac{1}{2} )$$, $\tau >0$ and β is large enough such that $$\tau > \frac{2+l_{g}}{1-2\theta _{k}}$$, $$\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \}$$;

(V) Let $$X:= {\mathbb{R}}^{p_{1}}\times \cdots \times{\mathbb{R}}^{p_{n}} \times{\mathbb{R}}^{q}\times{\mathbb{R}}^{m}$$. The set $$\{\omega \in X:L_{\beta}(\omega )\leq L_{\beta}({\omega}^{0}) \}$$ is bounded.

For showing the descent property, the following lemmas are necessary.

### Lemma 3.1

For Algorithm 1, for each $$k \in { N}$$, we have

\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}
(3.9)

For Algorithm 2, for each $$k \in { N}$$, we have

\begin{aligned} \begin{aligned} { \bigl\Vert {\Delta \lambda ^{k + 1}} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}
(3.10)

### Proof

Using Assumption A(III) and Lemma 2.3, we have

\begin{aligned} \bigl\Vert {{\Delta \lambda ^{k + 1}}} \bigr\Vert \le \frac{1}{{\sqrt {\rho _{\min (B^{\mathrm{T}}B)}} }} \bigl\Vert {B^{\mathrm{T}}} { \Delta \lambda ^{k + 1}} \bigr\Vert . \end{aligned}
(3.11)

For Algorithm 1, the optimal condition of y-subproblem in (3.2) yields

\begin{aligned} \begin{aligned} 0 = {\nabla _{y}}g \bigl( \mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr) - {B^{\mathrm{T}}} {\lambda ^{k}} + \beta {B^{\mathrm{T}}} \bigl( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b \bigr) + {\tau } \bigl({\Delta y^{k + 1}} \bigr) . \end{aligned} \end{aligned}

Since $${\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( {\mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k + 1}} - b})$$, we have

\begin{aligned} \begin{aligned} {B^{\mathrm{T}}} {\lambda ^{k + 1}} = {\nabla _{y}}g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}} \bigr)+\tau \bigl(\Delta y^{k+1} \bigr) . \end{aligned} \end{aligned}
(3.12)

Let $${u^{k}}=(\mathbf{x}_{[1,n]}^{k},{y^{k}})$$. Using Assumption A (I) and (3.12), we have

\begin{aligned} \begin{aligned} &{ \bigl\Vert {{B^{\mathrm{T}}} { \lambda ^{k + 1}} - {B^{\mathrm{T}}} {\lambda ^{k}}} \bigr\Vert ^{2}} \\ &\quad={ \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) + \tau \Delta y^{k+1} - \tau \Delta y^{k} \bigr\Vert ^{2}} \\ &\quad= \bigl\Vert {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k+1} \bigr\Vert ^{2} + \bigl\Vert \tau \Delta y^{k} \bigr\Vert ^{2} - 2 \bigl\langle \tau \Delta y^{k+1} , \tau \Delta y^{k} \bigr\rangle \\ &\qquad{} - 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - { \nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k} \bigr\rangle + 2 \bigl\langle {\nabla _{y}}g \bigl(u^{k+1} \bigr) - {\nabla _{y}}g \bigl(u^{k} \bigr) , \tau \Delta y^{k+1} \bigr\rangle \\ &\quad\le 3l_{g}^{2}{ \bigl\Vert {\Delta u^{k+1}} \bigr\Vert ^{2}}+3\tau ^{2} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}+3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le 3l_{g}^{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} + 3 \bigl(l_{g}^{2}+ \tau ^{2} \bigr) \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} +3\tau ^{2} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}
(3.13)

It follows from the above mentioned formula and (3.11) that

\begin{aligned} \begin{aligned} { \bigl\Vert \Delta \lambda ^{k + 1} \bigr\Vert ^{2}} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}

For Algorithm 2, similarly, we get

\begin{aligned} \begin{aligned} \bigl\Vert \Delta{\lambda ^{k + 1}} \bigr\Vert ^{2} \le \frac{3l_{g}^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + \frac{3\tau ^{2}}{\rho _{\min (B^{\mathrm{T}}B)}}{ \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2}} + \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}. \end{aligned} \end{aligned}

The proof is completed. □

To brief the analysis, some notations are given below. Let $${w^{k}} = (\mathbf{x}_{[1,n]}^{k},{y^{k}},{\lambda ^{k}}),{u^{k}}=( \mathbf{x}_{[1,n]}^{k},y^{k})$$, $${r_{k}}=\mathbf{Ax}_{[1,n]}^{k} + B{y^{k}} - b$$. The following lemma is important to prove the monotonicity of the sequence $$\{\hat{L}_{\beta }(\hat{w}^{k+1})\}$$ defined as (3.20).

### Lemma 3.2

For Algorithm 1 and Algorithm 2, select $$\theta _{k} \in [0,\frac{1}{2} )$$ and $${\tau},{\beta}$$ large enough to assure $$\tau > \frac{2+l_{g}}{1-2\theta _{k}}$$, $$\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{6(\tau ^{2}+l_{g}^{2})}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \}$$.

Then, for each $$k \in {\mathrm{N}}$$, we have

\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}+ { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + { \delta _{1}} \bigl( \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + { \bigl\Vert \Delta{{y^{k}} } \bigr\Vert ^{2}} \bigr), \end{aligned} \end{aligned}
(3.14)

where $$\delta _{2} >\delta _{1}>0$$.

### Proof

We first give the proof of Algorithm 1.

From (3.1) and (3.4), for $$j=1,\ldots ,n$$, we have

\begin{aligned} \begin{aligned} &{f_{j}} \bigl(x_{j}^{k + 1} \bigr) + \bigl\langle {\Delta x_{j}^{k + 1},{ \nabla _{{x_{j}}}}g \bigl( \mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\qquad{}- \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j]}^{k+1} + \mathbf{Ax}_{[j+1,n]}^{k} + B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,j-1]}^{k+1} +\mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\Vert ^{2}} \\ &\quad\le{f_{j}} \bigl(x_{j}^{k} \bigr) - \bigl\langle {{\lambda ^{k}},\mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,j-1]}^{k+1} + \mathbf{Ax}_{[j,n]}^{k} + B{y^{k}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k} - z_{j}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{j}^{k + 1} - z_{j}^{k}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}

From (3.2) and (3.5), we have

\begin{aligned} \begin{aligned} &g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{ \lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}}, \mathbf{Ax}_{[1,n]}^{k+1}+ B{y^{k}} - b} \bigr\rangle + \frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k}} - b \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}

Adding up the above mentioned formulas from $$j=1,\ldots ,n$$, we have

\begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) + g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{} - \sum_{i=1}^{n}{ \bigl\langle {\Delta x_{i}^{k + 1} ,{ \nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle } + \frac{{{\tau }}}{{\mathrm{{2}}}}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\qquad{} - \frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}} \bigl\Vert {y^{k + 1} - y^{k}} \bigr\Vert ^{2}, \end{aligned}

hence

\begin{aligned} \begin{aligned} &\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k + 1} \bigr) + g \bigl({u^{k + 1}} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k + 1}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k + 1}}} \Vert ^{2}} \\ &\quad\le\sum_{i=1}^{n}{f_{i}} \bigl(x_{i}^{k} \bigr) +g \bigl(u^{k} \bigr) - \bigl\langle {{\lambda ^{k}},{r_{k}}} \bigr\rangle + \frac{\beta }{2}{ \Vert {{r_{k}}} \Vert ^{2}} \\ &\qquad{}+ \underbrace{g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle }_{ \mathcal{A}} \\ & \qquad{}\underbrace{+\frac{{{\tau }}}{{\mathrm{{2}}}}\sum_{i=1}^{n}{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - \frac{\tau}{2}\sum _{i=1}^{n}{ \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}}}_{ \mathcal{B}} - \frac{\tau }{2} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}

One the one hand, from Lemma 2.2, part $$\mathcal{A}$$ can be written as

\begin{aligned} \begin{aligned} &g \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k}} \bigr)- g \bigl(u^{k} \bigr) - \sum_{i=1}^{n} \bigl\langle {\Delta x_{i}^{k + 1}, {\nabla _{{x_{i}}}}g \bigl( \mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k}, y^{k} \bigr)} \bigr\rangle \\ &\quad=\sum_{i=1}^{n} \bigl\lbrace g \bigl( \mathbf{x}_{[1,i]}^{k+1}, \mathbf{x}_{[i+1,n]}^{k},{y^{k}} \bigr)\\ &\qquad{}- g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr) - \bigl\langle {\Delta x_{i}^{k + 1},{\nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k+1}, \mathbf{x}_{[i,n]}^{k},y^{k} \bigr)} \bigr\rangle \bigr\rbrace \\ &\quad\le\frac{l_{g}}{2} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}. \end{aligned} \end{aligned}
(3.15)

On the other hand, by the definitions of $$z_{i}^{k}, i=1,2,\ldots ,n$$, we have

\begin{aligned} \begin{aligned} &{ \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}} - { \bigl\Vert {x_{i}^{k + 1} - z_{i}^{k}} \bigr\Vert ^{2}} \\ &\quad= \theta _{k}^{2}{ \bigl\Vert {x_{i}^{k-1} - x_{i}^{k}} \bigr\Vert ^{2}} - \bigl\Vert {x_{i}^{k+1} - x_{i}^{k} + {\theta _{k}} \bigl(x_{i}^{k} - x_{i}^{k - 1} \bigr)} \bigr\Vert { ^{2}} \\ &\quad= - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} - 2{\theta _{k}} \bigl\langle {x_{i}^{k} - x_{i}^{k + 1},x_{i}^{k} - x_{i}^{k - 1}} \bigr\rangle \\ &\quad\le - { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k+1} - x_{i}^{k}} \bigr\Vert ^{2}} + {\theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}} \\ &\quad=- (1 - {\theta _{k}}){ \bigl\Vert {x_{i}^{k+1}- x_{i}^{k}} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert {x_{i}^{k} - x_{i}^{k - 1}} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}

Thus, it can be inferred from part $$\mathcal{B}$$ that

\begin{aligned} \begin{aligned} {\sum_{i=1}^{n} \bigl\Vert {x_{i}^{k} - z_{i}^{k}} \bigr\Vert ^{2}}-{ \sum_{i=1}^{n} \bigl\Vert {x_{i}^{k+1} - z_{i}^{k}} \bigr\Vert ^{2}} \le - (1 - {\theta _{k}}){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}}. \end{aligned} \end{aligned}
(3.16)

From Lemma 2.2, (3.15) and (3.16), we obtain

\begin{aligned} \begin{aligned} {L_{\beta }} \bigl( \mathbf{x}_{[1,n]}^{k+1},y^{k+1},\lambda ^{k} \bigr) \le {}& {L_{ \beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} \\ &{}-\frac{\tau}{2} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{\tau \theta _{k} }}{\mathrm{{2}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} . \end{aligned} \end{aligned}
(3.17)

Recall that

\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \bigl\langle {\Delta{\lambda ^{k+1}} ,{r_{k + 1}}} \bigr\rangle \\ &= {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) + \frac{1}{\beta} \bigl\langle \Delta{\lambda ^{k+1}}, \Delta{\lambda ^{k+1}} \bigr\rangle \\ &\le {L_{\beta }} \bigl(\mathbf{x}_{[1,n]}^{k+1},{y^{k + 1}},{ \lambda ^{k}} \bigr) +\frac{1}{\beta} \bigl\Vert {\Delta{\lambda ^{k+1}}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}
(3.18)

Submitting (3.9) and (3.17) into (3.18), we have

\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{{{l_{g}}}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{{{\tau (1-{\theta _{k}}) }}}{{\mathrm{{2}}}}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} - \frac{\tau}{2}{ \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{\theta _{k}} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} \\ &\qquad{}+ \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} { \bigl\Vert \Delta y^{k + 1} \bigr\Vert ^{2}}+ \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}} } { \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}} \\ &\quad={L_{\beta }} \bigl({w^{k}} \bigr) - \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}- \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr) \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert ^{2} + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}

Hence,

\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta y^{k} \bigr\Vert ^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}

Since $$\beta > \max \{ \frac{3(l_{g}^{2}+\tau ^{2})}{\rho _{\min (B^{\mathrm{T}}B)}}, \frac{3 (\tau ^{2}+l_{g}^{2} )}{\tau \theta _{k}\rho _{\min (B^{\mathrm{T}}B)}} \}$$, which further implies $$\frac{6(l_{g}^{2}+\tau ^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}} < 1$$ and $$\frac{\tau \theta _{k}}{2} > \frac{3(\tau ^{2}+l_{g}^{2})}{\beta \rho _{\min (B^{\mathrm{T}}B)}}$$, then have

\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( \frac{\tau}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(\tau ^{2}+l_{g}^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2} \\ &\quad\le {L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}

Let $$\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 , \delta _{1}=\frac{\tau}{2}\theta _{k}$$. We get

\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert \Delta{{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}
(3.19)

Since , which further implies that $${ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2}$$, we obtain $$\delta _{2} >\delta _{1}>0$$. That is, (3.14) holds.

Similarly, for Algorithm 2, we obtain

\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr) + \biggl( { \frac{\tau (1 - \theta _{k} )}{2}} - \frac{l_{g}}{2} - \frac{3l_{g}^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} \\ &\qquad{}+ \biggl( \frac{\tau}{2} - \frac{l_{g}}{2} - \frac{3\tau ^{2}}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr)+ \frac{\tau \theta _{k}}{2} { \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{3(l_{g}^{2}+\tau ^{2})}{\beta {\rho _{\min (B^{\mathrm{T}}B)}}} \bigl\Vert {\Delta y^{k} } \bigr\Vert ^{2}. \end{aligned} \end{aligned}

Since , which further implies and , it follows that

\begin{aligned} \begin{aligned} &{L_{\beta }} \bigl({w^{k + 1}} \bigr)+ \biggl( { \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2} + \biggl( { \frac{\tau (1-\theta _{k})}{2}} - \frac{l_{g}}{2} - 1 \biggr){ \bigl\Vert \Delta y^{k + 1} \bigr\Vert }^{2} \\ &\quad\le{L_{\beta }} \bigl({w^{k}} \bigr) + \frac{\tau \theta _{k}}{2}{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert }^{2} + \frac{\tau \theta _{k}}{2} \bigl\Vert \Delta{y^{k}} \bigr\Vert ^{2}. \end{aligned} \end{aligned}

Let $$\delta _{2}={\frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2}-1 , \delta _{1}=\frac{\tau}{2}\theta _{k}$$. We have

\begin{aligned} \begin{aligned} {L_{\beta }} \bigl({w^{k + 1}} \bigr) + {\delta _{2}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2}} + { \bigl\Vert {\Delta{y^{k + 1}}} \bigr\Vert ^{2}} \bigr) \le {L_{\beta }} \bigl({w^{k}} \bigr) + {\delta _{1}} \bigl({ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k} \bigr\Vert ^{2}} + { \bigl\Vert \Delta y^{k} \bigr\Vert ^{2}} \bigr). \end{aligned} \end{aligned}

Since , which further implies that $${ \frac{\tau (1-\theta _{k} )}{2}} - \frac{l_{g}}{2} - 1 > \frac{\tau \theta _{k}}{2}$$, then we get $$\delta _{2}'>\delta _{1}'>0$$. That is, (3.14) holds. The lemma is proved. □

### Remark 2

Based on Lemma 3.2, we can define the following function

\begin{aligned} {\hat{L}_{\beta }} ( \hat{w} ) = {\hat{L}_{\beta }} ( {u,\lambda ,v} ) = {L_{\beta }} ( {u,\lambda } ) + {\delta _{\mathrm{{1}}}} { \Vert {u - v} \Vert ^{2}}, \end{aligned}
(3.20)

where

\begin{aligned} \begin{aligned} u= ( \mathbf{x}_{[1,n]},y ), v =( \tilde{ \mathbf{x}}_{[1,n]}, \tilde{y} ), \hat{w} = (u,\lambda ,v) = ( \mathbf{x}_{[1,n]},y , \lambda , \tilde{\mathbf{x}}_{[1,n]}, \tilde{y} ) \end{aligned} \end{aligned}

and

\begin{aligned} \begin{aligned} { \Vert {u - v} \Vert ^{2}} = { \Vert \mathbf{x}_{[1,n]} - \tilde{\mathbf{x}}_{[1,n]} \Vert ^{2}} + \Vert y - \tilde{y} \Vert ^{2} . \end{aligned} \end{aligned}

Set $${\hat{\omega} ^{k + 1}} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1},{ \lambda ^{{k + 1}}}, \mathbf{x}_{[1,n]}^{k},y^{k} ), u^{k+1} = (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} )$$. Thus,

\begin{aligned} \begin{aligned} {\hat{L}_{\beta }} \bigl(\hat{ \omega}^{k+1} \bigr)={\hat{L}_{\beta }} \bigl( {{u^{{k + 1}}},{ \lambda ^{{k + 1}}},{u^{k}}} \bigr) = {L_{\beta }} \bigl( u^{k+1},{\lambda ^{{k + 1}}} \bigr) + {\delta _{\mathrm{{1}}}} \bigl({{{ \bigl\Vert \Delta u^{k + 1} \bigr\Vert }^{2}}} \bigr). \end{aligned} \end{aligned}
(3.21)

The following lemma implies that the sequence $${\hat{L}_{\beta }} ( {{u^{k}},{\lambda ^{k}},{u^{k-1}}} )$$ is decreasing monotonically.

### Lemma 3.3

Suppose $${\hat{L}_{\beta }}( {\hat{\omega}^{k+1}} )$$ is defined as (3.20). Then, under Assumption A, for Algorithm 1 and Algorithm 2, we have:

\begin{aligned} \hat{ L }_{\beta} \bigl(\hat{\omega}^{k+1} \bigr)+ \delta \bigl( \bigl\Vert \Delta u^{k+1} \bigr\Vert ^{2} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr). \end{aligned}
(3.22)

That is, the sequence $$\{ {{{\hat{L}}_{\beta }}(\hat{\omega}^{k+1})}\}$$ is decreasing.

### Proof

Set $$\delta = {\delta _{2}} - {\delta _{1}} > 0$$. Then the result follows directly from Lemma 3.2. □

### 3.3 The cluster points of $$\{\omega _{k}\}$$ are contained in $$critL$$

In this subsection, together with the closeness of the limiting subdifferential mentioned above, we prove the subsequential convergence of the sequence $$\{\omega ^{k}\}$$. The proof of Algorithm 2 is similar to that of Algorithm 1, so we omit the proof of Algorithm 2 here.

### Lemma 3.4

Suppose $$\lbrace{\omega ^{k}}\rbrace$$ is the sequence generated by Algorithm 1. If Assumption A holds, then the following statements are true:

(I) The sequence $$\{\omega ^{k}\}$$ is bounded. (II) $$\hat{L}_{\beta}(\hat{\omega}^{k})$$ is bounded from below and convergent, additionally,

\begin{aligned} \sum_{k\ge 0} \bigl\Vert \omega ^{k+1}- \omega ^{k} \bigr\Vert ^{2} < +\infty . \end{aligned}
(3.23)

(III) The sequences $$\hat{L}_{\beta}(\hat{\omega}^{k})$$ and $${L}_{\beta}({\omega}^{k})$$ have the same limit $$\hat{L}_{*}$$.

### Proof

(I) Because of the decreasing property of $$\{\hat{L}_{\beta}(\hat{\omega}^{k})\}$$, we get

\begin{aligned} \begin{aligned} L_{\beta} \bigl(\omega ^{k} \bigr) \le \hat{L}_{\beta} \bigl(\hat{\omega}^{k} \bigr)\le \hat{L}_{\beta} \bigl(\hat{\omega}^{0} \bigr) = L_{\beta} \bigl(\omega ^{0} \bigr) + \delta \bigl( \bigl\Vert u^{0}-u^{-1} \bigr\Vert ^{2} \bigr)=L_{\beta} \bigl(\omega ^{0} \bigr), \end{aligned} \end{aligned}

where $$\|u^{0}-u^{-1}\|^{2}$$ is due to the Initialization parameters $$x_{i}^{0}=x_{i}^{-1}, i=1,\ldots ,n$$ and $$y^{0}=y^{-1}$$ in Algorithm 1. Hence, $$\{\omega ^{k}\}\subseteq \{\omega ^{k}\in X:L_{\beta}(\omega )\leq L_{ \beta}({\omega}^{0})\}$$. By Assumption A(V), the sequence $$\{\omega ^{k}\}$$ is bounded.

(II) Since $$\lbrace{\omega ^{k}}\rbrace$$ is bounded, $$\lbrace{\hat{\omega}^{k}}\rbrace$$ is also bounded, and it has at least one cluster point. Let $$\hat{\omega}^{*}$$ be a cluster point of $$\lbrace{\hat{\omega}^{k}}\rbrace$$, and $$\lim_{j\rightarrow +\infty}\hat{\omega}^{k_{j}}={\hat{\omega}^{*}}$$. Because of the fact that $$f_{i} (i=1,2,\ldots ,n)$$ are proper lower semicontinuous, and g is continuously differentiable, then $$\hat{L}_{\beta} (\cdot )$$ is proper lower semicontinuous. Hence, we have

\begin{aligned} \begin{aligned} \lim_{j \to +\infty}\inf \hat{L}_{\beta} \bigl(\hat{\omega}^{k_{j}} \bigr) \ge \hat{L}_{\beta} \bigl(\hat{\omega}^{*} \bigr). \end{aligned} \end{aligned}

According to the boundedness of $$f_{i}$$, g, $$\{\omega ^{k}\}_{k\ge 0}$$ and the definition of $$\hat{L}_{\beta}(\hat{\omega}^{k})$$, we have $$\hat{L}_{\beta}(\omega ^{k})$$ is bounded from below. Thus, $$\hat{L}_{\beta}(\hat{\omega}^{k_{j}})$$ is also bounded from below. From Lemma 3.3, $$\hat{L}_{\beta}(\hat{\omega}^{k})$$ is monotonically decreasing, and we obtain that $$\hat{L}_{\beta}(\hat{\omega}^{k_{j}})$$ is convergent. Since $$\hat{L}_{\beta}(\hat{\omega}^{k})$$ is monotonically decreasing, $$\hat{L}_{\beta}(\hat{\omega}^{k})$$ is also convergent and $$\hat{L}_{\beta}(\hat{\omega}^{*}) \le \hat{L}_{\beta}(\hat{\omega}^{k})$$. It follows from (3.22) that

\begin{aligned} \delta \bigl( { {{ \bigl\Vert {\Delta{u^{k+1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl(\hat{w}^{k} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{k+1} \bigr). \end{aligned}

Summing up the above inequality for $$k =0,\ldots ,N$$ and letting $$N \to \infty$$, we have

\begin{aligned} \begin{aligned} \delta \sum_{k=1}^{+\infty} \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert { \Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr) \le {\hat{L}_{\beta }} \bigl( \hat{w}^{0} \bigr)-\hat{ L_{\beta }} \bigl(\hat{w}^{*} \bigr) < +\infty . \end{aligned} \end{aligned}

Since $$\delta > 0$$, it follows that

\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert ^{2} < +\infty , \qquad\sum_{k=1}^{+\infty} \bigl\Vert {\Delta y^{k + 1}} \bigr\Vert ^{2} < +\infty . \end{aligned} \end{aligned}
(3.24)

Consequently, due to (3.9), we have

\begin{aligned} \begin{aligned} \sum_{k=1}^{+\infty} \bigl\Vert {\lambda ^{k + 1}} - {\lambda ^{{k}}} \bigr\Vert ^{2} < + \infty . \end{aligned} \end{aligned}
(3.25)

Then, .

(III) From (3.24), we have $$\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \Vert ^{2} \to 0$$ and $$\Vert {\Delta y^{k + 1}} \Vert ^{2} \to 0$$. Combining with the definition of $${\hat{L}_{\beta }}(\hat{w}^{k})$$ in (3.21) yields $$\hat{L}_{*} = \lim_{k\to +\infty}\hat{L}_{\beta}(\hat{\omega}^{k}) = \lim_{k\to +\infty}{L}_{\beta}({\omega}^{k})$$. The lemma is proved. □

The following lemma provides upper estimates for the limiting subgradients of $$\hat{L}_{\beta}(\cdot )$$, which is important for the convergence analysis of the sequence generated by Algorithm 1 and Algorithm 2.

### Lemma 3.5

Let $$\{ {{\omega ^{k}}} \}$$ be a sequence generated by Algorithm 1. Then, there exists $$C > 0$$ such that

\begin{aligned} \begin{aligned} d \bigl( {0,\partial {L_{\beta }} \bigl( {{\omega ^{k + 1}}} \bigr)} \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}
(3.26)

### Proof

By the definition of the augmented Lagrangian function $${L_{\beta }} ( \cdot )$$, we have

\begin{aligned} \textstyle\begin{cases} {\partial _{{x_{j}}}}{L_{\beta }}(u^{k+1},{\lambda ^{k + 1}} ) = \partial {f_{j}}( {x_{j}^{{k + 1}}} ) + {\nabla _{{x_{j}}}}g ( \mathbf{x}_{[1,n]}^{k+1},y^{k+1} )- A_{j}^{T}({\lambda ^{k + 1}} - \beta{r^{k + 1}}), \\ {\partial _{y}}{L_{\beta }}( u^{k+1},{\lambda ^{k + 1}} ) = {\nabla _{y}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {B^{T}}{\lambda ^{k + 1}} + \beta {B^{T}}{r^{k + 1}}, \\ {\partial _{\lambda }}{L_{\beta }}( {u^{k+1},{\lambda ^{k + 1}}}) = \frac{1}{\beta }({\lambda ^{k }} - {\lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned}
(3.27)

From the optimality conditions of (3.1)–(3.2), we have

\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1},\mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}{\lambda ^{k + 1}} - \beta A_{j}^{T} \Delta \mathbf{Ax}_{[j+1,n]}^{k+1} - \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) \\ \quad{} - {\tau}(x_{j}^{k+1} - z_{j}^{k}) \in \partial {f_{j}}( {x_{j}^{k+1}} ), \\ {B^{\mathrm{T}}}{\lambda ^{k + 1}} - {\tau}({y^{k + 1}} - y^{k}) = { \nabla _{y}}g( {{u^{k + 1}}} ), \\ {\lambda ^{k + 1}} = {\lambda ^{k}} - \beta ( \mathbf{Ax}_{[1,n]}^{k+1} + B{y^{k+1}} - b), \end{cases}\displaystyle \end{aligned}
(3.28)

where $$\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} =\mathbf{Ax}_{[j+1,n]}^{k+1} - \mathbf{Ax}_{[j+1,n]}^{k}$$. Putting (3.28) into (3.27), we have

\begin{aligned} { \bigl( {\rho _{1}^{k + 1},\rho _{2}^{k + 1}, \ldots ,\rho _{n}^{k + 1}}, \rho _{n+1}^{k + 1}, \rho _{n+2}^{k + 1} \bigr)^{T}} \in \partial {L_{\beta }} \bigl( {x_{1}^{k + 1},x_{2}^{k + 1}, \ldots ,x_{n}^{k+1},{y^{k + 1}},{ \lambda ^{k + 1}}} \bigr), \end{aligned}

where

\begin{aligned} \begin{aligned} \textstyle\begin{cases} \rho _{j}^{{k + 1}} = {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,n]}^{k+1},y^{k+1} ) - {\nabla _{{x_{j}}}}g (\mathbf{x}_{[1,j-1]}^{k+1}, \mathbf{x}_{[j,n]}^{k},y^{k} ) + A_{j}^{T}({\lambda ^{k}} - {\lambda ^{k + 1}}) \\ \phantom{\rho _{j}^{{k + 1}} =}{} + \beta A_{j}^{T}\Delta \mathbf{Ax}_{[j+1,n]}^{k+1} + \beta A_{j}^{T}B({y^{k}} - {y^{k + 1}}) - {\tau}(x_{j}^{k+1} - z_{j}^{k}), (j=1,\ldots ,n), \\ \rho _{n+1}^{k+1} = \beta {B^{\mathrm{T}}}({\lambda ^{k}} - { \lambda ^{k + 1}}) - {\tau}({y^{k + 1}} - y^{\mathrm{{k}}}), \\ \rho _{n+2}^{k+1} = \frac{1}{\beta }({\lambda ^{k }} - { \lambda ^{k+1}}). \end{cases}\displaystyle \end{aligned} \end{aligned}
(3.29)

Since g is Lipschitz continuous on bounded subsets and $$\{ \omega ^{k} \}$$ is bounded, by (III) of Assumption A, combining (3.14), there exists $$C > 0$$ such that

\begin{aligned} \begin{aligned} d \bigl(0,\partial L_{\beta} \bigl({\omega ^{k + 1}} \bigr) \bigr) \le C \Biggl( \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k+1} \bigr\Vert + \bigl\Vert \Delta{y^{k + 1}} \bigr\Vert + \sum_{i=1}^{n} \bigl\Vert \Delta x_{i}^{k} \bigr\Vert + \bigl\Vert \Delta{y^{k }} \bigr\Vert \Biggr). \end{aligned} \end{aligned}

Similarly, we can derive the same conclusion for Algorithm 2. We omit the proof here. □

### Theorem 3.1

Denote the set of the cluster points of the sequence $$\{ {{\omega ^{k}}} \}$$ and $$\{ {{{\hat{\omega}}^{k}}}\}$$ by Ω and Ω̂, respectively. We have that:

(I) If $$\omega ^{*}$$ is a cluster of $$\{\omega ^{k}\}$$, then it has a convergent subsequence $$\{\omega ^{k_{j}}\}_{j\ge 0}$$ such that $$\lim_{j\to +\infty}w^{k_{j}} = w^{*}$$, then

\begin{aligned} \begin{aligned} \lim_{j\to \infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) = L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}

(II) $$\Omega \subseteq critL_{\beta}$$.

(III) $$\lim_{k\to +\infty}d(\omega ^{k},\Omega )$$.

(IV) $$\{ {{\omega ^{k}}} \}$$ is non-empty compact and connected sets.

### Proof

(I) Since $$x_{i}^{k_{j}+1}$$ is the minimizer of $$x_{i}$$-subproblem, we have

\begin{aligned} &{f_{i}} \bigl(x_{i}^{k_{j} + 1} \bigr) + \bigl\langle {x_{i}^{k_{j} + 1} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i]}^{k_{j}+1} +\mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+ \frac{\beta }{2}{ \bigl\Vert {\mathbf{Ax}_{[1,i]}^{k_{j}+1} + \mathbf{Ax}_{[i+1,n]}^{k_{j}}+ B{y^{k_{j}}} - b} \bigr\Vert ^{2}} \\ &\quad\le {f_{i}} \bigl(x_{i}^{*} \bigr) + \bigl\langle {x_{i}^{*} - x_{i}^{k_{j}},{ \nabla _{{x_{i}}}}g \bigl(\mathbf{x}_{[1,i-1]}^{k_{j}+1}, \mathbf{x}_{[i,n]}^{k_{j}},y^{k_{j}} \bigr)} \bigr\rangle \\ &\qquad{} - \bigl\langle {{\lambda ^{k_{j}}}, \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b} \bigr\rangle \\ &\qquad{}+\frac{\beta }{2}{ \bigl\Vert \mathbf{Ax}_{[1,i-1]}^{k_{j}+1} +A_{i}x_{i}^{*} +\mathbf{Ax}_{[i+1,n]}^{k_{j}} + B{y^{k_{j}}} - b \bigr\Vert ^{2}} + \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{*} - z_{i}^{k_{j}}} \bigr\Vert ^{2}} - \frac{{{\tau }}}{{\mathrm{{2}}}}{ \bigl\Vert {x_{i}^{{k_{j}} + 1} - z_{i}^{k_{j}}} \bigr\Vert ^{2}}. \end{aligned}

Combing the inequality above with $$\lim_{j\to \infty}{\omega}^{k_{j}+1}=\omega ^{*}$$, we have

\begin{aligned} \begin{aligned} \limsup_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)\le f_{i} \bigl({x^{*}} \bigr). \end{aligned} \end{aligned}

Since $$f_{i} ( i=1,\ldots ,n)$$ is lower semicontinous, $$f_{i}(x_{i}^{*})\le \lim \inf_{j\to \infty}f_{i}(x_{i}^{k_{j}+1})$$. It follows that

\begin{aligned} \lim_{j\to \infty}f_{i} \bigl(x_{i}^{k_{j}+1} \bigr)=f_{i} \bigl({x^{*}} \bigr). \end{aligned}

Since g is continuous, we further obtain

\begin{aligned} \begin{aligned} &\lim_{j\to +\infty} L_{\beta} \bigl( \omega ^{k_{j}} \bigr) \\ &\quad=\lim_{j\to +\infty} \Biggl( \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{k_{j}} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{k_{j}},y^{k_{j}} \bigr) - \bigl\langle {\lambda ^{k_{j}} , \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b} \bigr\rangle \\ &\qquad{} +\frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{k_{j}} + By^{k_{j}} - b \bigr\Vert ^{2} \Biggr) \\ &\quad= \sum_{i=1}^{n}{f_{i}} \bigl( {{x_{i}}}^{*} \bigr) +g \bigl( \mathbf{x}_{[1,n]}^{*},y^{*} \bigr) - \bigl\langle { \lambda ^{*} , \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b} \bigr\rangle + \frac{\beta}{2} \bigl\Vert \mathbf{Ax}_{[1,n]}^{*} + By^{*} - b \bigr\Vert ^{2} \\ &\quad=L_{\beta} \bigl(\omega ^{*} \bigr). \end{aligned} \end{aligned}

(II) From Lemma 3.4, we have that $$x_{i}^{k+1} - \i ^{k} \to 0, y^{k+1} - y^{k} \to 0$$ and $$\lambda ^{k+1} - \lambda ^{k} \to 0$$. Thus, according to Lemma 3.5, it follows that $$\partial L_{(}\omega ^{k_{j}}) \to 0$$ as $$j\to \infty$$, while $$\omega ^{k_{j}} \to \omega ^{*}$$ and $$L_{\beta}(\omega ^{k_{j}}) \to L_{\beta}(\omega ^{*})$$ as $$j\to \infty$$. Because of the closeness of $$\partial f_{i}$$, the continuity of g and the relation above, we take limit $$k=k_{j}\to \infty$$ in (3.28), and then we have

\begin{aligned} \textstyle\begin{cases} - {\nabla _{{x_{j}}}}g( \mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}) + A_{j}^{ \mathrm{T}}{\lambda ^{*}} \in \partial {f_{j}} ( {x_{j}^{*}} ), \quad j = 1,\ldots ,n, \\ {\nabla _{y}}g( {\mathbf{x}_{[1,n]}^{{{*}}},{y^{*}}}) = {B^{\mathrm{T}}}{ \lambda ^{*}}, \\ \mathbf{Ax}_{[1,n]}^{*} + B{y^{*}} - b = 0, \end{cases}\displaystyle \end{aligned}

which implies that $$\omega ^{*}$$ is a critial point of $$L_{\beta} (\cdot )$$. According to (3.23), $$\{\omega ^{k}\}$$ is convergent. Thus, $$\omega ^{*}$$ is a cluster point of $$\{\omega ^{k}\}$$, i.e., $$\Omega \subseteq critL_{\beta}$$.

(III), (IV) The proof follows a similar approach to that of [Theorems 5(ii) and (iii) in Bolte et al. [19]], while incorporating the insights from Remark 5 within the same reference. This remark establishes that the properties detailed in (III) and (IV) are inherent to sequences satisfying the convergence condition $$w^{k+1}-w^{k} \to 0$$ as $$k\to +\infty$$. Such generic nature is indeed applicable in our context, as demonstrated by (3.23). □

### 3.4 Global convergence under the Kurdyka–Łojasiewicz property

In this subsection, we prove the global convergence of $$\{(\mathbf{x}_{[1,n]} , y^{k}, \lambda ^{k})\}$$ generated by Algorithm 1 and Algorithm 2 with the help of the Kurdyka–Łojasiewicz property. Since the proofs of two algorithms are identical, in this subsection, we only prove the global convergence of Algorithm 1.

### Theorem 3.2

(Global convergence)

Suppose that Assumption A holds, and $$\hat{L} ( {\hat{\omega}} )$$ satisfies the KŁ property at each point of Ω̂, then

(I) $$\sum_{k = 1}^{\infty }{\| {{\omega ^{k}} - {\omega ^{k - 1}}}\|} < \infty$$.

(II) $$\{ {{\omega ^{k}}} \}$$ converges to a critical point of $$L ( \cdot )$$.

### Proof

From Theorem 3.1, we have $$\mathop {\lim }_{k \to + \infty } \hat{L}( {{{\hat{\omega}}^{k}}} ) = \hat{L} ( {{{\hat{\omega}}^{*}}} )$$ for all $${\hat{\omega}^{*}} \in \hat{\Omega}$$. We consider two cases.

(i) If there exists an integer $${k_{0}}$$ such that $${\hat{L}_{\beta }}( {{{\hat{\omega}}^{{k_{0}}}}}) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$$. From Lemma 3.3, for all $$k > {k_{0}}$$, we have

\begin{aligned} \begin{aligned} \delta \bigl( \Vert \Delta \mathbf{x}_{[1,n]} \Vert ^{2} + \bigl\Vert \Delta y^{k+1} \bigr\Vert ^{2} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) \le {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{{k_{0}}}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) = 0 . \end{aligned} \end{aligned}
(3.30)

Thus, for any $$k > {k_{0}}$$, we have $$x_{i}^{k + 1} = x_{i}^{k}, i=1,2,\ldots ,n, {y^{k + 1}} = {y^{k}}$$. Hence, for any $$k > {k_{0}} + 1$$, one has $${\hat{\omega}^{k + 1}} = {\hat{\omega}^{k}}$$, and the assertion holds.

(ii) Since $$\{ \hat{L}_{\beta}(\hat{\omega}^{k})\}$$ is nonincreasing, it holds that $${\hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}} ) > {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$$ for all $$k >1$$. Since $$\mathop {\lim }_{{k} \to + \infty }d( {{{\hat{\omega}}^{k}}, \hat{\Omega}} )= 0$$, for any given $$\varepsilon > 0$$, there exists $${k_{1}} > 0$$, such that for any $$k > {k_{1}}$$, $$d( {{{\hat{\omega}}^{k}},\hat{\Omega}}) < \varepsilon$$. Since $$\mathop {\lim }_{{k_{j}} \to + \infty } {\hat{L}_{\beta }}( {{{ \hat{\omega}}^{k}}} ) = {\hat{L}_{\beta }} ( {{{\hat{\omega}}^{*}}} )$$, for any given $$\eta > 0$$, there exists $${k_{2}} > 0$$,$${ \hat{L}_{\beta }}( {{{\hat{\omega}}^{k}}}) < {\hat{L}_{\beta }} ( {{{ \hat{\omega}}^{*}}} ) + \eta$$, for all $$k > {k_{2}}$$. Consequently, when $$k > \tilde{k}: = \max \{ {{k_{1}},{k_{2}}} \}$$,

\begin{aligned} d \bigl( {{{\hat{\omega}}^{k}},\hat{\Omega}} \bigr) < \varepsilon , { \hat{L}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) < {\hat{L}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr) + \eta . \end{aligned}
(3.31)

Since $${\{ {{{\hat{\omega}}^{k}}} \}}$$ is non-empty compact set, and $${\hat{L}_{\beta }} ( \cdot )$$ is constant on Ω̂, applying Lemma 2.1, we have

\begin{aligned} \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} {{{\hat{\omega}}^{*}}} } \bigr) d \bigl( {0, \partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \ge 1, \quad\forall k > \tilde{k}. \end{aligned}
(3.32)

Let $${a_{k}}:= \sum_{i=1}^{n}\|\Delta x_{i}^{k} \| + \|\Delta y^{k} \|$$. $$\forall k > \tilde{k}$$. From Lemma (3.5), one has

\begin{aligned} \frac{1}{{\varphi '( {{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}}) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{*}}})} )}} \le d \bigl( {0,\partial {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr)} \bigr) \le C_{2} ( {a_{k}}+ {a_{k+1}} ). \end{aligned}
(3.33)

From the concavity of φ, we have

\begin{aligned} \begin{aligned} &\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \\ &\quad\ge \varphi ' \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k + 1}}} \bigr)} \bigr) \\ &\quad\ge \frac{{ {{{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k + 1}}} )} }}{{d ( {0,\partial {{\hat{L}}_{\beta }} ( {{{\hat{\omega}}^{k}}} )} )}} \ge \frac{{{{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k}}} ) - {{\hat{L}}_{\beta }}( {{{\hat{\omega}}^{k + 1}}})}}{{C( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}
(3.34)

From Lemma 3.3, we have

\begin{aligned} & {{\delta \bigl( { {{ \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{ \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)}} \\ &\quad\le \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{ \hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{C({a_{k}}+ {a_{k+1}} )}}. \end{aligned}

From the inequality $$\sum_{i=1}^{n}a_{i}\le \sqrt{n\sum_{i=1}^{n}a_{i}^{2}}$$ and $$\sqrt{ab}\le a+\frac{1}{4}b$$, we obtain

\begin{aligned} \begin{aligned} a_{k+1} \le{}& \bigl( { {{(n+1) \bigl\Vert \Delta \mathbf{x}_{[1,n]}^{k+1} \bigr\Vert }^{2}}}+{ {{(n+1) \bigl\Vert {\Delta{y^{k{+}1}} } \bigr\Vert }^{2}}} \bigr)^{\frac{1}{2}} \\ \le{} &\sqrt{\frac{C(n+1)}{\delta} \bigl(\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k + 1}}} \bigr) - {{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) \bigr){{({a_{k}}+ {a_{k+1}} )}}} \\ \le{} & \underbrace{\sqrt{\frac{C(n+1)}{\delta}} \bigl( {\varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k+1}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr)}_{a} + \frac{1}{4}{\underbrace{({a_{k}}+ {a_{k+1}} )}_{b}}. \end{aligned} \end{aligned}

Summing up the above inequality from $$k=k'+2,\ldots ,M$$ yields

\begin{aligned} \begin{aligned} \sum_{k=k'+2}^{M} {a_{k+1}} \le {}& \sqrt{\frac{C(n+1)}{\delta}} \bigl( { \varphi \bigl( {{{ \hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr) - \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{M}}} \bigr) - {{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{*}}} \bigr)} \bigr)} \bigr) \\ &{}+ \frac{1}{4}\sum_{k=k'+1}^{M}{{( {a_{k}}+ {a_{k+1}} )}}. \end{aligned} \end{aligned}

Letting $$M\to \infty$$, we get

\begin{aligned} \sum_{k=k'+2}^{\infty} {a_{k+1}} \le 2 \sqrt{\frac{C(n+1)}{\delta}} \bigl( \varphi \bigl( {{{\hat{L}}_{\beta }} \bigl( {{{\hat{\omega}}^{k'+2}}} \bigr) - {{\hat{L}}_{ \beta }} \bigl( {{{ \hat{\omega}}^{*}}} \bigr)} \bigr) \bigr) - \frac{1}{2} {a_{k'+1}}. \end{aligned}

Since $$\delta ,C>0$$ and $${a_{k'+1}}$$ is a constant, $$\sum_{k=k'+2}^{\infty} {a_{k+1}} < \infty$$. Therefore, $$\sum_{k=1}^{\infty} \| \omega ^{k+1}-\omega ^{k}\| < \infty$$. (I) is proved.

(II) $$\{\omega ^{k}\}$$ is a Cauchy sequence, and thus it is convergent. Combining (I) with Theorem 3.1, we obtain that $$\{ {{\omega ^{k}}} \}$$ converges to a critical point of $$L_{\beta} ( \cdot )$$. □

## 4 Numerical experiments

This section presents the numerical experiment outcomes of applying Algorithm 1 and Algorithm 2 to $$l_{\frac{1}{2}}$$-regularization problem and matrix decomposition problem. All experimental computations were executed using Matlab 2020b running on a Windows 11 system-equipped laptop with an AMD Ryzen 5 3550H CPU operating at 3.5 GHz and backed by 16 GB of RAM.

### 4.1 $$l_{\frac{1}{2}}$$-regularization problem

In compressed sensing, we consider the following optimization problem

\begin{aligned} \begin{aligned} \min_{x} \Vert Mx-b \Vert ^{2} + \varphi \Vert x \Vert _{0}, \end{aligned} \end{aligned}
(4.1)

where $$M\in \mathbb{R}^{m\times n}$$ is the measuring matrix, $$b\in \mathbb{R}^{n}$$ is the observation vector, φ is the regular parameter. $$\| x\|_{0}$$ denotes the number of nonzero components of x. However, the problem (4.1) is NP-hard, some scholars relax $$l_{0}$$ norm to $$l_{\frac{1}{2}}$$ norm in practical applications [28], then the problem is exported to the following nonconvex problem:

\begin{aligned} \begin{aligned} &\min \varphi \Vert x \Vert _{(1/2)}^{(1/2)}+\frac{1}{2}{{ \Vert y \Vert }^{2}} \\ &\quad\text{s.t}\text{. }Mx-y=b, \end{aligned} \end{aligned}
(4.2)

where $$\|x\|_{\frac{1}{2}}=(\sum_{i=1}^{n} \| x_{i}\| ^{\frac{1}{2}})^{2}$$.

Based on (4.2), we construct the following problem:

\begin{aligned} \begin{aligned} &\min_{x_{1},x_{2},y} c \Vert x_{1} \Vert _{(1/2)}^{(1/2)}+ \frac{1}{2}{{ \Vert x_{2} \Vert }^{2}}+ \frac{1}{2}{{ \Vert {{B}_{1}}x_{1}+{{B}_{2}}x_{2}+y \Vert }^{2}} \\ &\quad\text{s.t.} A_{1}x_{1}+A_{2}x_{2}+y=b. \end{aligned} \end{aligned}
(4.3)

To verify the validity of Algorithm 1 and Algorithm 2, we test them and compare them with LADMM.Footnote 1

Applying Algorithm 1 to problem (4.3) yields

\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H \biggl(\frac{1}{\mu _{1}} \biggl[ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T} \bigl({{B}_{1}} {{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}} \bigr)\\ \phantom{{{x_{1}}^{k+1}}=}{}- \beta {{A_{1}}^{T}} \biggl(A_{2}{{x_{2}}^{k}} +y^{k} -b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr] , \frac{2c}{{{\mu }_{1}}} \biggr), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} \biggl[\tau z_{2}^{k}-B_{2}^{T} \bigl(B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} \bigr) -\beta A_{2}^{T} \biggl( A_{1}x_{1}^{k+1}+y^{k}-b- \frac{\lambda _{k}}{\beta} \biggr) \biggr] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}} \biggl[\tau{{y}^{k}}- \bigl({{B}_{1}} {{x}^{k+1}}+{{B}_{2}} {{x_{2}}^{k+1}} \bigr)- \beta \biggl( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } \biggr) \biggr], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta \bigl(A{{x}^{k+1}}+B{{y}^{k+1}}-b \bigr), \end{cases}\displaystyle \end{aligned}

where $$\mu _{1}={\tau +\beta \rho _{{\max}{ ( A_{1}^{\mathrm{{T}}}A_{1} )}}}, \mu _{2}=1+\tau +\beta \rho _{{\max} ( A_{2}^{\mathrm{{T}}}A_{2} )},\mu _{3}=1+\tau +\beta$$, and $$H(\cdot ,\cdot )$$ is the half shrinkage operator [29] defined as $$H ( x,\alpha ) = \{ h_{\alpha}^{1}, h_{\alpha}^{2},\ldots h_{ \alpha}^{n} \}$$ with

${x}_{1}\left(i\right)=\left\{\begin{array}{cc}\frac{2{x}_{i}}{3}\left(1+cos\left(\frac{2}{3}\left(\pi -\varphi \left(|{h}_{\alpha }^{i}|\right)\right)\right)\right)\hfill & |{h}_{\alpha }^{i}|>\frac{\sqrt[3]{54}}{4}{\alpha }^{2/3};\hfill \\ 0\hfill & \text{otherwise};\hfill \end{array}$
(4.4)

where

\begin{aligned} \begin{aligned} \phi \bigl( \bigl\lvert h_{\alpha}^{i} \bigr\rvert \bigr)=\arccos \biggl( \frac{\alpha}{8} \biggl( \frac{\lvert h_{\alpha}^{i} \rvert }{3} \biggr)^{-(3/2)} \biggr) . \end{aligned} \end{aligned}

Applying Algorithm 2 to problem (4.3) yields

\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{z}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau z_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T} ( A_{1}x_{1}^{k+1}+By^{k}-b-\frac{\lambda _{k}}{\beta} ) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{4}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}}+y^{k})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta } ) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b), \end{cases}\displaystyle \end{aligned}

where $$\mu _{4}=\tau +\beta$$. Applying LADMM to problem (4.3), we obtain

\begin{aligned} \textstyle\begin{cases} {{x_{1}}^{k+1}}=H (\frac{1}{\mu _{1}} [ \tau{{x}_{1}^{k}}-{{B}_{1}}^{T}({{B}_{1}}{{x}^{k}}+{B}_{2}{x_{2}^{k}}+{{y}^{k}})- \beta {{A_{1}}^{T}}(A_{2}{{x_{2}}^{k}}-b- \frac{{{\lambda }^{k}}}{\beta }) ] , \frac{2c}{{{\mu }_{1}}} ), \\ {x_{2}}^{k+1}=\frac{1}{\mu _{2}} [\tau x_{2}^{k}-B_{2}^{T} (B_{1}x_{1}^{k}+B_{2}x_{2}^{k}+y^{k} ) -\beta A_{2}^{T}( A_{1}x_{1}^{k+1}+By^{k}-b- \frac{\lambda _{k}}{\beta}) ] , \\ {{y}^{k+1}}=\frac{1}{{{\mu }_{3}}}[\tau{{y}^{k}}-({{B}_{1}}{{x}^{k+1}}+{{B}_{2}}{{x_{2}}^{k+1}})- \beta ( A_{1}{{x_{1}}^{k+1}}+A_{2}x_{2}^{k+1}-b- \frac{{{\lambda }^{k}}}{\beta }) ], \\ {{\lambda }^{k+1}}={{\lambda }^{k}}-\beta (A{{x}^{k+1}}+B{{y}^{k+1}}-b). \end{cases}\displaystyle \end{aligned}

In experiment, we configure the parameter as follows: the dimensions are set to $$m=5000, n=1000$$, the regularization parameter is chosen as $$\beta =1000$$. $$b=0$$, $$c=1$$, and the inertial parameter is fixed at $$\theta =0.15$$. The initial points are selected as $$x_{1}^{-1}= x_{1}^{0}=0$$, $$x_{2}^{-1}= x_{2}^{0}=0$$, $$y^{0}=0$$, and $$\lambda ^{0}=0$$. $$A_{1}, A_{2}, B_{1}, B_{2}$$ are random matrices. The stopping criterion of all these methods are defined as

\begin{aligned} \begin{aligned} \Vert r_{k} \Vert = \bigl\Vert A_{1}x_{1}^{k}+A_{2}x_{2}^{k} +y-b \bigr\Vert \le 10^{-8}. \end{aligned} \end{aligned}

Throughout the testing phase, we conduct experiments with four cases $$\tau =30, \tau =35, \tau =40$$ and $$\tau =45$$, respectively. The numerical results of the three algorithms are reported in Table 1. We report the number of iterations required to satisfy the stopping criterion (“Iter”), the total computing time in seconds (“times”), and the value of the stopping criterion (“log(Crit)”). Moreover, to visually illustrate the convergence behavior, the curves of the objective value and $$\log (\|r_{k}\|)$$ at $$\tau =45\cdot$$ are presented in Fig. 1.

From Table 1, we can see that the two proposed algorithms have higher time efficiency and fewer iterations in comparison with LADMM. Figure 1(a) illustrates the trends of the objective value under the same iterations, clearly indicating that SPLIADMM and SCLIADMM have better performance of convergence than LADMM. Figure 1(b) again demonstrates the high time efficiency of our two algorithms, especially when “log(Crit)” is less than −4.

### 4.2 Matrix decomposition

Now, we consider the matrix decomposition problem, which has the following form:

\begin{aligned} \min \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2}\quad \text{s.t. } L+S=T, \end{aligned}
(4.5)

where $$M\in \mathbb{R}^{p\times n}$$ is the observed matrix, and $$L,S,T \in \mathbb{R}^{p\times n}$$ are the decision variables. The nuclear norm $$\|L\|_{*}:=\sum_{i=1}^{\min(p,n)}\vert \sigma _{i}(L)\vert ^{ \frac{1}{2}}$$, the spares term $$\|S\|_{1}:=\sum_{i=1}^{n}\sum_{i=1}^{p}\vert S_{ij}\vert$$, ω is the penalty factor, and α is the trade-off parameter between the nuclear norm $$\|L\|_{*}$$ and the $$l_{1}$$-norm $$\|S\|_{1}$$. The ALF of problem (4.5) is defined as

\begin{aligned} \begin{aligned} L_{\beta} (L,S,T,\lambda )= \Vert L \Vert _{*}+\alpha \Vert S \Vert _{1}+ \frac{\omega}{2} \Vert T-M \Vert ^{2} -\langle \lambda , L+S-T\rangle + \frac{\beta}{2} \Vert L+S-T \Vert ^{2}, \end{aligned} \end{aligned}

where λ is the Lagrange multiplier.

Applying SPLI-ADMM to problem (4.5), we get the closed-form iterative formulas:

\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}

where $$V(\cdot ,\mu )$$ is the singular value thresholding operator [30], $$S(\cdot ,\mu )$$ is the softshrinkage operator [31]. Applying SCLI-ADMM to problem (4.5), we get

\begin{aligned} \textstyle\begin{cases} z_{L}^{k}=L^{k}+\theta (L^{k}-{L}^{k-1} ), z_{S}^{k}=S^{k}+ \theta (S^{k}-{S}^{k-1} ), \\ L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau z_{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau z_{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\tau T^{k}+\beta (L^{k+1}+S^{k+1} )+\omega (M-T^{k})-\lambda ^{k}}{\beta +\tau}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ), \end{cases}\displaystyle \end{aligned}

Applying LADMM to problem (4.5), we have

\begin{aligned} \textstyle\begin{cases} L^{k+1}=V( \frac{\beta (T^{k}-S^{k} )+\lambda ^{k}+\tau{L}^{k}}{\beta +\tau}, \frac{1}{\beta +\tau}), \\ S^{k+1}=S( \frac{\beta (T^{k}-L^{k+1} )+\lambda ^{k}+\tau{S}^{k}}{\beta +\tau}, \frac{\alpha}{\beta +\tau}), \\ T^{k+1}= \frac{\beta (L^{k+1}+S^{k+1} )+\omega M-\lambda ^{k}}{\beta +\omega}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (L^{k+1}+S^{k+1}-T^{k+1} ). \end{cases}\displaystyle \end{aligned}

We set $$p=n=100$$, and take 8 different $$(r.,spr.)$$. Besides, we choose $$\alpha =\frac{0.2}{\sqrt{m}},\theta =0.3,\omega =1000$$, the matrix $$L,S$$ and T are initialized to be zero. We take $$\beta =5, \tau =1$$, M was generated in MATLAB randomly. The stopping criterion is defined as

\begin{aligned} \begin{aligned} \operatorname { RelChg }:= \frac{ \Vert (L^{k+1}, S^{k+1}, T^{k+1} )- (L^{k}, S^{k}, T^{k} ) \Vert _{F}}{ \Vert (L^{k}, S^{k}, T^{k} ) \Vert _{F}+1} \leqslant 10^{-8} \quad\text{or}\quad k>3000. \end{aligned} \end{aligned}

Let Ŝ and be a numerical solution of problem (4.5). We measure the quality of the recovery by the relative error, which is defined by

\begin{aligned} \begin{aligned} \operatorname{RelErr}:= \frac{ \Vert (\hat{L},\hat{S}, \hat{T})- (L^{*},S^{*}, T^{*} ) \Vert _{F}}{ \Vert (L^{*},S^{*}, T^{*} ) \Vert _{F}+1} . \end{aligned} \end{aligned}

Table 2 illustrates the comparison between different $$(r.,spr.)$$, where “r.” represents the rank of matrix L, “$$spr$$.” represents the sparsity of the sparse matrix S, “Iter” represents the number of iterations. $$\|S\|_{0}$$ denotes the number of nonzero elements of S. Besides, the iterative curves of the stopping criterion and relative error of the three algorithms are plotted in Fig. 2, respectively.

Table 2 shows that SPLIADMM and SCLIADMM take less time and fewer iterations under the same condition, which demonstrates that our proposed two algorithms are more efficient than LADMM for different rank and sparse ratios. In Fig. 2, the curves of stopping criterion (see Fig. 2(a) and (c)) in two trials demonstrate that SPLI-ADMM and SCLIADMM converge faster than LADMM. Figure 2(b) and (d) indicate clearly that the matrices L and S are better recovered by SPLI-ADMM and SCLI-ADMM because “RelErr” of LADMM is greater than that of SPLI-ADMM for the same “Iter”.

## 5 Conclusion

This paper made some extensions in the field of nonconvex optimization through the development and convergence analysis of two linearized ADMM algorithms, SPLI-ADMM and SCLI-ADMM. By integrating inertial strategy within a linearized framework, these algorithms improve the efficacy for solving linear constrained problems with nonseparable structure. A key novelty lies in the utilization of sequential gradients of the mixed term, which is not typically found in conventional ADMM approaches, enabling the proposed algorithms to use the latest information to update each variable. The KŁ property has been used to guarantee the convergence of the generated sequences. Finally, the results of numerical experiments show that the proposed algorithms exhibit superior time efficiency and validity.

## Data Availability

No datasets were generated or analysed during the current study.

## Notes

1. LADMM is a special case of SPLI-ADMM that the inertial parameter $$\theta _{k} = 0$$.

## References

1. Yang, J., Zhang, Y.: Alternating direction algorithms for $\ell_1$-problems in compressive sensing. SIAM J. Sci. Comput. 33(1), 250–278 (2011)

2. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

3. Ding, J., Zhang, X., Chen, M., Xue, K., Zhang, C., Pan, M.: Differentially private robust admm for distributed machine learning. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 1302–1311. IEEE, Los Angeles (2019)

4. Wang, Y., Yin, W., Zeng, J.: Global convergence of admm in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2019)

5. Jiang, B., Lin, T., Ma, S., Zhang, S.: Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput. Optim. Appl. 72(1), 115–157 (2019)

6. Peng, Z., Xu, Y., Yan, M., Arock, W.Y.: An algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

7. Chen, L., Sun, D., Toh, K.-C.: A note on the convergence of admm for linearly constrained convex optimization problems. Comput. Optim. Appl. 66(2), 327–343 (2017)

8. Chang, X., Liu, S., Zhao, P., Song, D.: A generalization of linearized alternating direction method of multipliers for solving two-block separable convex programming. J. Comput. Appl. Math. 357, 251–272 (2019)

9. Zhang, C., Song, Y., Cai, X., Han, D.: An extended proximal admm algorithm for three-block nonconvex optimization problems. J. Comput. Appl. Math. 398, 113681 (2021)

10. Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25(2), 882–915 (2015)

11. Wang, X., Shao, H., Liu, P., Wu, T.: An inertial proximal partially symmetric admm-based algorithm for linearly constrained multi-block nonconvex optimization problems with applications. J. Comput. Appl. Math. 420, 114821 (2023)

12. Hien, L.T.K., Phan, D.N., Gillis, N.: Inertial alternating direction method of multipliers for non-convex non-smooth optimization. Comput. Optim. Appl. 83(1), 247–285 (2022)

13. Chao, M., Deng, Z., Jian, J.: Convergence of linear Bregman admm for nonconvex and nonsmooth problems with nonseparable structure. Complexity 2020, 1–14 (2020)

14. Li, X., Mo, L., Yuan, X., Zhang, J.: Linearized alternating direction method of multipliers for sparse group and fused lasso models. Comput. Stat. Data Anal. 79, 203–221 (2014)

15. Ling, Q., Shi, W., Wu, G., Dlm, A.R.: Decentralized linearized alternating direction method of multipliers. IEEE Trans. Signal Process. 63(15), 4051–4064 (2015)

16. Liu, Q., Shen, X., Gu, Y.: Linearized admm for nonconvex nonsmooth optimization with convergence analysis. IEEE Access 7, 76131–76144 (2019)

17. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

18. Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in nonconvex optimization problems. Comput. Math. Model. 4(4), 336–341 (1993)

19. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

20. Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1), 3–11 (2001)

21. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (ipalm) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)

22. Hien, L.T.K., Papadimitriou, D.: An inertial admm for a class of nonconvex composite optimization with nonlinear coupling constraints (2022). arXiv preprint. arXiv:2212.11336

23. Boţ, R.I., Nguyen, D.-K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)

24. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

25. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

26. Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward–backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)

27. Goncalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Convergence rate bounds for a proximal admm with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems (2017). arXiv preprint. arXiv:1702.01850

28. Zeng, J., Lin, S., Wang, Y., Xu, Z.: $l_{1/2}$ regularization: convergence of iterative half thresholding algorithm. IEEE Trans. Signal Process. 62(9), 2317–2329 (2014)

29. Xu, Z., Chang, X., Xu, F., Zhang, H.: $l_{1/2}$ regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)

30. Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)

31. Bonesky, T., Maass, P.: Iterated soft shrinkage with adaptive operator evaluations. J. Inverse Ill-Posed Probl. 17(4), 337–358 (2009)

## Funding

This work is supported by National Natural Science Foundation of China under grants 72071130, 71901145 and 12371308; The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning(No.TP2022126); Key Lab of Intelligent and Green Flexographic Printing(No.KLIGFP-01)

## Author information

Authors

### Contributions

Z. write the introduction. Z. and K. finsih the theoretical framework and establish Convergence analysis for the entire study. K. and Q. Assisted in Numerical experiment and preparing the figures and tables. Y. finalized the manuscript content and structure, ensuring consistency and coherence. Z. and Y. acquired of the financial support for the project leading to this publication All authors reviewed the manuscript.

### Corresponding author

Correspondence to Yazheng Dang.

## Ethics declarations

Not applicable.

### Competing interests

The authors declare no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and permissions

Xue, Z., Yang, K., Ma, Q. et al. Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure. J Inequal Appl 2024, 65 (2024). https://doi.org/10.1186/s13660-024-03141-1