Sequential inertial linear ADMM algorithm for nonconvex and nonsmooth multiblock problems with nonseparable structure

The alternating direction method of multipliers (ADMM) has been widely used to solve linear constrained problems in signal processing, matrix decomposition, machine learning, and many other ﬁelds. This paper introduces two linearized ADMM algorithms, namely sequential partial linear inertial ADMM (SPLI-ADMM) and sequential complete linear inertial ADMM (SCLI-ADMM), which integrate linearized ADMM approach with inertial technique in the full nonconvex framework with nonseparable structure. Iterative schemes are formulated using either partial or full linearization while also incorporating the sequential gradient of the composite term in each subproblem’s update. This adaptation ensures that each iteration utilizes the latest information to improve the eﬃciency of the algorithms. Under some mild conditions, we prove that the sequences generated by two proposed algorithms converge to the critical points of the problem with the help of KŁ property. Finally, some numerical results are reported to show the eﬀectiveness of the proposed algorithms.


Introduction
In this paper, we consider the following linearly constrained nonconvex optimization problem with multiple block variables: where x i ∈ R p i (i = 1, 2, . . .n) and y ∈ R q are variables, each f i : R p i → R∪{+∞}(i = 1, 2, . . .n) are proper lower semicontinuous functions, which are nonconvex and (possibly) nonsmooth, g : R is continuously differentiable, and ∇g is Lips-chitz continuous with modulus l g > 0, A i ∈ R m×p i (i = 1, 2, . . .n), B ∈ R m×q are given matrices, and b ∈ R m .Denote x [i,j] = (x i , x i+1 , . . ., x j-1 , x j ) and Ax [j,k] = k i=j A i x i .The Augmented Lagrangian Function (ALF) of (1.1) is defined as L(x [1,n] , y, λ) = n i=1 f i (x i ) + g(x [1,n] , y)λ, Ax [1,n] where λ ∈ R m is the Lagrangian dual variable, and β > 0 is a penalty parameter.The problem (1.1) encapsulates a multitude of nonconvex optimization problems across various domains, including signal processing, image reconstruction, matrix decomposition, machine learning, etc. [1][2][3].When the number of blocks n equals 2, and g(•) is identically zero, this problem degenerates into two-block separable problem.If the problem contains merely a mixed term, it becomes similar to the problem in [4].On the other hand, if variable y is absent, the problem becomes the study in [5].Hence, problem (1.1) extends the scope of the objective functions found in the literature [4][5][6], encompassing a broader range of scenarios with additional variables and potential mixed terms, thereby reflecting the versatility and complexity encountered in contemporary applications.
Indeed, ADMM has been established as a powerful tool for solving two-block separable convex optimization problems [7,8].However, its effectiveness and convergence guarantees become much more intricate when dealing with nonconvex problems, especially when the number of blocks exceeds two.Zhang et al. [9] tackled this challenge by proposing a proximal ADMM for solving three-block nonconvex optimization tasks, building upon the groundwork laid by Sun et al. [10].Meanwhile, Wang et al. [11] proposed an inertial proximal partially symmetric ADMM, suitable for handling multiblock separable nonconvex optimization problems.Hien et al. [12] developed an inertial version of ADMM, referred to as iADMM, which integrated the majorization-minimization principle within each block update step to address a specific class of nonconvex low-rank representation problems.Chao et al. [13] contributed to this area with a linear Bregman ADMM algorithm for nonconvex multiblock optimization problems featuring nonseparable structures.
Linearized Alternating Direction Method of Multipliers (LADMM) simplifies the problem-solving process and significantly decreases the computational overhead associated with traditional ADMM.By linearizing certain components of the optimization problem at each iteration, LADMM allows for more straightforward and efficient updates.Li et al. [14] effectively utilized LADMM in the context of the least absolute shrinkage and selection operator (LASSO) problem, demonstrating that this linearized approach is simple and highly efficient.Ling et al. [15] further extended the application of LADMM by introducing a decentralized linearized ADMM algorithm, which solely linearizes the objective functions at each iterative step.This method facilitates distributed computation and can handle large-scale problems more effectively.Specifically addressing nonconvex and nonsmooth scenarios, Liu et al. [16] proposed a two-block linearized ADMM.This variant linearizes the mixed term and the quadratic penalty term in the Augmented Lagrangian Function (ALF), thereby providing a viable solution strategy for such challenging optimization problems.Chao et al. [13] presented a linear Bregman ADMM, which only linearized the mixed term for solving three-block nonseparable problems.This approach maintains the efficiency gains of LADMM while adapting it to accommodate the complexities inherent in multiblock and nonseparable optimizations.
Inertial technique, initially conceived by Polyak [17], serves as an acceleration strategy that takes into account the dynamics of the optimization process by incorporating information from the last two iterations, thereby mitigating substantial differences between consecutive points.Subsequently, Zavriv et al. [18] expanded the use of the inertial technique to tackle nonconvex optimization problems, marking a significant milestone in broadening the applicability of this methodology.Recently, the inertial technique has seen widespread adoption in conjunction with various optimization algorithms to enhance their performance in solving nonconvex optimization problems.Bot et al. [19] proposed an inertial forward-backward algorithm for the minimization of the sum of two non-convex functions.Attouch et al. [20] introduced an inertial proximal method and a proximal alternating projection method for maximal-monotone problems and minimization problems, respectively.Pock et al. [21] went on to propose a linear Inertial Proximal Alternating Minimization Algorithm (IPAMA) for a diverse range of nonconvex and nonsmooth optimization problems.Building upon these advancements, researchers have successfully integrated the inertial technique with the Alternating Direction Method of Multipliers (ADMM).Hien et al. [22] developed an Inertial Alternating Direction Method of Multipliers (iADMM) specifically designed for a class of nonconvex multiblock optimization problems with nonlinear coupling constraints.Wang et al. [11] also introduced an Inertial Proximal Partially Symmetric ADMM, tailored for nonconvex settings, further highlighting the versatility and efficacy of combining inertial techniques with ADMM in modern optimization methodologies.
The novelty of this paper can be summarized as follows: (I) The proposed algorithms combine the inertial effect with the linearization skill.The former improves the feasibility of the algorithms, while the latter contributes to fast convergence.
(II) Unlike conventional approaches such as those in [13], during the linearization phase, the gradient of the mixed term of the x j -sub-problem is calculated as ).This distinctive characteristic enables us to linearize the mixed term dynamically based on the progress of the indicator sequence, meaning that each update depends on the current state of the indicators.Consequently, it is referred to as a sequential gradient iteration scheme.
The rest of this paper is organized as follows: In Sect.2, some necessary preliminaries for further analysis are summarized.Then, we establish the convergence of the two algorithms in Sect.3. Section 4 shows the validity of the algorithms by some numerical experiments.Finally, some conclusions are drawn in Sect. 5.

Preliminaries
In this section, we recall some basic notations and preliminary results, which will be used in this paper.Throughout, R n denotes the n-dimensional Euclidean space, R ∪ {+∞} denotes the extended real number set, and N denotes the natural number set.The image space of a matrix Q ∈ R m×n is defined as ImQ := {Qx : denote the smallest positive singular value of the matrix Q T Q. • represents the Euclidean norm.dom f := {x ∈ R n : f (x) < +∞} is the domain of a function f : R n → R∪{+∞}, x, y = x T y = n i=1 x i y i .
(I) The Fréchet subdifferential, or regular subdifferential, of f at x ∈ domf , written ∂f (x), is defined as (II) The limiting-subdifferential, or simply the subdifferential, of f at x ∈ domf , written ∂f (x), is defined as is called a critical point or a stationary point of the function f .The set of critical points of f is denoted by crit f .

Proposition 2.1
We collect some basic properties of the subdifferential [24].(I) f (x) ⊆ ∂f (x) for each x ∈ R n , where the first set is closed convex, and the second set is only closed.
The KŁ property can be described as follows.
Definition 2.3 (see [19,26]) (KŁ property) Let f : R n → R ∪{+∞} be a proper lower semicontinuous function.If there exist η ∈ (0, +∞], a neighborhood U of x * , and a continuous concave function ϕ ∈ η such that for all where the distance from x to S is defined by d(x, S) := inf{ yx : y ∈ S}.Then, f is said to have the KŁ property at x * .
Lemma 2.1 (see [25]) (Uniformized KŁ property) Suppose that f : R n → R ∪ {+∞} is a proper lower semicontinuous function, and is a compact set.If f (x) ≡ f * for all x ∈ and satisfies the KŁ property at each point of , then there exist ε > 0, η > 0 and ϕ ∈ η such that ) Lemma 2.2 (see [25]) (Descent lemma) Let h : R n → R be a continuous differentiable function where gradient ∇h is Lipschitz continuous with the modulus l h > 0, then for any x, y ∈ R n , we have (2.4) Lemma 2.3 (see [27]) Let Q ∈ R m×n be a nonzero matrix, and let ρ min(Q T Q) denote the smallest positive eigenvalue of Q T Q.Then, for every u ∈ R n , it holds that where P Q denotes the Euclidean projection onto Im(Q).

Algorithms and their convergence
In this section, we propose two linear inertial ADMM algorithms, sequential partial linear inertial ADMM (SPLI-ADMM), and sequential complete linear inertial ADMM (SCLI-ADMM) and prove their convergence with some suitable conditions.Furthermore, we prove the boundedness of the sequence.

Two linear inertial algorithms
First, we present Algorithm 1 for (1.1).
In every iteration of the subproblems, our approach utilizes sequential gradient to update the variables.Specifically, for the (k + 1)th iteration of x i (i = 1, . . ., n), the mixed term ) is replaced with a linearized approximation that includes an inertial proximal term: Here, the sequential gradient ) is refreshed for each subproblem, reflecting the most recent variable updates.Note that the y-subproblem remains unlinearized, so we call it sequential partial linear inertial ADMM.For x j -subproblem (i = 1, . . ., n) and y-subproblem, respectively, we get the following auxiliary functions: ) where and θ k ∈ [0, 1  2 ).Utilizing the auxiliary functions above, the update rules are summarized in Algorithm 1 as follows: for i = 1, . . ., k do 7: end while 8: return (x k+1 1 , . . ., x k+1 n , y k+1 , λ k+1 ).
Remark 1 (I) The auxiliary functions defined in (3.1) own the inertial term τ i = 1, 2, . . ., n, respectively.The inertial schemes update the new iteration by employing the two previous iterations.By adding the inertial term to x i subproblems, the iteration trends to the direction (II) The purpose of linearizing the mixed term in x i -subproblem is to use the properties of differentiable blocks and simplify the calculation of each iteration.
Algorithm 2 is obtained by further linearization on the basis of Algorithm 1.The x isubproblems (i = 1, . . ., n) are same to that of Algorithm 1, the iterative scheme can be written as (3.4).During the (k + 1)th iteration for updating y, we replace the function in g(x k+1 [1,n] , y) with a linearized approximation plus a regularization term In Algorithm 2, all the subproblems were linearized and sequential updated, hence we call it the Sequential Complete Linear Inertial ADMM.
The auxiliary function of y-subproblem is as follows

A descent inequality
A crucial element in establishing the convergence of these algorithms is to verify the descent property of the regularized augmented Lagrangian function sequence.To facilitate our analysis, the following notations are introduced throughout this paper.For k ≥ 1, The convergence analysis relies on the following assumptions: Assumption A (I) g is l g -Lipschitz differentiable, and g is bounded from below.∇g is l g -Lipschitz continuous, i.e., ∇g(u For showing the descent property, the following lemmas are necessary.Lemma 3.1 For Algorithm 1, for each k ∈ N , we have For Algorithm 2, for each k ∈ N , we have Proof Using Assumption A(III) and Lemma 2.3, we have For Algorithm 1, the optimal condition of y-subproblem in (3.2) yields Using Assumption A (I) and (3.12), we have (3.13) It follows from the above mentioned formula and (3.11) that For Algorithm 2, similarly, we get The proof is completed.
To brief the analysis, some notations are given below.Let The following lemma is important to prove the monotonicity of the sequence { Lβ ( ŵk+1 )} defined as (3.20).
Proof We first give the proof of Algorithm 1. From (3.1) and (3.4), for j = 1, . . ., n, we have From (3.2) and (3.5), we have Adding up the above mentioned formulas from j = 1, . . ., n, we have One the one hand, from Lemma 2.2, part A can be written as (3.15) On the other hand, by the definitions of z k i , i = 1, 2, . . ., n, we have Thus, it can be inferred from part B that From Lemma 2.2, (3.15) and (3.16), we obtain (3.17) Recall that (3.18) Submitting (3.9) and (3.17) into (3.18),we have Hence, , then have Similarly, for Algorithm 2, we obtain Since β> max{ , it follows that then we get δ 2 > δ 1 > 0. That is, (3.14) holds.The lemma is proved.
Remark 2 Based on Lemma 3.2, we can define the following function where ).Thus, The following lemma implies that the sequence Lβ (u k , λ k , u k-1 ) is decreasing monotonically.

The cluster points of {ω k } are contained in critL
In this subsection, together with the closeness of the limiting subdifferential mentioned above, we prove the subsequential convergence of the sequence {ω k }.The proof of Algorithm 2 is similar to that of Algorithm 1, so we omit the proof of Algorithm 2 here.
The following lemma provides upper estimates for the limiting subgradients of Lβ (•), which is important for the convergence analysis of the sequence generated by Algorithm 1 and Algorithm 2. Proof By the definition of the augmented Lagrangian function L β (•), we have From the optimality conditions of (3.1)-(3.2),we have Since ∇g is Lipschitz continuous on bounded subsets and {ω k } is bounded, by (III) of Assumption A, combining (3.14), there exists C > 0 such that Similarly, we can derive the same conclusion for Algorithm 2. We omit the proof here.
Theorem 3.1 Denote the set of the cluster points of the sequence {ω k } and { ωk } by and ˆ , respectively.We have that: (III) lim k→+∞ d(ω k , ).
(IV) {ω k } is non-empty compact and connected sets.
is the minimizer of x i -subproblem, we have Combing the inequality above with lim j→∞ ω k j +1 = ω * , we have lim sup ).It follows that Since g is continuous, we further obtain (II) From Lemma 3.4, we have that x k+1 i -ı k → 0, y k+1y k → 0 and λ k+1λ k → 0. Thus, according to Lemma 3.5, it follows that ∂L ( ω k j ) → 0 as j → ∞, while ω k j → ω * and L β (ω k j ) → L β (ω * ) as j → ∞.Because of the closeness of ∂f i , the continuity of ∇g and the relation above, we take limit k = k j → ∞ in (3.28), and then we have Thus, ω * is a cluster point of {ω k }, i.e., ⊆ critL β .
(III), (IV) The proof follows a similar approach to that of [Theorems 5(ii) and (iii) in Bolte et al. [19]], while incorporating the insights from Remark 5 within the same reference.This remark establishes that the properties detailed in (III) and (IV) are inherent to sequences satisfying the convergence condition w k+1w k → 0 as k → +∞.Such generic nature is indeed applicable in our context, as demonstrated by (3.23).

Global convergence under the Kurdyka-Łojasiewicz property
In this subsection, we prove the global convergence of {(x [1,n] , y k , λ k )} generated by Algorithm 1 and Algorithm 2 with the help of the Kurdyka-Łojasiewicz property.Since the proofs of two algorithms are identical, in this subsection, we only prove the global convergence of Algorithm 1.

Theorem 3.2 (Global convergence) Suppose that Assumption A holds, and L( ω) satisfies the KŁ property at each point of
Proof From Theorem 3.1, we have lim k→+∞ L( ωk ) = L( ω * ) for all ω * ∈ ˆ .We consider two cases.
Letting M → ∞, we get (II) {ω k } is a Cauchy sequence, and thus it is convergent.Combining (I) with Theorem 3.1, we obtain that {ω k } converges to a critical point of L β (•).

Numerical experiments
This section presents the numerical experiment outcomes of applying Algorithm 1 and Algorithm 2 to l 1 2 -regularization problem and matrix decomposition problem.All experimental computations were executed using Matlab 2020b running on a Windows 11 system-equipped laptop with an AMD Ryzen 5 3550H CPU operating at 3.5 GHz and backed by 16 GB of RAM.

-regularization problem
In compressed sensing, we consider the following optimization problem where M ∈ R m×n is the measuring matrix, b ∈ R n is the observation vector, ϕ is the regular parameter.x 0 denotes the number of nonzero components of x.However, the problem (4.1) is NP-hard, some scholars relax l 0 norm to l 1 2 norm in practical applications [28], then the problem is exported to the following nonconvex problem: where x 1 2 = ( n i=1 x i 1 2 ) 2 .Based on (4.2), we construct the following problem: To verify the validity of Algorithm 1 and Algorithm 2, we test them and compare them with LADMM. 1  Applying Algorithm 1 to problem (4.3) yields where and H(•, •) is the half shrinkage operator [29] defined as 4 α 2/3 ; 0 o t h e r w i s e ; (4.4) where .
Applying Algorithm 2 to problem (4.3) yields where μ 4 = τ + β.Applying LADMM to problem (4.3), we obtain In experiment, we configure the parameter as follows: the dimensions are set to m = 5000, n = 1000, the regularization parameter is chosen as β = 1000.b = 0, c = 1, and the inertial parameter is fixed at θ = 0.15.The initial points are selected as x -1 1 = x 0 1 = 0, x -1 2 = x 0 2 = 0, y 0 = 0, and λ 0 = 0.A 1 , A 2 , B 1 , B 2 are random matrices.The stopping criterion of all 1 LADMM is a special case of SPLI-ADMM that the inertial parameter θ k = 0.  these methods are defined as Throughout the testing phase, we conduct experiments with four cases τ = 30, τ = 35, τ = 40 and τ = 45, respectively.The numerical results of the three algorithms are reported in Table 1.We report the number of iterations required to satisfy the stopping criterion ("Iter"), the total computing time in seconds ("times"), and the value of the stopping criterion ("log(Crit)").Moreover, to visually illustrate the convergence behavior, the curves of the objective value and log( r k ) at τ = 45• are presented in Fig. 1.
From Table 1, we can see that the two proposed algorithms have higher time efficiency and fewer iterations in comparison with LADMM. Figure 1(a) illustrates the trends of the objective value under the same iterations, clearly indicating that SPLIADMM and SCLI-ADMM have better performance of convergence than LADMM.Figure 1(b) again demonstrates the high time efficiency of our two algorithms, especially when "log(Crit)" is less than -4.

Matrix decomposition
Now, we consider the matrix decomposition problem, which has the following form: where λ is the Lagrange multiplier.
Let Ŝ and T be a numerical solution of problem (4.5).We measure the quality of the recovery by the relative error, which is defined by RelErr := ( L, Ŝ, T) -(L * , S * , T * ) F (L * , S * , T * ) F + 1 .
Table 2 illustrates the comparison between different (r., spr.), where "r." represents the rank of matrix L, "spr." represents the sparsity of the sparse matrix S, "Iter" represents the number of iterations.S 0 denotes the number of nonzero elements of S. Besides, the iterative curves of the stopping criterion and relative error of the three algorithms are plotted in Fig. 2, respectively.Table 2 shows that SPLIADMM and SCLIADMM take less time and fewer iterations under the same condition, which demonstrates that our proposed two algorithms are more efficient than LADMM for different rank and sparse ratios.In Fig. 2, the curves of stopping criterion (see Fig. 2(a) and (c)) in two trials demonstrate that SPLI-ADMM and SCLIADMM converge faster than LADMM.Figure 2(b) and (d) indicate clearly that the matrices L and S are better recovered by SPLI-ADMM and SCLI-ADMM because "RelErr" of LADMM is greater than that of SPLI-ADMM for the same "Iter".

Conclusion
This paper made some extensions in the field of nonconvex optimization through the development and convergence analysis of two linearized ADMM algorithms, SPLI-ADMM and SCLI-ADMM.By integrating inertial strategy within a linearized framework, these algorithms improve the efficacy for solving linear constrained problems with nonseparable structure.A key novelty lies in the utilization of sequential gradients of the mixed term, which is not typically found in conventional ADMM approaches, enabling the proposed algorithms to use the latest information to update each variable.The KŁ property has been used to guarantee the convergence of the generated sequences.Finally, the results of numerical experiments show that the proposed algorithms exhibit superior time efficiency and validity.

. 5 )
where M ∈ R p×n is the observed matrix, and L, S, T ∈ R p×n are the decision variables.The nuclear norm L * :=min(p,n) i=1 |σ i (L)|1 2 , the spares term S 1 := n i=1 p i=1 |S ij |, ω is the penalty factor, and α is the trade-off parameter between the nuclear norm L * and the l 1 -norm S 1 .The ALF of problem (4.5) is defined as

Table 1
Numerical results under different τ

Table 2
Summary of three algorithms for eight different (r., spr)