An ADMM-based SQP method for separably smooth nonconvex optimization

This work is about a splitting approach for solving separably smooth nonconvex linearly constrained optimization problems. Based on the ideas from two classical methods, namely the sequential quadratic programming (SQP) and the alternating direction method of multipliers (ADMM), we propose an ADMM-based SQP method. We focus on decomposing the quadratic programming (QP) subproblem of the primal problem into small-scale QP subproblems, which further embedded with Bregman distances can be solved effectively and followed by a dual ascent type update for the Lagrangian multipliers. Under suitable conditions as well as the crucial Kurdyka–Łojasiewicz property, we establish the global and strong convergence properties of the proposed method.


Introduction
Nonconvex optimization problems arise in a variety of applications ranging from the fields of signal and image processing, machine learning [1]. This class of problems is often structured and explicitly characterized with a separable objective, although they may be rather challenging to deal with.
In this paper, we consider the following nonconvex optimization problems with linear constraints and a separable objective function: min x,y f (x) + g(y), where f : R n 1 → R and g : R n 2 → R are continuously differentiable, but not necessarily convex, matrices A ∈ R m×n 1 , B ∈ R m×n 2 and the vector b ∈ R m are assumed to be given.
To solve problem (1) with separable structure, when f and g are convex, one simple but powerful method is the alternating direction method of multipliers (ADMM), which was originally proposed in [2,3]. A survey of ADMM or its related variants has gained in popularity by many researchers, see, e.g., [4][5][6][7][8][9]. The standard iterative scheme of the classical ADMM for problem (1) acts as follows: ( 2 a ) y k+1 = arg min where β > 0 is a penalty parameter and is the augmented Lagrangian function with a Lagrangian multiplier λ ∈ R m . In contrast to the developments of ADMM for the convex case outlined above, there are some works investigated on understanding the properties of splitting approaches for nonconvex problems, although a rigorous analysis is generally very difficult. Note that the recent works in [10][11][12] all dealt with the nonconvex problems by means of ADMM-type methods and established favorably crucial convergence results. In particular, a Bregman modification of ADMM on the problem with the sum of a smooth function and a nonconvex function in the objective was considered by Wang et al. [10]. Meanwhile, Li et al. [11] devised two types of splitting methods, in which there was only one subproblem also embedded with a Bregman distance. Moreover, Hong et al. [12] focused on solving the nonconvex consensus and sharing problems. We believe that it is very meaningful and important to further study characteristics of splitting approaches designed in the sense of nonconvexity and enlarge applicable spectrums due to the necessity in practice.
On the other hand, it is well known that the sequential quadratic programming (SQP) method, dated back earliest to [13], is one of the most efficient methods for solving smoothly constrained optimization problems, since it enjoys good theoretical properties and stable numerical performance with better approximation of the primal problem. For more than half a century, the SQP method has received rapid development and fruitful achievements, see, e.g., [14][15][16][17][18][19][20][21] and the references therein. Recently, Jian et al. [22] discussed a class of separably smooth nonconvex optimization problems with linear constraints and closed convex sets and presented an ADMM-SQP method. In this work, the QP subproblem of the primal problem was split into two smaller-scale QP subproblems, which were solved in a Jacobian manner. Moreover, an inexact Armijo line search was carried out to illustrate the descent property of the augmented Lagrangian function, and the global convergence was proved under proper conditions. As we know, the classical QP subproblem associated with problem (1) reads as follows: where H k ∈ R (n 1 +n 2 )×(n 1 +n 2 ) is a symmetric approximation to the Hessian of the Lagrangian function, namely L 0 (x, y, λ) with β = 0 in (3), for problem (1) with respect to variables (x, y).

Consider that
then it is reasonable to choose matrix H k of the form where H x k and H y k are the symmetric approximations of ∇ 2 f (x k ) and ∇ 2 g(y k ), respectively. In view of this, QP subproblem (4) can be converted into the following structure: where we find that the objective function is separable.
In this paper, motivated by the ideas of the splitting scheme applied to the QP subproblem in [22], and of the Bregman modification of ADMM in [10], we focus on QP subproblem (6) of the primal problem, and propose an ADMM-based SQP algorithm in the nonconvex sense. The resulting method makes use of the separable structure of QP subproblem (6) and decomposes it into two relatively small-scale QP subproblems in a Gauss-Seidel manner, which further are well equipped with additional Bregman distances and then are solved effectively. The main difference from the work [22] is that our proposed method is irrelevant with any line search and the convergence properties can be proved in terms of the potential function under some suitable conditions. The remainder of this paper is structured as follows. In Sect. 2, the ADMM-based SQP algorithm is established as some elementary preliminaries are prepared. Section 3 presents the convergence properties of the proposed algorithm. Finally, we give the conclusions in Sect. 4.
Notation. Throughout this paper, R n stands for the n-dimensional real Euclidean space, I is an identity matrix, · is the Euclidean norm equipped with inner product ·, · . For any vector x and matrix H, we denote x 2 H := x T Hx, where T is the transpose operation. H 0 means that the matrix H is positive definite (resp., positive semidefinite, H 0), while H G is used to denote H -G 0 (resp., H -G 0, H G), and moreover the minimum eigenvalue of a matrix H is denoted by σ H . For brevity, we additionally introduce the following notations: w := (x, y, λ), w k := x k , y k , λ k ,ŵ := (x, y, λ,ŷ),ŵ k := x k , y k , λ k , y k-1 , and the primal-dual errors x k := x kx k-1 , y k := y ky k-1 , λ k := λ kλ k-1 .

Preliminaries and ADMM-based SQP method
In this section, we provide some preliminaries that are useful in the sequel, and then describe the ADMM-based SQP method in detail.
The domain of function f is defined as dom f := {x ∈ R n : f (x) < +∞}. For a subset S ⊆ R n and a point x ∈ R n , the distance from x to S is defined as d(x, S) := inf{ xy : y ∈ S} and by convention d(x, S) = +∞ for all x when S = ∅.
(b) Let Φ η be the set of concave functions that satisfy (i); if f satisfies the KŁ property at each point of dom ∂f , then f is called a KŁ function.

Lemma 1 ([24], Uniformized KŁ property) Let Ω be a compact set, and let f be a proper lower semicontinuous function. Assume that f is constant on Ω and satisfies the KŁ property at each point of Ω.
Then there exist > 0, η > 0, and ϕ ∈ Φ η such that, for allx ∈ Ω and for all x, in the following intersection A semialgebraic set S ⊆ R n is a finite union of sets of the form where p 1 , . . . , p k and q 1 , . . . , q l are real polynomial functions. A function f : R n → R is semialgebraic if its graph is a semialgebraic subset of R n+1 . Such a function satisfies the KŁ property, see, e.g., [23,25,26], with ϕ(s) = cs 1-θ for some θ ∈ [0, 1) and some c > 0. On the other hand, some important stability properties of semialgebraic functions can be found [27].
• finite sums and products of semialgebraic functions are semialgebraic; • scalar products are semialgebraic; • indicator functions of semialgebraic sets are semialgebraic; • generalized inverse of semialgebraic mappings are semialgebraic; • composition of semialgebraic functions or mappings are semialgebraic; • functions of the type R n x → f (x) = sup y∈C g(x, y) (resp. R n x → f (x) = inf y∈C g(x, y)) where g and C are semialgebraic.
For a continuously differentiable convex function f on R n , the associated Bregman distance f is defined as for any x 1 , x 2 ∈ R n . Let us now collect some important properties about Bregman distance [10].
• Nonnegativity: we apply the splitting idea of the classical ADMM to the structured QP subproblem (6) so that variables x and y are updated alternatively at each iteration with regularized Bregman distances φ (·, x k ) and ψ (·, y k ), respectively, and then followed by the update of the Lagrangian multiplier λ, such a procedure can be formulated as follows: where φ and ψ are continuously differentiable and strongly convex functions with modulus σ φ , σ ψ on R n 1 and R n 2 , respectively. Notice that the objective functions of (7a) and (7b) are strictly convex if additionally we use the strong convexity of φ and ψ as well as the conditions that H x k + βA T A + σ φ I n 1 0 and H y k + βB T B + σ ψ I n 2 0. Invoking the first-order optimality conditions of (7a) and (7b), we have And then these, by the update formula (7c), can be rewritten as Based on the above analysis and preparation, now we describe the proposed algorithm in detail for solving problems (1) as follows (Algorithm 1).
Step 2. If a termination criterion is not met, calculate new matrices H x k+1 and H y k+1 , set k := k + 1, and return to Step 1.
Remark 1 At first glance, one might view the ADMM-based SQP method proposed in this paper as a special case of the well-known Bregman ADMM [10], whose iterative scheme is given as follows: when given some concrete choices of functionsφ k (x) andψ k (y) such as However, this may cause disagreements, since the pursuit of this statement raised above is simply a matter of form. In fact, the convexity property is required for a Bregman distance by definition, see, e.g., [28,29]. Meanwhile, it is worth pointing out that matrices H x k and H y k in this paper are not required to be positive semidefinite/definite, and function f or g is also not necessarily convex.

Convergence analysis
This section is devoted to the convergence analysis of the ADMM-based SQP method introduced in Sect. 2. First, we consider some basic assumptions as follows.
Assumption 1 Let min{σ 0 , σ φ , σ ψ } > 0, f : R n 1 → R, and g : R n 2 → R be continuously differentiable functions. Assume that the following conditions hold: (i) BB T σ 0 I, namely B is full row rank; (ii) φ and ψ are strongly convex with modulus σ φ , σ ψ , respectively; (iii) ∇f , ∇g, ∇φ, and ∇ψ are Lipschitz continuous with modulus f , g , φ , ψ > 0, respectively. where Remark 2 From Assumption 2, in order to guarantee that the final two relations in (10) hold, it is suitable for us to choose such quantities where σ H x k and σ H y k are the minimum eigenvalues of matrices H x k and H y k , respectively. Besides, there is no doubt that both QP subproblems (7a) and (7b) have a unique optimal solution.
To design an appropriate merit function for problem (1), we introduce a modified potential functionL β : Before giving the descent property ofL β (·), we first establish a series of technical results, which shall contribute to characterizing convergence properties of the ADMM-based SQP algorithm. To see that, we now provide an upper estimate for the quantities λ k+1 2 in the following. Lemma 2 Suppose that Assumptions 1 and 2 are satisfied. Then we have Proof First, it follows directly from Assumption 1(i) that Again, by the optimality condition (9b), we have Then, using inequality (16), Assumption 1(iii), and the boundedness of {H y k } in Assumption 2, we obtain B T λ k+1 2 = ∇g y k + H y k y k+1 + ∇ψ y k+1 -∇ψ y k -∇g y k-1 + H y k-1 y k + ∇ψ y k -∇ψ y k-1 2 which together with relation (15) immediately establishes the assertion.
To proceed, the following lemma bounds the pointwise change of the augmented Lagrangian function.

Lemma 3
Suppose that Assumptions 1 and 2 are satisfied, let {w k } be the sequence generated by the ADMM-based SQP method. Then the following assertion is true: Proof From Assumption 1(iii), we can see that ∇f and ∇g are Lipschitz continuous, then we can deduce and similarly g y k+1 ≤ g y k + ∇g y k T y k+1y k + g 2 y k+1 2 .
By the definition of L β (·) and formula (7c), we have Moreover, since y k+1 is a minimizer of (7b), then using the strong convexity of ψ and relation (20), we have Again, similarly recalling the update for x-subproblem (7a), one also has Summing up relations (21), (22), and (23), we obtain This, combined with relation (14), justifies the conclusion.

Lemma 4 Suppose that Assumptions 1 and 2 hold, let {w k } be the sequence generated by the ADMM-based SQP method. Then we havê
where matrices H x k and H y k are defined in (11a) and (11b), respectively.
Proof Clearly, we observe from Lemma 3 that . Thus, using the definitions ofL β (·), H x k , and H y k , the conclusion is immediately satisfied.
Notice that the boundedness of the iterative sequence {w k } generated by the ADMMbased SQP method plays an important role in the existence of a cluster point, so it is necessary to consider some additional conditions as follows.
Assumption 3 Assume that the following conditions are satisfied: (i) There exists a constantσ > 0 with g * := inf y {g(y) -σ ∇g(y) 2 } > -∞; (ii) f and g are coercive, i.e., lim x →+∞ f (x) = +∞, lim y →+∞ g(y) = +∞; We now prove the boundedness of the iterative sequence {w k }, which is further beneficial to illustrate that the potential function constructed is bounded from below.

Lemma 5 Suppose that Assumptions 1, 2, and 3 are all satisfied, let {w k } be the sequence generated by the ADMM-based SQP method. Then {w k } is bounded, and there exists a constant L such thatL
Proof First, by matching the complete square, we havê Next, recalling relation (16), Assumption 1(iii), and Assumption 2 gives and hence 1 2β Subtracting this into (25) and invoking the definition of δ in (13), we obtain On the other hand, from Lemma 4 and Assumption 2, we know that {L β (ŵ k )} is nonincreasing, and thus we havê Since lim x →+∞ f (x) = +∞ implies that f * := inf x f (x) > -∞, which together with Assumption 3 as well as relations (27) and (28), implies that sequences {x k }, { ∇g(y k ) }, and { y k } are bounded, and then the boundedness of {λ k } follows directly from estimate (26). Moreover, {y k } is also bounded by taking the fact that lim y →+∞ g(y) = +∞ implies that inf y g(y) > -∞. Therefore, the sequence {w k } is bounded.
To this end, ignoring some nonnegative terms of (27), one haŝ where L := f * + g * , and the proof is completed. Now, we are ready to establish the global convergence of the ADMM-based SQP method.

Proof (i) Note first from Lemma 4 that
Thus, summing up this inequality from k = 1 to n, using Lemma 5 yields Again, by Assumption 2, we know that, for any k, both matrices H x k and H y k are positive definite. Thus ∞ k=1 x k+1 2 < ∞ and ∞ k=1 y k+1 2 < ∞.
(ii) By Lemma 5, we argue that {w k } is bounded and thus there exists at least one cluster point. Assume that w * is a cluster point of {w k }, and let {w k j } be a convergent subsequence such that lim j→∞ w k j = w * . On the other hand, from Assumption 1(iii) and assertion (i), we know In view of this, and taking limit in (7c), (9a), and (9b) along the sequence {w k j }, we obtain This implies that w * is a KKT point of problem (1).
It is well known that when the potential function has a geometric property known as the KŁ property, then Theorem 1 can typically be strengthened since the limit point of the iterative sequence is unique. Theorem 2 Suppose that Assumptions 1, 2, and 3 hold, and suppose that f and g are semialgebraic functions, let {w k } be the sequence generated by the ADMM-based SQP method. Then the whole sequence {w k } converges to a KKT point of problem (1).
Proof Clearly, by Theorem 1, it suffices to prove that the sequence {w k } is convergent. From Lemmas 4 and 5, we know that {ŵ k } is also bounded and {L β (ŵ k )} is monotonically bounded from below, so L * := lim k→∞Lβ (ŵ k ) should exist. Then the rest of the proof is divided into two cases.
Suppose first thatL β (ŵ N ) = L * for some N ≥ 1. Since {L β (ŵ k )} is nonincreasing, we havê L β (ŵ k ) =L * for any k ≥ N . Then, according to Lemma 4, x N+t = x N and y N+t = y N hold for any t ≥ 0. This implies that both sequences {x k } and {y k } converge finitely. Furthermore, it follows from relation (14) that λ N+t = λ N+1 for any t ≥ 1. Hence, {λ k } is also convergent, which justifies the assertion.
Suppose next thatL β (ŵ k ) > L * for all k. Note thatL β (·) is a semialgebraic function due to the semialgebraicity of f and g, hence it is a KŁ function. Subsequently, this part involves three steps for analysis.
To begin with, we prove thatL β (·) is constant on Ω, where Ω is a set of all cluster points of the sequence {ŵ k }, and then utilize the uniformized KŁ property in Lemma 1.
That is, sequences {x k } and {y k } are convergent. Moreover, summing up relation (40) from k = k 1 to ∞, we also receive ∞ k=k 1 which implies that {λ k } is convergent too. That is, {w k } is convergent. Therefore, combining with Theorem 1, the whole proof is finished.

Conclusions
In this paper, an ADMM-based SQP method for separably smooth nonconvex problems with linear equality constraints is proposed. Incorporating the favorable ideas of SQP method and the classical ADMM, the QP subproblem of the original problem is split into smaller-scale QP subproblems, which can be easily solved with the help of Bregman distances, and hence relieve the difficulty brought by solving large-scale QP itself corresponding to the primal nonconvex optimization problem. Additionally, we update the Lagrangian multipliers in a dual ascent step. Based on the KŁ property and other standard assumptions, the proposed method is globally and strongly convergent in terms of the potential function.
As future work, it is tempting for us to consider whether these theoretical results can be used to develop a relaxation factor in the multipliers updating (7c) or some accelerated technique for variants of the ADMM-based SQP method. This is an interesting issue worthy of further investigation.