- Research
- Open access
- Published:
An ADMM-based SQP method for separably smooth nonconvex optimization
Journal of Inequalities and Applications volume 2020, Article number: 81 (2020)
Abstract
This work is about a splitting approach for solving separably smooth nonconvex linearly constrained optimization problems. Based on the ideas from two classical methods, namely the sequential quadratic programming (SQP) and the alternating direction method of multipliers (ADMM), we propose an ADMM-based SQP method. We focus on decomposing the quadratic programming (QP) subproblem of the primal problem into small-scale QP subproblems, which further embedded with Bregman distances can be solved effectively and followed by a dual ascent type update for the Lagrangian multipliers. Under suitable conditions as well as the crucial Kurdyka–Łojasiewicz property, we establish the global and strong convergence properties of the proposed method.
1 Introduction
Nonconvex optimization problems arise in a variety of applications ranging from the fields of signal and image processing, machine learning [1]. This class of problems is often structured and explicitly characterized with a separable objective, although they may be rather challenging to deal with.
In this paper, we consider the following nonconvex optimization problems with linear constraints and a separable objective function:
where \(f:R^{n_{1}}\rightarrow R\) and \(g:R^{n_{2}}\rightarrow R\) are continuously differentiable, but not necessarily convex, matrices \(A \in R^{m\times n_{1}}\), \(B \in R^{m\times n_{2}}\) and the vector \(b\in R^{m} \) are assumed to be given.
To solve problem (1) with separable structure, when f and g are convex, one simple but powerful method is the alternating direction method of multipliers (ADMM), which was originally proposed in [2, 3]. A survey of ADMM or its related variants has gained in popularity by many researchers, see, e.g., [4–9]. The standard iterative scheme of the classical ADMM for problem (1) acts as follows:
where \(\beta >0\) is a penalty parameter and
is the augmented Lagrangian function with a Lagrangian multiplier \(\lambda \in R^{m}\).
In contrast to the developments of ADMM for the convex case outlined above, there are some works investigated on understanding the properties of splitting approaches for nonconvex problems, although a rigorous analysis is generally very difficult. Note that the recent works in [10–12] all dealt with the nonconvex problems by means of ADMM-type methods and established favorably crucial convergence results. In particular, a Bregman modification of ADMM on the problem with the sum of a smooth function and a nonconvex function in the objective was considered by Wang et al. [10]. Meanwhile, Li et al. [11] devised two types of splitting methods, in which there was only one subproblem also embedded with a Bregman distance. Moreover, Hong et al. [12] focused on solving the nonconvex consensus and sharing problems. We believe that it is very meaningful and important to further study characteristics of splitting approaches designed in the sense of nonconvexity and enlarge applicable spectrums due to the necessity in practice.
On the other hand, it is well known that the sequential quadratic programming (SQP) method, dated back earliest to [13], is one of the most efficient methods for solving smoothly constrained optimization problems, since it enjoys good theoretical properties and stable numerical performance with better approximation of the primal problem. For more than half a century, the SQP method has received rapid development and fruitful achievements, see, e.g., [14–21] and the references therein. Recently, Jian et al. [22] discussed a class of separably smooth nonconvex optimization problems with linear constraints and closed convex sets and presented an ADMM-SQP method. In this work, the QP subproblem of the primal problem was split into two smaller-scale QP subproblems, which were solved in a Jacobian manner. Moreover, an inexact Armijo line search was carried out to illustrate the descent property of the augmented Lagrangian function, and the global convergence was proved under proper conditions. As we know, the classical QP subproblem associated with problem (1) reads as follows:
where \(H_{k} \in {\mathrm{R}}^{(n_{1}+ n_{2})\times (n_{1}+ n_{2})}\) is a symmetric approximation to the Hessian of the Lagrangian function, namely \(\mathcal{L}_{0} (x,y,\lambda )\) with \(\beta =0\) in (3), for problem (1) with respect to variables \((x,y)\). Consider that
then it is reasonable to choose matrix \(H_{k}\) of the form
where \(H_{k}^{x}\) and \(H_{k}^{y}\) are the symmetric approximations of \(\nabla ^{2} f(x^{k})\) and \(\nabla ^{2} g(y^{k})\), respectively. In view of this, QP subproblem (4) can be converted into the following structure:
where we find that the objective function is separable.
In this paper, motivated by the ideas of the splitting scheme applied to the QP subproblem in [22], and of the Bregman modification of ADMM in [10], we focus on QP subproblem (6) of the primal problem, and propose an ADMM-based SQP algorithm in the nonconvex sense. The resulting method makes use of the separable structure of QP subproblem (6) and decomposes it into two relatively small-scale QP subproblems in a Gauss–Seidel manner, which further are well equipped with additional Bregman distances and then are solved effectively. The main difference from the work [22] is that our proposed method is irrelevant with any line search and the convergence properties can be proved in terms of the potential function under some suitable conditions.
The remainder of this paper is structured as follows. In Sect. 2, the ADMM-based SQP algorithm is established as some elementary preliminaries are prepared. Section 3 presents the convergence properties of the proposed algorithm. Finally, we give the conclusions in Sect. 4.
Notation. Throughout this paper, \(R^{n}\) stands for the n-dimensional real Euclidean space, I is an identity matrix, \(\| \cdot \|\) is the Euclidean norm equipped with inner product \(\langle \cdot,\cdot \rangle \). For any vector x and matrix H, we denote \(\| x \|_{H}^{2}:= x^{\mathrm{T}} H x\), where T is the transpose operation. \(H \succ 0\) means that the matrix H is positive definite (resp., positive semidefinite, \(H \succeq 0\)), while \(H \succ G\) is used to denote \(H - G\succ 0\) (resp., \(H-G \succeq 0\), \(H\succeq G\)), and moreover the minimum eigenvalue of a matrix H is denoted by \(\sigma _{H}\). For brevity, we additionally introduce the following notations:
and the primal-dual errors
2 Preliminaries and ADMM-based SQP method
In this section, we provide some preliminaries that are useful in the sequel, and then describe the ADMM-based SQP method in detail.
The domain of function f is defined as \(\operatorname{dom}f:=\{ x\in R^{n}: f(x)<+\infty \}\). For a subset \(S\subseteq R^{n}\) and a point \(x\in R^{n}\), the distance from x to S is defined as \(d(x,S):= \inf \{ \|x-y\|: y\in S\}\) and by convention \(d(x,S) = + \infty \) for all x when \(S=\emptyset \).
Definition 1
([23], KŁ property)
Let f be a proper lower semicontinuous function, and let ∂f be the basic subdifferential of f in domf. Then
- (a)
The function f is said to have the Kurdyka–Łojasiewicz (KŁ) property at \(x^{\ast }\in \operatorname{dom} \partial f\) if there exist \(\eta \in (0,+\infty ]\), a neighborhood U of \(x^{\ast }\), and a continuous and concave function \(\varphi:[0,\eta )\rightarrow R_{+}\) such that
- (i)
\(\varphi (0) = 0\), and φ is continuously differentiable on \((0,\eta )\) with \(\varphi ' >0\);
- (ii)
for all x in \(U\cap \{x\in R^{n}: f(x^{\ast })< f(x)< f(x^{\ast })+\eta \}\), the KŁ inequality holds
$$ \varphi '\bigl(f(x)-f\bigl(x^{\ast }\bigr)\bigr)d\bigl(0, \partial f(x)\bigr)\geq 1. $$
- (i)
- (b)
Let \(\varPhi _{\eta }\) be the set of concave functions that satisfy (i); if f satisfies the KŁ property at each point of \(\operatorname{dom} \partial f\), then f is called a KŁ function.
Lemma 1
([24], Uniformized KŁ property)
LetΩbe a compact set, and letfbe a proper lower semicontinuous function. Assume thatfis constant onΩand satisfies the KŁ property at each point ofΩ. Then there exist\(\epsilon > 0,\eta >0\), and\(\varphi \in \varPhi _{\eta }\)such that, for all\(\bar{x}\in \varOmega \)and for allx, in the following intersection
one has
A semialgebraic set \(S\subseteq R^{n}\) is a finite union of sets of the form
where \(p_{1},\ldots, p_{k}\) and \(q_{1}, \ldots, q_{l}\) are real polynomial functions. A function \(f: R^{n} \rightarrow R\) is semialgebraic if its graph is a semialgebraic subset of \(R^{n+1}\). Such a function satisfies the KŁ property, see, e.g., [23, 25, 26], with \(\varphi (s)=cs^{1-\theta }\) for some \(\theta \in [0,1)\) and some \(c>0\). On the other hand, some important stability properties of semialgebraic functions can be found [27].
finite sums and products of semialgebraic functions are semialgebraic;
scalar products are semialgebraic;
indicator functions of semialgebraic sets are semialgebraic;
generalized inverse of semialgebraic mappings are semialgebraic;
composition of semialgebraic functions or mappings are semialgebraic;
functions of the type \(R^{n} \ni x\rightarrow f(x)= \sup_{y\in C} g(x,y)\) (resp. \(R^{n} \ni x\rightarrow f(x)= \inf_{y\in C} g(x,y)\)) where g and C are semialgebraic.
For a continuously differentiable convex function f on \(R^{n}\), the associated Bregman distance \(\triangle _{f}\) is defined as
for any \(x^{1},x^{2} \in R^{n}\). Let us now collect some important properties about Bregman distance [10].
Nonnegativity: \(\triangle _{f}(x^{1},x^{2})\geq 0, \triangle _{f}(x^{1},x^{1})=0\) for all \(x^{1}, x^{2}\).
Convexity: \(\triangle _{f}(x^{1},x^{2})\) is convex in \(x^{1}\), but not necessarily in \(x^{2}\).
Strong convexity: If f is \(\sigma _{f}\)-strongly convex, then \(\triangle _{f}(x^{1},x^{2})\geq \frac{\sigma _{f}}{2} \| x^{1}-x^{2} \|^{2}\) for all \(x^{1}, x^{2}\).
For the current primal-dual iterate \((x^{k},y^{k},\lambda ^{k})\in R^{n_{1}}\times R^{n_{2}}\times R^{m}\), we apply the splitting idea of the classical ADMM to the structured QP subproblem (6) so that variables x and y are updated alternatively at each iteration with regularized Bregman distances \(\triangle _{\phi }(\cdot,x^{k})\) and \(\triangle _{\psi }(\cdot,y^{k})\), respectively, and then followed by the update of the Lagrangian multiplier λ, such a procedure can be formulated as follows:
where ϕ and ψ are continuously differentiable and strongly convex functions with modulus \(\sigma _{\phi },\sigma _{\psi }\) on \(R^{n_{1}}\) and \(R^{n_{2}}\), respectively. Notice that the objective functions of (7a) and (7b) are strictly convex if additionally we use the strong convexity of ϕ and ψ as well as the conditions that \(H_{k}^{x} + \beta A^{\mathrm{T}} A + \sigma _{\phi }I_{n_{1}}\succ 0\) and \(H_{k}^{y} + \beta B^{\mathrm{T}} B + \sigma _{\psi }I_{n_{2}}\succ 0\).
Invoking the first-order optimality conditions of (7a) and (7b), we have
And then these, by the update formula (7c), can be rewritten as
Based on the above analysis and preparation, now we describe the proposed algorithm in detail for solving problems (1) as follows (Algorithm 1).
Remark 1
At first glance, one might view the ADMM-based SQP method proposed in this paper as a special case of the well-known Bregman ADMM [10], whose iterative scheme is given as follows:
when given some concrete choices of functions \(\tilde{\phi }_{k}(x)\) and \(\tilde{\psi }_{k}(y)\) such as
However, this may cause disagreements, since the pursuit of this statement raised above is simply a matter of form. In fact, the convexity property is required for a Bregman distance by definition, see, e.g., [28, 29]. Meanwhile, it is worth pointing out that matrices \(H^{x}_{k}\) and \(H^{y}_{k}\) in this paper are not required to be positive semidefinite/definite, and function f or g is also not necessarily convex.
3 Convergence analysis
This section is devoted to the convergence analysis of the ADMM-based SQP method introduced in Sect. 2. First, we consider some basic assumptions as follows.
Assumption 1
Let \(\min \{\sigma _{0},\sigma _{\phi },\sigma _{\psi }\}>0\), \(f:R^{n_{1}}\rightarrow R\), and \(g:R^{n_{2}}\rightarrow R\) be continuously differentiable functions. Assume that the following conditions hold:
- (i)
\({B B^{\mathrm{T}}}\succeq \sigma _{0} I\), namely B is full row rank;
- (ii)
ϕ and ψ are strongly convex with modulus \(\sigma _{\phi },\sigma _{\psi }\), respectively;
- (iii)
∇f, ∇g, ∇ϕ, and ∇ψ are Lipschitz continuous with modulus \(\ell _{f}, \ell _{g}, \ell _{\phi }, \ell _{\psi }> 0\), respectively.
Assumption 2
The matrix sequences \(\{H_{k}^{x}\}\) and \(\{H_{k}^{y}\}\) are bounded, and there exist constants \(h,\eta ^{x},\eta ^{y}>0\) such that
where
Remark 2
From Assumption 2, in order to guarantee that the final two relations in (10) hold, it is suitable for us to choose such quantities
and
where \(\sigma _{H^{x}_{k}}\) and \(\sigma _{H^{y}_{k}}\) are the minimum eigenvalues of matrices \(H_{k}^{x}\) and \(H_{k}^{y}\), respectively. Besides, there is no doubt that both QP subproblems (7a) and (7b) have a unique optimal solution.
To design an appropriate merit function for problem (1), we introduce a modified potential function\(\hat{\mathcal{L}}_{\beta }: R^{n_{1}}\times R^{n_{2}}\times R^{m} \times R^{n_{2}}\rightarrow R\) defined as
Before giving the descent property of \(\hat{\mathcal{L}}_{\beta }(\cdot )\), we first establish a series of technical results, which shall contribute to characterizing convergence properties of the ADMM-based SQP algorithm. To see that, we now provide an upper estimate for the quantities \(\| \triangle \lambda ^{k+1} \|^{2} \) in the following.
Lemma 2
Suppose that Assumptions1and2are satisfied. Then we have
Proof
First, it follows directly from Assumption 1(i) that
Again, by the optimality condition (9b), we have
Then, using inequality (16), Assumption 1(iii), and the boundedness of \(\{ H_{k}^{y} \}\) in Assumption 2, we obtain
which together with relation (15) immediately establishes the assertion. □
To proceed, the following lemma bounds the pointwise change of the augmented Lagrangian function.
Lemma 3
Suppose that Assumptions1and2are satisfied, let\(\{w^{k}\}\)be the sequence generated by the ADMM-based SQP method. Then the following assertion is true:
Proof
From Assumption 1(iii), we can see that ∇f and ∇g are Lipschitz continuous, then we can deduce
and similarly
By the definition of \(\mathcal{L}_{\beta }(\cdot )\) and formula (7c), we have
Moreover, since \(y^{k+1}\) is a minimizer of (7b), then using the strong convexity of ψ and relation (20), we have
Again, similarly recalling the update for x-subproblem (7a), one also has
Summing up relations (21), (22), and (23), we obtain
This, combined with relation (14), justifies the conclusion. □
We next give the descent property of \(\hat{\mathcal{L}}_{\beta }(\cdot )\), i.e., the sequence \(\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}\) is monotonically nonincreasing when depending on Assumption 1 and Assumption 2.
Lemma 4
Suppose that Assumptions1and2hold, let\(\{w^{k}\}\)be the sequence generated by the ADMM-based SQP method. Then we have
where matrices\(\mathcal{H}_{k}^{x} \)and\(\mathcal{H}_{k}^{y} \)are defined in (11a) and (11b), respectively.
Proof
Clearly, we observe from Lemma 3 that
Thus, using the definitions of \(\hat{\mathcal{L}}_{\beta }(\cdot )\), \(\mathcal{H}_{k}^{x}\), and \(\mathcal{H}_{k}^{y}\), the conclusion is immediately satisfied. □
Notice that the boundedness of the iterative sequence \(\{ w^{k}\}\) generated by the ADMM-based SQP method plays an important role in the existence of a cluster point, so it is necessary to consider some additional conditions as follows.
Assumption 3
Assume that the following conditions are satisfied:
- (i)
There exists a constant \(\tilde{\sigma } > 0\) with \(g^{\ast }:= \inf_{y} \{ g(y) - \tilde{\sigma } \| \nabla g(y) \| ^{2} \} >-\infty \);
- (ii)
f and g are coercive, i.e., \(\lim_{\| x \|\rightarrow +\infty } f(x) =+\infty, \lim_{\| y \| \rightarrow +\infty } g(y) = +\infty \);
- (iii)
\(\beta > \max \{\bar{\beta }, \frac{1}{\tilde{\sigma }\sigma _{0}}\}\).
We now prove the boundedness of the iterative sequence \(\{ w^{k}\}\), which is further beneficial to illustrate that the potential function constructed is bounded from below.
Lemma 5
Suppose that Assumptions1, 2, and3are all satisfied, let\(\{w^{k}\}\)be the sequence generated by the ADMM-based SQP method. Then\(\{ w^{k}\}\)is bounded, and there exists a constant\(\underline{\mathcal{L}}\)such that\(\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) \geq \underline{\mathcal{L}} > - \infty, \forall k>0\).
Proof
First, by matching the complete square, we have
Next, recalling relation (16), Assumption 1(iii), and Assumption 2 gives
and hence
Subtracting this into (25) and invoking the definition of δ in (13), we obtain
On the other hand, from Lemma 4 and Assumption 2, we know that \(\{ \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}\) is nonincreasing, and thus we have
Since \(\lim_{\| x \|\rightarrow +\infty } f(x) =+\infty \) implies that \(f^{\ast }:=\inf_{ x }f(x) > -\infty \), which together with Assumption 3 as well as relations (27) and (28), implies that sequences \(\{x^{k}\}\), \(\{\| \nabla g(y^{k}) \|\}\), and \(\{\|\triangle y^{k}\|\}\) are bounded, and then the boundedness of \(\{\lambda ^{k}\}\) follows directly from estimate (26). Moreover, \(\{y^{k}\}\) is also bounded by taking the fact that \(\lim_{\| y \|\rightarrow +\infty } g(y) = +\infty \) implies that \(\inf_{ y } g(y) > -\infty \). Therefore, the sequence \(\{w^{k}\}\) is bounded.
To this end, ignoring some nonnegative terms of (27), one has
where \(\underline{\mathcal{L}}:=f^{\ast }+g^{\ast }\), and the proof is completed. □
Now, we are ready to establish the global convergence of the ADMM-based SQP method.
Theorem 1
Suppose that Assumptions1, 2, and3are all satisfied, let\(\{w^{k}\}\)be the sequence generated by the ADMM-based SQP method. Then
- (i)
\(\lim_{k\rightarrow \infty } ( \| \triangle x^{k+1} \| + \| \triangle y^{k+1}\| +\| \triangle \lambda ^{k+1}\| ) =0\);
- (ii)
any cluster point\(w^{\ast }\)of the sequence\(\{w^{k}\}\)is a KKT point of problem (1).
Proof
(i) Note first from Lemma 4 that
Thus, summing up this inequality from \(k=1\) to n, using Lemma 5 yields
Again, by Assumption 2, we know that, for any k, both matrices \(\mathcal{H}_{k}^{x}\) and \(\mathcal{H}_{k}^{y}\) are positive definite. Thus
Furthermore, in view of estimate (14), one also has \(\sum_{k=1}^{\infty }\|\triangle \lambda ^{k+1} \|^{2} < \infty \). So,
which completes assertion (i).
(ii) By Lemma 5, we argue that \(\{ w^{k} \}\) is bounded and thus there exists at least one cluster point. Assume that \(w^{\ast }\) is a cluster point of \(\{ w^{k} \}\), and let \(\{ w^{k_{j}}\}\) be a convergent subsequence such that \(\lim_{j\rightarrow \infty } w^{k_{j}}=w^{\ast }\). On the other hand, from Assumption 1(iii) and assertion (i), we know
In view of this, and taking limit in (7c), (9a), and (9b) along the sequence \(\{ w^{k_{j}}\}\), we obtain
This implies that \(w^{\ast }\) is a KKT point of problem (1). □
It is well known that when the potential function has a geometric property known as the KŁ property, then Theorem 1 can typically be strengthened since the limit point of the iterative sequence is unique.
Theorem 2
Suppose that Assumptions1, 2, and3hold, and suppose thatfandgare semialgebraic functions, let\(\{w^{k}\}\)be the sequence generated by the ADMM-based SQP method. Then the whole sequence\(\{ w^{k} \}\)converges to a KKT point of problem (1).
Proof
Clearly, by Theorem 1, it suffices to prove that the sequence \(\{ w^{k}\}\) is convergent. From Lemmas 4 and 5, we know that \(\{ \hat{w}^{k} \}\) is also bounded and \(\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}\) is monotonically bounded from below, so \(\mathcal{L}^{\ast }:=\lim_{k\rightarrow \infty } \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\) should exist. Then the rest of the proof is divided into two cases.
Suppose first that \(\hat{\mathcal{L}}_{\beta }(\hat{w}^{N}) = \mathcal{L}^{\ast }\) for some \(N \geq 1\). Since \(\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}\) is nonincreasing, we have \(\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) = \hat{\mathcal{L}}^{\ast }\) for any \(k \geq N\). Then, according to Lemma 4, \(x^{N+t} = x^{N}\) and \(y^{N+t} = y^{N} \) hold for any \(t \geq 0\). This implies that both sequences \(\{ x^{k} \}\) and \(\{ y^{k} \}\) converge finitely. Furthermore, it follows from relation (14) that \(\lambda ^{N+t} = \lambda ^{N+1}\) for any \(t\geq 1\). Hence, \(\{ \lambda ^{k} \}\) is also convergent, which justifies the assertion.
Suppose next that \(\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) > \mathcal{L}^{\ast }\) for all k. Note that \(\hat{\mathcal{L}}_{\beta }(\cdot )\) is a semialgebraic function due to the semialgebraicity of f and g, hence it is a KŁ function. Subsequently, this part involves three steps for analysis.
To begin with, we prove that \(\hat{\mathcal{L}}_{\beta }(\cdot )\) is constant on Ω, where Ω is a set of all cluster points of the sequence \(\{ \hat{w}^{k} \}\), and then utilize the uniformized KŁ property in Lemma 1.
By the boundedness of \(\{ \hat{w}^{k} \}\), obviously, Ω is nonempty and \(d(\hat{w}^{k},\varOmega ) \rightarrow 0\). Similarly, as proved in [24, Lemma 5(iii)], we also know that Ω is a compact and connected set. Let \(\hat{w}^{\ast }\in \varOmega \) be arbitrary, and consider a convergent subsequence \(\{ \hat{w}^{k_{j}} \}\) of \(\{ \hat{w}^{k} \}\) converging to \(\hat{w}^{\ast }\). Then, by the continuity of \(\hat{\mathcal{L}}_{\beta }(\cdot )\), we have \(\mathcal{L}^{\ast }= \lim_{j\rightarrow \infty } \hat{\mathcal{L}}_{\beta }(\hat{w}^{k_{j}}) = \hat{\mathcal{L}}_{\beta }( \hat{w}^{\ast })\), so \(\{ \hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) \}\) is convergent, namely \(\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\rightarrow \hat{\mathcal{L}}_{\beta }(\hat{w}^{\ast })\). Thus, \(\hat{\mathcal{L}}_{\beta }(\cdot )\) is constant on Ω since \(\hat{w}^{\ast }\in \varOmega \) is arbitrary. Now, using Lemma 1, for any \(\delta _{0}>0\), \(\eta >0\), there exists an integer \(k_{1}>0\) such that \(d(\hat{w}^{k},\varOmega ) < \delta _{0}\) and \(\mathcal{L}^{\ast }< {\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})} < \mathcal{L}^{\ast }+ \eta \), and
Next, we attempt to bound the distance from 0 to \(\nabla \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\). Taking the partial derivative of \(\hat{\mathcal{L}}_{\beta }(\cdot )\) at \(\hat{w}^{k}\) with respect to variable x, we have
where the final equality follows from the optimality condition (8a). Likewise, for variable y, we have
where the second equality follows from (8b), and the final equality utilizes formula (7c). Additionally,
and
Since ∇ϕ and ∇ψ are Lipschitz continuous, then combining Assumption 2 with relations (33)–(36), there exists a constant \(a>0\) such that
Finally, based on relations (32) and (37), we start to study the convergence of the entire sequence \(\{ w^{k}\} \). For statement convenience, let
and
As \(\hat{\mathcal{L}}_{\beta }(\cdot )\) is nonincreasing and by Definition 1 that φ is monotonous, it is easy to have \(\Delta^{k} \geq 0\) for \(k\geq k_{1}\). Multiplying both sides of (37) by \(\Delta^{k}\), using inequality (32) and the concavity of φ, as well as Lemma 4 and Assumption 2, we obtain for all \(k \geq k_{1}\) that
where \(\sigma _{1}:= \frac{1}{4} \min \{\eta ^{x}, \eta ^{y} \}>0\). Now, dividing both sides of (38) by \(\sigma _{1}\), taking the square root, and then applying the inequality \(\frac{u+v}{2}\geq \sqrt{uv}\) for \(u,v\geq 0 \) to address the resulting inequality, we get
where \(m>0\) is an arbitrary constant. Then, by relation (16), Assumptions 1 and 2, we can derive
Accordingly, for simplicity, let \(d_{1}:= \frac{\ell _{g} +\ell _{\psi }+ h}{\sqrt{{\sigma _{0}}}}\) and \(d_{2}:= \frac{\ell _{\psi }+ h}{\sqrt{{\sigma _{0}}}}\), we have
and
Hence, substituting these into (39) and regrouping terms, we obtain
which by manipulating, can be rewritten as
As a result, making use of the nonnegativity of φ, summing up inequality (41) from \(k=k_{1}\) to ∞, we have
Note that \(m>0\) is arbitrary, now selecting \(m > 1+ {d}_{1}+ {d}_{2}\), and hence \(1-\frac{1}{2m}>0\) and \(1-\frac{1+ {d}_{1}+ {d}_{2}}{m} >0\), then it follows directly from (42) that
That is, sequences \(\{ x^{k} \} \) and \(\{ y^{k} \} \) are convergent. Moreover, summing up relation (40) from \(k=k_{1}\) to ∞, we also receive
which implies that \(\{ \lambda ^{k} \} \) is convergent too. That is, \(\{ w^{k} \} \) is convergent. Therefore, combining with Theorem 1, the whole proof is finished. □
4 Conclusions
In this paper, an ADMM-based SQP method for separably smooth nonconvex problems with linear equality constraints is proposed. Incorporating the favorable ideas of SQP method and the classical ADMM, the QP subproblem of the original problem is split into smaller-scale QP subproblems, which can be easily solved with the help of Bregman distances, and hence relieve the difficulty brought by solving large-scale QP itself corresponding to the primal nonconvex optimization problem. Additionally, we update the Lagrangian multipliers in a dual ascent step. Based on the KŁ property and other standard assumptions, the proposed method is globally and strongly convergent in terms of the potential function.
As future work, it is tempting for us to consider whether these theoretical results can be used to develop a relaxation factor in the multipliers updating (7c) or some accelerated technique for variants of the ADMM-based SQP method. This is an interesting issue worthy of further investigation.
References
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Glowinski, R., Marrocco, A.: Sur l’approximation par éléments finis d’ordre un et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. RAIRO. Anal. Numér. 9(2), 41–76 (1975)
Goldfard, D., Ma, S.Q., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)
Goldstein, T., Donoghue, B.O., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)
He, B.S., Liu, H., Wang, Z.R., Yuan, X.M.: A strictly contractive Peaceman–Rachford splitting method for convex programming. SIAM J. Optim. 24, 1011–1040 (2014)
He, B.S., Yuan, X.M.: On the \(\mathcal{O}(\frac{1}{n})\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
He, B.S., Yuan, X.M.: On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Wang, F.H., Xu, Z.B., Xu, H.K.: Convergence of alternating direction method with multipliers for nonconvex composite problems (2014) Preprint. Available at. arXiv:1410.8625
Li, G.Y., Pong, T.K.: Global convergence of splitting methods for nonconvex composition optimization. SIAM J. Optim. 25(1), 2434–2460 (2015)
Hong, M.Y., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)
Wilson, R.B.: A simplicial method for concave programming. PhD thesis, Graduate School of Business Administration, Harvard University, Cambridge (1963)
Robinson, S.: A quadratically convergent algorithm for general nonlinear programming problems. Math. Program. 3, 145–156 (1972)
Fukushima, M.: A successive quadratic programming algorithm with global and superlinear convergence properties. Math. Program. 35, 253–264 (1986)
Solodov, M.V.: Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences. Math. Program., Ser. A 118, 1–12 (2009)
Panier, E.R., Tits, A.L.: A superlinearly convergent feasible method for the solution of inequality constrained optimization problems. SIAM J. Control Optim. 25, 934–950 (1987)
Zhu, Z.B., Jian, J.B.: An efficient feasible SQP algorithm for inequality constrained optimization. Nonlinear Anal., Real World Appl. 10, 1220–1228 (2009)
Jian, J.B.: A superlinearly convergent implicit smooth SQP algorithm for mathematical programs with nonlinear complementarity constraints. Comput. Optim. Appl. 31(3), 335–361 (2005)
Jian, J.B., Zheng, H.Y., Tang, C.M., Hu, Q.J.: A new superlinearly convergent norm-relaxed method of strongly sub-feasible direction for inequality constrained optimization. Appl. Math. Comput. 182, 955–976 (2006)
Jian, J.B.: Fast Algorithms for Smooth Constrained Optimization—Theoretical Analysis and Numerical Experiments. Science Press, Beijing (2010)
Jian, J.B., Lao, Y.X., Chao, M.T., Ma, G.D.: ADMM-SQP algorithm for two blocks linear constrained nonconvex optimization. Oper. Res. Trans. 22(2), 79–92 (2018)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Acknowledgements
The authors are very grateful to the referees for their valuable comments and suggestions on the early version of the paper.
Availability of data and materials
Not applicable.
Funding
This work was supported by the National Natural Science Foundation of China (No. 11771383), the Natural Science Foundation of Guangxi Province (No. 2018GXNSFFA281007), and the Middle-aged and Young Teachers’ Basic Ability Promotion Project of Guangxi Province (No. 2017KY0537).
Author information
Authors and Affiliations
Contributions
ML conceived of the description of the ADMM-based SQP method, the convergence analysis and drafted the manuscript. JJ carried out the idea of this paper and participated in the convergence analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Liu, M., Jian, J. An ADMM-based SQP method for separably smooth nonconvex optimization. J Inequal Appl 2020, 81 (2020). https://doi.org/10.1186/s13660-020-02347-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-020-02347-3