# An ADMM-based SQP method for separably smooth nonconvex optimization

## Abstract

This work is about a splitting approach for solving separably smooth nonconvex linearly constrained optimization problems. Based on the ideas from two classical methods, namely the sequential quadratic programming (SQP) and the alternating direction method of multipliers (ADMM), we propose an ADMM-based SQP method. We focus on decomposing the quadratic programming (QP) subproblem of the primal problem into small-scale QP subproblems, which further embedded with Bregman distances can be solved effectively and followed by a dual ascent type update for the Lagrangian multipliers. Under suitable conditions as well as the crucial Kurdyka–Łojasiewicz property, we establish the global and strong convergence properties of the proposed method.

## Introduction

Nonconvex optimization problems arise in a variety of applications ranging from the fields of signal and image processing, machine learning . This class of problems is often structured and explicitly characterized with a separable objective, although they may be rather challenging to deal with.

In this paper, we consider the following nonconvex optimization problems with linear constraints and a separable objective function:

\begin{aligned} &\min_{x,y} \quad f(x)+ g(y), \\ &\mathrm{s.t.}\quad Ax + By = b, \end{aligned}
(1)

where $$f:R^{n_{1}}\rightarrow R$$ and $$g:R^{n_{2}}\rightarrow R$$ are continuously differentiable, but not necessarily convex, matrices $$A \in R^{m\times n_{1}}$$, $$B \in R^{m\times n_{2}}$$ and the vector $$b\in R^{m}$$ are assumed to be given.

To solve problem (1) with separable structure, when f and g are convex, one simple but powerful method is the alternating direction method of multipliers (ADMM), which was originally proposed in [2, 3]. A survey of ADMM or its related variants has gained in popularity by many researchers, see, e.g., . The standard iterative scheme of the classical ADMM for problem (1) acts as follows:

where $$\beta >0$$ is a penalty parameter and

$$\mathcal{L}_{\beta }(x,y,\lambda ) = f(x)+ g(y)- \lambda ^{\mathrm{T}} (Ax + By - b) + \frac{\beta }{2} \Vert Ax + By - b \Vert ^{2}$$
(3)

is the augmented Lagrangian function with a Lagrangian multiplier $$\lambda \in R^{m}$$.

In contrast to the developments of ADMM for the convex case outlined above, there are some works investigated on understanding the properties of splitting approaches for nonconvex problems, although a rigorous analysis is generally very difficult. Note that the recent works in  all dealt with the nonconvex problems by means of ADMM-type methods and established favorably crucial convergence results. In particular, a Bregman modification of ADMM on the problem with the sum of a smooth function and a nonconvex function in the objective was considered by Wang et al. . Meanwhile, Li et al.  devised two types of splitting methods, in which there was only one subproblem also embedded with a Bregman distance. Moreover, Hong et al.  focused on solving the nonconvex consensus and sharing problems. We believe that it is very meaningful and important to further study characteristics of splitting approaches designed in the sense of nonconvexity and enlarge applicable spectrums due to the necessity in practice.

On the other hand, it is well known that the sequential quadratic programming (SQP) method, dated back earliest to , is one of the most efficient methods for solving smoothly constrained optimization problems, since it enjoys good theoretical properties and stable numerical performance with better approximation of the primal problem. For more than half a century, the SQP method has received rapid development and fruitful achievements, see, e.g.,  and the references therein. Recently, Jian et al.  discussed a class of separably smooth nonconvex optimization problems with linear constraints and closed convex sets and presented an ADMM-SQP method. In this work, the QP subproblem of the primal problem was split into two smaller-scale QP subproblems, which were solved in a Jacobian manner. Moreover, an inexact Armijo line search was carried out to illustrate the descent property of the augmented Lagrangian function, and the global convergence was proved under proper conditions. As we know, the classical QP subproblem associated with problem (1) reads as follows:

\begin{aligned} &\min_{x,y} \quad \nabla f \bigl(x^{k}\bigr)^{\mathrm{T}} \bigl(x-x^{k}\bigr)+ \nabla g \bigl(y^{k}\bigr)^{ \mathrm{T}} \bigl(y-y^{k}\bigr) + \frac{1}{2} \bigl\Vert \bigl(x-x^{k},y-y^{k}\bigr) \bigr\Vert ^{2}_{H_{k}}, \\ &{\mathrm{s.t.}}\quad Ax + By = b, \end{aligned}
(4)

where $$H_{k} \in {\mathrm{R}}^{(n_{1}+ n_{2})\times (n_{1}+ n_{2})}$$ is a symmetric approximation to the Hessian of the Lagrangian function, namely $$\mathcal{L}_{0} (x,y,\lambda )$$ with $$\beta =0$$ in (3), for problem (1) with respect to variables $$(x,y)$$. Consider that

$$\nabla ^{2}_{(x,y)} \mathcal{L}_{0} (x,y,\lambda ) = \begin{pmatrix} \nabla ^{2} f(x) & O \\ O & \nabla ^{2} g(y) \end{pmatrix},$$

then it is reasonable to choose matrix $$H_{k}$$ of the form

$$H_{k} = \begin{pmatrix} H_{k}^{x} & O \\ O & H_{k}^{y} \end{pmatrix},$$
(5)

where $$H_{k}^{x}$$ and $$H_{k}^{y}$$ are the symmetric approximations of $$\nabla ^{2} f(x^{k})$$ and $$\nabla ^{2} g(y^{k})$$, respectively. In view of this, QP subproblem (4) can be converted into the following structure:

\begin{aligned}& \min_{x,y}\quad \nabla f \bigl(x^{k}\bigr)^{\mathrm{T}} \bigl(x-x^{k}\bigr) + \frac{1}{2} \bigl\Vert x-x^{k} \bigr\Vert ^{2}_{H_{k}^{x}} + \nabla g\bigl(y^{k}\bigr)^{\mathrm{T}} \bigl(y-y^{k}\bigr) + \frac{1}{2} \bigl\Vert y-y^{k} \bigr\Vert ^{2}_{H_{k}^{y}} \\ &\mathrm{s.t.}\quad Ax + By = b, \end{aligned}
(6)

where we find that the objective function is separable.

In this paper, motivated by the ideas of the splitting scheme applied to the QP subproblem in , and of the Bregman modification of ADMM in , we focus on QP subproblem (6) of the primal problem, and propose an ADMM-based SQP algorithm in the nonconvex sense. The resulting method makes use of the separable structure of QP subproblem (6) and decomposes it into two relatively small-scale QP subproblems in a Gauss–Seidel manner, which further are well equipped with additional Bregman distances and then are solved effectively. The main difference from the work  is that our proposed method is irrelevant with any line search and the convergence properties can be proved in terms of the potential function under some suitable conditions.

The remainder of this paper is structured as follows. In Sect. 2, the ADMM-based SQP algorithm is established as some elementary preliminaries are prepared. Section 3 presents the convergence properties of the proposed algorithm. Finally, we give the conclusions in Sect. 4.

Notation. Throughout this paper, $$R^{n}$$ stands for the n-dimensional real Euclidean space, I is an identity matrix, $$\| \cdot \|$$ is the Euclidean norm equipped with inner product $$\langle \cdot,\cdot \rangle$$. For any vector x and matrix H, we denote $$\| x \|_{H}^{2}:= x^{\mathrm{T}} H x$$, where T is the transpose operation. $$H \succ 0$$ means that the matrix H is positive definite (resp., positive semidefinite, $$H \succeq 0$$), while $$H \succ G$$ is used to denote $$H - G\succ 0$$ (resp., $$H-G \succeq 0$$, $$H\succeq G$$), and moreover the minimum eigenvalue of a matrix H is denoted by $$\sigma _{H}$$. For brevity, we additionally introduce the following notations:

$$w:=(x,y,\lambda ),\qquad w^{k}:=\bigl(x^{k},y^{k}, \lambda ^{k}\bigr),\qquad \hat{w}:= (x,y, \lambda,\hat{y}),\qquad \hat{w}^{k}:= \bigl(x^{k},y^{k},\lambda ^{k},y^{k-1}\bigr),$$

and the primal-dual errors

$$\triangle x^{k}:=x^{k}-x^{k-1},\qquad \triangle y^{k}:=y^{k}-y^{k-1}, \qquad\triangle \lambda ^{k}:=\lambda ^{k}-\lambda ^{k-1}.$$

## Preliminaries and ADMM-based SQP method

In this section, we provide some preliminaries that are useful in the sequel, and then describe the ADMM-based SQP method in detail.

The domain of function f is defined as $$\operatorname{dom}f:=\{ x\in R^{n}: f(x)<+\infty \}$$. For a subset $$S\subseteq R^{n}$$ and a point $$x\in R^{n}$$, the distance from x to S is defined as $$d(x,S):= \inf \{ \|x-y\|: y\in S\}$$ and by convention $$d(x,S) = + \infty$$ for all x when $$S=\emptyset$$.

### Definition 1

(, KŁ property)

Let f be a proper lower semicontinuous function, and let ∂f be the basic subdifferential of f in domf. Then

1. (a)

The function f is said to have the Kurdyka–Łojasiewicz (KŁ) property at $$x^{\ast }\in \operatorname{dom} \partial f$$ if there exist $$\eta \in (0,+\infty ]$$, a neighborhood U of $$x^{\ast }$$, and a continuous and concave function $$\varphi:[0,\eta )\rightarrow R_{+}$$ such that

1. (i)

$$\varphi (0) = 0$$, and φ is continuously differentiable on $$(0,\eta )$$ with $$\varphi ' >0$$;

2. (ii)

for all x in $$U\cap \{x\in R^{n}: f(x^{\ast })< f(x)< f(x^{\ast })+\eta \}$$, the KŁ inequality holds

$$\varphi '\bigl(f(x)-f\bigl(x^{\ast }\bigr)\bigr)d\bigl(0, \partial f(x)\bigr)\geq 1.$$
2. (b)

Let $$\varPhi _{\eta }$$ be the set of concave functions that satisfy (i); if f satisfies the KŁ property at each point of $$\operatorname{dom} \partial f$$, then f is called a KŁ function.

### Lemma 1

(, Uniformized KŁ property)

LetΩbe a compact set, and letfbe a proper lower semicontinuous function. Assume thatfis constant onΩand satisfies the KŁ property at each point ofΩ. Then there exist$$\epsilon > 0,\eta >0$$, and$$\varphi \in \varPhi _{\eta }$$such that, for all$$\bar{x}\in \varOmega$$and for allx, in the following intersection

$$\bigl\{ x\in R^{n}: d(x,\varOmega )< \epsilon \bigr\} \cap \bigl\{ x\in R^{n}: f(\bar{x})< f(x)< f( \bar{x})+\eta \bigr\} ,$$

one has

$$\varphi '\bigl(f(x)-f(\bar{x})\bigr)d\bigl(0,\partial f(x)\bigr) \geq 1.$$

A semialgebraic set $$S\subseteq R^{n}$$ is a finite union of sets of the form

$$\bigl\{ x \in R^{n}: p_{1}(x) =\cdots = p_{k}(x) = 0, q_{1}(x) < 0, \ldots, q_{l}(x) < 0\bigr\} ,$$

where $$p_{1},\ldots, p_{k}$$ and $$q_{1}, \ldots, q_{l}$$ are real polynomial functions. A function $$f: R^{n} \rightarrow R$$ is semialgebraic if its graph is a semialgebraic subset of $$R^{n+1}$$. Such a function satisfies the KŁ property, see, e.g., [23, 25, 26], with $$\varphi (s)=cs^{1-\theta }$$ for some $$\theta \in [0,1)$$ and some $$c>0$$. On the other hand, some important stability properties of semialgebraic functions can be found .

• finite sums and products of semialgebraic functions are semialgebraic;

• scalar products are semialgebraic;

• indicator functions of semialgebraic sets are semialgebraic;

• generalized inverse of semialgebraic mappings are semialgebraic;

• composition of semialgebraic functions or mappings are semialgebraic;

• functions of the type $$R^{n} \ni x\rightarrow f(x)= \sup_{y\in C} g(x,y)$$ (resp. $$R^{n} \ni x\rightarrow f(x)= \inf_{y\in C} g(x,y)$$) where g and C are semialgebraic.

For a continuously differentiable convex function f on $$R^{n}$$, the associated Bregman distance $$\triangle _{f}$$ is defined as

$$\triangle _{f}\bigl(x^{1},x^{2}\bigr) = f \bigl(x^{1}\bigr) - f\bigl(x^{2}\bigr) - \bigl\langle \nabla f\bigl(x^{2}\bigr),x^{1}-x^{2} \bigr\rangle$$

for any $$x^{1},x^{2} \in R^{n}$$. Let us now collect some important properties about Bregman distance .

• Nonnegativity: $$\triangle _{f}(x^{1},x^{2})\geq 0, \triangle _{f}(x^{1},x^{1})=0$$ for all $$x^{1}, x^{2}$$.

• Convexity: $$\triangle _{f}(x^{1},x^{2})$$ is convex in $$x^{1}$$, but not necessarily in $$x^{2}$$.

• Strong convexity: If f is $$\sigma _{f}$$-strongly convex, then $$\triangle _{f}(x^{1},x^{2})\geq \frac{\sigma _{f}}{2} \| x^{1}-x^{2} \|^{2}$$ for all $$x^{1}, x^{2}$$.

For the current primal-dual iterate $$(x^{k},y^{k},\lambda ^{k})\in R^{n_{1}}\times R^{n_{2}}\times R^{m}$$, we apply the splitting idea of the classical ADMM to the structured QP subproblem (6) so that variables x and y are updated alternatively at each iteration with regularized Bregman distances $$\triangle _{\phi }(\cdot,x^{k})$$ and $$\triangle _{\psi }(\cdot,y^{k})$$, respectively, and then followed by the update of the Lagrangian multiplier λ, such a procedure can be formulated as follows:

where ϕ and ψ are continuously differentiable and strongly convex functions with modulus $$\sigma _{\phi },\sigma _{\psi }$$ on $$R^{n_{1}}$$ and $$R^{n_{2}}$$, respectively. Notice that the objective functions of (7a) and (7b) are strictly convex if additionally we use the strong convexity of ϕ and ψ as well as the conditions that $$H_{k}^{x} + \beta A^{\mathrm{T}} A + \sigma _{\phi }I_{n_{1}}\succ 0$$ and $$H_{k}^{y} + \beta B^{\mathrm{T}} B + \sigma _{\psi }I_{n_{2}}\succ 0$$.

Invoking the first-order optimality conditions of (7a) and (7b), we have

\begin{aligned} & \nabla f\bigl(x^{k}\bigr) + H_{k}^{x} \triangle x^{k+1} - A^{\mathrm{T}} \lambda ^{k} + \beta A^{\mathrm{T}} \bigl( A x^{k+1} +B y^{k}-b\bigr) \end{aligned}
(8a)
\begin{aligned} &\quad{} + \nabla \phi \bigl(x^{k+1}\bigr) - \nabla \phi \bigl(x^{k} \bigr) =0, \\ & \nabla g\bigl(y^{k}\bigr) + H_{k}^{y} \triangle y^{k+1} - B^{\mathrm{T}} \lambda ^{k} + \beta B^{\mathrm{T}} \bigl( A x^{k+1} +B y^{k+1}-b\bigr) \\ &\quad{} + \nabla \psi \bigl(y^{k+1}\bigr) - \nabla \psi \bigl(y^{k} \bigr) =0. \end{aligned}
(8b)

And then these, by the update formula (7c), can be rewritten as

\begin{aligned} & \nabla f\bigl(x^{k}\bigr) + H_{k}^{x} \triangle x^{k+1} - A^{\mathrm{T}} \lambda ^{k+1} - \beta A^{\mathrm{T}} B \triangle y^{k+1} + \nabla \phi \bigl(x^{k+1}\bigr) - \nabla \phi \bigl(x^{k}\bigr) =0, \end{aligned}
(9a)
\begin{aligned} & \nabla g\bigl(y^{k}\bigr) + H_{k}^{y} \triangle y^{k+1} - B^{\mathrm{T}} \lambda ^{k+1} + \nabla \psi \bigl(y^{k+1}\bigr) - \nabla \psi \bigl(y^{k}\bigr) =0. \end{aligned}
(9b)

Based on the above analysis and preparation, now we describe the proposed algorithm in detail for solving problems (1) as follows (Algorithm 1).

### Remark 1

At first glance, one might view the ADMM-based SQP method proposed in this paper as a special case of the well-known Bregman ADMM , whose iterative scheme is given as follows:

\begin{aligned} \textstyle\begin{cases} x^{k+1}=\mathop{\arg \min }_{x\in R^{n_{1}}} \{\mathcal{L}_{\beta }(x,y^{k}, \lambda ^{k}) + \triangle _{\tilde{\phi }_{k}}(x,x^{k})\}, \\ y^{k+1}=\mathop{\arg \min }_{y\in R^{n_{2}}} \{\mathcal{L}_{\beta }(x^{k+1},y, \lambda ^{k})+ \triangle _{\tilde{\psi }_{k}}(y,y^{k})\}, \\ \lambda ^{k+1}=\lambda ^{k}- \beta (Ax^{k+1}+By^{k+1}-b), \end{cases}\displaystyle \end{aligned}

when given some concrete choices of functions $$\tilde{\phi }_{k}(x)$$ and $$\tilde{\psi }_{k}(y)$$ such as

$$\tilde{\phi }_{k}(x):=\phi (x)+\frac{1}{2} \Vert x \Vert ^{2}_{H^{x}_{k}}-f(x), \qquad\tilde{\psi }_{k}(y):=\psi (y)+ \frac{1}{2} \Vert y \Vert ^{2}_{H^{y}_{k}}-g(y).$$

However, this may cause disagreements, since the pursuit of this statement raised above is simply a matter of form. In fact, the convexity property is required for a Bregman distance by definition, see, e.g., [28, 29]. Meanwhile, it is worth pointing out that matrices $$H^{x}_{k}$$ and $$H^{y}_{k}$$ in this paper are not required to be positive semidefinite/definite, and function f or g is also not necessarily convex.

## Convergence analysis

This section is devoted to the convergence analysis of the ADMM-based SQP method introduced in Sect. 2. First, we consider some basic assumptions as follows.

### Assumption 1

Let $$\min \{\sigma _{0},\sigma _{\phi },\sigma _{\psi }\}>0$$, $$f:R^{n_{1}}\rightarrow R$$, and $$g:R^{n_{2}}\rightarrow R$$ be continuously differentiable functions. Assume that the following conditions hold:

1. (i)

$${B B^{\mathrm{T}}}\succeq \sigma _{0} I$$, namely B is full row rank;

2. (ii)

ϕ and ψ are strongly convex with modulus $$\sigma _{\phi },\sigma _{\psi }$$, respectively;

3. (iii)

f, g, ϕ, and ψ are Lipschitz continuous with modulus $$\ell _{f}, \ell _{g}, \ell _{\phi }, \ell _{\psi }> 0$$, respectively.

### Assumption 2

The matrix sequences $$\{H_{k}^{x}\}$$ and $$\{H_{k}^{y}\}$$ are bounded, and there exist constants $$h,\eta ^{x},\eta ^{y}>0$$ such that

$$\bigl\Vert H_{k}^{x} \bigr\Vert \leq h,\qquad \bigl\Vert H_{k}^{y} \bigr\Vert \leq h,\qquad \mathcal{H}_{k}^{x} \succeq \eta ^{x} I_{n_{1}},\qquad \mathcal{H}_{k}^{y}\succeq \eta ^{y} I_{n_{2}}, \quad\forall k,$$
(10)

where

\begin{aligned} & \mathcal{H}_{k}^{x}:= H_{k}^{x} + \sigma _{\phi }I_{n_{1}}-\ell _{f} I_{n_{1}}, \end{aligned}
(11a)
\begin{aligned} & \mathcal{H}_{k}^{y}:= H_{k}^{y} + \sigma _{\psi }I_{n_{2}} - \biggl( \ell _{g} + \frac{4 (\ell _{\psi }+h)^{2} +4(\ell _{g}+\ell _{\psi }+h )^{2}}{\beta \sigma _{0}} \biggr)I_{n_{2}}. \end{aligned}
(11b)

### Remark 2

From Assumption 2, in order to guarantee that the final two relations in (10) hold, it is suitable for us to choose such quantities

$$\inf \sigma _{H^{x}_{k}} + \sigma _{\phi }- \ell _{f}\geq \eta ^{x},$$

and

$$\beta > \bar{\beta }:= \frac{4[(\ell _{\psi }+h)^{2} + (l_{g} +\ell _{\psi }+ h )^{2}] }{\sigma _{0} (\inf \sigma _{H^{y}_{k}} + \sigma _{\psi }- \ell _{g}-\eta ^{y})}>0,\quad \forall k,$$
(12)

where $$\sigma _{H^{x}_{k}}$$ and $$\sigma _{H^{y}_{k}}$$ are the minimum eigenvalues of matrices $$H_{k}^{x}$$ and $$H_{k}^{y}$$, respectively. Besides, there is no doubt that both QP subproblems (7a) and (7b) have a unique optimal solution.

To design an appropriate merit function for problem (1), we introduce a modified potential function$$\hat{\mathcal{L}}_{\beta }: R^{n_{1}}\times R^{n_{2}}\times R^{m} \times R^{n_{2}}\rightarrow R$$ defined as

$$\hat{\mathcal{L}}_{\beta }(\hat{w}):= \mathcal{L}_{\beta }(w) + \delta \Vert y- \hat{y} \Vert ^{2},\qquad \delta:= \frac{2(\ell _{g}+\ell _{\psi }+ h)^{2}}{\beta \sigma _{0}}.$$
(13)

Before giving the descent property of $$\hat{\mathcal{L}}_{\beta }(\cdot )$$, we first establish a series of technical results, which shall contribute to characterizing convergence properties of the ADMM-based SQP algorithm. To see that, we now provide an upper estimate for the quantities $$\| \triangle \lambda ^{k+1} \|^{2}$$ in the following.

### Lemma 2

Suppose that Assumptions1and2are satisfied. Then we have

$$\bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert ^{2} \leq \frac{2 (\ell _{\psi }+ h)^{2}}{\sigma _{0}} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} + \frac{2(\ell _{g}+\ell _{\psi }+ h)^{2}}{\sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}.$$
(14)

### Proof

First, it follows directly from Assumption 1(i) that

$$\bigl\Vert B^{\mathrm{T}} \triangle \lambda ^{k+1} \bigr\Vert ^{2} \geq \sigma _{0} \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert ^{2}.$$
(15)

Again, by the optimality condition (9b), we have

$$B^{\mathrm{T}} \lambda ^{k+1} = \nabla g \bigl(y^{k}\bigr) + H_{k}^{y} \triangle y^{k+1} + \nabla \psi \bigl(y^{k+1}\bigr) - \nabla \psi \bigl(y^{k}\bigr).$$
(16)

Then, using inequality (16), Assumption 1(iii), and the boundedness of $$\{ H_{k}^{y} \}$$ in Assumption 2, we obtain

\begin{aligned} & \bigl\Vert B^{\mathrm{T}} \triangle \lambda ^{k+1} \bigr\Vert ^{2} \\ &\quad= \bigl\Vert \bigl(\nabla g\bigl(y^{k}\bigr) + H_{k}^{y} \triangle y^{k+1} + \nabla \psi \bigl(y^{k+1}\bigr) - \nabla \psi \bigl(y^{k}\bigr)\bigr) \\ &\qquad{} -\bigl(\nabla g\bigl(y^{k-1}\bigr) + H_{k-1}^{y} \triangle y^{k} + \nabla \psi \bigl(y^{k}\bigr) - \nabla \psi \bigl(y^{k-1}\bigr)\bigr) \bigr\Vert ^{2} \\ &\quad \leq \bigl[ (\ell _{\psi }+ h) \bigl\Vert \triangle y^{k+1} \bigr\Vert + ( \ell _{g}+ \ell _{\psi }+ h) \bigl\Vert \triangle y^{k} \bigr\Vert \bigr]^{2} \\ &\quad\leq 2 (\ell _{\psi }+ h)^{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} + 2(\ell _{g}+ \ell _{\psi }+ h)^{2} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}, \end{aligned}
(17)

which together with relation (15) immediately establishes the assertion. □

To proceed, the following lemma bounds the pointwise change of the augmented Lagrangian function.

### Lemma 3

Suppose that Assumptions1and2are satisfied, let$$\{w^{k}\}$$be the sequence generated by the ADMM-based SQP method. Then the following assertion is true:

\begin{aligned} &\mathcal{L}_{\beta }\bigl(w^{k+1}\bigr) - \mathcal{L}_{\beta }\bigl(w^{k}\bigr) \\ &\quad \leq - \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{(H_{k}^{x} + \sigma _{\phi }I_{n_{1}}-\ell _{f} I_{n_{1}}) }+ \frac{2 (\ell _{g}+\ell _{\psi }+ h )^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2} \\ &\qquad{} - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{(H_{k}^{y} +\sigma _{\psi }I_{n_{2}} - (\ell _{g} + \frac{4 (\ell _{\psi }+ h)^{2}}{\beta \sigma _{0}})I_{n_{2}})}. \end{aligned}
(18)

### Proof

From Assumption 1(iii), we can see that f and g are Lipschitz continuous, then we can deduce

\begin{aligned} & f\bigl(x^{k+1}\bigr)- f\bigl(x^{k}\bigr) - \nabla f\bigl(x^{k}\bigr)^{\mathrm{T}} \bigl(x^{k+1}-x^{k} \bigr) \\ &\quad = \int _{ 0}^{1} \nabla f \bigl(x^{k} + s \bigl(x^{k+1}-x^{k}\bigr) \bigr)^{\mathrm{T}} \bigl(x^{k+1}-x^{k}\bigr) \,ds - \int _{ 0}^{1} \nabla f \bigl(x^{k} \bigr)^{\mathrm{T}} \bigl(x^{k+1}-x^{k}\bigr) \,ds \\ &\quad = \int _{0}^{1} \bigl[ \nabla f\bigl(x^{k} + s\bigl(x^{k+1}-x^{k}\bigr)\bigr) - \nabla f \bigl(x^{k} \bigr) \bigr]^{\mathrm{T}} \bigl(x^{k+1}-x^{k} \bigr) \,ds \\ &\quad\leq \int _{0}^{1} \bigl\Vert \nabla f \bigl(x^{k} + s\bigl(x^{k+1}-x^{k}\bigr)\bigr) - \nabla f\bigl(x^{k} \bigr) \bigr\Vert \cdot \bigl\Vert x^{k+1}-x^{k} \bigr\Vert \,ds \\ &\quad\leq \ell _{f} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2} \int _{0}^{1} s \,ds = \frac{\ell _{f} }{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}, \end{aligned}
(19)

and similarly

$$g\bigl(y^{k+1}\bigr)\leq g\bigl(y^{k}\bigr) + \nabla g\bigl(y^{k}\bigr)^{\mathrm{T}} \bigl(y^{k+1}-y^{k} \bigr) + \frac{\ell _{g}}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}.$$
(20)

By the definition of $$\mathcal{L}_{\beta }(\cdot )$$ and formula (7c), we have

\begin{aligned} & \mathcal{L}_{\beta }\bigl(x^{k+1},y^{k+1}, \lambda ^{k+1}\bigr) -\mathcal{L}_{\beta }\bigl(x^{k+1},y^{k+1}, \lambda ^{k}\bigr) \\ &\quad =\bigl(\lambda ^{k} -\lambda ^{k+1}\bigr)^{\mathrm{T}} \bigl(Ax^{k+1}+By^{k+1}-b \bigr) = \frac{1 }{\beta } \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert ^{2}. \end{aligned}
(21)

Moreover, since $$y^{k+1}$$ is a minimizer of (7b), then using the strong convexity of ψ and relation (20), we have

\begin{aligned} & \mathcal{L}_{\beta }\bigl(x^{k+1},y^{k+1}, \lambda ^{k}\bigr) -\mathcal{L}_{\beta }\bigl(x^{k+1},y^{k}, \lambda ^{k}\bigr) \\ &\quad = f\bigl(x^{k+1}\bigr) + g\bigl(y^{k+1}\bigr) -\bigl(\lambda ^{k}\bigr)^{\mathrm{T}} \bigl(Ax^{k+1}+By^{k+1}-b \bigr) + \frac{\beta }{2} \bigl\Vert Ax^{k+1}+By^{k+1}-b \bigr\Vert ^{2} \\ &\qquad{}- \biggl[f\bigl(x^{k+1}\bigr) + g\bigl(y^{k}\bigr) -\bigl( \lambda ^{k}\bigr)^{\mathrm{T}} \bigl(Ax^{k+1}+By^{k}-b \bigr) + \frac{\beta }{2} \bigl\Vert Ax^{k+1}+By^{k}-b \bigr\Vert ^{2} \biggr] \\ &\quad \leq g\bigl(y^{k+1}\bigr)- g\bigl(y^{k}\bigr) -\nabla g \bigl(y^{k}\bigr)^{\mathrm{T}} \bigl(y^{k+1}-y^{k} \bigr) - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{H_{k}^{y}}- \triangle \psi \bigl(y^{k+1},y^{k} \bigr) \\ &\quad \leq \frac{\ell _{g}}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{H_{k}^{y}} -\frac{\sigma _{\psi }}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}, \\ &\quad = -\frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{(H_{k}^{y} +\sigma _{\psi }I_{n_{2}}- \ell _{g} I_{n_{2}})}. \end{aligned}
(22)

Again, similarly recalling the update for x-subproblem (7a), one also has

\begin{aligned} &\mathcal{L}_{\beta }\bigl(x^{k+1},y^{k}, \lambda ^{k}\bigr) -\mathcal{L}_{\beta }\bigl(x^{k},y^{k}, \lambda ^{k}\bigr) \\ &\quad = f\bigl(x^{k+1}\bigr)+ g\bigl(y^{k}\bigr) -\bigl(\lambda ^{k}\bigr)^{\mathrm{T}} \bigl(Ax^{k+1}+By^{k}-b \bigr) + \frac{\beta }{2} \bigl\Vert Ax^{k+1}+By^{k}-b \bigr\Vert ^{2} \\ &\qquad{}- \biggl[f\bigl(x^{k}\bigr) + g\bigl(y^{k}\bigr) -\bigl( \lambda ^{k}\bigr)^{\mathrm{T}} \bigl(Ax^{k}+By^{k}-b \bigr) + \frac{\beta }{2} \bigl\Vert Ax^{k}+By^{k}-b \bigr\Vert ^{2}\biggr] \\ &\quad \leq f\bigl(x^{k+1}\bigr)- f\bigl(x^{k}\bigr) -\nabla f \bigl(x^{k}\bigr)^{\mathrm{T}} \bigl(x^{k+1}-x^{k} \bigr) - \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{H_{k}^{x}} - \triangle \phi \bigl(x^{k+1},x^{k} \bigr) \\ &\quad \leq \frac{\ell _{f}}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2} - \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{H_{k}^{x}} -\frac{\sigma _{\phi }}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}, \\ &\quad = -\frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{(H_{k}^{x} + \sigma _{\phi }I_{n_{1}}- \ell _{f} I_{n_{1}})}. \end{aligned}
(23)

Summing up relations (21), (22), and (23), we obtain

\begin{aligned} \mathcal{L}_{\beta }\bigl(w^{k+1}\bigr) - \mathcal{L}_{\beta } \bigl(w^{k}\bigr) \leq{}& \frac{1 }{\beta } \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert ^{2} -\frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{(H_{k}^{x} + \sigma _{\phi }I_{n_{1}}-\ell _{f} I_{n_{1}})} \\ &{} - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{(H_{k}^{y} +\sigma _{\psi }I_{n_{2}}- \ell _{g} I_{n_{2}}) }. \end{aligned}

This, combined with relation (14), justifies the conclusion. □

We next give the descent property of $$\hat{\mathcal{L}}_{\beta }(\cdot )$$, i.e., the sequence $$\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}$$ is monotonically nonincreasing when depending on Assumption 1 and Assumption 2.

### Lemma 4

Suppose that Assumptions1and2hold, let$$\{w^{k}\}$$be the sequence generated by the ADMM-based SQP method. Then we have

$$\hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k+1}\bigr) \leq \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr) - \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{\mathcal{H}_{k}^{x} } - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{\mathcal{H}_{k}^{y}},$$
(24)

where matrices$$\mathcal{H}_{k}^{x}$$and$$\mathcal{H}_{k}^{y}$$are defined in (11a) and (11b), respectively.

### Proof

Clearly, we observe from Lemma 3 that

\begin{aligned} &\mathcal{L}_{\beta }\bigl(w^{k+1}\bigr) + \frac{2 (\ell _{g}+\ell _{\psi }+ h )^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} \\ &\quad \leq \mathcal{L}_{\beta }\bigl(w^{k}\bigr) + \frac{2 (\ell _{g}+\ell _{\psi }+ h )^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2} - \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{(H_{k}^{x} + \sigma _{\phi }I_{n_{1}}- \ell _{f} I_{n_{1}}) } \\ &\qquad{} - \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{ (H_{k}^{y} +\sigma _{\psi }I_{n_{2}} -(\ell _{g} + \frac{4 (\ell _{\psi }+ h)^{2} +4(\ell _{g}+\ell _{\psi }+ h )^{2}}{\beta \sigma _{0}})I_{n_{2}} )}. \end{aligned}

Thus, using the definitions of $$\hat{\mathcal{L}}_{\beta }(\cdot )$$, $$\mathcal{H}_{k}^{x}$$, and $$\mathcal{H}_{k}^{y}$$, the conclusion is immediately satisfied. □

Notice that the boundedness of the iterative sequence $$\{ w^{k}\}$$ generated by the ADMM-based SQP method plays an important role in the existence of a cluster point, so it is necessary to consider some additional conditions as follows.

### Assumption 3

Assume that the following conditions are satisfied:

1. (i)

There exists a constant $$\tilde{\sigma } > 0$$ with $$g^{\ast }:= \inf_{y} \{ g(y) - \tilde{\sigma } \| \nabla g(y) \| ^{2} \} >-\infty$$;

2. (ii)

f and g are coercive, i.e., $$\lim_{\| x \|\rightarrow +\infty } f(x) =+\infty, \lim_{\| y \| \rightarrow +\infty } g(y) = +\infty$$;

3. (iii)

$$\beta > \max \{\bar{\beta }, \frac{1}{\tilde{\sigma }\sigma _{0}}\}$$.

We now prove the boundedness of the iterative sequence $$\{ w^{k}\}$$, which is further beneficial to illustrate that the potential function constructed is bounded from below.

### Lemma 5

Suppose that Assumptions1, 2, and3are all satisfied, let$$\{w^{k}\}$$be the sequence generated by the ADMM-based SQP method. Then$$\{ w^{k}\}$$is bounded, and there exists a constant$$\underline{\mathcal{L}}$$such that$$\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) \geq \underline{\mathcal{L}} > - \infty, \forall k>0$$.

### Proof

First, by matching the complete square, we have

\begin{aligned} \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) ={}& f\bigl(x^{k}\bigr) +g\bigl(y^{k}\bigr) -\bigl( \lambda ^{k}\bigr)^{\mathrm{T}} \bigl(A x^{k} +B y^{k} -b \bigr) \\ &{} + \frac{\beta }{2} \bigl\Vert A x^{k} +B y^{k} -b \bigr\Vert ^{2} + \delta \bigl\Vert \triangle y^{k} \bigr\Vert ^{2} \\ ={}& f\bigl(x^{k}\bigr) +g\bigl(y^{k}\bigr) + \frac{\beta }{2} \biggl\Vert A x^{k} +B y^{k} -b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2} \\ & {}- \frac{1}{2\beta } \bigl\Vert \lambda ^{k} \bigr\Vert ^{2} + \delta \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}. \end{aligned}
(25)

Next, recalling relation (16), Assumption 1(iii), and Assumption 2 gives

\begin{aligned} \sigma _{0} \bigl\Vert \lambda ^{k} \bigr\Vert ^{2} & \leq \bigl\Vert B^{\mathrm{T}} \lambda ^{k} \bigr\Vert ^{2} = \bigl\Vert \nabla g\bigl(y^{k-1}\bigr) + H_{k-1}^{y} \triangle y^{k} + \nabla \psi \bigl(y^{k}\bigr)- \nabla \psi \bigl(y^{k-1}\bigr) \bigr\Vert ^{2} \\ & \leq \bigl( \bigl\Vert \nabla g\bigl(y^{k-1}\bigr)- \nabla g \bigl(y^{k}\bigr) \bigr\Vert + \bigl\Vert \nabla g \bigl(y^{k}\bigr) \bigr\Vert + ( \ell _{\psi }+ h ) \bigl\Vert \triangle y^{k} \bigr\Vert \bigr)^{2} \\ & \leq \bigl( \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert + ( \ell _{g} + \ell _{\psi }+ h) \bigl\Vert \triangle y^{k} \bigr\Vert \bigr)^{2} \\ & \leq 2 \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert ^{2} + 2(\ell _{g} + \ell _{\psi }+ h)^{2} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}, \end{aligned}

and hence

\begin{aligned} \frac{1}{2\beta } \bigl\Vert \lambda ^{k} \bigr\Vert ^{2} \leq \frac{1}{\beta \sigma _{0}} \bigl\Vert \nabla g \bigl(y^{k}\bigr) \bigr\Vert ^{2} + \frac{(\ell _{g} + \ell _{\psi }+ h)^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}. \end{aligned}
(26)

Subtracting this into (25) and invoking the definition of δ in (13), we obtain

\begin{aligned} \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) \geq{}& f\bigl(x^{k}\bigr) +g\bigl(y^{k}\bigr) - \frac{1}{\beta \sigma _{0}} \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert ^{2} \\ &{} + \frac{\beta }{2} \biggl\Vert A x^{k} +B y^{k} -b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2} + \frac{ (\ell _{g} +\ell _{\psi }+ h)^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2} \\ ={}& f\bigl(x^{k}\bigr) + \bigl(g\bigl(y^{k}\bigr) -\tilde{ \sigma } \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert ^{2} \bigr) +\biggl( \tilde{\sigma }- \frac{1}{\beta \sigma _{0}}\biggr) \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert ^{2} \\ &{} + \frac{\beta }{2} \biggl\Vert A x^{k} +B y^{k} -b - \frac{\lambda ^{k}}{\beta } \biggr\Vert ^{2}+ \frac{(\ell _{g} + \ell _{\psi }+ h)^{2}}{\beta \sigma _{0}} \bigl\Vert \triangle y^{k} \bigr\Vert ^{2}. \end{aligned}
(27)

On the other hand, from Lemma 4 and Assumption 2, we know that $$\{ \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}$$ is nonincreasing, and thus we have

$$\hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{1} \bigr) \geq \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr).$$
(28)

Since $$\lim_{\| x \|\rightarrow +\infty } f(x) =+\infty$$ implies that $$f^{\ast }:=\inf_{ x }f(x) > -\infty$$, which together with Assumption 3 as well as relations (27) and (28), implies that sequences $$\{x^{k}\}$$, $$\{\| \nabla g(y^{k}) \|\}$$, and $$\{\|\triangle y^{k}\|\}$$ are bounded, and then the boundedness of $$\{\lambda ^{k}\}$$ follows directly from estimate (26). Moreover, $$\{y^{k}\}$$ is also bounded by taking the fact that $$\lim_{\| y \|\rightarrow +\infty } g(y) = +\infty$$ implies that $$\inf_{ y } g(y) > -\infty$$. Therefore, the sequence $$\{w^{k}\}$$ is bounded.

To this end, ignoring some nonnegative terms of (27), one has

$$\hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) \geq f \bigl(x^{k}\bigr) +g\bigl(y^{k}\bigr) - \tilde{\sigma } \bigl\Vert \nabla g\bigl(y^{k}\bigr) \bigr\Vert ^{2}\geq \underline{\mathcal{L}},$$

where $$\underline{\mathcal{L}}:=f^{\ast }+g^{\ast }$$, and the proof is completed. □

Now, we are ready to establish the global convergence of the ADMM-based SQP method.

### Theorem 1

Suppose that Assumptions1, 2, and3are all satisfied, let$$\{w^{k}\}$$be the sequence generated by the ADMM-based SQP method. Then

1. (i)

$$\lim_{k\rightarrow \infty } ( \| \triangle x^{k+1} \| + \| \triangle y^{k+1}\| +\| \triangle \lambda ^{k+1}\| ) =0$$;

2. (ii)

any cluster point$$w^{\ast }$$of the sequence$$\{w^{k}\}$$is a KKT point of problem (1).

### Proof

(i) Note first from Lemma 4 that

$$\hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) -\hat{ \mathcal{L}}_{\beta }\bigl( \hat{w}^{k+1}\bigr) \geq \frac{1}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{ \mathcal{H}_{k}^{x} } + \frac{1}{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{\mathcal{H}_{k}^{y}}.$$

Thus, summing up this inequality from $$k=1$$ to n, using Lemma 5 yields

\begin{aligned} \infty & > \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{1}\bigr) - \underline{\mathcal{L}} \geq \hat{ \mathcal{L}}_{\beta }\bigl(\hat{w}^{1}\bigr) - \hat{ \mathcal{L}}_{\beta }\bigl(\hat{w}^{n+1}\bigr) \\ & \geq \frac{1}{2}\sum_{k=1}^{n} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}_{\mathcal{H}_{k}^{x} } + \frac{1}{2}\sum_{k=1}^{n} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2}_{\mathcal{H}_{k}^{y}}. \end{aligned}
(29)

Again, by Assumption 2, we know that, for any k, both matrices $$\mathcal{H}_{k}^{x}$$ and $$\mathcal{H}_{k}^{y}$$ are positive definite. Thus

$$\sum_{k=1}^{\infty } \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2} < \infty\quad \text{and} \quad\sum _{k=1}^{\infty } \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} < \infty.$$

Furthermore, in view of estimate (14), one also has $$\sum_{k=1}^{\infty }\|\triangle \lambda ^{k+1} \|^{2} < \infty$$. So,

$$\bigl\Vert \triangle x^{k+1} \bigr\Vert \rightarrow 0, \qquad\bigl\Vert \triangle y^{k+1} \bigr\Vert \rightarrow 0, \qquad \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert \rightarrow 0,\quad k \rightarrow \infty,$$
(30)

which completes assertion (i).

(ii) By Lemma 5, we argue that $$\{ w^{k} \}$$ is bounded and thus there exists at least one cluster point. Assume that $$w^{\ast }$$ is a cluster point of $$\{ w^{k} \}$$, and let $$\{ w^{k_{j}}\}$$ be a convergent subsequence such that $$\lim_{j\rightarrow \infty } w^{k_{j}}=w^{\ast }$$. On the other hand, from Assumption 1(iii) and assertion (i), we know

$$\nabla \phi \bigl(x^{k+1}\bigr)- \nabla \phi \bigl(x^{k}\bigr)\rightarrow 0,\qquad \nabla \psi \bigl(y^{k+1}\bigr)- \nabla \psi \bigl(y^{k}\bigr)\rightarrow 0, \quad k\rightarrow \infty.$$
(31)

In view of this, and taking limit in (7c), (9a), and (9b) along the sequence $$\{ w^{k_{j}}\}$$, we obtain

$$\nabla f\bigl(x^{\ast }\bigr)- A^{\mathrm{T}} \lambda ^{\ast }=0,\qquad \nabla g\bigl(y^{\ast }\bigr)- B^{ \mathrm{T}} \lambda ^{\ast }=0,\qquad A x^{\ast }+B y^{\ast }=b.$$

This implies that $$w^{\ast }$$ is a KKT point of problem (1). □

It is well known that when the potential function has a geometric property known as the KŁ property, then Theorem 1 can typically be strengthened since the limit point of the iterative sequence is unique.

### Theorem 2

Suppose that Assumptions1, 2, and3hold, and suppose thatfandgare semialgebraic functions, let$$\{w^{k}\}$$be the sequence generated by the ADMM-based SQP method. Then the whole sequence$$\{ w^{k} \}$$converges to a KKT point of problem (1).

### Proof

Clearly, by Theorem 1, it suffices to prove that the sequence $$\{ w^{k}\}$$ is convergent. From Lemmas 4 and 5, we know that $$\{ \hat{w}^{k} \}$$ is also bounded and $$\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}$$ is monotonically bounded from below, so $$\mathcal{L}^{\ast }:=\lim_{k\rightarrow \infty } \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})$$ should exist. Then the rest of the proof is divided into two cases.

Suppose first that $$\hat{\mathcal{L}}_{\beta }(\hat{w}^{N}) = \mathcal{L}^{\ast }$$ for some $$N \geq 1$$. Since $$\{\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\}$$ is nonincreasing, we have $$\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) = \hat{\mathcal{L}}^{\ast }$$ for any $$k \geq N$$. Then, according to Lemma 4, $$x^{N+t} = x^{N}$$ and $$y^{N+t} = y^{N}$$ hold for any $$t \geq 0$$. This implies that both sequences $$\{ x^{k} \}$$ and $$\{ y^{k} \}$$ converge finitely. Furthermore, it follows from relation (14) that $$\lambda ^{N+t} = \lambda ^{N+1}$$ for any $$t\geq 1$$. Hence, $$\{ \lambda ^{k} \}$$ is also convergent, which justifies the assertion.

Suppose next that $$\hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) > \mathcal{L}^{\ast }$$ for all k. Note that $$\hat{\mathcal{L}}_{\beta }(\cdot )$$ is a semialgebraic function due to the semialgebraicity of f and g, hence it is a KŁ function. Subsequently, this part involves three steps for analysis.

To begin with, we prove that $$\hat{\mathcal{L}}_{\beta }(\cdot )$$ is constant on Ω, where Ω is a set of all cluster points of the sequence $$\{ \hat{w}^{k} \}$$, and then utilize the uniformized KŁ property in Lemma 1.

By the boundedness of $$\{ \hat{w}^{k} \}$$, obviously, Ω is nonempty and $$d(\hat{w}^{k},\varOmega ) \rightarrow 0$$. Similarly, as proved in [24, Lemma 5(iii)], we also know that Ω is a compact and connected set. Let $$\hat{w}^{\ast }\in \varOmega$$ be arbitrary, and consider a convergent subsequence $$\{ \hat{w}^{k_{j}} \}$$ of $$\{ \hat{w}^{k} \}$$ converging to $$\hat{w}^{\ast }$$. Then, by the continuity of $$\hat{\mathcal{L}}_{\beta }(\cdot )$$, we have $$\mathcal{L}^{\ast }= \lim_{j\rightarrow \infty } \hat{\mathcal{L}}_{\beta }(\hat{w}^{k_{j}}) = \hat{\mathcal{L}}_{\beta }( \hat{w}^{\ast })$$, so $$\{ \hat{\mathcal{L}}_{\beta }(\hat{w}^{k}) \}$$ is convergent, namely $$\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})\rightarrow \hat{\mathcal{L}}_{\beta }(\hat{w}^{\ast })$$. Thus, $$\hat{\mathcal{L}}_{\beta }(\cdot )$$ is constant on Ω since $$\hat{w}^{\ast }\in \varOmega$$ is arbitrary. Now, using Lemma 1, for any $$\delta _{0}>0$$, $$\eta >0$$, there exists an integer $$k_{1}>0$$ such that $$d(\hat{w}^{k},\varOmega ) < \delta _{0}$$ and $$\mathcal{L}^{\ast }< {\hat{\mathcal{L}}_{\beta }(\hat{w}^{k})} < \mathcal{L}^{\ast }+ \eta$$, and

$$\varphi '\bigl({\hat{\mathcal{L}}_{\beta } \bigl(\hat{w}^{k}\bigr)} - \mathcal{L}^{\ast }\bigr) d\bigl(0, \nabla \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr)\bigr) \geq 1, \quad \forall k\geq k_{1}.$$
(32)

Next, we attempt to bound the distance from 0 to $$\nabla \hat{\mathcal{L}}_{\beta }(\hat{w}^{k})$$. Taking the partial derivative of $$\hat{\mathcal{L}}_{\beta }(\cdot )$$ at $$\hat{w}^{k}$$ with respect to variable x, we have

\begin{aligned} \nabla _{x} \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr) & = \nabla f\bigl(x^{k}\bigr) - A^{\mathrm{T}} \lambda ^{k} + \beta A^{\mathrm{T}} \bigl(Ax^{k} + By^{k} -b\bigr) \\ & = - H_{k}^{x} \triangle x^{k+1} -\beta A^{\mathrm{T}} A \triangle x^{k+1} -\bigl(\nabla \phi \bigl(x^{k+1}\bigr)-\nabla \phi \bigl(x^{k}\bigr)\bigr), \end{aligned}
(33)

where the final equality follows from the optimality condition (8a). Likewise, for variable y, we have

\begin{aligned} \nabla _{y} \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr) = {}&\nabla g\bigl(y^{k}\bigr) - B^{\mathrm{T}} \lambda ^{k} + \beta B^{\mathrm{T}} \bigl(Ax^{k} + By^{k} -b\bigr) + 2 \delta \triangle y^{k} \\ = {}&{- }H_{k}^{y} \triangle y^{k+1} -\bigl(\nabla \psi \bigl(y^{k+1}\bigr)-\nabla \psi \bigl(y^{k}\bigr)\bigr) \\ &{} - \beta B^{\mathrm{T}} \bigl(A \triangle x^{k+1} + B \triangle y^{k+1} \bigr) + 2 \delta \triangle y^{k} \\ ={}&{ -} H_{k}^{y} \triangle y^{k+1} -\bigl(\nabla \psi \bigl(y^{k+1}\bigr)-\nabla \psi \bigl(y^{k}\bigr)\bigr) \\ & {}+ B^{\mathrm{T}} \bigl(\triangle \lambda ^{k+1}- \triangle \lambda ^{k} \bigr)+ 2 \delta \triangle y^{k}, \end{aligned}
(34)

where the second equality follows from (8b), and the final equality utilizes formula (7c). Additionally,

$$\nabla _{\lambda }\hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr) = -\bigl(Ax^{k} + By^{k} -b\bigr)= \frac{1}{\beta } \triangle \lambda ^{k}$$
(35)

and

$$\nabla _{\hat{y}} \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr) = - 2 \delta \triangle y^{k}.$$
(36)

Since ϕ and ψ are Lipschitz continuous, then combining Assumption 2 with relations (33)–(36), there exists a constant $$a>0$$ such that

$$d\bigl(0,\nabla \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr)\bigr) \leq a \bigl( \bigl\Vert \triangle x^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k} \bigr\Vert + \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert + \bigl\Vert \triangle \lambda ^{k} \bigr\Vert \bigr).$$
(37)

Finally, based on relations (32) and (37), we start to study the convergence of the entire sequence $$\{ w^{k}\}$$. For statement convenience, let

$$\Delta ^{k}:= \varphi \bigl(\hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k} \bigr) - \mathcal{L}^{\ast }\bigr)-\varphi \bigl( \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k+1}\bigr) - \mathcal{L}^{\ast }\bigr)$$

and

$$\ell ^{k}:= \bigl\Vert \triangle x^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k} \bigr\Vert + \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert + \bigl\Vert \triangle \lambda ^{k} \bigr\Vert .$$

As $$\hat{\mathcal{L}}_{\beta }(\cdot )$$ is nonincreasing and by Definition 1 that φ is monotonous, it is easy to have $$\Delta^{k} \geq 0$$ for $$k\geq k_{1}$$. Multiplying both sides of (37) by $$\Delta^{k}$$, using inequality (32) and the concavity of φ, as well as Lemma 4 and Assumption 2, we obtain for all $$k \geq k_{1}$$ that

\begin{aligned} \ell ^{k} \cdot \bigl(a \Delta ^{k}\bigr) & \geq d\bigl(0,\nabla \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k} \bigr)\bigr)\cdot {\Delta }^{k} \\ & \geq d\bigl(0,\nabla \hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k}\bigr)\bigr) \cdot \varphi ' \bigl(\hat{ \mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) - \mathcal{L}^{\ast }\bigr) \cdot \bigl(\hat{\mathcal{L}}_{\beta } \bigl(\hat{w}^{k}\bigr) -\hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k+1}\bigr)\bigr) \\ & \geq \hat{\mathcal{L}}_{\beta }\bigl(\hat{w}^{k}\bigr) -\hat{ \mathcal{L}}_{\beta }\bigl(\hat{w}^{k+1}\bigr) \geq \frac{1}{2} \bigl( \bigl\Vert \triangle x^{k+1} \bigr\Vert _{ \mathcal{H}_{k}^{x}}^{2} + \bigl\Vert \triangle y^{k+1} \bigr\Vert _{\mathcal{H}_{k}^{y}}^{2} \bigr) \\ & \geq \frac{\eta ^{x}}{2} \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2}+ \frac{\eta ^{y} }{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} \\ & \geq \frac{1}{2} \min \bigl\{ \eta ^{x}, \eta ^{y}\bigr\} \bigl( \bigl\Vert \triangle x^{k+1} \bigr\Vert ^{2} + \bigl\Vert \triangle y^{k+1} \bigr\Vert ^{2} \bigr) \\ & \geq \frac{1}{4} \min \bigl\{ \eta ^{x}, \eta ^{y}\bigr\} \bigl( \bigl\Vert \triangle x^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k+1} \bigr\Vert \bigr)^{2} \\ & := \sigma _{1} \bigl( \bigl\Vert \triangle x^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k+1} \bigr\Vert \bigr)^{2}, \end{aligned}
(38)

where $$\sigma _{1}:= \frac{1}{4} \min \{\eta ^{x}, \eta ^{y} \}>0$$. Now, dividing both sides of (38) by $$\sigma _{1}$$, taking the square root, and then applying the inequality $$\frac{u+v}{2}\geq \sqrt{uv}$$ for $$u,v\geq 0$$ to address the resulting inequality, we get

$$\frac{\ell ^{k}}{2m} + \frac{am}{2\sigma _{1}}{\Delta }^{k} \geq \bigl\Vert \triangle x^{k+1} \bigr\Vert + \bigl\Vert \triangle y^{k+1} \bigr\Vert ,\quad k \geq k_{1},$$
(39)

where $$m>0$$ is an arbitrary constant. Then, by relation (16), Assumptions 1 and 2, we can derive

\begin{aligned} \sqrt{{\sigma _{0}}} \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert \leq{}& \bigl\Vert B^{\mathrm{T}} \triangle \lambda ^{k+1} \bigr\Vert \\ = {}&\bigl\Vert \bigl(\nabla g\bigl(y^{k}\bigr) + H_{k}^{y} \triangle y^{k+1} + \nabla \psi \bigl(y^{k+1}\bigr) -\nabla \psi \bigl(y^{k}\bigr) \bigr) \\ & {} - \bigl(\nabla g\bigl(y^{k-1}\bigr)+ H_{k-1}^{y} \triangle y^{k} + \nabla \psi \bigl(y^{k}\bigr) -\nabla \psi \bigl(y^{k-1}\bigr) \bigr) \bigr\Vert \\ \leq {}&(\ell _{g} +\ell _{\psi }+ h ) \bigl\Vert \triangle y^{k} \bigr\Vert + (\ell _{\psi }+ h ) \bigl\Vert \triangle y^{k+1} \bigr\Vert . \end{aligned}

Accordingly, for simplicity, let $$d_{1}:= \frac{\ell _{g} +\ell _{\psi }+ h}{\sqrt{{\sigma _{0}}}}$$ and $$d_{2}:= \frac{\ell _{\psi }+ h}{\sqrt{{\sigma _{0}}}}$$, we have

$$\bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert \leq d_{1} \bigl\Vert \triangle y^{k} \bigr\Vert + d_{2} \bigl\Vert \triangle y^{k+1} \bigr\Vert ,$$
(40)

and

$$\bigl\Vert \triangle \lambda ^{k} \bigr\Vert \leq d_{1} \bigl\Vert \triangle y^{k-1} \bigr\Vert + d_{2} \bigl\Vert \triangle y^{k} \bigr\Vert .$$

Hence, substituting these into (39) and regrouping terms, we obtain

\begin{aligned} &\biggl(1-\frac{1}{2m}\biggr) \bigl\Vert \triangle x^{k+1} \bigr\Vert + \biggl(1-\frac{1+d_{2}}{2m}\biggr) \bigl\Vert \triangle y^{k+1} \bigr\Vert \\ &\quad \leq \frac{1+d_{1}+d_{2}}{2m} \bigl\Vert \triangle y^{k} \bigr\Vert + \frac{d_{1}}{2m} \bigl\Vert \triangle y^{k-1} \bigr\Vert + \frac{am}{2\sigma _{1}}{\Delta }^{k}, \end{aligned}

which by manipulating, can be rewritten as

\begin{aligned} &\biggl(1-\frac{1}{2m}\biggr) \bigl\Vert \triangle x^{k+1} \bigr\Vert + \biggl(1- \frac{1+d_{1}+ d_{2}}{m} \biggr) \bigl\Vert \triangle y^{k+1} \bigr\Vert \\ &\quad \leq \frac{1+2d_{1}+d_{2}}{2m} \bigl( \bigl\Vert \triangle y^{k} \bigr\Vert - \bigl\Vert \triangle y^{k+1} \bigr\Vert \bigr) + \frac{d_{1}}{2m} \bigl( \bigl\Vert \triangle y^{k-1} \bigr\Vert - \bigl\Vert \triangle y^{k} \bigr\Vert \bigr) + \frac{am}{2\sigma _{1}}{ \Delta}^{k}. \end{aligned}
(41)

As a result, making use of the nonnegativity of φ, summing up inequality (41) from $$k=k_{1}$$ to ∞, we have

\begin{aligned} & \biggl(1-\frac{1}{2m}\biggr) \sum _{k=k_{1}}^{\infty } \bigl\Vert \triangle x^{k+1} \bigr\Vert + \biggl(1-\frac{1+{d}_{1}+{d}_{2}}{m} \biggr) \sum _{k=k_{1}}^{\infty } \bigl\Vert \triangle y^{k+1} \bigr\Vert \\ &\quad \leq \frac{1+2{d}_{1}+{d}_{2}}{2m} \bigl\Vert \triangle y^{k_{1}} \bigr\Vert + \frac{{d}_{1}}{2m} \bigl\Vert \triangle y^{k_{1}-1} \bigr\Vert + \frac{am}{2\sigma _{1}} \varphi \bigl(\hat{\mathcal{L}}_{\beta }\bigl( \hat{w}^{k_{1}}\bigr) -\mathcal{L}^{\ast }\bigr). \end{aligned}
(42)

Note that $$m>0$$ is arbitrary, now selecting $$m > 1+ {d}_{1}+ {d}_{2}$$, and hence $$1-\frac{1}{2m}>0$$ and $$1-\frac{1+ {d}_{1}+ {d}_{2}}{m} >0$$, then it follows directly from (42) that

$$\sum_{k=k_{1}}^{\infty } \bigl\Vert \triangle x^{k+1} \bigr\Vert < \infty \quad\text{and}\quad \sum _{k=k_{1}}^{\infty } \bigl\Vert \triangle y^{k+1} \bigr\Vert < \infty.$$

That is, sequences $$\{ x^{k} \}$$ and $$\{ y^{k} \}$$ are convergent. Moreover, summing up relation (40) from $$k=k_{1}$$ to ∞, we also receive

$$\sum_{k=k_{1}}^{\infty } \bigl\Vert \triangle \lambda ^{k+1} \bigr\Vert \leq {d}_{1} \sum _{k=k_{1}}^{\infty } \bigl\Vert \triangle y^{k} \bigr\Vert + {d}_{2} \sum_{k=k_{1}}^{\infty } \bigl\Vert \triangle y^{k+1} \bigr\Vert < \infty,$$

which implies that $$\{ \lambda ^{k} \}$$ is convergent too. That is, $$\{ w^{k} \}$$ is convergent. Therefore, combining with Theorem 1, the whole proof is finished. □

## Conclusions

In this paper, an ADMM-based SQP method for separably smooth nonconvex problems with linear equality constraints is proposed. Incorporating the favorable ideas of SQP method and the classical ADMM, the QP subproblem of the original problem is split into smaller-scale QP subproblems, which can be easily solved with the help of Bregman distances, and hence relieve the difficulty brought by solving large-scale QP itself corresponding to the primal nonconvex optimization problem. Additionally, we update the Lagrangian multipliers in a dual ascent step. Based on the KŁ property and other standard assumptions, the proposed method is globally and strongly convergent in terms of the potential function.

As future work, it is tempting for us to consider whether these theoretical results can be used to develop a relaxation factor in the multipliers updating (7c) or some accelerated technique for variants of the ADMM-based SQP method. This is an interesting issue worthy of further investigation.

## References

1. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

2. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976)

3. Glowinski, R., Marrocco, A.: Sur l’approximation par éléments finis d’ordre un et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. RAIRO. Anal. Numér. 9(2), 41–76 (1975)

4. Goldfard, D., Ma, S.Q., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141(1–2), 349–382 (2013)

5. Goldstein, T., Donoghue, B.O., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 1588–1623 (2014)

6. He, B.S., Liu, H., Wang, Z.R., Yuan, X.M.: A strictly contractive Peaceman–Rachford splitting method for convex programming. SIAM J. Optim. 24, 1011–1040 (2014)

7. He, B.S., Yuan, X.M.: On the $$\mathcal{O}(\frac{1}{n})$$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

8. He, B.S., Yuan, X.M.: On non-ergodic convergence rate of Douglas–Rachford alternating direction method of multipliers. Numer. Math. 130(3), 567–577 (2015)

9. Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

10. Wang, F.H., Xu, Z.B., Xu, H.K.: Convergence of alternating direction method with multipliers for nonconvex composite problems (2014) Preprint. Available at. arXiv:1410.8625

11. Li, G.Y., Pong, T.K.: Global convergence of splitting methods for nonconvex composition optimization. SIAM J. Optim. 25(1), 2434–2460 (2015)

12. Hong, M.Y., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)

13. Wilson, R.B.: A simplicial method for concave programming. PhD thesis, Graduate School of Business Administration, Harvard University, Cambridge (1963)

14. Robinson, S.: A quadratically convergent algorithm for general nonlinear programming problems. Math. Program. 3, 145–156 (1972)

15. Fukushima, M.: A successive quadratic programming algorithm with global and superlinear convergence properties. Math. Program. 35, 253–264 (1986)

16. Solodov, M.V.: Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences. Math. Program., Ser. A 118, 1–12 (2009)

17. Panier, E.R., Tits, A.L.: A superlinearly convergent feasible method for the solution of inequality constrained optimization problems. SIAM J. Control Optim. 25, 934–950 (1987)

18. Zhu, Z.B., Jian, J.B.: An efficient feasible SQP algorithm for inequality constrained optimization. Nonlinear Anal., Real World Appl. 10, 1220–1228 (2009)

19. Jian, J.B.: A superlinearly convergent implicit smooth SQP algorithm for mathematical programs with nonlinear complementarity constraints. Comput. Optim. Appl. 31(3), 335–361 (2005)

20. Jian, J.B., Zheng, H.Y., Tang, C.M., Hu, Q.J.: A new superlinearly convergent norm-relaxed method of strongly sub-feasible direction for inequality constrained optimization. Appl. Math. Comput. 182, 955–976 (2006)

21. Jian, J.B.: Fast Algorithms for Smooth Constrained Optimization—Theoretical Analysis and Numerical Experiments. Science Press, Beijing (2010)

22. Jian, J.B., Lao, Y.X., Chao, M.T., Ma, G.D.: ADMM-SQP algorithm for two blocks linear constrained nonconvex optimization. Oper. Res. Trans. 22(2), 79–92 (2018)

23. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

24. Attouch, H., Bolte, J., Svaiter, B.F.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

25. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

26. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)

27. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)

28. Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

29. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

### Acknowledgements

The authors are very grateful to the referees for their valuable comments and suggestions on the early version of the paper.

Not applicable.

## Funding

This work was supported by the National Natural Science Foundation of China (No. 11771383), the Natural Science Foundation of Guangxi Province (No. 2018GXNSFFA281007), and the Middle-aged and Young Teachers’ Basic Ability Promotion Project of Guangxi Province (No. 2017KY0537).

## Author information

Authors

### Contributions

ML conceived of the description of the ADMM-based SQP method, the convergence analysis and drafted the manuscript. JJ carried out the idea of this paper and participated in the convergence analysis. All authors read and approved the final manuscript.

### Corresponding author

Correspondence to Jinbao Jian.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Rights and permissions 