Skip to main content

Advertisement

A superlinearly convergent SSDP algorithm for nonlinear semidefinite programming

Article metrics

  • 137 Accesses

Abstract

In this paper, we present a sequential semidefinite programming (SSDP) algorithm for nonlinear semidefinite programming. At each iteration, a linear semidefinite programming subproblem and a modified quadratic semidefinite programming subproblem are solved to generate a master search direction. In order to avoid Maratos effect, a second-order correction direction is determined by solving a new quadratic programming. And then a penalty function is used as a merit function for arc search. The superlinear convergence is shown under the strict complementarity and the strong second-order sufficient conditions with the sigma term. Finally, some preliminary numerical results are reported.

Introduction

Consider the following nonlinear semidefinite programming (NLSDP) with a negative semidefinite matrix constraint:

$$ \begin{gathered} \min \quad f(x) \\ \mathrm{s.t.} \quad {\mathcal{A}}(x)\preceq 0, \end{gathered} $$
(1.1)

where \(f:\mathbb{R}^{n}\rightarrow \mathbb{R}\), \(\mathcal{A}: \mathbb{R}^{n}\rightarrow {\mathrm{\mathbb{S}}}^{m}\), \(\mathrm{ \mathbb{S}}^{m}\) is the set of m-order symmetric matrix and \({\mathbb{S}}^{m}_{+} \) (\({\mathbb{S}}^{m}_{-}\)) is the set of m-order positive (negative) semidefinite matrix. \(\mathcal{A}(x)\preceq 0\) means that \(\mathcal{A}(x)\) is a negative semidefinite matrix.

Nonlinear semidefinite programming has many applications both in theory and in the real world. Many convex optimization problems, such as variational inequality problems, fixed point problems [1,2,3], can be reformulated as convex NLSDP. Robust control problems, optimal structural design, and truss design problems can be reformulated as NLSDP (see [4,5,6]). There are a lot of literature for NLSDP on algorithms, for example, the augmented Lagrangian method [7,8,9,10,11,12], primal-dual interior point method [13, 14], and sequential semidefinite programming (SSDP) method [15,16,17,18,19,20,21]. Our research focus is on the SSDP method.

SSDP method, which is a generalization of SQP method for classic nonlinear programming, is one of effective methods for nonlinear semidefinite programming. For example, Correa and Ramirez [16] proposed a global SSDP algorithm for NLSDP. Recently, as illustrated by the extensive numerical experiments in [20, 21], SSDP algorithm has performed very well in finding a solution to NLSDP. At each iteration of SSDP method, a special quadratic semidefinite programming subproblem is solved to generate a search direction. However, just as traditional SQP method, most of existing SSDP methods also have some inherent pitfalls, e.g., (1) the first direction finding subproblem (DFP for short), namely a quadratic semidefinite programming (QSDP for short), is not ensured to be consistent. The algorithm in [21] is based on the assumption that the optimal solution of the first DFP exists. The algorithm in [20] directly goes to feasibility restoration phase when the first DFP is inconsistent. As we know, feasibility restoration phase will increase the computational cost. (2) The optimal solution to the first DFP is not ensured to be an improving direction, so it is possible that Maratos effect occurs. As a result, the superlinear convergence is not guaranteed to obtain.

Since NLSDP contains a negative semidefinite matrix constraint, it is more difficult to deal with these drawbacks comparing with SQP method for classic nonlinear programming. In this paper, we borrow the ideas of modified strategy of quadratic programming subproblem for nonlinear programming from [22]. We first construct a linear semidefinite programming (LSDP for short), and then by means of the solution of the LSDP we construct a special QSDP to yield the master search direction, which is ensured to be consistent. In order to avoid the Maratos effect, a second-order correction direction is introduced which is determined by solving a new quadratic programming. A penalty function is used as a merit function for arc search. The proposed algorithm possesses superlinear convergence under the strict complementarity and the strong second-order sufficient conditions with the sigma term.

The paper is organized as follows. Some notations and preliminaries are described in the next section. In Sect. 3, we present our algorithm in detail and analyze its feasibility. Under some mild conditions, the global convergence and superlinear convergence are shown in Sect. 4 and Sect. 5, respectively. In Sect. 6, the preliminary numerical results are reported. Some concluding remarks are given in the last section.

Preliminaries

In this section, for the sake of convenience, some definitions, notations, and results for NLSDP are introduced.

The differential operator \(D\mathcal{A}(x): {\mathbb{R}}^{n} \to {\mathbb{S}}^{m} \) is defined by

$$ D\mathcal{A}(x)d:=\sum_{i=1}^{n}d_{i} \frac{\partial {\mathcal{A}}(x)}{ \partial x_{i}}, \quad \forall d\in {\mathbb{R}}^{n}. $$
(2.1)

The adjoint operator \(D\mathcal{A}(x)^{*}\) of \(D\mathcal{A}(x)\) is defined by

$$ D\mathcal{A}(x)^{*}Z= \biggl(\biggl\langle \frac{\partial {\mathcal{A}}(x)}{ \partial x_{1}}, Z\biggr\rangle ,\ldots,\biggl\langle \frac{\partial {\mathcal{A}}(x)}{ \partial x_{n}}, Z\biggr\rangle \biggr)^{\mathrm{T}},\quad \forall Z\in {\mathrm{ \mathbb{S}}}^{m}, $$
(2.2)

where \(\langle A, B\rangle \) means the inner product of the matrix A and B, which is defined by \(\langle A, B\rangle =\operatorname{Tr}(AB)\) for any \(A, B\in {\mathbb{S}}^{m}, \operatorname{Tr}(\cdot )\) is the trace of a matrix.

The operator \(D^{2} {\mathcal{A}}(x): {\mathbb{R}}^{n}\times {\mathbb{R}} ^{n}\rightarrow {\mathbb{S}}^{m}\) is defined by

$$ d^{\mathrm{T}} D^{2} {\mathcal{A}}(x) \bar{d}:= \sum_{i,j=1}^{n} d_{i} \bar{d}_{j} \frac{\partial ^{2} {\mathcal{A}}(x)}{\partial x_{i} \partial x_{j}}, \quad \forall d, \bar{d}\in { \mathbb{R}}^{n}. $$
(2.3)

Definition 2.1

Given \(x\in \mathbb{R}^{n}\), if there exists a matrix \(\varLambda \in \mathbb{S}^{m}_{+}\) such that

$$\begin{aligned}& \nabla f(x)+D\mathcal{A}(x)^{\ast }\varLambda =0, \end{aligned}$$
(2.4a)
$$\begin{aligned}& \mathcal{A}(x)\preceq 0, \end{aligned}$$
(2.4b)
$$\begin{aligned}& {\mathrm{T r}}\bigl(\varLambda {\mathcal{A}}(x)\bigr)=0, \end{aligned}$$
(2.4c)

then x is called a KKT point of NLSDP (1.1), the matrix Λ is called the Lagrangian multiplier, (2.4a)–(2.4c) is called the KKT conditions of NLSDP (1.1).

Let \(\lambda _{1},\ldots ,\lambda _{n} \) be the eigenvalues of A (\(\in {\mathbb{R}}^{n\times n}\)), and let \(\lambda _{1}(A)\) be the largest eigenvalue of A. The following results will be used in the subsequent analysis.

Lemma 2.1

([23])

For any \(A,B\in \mathbb{S}^{m}\), the following inequality is true:

$$ \operatorname{Tr}(AB)\leq \sum_{i=1}^{m} \lambda _{i}(A)\lambda _{i}(B), $$
(2.5)

the equality holds if and only if there exists an invertible matrix P such that \(P^{-1}AP\) and \(P^{-1}BP\) are diagonal.

Based on Lemma 2.1, the following result is obvious.

Lemma 2.2

For any \(A\in \mathbb{S}^{m}, B\in \mathbb{S}^{m}_{+}\), the following inequality is true:

$$ \operatorname{Tr}(AB)\leq \lambda _{1}(A) \operatorname{Tr}(B). $$
(2.6)

Lemma 2.3

([24] (Weyl’s inequality))

Suppose \(A,B\in \mathbb{S}^{m}\), then \(\lambda _{1}(A+B)\leq \lambda _{1}(A)+\lambda _{1}(B)\).

Lemma 2.4

For any \(A, B\in \mathbb{S}^{m}\), if \(\lambda _{1}(A+B)<\lambda _{1}(A)\), then the following inequality is true:

$$ \lambda _{1}(A+\eta B)\leq \lambda _{1}(A), \quad \forall \eta \in (0,1). $$

Proof

If \(\lambda _{1}(B)\leq 0\), then it follows from Lemma 2.3 that

$$ \lambda _{1}(A+\eta B)\leq \lambda _{1}(A)+\lambda _{1}(\eta B)\leq \lambda _{1}(A), $$

that is, the result is true.

If \(\lambda _{1}(B)>0\), note that \(\lambda _{1}(A+B)<\lambda _{1}(A)\) and \(\eta \in (0,1)\), then it follows from Lemma 2.3 that

$$\begin{aligned} \lambda _{1}(A+\eta B) =&\lambda _{1}\bigl(A+B+(\eta -1)B\bigr) \\ \leq &\lambda _{1}(A+B)+\lambda _{1}\bigl((\eta -1)B \bigr) \\ \leq &\lambda _{1}(A)+(\eta -1)\lambda _{1}(B) \\ \leq &\lambda _{1}(A), \end{aligned}$$

so the result is proved. □

Let \(x^{k}\) be the current iterative point, motivated by the idea in [22], we construct a linear SDP subproblem (LSDP) as follows:

$$\begin{aligned} \operatorname{LSDP} \bigl(x^{k}\bigr) & \min \quad z \\ &\mathrm{s.t.} \quad {\mathcal{A}}\bigl(x^{k}\bigr)+D\mathcal{A} \bigl(x^{k}\bigr)d\preceq z E_{m}, \\ &\hphantom{\mathrm{s.t.} \quad } z\geq 0, \quad \Vert d \Vert \leq 1, \end{aligned}$$
(2.7)

where \(E_{m}\) is the m-order identity matrix.

It is known that \(\operatorname{LSDP}(x^{k})\) (2.7) has optimal solutions. Let \(((\widehat{d}^{k})^{\mathrm{T}}, z_{k})^{\mathrm{T}}\) be an optimal solution of \(\operatorname{LSDP}(x^{k})\) (2.7). Now we construct a quadratic semidefinite programming (\(\operatorname{QSDP}(x^{k},H_{k})\)) by means of \(z_{k}\) as follows:

$$\begin{aligned} \begin{aligned} \operatorname{QSDP} \bigl(x^{k},H_{k} \bigr) &\qquad \min \quad \nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d+ \frac{1}{2}d^{\mathrm{T}}H_{k} d \\ & \qquad\mathrm{s.t.} \quad {\mathcal{A}}\bigl(x^{k}\bigr)+D\mathcal{A} \bigl(x^{k}\bigr)d\preceq z_{k} E_{m}, \end{aligned} \end{aligned}$$
(2.8)

where \(H_{k}\in \mathbb{S}^{n}\) is the Hesse matrix or an approximation of the Hesse matrix of the Lagrangian function of NLSDP (1.1) at \(x^{k}\).

Generally, the optimal solution \(d^{k}\) to \(\operatorname{QSDP}(x ^{k}, H_{k})\) (2.8) cannot be guaranteed to avoid the Maratos effect and get superlinear convergence, so it needs a modification. To this end, motivated by the ideas in [20], we introduce a second-order correction direction by solving the following subproblem:

$$ \begin{gathered} \min \quad \nabla f \bigl(x^{k}\bigr) ^{\mathrm{T}}\bigl(d^{k}+d\bigr)+ \frac{1}{2}\bigl(d^{k}+d\bigr)^{ \mathrm{T}}H_{k} \bigl(d^{k}+d\bigr) \\ \mathrm{s.t.} \quad \bar{N}_{k}^{\mathrm{T}}\bigl(\mathcal{A} \bigl(x^{k}+d^{k}\bigr)+D\mathcal{A} \bigl(x^{k}\bigr)d\bigr) \bar{N}_{k}=- \bigl\Vert d^{k} \bigr\Vert ^{\varrho }E_{m-r}, \end{gathered} $$
(2.9)

where \(\varrho \in (2,3)\), \(r=\operatorname{rank}(\mathcal{A}(x^{k})+D \mathcal{A}(x^{k})d^{k})\), \({\bar{N}_{k}}=(p_{1}^{k},p_{2}^{k},\ldots , p_{m-r}^{k})\in {\mathbb{R}}^{m\times (m-r)}\), and \(\{p_{1}^{k},p _{2}^{k}, \ldots , p_{m-r}^{k}\}\) is an orthogonal basis for the null space of the matrix \(\mathcal{A}(x^{k})+D\mathcal{A}(x^{k})d^{k}\).

The following basic assumptions are required.

A 1

The functions \(f(x)\) and \(\mathcal{A}(x)\) are continuously differentiable.

A 2

There exist two positive constants a and ā such that

$$a \Vert d \Vert ^{2}\leq d^{\mathrm{T}}H_{k}d \leq \bar{a} \Vert d \Vert ^{2},\quad \forall d \in \mathbb{R}^{n}. $$

Under Assumptions A1–A2, the following lemma follows.

Lemma 2.5

Suppose that Assumptions A1–A2 hold. Then subproblem \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8) has a unique solution \(d^{k}\), and there exists a matrix \(\varLambda _{k}\in \mathbb{S}^{m}_{+}\) satisfying the KKT conditions of \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8), i.e.,

$$\begin{aligned}& \nabla f\bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)^{\ast }\varLambda _{k}+H_{k}d^{k} = 0, \end{aligned}$$
(2.10a)
$$\begin{aligned}& \mathcal{A}\bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)d^{k} \preceq z_{k} E_{m}, \end{aligned}$$
(2.10b)
$$\begin{aligned}& \operatorname{Tr}\bigl(\varLambda _{k}\bigl(\mathcal{A} \bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)d^{k}-z _{k}E_{m}\bigr)\bigr) = 0. \end{aligned}$$
(2.10c)

Define a measure of constraint violation for NLSDP (1.1) as follows:

$$\begin{aligned} P(x)=\lambda _{1}\bigl(\mathcal{A}(x) \bigr)_{+}, \end{aligned}$$
(2.11)

where \(\lambda _{1}(\mathcal{A}(x))_{+}=\max \{\lambda _{1}(\mathcal{A}(x)), 0 \}\). Obviously, \(P(x)=0\) if and only if x is a feasible point of NLSDP (1.1).

By means of \(P(x)\), we define a penalty function as a merit function for arc search:

$$ \theta _{\alpha }(x)=f(x)+\alpha P(x)=f(x)+ \alpha \lambda _{1}\bigl( \mathcal{A}(x)\bigr)_{+}, $$
(2.12)

where \(\alpha >0\) is a penalty parameter. The function \(\theta _{ \alpha }(x)\) comes from the Han penalty function for nonlinear programming.

Lemma 2.6

Suppose that Assumptions A1–A2 hold, \(d^{k}\) is the optimal solution of \(\operatorname{QSDP}(x^{k}, H_{k})\) (2.8). Then the directional derivative \(\theta '_{\alpha }(x^{k};d^{k})\) satisfies the following inequality:

$$\begin{aligned} & \theta '_{\alpha }\bigl(x^{k};d^{k} \bigr) \\ &\quad \leq \nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha \bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr) \end{aligned}$$
(2.13)
$$\begin{aligned} &\quad \leq -\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}+ \operatorname{Tr}\bigl(\varLambda _{k}\bigl( \mathcal{A} \bigl(x^{k}\bigr)-z_{k}E_{m}\bigr)\bigr)- \alpha \bigl(\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}- \lambda _{1}(z_{k}E_{m})_{+} \bigr), \end{aligned}$$
(2.14)

where \(z_{k}\) is the optimal value of \(\operatorname{LSDP}(x^{k})\) (2.7), \(\varLambda _{k}\) is a KKT multiplier of \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8) corresponding to the constraint.

Proof

First, it follows from (2.10a) that

$$\begin{aligned} \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k} = -\sum_{i=1}^{n}d_{i}^{k} \operatorname{Tr} \biggl(\varLambda _{k}\frac{\partial {\mathcal{A}}(x^{k})}{ \partial x_{i}} \biggr)- \bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}. \end{aligned}$$
(2.15)

Since \(\lambda _{1}(\cdot )_{+}\) is convex, we obtain

$$\begin{aligned}& \lambda _{1} \Biggl(\mathcal{A}\bigl(x^{k}\bigr)+t \sum_{i=1}^{n}d_{i }^{k} \frac{ \partial {\mathcal{A}}(x^{k})}{\partial x_{i}} \Biggr)_{+} \\& \quad \leq (1-t)\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}+t\lambda _{1} \Biggl(\mathcal{A}\bigl(x ^{k}\bigr)+\sum_{i=1}^{n}d_{i }^{k} \frac{\partial {\mathcal{A}}(x^{k})}{ \partial x_{i}} \Biggr)_{+} \\& \quad \leq (1-t)\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}+t\lambda _{1}(z_{k}E _{m})_{+}, \end{aligned}$$

the last inequality above is due to (2.10b). By the definition of directional derivative and the above inequality, we get

$$\begin{aligned}& \bigl[\lambda _{1}(\cdot )_{+} \bigr]'\bigl(\mathcal{A}\bigl(x^{k}\bigr);D\mathcal{A} \bigl(x^{k}\bigr)d ^{k}\bigr) \\& \quad =\lim_{t\rightarrow 0^{+}}t^{-1}\Biggl[\lambda _{1}\Biggl(\mathcal{A}\bigl(x^{k}\bigr)+t \sum _{i=1}^{n}d_{i}^{k} \frac{\partial {\mathcal{A}}(x^{k})}{\partial x_{i}}\Biggr)_{+}-\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr)\bigr)_{+}\Biggr] \\& \quad \leq \lim_{t\rightarrow 0^{+}}t^{-1}\bigl[(1-t) \lambda _{1}\bigl(\mathcal{A}\bigl(x ^{k}\bigr) \bigr)_{+}+t\lambda _{1}(z_{k}E_{m})_{+}- \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+}\bigr] \\& \quad =-\bigl(\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr). \end{aligned}$$
(2.16)

Combining with the definition of the directional derivative \(\theta '_{ \alpha }(x^{k};d^{k})\) and (2.16), we have

$$\begin{aligned} \theta '_{\alpha }\bigl(x^{k};d^{k} \bigr) = & \triangledown f\bigl(x^{k}\bigr)^{ \mathrm{T}}d^{k}+ \alpha \bigl(\bigl[\lambda _{1}(\cdot )_{+} \bigr]'\bigl(\mathcal{A}\bigl(x^{k}\bigr);D \mathcal{A}\bigl(x^{k}\bigr)d^{k}\bigr)\bigr) \\ \leq & \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha \bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr), \end{aligned}$$

that is, inequality (2.13) holds.

By (2.15), we obtain

$$\begin{aligned} & \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha \bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr) \\ &\quad \leq -\sum_{i=1}^{n}d_{i}^{k}{ \operatorname{Tr}} \biggl(\varLambda _{k}\frac{ \partial {\mathcal{A}}(x^{k})}{\partial x_{i}} \biggr)- \bigl(d^{k}\bigr)^{ \mathrm{T}}H_{k}d^{k}- \alpha \bigl(\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}- \lambda _{1}(z_{k}E_{m})_{+} \bigr). \end{aligned}$$
(2.17)

It follows from (2.10c) that

$$ -\operatorname{Tr} \Biggl(\varLambda _{k} \Biggl(\sum _{i=1}^{n}d_{i}^{k} \frac{ \partial {\mathcal{A}}(x^{k})}{\partial x_{i}} \Biggr) \Biggr)=-\operatorname{Tr} \bigl( \varLambda _{k}\bigl(D\mathcal{A}\bigl(x^{k}\bigr)d^{k} \bigr) \bigr)=\operatorname{Tr}\bigl(\varLambda _{k}\bigl(\mathcal{A} \bigl(x^{k}\bigr)-z_{k}E_{m}\bigr)\bigr). $$
(2.18)

Substituting the above equality into (2.17), one has

$$\begin{aligned}& \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha \bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr) \\& \quad \leq -\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}+ \operatorname{Tr}\bigl(\varLambda _{k}\bigl( \mathcal{A} \bigl(x^{k}\bigr)-z_{k}E_{m}\bigr)\bigr)- \alpha \bigl(\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}- \lambda _{1}(z_{k}E_{m})_{+} \bigr), \end{aligned}$$

that is, inequality (2.14) holds. □

The algorithm

In this section, we first present our algorithm in detail, and then analyze its feasibility.

Algorithm A

Step 0.:

Given \(x^{0}\in \mathbb{R}^{n}\), \(H_{0}=E_{n}\) (identity matrix), \(t\in (0,1)\), \(\alpha _{0}>0\), \(\bar{P}\in (1,10)\), \(\sigma \in (0,1)\), \(\beta \in (0,\frac{1}{2})\), \(\eta _{1}>0\). Let \(k:=0\).

Step 1.:

Solve \(\operatorname{LSDP}(x ^{k})\) (2.7) to get an optimal solution \(((\widehat{d}^{k})^{\mathrm{T}},z_{k})^{\mathrm{T}}\). If \(\lambda _{1}(\mathcal{A}(x^{k}))>0\) and \(z_{k}=\lambda _{1}( \mathcal{A}(x^{k}))\), then stop.

Step 2.:

(Generate a master direction). Solve \(\operatorname{QSDP}(x ^{k},H _{k})\) (2.8) to get the optimal solution \(d^{k}\). If \(d^{k}=0\), then stop.

Step 3.:

(Generate a second-order correction direction). Solve subproblem (2.9) and let \(\widetilde{d}^{k}\) be the solution. If there is no solution of subproblem (2.9) or \(\|\widetilde{d} ^{k}\| > \|d^{k}\|\), then set \(\widetilde{d}^{k}=0\).

Step 4.:

(Update \(\alpha _{k}\)) The update rule of \(\alpha _{k}\) is as follows:

$$ \alpha _{k+1}:= \textstyle\begin{cases} \alpha _{k}, & \mbox{if } \Delta (x^{k},\alpha _{k})\leq -(d^{k})^{\mathrm{T}} H_{k} d^{k}; \\ \frac{\triangledown f(x^{k})^{\mathrm{T}}d^{k}+(d^{k})^{\mathrm{T}}H _{k} d^{k}}{\lambda _{1}(\mathcal{A}(x^{k}))_{+}-z_{k}}+\eta _{1},& \mbox{otherwise,} \end{cases} $$
(3.1)

where \(\Delta (x^{k},\alpha _{k})\) is defined by

$$ \Delta \bigl(x^{k},\alpha _{k}\bigr)= \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha _{k}\bigl(\lambda _{1}\bigl(\mathcal{A} \bigl(x^{k}\bigr)\bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr), $$
(3.2)

that is, \(\Delta (x^{k},\alpha _{k})\) is the right-hand side of inequality (2.13).

Step 5.:

(Arc search) Let \(t_{k}\) be the first number of the sequence \(\{1,\sigma ,\sigma ^{2},\ldots \}\) satisfying the following inequalities:

$$\begin{aligned}& \theta _{\alpha _{k+1}}\bigl(x^{k}+t_{k}d^{k}+t_{k}^{2} \widetilde{d}^{k}\bigr) \leq \theta _{\alpha _{k+1}} \bigl(x^{k}\bigr)+\beta t_{k}\Delta \bigl(x^{k},\alpha _{k+1}\bigr), \quad \mbox{if }P \bigl(x^{k}\bigr)\leq \bar{P}; \end{aligned}$$
(3.3)
$$\begin{aligned}& \textstyle\begin{cases} \theta _{\alpha _{k+1}}(x^{k}+t_{k}d^{k}+t_{k}^{2}\widetilde{d}^{k}) \leq \theta _{\alpha _{k+1}}(x^{k})+\beta t_{k}\Delta (x^{k},\alpha _{k+1}), \\ P(x^{k}+t_{k}d^{k}) \leq P(x^{k}), \end{cases}\displaystyle \mbox{if }P\bigl(x^{k}\bigr)> \bar{P}. \end{aligned}$$
(3.4)
Step 6.:

Let \(x^{k+1}:=x^{k}+t_{k}d^{k}+t_{k}^{2} \widetilde{d}^{k}\), update \(H_{k}\) by some method to \(H_{k+1}\) such that \(H_{k+1}\) is positive definite. Set \(k:=k+1\), and return to Step 1.

Remark

In Algorithm A, by means of new modified strategies of subproblem, the quadratic semidefinite programming subproblem (2.8) yielding master searching direction is guaranteed to be consistent; further, it is ensured that the solution to subproblem (2.8) exists. This is very different from the ways in [20, 21].

In what follows, we analyze the feasibility of Algorithm A. To this end, it is necessary to extend the definition of infeasible stationary point for nonlinear programming [25, 26] to nonlinear semidefinite programming.

Definition 3.1

A point \(\widetilde{x}\in \mathbb{R}^{n}\) is called an infeasible stationary point of NLSDP (1.1) if \(\lambda _{1}(\mathcal{A}( \widetilde{x}))>0\) and

$$\begin{aligned} \min_{d\in \mathbb{R}^{n}}\max \bigl\{ \lambda _{1}\bigl(\mathcal{A}( \widetilde{x})+D\mathcal{A}(\widetilde{x})d \bigr), 0\bigr\} =\lambda _{1}\bigl( \mathcal{A}(\widetilde{x}) \bigr)_{+}=P(\widetilde{x}). \end{aligned}$$
(3.5)

Theorem 3.1

Suppose that Assumptions A1–A2 hold, then the following two results are true:

  1. (1)

    If Algorithm A stops at Step 1, then \(x^{k}\) is an infeasible stationary point of NLSDP (1.1).

  2. (2)

    If Algorithm A stops at Step 2, then \(x^{k}\) is a KKT point of NLSDP (1.1).

Proof

(1) If Algorithm A stops at Step 1, i.e., \(z_{k}=\lambda _{1}(\mathcal{A}(x^{k})) =P(x^{k})>0\), then \(x^{k}\) is an infeasible solution of NLSDP (1.1). In the following, we will prove \(x^{k}\) is an infeasible stationary point of NLSDP (1.1), i.e.,

$$ \min_{d\in \mathbb{R}^{n}}\max \bigl\{ \lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr)+D \mathcal{A}\bigl(x^{k} \bigr)d\bigr),0\bigr\} =\max \bigl\{ \lambda _{1}\bigl(\mathcal{A} \bigl(x^{k}\bigr)\bigr),0\bigr\} =P\bigl(x ^{k}\bigr). $$

By contradiction, suppose that \(x^{k}\) is not an infeasible stationary point, so there exists \(d^{k,0}\in \mathbb{R}^{n}\) such that

$$\begin{aligned} \max \bigl\{ \lambda _{1}\bigl(\mathcal{A} \bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)d^{k,0}\bigr),0\bigr\} < P\bigl(x ^{k}\bigr). \end{aligned}$$
(3.6)

If \(\|d^{k,0}\|> 1\), then \(\frac{1}{\|d^{k,0}\|}<1\), so by Lemma 2.4, we have

$$ \max \biggl\{ \lambda _{1} \biggl(\mathcal{A}\bigl(x^{k} \bigr)+\frac{1}{ \Vert d^{k,0} \Vert }D \mathcal{A}\bigl(x^{k} \bigr)d^{k,0} \biggr), 0 \biggr\} < P\bigl(x^{k}\bigr), $$
(3.7)

hence, we suppose, without loss of generality, that \(\|d^{k,0}\| \le 1\). Let

$$ {\widetilde{z}}_{k}:=\max \bigl\{ \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr)\bigr)+D \mathcal{A}\bigl(x^{k}\bigr)d^{k,0}),0\bigr\} < P \bigl(x^{k}\bigr). $$
(3.8)

Obviously, \(((d^{k,0})^{\mathrm{T}},\widetilde{z}_{k})^{\mathrm{T}}\) is a feasible point of subproblem \(\operatorname{LSDP}(x^{k})\) (2.7). Since \(z_{k}\) is the optimal value of \(\operatorname{LSDP}(x ^{k})\) (2.7), one has

$$\begin{aligned} z_{k}\leq \widetilde{z}_{k}< P \bigl(x^{k}\bigr), \end{aligned}$$
(3.9)

which contradicts to \(z_{k}=P(x^{k})\). Hence, if Algorithm A stops at Step 1, then \(x^{k}\) is an infeasible stationary point of NLSDP (1.1).

(2) If Algorithm A stops at Step 2, i.e., \(d^{k}=0\), then by the KKT conditions (2.10a)–(2.10c) of \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8), we obtain

$$\begin{aligned}& \nabla f\bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)^{\ast }\varLambda _{k} =0, \end{aligned}$$
(3.10a)
$$\begin{aligned}& \mathcal{A}\bigl(x^{k}\bigr) \preceq z_{k} E_{m}, \end{aligned}$$
(3.10b)
$$\begin{aligned}& \varLambda _{k} \succeq 0, \end{aligned}$$
(3.10c)
$$\begin{aligned}& \operatorname{Tr}\bigl(\varLambda _{k}\bigl(\mathcal{A} \bigl(x^{k}\bigr)-z_{k}E_{m}\bigr)\bigr) =0. \end{aligned}$$
(3.10d)

Now we prove \(z_{k}=0\). By contradiction, suppose \(z_{k}\neq 0\), then \(z _{k}>0\). So it follows that \(x^{k}\) is an infeasible point of NLSDP (1.1), i.e., \(\lambda _{1}(\mathcal{A}(x^{k})) >0 \). It is obvious that \((0, \lambda _{1}(\mathcal{A}(x^{k})) )\) is a feasible point of \(\operatorname{LSDP}(x^{k})\) (2.7), so the optimal solution \(z_{k}\le \lambda _{1}(\mathcal{A}(x^{k}))\). Since Algorithm A does not stop at Step 1, one has \(z_{k}< \lambda _{1}(\mathcal{A}(x^{k}))\), which implies that \(d=0\) is not a feasible point of \(\operatorname{QSDP} (x^{k},H_{k})\) (2.8). This contradicts the fact that 0 is the optimal solution of \(\operatorname{QSDP} (x^{k},H _{k})\) (2.8). Therefore, \(z_{k}=0\). Substituting \(z_{k}=0\) into (3.10b) and (3.10d), combining with (3.10a), (3.10c), we can conclude that \(x^{k}\) is a KKT point of NLSDP (1.1). □

Lemma 3.1

Suppose that Assumptions A1–A2 hold, if Algorithm A does not stop at Step 1 and \(d^{k}\neq 0\), then the following conclusions are true:

  1. (i)

    If \(P(x^{k})>0\), then the directional derivative \(P'(x ^{k};d^{k})<0\);

  2. (ii)

    \(\theta '_{\alpha _{k+1}}(x^{k};d^{k})<0\);

  3. (iii)

    \(\Delta (x^{k},\alpha _{k+1})<0\), so Algorithm A is well defined.

Proof

(i) If \(P(x^{k})>0\), it means that \(x^{k}\) is an infeasible point of NLSDP (1.1). We can prove \(z _{k}<\lambda _{1}(\mathcal{A}(x^{k}))_{+}\). By contradiction, if \(z _{k}\ge \lambda _{1}(\mathcal{A}(x^{k}))_{+}\), then \((0^{\mathrm{T}}, \lambda _{1}(\mathcal{A}(x^{k}))_{+})\) is the optimal solution of \(\operatorname{LSDP}(x ^{k})\) (2.7), which implies that Algorithm A stops at Step 1. This is a contradiction. So it follows from the definition of directional derivative and (2.16) that

$$\begin{aligned} P'\bigl(x^{k};d^{k}\bigr) = \bigl[ \lambda _{1}(\cdot )_{+}\bigr]'\bigl( \mathcal{A}\bigl(x^{k}\bigr);D \mathcal{A}\bigl(x^{k} \bigr)d^{k}\bigr) \leq -\bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr)\bigr)_{+}- \lambda _{1}(z_{k}E_{m})_{+}\bigr) < 0, \end{aligned}$$
(3.11)

that is, the result (i) is true.

(ii) The proof is divided into two cases.

Case A. \(x^{k}\) is a feasible point of NLSDP (1.1). We obtain \(z_{k}=0\) and \(\lambda _{1}(\mathcal{A}(x^{k}))_{+}=0\), so by (2.14) and Lemma 2.2, we obtain

$$\begin{aligned} \theta _{\alpha _{k+1}}'\bigl(x^{k};d^{k} \bigr) \leq & -\bigl(d^{k}\bigr)^{\mathrm{T}}H _{k}d^{k}+\operatorname{Tr}\bigl(\varLambda _{k} \bigl(\mathcal{A}\bigl(x^{k}\bigr)\bigr)\bigr) \\ \leq & -\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}+ \operatorname{Tr}(\varLambda _{k}) \lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr)\bigr)_{+} \\ =&-\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}< 0, \end{aligned}$$

the last inequality above is due to Assumption A2 and \(d^{k} \ne 0\).

Case B. \(x^{k}\) is an infeasible solution of NLSDP (1.1). This implies \(\lambda _{1}(\mathcal{A}(x^{k}))_{+}=\lambda _{1}( \mathcal{A}(x^{k}))>0\).

Since Algorithm A does not stop at Step 1, we have \(z_{k} < \lambda _{1}(\mathcal{A}(x^{k}))\). Therefore, it follows from (2.13), (3.1), and Assumption A2 that

$$\theta '_{\alpha _{k+1}}\bigl(x^{k};d^{k} \bigr)\leq -\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k}d ^{k}< 0. $$

(iii) If \(x^{k}\) is a feasible point of NLSDP (1.1), then it is obvious that \(z_{k}=\lambda _{1}(\mathcal{A}(x^{k}))_{+}=0\) is the optimal value of \(\operatorname{LSDP}(x^{k})\) (2.7). So \(\Delta (x^{k},\alpha _{k+1})=\triangledown f(x^{k})^{\mathrm{T}}d^{k}\). Note that \(d=0\) is a feasible solution of \(\operatorname{QSDP} (x^{k},H_{k})\) (2.8), so we get

$$ \Delta \bigl(x^{k},\alpha _{k+1}\bigr)=\triangledown f \bigl(x^{k}\bigr)^{\mathrm{T}}d^{k} \le - \bigl(d^{k}\bigr)^{\mathrm{T}}H_{k} d^{k}< 0. $$

If \(x^{k}\) is an infeasible point of NLSDP (1.1), then, according to the update rule (3.1) of \(\alpha _{k}\), it is sufficient to consider the second part of (3.1). It follows from (3.2) and (3.1) that

$$\begin{aligned}& \Delta \bigl(x^{k},\alpha _{k+1}\bigr) \\& \quad \le \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \biggl(\frac{\triangledown f(x^{k})^{\mathrm{T}}d^{k}+(d^{k})^{\mathrm{T}}H_{k} d^{k}}{\lambda _{1}(\mathcal{A}(x^{k}))_{+}-z_{k}}+\eta _{1} \biggr) \bigl(\lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+}-\lambda _{1}(z_{k}E_{m})_{+} \bigr) \\& \quad \le -\bigl(d^{k}\bigr)^{\mathrm{T}}H_{k} d^{k}< 0. \end{aligned}$$
(3.12)

Further, the arc search of Algorithm A is valid. Hence, Algorithm A is well defined. □

Global convergence

Based on Theorem 3.1, in this section, without loss of generality, we suppose that the sequence \(\{x^{k}\}\) generated by Algorithm A is infinite. In what follows, we will prove that \(\{x^{k}\}\) is bounded under some appropriate conditions, and that any accumulation point \(x^{*}\) of \(\{x^{k}\}\) is either an infeasible stationary point, or a KKT point of NLSDP (1.1). To this end, the following additional assumptions are necessary.

A 3

For any \(c>0\), the level set \(L_{c}:=\{x\in \mathbb{R}^{n}\mid P(x) \leq c\}\) is bounded.

A 4

For any feasible point x of NLSDP (1.1), MFCQ is satisfied at x, that is, there exists \(d\in \mathbb{R}^{n}\) such that

$$ \mathcal{A}(x)+D\mathcal{A}(x)d\prec 0. $$

Lemma 4.1

Suppose that Assumptions A1–A3 hold, then the iterative sequence \(\{x^{k}\}\) is bounded.

Proof

One of the following situations occurs:

  1. (i)

    If there exists an integer \(k_{1}\) such that \(P(x^{k}) \leq \bar{P}\) for any \(k>k_{1}\), then \(x^{k}\in L_{\bar{P}}\) for any \(k>k_{1}\). So \(\{x^{k}\}\) is also bounded because \(L_{\bar{P}}\) is bounded.

  2. (ii)

    If there exists an integer \(k_{2}\) such that \(P(x^{k})>\bar{P}\) for any \(k>k_{2}\), then it follows from Step 5 that \(x^{k}\in L_{P(x^{k_{2}})}\) for any \(k>k_{2}\). So \(\{x^{k}\}\) is also bounded because \(L_{P(x^{k_{2}})}\) is bounded.

  3. (iii)

    If both (i) and (ii) do not occur, i.e., \(P(x^{k})\leq \bar{P}\) and \(P(x^{k})>\bar{P}\) occur infinitely, respectively, then there exists an index set \(\{k_{j}\}\) satisfying

    $$ P\bigl(x^{k_{j}}\bigr)\leq \bar{P}, \qquad P \bigl(x^{k_{j}+1}\bigr)>\bar{P},\quad \forall j\in \{1,2,\ldots\}. $$
    (4.1)

So by arc search strategy, there exists an index set \(\{s_{j}\}\) associated with \(\{k_{j}\}\) such that

$$k_{j}< s_{j}< k_{j+1},\quad P \bigl(x^{s_{j}}\bigr)>\bar{P},\qquad P\bigl(x^{s_{j}+1}\bigr)\leq \bar{P},\quad \forall j\in \{1,2,\ldots\}. $$

For convenience, let \(N:=\{1,2,\ldots\}\), \(N_{j}:=\{k\mid k_{j}< k< k_{j+1} \}\), then we obtain

$$ \{k\in N\mid k>k_{1}\}=\bigcup _{j} \bigl\{ N_{j}\cup \{k_{2},k_{3},k_{4}, \ldots \} \bigr\} . $$

We know from (4.1) that \(\{x^{k_{j}}\}\subseteq L_{\bar{P}}\), so \(\{x^{k_{j}}\}\) is bounded, Hence, \(\{x^{k}\}\) is bounded as long as we can prove that there exists \(\bar{c}>0\) such that \(x^{k}\in L_{ \bar{c}}\), \(\forall j\in N\), \(k\in N_{j}\). Combining with the boundedness of \(\{x^{k_{j}}\}\) and Assumption A1, we get \(\{\triangledown f(x^{k_{j}})\}\) is bounded, i.e., there exists \(M>0\) such that \(\| \triangledown f(x^{k_{j}})\|\leq M\) for any \(j\in N\). In addition, it follows from \(\operatorname{LSDP}(x^{k})\) (2.7) and \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8) that \(\{d^{k_{j}}\}\) is bounded. In view of \(x^{k_{j}+1}=x ^{k_{j}}\), we get \(\{x^{k_{j}+1}\}\) is bounded. Further, one obtains \(\{P(x^{k_{j}+1})\}\) is bounded due to the continuity of \(P(x)\). So there exists \(\bar{c}>0\) such that \(P(x^{k_{j}+1})\leq \bar{c}\).

At last, by (4.1) and Step 5 in Algorithm A, one has

$$\begin{aligned} \bar{c}\geq P\bigl(x^{k_{j}+1}\bigr)\geq P\bigl(x^{k_{j}+2}\bigr) \geq \cdots\geq P\bigl(x^{s_{j}}\bigr) \geq \bar{P}, \end{aligned}$$
(4.2)
$$\begin{aligned} \bar{P}\geq P\bigl(x^{s_{j}+1}\bigr),\bar{P}\geq P\bigl(x^{s_{j}+2} \bigr),\ldots,\bar{P} \geq P\bigl(x^{k_{j+1}}\bigr). \end{aligned}$$
(4.3)

We can find \(k_{j}\) and \(k_{j+1}\) such that \(k\in (k_{j},k_{j+1})\) for \(k\in N\). So it follows from (4.2) and (4.3) that \({x^{k}}\in L_{\bar{c}}\), i.e., \(\{x^{k}\}\) is bounded. □

Lemma 4.2

Suppose that Assumptions A1–A4 hold. If \(\alpha _{k}\rightarrow +\infty \), then every accumulation point \(x^{*}\) of \(\{x^{k}\}\) generated by Algorithm A is an infeasible stationary point of NLSDP (1.1).

Proof

If \(\alpha _{k}\rightarrow +\infty \), then it follows from (3.1) that the sequence \(\{\frac{\triangledown f(x ^{k})^{\mathrm{T}}d^{k}+(d^{k})^{\mathrm{T}}H_{k} d^{k}}{\lambda _{1}( \mathcal{A}(x^{k}))_{+}-z_{k}}\}\) diverges to +∞.

By (2.15) and (2.18), we have

$$\begin{aligned} \frac{\triangledown f(x^{k})^{\mathrm{T}}d^{k}+(d^{k})^{\mathrm{T}} H _{k} d^{k}}{\lambda _{1}(\mathcal{A}(x^{k}))_{+}-z_{k}} \le &\frac{ \operatorname{Tr}(\varLambda _{k}(\mathcal{A}(x^{k})-z_{k}E_{m}))}{\lambda _{1}(\mathcal{A}(x^{k})-z_{k}E_{m})} \\ \leq& \frac{\operatorname{Tr}(\varLambda _{k})\lambda _{1}(\mathcal{A}(x^{k})-z_{k}E_{m})}{\lambda _{1}( \mathcal{A}(x^{k})-z_{k}E_{m})} \\ =&\operatorname{Tr}(\varLambda _{k}). \end{aligned}$$
(4.4)

If \(x^{*}\) is a feasible point of NLSDP (1.1), then, by Assumption A4, we know that MFCQ is satisfied at \(x^{*}\). Similar to the proof of Theorem 5.1 in [27], we obtain that the set Ω of the KKT Lagrangian multipliers for \(\operatorname{QSDP}(x^{*},H_{*})\) (2.8) is nonempty and bounded. Note that \(\varLambda _{k} \stackrel{ \mathcal{K}}{\to } \varLambda _{*}\) and \(\varLambda _{*} \in \varOmega \), so \(\{\varLambda _{k}\}\) is bounded. Therefore it follows from (4.4) that \(\{\frac{\triangledown f(x^{k})^{\mathrm{T}}d^{k}+(d^{k})^{ \mathrm{T}} H_{k} d^{k}}{\lambda _{1}(\mathcal{A}(x^{k}))_{+}-z_{k}}\}\) is bounded. This is a contradiction. Hence, \(x^{*}\) is an infeasible point, i.e., \(\lambda _{1}(\mathcal{A}(x^{*}))>0\). Further, it is obvious that \((0,\lambda _{1}(\mathcal{A}(x^{*})))\) is a feasible solution of \(\operatorname{LSDP}(x ^{*})\) (2.7), so \(z_{*}\leq \lambda _{1}(\mathcal{A}(x^{*}))\). Let \((d^{*},z_{*})\) be an optimal solution of \(\operatorname{LSDP} (x^{*})\) (2.7), then by the constraint of \(\operatorname{LSDP}(x^{*})\) (2.7), we have

$$ \lambda _{1}\bigl(\mathcal{A}\bigl(x^{*}\bigr)+D \mathcal{A}\bigl(x^{*}\bigr)d^{*}\bigr)\preceq \lambda _{1}\bigl(\mathcal{A}\bigl(x^{*}\bigr)\bigr), $$

further,

$$ \max \bigl\{ \lambda _{1}\bigl(\mathcal{A}\bigl(x^{*} \bigr)+D\mathcal{A}\bigl(x^{*}\bigr)d^{*}\bigr),0\bigr\} \leq \lambda _{1}\bigl(\mathcal{A}\bigl(x^{*}\bigr) \bigr). $$

Therefore, we get

$$ \min_{d\in \mathbb{R}^{n}}\max \bigl\{ \lambda _{1}\bigl( \mathcal{A}\bigl(x^{*}\bigr)+D \mathcal{A}\bigl(x^{*} \bigr)d\bigr),0\bigr\} \leq \max \bigl\{ \lambda _{1}\bigl(\mathcal{A} \bigl(x^{*}\bigr)+D \mathcal{A}\bigl(x^{*} \bigr)d^{*}\bigr),0\bigr\} \leq \lambda _{1}\bigl( \mathcal{A}\bigl(x^{*}\bigr)\bigr). $$

Let \(d=0\), then \(\max \{\lambda _{1}(\mathcal{A}(x^{*})+D\mathcal{A}(x ^{*})d),0\}=\lambda _{1}(\mathcal{A}(x^{*}))\), which together with the above inequality implies

$$ \min_{d\in \mathbb{R}^{n}}\max \bigl\{ \lambda _{1}\bigl( \mathcal{A}\bigl(x^{*}\bigr)+D \mathcal{A}\bigl(x^{*} \bigr)d\bigr),0\bigr\} =\lambda _{1}(\mathcal{A}\bigl(x^{*} \bigr), $$

that is, \(x^{*}\) is an infeasible stationary point of NLSDP (1.1). □

In the rest of the paper, we assume \(\alpha _{k}<+\infty \). According to the update rule (3.1), the following conclusion is shown easily.

Lemma 4.3

Suppose that Assumptions A1–A4 hold. Then there exists an integer \(k_{0}\) such that \(\alpha _{k}\equiv \alpha _{k_{0}} \triangleq \alpha >0\) for any \(k\geq k_{0}\).

Based on Lemma 4.3, in the rest of the paper, without loss of generality, we assume that \(\alpha _{k}\equiv \alpha \), \(k=1, 2, \ldots \) .

Lemma 4.4

Suppose that Assumptions A1–A2 hold, \(x^{k} {\longrightarrow } x^{*}\), \(H_{k} {\longrightarrow } H_{*}\). Then \(z_{k} {\longrightarrow } z_{*}\), \(d^{k} {\longrightarrow } d^{*}\), where \(z_{k}\), \(z_{*}\) are the optimal solutions of \(\operatorname{LSDP}(x^{k})\) (2.7) and \(\operatorname{LSDP}(x^{*})\) (2.7), respectively, and \(d^{k}\), \(d ^{*}\) are the optimal solutions of \(\operatorname{QSDP}(x^{k}, H_{k})\) (2.8) and \(\operatorname{QSDP}(x^{*}, H_{*})\) (2.8), respectively.

Proof

Since \(z_{k}\) is the optimal value of \(\operatorname{LSDP}(x^{k})\) (2.7), we obtain \(z_{k}<\lambda _{1}(\mathcal{A}(x^{k}))_{+}\) due to the fact that \((0, \lambda _{1}(\mathcal{A}(x^{k}))_{+})\) is a feasible solution of \(\operatorname{LSDP}(x^{k})\) (2.7). By the boundedness of \(\{\lambda _{1}(\mathcal{A}(x^{k}))\}\) and \(z_{k}>0\), it is true that \(\{z_{k}\}\) is bounded. According to the sensitivity theory of semidefinite programming in [28], we know that the first part of the conclusions is true.

Now consider the second part of the conclusions. We first prove \(\{d ^{k}\}\) is bounded. It follows from \(\operatorname{LSDP}(x^{k})\) (2.7) that \(\|\widehat{d}^{k}\|\leq 1\). And obviously, \(\widehat{d}^{k}\) is a feasible solution of \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8), so one obtains

$$\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}+ \frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}H _{k}d^{k}\leq \nabla f\bigl(x^{k} \bigr)^{\mathrm{T}}\widehat{d}^{k}+ \frac{1}{2}\bigl( \widehat{d}^{k}\bigr)^{\mathrm{T}}H_{k} \widehat{d}^{k}, $$

further, the above inequality gives rise to

$$\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}+ \frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}H _{k}d^{k}\leq \bigl\Vert \nabla f \bigl(x^{k}\bigr) \bigr\Vert \bigl\Vert \widehat{d}^{k} \bigr\Vert + \frac{1}{2} \bigl\Vert \widehat{d}^{k} \bigr\Vert ^{2}\bar{a}\leq M_{1}+\frac{1}{2} \bar{a}. $$

On the other hand, one has

$$\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}+ \frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}H _{k}d^{k}\geq - \bigl\Vert \nabla f \bigl(x^{k}\bigr) \bigr\Vert \bigl\Vert d^{k} \bigr\Vert +a \bigl\Vert d^{k} \bigr\Vert ^{2}\geq -M_{1} \bigl\Vert d^{k} \bigr\Vert +a \bigl\Vert d^{k} \bigr\Vert ^{2}. $$

The two inequalities above indicate that \(\{d^{k}\}\) is bounded.

Suppose that \(d^{k} \not \to d^{*}\), then there exists a subsequence \(\{d^{s}\}_{K_{1}}\subseteq \{d^{k}\}\) converging to (\(\neq d ^{*}\)). For any feasible solution d of \(\operatorname{QSDP}(x^{*}, H_{*})\) (2.8), since \(z_{k} {\rightarrow } z_{*}\), there exists a feasible solution \(d^{m}\) of \(\operatorname{QSDP}(x^{s}, H_{s})\) (2.8) such that

$$d^{m} \stackrel{K_{1}}{\longrightarrow } d. $$

Since \(d^{s}\) is the solution of \(\operatorname{QSDP}(x^{s}, H_{s})\) (2.8), one has

$$\nabla f\bigl(x^{s}\bigr)^{\mathrm{T}}d^{s}+ \frac{1}{2}\bigl(d^{s}\bigr)^{\mathrm{T}}H _{s} d^{s}\leq \nabla f\bigl(x^{s} \bigr)^{\mathrm{T}}d^{m}+\frac{1}{2}\bigl(d^{m} \bigr)^{ \mathrm{T}} H_{s} d^{m}. $$

Let \(s \stackrel{K_{1}}{\longrightarrow } \infty \), \(m \stackrel{K_{1}}{ \longrightarrow } \infty \), one gets

$$\nabla f\bigl(x^{*}\bigr)^{\mathrm{T}}\bar{d}+ \frac{1}{2}\bar{d}^{\mathrm{T}}H _{*}\bar{d} \leq \nabla f\bigl(x^{*}\bigr)^{\mathrm{T}}d+\frac{1}{2}d^{ \mathrm{T}}H_{*}d, $$

which means that is a solution of \(\operatorname{QSDP}(x^{*}, H_{*})\) (2.8). This contradicts the uniqueness of the solution of \(\operatorname{QSDP}(x ^{*}, B_{*})\) (2.8). □

Lemma 4.5

Suppose that Assumptions A1–A4 hold, \(x^{*}\) is an accumulation point of the sequence \(\{x^{k}\}\) generated by Algorithm A, i.e., \(x^{k}\stackrel{K}{\longrightarrow } x^{*}\). If \(x^{*}\) is not an infeasible stationary point of NLSDP (1.1), then \(d^{k}\stackrel{K}{\longrightarrow }0\).

Proof

By contradiction, suppose that , then there exist a constant \(b>0\) and an index subset \(K'\subseteq K\) such that

$$ \bigl\Vert d^{k} \bigr\Vert \geq b>0 $$
(4.5)

for any \(k\in K'\). The following proof is divided into two steps.

Step A. We first prove \(\underline{t}:=\inf \{t_{k}, k \in K'\}>0\).

By Taylor expansion and the boundedness of the sequence \(\{d^{k}\}\), one has

$$\begin{aligned} f\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr) =&f\bigl(x^{k}\bigr)+t\nabla f \bigl(x^{k}\bigr)^{ \mathrm{T}}d^{k}+o(t), \\ {\mathcal{A}}\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr) =&\mathcal{A}\bigl(x ^{k} \bigr)+t\sum_{i=1}^{n} d_{i}^{k}\frac{\partial {\mathcal{A}}(x^{k})}{ \partial x_{i}}+o(t). \end{aligned}$$
(4.6)

In view of \(t\leq 1\), combining with the convexity of \(\lambda _{1}( \cdot )_{+}\) and \(\operatorname{QSDP}(x^{k},H_{k})\) (2.8), one obtains

$$\begin{aligned}& \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr)\bigr)_{+} \\& \quad \leq (1-t)\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}+t\lambda _{1} \Biggl( \mathcal{A} \bigl(x^{k}\bigr)+\sum_{i=1}^{n} d_{k_{i}}\frac{\partial {\mathcal{A}}(x ^{k})}{\partial x_{i}} \Biggr)_{+}+o(t) \\& \quad \leq (1-t)\lambda _{1}\bigl(\mathcal{A}\bigl(x^{k} \bigr)\bigr)_{+}+t\lambda _{1}(z_{k}E _{m})_{+}+o(t), \end{aligned}$$
(4.7)

which together with (2.12), (4.6), and (4.7) gives

$$\begin{aligned}& \theta _{\alpha }\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr) \\& \quad \leq f\bigl(x^{k}\bigr)+t\nabla f\bigl(x^{k} \bigr)^{\mathrm{T}}d^{k}+o(t)+\alpha \bigl[(1-t) \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+}+t\lambda _{1}(z_{k}E_{m})_{+}+o(t) \bigr] \\& \quad =f\bigl(x^{k}\bigr)+\alpha \lambda _{1}\bigl( \mathcal{A}\bigl(x^{k}\bigr)\bigr)_{+}+t\bigl(\nabla f \bigl(x ^{k}\bigr)^{\mathrm{T}}d^{k}-\alpha \bigl( \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+}- \lambda _{1}(z_{k} E_{m})_{+}\bigr)\bigr)+o(t) \\& \quad =\theta _{\alpha }\bigl(x^{k}\bigr)+t\Delta \bigl(x^{k},\alpha \bigr)+o(t), \end{aligned}$$

so we obtain

$$\begin{aligned} \theta _{\alpha }\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr)-\theta _{\alpha }\bigl(x^{k} \bigr)-\beta t\triangle \bigl(x^{k},\alpha \bigr)\leq (1-\beta )t \triangle \bigl(x ^{k},\alpha \bigr)+o(t). \end{aligned}$$
(4.8)

By Lemma 4.4 and (3.12), we get

$$ \Delta \bigl(x^{k},\alpha \bigr)\leq -\bigl(d^{k} \bigr)^{\mathrm{T}}H_{k}d^{k}\longrightarrow - \bigl(d^{*}\bigr)^{\mathrm{T}}H_{*}d^{*}< 0 \quad \mbox{as } k\ \bigl(\in K'\bigr)\longrightarrow \infty , $$

so we have

$$ \triangle \bigl(x^{k},\alpha \bigr)\leq -0.5\bigl(d^{*} \bigr)^{\mathrm{T}}H_{*}d^{*} $$

for k (\(\in K'\)) sufficiently large. Substituting the above inequality into (4.8) gives

$$ \theta _{\alpha }\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr)-\theta _{\alpha }\bigl(x^{k} \bigr)-\beta t\triangle \bigl(x^{k},\alpha \bigr)\leq -0.5(1-\beta )t \bigl(d^{*}\bigr)^{ \mathrm{T}}H_{*}d^{*}+o(t), $$
(4.9)

which means that, for k (\(\in K'\)) sufficiently large and t sufficiently small, inequality (3.3) or the first inequality of (3.4) holds.

In what follows, we consider the second inequality of (3.4).

Note that \(P(x^{k})>\bar{P}>0\), so \(P(x^{*})=\lim_{K'} P(x^{k}) \ge \bar{P}>0\), which means \(x^{*}\) is an infeasible solution of NLSDP (1.1). Since \(x^{*}\) is not an infeasible stationary point of NLSDP (1.1), it follows that \(z_{*}-\lambda _{1}(\mathcal{A}(x ^{*}))_{+}<0\). Further, we have

$$ \lambda _{1}(z_{k}E_{m})_{+}- \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+} \longrightarrow z_{*}-\lambda _{1}\bigl(\mathcal{A}\bigl(x^{*}\bigr) \bigr)_{+}< 0\quad \mbox{as }k\bigl(\in K'\bigr) \rightarrow \infty , $$

so it follows that, for k (\(\in K'\)) sufficiently large,

$$ \lambda _{1}(z_{k}E_{m})_{+}- \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+} < 0.5\bigl(z _{*}-\lambda _{1} \bigl(\mathcal{A}\bigl(x^{*}\bigr)\bigr)_{+}\bigr). $$

By (4.7), (2.11), and the above inequality, one has

$$\begin{aligned} P\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr) \leq &P\bigl(x^{k}\bigr)+t \bigl(\lambda _{1}(z _{k}E_{m})_{+}- \lambda _{1}\bigl(\mathcal{A}\bigl(x^{k}\bigr) \bigr)_{+}\bigr)+o(t) \\ \leq &P\bigl(x^{k}\bigr)+0.5t\bigl(z_{*}-\lambda _{1}\bigl(\mathcal{A}\bigl(x^{*}\bigr) \bigr)_{+}\bigr)+o(t), \end{aligned}$$

equivalently,

$$\begin{aligned} P\bigl(x^{k}+td^{k}+t^{2} \widetilde{d}^{k}\bigr)-P\bigl(x^{k}\bigr)\leq 0.5t \bigl(z_{*}-\lambda _{1}\bigl(\mathcal{A} \bigl(x^{*}\bigr)\bigr)_{+}\bigr)+o(t), \end{aligned}$$
(4.10)

which implies that, for k (\(\in K'\)) sufficiently large and t sufficiently small, the second inequality in (3.4) holds.

Summarizing the analysis above, we can conclude \(\underline{t}:= \inf \{t_{k}, k\in K'\}>0\).

Step B. Based on \(\underline{t}= \inf \{t_{k}, k\in K'\}>0 \), we prove a contradiction will occur.

It follows from (3.3) or (3.4) that \(\{\theta _{ \alpha }(x^{k})\}\) is nonincreasing and

$$ \theta _{\alpha }\bigl(x^{k+1}\bigr)\leq \theta _{\alpha }\bigl(x^{k}\bigr)-0.5ab^{2}\beta \underline{t} $$
(4.11)

for any \(k\in K'\), where b is defined in (4.5). And one obtains from (2.12) that

$$ \theta _{\alpha }\bigl(x^{k}\bigr)=f\bigl(x^{k} \bigr)+\alpha \bigl(\lambda _{1}\bigl(\mathcal{A}\bigl(x ^{k}\bigr)\bigr)_{+}\bigr)\geq f\bigl(x^{k} \bigr), $$

combining with the boundedness of \(\{f(x^{k})\}\), we conclude that \(\{\theta _{\alpha }(x^{k})\}\) is convergent. Taking \(k\stackrel{ K'}{ \longrightarrow }\infty \) in (4.11), we obtain \(-0.5ab^{2} \beta \underline{t}\geq 0\). This is a contradiction. So \(\lim_{K}d^{k}=0\). □

Based on the above results, we are now in a position to present the global convergence of Algorithm A.

Theorem 4.1

Suppose that Assumptions A1–A4 hold, \(x^{*}\) is an accumulation point of the sequence \(\{x^{k}\}\) generated by Algorithm A. Then either \(x^{*}\) is an infeasible stationary point, or a KKT point of NLSDP (1.1).

Proof

Without loss of generality, we suppose that \(x^{*}\) is not an infeasible stationary point of NLSDP (1.1). In what follows, we show that \(x^{*}\) is a KKT point of NLSDP (1.1).

By Lemmas 4.44.5, we know that \(d^{*}=0\) is an optimal solution of \(\operatorname{QSDP}(x^{*},H_{*})\) (2.8), so it follows from Lemma 2.5 that there exists \(\varLambda _{*}\in \mathbb{S}^{m}_{+}\) such that

$$\begin{aligned}& \nabla f\bigl(x^{*}\bigr)+D\mathcal{A}\bigl(x^{*} \bigr)^{\ast }\varLambda _{*} = 0, \end{aligned}$$
(4.12a)
$$\begin{aligned}& \mathcal{A}\bigl(x^{*}\bigr) \preceq z_{*} E_{m}, \end{aligned}$$
(4.12b)
$$\begin{aligned}& \operatorname{Tr}\bigl(\varLambda _{*}\bigl(\mathcal{A} \bigl(x^{*}\bigr)-z_{*}E_{m}\bigr)\bigr)=0. \end{aligned}$$
(4.12c)

Now we prove \(z_{*}=0\). By contradiction, if \(z_{*}\neq 0\), then \(z_{*}>0\). Since (0, \(\lambda _{1}(A(x^{*}))_{+}\)) is a feasible solution of \(\operatorname{LSDP}(x^{*})\) (2.7), we get \(\lambda _{1}(A(x^{*}))_{+}\geq z _{*}>0\), which implies \(x^{*}\) is an infeasible point of NLSDP (1.1). On the other hand, we get \(z_{*}\geq \lambda _{1}(A(x^{*}))_{+}>0\) by (4.12b), so \(z_{*}=\lambda _{1}(A(x^{*}))_{+}\). Obviously, \((0, z_{*}=\lambda _{1}(A(x^{*}))_{+})\) is an optimal solution of \(\operatorname{LSDP}(x^{*})\) (2.7). In a manner similar to the proof of Theorem 3.1, we can conclude that \(x^{*}\) is an infeasible stationary point of NLSDP (1.1), this is a contradiction.

Substituting \(z_{*}=0\) into (4.12a)–(4.12c), one obtains

$$\begin{aligned}& \nabla f\bigl(x^{*}\bigr)+D\mathcal{A}\bigl(x^{*} \bigr)^{\ast }\varLambda _{*} =0, \\& \mathcal{A}\bigl(x^{*}\bigr)\preceq 0, \qquad \operatorname{Tr}\bigl( \varLambda _{*}{\mathcal{A}}\bigl(x^{*}\bigr)\bigr) =0, \end{aligned}$$

which means that \(x^{*}\) is a KKT point of NLSDP (1.1). □

Superlinear convergence

In this section, we analyze the superlinear convergence of Algorithm A. At first, we prove that a full step can be accepted for k sufficiently large, and then we present the superlinear convergence. To this end, Assumption A1 should be strengthened as the following one:

A 5

The functions \(f(x)\) and \(\mathcal{A}(x)\) are twice continuously differentiable.

Besides, the following assumptions are necessary:

A 6

([20])

The sequence \(\{x^{k}\}\) generated by Algorithm A is an infinite sequence, and \(\lim_{k\to \infty } x ^{k}= x^{*}\), where \(x^{*}\) is a KKT point of NLSDP (1.1). In addition, let \(\varLambda _{*}\) be the corresponding Lagrangian multiplier.

A 7

([29])

The constrained nondegeneracy condition holds at \((x^{*},\varLambda _{*})\).

Denote

$$\begin{aligned} Y_{ij}^{k}= \biggl(\bigl(p_{i}^{k} \bigr)^{\mathrm{T}}\frac{\partial {\mathcal{A}}(x ^{k})}{\partial x_{1}}p_{j}^{k}, \bigl(p_{i}^{k}\bigr)^{\mathrm{T}} \frac{\partial {\mathcal{A}}(x^{k})}{\partial x_{2}}p_{j}^{k}, \ldots , \bigl(p_{i}^{k}\bigr)^{ \mathrm{T}} \frac{\partial {\mathcal{A}}(x^{k})}{\partial x_{n}}p_{j} ^{k} \biggr)^{\mathrm{T}}, \end{aligned}$$

where \(1\leq i\leq j\leq m-r\), \(\{p_{1}^{k}, \ldots , p_{m-r}^{k}\}\) is an orthogonal basis, which is introduced in Sect. 2. And let \(\lim_{k\to \infty } Y_{ij}^{k}=Y_{ij}^{*}\). The constrained nondegeneracy condition (i.e., Assumption A7) is equivalent to the fact that the matrix \((Y_{11}^{*}, Y_{12}^{*}, Y_{22}^{*}, Y _{13}^{*}, Y_{23}^{*}, Y_{33}^{*}, \ldots , Y_{m-r,m-r}^{*})\) is full of column rank, which implies that \(\varLambda _{*}\) is unique. Based on this result, we know that subproblem (2.9) has a unique solution.

A 8

([29])

The strong second-order sufficient condition holds at \(x^{*}\), i.e.,

$$d^{T}\nabla _{xx}L\bigl(x^{*},\varLambda _{*}\bigr)d+\varGamma _{\mathcal{A}(x^{*})}\bigl( \varLambda _{*},D\mathcal{A}\bigl(x^{*}\bigr)d\bigr)>0 $$

for any \(d\in \operatorname{app}(\varLambda _{*})\backslash \{0\}\), where \(\operatorname{app}(\varLambda _{*}):=\{d\mid D\mathcal{A}(x^{*})d\in \operatorname{aff}(\mathcal{C}(\mathcal{A}(x^{*})+\varLambda _{*}; \mathbb{S}_{-}^{m}))\}\), \(\varGamma _{A}(B, C):=-2\langle B,CA^{\dotplus }C\rangle \), \(A^{\dotplus }\) is the Moore–Penrose pseudoinverse of A.

A 9

The strict complementarity condition is satisfied at \((x^{*},\varLambda _{*})\), i.e.,

$$\operatorname{rank}\bigl(\mathcal{A}\bigl(x^{*}\bigr)\bigr)=r,\qquad \operatorname{rank}(\varLambda _{*})=m-r. $$

A 10

\(\|(W_{k}-H_{k})d^{k}\|=o(\|d^{k}\|)\), where

$$\begin{aligned}& W_{k}=\nabla _{xx}L\bigl(x^{k},\varLambda _{k}\bigr)=\nabla ^{2}f\bigl(x^{k}\bigr)+ \nabla _{xx} ^{2}\bigl\langle \varLambda _{k},\mathcal{A}\bigl(x^{k}\bigr)\bigr\rangle , \\& \nabla _{xx}^{2}\bigl\langle \varLambda _{k},\mathcal{A}\bigl(x^{k}\bigr)\bigr\rangle = \begin{pmatrix} \langle \varLambda _{k},\frac{\partial ^{2}{\mathcal{A}}(x^{k})}{\partial x_{1}\partial x_{1}}\rangle & \ldots &\langle \varLambda _{k},\frac{\partial ^{2}{\mathcal{A}}(x^{k})}{\partial x_{1}\partial x_{n}}\rangle \\ \ldots& \ldots &\ldots \\ \langle \varLambda _{k},\frac{\partial ^{2}{\mathcal{A}}(x^{k})}{\partial x_{n}\partial x_{1}}\rangle & \ldots &\langle \varLambda _{k},\frac{\partial ^{2}{\mathcal{A}}(x^{k})}{\partial x_{n}\partial x_{n}}\rangle \end{pmatrix}. \end{aligned}$$

Based on Assumptions A5–A6, we know that \(z_{k}=0\) for sufficiently large k, so for sufficiently large k, the two subproblems LSDP (2.7) and QSDP (2.8) can be replaced by the following subproblem:

$$ \begin{gathered} \min \quad \nabla f \bigl(x^{k}\bigr)^{\mathrm{T}}d+\frac{1}{2}d^{\mathrm{T}}H _{k}d \\ \mathrm{s.t.} \quad {\mathcal{A}}\bigl(x^{k}\bigr)+D\mathcal{A} \bigl(x^{k}\bigr)d\preceq 0. \end{gathered} $$
(5.1)

The following conclusions are the results in [20], which are important for the analysis of the acceptance of full step size (i.e., Lemma 5.2).

Lemma 5.1

Suppose that Assumptions A2–A10 hold, then the following conclusions are true:

  1. (i)

    \(\lim_{k\rightarrow \infty }\varLambda _{k}=\varLambda _{*}\).

  2. (ii)

    \(\operatorname{rank}(\mathcal{A}(x^{k})+D\mathcal{A}(x^{k})d ^{k})=\operatorname{rank}(\mathcal{A}(x^{*}))=r\) for all k sufficiently large.

  3. (iii)

    If \(\widetilde{d}^{k}\) is a solution to subproblem (2.9), then there exists \(\widehat{\varPhi }_{k}\in {\mathbb{S}} ^{q}\) such that

    $$ \nabla f\bigl(x^{k}\bigr)+H_{k}d_{k}+H_{k} \widetilde{d}^{k}+D\mathcal{A}\bigl(x^{k} \bigr)^{*} {\bar{N}_{k}}^{\mathrm{T}}\widehat{\varPhi }_{k}{\bar{N}_{k}}=0. $$
    (5.2)
  4. (iv)

    \(\|\widetilde{d}^{k}\|=O(\|d^{k}\|^{2}), \|\varPhi _{k}- \widehat{\varPhi }_{k}\|=O(\|d^{k}\|^{2})\) for all k sufficiently large, where \(\varPhi _{k}\) satisfies \(\varLambda _{k}={\bar{N}_{k}}\varPhi _{k}{\bar{N} _{k}}^{\mathrm{T}}\).

  5. (v)

    \(\lambda _{1}(\mathcal{A}(x^{k}+d^{k}+\widetilde{d} ^{k}))\leq 0\) holds for all k sufficiently large.

Based on Lemma 5.1, we are now in a position to show that the full size step can be accepted for k sufficiently large, which plays a key role for the superlinear convergence.

Lemma 5.2

Suppose that Assumptions A2–A10 hold, then the full step size can be accepted, i.e., \(t_{k}\equiv 1\) for k sufficiently large.

Proof

By Lemma 5.1(v), we know \(\lambda _{1}( \mathcal{A}(x^{k}+d^{k}+\widetilde{d}^{k}))\leq 0\) for all k sufficiently large, this implies \(P(x^{k})\leq \bar{P}\) for all k sufficiently large. So according to the arc search strategy (see Step 5 in Algorithm A), it is sufficient to prove

$$\theta _{\alpha }\bigl(x^{k}+d^{k}+ \widetilde{d}^{k}\bigr)\leq \theta _{\alpha }\bigl(x ^{k}\bigr)+\beta \Delta _{k} $$

for k sufficiently large, or equivalently,

$$ T_{k}:=\theta _{\alpha } \bigl(x^{k}+d^{k}+\widetilde{d}^{k}\bigr)- \theta _{\alpha }\bigl(x^{k}\bigr)-\beta \Delta _{k}\leq 0 $$
(5.3)

for k sufficiently large.

By (2.12), Taylor expansion, and Lemma 5.1(iv), (v), we obtain

$$\begin{aligned}& \theta _{\alpha }\bigl(x^{k}+d^{k}+ \widetilde{d}^{k}\bigr)-\theta _{\alpha }\bigl(x ^{k}\bigr) \\& \quad =\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}+ \nabla f\bigl(x^{k}\bigr)^{\mathrm{T}} {\widetilde{d}^{k}}+ \frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}\nabla ^{2}f\bigl(x^{k}\bigr)d ^{k}-\alpha P \bigl(x^{k}\bigr)+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr). \end{aligned}$$
(5.4)

It follows from the constraints of subproblem (2.9) that

$$ \bar{N}_{k}^{\mathrm{T}}\bigl(\mathcal{A} \bigl(x^{k}+d^{k}\bigr)\bigr)\bar{N}_{k}=- \bar{N}_{k}^{\mathrm{T}}\bigl(D\mathcal{A}\bigl(x^{k} \bigr)\widetilde{d}^{k}\bigr)\bar{N} _{k}+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr), $$
(5.5)

which gives rise to

$$\begin{aligned} -\bigl\langle D\mathcal{A}\bigl(x^{k}\bigr)\widetilde{d}^{k}, \bar{N}_{k}\hat{\varPhi } _{k}\bar{N}_{k}^{\mathrm{T}} \bigr\rangle =\bigl\langle {\mathcal{A}}\bigl(x^{k}+d ^{k}\bigr),\bar{N}_{k}\hat{\varPhi }_{k} \bar{N}_{k}^{\mathrm{T}}\bigr\rangle +o\bigl( \bigl\Vert d ^{k} \bigr\Vert ^{2}\bigr). \end{aligned}$$

By (5.2), (2.2), and the above equality, one has

$$\begin{aligned} \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}{ \widetilde{d}^{k}} =&-\bigl\langle D \mathcal{A} \bigl(x^{k}\bigr)\widetilde{d}^{k},\bar{N}_{k} \hat{\varPhi }_{k}\bar{N} _{k}^{\mathrm{T}}\bigr\rangle -\bigl(d^{k}+\widetilde{d}^{k} \bigr)^{\mathrm{T}}B_{k} {\widetilde{d}^{k}} \\ =&\bigl\langle {\mathcal{A}}\bigl(x^{k}+d^{k}\bigr), \bar{N}_{k}\hat{\varPhi }_{k} \bar{N}_{k}^{\mathrm{T}} \bigr\rangle +o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr) \\ =&\bigl\langle \bar{N}_{k}^{\mathrm{T}}{\mathcal{A}} \bigl(x^{k}+d^{k}\bigr)\bar{N} _{k},\hat{ \varPhi }_{k}\bigr\rangle +o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr) \\ =&\bigl\langle \bar{N}_{k}^{\mathrm{T}}{\mathcal{A}} \bigl(x^{k}+d^{k}\bigr)\bar{N} _{k},\hat{ \varPhi }_{k}-\varPhi _{k}\bigr\rangle \\ &{}+\bigl\langle \bar{N}_{k}^{\mathrm{T}} {\mathcal{A}}\bigl(x^{k}+d^{k} \bigr)\bar{N}_{k},\varPhi _{k}\bigr\rangle +o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr) \\ =& \bigl\langle {\mathcal{A}}\bigl(x^{k}+d^{k}\bigr), \bar{N}_{k}\varPhi _{k} \bar{N} _{k}^{\mathrm{T}} \bigr\rangle +o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr), \end{aligned}$$

where the last equality is due to Lemma 5.1(iv) and (5.5).

Note that \({\bar{N}_{k}}\varPhi _{k}{\bar{N}_{k}}^{\mathrm{T}}=\varLambda _{k}\), by Taylor expansion and (2.10c), we get

$$\begin{aligned} \triangledown f\bigl(x^{k}\bigr)^{\mathrm{T}}{ \widetilde{d}^{k}} =&\biggl\langle {\mathcal{A}} \bigl(x^{k}\bigr)+D\mathcal{A}\bigl(x^{k} \bigr)d^{k}+\frac{1}{2}\bigl(d^{k} \bigr)^{ \mathrm{T}}D^{2}{\mathcal{A}}\bigl(x^{k} \bigr)d^{k},\varLambda _{k}\biggr\rangle +o\bigl( \bigl\Vert d ^{k} \bigr\Vert ^{2}\bigr) \\ =&\biggl\langle \frac{1}{2}\bigl(d^{k} \bigr)^{\mathrm{T}}D^{2}{\mathcal{A}}\bigl(x^{k} \bigr)d ^{k},\varLambda _{k}\biggr\rangle +o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr) \\ =&\frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}\nabla _{xx}^{2}\bigl\langle \varLambda _{k}, \mathcal{A}\bigl(x^{k}\bigr)\bigr\rangle d^{k}+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr). \end{aligned}$$
(5.6)

By (5.3), (5.4), and (5.6), we have

$$\begin{aligned} T_{k} =&\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha P\bigl(x^{k}\bigr)+\nabla f\bigl(x ^{k} \bigr)^{\mathrm{T}}{\widetilde{d}^{k}}+\frac{1}{2} \bigl(d^{k}\bigr)^{\mathrm{T}} \nabla ^{2}f \bigl(x^{k}\bigr)d^{k} \\ &{}+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr)-\beta \Delta _{k} \\ =&\nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha P\bigl(x^{k}\bigr)+\frac{1}{2}\bigl(d ^{k}\bigr)^{\mathrm{T}}\nabla _{xx}^{2} \bigl\langle \varLambda _{k},\mathcal{A}\bigl(x ^{k} \bigr)\bigr\rangle d^{k} \\ &{}+\frac{1}{2}\bigl(d^{k}\bigr)^{\mathrm{T}}\nabla ^{2}f\bigl(x^{k}\bigr)d^{k}-\beta \Delta _{k}+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr). \end{aligned}$$
(5.7)

It follows from (3.2) and (3.12) that

$$ \nabla f\bigl(x^{k}\bigr)^{\mathrm{T}}d^{k}- \alpha P\bigl(x^{k}\bigr)=\Delta _{k}\leq -\bigl(d ^{k}\bigr)^{\mathrm{T}}H_{k}d^{k}. $$
(5.8)

Noting that \(\frac{1}{2}-\beta >0\), it follows from (5.7), (5.8), and Assumption A2 that

$$\begin{aligned} T_{k} \leq &\biggl(\frac{1}{2}-\beta \biggr)\Delta _{k}+\frac{1}{2}\bigl(d^{k} \bigr)^{ \mathrm{T}}(W_{k}-H_{k})d^{k}+o \bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr) \\ \leq &-\biggl(\frac{1}{2}-\beta \biggr)a \bigl\Vert d^{k} \bigr\Vert ^{2}+\frac{1}{2} \bigl(d^{k}\bigr)^{ \mathrm{T}}(W_{k}-H_{k})d^{k}+o \bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr), \end{aligned}$$

which together with Assumption A10 gives

$$T_{k}\leq -\biggl(\frac{1}{2}-\beta \biggr)a \bigl\Vert d^{k} \bigr\Vert ^{2}+o\bigl( \bigl\Vert d^{k} \bigr\Vert ^{2}\bigr)\leq 0 $$

for k sufficiently large. So we get the conclusion. □

Based on Lemma 5.2, we now present the superlinear convergence of Algorithm A. The proof is similar to that of Theorem 3.3 in [17].

Theorem 5.1

Suppose that Assumptions A2–A10 hold, the sequence \(\{x^{k}\}\) generated by Algorithm A is superlinearly convergent, that is, \(\|x^{k+1}-x^{*}\|=o(\|x^{k}-x^{*}\|)\).

Numerical experiments

In this section, preliminary numerical experiments of Algorithm A are implemented. The tested problems are chosen from [13, 30]. Algorithm A was coded by Matlab (2017a) and run on a computer with 3.60 GHz CPU with Windows 7 (64 bite) system.

The parameters are chosen as follows:

$$ \alpha _{0}=80.1,\qquad \bar{P}=5,\qquad \eta 1=0.1,\qquad \eta 2=0.2,\qquad \beta =0.4; $$

The stop criterion is: \(\| d^{k}\| \leq 10^{-6}\).

Problem 1

([30])

$$ \begin{gathered} \min \quad f(x)=\sin {x_{1}}+\cos {x_{2}} \\ \mathrm{s.t.} \quad \begin{pmatrix} x_{1} & 1 \\ 1 &x_{2} \end{pmatrix} \preceq 0, \\ x=(x_{1},x_{2})^{\mathrm{T}}\in \mathbb{R}^{2}. \end{gathered} $$
(6.1)

Problem 2

([30])

$$ \begin{gathered} \min \quad f(x)=e^{-x_{1}-x_{2}} \\ \mathrm{s.t} \quad \begin{pmatrix} x_{1} & 1 \\ 1 &x_{2} \end{pmatrix} \preceq 0, \\ x=(x_{1},x_{2})^{\mathrm{T}}\in \mathbb{R}^{2}. \end{gathered} $$
(6.2)

For the above two tested problems, we compare Algorithm A with the algorithm in [30]. The numerical results are listed in Table 1. The meaning of the notations in Table 1 is described as follows:

$$\textstyle\begin{array} {rl@{\qquad }rl} x^{0}: & \mbox{the initial point}, & \mbox{Iter}: & \mbox{the number of iterations}, \\ \mbox{time}: & \mbox{the CPU time}, & f_{\mathrm{final}}:& \mbox{the final objective value}. \end{array} $$
Table 1 Numerical results

The numerical results in Table 1 indicate that Algorithm A is much more robust than Algorithm [30], although the CPU time of Algorithm A is more than that of Algorithm [30]. The less time of Algorithm [30] is due to the fact that Algorithm [30] only solves a subproblem at each iteration.

Problem 3

([13])

The nearest correlation matrix (NCM) problem:

$$ \begin{gathered} \min \quad f(X)=\frac{1}{2} \Vert X-C \Vert _{F}^{2} \\ \mathrm{s.t} \quad X\preceq \epsilon I, \\ \quad X_{ii}=1,\quad i=1,2,\ldots,m, \end{gathered} $$
(6.3)

where \(C\in {\mathbb{S}}^{m}\) is a given matrix, \(X\in {\mathbb{S} ^{m}}\), ϵ is a scalar.

In the implementation, \(\epsilon =10^{-3}\), the matrix C is generated randomly, and its diagonal elements are 1. We test ten times for every fixed dimensionality.

The numerical results are listed in Table 2. The meaning of the notations in Table 2 is described as follows:

$$\textstyle\begin{array} {rl@{\qquad }rl} n: & \mbox{the dimensionality of }x;& m: & \mbox{the order of the matrix }\mathcal{A}(x); \\ \mbox{A-Iter}: & \mbox{the average iterative number}. \end{array} $$
Table 2 Numerical results of NCM

The numerical results in Table 2 indicate that the average iterative number of Algorithm A is less than that of the other two algorithms. Hence, Algorithm A is comparable.

Concluding remarks

In this paper, we have presented a new SSDP algorithm for nonlinear semidefinite programming. Two subproblems, which are constructed skillfully, are solved to generate the master search directions. In order to avoid the Maratos effect, a second-order correction direction is introduced by solving a new quadratic programming. A penalty function is used as a merit function for arc search. The global convergence and superlinearly convergence of the proposed algorithm are shown under some mild conditions. The preliminary numerical results indicate that the proposed algorithm is effective and comparable.

References

  1. 1.

    Tuyen, N.V., Yao, J.C., Wen, C.F.: A note on approximate Karush–Kuhn–Tucker conditions in locally Lipschitz multiobjective optimization. Optim. Lett. 13, 163–174 (2019)

  2. 2.

    Chang, S.S., Wen, C.F., Yao, J.C.: Common zero point for a finite family of inclusion problems of accretive mappings in Banach spaces. Optimization 67, 1183–1196 (2018)

  3. 3.

    Takahashi, W., Wen, C.F., Yao, J.C.: The shrinking projection method for a finite family of demimetric mappings with variational inequality problems in a Hilbert space. Fixed Point Theory 19, 407–419 (2018)

  4. 4.

    Kanno, Y., Takewaki, I.: Sequential semidefinite program for maximum robustness design of structures under load uncertainty. J. Optim. Theory Appl. 130(2), 265–287 (2006)

  5. 5.

    Jarre, F.: An interior point method for semidefinite programming. Optim. Eng. 1, 347–372 (2000)

  6. 6.

    Ben, T.A., Jarre, F., Kovara, M., Nemirovski, A., Zowe, J.: Optimization design of trusses under a nonconvex global buckling constraint. Optim. Eng. 1, 189–213 (2000)

  7. 7.

    Noll, D., Torki, M., Apkarian, P.: Partially augmented Lagrangian method for matrix inequality constraints. SIAM J. Optim. 15(1), 161–184 (2001)

  8. 8.

    Sun, J., Zhang, L.W., Wu, Y.: Properties of the augmented Lagrangian in nonlinear semidefinite optimization. J. Optim. Theory Appl. 129(3), 437–456 (2006)

  9. 9.

    Sun, D.F., Sun, J., Zhang, L.W.: The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math. Program. 114(2), 349–391 (2008)

  10. 10.

    Luo, H.Z., Wu, H.X., Chen, G.T.: On the convergence of augmented Lagrangian methods for nonlinear semidefinite programming. J. Glob. Optim. 54(3), 599–618 (2012)

  11. 11.

    Wu, H.X., Luo, H.Z., Ding, X.D.: Global convergence of modified augmented Lagrangian methods for nonlinear semidefinite programming. Comput. Optim. Appl. 56(3), 531–558 (2013)

  12. 12.

    Wu, H.X., Luo, H.Z., Yang, J.F.: Nonlinear separation approach for the augmented Lagrangian in nonlinear semidefinite programming. J. Glob. Optim. 59(4), 695–727 (2014)

  13. 13.

    Yamashita, H., Yabe, H., Harada, K.: A primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 135, 89–121 (2012)

  14. 14.

    Yamashita, H., Yabe, H.: Local and superlinear convergence of a primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 132, 1–30 (2012)

  15. 15.

    Fares, B., Noll, D., Apkarian, P.: Robust control via sequential semidefinite programming. SIAM J. Control Optim. 40, 1791–1820 (2002)

  16. 16.

    Correa, R., Ramirez, H.C.: A global algorithm for nonlinear semidefinite programming. SIAM J. Optim. 15(1), 303–318 (2004)

  17. 17.

    Wang, Y., Zhang, S.W., Zhang, L.W.: A note on convergence analysis of an SQP-type method for nonlinear semidefinite programming. J. Inequal. Appl. 2008, 218345 (2007)

  18. 18.

    Gomez, W., Ramirez, H.: A filter algorithm for nonlinear semidefinite programming. Appl. Math. Comput. 29(29), 297–328 (2010)

  19. 19.

    Zhu, Z.B., Zhu, H.L.: A filter method for nonlinear semidefinite programming with global convergence. Acta Math. Sin. 30(10), 1810–1826 (2014)

  20. 20.

    Zhao, Q., Chen, Z.W.: On the superlinear local convergence of a penalty-free method for nonlinear semidefinite programming. J. Comput. Appl. Math. 308, 1–19 (2016)

  21. 21.

    Zhao, Q., Chen, Z.W.: An SQP-type method with superlinear convergence for nonlinear semidefinite programming. Asia-Pac. J. Oper. Res. 35, 1850009 (2018)

  22. 22.

    Zhang, J.L., Zhang, X.S.: A robust SQP method for optimization with inequality constraints. J. Comput. Appl. Math. 21(2), 247–256 (2003)

  23. 23.

    Theobald, C.M.: An inequality for the trace of the product of two symmetric matrices. Math. Proc. Camb. Philos. Soc. 77(2), 265–267 (1975)

  24. 24.

    So, W.: Commutativity and spectra of Hermitian matrices. Linear Algebra Appl. 212–213(15), 121–129 (1994)

  25. 25.

    Dunn, J.C.: On the convergence of projected gradient processes to singular critical points. J. Optim. Theory Appl. 55(2), 203–216 (1987)

  26. 26.

    Burke, J.V., Moré, J.J.: On the identification of active constraints. SIAM J. Numer. Anal. 25, 1197–1211 (1988)

  27. 27.

    Burke, J.V., Han, S.P.: A robust sequential quadratic programming method. Math. Program. 43(1–3), 277–303 (1989)

  28. 28.

    Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems, vol. 13, pp. 63–83. Springer, Berlin (2000)

  29. 29.

    Sun, D.F.: The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math. Oper. Res. 31(4), 761–776 (2006)

  30. 30.

    Li, Z.Y.: Differential-Algebraic Algorithm and Sequential Penalty Algorithm for Semidefinite Programming. Fujian Normal University, Fujian (2006) (in Chinese)

  31. 31.

    Yamakawa, Y., Yamashita, N.: A differentiable merit function for the shifted perturbed Karush–Kuhn–Tucker conditions of the nonlinear semidefinite programming. Pac. J. Optim. 11(3), 557–579 (2015)

Download references

Availability of data and materials

Not applicable

Authors’ information

Jianling Li, Email: jianlingli@126.com; Hui Zhang, Email: 605007820@qq.com

Funding

The work is supported by the National Natural Science Foundation (No.11561005) and the Science Foundation of Guangxi Province (No. 2016GXNSFAA380248).

Author information

The authors completed the paper. The authors read and approved the final manuscript.

Correspondence to Jian Ling Li.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Nonlinear semidefinite programming
  • Penalty function
  • Sequential semidefinite programming
  • Global convergence
  • Superlinear convergence