In this section, for the sake of simplicity, we introduce and use the following notations for the problem (1) in this paper:
$$\begin{aligned}& X: = \bigl\{ x \in R^{n}: f_{j}(x) \le 0, j \in J\bigr\} , \end{aligned}$$
(2)
$$\begin{aligned}& I(x): = \bigl\{ i \in I:f_{i}(x) = F(x)\bigr\} ,\qquad J(x): = \bigl\{ j \in J:f_{j}(x) = 0\bigr\} . \end{aligned}$$
(3)
According to the analysis of other projection algorithms, we need the following linear independence assumption.
Assumption A1
The functions \(f_{j}(x)\) (\(j \in I \cup J\)) are all first order continuously differentiable, and there exists an index \(l_{x} \in I(x)\) for each \(x \in X\) such that the gradient vectors \(\{ \nabla f_{i}(x)  \nabla f_{l_{x}}(x), i \in I(x)\backslash \{ l_{x}\};\nabla f_{j}(x),j \in J(x)\}\) are linearly independent.
Remark 1
We can easily find that Assumption A1 is equivalent to Assumption A11:
Assumption A11
The vectors \(\{ \nabla f_{i}(x)  \nabla f_{t}(x), i \in I(x)\backslash \{ t\}; \nabla f_{j}(x),j \in J(x)\} \) are linearly independent for arbitrarily \(t \in I(x)\).
For a given point \(x^{k} \) and the parameter \(\varepsilon \ge 0\), we use the following approximate active set in this paper:
$$ \textstyle\begin{cases} I_{k} = \{ i \in I:  \tilde{\varrho}_{k} \le f_{i}(x^{k})  F(x^{k}) \le 0\}, \\ J_{k} = \{ j \in J:  \tilde{\varrho}_{k} \le f_{j}(x^{k}) \le 0\}, \end{cases} $$
(4)
where \(\tilde{\varrho}_{k}\) is taken by
$$ \tilde{\varrho}_{0} = \varepsilon,\qquad \tilde{\varrho}_{k} = \min \{ \varepsilon,\varrho_{k  1}\},\quad k \ge 1, $$
(5)
\(\varrho_{k  1} \ge 0\) is the identification function corresponding to the last iteration point \(x^{k  1} \) and it can be calculated by (17). At each iteration, only \(\varrho_{k}\) need to be computed, since \(\varrho_{k  1} \) has already been calculated at the previous iteration, so the total computing capacity is not increase at each iteration.
For the current iteration point \(x^{k}\), for convenience of statement, we define
$$\begin{aligned}& F^{k}: = F\bigl(x^{k}\bigr),\qquad f_{i}^{k}: = f_{i}\bigl(x^{k}\bigr),\qquad g_{i}^{k}: = \nabla f_{i}\bigl(x^{k}\bigr),\quad i \in I \cup J, \\& l_{k} \in I\bigl(x^{k}\bigr),\qquad I_{k}^{0}: = I_{k}\backslash \{ l_{k}\},\qquad L_{k}: = I_{k}^{0} \cup J_{k}: = (I_{k} \cup J_{k})\backslash \{ l_{k}\}, \end{aligned}$$
where \(l_{k} \in I(x^{k})\). In theory, the index \(l_{k}\) can be any component of the active set \(I(x^{k})\) of Algorithm A, and all the theoretical analyses are the same. But we set \(l_{k} = l_{x^{k}} =:\min \{ i:i \in I(x^{k})\} \) for convenience. We can easily find that \(I(x^{k}) \subseteq I_{k}\), \(J(x^{k}) \subseteq J_{k}\). Therefore, for the feasible point \(x^{k}\) of the problem (1), its stationary point (optimality conditions) can be described as
$$ \textstyle\begin{cases} \sum_{i \in I_{k}}\lambda_{i}^{k}g_{i}^{k} + \sum_{j \in J_{k}}\lambda_{j}^{k}g_{j}^{k} = 0,\sum_{i \in I_{k}}\lambda_{i}^{k} = 1, \\ \lambda_{i}^{k} \ge 0,\lambda_{i}^{k}(f_{i}^{k}  F^{k}) = 0,i \in I_{k};\lambda_{j}^{k} \ge 0,\lambda_{j}^{k}f_{j}^{k} = 0,j \in J_{k}. \end{cases} $$
(6)
The foregoing conditions can be rewritten in the following equivalent form:
$$ \textstyle\begin{cases} \sum_{i \in I_{k}^{0}}\lambda_{i}^{k}(g_{i}^{k}  g_{l_{k}}^{k}) + \sum_{j \in J_{k}}\lambda_{j}^{k}g_{j}^{k} =  g_{l_{k}}^{k}, \\ \lambda_{i}^{k} \ge 0,\lambda_{i}^{k}(f_{i}^{k}  F^{k}) = 0,i \in I_{k}^{0};\lambda_{j}^{k} \ge 0,\lambda_{j}^{k}f_{j}^{k} = 0,j \in J_{k}, \\ \lambda_{l_{k}}^{k} = 1  \sum_{i \in I_{k}^{0}}\lambda_{i}^{k} \ge 0. \end{cases} $$
(7)
We define a generalized gradient projection matrix to test whether the current iteration point \(x^{k}\) satisfies (6)
$$ P_{k}: = E_{n}  N_{k}Q_{k}, $$
(8)
where \(E_{n}\) is an nthorder identity matrix.
$$\begin{aligned}& N_{k}: = \bigl(g_{i}^{k}  g_{l_{k}}^{k},i \in I_{k}^{0};g_{j}^{k},j \in J_{k}\bigr),\qquad Q_{k}: = \bigl(N_{k}^{T}N_{k} + D_{k}\bigr)^{  1}N_{k}^{T}, \end{aligned}$$
(9)
$$\begin{aligned}& D_{k}: = \operatorname{diag}\bigl(D_{j}^{k},j \in L_{k}\bigr),\qquad D_{j}^{k}: = \textstyle\begin{cases} (F^{k}  f_{j}^{k})^{p},&j \in I_{k}^{0}; \\ (  f_{j}^{k})^{p}, &j \in J_{k}. \end{cases}\displaystyle \end{aligned}$$
(10)
Suppose that (\(x^{k},\lambda_{L_{k}}^{k}\)) is a stationary point pair, from (7), it is not difficult to get
$$N_{k}^{T}N_{k}\lambda_{L_{k}}^{k} =  N_{k}^{T}g_{l_{k}}^{k},\qquad D_{k}\lambda_{L_{k}}^{k} = 0,\qquad \lambda_{L_{k}}^{k} \ge 0. $$
These further imply that
$$\begin{aligned}& \bigl(N_{k}^{T}N_{k} + D_{k}\bigr) \lambda_{L_{k}}^{k} =  N_{k}^{T}g_{l_{k}}^{k}, \qquad \lambda_{L_{k}}^{k} =  Q_{k}g_{l_{k}}^{k}, \end{aligned}$$
(11)
$$\begin{aligned}& P_{k}g_{l_{k}}^{k} = 0, \qquad \lambda_{l_{k}}^{k} \ge 0, \qquad D_{j}^{k}\lambda_{j}^{k} = 0,\qquad \lambda_{j}^{k} \ge 0,\quad j \in L_{k}. \end{aligned}$$
(12)
Based on the above analysis, we introduce the following optimal identification function for the stationary point:
$$ \rho_{k}: = \bigl\Vert P_{k}g_{l_{k}}^{k} \bigr\Vert ^{2} + \omega_{k} + \bar{\omega}_{k}^{2}, $$
(13)
where
$$\begin{aligned}& \omega_{k}: = \sum_{j \in L_{k}}\max \bigl\{  \mu_{j}^{k},\mu_{j}^{k}D_{j}^{k} \bigr\} ,\qquad \bar{\omega}_{k}: = \max \bigl\{  \mu_{l_{k}}^{k},0 \bigr\} , \end{aligned}$$
(14)
$$\begin{aligned}& \mu_{L_{k}}^{k}: =  Q_{k}g_{l_{k}}^{k},\qquad \mu_{l_{k}}^{k}: = 1  \sum_{i \in I_{k}^{0}} \mu_{i}^{k}. \end{aligned}$$
(15)
Before describing our algorithm, one has the following lemma.
Lemma 1
Suppose that Assumption
A1
holds. Then

(i)
the matrix
\(N_{k}^{T}N_{k} + D_{k}\)
is nonsingular and positive definite. Suppose
\(x^{k} \to x^{*}\), \(N_{k} \to N_{ *}\), \(D_{k} \to D_{ *}\), then the matrix
\(N_{ *}^{T}N_{ *} + D_{ *}\)
is also nonsingular and positive definite;

(ii)
\(N_{k}^{T}P_{k} = D_{k}Q_{k}\), \(N_{k}^{T}Q_{k}^{T} = E_{\vert L_{k}\vert }  D_{k}(N_{k}^{T}N_{k} + D_{k})^{  1}\);

(iii)
\((g_{l_{k}}^{k})^{T}P_{k}g_{l_{k}}^{k} = \Vert P_{k}g_{l_{k}}^{k}\Vert ^{2} + \sum_{j \in L_{k}}(\mu_{j}^{k})^{2}D_{j}^{k}\);

(iv)
\(\rho_{k} = 0\)
if and only if
\(x^{k}\)
is a stationary point of the problem (1).
Proof
Under Assumption A1, it is shown that the matrix \(N_{k}^{T}N_{k} + D_{k}\) is nonsingular by [25], Theorem 1.1.9. Then, we can prove that the matrix \(N_{ *}^{T}N_{ *} + D_{ *}\) is also definite similar to [25], Lemma 2.2.2. The conclusions (ii) and (iii) can obtained similar to [25], Theorem 1.1.9. Next, we prove the conclusion (iv) in detail below.
(iv) If \(\rho_{k} = 0\), combining the equations (13), (8), and (15), we obtain
$$0 = P_{k}g_{l_{k}}^{k} = g_{l_{k}}^{k}  N_{k}Q_{k}g_{l_{k}}^{k} = g_{l_{k}}^{k} + N_{k}\mu_{L_{k}}^{k}. $$
We have \(\max \{  \mu_{j}^{k},\mu_{j}^{k}D_{j}\} = 0\) by \(\omega_{k} = 0\), which follows by \(\mu_{j}^{k} \ge 0\), \(\mu_{j}^{k}D_{j}^{k} = 0\), \(\forall j \in L_{k} \) and it is not difficult to get \(\mu_{l_{k}}^{k} \ge 0\) by \(\bar{\omega}_{k} = 0\). So we have
$$\textstyle\begin{cases} \sum_{i \in I_{k}}\mu_{i}^{k}g_{i}^{k} + \sum_{j \in J_{k}}\mu_{j}^{k}g_{j}^{k} = 0, \sum_{i \in I_{k}}\mu_{i}^{k} = 1, \\ \mu_{i}^{k} \ge 0, \mu_{i}^{k}(f_{i}^{k}  F^{k}) = 0, i \in I_{k};\mu_{j}^{k} \ge 0, \mu_{j}^{k}f_{j}^{k} = 0, j \in J_{k}. \end{cases} $$
Therefore, \(x^{k} \) is a stationary point of the problem (1) with the multiplier \((\mu_{I_{k} \cup J_{k}}^{k}, 0_{(I \cup J)\backslash (I_{k} \cup J_{k})})\).
Conversely, if \(x^{k} \) is a stationary point of the problem (1) with the multiplier vector \(\lambda^{k}\), then from the (6)(15) one knows that \(\mu_{L_{k}}^{k} = \lambda_{L_{k}}^{k}\), and \(\rho_{k} = 0\). □
The above results show that the current iteration point \(x^{k}\) is a stationary point of the problem (1) if and only if \(\rho_{k} = 0\), that is, \(\rho_{k} \) is optimal identification function. In case of \(\rho_{k} > 0\), together with \(I_{k}\) and \(J_{k}\), we compute the search direction, which is motivated by the generalized gradient projection technique in [25], Chapter II, and [26]
$$ d^{k} = \rho_{k}^{\xi} \bigl\{  P_{k}g_{l_{k}}^{k} + Q_{k}^{T}v_{L_{k}}^{k} \bigr\}  \varrho_{k}Q_{k}^{T}e^{k}, $$
(16)
where the parameter \(\xi > 0\), \(e^{k} = (1,1, \ldots,1)^{T} \in R^{\vert L_{k}\vert }\),
$$ \varrho_{k} = \frac{\rho_{k}^{1 + \xi}}{1 + \Vert \mu_{L_{k}}^{k}\Vert _{1}}, $$
(17)
the vector \(v_{L_{k}}^{k} = (v_{j}^{k}, j \in L_{k})\) with
$$ v_{j}^{k} = \textstyle\begin{cases} \bar{\omega}_{k}  1,& \mbox{if }\mu_{j}^{k} < 0,j \in I_{k}^{0}; \\ \bar{\omega}_{k} + D_{j}^{k},&\mbox{if }\mu_{j}^{k} \ge 0,j \in I_{k}^{0}, \end{cases}\displaystyle \qquad v_{j}^{k} = \textstyle\begin{cases}  1,&\mbox{if }\mu_{j}^{k} < 0,j \in J_{k}; \\ D_{j}^{k},&\mbox{if }\mu_{j}^{k} \ge 0,j \in J_{k}. \end{cases} $$
(18)
The following lemma further describes the important characteristics of the search direction, which is feasible and descent.
Lemma 2
Suppose that Assumption
A1
holds. Then

(i)
\((g_{l_{k}}^{k})^{T}d^{k} \le  \rho_{k}^{\xi} \bar{\omega}_{k}  \varrho_{k}\);

(ii)
\((g_{j}^{k})^{T}d^{k} \le  \varrho_{k}\), \(\forall j \in (I(x^{k})\backslash \{ l_{k}\} ) \cup J(x^{k})\);

(iii)
\(F'(x^{k};d^{k}) \le  \varrho_{k}\), where
\(F'(x^{k};d^{k})\)
is the directional derivative of
\(F(x)\)
at the point
\(x^{k}\)
along the direction
\(d^{k}\).
Proof
(i) First, by (16), together with Lemma 1(iii), (15), (18) and (14), we obtain
$$\begin{aligned} \bigl(g_{l_{k}}^{k}\bigr)^{T}d^{k} &= \rho_{k}^{\xi} \bigl\{  \bigl(g_{l_{k}}^{k} \bigr)^{T}P_{k}g_{l_{k}}^{k} + \bigl(Q_{k}g_{l_{k}}^{k}\bigr)^{T}v_{L_{k}}^{k} \bigr\}  \varrho_{k}\bigl(Q_{k}g_{l_{k}}^{k} \bigr)^{T}e^{k} \\ &= \rho_{k}^{\xi} \biggl\{  \bigl\Vert P_{k}g_{l_{k}}^{k}\bigr\Vert ^{2}  \sum _{j \in L_{k}}\bigl(\mu_{j}^{k} \bigr)^{2}D_{j}^{k}  \bigl(\mu_{L_{k}}^{k} \bigr)^{T}v_{L_{k}}^{k}\biggr\} + \varrho_{k} \bigl(\mu_{L_{k}}^{k}\bigr)^{T}e^{k} \\ &\le \rho_{k}^{\xi} \biggl\{  \bigl\Vert P_{k}g_{l_{k}}^{k}\bigr\Vert ^{2}  \sum _{\mu_{i}^{k} < 0,i \in I_{k}^{0}}\mu_{i}^{k}(\bar{ \omega}_{k}  1)  \sum_{\mu_{i}^{k} \ge 0,i \in I_{k}^{0}} \mu_{i}^{k}\bigl(\bar{\omega}_{k} + D_{i}^{k}\bigr) \\ &\quad {} \sum_{\mu_{j}^{k} < 0,j \in J_{k}}\bigl(  \mu_{j}^{k} \bigr)  \sum_{\mu_{j}^{k} \ge 0,j \in J_{k}}\mu_{j}^{k}D_{j}^{k} \biggr\} + \varrho_{k}\bigl(\mu_{L_{k}}^{k} \bigr)^{T}e^{k} \\ &= \rho_{k}^{\xi} \biggl\{  \bigl\Vert P_{k}g_{l_{k}}^{k}\bigr\Vert ^{2}  \sum _{\mu_{j}^{k} < 0,j \in L_{k}}\bigl(  \mu_{j}^{k}\bigr)  \sum_{\mu_{j}^{k} \ge 0,j \in L_{k}}\mu_{j}^{k}D_{j}^{k}  \bar{\omega}_{k}\sum_{i \in I_{k}^{0}} \mu_{i}^{k}\biggr\} + \varrho_{k}\bigl( \mu_{L_{k}}^{k}\bigr)^{T}e^{k} \\ &= \rho_{k}^{\xi} \biggl\{  \bigl\Vert P_{k}g_{l_{k}}^{k}\bigr\Vert ^{2}  \omega_{k}  \bar{\omega}_{k}\sum _{i \in I_{k}^{0}}\mu_{i}^{k}\biggr\} + \varrho_{k}\bigl(\mu_{L_{k}}^{k}\bigr)^{T}e^{k}. \end{aligned} $$
In addition, based on the definition of \(\mu_{l_{k}}^{k}\), it immediately follows \(\bar{\omega}_{k}\sum_{i \in I_{k}^{0}}\mu_{i}^{k} = \bar{\omega}_{k} + \bar{\omega}_{k}(  \mu_{l_{k}}^{k})\). If \(\mu_{l_{k}}^{k} \ge 0\), we have \(\bar{\omega}_{k} = 0\), \(\bar{\omega}_{k}(  \mu_{l_{k}}^{k}) = 0 = \bar{\omega}_{k}^{2}\). On the other hand \(\mu_{l_{k}}^{k} < 0\), then \(\bar{\omega}_{k} =  \mu_{l_{k}}^{k}\), \(\bar{\omega}_{k}(  \mu_{l_{k}}^{k}) = \bar{\omega}_{k}^{2}\). Thus, \(\bar{\omega}_{k}\sum_{i \in I_{k}^{0}}\mu_{i}^{k} = \bar{\omega}_{k} + \bar{\omega}_{k}^{2} \) is always true. We have from (13) and (17)
$$ \begin{aligned}[b] \bigl(g_{l_{k}}^{k} \bigr)^{T}d^{k} &\le \rho_{k}^{\xi} \bigl\{  \bigl\Vert P_{k}g_{l_{k}}^{k}\bigr\Vert ^{2}  \omega_{k}  \bar{\omega}_{k}  \bar{\omega}_{k}^{2} \bigr\} + \varrho_{k}\bigl(\mu_{L_{k}}^{k} \bigr)^{T}e^{k} \\ &= \rho_{k}^{\xi} (  \rho_{k}  \bar{ \omega}_{k}) + \varrho_{k}\bigl(\mu_{L_{k}}^{k} \bigr)^{T}e^{k} \\ &\le  \rho_{k}^{\xi} \bar{\omega}_{k}  \rho_{k}^{1 + \xi} + \varrho_{k}\bigl\Vert \mu_{L_{k}}^{k}\bigr\Vert _{1} \\ &=  \rho_{k}^{\xi} \bar{\omega}_{k}  \varrho_{k}. \end{aligned} $$
(19)
(ii) From Lemma 1(ii) and (16), we obtain
$$ \begin{aligned}[b] N_{k}^{T}d^{k} &= \rho_{k}^{\xi} \bigl\{  N_{k}^{T}P_{k}g_{l_{k}}^{k} + N_{k}^{T}Q_{k}^{T}v_{L_{k}}^{k} \bigr\}  \varrho_{k}N_{k}^{T}Q_{k}^{T}e^{k} \\ &= \rho_{k}^{\xi} \bigl\{  D_{k}Q_{k}g_{l_{k}}^{k} + v_{L_{k}}^{k}  D_{k}\bigl(N_{k}^{T}N_{k} + D_{k}\bigr)^{  1}v_{L_{k}}^{k}\bigr\} \\ &\quad {} \varrho_{k}\bigl\{ E_{\vert L_{k}\vert }  D_{k} \bigl(N_{k}^{T}N_{k} + D_{k} \bigr)^{  1}\bigr\} e^{k}. \end{aligned} $$
(20)
Then we discuss the following two cases, respectively.
Case 1. For \(i \in (I(x^{k})\backslash \{ l_{k}\} ) \subseteq I_{k}^{0}\), it follows that \(D_{j}^{k} = 0\). From (20), we have \((g_{i}^{k}  g_{l_{k}}^{k})^{T}d^{k} = \rho_{k}^{\xi} v_{i}^{k}  \varrho_{k}\). Then, combined (18) with conclusion (i), we have
$$\bigl(g_{i}^{k}\bigr)^{T}d^{k} = \bigl(g_{l_{k}}^{k}\bigr)^{T}d^{k} + \rho_{k}^{\xi} v_{i}^{k}  \varrho_{k} \le  \varrho_{k}  \rho_{k}^{\xi} \bar{\omega}_{k} + \rho_{k}^{\xi} \bar{ \omega}_{k}  \varrho_{k} =  2\varrho_{k} \le  \varrho_{k}. $$
Case 2. For \(j \in J(x^{k}) \subseteq J_{k}\), \(D_{j}^{k} = 0 \) holds. It follows from (20) and (18) that
$$\bigl(g_{j}^{k}\bigr)^{T}d^{k} = \rho_{k}^{\xi} v_{j}^{k}  \varrho_{k} \le  \varrho_{k}. $$
By summarizing the above discussion, the conclusion (ii) holds.
(iii) Since \(F'(x^{k};d^{k}) = \max \{ (g_{i}^{k})^{T}d^{k}, i \in I(x^{k})\}\), we have \(F'(x^{k};d^{k}) \le  \varrho_{k}\) from the conclusions (i) and (ii). □
Based on the improved direction \(d^{k}\) defined by (16) and analysed above, we are now ready to describe the steps of our algorithm as follows.
Algorithm A
Step 0. Choose an initial feasible point \(x^{0} \in X\) and parameters: \(\alpha,\beta \in (0,1)\), \(\varepsilon > 0\), \(p \ge 1\), \(\xi > 0\). Let \(k: = 0\).
Step 1. For the current iteration point \(x^{k}\), generate the working set \(I_{k}\), \(J_{k}\) by (4) and (5), calculate the projection matrix \(P_{k}\), the optimal identification function values \(\rho_{k}\) and \(\varrho_{k}\) by (8), (13)(15) and (17). If \(\rho_{k} = 0\), then \(x^{k} \) is a stationary point of the problem (1), stop. Otherwise, go to Step 2.
Step 2. Obtain the search direction \(d^{k}\) by (16)(18).
Step 3. Compute the step size \(t_{k}\), which is the maximum t of the sequence \(\{ 1,\beta,\beta^{2}, \ldots \}\) satisfying
$$\begin{aligned}& F\bigl(x^{k} + td^{k}\bigr) \le F^{k}  \alpha t \varrho_{k}, \end{aligned}$$
(21)
$$\begin{aligned}& f_{j}\bigl(x^{k} + td^{k}\bigr) \le 0,\quad j \in J. \end{aligned}$$
(22)
Step 4. Let \(x^{k + 1} = x^{k} + t_{k}d^{k}\), \(k: = k + 1\), and go back to Step 1.
Remark 2
The inequality (21) is equivalent to \(f_{i}(x^{k} + td^{k}) \le F^{k}  \alpha t\varrho_{k}\), \(i \in I\).
Note that Lemma 2, we get \(F'(x^{k};d^{k}) \le  \varrho_{k} < 0 \) and \((g_{j}^{k})^{T}d^{k} \le  \varrho_{k}\), \(\forall j \in J(x^{k})\), so it is easy to know that (21) and (22) hold for \(t > 0\) small enough, then Algorithm A is well defined.