First, we denote \(\|x\|=\sqrt{x^{\top}x}\) as the Euclidean-norm. Let \({C^{*}}\) denote the solution set of \(\operatorname{CES}(F,C)\). Throughout this paper, we assume that:
-
(A1)
The solution set \({C^{*}}\) is nonempty.
-
(A2)
The mapping \(F(\cdot)\) is monotone on C, i.e.,
$$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq0,\quad \mbox{for all } x,y\in C. $$
-
(A3)
The mapping \(F(\cdot)\) is Lipschitz continuous on C, i.e., there is a positive constant L such that
$$ \bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \mbox{for all } x,y\in C. $$
-
(A4)
The mapping \(F(\cdot)\) is strongly monotone on C, i.e., there is a positive constant η such that
$$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq\eta\|x-y\| ^{2}, \quad \mbox{for all } x,y\in C. $$
(2)
Obviously, (A4) implies (A2), and from (2) and the Cauchy-Schwartz inequality, we have
$$ \bigl\Vert F(x)-F(y)\bigr\Vert \geq\eta\|x-y\|, \quad \mbox{for all } x,y\in C. $$
(3)
Then let \(P_{C}(\cdot)\) denote the projection mapping from \(R^{n}\) onto the convex set C, i.e.,
$$ P_{C}(x)=\operatorname{argmin}\bigl\{ \Vert x-y\Vert |y\in C\bigr\} , $$
which has the following nonexpansive property:
$$ \bigl\Vert P_{C}(x)-P_{C}(y)\bigr\Vert \leq\|x-y\|,\quad \forall x,y\in R^{n}. $$
(4)
Now, we review the spectral gradient method for the unconstrained minimization problem:
$$ \min f(x),\quad x\in R^{n}, $$
(5)
where \(f: R^{n}\rightarrow R\) is smooth and its gradient is available. The spectral gradient for solving (5) is an iterative method of the form
$$ x_{k+1}=x_{k}-\alpha_{k}\nabla f(x_{k}), $$
where \(\alpha_{k}\) is a step size defined by (see [9])
$$ \alpha_{k}^{\mathrm{I}}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}}\quad \mbox{or}\quad \alpha_{k}^{\mathrm{II}}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}}, $$
(6)
in which \(s_{k-1}=x_{k}-x_{k-1}\), \(y_{k-1}=\nabla f(x_{k})-\nabla f(x_{k-1})\). The step sizes (6) are called Barzilai-Borwein (BB) step sizes, and the corresponding gradient methods are spectral gradient methods. The spectral gradient with step size \(\alpha_{k}^{\mathrm{II}}\) has been extended to solve the constrained equations (1) by Yu et al. [6], however, as discussed in the Introduction, we do not know whether the method in [6] possesses the linear convergence rate. In the following, we will extend the spectral gradient with step size \(\alpha_{k}^{\mathrm{I}}\) and \(\alpha_{k}^{\mathrm {II}}\) to solve constrained equations (1) by some new type Armijo line searches, and we propose two spectral gradient projection methods, which are not only globally convergent, but also have a linear convergence rate.
The spectral gradient projection methods are stated as follows.
Algorithm 2.1
Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), the parameters \(0<\rho<1\), \(0<\sigma<r<1\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).
Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.
Step 2. Compute \(d_{k}\) by
$$ d_{k}=\left \{ \begin{array}{l@{\quad}l} -F(x_{k}),& \mbox{if } k=0, \\ -\theta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right . $$
(7)
where
$$ \theta_{k}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}}, $$
(8)
which is similar to \(\alpha_{k}^{\mathrm{I}}\) defined in (6), \(y_{k-1}=F(x_{k})-F(x_{k-1})\), but \(s_{k-1}\) is defined by
$$ s_{k-1}=x_{k}-x_{k-1}+ry_{k-1}, $$
which is different from the standard definition of \(s_{k-1}\). Stop if \(d_{k}=0\); otherwise, go to Step 3.
Step 3. Find the trial point \(z_{k}=x_{k}+\alpha_{k} d_{k}\), where \(\alpha_{k}=\beta_{k}\rho^{m_{k}}\) with \(m_{k}\) being the smallest nonnegative integer m such that
$$ -\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(9)
Step 4. Compute
$$ x_{k+1}=P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr], $$
(10)
where
$$ \xi_{k}=\frac{\langle F(z_{k}), x_{k}-z_{k}\rangle }{\Vert F(z_{k})\Vert ^{2}}. $$
(11)
Choose an initial step size \(\beta_{k+1}\) such that \(\beta_{k+1}\in [\beta_{\mathrm{min}},\beta_{\mathrm{max}}]\). Set \(k:=k+1\) and go to Step 1.
Algorithm 2.2
Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), compute L, the Lipschitz constant of \(F(\cdot)\), choose the parameters \(0<\rho <1\), \(0< r<1\), \(0<\sigma<r^{2}/(L+r)\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).
Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.
Step 2. Compute \(d_{k}\) by
$$d_{k}=\left \{ \begin{array}{l@{\quad}l}-F(x_{k}), &\mbox{if } k=0, \\ -\vartheta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right . $$
where
$$\vartheta_{k}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}}, $$
which is similar to \(\alpha_{k}^{\mathrm{II}}\) defined in (6), \(s_{k-1}=x_{k}-x_{k-1}\), but \(y_{k-1}\) is defined by
$$ y_{k-1}=F(x_{k})-F(x_{k-1})+rs_{k-1}, $$
which is different from the standard definition of \(s_{k-1}\). Stop if \(d_{k}=0\); otherwise, go to Step 3.
Step 3. Find the trial point \(z_{k}=x_{k}+\alpha_{k} d_{k}\), where \(\alpha_{k}=\beta_{k}\rho^{m_{k}}\) with \(m_{k}\) being the smallest nonnegative integer m such that
$$ -\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\|d_{k} \|^{2}. $$
(12)
Step 4. See Step 4 of Algorithm 2.1.
The discussions of the global convergence and linear convergence rate of Algorithm 2.2 are similar to those of Algorithm 2.1. Therefore, in the following, we discuss Algorithm 2.2 in detail, and we only give the corresponding results of Algorithm 2.2.
Remark 2.1
For Algorithm 2.1, by (3), we have
$$\begin{aligned} s_{k-1}^{\top}y_{k-1}&=\langle x_{k}-x_{k-1}+ry_{k-1},y_{k-1} \rangle \\ &\leq\frac{1}{\eta}\|y_{k-1}\|^{2}+r\|y_{k-1} \|^{2} \\ &= \biggl(\frac{1}{\eta}+r \biggr)\|y_{k-1}\|^{2}. \end{aligned}$$
In addition, by the monotonicity of \(F(\cdot)\), we also have
$$ s_{k-1}^{\top}y_{k-1}\geq r\|y_{k-1} \|^{2}. $$
So we have from the above two inequalities and (7)
$$ r\bigl\Vert F(x_{k})\bigr\Vert \leq \|d_{k}\|\leq \biggl(\frac {1}{\eta}+r \biggr)\bigl\Vert F(x_{k})\bigr\Vert , $$
(13)
from which we can get \(\|F(x_{k})\|=0\) if \(\|d_{k}\|=0\), which means \(x_{k}\) is a solution of \(\operatorname{CES}(F,C)\). Thus, Algorithm 2.1 can also terminate when \(\|d_{k}\|=0\). Similarly, for Algorithm 2.2, by the Lipschitz continuity and monotonicity of \(F(\cdot)\), we can deduce that
$$\frac{\|F(x_{k})\|}{L+r}\leq\|d_{k}\|\leq\frac{\|F(x_{k})\|}{r}. $$
In what follows, we assume that \(\|F(x_{k})\|\neq0\) and \(\|d_{k}\|\neq0\), for all k, i.e., Algorithm 2.1 or Algorithm 2.2 generates an infinite sequence \(\{x_{k}\}\).
Remark 2.2
In (10), we attach a relax factor \(\gamma \in(0,2)\) to \(F(z_{k})\) based on numerical experiences.
Remark 2.3
The line search (9) is different from that of [6, 7], which is well defined by the following lemma.
Lemma 2.1
For all
\(k\geq0\), there exists a nonnegative number
\(m_{k}\)
satisfying (9).
Proof
For the sake of contradiction, we suppose that there exists \(k_{0}\geq0\) such that (9) is not satisfied for any nonnegative integer m, i.e.,
$$-\bigl\langle F\bigl(x_{k_{0}}+\beta_{k_{0}}\rho^{m}d_{k_{0}} \bigr),d_{k_{0}}\bigr\rangle <\sigma\bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2},\quad \forall m\geq1. $$
Letting \(m\rightarrow\infty\) and using the continuity of \(F(\cdot)\) yield
$$ -\bigl\langle F(x_{k_{0}}),d_{k_{0}}\bigr\rangle \leq \sigma \bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2}. $$
(14)
On the other hand, by (7) and (13), we obtain
$$ -\bigl\langle F(x_{0}),d_{0}\bigr\rangle =\bigl\Vert F(x_{0})\bigr\Vert ^{2}>r\bigl\Vert F(x_{0}) \bigr\Vert ^{2} $$
and
$$ -\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1, $$
which together with (14) means that \(\sigma\geq r\), however, this contradicts the fact that \(\sigma< r\). Therefore the assertion of Lemma 2.1 holds. This completes the proof. □
For the line search (12), we have a similar result, in the following lemma.
Lemma 2.2
For all
\(k\geq0\), there exists a nonnegative number
\(m_{k}\)
satisfying (12).
Proof
The lemma can be proved by contradiction as that of Lemma 2.1, and we omit the proof for concision. This completes the proof. □
The step length \(\alpha_{k}\) and the norm of the function \(F(x_{k})\) satisfy the following property, which is an important result for proving the global convergence of Algorithm 2.1.
Lemma 2.3
Suppose that
\(F(\cdot)\)
is strongly monotone and let
\(\{x_{k}\}\)
and
\(\{z_{k}\}\)
be the sequences generated by Algorithm
2.1, then
\(\{x_{k}\}\)
and
\(\{z_{k}\}\)
are both bounded. Furthermore, we have
$$ \lim_{k\rightarrow\infty}\alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}=0. $$
(15)
Proof
From (9), we have
$$ \bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle \geq\sigma \alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}>0. $$
(16)
For any \(x^{*}\in C^{*}\), from (4), we have
$$\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad = \bigl\Vert P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr]-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-\gamma\xi_{k}F(z_{k})-x^{*} \bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle +\gamma^{2}\xi _{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2}. \end{aligned}$$
(17)
By the monotonicity of the mapping \(F(\cdot)\), we have
$$\begin{aligned}& \bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle +\bigl\langle F(z_{k}),z_{k}-x^{*}\bigr\rangle \\& \quad \geq \bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\bigl\langle F\bigl(x^{*}\bigr),z_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle . \end{aligned}$$
(18)
Substituting (16) and (18) into (17), we have
$$\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\gamma ^{2}\xi_{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\gamma(2- \gamma) \frac{\langle F(z_{k}),x_{k}-z_{k}\rangle ^{2}}{\Vert F(z_{k})\Vert ^{2}} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}- \gamma(2-\gamma) \frac{\sigma^{2}\alpha_{k}^{2}\Vert F(x_{k})\Vert ^{4}}{\Vert F(z_{k})\Vert ^{2}} , \end{aligned}$$
(19)
which together with \(\gamma\in(0,2)\) indicates that, for all k,
$$ \bigl\Vert x_{k+1}-x^{*}\bigr\Vert \leq\bigl\Vert x_{k}-x^{*}\bigr\Vert , $$
(20)
which shows that the sequence \(\{x_{k}\}\) is bounded. By (13), \(\{d_{k}\}\) is bounded and so is \(\{z_{k}\}\). Then, by the continuity of \(F(\cdot)\), there exists a constant \(M>0\) such that \(\|F(z_{k})\|\leq M\), for all k. Therefore it follows from (19) that
$$\gamma(2-\gamma)\frac{\sigma^{2}}{M^{2}}\sum_{k=0}^{\infty}\alpha_{k}^{2}\bigl\Vert F(x_{k})\bigr\Vert ^{4}\leq\sum_{k=0}^{\infty}\bigl(\bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2}\bigr)<\infty, $$
which implies that the assertion (15) holds. The proof is completed. □
Lemma 2.4
Suppose that
\(F(\cdot)\)
is monotone and Lipschitz continuous and let
\(\{x_{k}\}\)
and
\(\{z_{k}\}\)
be the sequences generated by Algorithm
2.2, then
\(\{x_{k}\}\)
and
\(\{z_{k}\}\)
are both bounded. Furthermore, we have
$$\lim_{k\rightarrow\infty}\alpha_{k}\|d_{k} \|^{2}=0. $$
Proof
The conclusion is a little different from (15), which results from the difference of the right hands of the line searches (9) and (12). In fact, this conclusion can be proved as that of Lemma 2.3, and we also omit it for concision. This completes the proof. □
Now, we establish the global convergence theorems for Algorithm 2.1 and Algorithm 2.2.
Theorem 2.1
Suppose that the conditions in Lemma
2.3
hold. Then the sequence
\(\{x_{k}\}\)
generated by Algorithm
2.1
globally converges to a solution of
\(\operatorname{CES}(F,C)\).
Proof
We consider the following two possible cases.
Case 1: \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|=0\), which together with the continuity of \(F(\cdot)\) implies that the sequence \(\{x_{k}\}\) has some accumulation point \(\bar{x}\) such that \(F(\bar{x})=0\). From (20), \(\{\|x_{k}-\bar{x}\|\}\) converges, and since \(\bar{x}\) is an accumulation point of \(\{x_{k}\}\), \(\{x_{k}\}\) must converge to \(\bar{x}\).
Case 2: \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0\). Then by (15), it follows that \(\lim_{k\rightarrow\infty}\alpha_{k}=0\). Therefore, from the line search (9), for sufficiently large k, we have
$$ -\bigl\langle F\bigl(x_{k}+\beta_{k}\rho ^{m_{k}-1}d_{k}\bigr),d_{k}\bigr\rangle <\sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}. $$
(21)
Since \(\{x_{k}\}\), \(\{d_{k}\}\) are both bounded, we can choose a sequence and letting \(k\rightarrow\infty\) in (21), we can obtain
$$ -\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \leq\sigma \bigl\Vert F(\bar{x})\bigr\Vert ^{2}, $$
(22)
where \(\bar{x}\), \(\bar{d}\) are limit points of corresponding subsequences. On the other hand, by (13), we obtain
$$ -\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1. $$
Letting \(k\rightarrow\infty\) in the above inequality, we obtain
$$ -\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \geq r\bigl\Vert F( \bar{x})\bigr\Vert ^{2}. $$
(23)
Thus, by (22) and (23), we get \(r\leq\sigma\), and this contradicts the fact that \(r>\sigma\). Therefore \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0\) does not hold. This completes the proof. □
For Algorithm 2.2, we also have the following global convergence.
Theorem 2.2
Suppose that the conditions in Lemma
2.4
hold. Then the sequence
\(\{x_{k}\}\)
generated by Algorithm
2.2
globally converges to a solution of
\(\operatorname{CES}(F,C)\).
Proof
Following a process similar to the proof for Theorem 2.1, we can get the desired conclusion. This completes the proof. □