# Two spectral gradient projection methods for constrained equations and their linear convergence rate

## Abstract

Due to its simplicity and numerical efficiency for unconstrained optimization problems, the spectral gradient method has received more and more attention in recent years. In this paper, two spectral gradient projection methods for constrained equations are proposed, which are combinations of the well-known spectral gradient method and the hyperplane projection method. The new methods are not only derivative-free, but also completely matrix-free, and consequently they can be applied to solve large-scale constrained equations. Under the condition that the underlying mapping of the constrained equations is Lipschitz continuous or strongly monotone, we establish the global convergence of the new methods. Compared with the existing gradient methods for solving such problems, the new methods possess a linear convergence rate under some error bound conditions. Furthermore, a relax factor γ is attached in the update step to accelerate convergence. Preliminary numerical results show that they are efficient and promising in practice.

## 1 Introduction

In this paper, we consider the problems of finding a solution of the following constrained equations, denoted by $$\operatorname{CES}(F,C)$$,

$$F\bigl(x^{*}\bigr)=0\quad \mbox{subject to}\quad x^{*}\in C,$$
(1)

where $$F: C\rightarrow R^{n}$$ is a given continuous nonlinear mapping and C is a nonempty closed convex set of $$R^{n}$$. Obviously, when $$C=R^{n}$$, (1) reduces to the nonlinear equations, which is intensively studied by many scholars. The constrained system of equations (1) appears in wide variety of problems in applied mathematics, and some important problems, such as economic equilibrium problems , power flow equations , and chemical equilibrium systems , can be reformulated as a problem of the kind (1).

Among various numerical methods for solving $$\operatorname{CES}(F,C)$$ , the gradient projection methods (GPMs) are the most efficient, especially when the projection onto the feasible set C is easy to implement. For example, when C is the nonnegative orthant, or a box, or a ball, GPMs require the lowest computational cost. In addition, the GPMs are also the simplest, because they do not need to store any matrix during the iteration process. Therefore, they are completely matrix-free, and consequently, they can be applied to solve large-scale $$\operatorname{CES}(F,C)$$.

It is well known that the spectral gradient method [9, 10] and the conjugate gradient method  are two efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Recently, combined with the projection technique, they are extended to solve constrained equations $$\operatorname{CES}(F,C)$$ by some scholars [6, 7]. In , Yu et al. proposed a spectral gradient projection method for solving monotone $$\operatorname{CES}(F,C)$$, which can be applied to nonsmooth constrained equation, and works quite well even for large-scale $$\operatorname{CES}(F,C)$$. Quite recently, Liu et al.  developed two unified frameworks of some sufficient descent conjugate gradient projection methods for solving monotone $$\operatorname{CES}(F,C)$$, which are also applied to solve large-scale nonsmooth constrained equations. However, the convergence rate issue of the methods in [6, 7] is not investigated. Therefore, whether they have a linear convergence rate is an open problem. Can we design a spectral/conjugate gradient projection method with a linear convergence rate for $$\operatorname{CES}(F,C)$$? In this paper, we answer this question positively for spectral gradient projection method. Note that, in , Dai and Liao proved a nice conclusion for the spectral gradient method. In fact, they established the R-linear convergence of the spectral gradient method for strongly convex quadratics of any number of dimensions, and they also proved the locally R-linear convergence for the general objective function. Obviously, the general minimization problem discussed in  is equivalent to the system of nonlinear equations under some mild conditions. However, for the system of constrained nonlinear equations, we shall establish the locally R-linear convergence of the spectral gradient method in this paper. Therefore, our result extends the conclusion in  in some sense.

In fact, in this paper, motivated by the projection methods in [13, 14] and the spectral gradient method in , we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed as combinations of the well-known spectral gradient method and the famous hyperplane projection method, and they possess a linear convergence rate under some error bound conditions. The remainder of this paper is organized as follows. In the next section, we describe the new methods and present their global convergence analysis. The linear convergence rates of the new methods are established in Section 3. Numerical results are reported in Section 4. Finally, some final remarks are included in Section 5.

## 2 Algorithm and convergence analysis

First, we denote $$\|x\|=\sqrt{x^{\top}x}$$ as the Euclidean-norm. Let $${C^{*}}$$ denote the solution set of $$\operatorname{CES}(F,C)$$. Throughout this paper, we assume that:

1. (A1)

The solution set $${C^{*}}$$ is nonempty.

2. (A2)

The mapping $$F(\cdot)$$ is monotone on C, i.e.,

$$\bigl\langle F(x)-F(y), x-y\bigr\rangle \geq0,\quad \mbox{for all } x,y\in C.$$
3. (A3)

The mapping $$F(\cdot)$$ is Lipschitz continuous on C, i.e., there is a positive constant L such that

$$\bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \mbox{for all } x,y\in C.$$
4. (A4)

The mapping $$F(\cdot)$$ is strongly monotone on C, i.e., there is a positive constant η such that

$$\bigl\langle F(x)-F(y), x-y\bigr\rangle \geq\eta\|x-y\| ^{2}, \quad \mbox{for all } x,y\in C.$$
(2)

Obviously, (A4) implies (A2), and from (2) and the Cauchy-Schwartz inequality, we have

$$\bigl\Vert F(x)-F(y)\bigr\Vert \geq\eta\|x-y\|, \quad \mbox{for all } x,y\in C.$$
(3)

Then let $$P_{C}(\cdot)$$ denote the projection mapping from $$R^{n}$$ onto the convex set C, i.e.,

$$P_{C}(x)=\operatorname{argmin}\bigl\{ \Vert x-y\Vert |y\in C\bigr\} ,$$

which has the following nonexpansive property:

$$\bigl\Vert P_{C}(x)-P_{C}(y)\bigr\Vert \leq\|x-y\|,\quad \forall x,y\in R^{n}.$$
(4)

Now, we review the spectral gradient method for the unconstrained minimization problem:

$$\min f(x),\quad x\in R^{n},$$
(5)

where $$f: R^{n}\rightarrow R$$ is smooth and its gradient is available. The spectral gradient for solving (5) is an iterative method of the form

$$x_{k+1}=x_{k}-\alpha_{k}\nabla f(x_{k}),$$

where $$\alpha_{k}$$ is a step size defined by (see )

$$\alpha_{k}^{\mathrm{I}}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}}\quad \mbox{or}\quad \alpha_{k}^{\mathrm{II}}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}},$$
(6)

in which $$s_{k-1}=x_{k}-x_{k-1}$$, $$y_{k-1}=\nabla f(x_{k})-\nabla f(x_{k-1})$$. The step sizes (6) are called Barzilai-Borwein (BB) step sizes, and the corresponding gradient methods are spectral gradient methods. The spectral gradient with step size $$\alpha_{k}^{\mathrm{II}}$$ has been extended to solve the constrained equations (1) by Yu et al. , however, as discussed in the Introduction, we do not know whether the method in  possesses the linear convergence rate. In the following, we will extend the spectral gradient with step size $$\alpha_{k}^{\mathrm{I}}$$ and $$\alpha_{k}^{\mathrm {II}}$$ to solve constrained equations (1) by some new type Armijo line searches, and we propose two spectral gradient projection methods, which are not only globally convergent, but also have a linear convergence rate.

The spectral gradient projection methods are stated as follows.

### Algorithm 2.1

Step 0. Set an arbitrary initial point $$x_{0}\in{C}$$, the parameters $$0<\rho<1$$, $$0<\sigma<r<1$$, $$0<\gamma<2$$, and $$0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}$$. Set the initial step size $$\beta_{0}=1$$ and set $$k:=0$$.

Step 1. If $$F(x_{k})=0$$, then stop; otherwise, go to Step 2.

Step 2. Compute $$d_{k}$$ by

$$d_{k}=\left \{ \begin{array}{l@{\quad}l} -F(x_{k}),& \mbox{if } k=0, \\ -\theta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right .$$
(7)

where

$$\theta_{k}=\frac{s_{k-1}^{\top}y_{k-1}}{y_{k-1}^{\top}y_{k-1}},$$
(8)

which is similar to $$\alpha_{k}^{\mathrm{I}}$$ defined in (6), $$y_{k-1}=F(x_{k})-F(x_{k-1})$$, but $$s_{k-1}$$ is defined by

$$s_{k-1}=x_{k}-x_{k-1}+ry_{k-1},$$

which is different from the standard definition of $$s_{k-1}$$. Stop if $$d_{k}=0$$; otherwise, go to Step 3.

Step 3. Find the trial point $$z_{k}=x_{k}+\alpha_{k} d_{k}$$, where $$\alpha_{k}=\beta_{k}\rho^{m_{k}}$$ with $$m_{k}$$ being the smallest nonnegative integer m such that

$$-\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$
(9)

Step 4. Compute

$$x_{k+1}=P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr],$$
(10)

where

$$\xi_{k}=\frac{\langle F(z_{k}), x_{k}-z_{k}\rangle }{\Vert F(z_{k})\Vert ^{2}}.$$
(11)

Choose an initial step size $$\beta_{k+1}$$ such that $$\beta_{k+1}\in [\beta_{\mathrm{min}},\beta_{\mathrm{max}}]$$. Set $$k:=k+1$$ and go to Step 1.

### Algorithm 2.2

Step 0. Set an arbitrary initial point $$x_{0}\in{C}$$, compute L, the Lipschitz constant of $$F(\cdot)$$, choose the parameters $$0<\rho <1$$, $$0< r<1$$, $$0<\sigma<r^{2}/(L+r)$$, $$0<\gamma<2$$, and $$0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}$$. Set the initial step size $$\beta_{0}=1$$ and set $$k:=0$$.

Step 1. If $$F(x_{k})=0$$, then stop; otherwise, go to Step 2.

Step 2. Compute $$d_{k}$$ by

$$d_{k}=\left \{ \begin{array}{l@{\quad}l}-F(x_{k}), &\mbox{if } k=0, \\ -\vartheta_{k}F(x_{k}), &\mbox{if } k\geq1, \end{array} \right .$$

where

$$\vartheta_{k}=\frac{s_{k-1}^{\top}s_{k-1}}{s_{k-1}^{\top}y_{k-1}},$$

which is similar to $$\alpha_{k}^{\mathrm{II}}$$ defined in (6), $$s_{k-1}=x_{k}-x_{k-1}$$, but $$y_{k-1}$$ is defined by

$$y_{k-1}=F(x_{k})-F(x_{k-1})+rs_{k-1},$$

which is different from the standard definition of $$s_{k-1}$$. Stop if $$d_{k}=0$$; otherwise, go to Step 3.

Step 3. Find the trial point $$z_{k}=x_{k}+\alpha_{k} d_{k}$$, where $$\alpha_{k}=\beta_{k}\rho^{m_{k}}$$ with $$m_{k}$$ being the smallest nonnegative integer m such that

$$-\bigl\langle F(x_{k}+\alpha_{k} d_{k}),d_{k}\bigr\rangle \geq \sigma\|d_{k} \|^{2}.$$
(12)

Step 4. See Step 4 of Algorithm 2.1.

The discussions of the global convergence and linear convergence rate of Algorithm 2.2 are similar to those of Algorithm 2.1. Therefore, in the following, we discuss Algorithm 2.2 in detail, and we only give the corresponding results of Algorithm 2.2.

### Remark 2.1

For Algorithm 2.1, by (3), we have

\begin{aligned} s_{k-1}^{\top}y_{k-1}&=\langle x_{k}-x_{k-1}+ry_{k-1},y_{k-1} \rangle \\ &\leq\frac{1}{\eta}\|y_{k-1}\|^{2}+r\|y_{k-1} \|^{2} \\ &= \biggl(\frac{1}{\eta}+r \biggr)\|y_{k-1}\|^{2}. \end{aligned}

In addition, by the monotonicity of $$F(\cdot)$$, we also have

$$s_{k-1}^{\top}y_{k-1}\geq r\|y_{k-1} \|^{2}.$$

So we have from the above two inequalities and (7)

$$r\bigl\Vert F(x_{k})\bigr\Vert \leq \|d_{k}\|\leq \biggl(\frac {1}{\eta}+r \biggr)\bigl\Vert F(x_{k})\bigr\Vert ,$$
(13)

from which we can get $$\|F(x_{k})\|=0$$ if $$\|d_{k}\|=0$$, which means $$x_{k}$$ is a solution of $$\operatorname{CES}(F,C)$$. Thus, Algorithm 2.1 can also terminate when $$\|d_{k}\|=0$$. Similarly, for Algorithm 2.2, by the Lipschitz continuity and monotonicity of $$F(\cdot)$$, we can deduce that

$$\frac{\|F(x_{k})\|}{L+r}\leq\|d_{k}\|\leq\frac{\|F(x_{k})\|}{r}.$$

In what follows, we assume that $$\|F(x_{k})\|\neq0$$ and $$\|d_{k}\|\neq0$$, for all k, i.e., Algorithm 2.1 or Algorithm 2.2 generates an infinite sequence $$\{x_{k}\}$$.

### Remark 2.2

In (10), we attach a relax factor $$\gamma \in(0,2)$$ to $$F(z_{k})$$ based on numerical experiences.

### Remark 2.3

The line search (9) is different from that of [6, 7], which is well defined by the following lemma.

### Lemma 2.1

For all $$k\geq0$$, there exists a nonnegative number $$m_{k}$$ satisfying (9).

### Proof

For the sake of contradiction, we suppose that there exists $$k_{0}\geq0$$ such that (9) is not satisfied for any nonnegative integer m, i.e.,

$$-\bigl\langle F\bigl(x_{k_{0}}+\beta_{k_{0}}\rho^{m}d_{k_{0}} \bigr),d_{k_{0}}\bigr\rangle <\sigma\bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2},\quad \forall m\geq1.$$

Letting $$m\rightarrow\infty$$ and using the continuity of $$F(\cdot)$$ yield

$$-\bigl\langle F(x_{k_{0}}),d_{k_{0}}\bigr\rangle \leq \sigma \bigl\Vert F(x_{k_{0}})\bigr\Vert ^{2}.$$
(14)

On the other hand, by (7) and (13), we obtain

$$-\bigl\langle F(x_{0}),d_{0}\bigr\rangle =\bigl\Vert F(x_{0})\bigr\Vert ^{2}>r\bigl\Vert F(x_{0}) \bigr\Vert ^{2}$$

and

$$-\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1,$$

which together with (14) means that $$\sigma\geq r$$, however, this contradicts the fact that $$\sigma< r$$. Therefore the assertion of Lemma 2.1 holds. This completes the proof. □

For the line search (12), we have a similar result, in the following lemma.

### Lemma 2.2

For all $$k\geq0$$, there exists a nonnegative number $$m_{k}$$ satisfying (12).

### Proof

The lemma can be proved by contradiction as that of Lemma 2.1, and we omit the proof for concision. This completes the proof. □

The step length $$\alpha_{k}$$ and the norm of the function $$F(x_{k})$$ satisfy the following property, which is an important result for proving the global convergence of Algorithm 2.1.

### Lemma 2.3

Suppose that $$F(\cdot)$$ is strongly monotone and let $$\{x_{k}\}$$ and $$\{z_{k}\}$$ be the sequences generated by Algorithm 2.1, then $$\{x_{k}\}$$ and $$\{z_{k}\}$$ are both bounded. Furthermore, we have

$$\lim_{k\rightarrow\infty}\alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}=0.$$
(15)

### Proof

From (9), we have

$$\bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle \geq\sigma \alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2}>0.$$
(16)

For any $$x^{*}\in C^{*}$$, from (4), we have

\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad = \bigl\Vert P_{C}\bigl[x_{k}-\gamma \xi_{k}F(z_{k})\bigr]-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-\gamma\xi_{k}F(z_{k})-x^{*} \bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle +\gamma^{2}\xi _{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2}. \end{aligned}
(17)

By the monotonicity of the mapping $$F(\cdot)$$, we have

\begin{aligned}& \bigl\langle F(z_{k}),x_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle +\bigl\langle F(z_{k}),z_{k}-x^{*}\bigr\rangle \\& \quad \geq \bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\bigl\langle F\bigl(x^{*}\bigr),z_{k}-x^{*}\bigr\rangle \\& \quad = \bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle . \end{aligned}
(18)

Substituting (16) and (18) into (17), we have

\begin{aligned}& \bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-2\gamma \xi_{k}\bigl\langle F(z_{k}),x_{k}-z_{k} \bigr\rangle +\gamma ^{2}\xi_{k}^{2}\bigl\Vert F(z_{k})\bigr\Vert ^{2} \\& \quad = \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\gamma(2- \gamma) \frac{\langle F(z_{k}),x_{k}-z_{k}\rangle ^{2}}{\Vert F(z_{k})\Vert ^{2}} \\& \quad \leq \bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}- \gamma(2-\gamma) \frac{\sigma^{2}\alpha_{k}^{2}\Vert F(x_{k})\Vert ^{4}}{\Vert F(z_{k})\Vert ^{2}} , \end{aligned}
(19)

which together with $$\gamma\in(0,2)$$ indicates that, for all k,

$$\bigl\Vert x_{k+1}-x^{*}\bigr\Vert \leq\bigl\Vert x_{k}-x^{*}\bigr\Vert ,$$
(20)

which shows that the sequence $$\{x_{k}\}$$ is bounded. By (13), $$\{d_{k}\}$$ is bounded and so is $$\{z_{k}\}$$. Then, by the continuity of $$F(\cdot)$$, there exists a constant $$M>0$$ such that $$\|F(z_{k})\|\leq M$$, for all k. Therefore it follows from (19) that

$$\gamma(2-\gamma)\frac{\sigma^{2}}{M^{2}}\sum_{k=0}^{\infty}\alpha_{k}^{2}\bigl\Vert F(x_{k})\bigr\Vert ^{4}\leq\sum_{k=0}^{\infty}\bigl(\bigl\Vert x_{k}-x^{*}\bigr\Vert ^{2}-\bigl\Vert x_{k+1}-x^{*}\bigr\Vert ^{2}\bigr)<\infty,$$

which implies that the assertion (15) holds. The proof is completed. □

### Lemma 2.4

Suppose that $$F(\cdot)$$ is monotone and Lipschitz continuous and let $$\{x_{k}\}$$ and $$\{z_{k}\}$$ be the sequences generated by Algorithm 2.2, then $$\{x_{k}\}$$ and $$\{z_{k}\}$$ are both bounded. Furthermore, we have

$$\lim_{k\rightarrow\infty}\alpha_{k}\|d_{k} \|^{2}=0.$$

### Proof

The conclusion is a little different from (15), which results from the difference of the right hands of the line searches (9) and (12). In fact, this conclusion can be proved as that of Lemma 2.3, and we also omit it for concision. This completes the proof. □

Now, we establish the global convergence theorems for Algorithm 2.1 and Algorithm 2.2.

### Theorem 2.1

Suppose that the conditions in Lemma  2.3 hold. Then the sequence $$\{x_{k}\}$$ generated by Algorithm 2.1 globally converges to a solution of $$\operatorname{CES}(F,C)$$.

### Proof

We consider the following two possible cases.

Case 1: $$\liminf_{k\rightarrow\infty}\|F(x_{k})\|=0$$, which together with the continuity of $$F(\cdot)$$ implies that the sequence $$\{x_{k}\}$$ has some accumulation point $$\bar{x}$$ such that $$F(\bar{x})=0$$. From (20), $$\{\|x_{k}-\bar{x}\|\}$$ converges, and since $$\bar{x}$$ is an accumulation point of $$\{x_{k}\}$$, $$\{x_{k}\}$$ must converge to $$\bar{x}$$.

Case 2: $$\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0$$. Then by (15), it follows that $$\lim_{k\rightarrow\infty}\alpha_{k}=0$$. Therefore, from the line search (9), for sufficiently large k, we have

$$-\bigl\langle F\bigl(x_{k}+\beta_{k}\rho ^{m_{k}-1}d_{k}\bigr),d_{k}\bigr\rangle <\sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$
(21)

Since $$\{x_{k}\}$$, $$\{d_{k}\}$$ are both bounded, we can choose a sequence and letting $$k\rightarrow\infty$$ in (21), we can obtain

$$-\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \leq\sigma \bigl\Vert F(\bar{x})\bigr\Vert ^{2},$$
(22)

where $$\bar{x}$$, $$\bar{d}$$ are limit points of corresponding subsequences. On the other hand, by (13), we obtain

$$-\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2},\quad \forall k \geq1.$$

Letting $$k\rightarrow\infty$$ in the above inequality, we obtain

$$-\bigl\langle F(\bar{x}),\bar{d}\bigr\rangle \geq r\bigl\Vert F( \bar{x})\bigr\Vert ^{2}.$$
(23)

Thus, by (22) and (23), we get $$r\leq\sigma$$, and this contradicts the fact that $$r>\sigma$$. Therefore $$\liminf_{k\rightarrow\infty}\|F(x_{k})\|>0$$ does not hold. This completes the proof. □

For Algorithm 2.2, we also have the following global convergence.

### Theorem 2.2

Suppose that the conditions in Lemma  2.4 hold. Then the sequence $$\{x_{k}\}$$ generated by Algorithm 2.2 globally converges to a solution of $$\operatorname{CES}(F,C)$$.

### Proof

Following a process similar to the proof for Theorem 2.1, we can get the desired conclusion. This completes the proof. □

## 3 Convergence rate

By Theorem 2.1 and Theorem 2.2, we know that the sequence $$\{x_{k}\}$$ generated by Algorithm 2.1 or Algorithm 2.2 converges to a solution of $$\operatorname{CES}(F,C)$$. In what follows, we always assume that $$x_{k}\rightarrow x^{*}$$ as $$k\rightarrow\infty$$, where $$x^{*}\in C^{*}$$. To establish the local convergence rate of the sequence generated by Algorithm 2.1 or Algorithm 2.2, we need the following assumption.

### Assumption 3.1

For $$x^{*}\in C^{*}$$, there exist three positive constants δ, c, and L such that

$$c\operatorname{dist}\bigl(x,C^{*}\bigr)\leq\bigl\Vert F(x)\bigr\Vert ,\quad \forall x\in N\bigl(x^{*},\delta\bigr)$$
(24)

and

$$\bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \forall x,y \in N\bigl(x^{*},\delta\bigr),$$
(25)

where $$\operatorname{dist}(x,C^{*})$$ denotes the distance from x to the solution set $$C^{*}$$, and

$$N\bigl(x^{*},\delta\bigr)=\bigl\{ x\in R^{n}|\bigl\Vert x-x^{*}\bigr\Vert \leq\delta\bigr\} .$$

Obviously, (A3) in Section 2 implies (25). Here, we set the constant c so that

$$0<\frac{\gamma(2-\gamma)\sigma\alpha c^{2}\eta ^{2}}{ L^{2}(\beta_{\mathrm{max}}L(1+r\eta)+\eta)^{2}}<1.$$
(26)

Now, we analyze the convergence rate of the sequence $$\{x_{k}\}$$ generated by Algorithm 2.1 or Algorithm 2.2 under the conditions (24) and (25).

### Lemma 3.1

If (A4) and the conditions in Assumption  3.1 hold, then the sequence $$\{\alpha_{k}\}$$ generated by the line search (9) has a positive bound from below.

### Proof

We only need to prove that for sufficiently large k, $$\alpha_{k}$$ has a positive bound from below. If $$\alpha_{k}\leq\beta_{k}$$, then by the construction of $$\alpha_{k}$$, we have

$$-\bigl\langle F\bigl(x_{k}+\beta_{k}\alpha_{k} \rho^{-1}d_{k}\bigr),d_{k}\bigr\rangle <\sigma\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$

In addition, by (7), we have

$$-\bigl\langle F(x_{k}),d_{k}\bigr\rangle = \theta_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq r\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$

Then, by the above two inequalities, we can obtain

$$\bigl\langle F\bigl(x_{k}+\beta_{k} \alpha_{k}\rho ^{-1}d_{k}\bigr)-F(x_{k}),d_{k} \bigr\rangle \geq(r-\sigma)\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$
(27)

On the other hand, from (13) and (25), we have

$$\bigl\langle F\bigl(x_{k}+\beta_{k} \alpha_{k}\rho ^{-1}d_{k}\bigr)-F(x_{k}),d_{k} \bigr\rangle \leq\frac{L\beta_{k}\alpha_{k}}{\rho}\|d_{k}\| ^{2}\leq \frac{L\beta_{k}\alpha_{k}(1+r\eta)^{2}}{\rho\eta^{2}}\bigl\Vert F(x_{k})\bigr\Vert ^{2}.$$
(28)

By (27) and (28), for k sufficiently large we obtain

$$\alpha_{k}\geq\frac{\rho(r-\sigma)\eta^{2}}{L\beta_{k}(1+r\eta)^{2}}\geq\frac {\rho(r-\sigma)\eta^{2}}{L\beta_{\mathrm{max}}(1+r\eta)^{2}}.$$

Therefore, there is a positive constant α, such that

$$\alpha_{k}\geq\alpha,$$
(29)

for all k. The proof is completed. □

### Lemma 3.2

If (A2), (A3), and the conditions in Assumption  3.1 hold, then the sequence $$\{\alpha_{k}\}$$ generated by the line search (12) has a positive bound from below.

### Proof

The proof is similar to that of Lemma 3.1, and we omit it for concision. This completes the proof. □

### Theorem 3.1

In addition to the assumptions in Theorem  2.1, if conditions (24) and (25) hold, then the sequence $$\{ \operatorname{dist}(x_{k},C^{*})\}$$ generated by Algorithm  2.1 converges locally to 0 at the Q-linear rate, hence the sequence $$\{x_{k}\}$$ converges locally to $$x^{*}$$ at the R-linear rate.

### Proof

Let $$v_{k}\in C^{*}$$ be the closest solution to $$x_{k}$$. That is, $$\|x_{k}-v_{k}\| =\operatorname{dist}(x_{k},C^{*})$$. By (19), we have

$$\|x_{k+1}-v_{k}\|^{2}\leq \|x_{k}-v_{k}\|^{2}-\gamma (2-\gamma) \frac{\langle F(z_{k}), x_{k}-z_{k}\rangle^{2}}{\|F(z_{k})\|^{2}}.$$
(30)

For sufficiently large k, it follows from (13) and (25) that

\begin{aligned} \bigl\Vert F(z_{k})\bigr\Vert &=\bigl\Vert F(z_{k})-F(v_{k}) \bigr\Vert \\ &\leq L\Vert z_{k}-v_{k}\Vert \\ &\leq L\bigl(\Vert x_{k}-y_{k}\Vert +\Vert x_{k}-v_{k}\Vert \bigr) \\ &\leq L\bigl(\beta_{\mathrm{max}}\Vert d_{k}\Vert +\Vert x_{k}-v_{k}\Vert \bigr) \\ &\leq L \biggl(\frac{\beta_{\mathrm{max}}(1+r\eta)\Vert F(x_{k})\Vert }{\eta}+\Vert x_{k}-v_{k} \Vert \biggr) \\ &=L \biggl(\frac{\beta_{\mathrm{max}}(1+r\eta)\Vert F(x_{k})-F(v_{k})\Vert }{\eta }+\Vert x_{k}-v_{k}\Vert \biggr) \\ &\leq L \biggl(\frac{\beta_{\mathrm{max}}L(1+r\eta)}{\eta}+1 \biggr)\Vert x_{k}-v_{k} \Vert \\ &= L \biggl(\frac{\beta_{\mathrm{max}}L(1+r\eta)}{\eta}+1 \biggr)\operatorname {dist} \bigl(x_{k},C^{*}\bigr). \end{aligned}

Thus, from (9), (24), and (29), for sufficiently large k, we have

$$\bigl\langle F(z_{k}),x_{k}-z_{k}\bigr\rangle \geq\sigma\alpha_{k}\bigl\Vert F(x_{k})\bigr\Vert ^{2} \geq\sigma\alpha\bigl\Vert F(x_{k})\bigr\Vert ^{2}\geq\sigma\alpha c^{2}\operatorname{dist}^{2} \bigl(x_{k},C^{*}\bigr).$$

Substituting the above two inequalities into (30) and from (26), we have

$$\operatorname{dist}^{2}\bigl(x_{k+1},C^{*}\bigr)\leq \|x_{k+1}-v_{k}\|^{2}\leq \biggl(1- \frac {\gamma(2-\gamma)\sigma\alpha c^{2}\eta^{2}}{ L^{2}(\beta_{\mathrm {max}}L(1+r\eta)+\eta)^{2}} \biggr)\operatorname{dist}^{2}\bigl(x_{k},C^{*} \bigr),$$

which implies that the sequence $$\{\operatorname{dist}(x_{k},C^{*})\}$$ converges locally to 0 at the Q-linear rate. Therefore, the sequence $$\{x_{k}\}$$ converges locally to $$x^{*}$$ at the R-linear rate. The proof is completed. □

### Theorem 3.2

In addition to the assumptions in Theorem  2.2, if conditions (24) and (25) hold, then the sequence $$\{ \operatorname{dist}(x_{k},C^{*})\}$$ generated by Algorithm  2.1 converges locally to 0 at the Q-linear rate, hence the sequence $$\{x_{k}\}$$ converges locally to $$x^{*}$$ at an R-linear rate.

### Proof

The proof is similar to that of Theorem 3.1, and we also omit it for concision. This completes the proof. □

## 4 Numerical results

In this section, we test Algorithm 2.1 and Algorithm 2.2, and compare them with the spectral gradient projection method in . We give the following three simple problems to test the efficiency of the three methods.

### Problem 1

The mapping $$F(\cdot)$$ is taken as $$F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}$$, where

$$f_{i}(x)=e^{x_{i}}-1, \quad \mbox{for } i=1,2,\ldots,n$$

and $${C}={R}_{+}^{n}$$. Obviously, this problem has a unique solution $$x^{*}=(0,0,\ldots,0)^{\top}$$.

### Problem 2

The mapping $$F(\cdot)$$ is taken as $$F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}$$, where

$$f_{i}(x)=x_{i}-\sin|x_{i}-1|,\quad \mbox{for } i=1,2,\ldots,n$$

and $${C}=\{x\in{R}_{+}^{n}|\sum_{i=1}^{n}x_{i}\leq n, x_{i}\geq0, i=1,2,\ldots,n\}$$. Obviously, Problem 2 is nonsmooth at $$x=(1,1,\ldots,1)^{\top}$$.

### Problem 3

The problem is adapted from . The mapping $$F(\cdot)$$ is taken as $$F(x)=D(x)+Mx$$, where $$D(x)$$ and Mx are the nonlinear part and linear part of $$F(x)$$, respectively. Here, the components of $$D(x)$$ is defined by $$D_{j}(x)=a_{j}\arctan(x_{j})$$, where $$a_{j}$$ is a random variable in $$(0,100)$$, and the matrix $$M=A^{\top}A+B$$, where A is an $$n\times n$$ matrix whose entries are randomly generated in the interval $$(-1,1)$$ and a skew-symmetric matrix B is generated in the same way. In addition, $${C}={R}_{+}^{n}$$.

The codes are written in Mablab 7.0 and run on a personal computer with 2.0 GHz CPU processor. The parameters used in Algorithm 2.1 and Algorithm 2.2 are set as $$\rho=0.6$$, $$r=10^{-3}$$, $$\sigma=10^{-4}$$, and $$\gamma=1.8$$ for Problem 1 and $$\gamma=1$$ for Problems 2 and 3. The initial step size in Step 2 of Algorithm 2.1 or Algorithm 2.2 is set to be $$\beta_{k}=1$$. We stop the iteration if the iteration number exceeds 1,000 or the inequality $$\|F(x_{k})\|\leq10^{-5}$$ is satisfied. The method in  (denoted by CGD) is implemented with the following parameters: $$\rho=0.1$$, $$r=0.01$$, $$\sigma =10^{-4}$$, and $$\xi=1$$.

For Problems 1 and 2, the initial point is set as $$x_{0}=\operatorname {ones}(n,1)$$, and for Problem 3, the initial point is set as $$x_{0}=\operatorname{rand}(n,1)$$. Tables 1-3 give the numerical results by Algorithm 2.1, Algorithm 2.2, and CGD with different dimensions, where Iter. denotes the iteration number, Fn denotes the number of function evaluations, and CPU denotes the CPU time in seconds when the algorithms terminate.

The numerical results given in Tables 1-3 show that: (1) the three methods can solve all the tested problems successfully; (2) for the two easy Problems 1 and 2, Algorithm 2.2 performs a little better than Algorithm 2.1 for the CPU time, and both methods perform better than CGD for the three criteria: Iter., Fn, and CPU; (3) for the difficult Problem 3, Algorithm 2.1 performs best among the three methods, and both Algorithm 2.1 and Algorithm 2.2 perform much better than CGD, especially for the CPU time. From the above analysis, we conclude that Algorithm 2.1 and Algorithm 2.2 are better than CGD.

## 5 Conclusions

Two spectral gradient projection methods for solving constrained equations have been developed, which are not only derivative-free, but also completely matrix-free. Consequently, they can be applied to solve large-scale nonsmooth constrained equations. We established the global convergence without the requirement of differentiability of the equations, and presented the linear convergence rate under standard conditions. We also reported some numerical results to show the efficiency of the proposed methods.

## References

1. Dirkse, SP, Ferris, MC: MCPLIB: a collection of nonlinear mixed complementarity problems. Optim. Methods Softw. 5, 319-345 (1995)

2. Wood, AJ, Wollenberg, BF: Power Generation, Operation, and Control. Wiley, New York (1996)

3. Meintjes, K, Morgan, AP: A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 22, 333-361 (1987)

4. Qi, LQ, Tong, XJ, Li, DH: An active-set projected trust region algorithm for box constrained nonsmooth equations. J. Optim. Theory Appl. 120, 601-625 (2004)

5. Ortega, JM, Rheinboldt, WC: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

6. Yu, ZS, Lin, J, Sun, J, Xiao, YH, Liu, LY, Li, ZH: Spectral gradient projection method for monotone nonlinear equations with convex constraints. Appl. Numer. Math. 59, 2416-2423 (2009)

7. Liu, SY, Huang, YY, Jiao, HW: Sufficient descent conjugate gradient methods for solving convex constrained nonlinear monotone equations. Abstr. Appl. Anal. 2014, Article ID 305643 (2014)

8. Sun, M, Liu, J: Three derivative-free projection methods for large-scale nonlinear equations with convex constraints. J. Appl. Math. Comput. (2014). doi:10.1007/s12190-014-0774-5

9. Barzilai, J, Borwein, JM: Two point stepsize gradient methods. IMA J. Numer. Anal. 8, 141-148 (1988)

10. Birgin, EG, Martinez, JM, Raydan, M: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60, 1-21 (2014)

11. Fletcher, R, Reeves, C: Function minimization by conjugate gradients. Comput. J. 7, 149-154 (1964)

12. Dai, YH, Liao, LZ: R-Linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1-10 (2002)

13. Wang, CW, Wang, YJ, Xu, CL: A projection method for a system of nonlinear monotone equations with convex constraints. Math. Methods Oper. Res. 66, 33-46 (2007)

14. Zheng, L: A new projection algorithm for solving a system of nonlinear equations with convex constraints. Bull. Korean Math. Soc. 50, 823-832 (2013)

15. Xiao, YH, Zhu, H: A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 405, 310-319 (2013)

## Acknowledgements

The authors gratefully acknowledge the helpful comments and suggestions of the anonymous reviewers. This work is supported by the National Natural Science Foundation of China (71371139, 11302188), the Shanghai Shuguang Talent Project (13SG24), the Shanghai Pujiang Talent Project (12PJC069), and the Foundation of Teachers Professional Development of Zhejiang Provincial Visiting Scholar in Higher School.

## Author information

Authors

### Corresponding author

Correspondence to Yongrui Duan.

### Competing interests

The authors declare that they have no competing interests.

### Authors’ contributions

The first author has designed the two algorithms and the second author has refined them. Both authors have equally contributed in the numerical results. Both authors read and approved the final manuscript.

Equal contributors

## Rights and permissions

Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Reprints and Permissions 