Two spectral gradient projection methods for constrained equations and their linear convergence rate

Due to its simplicity and numerical efficiency for unconstrained optimization problems, the spectral gradient method has received more and more attention in recent years. In this paper, two spectral gradient projection methods for constrained equations are proposed, which are combinations of the well-known spectral gradient method and the hyperplane projection method. The new methods are not only derivative-free, but also completely matrix-free, and consequently they can be applied to solve large-scale constrained equations. Under the condition that the underlying mapping of the constrained equations is Lipschitz continuous or strongly monotone, we establish the global convergence of the new methods. Compared with the existing gradient methods for solving such problems, the new methods possess a linear convergence rate under some error bound conditions. Furthermore, a relax factor γ is attached in the update step to accelerate convergence. Preliminary numerical results show that they are efficient and promising in practice.


Introduction
In this paper, we consider the problems of finding a solution of the following constrained equations, denoted by CES(F, C), where F : C → R n is a given continuous nonlinear mapping and C is a nonempty closed convex set of R n . Obviously, when C = R n , () reduces to the nonlinear equations, which is intensively studied by many scholars. The constrained system of equations () appears in wide variety of problems in applied mathematics, and some important problems, such as economic equilibrium problems [], power flow equations [], and chemical equilibrium systems [], can be reformulated as a problem of the kind (). Among various numerical methods for solving CES(F, C) [-], the gradient projection methods (GPMs) are the most efficient, especially when the projection onto the feasible set C is easy to implement. For example, when C is the nonnegative orthant, or a box, or a ball, GPMs require the lowest computational cost. In addition, the GPMs are also the simplest, because they do not need to store any matrix during the iteration process. Therefore, they are completely matrix-free, and consequently, they can be applied to solve large-scale CES(F, C).
It is well known that the spectral gradient method [, ] and the conjugate gradient method [] are two efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Recently, combined with the projection technique, they are extended to solve constrained equations CES(F, C) by some scholars [, ]. In [], Yu et al. proposed a spectral gradient projection method for solving monotone CES(F, C), which can be applied to nonsmooth constrained equation, and works quite well even for large-scale CES(F, C). Quite recently, Liu et al. [] developed two unified frameworks of some sufficient descent conjugate gradient projection methods for solving monotone CES(F, C), which are also applied to solve large-scale nonsmooth constrained equations. However, the convergence rate issue of the methods in [, ] is not investigated. Therefore, whether they have a linear convergence rate is an open problem. Can we design a spectral/conjugate gradient projection method with a linear convergence rate for CES(F, C)? In this paper, we answer this question positively for spectral gradient projection method. Note that, in [], Dai and Liao proved a nice conclusion for the spectral gradient method. In fact, they established the R-linear convergence of the spectral gradient method for strongly convex quadratics of any number of dimensions, and they also proved the locally R-linear convergence for the general objective function. Obviously, the general minimization problem discussed in [] is equivalent to the system of nonlinear equations under some mild conditions. However, for the system of constrained nonlinear equations, we shall establish the locally R-linear convergence of the spectral gradient method in this paper. Therefore, our result extends the conclusion in [] in some sense.
In fact, in this paper, motivated by the projection methods in [, ] and the spectral gradient method in [], we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed as combinations of the wellknown spectral gradient method and the famous hyperplane projection method, and they possess a linear convergence rate under some error bound conditions. The remainder of this paper is organized as follows. In the next section, we describe the new methods and present their global convergence analysis. The linear convergence rates of the new methods are established in Section . Numerical results are reported in Section . Finally, some final remarks are included in Section .
(A) The mapping F(·) is strongly monotone on C, i.e., there is a positive constant η such that Obviously, (A) implies (A), and from () and the Cauchy-Schwartz inequality, we have Then let P C (·) denote the projection mapping from R n onto the convex set C, i.e., which has the following nonexpansive property: Now, we review the spectral gradient method for the unconstrained minimization problem: where f : R n → R is smooth and its gradient is available. The spectral gradient for solving () is an iterative method of the form where α k is a step size defined by (see []) The step sizes () are called Barzilai-Borwein (BB) step sizes, and the corresponding gradient methods are spectral gradient methods. The spectral gradient with step size α II k has been extended to solve the constrained equations () by Yu et al. [], however, as discussed in the Introduction, we do not know whether the method in [] possesses the linear convergence rate. In the following, we will extend the spectral gradient with step size α I k and α II k to solve constrained equations () by some new type Armijo line searches, and we propose two spectral gradient projection methods, which are not only globally convergent, but also have a linear convergence rate.
The spectral gradient projection methods are stated as follows.
Step . If F(x k ) = , then stop; otherwise, go to Step .
Step . Compute d k by Step .
Step . Find the trial point Step . Compute Choose an initial step size β k+ such that β k+ ∈ [β min , β max ]. Set k := k +  and go to Step .
Step . If F(x k ) = , then stop; otherwise, go to Step .
Step . Compute d k by which is similar to α II k defined in (), s k- = x kx k- , but y k- is defined by Step .
Step . Find the trial point Step . See Step  of Algorithm ..
The discussions of the global convergence and linear convergence rate of Algorithm . are similar to those of Algorithm .. Therefore, in the following, we discuss Algorithm . in detail, and we only give the corresponding results of Algorithm ..
In addition, by the monotonicity of F(·), we also have So we have from the above two inequalities and () from which we can get F(x k ) =  if d k = , which means x k is a solution of CES(F, C). Thus, Algorithm . can also terminate when d k = . Similarly, for Algorithm ., by the Lipschitz continuity and monotonicity of F(·), we can deduce that In what follows, we assume that F(x k ) =  and d k = , for all k, i.e., Algorithm . or Algorithm . generates an infinite sequence {x k }.
Remark . In (), we attach a relax factor γ ∈ (, ) to F(z k ) based on numerical experiences.
Remark . The line search () is different from that of [, ], which is well defined by the following lemma. Proof For the sake of contradiction, we suppose that there exists k  ≥  such that () is not satisfied for any nonnegative integer m, i.e., Letting m → ∞ and using the continuity of F(·) yield On the other hand, by () and (), we obtain which together with () means that σ ≥ r, however, this contradicts the fact that σ < r. Therefore the assertion of Lemma . holds. This completes the proof. Proof The lemma can be proved by contradiction as that of Lemma ., and we omit the proof for concision. This completes the proof.
The step length α k and the norm of the function F(x k ) satisfy the following property, which is an important result for proving the global convergence of Algorithm ..
Lemma . Suppose that F(·) is strongly monotone and let {x k } and {z k } be the sequences generated by Algorithm ., then {x k } and {z k } are both bounded. Furthermore, we have For any x * ∈ C * , from (), we have By the monotonicity of the mapping F(·), we have Substituting () and () into (), we have which together with γ ∈ (, ) indicates that, for all k, which shows that the sequence {x k } is bounded. By (), {d k } is bounded and so is {z k }. Then, by the continuity of F(·), there exists a constant M >  such that F(z k ) ≤ M, for all k. Therefore it follows from () that which implies that the assertion () holds. The proof is completed.

Lemma . Suppose that F(·) is monotone and Lipschitz continuous and let {x
Proof The conclusion is a little different from (), which results from the difference of the right hands of the line searches () and (). In fact, this conclusion can be proved as that of Lemma ., and we also omit it for concision. This completes the proof. Proof We consider the following two possible cases.
Case : lim inf k→∞ F(x k ) = , which together with the continuity of F(·) implies that the sequence {x k } has some accumulation pointx such that F(x) = . From (), { x k -x } converges, and sincex is an accumulation point of {x k }, {x k } must converge tox.
Case : lim inf k→∞ F(x k ) > . Then by (), it follows that lim k→∞ α k = . Therefore, from the line search (), for sufficiently large k, we have Since {x k }, {d k } are both bounded, we can choose a sequence and letting k → ∞ in (), we can obtain wherex,d are limit points of corresponding subsequences. On the other hand, by (), we obtain Letting k → ∞ in the above inequality, we obtain Thus, by () and (), we get r ≤ σ , and this contradicts the fact that r > σ . Therefore lim inf k→∞ F(x k ) >  does not hold. This completes the proof.
For Algorithm ., we also have the following global convergence.
Theorem . Suppose that the conditions in Lemma . hold. Then the sequence {x k } generated by Algorithm . globally converges to a solution of CES(F, C).
Proof Following a process similar to the proof for Theorem ., we can get the desired conclusion. This completes the proof.

Convergence rate
By Theorem . and Theorem ., we know that the sequence {x k } generated by Algorithm . or Algorithm . converges to a solution of CES (F, C). In what follows, we always assume that x k → x * as k → ∞, where x * ∈ C * . To establish the local convergence rate of the sequence generated by Algorithm . or Algorithm ., we need the following assumption.
Assumption . For x * ∈ C * , there exist three positive constants δ, c, and L such that and where dist(x, C * ) denotes the distance from x to the solution set C * , and Obviously, (A) in Section  implies (). Here, we set the constant c so that Proof We only need to prove that for sufficiently large k, α k has a positive bound from below. If α k ≤ β k , then by the construction of α k , we have In addition, by (), we have Then, by the above two inequalities, we can obtain On the other hand, from () and (), we have By () and (), for k sufficiently large we obtain Therefore, there is a positive constant α, such that for all k. The proof is completed. Proof The proof is similar to that of Lemma ., and we omit it for concision. This completes the proof.

Theorem . In addition to the assumptions in Theorem ., if conditions () and ()
hold, then the sequence {dist(x k , C * )} generated by Algorithm . converges locally to  at the Q-linear rate, hence the sequence {x k } converges locally to x * at the R-linear rate.
Proof Let v k ∈ C * be the closest solution to x k . That is, x kv k = dist(x k , C * ). By (), we have For sufficiently large k, it follows from () and () that Thus, from (), (), and (), for sufficiently large k, we have Substituting the above two inequalities into () and from (), we have which implies that the sequence {dist(x k , C * )} converges locally to  at the Q-linear rate. Therefore, the sequence {x k } converges locally to x * at the R-linear rate. The proof is completed.
Theorem . In addition to the assumptions in Theorem ., if conditions () and () hold, then the sequence {dist(x k , C * )} generated by Algorithm . converges locally to  at the Q-linear rate, hence the sequence {x k } converges locally to x * at an R-linear rate.
Proof The proof is similar to that of Theorem ., and we also omit it for concision. This completes the proof.

Numerical results
In this section, we test Algorithm . and Algorithm ., and compare them with the spectral gradient projection method in []. We give the following three simple problems to test the efficiency of the three methods. Here, the components of D(x) is defined by D j (x) = a j arctan(x j ), where a j is a random variable in (, ), and the matrix M = A A + B, where A is an n × n matrix whose entries are randomly generated in the interval (-, ) and a skew-symmetric matrix B is generated in the same way. In addition, C = R n + .
The codes are written in Mablab . and run on a personal computer with . GHz CPU processor. The parameters used in Algorithm . and Algorithm . are set as ρ = ., r =  - , σ =  - , and γ = . for Problem  and γ =  for Problems  and . The initial step size in Step  of Algorithm . or Algorithm . is set to be β k = . We stop the iteration if the iteration number exceeds , or the inequality F(x k ) ≤  - is satisfied. The method in [] (denoted by CGD) is implemented with the following parameters: ρ = ., r = ., σ =  - , and ξ = .
For Problems  and , the initial point is set as x  = ones(n, ), and for Problem , the initial point is set as x  = rand(n, ). Tables - give the numerical results by Algorithm ., Algorithm ., and CGD with different dimensions, where Iter. denotes the iteration number, Fn denotes the number of function evaluations, and CPU denotes the CPU time in seconds when the algorithms terminate.
The numerical results given in Tables - show that: () the three methods can solve all the tested problems successfully; () for the two easy Problems  and , Algorithm . performs a little better than Algorithm . for the CPU time, and both methods perform better than CGD for the three criteria: Iter., Fn, and CPU; () for the difficult Problem , Algorithm . performs best among the three methods, and both Algorithm . and Algorithm . perform much better than CGD, especially for the CPU time. From the above analysis, we conclude that Algorithm . and Algorithm . are better than CGD.

Conclusions
Two spectral gradient projection methods for solving constrained equations have been developed, which are not only derivative-free, but also completely matrix-free. Consequently, they can be applied to solve large-scale nonsmooth constrained equations. We established the global convergence without the requirement of differentiability of the equations, and presented the linear convergence rate under standard conditions. We also reported some numerical results to show the efficiency of the proposed methods.