Research  Open  Published:
Two spectral gradient projection methods for constrained equations and their linear convergence rate
Journal of Inequalities and Applicationsvolume 2015, Article number: 8 (2015)
Abstract
Due to its simplicity and numerical efficiency for unconstrained optimization problems, the spectral gradient method has received more and more attention in recent years. In this paper, two spectral gradient projection methods for constrained equations are proposed, which are combinations of the wellknown spectral gradient method and the hyperplane projection method. The new methods are not only derivativefree, but also completely matrixfree, and consequently they can be applied to solve largescale constrained equations. Under the condition that the underlying mapping of the constrained equations is Lipschitz continuous or strongly monotone, we establish the global convergence of the new methods. Compared with the existing gradient methods for solving such problems, the new methods possess a linear convergence rate under some error bound conditions. Furthermore, a relax factor γ is attached in the update step to accelerate convergence. Preliminary numerical results show that they are efficient and promising in practice.
Introduction
In this paper, we consider the problems of finding a solution of the following constrained equations, denoted by $\operatorname{CES}(F,C)$,
where $F: C\rightarrow R^{n}$ is a given continuous nonlinear mapping and C is a nonempty closed convex set of $R^{n}$. Obviously, when $C=R^{n}$, (1) reduces to the nonlinear equations, which is intensively studied by many scholars. The constrained system of equations (1) appears in wide variety of problems in applied mathematics, and some important problems, such as economic equilibrium problems [1], power flow equations [2], and chemical equilibrium systems [3], can be reformulated as a problem of the kind (1).
Among various numerical methods for solving $\operatorname{CES}(F,C)$ [4–8], the gradient projection methods (GPMs) are the most efficient, especially when the projection onto the feasible set C is easy to implement. For example, when C is the nonnegative orthant, or a box, or a ball, GPMs require the lowest computational cost. In addition, the GPMs are also the simplest, because they do not need to store any matrix during the iteration process. Therefore, they are completely matrixfree, and consequently, they can be applied to solve largescale $\operatorname{CES}(F,C)$.
It is well known that the spectral gradient method [9, 10] and the conjugate gradient method [11] are two efficient methods for solving largescale unconstrained optimization problems due to their simplicity and low storage. Recently, combined with the projection technique, they are extended to solve constrained equations $\operatorname{CES}(F,C)$ by some scholars [6, 7]. In [6], Yu et al. proposed a spectral gradient projection method for solving monotone $\operatorname{CES}(F,C)$, which can be applied to nonsmooth constrained equation, and works quite well even for largescale $\operatorname{CES}(F,C)$. Quite recently, Liu et al. [7] developed two unified frameworks of some sufficient descent conjugate gradient projection methods for solving monotone $\operatorname{CES}(F,C)$, which are also applied to solve largescale nonsmooth constrained equations. However, the convergence rate issue of the methods in [6, 7] is not investigated. Therefore, whether they have a linear convergence rate is an open problem. Can we design a spectral/conjugate gradient projection method with a linear convergence rate for $\operatorname{CES}(F,C)$? In this paper, we answer this question positively for spectral gradient projection method. Note that, in [12], Dai and Liao proved a nice conclusion for the spectral gradient method. In fact, they established the Rlinear convergence of the spectral gradient method for strongly convex quadratics of any number of dimensions, and they also proved the locally Rlinear convergence for the general objective function. Obviously, the general minimization problem discussed in [12] is equivalent to the system of nonlinear equations under some mild conditions. However, for the system of constrained nonlinear equations, we shall establish the locally Rlinear convergence of the spectral gradient method in this paper. Therefore, our result extends the conclusion in [12] in some sense.
In fact, in this paper, motivated by the projection methods in [13, 14] and the spectral gradient method in [6], we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed as combinations of the wellknown spectral gradient method and the famous hyperplane projection method, and they possess a linear convergence rate under some error bound conditions. The remainder of this paper is organized as follows. In the next section, we describe the new methods and present their global convergence analysis. The linear convergence rates of the new methods are established in Section 3. Numerical results are reported in Section 4. Finally, some final remarks are included in Section 5.
Algorithm and convergence analysis
First, we denote $\x\=\sqrt{x^{\top}x}$ as the Euclideannorm. Let ${C^{*}}$ denote the solution set of $\operatorname{CES}(F,C)$. Throughout this paper, we assume that:

(A1)
The solution set ${C^{*}}$ is nonempty.

(A2)
The mapping $F(\cdot)$ is monotone on C, i.e.,
$$ \bigl\langle F(x)F(y), xy\bigr\rangle \geq0,\quad \mbox{for all } x,y\in C. $$ 
(A3)
The mapping $F(\cdot)$ is Lipschitz continuous on C, i.e., there is a positive constant L such that
$$ \bigl\Vert F(x)F(y)\bigr\Vert \leq L\xy\,\quad \mbox{for all } x,y\in C. $$ 
(A4)
The mapping $F(\cdot)$ is strongly monotone on C, i.e., there is a positive constant η such that
$$ \bigl\langle F(x)F(y), xy\bigr\rangle \geq\eta\xy\ ^{2}, \quad \mbox{for all } x,y\in C. $$(2)
Obviously, (A4) implies (A2), and from (2) and the CauchySchwartz inequality, we have
Then let $P_{C}(\cdot)$ denote the projection mapping from $R^{n}$ onto the convex set C, i.e.,
which has the following nonexpansive property:
Now, we review the spectral gradient method for the unconstrained minimization problem:
where $f: R^{n}\rightarrow R$ is smooth and its gradient is available. The spectral gradient for solving (5) is an iterative method of the form
where $\alpha_{k}$ is a step size defined by (see [9])
in which $s_{k1}=x_{k}x_{k1}$, $y_{k1}=\nabla f(x_{k})\nabla f(x_{k1})$. The step sizes (6) are called BarzilaiBorwein (BB) step sizes, and the corresponding gradient methods are spectral gradient methods. The spectral gradient with step size $\alpha_{k}^{\mathrm{II}}$ has been extended to solve the constrained equations (1) by Yu et al. [6], however, as discussed in the Introduction, we do not know whether the method in [6] possesses the linear convergence rate. In the following, we will extend the spectral gradient with step size $\alpha_{k}^{\mathrm{I}}$ and $\alpha_{k}^{\mathrm {II}}$ to solve constrained equations (1) by some new type Armijo line searches, and we propose two spectral gradient projection methods, which are not only globally convergent, but also have a linear convergence rate.
The spectral gradient projection methods are stated as follows.
Algorithm 2.1
Step 0. Set an arbitrary initial point $x_{0}\in{C}$, the parameters $0<\rho<1$, $0<\sigma<r<1$, $0<\gamma<2$, and $0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}$. Set the initial step size $\beta_{0}=1$ and set $k:=0$.
Step 1. If $F(x_{k})=0$, then stop; otherwise, go to Step 2.
Step 2. Compute $d_{k}$ by
where
which is similar to $\alpha_{k}^{\mathrm{I}}$ defined in (6), $y_{k1}=F(x_{k})F(x_{k1})$, but $s_{k1}$ is defined by
which is different from the standard definition of $s_{k1}$. Stop if $d_{k}=0$; otherwise, go to Step 3.
Step 3. Find the trial point $z_{k}=x_{k}+\alpha_{k} d_{k}$, where $\alpha_{k}=\beta_{k}\rho^{m_{k}}$ with $m_{k}$ being the smallest nonnegative integer m such that
Step 4. Compute
where
Choose an initial step size $\beta_{k+1}$ such that $\beta_{k+1}\in [\beta_{\mathrm{min}},\beta_{\mathrm{max}}]$. Set $k:=k+1$ and go to Step 1.
Algorithm 2.2
Step 0. Set an arbitrary initial point $x_{0}\in{C}$, compute L, the Lipschitz constant of $F(\cdot)$, choose the parameters $0<\rho <1$, $0< r<1$, $0<\sigma<r^{2}/(L+r)$, $0<\gamma<2$, and $0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}$. Set the initial step size $\beta_{0}=1$ and set $k:=0$.
Step 1. If $F(x_{k})=0$, then stop; otherwise, go to Step 2.
Step 2. Compute $d_{k}$ by
where
which is similar to $\alpha_{k}^{\mathrm{II}}$ defined in (6), $s_{k1}=x_{k}x_{k1}$, but $y_{k1}$ is defined by
which is different from the standard definition of $s_{k1}$. Stop if $d_{k}=0$; otherwise, go to Step 3.
Step 3. Find the trial point $z_{k}=x_{k}+\alpha_{k} d_{k}$, where $\alpha_{k}=\beta_{k}\rho^{m_{k}}$ with $m_{k}$ being the smallest nonnegative integer m such that
Step 4. See Step 4 of Algorithm 2.1.
The discussions of the global convergence and linear convergence rate of Algorithm 2.2 are similar to those of Algorithm 2.1. Therefore, in the following, we discuss Algorithm 2.2 in detail, and we only give the corresponding results of Algorithm 2.2.
Remark 2.1
For Algorithm 2.1, by (3), we have
In addition, by the monotonicity of $F(\cdot)$, we also have
So we have from the above two inequalities and (7)
from which we can get $\F(x_{k})\=0$ if $\d_{k}\=0$, which means $x_{k}$ is a solution of $\operatorname{CES}(F,C)$. Thus, Algorithm 2.1 can also terminate when $\d_{k}\=0$. Similarly, for Algorithm 2.2, by the Lipschitz continuity and monotonicity of $F(\cdot)$, we can deduce that
In what follows, we assume that $\F(x_{k})\\neq0$ and $\d_{k}\\neq0$, for all k, i.e., Algorithm 2.1 or Algorithm 2.2 generates an infinite sequence $\{x_{k}\}$.
Remark 2.2
In (10), we attach a relax factor $\gamma \in(0,2)$ to $F(z_{k})$ based on numerical experiences.
Remark 2.3
The line search (9) is different from that of [6, 7], which is well defined by the following lemma.
Lemma 2.1
For all $k\geq0$, there exists a nonnegative number $m_{k}$ satisfying (9).
Proof
For the sake of contradiction, we suppose that there exists $k_{0}\geq0$ such that (9) is not satisfied for any nonnegative integer m, i.e.,
Letting $m\rightarrow\infty$ and using the continuity of $F(\cdot)$ yield
On the other hand, by (7) and (13), we obtain
and
which together with (14) means that $\sigma\geq r$, however, this contradicts the fact that $\sigma< r$. Therefore the assertion of Lemma 2.1 holds. This completes the proof. □
For the line search (12), we have a similar result, in the following lemma.
Lemma 2.2
For all $k\geq0$, there exists a nonnegative number $m_{k}$ satisfying (12).
Proof
The lemma can be proved by contradiction as that of Lemma 2.1, and we omit the proof for concision. This completes the proof. □
The step length $\alpha_{k}$ and the norm of the function $F(x_{k})$ satisfy the following property, which is an important result for proving the global convergence of Algorithm 2.1.
Lemma 2.3
Suppose that $F(\cdot)$ is strongly monotone and let $\{x_{k}\}$ and $\{z_{k}\}$ be the sequences generated by Algorithm 2.1, then $\{x_{k}\}$ and $\{z_{k}\}$ are both bounded. Furthermore, we have
Proof
From (9), we have
For any $x^{*}\in C^{*}$, from (4), we have
By the monotonicity of the mapping $F(\cdot)$, we have
Substituting (16) and (18) into (17), we have
which together with $\gamma\in(0,2)$ indicates that, for all k,
which shows that the sequence $\{x_{k}\}$ is bounded. By (13), $\{d_{k}\}$ is bounded and so is $\{z_{k}\}$. Then, by the continuity of $F(\cdot)$, there exists a constant $M>0$ such that $\F(z_{k})\\leq M$, for all k. Therefore it follows from (19) that
which implies that the assertion (15) holds. The proof is completed. □
Lemma 2.4
Suppose that $F(\cdot)$ is monotone and Lipschitz continuous and let $\{x_{k}\}$ and $\{z_{k}\}$ be the sequences generated by Algorithm 2.2, then $\{x_{k}\}$ and $\{z_{k}\}$ are both bounded. Furthermore, we have
Proof
The conclusion is a little different from (15), which results from the difference of the right hands of the line searches (9) and (12). In fact, this conclusion can be proved as that of Lemma 2.3, and we also omit it for concision. This completes the proof. □
Now, we establish the global convergence theorems for Algorithm 2.1 and Algorithm 2.2.
Theorem 2.1
Suppose that the conditions in Lemma 2.3 hold. Then the sequence $\{x_{k}\}$ generated by Algorithm 2.1 globally converges to a solution of $\operatorname{CES}(F,C)$.
Proof
We consider the following two possible cases.
Case 1: $\liminf_{k\rightarrow\infty}\F(x_{k})\=0$, which together with the continuity of $F(\cdot)$ implies that the sequence $\{x_{k}\}$ has some accumulation point $\bar{x}$ such that $F(\bar{x})=0$. From (20), $\{\x_{k}\bar{x}\\}$ converges, and since $\bar{x}$ is an accumulation point of $\{x_{k}\}$, $\{x_{k}\}$ must converge to $\bar{x}$.
Case 2: $\liminf_{k\rightarrow\infty}\F(x_{k})\>0$. Then by (15), it follows that $\lim_{k\rightarrow\infty}\alpha_{k}=0$. Therefore, from the line search (9), for sufficiently large k, we have
Since $\{x_{k}\}$, $\{d_{k}\}$ are both bounded, we can choose a sequence and letting $k\rightarrow\infty$ in (21), we can obtain
where $\bar{x}$, $\bar{d}$ are limit points of corresponding subsequences. On the other hand, by (13), we obtain
Letting $k\rightarrow\infty$ in the above inequality, we obtain
Thus, by (22) and (23), we get $r\leq\sigma$, and this contradicts the fact that $r>\sigma$. Therefore $\liminf_{k\rightarrow\infty}\F(x_{k})\>0$ does not hold. This completes the proof. □
For Algorithm 2.2, we also have the following global convergence.
Theorem 2.2
Suppose that the conditions in Lemma 2.4 hold. Then the sequence $\{x_{k}\}$ generated by Algorithm 2.2 globally converges to a solution of $\operatorname{CES}(F,C)$.
Proof
Following a process similar to the proof for Theorem 2.1, we can get the desired conclusion. This completes the proof. □
Convergence rate
By Theorem 2.1 and Theorem 2.2, we know that the sequence $\{x_{k}\}$ generated by Algorithm 2.1 or Algorithm 2.2 converges to a solution of $\operatorname{CES}(F,C)$. In what follows, we always assume that $x_{k}\rightarrow x^{*}$ as $k\rightarrow\infty$, where $x^{*}\in C^{*}$. To establish the local convergence rate of the sequence generated by Algorithm 2.1 or Algorithm 2.2, we need the following assumption.
Assumption 3.1
For $x^{*}\in C^{*}$, there exist three positive constants δ, c, and L such that
and
where $\operatorname{dist}(x,C^{*})$ denotes the distance from x to the solution set $C^{*}$, and
Obviously, (A3) in Section 2 implies (25). Here, we set the constant c so that
Now, we analyze the convergence rate of the sequence $\{x_{k}\}$ generated by Algorithm 2.1 or Algorithm 2.2 under the conditions (24) and (25).
Lemma 3.1
If (A4) and the conditions in Assumption 3.1 hold, then the sequence $\{\alpha_{k}\}$ generated by the line search (9) has a positive bound from below.
Proof
We only need to prove that for sufficiently large k, $\alpha_{k}$ has a positive bound from below. If $\alpha_{k}\leq\beta_{k}$, then by the construction of $\alpha_{k}$, we have
In addition, by (7), we have
Then, by the above two inequalities, we can obtain
On the other hand, from (13) and (25), we have
By (27) and (28), for k sufficiently large we obtain
Therefore, there is a positive constant α, such that
for all k. The proof is completed. □
Lemma 3.2
If (A2), (A3), and the conditions in Assumption 3.1 hold, then the sequence $\{\alpha_{k}\}$ generated by the line search (12) has a positive bound from below.
Proof
The proof is similar to that of Lemma 3.1, and we omit it for concision. This completes the proof. □
Theorem 3.1
In addition to the assumptions in Theorem 2.1, if conditions (24) and (25) hold, then the sequence $\{ \operatorname{dist}(x_{k},C^{*})\}$ generated by Algorithm 2.1 converges locally to 0 at the Qlinear rate, hence the sequence $\{x_{k}\}$ converges locally to $x^{*}$ at the Rlinear rate.
Proof
Let $v_{k}\in C^{*}$ be the closest solution to $x_{k}$. That is, $\x_{k}v_{k}\ =\operatorname{dist}(x_{k},C^{*})$. By (19), we have
For sufficiently large k, it follows from (13) and (25) that
Thus, from (9), (24), and (29), for sufficiently large k, we have
Substituting the above two inequalities into (30) and from (26), we have
which implies that the sequence $\{\operatorname{dist}(x_{k},C^{*})\}$ converges locally to 0 at the Qlinear rate. Therefore, the sequence $\{x_{k}\}$ converges locally to $x^{*}$ at the Rlinear rate. The proof is completed. □
Theorem 3.2
In addition to the assumptions in Theorem 2.2, if conditions (24) and (25) hold, then the sequence $\{ \operatorname{dist}(x_{k},C^{*})\}$ generated by Algorithm 2.1 converges locally to 0 at the Qlinear rate, hence the sequence $\{x_{k}\}$ converges locally to $x^{*}$ at an Rlinear rate.
Proof
The proof is similar to that of Theorem 3.1, and we also omit it for concision. This completes the proof. □
Numerical results
In this section, we test Algorithm 2.1 and Algorithm 2.2, and compare them with the spectral gradient projection method in [15]. We give the following three simple problems to test the efficiency of the three methods.
Problem 1
The mapping $F(\cdot)$ is taken as $F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}$, where
and ${C}={R}_{+}^{n}$. Obviously, this problem has a unique solution $x^{*}=(0,0,\ldots,0)^{\top}$.
Problem 2
The mapping $F(\cdot)$ is taken as $F(x)=(f_{1}(x), f_{2}(x),\ldots,f_{n}(x))^{\top}$, where
and ${C}=\{x\in{R}_{+}^{n}\sum_{i=1}^{n}x_{i}\leq n, x_{i}\geq0, i=1,2,\ldots,n\} $. Obviously, Problem 2 is nonsmooth at $x=(1,1,\ldots,1)^{\top}$.
Problem 3
The problem is adapted from [13]. The mapping $F(\cdot)$ is taken as $F(x)=D(x)+Mx$, where $D(x)$ and Mx are the nonlinear part and linear part of $F(x)$, respectively. Here, the components of $D(x)$ is defined by $D_{j}(x)=a_{j}\arctan(x_{j})$, where $a_{j}$ is a random variable in $(0,100)$, and the matrix $M=A^{\top}A+B$, where A is an $n\times n$ matrix whose entries are randomly generated in the interval $(1,1)$ and a skewsymmetric matrix B is generated in the same way. In addition, ${C}={R}_{+}^{n}$.
The codes are written in Mablab 7.0 and run on a personal computer with 2.0 GHz CPU processor. The parameters used in Algorithm 2.1 and Algorithm 2.2 are set as $\rho=0.6$, $r=10^{3}$, $\sigma=10^{4}$, and $\gamma=1.8$ for Problem 1 and $\gamma=1$ for Problems 2 and 3. The initial step size in Step 2 of Algorithm 2.1 or Algorithm 2.2 is set to be $\beta_{k}=1$. We stop the iteration if the iteration number exceeds 1,000 or the inequality $\F(x_{k})\\leq10^{5}$ is satisfied. The method in [15] (denoted by CGD) is implemented with the following parameters: $\rho=0.1$, $r=0.01$, $\sigma =10^{4}$, and $\xi=1$.
For Problems 1 and 2, the initial point is set as $x_{0}=\operatorname {ones}(n,1)$, and for Problem 3, the initial point is set as $x_{0}=\operatorname{rand}(n,1)$. Tables 13 give the numerical results by Algorithm 2.1, Algorithm 2.2, and CGD with different dimensions, where Iter. denotes the iteration number, Fn denotes the number of function evaluations, and CPU denotes the CPU time in seconds when the algorithms terminate.
The numerical results given in Tables 13 show that: (1) the three methods can solve all the tested problems successfully; (2) for the two easy Problems 1 and 2, Algorithm 2.2 performs a little better than Algorithm 2.1 for the CPU time, and both methods perform better than CGD for the three criteria: Iter., Fn, and CPU; (3) for the difficult Problem 3, Algorithm 2.1 performs best among the three methods, and both Algorithm 2.1 and Algorithm 2.2 perform much better than CGD, especially for the CPU time. From the above analysis, we conclude that Algorithm 2.1 and Algorithm 2.2 are better than CGD.
Conclusions
Two spectral gradient projection methods for solving constrained equations have been developed, which are not only derivativefree, but also completely matrixfree. Consequently, they can be applied to solve largescale nonsmooth constrained equations. We established the global convergence without the requirement of differentiability of the equations, and presented the linear convergence rate under standard conditions. We also reported some numerical results to show the efficiency of the proposed methods.
References
 1.
Dirkse, SP, Ferris, MC: MCPLIB: a collection of nonlinear mixed complementarity problems. Optim. Methods Softw. 5, 319345 (1995)
 2.
Wood, AJ, Wollenberg, BF: Power Generation, Operation, and Control. Wiley, New York (1996)
 3.
Meintjes, K, Morgan, AP: A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 22, 333361 (1987)
 4.
Qi, LQ, Tong, XJ, Li, DH: An activeset projected trust region algorithm for box constrained nonsmooth equations. J. Optim. Theory Appl. 120, 601625 (2004)
 5.
Ortega, JM, Rheinboldt, WC: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
 6.
Yu, ZS, Lin, J, Sun, J, Xiao, YH, Liu, LY, Li, ZH: Spectral gradient projection method for monotone nonlinear equations with convex constraints. Appl. Numer. Math. 59, 24162423 (2009)
 7.
Liu, SY, Huang, YY, Jiao, HW: Sufficient descent conjugate gradient methods for solving convex constrained nonlinear monotone equations. Abstr. Appl. Anal. 2014, Article ID 305643 (2014)
 8.
Sun, M, Liu, J: Three derivativefree projection methods for largescale nonlinear equations with convex constraints. J. Appl. Math. Comput. (2014). doi:10.1007/s1219001407745
 9.
Barzilai, J, Borwein, JM: Two point stepsize gradient methods. IMA J. Numer. Anal. 8, 141148 (1988)
 10.
Birgin, EG, Martinez, JM, Raydan, M: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60, 121 (2014)
 11.
Fletcher, R, Reeves, C: Function minimization by conjugate gradients. Comput. J. 7, 149154 (1964)
 12.
Dai, YH, Liao, LZ: RLinear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 110 (2002)
 13.
Wang, CW, Wang, YJ, Xu, CL: A projection method for a system of nonlinear monotone equations with convex constraints. Math. Methods Oper. Res. 66, 3346 (2007)
 14.
Zheng, L: A new projection algorithm for solving a system of nonlinear equations with convex constraints. Bull. Korean Math. Soc. 50, 823832 (2013)
 15.
Xiao, YH, Zhu, H: A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 405, 310319 (2013)
Acknowledgements
The authors gratefully acknowledge the helpful comments and suggestions of the anonymous reviewers. This work is supported by the National Natural Science Foundation of China (71371139, 11302188), the Shanghai Shuguang Talent Project (13SG24), the Shanghai Pujiang Talent Project (12PJC069), and the Foundation of Teachers Professional Development of Zhejiang Provincial Visiting Scholar in Higher School.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
The first author has designed the two algorithms and the second author has refined them. Both authors have equally contributed in the numerical results. Both authors read and approved the final manuscript.
Equal contributors
Rights and permissions
Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
About this article
Received
Accepted
Published
DOI
Keywords
 constrained equations
 spectral gradient method
 projection method
 global convergence