- Research
- Open Access
Two spectral gradient projection methods for constrained equations and their linear convergence rate
- Jing Liu^{1} and
- Yongrui Duan^{2}Email author
https://doi.org/10.1186/s13660-014-0525-z
© Liu and Duan; licensee Springer 2015
- Received: 20 July 2014
- Accepted: 12 December 2014
- Published: 10 January 2015
Abstract
Due to its simplicity and numerical efficiency for unconstrained optimization problems, the spectral gradient method has received more and more attention in recent years. In this paper, two spectral gradient projection methods for constrained equations are proposed, which are combinations of the well-known spectral gradient method and the hyperplane projection method. The new methods are not only derivative-free, but also completely matrix-free, and consequently they can be applied to solve large-scale constrained equations. Under the condition that the underlying mapping of the constrained equations is Lipschitz continuous or strongly monotone, we establish the global convergence of the new methods. Compared with the existing gradient methods for solving such problems, the new methods possess a linear convergence rate under some error bound conditions. Furthermore, a relax factor γ is attached in the update step to accelerate convergence. Preliminary numerical results show that they are efficient and promising in practice.
Keywords
- constrained equations
- spectral gradient method
- projection method
- global convergence
1 Introduction
Among various numerical methods for solving \(\operatorname{CES}(F,C)\) [4–8], the gradient projection methods (GPMs) are the most efficient, especially when the projection onto the feasible set C is easy to implement. For example, when C is the nonnegative orthant, or a box, or a ball, GPMs require the lowest computational cost. In addition, the GPMs are also the simplest, because they do not need to store any matrix during the iteration process. Therefore, they are completely matrix-free, and consequently, they can be applied to solve large-scale \(\operatorname{CES}(F,C)\).
It is well known that the spectral gradient method [9, 10] and the conjugate gradient method [11] are two efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Recently, combined with the projection technique, they are extended to solve constrained equations \(\operatorname{CES}(F,C)\) by some scholars [6, 7]. In [6], Yu et al. proposed a spectral gradient projection method for solving monotone \(\operatorname{CES}(F,C)\), which can be applied to nonsmooth constrained equation, and works quite well even for large-scale \(\operatorname{CES}(F,C)\). Quite recently, Liu et al. [7] developed two unified frameworks of some sufficient descent conjugate gradient projection methods for solving monotone \(\operatorname{CES}(F,C)\), which are also applied to solve large-scale nonsmooth constrained equations. However, the convergence rate issue of the methods in [6, 7] is not investigated. Therefore, whether they have a linear convergence rate is an open problem. Can we design a spectral/conjugate gradient projection method with a linear convergence rate for \(\operatorname{CES}(F,C)\)? In this paper, we answer this question positively for spectral gradient projection method. Note that, in [12], Dai and Liao proved a nice conclusion for the spectral gradient method. In fact, they established the R-linear convergence of the spectral gradient method for strongly convex quadratics of any number of dimensions, and they also proved the locally R-linear convergence for the general objective function. Obviously, the general minimization problem discussed in [12] is equivalent to the system of nonlinear equations under some mild conditions. However, for the system of constrained nonlinear equations, we shall establish the locally R-linear convergence of the spectral gradient method in this paper. Therefore, our result extends the conclusion in [12] in some sense.
In fact, in this paper, motivated by the projection methods in [13, 14] and the spectral gradient method in [6], we propose two spectral gradient projection methods for solving nonsmooth constrained equations, which can be viewed as combinations of the well-known spectral gradient method and the famous hyperplane projection method, and they possess a linear convergence rate under some error bound conditions. The remainder of this paper is organized as follows. In the next section, we describe the new methods and present their global convergence analysis. The linear convergence rates of the new methods are established in Section 3. Numerical results are reported in Section 4. Finally, some final remarks are included in Section 5.
2 Algorithm and convergence analysis
- (A1)
The solution set \({C^{*}}\) is nonempty.
- (A2)The mapping \(F(\cdot)\) is monotone on C, i.e.,$$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq0,\quad \mbox{for all } x,y\in C. $$
- (A3)The mapping \(F(\cdot)\) is Lipschitz continuous on C, i.e., there is a positive constant L such that$$ \bigl\Vert F(x)-F(y)\bigr\Vert \leq L\|x-y\|,\quad \mbox{for all } x,y\in C. $$
- (A4)The mapping \(F(\cdot)\) is strongly monotone on C, i.e., there is a positive constant η such that$$ \bigl\langle F(x)-F(y), x-y\bigr\rangle \geq\eta\|x-y\| ^{2}, \quad \mbox{for all } x,y\in C. $$(2)
The spectral gradient projection methods are stated as follows.
Algorithm 2.1
Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), the parameters \(0<\rho<1\), \(0<\sigma<r<1\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).
Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.
Algorithm 2.2
Step 0. Set an arbitrary initial point \(x_{0}\in{C}\), compute L, the Lipschitz constant of \(F(\cdot)\), choose the parameters \(0<\rho <1\), \(0< r<1\), \(0<\sigma<r^{2}/(L+r)\), \(0<\gamma<2\), and \(0<\beta_{\mathrm {min}}<\beta_{\mathrm{max}}\). Set the initial step size \(\beta_{0}=1\) and set \(k:=0\).
Step 1. If \(F(x_{k})=0\), then stop; otherwise, go to Step 2.
Step 4. See Step 4 of Algorithm 2.1.
The discussions of the global convergence and linear convergence rate of Algorithm 2.2 are similar to those of Algorithm 2.1. Therefore, in the following, we discuss Algorithm 2.2 in detail, and we only give the corresponding results of Algorithm 2.2.
Remark 2.1
In what follows, we assume that \(\|F(x_{k})\|\neq0\) and \(\|d_{k}\|\neq0\), for all k, i.e., Algorithm 2.1 or Algorithm 2.2 generates an infinite sequence \(\{x_{k}\}\).
Remark 2.2
In (10), we attach a relax factor \(\gamma \in(0,2)\) to \(F(z_{k})\) based on numerical experiences.
Remark 2.3
The line search (9) is different from that of [6, 7], which is well defined by the following lemma.
Lemma 2.1
For all \(k\geq0\), there exists a nonnegative number \(m_{k}\) satisfying (9).
Proof
For the line search (12), we have a similar result, in the following lemma.
Lemma 2.2
For all \(k\geq0\), there exists a nonnegative number \(m_{k}\) satisfying (12).
Proof
The lemma can be proved by contradiction as that of Lemma 2.1, and we omit the proof for concision. This completes the proof. □
The step length \(\alpha_{k}\) and the norm of the function \(F(x_{k})\) satisfy the following property, which is an important result for proving the global convergence of Algorithm 2.1.
Lemma 2.3
Proof
Lemma 2.4
Proof
The conclusion is a little different from (15), which results from the difference of the right hands of the line searches (9) and (12). In fact, this conclusion can be proved as that of Lemma 2.3, and we also omit it for concision. This completes the proof. □
Now, we establish the global convergence theorems for Algorithm 2.1 and Algorithm 2.2.
Theorem 2.1
Suppose that the conditions in Lemma 2.3 hold. Then the sequence \(\{x_{k}\}\) generated by Algorithm 2.1 globally converges to a solution of \(\operatorname{CES}(F,C)\).
Proof
We consider the following two possible cases.
Case 1: \(\liminf_{k\rightarrow\infty}\|F(x_{k})\|=0\), which together with the continuity of \(F(\cdot)\) implies that the sequence \(\{x_{k}\}\) has some accumulation point \(\bar{x}\) such that \(F(\bar{x})=0\). From (20), \(\{\|x_{k}-\bar{x}\|\}\) converges, and since \(\bar{x}\) is an accumulation point of \(\{x_{k}\}\), \(\{x_{k}\}\) must converge to \(\bar{x}\).
For Algorithm 2.2, we also have the following global convergence.
Theorem 2.2
Suppose that the conditions in Lemma 2.4 hold. Then the sequence \(\{x_{k}\}\) generated by Algorithm 2.2 globally converges to a solution of \(\operatorname{CES}(F,C)\).
Proof
Following a process similar to the proof for Theorem 2.1, we can get the desired conclusion. This completes the proof. □
3 Convergence rate
By Theorem 2.1 and Theorem 2.2, we know that the sequence \(\{x_{k}\}\) generated by Algorithm 2.1 or Algorithm 2.2 converges to a solution of \(\operatorname{CES}(F,C)\). In what follows, we always assume that \(x_{k}\rightarrow x^{*}\) as \(k\rightarrow\infty\), where \(x^{*}\in C^{*}\). To establish the local convergence rate of the sequence generated by Algorithm 2.1 or Algorithm 2.2, we need the following assumption.
Assumption 3.1
Lemma 3.1
If (A4) and the conditions in Assumption 3.1 hold, then the sequence \(\{\alpha_{k}\}\) generated by the line search (9) has a positive bound from below.
Proof
Lemma 3.2
If (A2), (A3), and the conditions in Assumption 3.1 hold, then the sequence \(\{\alpha_{k}\}\) generated by the line search (12) has a positive bound from below.
Proof
The proof is similar to that of Lemma 3.1, and we omit it for concision. This completes the proof. □
Theorem 3.1
In addition to the assumptions in Theorem 2.1, if conditions (24) and (25) hold, then the sequence \(\{ \operatorname{dist}(x_{k},C^{*})\}\) generated by Algorithm 2.1 converges locally to 0 at the Q-linear rate, hence the sequence \(\{x_{k}\}\) converges locally to \(x^{*}\) at the R-linear rate.
Proof
Theorem 3.2
In addition to the assumptions in Theorem 2.2, if conditions (24) and (25) hold, then the sequence \(\{ \operatorname{dist}(x_{k},C^{*})\}\) generated by Algorithm 2.1 converges locally to 0 at the Q-linear rate, hence the sequence \(\{x_{k}\}\) converges locally to \(x^{*}\) at an R-linear rate.
Proof
The proof is similar to that of Theorem 3.1, and we also omit it for concision. This completes the proof. □
4 Numerical results
In this section, we test Algorithm 2.1 and Algorithm 2.2, and compare them with the spectral gradient projection method in [15]. We give the following three simple problems to test the efficiency of the three methods.
Problem 1
Problem 2
Problem 3
The problem is adapted from [13]. The mapping \(F(\cdot)\) is taken as \(F(x)=D(x)+Mx\), where \(D(x)\) and Mx are the nonlinear part and linear part of \(F(x)\), respectively. Here, the components of \(D(x)\) is defined by \(D_{j}(x)=a_{j}\arctan(x_{j})\), where \(a_{j}\) is a random variable in \((0,100)\), and the matrix \(M=A^{\top}A+B\), where A is an \(n\times n\) matrix whose entries are randomly generated in the interval \((-1,1)\) and a skew-symmetric matrix B is generated in the same way. In addition, \({C}={R}_{+}^{n}\).
The codes are written in Mablab 7.0 and run on a personal computer with 2.0 GHz CPU processor. The parameters used in Algorithm 2.1 and Algorithm 2.2 are set as \(\rho=0.6\), \(r=10^{-3}\), \(\sigma=10^{-4}\), and \(\gamma=1.8\) for Problem 1 and \(\gamma=1\) for Problems 2 and 3. The initial step size in Step 2 of Algorithm 2.1 or Algorithm 2.2 is set to be \(\beta_{k}=1\). We stop the iteration if the iteration number exceeds 1,000 or the inequality \(\|F(x_{k})\|\leq10^{-5}\) is satisfied. The method in [15] (denoted by CGD) is implemented with the following parameters: \(\rho=0.1\), \(r=0.01\), \(\sigma =10^{-4}\), and \(\xi=1\).
Numerical results with different dimensions of Problem 1
Numerical results with different dimensions of Problem 2
The numerical results given in Tables 1-3 show that: (1) the three methods can solve all the tested problems successfully; (2) for the two easy Problems 1 and 2, Algorithm 2.2 performs a little better than Algorithm 2.1 for the CPU time, and both methods perform better than CGD for the three criteria: Iter., Fn, and CPU; (3) for the difficult Problem 3, Algorithm 2.1 performs best among the three methods, and both Algorithm 2.1 and Algorithm 2.2 perform much better than CGD, especially for the CPU time. From the above analysis, we conclude that Algorithm 2.1 and Algorithm 2.2 are better than CGD.
5 Conclusions
Two spectral gradient projection methods for solving constrained equations have been developed, which are not only derivative-free, but also completely matrix-free. Consequently, they can be applied to solve large-scale nonsmooth constrained equations. We established the global convergence without the requirement of differentiability of the equations, and presented the linear convergence rate under standard conditions. We also reported some numerical results to show the efficiency of the proposed methods.
Notes
Declarations
Acknowledgements
The authors gratefully acknowledge the helpful comments and suggestions of the anonymous reviewers. This work is supported by the National Natural Science Foundation of China (71371139, 11302188), the Shanghai Shuguang Talent Project (13SG24), the Shanghai Pujiang Talent Project (12PJC069), and the Foundation of Teachers Professional Development of Zhejiang Provincial Visiting Scholar in Higher School.
Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Authors’ Affiliations
References
- Dirkse, SP, Ferris, MC: MCPLIB: a collection of nonlinear mixed complementarity problems. Optim. Methods Softw. 5, 319-345 (1995) View ArticleGoogle Scholar
- Wood, AJ, Wollenberg, BF: Power Generation, Operation, and Control. Wiley, New York (1996) Google Scholar
- Meintjes, K, Morgan, AP: A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 22, 333-361 (1987) View ArticleMATHMathSciNetGoogle Scholar
- Qi, LQ, Tong, XJ, Li, DH: An active-set projected trust region algorithm for box constrained nonsmooth equations. J. Optim. Theory Appl. 120, 601-625 (2004) View ArticleMATHMathSciNetGoogle Scholar
- Ortega, JM, Rheinboldt, WC: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970) MATHGoogle Scholar
- Yu, ZS, Lin, J, Sun, J, Xiao, YH, Liu, LY, Li, ZH: Spectral gradient projection method for monotone nonlinear equations with convex constraints. Appl. Numer. Math. 59, 2416-2423 (2009) View ArticleMATHMathSciNetGoogle Scholar
- Liu, SY, Huang, YY, Jiao, HW: Sufficient descent conjugate gradient methods for solving convex constrained nonlinear monotone equations. Abstr. Appl. Anal. 2014, Article ID 305643 (2014) MathSciNetGoogle Scholar
- Sun, M, Liu, J: Three derivative-free projection methods for large-scale nonlinear equations with convex constraints. J. Appl. Math. Comput. (2014). doi:10.1007/s12190-014-0774-5 Google Scholar
- Barzilai, J, Borwein, JM: Two point stepsize gradient methods. IMA J. Numer. Anal. 8, 141-148 (1988) View ArticleMATHMathSciNetGoogle Scholar
- Birgin, EG, Martinez, JM, Raydan, M: Spectral projected gradient methods: review and perspectives. J. Stat. Softw. 60, 1-21 (2014) Google Scholar
- Fletcher, R, Reeves, C: Function minimization by conjugate gradients. Comput. J. 7, 149-154 (1964) View ArticleMATHMathSciNetGoogle Scholar
- Dai, YH, Liao, LZ: R-Linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1-10 (2002) View ArticleMATHMathSciNetGoogle Scholar
- Wang, CW, Wang, YJ, Xu, CL: A projection method for a system of nonlinear monotone equations with convex constraints. Math. Methods Oper. Res. 66, 33-46 (2007) View ArticleMATHMathSciNetGoogle Scholar
- Zheng, L: A new projection algorithm for solving a system of nonlinear equations with convex constraints. Bull. Korean Math. Soc. 50, 823-832 (2013) View ArticleMATHMathSciNetGoogle Scholar
- Xiao, YH, Zhu, H: A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 405, 310-319 (2013) View ArticleMathSciNetGoogle Scholar