 Research
 Open Access
An efficient modification of the HestenesStiefel nonlinear conjugate gradient method with restart property
 Zabidin Salleh^{1}Email author and
 Ahmad Alhawarat^{1}
https://doi.org/10.1186/s1366001610495
© Salleh and Alhawarat 2016
 Received: 8 November 2015
 Accepted: 18 March 2016
 Published: 6 April 2016
Abstract
The conjugate gradient (CG) method is one of the most popular methods to solve nonlinear unconstrained optimization problems. The HestenesStiefel (HS) CG formula is considered one of the most efficient methods developed in this century. In addition, the HS coefficient is related to the conjugacy condition regardless of the line search method used. However, the HS parameter may not satisfy the global convergence properties of the CG method with the WolfePowell line search if the descent condition is not satisfied. In this paper, we use the original HS CG formula with a mild condition to construct a CG method with restart using the negative gradient. The convergence and descent properties with the strong WolfePowell (SWP) and weak WolfePowell (WWP) line searches are established. Using this condition, we guarantee that the HS formula is nonnegative, its value is restricted, and the number of restarts is not too high. Numerical computations with the SWP line search and some standard optimization problems demonstrate the robustness and efficiency of the new version of the CG parameter in comparison with the latest and classical CG formulas. An example is used to describe the benefit of using different initial points to obtain different solutions for multimodal optimization functions.
Keywords
 conjugate gradient method
 WolfePowell line search
 HestenesStiefel formula
 restart condition
 performance profile
1 Introduction
2 Motivation and the new modification
The following algorithm describes the steps of using the CG method with (17) and SWP line search to obtain the solution for the optimization functions.
Algorithm 1
 Step 0.:

Initialization. Given \(x_{1}\), set \(k = 1\).
 Step 1.:

If \(\Vert g_{k} \Vert \le \varepsilon\), then stop, where \(0 < \varepsilon \ll 1\).
 Step 2.:

Compute \(\beta_{k}\) based on (17).
 Step 5.:

Update a new point based on (2).
 Step 6.:

Convergent test and stopping criteria: if \(\Vert g_{k} \Vert \le \varepsilon\), then stop; otherwise, go to Step 2 with \(k = k + 1\).
3 Global convergence properties for the \(\beta_{k}^{\mathrm{ZA}}\) method
Because we are interested in determining the stationary point for the nonlinear optimization functions that are bounded below and whose gradient is Lipschitz continuous, the following standard assumption is necessary.
Assumption 1
 I.The level set \(\Psi = \{ x f(x) \le f(x_{1})\}\) is bounded, i.e., there is a positive constant M such that$$\Vert x \Vert \le M, \quad \forall x \in \Psi. $$
 II.In some neighborhood N of Ψ, f is continuously differentiable, and its gradient is Lipschitz continuous, i.e., for all \(x,y \in N\), there is a constant \(L > 0\) such that$$\bigl\Vert g(x)  g(y) \bigr\Vert \le L\Vert x  y \Vert . $$
Lemma 3.1
The following two theorems demonstrate that (17) satisfies the descent condition with SWP and WWP line searches.
Theorem 3.1
Let the sequences \(\{ g_{k} \}\) and \(\{ d_{k} \}\) be generated by methods (2), (3) and (17) with step length \(\alpha_{k}\), which is computed using the SWP line searches (9) and (11) with \(\sigma < \frac{1}{3}\); then the sufficient descent condition (13) holds for some \(c \in (0, 1)\).
Proof
Theorem 3.2
Assume the sequences \(\{ g_{k} \}\) and \(\{ d_{k} \}\) are generated using the methods (2), (3) and (17) with step length \(\alpha_{k}\), which is computed via the WWP line search given by (9) and (10); then the descent condition (13) holds.
Proof
Gilbert and Nocedal [7] presented a useful property to prove the global convergence properties for the methods that pertain to the PRP (HS) formula. The property is as follows.
Property ∗
Lemma 3.2
Consider the CG method as defined in (2), (3), and (17) and the step length computed using the WWP line search. If the equation in (22) and Assumption 1 hold, then \(\beta_{k}^{\mathrm{ZA}}\) satisfies Property ∗.
Proof
The proof of the forthcoming lemmas and Theorem 3.3 originally can be found in [7]. However, we present it here for readability. The following lemma shows that if the CG formula satisfies Property ∗, then the fraction of steps cannot be too small.
Lemma 3.3
Lemma 3.4
Theorem 3.3
Suppose that Assumption 1 holds. Assume that the sequences \(\{ g_{k} \}\) and \(\{ d_{k} \}\) are generated by Algorithm 1, where \(\alpha_{k}\) is computed using the WWP line search, and that the sufficient descent condition (13) holds. In addition, suppose that Property ∗ holds. Then we have \(\lim_{k \to \infty} \inf \Vert g_{k} \Vert = 0\).
Proof
Using Lemmas 3.2, 3.3, and 3.4 and Theorem 3.3, the global convergence of Algorithm 1 with the WolfePowell line search is similarly established to that in Theorem 4.3 in [7]. Therefore, the proof of the following theorem is omitted, and we present the following theorems without proof.
4 Numerical results and discussion
A list of test problem functions
No.  Function  Dimension/s  Initial points 

1  Extended White & Holst function  2, 500, 1,000, 5,000, 10,000  (−1.2,1,−1.2,1,…,−1.2,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
2  Extended Rosenbrock function  2, 500, 1,000, 5,000, 10,000  (−1.2,1,−1.2,1,…,−1.2,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
3  Six hump function  2  (1,1), (5,5), (10,10), (15,15) 
4  Extended Beale function  2, 500, 1,000, 5,000, 10,000  (−1,−1,…,−1), (0.5,0.5,…,0.5), (10,10,…,10), (1,1,…,1) 
5  Three hump function  2  (1,1), (5,5), (10,10), (15,15) 
6  Extended Himmelblau function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
7  Diagonal 2 function  2, 500, 1,000, 5,000, 10,000  (0.2,0.2,…,0.2), (0.25,0.25,…,0.25), (0.5,0.5,…,0.5), (1,1,…,1) 
8  NONSCOMP function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (−1,−1,…,−1), (−2,−2,…,−2), (−5,−5,…,−5) 
9  Extended DENSCHNB function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
10  Shallow function  2, 500, 1,000, 5,000, 10,000  (−2,−2,…,−2), (2,2,…,2), (5,5…,5), (10,10,…,10) 
11  Booth function  2  (1,1), (5,5), (10,10), (15,15) 
12  Extended quadratic penalty function QP2  2, 500, 1,000, 5,000, 10,000  (2,2,…,2), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
13  DIXMAANA function  300, 6,000, 9,000, 12,000  (1,1,…,1), (2,2,…,2), (3,3,…,3), (5,5,…,5), 
14  DIXMAANB function  300, 6,000, 9,000, 12,000  (1,1,…,1), (2,2,…,2), (3,3,…,3), (5,5,…,5), 
15  NONDIA function  2, 500, 1,000, 5,000, 10,000  (−2,−2,…,−2), (−1,−1,…,−1), (0,0,…,0), (1,1,…,1) 
16  Extended tridiagonal 1 function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
17  DQDRTIC function  2, 500, 1,000, 5,000, 10,000  (−1,−1,…,−1), (1,1,…,1), (2,2,…,2), (3,3,…,3) 
18  Diagonal 4 function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
19  Extended Cliff function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
20  Shallow function  2, 500, 1,000, 5,000, 10,000  (−2,−2,…,−2), (2,2,…,2), (5,5,…,5), (10,10,…,10) 
21  NONDIA (Shanno78)  2, 500, 1,000, 5,000, 10,000  (−2,−2,…,−2), (−1,−1,…,−1), (0,0,…,0), (1,1,…,1) 
22  Raydan 2 Function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
23  Extended block diagonal BD1 function  2, 500, 1,000, 5,000, 10,000  (0.1,0.1,…,0.1), (0.2,0.2,…,0.2), (0.3,0.3,…,0.3), (0.4,0.4,…,0.4) 
24  Generalized tridiagonal 1 function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
25  Diagonal 4 function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
26  Extended Powell function  2, 500, 1,000, 5,000, 10,000  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
27  Perturbed quadratic function  2, 500, 1,000, 5,000, 10,000  (0.5,0.5,…,0.5), (2,2,…,2), (1,1,…,1), (10,10,…,10) 
28  A quadratic function QF2  2, 500, 1,000, 5,000, 10,000  (0.5,0.5,…,0.5), (1,1,…,1), (5,5,…,5), (10,10,…,10) 
29  Sum squares function  2, 500, 1,000, 5,000, 10,000  (−5,−5,…,−5), (−1,−1,…,−1), (1,1,…,1), (5,5,…,5) 
30  Zettl function  2  (1,1,…,1), (5,5,…,5), (10,10,…,10), (15,15,…,15) 
The initial point \(x_{1} \in \mathbb{R}^{n}\) is arbitrary. As shown in Table 1, we used different initial points based on the original standard points. We notice from the numerical results that different initial points almost had different stationary points for the multimodal functions. In addition, the efficiency of the algorithm depended on the initial points for every function. For example, the efficiency of the FR algorithm with the extended Rosenbrock function and the initial point \((1.2, 1, 1.2, 1, \ldots, 1.2, 1)\) is different from that with \((5, 5, \ldots, 5)\) or \((10, 10, \ldots, 10)\) as the initial point. Moreover, the initial point determines the value of the CG formula based on Powell [20]; for example, the PRP or HS parameter fails to obtain the solution if its value is negative. In contrast, if we use another initial point, the value of PRP is nonnegative and satisfies the descent property. This result motivated us to further study the initial points. Moreover, different dimensions were used for every function, and the dimension range was \([2, 10\text{,}000]\).
The initial points corresponding the optimal points with the Himmelblau function
Initial point  The optimal solution  The function value 

(1,1)  (3,2)  f(3,2)=0 
(−1,−1)  (3.5844,−1.8481)  f(3.5844,−1.8481)=0 
(10,10)  (−3.7793,−3.2832)  f(−3.7793,−3.2832)=0 
(−5,−5)  (−2.8051,3.1313)  f(−2.8051,3.1313)=0 
The performance results are shown in Figures 2 and 3 with a performance profile introduced by Dolan and Moré [21].
Based on the left side of Figures 2 and 3, ZA is clearly above the other curves. As we previously mentioned, PRP^{∗} seems to be better than HPRP because the latter restarted too many times by using the negative gradient. Furthermore, WYL is better than NPRP, DPRP. Although the PRP and HS methods are efficient, both of them have theoretical problems; thus, the number of solved function using the PRP formula does not exceed 75%. The HS+ formula also has theoretical problem when the direction derivative is positive; hence, it may not satisfy the descent property with the SWP line search. Thus, the percentage value of solved functions using the HS+ formula is approximately 90%. The FR formula satisfies the descent property and convergence property, but we terminated the program several times because it is cyclic without reaching the solution. For all algorithms, the time limit to obtain the solution was 500 seconds.
5 Conclusion
In this paper, we used the HS CG formula with the restart. The global convergence and descent properties were established with WWP and SWP line searches. The numerical results demonstrate that the new modification is better than other CG parameters.
Declarations
Acknowledgements
The authors are grateful to the editor and the anonymous reviewers for their valuable comments and suggestions, which have substantially improved this paper. In addition, we acknowledge the Ministry of Higher Education Malaysia and Universiti Malaysia Terengganu; this study was partially supported under the Fundamental Research Grant Scheme (FRGS) Vote no. 59347.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Hestenes, MR, Stiefel, E: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409436 (1952) MathSciNetView ArticleMATHGoogle Scholar
 Fletcher, R, Reeves, CM: Function minimization by conjugate gradients. Comput. J. 7(2), 149154 (1964) MathSciNetView ArticleMATHGoogle Scholar
 Polak, E, Ribiere, G: Note sur la convergence de méthodes de directions conjuguées. ESAIM: Math. Model. Numer. Anal. 3(R1), 3543 (1969) MathSciNetMATHGoogle Scholar
 Wolfe, P: Convergence conditions for ascent methods. SIAM Rev. 11, 226235 (1968) MathSciNetView ArticleMATHGoogle Scholar
 Wolfe, P: Convergence conditions for ascent methods. II: some corrections. SIAM Rev. 13, 185188 (1971) MathSciNetView ArticleMATHGoogle Scholar
 Dai, YH, Liao, LZ: New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43(1), 87101 (2001) MathSciNetView ArticleMATHGoogle Scholar
 Gilbert, JC, Nocedal, J: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 2142 (1992) MathSciNetView ArticleMATHGoogle Scholar
 Zoutendijk, G: Nonlinear programming, computational methods. Integer Nonlinear Program. 143(1), 3786 (1970) MathSciNetMATHGoogle Scholar
 Moré, JJ, Thuente, DJ: On line search algorithms with guaranteed sufficient decrease. Mathematics and Computer Science Division Preprint MCSP1530590, Argonne National Laboratory, Argonne, IL (1990) Google Scholar
 TouatiAhmed, D, Storey, C: Efficient hybrid conjugate gradient techniques. J. Optim. Theory Appl. 64, 379397 (1990) MathSciNetView ArticleMATHGoogle Scholar
 Wei, Z, Yao, S, Liu, L: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183(2), 13411350 (2006) MathSciNetView ArticleMATHGoogle Scholar
 Zhang, L: An improved WeiYaoLiu nonlinear conjugate gradient method for optimization computation. Appl. Math. Comput. 215(6), 22692274 (2009) MathSciNetView ArticleMATHGoogle Scholar
 Dai, Z, Wen, F: Another improved WeiYaoLiu nonlinear conjugate gradient method with sufficient descent property. Appl. Math. Comput. 218(14), 74217430 (2012) MathSciNetView ArticleMATHGoogle Scholar
 Sun, W, Yuan, YX: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer, Berlin (2006) MATHGoogle Scholar
 Alhawarat, A, Mamat, M, Rivaie, M, Salleh, Z: An efficient hybrid conjugate gradient method with the strong WolfePowell line search. Math. Probl. Eng. 2015, Article ID 103517 (2015) MathSciNetView ArticleGoogle Scholar
 Yuan, G, Meng, Z, Li, Y: A modified Hestenes and Stiefel conjugate gradient algorithm for largescale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168(1), 129152 (2016) MathSciNetView ArticleMATHGoogle Scholar
 Alhawarat, A, Mamat, M, Rivaie, M, Mohd, I: A new modification of nonlinear conjugate gradient coefficients with global convergence properties. Int. J. Math. Comput. Stat. Nat. Phys. Eng. 8(1), 5460 (2014) Google Scholar
 Bongartz, I, Conn, AR, Gould, N, Toint, PL: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21(1), 123160 (1995) View ArticleMATHGoogle Scholar
 Andrei, N: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147161 (2008) MathSciNetMATHGoogle Scholar
 Powell, MJD: Nonconvex Minimization Calculations and the Conjugate Gradient Method. Springer, Berlin (1984) View ArticleMATHGoogle Scholar
 Dolan, ED, Moré, JJ: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201213 (2002) MathSciNetView ArticleMATHGoogle Scholar