Global convergence of a modified conjugate gradient method
 Xuesha Wu^{1}Email author
https://doi.org/10.1186/1029242X2014248
© Wu; licensee Springer. 2014
Received: 19 March 2014
Accepted: 4 June 2014
Published: 18 July 2014
Abstract
A modified conjugate gradient method to solve unconstrained optimization problems is proposed which satisfies the sufficient descent condition in the case of the strong Wolfe line search, and its global convergence property is established simply. The numerical results show that the proposed method is promising for the given test problems.
MSC:90C26, 65H10.
Keywords
1 Introduction
The nonlinear conjugate gradient method is one of the best methods to solve unconstrained optimization problems. It comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and strong local or global convergence properties. Therefore, a modified nonlinear conjugate gradient method is proposed and analyzed in this paper.
where $f:{R}^{n}\to R$ is a smooth function and its gradient is denoted by g.
where $\parallel \cdot \parallel $ is the Euclidean norm. Their corresponding methods are generally called PRP, LS, and HZ conjugate gradient methods. If f is a strictly convex quadratic function, these methods are equivalent in the case that an exact line search is used. If f is nonconvex, their behaviors may show some differences.
When the objective function is convex, Polak and Ribière [1] proved that the PRP method is globally convergent under the exact line search. But Powell [5] showed that the PRP method does not converge globally for some nonconvex functions. However, in the past few years, the PRP method is generally believed to be the most efficient conjugate gradient method in practical computation. One remarkable property of the PRP method is that it essentially performs a restart if a bad direction occurs (see [6]). But Powell [5] constructed an example showing that the PRP method can cycle infinitely without approaching any stationary point even if an exact line search is used. This counterexample also indicates that the PRP method has a drawback in that it may not globally be convergent when the objective function is nonconvex. Recently, Zhang et al. [7] proposed a descent modified PRP conjugate gradient method and proved its global convergence. The LS method has a similar property as the PRP method. The global convergence of the LS method with the GrippoLucidi line search has also been proved in [8]. Some researchers have further studied the LS method (see Liu [9], Liu and Du [10]). In addition, Hager and Zhang [4] gave another effective method, namely the CGDESCENT method. It not only has stable convergence, but it also shows an effective numerical experiment result. In this method, the parameter ${\beta}_{k}$ is computed by ${\beta}_{k}=max\{{\beta}_{k}^{\mathrm{HZ}},{\eta}_{k}\}$, where ${\eta}_{k}=\frac{1}{\parallel {d}_{k1}\parallel min\{\eta ,\parallel {g}_{k1}\parallel \}}$, $\eta >0$.
In the next section, a modified conjugate gradient method is proposed. In Section 3, we prove the global convergence of the proposed method for nonconvex functions in the case of the strong Wolfe line search. In Section 4, we report some numerical results.
2 The new algorithm
where ${t}_{k}=\frac{\parallel {g}_{k}\parallel}{\parallel {g}_{k1}\parallel}$, $\lambda \in (0,1)$ and $\lambda >2\sigma $. Obviously, the denominator of (2.1) is a convex combination of ${d}_{k1}^{T}{g}_{k1}$ and $max\{0,{g}_{k}^{T}{d}_{k1}\}$ which may avoid the denominator of ${\beta}_{k}^{\mathrm{LS}}$ tending to zero. Now, we state formally the corresponding algorithm scheme for unconstrained optimization problems.
Algorithm 2.1

Step 0: Given an initial ${x}_{1}\in {R}^{n}$, $\epsilon \ge 0$, $\lambda =0.8$. Set $k=1$.

Step 1: If $\parallel {g}_{1}\parallel \le \epsilon $, then stop.

Step 2: Compute ${\alpha}_{k}$ by the strong Wolfe line search ($0<\delta <\sigma <\frac{1}{2}$):$f({x}_{k}+{\alpha}_{k}{d}_{k})\le f({x}_{k})+\delta {\alpha}_{k}{g}_{k}^{T}{d}_{k},$(2.2)$\leftg{({x}_{k}+{\alpha}_{k}{d}_{k})}^{T}{d}_{k}\right\le \sigma {g}_{k}^{T}{d}_{k}.$(2.3)

Step 3: Let ${x}_{k+1}={x}_{k}+{\alpha}_{k}{d}_{k}$, ${g}_{k+1}=g({x}_{k+1})$, if $\parallel {g}_{k+1}\parallel \le \epsilon $, then stop.

Step 4: Compute ${\beta}_{k+1}$ by (2.1), and generate ${d}_{k+1}$ by (1.3).

Step 5: Set $k=k+1$, go to step 2.
is always assumed to hold. Because it plays an important role in proving the global convergence of conjugate gradient methods. Fortunately, in this paper, the search direction ${d}_{k}$ satisfies the sufficient descent condition in the case of the strong Wolfe line search without any assumption.
From the above inequality, the conclusion (2.5) holds for $k+1$. Thus, the conclusion (2.5) holds for $k\in {N}^{+}$. □
3 Global convergence of Algorithm 2.1
In order to prove the global convergence of Algorithm 2.1, the following assumptions for the objective function are often used.
 (i)
The level set $\mathrm{\Omega}=\{x\mid f(x)\le f({x}_{1})\}$ is bounded, where ${x}_{1}$ is the starting point.
 (ii)In some neighborhood V of Ω, the objective function f is continuously differentiable, and its gradient is Lipschitz continuous, i.e., there exists a constant $L>0$ such that$\parallel g(x)g(y)\parallel \le L\parallel xy\parallel ,\phantom{\rule{1em}{0ex}}\text{for all}x,y\in V.$(3.1)
The conclusion of the following lemma, often called the Zoutendijk condition, is usually used to prove the global convergence properties of conjugate gradient methods. It was originally established by Zoutendijk [15].
Proof This lemma can be proved in a similar way as in [16], so we omit it. □
 (1)
there exists a constant $b>1$ such that ${\beta}_{k}^{\mathrm{VLS}}\le b$;
 (2)
there exists a constant $\tau >0$, such that $\parallel {x}_{k}{x}_{k1}\parallel \le \tau \Rightarrow {\beta}_{k}^{\mathrm{VLS}}\le \frac{1}{2b}$.
□
where ${\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau}\underline{\underline{\mathrm{\Delta}}}\{i\in {Z}^{+}:k\le i\le k+\mathrm{\Delta}1,\parallel {x}_{i}{x}_{i1}\parallel \ge \tau \}$, ${\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau}$ denotes the number of ${\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau}$.
Proof This lemma plays an important role in proving the global convergences of PRP, HS, and LS conjugate gradient methods, and so on. It was originally proved in [17]. From Remark 2.1 and Lemma 3.3, it is easy to find that Algorithm 2.1 leads to the conclusion of Lemma 3.4. □
where ${s}_{i1}={x}_{i}{x}_{i1}$.
Let τ come from Lemma 3.4, and we define $\mathrm{\Delta}=[8\xi /\tau ]$, where $8\xi /\tau \le \mathrm{\Delta}<(8\xi /\tau )+1$, and $\mathrm{\Delta}\in {Z}^{+}$.
Thus we complete the proof of Theorem 3.1. □
4 Numerical results
In this section, we compare the performance of Algorithm 2.1 with those of the PRP+ method [18] and the CGDESCENT method [4] in the number of function evaluations and CPU time in seconds with the strong Wolfe line search. The test problems are some largescaled unconstrained optimization problems in [19, 20]. The parameters in the line search are chosen as follows: $\delta =0.01$, $\sigma =0.1$. If ${\parallel {g}_{k}\parallel}_{\mathrm{\infty}}\le {10}^{6}$ is satisfied, we will terminate the program. All codes were written in Fortran 6.0 and run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system.
The numerical results of Algorithm 2.1 , PRP+ method and CGDESCENT method
Problems  Dim  Algorithm 2.1  PRP+ method  CGDESCENT method 

Extended Freudenstein & Roth  5,000  58∖0.07  1,144∖0.09  13,235∖1.24 
10,000  12∖0.01  426∖0.07  23∖0.01  
Extended Trigonometric  5,000  30∖0.07  61∖0.08  79∖0.10 
10,000  33∖0.19  112∖0.28  161∖0.40  
Extended Rosenbrock  5,000  41∖0.02  67∖0.02  60∖0.02 
10,000  34∖0.03  62∖0.03  57∖0.01  
Extended White & Holst  5,000  40∖0.02  53∖0.02  45∖0.02 
10,000  38∖0.01  43∖0.01  52∖0.02  
Extended Beale  5,000  15∖0.01  26∖0.02  24∖0.00 
10,000  15∖0.01  26∖0.00  24∖0.01  
Extended Penalty  5,000  11∖0.01  51∖0.00  1,979∖0.17 
10,000  18∖0.02  36∖0.02  50∖0.02  
Perturbed Quadratic  5,000  705∖0.35  1,462∖0.42  1,471∖0.36 
10,000  1,353∖0.88  2,059∖1.19  2,014∖0.95  
Raydan 2  5,000  9∖0.00  9∖0.00  9∖0.00 
10,000  9∖0.02  9∖0.02  9∖0.02  
Diagonal 2  5,000  432∖0.44  987∖0.67  699∖0.45 
10,000  595∖1.28  1,117∖1.52  1,209∖1.60  
Generalized Tridiagonal 1  5,000  42∖0.02  2,013∖0.27  53∖0.01 
10,000  71∖0.14  578∖0.15  707∖0.19  
Extended Tridiagonal 1  5,000  12∖0.00  23∖0.00  28∖0.00 
10,000  12∖0.01  28∖0.02  29∖0.01  
Extended Three Expo Terms  5,000  8∖0.01  21∖0.01  15∖0.02 
10,000  12∖0.04  19∖0.05  15∖0.03  
Generalized Tridiagonal 2  5,000  50∖0.03  94∖0.03  95∖0.03 
10,000  62∖0.05  97∖0.06  77∖0.03  
Diagonal 4  5,000  8∖0.00  8∖0.00  8∖0.00 
10,000  8∖0.00  8∖0.00  8∖0.00  
Diagonal 5  5,000  9∖0.01  9∖0.02  9∖0.02 
10,000  9∖0.03  9∖0.03  9∖0.03  
Extended Himmelblau  5,000  18∖0.00  35∖0.00  16∖0.00 
10,000  18∖0.02  35∖0.03  16∖0.00  
Generalized PSC1  5,000  1,886∖3.20  633∖0.60  17,679∖14.59 
10,000  729∖2.61  1,271∖2.39  8,364∖13.94  
Extended PSC1  5,000  17∖0.02  13∖0.01  16∖0.02 
10,000  17∖0.03  15∖0.02  16∖0.03  
Extended Powell  5,000  46∖0.03  138∖0.03  250∖0.05 
10,000  74∖0.06  88∖0.04  311∖0.09  
Extended BlockDiagonal BD1  5,000  55∖0.02  40∖0.01  33∖0.01 
10,000  53∖0.06  47∖0.05  46∖0.05  
Extended Maratos  5,000  71∖0.03  132∖0.03  103∖0.02 
10,000  69∖0.02  96∖0.03  99∖0.03  
Quadratic Diagonal Perturbed  5,000  793∖0.22  880∖0.22  2,111∖0.39 
10,000  1,549∖0.62  1,303∖0.66  2,966∖1.16  
Extended Wood  5,000  85∖0.02  57∖0.01  135∖0.01 
10,000  84∖0.04  65∖0.03  116∖0.05  
Extended Hiebert  5,000  176∖0.04  137∖0.03  120∖0.03 
10,000  173∖0.05  137∖0.05  114∖0.03  
QuadraticQF1  5,000  1,222∖0.25  1,854∖0.50  1,397∖0.30 
10,000  1,396∖0.82  1,864∖0.97  2,545∖1.06  
Extended Quadratic Penalty QP2  5,000  45∖0.05  80∖0.06  76∖0.06 
10,000  43∖0.11  71∖0.13  84∖0.14  
QuadraticQF2  5,000  1,167∖0.40  1,620∖0.44  1,613∖0.36 
10,000  1,430∖1.14  2,625∖1.45  2,941∖1.31  
Extended EP1  5,000  6∖0.00  6∖0.00  6∖0.00 
10,000  413∖0.21  513∖0.25  439∖0.22  
Extended Tridiagonal 2  5,000  875∖0.23  2,436∖0.27  857∖0.11 
10,000  5,139∖1.17  5,857∖1.29  6,569∖1.47  
ARWHEAD  5,000  16∖0.00  16∖0.00  32∖0.01 
10,000  11∖0.00  14∖0.00  11∖0.00  
NONDIA  5,000  15∖0.00  15∖0.00  17∖0.00 
10,000  15∖0.02  16∖0.00  17∖0.02  
DQDRTIC  5,000  17∖0.00  19∖0.01  21∖0.00 
10,000  26∖0.01  25∖0.00  23∖0.01  
DIXMAANA  5,000  11∖0.02  12∖0.02  16∖0.00 
10,000  11∖0.01  12∖0.01  14∖0.02  
DIXMAANB  5,000  21∖0.01  20∖0.02  23∖0.00 
10,000  21∖0.02  20∖0.02  23∖0.03  
DIXMAANC  5,000  22∖0.01  24∖0.01  28∖0.02 
10,000  22∖0.01  25∖0.02  28∖0.01  
Broyden Tridiagonal  5,000  77∖0.02  125∖0.03  132∖0.03 
10,000  76∖0.04  121∖0.08  110∖0.05  
Almost Perturbed Quadratic  5,000  1,156∖0.30  1,479∖0.39  1,448∖0.31 
10,000  1,906∖0.95  2,198∖1.19  2,149∖0.92  
Tridiagonal Perturbed Quadratic  5,000  1,489∖0.37  1,783∖0.53  1,562∖0.41 
10,000  1,140∖1.15  1,879∖1.11  2,477∖1.25  
EDENSCH  5,000  325∖0.04  362∖0.05  2,157∖0.33 
10,000  1,106∖0.26  1,492∖0.45  1,594∖0.47  
VARDIM  5,000  34∖0.00  46∖0.02  46∖0.00 
10,000  47∖0.02  52∖0.03  52∖0.03  
LIARWHD  5,000  24∖0.01  29∖0.02  34∖0.00 
10,000  28∖0.01  38∖0.01  41∖0.02  
Diagonal 6  5,000  9∖0.00  9∖0.00  9∖0.00 
10,000  9∖0.01  9∖0.02  9∖0.01  
DIXMAANG  5,000  1,186∖0.42  715∖0.39  660∖0.33 
10,000  2,261∖1.84  1,275∖1.42  1,042∖1.07  
DIXMAANI  5,000  524∖0.38  850∖0.47  702∖0.36 
10,000  658∖1.15  1,266∖1.37  985∖1.00  
DIXMAANJ  5,000  770∖0.35  829∖0.44  770∖0.38 
10,000  1,823∖1.19  1,108∖1.14  1,108∖1.09  
DIXMAANK  5,000  1,510∖0.52  715∖0.41  812∖0.41 
10,000  3,658∖2.03  963∖1.09  1,303∖1.31  
ENGVAL1  5,000  5,277∖0.55  7,474∖0.86  6,436∖0.69 
10,000  7,380∖1.72  7,196∖1.62  21,402∖4.56  
COSINE  5,000  30∖0.02  33∖0.02  31∖0.03 
10,000  30∖0.03  34∖0.04  31∖0.03  
DENSCHNB  5,000  10∖0.01  14∖0.02  13∖0.02 
10,000  10∖0.00  15∖0.00  13∖0.00  
DENSCHNF  5,000  19∖0.01  39∖0.01  35∖0.01 
10,000  19∖0.02  29∖0.02  31∖0.02  
SINQUAD  5,000  515∖0.51  976∖0.83  566∖0.47 
10,000  2,011∖2.61  2,116∖3.59  5,989∖10.08 
Obviously, Algorithm 2.1 is competitive to the PRP+ method and the CGDESCENT method in the number of function evaluations and CPU time. Thus, it is of great importance to study Algorithm 2.1.
Declarations
Acknowledgements
The author wishes to express their heartfelt thanks to the anonymous referees and the editor for their detailed and helpful suggestions for revising the manuscript.
Authors’ Affiliations
References
 Polak E, Ribière G: Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. Inform. Rech. Oper. 1969,3(16):35–43.MATHGoogle Scholar
 Polak BT: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 1969, 9: 94–112. 10.1016/00415553(69)900354View ArticleGoogle Scholar
 Liu Y, Storey C: Efficient generalized conjugate gradient algorithms. Part 1: theory. J. Optim. Theory Appl. 1992, 69: 129–137.MathSciNetView ArticleMATHGoogle Scholar
 Hager WW, Zhang H: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16: 170–192. 10.1137/030601880MathSciNetView ArticleMATHGoogle Scholar
 Powell MJD: Nonconvex minimization calculations and the conjugate gradient method. Lecture Notes in Mathematics 1066. In Numerical Analysis. Springer, Berlin; 1984:122–141.View ArticleGoogle Scholar
 Hager WW, Zhang H: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2: 35–58.MathSciNetMATHGoogle Scholar
 Zhang L, Zhou W, Li DH: A descent modified PolakRibièrePolyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 2006, 26: 629–640. 10.1093/imanum/drl016MathSciNetView ArticleMATHGoogle Scholar
 Li ZF, Chen J, Deng NY: A new conjugate gradient method and its global convergence properties. Math. Program. 1997, 78: 375–391.Google Scholar
 Liu J: Convergence properties of a class of nonlinear conjugate gradient methods. Comput. Oper. Res. 2013, 40: 2656–2661. 10.1016/j.cor.2013.05.013MathSciNetView ArticleGoogle Scholar
 Liu J, Du X: Global convergence of a modified LS method. Math. Probl. Eng. 2012., 2012: Article ID 910303Google Scholar
 Li M, Chen Y, Qu AP: Global convergence of a modified LiuStorey conjugate gradient method. U.P.B. Sci. Bull., Ser. A 2012, 74: 11–26.MathSciNetMATHGoogle Scholar
 Tang C, Wei Z, Li G: A new version of the LiuStorey conjugate gradient method. Appl. Math. Comput. 2007, 189: 302–313. 10.1016/j.amc.2006.11.098MathSciNetView ArticleMATHGoogle Scholar
 Liu J, Du X, Wang K: Convergence of descent methods with variable parameters. Acta Math. Appl. Sin. 2010, 33: 222–230. (in Chinese)MathSciNetMATHGoogle Scholar
 Wei Z, Yao S, Liu L: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183: 1341–1350. 10.1016/j.amc.2006.05.150MathSciNetView ArticleMATHGoogle Scholar
 Zoutendijk G: Nonlinear programming, computational methods. In Integer and Nonlinear Programming. Edited by: Abadie J. NorthHolland, Amsterdam; 1970:37–86.Google Scholar
 Li ZF, Chen J, Deng NY: Convergence properties of conjugate gradient methods with Goldstein line searches. J. China Agric. Univ. 1996,I(4):15–18.Google Scholar
 Dai YH, Yuan Y: Nonlinear Conjugate Gradient Method. Shanghai Scientific & Technical Publishers, Shanghai; 2000. (in Chinese)Google Scholar
 Powell MJD: Convergence properties of algorithms for nonlinear optimization. SIAM Rev. 1986, 28: 487–500. 10.1137/1028154MathSciNetView ArticleMATHGoogle Scholar
 Bongartz I, Conn AR, Gould NIM, Toint PL: CUTE: constrained and unconstrained testing environments. ACM Trans. Math. Softw. 1995, 21: 123–160. 10.1145/200979.201043View ArticleMATHGoogle Scholar
 Andrei N: An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10: 147–161.MathSciNetMATHGoogle Scholar
 Dolan ED, Moré JJ: Benchmarking optimization software with performance profiles. Math. Program. 2002, 91: 201–213. 10.1007/s101070100263MathSciNetView ArticleMATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.