 Research
 Open access
 Published:
Global convergence of a modified conjugate gradient method
Journal of Inequalities and Applications volume 2014, Article number: 248 (2014)
Abstract
A modified conjugate gradient method to solve unconstrained optimization problems is proposed which satisfies the sufficient descent condition in the case of the strong Wolfe line search, and its global convergence property is established simply. The numerical results show that the proposed method is promising for the given test problems.
MSC:90C26, 65H10.
1 Introduction
The nonlinear conjugate gradient method is one of the best methods to solve unconstrained optimization problems. It comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and strong local or global convergence properties. Therefore, a modified nonlinear conjugate gradient method is proposed and analyzed in this paper.
Consider the following unconstrained optimization problem:
where f:{R}^{n}\to R is a smooth function and its gradient is denoted by g.
The conjugate gradient methods for solving the above problem often use the following iterative rules:
where {x}_{k} is the current iterate, the stepsize {\alpha}_{k} is a positive scalar which is generated by some line search, and the search direction {d}_{k} is defined by
where {g}_{k}=\mathrm{\nabla}f({x}_{k}), {\beta}_{k} is the conjugate parameter which determines the performances of the corresponding methods. There are many wellknown parameters {\beta}_{k}, such as
where \parallel \cdot \parallel is the Euclidean norm. Their corresponding methods are generally called PRP, LS, and HZ conjugate gradient methods. If f is a strictly convex quadratic function, these methods are equivalent in the case that an exact line search is used. If f is nonconvex, their behaviors may show some differences.
When the objective function is convex, Polak and Ribière [1] proved that the PRP method is globally convergent under the exact line search. But Powell [5] showed that the PRP method does not converge globally for some nonconvex functions. However, in the past few years, the PRP method is generally believed to be the most efficient conjugate gradient method in practical computation. One remarkable property of the PRP method is that it essentially performs a restart if a bad direction occurs (see [6]). But Powell [5] constructed an example showing that the PRP method can cycle infinitely without approaching any stationary point even if an exact line search is used. This counterexample also indicates that the PRP method has a drawback in that it may not globally be convergent when the objective function is nonconvex. Recently, Zhang et al. [7] proposed a descent modified PRP conjugate gradient method and proved its global convergence. The LS method has a similar property as the PRP method. The global convergence of the LS method with the GrippoLucidi line search has also been proved in [8]. Some researchers have further studied the LS method (see Liu [9], Liu and Du [10]). In addition, Hager and Zhang [4] gave another effective method, namely the CGDESCENT method. It not only has stable convergence, but it also shows an effective numerical experiment result. In this method, the parameter {\beta}_{k} is computed by {\beta}_{k}=max\{{\beta}_{k}^{\mathrm{HZ}},{\eta}_{k}\}, where {\eta}_{k}=\frac{1}{\parallel {d}_{k1}\parallel min\{\eta ,\parallel {g}_{k1}\parallel \}}, \eta >0.
In the next section, a modified conjugate gradient method is proposed. In Section 3, we prove the global convergence of the proposed method for nonconvex functions in the case of the strong Wolfe line search. In Section 4, we report some numerical results.
2 The new algorithm
Recently, some people have studied some variants of the LS method. For example, Li et al. [11] proposed a modified LS method where the parameter {\beta}_{k} is computed by
where t>\frac{1}{4} is a constant. They proved the global convergence of the modified method with the Armijo line search and Wolfe line search. Tang et al. [12] proved the LS method with the new line search. Liu et al. [13] studied a modified LS method where the parameter {\beta}_{k} is computed by
where \rho >1+\xi, \xi >0. They proved the global convergence of the corresponding method with the Wolfe line search. In 2006, Wei et al. [14] proposed a modified PRP method where the parameter {\beta}_{k} is obtained by
They proved its global convergence with the exact line search, the strong Wolfe line search, and the GrippoLucidi line search, respectively. Their work overcomes the weak convergence of the PRP method. Inspired by their work, we consider a variant of LS method, i.e.
where {t}_{k}=\frac{\parallel {g}_{k}\parallel}{\parallel {g}_{k1}\parallel}, \lambda \in (0,1) and \lambda >2\sigma. Obviously, the denominator of (2.1) is a convex combination of {d}_{k1}^{T}{g}_{k1} and max\{0,{g}_{k}^{T}{d}_{k1}\} which may avoid the denominator of {\beta}_{k}^{\mathrm{LS}} tending to zero. Now, we state formally the corresponding algorithm scheme for unconstrained optimization problems.
Algorithm 2.1

Step 0: Given an initial {x}_{1}\in {R}^{n}, \epsilon \ge 0, \lambda =0.8. Set k=1.

Step 1: If \parallel {g}_{1}\parallel \le \epsilon, then stop.

Step 2: Compute {\alpha}_{k} by the strong Wolfe line search (0<\delta <\sigma <\frac{1}{2}):
f({x}_{k}+{\alpha}_{k}{d}_{k})\le f({x}_{k})+\delta {\alpha}_{k}{g}_{k}^{T}{d}_{k},(2.2)\leftg{({x}_{k}+{\alpha}_{k}{d}_{k})}^{T}{d}_{k}\right\le \sigma {g}_{k}^{T}{d}_{k}.(2.3) 
Step 3: Let {x}_{k+1}={x}_{k}+{\alpha}_{k}{d}_{k}, {g}_{k+1}=g({x}_{k+1}), if \parallel {g}_{k+1}\parallel \le \epsilon, then stop.

Step 4: Compute {\beta}_{k+1} by (2.1), and generate {d}_{k+1} by (1.3).

Step 5: Set k=k+1, go to step 2.
In some references, the sufficient descent condition
is always assumed to hold. Because it plays an important role in proving the global convergence of conjugate gradient methods. Fortunately, in this paper, the search direction {d}_{k} satisfies the sufficient descent condition in the case of the strong Wolfe line search without any assumption.
Lemma 2.1 Let the sequences \{{g}_{k}\} and \{{d}_{k}\} be generated by Algorithm 2.1, then we obtain
Proof The conclusion can be proved by induction. Since {g}_{1}^{T}{d}_{1}={\parallel {g}_{1}\parallel}^{2}, the conclusion (2.5) holds for k=1. Now we assume that the conclusion (2.5) holds for k\ge 1 and {g}_{k+1}\ne 0. One gets from (1.3) that
From the above inequality, the conclusion (2.5) holds for k+1. Thus, the conclusion (2.5) holds for k\in {N}^{+}. □
Remark 2.1 From (2.5) and the definition of {\beta}_{k}^{\mathrm{VLS}}, it is not difficult to find that
3 Global convergence of Algorithm 2.1
In order to prove the global convergence of Algorithm 2.1, the following assumptions for the objective function are often used.
Assumption (H)

(i)
The level set \mathrm{\Omega}=\{x\mid f(x)\le f({x}_{1})\} is bounded, where {x}_{1} is the starting point.

(ii)
In some neighborhood V of Ω, the objective function f is continuously differentiable, and its gradient is Lipschitz continuous, i.e., there exists a constant L>0 such that
\parallel g(x)g(y)\parallel \le L\parallel xy\parallel ,\phantom{\rule{1em}{0ex}}\text{for all}x,y\in V.(3.1)
From Assumption (H), there exists a constant \tilde{r}>0 such that
The conclusion of the following lemma, often called the Zoutendijk condition, is usually used to prove the global convergence properties of conjugate gradient methods. It was originally established by Zoutendijk [15].
Lemma 3.1 Suppose Assumption (H) holds. Let the sequences \{{g}_{k}\} and \{{d}_{k}\} be generated by Algorithm 2.1, then we have
Lemma 3.2 Suppose Assumption (H) holds. Let the sequences \{{g}_{k}\} and \{{d}_{k}\} be generated by Algorithm 2.1, and let there exist a constant r>0 such that
Then we have
Proof This lemma can be proved in a similar way as in [16], so we omit it. □
Lemma 3.3 Suppose Assumption (H) holds. Let the sequences \{{g}_{k}\} and \{{d}_{k}\} be generated by Algorithm 2.1, and let the sequence \{{g}_{k}\} satisfy
Then the conjugate parameter {\beta}_{k}^{\mathrm{VLS}} has property (\ast ), i.e.,

(1)
there exists a constant b>1 such that {\beta}_{k}^{\mathrm{VLS}}\le b;

(2)
there exists a constant \tau >0, such that \parallel {x}_{k}{x}_{k1}\parallel \le \tau \Rightarrow {\beta}_{k}^{\mathrm{VLS}}\le \frac{1}{2b}.
Proof It follows from (2.1), (3.4), and (2.5) that
Define \tau =\frac{(\lambda 2\sigma ){r}^{2}}{4L\tilde{r}b}. Let \parallel {x}_{k}{x}_{k1}\parallel \le \tau, it then follows from Assumption (H)(ii) that
□
Lemma 3.4 Suppose Assumption (H) holds. Consider any method of (1.2)(1.3), where {\beta}_{k}\ge 0, and where {\alpha}_{k} satisfies the strong Wolfe line search. If {\beta}_{k} has the property (\ast ), and (2.5) and (3.4) hold, then there exists a constant \tau >0, for any \mathrm{\Delta}\in {Z}^{+} and {k}_{0}\in {Z}^{+}, and for any k\ge {k}_{0} such that
where {\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau}\underline{\underline{\mathrm{\Delta}}}\{i\in {Z}^{+}:k\le i\le k+\mathrm{\Delta}1,\parallel {x}_{i}{x}_{i1}\parallel \ge \tau \}, {\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau} denotes the number of {\mathrm{\Re}}_{k,\mathrm{\Delta}}^{\tau}.
Proof This lemma plays an important role in proving the global convergences of PRP, HS, and LS conjugate gradient methods, and so on. It was originally proved in [17]. From Remark 2.1 and Lemma 3.3, it is easy to find that Algorithm 2.1 leads to the conclusion of Lemma 3.4. □
Theorem 3.1 Suppose Assumption (H) holds. Let the sequences \{{g}_{k}\} and \{{d}_{k}\} be generated by Algorithm 2.1. If {\beta}_{k}^{\mathrm{VLS}} has the property (\ast ), and (2.5) holds, then we obtain
Proof Using mathematical induction. Suppose that (3.5) does not hold, which means that there exists r>0 such that
We also define {u}_{k}=\frac{{d}_{k}}{\parallel {d}_{k}\parallel}, then for all l,k\in {Z}^{+} (l\ge k), we have
where {s}_{i1}={x}_{i}{x}_{i1}.
From Assumption (H), we know that there exists a constant \xi >0 such that
By (3.7), we have
Since (3.8) and (3.9) hold, we have
Let τ come from Lemma 3.4, and we define \mathrm{\Delta}=[8\xi /\tau ], where 8\xi /\tau \le \mathrm{\Delta}<(8\xi /\tau )+1, and \mathrm{\Delta}\in {Z}^{+}.
From Lemma 3.2, we know that there exists {k}_{0} such that
From the CauchySchwarz inequality and (3.11), and letting \mathrm{\forall}i\in [k,k+\mathrm{\Delta}1], we have
From Lemma 3.4, we know that there exists k\ge {k}_{0} such that
By (3.10), (3.12), and (3.13), we have
From (3.14), we have \mathrm{\Delta}<8\xi /\tau, which is a contradiction with the definition of Δ. Therefore,
Thus we complete the proof of Theorem 3.1. □
4 Numerical results
In this section, we compare the performance of Algorithm 2.1 with those of the PRP+ method [18] and the CGDESCENT method [4] in the number of function evaluations and CPU time in seconds with the strong Wolfe line search. The test problems are some largescaled unconstrained optimization problems in [19, 20]. The parameters in the line search are chosen as follows: \delta =0.01, \sigma =0.1. If {\parallel {g}_{k}\parallel}_{\mathrm{\infty}}\le {10}^{6} is satisfied, we will terminate the program. All codes were written in Fortran 6.0 and run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system.
The numerical results are reported in Table 1. The first column ‘Problems’ represents the problem’s name in [19, 20]. ‘Dim’ denotes the dimension of the test problems. The detailed numerical results are listed in the form NF∖CPU, where NF and CPU denote the number of function evaluations and CPU time in seconds, respectively.
We say that, in particular for the i th problem, the performance of the M1 method was better than the performance of M2 method, if the CPU time, or the number of function evaluations, of the M1 method was smaller than the CPU time, or the number of iterations of the M2 method, respectively. In order to estimate the whole effect, we apply the performance profiles of Dolan and Moré [21] in CPU time. From Table 1, some CPU times are zero. In order to have a comprehensive evaluation of the M1 and M2 methods in CPU time, we take the average value of the CPU time for each method, and denote av(\mathrm{M}1), av(\mathrm{M}2). Then we take the CPU time of each problem plus the average value of av(\mathrm{M}1) and av(\mathrm{M}2). According to their description, the top curve is the method that solved the most problems in a time that was within a factor τ of the best time; see Figure 1 and Figure 2. Using the same method, we also test on the number of function evaluations; see Figure 3 and Figure 4.
Obviously, Algorithm 2.1 is competitive to the PRP+ method and the CGDESCENT method in the number of function evaluations and CPU time. Thus, it is of great importance to study Algorithm 2.1.
References
Polak E, Ribière G: Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. Inform. Rech. Oper. 1969,3(16):35–43.
Polak BT: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 1969, 9: 94–112. 10.1016/00415553(69)900354
Liu Y, Storey C: Efficient generalized conjugate gradient algorithms. Part 1: theory. J. Optim. Theory Appl. 1992, 69: 129–137.
Hager WW, Zhang H: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16: 170–192. 10.1137/030601880
Powell MJD: Nonconvex minimization calculations and the conjugate gradient method. Lecture Notes in Mathematics 1066. In Numerical Analysis. Springer, Berlin; 1984:122–141.
Hager WW, Zhang H: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2: 35–58.
Zhang L, Zhou W, Li DH: A descent modified PolakRibièrePolyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 2006, 26: 629–640. 10.1093/imanum/drl016
Li ZF, Chen J, Deng NY: A new conjugate gradient method and its global convergence properties. Math. Program. 1997, 78: 375–391.
Liu J: Convergence properties of a class of nonlinear conjugate gradient methods. Comput. Oper. Res. 2013, 40: 2656–2661. 10.1016/j.cor.2013.05.013
Liu J, Du X: Global convergence of a modified LS method. Math. Probl. Eng. 2012., 2012: Article ID 910303
Li M, Chen Y, Qu AP: Global convergence of a modified LiuStorey conjugate gradient method. U.P.B. Sci. Bull., Ser. A 2012, 74: 11–26.
Tang C, Wei Z, Li G: A new version of the LiuStorey conjugate gradient method. Appl. Math. Comput. 2007, 189: 302–313. 10.1016/j.amc.2006.11.098
Liu J, Du X, Wang K: Convergence of descent methods with variable parameters. Acta Math. Appl. Sin. 2010, 33: 222–230. (in Chinese)
Wei Z, Yao S, Liu L: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183: 1341–1350. 10.1016/j.amc.2006.05.150
Zoutendijk G: Nonlinear programming, computational methods. In Integer and Nonlinear Programming. Edited by: Abadie J. NorthHolland, Amsterdam; 1970:37–86.
Li ZF, Chen J, Deng NY: Convergence properties of conjugate gradient methods with Goldstein line searches. J. China Agric. Univ. 1996,I(4):15–18.
Dai YH, Yuan Y: Nonlinear Conjugate Gradient Method. Shanghai Scientific & Technical Publishers, Shanghai; 2000. (in Chinese)
Powell MJD: Convergence properties of algorithms for nonlinear optimization. SIAM Rev. 1986, 28: 487–500. 10.1137/1028154
Bongartz I, Conn AR, Gould NIM, Toint PL: CUTE: constrained and unconstrained testing environments. ACM Trans. Math. Softw. 1995, 21: 123–160. 10.1145/200979.201043
Andrei N: An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10: 147–161.
Dolan ED, Moré JJ: Benchmarking optimization software with performance profiles. Math. Program. 2002, 91: 201–213. 10.1007/s101070100263
Acknowledgements
The author wishes to express their heartfelt thanks to the anonymous referees and the editor for their detailed and helpful suggestions for revising the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The author declares that she has no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Wu, X. Global convergence of a modified conjugate gradient method. J Inequal Appl 2014, 248 (2014). https://doi.org/10.1186/1029242X2014248
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1029242X2014248