 Research
 Open access
 Published:
Modified nonlinear conjugate gradient method with sufficient descent condition for unconstrained optimization
Journal of Inequalities and Applications volume 2011, Article number: 57 (2011)
Abstract
In this paper, an efficient modified nonlinear conjugate gradient method for solving unconstrained optimization problems is proposed. An attractive property of the modified method is that the generated direction in each step is always descending without any line search. The global convergence result of the modified method is established under the general Wolfe line search condition. Numerical results show that the modified method is efficient and stationary by comparing with the wellknown PolakRibiérePolyak method, CGDESCENT method and DSPCG method using the unconstrained optimization problems from More and Garbow (ACM Trans Math Softw 7, 1741, 1981), so it can be widely used in scientific computation.
Mathematics Subject Classification (2010) 90C26 · 65H10
1 Introduction
The conjugate gradient method comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and strong local or global convergence properties. The purpose of this paper is to study the global convergence properties and practical computational performance of a modified nonlinear conjugate gradient method for unconstrained optimization without restarts, and with appropriate conditions.
In this paper, we consider the unconstrained optimization problem:
where f : R^{n} → R is a realvalued, continuously differentiable function.
When applied to the nonlinear problem (1.1), a nonlinear conjugate gradient method generates a sequence {x_{ k } }, k ≥ 1, starting from an initial guess x_{1} ∈ R^{n} , using the recurrence
where the positive step size α_{ k } is obtained by some line search, and the search direction d_{ k } is generated by the rule:
where g_{ k } = ▽f (x_{ k } ) and β_{ k } is a scalar. Wellknown formulas for β_{ k } are called LiuStorey (LS) formula, PolakRibiérePolyak (PRP) formula, and are given by
respectively, where symbol  ·  denotes the Euclidean norm and y_{k 1}= g_{ k }  g_{k 1}. Their corresponding methods generally specified as LS and PRP conjugate gradient methods. If f is a strictly convex quadratic function, the both methods are equivalent in the case that an exact line search is used. If f is nonconvex, their behaviors may be distinctly different. In the past two decades, the convergence properties of LS and PRP methods have been intensively studied by many researchers (e.g., [1–5]).
In practical computation, the PRP method, which is generally believed to be the most efficient conjugate gradient methods, and has got meticulous in recent years. One remarkable property of the method is that they essentially perform a restart if a bad direction occurs (see [6]). However, Powell [7] constructed an example showed that the method can cycle infinitely without approaching any stationary point even if an exact line search is used. This counterexample also indicates that the method has a drawback that it is impossible to be convergent when the objective function is nonconvex. Therefore, during the past few years, much effort has been investigated to create new formulae for β_{ k } , which not only possesses global convergence for general functions but also is superior to original method from the computation point of view (see [8–17]).
In this paper, we further study the conjugate gradient method for the solution of unconstrained optimization problems. Meanwhile, we focus our attention on the scalar for β_{ k } with [12]. We introduce a version of modified LS conjugate gradient method. An attractive property of the proposed method is that the generated directions are always descending. Besides, this property is independent of line search used and the convexity of objective function. Under the general Wolfe line search condition, we establish the global convergence of the proposed method. We also do some numerical experiments by using a large set of unconstrained optimization problems from [18], which indicates the proposed method possesses better performances when compared with the classic PRP method, CGDESCENT method and DSPCG method. This paper is organized as follows. In section 2, we propose our algorithm, some assumptions on objective function and some lemmas. In section 3, global convergence analysis is provided with the general Wolfe line search condition. In the last section, we perform the numerical experiments by using a set of large problems, and do some numerical comparisons with PRP method, CGDESCENT method and DSPCG method.
2 The sufficient descent property
Algorithom 2.1:
Step 1: Data x_{1} ∈ R^{n} , ε ≥ 0. Set d_{1} = g_{1}, if g_{1} ≤ ε, then stop.
Step 2: Compute α_{ k } by the general Wolfe line searches (σ_{1} ∈ (δ, 1), σ_{2} ≥ 0):
Step 3: Let x_{k+1}= x_{ k } + α_{ k } d_{ k } , g_{k+1}= g(x_{k+1}), if g_{k+1} ≤ ε, then stop.
Step 4: Generate d_{k+1}by (1.3) in which β_{k+1}is computed by
Step 5: Set k = k + 1, go to step 2.
In this paper, we prove the global convergence of the new algorithm under the following assumption.
Assumption (H):

(i)
The level set Ω = {x ∈ R^{n} f(x) ≤ f(x _{1})} is bounded, where x _{1} is the starting point.

(ii)
In a neighborhood V of Ω, f is continuously differentiable and its gradient g is Lipschitz continuous, namely, there exists a constant L > 0 such that
\left\rightg\left(x\right)g\left(y\right)\left\right\phantom{\rule{2.77695pt}{0ex}}\le L\left\rightxy\left\right,\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\phantom{\rule{2.77695pt}{0ex}}\forall x,\phantom{\rule{2.77695pt}{0ex}}y\in V.(2.4)
Obviously, from the Assumption (H) (i), there exists a positive constant ξ, so that
where ξ is the diameter of Ω.
From Assumption (H) (ii), we also know that there exists a constant \stackrel{\u0303}{r}>0, such that
On some studies of the conjugate gradient methods, the sufficient descent condition
plays an important role. Unfortunately, this condition is hard to hold. However, the following lemma proves the sufficient descent property of Algorithm 2.1 independent of any line search and the convexity of objective function.
Lemma 2.1 Consider any method (1.2)(1.3), where {\beta}_{k}={\beta}_{k}^{\mathsf{\text{VLS}}}. We get
Proof. Multiplying (1.3) by {g}_{k}^{T}, we have
From (2.3), if β_{ k } = 0, then
If {\beta}_{k}={\beta}_{k}^{\mathsf{\text{LS}}}u\cdot \frac{\left\right{y}_{k1}{}^{2}}{{\left({g}_{k1}^{T}{d}_{k1}\right)}^{2}}\cdot {g}_{k}^{T}{d}_{k1}, then from (1.4) and (2.8), we have
We apply the inequality
to the first term in (2.9) with
then we have
From the above inequality and (2.9), we have
From the above proof, we obtain that the conclusion (2.7) holds.
□
3 Global convergence of the modified method
The conclusion of the following lemma, often called the Zoutendijk condition, is used to prove the global convergence of nonlinear conjugate gradient methods. It was originally given by Zoutendijk [19] under the Wolfe line searches. In the following lemma, we will prove the Zoutendijk condition under the general Wolfe line searches.
Lemma 3.1 Suppose Assumption (H) holds. Consider iteration of the form (1.2)(1.3), where d_{ k } satisfies {g}_{k}^{T}{d}_{k}<0 for k ∈ N^{+} and α_{ k } satisfies the general Wolfe line searches. Then
Proof. From (2.2) and Assumption (H) (ii), we have
then
From (2.1) and the equality above, we get
From Assumption (H) (i), and combining this inequality, we have
□
Lemma 3.2 Suppose Assumption (H) holds. Consider the method (1.2)(1.3), where {\beta}_{k}={\beta}_{k}^{\mathsf{\text{VLS}}}, and α_{ k } satisfies the general Wolfe line searches. If there exists a positive constant r, such that
then we have
d_{ k }  ≠ 0, for each k and {\sum}_{k\ge 2}\left\right{u}_{k}{u}_{k1}{}^{2}<+\infty ,
where {u}_{k}=\frac{{d}_{k}}{\left\right{d}_{k}\left\right}.
Proof. From (3.2), it follows from the descent property of Lemma 2.1 that d_{ k } ≠ 0 for each k. Define
By (1.3), we have
Since the u_{ k } is unit vector, we have
Since δ_{ k } ≥ 0, it follows that
From (3.1) and (3.2), we have
then
By (3.3), we have
□
Lemma 3.3 Suppose Assumption (H) holds. Consider the method (1.2)(1.3), where {\beta}_{k}={\beta}_{k}^{\mathsf{\text{VLS}}}, and α_{ k } satisfies the general line searches. If (3.2) holds, we have
where \rho =\frac{L\cdot \xi}{\left(1\frac{1}{4u}\right){r}^{2}}\left(\stackrel{\u0303}{r}+uL\cdot \xi \cdot max\phantom{\rule{0.3em}{0ex}}\left\{{\sigma}_{1},{\sigma}_{2}\right\}\right).
Proof. Define s_{k 1}= x_{ k }  x_{k 1}. From (2.2), we have
By (1.4), (2.3), (2.4)(2.7) and (3.5), we have
□
Theorem 3.1 Suppose Assumption (H) holds. Consider the method (1.2)(1.3), where {\beta}_{k}={\beta}_{k}^{\mathsf{\text{VLS}}} and α_{ k } satisfies the general line searches, then either g_{ k } = 0 for some k or
Proof. If g_{ k } = 0 for some k, we have the conclusion. In the following, we suppose that g_{ k } ≠ 0 for all k, then (3.2) holds, and we can obtain a contradiction.
We also define {u}_{i}=\frac{{d}_{i}}{\left\right{d}_{i}\left\right}, then for any l, k ∈ Z^{+}, and l > k, we have
By the triangle inequality, we have
Let Δ be a positive integer, chosen large enough that
where ξ and ρ appear in Lemma 3.3.
Since the conclusion of Lemma 3.2, there exits a k_{0} large enough such that
If ∀i ∈ [k + 1, k + Δ], then by (3.7) and the CauchySchwarz inequality, we have
Combining this inequality and (3.6), we have
then
when ∀l ∈ [k + 1, k + Δ].
Define \lambda =\frac{L}{\left(1\frac{1}{4u}\right){r}^{2}}\left(\stackrel{\u0303}{r}+uL\cdot max\phantom{\rule{0.3em}{0ex}}\left\{{\sigma}_{1},{\sigma}_{2}\right\}\right). From Lemma 3.3, we have
Define S_{ i } = 2λ^{2} s_{ i } ^{2}. By (1.3) and (2.6), for ∀l ≥ k_{0} + 1, we have
From the inequality above, we have
By the inequality above, the product is defined to be one whenever the index range is vacuous. Let us consider a product of Δ consecutive S_{ i } , where k ≥ k_{0}. Combining (2.5), (3.7) and (3.9), by the arithmeticgeometric mean inequality, we have
then the sum in (3.10) is bounded, independent of l.
From Lemma 3.2 and (3.2), we have
which contradicts the result of the Theorem 3.1 that the bound for d_{ l } , independent of l > k_{0}. Hence,
□
4 Numerical results
In this section, we compare the modified conjugate gradient method, denoted VLS method, to the PRP method, CGDESCENT (η = 0.01) method [12] and DSPCG (C = 0.5) method [17] in the average performance and the CPU time performance under the general Wolfe line search where δ = 0.01, σ_{1} = σ_{2} = 0.1 and u = 0.5. The tested 78 problems come from [18], and the termination condition of the experiments is g_{ k }  ≤ 10^{6}, or Itmax > 9999 where Itmax denotes the maximal number of iterations. All codes were written in Mat lab 7.0 and run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system.
The numerical results of our tests are reported in Table 1. The first column "N" represents the problem's index which corresponds to "N" in Table 2. The detailed numerical results are listed in the form NI/NF/NG/CPU, where NI, NF, NG, CPU denote the number of iterations, function evaluations, gradient evaluations and the time of the CPU in seconds, respectively. "Dim" denotes the dimension of the test problem. If the limit of iteration was exceeded the run was stopped, this is indicated by NaN. In the Table 2, "Problem" represents the problem's name in [18].
Firstly, in order to rank the average performance of all above conjugate gradient methods, one can compute the total number of function and gradient evaluation by the formula
where l is some integer. According to the results on automatic differentiation [20, 21], the value of l can be set to 5, i.e.
That is to say, one gradient evaluation is equivalent to five function evaluations if automatic differentiation is used.
By making used of (4.2), we compare the VLS method with DSPCG method, PRP method and CGDESCENT method as follows: for the i th problem, compute the total numbers of function evaluations and gradient evaluations required by the VLS method, DSPCG method, PRP method and CGDESCENT method by formula (4.2), and denote them by N_{total,i}(VLS), N_{total,i}(DSPCG), N_{total,i}(PRP) and N_{total,i}(CGDESCENT), respectively. Then we calculate the ratio
If the i_{0}th problem is not run by the method, we use a constant λ = max{γ_{ i } (method)i ∈ S_{1}} instead of {\gamma}_{{i}_{0}} (the method), where S_{1} denotes the set of the test problems which can be run by the method. The geometric mean of these ratios for VLS method over all the test problems is defined by
where S denotes the set of the test problems, and S denotes the number of elements in S. One advantage of the above rule is that, the comparison is relative and hence does not be dominated by a few problems for which the method requires a great deal of function evaluations and gradient functions.
According to the above rule, it is clear that γ (VLS) = 1. The values of γ (DSPCG), γ (PRP) and γ (CGDESCENT) are listed in Table 3.
Secondly, we adopt the performance profiles by Dolan and Moré [22] to compare the VLS method to the DSPCG method, PRP method and CGDESCENT method in the CPU time performance (see Figure 1) In Figure 1,
That is, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is fastest; the right side gives the percentage of the test problems that were successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time. Since the top curve in Figure 1 corresponds to VLS method, this method is clearly fastest for this set for 78 test problems. In particular, the VLS method is fastest for about 60% of the test problems, and it ultimately solves 100% of the test problems.
From Table 3 and Figure 1, it is clear that the VLS method performs better in the average performance and the CPU time performance, which implies that the proposed modified method is computationally efficient.
References
Liu Y, Storey C: Efficient generalized conjugate gradient algorithms. Part 1: theory. J Optim Theory Appl 1992, 69: 129–137.
Polak E, Ribire G: Note sur la convergence de directions conjugees. Rev Francaise Informat Recherche Operatinelle 3e Annee 1969, 16: 35–43.
Polak BT: The conjugate gradient method in extreme problems. USSR Comput Math Math Phys 1969, 9: 94–112. 10.1016/00415553(69)900354
Gaohang Yu, Yanlin Zhao, Zengxin Wei: A descent nonlinear conjugate gradient method for largescale unconstrained optimization. Appl Math Comput 2007, 187: 636–643. 10.1016/j.amc.2006.08.087
Jinkui Liu, Xianglin Du, Kairong Wang: Convergence of descent methods with variable parameters. Acta Math Appl Sin 2010, 33: 222–230. (in Chinese)
Hager WW, Zhang H: A survey of nonlinear conjugate gradient methods. Pac J Optim 2006, 2: 35–58.
Powell MJD: Nonconvex minimization calculations and the conjugate gradient method. In Numerical Analysis (Dundee, 1983). Lecture Notes in Mathematics. Volume 1066. Springer, Berlin; 1984:122–141.
Andrei N: Scaled conjugate gradient algorithms for unconstrained optimization. Comput Optim Appl 2007, 38: 401–416. 10.1007/s1058900790557
Andrei N: Another nonlinear conjugate gradient algorithm for unconstrained optimization. Optim Methods Softw 2009, 24: 89–104. 10.1080/10556780802393326
Birgin EG, Martínez JM: A spectral conjugate gradient method for unconstrained optimization. Appl Math Optim 2001, 43: 117–128. 10.1007/s0024500100030
Dai YH, Liao LZ: New conjugacy conditions and related nonlinear conjugate gradient methods. Appl Math Optim 2001, 43: 87–101. 10.1007/s002450010019
Hager WW, Zhang H: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J Optim 2005, 16: 170–192. 10.1137/030601880
Li G, Tang C, Wei Z: New conjugacy condition and related new conjugate gradient methods for unconstrained optimization. J Comput Appl Math 2007, 202: 523–539. 10.1016/j.cam.2006.03.005
Wei Z, Li G, Qi L: New quasiNewton methods for unconstrained optimization problems. Appl Math Comput 2006, 175: 1156–1188. 10.1016/j.amc.2005.08.027
Zhang L, Zhou W, Li DH: A descent modified PolakRibiérePolyak conjugate gradient method and its global convergence. IMA J Numer Anal 2006, 26: 629–640. 10.1093/imanum/drl016
Yuan G: modified nonlinear conjugate gradient method with sufficient descent property for largescale optimization problems. Optim Lett 2009, 3: 11–21. 10.1007/s1159000800865
Gaohang Yu, Lutai Guan, Wufan Chen: Spectral conjugate gradient methods with sufficient descent property for largescale unconstrained optimization. Optim Methods Softw 2008, 23: 275–293. 10.1080/10556780701661344
More JJ, Garbow BS, Hillstrome KE: Testing unconstrained optimization software. ACM Trans Math Softw 1981, 7: 17–41. 10.1145/355934.355936
Zoutendijk G: Nonlinear programming, computational methods. In Integer and Nonlinear Programming. Edited by: Abadie J. NorthHolland, Amsterdam; 1970:37–86.
Dai Y, Ni Q: Testing different conjugate gradient methods for largescale unconstrained optimization. J Comput Math 2003, 213: 11–320.
Griewank A: On automatic differentiation. In Mathematical Programming: Recent Developments and Applications. Edited by: Iri M, Tannabe K. Kluwer Academic Publishers, Dordrecht; 1989:84–108.
Dolan ED, Moré JJ: Benchmarking optimization software with performance profiles. Math Program 2001, 91: 201–213.
Acknowledgements
The authors wish to express their heart felt thanks to the referees and Professor K. Teo for their detailed and helpful suggestions for revising the manuscript. At the same time, we are grateful for the suggestions of Lijuan Zhang. This work was supported by The Nature Science Foundation of Chongqing Education Committee (KJ091104, KJ101108) and Chongqing Three Gorges University (09ZZ060).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Jinkui Liu carried out the new method studies, designed all the steps of proof in this research and drafted the manuscript. Shaoheng Wang participated in writing the all codes of the algorithm and suggested many good ideas that made this paper possible. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Liu, J., Wang, S. Modified nonlinear conjugate gradient method with sufficient descent condition for unconstrained optimization. J Inequal Appl 2011, 57 (2011). https://doi.org/10.1186/1029242X201157
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/1029242X201157