Skip to main content

Global convergence of a modified conjugate gradient method

Abstract

A modified conjugate gradient method to solve unconstrained optimization problems is proposed which satisfies the sufficient descent condition in the case of the strong Wolfe line search, and its global convergence property is established simply. The numerical results show that the proposed method is promising for the given test problems.

MSC:90C26, 65H10.

1 Introduction

The nonlinear conjugate gradient method is one of the best methods to solve unconstrained optimization problems. It comprises a class of unconstrained optimization algorithms which is characterized by low memory requirements and strong local or global convergence properties. Therefore, a modified nonlinear conjugate gradient method is proposed and analyzed in this paper.

Consider the following unconstrained optimization problem:

min x R n f(x),
(1.1)

where f: R n R is a smooth function and its gradient is denoted by g.

The conjugate gradient methods for solving the above problem often use the following iterative rules:

x k + 1 = x k + α k d k ,
(1.2)

where x k is the current iterate, the stepsize α k is a positive scalar which is generated by some line search, and the search direction d k is defined by

d k = { g k , for  k = 1 ; g k + β k d k 1 , for  k 2 ,
(1.3)

where g k =f( x k ), β k is the conjugate parameter which determines the performances of the corresponding methods. There are many well-known parameters β k , such as

β k PRP = g k T ( g k g k 1 ) g k 1 2 ( Polak-Ribière-Polyak (PRP) [1, 2] ) , β k LS = g k T ( g k g k 1 ) d k 1 T g k 1 ( Liu-Storey (LS) [3] ) , β k HZ = ( y k 1 2 d k 1 y k 1 2 d k 1 T y k 1 ) T g k d k 1 T y k 1 ( Hager-Zhang [4] ) ,

where is the Euclidean norm. Their corresponding methods are generally called PRP, LS, and HZ conjugate gradient methods. If f is a strictly convex quadratic function, these methods are equivalent in the case that an exact line search is used. If f is non-convex, their behaviors may show some differences.

When the objective function is convex, Polak and Ribière [1] proved that the PRP method is globally convergent under the exact line search. But Powell [5] showed that the PRP method does not converge globally for some non-convex functions. However, in the past few years, the PRP method is generally believed to be the most efficient conjugate gradient method in practical computation. One remarkable property of the PRP method is that it essentially performs a restart if a bad direction occurs (see [6]). But Powell [5] constructed an example showing that the PRP method can cycle infinitely without approaching any stationary point even if an exact line search is used. This counter-example also indicates that the PRP method has a drawback in that it may not globally be convergent when the objective function is non-convex. Recently, Zhang et al. [7] proposed a descent modified PRP conjugate gradient method and proved its global convergence. The LS method has a similar property as the PRP method. The global convergence of the LS method with the Grippo-Lucidi line search has also been proved in [8]. Some researchers have further studied the LS method (see Liu [9], Liu and Du [10]). In addition, Hager and Zhang [4] gave another effective method, namely the CG-DESCENT method. It not only has stable convergence, but it also shows an effective numerical experiment result. In this method, the parameter β k is computed by β k =max{ β k HZ , η k }, where η k = 1 d k 1 min { η , g k 1 } , η>0.

In the next section, a modified conjugate gradient method is proposed. In Section 3, we prove the global convergence of the proposed method for non-convex functions in the case of the strong Wolfe line search. In Section 4, we report some numerical results.

2 The new algorithm

Recently, some people have studied some variants of the LS method. For example, Li et al. [11] proposed a modified LS method where the parameter β k is computed by

β k = g k T ( g k g k 1 ) d k 1 T g k 1 t g k g k 1 2 d k 1 T g k ( d k 1 T g k 1 ) 2 ,

where t> 1 4 is a constant. They proved the global convergence of the modified method with the Armijo line search and Wolfe line search. Tang et al. [12] proved the LS method with the new line search. Liu et al. [13] studied a modified LS method where the parameter β k is computed by

β k LS 2 = { g k T ( g k g k 1 ) ρ | g k T d k 1 | g k 1 T d k 1 if  min { 1 , ρ 1 ξ } g k 2 > | g k T g k 1 | , 0 else ,

where ρ>1+ξ, ξ>0. They proved the global convergence of the corresponding method with the Wolfe line search. In 2006, Wei et al. [14] proposed a modified PRP method where the parameter β k is obtained by

β k = g k T ( g k g k g k 1 g k 1 ) g k 1 2 .

They proved its global convergence with the exact line search, the strong Wolfe line search, and the Grippo-Lucidi line search, respectively. Their work overcomes the weak convergence of the PRP method. Inspired by their work, we consider a variant of LS method, i.e.

β k VLS = g k T ( g k t k g k 1 ) λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } ,
(2.1)

where t k = g k g k 1 , λ(0,1) and λ>2σ. Obviously, the denominator of (2.1) is a convex combination of d k 1 T g k 1 and max{0, g k T d k 1 } which may avoid the denominator of β k LS tending to zero. Now, we state formally the corresponding algorithm scheme for unconstrained optimization problems.

Algorithm 2.1

  • Step 0: Given an initial x 1 R n , ε0, λ=0.8. Set k=1.

  • Step 1: If g 1 ε, then stop.

  • Step 2: Compute α k by the strong Wolfe line search (0<δ<σ< 1 2 ):

    f( x k + α k d k )f( x k )+δ α k g k T d k ,
    (2.2)
    |g ( x k + α k d k ) T d k |σ g k T d k .
    (2.3)
  • Step 3: Let x k + 1 = x k + α k d k , g k + 1 =g( x k + 1 ), if g k + 1 ε, then stop.

  • Step 4: Compute β k + 1 by (2.1), and generate d k + 1 by (1.3).

  • Step 5: Set k=k+1, go to step 2.

In some references, the sufficient descent condition

g k T d k c g k 2 ,c>0,
(2.4)

is always assumed to hold. Because it plays an important role in proving the global convergence of conjugate gradient methods. Fortunately, in this paper, the search direction d k satisfies the sufficient descent condition in the case of the strong Wolfe line search without any assumption.

Lemma 2.1 Let the sequences { g k } and { d k } be generated by Algorithm 2.1, then we obtain

g k T d k ( 1 2 σ λ ) g k 2 .
(2.5)

Proof The conclusion can be proved by induction. Since g 1 T d 1 = g 1 2 , the conclusion (2.5) holds for k=1. Now we assume that the conclusion (2.5) holds for k1 and g k + 1 0. One gets from (1.3) that

g k + 1 T d k + 1 g k + 1 2 = 1 β k + 1 VLS g k + 1 T d k g k + 1 2 1 | β k + 1 VLS | | g k + 1 T d k | g k + 1 2 1 g k + 1 2 + g k + 1 g k | g k + 1 T g k | λ | g k T d k | | g k + 1 T d k | g k + 1 2 1 2 g k + 1 2 λ | g k T d k | σ | g k T d k | g k + 1 2 = 1 2 σ λ .

From the above inequality, the conclusion (2.5) holds for k+1. Thus, the conclusion (2.5) holds for k N + . □

Remark 2.1 From (2.5) and the definition of β k VLS , it is not difficult to find that

β k VLS = g k T ( g k t k g k 1 ) λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } g k 2 g k g k 1 | g k T g k 1 | λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } g k 2 g k g k 1 g k T g k 1 λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } = 0 .

3 Global convergence of Algorithm 2.1

In order to prove the global convergence of Algorithm 2.1, the following assumptions for the objective function are often used.

Assumption (H)

  1. (i)

    The level set Ω={xf(x)f( x 1 )} is bounded, where x 1 is the starting point.

  2. (ii)

    In some neighborhood V of Ω, the objective function f is continuously differentiable, and its gradient is Lipschitz continuous, i.e., there exists a constant L>0 such that

    g ( x ) g ( y ) Lxy,for all x,yV.
    (3.1)

From Assumption (H), there exists a constant r ˜ >0 such that

g k r ˜ ,for all k.

The conclusion of the following lemma, often called the Zoutendijk condition, is usually used to prove the global convergence properties of conjugate gradient methods. It was originally established by Zoutendijk [15].

Lemma 3.1 Suppose Assumption (H) holds. Let the sequences { g k } and { d k } be generated by Algorithm 2.1, then we have

k 1 ( g k T d k ) 2 d k 2 <+.
(3.2)

Lemma 3.2 Suppose Assumption (H) holds. Let the sequences { g k } and { d k } be generated by Algorithm 2.1, and let there exist a constant r>0 such that

g k r,for all k1.
(3.3)

Then we have

k 2 u k u k 1 2 <+, u k = d k d k .

Proof This lemma can be proved in a similar way as in [16], so we omit it. □

Lemma 3.3 Suppose Assumption (H) holds. Let the sequences { g k } and { d k } be generated by Algorithm 2.1, and let the sequence { g k } satisfy

0<r g k r ˜ ,for all k1.
(3.4)

Then the conjugate parameter β k VLS has property (), i.e.,

  1. (1)

    there exists a constant b>1 such that | β k VLS |b;

  2. (2)

    there exists a constant τ>0, such that x k x k 1 τ| β k VLS | 1 2 b .

Proof It follows from (2.1), (3.4), and (2.5) that

| β k VLS | = | g k T ( g k t k g k 1 ) λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } | g k ( g k + r ˜ r g k 1 ) λ | d k 1 T g k 1 | g k ( g k + r ˜ r g k 1 ) ( λ 2 σ ) g k 1 2 r ˜ ( r ˜ + r ˜ 2 r ) ( λ 2 σ ) r 2 r ˜ 2 ( r + r ˜ ) ( λ 2 σ ) r 3 = b .

Define τ= ( λ 2 σ ) r 2 4 L r ˜ b . Let x k x k 1 τ, it then follows from Assumption (H)(ii) that

| β k VLS | = | g k T ( g k t k g k 1 ) λ d k 1 T g k 1 + ( 1 λ ) max { 0 , g k T d k 1 } | g k ( g k g k 1 + g k 1 t k g k 1 ) λ | d k 1 T g k 1 | r ˜ ( L τ + g k 1 t k g k 1 ) λ | d k 1 T g k 1 | r ˜ ( L τ + | g k g k 1 | ) ( λ 2 σ ) g k 1 2 r ˜ ( L τ + g k g k 1 ) ( λ 2 σ ) g k 1 2 2 L τ r ˜ ( λ 2 σ ) r 2 = 1 2 b .

 □

Lemma 3.4 Suppose Assumption (H) holds. Consider any method of (1.2)-(1.3), where β k 0, and where α k satisfies the strong Wolfe line search. If β k has the property (), and (2.5) and (3.4) hold, then there exists a constant τ>0, for any Δ Z + and k 0 Z + , and for any k k 0 such that

| k , Δ τ |> Δ 2 ,

where k , Δ τ Δ ̲ ̲ {i Z + :kik+Δ1, x i x i 1 τ}, | k , Δ τ | denotes the number of k , Δ τ .

Proof This lemma plays an important role in proving the global convergences of PRP, HS, and LS conjugate gradient methods, and so on. It was originally proved in [17]. From Remark 2.1 and Lemma 3.3, it is easy to find that Algorithm 2.1 leads to the conclusion of Lemma 3.4. □

Theorem 3.1 Suppose Assumption (H) holds. Let the sequences { g k } and { d k } be generated by Algorithm 2.1. If β k VLS has the property (), and (2.5) holds, then we obtain

lim inf k + g k =0.
(3.5)

Proof Using mathematical induction. Suppose that (3.5) does not hold, which means that there exists r>0 such that

g k r,for all k1.
(3.6)

We also define u k = d k d k , then for all l,k Z + (lk), we have

x l x k 1 = i = k l x i x i 1 u i 1 = i = k l s i 1 u k 1 + i = k l s i 1 ( u i 1 u k 1 ) ,
(3.7)

where s i 1 = x i x i 1 .

From Assumption (H), we know that there exists a constant ξ>0 such that

xξ,for xV.
(3.8)

By (3.7), we have

i = k l s i 1 u k 1 =( x l x k 1 ) i = k l s i 1 ( u i 1 u k 1 ).
(3.9)

Since (3.8) and (3.9) hold, we have

i = k l s i 1 2ξ+ i = k l s i 1 u i 1 u k 1 .
(3.10)

Let τ come from Lemma 3.4, and we define Δ=[8ξ/τ], where 8ξ/τΔ<(8ξ/τ)+1, and Δ Z + .

From Lemma 3.2, we know that there exists k 0 such that

i k 0 u i + 1 u i 2 1 4 Δ .
(3.11)

From the Cauchy-Schwarz inequality and (3.11), and letting i[k,k+Δ1], we have

u i 1 u k 1 j = k i 1 u j u j 1 ( i k ) 1 2 ( j = k i 1 u j u j 1 2 ) 1 2 Δ 1 2 ( 1 4 Δ ) 1 2 = 1 2 .
(3.12)

From Lemma 3.4, we know that there exists k k 0 such that

| k , Δ τ |> Δ 2 .
(3.13)

By (3.10), (3.12), and (3.13), we have

2ξ 1 2 i = k k + Δ 1 s i 1 > τ 2 | k , Δ τ |> τ Δ 4 .
(3.14)

From (3.14), we have Δ<8ξ/τ, which is a contradiction with the definition of Δ. Therefore,

lim inf k + g k =0.

Thus we complete the proof of Theorem 3.1. □

4 Numerical results

In this section, we compare the performance of Algorithm 2.1 with those of the PRP+ method [18] and the CG-DESCENT method [4] in the number of function evaluations and CPU time in seconds with the strong Wolfe line search. The test problems are some large-scaled unconstrained optimization problems in [19, 20]. The parameters in the line search are chosen as follows: δ=0.01, σ=0.1. If g k 10 6 is satisfied, we will terminate the program. All codes were written in Fortran 6.0 and run on a PC with 2.0 GHz CPU processor and 512 MB memory and Windows XP operation system.

The numerical results are reported in Table 1. The first column ‘Problems’ represents the problem’s name in [19, 20]. ‘Dim’ denotes the dimension of the test problems. The detailed numerical results are listed in the form NFCPU, where NF and CPU denote the number of function evaluations and CPU time in seconds, respectively.

Table 1 The numerical results of Algorithm 2.1 , PRP+ method and CG-DESCENT method

We say that, in particular for the i th problem, the performance of the M1 method was better than the performance of M2 method, if the CPU time, or the number of function evaluations, of the M1 method was smaller than the CPU time, or the number of iterations of the M2 method, respectively. In order to estimate the whole effect, we apply the performance profiles of Dolan and Moré [21] in CPU time. From Table 1, some CPU times are zero. In order to have a comprehensive evaluation of the M1 and M2 methods in CPU time, we take the average value of the CPU time for each method, and denote av( M 1 ), av( M 2 ). Then we take the CPU time of each problem plus the average value of av( M 1 ) and av( M 2 ). According to their description, the top curve is the method that solved the most problems in a time that was within a factor τ of the best time; see Figure 1 and Figure 2. Using the same method, we also test on the number of function evaluations; see Figure 3 and Figure 4.

Figure 1
figure 1

Performance profiles with respect to CPU time in seconds.

Figure 2
figure 2

Performance profiles with respect to CPU time in seconds.

Figure 3
figure 3

Performance profiles with respect to the numbers of iterations.

Figure 4
figure 4

Performance profiles with respect to the numbers of iterations.

Obviously, Algorithm 2.1 is competitive to the PRP+ method and the CG-DESCENT method in the number of function evaluations and CPU time. Thus, it is of great importance to study Algorithm 2.1.

References

  1. Polak E, Ribière G: Note sur la convergence de méthodes de directions conjuguées. Rev. Fr. Inform. Rech. Oper. 1969,3(16):35–43.

    MATH  Google Scholar 

  2. Polak BT: The conjugate gradient method in extreme problems. USSR Comput. Math. Math. Phys. 1969, 9: 94–112. 10.1016/0041-5553(69)90035-4

    Article  Google Scholar 

  3. Liu Y, Storey C: Efficient generalized conjugate gradient algorithms. Part 1: theory. J. Optim. Theory Appl. 1992, 69: 129–137.

    Article  MathSciNet  MATH  Google Scholar 

  4. Hager WW, Zhang H: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16: 170–192. 10.1137/030601880

    Article  MathSciNet  MATH  Google Scholar 

  5. Powell MJD: Nonconvex minimization calculations and the conjugate gradient method. Lecture Notes in Mathematics 1066. In Numerical Analysis. Springer, Berlin; 1984:122–141.

    Chapter  Google Scholar 

  6. Hager WW, Zhang H: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2: 35–58.

    MathSciNet  MATH  Google Scholar 

  7. Zhang L, Zhou W, Li DH: A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 2006, 26: 629–640. 10.1093/imanum/drl016

    Article  MathSciNet  MATH  Google Scholar 

  8. Li ZF, Chen J, Deng NY: A new conjugate gradient method and its global convergence properties. Math. Program. 1997, 78: 375–391.

    Google Scholar 

  9. Liu J: Convergence properties of a class of nonlinear conjugate gradient methods. Comput. Oper. Res. 2013, 40: 2656–2661. 10.1016/j.cor.2013.05.013

    Article  MathSciNet  Google Scholar 

  10. Liu J, Du X: Global convergence of a modified LS method. Math. Probl. Eng. 2012., 2012: Article ID 910303

    Google Scholar 

  11. Li M, Chen Y, Qu A-P: Global convergence of a modified Liu-Storey conjugate gradient method. U.P.B. Sci. Bull., Ser. A 2012, 74: 11–26.

    MathSciNet  MATH  Google Scholar 

  12. Tang C, Wei Z, Li G: A new version of the Liu-Storey conjugate gradient method. Appl. Math. Comput. 2007, 189: 302–313. 10.1016/j.amc.2006.11.098

    Article  MathSciNet  MATH  Google Scholar 

  13. Liu J, Du X, Wang K: Convergence of descent methods with variable parameters. Acta Math. Appl. Sin. 2010, 33: 222–230. (in Chinese)

    MathSciNet  MATH  Google Scholar 

  14. Wei Z, Yao S, Liu L: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 2006, 183: 1341–1350. 10.1016/j.amc.2006.05.150

    Article  MathSciNet  MATH  Google Scholar 

  15. Zoutendijk G: Nonlinear programming, computational methods. In Integer and Nonlinear Programming. Edited by: Abadie J. North-Holland, Amsterdam; 1970:37–86.

    Google Scholar 

  16. Li ZF, Chen J, Deng NY: Convergence properties of conjugate gradient methods with Goldstein line searches. J. China Agric. Univ. 1996,I(4):15–18.

    Google Scholar 

  17. Dai YH, Yuan Y: Nonlinear Conjugate Gradient Method. Shanghai Scientific & Technical Publishers, Shanghai; 2000. (in Chinese)

    Google Scholar 

  18. Powell MJD: Convergence properties of algorithms for nonlinear optimization. SIAM Rev. 1986, 28: 487–500. 10.1137/1028154

    Article  MathSciNet  MATH  Google Scholar 

  19. Bongartz I, Conn AR, Gould NIM, Toint PL: CUTE: constrained and unconstrained testing environments. ACM Trans. Math. Softw. 1995, 21: 123–160. 10.1145/200979.201043

    Article  MATH  Google Scholar 

  20. Andrei N: An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10: 147–161.

    MathSciNet  MATH  Google Scholar 

  21. Dolan ED, Moré JJ: Benchmarking optimization software with performance profiles. Math. Program. 2002, 91: 201–213. 10.1007/s101070100263

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author wishes to express their heartfelt thanks to the anonymous referees and the editor for their detailed and helpful suggestions for revising the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuesha Wu.

Additional information

Competing interests

The author declares that she has no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X. Global convergence of a modified conjugate gradient method. J Inequal Appl 2014, 248 (2014). https://doi.org/10.1186/1029-242X-2014-248

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1029-242X-2014-248

Keywords