Skip to main content

Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property

Abstract

In this paper, three modified Polak-Ribière-Polyak (PRP) conjugate gradient methods for unconstrained optimization are proposed. They are based on the two-term PRP method proposed by Cheng (Numer. Funct. Anal. Optim. 28:1217-1230, 2007), the three-term PRP method proposed by Zhang et al. (IMA J. Numer. Anal. 26:629-640, 2006), and the descent PRP method proposed by Yu et al. (Optim. Methods Softw. 23:275-293, 2008). These modified methods possess the sufficient descent property without any line searches. Moreover, if the exact line search is used, they reduce to the classical PRP method. Under standard assumptions, we show that these three methods converge globally with a Wolfe line search. We also report some numerical results to show the efficiency of the proposed methods.

Introduction

Consider the unconstrained optimization problem:

$$ \min f(x), x\in\mathcal{R}^{n}, $$
(1)

where \(f: \mathcal{R}^{n}\rightarrow\mathcal{R}\) is continuously differentiable, and its gradient \(g(x)\) is available. Conjugate gradient methods are efficient for solving (1), especially for large-scale problems. A conjugate gradient method generates an iterate sequence \(\{x_{k}\}\) by

$$ x_{k+1}=x_{k}+\alpha_{k} d_{k}, \quad k=0,1,\ldots, $$
(2)

where \(x_{k}\) is the current iterate, \(\alpha_{k}>0\) is the step size and computed by certain line search, and \(d_{k}\) is the search direction defined by

$$ d_{k}=\left \{ \begin{array}{l@{\quad}l}-g_{k},& \text{if } k=0, \\ -g_{k}+\beta_{k}d_{k-1}, &\text{if } k\geq1, \end{array} \right . $$
(3)

in which \(\beta_{k}\) is an important parameter. Generally, different conjugate gradient methods correspond to different choices of the parameter \(\beta_{k}\). Some well-known formulas for \(\beta_{k}\) include the Fletcher-Reeves (FR) [1], the Polak-Ribière-Polyak (PRP) [2, 3], the Liu-Storey (LS) [4], the Dai-Yuan (DY) [5], the Hestenes-Stiefel (HS) [6] and the conjugate descent (CD) [7] formulas. In this paper, we focus our attention on the PRP method, in which the parameter \(\beta_{k}\) is given by

$$ \beta_{k}^{\mathrm{PRP}}=\frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\| g_{k-1}\|^{2}}, $$
(4)

where \(\|\cdot\|\) is the 2-norm. In the convergence analysis and implementations of conjugate gradient methods, one often requires the line search to be an inexact line search such as a Wolfe line search, a strong Wolfe line search or an Armijo line search. The Wolfe line search is finding a step size \(\alpha _{k}\) satisfying

$$ \left \{ \begin{array}{l} f(x_{k}+\alpha_{k} d_{k})\leq f(x_{k})+\rho\alpha_{k}g_{k}^{\top}d_{k}, \\ g(x_{k}+\alpha_{k} d_{k})^{\top}d_{k}\geq\sigma g_{k}^{\top}d_{k}, \end{array} \right . $$
(5)

where \(0<\rho<\sigma<1\). The strong Wolfe line search is computing \(\alpha_{k}\) such that

$$ \left \{ \begin{array}{l} f(x_{k}+\alpha_{k} d_{k})\leq f(x_{k})+\rho\alpha_{k}g_{k}^{\top}d_{k}, \\ |g(x_{k}+\alpha_{k} d_{k})^{\top}d_{k}|\leq\sigma|g_{k}^{\top}d_{k}|, \end{array} \right . $$
(6)

where \(0<\rho<1/2\) and \(\sigma\in(\rho,1)\). The Armijo line search is finding a step size \(\alpha_{k}=\max\{\rho^{j}|j=0,1,\ldots\}\) satisfying

$$ f(x_{k}+\alpha_{k} d_{k})\leq f(x_{k})+\delta\alpha_{k}g_{k}^{\top}d_{k}, $$
(7)

where \(\delta\in(0,1)\) and \(\rho\in(0,1)\) are two constants.

The PRP method is generally regarded to be one of the most efficient conjugate gradient methods and has been studied by many researchers [2, 3, 8]. Polak and Ribière [2] proved that the PRP method with the exact line search is globally convergent under a strong convexity assumption for the objective function f. Gilbert and Nocedal [3] conducted an elegant analysis and showed that the PRP method is globally convergent if \(\beta_{k}^{\mathrm{PRP}}\) is restricted to be non-negative (denoted \(\beta_{k}^{\mathrm{PRP+}}\)) and \(\alpha_{k}\) is determined by a line search step satisfying the sufficient descent condition

$$ g_{k}^{\top}d_{k}\leq-c \|g_{k}\|^{2},\quad c>0, $$
(8)

in addition to the Wolfe line search condition (5). Grippo and Lucidi [8] proposed new line search conditions, which can ensure that the PRP method is globally convergent for nonconvex problems. However, the method given by Grippo and Lucidi [8] does not perform better than the PRP method, which employs \(\beta_{k}^{\mathrm{PRP+}}\) and the Wolfe line search in the numerical computations. Therefore, great attention is given to the problem of finding the methods which not only have global convergence but also have nice numerical performance [916].

Recently, two new conjugate gradient methods, obtained by modifying the PRP method, called two-term PRP method (denoted CTPRP) and three-term PRP method (denoted ZTPRP), have been proposed by Cheng [9] and Zhang et al. [10], respectively, in which the direction \(d_{k}\) is given by

$$d_{k}^{\mathrm{CTPRP}}=- \biggl(1+\beta_{k}^{\mathrm{PRP}} \frac{g_{k}^{\top}d_{k-1}}{\|g_{k}\|^{2}} \biggr)g_{k}+\beta_{k}^{\mathrm{PRP}}d_{k-1}, \quad \forall k\geq1 $$

or

$$d_{k}^{\mathrm{ZTPRP}}=-g_{k}+\beta_{k}^{\mathrm{PRP}}d_{k-1}- \theta _{k}y_{k-1},\quad \forall k\geq1, $$

where

$$y_{k-1}=g_{k}-g_{k-1}, \qquad \theta_{k}= \frac{g_{k}^{\top}d_{k-1}}{\|g_{k-1}\|^{2}}. $$

An attractive feature of the CTPRP method and the ZTPRP method is that they satisfy \(g_{k}^{\top}d_{k}=-\|g_{k}\|^{2}\), which is independent of line search used. Moreover, these two methods are globally convergent if some kind of modified Armijo type line search or strong Wolfe line search is used, and the presented numerical results in [9, 10] show some potential advantages of the proposed methods. Moreover, Yu et al. [11] proposed another type variation of PRP method, denoted YTPRP, whose direction is defined by

$$d_{k}^{\mathrm{YTPRP}}=-g_{k}+\beta_{k}^{\mathrm{YPRP}}d_{k-1}, \quad \forall k\geq1, $$

where

$$\beta_{k}^{\mathrm{YPRP}}=\beta_{k}^{\mathrm{PRP}}-C \frac{\|y_{k-1}\| ^{2}g_{k}^{\top}d_{k-1}}{\|g_{k-1}\|^{4}}\quad \text{and}\quad C>\frac{1}{4}. $$

An attractive feature of the \(d_{k}^{\mathrm{YTPRP}}\) is that it satisfies \(g_{k}^{\top}d_{k}\leq-(1-1/4C)\|g_{k}\|^{2}\), which is also independent of line search used.

Note that the global convergence of the above three methods is established under some Armijo type line search or strong Wolfe line search. It is well known that the step size generated by the Armijo line search maybe approaches zero, and thus the reduction of the objective function is very little. This slows down the optimization process. Obviously, the strong Wolfe line search can avoid this phenomenon when the parameter \(\sigma \rightarrow0^{+}\), and in this case, the strong Wolfe line search is close to the exact line search. Thus, the computational load of the strong Wolfe line search increases heavily. In fact, the Wolfe line search can also avoid the above phenomenon. However, compared with the strong Wolfe line search, the Wolfe line search needs less computation to get a suitable step size at each iteration. Therefore, the Wolfe line search can enhance the efficiency of the conjugate gradient method.

In this paper, we shall investigate some variations of PRP method under a Wolfe line search. In fact, we take a little modification to the \(\beta_{k}^{\mathrm{PRP}}\) and propose three modified PRP methods based on the iterate directions \(d_{k}^{\mathrm{CTPRP}}\), \(d_{k}^{\mathrm {ZTPRP}}\), and \(d_{k}^{\mathrm{YTPRP}}\), which possess not only the sufficient descent property for any line search but also global convergence with a Wolfe line search. In order to do so, the remainder of the paper is organized as follows: In Section 2, we propose the modified PRP methods and prove their convergence. In Section 3, we present some numerical results by using the test problems in [17]. Section 4 concludes the paper with final remarks.

Three modified PRP methods

First, we give the following basic assumption as regards the objection function \(f(x)\).

Assumptions

  1. (H1)

    The level set \(R_{0}=\{x|f(x)\leq f(x_{0})\}\) is bounded.

  2. (H2)

    In some neighborhood N of \(R_{0}\), the gradient \(g(x)\) is Lipschitz continuous on an open convex set B that contains \(R_{0}\), i.e., there exists a constant \(L>0\) such that

    $$\bigl\Vert g(x)-g(y)\bigr\Vert \leq L\|x-y\|, \quad \text{for any } x, y\in B. $$

Assumptions (H1) and (H2) imply that there exist positive constants γ and B such that

$$ \bigl\Vert g(x)\bigr\Vert \leq\gamma,\quad \forall x\in R_{0} $$
(9)

and

$$ \|x-y\|\leq B,\quad \forall x,y\in R_{0}. $$
(10)

Recently, Wei et al. [18] proposed a variation of the FR method which we call the VFR method, in which the parameter \(\beta_{k}\) is defined by

$$\beta_{k}^{\mathrm{VFR}}=\frac{\mu_{1}\|g_{k}\|^{2}}{\mu_{2}|g_{k}^{\top}d_{k-1}|+\mu _{3}\|g_{k-1}\|^{2}}, $$

where \(\mu_{1}\in(0,+\infty)\), \(\mu_{2}\in(\mu_{1}+\epsilon_{1},+\infty)\), \(\mu_{3}\in (0,+\infty)\), and \(\epsilon_{1}\) is any given positive constant. An attractive feature of the VFR method is that the sufficient descent condition \(g_{k}^{\top}d_{k}\leq-(1-\frac{\mu_{1}}{\mu_{2}})\|g_{k}\|^{2}\) always holds which is independent of the line search used. The idea of Wei et al. [18] was further extended to the Wei-Yao-Liu method by Dai and Wen [19]. Here, motivated by the ideas of Wei et al. [18] and Dai and Wen [19], we construct two modified PRP methods, in which the parameter \(\beta_{k}\) is specified as follows:

$$ \beta_{k}^{\mathrm{MPRP}}=\frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\mu |g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}} $$
(11)

or

$$ \beta_{k}^{\mathrm{MPRP+}}=\max \biggl\{ \frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}},0 \biggr\} , $$
(12)

where \(\mu\geq0\) is a constant. Obviously, if \(\mu=0\) or the line search is exact, the new parameter \(\beta_{k}^{\mathrm{MPRP}}\) or \(\beta _{k}^{\mathrm{MPRP+}}\) reduces to the classical parameter \(\beta _{k}^{\mathrm{PRP}}\) in [2] or \(\beta_{k}^{\mathrm{PRP+}}\) in [3].

First, using the parameter \(\beta_{k}^{\mathrm{MPRP}}\) and the direction \(d_{k}^{\mathrm{CTPRP}}\), we present the following conjugate gradient method (denoted the TMPRP1 method).

TMPRP1 method

(Two-term modified PRP method)

Step 0.:

Give an initial point \(x_{0}\in\mathcal{R}^{n}\), \(\mu\geq0\), \(0<\rho<\sigma<1\), and set \(d_{0}=-g_{0}\), \(k:=0\).

Step 1.:

If \(\|g_{k}\|=0\) then stop; otherwise go to Step 2.

Step 2.:

Compute \(d_{k}\) by

$$ d_{k}=\left \{ \begin{array}{l@{\quad}l} -g_{k},& \text{if } k=0, \\ - (1+\beta_{k}^{\mathrm{MPRP}}\frac{g_{k}^{\top}d_{k-1}}{\|g_{k}\| ^{2}} )g_{k}+\beta_{k}^{\mathrm{MPRP}}d_{k-1},& \text{if } k\geq1. \end{array} \right . $$
(13)

Determine the step size \(\alpha_{k}\) by Wolfe line search (5).

Step 3.:

Set \(x_{k+1}=x_{k}+\alpha_{k} d_{k}\), and \(k:=k+1\); go to Step 1.

Similarly, using the parameter \(\beta_{k}^{\mathrm{MPRP}}\) and the direction \(d_{k}^{\mathrm{ZTPRP}}\), we present the following conjugate gradient method (denoted the TMPRP2 method).

TMPRP2 method

(Three-term modified PRP method)

Step 0.:

Give an initial point \(x_{0}\in\mathcal{R}^{n}\), \(\mu\geq0\), \(0<\rho<\sigma<1\), and set \(d_{0}=-g_{0}\), \(k:=0\).

Step 1.:

If \(\|g_{k}\|=0\) then stop; otherwise go to Step 2.

Step 2.:

Compute \(d_{k}\) by

$$ d_{k}=\left \{ \begin{array}{l@{\quad}l} -g_{k},& \text{if } k=0, \\ -g_{k}+\beta_{k}^{\mathrm{MPRP}}d_{k-1}-\vartheta_{k} y_{k-1}, & \text{if } k\geq1, \end{array} \right . $$
(14)

where \(\vartheta_{k}=g_{k}^{\top}d_{k-1}/(\|g_{k-1}\|^{2}+\mu|g_{k}^{\top}d_{k-1}|)\). Determine the step size \(\alpha_{k}\) by Wolfe line search (5).

Step 3.:

Set \(x_{k+1}=x_{k}+\alpha_{k} d_{k}\), and \(k:=k+1\); go to Step 1.

Using a parameter similar to \(\beta_{k}^{\mathrm{YPRP}}\), we present the following conjugate gradient method (denoted the TMPRP3 method).

TMPRP3 method

(Three-term descent PRP method)

Step 0.:

Give an initial point \(x_{0}\in\mathcal{R}^{n}\), \(\mu\geq0\), \(t>1\), \(0<\rho<\sigma<1\), and set \(d_{0}=-g_{0}\), \(k:=0\).

Step 1.:

If \(\|g_{k}\|=0\) then stop; otherwise go to Step 2.

Step 2.:

Compute \(d_{k}\) by

$$ d_{k}=\left \{ \begin{array}{l@{\quad}l} -g_{k}, &\text{if } k=0, \\ -g_{k}+\beta_{k}^{\mathrm{VPRP}}d_{k-1}+\nu_{k} (y_{k-1}-s_{k-1}), &\text{if } k\geq1, \end{array} \right . $$
(15)

where

$$ \begin{aligned} &\beta_{k}^{\mathrm{VPRP}}=\frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\mu |g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}}-t \frac{\|y_{k-1}\|^{2}g_{k}^{\top}d_{k-1}}{(\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}}, \\ &\nu_{k}=\frac{g_{k}^{\top}d_{k-1}}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}}. \end{aligned} $$
(16)

Determine the step size \(\alpha_{k}\) by Wolfe line search (5).

Step 3.:

Set \(x_{k+1}=x_{k}+\alpha_{k} d_{k}\), and \(k:=k+1\); go to Step 1.

Remark 2.1

If the constant \(\mu=0\), then the TMPRP1 method and TMPRP2 method reduce to the methods proposed by Cheng [9] and Zhang et al. [10], respectively, and the TMPRP3 method reduces to a method similar to that proposed by Yu et al. [20].

Remark 2.2

Obviously, if the line search is exact, then the direction generated by (13) or (14) or (15) reduces to (3) with \(\beta _{k}=\beta_{k}^{\mathrm{PRP}}\). Therefore, in the following, we assume that \(\mu>0\).

Remark 2.3

From (13) and (14), we can easily obtain

$$ g_{k}^{\top}d_{k}=- \|g_{k}\|^{2}\quad \text{and} \quad \|g_{k}\|\leq \| d_{k}\|. $$
(17)

This indicates that the TMPRP1 method and the TMPRP2 method satisfy the sufficient descent property. In addition, from the following lemma, we can see that the TMPRP3 method also satisfies this property.

Lemma 2.1

Let \(\{x_{k}\}\) and \(\{d_{k}\}\) be generated by the TMPRP3 method, then we have

$$ g_{k}^{\top}d_{k}\leq- \biggl(1- \frac{1}{t} \biggr)\|g_{k}\|^{2}. $$
(18)

Proof

We have from (15) and (16)

$$\begin{aligned} g_{k}^{\top}d_{k} =&-\|g_{k} \|^{2}+ \biggl(\frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}}-t\frac{\|y_{k-1}\|^{2}g_{k}^{\top}d_{k-1}}{(\mu |g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}} \biggr)g_{k}^{\top}d_{k-1} \\ &{}+\frac{g_{k}^{\top}d_{k-1}}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\| ^{2}}\bigl(g_{k}^{\top}y_{k-1}-g_{k}^{\top}s_{k-1}\bigr) \\ \leq&-\|g_{k}\|^{2}+2\frac{g_{k}^{\top}y_{k-1}g_{k}^{\top}d_{k-1}}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2}}-t \frac{\|y_{k-1}\|^{2}(g_{k}^{\top}d_{k-1})^{2}}{(\mu |g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}} \\ &{}-\frac{\alpha_{k-1}(g_{k}^{\top}d_{k-1})^{2}}{\mu|g_{k}^{\top}d_{k-1}|+\| g_{k-1}\|^{2}} \\ \leq&-\|g_{k}\|^{2}+2 \biggl(\frac{1}{\sqrt{t}}g_{k} \biggr)^{\top}\biggl(\frac {\sqrt{t}g_{k}^{\top}d_{k-1}}{\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\| ^{2}}y_{k-1} \biggr)-t \frac{\|y_{k-1}\|^{2}(g_{k}^{\top}d_{k-1})^{2}}{(\mu |g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}} \\ \leq&-\|g_{k}\|^{2}+\frac{1}{t}\|g_{k} \|^{2}+t\frac{\|y_{k-1}\|^{2}(g_{k}^{\top}d_{k-1})^{2}}{(\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}}-t\frac{\|y_{k-1}\| ^{2}(g_{k}^{\top}d_{k-1})^{2}}{(\mu|g_{k}^{\top}d_{k-1}|+\|g_{k-1}\|^{2})^{2}} \\ =&- \biggl(1-\frac{1}{t} \biggr)\|g_{k}\|^{2}, \end{aligned}$$

which indicates that (18) holds by induction since \(d_{0}=-g_{0}\) and \(t>1\). This completes the proof. □

Remark 2.4

From the proof of Lemma 2.1, we can see that if the term \(s_{k-1}\) in \(d_{k}\) is deleted, then the above sufficient descent property still holds.

The global convergence proof of the above three methods is similar, here, we only prove the global convergence of the TMPRP1 method. In the case of the other two methods, the argument is similar.

The following lemma, called the Zoutendijk condition, is often used to prove global convergence of conjugate gradient method. It was originally given by Zoutendijk in [21].

Lemma 2.2

Suppose that \(x_{0}\) is a starting point for which assumptions (H1) and (H2) hold. Consider any method in the form of (2), where \(d_{k}\) is a descent direction and \(\alpha_{k}\) satisfies the Wolfe condition (5) or the strong Wolfe condition (6). Then we have

$$\sum_{k=0}^{\infty}\frac{(g_{k}^{\top}d_{k})^{2}}{\|d_{k}\|^{2}}< +\infty. $$

This together with (17) shows that

$$ \sum_{k=0}^{\infty}\frac{\|g_{k}\|^{4}}{\|d_{k}\|^{2}}< +\infty. $$
(19)

Definition 2.1

The function \(f(x)\) is said to be uniformly convex on \(\mathcal{R}^{n}\), if there is a positive constant m such that

$$d^{\top}\nabla^{2}f(x)d\geq m\|d\|^{2},\quad \forall x,d\in\mathcal{R}^{n}, $$

where \(\nabla^{2}f(x)\) is the Hessian matrix of the function \(f(x)\).

Now we prove the strongly global convergence of TMPRP1 method for uniformly convex functions.

Lemma 2.3

Let the sequences \(\{x_{k}\}\) and \(\{d_{k}\}\) be generated by TMPRP1 method, and the function \(f(x)\) be uniformly convex, then we have

$$ c_{1}\alpha_{k}\|d_{k} \|^{2}\leq-g_{k}^{\top}d_{k}, $$
(20)

where \(c_{1}=(1-\rho)^{-1}m/2\).

Proof

See Lemma 2.1 in [22]. □

The proof of the following theorem is similar to that of Theorem 2.1 in [22]. For completeness, we give the proof.

Theorem 2.1

Suppose that the assumptions (H1) and (H2) hold, and \(f(x)\) is uniformly convex, then we have

$$\lim_{k\rightarrow\infty}\|g_{k}\|=0. $$

Proof

From (11), (20), and (H2), we have

$$ \bigl\vert \beta_{k}^{\mathrm{MPRP}}\bigr\vert \leq\biggl\vert \frac{g_{k}^{\top}(g_{k}-g_{k-1})}{\|g_{k-1}\|^{2}}\biggr\vert \leq\frac{L\alpha _{k-1}\|g_{k}\|\|d_{k-1}\|}{-g_{k-1}^{\top}d_{k-1}}\leq\frac{L}{c_{1}} \frac {\|g_{k}\|}{\|d_{k-1}\|}. $$

This together with (13) shows that

$$\begin{aligned} \|d_{k}\| \leq&\|g_{k}\|+\bigl\vert \beta_{k}^{\mathrm{MPRP}}\bigr\vert \frac{\|g_{k}\|\|d_{k-1}\|}{\|g_{k}\| ^{2}} \|g_{k}\|+\bigl\vert \beta_{k}^{\mathrm{MPRP}}\bigr\vert \|d_{k-1}\| \\ \leq&\|g_{k}\|+\frac{2L}{c_{1}}\|g_{k}\| \\ =& \biggl(1+\frac{2L}{c_{1}} \biggr)\|g_{k}\|. \end{aligned}$$

Then, letting \(\sqrt{A}=1+\frac{2L}{c_{1}}\), we get \(\|d_{k}\|^{2}\leq A\| g_{k}\|^{2}\). So, by (19), we get

$$\lim_{k\rightarrow\infty}\|g_{k}\|^{2}=\lim _{k\rightarrow\infty}\frac{\| g_{k}\|^{4}}{\|g_{k}\|^{2}}\leq A\lim_{k\rightarrow\infty} \frac{\|g_{k}\|^{4}}{\| d_{k}\|^{2}}=0. $$

This completes the proof. □

We are going to investigate the global convergence of the TMPRP1 method with Wolfe line search (5) for nonconvex function. In the last part of this subsection, we use \(\beta_{k}^{\mathrm{MPRP}+}\) to replace \(\beta _{k}^{\mathrm{MPRP}}\) in (13).

The next lemma corresponds to Lemma 4.3 in [23] and Theorem 3.2 in [24].

Lemma 2.4

Suppose that assumptions (H1) and (H2) hold. Let \(\{ x_{k}\}\) be the sequence generated by TMPRP1 method. If there exists a constant \(\varepsilon>0\) such that \(\|g_{k}\| \geq\varepsilon\) for all \(k\geq0\), then we have

$$ \sum_{k=0}^{\infty}\|u_{k+1}-u_{k}\|^{2}< +\infty, $$
(21)

where \(u_{k}=d_{k}/\|d_{k}\|\).

Proof

From (17) and \(\|g_{k}\|\geq\varepsilon\) for all k, we have \(\|d_{k}\|>0\) for all k. Therefore, \(u_{k}\) is well defined. Define

$$r_{k}=-\frac{ (1+\beta_{k}^{\mathrm{MPRP+}}\frac{g_{k}^{\top}d_{k-1}}{\| g_{k}\|^{2}} )}{\|d_{k}\|}g_{k} \quad \text{and}\quad \delta_{k}=\beta_{k}^{\mathrm {MPRP+}}\frac{\|d_{k-1}\|}{\|d_{k}\|}. $$

Then we have

$$u_{k}=r_{k}+\delta u_{k-1}. $$

Since \(u_{k-1}\) and \(u_{k}\) are unit vectors, we can write

$$\|r_{k}\|=\|u_{k}-\delta u_{k-1}\|=\|\delta u_{k}-u_{k-1}\|. $$

Noting that \(\delta_{k}\geq0\), we get

$$ \|u_{k}-u_{k-1}\|\leq\bigl\Vert (1+ \delta_{k}) (u_{k}-u_{k-1})\bigr\Vert \leq\| u_{k}-\delta u_{k-1}\|+\|\delta u_{k}-u_{k-1} \|=2\|r_{k}\|. $$
(22)

From (10), (11), and (H2), we have

$$ \bigl\vert \beta_{k}^{\mathrm{MPRP+}}\bigr\vert \frac{|g_{k}^{\top}d_{k-1}|}{\| g_{k}\|^{2}}\leq\frac{\|g_{k}\|LB}{\mu|g_{k}^{\top}d_{k-1}|}\frac{|g_{k}^{\top}d_{k-1}|}{\|g_{k}\|^{2}}\leq\frac{LB}{\varepsilon\mu}. $$
(23)

From (9), (10), and (23), it follows that there exists a constant \(M_{1}\geq0\) such that

$$ \biggl\Vert - \biggl(1+\beta_{k}^{\mathrm{MPRP+}} \frac{g_{k}^{\top}d_{k-1}}{\|g_{k}\|^{2}} \biggr)g_{k}\biggr\Vert \leq\|g_{k}\|+ \frac{LB}{\varepsilon\mu }\gamma\leq\gamma+\frac{LB}{\varepsilon\mu}\gamma\doteq M_{1}. $$
(24)

Thus, from (19) and (24), we get

$$\sum_{k=0}^{\infty}\|r_{k} \|^{2}\leq\sum_{k=0}^{\infty}\frac{M_{1}^{2}}{\|d_{k}\| ^{2}}=\sum_{k=0}^{\infty}\frac{M_{1}^{2}}{\|g_{k}\|^{4}}\frac{\|g_{k}\|^{4}}{\|d_{k}\| ^{2}}\leq\frac{M_{1}^{2}}{\varepsilon^{4}}\sum _{k=0}^{\infty}\frac{\|g_{k}\|^{4}}{\| d_{k}\|^{2}}< +\infty, $$

which together with (22) completes the proof. □

The following theorem establishes the global convergence of the TMPRP1 method with Wolfe line search (5) for general nonconvex functions. The proof is analogous to that of Theorem 3.2 in [24].

Theorem 2.2

Let the assumptions (H1) and (H2) hold. Then the sequence \(\{x_{k}\}\) generated by TMPRP1 method satisfies

$$ \liminf_{k\rightarrow\infty}\|g_{k}\|=0. $$
(25)

Proof

Assume that the conclusion (25) is not true. Then there exists a constant \(\varepsilon>0\) such that for all

$$\|g_{k}\|\geq\varepsilon, \quad \forall k\geq0. $$

The proof is divided into the following two steps.

Step I. A bound on the steps \(s_{k}\). We observe that for any \(l\geq k\),

$$ x_{l}-x_{k}=\sum _{j=k}^{l-1}(x_{j+1}-x_{j})=\sum _{j=k}^{l-1}\| s_{j} \|u_{j}=\sum_{j=k}^{l-1} \|s_{j}\|u_{k}+\sum_{j=k}^{l-1} \|s_{j}\|(u_{j}-u_{k}), $$
(26)

where \(s_{j}=x_{j+1}-x_{j}\) and \(u_{k}\) is defined in Lemma 2.4. Using the triangle inequality and \(\|u_{k}\|=1\), we can write (26) as

$$ \sum_{j=k}^{l-1} \|s_{j}\|\leq\|x_{l}-x_{k}\|+\sum _{j=k}^{l-1}\|s_{j}\|\|u_{j}-u_{k} \|\leq B+\sum_{j=k}^{l-1}\|s_{j}\| \|u_{j}-u_{k}\|. $$
(27)

Let Δ be an arbitrary but fixed positive integer. It follows from Lemma 2.4 that there is an index \(k_{\Delta}\) such that

$$ \sum_{i\geq k_{\Delta}}\|u_{i+1}-u_{i} \|^{2}\leq\frac {1}{4\Delta}. $$
(28)

If \(j>k\geq k_{\Delta}\) with \(j-k\leq\Delta\), then by (28) and Cauchy-Schwarz inequality, we have

$$\begin{aligned} \|u_{j}-u_{k}\| \leq&\sum_{i=k}^{j-1} \|u_{i+1}-u_{i}\| \\ \leq& \Biggl((j-k)\sum_{i=k}^{j-1} \|u_{i+1}-u_{i}\|^{2} \Biggr)^{\frac {1}{2}} \\ \leq& \biggl(\Delta\frac{1}{4\Delta} \biggr)^{\frac{1}{2}}= \frac{1}{2}. \end{aligned}$$

Combining this with (27) yields

$$ \sum_{j=k}^{l-1} \|s_{j}\|\leq2B, $$
(29)

where \(l>k\geq k_{\Delta}\) with \(l-k\leq\Delta\).

Step II. A bound on the direction \(d_{k}\). From (13) and (24), we have

$$\begin{aligned} \|d_{k}\|^{2} \leq& \biggl(\biggl\| - \biggl(1+ \beta_{k}^{\mathrm{MPRP+}}\frac{g_{k}^{\top}d_{k-1}}{\|g_{k}\|^{2}} \biggr)g_{k}\biggr\| + \bigl\vert \beta_{k}^{\mathrm{MPRP+}}\bigr\vert \|d_{k-1}\| \biggr)^{2} \\ \leq& \bigl(M_{1}+\bigl\vert \beta_{k}^{\mathrm{MPRP+}} \bigr\vert \|d_{k-1}\| \bigr)^{2} \\ \leq&2M_{1}^{2}+2\bigl(\beta_{k}^{\mathrm{MPRP+}} \bigr)^{2}\|d_{k-1}\|^{2} \\ \leq&2M_{1}^{2}+\frac{2L^{2}\gamma^{2}\|s_{k-1}\|^{2}}{\varepsilon^{2}}\|d_{k-1} \|^{2}. \end{aligned}$$

By the use of the same argument of the Case III of Theorem 3.2 in [3], we can get the conclusion (25). This completes the proof. □

Remark 2.5

From Theorem 2.2, we can see that the TMPRP1 method possesses better convergence properties than CTPRP method in [2]. Since the TMPRP1 method converges globally for nonconvex minimization problems with a Wolfe line search, while the CTPPR method converges globally for nonconvex minimization problems with a strong Wolfe line search. We also note that the term \(\mu|g_{k}^{\top}d_{k-1}|\) in the denominator of (11) plays an important role in the proof of Lemma 2.4.

Numerical results

In this section, we present some numerical results to compare the performance of the TMPRP1 method, the CG_DESCENT method in [24] and the DTPRP method in [19].

  • TMPRP1: the TMPRP1 method with Wolfe line search (5), with \(\mu=10^{-4}\), \(\rho=0.1\), \(\sigma=0.5\);

  • CG_DESCENT: the CG_DESCENT method with Wolfe line search (5), with \(\rho=0.1\), \(\sigma=0.5\);

  • DTPRP: the DTPRP method with Wolfe line search (5), with \(\mu =1.2\), \(\rho=0.1\), \(\sigma=0.5\).

All codes were written in Matlab 7.1 and run on a portable computer. We stopped the iteration if the number of iterations exceeded 1,000 or \(\| g_{k}\|<10^{-5}\). Here, we use some test problems in [17] with different dimensions. Our numerical results are listed in the form NI/NF/CPU, where the symbols NI, NF, and CPU mean the number of iterations, the number of function evaluations and the CPU time in seconds, respectively. ‘F’ means the method failed. Here, the code of Wolfe line search (5) is adapted from [25]. In Figures 1 and 2, we adopt the performance profiles by Dolan and Moré [12] to compare the performance based on the CPU time between the TMPRP1 method, the CG_DESCENT method and the DTPRP method. That is, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is fastest; while the right side gives the percentage of thee test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time. From Table 1 and Figures 1 and 2, we can see that the TMPRP1 method performs better than the CG_DESCENT method and the DTPRP method, thus the proposed TMPRP1 method is computationally efficient.

Figure 1
figure1

Performance profiles of TMPRP1 and CG_DESCENT about CPU time.

Figure 2
figure2

Performance profiles of TMPRP1 and DTPRP about CPU time.

Table 1 The results for the methods on the tested problems

Conclusion

This paper proposed three modified PRP conjugate gradient methods, which are some improvements of recently proposed PRP conjugate gradient methods. The global convergence of the proposed methods are established under the Wolfe line search. The effectiveness of the proposed methods have been shown by some numerical examples. We find that the performance of the TMPRP1 method is related to the parameter μ in \(\beta_{k}^{\mathrm{MPRP}}\); therefore, how to choose a suitable parameter τ deserves further investigation.

References

  1. 1.

    Fletcher, R, Reeves, C: Function minimization by conjugate gradients. J. Comput. 7, 149-154 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  2. 2.

    Polak, B, Ribière, G: Note sur la convergence des méthodes de directions conjuguées. Rev. Fr. Inform. Rech. Oper. 16, 35-43 (1969)

    Google Scholar 

  3. 3.

    Gilbert, JC, Nocedal, J: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 21-42 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  4. 4.

    Liu, YL, Storey, CS: Efficient generalized conjugate gradient algorithms, part 1: theory. J. Optim. Theory Appl. 69, 129-137 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  5. 5.

    Dai, YH, Yuan, YX: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10, 177-182 (2000)

    Article  Google Scholar 

  6. 6.

    Hestenes, MR, Stiefel, EL: Method of conjugate gradient for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409-432 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  7. 7.

    Fletcher, R: Practical Methods of Optimization. Volume 1: Unconstrained Optimization. Wiley, New York (1987)

    Google Scholar 

  8. 8.

    Grippo, L, Luidi, S: A globally convergent version of the Polak-Ribière gradient method. Math. Program. 78, 375-391 (1997)

    MATH  Google Scholar 

  9. 9.

    Cheng, WY: A two-term PRP-based descent method. Numer. Funct. Anal. Optim. 28, 1217-1230 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  10. 10.

    Zhang, L, Zhou, WJ, Li, DH: A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26, 629-640 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  11. 11.

    Yu, GH, Guan, LT, Chen, WF: Spectral conjugate gradient methods with sufficient descent property for large-scale unconstrained optimization. Optim. Methods Softw. 23, 275-293 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  12. 12.

    Dolan, ED, Moré, JJ: Benchmarking optimization software with performance profiles. Math. Program. 91, 201-213 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  13. 13.

    Wei, ZX, Li, G, Qi, LQ: Global convergence of the Polak-Ribière-Polyak conjugate gradient method with inexact line searches for non-convex unconstrained optimization problems. Math. Comput. 77, 2173-2193 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. 14.

    Li, G, Tang, CM, Wei, ZX: New conjugacy condition and related new conjugate gradient methods for unconstrained optimization. J. Comput. Appl. Math. 202, 523-539 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. 15.

    Yu, G, Guan, L, Li, G: Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. J. Ind. Manag. Optim. 3, 565-579 (2008)

    MathSciNet  Google Scholar 

  16. 16.

    Dai, YH, Kou, CX: A nonlinear conjugate gradient algorithm with an optimal property and an improved Wolfe line search. SIAM J. Optim. 23, 296-320 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  17. 17.

    Neculai, A: Unconstrained optimization by direct searching (2007). http://camo.ici.ro/neculai/UNO/UNO.FOR

  18. 18.

    Wei, ZX, Yao, SW, Liu, LY: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183, 1341-1350 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  19. 19.

    Dai, ZF, Wen, FH: Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property. Appl. Math. Comput. 218, 7421-7430 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  20. 20.

    Yu, GH, Zhao, YL, Wei, ZX: A descent nonlinear conjugate gradient method for large-scale unconstrained optimization. Appl. Math. Comput. 187, 636-643 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  21. 21.

    Zoutendijk, G: Nonlinear programming, computational methods. In: Abadie, J (ed.) Integer and Nonlinear Programming, pp. 37-86. North-Holland, Amsterdam (1970)

    Google Scholar 

  22. 22.

    Dai, ZF, Tian, BS: Global convergence of some modified PRP nonlinear conjugate gradient methods. Optim. Lett. 5(4), 615-630 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  23. 23.

    Dai, YH, Liao, LZ: New conjugacy conditions and related nonlinear conjugate gradient methods. Appl. Math. Optim. 43, 97-101 (2001)

    Article  MathSciNet  Google Scholar 

  24. 24.

    Hager, WW, Zhang, HC: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2, 35-58 (2006)

    MATH  MathSciNet  Google Scholar 

  25. 25.

    Wu, QZ, Zheng, ZY, Deng, W: Operations Research and Optimization, MATLAB Programming, pp. 66-69. China Machine Press, Beijing (2010)

    Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the helpful comments and suggestions of the anonymous reviewers. This work was partially supported by the domestic visiting scholar project funding of Shandong Province outstanding young teachers in higher schools, the foundation of Scientific Research Project of Shandong Universities (No. J13LI03), and the Shandong Province Statistical Research Project (No. 20143038).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Min Sun.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

The first author has designed the three methods and the second author has refined them. Both authors have equally contributed in the numerical results. All authors read and approved the final manuscript.

Rights and permissions

Open Access This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, M., Liu, J. Three modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. J Inequal Appl 2015, 125 (2015). https://doi.org/10.1186/s13660-015-0649-9

Download citation

Keywords

  • conjugate gradient method
  • sufficient descent property
  • global convergence