Skip to main content

Advertisement

Least-squares-based three-term conjugate gradient methods

Abstract

In this paper, we first propose a new three-term conjugate gradient (CG) method, which is based on the least-squares technique, to determine the CG parameter, named LSTT. And then, we present two improved variants of the LSTT CG method, aiming to obtain the global convergence property for general nonlinear functions. The least-squares technique used here well combines the advantages of two existing efficient CG methods. The search directions produced by the proposed three methods are sufficient descent directions independent of any line search procedure. Moreover, with the Wolfe–Powell line search, LSTT is proved to be globally convergent for uniformly convex functions, and the two improved variants are globally convergent for general nonlinear functions. Preliminary numerical results are reported to illustrate that our methods are efficient and have advantages over two famous three-term CG methods.

Introduction

Consider the following unconstrained optimization problem:

$$ \min_{x \in\mathbb{R}^{n}} f(x), $$

where \(f: \mathbb{R}^{n}\rightarrow\mathbb{R}\) is a continuously differentiable function whose gradient function is denoted by \(g(x)\).

Conjugate gradient (CG) methods are known to be among the most efficient methods for unconstrained optimization due to their advantages of simple structure, low storage, and nice numerical behavior. CG methods have been widely used to solve practical problems, especially large-scale problems such as image recovery [1], condensed matter physics [2], environmental science [3], and unit commitment problems [46].

For the current iteration point \(x_{k}\), the CG methods yield the new iterate \(x_{k+1}\) by the formula

$$ x_{k+1}=x_{k}+\alpha_{k}d_{k},\quad k=0, 1, \ldots, $$

where \(\alpha_{k}\) is the stepsize determined by a certain line search and \(d_{k}\) is the so-called search direction in the form of

$$\begin{aligned} d_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k}, & k=0,\\ -g_{k}+\beta_{k} d_{k-1}, & k\geq1, \end{array}\displaystyle \right . \end{aligned}$$

in which \(\beta_{k}\) is a parameter. Different choices of \(\beta_{k}\) correspond to different CG methods. Some classical and famous formulas of the CG methods parameter \(\beta _{k}\) are:

$$\begin{aligned}& \beta_{k}^{\mathrm{HS}}=\frac{g_{k}^{T}y_{k-1}}{{d_{k-1}^{T}y_{k-1}}}, \quad\text{Hestenes and Stiefel (HS) [7];} \\& \beta_{k}^{\mathrm{FR}}=\frac{ \Vert g_{k} \Vert ^{2}}{ \Vert g_{k-1} \Vert ^{2}}, \quad\text{Fletcher and Reeves (FR) [8];} \\& \beta_{k}^{\mathrm{PRP}}=\frac{g_{k}^{T}y_{k-1}}{ \Vert g_{k-1} \Vert ^{2}}, \quad\text{Polak, Ribi\'{e}re, and Polyak (PRP) [9, 10];} \\& \beta_{k}^{\mathrm{DY}}=\frac{ \Vert g_{k} \Vert ^{2}}{{d_{k-1}^{T}y_{k-1}}}, \quad\text{Dai and Yuan (DY) [11],} \end{aligned}$$

where \(g_{k}=g(x_{k})\), \(y_{k-1}=g_{k}-g_{k-1}\), and \(\|\cdot\|\) denotes the Euclidean norm.

Here are two commonly used line searches for choosing the stepsize \(\alpha_{k}\).

  • The Wolfe–Powell line search: the stepsize \(\alpha _{k}\) satisfies the following two relations:

    $$ f(x_{k}+\alpha_{k}d_{k})-f(x_{k}) \leq\delta\alpha_{k} g_{k}^{T}d_{k} $$
    (1)

    and

    $$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq\sigma g_{k}^{T}d_{k}, $$
    (2)

    where \(0<\delta<\sigma<1\).

  • The strong Wolfe–Powell line search: the stepsize \(\alpha_{k}\) satisfies both (1) and the following relation:

    $$ \bigl\vert g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \bigr\vert \leq\sigma \bigl\vert g_{k}^{T}d_{k} \bigr\vert . $$

In recent years, based on the above classical formulas and line searches, many variations of CG methods have been proposed, including spectral CG methods [12, 13], hybrid CG methods [14, 15], and three-term CG methods [16, 17]. Among them, the three-term CG methods seem to attract more attention, and a great deal of efforts has been devoted to developing this kind of methods, see, e.g., [1823]. In particular, by combining the PRP method [9, 10] with the BFGS quasi-Newton method [24], Zhang et al. [22] presented a three-term PRP CG method (TTPRP). Their motivation is that the PRP method has good numerical performance but is generally not a descent method when the Armijo-type line search is executed. The direction of TTPRP is given by

$$ d_{k}^{\mathrm{TTPRP}} =\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{PRP}}d_{k-1} -\theta_{k}^{(1)}y_{k-1}, & \mbox{if } k\geq1, \end{array}\displaystyle \right . $$

where

$$ \theta_{k}^{(1)}=\frac{g_{k}^{T}d_{k-1}}{ \Vert g_{k-1} \Vert ^{2}}, $$
(3)

which is always a descent direction (independent of line searches) for the objective function.

In the same way, Zhang et al. [25] presented a three-term FR CG method (TTFR) whose direction is in the form of

$$ d_{k}^{\mathrm{TTFR}} =\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{FR}}d_{k-1} -\theta_{k}^{(1)}g_{k}, & \mbox{if } k\geq1, \end{array}\displaystyle \right . $$

where \(\theta_{k}^{(1)}\) is given by (3). Later, Zhang et al. [23] proposed a three-term HS CG method (TTHS) whose direction is defined by

$$\begin{aligned} d_{k}^{\mathrm{TTHS}}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{HS}}d_{k-1} -\theta_{k}^{(2)}y_{k-1}, & \mbox{if } k\geq1, \end{array}\displaystyle \right . \end{aligned}$$
(4)

where

$$\theta_{k}^{(2)}=\frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}. $$

The above approaches [22, 23, 25] have a common advantage that the relation \(d_{k}^{T}g_{k}=-\|g_{k}\|^{2}\) holds. This means that they always generate descent directions without the help of line searches. Moreover, they can all achieve global convergence under suitable line searches.

Before putting forward the idea of our new three-term CG methods, we first briefly review a hybrid CG method (HCG) proposed by Babaie-Kafaki and Ghanbari [26], in which the search direction is in the form of

$$\begin{aligned} d_{k}^{\mathrm{HCG}}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{HCG}}d_{k-1}, & \mbox{if } k\geq1, \end{array}\displaystyle \right . \end{aligned}$$

where the parameter is given by a convex combination of FR and PRP formulas

$$\begin{aligned} \beta_{k}^{\mathrm{HCG}}=(1-\theta_{k})\beta_{k}^{\mathrm{PRP}}+ \theta _{k}\beta_{k}^{\mathrm{FR}},\quad \mbox{with } \theta_{k}\in[0,1]. \end{aligned}$$

It is obvious that the choice of \(\theta_{k}\) is very critical for the practical performance of the HCG method. By taking into account that the TTHS method has good theoretical property and numerical performance, Babaie-Kafaki and Ghanbari [26] proposed a way to select \(\theta_{k}\) such that the direction \(d_{k}^{\mathrm{HCG}}\) is as close as possible to \(d_{k}^{\mathrm{TTHS}}\) in the sense that their distance is minimized, i.e., the optimal choice \(\theta_{k}^{*}\) is obtained by solving the least-squares problem

$$ \theta_{k}^{*}=\arg\min_{\theta_{k}\in[0,1]} \bigl\Vert d_{k}^{\mathrm {HCG}}-d_{k}^{\mathrm{TTHS}} \bigr\Vert ^{2}. $$
(5)

Similarly, Babaie-Kafaki and Ghanbari [27] proposed another hybrid CG method by combining HS with DY, in which the combination coefficient is also determined by the least-squares technique (5). The numerical results in [26, 27] show that this least-squares-based approach is very efficient.

Summarizing the above discussions, we have the following two observations: (1) the three-term CG methods perform well both theoretically and numerically; (2) the least-squares technique can greatly improve the efficiency of CG methods. Putting these together, the main goal of this paper is to develop new three-term CG methods that are based on the least-squares technique. More precisely, we first propose a basic three-term CG method, namely LSTT, in which the least-squares technique well combines the advantages of two existing efficient CG methods. With the Wolfe–Powell line search, LSTT is proved to be globally convergent for uniformly convex functions. In order to obtain the global convergence property for general nonlinear functions, we further present two improved variants of the LSTT CG method. All the three methods generate sufficient descent directions independent of any line search procedure. Global convergence is also analyzed for the proposed methods. Finally, some preliminary numerical results are reported to illustrate that our methods are efficient and have advantages over two famous three-term CG methods.

The paper is organized as follows. In Sect. 2, we present the basic LSTT CG method. Global convergence of LSTT is proved in Sect. 3. Two improved variants of LSTT and their convergence analysis are given in Sect. 4. Numerical results are reported in Sect. 5. Some concluding remarks are made in Sect. 6.

Least-squares-based three-term (LSTT) CG method

In this section, we first derive a new three-term CG formula, and then present the corresponding CG algorithm. Our formula is based on the following modified HS (MHS) formula proposed by Hager and Zhang [28, 29]:

$$ \beta_{k}^{\mathrm{MHS}}(\tau_{k})= \beta_{k}^{\mathrm{HS}}- \tau_{k}\frac { \Vert y_{k-1} \Vert ^{2}g_{k}^{T}d_{k-1}}{(d_{k-1}^{T}y_{k-1})^{2}}, $$
(6)

where \(\tau_{k}\) (≥0) is a parameter. The corresponding direction is then given by

$$\begin{aligned} d_{k}^{\mathrm{MHS}}(\tau_{k})=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{MHS}}(\tau_{k})d_{k-1}, & \mbox{if } k\geq1. \end{array}\displaystyle \right . \end{aligned}$$
(7)

Different choices of \(\tau_{k}\) will lead to different types of CG formulas. In particular, \(\beta_{k}^{\mathrm{MHS}}(0)=\beta_{k}^{\mathrm{HS}}\), and \(\beta_{k}^{\mathrm{MHS}}(2)\) is just the formula proposed in [28].

In this paper, we present a more sophisticated choice of \(\tau_{k}\) by making use of the least-squares technique. More precisely, the optimal choice \(\tau_{k}^{*}\) is determined such that the direction \(d_{k}^{\mathrm{MHS}}\) is as close as possible to \(d_{k}^{\mathrm{TTHS}}\), i.e., it is generated by solving the least-squares problem

$$ \tau_{k}^{*}= \arg\min_{\tau_{k}\in[0,1]} \bigl\Vert d_{k}^{\mathrm {MHS}}(\tau_{k})-d_{k}^{\mathrm{TTHS}} \bigr\Vert ^{2}. $$
(8)

Substituting (4) and (7) in (8), we have

$$ \tau_{k}^{*}=\arg\min_{\tau_{k}\in[0,1]} \biggl\Vert \frac {g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}y_{k-1}-\tau_{k}\frac{ \Vert y_{k-1} \Vert ^{2}g_{k}^{T}d_{k-1}}{(d_{k-1}^{T}y_{k-1})^{2}}d_{k-1} \biggr\Vert ^{2}, $$

which implies

$$ \tau_{k}^{*}=\frac{(d_{k-1}^{T}y_{k-1})^{2}}{ \Vert y_{k-1} \Vert ^{2} \Vert d_{k-1} \Vert ^{2}}. $$
(9)

Thus, from (6), we obtain

$$ \beta_{k}^{\mathrm{MHS}}\bigl(\tau_{k}^{*} \bigr)=\beta_{k}^{\mathrm{HS}}-\frac {g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}}. $$
(10)

So far, it seems that the two-term direction \(d_{k}^{\mathrm {MHS}}(\tau_{k}^{*})\) obtained from (9) and (10) is a “good enough” direction; however, it may not always be a descent direction of the objective function. In order to overcome this difficulty, we propose a least-squares-based three-term (LSTT) direction by augmenting a term to \(d_{k}^{\mathrm {MHS}}(\tau_{k}^{*})\) as follows:

$$\begin{aligned} d_{k}^{\mathrm{LSTT}}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k},& \mbox{if } k=0,\\ -g_{k}+\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})d_{k-1}-\theta_{k}y_{k-1}, & \mbox{if } k\geq1, \end{array}\displaystyle \right . \end{aligned}$$
(11)

where

$$ \theta_{k}=\frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}. $$
(12)

The following lemma shows that the direction \(d_{k}^{\mathrm{LSTT}}\) (11) is a sufficient descent direction, which is independent of the line search used.

Lemma 1

Let the search direction \(d_{k}:=d_{k}^{\mathrm{LSTT}}\)be generated by (11). Then it satisfies the following sufficient descent condition:

$$ g_{k}^{T}d_{k}\leq- \Vert g_{k} \Vert ^{2}. $$
(13)

Proof

For \(k=0\), we have \(d_{0}=-g_{0}\), so it follows that \(g_{0}^{T}d_{0}=-\|g_{0}\|^{2}\).

For \(k\geq1\), we have

$$d_{k}=-g_{k}+\beta_{k}^{\mathrm{MHS}}\bigl( \tau_{k}^{*}\bigr)d_{k-1}-\theta_{k}y_{k-1}, $$

which along with (10) and (12) shows that

$$\begin{aligned} g_{k}^{T}d_{k} = & - \Vert g_{k} \Vert ^{2}+ \biggl(\frac{g_{k}^{T}y_{k-1}}{d_{k-1}^{T}y_{k-1}}-\frac {g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}} \biggr)g_{k}^{T}d_{k-1}- \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}g_{k}^{T}y_{k-1} \\ = & - \Vert g_{k} \Vert ^{2}- \frac{(g_{k}^{T}d_{k-1})^{2}}{ \Vert d_{k-1} \Vert ^{2}} \\ \leq& - \Vert g_{k} \Vert ^{2}. \end{aligned}$$

So the proof is completed. □

Now, we formally present the least-squares-based three-term CG algorithm (Algorithm 1) that uses \(d_{k}^{\mathrm{LSTT}}\) (11) as the search direction. Note that it reduces to the classical HS method if an exact line search is executed in Step 3.

Algorithm 1
figurea

Least-squares-based three-term CG algorithm (LSTT)

Convergence analysis for uniformly convex functions

In this section, we establish the global convergence of Algorithm 1 for uniformly convex functions. The stepsize \(\alpha_{k}\) at Step 3 is generated by the Wolfe–Powell line search (1) and (2). For this purpose, we first make two standard assumptions on the objective function, which are assumed to be hold throughout the rest of the paper.

Assumption 1

The level set \(\varOmega=\{x\in\mathbb{R}^{n}| f(x)\leq f(x_{0})\}\) is bounded.

Assumption 2

There is an open set \(\mathcal{O}\) containing Ω, in which \(f(x)\) is continuous differentiable and its gradient function \(g(x)\) is Lipschitz continuous, i.e., there exists a constant \(L>0\) such that

$$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq L \Vert x-y \Vert ,\quad \forall x,y \in\mathcal{O}. $$
(14)

From Assumptions 1 and 2, it is not difficult to verify that there is a constant \(\gamma>0\) such that

$$ \bigl\Vert g(x) \bigr\Vert \leq\gamma, \quad\forall x \in\varOmega. $$
(15)

The following lemma is commonly used in proving the convergence of CG methods, which is called the Zoutendijk condition [30].

Lemma 2

Suppose that the sequence \(\{x_{k}\}\)of iterates is generated by Algorithm 1. If the search direction \(d_{k}\)satisfies \(g_{k}^{T}d_{k}<0\)and the stepsize \(\alpha_{k}\)is calculated by the Wolfe–Powell line search (1) and (2), then we have

$$ \sum_{k=0}^{\infty}\frac{(g_{k}^{T}d_{k})^{2}}{ \Vert d_{k} \Vert ^{2}}< +\infty. $$
(16)

From Lemma 1, we know that if Algorithm 1 does not stop, then

$$g_{k}^{T}d_{k}\leq- \Vert g_{k} \Vert ^{2}< 0. $$

Thus, under Assumptions 1 and 2, relation (16) holds immediately for Algorithm 1.

Now, we present the global convergence of Algorithm 1 (with \(\epsilon=0\)) for uniformly convex functions.

Theorem 1

Suppose that the sequence \(\{x_{k}\}\)of iterates is generated by Algorithm 1, and that the stepsize \(\alpha_{k}\)is calculated by the Wolfe–Powell line search (1) and (2). Iffis uniformly convex on the level setΩ, i.e., there exists a constant \(\mu>0\)such that

$$ \bigl(g(x)-g(y) \bigr)^{T}(x-y)\geq\mu \Vert x-y \Vert ^{2},\quad \forall x, y\in \varOmega, $$
(17)

then either \(\|g_{k}\|=0\)for somek, or

$$\lim_{k\rightarrow\infty} \Vert g_{k} \Vert =0. $$

Proof

If \(\|g_{k}\|=0\) for some k, then the algorithm stops. So, in what follows, we assume that an infinite sequence \(\{x_{k}\}\) is generated.

According to Lipschitz condition (14), the following relation holds:

$$ \Vert y_{k-1} \Vert = \Vert g_{k}-g_{k-1} \Vert \leq L \Vert x_{k}-x_{k-1} \Vert = L \Vert s_{k-1} \Vert , $$
(18)

where \(s_{k-1}:=x_{k}-x_{k-1}\). In addition, from (17) it follows that

$$ y_{k}^{T}s_{k}\geq\mu \Vert s_{k} \Vert ^{2}. $$
(19)

By combining the definition of \(d^{k}\) (cf. (10), (11), and (12)) with relations (18) and (19), we have

$$\begin{aligned} \Vert d_{k} \Vert =& \biggl\Vert -g_{k}+ \biggl( \frac {g_{k}^{T}y_{k-1}}{d_{k-1}^{T}y_{k-1}}-\frac{g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}} \biggr)d_{k-1}- \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}y_{k-1} \biggr\Vert \\ \leq& \Vert g_{k} \Vert +\frac{ \Vert g_{k} \Vert \Vert y_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert d_{k-1} \Vert + \Vert g_{k} \Vert +\frac{ \Vert g_{k} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert y_{k-1} \Vert \\ = & 2 \Vert g_{k} \Vert +2\frac{ \Vert g_{k} \Vert \Vert y_{k-1} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \\ \leq& 2 \Vert g_{k} \Vert +2L\frac{ \Vert g_{k} \Vert \Vert s_{k-1} \Vert ^{2}}{\mu \Vert s_{k-1} \Vert ^{2}} \\ = & \biggl(2+\frac{2L}{\mu} \biggr) \Vert g_{k} \Vert . \end{aligned}$$

This together with Lemma 1 and (16) shows that

$$+\infty> \sum_{k=0}^{\infty}\frac{(g_{k}^{T}d_{k})^{2}}{ \Vert d_{k} \Vert ^{2}} \geq\sum_{k=0}^{\infty}\frac{ \Vert g_{k} \Vert ^{4}}{ \Vert d_{k} \Vert ^{2}} \geq \frac{1}{\omega^{2}}\sum_{k=0}^{\infty} \Vert g_{k} \Vert ^{4}, \quad\text{with } \omega=2+2L/\mu, $$

which implies that \(\lim_{k\rightarrow\infty}\|g_{k}\|=0\). □

Two improved variants of the LSTT CG method

Note that the global convergence of Algorithm 1 is established only for uniformly convex functions. In this section, we present two improved variants of Algorithm 1, which both have global convergence property for general nonlinear functions.

An improved version of LSTT (LSTT+)

In fact, the main difficulty impairing convergence for general functions is that \(\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})\) (cf. (10)) may be negative. So, similar to the strategy used in [31], we present the first modification of direction \(d_{k}^{\mathrm{LSTT}}\) (11) as follows:

$$\begin{aligned} d_{k}^{\mathrm{LSTT+}}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k}+\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})d_{k-1}-\theta_{k}y_{k-1}, & \mbox{if } k>0 \mbox{ and } \beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})>0,\\ -g_{k}, & \mbox{otherwise}, \end{array}\displaystyle \right . \end{aligned}$$
(20)

where \(\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})\) and \(\theta_{k}\) are given by (10) and (12), respectively. The corresponding algorithm is given in Algorithm 2.

Algorithm 2
figureb

Improved version of LSTT algorithm (LSTT+)

Obviously, the search direction \(d_{k}\) generated by Algorithm 2 satisfies the sufficient descent condition (13). Therefore, if the stepsize \(\alpha_{k}\) is calculated by the Wolfe–Powell line search (1) and (2), then the Zoutendijk condition (16) also holds for Algorithm 2.

The following lemma shows some other important properties about the search direction \(d^{k}\).

Lemma 3

Suppose that the sequence \(\{d_{k}\}\)of directions is generated by Algorithm 2, and that the stepsize \(\alpha_{k}\)is calculated by the Wolfe–Powell line search (1) and (2). If there is a constant \(c>0\)such that \(\|g_{k}\|\geq c\)for anyk, then

$$\begin{aligned} d_{k}\neq0\quad\textit{for each }k,\quad\textit{and}\quad\sum_{k=0}^{\infty}{ \|u_{k}-u_{k-1}\| ^{2}}< +\infty, \end{aligned}$$

where \(\|u_{k}\|={d_{k}}/{\|d_{k}\|}\).

Proof

Firstly, from Lemma 1 and the fact that \(\|g_{k}\|\geq c\), we have

$$ g_{k}^{T}d_{k}\leq- \Vert g_{k} \Vert ^{2}\leq-c^{2},\quad \forall k, $$
(21)

which implies that \(d_{k}\neq0\) for each k.

Secondly, from (16) and (21), we have

$$\begin{aligned} {c}^{4}\sum_{k=0}^{\infty}\frac{1}{ \Vert d_{k} \Vert ^{2}}\leq\sum_{k=0}^{\infty}\frac{ \Vert g_{k} \Vert ^{4}}{ \Vert d_{k} \Vert ^{2}}\leq\sum_{k=0}^{\infty}\frac {(g_{k}^{T}d_{k})^{2}}{ \Vert d_{k} \Vert ^{2}}< +\infty. \end{aligned}$$
(22)

Now we rewrite the direction \(d_{k}\) in (20) as

$$\begin{aligned} d_{k}=-g_{k}+\beta_{k}^{+}d_{k-1}- \theta^{+}_{k}y_{k-1}, \end{aligned}$$
(23)

where

$$\begin{aligned} \theta^{+}_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} \theta_{k}, & \mbox{if } \beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})>0,\\ 0, & \mbox{otherwise}, \end{array}\displaystyle \right . \quad \mbox{and}\quad \beta_{k}^{+}=\max \bigl\{ \beta_{k}^{\mathrm{MHS}} \bigl(\tau_{k}^{*}\bigr), 0 \bigr\} . \end{aligned}$$

Denote

$$\begin{aligned} a_{k}=\frac{-g_{k}-\theta_{k}^{+}y_{k-1}}{ \Vert d_{k} \Vert },\qquad b_{k}= \beta_{k}^{+}\frac{ \Vert d_{k-1} \Vert }{ \Vert d_{k} \Vert }. \end{aligned}$$
(24)

According to (23) and (24), it follows that

$$\begin{aligned} u_{k}=\frac{d_{k}}{ \Vert d_{k} \Vert }=\frac{-g_{k}-\theta_{k}^{+}y_{k-1}+\beta _{k}^{+}d_{k-1}}{ \Vert d_{k} \Vert }=a_{k}+b_{k}u_{k-1}. \end{aligned}$$

From the fact that \(\|u_{k}\|=1\), we obtain

$$\begin{aligned} \Vert a_{k} \Vert = \Vert u_{k}-b_{k}u_{k-1} \Vert = \Vert b_{k}u_{k}-u_{k-1} \Vert . \end{aligned}$$

Since \(b_{k}\geq0\), we get

$$\begin{aligned} \Vert u_{k}-u_{k-1} \Vert \leq& \bigl\Vert (1+b_{k}) (u_{k}-u_{k-1}) \bigr\Vert \\ \leq& \Vert u_{k}-b_{k}u_{k-1} \Vert + \Vert b_{k}u_{k}-u_{k-1} \Vert \\ = & 2 \Vert a_{k} \Vert . \end{aligned}$$
(25)

On the other hand, from the Wolfe–Powell line search condition (2) and (21), we have

$$ d_{k-1}^{T}y_{k-1}=d_{k-1}^{T}(g_{k}-g_{k-1}) \geq(1-\sigma) \bigl(-d_{k-1}^{T}g_{k-1}\bigr)\geq(1- \sigma){c}^{2}>0. $$
(26)

Since \(g_{k-1}^{T}d_{k-1}<0\), we have

$$g_{k}^{T}d_{k-1}= d_{k-1}^{T}y_{k-1}+g_{k-1}^{T}d_{k-1}< d_{k-1}^{T}y_{k-1}. $$

This together with (26) shows that

$$ \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}< 1. $$
(27)

Again from (2), it follows that

$$g_{k}^{T}d_{k-1}\geq\sigma g_{k-1}^{T}d_{k-1}=- \sigma y_{k-1}^{T}d_{k-1}+\sigma g_{k}^{T}d_{k-1}, $$

which implies

$$ \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}\geq\frac{-\sigma}{1-\sigma}. $$
(28)

By combining (27) and (28), we have

$$ \biggl\vert \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}} \biggr\vert \leq\max \biggl\{ \frac{\sigma}{1-\sigma},1 \biggr\} . $$
(29)

In addition, the following relation comes directly from (15)

$$ \Vert y_{k-1} \Vert = \Vert g_{k}-g_{k-1} \Vert \leq \Vert g_{k} \Vert + \Vert g_{k-1} \Vert \leq2\gamma. $$
(30)

Finally, from (15), (29), and (30), we give a bound on the numerator of \(a_{k}\):

$$\begin{aligned} \bigl\Vert -g_{k}-\theta_{k}^{+}y_{k-1} \bigr\Vert \leq& \Vert g_{k} \Vert + \biggl\vert \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}} \biggr\vert \Vert y_{k-1} \Vert \\ \leq& \Vert g_{k} \Vert +\max \biggl\{ \frac{\sigma}{1-\sigma},1 \biggr\} \Vert y_{k-1} \Vert \\ \leq& M, \end{aligned}$$

where \(M=\gamma+2\gamma\max \{\frac{\sigma}{1-\sigma },1 \}\). This together with (25) shows that

$$\Vert u_{k}-u_{k-1} \Vert ^{2}\leq4 \Vert a_{k} \Vert ^{2}\leq\frac{4M^{2}}{ \Vert d_{k} \Vert ^{2}}. $$

Summing the above relation over k and using (22), the proof is completed. □

We are now ready to prove the global convergence of Algorithm 2.

Theorem 2

Suppose that the sequence \(\{x_{k}\}\)of iterates is generated by Algorithm 2, and that the stepsize \(\alpha_{k}\)is calculated by the Wolfe–Powell line search (1) and (2). Then either \(\|g_{k}\|=0\)for somekor

$$\liminf_{k\rightarrow\infty} \Vert g_{k} \Vert =0. $$

Proof

Suppose by contradiction that there is a constant \(c>0\) such that \(\| g_{k}\|\geq c\) for any k. So the conditions of Lemma 3 hold.

We first show that there is a bound on the steps \(s_{k}\), whose proof is a modified version of [28, Thm. 3.2]. From Assumption 1, there is a constant \(B>0\) such that

$$\Vert x_{k} \Vert \leq B,\quad \forall k, $$

which implies

$$ \Vert x_{l}-x_{k} \Vert \leq \Vert x_{l} \Vert + \Vert x_{k} \Vert \leq2B. $$
(31)

For any \(l\geq k\), it is clear that

$$x_{l}-x_{k}=\sum_{j=k}^{l-1}{(x_{j+1}-x_{j})}= \sum_{j=k}^{l-1}{ \Vert s_{j} \Vert u_{j}}=\sum_{j=k}^{l-1}{ \Vert s_{j} \Vert u_{k}}+\sum _{j=k}^{l-1}{ \Vert s_{j} \Vert (u_{j}-u_{k}}). $$

This together with the triangle inequality and (31) shows that

$$ \sum_{j=k}^{l-1}{ \Vert s_{j} \Vert }\leq \Vert x_{l}-x_{k} \Vert + \sum_{j=k}^{l-1}{ \Vert s_{j} \Vert \Vert u_{j}-u_{k} \Vert }\leq2B+\sum _{j=k}^{l-1}{ \Vert s_{j} \Vert \Vert u_{j}-u_{k} \Vert }. $$
(32)

Denote

$$\xi:=\frac{2\gamma L}{(1-\sigma)c^{2}}, $$

where σ, L, and γ are given in (2), (14), and (15), respectively. Let be a positive integer, chosen large enough that

$$\begin{aligned} \triangle\geq8\xi B. \end{aligned}$$
(33)

Moreover, from Lemma 3, we can choose an index \(k_{0}\) large enough that

$$\begin{aligned} \sum_{i\geq k_{0}}{ \Vert u_{i+1}-u_{i} \Vert ^{2}}\leq\frac{1}{4\triangle}. \end{aligned}$$
(34)

Thus, if \(j>k\geq k_{0}\) and \(j-k\leq\triangle\), we can derive the following relations by (34) and the Cauchy–Schwarz inequality:

$$\begin{aligned} \Vert u_{j}-u_{k} \Vert \leq& \sum _{i=k}^{j-1}{ \Vert u_{i+1}-u_{i} \Vert } \\ \leq& \sqrt{j-k} \Biggl(\sum_{i=k}^{j-1}{ \Vert u_{i+1}-u_{i} \Vert ^{2}} \Biggr)^{\frac{1}{2}} \\ \leq& \sqrt{\triangle} \biggl(\frac{1}{4\triangle} \biggr)^{\frac {1}{2}}=\frac{1}{2}. \end{aligned}$$
(35)

Combining (32) and (35), we have

$$ \sum_{j=k}^{l-1}{ \Vert s_{j} \Vert }\leq4B, $$
(36)

where \(l>k\geq k_{0}\) and \(l-k\leq\triangle\).

Next, we prove that there is a bound on the directions \(d_{k}\).

If \(d_{k}=-g_{k}\) in (20), then from (15) we have

$$ \Vert d_{k} \Vert \leq\gamma. $$
(37)

In what follows, we consider the case where

$$d_{k}=-g_{k}+\beta_{k}^{\mathrm{MHS}}\bigl( \tau_{k}^{*}\bigr)d_{k-1}-\theta_{k}y_{k-1}. $$

Thus, from (15), (18), and (26), we have

$$\begin{aligned} \Vert d_{k} \Vert ^{2} =& \biggl\Vert -g_{k}+ \biggl(\frac {g_{k}^{T}y_{k-1}}{d_{k-1}^{T}y_{k-1}}-\frac{g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}} \biggr)d_{k-1}-\frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}y_{k-1} \biggr\Vert ^{2} \\ \leq& \biggl( \Vert g_{k} \Vert +\frac{ \Vert g_{k} \Vert \Vert y_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert d_{k-1} \Vert + \Vert g_{k} \Vert + \frac{ \Vert g_{k} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert y_{k-1} \Vert \biggr)^{2} \\ = & \biggl(2 \Vert g_{k} \Vert +2\frac{ \Vert g_{k} \Vert \Vert y_{k-1} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \biggr)^{2} \\ \leq& \biggl(2\gamma+\frac{2\gamma L}{(1-\sigma)c^{2}} \Vert s_{k-1} \Vert \Vert d_{k-1} \Vert \biggr)^{2} \\ \leq& 8\gamma^{2}+2\xi^{2} \Vert s_{k-1} \Vert ^{2} \Vert d_{k-1} \Vert ^{2}. \end{aligned}$$

Then, by defining \(S_{j}=2\xi^{2}\|s_{j}\|^{2}\), for \(l>k_{0}\), we have

$$\begin{aligned} \Vert d_{l} \Vert ^{2}\leq8 \gamma^{2} \Biggl(\sum_{i=k_{0}+1}^{l}{ \prod_{j=i}^{l-1}{S_{j}}} \Biggr)+ \Vert d_{k_{0}} \Vert ^{2}\prod_{j=k_{0}}^{l-1}{S_{j}}. \end{aligned}$$
(38)

From (36), following the corresponding lines in [28, Thm. 3.2], we can conclude that the right-hand side of (38) is bounded, and the bound is independent of l. This together with (37) contradicts (22). Therefore, \(\liminf_{k\rightarrow\infty}\|g_{k}\|=0\). □

A modified version of LSTT+ (MLSTT+)

In order to further improve the efficiency of Algorithm 2, we propose a modified version of \(d_{k}^{\mathrm{LSTT+}}\) (20) as follows:

$$\begin{aligned} d_{k}^{\mathrm{MLSTT+}}=\left \{ \textstyle\begin{array}{l@{\quad}l} -g_{k}+\beta_{k}^{\mathrm{MLSTT+}}d_{k-1}-\theta_{k}z_{k-1}, & \mbox{if } k>0 \mbox{ and } \beta_{k}^{\mathrm{MLSTT+}}>0,\\ -g_{k}, & \mbox{otherwise}, \end{array}\displaystyle \right . \end{aligned}$$
(39)

where \(\theta_{k}\) is given by (12) and

$$\begin{aligned}& \beta_{k}^{\mathrm{MLSTT+}}=\frac {g_{k}^{T}z_{k-1}}{d_{k-1}^{T}y_{k-1}}- \frac{g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}}, \end{aligned}$$
(40)
$$\begin{aligned}& z_{k-1}=g_{k}-\frac{ \Vert g_{k} \Vert }{ \Vert g_{k-1} \Vert }g_{k-1}. \end{aligned}$$
(41)

The difference between (20) and (39) is that \(y_{k-1}\) is replaced by \(z_{k-1}\). This idea, which aims to improve the famous PRP method, originated from [32]. Such a substitution seems useful here in that it could increase the possibility of the CG parameter being positive, and as a result, the three-term direction is used more often. In fact, as iterations go along, \(\|g_{k}\|\) approaches zero asymptotically, and therefore the fact that \(\|g_{k}\|/\|g_{k-1}\|<1\) may frequently happen. If in addition \(g_{k}^{T}g_{k-1}>0\), then we have

$$g_{k}^{T}z_{k-1}= \Vert g_{k} \Vert - \frac{ \Vert g_{k} \Vert }{ \Vert g_{k-1} \Vert }g_{k}^{T}g_{k-1}> \Vert g_{k} \Vert -g_{k}^{T}g_{k-1}=g_{k}^{T}y_{k-1}. $$

The following lemma shows that the search direction (39) also has sufficient descent property.

Lemma 4

Let the search direction \(d_{k}\)be generated by (39). Then it satisfies the following sufficient descent condition (independent of line search):

$$ g_{k}^{T}d_{k}\leq- \Vert g_{k} \Vert ^{2}. $$
(42)

Proof

The proof is similar to that of Lemma 1. □

From Lemma 4, we know that the Zoutendijk condition (16) also holds for Algorithm 3. In what follows, we show that Algorithm 3 is globally convergent for general functions. The following lemma illustrates that the direction \(d_{k}\) generated by Algorithm 3 inherits some useful properties of \(d_{k}^{\mathrm{LSTT+}}\) (20), whose proof is a modification of Lemma 3.

Algorithm 3
figurec

A modified version of LSTT+ algorithm (MLSTT+)

Lemma 5

Suppose that the sequence \(\{d_{k}\}\)of directions is generated by Algorithm 3. If there is a constant \(c>0\)such that \(\|g_{k}\|\geq c\)for anyk, then

$$\begin{aligned} d_{k}\neq0 \quad\textit{for each }k, \quad\textit{and}\quad \sum_{k=0}^{\infty}{ \|u_{k}-u_{k-1}\| ^{2}}< +\infty, \end{aligned}$$

where \(\|u_{k}\|={d_{k}}/{\|d_{k}\|}\).

Proof

From the related analysis in Lemma 3, we have

$$ {c}^{4}\sum_{k=0}^{\infty}\frac{1}{ \Vert d_{k} \Vert ^{2}}< +\infty. $$
(43)

Now we redisplay the direction \(d_{k}\) in (39) as

$$\begin{aligned} d_{k}=-g_{k}-\hat{\theta}^{+}_{k}z_{k-1}+ \hat{\beta}_{k}^{+}d_{k-1}, \end{aligned}$$
(44)

where

$$\begin{aligned} \hat{\theta}^{+}_{k}=\left \{ \textstyle\begin{array}{l@{\quad}l} \theta_{k}, & \mbox{if } \beta_{k}^{\mathrm{MLSTT+}}>0,\\ 0, & \mbox{otherwise}, \end{array}\displaystyle \right . \quad\mbox{and}\quad \hat{\beta}_{k}^{+}=\max \bigl\{ \beta_{k}^{\mathrm{MLSTT+}}, 0 \bigr\} . \end{aligned}$$
(45)

Define

$$\begin{aligned} \hat{a}_{k}=\frac{-g_{k}-\hat{\theta}^{+}_{k}z_{k-1}}{ \Vert d_{k} \Vert },\qquad \hat {b}_{k}= \hat{\beta}_{k}^{+}\frac{ \Vert d_{k-1} \Vert }{ \Vert d_{k} \Vert }. \end{aligned}$$
(46)

According to (44) and (46), it follows that

$$\begin{aligned} u_{k}=\frac{d_{k}}{ \Vert d_{k} \Vert }=\frac{-g_{k}-\hat{\theta}^{+}_{k}z_{k-1}+\hat {\beta}_{k}^{+}d_{k-1}}{ \Vert d_{k} \Vert }=\hat{a}_{k}+ \hat{b}_{k}u_{k-1}. \end{aligned}$$

Thus, following the lines in the proof of Lemma 3, we get

$$\begin{aligned} \Vert u_{k}-u_{k-1} \Vert \leq2 \Vert \hat{a}_{k} \Vert . \end{aligned}$$
(47)

Moreover, we also have

$$ \biggl\vert \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}} \biggr\vert \leq\max \biggl\{ \frac{\sigma}{1-\sigma},1 \biggr\} \quad \text{and}\quad \Vert y_{k-1} \Vert \leq 2 \gamma. $$
(48)

The following relations hold by the definition of \(z_{k-1}\) (41):

$$\begin{aligned} \Vert z_{k-1} \Vert \leq& \Vert g_{k}-g_{k-1} \Vert + \biggl\Vert g_{k-1}- \frac{ \Vert g_{k} \Vert }{ \Vert g_{k-1} \Vert }g_{k-1} \biggr\Vert \\ = & \Vert y_{k-1} \Vert + \biggl\vert 1- \frac{ \Vert g_{k} \Vert }{ \Vert g_{k-1} \Vert } \biggr\vert \Vert g_{k-1} \Vert \\ \leq& \Vert y_{k-1} \Vert + \Vert g_{k-1}-g_{k} \Vert \\ = & 2 \Vert y_{k-1} \Vert . \end{aligned}$$
(49)

By combining (15), (48), and (49), we put a bound on the numerator of \(\|\hat{a}_{k}\|\):

$$\begin{aligned} \bigl\Vert -g_{k}-\hat{\theta}^{+}_{k}z_{k-1} \bigr\Vert \leq& \Vert g_{k} \Vert + \biggl\vert \frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}} \biggr\vert \Vert z_{k-1} \Vert \\ \leq& \Vert g_{k} \Vert +2\max \biggl\{ \frac{\sigma}{1-\sigma},1 \biggr\} \Vert y_{k-1} \Vert \\ \leq& \hat{M}, \end{aligned}$$

where \(\hat{M}=\gamma+4\gamma\max \{\frac{\sigma}{1-\sigma },1 \}\). This together with (47) shows that

$$\begin{aligned} \Vert u_{k}-u_{k-1} \Vert ^{2}\leq4 \Vert \hat{a}_{k} \Vert ^{2}\leq\frac{4\hat{M}^{2}}{ \Vert d_{k} \Vert ^{2}}. \end{aligned}$$

Summing the above inequalities over k and utilizing (43), we complete the proof. □

We finally present the global convergence of Algorithm 3.

Theorem 3

Suppose that the sequence \(\{x_{k}\}\)of iterates is generated by Algorithm 3. Then either \(\|g_{k}\|=0\)for somekor

$$\liminf_{k\rightarrow\infty} \Vert g_{k} \Vert =0. $$

Proof

Given that there is a constant \(c>0\) such that \(\|g_{k}\|\geq c\) for any k, then the conclusions of Lemma 5 hold.

Without loss of generality, we only consider the case where

$$d_{k}=-g_{k}+\beta_{k}^{\mathrm{MLSTT+}}d_{k-1}- \theta_{k}z_{k-1}. $$

So from (15), (18), (26), and (49), we obtain

$$\begin{aligned} \Vert d_{k} \Vert ^{2} =& \biggl\Vert -g_{k}+ \biggl(\frac {g_{k}^{T}z_{k-1}}{d_{k-1}^{T}y_{k-1}}-\frac{g_{k}^{T}d_{k-1}}{ \Vert d_{k-1} \Vert ^{2}} \biggr)d_{k-1} -\frac{g_{k}^{T}d_{k-1}}{d_{k-1}^{T}y_{k-1}}z_{k-1} \biggr\Vert ^{2} \\ \leq& \biggl( \Vert g_{k} \Vert +\frac{ \Vert g_{k} \Vert \Vert z_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert d_{k-1} \Vert + \Vert g_{k} \Vert + \frac{ \Vert g_{k} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \Vert z_{k-1} \Vert \biggr)^{2} \\ = & \biggl(2 \Vert g_{k} \Vert +2\frac{ \Vert g_{k} \Vert \Vert z_{k-1} \Vert \Vert d_{k-1} \Vert }{d_{k-1}^{T}y_{k-1}} \biggr)^{2} \\ \leq& \biggl(2\gamma+\frac{4\gamma L}{(1-\sigma)c^{2}} \Vert s_{k-1} \Vert \Vert d_{k-1} \Vert \biggr)^{2} \\ \leq& 2\eta^{2}+2\rho^{2} \Vert s_{k-1} \Vert ^{2} \Vert d_{k-1} \Vert ^{2}, \end{aligned}$$

where \(\eta=2\gamma\) and \(\rho=\frac{4\gamma L}{(1-\sigma)c^{2}}\).

The remainder of the argument is analogous to that of Theorem 2, hence omitted here. □

Numerical results

In this section, we aim to test the practical effectiveness of Algorithm 2 (LSTT+) and Algorithm 3 (MLSTT+) which are both convergent for general functions under the Wolfe–Powell line search. The numerical results are compared with the TTPRP [22] method and the TTHS [23] method by solving 104 test problems from the CUTE library [3335], whose dimensions range from 2 to 5,000,000.

All codes were written in Matlab R2014a and run on a PC with 4 GB RAM memory and Windows 7 operating system. The stepsizes \(\alpha_{k}\) are generated by the Wolfe–Powell line search with \(\sigma=0.1\) and \(\delta=0.01\). In Tables 1, 2, 3, “Name” and “n” mean the abbreviation of the test problem and its dimension. “Itr/NF/NG” stand for the number of iterations, function evaluations, and gradient evaluations, respectively. “Tcpu” and “\(\|g_{*}\|\)” denote the computing time of CPU and the final norm of the gradient value, respectively. The stopping criterion is \(\| g_{k}\|\leq10^{-6}\) or \(\mathrm{Itr}>2000\).

Table 1 Numerical comparisons of four CG methods
Table 2 Numerical comparisons of four CG methods (continued)
Table 3 Numerical comparisons of four CG methods (continued)

To clearly show the difference in numerical effects between the above mentioned four CG methods, we present the performance profiles introduced by Dolan and Morè [36] in Figs. 1, 2, 3, 4 (with respect to Itr, NF, NG, and Tcpu, respectively), which is based on the following.

Figure 1
figure1

Performance profiles on Itr of four CG methods

Figure 2
figure2

Performance profiles on NF of four CG methods

Figure 3
figure3

Performance profiles on NG of four CG methods

Figure 4
figure4

Performance profiles on Tcpu of four CG methods

Denote the whole set of \(n_{p}\) test problems by \(\mathcal{P}\), and the set of solvers by \(\mathcal{S}\). Let \(t_{p, s}\) be the Tcpu (the Itr or others) required to solve problem \(p \in\mathcal{P}\) by solver \(s \in\mathcal{S}\), and define the performance ratio as

$$\begin{aligned} r_{p, s}=t_{p, s} \big/ \min_{s \in\mathcal{S}} t_{p, s}. \end{aligned}$$

For \(t_{p, s}\) of the “NaN” in Tables 1, 2, 3, we let \(r_{p, s}=2 \max\{r_{p, s}: s \in\mathcal{S}\}\), then the performance profile for each solver can be defined by

$$\begin{aligned} \rho_{s}(\tau)=\frac{1}{n_{p}} \operatorname{size} \bigl(\{p \in \mathcal{P}: \log_{2} r_{p, s} \leq\tau\} \bigr), \end{aligned}$$

where \(\operatorname{size}(A)\) stands for the number of elements in the set A. Hence \(\rho_{s}(\tau)\) is the probability for solver \(s \in\mathcal{S}\) that the performance ratio \(r_{p, s}\) is within a factor \(\tau\in \mathbb{R}\). The function \(\rho_{s}\) is the (cumulative) distribution function for the performance ratio. Apparently the solver whose curved shape is on the top will win over the rest of the solvers. Refer to [36] for more details.

For each method, the performance profile plots the fraction \(\rho _{s}(\tau)\) of the problems for which the method is within a factor τ of the best time. The left side of the figure represents the percentage of the test problems for which a method is the fastest. The right side represents the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time.

In Figs. 1, 2, 3, 4, we compare the performance of the LSTT+ method and the MLSTT+ method with the TTPRP method and the TTHS method. We observe from Fig. 1 that MLSTT+ is the fastest for about 51% of the test problems with the smallest number of iterations, and it ultimately solves about 98% of the test problems. LSTT+ has the second best performance which can solve 88% of the test problems successfully, while TTPRP and TTHS solve about 80% and 78% of the test problems successfully, respectively. Figure 2 shows that MLSTT+ exhibits the best performance for the number of function evaluations since it can solve about 49% of the test problems with the smallest number of function evaluations; LSTT+ has the second best performance as it solves about 40% in the same situation. From Fig. 3, it is not difficult to see that MLSTT+ and LSTT+ perform better than the other two methods about the number of gradient evaluations. Moreover, MLSTT+ is the fastest for the number of gradient evaluations since it solves about 56% of the test problems with the smallest number of gradient evaluations, while LSTT+ solves about 41% of the test problems with the smallest number of gradient evaluations. In Fig. 4, MLSTT+ displays the best performance for CPU time since it solves about 53% of the test problems with the least CPU time and the data for LSTT+ is 42% in the same case, which is second. Since all methods were implemented with the same line search, we can conclude that the LSTT+ method and the MLSTT+ method seem more efficient.

Combining Tables 1, 2, 3 and Figs. 1, 2, 3, 4, we are led to the conclusion that LSTT+ and MLSTT+ perform better than TTPRP and TTHS, in which MLSTT+ is the best one. This shows that the proposed methods of this paper possess good numerical performance.

Conclusion

In this paper, we have presented three new three-term CG methods that are based on the least-squares technique to determine the CG parameters. All can generate sufficient descent directions without the help of a line search procedure. The basic one is globally convergent for uniformly convex functions, while the other two improved variants possess global convergence for general nonlinear functions. Preliminary numerical results show that our methods are very promising.

References

  1. 1.

    Tripathi, A., McNulty, I., Shpyrko, O.G.: Ptychographic overlap constraint errors and the limits of their numerical recovery using conjugate gradient descent methods. Opt. Express 22(2), 1452–1466 (2014)

  2. 2.

    Antoine, X., Levitt, A., Tang, Q.: Efficient spectral computation of the stationary states of rotating Bose–Einstein condensates by preconditioned nonlinear conjugate gradient methods. J. Comput. Phys. 343, 92–109 (2017)

  3. 3.

    Azimi, A., Daneshgar, E.: Indoor contaminant source identification by inverse zonal method: Levenberg–Marquardt and conjugate gradient methods. Adv. Build. Energy Res. 12(2), 250–273 (2018)

  4. 4.

    Yang, L.F., Jian, J.B., Wang, Y.Y., Dong, Z.Y.: Projected mixed integer programming formulations for unit commitment problem. Int. J. Electr. Power Energy Syst. 68, 195–202 (2015)

  5. 5.

    Yang, L.F., Jian, J.B., Zhu, Y.N., Dong, Z.Y.: Tight relaxation method for unit commitment problem using reformulation and lift-and-project. IEEE Trans. Power Syst. 30(1), 13–23 (2015)

  6. 6.

    Yang, L.F., Zhang, C., Jian, J.B., Meng, K., Xu, Y., Dong, Z.Y.: A novel projected two-binary-variable formulation for unit commitment in power systems. Appl. Energy 187, 732–745 (2017)

  7. 7.

    Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)

  8. 8.

    Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)

  9. 9.

    Polak, E.: Note sur la convergence de méthodes de directions conjuées. Revue Francaise Information Recherche Operationnelle 16(16), 35–43 (1969)

  10. 10.

    Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)

  11. 11.

    Dai, Y.H., Yuan, Y.X.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)

  12. 12.

    Dong, X.L., Liu, H.W., He, Y.B.: New version of the three-term conjugate gradient method based on spectral scaling conjugacy condition that generates descent search direction. Appl. Math. Comput. 269, 606–617 (2015)

  13. 13.

    Jian, J.B., Chen, Q., Jiang, X.Z., Zeng, Y.F., Yin, J.H.: A new spectral conjugate gradient method for large-scale unconstrained optimization. Optim. Methods Softw. 32(3), 503–515 (2017)

  14. 14.

    Sun, M., Liu, J.: New hybrid conjugate gradient projection method for the convex constrained equations. Calcolo 53(3), 399–411 (2016)

  15. 15.

    Mtagulwa, P., Kaelo, P.: An efficient modified PRP-FR hybrid conjugate gradient method for solving unconstrained optimization problems. Appl. Numer. Math. 145, 111–120 (2019)

  16. 16.

    Dong, X.-L., Han, D.-R., Ghanbari, R., Li, X.-L., Dai, Z.-F.: Some new three-term Hestenes–Stiefel conjugate gradient methods with affine combination. Optimization 66(5), 759–776 (2017)

  17. 17.

    Albaali, M., Narushima, Y., Yabe, H.: A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization. Comput. Optim. Appl. 60(1), 89–110 (2015)

  18. 18.

    Babaie-Kafaki, S., Ghanbari, R.: Two modified three-term conjugate gradient methods with sufficient descent property. Optim. Lett. 8(8), 2285–2297 (2014)

  19. 19.

    Arzuka, I., Bakar, M.R.A., Leong, W.J.: A scaled three-term conjugate gradient method for unconstrained optimization. J. Inequal. Appl. 2016(1), Article ID 325 (2016)

  20. 20.

    Liu, J.K., Feng, Y.M., Zou, L.M.: Some three-term conjugate gradient methods with the inexact line search condition. Calcolo 55(2), Article ID 16 (2018)

  21. 21.

    Li, M.: A family of three-term nonlinear conjugate gradient methods close to the memoryless BFGS method. Optim. Lett. 12(8), 1911–1927 (2018)

  22. 22.

    Zhang, L., Zhou, W.J., Li, D.H.: A descent modified Polak–Ribiére–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26(4), 629–640 (2006)

  23. 23.

    Zhang, L., Zhou, W.J., Li, D.H.: Some descent three-term conjugate gradient methods and their global convergence. Optim. Methods Softw. 22(4), 697–711 (2007)

  24. 24.

    Dennis, J.E. Jr., Moré, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)

  25. 25.

    Zhang, L., Zhou, W.J., Li, D.H.: Global convergence of a modified Fletcher–Reeves conjugate gradient method with Armijo-type line search. Numer. Math. 104(4), 561–572 (2006)

  26. 26.

    Babaie-Kafaki, S., Ghanbari, R.: A hybridization of the Polak–Ribiére–Polyak and Fletcher–Reeves conjugate gradient methods. Numer. Algorithms 68(3), 481–495 (2015)

  27. 27.

    Babaie-Kafaki, S., Ghanbari, R.: A hybridization of the Hestenes–Stiefel and Dai–Yuan conjugate gradient methods based on a least-squares approach. Optim. Methods Softw. 30(4), 673–681 (2015)

  28. 28.

    Hager, W.W., Zhang, H.C.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 16(1), 170–192 (2005)

  29. 29.

    Hager, W.W., Zhang, H.C.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2(1), 35–58 (2006)

  30. 30.

    Zoutendijk, G.: Nonlinear programming, computational methods. In: Abadie, J. (ed.) Integer and Nonlinear Programming, pp. 37–86. North-Holland, Amsterdam (1970)

  31. 31.

    Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)

  32. 32.

    Wei, Z.X., Yao, S.W., Liu, L.Y.: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183(2), 1341–1350 (2006)

  33. 33.

    More, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17–41 (1981)

  34. 34.

    Bongartz, I., Conn, A.R., Gould, N., Toint, P.L.: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw. 21(1), 123–160 (1995)

  35. 35.

    Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147–161 (2008)

  36. 36.

    Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)

Download references

Acknowledgements

The authors wish to thank the two anonymous referees and the editor for their constructive and pertinent suggestions for improving both the presentation and the numerical experiments. They would like to thank for the support of funds as well.

Availability of data and materials

Not applicable.

Funding

This work was supported by the National Natural Science Foundation (11761013) and Guangxi Natural Science Foundation (2018GXNSFFA281007) of China.

Author information

All authors read and approved the final manuscript. CT mainly contributed to the algorithm design and convergence analysis; SL mainly contributed to the convergence analysis and numerical results; and ZC mainly contributed to the algorithm design.

Correspondence to Chunming Tang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tang, C., Li, S. & Cui, Z. Least-squares-based three-term conjugate gradient methods. J Inequal Appl 2020, 27 (2020). https://doi.org/10.1186/s13660-020-2301-6

Download citation

MSC

  • 90C30
  • 65K05
  • 49M37

Keywords

  • Three-term conjugate gradient method
  • Least-squares technique
  • Sufficient descent property
  • Wolfe–Powell line search
  • Global convergence