- Research
- Open access
- Published:

# Least-squares-based three-term conjugate gradient methods

*Journal of Inequalities and Applications*
**volume 2020**, Article number: 27 (2020)

## Abstract

In this paper, we first propose a new three-term conjugate gradient (CG) method, which is based on the least-squares technique, to determine the CG parameter, named LSTT. And then, we present two improved variants of the LSTT CG method, aiming to obtain the global convergence property for general nonlinear functions. The least-squares technique used here well combines the advantages of two existing efficient CG methods. The search directions produced by the proposed three methods are sufficient descent directions independent of any line search procedure. Moreover, with the Wolfe–Powell line search, LSTT is proved to be globally convergent for uniformly convex functions, and the two improved variants are globally convergent for general nonlinear functions. Preliminary numerical results are reported to illustrate that our methods are efficient and have advantages over two famous three-term CG methods.

## 1 Introduction

Consider the following unconstrained optimization problem:

where \(f: \mathbb{R}^{n}\rightarrow\mathbb{R}\) is a continuously differentiable function whose gradient function is denoted by \(g(x)\).

Conjugate gradient (CG) methods are known to be among the most efficient methods for unconstrained optimization due to their advantages of simple structure, low storage, and nice numerical behavior. CG methods have been widely used to solve practical problems, especially large-scale problems such as image recovery [1], condensed matter physics [2], environmental science [3], and unit commitment problems [4–6].

For the current iteration point \(x_{k}\), the CG methods yield the new iterate \(x_{k+1}\) by the formula

where \(\alpha_{k}\) is the stepsize determined by a certain line search and \(d_{k}\) is the so-called search direction in the form of

in which \(\beta_{k}\) is a parameter. Different choices of \(\beta_{k}\) correspond to different CG methods. Some classical and famous formulas of the CG methods parameter \(\beta _{k}\) are:

where \(g_{k}=g(x_{k})\), \(y_{k-1}=g_{k}-g_{k-1}\), and \(\|\cdot\|\) denotes the Euclidean norm.

Here are two commonly used line searches for choosing the stepsize \(\alpha_{k}\).

*The Wolfe–Powell line search*: the stepsize \(\alpha _{k}\) satisfies the following two relations:$$ f(x_{k}+\alpha_{k}d_{k})-f(x_{k}) \leq\delta\alpha_{k} g_{k}^{T}d_{k} $$(1)and

$$ g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \geq\sigma g_{k}^{T}d_{k}, $$(2)where \(0<\delta<\sigma<1\).

*The strong Wolfe–Powell line search*: the stepsize \(\alpha_{k}\) satisfies both (1) and the following relation:$$ \bigl\vert g(x_{k}+\alpha_{k}d_{k})^{T}d_{k} \bigr\vert \leq\sigma \bigl\vert g_{k}^{T}d_{k} \bigr\vert . $$

In recent years, based on the above classical formulas and line searches, many variations of CG methods have been proposed, including spectral CG methods [12, 13], hybrid CG methods [14, 15], and three-term CG methods [16, 17]. Among them, the three-term CG methods seem to attract more attention, and a great deal of efforts has been devoted to developing this kind of methods, see, e.g., [18–23]. In particular, by combining the PRP method [9, 10] with the BFGS quasi-Newton method [24], Zhang et al. [22] presented a three-term PRP CG method (TTPRP). Their motivation is that the PRP method has good numerical performance but is generally not a descent method when the Armijo-type line search is executed. The direction of TTPRP is given by

where

which is always a descent direction (independent of line searches) for the objective function.

In the same way, Zhang et al. [25] presented a three-term FR CG method (TTFR) whose direction is in the form of

where \(\theta_{k}^{(1)}\) is given by (3). Later, Zhang et al. [23] proposed a three-term HS CG method (TTHS) whose direction is defined by

where

The above approaches [22, 23, 25] have a common advantage that the relation \(d_{k}^{T}g_{k}=-\|g_{k}\|^{2}\) holds. This means that they always generate descent directions without the help of line searches. Moreover, they can all achieve global convergence under suitable line searches.

Before putting forward the idea of our new three-term CG methods, we first briefly review a hybrid CG method (HCG) proposed by Babaie-Kafaki and Ghanbari [26], in which the search direction is in the form of

where the parameter is given by a convex combination of FR and PRP formulas

It is obvious that the choice of \(\theta_{k}\) is very critical for the practical performance of the HCG method. By taking into account that the TTHS method has good theoretical property and numerical performance, Babaie-Kafaki and Ghanbari [26] proposed a way to select \(\theta_{k}\) such that the direction \(d_{k}^{\mathrm{HCG}}\) is as close as possible to \(d_{k}^{\mathrm{TTHS}}\) in the sense that their distance is minimized, i.e., the optimal choice \(\theta_{k}^{*}\) is obtained by solving the least-squares problem

Similarly, Babaie-Kafaki and Ghanbari [27] proposed another hybrid CG method by combining HS with DY, in which the combination coefficient is also determined by the least-squares technique (5). The numerical results in [26, 27] show that this least-squares-based approach is very efficient.

Summarizing the above discussions, we have the following two observations: (1) the three-term CG methods perform well both theoretically and numerically; (2) the least-squares technique can greatly improve the efficiency of CG methods. Putting these together, the main goal of this paper is to develop new three-term CG methods that are based on the least-squares technique. More precisely, we first propose a basic three-term CG method, namely LSTT, in which the least-squares technique well combines the advantages of two existing efficient CG methods. With the Wolfe–Powell line search, LSTT is proved to be globally convergent for uniformly convex functions. In order to obtain the global convergence property for general nonlinear functions, we further present two improved variants of the LSTT CG method. All the three methods generate sufficient descent directions independent of any line search procedure. Global convergence is also analyzed for the proposed methods. Finally, some preliminary numerical results are reported to illustrate that our methods are efficient and have advantages over two famous three-term CG methods.

The paper is organized as follows. In Sect. 2, we present the basic LSTT CG method. Global convergence of LSTT is proved in Sect. 3. Two improved variants of LSTT and their convergence analysis are given in Sect. 4. Numerical results are reported in Sect. 5. Some concluding remarks are made in Sect. 6.

## 2 Least-squares-based three-term (LSTT) CG method

In this section, we first derive a new three-term CG formula, and then present the corresponding CG algorithm. Our formula is based on the following modified HS (MHS) formula proposed by Hager and Zhang [28, 29]:

where \(\tau_{k}\) (≥0) is a parameter. The corresponding direction is then given by

Different choices of \(\tau_{k}\) will lead to different types of CG formulas. In particular, \(\beta_{k}^{\mathrm{MHS}}(0)=\beta_{k}^{\mathrm{HS}}\), and \(\beta_{k}^{\mathrm{MHS}}(2)\) is just the formula proposed in [28].

In this paper, we present a more sophisticated choice of \(\tau_{k}\) by making use of the least-squares technique. More precisely, the optimal choice \(\tau_{k}^{*}\) is determined such that the direction \(d_{k}^{\mathrm{MHS}}\) is as close as possible to \(d_{k}^{\mathrm{TTHS}}\), i.e., it is generated by solving the least-squares problem

Substituting (4) and (7) in (8), we have

which implies

Thus, from (6), we obtain

So far, it seems that the two-term direction \(d_{k}^{\mathrm {MHS}}(\tau_{k}^{*})\) obtained from (9) and (10) is a “good enough” direction; however, it may not always be a descent direction of the objective function. In order to overcome this difficulty, we propose a least-squares-based three-term (LSTT) direction by augmenting a term to \(d_{k}^{\mathrm {MHS}}(\tau_{k}^{*})\) as follows:

where

The following lemma shows that the direction \(d_{k}^{\mathrm{LSTT}}\) (11) is a sufficient descent direction, which is independent of the line search used.

### Lemma 1

*Let the search direction*
\(d_{k}:=d_{k}^{\mathrm{LSTT}}\)*be generated by* (11). *Then it satisfies the following sufficient descent condition*:

### Proof

For \(k=0\), we have \(d_{0}=-g_{0}\), so it follows that \(g_{0}^{T}d_{0}=-\|g_{0}\|^{2}\).

For \(k\geq1\), we have

which along with (10) and (12) shows that

So the proof is completed. □

Now, we formally present the least-squares-based three-term CG algorithm (Algorithm 1) that uses \(d_{k}^{\mathrm{LSTT}}\) (11) as the search direction. Note that it reduces to the classical HS method if an exact line search is executed in Step 3.

## 3 Convergence analysis for uniformly convex functions

In this section, we establish the global convergence of Algorithm 1 for uniformly convex functions. The stepsize \(\alpha_{k}\) at Step 3 is generated by the Wolfe–Powell line search (1) and (2). For this purpose, we first make two standard assumptions on the objective function, which are assumed to be hold throughout the rest of the paper.

### Assumption 1

The level set \(\varOmega=\{x\in\mathbb{R}^{n}| f(x)\leq f(x_{0})\}\) is bounded.

### Assumption 2

There is an open set \(\mathcal{O}\) containing *Ω*, in which \(f(x)\) is continuous differentiable and its gradient function \(g(x)\) is Lipschitz continuous, i.e., there exists a constant \(L>0\) such that

From Assumptions 1 and 2, it is not difficult to verify that there is a constant \(\gamma>0\) such that

The following lemma is commonly used in proving the convergence of CG methods, which is called the *Zoutendijk condition* [30].

### Lemma 2

*Suppose that the sequence*
\(\{x_{k}\}\)*of iterates is generated by Algorithm* 1. *If the search direction*
\(d_{k}\)*satisfies*
\(g_{k}^{T}d_{k}<0\)*and the stepsize*
\(\alpha_{k}\)*is calculated by the Wolfe–Powell line search* (1) *and* (2), *then we have*

From Lemma 1, we know that if Algorithm 1 does not stop, then

Thus, under Assumptions 1 and 2, relation (16) holds immediately for Algorithm 1.

Now, we present the global convergence of Algorithm 1 (with \(\epsilon=0\)) for uniformly convex functions.

### Theorem 1

*Suppose that the sequence*
\(\{x_{k}\}\)*of iterates is generated by Algorithm* 1, *and that the stepsize*
\(\alpha_{k}\)*is calculated by the Wolfe–Powell line search* (1) *and* (2). *If**f**is uniformly convex on the level set**Ω*, *i*.*e*., *there exists a constant*
\(\mu>0\)*such that*

*then either*
\(\|g_{k}\|=0\)*for some**k*, *or*

### Proof

If \(\|g_{k}\|=0\) for some *k*, then the algorithm stops. So, in what follows, we assume that an infinite sequence \(\{x_{k}\}\) is generated.

According to Lipschitz condition (14), the following relation holds:

where \(s_{k-1}:=x_{k}-x_{k-1}\). In addition, from (17) it follows that

By combining the definition of \(d^{k}\) (cf. (10), (11), and (12)) with relations (18) and (19), we have

This together with Lemma 1 and (16) shows that

which implies that \(\lim_{k\rightarrow\infty}\|g_{k}\|=0\). □

## 4 Two improved variants of the LSTT CG method

Note that the global convergence of Algorithm 1 is established only for uniformly convex functions. In this section, we present two improved variants of Algorithm 1, which both have global convergence property for general nonlinear functions.

### 4.1 An improved version of LSTT (LSTT+)

In fact, the main difficulty impairing convergence for general functions is that \(\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})\) (cf. (10)) may be negative. So, similar to the strategy used in [31], we present the first modification of direction \(d_{k}^{\mathrm{LSTT}}\) (11) as follows:

where \(\beta_{k}^{\mathrm{MHS}}(\tau_{k}^{*})\) and \(\theta_{k}\) are given by (10) and (12), respectively. The corresponding algorithm is given in Algorithm 2.

Obviously, the search direction \(d_{k}\) generated by Algorithm 2 satisfies the sufficient descent condition (13). Therefore, if the stepsize \(\alpha_{k}\) is calculated by the Wolfe–Powell line search (1) and (2), then the Zoutendijk condition (16) also holds for Algorithm 2.

The following lemma shows some other important properties about the search direction \(d^{k}\).

### Lemma 3

*Suppose that the sequence*
\(\{d_{k}\}\)*of directions is generated by Algorithm* 2, *and that the stepsize*
\(\alpha_{k}\)*is calculated by the Wolfe–Powell line search* (1) *and* (2). *If there is a constant*
\(c>0\)*such that*
\(\|g_{k}\|\geq c\)*for any**k*, *then*

*where*
\(\|u_{k}\|={d_{k}}/{\|d_{k}\|}\).

### Proof

Firstly, from Lemma 1 and the fact that \(\|g_{k}\|\geq c\), we have

which implies that \(d_{k}\neq0\) for each *k*.

Secondly, from (16) and (21), we have

Now we rewrite the direction \(d_{k}\) in (20) as

where

Denote

According to (23) and (24), it follows that

From the fact that \(\|u_{k}\|=1\), we obtain

Since \(b_{k}\geq0\), we get

On the other hand, from the Wolfe–Powell line search condition (2) and (21), we have

Since \(g_{k-1}^{T}d_{k-1}<0\), we have

This together with (26) shows that

Again from (2), it follows that

which implies

By combining (27) and (28), we have

In addition, the following relation comes directly from (15)

Finally, from (15), (29), and (30), we give a bound on the numerator of \(a_{k}\):

where \(M=\gamma+2\gamma\max \{\frac{\sigma}{1-\sigma },1 \}\). This together with (25) shows that

Summing the above relation over *k* and using (22), the proof is completed. □

We are now ready to prove the global convergence of Algorithm 2.

### Theorem 2

*Suppose that the sequence*
\(\{x_{k}\}\)*of iterates is generated by Algorithm* 2, *and that the stepsize*
\(\alpha_{k}\)*is calculated by the Wolfe–Powell line search* (1) *and* (2). *Then either*
\(\|g_{k}\|=0\)*for some**k**or*

### Proof

Suppose by contradiction that there is a constant \(c>0\) such that \(\| g_{k}\|\geq c\) for any *k*. So the conditions of Lemma 3 hold.

We first show that there is a bound on the steps \(s_{k}\), whose proof is a modified version of [28, Thm. 3.2]. From Assumption 1, there is a constant \(B>0\) such that

which implies

For any \(l\geq k\), it is clear that

This together with the triangle inequality and (31) shows that

Denote

where *σ*, *L*, and *γ* are given in (2), (14), and (15), respectively. Let △ be a positive integer, chosen large enough that

Moreover, from Lemma 3, we can choose an index \(k_{0}\) large enough that

Thus, if \(j>k\geq k_{0}\) and \(j-k\leq\triangle\), we can derive the following relations by (34) and the Cauchy–Schwarz inequality:

Combining (32) and (35), we have

where \(l>k\geq k_{0}\) and \(l-k\leq\triangle\).

Next, we prove that there is a bound on the directions \(d_{k}\).

If \(d_{k}=-g_{k}\) in (20), then from (15) we have

In what follows, we consider the case where

Thus, from (15), (18), and (26), we have

Then, by defining \(S_{j}=2\xi^{2}\|s_{j}\|^{2}\), for \(l>k_{0}\), we have

From (36), following the corresponding lines in [28, Thm. 3.2], we can conclude that the right-hand side of (38) is bounded, and the bound is independent of *l*. This together with (37) contradicts (22). Therefore, \(\liminf_{k\rightarrow\infty}\|g_{k}\|=0\). □

### 4.2 A modified version of LSTT+ (MLSTT+)

In order to further improve the efficiency of Algorithm 2, we propose a modified version of \(d_{k}^{\mathrm{LSTT+}}\) (20) as follows:

where \(\theta_{k}\) is given by (12) and

The difference between (20) and (39) is that \(y_{k-1}\) is replaced by \(z_{k-1}\). This idea, which aims to improve the famous PRP method, originated from [32]. Such a substitution seems useful here in that it could increase the possibility of the CG parameter being positive, and as a result, the three-term direction is used more often. In fact, as iterations go along, \(\|g_{k}\|\) approaches zero asymptotically, and therefore the fact that \(\|g_{k}\|/\|g_{k-1}\|<1\) may frequently happen. If in addition \(g_{k}^{T}g_{k-1}>0\), then we have

The following lemma shows that the search direction (39) also has sufficient descent property.

### Lemma 4

*Let the search direction*
\(d_{k}\)*be generated by* (39). *Then it satisfies the following sufficient descent condition* (*independent of line search*):

### Proof

The proof is similar to that of Lemma 1. □

From Lemma 4, we know that the Zoutendijk condition (16) also holds for Algorithm 3. In what follows, we show that Algorithm 3 is globally convergent for general functions. The following lemma illustrates that the direction \(d_{k}\) generated by Algorithm 3 inherits some useful properties of \(d_{k}^{\mathrm{LSTT+}}\) (20), whose proof is a modification of Lemma 3.

### Lemma 5

*Suppose that the sequence*
\(\{d_{k}\}\)*of directions is generated by Algorithm* 3. *If there is a constant*
\(c>0\)*such that*
\(\|g_{k}\|\geq c\)*for any**k*, *then*

*where*
\(\|u_{k}\|={d_{k}}/{\|d_{k}\|}\).

### Proof

From the related analysis in Lemma 3, we have

Now we redisplay the direction \(d_{k}\) in (39) as

where

Define

According to (44) and (46), it follows that

Thus, following the lines in the proof of Lemma 3, we get

Moreover, we also have

The following relations hold by the definition of \(z_{k-1}\) (41):

By combining (15), (48), and (49), we put a bound on the numerator of \(\|\hat{a}_{k}\|\):

where \(\hat{M}=\gamma+4\gamma\max \{\frac{\sigma}{1-\sigma },1 \}\). This together with (47) shows that

Summing the above inequalities over *k* and utilizing (43), we complete the proof. □

We finally present the global convergence of Algorithm 3.

### Theorem 3

*Suppose that the sequence*
\(\{x_{k}\}\)*of iterates is generated by Algorithm* 3. *Then either*
\(\|g_{k}\|=0\)*for some**k**or*

### Proof

Given that there is a constant \(c>0\) such that \(\|g_{k}\|\geq c\) for any *k*, then the conclusions of Lemma 5 hold.

Without loss of generality, we only consider the case where

So from (15), (18), (26), and (49), we obtain

where \(\eta=2\gamma\) and \(\rho=\frac{4\gamma L}{(1-\sigma)c^{2}}\).

The remainder of the argument is analogous to that of Theorem 2, hence omitted here. □

## 5 Numerical results

In this section, we aim to test the practical effectiveness of Algorithm 2 (LSTT+) and Algorithm 3 (MLSTT+) which are both convergent for general functions under the Wolfe–Powell line search. The numerical results are compared with the TTPRP [22] method and the TTHS [23] method by solving 104 test problems from the CUTE library [33–35], whose dimensions range from 2 to 5,000,000.

All codes were written in Matlab R2014a and run on a PC with 4 GB RAM memory and Windows 7 operating system. The stepsizes \(\alpha_{k}\) are generated by the Wolfe–Powell line search with \(\sigma=0.1\) and \(\delta=0.01\). In Tables 1, 2, 3, “Name” and “n” mean the abbreviation of the test problem and its dimension. “Itr/NF/NG” stand for the number of iterations, function evaluations, and gradient evaluations, respectively. “Tcpu” and “\(\|g_{*}\|\)” denote the computing time of CPU and the final norm of the gradient value, respectively. The stopping criterion is \(\| g_{k}\|\leq10^{-6}\) or \(\mathrm{Itr}>2000\).

To clearly show the difference in numerical effects between the above mentioned four CG methods, we present the performance profiles introduced by Dolan and Morè [36] in Figs. 1, 2, 3, 4 (with respect to Itr, NF, NG, and Tcpu, respectively), which is based on the following.

Denote the whole set of \(n_{p}\) test problems by \(\mathcal{P}\), and the set of solvers by \(\mathcal{S}\). Let \(t_{p, s}\) be the Tcpu (the Itr or others) required to solve problem \(p \in\mathcal{P}\) by solver \(s \in\mathcal{S}\), and define the performance ratio as

For \(t_{p, s}\) of the “NaN” in Tables 1, 2, 3, we let \(r_{p, s}=2 \max\{r_{p, s}: s \in\mathcal{S}\}\), then the performance profile for each solver can be defined by

where \(\operatorname{size}(A)\) stands for the number of elements in the set *A*. Hence \(\rho_{s}(\tau)\) is the probability for solver \(s \in\mathcal{S}\) that the performance ratio \(r_{p, s}\) is within a factor \(\tau\in \mathbb{R}\). The function \(\rho_{s}\) is the (cumulative) distribution function for the performance ratio. Apparently the solver whose curved shape is on the top will win over the rest of the solvers. Refer to [36] for more details.

For each method, the performance profile plots the fraction \(\rho _{s}(\tau)\) of the problems for which the method is within a factor *τ* of the best time. The left side of the figure represents the percentage of the test problems for which a method is the fastest. The right side represents the percentage of the test problems that are successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor *τ* of the best time.

In Figs. 1, 2, 3, 4, we compare the performance of the LSTT+ method and the MLSTT+ method with the TTPRP method and the TTHS method. We observe from Fig. 1 that MLSTT+ is the fastest for about 51% of the test problems with the smallest number of iterations, and it ultimately solves about 98% of the test problems. LSTT+ has the second best performance which can solve 88% of the test problems successfully, while TTPRP and TTHS solve about 80% and 78% of the test problems successfully, respectively. Figure 2 shows that MLSTT+ exhibits the best performance for the number of function evaluations since it can solve about 49% of the test problems with the smallest number of function evaluations; LSTT+ has the second best performance as it solves about 40% in the same situation. From Fig. 3, it is not difficult to see that MLSTT+ and LSTT+ perform better than the other two methods about the number of gradient evaluations. Moreover, MLSTT+ is the fastest for the number of gradient evaluations since it solves about 56% of the test problems with the smallest number of gradient evaluations, while LSTT+ solves about 41% of the test problems with the smallest number of gradient evaluations. In Fig. 4, MLSTT+ displays the best performance for CPU time since it solves about 53% of the test problems with the least CPU time and the data for LSTT+ is 42% in the same case, which is second. Since all methods were implemented with the same line search, we can conclude that the LSTT+ method and the MLSTT+ method seem more efficient.

Combining Tables 1, 2, 3 and Figs. 1, 2, 3, 4, we are led to the conclusion that LSTT+ and MLSTT+ perform better than TTPRP and TTHS, in which MLSTT+ is the best one. This shows that the proposed methods of this paper possess good numerical performance.

## 6 Conclusion

In this paper, we have presented three new three-term CG methods that are based on the least-squares technique to determine the CG parameters. All can generate sufficient descent directions without the help of a line search procedure. The basic one is globally convergent for uniformly convex functions, while the other two improved variants possess global convergence for general nonlinear functions. Preliminary numerical results show that our methods are very promising.

## References

Tripathi, A., McNulty, I., Shpyrko, O.G.: Ptychographic overlap constraint errors and the limits of their numerical recovery using conjugate gradient descent methods. Opt. Express

**22**(2), 1452–1466 (2014)Antoine, X., Levitt, A., Tang, Q.: Efficient spectral computation of the stationary states of rotating Bose–Einstein condensates by preconditioned nonlinear conjugate gradient methods. J. Comput. Phys.

**343**, 92–109 (2017)Azimi, A., Daneshgar, E.: Indoor contaminant source identification by inverse zonal method: Levenberg–Marquardt and conjugate gradient methods. Adv. Build. Energy Res.

**12**(2), 250–273 (2018)Yang, L.F., Jian, J.B., Wang, Y.Y., Dong, Z.Y.: Projected mixed integer programming formulations for unit commitment problem. Int. J. Electr. Power Energy Syst.

**68**, 195–202 (2015)Yang, L.F., Jian, J.B., Zhu, Y.N., Dong, Z.Y.: Tight relaxation method for unit commitment problem using reformulation and lift-and-project. IEEE Trans. Power Syst.

**30**(1), 13–23 (2015)Yang, L.F., Zhang, C., Jian, J.B., Meng, K., Xu, Y., Dong, Z.Y.: A novel projected two-binary-variable formulation for unit commitment in power systems. Appl. Energy

**187**, 732–745 (2017)Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand.

**49**(6), 409–436 (1952)Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J.

**7**(2), 149–154 (1964)Polak, E.: Note sur la convergence de méthodes de directions conjuées. Revue Francaise Information Recherche Operationnelle

**16**(16), 35–43 (1969)Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys.

**9**(4), 94–112 (1969)Dai, Y.H., Yuan, Y.X.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim.

**10**(1), 177–182 (1999)Dong, X.L., Liu, H.W., He, Y.B.: New version of the three-term conjugate gradient method based on spectral scaling conjugacy condition that generates descent search direction. Appl. Math. Comput.

**269**, 606–617 (2015)Jian, J.B., Chen, Q., Jiang, X.Z., Zeng, Y.F., Yin, J.H.: A new spectral conjugate gradient method for large-scale unconstrained optimization. Optim. Methods Softw.

**32**(3), 503–515 (2017)Sun, M., Liu, J.: New hybrid conjugate gradient projection method for the convex constrained equations. Calcolo

**53**(3), 399–411 (2016)Mtagulwa, P., Kaelo, P.: An efficient modified PRP-FR hybrid conjugate gradient method for solving unconstrained optimization problems. Appl. Numer. Math.

**145**, 111–120 (2019)Dong, X.-L., Han, D.-R., Ghanbari, R., Li, X.-L., Dai, Z.-F.: Some new three-term Hestenes–Stiefel conjugate gradient methods with affine combination. Optimization

**66**(5), 759–776 (2017)Albaali, M., Narushima, Y., Yabe, H.: A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization. Comput. Optim. Appl.

**60**(1), 89–110 (2015)Babaie-Kafaki, S., Ghanbari, R.: Two modified three-term conjugate gradient methods with sufficient descent property. Optim. Lett.

**8**(8), 2285–2297 (2014)Arzuka, I., Bakar, M.R.A., Leong, W.J.: A scaled three-term conjugate gradient method for unconstrained optimization. J. Inequal. Appl.

**2016**(1), Article ID 325 (2016)Liu, J.K., Feng, Y.M., Zou, L.M.: Some three-term conjugate gradient methods with the inexact line search condition. Calcolo

**55**(2), Article ID 16 (2018)Li, M.: A family of three-term nonlinear conjugate gradient methods close to the memoryless BFGS method. Optim. Lett.

**12**(8), 1911–1927 (2018)Zhang, L., Zhou, W.J., Li, D.H.: A descent modified Polak–Ribiére–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal.

**26**(4), 629–640 (2006)Zhang, L., Zhou, W.J., Li, D.H.: Some descent three-term conjugate gradient methods and their global convergence. Optim. Methods Softw.

**22**(4), 697–711 (2007)Dennis, J.E. Jr., Moré, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev.

**19**(1), 46–89 (1977)Zhang, L., Zhou, W.J., Li, D.H.: Global convergence of a modified Fletcher–Reeves conjugate gradient method with Armijo-type line search. Numer. Math.

**104**(4), 561–572 (2006)Babaie-Kafaki, S., Ghanbari, R.: A hybridization of the Polak–Ribiére–Polyak and Fletcher–Reeves conjugate gradient methods. Numer. Algorithms

**68**(3), 481–495 (2015)Babaie-Kafaki, S., Ghanbari, R.: A hybridization of the Hestenes–Stiefel and Dai–Yuan conjugate gradient methods based on a least-squares approach. Optim. Methods Softw.

**30**(4), 673–681 (2015)Hager, W.W., Zhang, H.C.: A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim.

**16**(1), 170–192 (2005)Hager, W.W., Zhang, H.C.: A survey of nonlinear conjugate gradient methods. Pac. J. Optim.

**2**(1), 35–58 (2006)Zoutendijk, G.: Nonlinear programming, computational methods. In: Abadie, J. (ed.) Integer and Nonlinear Programming, pp. 37–86. North-Holland, Amsterdam (1970)

Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim.

**2**(1), 21–42 (1992)Wei, Z.X., Yao, S.W., Liu, L.Y.: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput.

**183**(2), 1341–1350 (2006)More, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw.

**7**(1), 17–41 (1981)Bongartz, I., Conn, A.R., Gould, N., Toint, P.L.: CUTE: constrained and unconstrained testing environment. ACM Trans. Math. Softw.

**21**(1), 123–160 (1995)Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim.

**10**(1), 147–161 (2008)Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program.

**91**(2), 201–213 (2002)

### Acknowledgements

The authors wish to thank the two anonymous referees and the editor for their constructive and pertinent suggestions for improving both the presentation and the numerical experiments. They would like to thank for the support of funds as well.

### Availability of data and materials

Not applicable.

## Funding

This work was supported by the National Natural Science Foundation (11761013) and Guangxi Natural Science Foundation (2018GXNSFFA281007) of China.

## Author information

### Authors and Affiliations

### Contributions

All authors read and approved the final manuscript. CT mainly contributed to the algorithm design and convergence analysis; SL mainly contributed to the convergence analysis and numerical results; and ZC mainly contributed to the algorithm design.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Tang, C., Li, S. & Cui, Z. Least-squares-based three-term conjugate gradient methods.
*J Inequal Appl* **2020**, 27 (2020). https://doi.org/10.1186/s13660-020-2301-6

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s13660-020-2301-6