The new spectral conjugate gradient method for large-scale unconstrained optimisation

The spectral conjugate gradient methods are very interesting and have been proved to be effective for strictly convex quadratic minimisation. In this paper, a new spectral conjugate gradient method is proposed to solve large-scale unconstrained optimisation problems. Motivated by the advantages of approximate optimal stepsize strategy used in the gradient method, we design a new scheme for the choices of the spectral and conjugate parameters. Furthermore, the new search direction satisfies the spectral property and sufficient descent condition. Under some suitable assumptions, the global convergence of the developed method is established. Numerical comparisons show better behaviour of the proposed method with respect to some existing methods for a set of 130 test problems.


Introduction
Consider the following unconstrained optimisation: where f : R n → R is continuously differentiable and bounded from below. Conjugate gradient method is one of the most effective line search methods for solving unconstrained optimisation problem (1) due to its features of low memory requirement and simple computation. Let x 0 be an arbitrary initial approximate solution of problem (1). The iterative formula of conjugate gradient is given by The search direction d k is defined by © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
where g k = ∇f (x k ) is the gradient of f (x) at x k and β k is a conjugate parameter. Different choices of β k correspond to different conjugate gradient methods. Well-known formulas for β k can be found in [8, 12-14, 17, 26]. The stepsize α k > 0 is usually obtained by the Wolfe line search where 0 < c 1 ≤ c 2 < 1. In order to exclude the points that are far from stationary points of f (x) along the direction d k , the strong Wolfe line search is used, which requires α k to satisfy (4) and Combining the conjugate gradient method and spectral gradient method [3], a spectral conjugate gradient method (SCG) was proposed by Bergin et al. [5]. Let where the spectral parameter θ k and the conjugate parameter β k are defined by respectively. Obviously, if θ k = 1, the method is one of the classical conjugate gradient methods; if β k = 0, the method is the spectral gradient method. The SCG [5] was modified by Yu et al. [32] in order to achieve the descent directions. Moreover, there are other ways to determine θ k and β k . For instance, based on the descent condition, Wan et al. [29] and Zhang et al. [35] presented the modified PRP and FR spectral conjugate gradient method, respectively. Due to the strong convergence of the Newton method, Andrei [1] proposed an accelerated conjugate gradient method, which took advantage of the Newton method to improve the performance of the conjugate gradient method. Following this idea, Parvaneh et al. [24] proposed a new SCG, which is a modified version of the method suggested by Jian et al. [15]. Masoud [21] introduced a scaled conjugate gradient method which inherited the good properties of the classical conjugate gradient. More references in this field can be seen in [6,10,20,28,34].
Recently, Liu et al. [18,19] introduced approximate optimal stepsizes (α AOS k ) for gradient method. They constructed a quadratic approximation model of f (x kαg k ) where the approximation Hessian matrix B k is symmetric and positive definite. By minimising ϕ(α), they obtained α AOS k = g k 2 g T k B k g k and proposed the approximate optimal gradient 2 I is selected, then the α AOS k reduces to α BB1 k , and the corresponding method is BB method [3]. If B k = 1/ᾱ BB k I is chosen, whereᾱ BB k is some modified BB stepsize, then the α AOS k reduces toᾱ BB k , and the corresponding method is some modified BB method [4,7,30]. And if B k = 1/tI, t > 0, then the α AOS k is the fixed stepsize t, and the corresponding method is the gradient method with fixed stepsize [16,22,33]. In this sense, the approximate optimal gradient method is a generalisation of the BB methods.
In this paper, we propose a new spectral conjugate gradient method based on the idea of the approximate optimal stepsize. Compared with the SCG method [5], the proposed method generates the sufficient descent direction per iteration and does not require more computation costs. Under some assumption conditions, the global convergence of the proposed method is established.
The rest of this paper is organised as follows. In Sect. 2, a new spectral conjugate gradient algorithm is presented and its computational costs are analysed. The global convergence of the proposed method is established in Sect. 3. In Sect. 4, some numerical experiments are used to show that the proposed method is superior to the SCG [5] and DY [8] methods. Conclusions are drawn in Sect. 5.

The new spectral conjugate gradient algorithm
In this section, we propose a new spectral conjugate gradient method with the form of (7). Letd k be a classical conjugate gradient direction. We firstly consider the approximate By dψ dα = 0, we obtain the approximate optimal stepsize α * k associated with ψ(α) Here, we choose BFGS update formula to generate B k , that is, To reduce the computational and storage costs, the memoryless BFGS schemes are usually used to substitute B k , see [2,23,25]. In this paper, we choose B k-1 as a scalar matrix ξ y k-1 2 s T k-1 y k-1 I, ξ > 0. Then (10) can be rewritten as It is easy to prove that if s T k-1 y k-1 > 0, then B k is symmetric and positive definite. If the directiond k is chosen as DY formula [8], i.e., Substituting (11) and (12) into (9), we have where To ensure the sufficient descent property of the direction and the bounded property of spectral parameter θ k , the truncating technique in [19] is adopted to choose θ k and β k as follows: 2 . Based on the above analyses, we describe the following algorithm.
Remark 1 By contrast with the SCG algorithm formula, the extra computational work of NSCG algorithm seems to require the inner products g T k-1 s k-1 per iteration. But g T k-1 s k-1 should be computed while implementing the Wolfe conditions. It implies that the extra computational work can be negligible.
Remark 2 It is well known that s T k-1 y k-1 > 0 can be guaranteed by the Wolfe line search. Since (11) implies a memoryless quasi-Newton update, from the references [27] and [31], it can be seen where m and M are positive constants. Together with (15), the parameter θ k satisfies that The following theorem indicates that the search direction generated by NSCG algorithm satisfies the sufficient descent condition.

Convergence analysis
In this section, the convergence of NSCG algorithm is analysed. We consider that g k = 0 for all k ≥ 0, otherwise a stationary point is obtained. We make the following assumptions.
Assumption 3.1 implies that there exists a constant Γ ≥ 0 such that The following lemma called Zoutendijk condition [36] was originally given by Zoutendijk et al.
Proof It is sufficient to prove that if (22) is not true, then (23) holds. We use proofs by contradiction. Suppose that there exists γ > 0 such that From (7) and Theorem 2.1, we have Besides, pre-multiplying (7) by g T k , we have By using the triangle inequality and (6), we get Together with Cauchy's inequality, (26) yields Therefore, from (25) and (27), we obtain It follows from Lemma 3.1 that By use of (24) and θ k ≥ m, for all sufficiently large k, there exists a positive constant λ such that Therefore, from (28) and (29) we have holds for all sufficiently large k. Combining with the Zoutendijk condition, we deduce that inequality (23) holds.

Corollary 3.1 Suppose that all the conditions of Lemma 3.2 hold. If
then lim inf k→∞ g k = 0.
Proof Suppose that there is a positive constant γ such that g k ≥ γ for all k ≥ 0. From Lemma 3.2, we have which contradicts (30), i.e., Corollary 3.1 is true.
In the following, we establish the global convergence theorem of NSCG algorithm.

Numerical results
In this section, we show the computational performance of NSCG algorithm. All codes are written in Matlab R2015b and run on PC with 2.50 GHz CPU processor and 4.00 GB RAM memory. Our test problems consist of 130 examples [9] from 100 to 5,000,000 variables. We implement the same stopping criterion Set the parameters ε = 10 -6 , ξ = 1.0001, c 1 = 0.0001 and c 2 = 0.9. Liu et al. [19] proposed GM_AOS 1, GM_AOS 2 and GM_AOS 3 algorithms, and GM_AOS 2 algorithm was slightly better than the other algorithms. When the quadratic model is considered, the algorithm developed by [18] is identical with GM_AOS 1 algorithm. In a certain sense, our algorithm can be viewed as an extension of SCG algorithm [5] and a modification of DY algorithm [8]. Therefore, we adopt the performance profiles introduced by Dolan et al. [11] to display the numerical performances of NSCG, SCG, DY and GM_AOS 2 algorithms.
It is noticed that the number of iterations (Itr), the number of function evaluations (NF), the number of gradient evaluations (NG) and the CPU time (Tcpu) are important factors showing the numerical performance of an optimal method. In profiles, the top curve is the method that solved the most problems in a time that was within a factor of the best time. The horizontal axis gives the percentage (τ ) of the test problems for which a method is the fastest (efficiency), while the vertical side gives the percentage (ψ) of the test problems that are successfully solved by each of the methods. Moreover, we present the number of problems solved by the tested algorithms with a minimum number of Itr, NF and NG and the minimum Tcpu. If programme runs failure, we denote the number of Itr, NF, NG by a large positive integer, respectively, and denote the Tcpu by 1000 seconds. In this way, only NSCG algorithm can solve all test problems. However, SCG, DY and GM_AOS 2 algorithms do 98.5%, 93.8% and 92.3% of problems, respectively.
From Figs. 1-4, we can see that NSCG algorithm is the top performer, being more successful and more robust than SCG, DY and GM_AOS 2 algorithms. For example, in Fig. 1,  Observe that NSCG algorithm is also the fastest of the three algorithms in Figs. 2, 3 and 4. To conclude, NSCG algorithm is more effective than other algorithms with respect to all the measures (Itr, NF, NG, Tcpu).

Conclusions
In this paper, a new spectral conjugate gradient method is proposed based on the idea of approximate optimal stepsize. Besides, the memoryless BFGS formula is embedded in our algorithm to reduce the computational and storage costs. Under some assumptions, global convergence of the proposed method is established. Numerical results show that this method is efficient and competitive.