A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

A Polak–Ribière–Polyak (PRP) algorithm is one of the oldest and popular conjugate gradient algorithms for solving nonlinear unconstrained optimization problems. In this paper, we present a q-variant of the PRP (q-PRP) method for which both the sufficient and conjugacy conditions are satisfied at every iteration. The proposed method is convergent globally with standard Wolfe conditions and strong Wolfe conditions. The numerical results show that the proposed method is promising for a set of given test problems with different starting points. Moreover, the method reduces to the classical PRP method as the parameter q approaches 1.


Introduction
The conjugate gradient (CG) methods have played an important role in solving nonlinear optimization problems due to their simplicity of iteration and very low memory requirements [1,2]. Of course, the CG methods are not among the fastest or most robust optimization algorithms for solving nonlinear problems today, but they are very popular among engineers and mathematicians to solve nonlinear optimization problems [3][4][5]. The origin of the methods dates back to 1952 when Hestenes and Stiefel introduced a CG method [6] for solving a symmetric positive definite linear system of equations. Further, Fletcher and Reeves modified the same method called FR [7] in the 1960s and developed a conjugate gradient method to solve unconstrained nonlinear optimization problems.
The conjugate gradient methods deflect the steepest descent method [8] by adding to it a positive multiple of the direction used in the previous step. They only require the first-order derivative and overcome the shortcomings of the slow convergence rate of the steepest descent method. By means of conjugacy, the conjugate gradient methods make the steepest descent direction to account for conjugacy and thus enhance the efficiency and reliability of the algorithm. Different conjugate gradient algorithms correspond to different choices of the scalar parameter β k [6,7,9]. The parameter β k is selected to mini-mize a convex quadratic function in a subspace spanned by a set of mutually conjugate descent directions, but the effectiveness of the algorithm depends on the accuracy of the line searches.
Quantum calculus, known as q-calculus, is the study of calculus without limits, where classical mathematical formulas are obtained as q approaches 1. In q-calculus, the classical derivative is replaced by the q-difference operator. Jackson [10,11] was the first to have some applications of the q-calculus and introduced the q-analogue of the classical derivative and integral operators. Applications of q-calculus play an important role in various fields of mathematics and physics [12][13][14][15][16][17][18][19][20].
In 1969, Polak and Ribière [21] and Polyak [22] proposed a conjugate gradient method independently, later it was called Polak, Ribière, and Polyak (PRP) method. In view of the practical computation, the PRP method performed much better than the FR method for many unconstrained optimization problems because it automatically recovered once a small step length was generated, although the global convergence of the PRP method was proved only for the strictly convex functions [23]. For general nonlinear functions, Powell showed that the PRP method could cycle infinitely without approaching a solution even if the step-length was chosen to be the least positive minimizer of the line search function [24]. To change this unbalanced state, Gilbert and Nocedal [25] considered Powell's suggestions [26] to modify the PRP method and showed that this modification of the PRP method is globally convergent for exact and inexact line searches.
In 2019, Yuan et al. proposed a new modified three-term conjugate gradient algorithm based on the modified Armijo line search technique [27]. After that in 2020, they designed a modified conjugate gradient method with a sufficient descent property and a trust region property [28]. The authors in [29] proposed the modified Hestenes-Stiefe (HS) conjugate gradient algorithm in order to solve large-scale complex smooth and nonsmooth optimization problems.
In 2020, Yuan et al. further proposed the PRP method and established the global convergence proof with the modified weak Wolfe-Powell line search technique for nonconvex functions. The numerical results demonstrated the competitiveness of the method compared to the existing methods. The engineering Muskingum model and image restoration problems were used to determine the interesting aspects of the given algorithm [30]. The generalized conjugate gradient algorithms were studied for solving large-scale unconstrained optimization problems within the real world applications, and two open problems were formulated [31][32][33].
The preliminary experimental optimization results using q-calculus were first shown in the field of global optimization [34]. The idea of this work is utilized in the stochastic q-neurons which are based on activation functions converted into the corresponding stochastic q-activation functions for improving the effectiveness of the algorithm. The q-gradient concept is further utilized in the least mean square algorithm to inherit the fast convergence property with less dependency on the eigenvalue of the input correlation matrix [35]. A modified least mean algorithm using q-calculus was also proposed which automatically adapted the learning rate with respect to the error and was shown to have fast convergence [36]. In optimization, the q-calculus was employed in Newton, modified Newton, BFGS, and limited memory BFGS methods for solving unconstrained nonlinear optimization problems [19,[37][38][39][40] with the least number of iterations. In the field of conjugate gradient methods, the q-analogue of the Fletcher-Reeves method was developed [41] to optimize unimodal and multimodal functions, and the Gaussian perturbations were used in some iterations to ensure the convergence globally in the probabilistic sense only.
In this paper, we propose a q-variant of PRP method, called q-PRP, with the sufficient descent property independent of the line searches and convexity assumption of the objective function. Under a condition on the q-gradient of the objective function and some other appropriate conditions, the proposed method is globally convergent. The numerical experiments are conducted to show the effectiveness of the q-PRP algorithm. For a set of given test functions with different starting points, it was able to escape from many local minima to reach global minima due to q-gradient.
The remainder of this paper is organized as follows: In the next section, we present the essential preliminaries. The main results are presented in Sect. 3, and their convergence proofs are given in Sect. 4. The numerical examples of the theoretical results are analyzed in Sect. 5. The paper is then ended with a conclusion and directions for future work.

Essential preliminaries
In this section, the principal terms of q-calculus are formed by assuming 0 < q < 1, as follows: The q-integer [n] q is defined by for all n ∈ N. The q-analogue of (1 + x) n q is the polynomial given by The derivative of x n with respect to x is given by [n] q x n-1 . The q-derivative D q f of a function f is given by Example 2.1 Let the function f : R → R be such that f (x) = ln x. Then, we have It is obvious that the q-derivative of a function is a linear operator, that is, for any constant a and b, we have [42] D q af (x) + bg(x) = aD q f (x) + bD q g(x).
Let f (x) be a continuous function on [a, b], where a, b ∈ R. Then, there existq ∈ (0, 1) and x ∈ (a, b) [43] such that for all q ∈ (q, 1) ∪ (1,q -1 ). The q-partial derivative of a function f : R n → R at x ∈ R n with respect to x i , where scalar q ∈ (0, 1), is given as [34] We now choose the parameter q as a vector, that is, Then, the q-gradient vector [34] of f is Let {q k i } be a real sequence defined by for each i = 1, . . . , n, where k = 0, 1, 2, . . . , and a fixed starting number 0 < q 0 i < 1. Of course, the sequence {q k i } converges to (1, . . . , 1) as k → ∞ [38]. Thus, the q-gradient reduces to a classical derivative. For the sake of convenience, we represent the q-gradient vector of f at x k as Then, the q-gradient is given as In the next section, we present the q-PRP method. To improve the efficiency, we utilize the q-gradient in inexact line search methods to generate the step-length which ensures the reduction of the objective function value.

On q-Polak-Ribière-Polyak conjugate gradient algorithm
Consider the following unconstrained nonlinear optimization problem: where f : R n → R is a continuously q-differentiable function. The numerical optimization algorithms of general objective functions differ mainly in generating the search directions.
In the conjugate gradient algorithms, a sequence of iterates is generated with a given starting point x 0 ∈ R n by the following schema: for all k ≥ 0, where x k is the current iterate, d k q k is a descent direction of f at x k and α k > 0 is the step-length. Note that the descent direction d k q k = -g k q k leads to the q-steepest descent method [34]. In the case q k approaches (1, 1, . . . , 1) T as k → ∞, the method reduces to the classical steepest descent method [7]. The search direction d k q is guaranteed to have a descent direction due to the following: The directions d k q k are generated in the light of classical conjugate direction methods [7,9,21,44,45] as where β q-PRP k ∈ R is modified from a scalar quantity β k in the PRP method and presented as follows: Some well-known conjugate gradient methods are available, such as FR (Fletcher-Revees) [7], PRP (Polak-Ribière-Polyak) [9,21], and HS (Hestenes-Stiefel) [6] conjugate gradient method, respectively. Among these, the PRP method is considered the best in practical computation. In order to guarantee the global convergence, we choose d k q k to satisfy the sufficient descent condition: where c > 0 is a constant. There are several approaches to find the step-length. Among them, the exact line search [46,47] is time consuming and sometimes difficult to carry out. Therefore, the researchers adopt the approaches of some inexact line search techniques such as Wolfe line search [48], Goldstein line search [49], or Armijo line search with backtracking [50]. The most used line search conditions for determining the step-length are the so-called standard Wolfe line search conditions: and where 0 < δ < σ < 1. The first condition (7) is called the Armijo condition, which ensures a sufficient reduction of the objective function value, while the second condition (8) is called the curvature condition, which ensures nonacceptance of short step-length. To investigate the global convergence property of the PRP method, a modified Armijo line search method was proposed [51]. For given constants μ > 0, δ, ρ ∈ (0, 1), the line search aims to find such that (2) and (4) satisfy and -C 1 g q k+1 x k+1 2 ≤ g q k+1 x k+1 T d k+1 q k+1 ≤ -C 2 g q k+1 x k+1 2 , where 0 < C 2 < 1 < C 1 are constants. Accordingly, since {f (x k )} k≥0 is a nonincreasing sequence, we have Equivalently, It is worth mentioning that a step-length computed by the standard Wolfe line search conditions (7)-(8) may not be sufficiently close to a minimizer of (P). Instead, the strong Wolfe line search conditions can be used, which consist of (7) and, instead of (8), the following strengthened version: is used. From (11), we see that if σ → ∞, then the step-length satisfying (7) and (11) tends to be the optimal step-length [2]. Note that appropriate choices for a starting point have a positive effect on computational cost and convergence speed of the algorithm. The modified PRP conjugate gradient-like method introduced by [52] is presented in the context of q-calculus as: With the q-gradient, we can have a modification of [52] by taking From (12) and (13) for k ≥ 1, we obtain This implies that d k q k provides a q-descent direction of the objective function at x k . It is worth mentioning that if exact line search [53] is used to compute the step-length α k , then θ k = 0, and q k → (1, 1, . . . , 1) T for k → ∞. Then, finally, the q-PRP method reduces to the classical PRP method.
The number of steps taken by the algorithm to a large extent determines the number of iterations which always differs from one problem to another. Thus, we present the following Algorithm 1 to solve the problem (P).

Global convergence
In this section, we prove the global convergence of Algorithm 1 under the following assumptions.
is bounded, where x 0 is a starting point.

Assumption 4.2
In some neighborhood N of , f has a continuous q-derivative and there exists a constant L > 0 such that for x, y ∈ N .

7:
Find d k q k using (12) 8: end if 9: Determine α k either by (7)-(8) or (7) and (11). 10: 11: for i = 1, . . . , n do 12: Set q k+1 13: end for 14: end for Output: Since {f (x)} is nonincreasing, it is clear that the sequence {x k } generated by Algorithm 1 is contained in . From Assumptions 4.1 and 4.2, there is a constant η > 0 such that for each x ∈ . Based on Assumption 4.1, there exists a positive constant B such that x ≤ B, for all x ∈ . Without any specification, let {x k } and {d k q k } be the iterative sequence and q-descent direction sequence generated by Algorithm 1. To this point, we present the following lemma.
for all k, then there exists a constant M > 0 such that the q-descent direction satisfies for all k.
Proof From (12) and (16) for k ≥ 1, we obtain Taking the norm of both sides of the above equation and using (16), we get From Assumption 4.2 and (17), we have From (10), α k-1 d k-1 q k-1 → 0 and since {q k } approaches (1, . . . , 1) T as k → ∞, there exist a constant s ∈ (0, 1) and an integer k 0 such that the following inequality holds for all k ≥ k 0 : 2η From (19), we get for any k > k 0 , For k sufficiently large with s ∈ (0, 1), the second term of the above inequality can satisfy Thus, we get thus we get (18).
We now present that the modified q-PRP method with modified Armijo-type line search introduced by [51] due to the q-gradient is globally convergent.
Proof For the sake of obtaining a contradiction, we suppose that the given conclusion is not true. Then, there exists a constant > 0 such that for all k. If lim inf k→∞ α k > 0, then from (10) and (14), we get This contradicts the assumption (22). Suppose that lim inf k→∞ α k = 0, that is, there is an infinite index set K such that lim k→∞, k∈K Suppose that step-9 of Algorithm 1 utilizes (9) to generate the step-length. When k ∈ K is sufficiently large, and ρ -1 α k for ρ ∈ (0, 1) [52] does not satisfy (9), then we must have From the q-mean value theorem, there is γ k ∈ (0, 1) such that that is, From Lemma 4.1 and Assumption 4.2, we have where L > 0. From (22) and (23), Using (14), we get Since {d k q k } is bounded and lim k∈K,k→∞ α k = 0, This gives a contradiction. The proof is complete.
The following important result introduced by Zoutendijk [54] can be expressed in the sense of q-calculus as follows: (2) and (4), where d k q k satisfies (3) and α k is obtained by standard Wolfe line search conditions (7)-(8) and strong Wolfe line search conditions (7) and (11). Then,

Lemma 4.3 Suppose that Assumptions 4.1 and 4.2 hold. Consider the iteration methods
We now present the convergence analysis of Algorithm 1 with standard Wolfe conditions, which is a modification of [55,56] in the sense of q-calculus. In this case, the steplengths are bounded below by a positive constant.
Proof From (3) and the first Wolfe condition (7), we have This means that the sequence {f (x k )} k≥0 is bounded. From the second standard Wolfe condition (8) and Assumption 4.2, we get From the first standard Wolfe condition (7), we have Thus, Zoutendijk condition (24) holds, that is, From Assumption 4.1, there exists a constant B such that This, together with (6) and (26), leads to (25).
We present the following theorem which is a modification of that in [57] using the qgradient for q-PRP method with strong Wolfe conditions. is such that the step-length α k satisfies the strong Wolfe conditions (7) and (11), then either Proof From (4), for all k ≥ 1, we have Squaring both sides of the above equation, we get Pre-multiplying (4) for k ≥ 1 by g k q k , we get From (29) and the second strong Wolfe condition (11), one obtains From the inequality we can express (30) as where c 1 = 1 (1+σ 2 ) . Note that From (28) one gets Using (31), we obtain If (27) is not true, then from the Zoutendijk condition (24) with (32) we obtain the following inequality: which holds for k sufficiently large with q k approaching (1, . . . , 1) T . From (32) and (33), one immediately recovers (30).
The following lemma immediately follows from the above convergence theorem.

Lemma 4.6
Suppose that Assumptions 4.1 and 4.2 hold and, from Algorithm 1, the steplength is determined using strong Wolfe conditions. If for any r ∈ [0, 4], then the method converges in the sense that (27) is true.
Proof If (27) is not true, then from Theorem 4.5, it follows that Because g k q k is bounded away from zero and r ∈ [0, 4], it is easy to see that (35) contradicts (34). Therefore, the lemma is true.
The above inequality shows that if a conjugate gradient method fails to converge, then the length of the search direction will diverge to infinity. Observe that in the above developments, the sufficient descent condition is assumed. This lemma is very useful for proving the global convergence of some conjugate gradient methods without assuming the sufficient descent condition.

Numerical illustration
In this section, we investigate the computational efficiency of Algorithm 1 using standard Wolfe conditions (7) and (8), and strong Wolfe conditions (7) and (11), respectively, in contrast to the classical PRP method under the same two conditions. All codes of Algorithm 1 and classical PRP method are written in R version 3.6.1 installed on a laptop having Intel(R) Core(TM) i3-4005U, 1.70 GHz CPU processor and 4 GB RAM. The iteration was set to terminate if it exceeded 1000 or the gradient of a function was within 10 -6 .
We find the q-gradient of the above function at the point The complete computational details are given in Table 1 which is depicted graphically through Fig. 1. Note that Fig. 2 provides the three-dimensional view of Mishra 6 test function.
Example 5.2 Consider a function f : R 2 → R given by The Rosenbrock function, also called Rosenbrock's valley or banana function, is a nonconvex, unimodal, and nonseparable function. Finding its global minimum numerically is difficult. It has only one global minimizer located at the point with the search range [-100, 100] for x 1 and x 2 . For performing the experiment, we first generated 37 different starting points from the interval [-5, 5] for the above Rosenbrock function. The numerical results are shown in Table 2 for Algorithm 1 and Table 3 for the classical PRP Algorithm. From these tables, we observe that the number of iterations (NI) is smaller in the case of Algorithm 1 in comparison to the classical PRP method. The meanings of columns of both tables are well-defined. Figure 3 shows the results of comparisons in the number of iterations. The Rastrigin test function is a nonconvex, multimodal, and separable function, which has several local minimizers arranged in a regular lattice, but it has only one global minimizer located at the point The search range for the Rastrigin function is [-5.12, 5.12] in both x 1 and x 2 . This function poses a fairly difficult problem due to its large search space and its large number of local minima. With a chosen starting point (0.2, 0.2) T , we minimize this function through Algorithm 1 using strong Wolfe conditions. Note that q-PRP terminates in 5 iterations as      which are not true. This is one of the advantages of using the q-gradient in our proposed method over the classical method. We now execute Algorithm 1 on a set of test functions taken from the CUTEr library [59] with 51 different starting points under standard and strong Wolfe conditions, respectively. Note that direction d k q generated by the proposed method is the q-descent direction due to the involvement of the q-gradient. Tables 4 and 5 list the numerical results of Algorithm 1 for 51 different starting points on a set of test problems, and Figs. 4 and 5 show this comparison graphically for both q-PRP and classical PRP methods under standard and strong Wolfe conditions, respectively. We conclude that our method is better than the classical method with the smaller number of iterations for the selected set of test problems.

Conclusion and future work
This paper proposed the q-PRP conjugate gradient method, which is an improvement of classical PRP conjugate gradient methods. The global convergence of the proposed method is established under the standard and strong Wolfe line searches. The effectiveness of the proposed method has been shown by some numerical examples. We find that the proposed method due to the q-gradient converges fast for a set of test problems with different starting points. The inclusion of q-calculus in other conjugate gradient methods deserves further investigation.         Tables 4 and 5