Skip to main content

A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems

Abstract

A Polak–Ribière–Polyak (PRP) algorithm is one of the oldest and popular conjugate gradient algorithms for solving nonlinear unconstrained optimization problems. In this paper, we present a q-variant of the PRP (q-PRP) method for which both the sufficient and conjugacy conditions are satisfied at every iteration. The proposed method is convergent globally with standard Wolfe conditions and strong Wolfe conditions. The numerical results show that the proposed method is promising for a set of given test problems with different starting points. Moreover, the method reduces to the classical PRP method as the parameter q approaches 1.

Introduction

The conjugate gradient (CG) methods have played an important role in solving nonlinear optimization problems due to their simplicity of iteration and very low memory requirements [1, 2]. Of course, the CG methods are not among the fastest or most robust optimization algorithms for solving nonlinear problems today, but they are very popular among engineers and mathematicians to solve nonlinear optimization problems [35]. The origin of the methods dates back to 1952 when Hestenes and Stiefel introduced a CG method [6] for solving a symmetric positive definite linear system of equations. Further, Fletcher and Reeves modified the same method called FR [7] in the 1960s and developed a conjugate gradient method to solve unconstrained nonlinear optimization problems.

The conjugate gradient methods deflect the steepest descent method [8] by adding to it a positive multiple of the direction used in the previous step. They only require the first-order derivative and overcome the shortcomings of the slow convergence rate of the steepest descent method. By means of conjugacy, the conjugate gradient methods make the steepest descent direction to account for conjugacy and thus enhance the efficiency and reliability of the algorithm. Different conjugate gradient algorithms correspond to different choices of the scalar parameter \(\beta _{k}\) [6, 7, 9]. The parameter \(\beta _{k}\) is selected to minimize a convex quadratic function in a subspace spanned by a set of mutually conjugate descent directions, but the effectiveness of the algorithm depends on the accuracy of the line searches.

Quantum calculus, known as q-calculus, is the study of calculus without limits, where classical mathematical formulas are obtained as q approaches 1. In q-calculus, the classical derivative is replaced by the q-difference operator. Jackson [10, 11] was the first to have some applications of the q-calculus and introduced the q-analogue of the classical derivative and integral operators. Applications of q-calculus play an important role in various fields of mathematics and physics [1220].

In 1969, Polak and Ribière [21] and Polyak [22] proposed a conjugate gradient method independently, later it was called Polak, Ribière, and Polyak (PRP) method. In view of the practical computation, the PRP method performed much better than the FR method for many unconstrained optimization problems because it automatically recovered once a small step length was generated, although the global convergence of the PRP method was proved only for the strictly convex functions [23]. For general nonlinear functions, Powell showed that the PRP method could cycle infinitely without approaching a solution even if the step-length was chosen to be the least positive minimizer of the line search function [24]. To change this unbalanced state, Gilbert and Nocedal [25] considered Powell’s suggestions [26] to modify the PRP method and showed that this modification of the PRP method is globally convergent for exact and inexact line searches.

In 2019, Yuan et al. proposed a new modified three-term conjugate gradient algorithm based on the modified Armijo line search technique [27]. After that in 2020, they designed a modified conjugate gradient method with a sufficient descent property and a trust region property [28]. The authors in [29] proposed the modified Hestenes–Stiefe (HS) conjugate gradient algorithm in order to solve large-scale complex smooth and nonsmooth optimization problems.

In 2020, Yuan et al. further proposed the PRP method and established the global convergence proof with the modified weak Wolfe–Powell line search technique for nonconvex functions. The numerical results demonstrated the competitiveness of the method compared to the existing methods. The engineering Muskingum model and image restoration problems were used to determine the interesting aspects of the given algorithm [30]. The generalized conjugate gradient algorithms were studied for solving large-scale unconstrained optimization problems within the real world applications, and two open problems were formulated [3133].

The preliminary experimental optimization results using q-calculus were first shown in the field of global optimization [34]. The idea of this work is utilized in the stochastic q-neurons which are based on activation functions converted into the corresponding stochastic q-activation functions for improving the effectiveness of the algorithm. The q-gradient concept is further utilized in the least mean square algorithm to inherit the fast convergence property with less dependency on the eigenvalue of the input correlation matrix [35]. A modified least mean algorithm using q-calculus was also proposed which automatically adapted the learning rate with respect to the error and was shown to have fast convergence [36]. In optimization, the q-calculus was employed in Newton, modified Newton, BFGS, and limited memory BFGS methods for solving unconstrained nonlinear optimization problems [19, 3740] with the least number of iterations. In the field of conjugate gradient methods, the q-analogue of the Fletcher–Reeves method was developed [41] to optimize unimodal and multimodal functions, and the Gaussian perturbations were used in some iterations to ensure the convergence globally in the probabilistic sense only.

In this paper, we propose a q-variant of PRP method, called q-PRP, with the sufficient descent property independent of the line searches and convexity assumption of the objective function. Under a condition on the q-gradient of the objective function and some other appropriate conditions, the proposed method is globally convergent. The numerical experiments are conducted to show the effectiveness of the q-PRP algorithm. For a set of given test functions with different starting points, it was able to escape from many local minima to reach global minima due to q-gradient.

The remainder of this paper is organized as follows: In the next section, we present the essential preliminaries. The main results are presented in Sect. 3, and their convergence proofs are given in Sect. 4. The numerical examples of the theoretical results are analyzed in Sect. 5. The paper is then ended with a conclusion and directions for future work.

Essential preliminaries

In this section, the principal terms of q-calculus are formed by assuming \(0< q<1\), as follows: The q-integer \([n]_{q}\) is defined by

$$ [n]_{q} = \textstyle\begin{cases} \frac{1-q^{n}}{ 1-q},& q\ne 1, \\ n,& q=1, \end{cases} $$

for all \(n\in \mathbb{N}\). The q-analogue of \((1+x)_{q}^{n}\) is the polynomial given by

$$ (1+x)_{q}^{n} = \textstyle\begin{cases} 1, & n=0, \\ \prod_{k=0}^{n-1} (1+q^{k}x), & n\geq 1. \end{cases} $$

The derivative of \(x^{n}\) with respect to x is given by \([n]_{q}x^{n-1}\). The q-derivative \(D_{q}f\) of a function f is given by

$$ D_{q}f(x) = \frac{f(qx)- f(x)}{qx- x}, $$

if \(x\in 0\), and \(D_{q}f(0)=f'(0)\), provided \(f'(0)\) exists. Note that

$$ \lim_{q\to 1}D_{q}f(x)=\lim_{q\to 1} \frac{f(qx)-f(x)}{(q-1)x} = \frac{{\mathrm{d}}f(x)}{{\mathrm{d}} x}, $$

if f is differentiable.

Example 2.1

Let the function \(f : \mathbb{R}\to \mathbb{R}\) be such that \(f(x)=\ln x\). Then, we have

$$ \biggl( \frac{{\mathrm{d}}}{ {\mathrm{d}} x} \biggr)_{q} \ln x = \frac{ \ln x - \ln ( qx) }{ (1-q)x } = \frac{ \ln \frac{1}{q}}{(1-q)x}. $$

It is obvious that the q-derivative of a function is a linear operator, that is, for any constant a and b, we have [42]

$$ D_{q} \bigl\{ af(x) + bg(x) \bigr\} = aD_{q} f(x) + b D_{q} g(x). $$

Let \(f(x)\) be a continuous function on \([a, b]\), where \(a, b \in \mathbb{R}\). Then, there exist \(\hat{q} \in (0, 1)\) and \(x \in (a,b)\) [43] such that

$$ f(b) - f(a) = D_{q} f(x) (b-a), $$

for all \(q \in (\hat{q}, 1) \cup (1, \hat{q}^{-1})\). The q-partial derivative of a function \(f : \mathbb{R}^{n} \to \mathbb{R}\) at \(x\in \mathbb{R}^{n}\) with respect to \(x_{i}\), where scalar \(q \in (0,1)\), is given as [34]

$$ D_{q, x_{i}} f(x) = \textstyle\begin{cases} \frac{1}{(1-q) x_{i}} [ f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ) \\ \quad {}- f (x_{1}, x_{2},\ldots , x_{i-1}, q x_{i},x_{i+1},\ldots , x_{n} ) ], & x_{i}\ne 0, q\ne 1, \\ \frac{\partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, 0, x_{i+1},\ldots , x_{n} ),& x_{i}=0, \\ \frac{ \partial }{\partial x_{i}} f ( x_{1}, x_{2},\ldots , x_{i-1}, x_{i}, x_{i+1},\ldots , x_{n} ),& q=1. \end{cases} $$

We now choose the parameter q as a vector, that is,

$$ q=(q_{1},\ldots , q_{i},\ldots , q_{n})^{T} \in \mathbb{R}^{n}. $$

Then, the q-gradient vector [34] of f is

$$ \nabla _{q} f(x)^{T} = \begin{bmatrix} D_{q_{1}, x_{1}} f(x) & \ldots & D_{q_{i}, x_{i}} f(x) & \ldots & D_{q_{n}, x_{n}} f(x) \end{bmatrix} . $$

Let \(\{ q^{k}_{i} \}\) be a real sequence defined by

$$ q^{k+1}_{i} = 1- \frac{ q^{k}_{i}}{ (k+1)^{2}}, $$
(1)

for each \(i=1,\ldots ,n\), where \(k=0,1,2,\ldots \) , and a fixed starting number \(0< q^{0}_{i} < 1\). Of course, the sequence \(\{q^{k}_{i}\}\) converges to \((1,\ldots , 1)\) as \(k \to \infty \) [38]. Thus, the q-gradient reduces to a classical derivative. For the sake of convenience, we represent the q-gradient vector of f at \(x^{k}\) as

$$ g_{q^{k}} \bigl( x^{k} \bigr) = \nabla _{q^{k}} f \bigl( x^{k} \bigr). $$

Example 2.2

Consider the function \(f : \mathbb{R}^{2} \to \mathbb{R}\) defined by

$$ f(x) = x_{1} x_{2}^{2} + 4x_{1}^{2}. $$

Then, the q-gradient is given as

$$ \nabla _{q^{k}} f(x)^{T} = \begin{bmatrix} 4(1+q^{k}_{1})x_{1}+x_{2}^{2} & x_{1}(1+q^{k}_{1})x_{2} \end{bmatrix} . $$

In the next section, we present the q-PRP method. To improve the efficiency, we utilize the q-gradient in inexact line search methods to generate the step-length which ensures the reduction of the objective function value.

On q-Polak–Ribière–Polyak conjugate gradient algorithm

Consider the following unconstrained nonlinear optimization problem:

$$ (P) \quad \min_{x\in \mathbb{R}^{n}} f(x), $$

where \(f: \mathbb{R}^{n} \to \mathbb{R}\) is a continuously q-differentiable function. The numerical optimization algorithms of general objective functions differ mainly in generating the search directions. In the conjugate gradient algorithms, a sequence of iterates is generated with a given starting point \(x^{0} \in \mathbb{R}^{n}\) by the following schema:

$$ x^{k+1}=x^{k}+p^{k}, \qquad p^{k}=\alpha _{k}d_{q^{k}}^{k}, $$
(2)

for all \(k\ge 0\), where \(x^{k}\) is the current iterate, \(d_{q^{k}}^{k}\) is a descent direction of f at \(x^{k}\) and \(\alpha _{k}>0\) is the step-length. Note that the descent direction \(d_{q^{k}}^{k} = -g_{q^{k}}^{k}\) leads to the q-steepest descent method [34]. In the case \(q^{k}\) approaches

$$ (1,1,\ldots , 1)^{T} $$

as \(k\to \infty \), the method reduces to the classical steepest descent method [7]. The search direction \(d_{q}^{k}\) is guaranteed to have a descent direction due to the following:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}< 0. $$
(3)

The directions \(d_{q^{k}}^{k}\) are generated in the light of classical conjugate direction methods [7, 9, 21, 44, 45] as

$$ d_{ q^{k}}^{k} = \textstyle\begin{cases} -g_{q^{k}}^{k},& k=0, \\ -g_{q^{k}}^{k}+\beta _{k}^{q-\mathrm{PRP}}d_{q^{k-1}}^{k-1},& k \ge 1, \end{cases} $$
(4)

where \(\beta _{k}^{q-\mathrm{PRP}}\in \mathbb{R}\) is modified from a scalar quantity \(\beta _{k}\) in the PRP method and presented as follows:

$$ \beta _{k}^{q-\mathrm{PRP}} = \frac{ (g_{q^{k}}^{k} )^{T} (g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$
(5)

Some well-known conjugate gradient methods are available, such as FR (Fletcher–Revees) [7], PRP (Polak–Ribière–Polyak) [9, 21], and HS (Hestenes–Stiefel) [6] conjugate gradient method, respectively. Among these, the PRP method is considered the best in practical computation. In order to guarantee the global convergence, we choose \(d_{q^{k}}^{k}\) to satisfy the sufficient descent condition:

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \le - c \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(6)

where \(c>0\) is a constant. There are several approaches to find the step-length. Among them, the exact line search [46, 47] is time consuming and sometimes difficult to carry out. Therefore, the researchers adopt the approaches of some inexact line search techniques such as Wolfe line search [48], Goldstein line search [49], or Armijo line search with backtracking [50]. The most used line search conditions for determining the step-length are the so-called standard Wolfe line search conditions:

$$ f \bigl( x^{k}+\alpha _{k}d_{q^{k}}^{k} \bigr) \le f\bigl(x^{k}\bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q}^{k} $$
(7)

and

$$ g_{q^{k}} \bigl( x^{k} + \alpha _{k}d_{q^{k}}^{k} \bigr)^{T}d_{q^{k}}^{k} \ge \sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$
(8)

where \(0<\delta <\sigma <1\). The first condition (7) is called the Armijo condition, which ensures a sufficient reduction of the objective function value, while the second condition (8) is called the curvature condition, which ensures nonacceptance of short step-length. To investigate the global convergence property of the PRP method, a modified Armijo line search method was proposed [51]. For given constants \(\mu >0\), \(\delta , \rho \in (0, 1)\), the line search aims to find

$$ \alpha _{k}=\max \biggl\{ \rho ^{j} \frac{\mu \lvert (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} \rvert }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} : j = 0, 1,\ldots \biggr\} $$

such that (2) and (4) satisfy

$$ f \bigl(x^{k+1} \bigr) \le f \bigl(x^{k} \bigr) - \delta \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(9)

and

$$ - C_{1} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2} \le \bigl( g_{q^{k+1}} \bigl( x^{k+1} \bigr) \bigr)^{T} d_{q^{k+1}}^{k+1} \le -C_{2} \bigl\lVert g_{q^{k+1}} \bigl(x^{k+1} \bigr) \bigr\rVert ^{2}, $$

where \(0< C_{2}<1<C_{1}\) are constants. Accordingly, since \(\{ f(x^{k})\}_{k\ge 0}\) is a nonincreasing sequence, we have

$$ \sum_{k=0}^{\infty } \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} < \infty . $$

Equivalently,

$$ \lim_{k\to \infty } \alpha _{k} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert =0. $$
(10)

It is worth mentioning that a step-length computed by the standard Wolfe line search conditions (7)–(8) may not be sufficiently close to a minimizer of \((P)\). Instead, the strong Wolfe line search conditions can be used, which consist of (7) and, instead of (8), the following strengthened version:

$$ \bigl\lvert g_{q^{k}} \bigl(x^{k} + \alpha _{k} d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert \le -\sigma \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} $$
(11)

is used. From (11), we see that if \(\sigma \to \infty \), then the step-length satisfying (7) and (11) tends to be the optimal step-length [2]. Note that appropriate choices for a starting point have a positive effect on computational cost and convergence speed of the algorithm. The modified PRP conjugate gradient-like method introduced by [52] is presented in the context of q-calculus as:

$$\begin{aligned} d_{q^{k}}^{k} = \textstyle\begin{cases} - g_{q^{k}}^{k}, & k=0, \\ -g_{q^{k}}^{k} + \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1} - \theta ^{k} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{ k-1} ),& k>0. \end{cases}\displaystyle \end{aligned}$$
(12)

With the q-gradient, we can have a modification of [52] by taking

$$\begin{aligned} \theta ^{k} = \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. \end{aligned}$$
(13)

From (12) and (13) for \(k\ge 1\), we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}}^{k} )^{T} ( g_{q^{k}}^{k}-g_{q^{k-1}}^{k-1} ) }{ \lVert g_{ q^{k-1}}^{k-1} \rVert ^{2} } d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}}^{k} )^{T} d_{q^{k-1}}^{k-1} }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{ k-1} \bigr), $$

that is,

$$\begin{aligned} \bigl( d_{q^{k}}^{k} \bigr)^{T} g_{q^{k}}^{k} &= - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. \end{aligned}$$
(14)

This implies that \(d_{q^{k}}^{k}\) provides a q-descent direction of the objective function at \(x^{k}\). It is worth mentioning that if exact line search [53] is used to compute the step-length \(\alpha _{k}\), then \(\theta ^{k}=0\), and \(q^{k}\to (1, 1,\ldots , 1)^{T}\) for \(k\to \infty \). Then, finally, the q-PRP method reduces to the classical PRP method.

The number of steps taken by the algorithm to a large extent determines the number of iterations which always differs from one problem to another. Thus, we present the following Algorithm 1 to solve the problem \((P)\).

Algorithm 1
figurea

q-PRP conjugate gradient algorithm

Global convergence

In this section, we prove the global convergence of Algorithm 1 under the following assumptions.

Assumption 4.1

The level set

$$ \Omega = \bigl\{ x \in \mathbb{R}^{n} : f(x) \le f \bigl(x^{0}\bigr) \bigr\} , $$

is bounded, where \(x^{0}\) is a starting point.

Assumption 4.2

In some neighborhood N of Ω, f has a continuous q-derivative and there exists a constant \(L>0\) such that

$$ \bigl\lVert g_{q}(x) - g_{q}(y) \bigr\rVert \le L \lVert x-y \rVert , $$
(15)

for \(x, y \in N\).

Since \(\{f(x)\}\) is nonincreasing, it is clear that the sequence \(\{ x^{k} \}\) generated by Algorithm 1 is contained in Ω. From Assumptions 4.1 and 4.2, there is a constant \(\eta >0\) such that

$$ \bigl\lVert g_{q^{k}}(x) \bigr\rVert \le \eta , $$
(16)

for each \(x\in \Omega \). Based on Assumption 4.1, there exists a positive constant \(\mathcal{B}\) such that \(\lVert x\rVert \le B\), for all \(x\in \Omega \). Without any specification, let \(\{x^{k}\}\) and \(\{d_{q^{k}}^{k}\}\) be the iterative sequence and q-descent direction sequence generated by Algorithm 1. To this point, we present the following lemma.

Lemma 4.1

If there exists a constant \(\epsilon >0\), and \(\{q^{k}\}\) generated by (1) is such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$
(17)

for all k, then there exists a constant \(\mathcal{M} > 0\) such that the q-descent direction satisfies

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \mathcal{M}, $$
(18)

for all k.

Proof

From (12) and (16) for \(k\ge 1\), we obtain

$$ d_{q^{k}}^{k} = -g_{q^{k}}^{k} + \frac{ ( g_{q^{k}} )^{k} ( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} )}{ \lVert g_{q^{k-1}}^{ k-1} \rVert ^{2}} d_{q^{k-1}}^{k-1} - \frac{ ( g_{q^{k}} )^{T} d_{q^{k-1}}^{k-1}}{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl( g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \bigr). $$

Taking the norm of both sides of the above equation and using (16), we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ \lVert g_{q^{k}}^{k} - g_{q^{k-1}}^{k-1} \rVert \lVert d_{q^{k-1}}^{k-1} \rVert }{ \lVert g_{q^{k-1}}^{k-1} \rVert ^{2}}. $$

From Assumption 4.2 and (17), we have

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \eta + 2 \eta \frac{ L\alpha _{k-1} \lVert d_{q^{k-1}}^{k-1} \rVert }{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{ k-1} \bigr\rVert . $$
(19)

From (10), \(\alpha _{k-1}d_{q^{k-1}}^{k-1}\to 0\) and since \(\{q^{k}\}\) approaches \((1,\ldots , 1)^{T}\) as \(k\to \infty \), there exist a constant \(s\in (0, 1)\) and an integer \(k_{0}\) such that the following inequality holds for all \(k\ge k_{0}\):

$$ 2 \eta \frac{ L\alpha _{k-1}}{ \epsilon ^{2}} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \le s. $$

From (19), we get for any \(k>k_{0}\),

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert &\le \eta + s \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert \\ &\le \eta ( 1+s) +s^{2} \bigl\lVert d_{q^{k-2}}^{k-2} \bigr\rVert \\ & \quad \vdots \\ &\le \eta \bigl( 1 + s + s^{2} + \cdots + s^{k-k_{0}-1} \bigr) + s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . \end{aligned}$$

For k sufficiently large with \(s\in (0, 1)\), the second term of the above inequality can satisfy

$$ s^{k-k_{0}} \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert < \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Thus, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert < \frac{\eta }{ 1 - s} + \bigl\lVert d_{ q^{k_{0}}}^{k_{0}} \bigr\rVert . $$

Choosing

$$ \mathcal{M} = \max \biggl\{ \bigl\lVert d_{q^{1}}^{1} \bigr\rVert , \bigl\lVert d_{q^{2}}^{2} \bigr\rVert ,\ldots , \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert , \frac{\eta }{1-s}+ \bigl\lVert d_{q^{k_{0}}}^{k_{0}} \bigr\rVert \biggr\} , $$

thus we get (18). □

We now present that the modified q-PRP method with modified Armijo-type line search introduced by [51] due to the q-gradient is globally convergent.

Theorem 4.2

Assume that Assumptions 4.1and 4.2hold, then Algorithm 1 generates an infinite sequence \(\{x^{k}\}\) such that

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$
(20)

Proof

For the sake of obtaining a contradiction, we suppose that the given conclusion is not true. Then, there exists a constant \(\epsilon >0\) such that

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert \ge \epsilon , $$
(21)

for all k. If \(\liminf_{ k \to \infty } \alpha _{k} > 0\), then from (10) and (14), we get

$$ \liminf_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This contradicts the assumption (22). Suppose that \(\liminf_{k\to \infty }\alpha _{k}=0\), that is, there is an infinite index set \(\mathcal{K}\) such that

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \alpha _{k} = 0. $$

Suppose that step-9 of Algorithm 1 utilizes (9) to generate the step-length. When \(k\in \mathcal{K}\) is sufficiently large, and \(\rho ^{-1}\alpha _{k}\) for \(\rho \in (0 , 1)\) [52] does not satisfy (9), then we must have

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl( x^{k} \bigr) > - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$
(22)

From the q-mean value theorem, there is \(\gamma _{k} \in (0,1)\) such that

$$ f \bigl( x^{k}+\rho ^{-1} \alpha _{k} d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) = \rho ^{-1} \alpha _{k}g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}, $$

that is,

$$\begin{aligned} f \bigl( x^{k}+\rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) -f \bigl( x^{k} \bigr) &= \rho ^{-1} \alpha _{k} \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ & \quad{} + \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}} \bigl( x^{k} + \gamma _{k} \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - g_{q^{k}} \bigl(x^{k} \bigr) \bigr)^{T} d_{q^{k}}^{k}. \end{aligned}$$

From Lemma 4.1 and Assumption 4.2, we have

$$ f \bigl( x^{k} + \rho ^{-1} \alpha _{k}d_{q^{k}}^{k} \bigr) - f \bigl(x^{k} \bigr) \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L\rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, $$
(23)

where \(L>0\). From (22) and (23),

$$ - \delta \rho ^{-2}\alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} \le \rho ^{-1} \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} + L \rho ^{-2} \alpha _{k}^{2} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Using (14), we get

$$ \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \le \alpha _{k} ( \delta +L) \rho ^{-1} \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}. $$

Since \(\{d_{q^{k}}^{k}\}\) is bounded and \(\lim_{k\in \mathcal{K}, k \to \infty } \alpha _{k}=0\),

$$ \lim_{\substack{k \to \infty ,\\ k \in \mathcal{K}}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$

This gives a contradiction. The proof is complete. □

The following important result introduced by Zoutendijk [54] can be expressed in the sense of q-calculus as follows:

Lemma 4.3

Suppose that Assumptions 4.1and 4.2hold. Consider the iteration methods (2) and (4), where \(d_{q^{k}}^{k}\) satisfies (3) and \(\alpha _{k}\) is obtained by standard Wolfe line search conditions (7)(8) and strong Wolfe line search conditions (7) and (11). Then,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$
(24)

We now present the convergence analysis of Algorithm 1 with standard Wolfe conditions, which is a modification of [55, 56] in the sense of q-calculus. In this case, the step-lengths are bounded below by a positive constant.

Theorem 4.4

Assume that the line search fulfills the standard Wolfe conditions (7)(8). If there exists a positive constant \(\alpha _{0}\in (0,1]\) such that \(\alpha _{k}\ge \alpha _{0}\) for all \(k\ge 0\), then

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0. $$
(25)

Proof

From (3) and the first Wolfe condition (7), we have

$$\begin{aligned} f \bigl( x^{k+1} \bigr) &\le f \bigl(x^{k} \bigr) + \delta \alpha _{k} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le f \bigl( x^{k} \bigr)\le f \bigl( x^{k-1} \bigr)\le \cdots \le f \bigl( x^{0} \bigr). \end{aligned}$$

This means that the sequence \(\{f(x^{k})\}_{k\ge 0}\) is bounded. From the second standard Wolfe condition (8) and Assumption 4.2, we get

$$\begin{aligned} - ( 1 - \sigma ) \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} &\le \bigl( g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \\ &\le \bigl\lVert g_{q^{k+1}}^{k+1} - g_{q^{k}}^{k} \bigr\rVert \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \alpha _{k} L \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2}, \end{aligned}$$

that is,

$$ - \frac{ (1-\sigma ) (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k}}{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k}. $$

Post-multiplying both sides by \(\delta (g_{q^{k}}^{k})^{T}d_{q^{k}}^{k}\), we get

$$ - \frac{ ( 1 - \sigma ) \delta ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ L \lVert d_{q^{k}}^{k} \rVert ^{2}} \le \alpha _{k} \delta \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$

From the first standard Wolfe condition (7), we have

$$ \frac{ \delta ( 1 - \sigma )}{ L } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \le f \bigl(x^{k} \bigr) - f \bigl(x^{k+1} \bigr). $$

Since \(\{f(x^{k})\}_{k\ge 0}\) is bounded,

$$\begin{aligned} \begin{aligned} \frac{ \delta (1 - \sigma )}{L} \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} &\le f \bigl(x^{0} \bigr) - f \bigl(x^{1} \bigr) + \bigl( f \bigl(x^{1} \bigr) - f \bigl( x^{2} \bigr) \bigr) + \cdots \\ &= f \bigl(x^{0} \bigr) - \lim_{ k \to \infty } f \bigl(x^{k} \bigr)< +\infty . \end{aligned} \end{aligned}$$

Thus, Zoutendijk condition (24) holds, that is,

$$ \sum_{k=0}^{\infty } \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< + \infty . $$
(26)

From Assumption 4.1, there exists a constant \(\mathcal{B}\) such that

$$ \bigl\lVert p^{k} \bigr\rVert = \bigl\lVert \alpha _{k} d_{q^{k}}^{k} \bigr\rVert \le \mathcal{B}. $$

Since \(\alpha _{k}\ge \alpha _{0}\), we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert \le \frac{\mathcal{B}}{\alpha _{0}}. $$

This, together with (6) and (26), leads to (25). □

We present the following theorem which is a modification of that in [57] using the q-gradient for q-PRP method with strong Wolfe conditions.

Theorem 4.5

Suppose that \(x^{0}\) is a starting point and Assumptions 4.1and 4.2hold. Let \(\{x^{k}\}\) be the sequence generated by Algorithm 1. If \(\beta _{k}^{q-\mathrm{PRP}}\) is such that the step-length \(\alpha _{k}\) satisfies the strong Wolfe conditions (7) and (11), then either

$$ \lim_{k\to \infty } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert =0 \quad \textit{or}\quad \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}}< \infty . $$
(27)

Proof

From (4), for all \(k\ge 1\), we have

$$ d_{q^{k}}^{k} + g_{q^{k}}^{k} = \beta _{k}^{q-\mathrm{PRP}} d_{q^{k-1}}^{k-1}. $$

Squaring both sides of the above equation, we get

$$ \bigl\lVert d_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + 2 \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = \bigl( \beta _{k}^{ q - \mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. $$

Since \(d_{q^{k}}^{k}\) satisfies the descent condition \((g_{q^{k}}^{k})^{T} d_{q^{k}}^{k} < 0\),

$$\begin{aligned} \bigl\lVert d_{q^{k}}^{ k} \bigr\rVert ^{2} \ge - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl\lVert d_{q^{k-1}}^{k-1} \bigr\rVert ^{2}. \end{aligned}$$
(28)

Pre-multiplying (4) for \(k\ge 1\) by \(g_{q^{k}}^{k}\), we get

$$ \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} = - \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} + \beta _{k}^{ q - \mathrm{PRP}} \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k}. $$
(29)

From (29) and the second strong Wolfe condition (11), one obtains

$$ \bigl\lvert g_{q^{k}}^{k} d_{q^{k}}^{k} \bigr\rvert + \sigma \bigl\lvert \beta _{k}^{q-\mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert \ge \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2}. $$
(30)

From the inequality

$$ ( a + \sigma b)^{2} \le \bigl( 1 + \sigma ^{2} \bigr) \bigl( a^{2}+b^{2} \bigr), $$

for all a, b, \(\sigma \ge 0\), with

$$ a = \bigl\lvert \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr\rvert $$

and

$$ b = \bigl\lvert \beta _{k}^{q- \mathrm{PRP}} \bigr\rvert \bigl\lvert \bigl( g_{q^{k-1}}^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr\rvert , $$

we can express (30) as

$$\begin{aligned} \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q-\mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1} }^{ k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \ge c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4}, \end{aligned}$$
(31)

where \(c_{1} = \frac{1}{(1+\sigma ^{2})}\). Note that

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad = \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl(g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \frac{ \lVert d_{q^{k}}^{k} \rVert ^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl( \bigl(g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \biggr]. \end{aligned}$$

From (28) one gets

$$\begin{aligned} &\frac{ ( (g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert } \\ & \quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ \bigl( \bigl( g_{q^{k}}^{k} \bigr)^{T} d_{q^{k}}^{k} \bigr)^{2} + \bigl( \beta _{k}^{q - \mathrm{PRP}} \bigr)^{2} \bigl( \bigl( g_{q^{k-1}}^{k-1} \bigr)^{T} d_{q^{k-1}}^{k-1} \bigr)^{2} \\ &\quad \quad{} - \frac{ ( g_{q^{k-1}}^{k-1} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2} } \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned}$$

Using (31), we obtain

$$ \begin{aligned} &\frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \\ &\quad \ge \frac{1}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \biggl[ c_{1} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{4} - \frac{ ( g_{q^{k}}^{k-1} d_{q^{k-1}}^{k-1} )^{2} }{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \bigl\lVert g_{q^{k}}^{k} \bigr\rVert ^{2} \biggr]. \end{aligned} $$
(32)

If (27) is not true, then from the Zoutendijk condition (24) with (32) we obtain the following inequality:

$$\begin{aligned} \frac{ ( ( g_{q^{k}}^{k} )^{T} d_{q^{k}}^{k} )^{2}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} + \frac{ ( ( g_{q^{k-1}}^{k-1} )^{T} d_{q^{k-1}}^{k-1} )^{2}}{ \lVert d_{q^{k-1}}^{k-1} \rVert ^{2}} \ge \frac{c}{2} \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4} }{ \lVert d_{q^{k}}^{k} \rVert ^{2}} \end{aligned}$$
(33)

which holds for k sufficiently large with \(q^{k}\) approaching \((1,\ldots ,1)^{T}\). From (32) and (33), one immediately recovers (30). □

The following lemma immediately follows from the above convergence theorem.

Lemma 4.6

Suppose that Assumptions 4.1and 4.2hold and, from Algorithm 1, the step-length is determined using strong Wolfe conditions. If

$$ \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{r}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} = + \infty , $$
(34)

for any \(r\in [0, 4]\), then the method converges in the sense that (27) is true.

Proof

If (27) is not true, then from Theorem 4.5, it follows that

$$\begin{aligned} \sum_{k=1}^{\infty } \frac{ \lVert g_{q^{k}}^{k} \rVert ^{4}}{ \lVert d_{q^{k}}^{k} \rVert ^{2}} < + \infty . \end{aligned}$$
(35)

Because \(\lVert g_{q^{k}}^{k} \rVert \) is bounded away from zero and \(r\in [0, 4]\), it is easy to see that (35) contradicts (34). Therefore, the lemma is true. □

The above inequality shows that if a conjugate gradient method fails to converge, then the length of the search direction will diverge to infinity. Observe that in the above developments, the sufficient descent condition is assumed. This lemma is very useful for proving the global convergence of some conjugate gradient methods without assuming the sufficient descent condition.

Numerical illustration

In this section, we investigate the computational efficiency of Algorithm 1 using standard Wolfe conditions (7) and (8), and strong Wolfe conditions (7) and (11), respectively, in contrast to the classical PRP method under the same two conditions.

All codes of Algorithm 1 and classical PRP method are written in R version 3.6.1 installed on a laptop having Intel(R) Core(TM) i3-4005U, 1.70 GHz CPU processor and 4 GB RAM. The iteration was set to terminate if it exceeded 1000 or the gradient of a function was within 10−6.

Example 5.1

Consider a function (Mishra 6) [58] \(f : \mathbb{R}^{2}\to \mathbb{R}\) given by

$$\begin{aligned} f(x) &= -\log \bigl( \sin ^{2} \bigl( ( \cos x_{1} + \cos x_{2} )^{2} \bigr)- \cos ^{2} \bigl( ( \sin x_{1} + \sin x_{2} )^{2} \bigr) \bigr) \\ & \quad{} + 0.1 \bigl[ (x_{1}-1)^{2} + (x_{2}-1)^{2} \bigr]. \end{aligned}$$

We find the q-gradient of the above function at the point

$$ x=(2.88 , 1.82)^{T}, $$

with the starting parameter value

$$ q^{1}=(0.32 , 0.32)^{T}. $$

We run the q-gradient algorithm [39] for \(k=1,\ldots ,50\) iterations so that \(q^{50}\) approaches

$$ (0.999607921, 0.999607921)^{T}, $$

and in the 50th iteration we get the q-gradient

$$ g_{q^{50}}^{50}=(-0.41348771, -0.63704079)^{T}. $$

The complete computational details are given in Table 1 which is depicted graphically through Fig. 1. Note that Fig. 2 provides the three-dimensional view of Mishra 6 test function.

Figure 1
figure1

Graphical representation of the q-gradient of Mishra 6 function based on Table 1

Figure 2
figure2

Three-dimensional view of Mishra 6 function

Table 1 q-Gradient of Example 5.1

Example 5.2

Consider a function \(f : \mathbb{R}^{2} \to \mathbb{R}\) given by

$$ f(x_{1}, x_{2}) =(1-x_{1})^{2}+100 \bigl(x_{2}-x_{1}^{2} \bigr)^{2}. $$

The Rosenbrock function, also called Rosenbrock’s valley or banana function, is a nonconvex, unimodal, and nonseparable function. Finding its global minimum numerically is difficult. It has only one global minimizer located at the point

$$ x^{*}=(1 , 1)^{T}, $$

with the search range \([-100, 100]\) for \(x_{1}\) and \(x_{2}\). For performing the experiment, we first generated 37 different starting points from the interval \([-5, 5]\) for the above Rosenbrock function. The numerical results are shown in Table 2 for Algorithm 1 and Table 3 for the classical PRP Algorithm. From these tables, we observe that the number of iterations \((NI)\) is smaller in the case of Algorithm 1 in comparison to the classical PRP method. The meanings of columns of both tables are well-defined. Figure 3 shows the results of comparisons in the number of iterations.

Figure 3
figure3

Graphical representation of q-PRP and PRP algorithms based on Tables 2 and 3

Table 2 Numerical results of Example 5.2 using Algorithm 1
Table 3 Numerical results of Example 5.2 using classical PRP Algorithm

Example 5.3

Consider the following Rastrigin function \(f : \mathbb{R}^{2} \to \mathbb{R}\), that is,

$$ f(x_{1}, x_{2}) = 20 + x_{1}^{2}+x_{2}^{2} - 10(\cos 2\pi x_{1}) + \cos 2\pi x_{2}. $$

The Rastrigin test function is a nonconvex, multimodal, and separable function, which has several local minimizers arranged in a regular lattice, but it has only one global minimizer located at the point

$$ x^{*}=(0, 0)^{T}. $$

The search range for the Rastrigin function is \([-5.12, 5.12]\) in both \(x_{1}\) and \(x_{2}\). This function poses a fairly difficult problem due to its large search space and its large number of local minima. With a chosen starting point \((0.2, 0.2)^{T}\), we minimize this function through Algorithm 1 using strong Wolfe conditions. Note that q-PRP terminates in 5 iterations as

$$ g_{q^{5}}^{5}=(0.0001900418 , 0.0001900418)^{T}, $$

with step-length \(\alpha _{5}=0.252244535\). Thus, we get the global minimizer

$$ x^{*} =x^{5}= (-2.05643E-08 , -2.05643E-08)^{T}, $$

with minimum function value

$$ f\bigl(x^{*}\bigr) = 1.669775E-13, $$

while running the classical PRP method using strong Wolfe conditions from the same chosen starting point, it terminates in 5 iterations with

$$ g_{q^{5}}^{5}=(1.776357E-10 , 2.66453E-10)^{T}, $$

\(\alpha _{5}=0.002547382\), but fails to achieve the global minimizer as

$$ x^{*}=x^{5}= (-1.990911 , -1.990911)^{T}, $$

and

$$ f\bigl(x^{*}\bigr)=7.967698, $$

which are not true. This is one of the advantages of using the q-gradient in our proposed method over the classical method.

We now execute Algorithm 1 on a set of test functions taken from the CUTEr library [59] with 51 different starting points under standard and strong Wolfe conditions, respectively. Note that direction \(d_{q}^{k}\) generated by the proposed method is the q-descent direction due to the involvement of the q-gradient. Tables 4 and 5 list the numerical results of Algorithm 1 for 51 different starting points on a set of test problems, and Figs. 4 and 5 show this comparison graphically for both q-PRP and classical PRP methods under standard and strong Wolfe conditions, respectively. We conclude that our method is better than the classical method with the smaller number of iterations for the selected set of test problems.

Figure 4
figure4

Graphical representation of q-PRP and PRP algorithms under standard Wolfe conditions based on Tables 4 and 5

Figure 5
figure5

Graphical representation of q-PRP and PRP algorithms under strong Wolfe conditions based on Tables 4 and 5

Table 4 Numerical results using Algorithm 1
Table 5 Numerical results using classical PRP

Conclusion and future work

This paper proposed the q-PRP conjugate gradient method, which is an improvement of classical PRP conjugate gradient methods. The global convergence of the proposed method is established under the standard and strong Wolfe line searches. The effectiveness of the proposed method has been shown by some numerical examples. We find that the proposed method due to the q-gradient converges fast for a set of test problems with different starting points. The inclusion of q-calculus in other conjugate gradient methods deserves further investigation.

Availability of data and materials

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. 1.

    Mishra, S.K., Ram, B.: Conjugate gradient methods. In: Introduction to Unconstrained Optimization with R, pp. 211–244. Springer, Singapore (2019)

    Google Scholar 

  2. 2.

    Andrei, N.: Nonlinear Conjugate Gradient Methods for Unconstrained Optimization. Springer, Berlin (2020)

    Google Scholar 

  3. 3.

    Li, Y., Chen, W., Zhou, H., Yang, L.: Conjugate gradient method with pseudospectral collocation scheme for optimal rocket landing guidance. Aerosp. Sci. Technol. 104, 105999 (2020)

    Article  Google Scholar 

  4. 4.

    Liu, J., Du, S., Chen, Y.: A sufficient descent nonlinear conjugate gradient method for solving m-tensor equations. J. Comput. Appl. Math. 371, 112709 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  5. 5.

    Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Hestenes, M.R., Stiefel, E., et al.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952)

    MathSciNet  MATH  Article  Google Scholar 

  7. 7.

    Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149–154 (1964)

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Mishra, S.K., Ram, B.: Steepest descent method. In: Introduction to Unconstrained Optimization with R, pp. 131–173. Springer, Singapore (2019)

    Google Scholar 

  9. 9.

    Polak, E., Ribiere, G.: Note sur la convergence de méthodes de directions conjuguées. ESAIM: Math. Model. Numer. Anal. 3(R1), 35–43 (1969)

    MATH  Google Scholar 

  10. 10.

    Jackson, F.H.: On q-functions and a certain difference operator. Earth Environ. Sci. Trans. R. Soc. Edinb. 46(2), 253–281 (1909)

    Article  Google Scholar 

  11. 11.

    Jackson, D.O., Fukuda, T., Dunn, O., Majors, E.: On q-definite integrals. Q. J. Pure Appl. Math. 41, 193–203 (1910)

    Google Scholar 

  12. 12.

    Ernst, T.: A method for q-calculus. J. Nonlinear Math. Phys. 10(4), 487–525 (2003)

    MathSciNet  MATH  Article  Google Scholar 

  13. 13.

    Awan, M.U., Talib, S., Kashuri, A., Noor, M.A., Chu, Y.-M.: Estimates of quantum bounds pertaining to new q-integral identity with applications. Adv. Differ. Equ. 2020, 424 (2020)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Samei, M.E.: Existence of solutions for a system of singular sum fractional q-differential equations via quantum calculus. Adv. Differ. Equ. 2020, 23 (2020). https://doi.org/10.1186/s13662-019-2480-y

    MathSciNet  Article  Google Scholar 

  15. 15.

    Liang, S., Samei, M.E.: New approach to solutions of a class of singular fractional q-differential problem via quantum calculus. Adv. Differ. Equ. 2020, 14 (2020). https://doi.org/10.1186/s13662-019-2489-2

    MathSciNet  Article  Google Scholar 

  16. 16.

    Ahmadian, A., Rezapour, S., Salahshour, S., Samei, M.E.: Solutions of sum-type singular fractional q-integro-differential equation with m-point boundary value problem using quantum calculus. Math. Methods Appl. Sci. 43(15), 8980–9004 (2020). https://doi.org/10.1002/mma.6591

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Samei, M.E., Hedayati, H., Rezapour, S.: Existence results for a fraction hybrid differential inclusion with Caputo–Hadamard type fractional derivative. Adv. Differ. Equ. 2019, 163 (2019). https://doi.org/10.1186/s13662-019-2090-8

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Samei, M.E., Ranjbar, G.K., Hedayati, V.: Existence of solutions for equations and inclusions of multiterm fractional q-integro-differential with nonseparated and initial boundary conditions. J. Inequal. Appl. 2019, 273 (2019). https://doi.org/10.1186/s13660-019-2224-2

    MathSciNet  Article  Google Scholar 

  19. 19.

    Mishra, S.K., Panda, G., Ansary, M.A.T., Ram, B.: On q-Newton’s method for unconstrained multiobjective optimization problems. J. Appl. Math. Comput. 63, 391–410 (2020)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Lai, K.K., Mishra, S.K., Ram, B.: On q-quasi-Newton’s method for unconstrained multiobjective optimization problems. Mathematics 8(4), 616 (2020)

    Article  Google Scholar 

  21. 21.

    Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)

    MATH  Article  Google Scholar 

  22. 22.

    Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comput. Math. Math. Phys. 9(4), 94–112 (1969)

    MATH  Article  Google Scholar 

  23. 23.

    Yuan, Y.: Numerical Methods for Nonlinear Programming. Shanghai Sci. Technol., Shanghai (1993)

    Google Scholar 

  24. 24.

    Powell, M.J.: Nonconvex minimization calculations and the conjugate gradient method. In: Numerical Analysis, pp. 122–141. Springer, Berlin (1984)

    Google Scholar 

  25. 25.

    Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  26. 26.

    Powell, M.J.: Convergence properties of algorithms for nonlinear optimization. SIAM Rev. 28(4), 487–500 (1986)

    MathSciNet  MATH  Article  Google Scholar 

  27. 27.

    Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm and its application in large-scale optimization problems and image restoration. J. Inequal. Appl. 2019, 247 (2019) https://doi.org/10.1186/s13660-019-2192-6

    MathSciNet  Article  Google Scholar 

  28. 28.

    Yuan, G., Li, T., Hu, W.: A conjugate gradient algorithm for large-scale nonlinear equations and image restoration problems. Appl. Numer. Math. 147, 129–141 (2020). https://doi.org/10.1016/j.apnum.2019.08.022

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Hu, W., Wu, J., Yuan, G.: Some modified Hestenes–Stiefel conjugate gradient algorithms with application in image restoration. Appl. Numer. Math. 158, 360–376 (2020). https://doi.org/10.1016/j.apnum.2020.08.009

    MathSciNet  Article  MATH  Google Scholar 

  30. 30.

    Yuan, G., Lu, J., Wang, Z.: The PRP conjugate gradient algorithm with a modified WWP line search and its application in the image restoration problems. Appl. Numer. Math. 152, 1–11 (2020). https://doi.org/10.1016/j.apnum.2020.01.019

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Yuan, G., Wei, Z., Yang, Y.: The global convergence of the Polak–Ribière–Polyak conjugate gradient algorithm under inexact line search for nonconvex functions. J. Comput. Appl. Math. 362, 262–275 (2019)

    MathSciNet  MATH  Article  Google Scholar 

  32. 32.

    Yuan, G., Wang, X., Zhou, S.: The projection technique for two open problems of unconstrained optimization problems. J. Optim. Theory Appl. 186, 590–619 (2020)

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    Zhang, M., Zhou, Y., Wang, S.: A modified nonlinear conjugate gradient method with the Armijo line search and its application. Math. Probl. Eng. 2020, 6210965 (2020)

    MathSciNet  Google Scholar 

  34. 34.

    Soterroni, A.C., Galski, R.L., Ramos, F.M.: The q-gradient vector for unconstrained continuous optimization problems. In: Operations Research Proceedings 2010, pp. 365–370. Springer, Berlin (2011)

    Google Scholar 

  35. 35.

    Sadiq, A., Usman, M., Khan, S., Naseem, I., Moinuddin, M., Al-Saggaf, U.M.: q-LMF: quantum calculus-based least mean fourth algorithm. In: Fourth International Congress on Information and Communication Technology, pp. 303–311. Springer, Berlin (2020)

    Google Scholar 

  36. 36.

    Sadiq, A., Khan, S., Naseem, I., Togneri, R., Bennamoun, M.: Enhanced q-least mean square. Circuits Syst. Signal Process. 38(10), 4817–4839 (2019)

    Article  Google Scholar 

  37. 37.

    Chakraborty, S.K., Panda, G.: q-line search scheme for optimization problem. arXiv preprint, arXiv:1702.01518 (2017)

  38. 38.

    Chakraborty, S.K., Panda, G.: Newton like line search method using q-calculus. In: International Conference on Mathematics and Computing, pp. 196–208. Springer, Berlin (2017)

    Google Scholar 

  39. 39.

    Lai, K.K., Mishra, S.K., Panda, G., Chakraborty, S.K., Samei, M.E., Ram, B.: A limited memory q-BFGS algorithm for unconstrained optimization problems. J. Appl. Math. Comput. (2020). https://doi.org/10.1007/s12190-020-01432-6

    Article  Google Scholar 

  40. 40.

    Lai, K.K., Mishra, S.K., Panda, G., Ansary, M.A.T., Ram, B.: On q-steepest descent method for unconstrained multiobjective optimization problems. AIMS Math. 5(6), 5521–5540 (2020)

    MathSciNet  Article  Google Scholar 

  41. 41.

    Gouvêa, É.J., Regis, R.G., Soterroni, A.C., Scarabello, M.C., Ramos, F.M.: Global optimization using q-gradients. Eur. J. Oper. Res. 251(3), 727–738 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Aral, A., Gupta, V., Agarwal, R.P., et al.: Applications of q-Calculus in Operator Theory. Springer, New York (2013)

    Google Scholar 

  43. 43.

    Rajković, P., Stanković, M., Marinković, S.D.: Mean value theorems in g-calculus. Mat. Vesn. 54(3–4), 171–178 (2002)

    MathSciNet  MATH  Google Scholar 

  44. 44.

    Dai, Y.-H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)

    MathSciNet  MATH  Article  Google Scholar 

  45. 45.

    Fletcher, R.: Practical Methods of Optimization, vol. 80, 4. Wiley, New York (1987)

    Google Scholar 

  46. 46.

    Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006)

    Google Scholar 

  47. 47.

    Mishra, S.K., Ram, B.: Introduction to Unconstrained Optimization with R. Springer, Singapore (2019)

    Google Scholar 

  48. 48.

    Wolfe, P.: Convergence conditions for ascent methods, II: some corrections. SIAM Rev. 13(2), 185–188 (1971)

    MathSciNet  MATH  Article  Google Scholar 

  49. 49.

    Goldstein, A.A.: On steepest descent. J. Soc. Ind. Appl. Math., A, on Control 3(1), 147–151 (1965)

    MathSciNet  MATH  Google Scholar 

  50. 50.

    Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)

    MathSciNet  MATH  Article  Google Scholar 

  51. 51.

    Grippo, L., Lucidi, S.: A globally convergent version of the Polak–Ribiere conjugate gradient method. Math. Program. 78(3), 375–391 (1997)

    MathSciNet  MATH  Article  Google Scholar 

  52. 52.

    Zhang, L., Zhou, W., Li, D.-H.: A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26(4), 629–640 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  53. 53.

    Mishra, S.K., Ram, B.: One-dimensional optimization methods. In: Introduction to Unconstrained Optimization with R, pp. 85–130. Springer, Boston (2019)

    Google Scholar 

  54. 54.

    Zoutendijk, G.: Nonlinear programming, computational methods. In: Integer and Nonlinear Programming, pp. 37–86 (1970)

    Google Scholar 

  55. 55.

    Yuan, G.: Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems. Optim. Lett. 3(1), 11–21 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  56. 56.

    Aminifard, Z., Babaie-Kafaki, S.: A modified descent Polak–Ribiére–Polyak conjugate gradient method with global convergence property for nonconvex functions. Calcolo 56(2), 16 (2019)

    MATH  Article  Google Scholar 

  57. 57.

    Dai, Y., Han, J., Liu, G., Sun, D., Yin, H., Yuan, Y.-X.: Convergence properties of nonlinear conjugate gradient methods. SIAM J. Optim. 10(2), 345–358 (2000)

    MathSciNet  MATH  Article  Google Scholar 

  58. 58.

    Mishra, S.K.: Global optimization by differential evolution and particle swarm methods: evaluation on some benchmark functions. Available at SSRN 933827 (2006)

  59. 59.

    Gould, N.I.M., Orban, D., Toint, P.L.: CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 29, 373–394 (2003)

    MATH  Article  Google Scholar 

Download references

Acknowledgements

The first author was supported by the Science and Engineering Research Board (Grant No. DST-SERB-MTR-2018/000121). The third author was supported by Bu-Ali Sina University. The fourth author was supported by the University Grants Commission (IN) (Grant No. UGC-2015-UTT-59235). Constructive comments by the referees are gratefully acknowledged.

Funding

Not applicable.

Author information

Affiliations

Authors

Contributions

The authors declare that the study was realized in collaboration with equal responsibility. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammad Esmael Samei.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mishra, S.K., Chakraborty, S.K., Samei, M.E. et al. A q-Polak–Ribière–Polyak conjugate gradient algorithm for unconstrained optimization problems. J Inequal Appl 2021, 25 (2021). https://doi.org/10.1186/s13660-021-02554-6

Download citation

MSC

  • 34A08
  • 34B16
  • 90C26
  • 39A13

Keywords

  • Unconstrained optimization
  • Conjugate gradient method
  • Global convergence
  • q-calculus