Skip to main content


A conjugate gradient algorithm and its application in large-scale optimization problems and image restoration

  • 551 Accesses


To solve large-scale unconstrained optimization problems, a modified PRP conjugate gradient algorithm is proposed and is found to be interesting because it combines the steepest descent algorithm with the conjugate gradient method and successfully fully utilizes their excellent properties. For smooth functions, the objective algorithm sufficiently utilizes information about the gradient function and the previous direction to determine the next search direction. For nonsmooth functions, a Moreau–Yosida regularization is introduced into the proposed algorithm, which simplifies the process in addressing complex problems. The proposed algorithm has the following characteristics: (i) a sufficient descent feature as well as a trust region trait; (ii) the ability to achieve global convergence; (iii) numerical results for large-scale smooth/nonsmooth functions prove that the proposed algorithm is outstanding compared to other similar optimization methods; (iv) image restoration problems are done to turn out that the given algorithm is successful.


The concerned problem is given by

$$ \min \bigl\{ f(x) \mid x \in \Re ^{n} \bigr\} , $$

where the function \(f: \Re ^{n}\rightarrow \Re \) and \(f\in C^{2}\). The above model is quite typical but a difficult mathematic model and is seen throughout daily life, work, and scientific research, thus being the focus of a great variety of careers. Experts and scholars have conducted numerous in-depth studies and achieved a series of fruitful results (see, e.g., [2, 5, 12, 22, 29,30,31, 40, 50, 52, 53]). It is quite noticeable that the steepest descent method is simple, and the computational and memory requirements are low. In the negative gradient direction, the function’s value decreases rapidly, which makes it easy to think that this is a suitable search direction, although the convergence rate of the gradient method is not always fast. Later, experts and scholars modified this method and presented an efficient conjugate gradient method, which provides a simple form but high performance. There are two aspects of optimization problems: the step length and the search direction. In general, the mathematical formula for (1.1) is

$$ x_{k+1}=x_{k}+\alpha _{k}d_{k}, \quad k \in \{0, 1, 2,\ldots \}, $$

where \(x_{k}\) is the current iteration point, \(\alpha _{k}\) is called the step length, and \(d_{k}\) is the kth search direction. The formula for \(d_{k}\) is often defined by

$$ d_{k+1}=\textstyle\begin{cases} -g_{k+1}+\beta _{k}d_{k}, & \mbox{if } k\geq 1, \\ -g_{k+1},& \mbox{if } k=0, \end{cases} $$

where \(\beta _{k} \in \Re \). In addition, increasingly more efficient and successful conjugate gradient algorithms have been proposed using a variety of expression for \(\beta _{k}\) as well as \(d_{k}\) (see, e.g., [9, 11, 27, 38, 42,43,44, 47, 49]). The well-known PRP algorithm [26, 27] is of the following form:

$$ \beta _{k}^{\mathrm{PRP}}=\frac{g_{k+1}^{T}(g_{k+1}-g_{k})}{ \Vert g_{(x _{k})} \Vert ^{2}}, $$

where \(g_{k}\), \(g_{k+1}\) and \(f_{k}\) denote \(g(x_{k})\), \(g(x_{k+1})\) and \(f(x_{k})\), respectively. \(g(x_{k+1})=g_{k+1}=\nabla f(x_{k+1})\) is the gradient function of the objective function f at \(x_{k+1}\). It is remarkable that the PRP conjugate algorithm is extremely effective for large-scale optimization problems. It is regrettable that it fails to achieve global convergence when addressing nonconvex function problems under the so-called weak Wolfe–Powell (WWP) line search technique. Its formula is as follows:

$$ g(x_{k}+\alpha _{k}d_{k})^{T}d_{k} \ge \rho g_{k}^{T}d_{k} $$


$$ f(x_{k}+\alpha _{k}d_{k}) \le f_{k}+\varphi \alpha _{k}g_{k}^{T}d_{k}, $$

where \(\varphi \in (0, 1/2)\), \(\alpha _{k} > 0\) and \(\rho \in (\varphi , 1)\). To address the above exchanging problem, Yuan, Wei, and Lu [48] developed the following innovative formula for the normal WWP line search technique (called Yuan, Wei, and Lu line search (YWL)) and obtained numerous rich theoretical results:

$$ f(x_{k}+\alpha _{k}d_{k}) \le f_{k}+\varphi \alpha _{k}g_{k}^{T}d_{k}+ \alpha _{k}\min \biggl[-\varphi _{1}g_{k}^{T}d_{k}, \delta \frac{\alpha _{k}}{2} \Vert d_{k} \Vert ^{2} \biggr] $$


$$ g(x_{k}+\alpha _{k}d_{k})^{T}d_{k} \ge \rho g_{k}^{T}d_{k}+\min \bigl[- \varphi _{1}g_{k}^{T}d_{k},\delta \alpha _{k} \Vert d_{k} \Vert \Vert d_{k} \Vert \bigr], $$

where \(\varphi \in (0,1/2)\), \(\rho \in (\varphi ,1)\) and \(\varphi _{1} \in (0,\varphi )\). Further study can be found in [45]. Based on the innovation of the YWL line search technique, Yuan et al. [41] focused on the usual Armijo line search technique and proposed a modified Armijo line search technique as follows:

$$ f(x_{k}+\alpha _{k}d_{k}) \le f(x_{k})+\lambda \alpha _{k}g_{k}^{T}d _{k}+\alpha _{k}\min \biggl[-\lambda _{1}g_{k}^{T}d_{k}, \lambda \frac{\alpha _{k}}{2} \Vert d_{k} \Vert ^{2} \biggr], $$

where \(\lambda , \gamma \in (0,1)\), \(\lambda _{1} \in (0,\lambda )\), and \(\alpha _{k}\) is the largest number of \(\{\gamma ^{k}\mid k=0,1,2,\ldots \}\). It is interesting that some scholars not only focus on the expression of the coefficient \(\beta _{k}\) but also attempt to modify the formula of the search direction \(d_{k+1}\). Nonlinear conjugate gradient methods are increasingly more interesting to scholars because of their simplicity and lower equipment requirements for the calculation environment. Thus, HS (see [13, 16, 33]) and PRP algorithms (see [35, 51]) are widely used to solve complex problems in various fields. Currently, some experts focus on the three-term conjugate gradient because its search direction sometimes has the descent and automatic trust region properties. Motivated by the above discussion, a new modified three-term conjugate gradient algorithm based on the modified Armijo line search technique is proposed. The algorithm has the following properties:

  • The search direction has a sufficient decrease and a trust region property.

  • For general functions, the proposed algorithm under mild assumptions possesses global convergence.

  • The new algorithm combines the deepest descent method with the conjugate gradient algorithm through the size of the coefficients, and the numerical results demonstrate the method’s good performance compared with established algorithms.

  • The corresponding numerical results prove that the discussed method is efficient as well as successful at solving general problems.

  • The paper successfully combines the mathematic theory with real-world application. On the one hand, the proposed algorithm has a good performance in solving the large-scale optimization problems, on the other hand, it is introduced in the image restoration, which has wild application in biological engineering, medical sciences and other areas of science and engineering.

The remainder of this paper is organized as follows: The next section presents the motivation and the content of the algorithm to solve large-scale smooth problems includes the important mathematical characters; the similar optimization algorithm was presented to solve large-scale non-smooth optimization problems; the Sect. 4 presents the application of the Sect. 3 in the problem of the image restoration; the paper’s conclusion and algorithm’s characters was listed in Sect. 5. Without loss of generality, \(f(x_{k})\) and \(f(x_{k+1})\) are replaced by \(f_{k}\) and \(f_{k+1}\), and \(\|\cdot \|\) is the Euclidean norm.

New three-term conjugate gradient algorithm for smooth problems

The three-term conjugate gradient algorithm has seen extensive study and obtained extremely good theoretical results. In the light of the work by Toouati-Ahmed, Storey [34], Al-Baali [1], Gilbert, and Nocedal [17] on conjugate gradient methods, the sufficient descent condition is crucial for the global convergence. From this, a famous formula for the search direction \(d_{k+1}\) emerges. Zhang [51] proposed the following formula:

$$ d_{k+1}= \textstyle\begin{cases} -g_{k+1} + \frac{g_{k+1}^{T}y_{k}d_{k}-d_{k}^{T}g_{k+1}y_{k}}{g_{k} ^{T}g_{k}} & \mbox{if } k\geq 1, \\ -g_{k+1},& \mbox{if } k=0, \end{cases} $$

where \(y_{k}=g_{k+1}-g_{k}\). It is notable that the three-term conjugate gradient algorithm was firstly introduced in solving optimization problems and the numeral results proves it is competitive than similar methods, thus this paper choose it as the compared algorithm in Sects. 2.3 and 3.2. In [23], Nazareth proposed another variety of formula,

$$ d_{k+1}=-y_{k}+\frac{y_{k}^{T}y_{k}}{y_{k}^{T}d_{k}}d_{k}+ \frac{y_{k-1} ^{T}y_{k}}{y_{k-1}^{T}d_{k-1}}d_{k-1}, $$

where \(y_{k}=g_{k+1}-g_{k}\), \(g_{k}\) is the gradient function value at the point \(x_{k}\), and \(d_{0}=d_{-1}=0\). In [14], Deng and Zhong expressed a new three-term conjugate gradient formula as follows:

$$ d_{k+1}=-g_{k+1}-\biggl(\biggl(1- \frac{y_{k}^{T}y_{k}}{y_{k}^{T}s_{k}}\biggr)\frac{s _{k}^{T}g_{k+1}}{y_{k}^{T}s_{k}}-\frac{y_{k}^{T}g_{k+1}}{y_{k}^{T}s _{k}} \biggr)s_{k}-\frac{s_{k}^{T}g_{k+1}}{y_{k}^{T}s_{k}}y_{k}, $$

where \(s_{k}=x_{k+1}-x_{k}\). Based on the above discussion, we express the new three-term algorithm under the modified Armijo line search technique (1.9) as follows:

$$ d_{k+1}= \textstyle\begin{cases} -g_{k+1}+\frac{g_{k+1}^{T}y_{k}^{*}d_{k}-d_{k}^{T}g_{ k+1}y_{k}^{*}}{ \max \{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert ,\min (\xi _{3} \Vert g_{k} \Vert ^{2}, \xi _{4}d_{k}^{T}d_{k})\}} & \mbox{if } k \ge 1, \\ -g_{k+1} & \mbox{if } k = 0, \end{cases} $$

where \(\xi _{2}, \xi _{3}, \xi _{4} > 0\). To gather more information about the objective function, we address the corresponding gradient function as well as the initial point, let \(y_{k}^{*}=y_{k}+\varphi _{k}(x_{k+1}-x_{k})\), where \(y_{k}=g_{k+1}-g_{k}\), \(\varphi _{k}= \max \{0, B_{k}\}\), and \(B_{k}=\frac{(g_{k+1}+g_{k})^{T}s_{k}+2(f_{k}-f _{k+1})}{\|x_{k+1}-x_{k}\|^{2}}\). This plays an important role in theory and numerical performance [46]. From the above discussion, we introduce a new PRP algorithm (Algorithm 2.1).

Algorithm steps

Algorithm 2.1

Step 1::

(Initiation) Choose an initial point \(x_{0}\), \(\gamma \in (0,1)\), \(\xi _{2}\), \(\xi _{3}\), \(\xi _{4}> 0\), and positive constants \(\varepsilon \in (0,1)\). Let \(k=0\), \(d_{0}=-g_{0}\).

Step 2::

If \(\|g_{k}\| \leq \varepsilon \), then stop.

Step 3::

Find the step length, where the calculation \(\alpha _{k} = \max \{\gamma ^{k}\mid k=0, 1, 2, \ldots \}\) stems from (1.9).

Step 4::

Set the new iteration point of \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).

Step 5::

Update the search direction by (2.4).

Step 6::

If \(\|g_{k+1}\|\leq \varepsilon \) holds, the algorithm stops. Otherwise, go to next step.

Step 7::

Let \(k:=k+1\) and go to Step 3.

Algorithm characteristics

This section states the properties of the sufficient descent, trust region as well as global convergence of Algorithm 2.1.

Lemma 2.1

If the search direction \(d_{k}\) is generated by (2.4), then

$$ g_{k}^{T}d_{k}=- \Vert g_{k} \Vert \Vert g_{k} \Vert $$


$$ \Vert d_{k} \Vert \leq (1+2/\xi _{2}) \Vert g_{k} \Vert , $$

where σ are positive constants.


On the one hand, it is true that (2.5) and (2.6) are correct if \(k=0\).

On the other hand, from (2.4),

$$\begin{aligned} g_{k+1}^{T}d_{k+1} =&g_{k+1}^{T}\biggl[-g_{k+1}+ \frac{g_{k+1}^{T}y_{k}^{*}d _{k}-d_{k}^{T}g_{k+1}y_{k}^{*}}{\max \{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k} ^{*} \Vert ,\min (\xi _{3} \Vert g_{k} \Vert ^{2}, \xi _{4}d_{k}^{T}d_{k})\}}\biggr] \\ =& - \Vert g_{k+1} \Vert ^{2}+\frac{g_{k+1}^{T}g_{k+1}^{T}y_{k}^{*}d_{k}-g_{k+1} ^{T}d_{k}^{T}g_{k+1}y_{k}^{*}}{\max \{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert ,\min (\xi _{3} \Vert g_{k} \Vert ^{2}, \xi _{4}d_{k}^{T}d_{k})\}} \\ =&- \Vert g_{k+1} \Vert ^{2} \end{aligned}$$


$$\begin{aligned} \Vert d_{k+1} \Vert =& \biggl\Vert g_{k+1}+ \frac{g_{k+1}^{T}y_{k}^{*}d_{k}-d_{k}^{T}g _{k+1}y_{k}^{*}}{\max \{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert ,\min (\xi _{3} \Vert g_{k} \Vert ^{2}, \xi _{4}d_{k}^{T}d_{k})\}} \biggr\Vert \\ \leq & \Vert g_{k+1} \Vert + \biggl\Vert \frac{g_{k+1}^{T}y_{k}^{*}d_{k}-d_{k}^{T}g_{k+1}y _{k}^{*}}{\max \{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert ,\min (\xi _{3} \Vert g _{k} \Vert ^{2}, \xi _{4}d_{k}^{T}d_{k})\}} \biggr\Vert \\ \leq & \Vert g_{k+1} \Vert +\frac{ \Vert g_{k+1} \Vert \Vert y_{k}^{*} \Vert \Vert d_{k} \Vert + \Vert d_{k} \Vert \Vert g_{k+1} \Vert \Vert y_{k}^{*} \Vert }{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert } \\ \leq & \Vert g_{k+1} \Vert +2\frac{ \Vert g_{k+1} \Vert \Vert y_{k}^{*} \Vert \Vert d_{k} \Vert }{\xi _{2} \Vert d_{k}^{T} \Vert \Vert y_{k}^{*} \Vert } \\ =&(1+2/\xi _{2}) \Vert g_{k+1} \Vert . \end{aligned}$$

It is true that (2.5) and (2.6) demonstrate that the search direction has a sufficient descent trait and a trust region property, respectively. □

Aiming at achieving global convergence, we propose the following mild assumptions.

Assumption (i)

The level set of \(\varOmega =\{x\mid f(x) \leq f(x _{0})\}\) is bounded.

Assumption (ii)

The objective function \(f(x) \in C^{2}\) is bounded from below, and its gradient function \(g(x)\) is Lipschitz continuous, i.e., there exists a positive constant τ such that

$$ \bigl\Vert g(x)-g(y) \bigr\Vert \leq \tau \Vert x-y \Vert ,\quad x, y \in R^{n}. $$

Based on the above discussion and established conclusion concerning the modified Armijo line search of being reasonable and necessary (see [48]), the global convergence algorithm is established as follows.

Theorem 2.1

If assumptions (i)(ii) are true and the corresponding sequences of \(\{x_{k}\}\), \(\{d_{k}\}\), \(\{g_{k}\}\), \(\{\alpha _{k}\}\) are generated by Algorithm 2.1, then we arrive at the conclusion that

$$ \lim_{k \rightarrow \infty } \Vert g_{k} \Vert =0. $$


Suppose that the conclusion of the above theorem is incorrect, i.e., there exist a positive constant \(\sigma _{3}\) and index number \(k^{\prime } \) such that

$$ \Vert g_{k} \Vert \geq \sigma _{3}, \quad \forall k \geq k^{\prime }. $$

Based on (1.9) and (2.5),

$$\begin{aligned} f(x_{k}+\alpha _{k}d_{k}) \le& f(x_{k})+\lambda \alpha _{k}g_{k}^{T}d _{k}+\alpha _{k}\min \biggl[-\lambda _{1}g_{k}^{T}d_{k}, \lambda \frac{\alpha _{k}}{2} \Vert d_{k} \Vert ^{2} \biggr] \\ \leq& f(x_{k})+\alpha _{k}(\lambda -\lambda _{1})g _{k}^{T}d_{k}. \end{aligned}$$

Then with the above formulas with \(k=0\) from ∞ and combining with Assumption (ii), we obtain

$$ \sum_{k=0}^{\infty }\alpha _{k}( \lambda -\lambda _{1})g_{k}^{T}d_{k} \leq f_{0}-f_{\infty }< \infty . $$

Based on the convergence theorem of sequences,

$$ \lim_{k \rightarrow \infty } (\lambda -\lambda _{1})\alpha _{k}g_{k} ^{T}d_{k} = 0. $$

Then we have

$$ \lim_{k \rightarrow \infty } \alpha _{k}g_{k}^{T}d_{k} = 0, $$

from the formula of (2.5), then

$$ \lim_{k \rightarrow \infty } \alpha _{k} \Vert g_{k} \Vert ^{2} = 0. $$

This means that \(\{\alpha _{k}\} \rightarrow 0\), \(k \rightarrow \infty \) or \(\{\|g_{k}\|\} \rightarrow 0\), \(k \rightarrow \infty \). We then state two cases:

(i) If \(\{\alpha _{k}\} \rightarrow 0\), \(k \rightarrow \infty \), consider the line search method, for every suitable parameter \(\alpha _{k}\),

$$ f(x_{k}+\alpha _{k}/\gamma d_{k}) > f(x_{k})+\lambda \alpha _{k}/\gamma g_{k}^{T}d_{k}+ \alpha _{k}/\gamma \min \bigl(-\lambda _{1}g_{k}^{T}d_{k}, \lambda \alpha _{k} \Vert d_{k} \Vert ^{2}/(2\gamma )\bigr), $$

there thus exists a positive constant \(\lambda ^{*} \leq \lambda _{1}\), such that

$$ f(x_{k}+\alpha _{k}d_{k}/\gamma )-f(x_{k}) \geq -\bigl(\lambda -\lambda ^{*}\bigr) \alpha _{k}/\gamma \Vert g_{k} \Vert ^{2}. $$

Using (2.5), Assumption (ii) and the continuity of \(f(x)\) and \(g(x)\), we have

$$\begin{aligned} f(x_{k}+\alpha _{k}d_{k}/\gamma )-f(x_{k}) =& \alpha _{k}/\gamma g(x _{k}+ \eta _{k} \alpha _{k}/\gamma d_{k})^{T}d_{k} \\ =&\alpha _{k}/\gamma g(x_{k})^{T}d_{k}+ \bigl[\alpha _{k}/\gamma g(x_{k}+ \eta _{k} \alpha _{k}/\gamma d_{k})-g(x_{k}) \bigr]^{T}d_{k} \\ \leq & \alpha _{k}/\gamma g(x_{k})^{T}d_{k}+ \eta _{k}\tau \alpha _{k} ^{2}/\gamma ^{2} \Vert d_{k} \Vert ^{2} \\ =&-\alpha _{k}/\gamma \bigl\Vert g(x_{k}) \bigr\Vert ^{2}+\eta _{k}\tau \alpha _{k}^{2}/ \gamma ^{2} \Vert d_{k} \Vert ^{2}, \end{aligned}$$

where \(\eta _{k} \in (0,1)\). Comparing the above two expressions, we have

$$ -\alpha _{k}/\gamma \Vert g_{x_{k}} \Vert ^{2}+\eta _{k}\tau \alpha _{k}^{2}/ \gamma ^{2} \Vert d_{k} \Vert ^{2} \geq - \bigl(\lambda -\lambda ^{*}\bigr)/\gamma \alpha _{k} \Vert g_{x_{k}} \Vert ^{2} $$


$$ \Vert g_{x_{k}} \Vert ^{2}-\eta _{k}\tau \alpha _{k}/\gamma \bigl\Vert d_{k}^{2} \bigr\Vert \leq \bigl( \lambda -\lambda ^{*}\bigr) \bigl\Vert g(x_{k}) \bigr\Vert ^{2}. $$


$$ \alpha _{k} \geq \gamma \bigl(1-\lambda +\lambda ^{*} \bigr)/\bigl(\eta _{k}\tau \sigma ^{2}\bigr)>0, $$

where \(\sigma \in (1+\frac{2}{\xi _{2}}, \infty )\). This contradicts the assumption of case (i).

(ii) Clearly, \(\{g_{k}\} \rightarrow 0\) if \(\alpha _{k}\) is a positive finite constant when k is a sufficiently large constant from the formula of (2.14). This conclusion does not satisfy the assumption of (2.10); this completes the proof. □

Numerical results

Related content is presented in this section and consists of two parts: test problems and corresponding numerical results. To measure the algorithm’s efficiency, we compare Algorithm 2.1 with Algorithm 1 in [51] in terms of NI, NFG, and CPU on the test problems listed in Table 2 of Appendix 1, which are from [3], where NI, NFG, and CPU indicate the number of iterations, the sum of the calculation’s frequency of the objective function and gradient function, and the calculation time needed to solve various test problems (in seconds), respectively. Algorithm 1 is different from the objective algorithm in the formula of \(d_{k+1}\) that was determined by (2.1), and the remainder of Algorithm 1 is identical to Algorithm 2.1.

Stopping rule: If \(| f(x_{k})| > e_{1}\), let \(\mathit{stop}1=\frac{| f(x_{k})-f(x_{k+1})| }{| f(x_{k})| }\) or \(\mathit{stop}1=| f(x_{k})-f(x_{k+1})| \). If the condition \(\|g(x)\|< \epsilon \) or \(\mathit{stop}1 < e_{2}\) is satisfied, the algorithm stops, where \(e_{1}=e_{2}=10^{-4}\), \(\epsilon =10^{-4}\). On the one hand, based on the virtual case, the proposed algorithm also stops if the number of iterations is greater than 10,000 and the iteration number of \(\alpha _{k}\) is greater than 5. On the other hand, ‘NO’ and ‘problem’ in Table 2 indicate the number of the tested problem and the name of the problem, respectively.

Initiation: \(\lambda =0.9\), \(\lambda _{1}=0.4\), \(\xi _{3}= 300\), \(\xi _{2}=\xi _{4}=0.01\), \(\gamma =0.01\).

Dimension: \(30\text{,}000\), \(90\text{,}000\), \(150\text{,}000\), \(210\text{,}000\).

Calculation environment: The calculation environment is a computer with 2 GB of memory, a Pentium (R) Dual-Core CPU E5800@3.20 GHz, and the 64-bit Windows 7 operating system.

The algorithms’ numerical results are listed in Table 3 of Appendix 1 with their corresponding NI, NFG and CPU. Then, based on the technique in [15], plots of the corresponding figures are presented for the proposed algorithm. Some of the test problems are especially complex in that the above algorithms fail to solve them. Thus, a list of the numerical results with the corresponding problem index is given in Table 3, where ‘NO’ and ‘Dim’ are the index of the problem and the dimension of the variable, respectively. Clearly, the proposed algorithm (Algorithm 2.1) is effective from the above figures because the point’s value on the algorithm’s curve is larger than for the other algorithms. In Fig. 1, the red curve’s value of the initial point is close to 0.9, while Algorithm 1 only arrives at a value of 0.6. This means that Algorithm 2.1 addresses complex problems fewer iterations. In Fig. 2, the red curve is above the left curve because the calculation number of the objective function is less than the left one when addressing practical problems, which is an important aspect for measuring the performance of an algorithm. It is well know that the calculation time (CPU time) is the most essential metric of an algorithm because the speed of the algorithm is the most basic feature. The objective algorithm not only is well defined because the curve seems more feasible and smooth but also can address most complex problems because the largest value of the point on the curve is close to 0.98, which indicates that the proposed algorithm is highly effective. Overall, the proposed algorithm cannot only solve smooth problems but it also enriches the knowledge of optimization and lays the foundation for further in-depth studies.

Figure 1

Performance profiles of these methods (NI)

Figure 2

Performance profiles of these methods (NFG)

Figure 3

Performance profiles of these methods (CPU time)

Algorithm for nonsmooth problems

From the previous section, the proposed algorithm is trustworthy and has good potential based on the fundamental numerical results. Thus, this section attempts to apply the proposed method to nonsmooth problems. It is interesting that the vast majority of practical conditions are harsh; therefore, Newton’s series of methods are often unsatisfactory for solving such problems because they require information about the gradient function [37, 39, 46]. Currently, most experts and scholars focus on bundled methods, which are successful solutions to small-scale problems (see [18, 19, 24, 36]) but fail to solve large-scale practical problems. With the development of science and technology, it is becoming an urgent need to design a simple but effective algorithm to solve large-scale nonsmooth problems. Based on the simplicity of the conjugate gradient method, some experts and scholars have proposed relevant algorithms and made numerous fruitful theoretical achievements (see [20, 28]).

Consider the following problem:

$$ \min \theta (x), $$

where \(\theta (x)\) sometimes is nonsmooth and \(x \in R^{n}\). A famous technique is called ‘Moreau–Yosida’ regularization, which calculates the equivalent solution of the previous problem through a modified object function with the formula

$$ \min_{t \in R^{n}} \biggl\{ \theta (t)+ \frac{ \Vert t-x \Vert ^{2}}{2\chi }\biggr\} , $$

where χ and \(\|\cdot \|\) denote a positive constant and the Euclidean norm, respectively. Without loss of generality, we denote \(\theta ^{M}(x)\) as in (3.1) ‘Moreau–Yosida’ regularization, i.e.,

$$ \theta ^{M}(x)=\min_{t \in R^{n}} \biggl\{ \theta (t)+\frac{ \Vert t-x \Vert ^{2}}{2 \chi }\biggr\} . $$

The ‘Moreau–Yosida’ regularization technique was introduced because of its outstanding properties such as differentiability (see [21, 25]). Assume that the function (3.1) is convex and that its ‘Moreau–Yosida’ regularization obtains the best solution of \(\omega (x)=\operatorname{argmin} \theta ^{M}(t)\). Based on the mathematic knowledge and the established conclusion, we then have

$$ \nabla \theta ^{M}(x)=\frac{\omega (x)-x}{\chi }. $$

The most surprising result is that the function \(\theta ^{M}(x)\) is instantly smooth and that its gradient function is Lipschitz continuous, which is not true for the original function. It is worth noting that (3.1) and (3.3) are equivalent to each other because they have the same solution. In the remainder of this discussion, we attempt to further study the goal of (3.3) because it is clear and brief to a large extent, and we introduce relevant components of the objective algorithm simultaneously. Some necessary properties of sufficient descent and trust region will be listed in the next section. Then we provide relevant numerical results to show the performance of the proposed algorithm and draw a conclusion with regard of the whole paper.

New algorithm and its necessary properties

We start with the following formula for \(d_{k+1}\), which is an important component of the proposed algorithm in addressing complex problems:

$$ d_{k+1}=\textstyle\begin{cases} -\nabla \theta ^{M}(x_{k+1})^{T}+\frac{\nabla \theta ^{M}(x_{k+1}) y _{k}d_{k} -d_{k}^{T}\nabla \theta ^{M}(x_{k+1})y_{k}}{\max \{\xi _{2} \Vert d _{k} \Vert \Vert y_{k} \Vert ,\min \{\xi _{3} \Vert \nabla \theta ^{M}(x_{k}) \Vert ^{2},\xi _{4} \Vert d_{k} \Vert ^{2}\}\}}, & \mbox{if } k\geq 1, \\ -\nabla \theta ^{M}(x_{k+1}),& \mbox{if } k=0, \end{cases} $$

where \(s_{k}=x_{k+1}-x_{k}\), \(y_{k}=\nabla \theta ^{M}(x_{k+1})-\nabla \theta ^{M}(x_{k})\), and \(\xi _{2}\), \(\xi _{3}\), and \(\xi _{4}\) are positive constants. The step length \(\alpha _{k}\) is determined by

$$\begin{aligned} \theta ^{M}(x_{k}+\alpha _{k}d_{k}) \leq& \theta ^{M}(x_{k})+ \lambda \alpha _{k}\nabla \theta ^{M}(x_{k})^{T}d_{k} \\ &{}+ \alpha _{k}\min \bigl[-\lambda _{1}\nabla \theta ^{M}(x_{k})^{T}d_{k}, \lambda \alpha _{k}d_{k}^{T}d _{k}/2 \bigr], \end{aligned}$$

where \(\lambda , \gamma \in (0,1)\), \(\lambda _{1} \in (0,\lambda )\), and \(\alpha _{k}\) is the largest number of \(\{\gamma ^{k}\mid k=0,1,2,\ldots \}\). It is well known from the case of smooth functions that the new search direction has satisfactory descent and trust region properties; therefore, we merely list them without proof. We have

$$\begin{aligned}& \nabla \theta ^{M}(x_{k})^{T} d_{k}=- \bigl\Vert \nabla \theta ^{M}(x_{k}) \bigr\Vert ^{2}, \end{aligned}$$
$$\begin{aligned}& \Vert d_{k} \Vert \leq \sigma \bigl\Vert \nabla \theta ^{M}(x_{k}) \bigr\Vert , \end{aligned}$$

where σ is the same as in (2.7). Now, manifesting specific algorithm steps, we express the cause of the existing \(\alpha '_{k} \in \Re \) that satisfies the demands of the modified Armijo line search formula and provides the global convergence of the proposed algorithm.

Algorithm 3.1

Step 1::

(Initiation) Choose an initial point \(x_{0}\), \(\gamma \in (0,1)\), \(\xi _{2}\), \(\xi _{3}\), and \(\xi _{4}> 0\) and positive constants \(\varepsilon \in (0,1)\). Let \(k=0\), \(d_{0}=-\nabla \theta ^{M}(x _{0})\).

Step 2::

If \(\|\nabla \theta ^{M}(x_{k})\| \leq \varepsilon \), then stop.

Step 3::

Find the step length, i.e., the calculation \(\alpha _{k} = \max \{\gamma ^{k}\mid k=0, 1, 2, \ldots \}\) stemming from (3.6).

Step 4::

Set the new iteration point \(x_{k+1}=x_{k}+\alpha _{k}d_{k}\).

Step 5::

Update the search direction by (3.5).

Step 6::

If \(\|\nabla \theta ^{M}(x_{k+1})\|\leq \varepsilon \) holds, the algorithm stops; otherwise, go to the next step.

Step 7::

Let \(k:=k+1\) and go to Step 3.

To express the validity of the step length \(\alpha _{k}\) in (3.6) and the global convergence of Algorithm 3.1, the following assumptions are necessary.


  1. (i)

    The level set \(\pi =\{x\mid \theta ^{M}(x) \leq \theta ^{M}(x_{0})\}\) is bounded.

  2. (ii)

    The function \(\theta ^{M}(x) \in C^{2}\) is bounded from below.

From the ‘Moreau–Yosida’ regularization technique, the function \(\theta ^{M}(x)\) is Lipschitz continuous, i.e., there exists a positive constant κ subject to

$$ \bigl\Vert \nabla \theta ^{M}(x)-\nabla \theta ^{M}(y) \bigr\Vert \leq \kappa \Vert x-y \Vert . $$

Theorem 3.1

If Assumptions (i)(ii) are true, then there exists a constant \(\alpha _{k}\) that satisfies the requirements of (3.6).


We introduce the following function:

$$\begin{aligned} \vartheta (\alpha ) =&\theta ^{M}(x_{k}+\alpha d_{k})-\theta ^{M}(x_{k})- \lambda \alpha \nabla \theta ^{M}(x_{k})^{T}d_{k} \\ &{}- \alpha \min \bigl[-\lambda _{1}\nabla \theta ^{M}(x_{k})^{d}k, \lambda \alpha d_{k}^{T}d_{k}/2\bigr]. \end{aligned}$$

Based on the established theorem, following the sufficient decrease of (3.7), for sufficiently small positive α, we have

$$\begin{aligned} \vartheta (\alpha ) =&\theta ^{M}(x_{k}+\alpha d_{k})-\theta ^{M}(x_{k})- \lambda \alpha \nabla \theta ^{M}(x_{k})^{T}d_{k} \\ &{}- \alpha \min \bigl[-\lambda _{1}\nabla \theta ^{M}(x_{k})^{T}d_{k}, \lambda \alpha d_{k}^{T}d_{k}/2\bigr] \\ =&\alpha (1+\lambda _{1}-\lambda )\nabla \theta ^{M}(x_{k})^{T} d_{k}+o( \alpha ) < 0, \end{aligned}$$

where the latter inequality holds since the objective function is continuous. Thus, there exists a constant \(0<\alpha _{0}<1\) such that \(\vartheta (\alpha _{0}) < 0\); on the other hand, \(\vartheta (0)=0\), and based on the function’s continuous property, there exists a constant \(\alpha _{1}\) such that

$$\begin{aligned} \vartheta (\alpha _{1}) =&\theta ^{M}(x_{k}+ \alpha _{1} d_{k})-\theta ^{M}(x _{k})-\lambda \alpha _{1} \nabla \theta ^{M}(x_{k})^{T}d_{k} \\ &{}-\alpha _{1} \min \bigl[-\lambda _{1}\nabla \theta ^{M}(x_{k})^{d}k, \lambda \alpha _{1} d _{k}^{T}d_{k}/2\bigr]< 0. \end{aligned}$$


$$\begin{aligned} \theta ^{M}(x_{k}+\alpha _{1} d_{k}) < &\theta ^{M}(x_{k})+\lambda \alpha _{1} \nabla \theta ^{M}(x_{k})^{T}d_{k} \\ &{}+ \alpha _{1} \min \bigl[-\lambda _{1} \nabla \theta ^{M}(x_{k})^{T}d_{k}, \lambda \alpha _{1} d_{k}^{T}d_{k}/2 \bigr] \end{aligned}$$

is correct, and this means that the modified Armijo line search is well defined. From the above discussion, Algorithm 3.1 has the properties of the sufficient descent and a trust region, and we can now present the theorem of global convergence. □

Theorem 3.2

If the above assumptions are satisfied and the relative sequences \(\{x_{k}\}\), \(\{\alpha _{k}\}\), \(\{d_{k}\}\), \(\{\theta ^{M}(x_{k})\}\) are generated by Algorithm 3.1, then we have \(\lim_{k \rightarrow \infty }\|\nabla \theta ^{M}(x_{k})\|=0\).

We neglect the proof because its proof is similar to that of Theorem 2.1.

Nonsmooth numerical experiment

Two algorithms are proposed and compared to the proposed algorithm because this section measures the objective algorithm’s efficiency on the test problems listed in Table 4. In addition, the problems are only different from Algorithm 3.1 in the formula for \(d_{k+1}\). The relevant numerical data are listed in Table 5 of Appendix 2, and we plot the corresponding graphs based on these data, where ‘NI’, ‘NF’, and ‘CPU’ are the iteration number, calculation number of the objective function and the algorithm’s run time (in seconds). The first metric is determined by

$$ d_{k+1}= \textstyle\begin{cases} -\nabla \theta ^{M}(x_{k+1}) + \frac{\nabla \theta ^{M}(x_{k+1}))^{T}y _{k}d_{k}-d_{k}^{T}\nabla \theta ^{M}(x_{k+1})y_{k}}{\nabla \theta ^{M}(x _{k})^{T}\nabla \theta ^{M}(x_{k})}, & \mbox{if } k\geq 1, \\ -\nabla \theta ^{M}(x_{k+1}),& \mbox{if } k=0. \end{cases} $$

In [51], without loss of generality, calling Algorithm 2, the other algorithm in [4] is calculated as

$$ d_{k+1}= \textstyle\begin{cases} \frac{-y_{k}^{T}s_{k}\nabla \theta ^{M}(x_{k+1}) + y_{k}^{T}\nabla \theta ^{M}(x_{k+1})s_{k}-s_{k}^{T}g_{k+1}y_{k}}{ \Vert \nabla \theta ^{M}(x _{k}) \Vert ^{2}}, & \mbox{if } k\geq 1, \\ -\nabla \theta ^{M}(x_{k+1}),& \mbox{if } k=0, \end{cases} $$

denoted as Algorithm 3.

Dimension: 150,000, 180,000, 192,000, 210,000, 222,000, 231,000, 240,000, 252,000, 270,000.

Initiation: \(\lambda =0.9\), \(\lambda _{1}=0.4\), \(\xi _{3}= 100\), \(\xi _{2}=\xi _{4}=0.01\), \(\gamma =0.5\).

Stopping rule: If NI is no greater than 10,000, \(|f(x_{k+1})-f(x _{k})| < 1e-7\) and if the iteration number of \(\alpha _{k}\) is no greater than 5, then the algorithm stops.

Calculation environment: The calculation environment is a computer with 2 GB of memory, a Pentium (R) Dual-Core CPU E5800@3.20 GHz and the 64-bit Windows 7 operating system.

From Figs. 46, the proposed algorithm is effective and successful to a large extent. First, the computational data of the algorithm fully address complex situations. Second, the algorithm in the design of the search direction carefully considers the corresponding function, gradient function and current direction. In Figs. 4 and 5, the curve of Algorithm 3.1 is above the other two curves because the number of iterations is much lower. Its initial point is close to 0.75, which is much larger than the other algorithms. Note that the proposed algorithm’s computation time is the best of the three algorithms because the curve increases rapidly and is very smooth. In other words, its curve has a wonderful initiation point, which results in a high efficiency in addressing complex issues.

Figure 4

Performance profiles of these methods (NI)

Figure 5

Performance profiles of these methods (NF)

Figure 6

Performance profiles of these methods (CPU)

Applications of Algorithm 3.1 in image restoration

It is well known that many modern applications of optimization call for studying large-scale nonsmooth convex optimization problems, where the image restoration problem arising in image processing is an illustrating example. The image restoration problem plays an important role in biological engineering, medical sciences and other areas of science and engineering (see [6, 10, 32] etc.), which is to reconstruct an image of an unknown scene from an observed image. The most common image degradation model is defined by the following system:

$$ b = Ax + \eta, $$

where \(x \in \Re ^{n}\) is the underlying images, \(b \in \Re ^{m}\) is the observed images, A is an \(m \times n\) blurring matrix, and \(\eta \in \Re ^{m}\) denotes the noise. One way to get the unknown η is to solving the problem \(\min_{x \in \Re ^{n}} \|Ax+b\|^{2}\). This problem will not have a satisfactory solution since the system is very sensitive to lack of information and the noise. The regularized least square problem is often used to overcome the above shortcoming

$$ \min_{x \in \Re ^{n}} \Vert Ax+b \Vert ^{2}+ \lambda \Vert Dx \Vert _{1}, $$

where \(\|\cdot \|_{1}\) is the \(l^{1}\) norm, λ is the regularization parameter controlling the trade-off between the regularization term and the data-fitting term, and D is a linear operator. It is easy to see that the above problem is a nonsmooth convex optimization problem and it is typically of large scale since the \(l^{1}\) norm is nonsmooth.

Image restoration problem

The above section tells us that Algorithm 3.1 can be used for large-scale nonsmooth problems. Then we will use this algorithm to solve the above image noise problem, where the parameters are the same as those of Sect. 3.2 different from \(\xi _{2}=\xi _{3}=\xi _{4}=1\). All codes are written by MATLAB r2017a and run on a PC with an Intel Pentium(R) Xeon(R) E5507 CPU @2.27 GHz, 6.00 GB of RAM, and the Windows 7 operating system. The stopped condition is

$$ \frac{ \vert \theta _{\alpha }(v_{k+1})-\theta _{\alpha }(v_{k}) \vert }{ \vert \theta _{\alpha }(v_{k}) \vert }\leq 10^{-2} $$


$$ \frac{ \Vert v_{k+1}- v_{k} \Vert }{ \Vert v_{k} \Vert }\leq 10^{-2}, $$


$$ \theta _{\alpha }(v_{k})=\sum_{(i,j)\in N} \biggl\{ \sum_{(m,n) \in V_{i,j} \setminus N} \varphi _{\alpha }(v_{i,j}-y_{m,n})+ \frac{1}{2}\sum_{(m,n)\in V_{i,j} \cap N} \varphi _{\alpha }(v_{i,j}-v _{m,n})\biggr\} , $$

the noise candidate indices set \(N:=\{(i,j)\in {A} \mid \bar{y} _{i,j} \neq y_{i,j},y_{i,j}=s_{\mathrm{min}} \mbox{ or } s_{ \mathrm{max}}\}\), \(s_{\mathrm{max}}\) is the maximum of the noisy pixel and \(s_{\mathrm{min}}\) denotes the minimum of the noisy pixel, \({A}=\{1,2,\ldots , M\}\times \{1,2,3,\ldots ,N\}\), \(V_{i,j}=\{(i,j-1),(i,j+1),(i-1,j),(i+1,j) \}\) is the neighborhood of \((i,j)\), y denotes the observed noisy image of x corrupted by the salt-and-pepper noise, ȳ is defined by the image obtained by applying the adaptive median filter method to the noisy image y in the first phase, x is the true image with M-by-N pixels, and \(x_{i,j}\) denotes the gray level of x at pixel location \((i,j)\). It is easy to see that the regularity of \(\theta _{\alpha }\) only depends on \(\varphi _{\alpha }\) and there exist many properties as regards \(\theta _{\alpha }\) and \(\varphi _{\alpha }\) that are studied by many scholars (see [7, 8] etc.). In the experiments, Lena (\(256 \times 256\)), Cameraman (\(256\times 256\)), Barbara (\(512\times 512\)), Banoon (\(512\times 512\)), and Man (\(1024\times 1024\)) are the tested images. To compare Algorithm 3.1 with other similar algorithm, we also test the well-known PRP conjugate gradient algorithm, where the Step 5 of Algorithm 3.1 is replaced by the PRP formula. The tested performances of these two algorithms (Algorithm 3.1 and PRP algorithm) are listed and the spent time is stated in Table 1.

Table 1 The CPU time of PRP algorithm and Algorithm 3.1 in seconds

Results and discussion

Figures 7 and 8 show that Algorithm 3.1 and PRP algorithm have good performance to solve the image restoration and both of them can successfully do this problem. From the results of Table 1 it turns out that Algorithm 3.1 is competitive to PRP algorithm since it needs less CPU time to restoration of the most given images than those of the PRP algorithm.

Figure 7

Restoration of the images Lena, Cameraman, Barbara, Banoon, and Man by PRP algorithm and Algorithm 3.1. From left to right: the noisy image with 20% salt-and-pepper noise, the restorations obtained by minimizing z with PRP algorithm and Algorithm 3.1, respectively

Figure 8

Restoration of the images Lena, Cameraman, Barbara, Banoon, and Man by PRP algorithm and Algorithm 3.1. From left to right: the noisy image with 50% salt-and-pepper noise, the restorations obtained by minimizing z with PRP algorithm and Algorithm 3.1, respectively


This paper proposes a new PRP algorithm that combines the innovative formula of the search direction \(d_{k+1}\) with the modified Armijo line technique: (i) In the design of the proposed algorithm, the key information about the objective function, the gradient function and its current direction is collected and applied to complex problems, and the numerical results show that the proposed algorithm is efficient. (ii) For nonsmooth problems, the introduced ‘Moreau–Yosida’ regularization technique succeeds in enhancing the proposed algorithm, and the numerical results prove the validity and simplicity of the discussed algorithm. (iii) Image restoration problems are done by Algorithm 3.1 and from the tested results it turns out that the given algorithm has better performance than those of the normal PRP algorithm. However, there are some problems with the optimization method that need to be studied such as how to better leverage the benefits of the steepest descent method while overcoming its shortcomings.


  1. 1.

    Al-Baali, A., Albaali, M.: Descent property and global convergence of the Fletcher–Reeves method with inexact line search. IMA J. Numer. Anal. 5(1), 121–124 (1985)

  2. 2.

    Al-Baali, M., Narushima, Y., Yabe, H.: A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization. Comput. Optim. Appl. 60(1), 89–110 (2015)

  3. 3.

    Andrei, N.: An unconstrained optimization test functions collection. Environ. Sci. Technol. 10(1), 6552–6558 (2008)

  4. 4.

    Andrei, N.: On three-term conjugate gradient algorithms for unconstrained optimization. Appl. Math. Comput. 241(11), 19–29 (2008)

  5. 5.

    Argyros, I.K., George, S.: Local convergence analysis of Jarratt-type schemes for solving equations. Appl. Set-Valued Anal. Optim. 1, 53–62 (2019)

  6. 6.

    Banham, M.R., Katsaggelos, A.K.: Digital image restoration. IEEE Signal Process. Mag. 14, 24–41 (1997)

  7. 7.

    Cai, J.F., Chan, R.H., Fiore, C.D.: Minimization of a detail-preserving regularization functional for impulse noise removal. J. Math. Imaging Vis. 29, 79–91 (2007)

  8. 8.

    Cai, J.F., Chan, R.H., Morini, B.: Minimization of an edge-preserving regularization functional by conjugate gradient types methods. In: Image Processing Based on Partial Differential Equations: Prceedings of the International Conference on PDE-Based Image Processsing and Related Inverse Problems, CMA, Oslo, August 8–12, 2005, pp. 109–122. Springer, Berlin (2007)

  9. 9.

    Cardenas, S.: Efficient generalized conjugate gradient algorithms. I. Theory. J. Optim. Theory Appl. 69(1), 129–137 (1991)

  10. 10.

    Chan, C.L., Katsaggelos, A.K., Sahakian, A.V.: Image sequence filtering in quantum-limited noise with applications to low-dose fluoroscopy. IEEE Trans. Med. Imaging 12, 610–621 (1993)

  11. 11.

    Dai, Z.: A mixed HS–DY conjugate gradient methods. Math. Numer. Sin. (2005)

  12. 12.

    Dai, Z., Wen, F.: A generalized approach to sparse and stable portfolio optimization problem. J. Ind. Manag. Optim. 14, 1651–1666 (2018)

  13. 13.

    Dai, Z.F.: Two modified HS type conjugate gradient methods for unconstrained optimization problems. Nonlinear Anal., Theory Methods Appl. 74(3), 927–936 (2011)

  14. 14.

    Deng, S., Wan, Z.: A three-term conjugate gradient algorithm for large-scale unconstrained optimization problems. Appl. Numer. Math. 92, 70–81 (2015)

  15. 15.

    Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2001)

  16. 16.

    Du, X., Liu, J.: Global convergence of a spectral HS conjugate gradient method. Proc. Eng. 15(4), 1487–1492 (2011)

  17. 17.

    Gilbert, J.C., Nocedal, J.: Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2(1), 21–42 (1990)

  18. 18.

    Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)

  19. 19.

    Haarala, N., Miettinen, K., Mäkelä, M.M.: Globally convergent limited memory bundle method for large-scale nonsmooth optimization. Math. Program. 109(1), 181–205 (2007)

  20. 20.

    Lin, C.J., Weng, R.C., Keerthi, S.S.: Trust region Newton method for logistic regression. J. Mach. Learn. Res. 9(2), 627–650 (2008)

  21. 21.

    Meng, F., Zhao, G.: On second-order properties of the Moreau–Yosida regularization for constrained nonsmooth convex programs. Numer. Funct. Anal. Optim. 25(5–6), 515–529 (2004)

  22. 22.

    Narushima, Y., Yabe, H., Ford, J.A.: A three-term conjugate gradient method with sufficient descent property for unconstrained optimization. SIAM J. Optim. 21(1), 212–230 (2016)

  23. 23.

    Nazareth, L.: A conjugate direction algorithm without line searches. J. Optim. Theory Appl. 23(3), 373–387 (1977)

  24. 24.

    Oustry, F.: A second-order bundle method to minimize the maximum eigenvalue function. Math. Program. 89(1), 1–33 (2000)

  25. 25.

    Pearson, J.W., Martin, S., Wathen, A.J.: Preconditioners for state-constrained optimal control problems with Moreau–Yosida penalty function. Numer. Linear Algebra Appl. 21(1), 81–97 (2013)

  26. 26.

    Polak, E.: The conjugate gradient method in extreme problems. Comput. Math. Math. Phys. 9, 94–112 (1969)

  27. 27.

    Polak, E., Ribière, G.: Note sur la convergence de directions conjugees. Rev. Fr. Inform. Rech. Opér. 3, 35–43 (1969)

  28. 28.

    Schramm, H., Zowe, J.: A version of the bundle method for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J. Optim. 2(1), 121–152 (2006)

  29. 29.

    Sheng, Z., Yuan, G.: An effective adaptive trust region algorithm for nonsmooth minimization. Comput. Optim. Appl. 71, 251–271 (2018)

  30. 30.

    Sheng, Z., Yuan, G., et al.: An adaptive trust region algorithm for large-residual nonsmooth least squares problems. J. Ind. Manag. Optim. 14, 707–718 (2018)

  31. 31.

    Sheng, Z., Yuan, G., Cui, Z.: A new adaptive trust region algorithm for optimization problems. Acta Math. Sci. 38(2), 479–496 (2018)

  32. 32.

    Slump, C.H.: Real-time image restoration in diagnostic X-ray imaging, the effects on quantum noise. In: Proceedings of the 11th IAPR International Conference on Pattern Recognition, Vol. II, Conference B: Pattern Recognition Methodology and Systems, pp. 693–696 (1992)

  33. 33.

    Sun, Q., Liu, Q.: Global convergence of modified HS conjugate gradient method. J. Appl. Math. Comput. 22(3), 289–297 (2009)

  34. 34.

    Touati-Ahmed, D., Storey, C.: Efficient hybrid conjugate gradient techniques. J. Optim. Theory Appl. 64(2), 379–397 (1990)

  35. 35.

    Wan, Z., Yang, Z.L. Wang, Y.L.: New spectral PRP conjugate gradient method for unconstrained optimization. Appl. Math. Lett. 24(1), 16–22 (2011)

  36. 36.

    Wang, W., Qiao, X., Han, Y.: A proximal bundle method for nonsmooth and nonconvex constrained optimization. Comput. Stat. Data Anal. 34(34), 3464–3485 (2011)

  37. 37.

    Wei, Z., Li, G., Qi, L.: New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 175(2), 1156–1188 (2006)

  38. 38.

    Wei, Z., Yao, S., Liu, L.: The convergence properties of some new conjugate gradient methods. Appl. Math. Comput. 183(2), 1341–1350 (2006)

  39. 39.

    Wei, Z., Yu, G., Yuan, G., et al.: The superlinear convergence of a modified BFGS-type method for unconstrained optimization. Comput. Optim. Appl. 29(3), 315–332 (2004)

  40. 40.

    Ying, L., Hai, Z., Fei, L.: A modified proximal gradient method for a family of nonsmooth convex optimization problems. J. Oper. Res. Soc. China 5, 391–403 (2017)

  41. 41.

    Yuan, G., Hu, W., Wang, B.: A modified Armijo line search technique for large-scale nonconvex smooth and convex nonsmooth optimization problems. Preprint (2017)

  42. 42.

    Yuan, G., Li, Y., Li, Y.: A modified Hestenes and Stiefel conjugate gradient algorithm for large-scale nonsmooth minimizations and nonlinear equations. J. Optim. Theory Appl. 168(1), 129–152 (2016)

  43. 43.

    Yuan, G., Lu, X.: A modified PRP conjugate gradient method. Ann. Oper. Res. 166(1), 73–90 (2009)

  44. 44.

    Yuan, G., Lu, X., Wei, Z.: A conjugate gradient method with descent direction for unconstrained optimization. J. Comput. Appl. Math. 233(2), 519–530 (2009)

  45. 45.

    Yuan, G., Sheng, Z., Wang, P., Hu, W., Li, C.: The global convergence of a modified BFGS method for nonconvex functions. J. Comput. Appl. Math. 327, 274–294 (2018)

  46. 46.

    Yuan, G., Wei, Z.: Convergence analysis of a modified BFGS method on convex minimizations. Comput. Optim. Appl. 47(2), 237–255 (2010)

  47. 47.

    Yuan, G., Wei, Z., Li, G.: A modified Polak–Ribière–Polyak conjugate gradient algorithm for nonsmooth convex programs. J. Comput. Appl. Math. 255, 86–96 (2014)

  48. 48.

    Yuan, G., Wei, Z., Lu, X.: Global convergence of BFGS and PRP methods under a modified weak Wolfe–Powell line search. Appl. Math. Model. 47, 811–825 (2017)

  49. 49.

    Yuan, G., Zhang, M.: A three-terms Polak–Ribière–Polyak conjugate gradient algorithm for large-scale nonlinear equations. J. Comput. Appl. Math. 286, 186–195 (2015)

  50. 50.

    Zaslavski, A.J.: Three convergence results for continuous descent methods with a convex objective function. J. Appl. Numer. Optim. 1, 53–61 (2019)

  51. 51.

    Zhang, L., Zhou, W., Li, D.H.: A descent modified Polak–Ribière–Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal. 26(4), 629–640 (2006)

  52. 52.

    Zhao, X., Ng, K.F., Li, C., Yao, J.C.: Linear regularity and linear convergence of projection-based methods for solving convex feasibility problems. Appl. Math. Optim. 78, 613–641 (2018)

  53. 53.

    Zhou, W.: Some descent three-term conjugate gradient methods and their global convergence. Optim. Methods Softw. 22(4), 697–711 (2007)

Download references


The authors would like to thank for the support funds.


This work was supported by the National Natural Science Foundation of China (Grant No. 11661009), the Guangxi Natural Science Fund for Distinguished Young Scholars (No. 2015GXNSFGA139001), and the Guangxi Natural Science Key Fund (No. 2017GXNSFDA198046).

Author information

GY mainly analyzed the theory results and organized this paper, TL did the numerical experiments of smooth problems and WH focused on the nonsmooth problems and image problems. All authors read and approved the final manuscript.

Correspondence to Wujie Hu.

Ethics declarations

Competing interests

The authors declare to have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

Table 2 Test problems
Table 3 Numerical results

Appendix 2

Table 4 Test problems
Table 5 Numerical results

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yuan, G., Li, T. & Hu, W. A conjugate gradient algorithm and its application in large-scale optimization problems and image restoration. J Inequal Appl 2019, 247 (2019).

Download citation


  • 90C26


  • Conjugate gradient
  • Nonconvex and nonsmooth
  • Descent property
  • Global convergence