# Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method

## Abstract

In this paper, a new appropriate diagonal matrix estimation of the Hessian is introduced by minimizing the Byrd and Nocedal function subject to the weak secant equation. The Hessian estimate is used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. Moreover, to counteract the adverse effect of monotonicity, we introduce a new nonmonotone strategy. The global and superlinear convergence of the suggested algorithm is established under some standard conditions. The numerical experiments on unconstrained optimization test functions show that the new algorithm is efficient and robust.

## 1 Introduction

In this paper, we deal with the following unconstrained optimization problem:

$$\min_{x\in \mathbb{R}^{n}} f(x),$$
(1.1)

where $$f:\mathbb{R}^{n} \to \mathbb{R}$$ is a twice continuously differentiable function.

Line search (LS) and trust region (TR) methods are two prominent classes of iterative methods to solve the problem (1.1). The LS method, for a given initial point $$x_{0} \in \mathbb{R}^{n}$$, is a procedure that computes a step length $$\alpha _{k}$$ in the specific direction $$p_{k}$$ and considers a new point as $$x_{k+1} = x_{k} +\alpha _{k} p_{k}$$. On the other hand, the TR algorithm computes a trial step $$p_{k}$$ which is an approximate solution of the following quadratic subproblem:

\begin{aligned} &\min g^{T}_{k}p+\frac{1}{2}p^{T}B_{k}p, \\ & \quad \text{s.t. } \Vert p \Vert \le {\Delta }_{k}, \end{aligned}
(1.2)

in which $$g_{k}=\nabla f(x_{k})$$, $$B_{k}\in \mathbb{R}^{n\times n}$$ is the exact Hessian $$\nabla ^{2} f(x_{k})$$, or a symmetric approximation of it, and $${\Delta }_{k}{\mathrm{>0}}$$ is the TR radius. In the rest of the paper, $$\|\cdot \|$$ refers to the Euclidean norm. According to [19] (also see [5, Theorem 8.5]), $$p^{*}$$ is the exact solution of (1.2) if and only if there exists a $$\lambda \ge 0$$ such that

$$\textstyle\begin{cases} (B_{k}+\lambda ^{*} I)p^{*} = -g_{k}, \\ \Vert p^{*} \Vert \le \Delta _{k}, \\ \lambda ^{*}(\Delta _{k}- \Vert p^{*} \Vert )=0, \end{cases}$$

and $$(B_{k} + \lambda I)$$ is positive semidefinite.

The regularized Newton method (RNM) is another efficient approach for solving the problem (1.1) and has good convergence properties; see [7, 16, 21, 23, 24, 26]. At each iteration of the RNM, the trial step $$p_{k}$$ is obtained by approximately minimizing the following unconstrained quadratic function:

\begin{aligned} {\min \psi _{k}(p)}=f_{k}+g^{T}_{k}p+ \frac{1}{2}p^{T}B_{k}p+ \frac{1}{2} \lambda _{k} \Vert p \Vert ^{2}, \end{aligned}
(1.3)

where $$f_{k}=f(x_{k})$$ and $$\lambda _{k}$$ is called the regularized parameter. Here we define

$$\lambda _{k}= \frac{1}{2} \Delta _{k} \min \bigl\lbrace \Vert g_{k} \Vert , \Vert f_{k} \Vert , 1 \bigr\rbrace .$$
(1.4)

It is worth noting that the update rule of $$\Delta _{k}$$ is similar to the TR radius. At each iteration, the RNM method obtains the trial step $$p_{k}$$ by solving the following regularized Newton equation:

$$( B_{k}+\lambda _{k} I )p_{k}=-g_{k},$$
(1.5)

where I is the identity matrix and $$(B_{k}+\lambda _{k} I)$$ is positive semidefinite. Therefore, $$p_{k}$$ is well defined. If $$(B_{k}+\lambda _{k} I)$$ is positive definite, $$p_{k}$$ is unique. We can conclude that $$p_{k}$$ solves (1.5) if and only if it is the global minimizer of the unconstrained quadratic function (1.3).

If we let $$\Delta _{k}= \|p_{k}\| = \| - (B_{k} + \lambda _{k} I)^{-1} g_{k}\|$$, then it can be verified (see [19, Theorem 6.1.2]) that $$p_{k}$$ is also a solution of the TR subproblem (1.2).

By the famous result given by Powell in [17] (also see [19, Lemma 6.1.3]), we know that

$$\psi _{k}(0)-\psi _{k}(p_{k}) \ge \gamma \Vert g_{k} \Vert \min \biggl\lbrace \Vert p_{k} \Vert , \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\rbrace ,$$
(1.6)

The preceding inequality and (1.3) indicate that

$$g^{T}_{k} p_{k}\le -\gamma \Vert g_{k} \Vert \min \biggl\lbrace \Vert p_{k} \Vert , \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\rbrace ,$$
(1.7)

where $$\gamma \in (0,1)$$ is a constant.

Generally, solving the TR subproblem is more expensive than the RNM subproblem. In the RNM, only one equation (1.5) is solved at each iteration. Hence, the computational cost of obtaining an RNM step is much lower than solving a TR subproblem [24].

The most common update formula for $$B_{k}$$ is the BFGS update formula. Numerically, this method needs $$O(n^{2})$$ storage, which makes it unsuitable for large-scale problems. The application of quasi-Newton methods for solving large-scale unconstrained optimization problems has been extended by limited-memory quasi-Newton methods [15] and truncated Newton methods [12, 14]. However, the implementation of these methods for their practical usage is very sophisticated, and the associated software is quite complex [3]. Therefore, researchers have considered an alternative approach for the matrix $$B_{k}$$, in which a diagonal matrix $$B_{k} = \mathrm{diag}(b^{(1)} _{k} , b^{(2)}_{k} ,\dots , b^{(n)}_{k} )$$ is used to approximate the Hessian matrix [4, 11, 18, 27]. Observe that in this method, only $$O(n)$$ storage is required to store $$B_{k}$$ [4].

This paper first introduces an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8]. Subsequently, a new nonmonotone strategy to overcome the adverse effect of monotonically is introduced. The estimation of the Hessian is used to correct the framework of a new nonmonotone TR algorithm with the regularized quasi-Newton method. The suggested algorithm exploits a stronger nonmonotone strategy far from the solution and a weaker one close to the solution. We prove that the new algorithm is globally and superlinearly convergent.

In the next section, an appropriate diagonal matrix estimation of the Hessian is derived. In Sect. 3, the new nonmonotone strategy and the structure of the suggested algorithm are explained. Section 4 is associated with the convergence analysis of the new algorithm. In Sect. 5, some numerical experiments on a set of unconstrained optimization test problems are examined. The conclusions are given in Sect. 6.

## 2 Derivation of new diagonal updating

In the quasi-Newton method framework, the Hessian approximation matrix $$B_{k+1}$$ is usually required to satisfy the secant equation

$$B_{k+1}s_{k }= y_{k},$$
(2.1)

where $$s_{k}=x_{k+1}-x_{k}$$ and $$y_{k}=g_{k+1}-g_{k}$$. To find an appropriate diagonal matrix estimation of the Hessian in the sense of

$$B_{k} = \mathrm{diag}\bigl(b^{(1)} _{k} , b^{(2)}_{k} ,\dots , b^{(n)}_{k} \bigr),$$

we assume that $$B_{k}$$ is positive definite, and $$s^{T}_{k}y_{k} > 0$$ for all k. Since it is difficult for a diagonal matrix to satisfy the known secant equation (2.1), we will consider that $$B_{k+1}$$ satisfies the weak secant equation of Dennis and Wolkowicz [8], namely

$$s^{T}_{k}B_{k+1}s_{k}=s^{T}_{k} y_{k}.$$
(2.2)

The motivation for using the weak secant equation (2.2) can be seen in [4]. Byrd and Nocedal [6] introduced the function

$$\varphi (A) = \mathrm{tr}(A)-\ln \bigl(\det (A)\bigr),$$

defined on positive definite matrices, where $$\ln (\cdot )$$ denotes the natural logarithm. This is an elegant and efficient tool for analyzing the global properties of quasi-Newton methods. We will introduce an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation (2.2) as follows:

$$\min \varphi (B_{k+1})=\min \bigl( \mathrm{tr}(B_{k+1})-\ln \bigl(\det (B_{k+1})\bigr) \bigr),$$
(2.3)

subject to

$$s^{T}_{k}B_{k+1}s_{k}=s^{T}_{k} {y}_{k}=\varrho _{k}.$$
(2.4)

To achieve a new diagonal updating formula, we give the following penalized version of (2.3) and (2.4):

$$L=\min \bigl(\varphi (B_{k+1} )+ \bigl(s^{T}_{k} B_{k+1}s_{k}- \varrho _{k} \bigr)^{2} \bigr).$$
(2.5)

Now, having in mind that $$\mathrm{tr}(B_{k+1}) = b^{1}_{ k+1} +\cdots + b^{n} _{k+1}$$ and $$\det (B_{k+1}) = b^{1} _{k+1} \cdots b^{n}_{ k+1}$$, the minimization problem (2.5) becomes

$$L=\min \Biggl( b^{1}_{ k+1} +\cdots + b^{n} _{k+1}-\ln \bigl(b^{1} _{k+1} \cdots b^{n}_{ k+1} \bigr)+ \Biggl(\sum _{i=0}^{n} b^{i}_{k+1} \bigl(s^{i}_{k} \bigr)^{2}-\varrho _{k}, \Biggr)^{2} \Biggr),$$
(2.6)

where $$s^{i}_{k}, i = 1, \dots , n$$, are the components of vector $$s_{k}$$.

The required solution of (2.3) and (2.4) is a stationary point of the penalized function. Hence, from (2.6), we have

$$\frac{\partial \varphi}{\partial b^{i}_{k+1}}=1-\frac{1}{b^{i}_{k+1}}+2 \bigl(s^{i}_{k} \bigr)^{2} \bigl(b^{i}_{k+1} \bigl(s^{i}_{k} \bigr)^{2}- \varrho _{k} \bigr)=0.$$
(2.7)

Therefore, using (2.7), the elements of the diagonal matrix $$B_{k+1}$$ can be expressed as

$${b}^{i}_{k+1}= \frac{2\varrho _{k} (s^{i}_{k})^{2}-1+\sqrt{ (2\varrho _{k} (s^{i}_{k} )^{2}-1 )^{2}+8 (s^{i}_{k} )^{4}}}{4 (s^{i}_{k} )^{4}},$$
(2.8)

which are positive and well defined for $$s^{i}_{k}\neq 0$$. Since we have $$s^{T}_{k}y_{k} > 0$$ for all k, to ensure positiveness as well as uniformly boundedness of $$b^{i}_{k+1}$$ given by (2.8) in general situations, we set

$$\bar{b}^{i} _{k+1} = \textstyle\begin{cases} {{b}^{i} _{k+1}}, & \text{if } L_{k} \le{b}^{i} _{k+1}\le U_{k}, \\ {b}^{i} _{k}, & \text{if } s^{i}_{k}=0 \text{ or } s^{T}_{k} y_{k} \le 0, \\ \min \lbrace \max \lbrace \frac{s^{T}_{k} y_{k}}{s^{T}_{k} s_{k}},\upsilon \rbrace , \frac{1}{\upsilon} \rbrace , & \text{otherwise}, \end{cases}$$
(2.9)

where Ï… is a small positive constant. Therefore, our Hessian approximation can be given by

$$B_{k+1}=\mathrm{diag}\bigl(\bar{b}^{1}_{k+1}, \bar{b}^{2}_{k+1},\dots , \bar{b}^{n}_{k+1} \bigr).$$
(2.10)

A crucial problem is choosing the bounds $$L_{k}$$ and $$U_{k}$$. Here, we introduce an adaptive strategy to determine them. Let us begin by considering the curvature of $$f (x)$$ in direction $$s_{k}$$, which is represented by

$$s^{T}_{k} \bar{H}_{k} s_{k}=s^{T}_{k} y_{k},$$

in which $$\bar{H}_{k} = \int _{0}^{1} \nabla ^{2} f (x_{k} +ts_{k})\,dt$$ is the average Hessian matrix along $$s_{k}$$. Since it is not practical to compute the eigenvalue of the Hessian matrix in each iteration, we can estimate its size based on the scalar

$$t_{k}=\frac{s^{T}_{k} \bar{H}_{k} s_{k}}{s^{T}_{k} s_{k}}= \frac{s^{T}_{k} y_{k}}{s^{T}_{k} s_{k}}.$$

Now $$L_{k}$$ and $$U_{k}$$ in (2.9) can be chosen according to the value of $$t_{k}$$ as follows:

$$L_{k} = \max \bigl\{ \sigma _{1}(t_{k})t_{k}, \underline{L} \bigr\} ,\qquad U_{k} = \min \bigl\{ \sigma _{2}(t_{k}) \vert t_{k} \vert , \bar{U} \bigr\} ,$$

where $$0 \le \sigma _{1}(t_{k}) \le 1$$, $$\sigma _{2}(t_{k})\ge 1$$, $$0< \underline{L} < 1$$, and $$\bar{U} > 1$$. Obviously, the values of two bounds can be adjusted by $$\sigma _{1}$$ and $$\sigma _{2}$$, whose values depend on $$t_{k}$$. According to the relation (2.9), there exist two positive constants m and M such that

$$m=\min \lbrace \upsilon ,\underline{L} \rbrace \le \bar{b}^{i} _{k+1}\le M=\max \biggl\lbrace \frac{1}{\upsilon}, \bar{U} \biggr\rbrace .$$
(2.11)

## 3 Nonmonotone strategy and new algorithm

Grippo et al. [10] found that monotonically decreasing objective function values in the classical iterative schemes for solving (1.1) may reduce the convergence speed of the TR method, especially in the presence of narrow curved valleys. Also, see [20]. As a remedy, scholars put their best efforts into developing nonmonotone strategies that guarantee global convergence [1, 2, 18, 25]. The pioneering nonmonotone LS method was introduced by Grippo et al. [10] as follows:

$$f(x_{k}+\alpha _{k} p_{k})\le f_{l(k)}+ \sigma \alpha _{k} g^{T}_{k} p_{k},$$
(3.1)

in which $$\sigma \in (0, 1)$$ is a constant,

$$f_{l(k)}=\max_{0\le j\le \phi (k)}\{f_{k-j}\},$$

$$\phi (0) = 0$$, $$0 \le \phi (k) \le \min \{\phi (k -1) + 1, N\}$$ for all $$k\ge 1$$, and N is a nonnegative integer. Despite the good advantages of this strategy, Zhang and Hager [25] found that this method suffers from various weaknesses. Therefore, a nonmonotone strategy based on the weighted average of previous consecutive iterations was proposed by them. Moreover, using an adaptive convex combination of $$f_{l(k)}$$ and $$f_{k}$$, Amini et al. [2] put an effective substitution in (3.1).

To counteract the adverse effect of monotonicity, here we introduce the following hybrid nonmonotone LS condition:

$$f(x_{k}+\alpha _{k} p_{k})\le D_{k}+ \delta \alpha _{k} \biggl(g^{T}_{k} p_{k}-\frac{1}{2} \alpha _{k} p^{T}_{k}B_{k}p_{k} \biggr),$$
(3.2)

where $$\delta \in (0,1)$$ is a constant and

$$D_{k}= \textstyle\begin{cases} f_{k}, & k=0, \\ (\xi _{k}f_{l(k)}+f_{k} )/ ({\xi _{k}+1} ), & k \ge 1 , \end{cases}$$
(3.3)

with $$\xi _{k}\in [0,1]$$. As we see, the definition of mean values $$D_{k}$$ implies that each $$D_{k}$$ is a convex combination of the $$f_{l(k)}$$ and $$f_{k}$$. For given $$\xi _{0}\in [0,1]$$, to calculate $$\xi _{k}$$ we employ the following update formula:

$$\xi _{k} = \textstyle\begin{cases} {\xi _{0}}/{2}, & k=1, \\ (\xi _{k-1}+\xi _{k-2} )/{2},& k\ge 2. \end{cases}$$
(3.4)

The new nonmonotone LS is performed in a backtracking scheme. That is, the step length $$\alpha _{k}$$ is the largest member of $$\{\rho ^{j} \beta _{k}\}_{j \ge 0}$$ with $$\rho \in (0,1)$$ and $$\beta _{k}>0$$ which satisfies inequality (3.2) [2, 19]. Similar to [13], we set $$\beta _{k}=-{g^{T}_{k} p_{k}}/{\pi _{k}\|p_{k}\|^{2}}$$ such that $$\pi _{k}=\|y_{k}\|/\|p_{k}\|$$.

Let $$p_{k}$$ be the solution of (1.3) in which $$B_{k+1}$$ is a diagonal matrix. To determine whether a trial step will be accepted, we compute $$\hat{r}_{k}$$ as the ratio between of $$f(x)$$ and the model function $$\psi _{k}(p)$$ by the following relation:

$$\hat{r}_{k}= \frac{D_{k} -f(x_{k}+p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})},$$
(3.5)

where $$D_{k}$$ is computed by (3.3).

The new TR ratio implies that the suggested algorithm benefits from the best convergence results with a stronger nonmonotone strategy far from the solution and a weaker one close to the solution; see [25].

Now, we can present the framework of the new algorithm as follows (see Algorithm 1).

## 4 Convergence analysis

In this section, we examine the convergence properties of the suggested algorithm. To this end, the following standard assumption is needed [18].

### Assumption 4.1

The level set $$\varGamma (x_{0}) = \{x\vert f (x)\le f (x_{0})\}$$, satisfies $$\varGamma (x_{0}) \subset \Im$$, where â„‘ is a closed and bounded set of $$\mathbb{R}^{n}$$.

### Remark 1

Let $$f(x)$$ be a twice continuously differentiable function. Therefore, Assumption 4.1 implies that there exists a constant $$M_{1} > 0$$ such that

$$\bigl\Vert \nabla ^{2} f(x_{k}) \bigr\Vert \le M_{1}, \quad \forall x\in \Im .$$
(4.1)

Therefore, using the mean value theorem, one can conclude that

$$\bigl\Vert g(x) - g(y) \bigr\Vert \le M_{1} \Vert x - y \Vert ,\quad \forall x, y \in \Im ,$$

which means that $$g(x)$$ is Lipschitz continuous in â„‘.

To establish global convergence of the iterative scheme $$x_{k+1} = x_{k} + \alpha _{k} p_{k}$$, with the backtracking LS satisfying (3.3), we assume that Assumption 4.1 holds and the direction $$p_{k}$$ satisfies the following sufficient descent conditions:

$$g^{T}_{k} p_{k}\le - a_{1} \Vert g_{k} \Vert ^{2} \quad \text{and} \quad \Vert p_{k} \Vert \le a_{2} \Vert g_{k} \Vert ,$$
(4.2)

where $$a_{1}$$ and $$a_{2}$$ are two positive real-valued constants. For convenience in the discussion, we consider two index sets as follows:

$$\mathcal{I}=\{k:\hat{r}\ge \mu _{0}\} \quad \text{and} \quad \mathcal{J}=\{k:\hat{r}< \mu _{0}\}.$$

### Lemma 4.1

Suppose that the sequence $$\{B_{k}\}$$ is generated by Algorithm 1. Then for any k, $$B_{k}$$ is positive definite.

### Proof

According to definition of $$B_{k}=\mathrm{diag}(\bar{b}^{1}_{k},\bar{b}^{2}_{k},\dots ,\bar{b}^{n}_{k})$$, and relation (2.9), this is obvious.â€ƒâ–¡

### Lemma 4.2

Suppose that sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then we have

$$\bigl\vert f_{k}-f_{k+1}- \bigl(\psi _{k}(0)-\psi _{k}(p_{k}) \bigr) \bigr\vert \le O\bigl( \Vert p_{k} \Vert ^{2}\bigr).$$

### Proof

Using the Taylor expansion with (2.11) and Assumption 4.1, we get

\begin{aligned} \bigl\vert f_{k}-f_{k+1}-\bigl(\psi _{k} (0)- \psi _{k} (p_{k} )\bigr) \bigr\vert &= \bigl\vert -{p_{k}}^{T} \nabla ^{2} f(x_{k})p_{k}+{p_{k}}^{T}B_{k} p_{k} \bigr\vert +O \bigl({ \Vert p_{k} \Vert }^{2} \bigr) \\ &= \bigl\Vert {p_{k}}^{T}\bigl(B_{k}- \nabla ^{2} f(x_{k})\bigr)p_{k} \bigr\Vert +O\bigl({ \Vert p_{k} \Vert }^{2}\bigr) \\ &\le (M+M_{1}){ \Vert p_{k} \Vert }^{2}+O\bigl({ \Vert p_{k} \Vert }^{2} \bigr)=O\bigl({ \Vert p_{k} \Vert }^{2}\bigr). \end{aligned}

Hence, the proof is complete.â€ƒâ–¡

### Lemma 4.3

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then for all $$k\in \mathbb{N}\cup \{0\}$$, we have $$x_{k} \in \Gamma (x_{0})$$.

### Proof

We consider two cases.

Case 1. If $$k\in \mathcal{I}$$, then from (1.6) and (3.5), we can write

$$f_{k+1}\le D_{k}-{\mu }_{0} \gamma \Vert g_{k} \Vert \min \biggl\{ \Vert p_{k} \Vert , \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} ,$$
(4.3)

which shows that $$f_{k+1}\le D_{k}$$, for all $$k \in \mathcal{I}$$.

Case 2. If $$k\in \mathcal{J}$$, then the trial step is rejected and LS must be performed. Through (1.7) we know that $$g ^{T}_{k} p_{k} \le 0$$ for all k. Therefore, from this inequality along with (3.2), we conclude

\begin{aligned} f_{k+1}-D_{k} \le \delta \alpha _{k} \biggl( g^{T}_{k} p_{k}- \frac{1}{2}\alpha _{k} p^{T}_{k} B_{k}p_{k} \biggr) \le \delta \alpha _{k}g^{T}_{k} p_{k} \le 0. \end{aligned}
(4.4)

Hence, we have $$f_{k+1}\le D_{k}$$ for all $$k \in \mathcal{J}$$. In addition, using the definition of $$f_{l(k)}$$ and (3.3), we have

$$D_{k}=\frac{\xi _{k} f_{l(k)}+f_{k}}{{\xi _{k} }+1}\le \frac{\xi _{k} f_{l(k)}+f_{l(k)}}{{\xi _{k} }+1}= f_{l(k)}.$$
(4.5)

From (4.3) and (4.4) along with (4.5), we have $$f_{k+1}\le D_{k}\le f_{l(k)}\le f_{0}$$ for all $$k\in \mathbb{N}\cup \{0\}$$. Therefore, the sequence $$\{x_{k}\}$$ is contained in $$\Gamma (x_{0})$$.â€ƒâ–¡

### Lemma 4.4

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then the sequence $$\{f_{l(k)}\}$$ is convergent.

### Proof

From the definition of $$f_{l(k+1)}$$ and Lemma 4.3, we have

$$f_{l(k+1)}=\max_{0\le j\le \phi (k+1)}\{f_{k+1-j}\} \le \max _{0\le j\le \phi (k)+1}\{f_{k+1-j}\}=\max \{f_{l(k)}, f_{k+1} \}=f_{l(k)}.$$

Thus, $$\{f_{l(k)}\}$$ is a nonincreasing sequence. Also, the boundedness of $$\{f_{k}\}$$ leads to a lower bound. Therefore, the sequence $$\{f_{l(k)}\}$$ is convergent.â€ƒâ–¡

### Lemma 4.5

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then we have

$$f_{k+1}\le D_{k+1}.$$

### Proof

From definition of $$f_{l(k+1)}$$, we have $$f_{k+1}\le f_{l(k+1)}$$, for all $$k\in \mathbb{N}$$. Thus, according to (3.3), we can write

$$f_{k+1}=\frac{\xi _{k+1}f_{k+1}+f_{k+1}}{\xi _{k+1}+1}\le \frac{\xi _{k+1}f_{l(k+1)}+f_{k+1}}{\xi _{k+1}+1}=D_{k+1}, \quad \forall k \in \mathbb{N}\cup \{0\}.$$

This completes the proof of the lemma.â€ƒâ–¡

### Lemma 4.6

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then Step 4 of the algorithm is well defined.

### Proof

First, suppose by contradiction that there exists $$k \in \mathcal{J}$$ such that

$$f(x_{k}+\alpha _{k} p_{k}) > D_{k}+\delta \alpha _{k} \biggl( g^{T}_{k} p_{k}-\frac{1}{2}\alpha _{k} p^{T}_{k} B_{k} p_{k} \biggr).$$

Using Taylor expansion and Lemma 4.5, we obtain

$$f_{k}+\alpha _{k} g^{T}_{k} p_{k} + \frac{1}{2} \alpha ^{2}_{k} p^{T}_{k} \nabla ^{2} f(\zeta _{k}) p_{k}>f_{k}+ \delta \alpha _{k} \biggl( g^{T}_{k} p_{k}- \frac{1}{2}\alpha _{k} p^{T}_{k} B_{k} p_{k} \biggr).$$

for some $$\zeta _{k} \in (x_{k}, x_{k}+\alpha _{k}p_{k})$$. Therefore, using (2.11) and Assumption 4.1, we can write

$$(1-\delta ) g^{T}_{k} p_{k} + \frac{1}{2} \alpha _{k} \Vert p_{k} \Vert ^{2} ( \delta M+M_{1})>0.$$

If $$\alpha _{k}\to 0$$, then we get

$$(1-\delta ) g^{T}_{k} p_{k} >0.$$
(4.6)

Due to the fact that $$\delta \in (0, 1)$$, inequality (4.6) leads us to $$g^{T}_{k} p_{k}\ge 0$$, which contradicts (1.7). So, Step 4 in Algorithm 3.1 is well defined.â€ƒâ–¡

### Lemma 4.7

Assume that the sequence $$\{x_{k} \}$$ is generated by Algorithm 1. Then for all $$k\in \mathcal{J}$$, the step length $$\alpha _{k}$$ satisfies

$$\alpha _{k}>\frac{(1-\delta )\rho a_{1} }{a^{2}_{2}(\delta M+M_{1})}.$$

### Proof

Assume $$\alpha =\frac{\alpha _{k}}{\rho}$$. It follows from Step 4 of Algorithm 1, that

$$f(x_{k}+\alpha p_{k})> D_{k}+\delta \alpha \biggl( g^{T}_{k} p_{k}- \frac{1}{2}\alpha p^{T}_{k} B_{k} p_{k} \biggr).$$
(4.7)

By Taylorâ€™s expansion, we have

$$f(x_{k}+\alpha p_{k})=f_{k}+ \alpha g^{T}_{k} p_{k}+\frac{1}{2} \alpha ^{2} p_{k}^{T} \nabla ^{2} f(\zeta _{k})p_{k},$$
(4.8)

where $$\zeta _{k} \in (x_{k}, x_{k}+\alpha p_{k})$$. From (4.1), (4.7), (4.8), and Lemma 4.5, we obtain

$$\delta \alpha \biggl( g^{T}_{k} p_{k}- \frac{1}{2}\alpha p^{T}_{k} B_{k} p_{k} \biggr)< \alpha g^{T}_{k} p_{k}+\frac{1}{2}\alpha ^{2} M_{1} \Vert p_{k} \Vert ^{2},$$

therefore,

$$-(1-\delta ) g^{T}_{k} p_{k}< \frac{1}{2}\alpha M_{1} \Vert p_{k} \Vert ^{2}+ \frac{1}{2}\delta \alpha p^{T}_{k} B_{k} p_{k} \le \frac{1}{2}\alpha \Vert p_{k} \Vert ^{2}( \delta M+M_{1}).$$
(4.9)

The combination of (4.2) and (4.9) implies that

$$(1-\delta ) \frac{a_{1}}{a^{2}_{2}} \Vert p_{k} \Vert ^{2} < \alpha \Vert p_{k} \Vert ^{2} ( \delta M+M_{1}),$$

which completes the proof of the lemma.â€ƒâ–¡

### Lemma 4.8

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then we have

$$\lim_{k\to \infty} f_{l(k)}=\lim _{k\to \infty } f_{k}.$$
(4.10)

### Proof

We consider the following two cases:

Case 1. $$k \in \mathcal{I}$$. It follows from (3.5) and Lemma 4.5 that

$$\hat{r}_{k}= \frac {{D}_{k}-f(x_{k}+ p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})}\ge \frac {f_{k}-f(x_{k}+ p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})}\ge \mu _{0}.$$

Now, similarly as in the proof of Theorem 3.2 in [1], we can deduce that (4.10) holds.

Case 2. $$k \in \mathcal{J}$$. For $$k>N$$, using (3.2) and Lemma 4.3, we can write

\begin{aligned} f (x_{l(k)} ) &= f (x_{l(k)-1}+\alpha _{l(k)-1} p_{l(k)-1} ) \\ & \le D_{l(k)-1}+\delta \alpha _{l(k)-1} \biggl( g^{T}_{l(k)-1} p_{l(k)-1}- \frac{1}{2} \alpha _{l(k)-1}p^{T}_{l(k)-1} B_{l(k)-1} p_{l(k)-1} \biggr) \\ &\le f(x_{l(k)-1})+\delta \alpha _{l(k)-1} g^{T}_{l(k)-1} p_{l(k)-1}. \end{aligned}

So, from Lemma 4.4, since $$\alpha _{l(k)-1} g^{T}_{l(k)-1} p_{l(k)-1}<0$$, we can conclude that

$$\lim_{k\to \infty}\alpha _{l(k)-1} g^{T}_{l(k-1)} p_{l(k)-1}=0.$$
(4.11)

Now, from (1.7) along with (4.2), we have

\begin{aligned} g^{T}_{k} p_{k} & \le - \gamma \Vert g_{k} \Vert \min \biggl\lbrace \Vert p_{k} \Vert , \frac{ \Vert g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\rbrace \le -\gamma \frac{ \Vert p_{k} \Vert }{a_{2}} \min \biggl\lbrace \Vert p_{k} \Vert , \frac{ \Vert p_{k} \Vert }{a_{2} M} \biggr\rbrace \\ & \le -\frac{\gamma}{a_{2}} \min \biggl\lbrace 1, \frac{1}{a_{2}M} \biggr\rbrace \Vert p_{k} \Vert ^{2}. \end{aligned}
(4.12)

Thus, using (4.11) and (4.12), it follows that

$$\lim_{k\to \infty}\alpha _{l(k)-1} \Vert p_{l(k)-1} \Vert =0.$$

The remainder of the proof can be found in [10] and here is omittedâ€ƒâ–¡

### Corollary 4.1

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. Then we have

$$\lim_{k\to \infty} D_{k}=\lim_{k\to \infty} f_{k}.$$

### Proof

From Lemmas 4.3 and 4.5, we have $$f_{k}\le D_{k}\le f_{l(k)}$$, This completes the proof by using Lemma 4.8.â€ƒâ–¡

### Lemma 4.9

Suppose that the sequence $$\{x_{k}\}$$ is generated by Algorithm 1. If the sequence $$\{x_{k}\}$$ does not converge to a stationary point, i.e., there exists a constant $$\varepsilon >0$$ such that

$$\Vert g_{k} \Vert > \varepsilon,$$
(4.13)

holds for all $$k\in \mathbb{N}$$, then

$$f_{k+1}\le D_{k} -\varphi \min \biggl\{ \Vert p_{k} \Vert , \frac{\varepsilon}{M} \biggr\} ,$$
(4.14)

holds for all $$k\in \mathbb{N}$$.

### Proof

We consider two cases as follows:

Case 1. $$k\in \mathcal{I}$$. From (1.6), (2.11), (3.5), and (4.13), we have

$$f_{k+1}\le D_{k} -{\mu }_{0} \gamma \varepsilon \min \biggl\{ \Vert p_{k} \Vert ,\frac{\varepsilon}{M} \biggr\} ,$$
(4.15)

Case 2. $$k\in \mathcal{J}$$. Similar to Case 2 in the proof of Lemma 4.3, it follows that

$$f_{k+1} -D_{k}\le \delta \alpha _{k} g^{T}_{k} p_{k}.$$

Now from (1.7), (2.11), (4.13), and Lemma 4.7, we get

$$f_{k+1}\le D_{k} - \gamma \delta \frac{(1-\delta )\rho a_{1} }{a^{2}_{2}(\delta M+ M_{1})}\varepsilon \min \biggl\{ \Vert p_{k} \Vert ,\frac{\varepsilon}{M} \biggr\} .$$
(4.16)

Let $$\varphi =\min \lbrace \mu _{0}\varepsilon \gamma ,\gamma \varepsilon \delta \frac{(1-\delta )\rho a_{1} }{a^{2}_{2}(\delta M+M_{1})} \rbrace$$. Combining (4.15) and (4.16), we conclude that relation (4.14) is valid, for all $$k \in \mathbb{N}$$.â€ƒâ–¡

In this situation, it is possible to prove the following convergence theorems for Algorithm 1.

### Theorem 4.1

Algorithm 1 either terminates in finitely many iterations, or generates an infinite sequence $$\{x_{k}\}$$ which satisfies

$$\lim_{k\to \infty}\inf \Vert g_{k} \Vert =0.$$
(4.17)

### Proof

If Algorithm 1 terminates in finitely many iterations, the theorem is true. If (4.17) is not true, then there exists a constant $$\varepsilon > 0$$ such that (4.13) holds.

Let $$S = \lbrace k:\hat{r}_{k}\ge \mu _{0} \rbrace$$. We prove that $$\lambda _{k}\to \infty$$ and $$\Delta _{k}\to \infty$$, as $$k\to \infty$$. From

$$\lambda _{k}=\frac{1}{2}\Delta _{k} \min \bigl\lbrace \Vert g_{k} \Vert , \Vert f_{k} \Vert ,1 \bigr\rbrace ,$$

it follows that $$\lambda _{k}\to \infty \Longleftrightarrow \Delta _{k}\to \infty$$.

We consider the following cases:

Case 1. If S is a finite set, then there exists some $$\bar{k} > 0$$ such that $$\hat{r}_{k}<\mu _{0}$$ holds for all $$k> \bar{k}$$. Thus we have that $$\Delta _{k+1}\ge c_{2} \Delta _{k}$$ holds for all $$k > \bar{k}$$. Since $$c_{2} > 1$$, we conclude that

$$\Delta _{k}\to \infty , \quad \text{and thus}\quad \lambda _{k}\to \infty \quad ( \text{as } k \to \infty ).$$
(4.18)

Case 2. If S is an infinite set, then from Lemma 4.9, we have that

$$f_{k+1}- D_{k}\le -\varphi \min \biggl\{ \Vert p_{k} \Vert , \frac{\varepsilon}{M} \biggr\} \le 0$$

holds for all $$k\in S$$. Thanks to Corollary 4.1, we get

$$\lim_{ k\to \infty , k\in S} \min \biggl\{ \Vert p_{k} \Vert , \frac{\varepsilon}{M} \biggr\} =0,$$

which implies that

$$\lim_{ k\to \infty , k\in S} p_{k}=0,$$
(4.19)

From the above equality, together with relation (1.5), it follows that

$$\lambda _{k}\to \infty ,\quad \text{and thus} \quad \Delta _{k}\to \infty \quad ( \text{as } k \to \infty , k\in S).$$
(4.20)

Case 3. Suppose that $$S^{c}$$ denotes the complementary set of S and $$S ^{c}$$ is an infinite set. Now we only need to prove that $$\lambda _{k}\to \infty$$, as $$k\to \infty$$ and $$k \in S^{c}$$. Let $$I^{*}= \lbrace k_{i}:k_{i} - 1\in S \text{ and } k_{i}\in S^{c} \rbrace$$, then $$\{k_{i} -1\}$$ is an infinite subset of S. Using (4.20), we have $$\Delta _{k_{i} -1} \to \infty$$ as $$i\to \infty$$. From $$k_{i} - 1\in S$$, we conclude that $$\hat{r}_{k_{i} -1}\ge \mu _{0}$$ and $$\Delta _{k_{i}}=c_{1}\Delta _{k_{i} -1}$$ or $$\Delta _{k_{i}}=\Delta _{k_{i} -1}$$ hold. Since $$0< c_{1}<1$$, $$\Delta _{k_{i}}\ge c_{1}\Delta _{k_{i} -1}$$ holds for all $$k_{i}\in I^{*}$$.

Hence, we have

$$\Delta _{k_{i}}\to \infty \quad \text{as } k_{i} \in I ^{*}, i\to \infty .$$
(4.21)

For any $$k\notin S$$, there exists an index $$k_{i}$$ such that $$k_{i} \le k$$ and all iterations between $$k_{i}$$ and k are unsuccessful. According to the construction of Algorithm 1, we can write

$$\Delta _{k}=c_{2} \Delta _{k-1}=\cdots =c_{2}^{k-k_{i}}\Delta _{k_{i}} \ge \Delta _{k_{i}},$$
(4.22)

thus it follows from (4.21) and (4.22) that

$$\Delta _{k}\to \infty ,\quad \text{and thus}\quad \lambda _{k} \to \infty\quad \bigl( \text{as } k \to \infty , k\in S^{c}\bigr).$$
(4.23)

Combing (4.20) and (4.23), we have that

$$\Delta _{k}\to \infty , \quad \text{and thus} \quad \lambda _{k} \to \infty\quad ( \text{as } k \to \infty ).$$
(4.24)

Now from (1.5) and (4.24), we can write

$$\Vert p_{k} \Vert \le \bigl\Vert -(B_{k}+\lambda _{k} I)^{-1}g_{k} \bigr\Vert \le {\lambda}^{-1}_{k} \Vert g_{k} \Vert \to 0,$$
(4.25)

for all $$k\in \mathbb{N}$$. Thanks to (4.25), it follows that $$p_{k} \to 0$$. Therefore, using (1.6), (2.11), and Lemma 4.2 as $$k \to \infty$$, we get

$$\biggl\vert \frac {f_{k}-f(x_{k}+ p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})}-1 \biggr\vert \le \frac{O ( \Vert p_{k} \Vert ^{2} )}{\gamma \varepsilon \min \lbrace \Vert p_{k} \Vert ,\frac{\varepsilon}{M} \rbrace} \to 0,$$

from which we can deduce that

$$\hat{r}_{k}= \frac {{D}_{k}-f(x_{k}+ p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})}\ge \frac {f_{k}-f(x_{k}+ p_{k})}{\psi _{k}(0)-\psi _{k}(p_{k})}\ge \mu _{0},$$
(4.26)

for all sufficiently large k. The construction of Algorithm 1 and (4.26) shows that there exists a positive constant $$\Delta ^{*}$$ such that $$\Delta _{k}\le \Delta ^{*}$$ holds for all sufficiently large k, which contradicts (4.24). The proof is completed.â€ƒâ–¡

### Theorem 4.2

Suppose the infinite sequence $$\{x_{k}\}$$, convergent to $${x^{*}}$$, is generated by Algorithm 1. In addition, assume that $$\nabla ^{2} f(x^{*})$$ is positive definite. If the condition

$$\lim_{k\to \infty } \frac{ \Vert (\nabla ^{2} f(x^{*})-B_{k} ) p_{k} \Vert }{ \Vert p_{k} \Vert }=0,$$
(4.27)

holds, then the sequence $$\{x_{k}\}$$ converges to $$x^{*}$$ superlinearly.

### Proof

Let $$p_{k}$$ be the exact solution of (1.3) which satisfies

$$(B_{k}+\lambda _{k} I ) p_{k}=-g_{k}, \quad \text{or}\quad p_{k}= (B_{k}+\lambda _{k} I )^{-1} g_{k}.$$
(4.28)

We show that $$\hat{r}_{k} \ge \mu _{1}$$, for k sufficiently large and $$\Delta _{k}\to \infty$$ as $$k\to \infty$$. First, we define

$$r_{k}= \frac{f_{k} -f_{k+1}}{\psi _{k}(0) -\psi _{k}(p_{k})},$$
(4.29)

and then we prove that $$r_{k}\ge \mu _{1}$$. From (4.29) it follows that

$$r_{k}\ge \mu _{1}\quad \Longleftrightarrow\quad {f_{k} -f_{k+1}}-\mu _{1} \bigl( {\psi _{k}(0) -\psi _{k}(p_{k})} \bigr)\ge 0,$$

thus we can express

$$r_{k}\ge \mu _{1} \quad \Longleftrightarrow\quad f_{k+1} -\psi _{k}(p_{k})+ (1-\mu _{1} ) \bigl(\psi _{k}(p_{k})-f_{k} \bigr)\le 0.$$
(4.30)

From the direct computation, we obtain

\begin{aligned} f_{k}-\psi _{k}(p_{k})& = -g^{T}_{k} p_{k}-\frac{1}{2}p^{T}_{k} B_{k} p_{k}-\frac{1}{2} \lambda _{k} \Vert p_{k} \Vert ^{2} \\ & =- [ g_{k}+B_{k} p_{k}+\lambda _{k} p_{k} ] ^{T} p_{k} + \frac{1}{2}p^{T}_{k} B_{k} p_{k}+\frac{1}{2} \lambda _{k} \Vert p_{k} \Vert ^{2} \\ & = +\frac{1}{2}p^{T}_{k} B_{k} p_{k}+\frac{1}{2} \lambda _{k} \Vert p_{k} \Vert ^{2} \ge \frac{1}{2}\lambda _{k} \Vert p_{k} \Vert ^{2}, \end{aligned}

where the last equality is obtained from (4.28). Therefore, we have that

$$f_{k}-\psi _{k}(p_{k}) \ge \frac{1}{2}\lambda _{k} \Vert p_{k} \Vert ^{2}.$$
(4.31)

By Taylor expansion, we can write

$$f_{k+1}=f_{k}+g^{T}_{k} p_{k}+\frac{1}{2}p^{T}_{k} \nabla ^{2} f( \zeta _{k}) p_{k}$$
(4.32)

where $$\zeta _{k} \in (x_{k}, x_{k+1})$$. From (1.3) and (4.32), we get

\begin{aligned} f_{k+1} -\psi _{k}(p_{k})& =\frac{1}{2}p^{T}_{k} \bigl(\nabla ^{2} f(\zeta _{k})-B_{k} \bigr)p_{k} -\frac{1}{2} \lambda _{k} \Vert p_{k} \Vert ^{2} \\ & =\frac{1}{2}p^{T}_{k} \bigl(\nabla ^{2} f(\zeta _{k})-\nabla ^{2} f \bigl(x^{*}\bigr) \bigr)p_{k}+\frac{1}{2}p^{T}_{k} \bigl(\nabla ^{2} f\bigl(x^{*}\bigr)-B_{k} \bigr)p_{k} -\frac{1}{2} \lambda _{k} \Vert p_{k} \Vert ^{2} \\ & \le \frac{1}{2}p^{T}_{k} \bigl( \nabla ^{2} f(\zeta _{k})-\nabla ^{2} f \bigl(x^{*}\bigr) \bigr)p_{k}+\frac{1}{2}p^{T}_{k} \bigl(\nabla ^{2} f\bigl(x^{*}\bigr)-B_{k} \bigr)p_{k} \\ & \le \frac{1}{2} \bigl\Vert \nabla ^{2} f(\zeta _{k})-\nabla ^{2} f\bigl(x^{*}\bigr) \bigr\Vert \Vert p_{k} \Vert ^{2}+\frac{1}{2} \bigl\Vert \bigl(\nabla ^{2} f\bigl(x^{*} \bigr)-B_{k}\bigr)p_{k} \bigr\Vert \Vert p_{k} \Vert \\ & = \frac{1}{2} \biggl( \bigl\Vert \nabla ^{2} f(\zeta _{k})-\nabla ^{2} f\bigl(x^{*}\bigr) \bigr\Vert +\frac{ \Vert (\nabla ^{2} f(x^{*})-B_{k})p_{k} \Vert }{ \Vert p_{k} \Vert } \biggr) \Vert p_{k} \Vert ^{2}. \end{aligned}
(4.33)

Since $$\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})$$, using (4.27) and (4.33), we obtain

$$f_{k+1} -\psi _{k}(p_{k}) \le 0.$$
(4.34)

It follows from (4.31), (4.34), and $$0< \mu _{1}<1$$ that

$$f_{k+1} -\psi _{k}(p_{k})+ (1-\mu _{1} ) \bigl(\psi _{k}(p_{k})-f_{k} \bigr)\le -\frac{1}{2} (1-\mu _{1} ) \lambda _{k} \Vert p_{k} \Vert ^{2}\le 0,$$

therefore, from (4.30), we have $$r_{k}\ge \mu _{1}$$. Now using (3.5) and Lemma 4.5, we can write

$$\hat{r}_{k}=\frac{D_{k} -f_{k+1}}{\psi _{k}(0) -\psi _{k}(p_{k})}\ge \frac{f_{k} -f_{k+1}}{\psi _{k}(0) -\psi _{k}(p_{k})}=r_{k} \ge \mu _{1},$$

for k sufficiently large. Using Step 4 of Algorithm 1, we have $$\Delta _{k+1}=c_{1}\Delta _{k}$$, which implies that $$\lim_{k\to \infty}\Delta _{k} \to 0$$. Therefore, from (1.4), we get

$$\lim_{k\to \infty}\lambda _{k} \to 0 .$$
(4.35)

By the Taylor expansion, we can write

$$g_{k+1}= g_{k}+\nabla ^{2} f(\zeta _{k})p_{k},$$
(4.36)

where $$\zeta _{k} \in (x_{k},x_{k+1})$$. So, from (4.28) and (4.36) for k sufficiently large, we have

\begin{aligned} \Vert g_{k+1} \Vert & = \bigl\Vert \nabla ^{2} f(\zeta _{k})p_{k}- B_{k} p_{k} -\lambda _{k} p_{k} \bigr\Vert \\ & = \bigl\Vert \bigl(\nabla ^{2} f(\zeta _{k})-\nabla ^{2} f\bigl(x^{*}\bigr) \bigr)p_{k} - \bigl( B_{k}-\nabla ^{2} f\bigl(x^{*}\bigr) \bigr)p_{k}- \lambda _{k} p_{k} \bigr\Vert \\ & \le \biggl[ \bigl\Vert \bigl(\nabla ^{2} f(\zeta _{k})-\nabla ^{2} f\bigl(x^{*}\bigr) \bigr) \bigr\Vert + \frac{ \Vert ( B_{k}-\nabla ^{2} f(x^{*}) )p_{k} \Vert }{ \Vert p_{k} \Vert } +\lambda _{k} \biggr] \Vert p_{k} \Vert . \end{aligned}

Thus we can write

$$\frac{ \Vert g_{k+1} \Vert }{ \Vert p_{k} \Vert }\le \biggl[ \bigl\Vert \bigl(\nabla ^{2} f( \zeta _{k})-\nabla ^{2} f \bigl(x^{*}\bigr) \bigr) \bigr\Vert + \frac{ \Vert ( B_{k}-\nabla ^{2} f(x^{*}) )p_{k} \Vert }{ \Vert p_{k} \Vert } +\lambda _{k} \biggr].$$
(4.37)

The right-hand side of (4.37) converges to 0 due to the fact that $$\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})$$, and by using (4.27) and (4.35). Hence, we deduce that

$$\lim_{k\to \infty}\frac{ \Vert g_{k+1} \Vert }{ \Vert p_{k} \Vert }=0.$$
(4.38)

Since $$f (x)$$ is twice continuously differentiable, along with Assumption 4.1, we conclude that there exists $$\tau > 0$$ such that

$$\tau \bigl\Vert x_{k+1}-x^{*} \bigr\Vert \le \Vert g_{k+1} \Vert .$$
(4.39)

From (4.39), it follows that

\begin{aligned} \frac{ \Vert g_{k+1} \Vert }{ \Vert p_{k} \Vert }\ge \frac{\tau \Vert x_{k+1}-x^{*} \Vert }{ \Vert x_{k+1}-x_{k} \Vert } &\ge \frac{\tau \Vert x_{k+1}-x^{*} \Vert }{ \Vert x_{k+1}-x^{*} \Vert + \Vert x_{k}-x^{*} \Vert } \\ &= \frac{\frac{ \Vert x_{k+1}-x^{*} \Vert }{ \Vert x_{k}-x^{*} \Vert }}{\frac{ \Vert x_{k+1}-x^{*} \Vert }{ \Vert x_{k}-x^{*} \Vert }+1}. \end{aligned}
(4.40)

In view of (4.38), from (4.40) we have $$\lim_{k\to \infty} \frac{ \| x_{k+1}-x^{*}\|}{\| x_{k}-x^{*}\|}=0$$. Therefore, the convergence rate $$\{x_{k}\}$$ is superlinear.â€ƒâ–¡

## 5 Numerical experiments

In this section, we report the performance of the proposed algorithm, CNTR, as well as some comparisons of the CNTR algorithm with the NARQNLS algorithm of Zhang and Ni [24] and the NTRLS algorithm of Qunyan and Dan [18].

The experiments have been performed on a set of unconstrained test functions. All test functions are chosen from Andrei [3], which are listed in Table 1.

We performed numerical calculations in MATLAB R2020b (9.9.0.1467703) programming environment. The codes were run on a PC processor (Intel (R) CORE (TM) i7-1355U-2.8 GHz, RAM 16 GB).

In practical implementations for CNTR algorithm, we set $$\mu _{0}=0.1$$, $$\mu _{1}=0.8$$, $$c_{1}=0.25$$, $$c_{2}=2$$, $$\underline{L}=0.001$$, $$\bar{U}=1000$$, $$\upsilon =10^{-5}$$, $$\Delta _{0}=0.1$$, $$\rho =0.5$$, $$\ \pi _{0}=0.4$$, $$N=10$$. If $$t_{k}< 0$$, then $$\sigma _{1} = 0$$, $$\sigma _{2} = 1$$; if $$0\le t_{k} \le 10$$, then $$\sigma _{1} = 0.5$$, $$\sigma _{2} = 5$$; if $$t_{k}>10$$, then $$\sigma _{1} = 1$$, $$\sigma _{2} = 10$$. To calculate $$\xi _{k}$$, we consider $$\xi _{0}=0.85$$ and then update $$\xi _{k}$$ with relation (3.4).

The value of all parameters used for the NARQNLS and NTRLS algorithms are similar to those in [24] and [18], respectively. All algorithms were ended with an iterate satisfying $$\|g(x_{k})\|\le 10^{-5}$$ or $$k> 10000$$.

The results obtained are reported in Table 2. Then notations listed in the tables are defined as follows: NI, the number of iterations; NF, the number of function evaluations; and $$(-)$$, the number of iterations over 10000.

Taking a glance at Table 2, it can be seen that the CNTR algorithm has solved all the test functions, while the other considered algorithms have failed in some cases.

Dolan and MorÃ© [9] proposed a new method to compare the performance of iterative algorithms with a statistical process by displaying performance profiles.

Figures 1, 2, and 3 show the performance profile of CNTR and the other considered algorithms in terms of the number of iterations $$(NI)$$, number of function evaluations $$(NF)$$, and CPU time, respectively.

From Figs. 1 and 2, it can be easily seen that CNTR has the most wins among all considered algorithms. More precisely, the CNTR algorithm is the best in terms of the total number of iterations and function evaluations in more than 47% and 68% of the test functions, respectively. In Fig. 3, we observe that in more than 85% of cases, the CNTR algorithm is faster than the other algorithms. Another remarkable factor of these three figures is that the performance profile of the CNTR algorithm grows faster than the other profiles. These observations imply that the CNTR algorithm is more efficient and robust than the other considered algorithms.

## 6 Conclusion

Minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8], we have introduced an appropriate diagonal matrix estimation of the Hessian. The Hessian estimate has been used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. To overcome the adverse effect of monotonicity, we have introduced a new nonmonotone strategy. The global and superlinear convergence of the proposed algorithm has been established under some standard conditions. It has been shown by the Dolan-MorÃ© performance profile that the suggested algorithm is efficient and robust in applying the set of unconstrained optimization test functions.

## Data Availability

No datasets were generated or analysed during the current study.

## References

1. Ahookhosh, M., Amini, K.: A nonmonotone trust region method with adaptive radius for unconstrained optimization. Comput. Math. Appl. 60(3), 411â€“422 (2010)

2. Ahookhosh, M., Amini, K., Peyghami, M.R.: A non-monotone trust region line search method for large scale unconstrained optimization. Appl. Math. Model. 36(1), 478â€“487 (2012)

3. Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147â€“161 (2008)

4. Andrei, N.: A diagonal quasi-Newton updating method for unconstrained optimization. Numer. Algorithms 81(4), 575â€“590 (2019)

5. Andrei, N.: Modern Numerical Nonlinear Optimization. Springer Optimization and Its Applications, vol.Â 195. Springer, Berlin (2022)

6. Byrd, R., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727â€“739 (1989)

7. Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245â€“295 (2011)

8. Dennis, J.E., Wolkowicz, H.: Sizing and least-change secant methods. SIAM J. Numer. Anal. 30(5), 1291â€“1314 (1993)

9. Dolan, E.D., MorÃ©, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201â€“213 (2002)

10. Grippo, L., Lampariello, F., Lucidi, S.: A non-monotone line search technique for Newtonâ€™s method. SIAM J. Numer. Anal. 23(4), 707â€“716 (1986)

11. Leong, W.J., Enshaei, S., Kek, S.L.: Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm. Numer. Algorithms 86(3), 1225â€“1241 (2021)

12. Li, Y.J., Li, D.H.: Truncated regularized Newton method for convex minimizations. Comput. Optim. Appl. 43(1), 119â€“131 (2009)

13. Liu, J., Ma, C.: A non-monotone trust region method with new inexact line search for unconstrained optimization. Numer. Algorithms 64(1), 1â€“20 (2013)

14. Nash, S.G.: Preconditioning of truncated-Newton methods. SIAM J. Sci. Stat. Comput. 6(3), 599â€“616 (1985)

15. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773â€“782 (1980)

16. Polyak, R.A.: Regularized Newton method for unconstrained convex optimization. Math. Program. 120(1), 125â€“145 (2009)

17. Powell, M.J.D.: Convergence properties of a class minimization algorithms. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming, vol.Â 2, pp.Â 1â€“25. Academic Press, New York (1975)

18. Qunyan, Z., Dan, H.: Non-monotone adaptive trust region method with line search based on new diagonal updating. Appl. Numer. Math. 91, 75â€“88 (2015)

19. Sun, W., Yuan, Y.X.: Optimization Theory and Methods. Nonlinear Programming. Springer, New York (2006)

20. Toint, P.L.: An assessment of non-monotone line search techniques for unconstrained optimization. SIAM J. Sci. Comput. 17(3), 725â€“739 (1996)

21. Ueda, K., Yamashita, N.: A regularized Newton method without line search for unconstrained optimization. Comput. Optim. Appl. 59(1â€“2), 321â€“351 (2014)

22. Wan, Z., Huang, S., Zheng, X.D.: New cautious BFGS algorithm based on modified Armijo-type line search. J. Inequal. Appl. 2012(1), 1 (2012)

23. Zhang, H., Ni, Q.: A new regularized quasi-Newton algorithm for unconstrained optimization. Appl. Math. Comput. 259, 460â€“469 (2015)

24. Zhang, H., Ni, Q.: A new regularized quasi-Newton method for unconstrained optimization. Optim. Lett. 12(1), 1639â€“1658 (2018)

25. Zhang, H.C., Hager, W.W.: A non-monotone line search technique for unconstrained optimization. SIAM J. Optim. 14(4), 1043â€“1056 (2004)

26. Zhou, W., Chen, X.: On the convergence of a modified regularized Newton method for convex optimization with singular solutions. J. Comput. Appl. Math. 239(1), 179â€“188 (2013)

27. Zhu, M., Nazareth, J.L., Wolkowicz, H.: The quasi-Cauchy relation and diagonal updating. SIAM J. Optim. 9(4), 1192â€“1204 (1999)

Not applicable.

Not applicable.

## Author information

Authors

### Contributions

The authors confirm contribution to the manuscript as follows: study conception and design: Ali Ashrafi convergence analysis: Seyed Hamzeh Mirzaei; performing numerical tests and interpretation of results:Seyed Hamzeh Mirzaei; draft manuscript preparation: Seyed Hamzeh Mirzaei and Ali. Ashrafi; All authors reviewed the results and approved the final version of the manuscript.

### Corresponding author

Correspondence to Ali Ashrafi.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

### Publisherâ€™s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and permissions

Mirzaei, S.H., Ashrafi, A. Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method. J Inequal Appl 2024, 90 (2024). https://doi.org/10.1186/s13660-024-03161-x