- Research
- Open access
- Published:

# Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method

*Journal of Inequalities and Applications*
**volumeÂ 2024**, ArticleÂ number:Â 90 (2024)

## Abstract

In this paper, a new appropriate diagonal matrix estimation of the Hessian is introduced by minimizing the Byrd and Nocedal function subject to the weak secant equation. The Hessian estimate is used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. Moreover, to counteract the adverse effect of monotonicity, we introduce a new nonmonotone strategy. The global and superlinear convergence of the suggested algorithm is established under some standard conditions. The numerical experiments on unconstrained optimization test functions show that the new algorithm is efficient and robust.

## 1 Introduction

In this paper, we deal with the following unconstrained optimization problem:

where \(f:\mathbb{R}^{n} \to \mathbb{R}\) is a twice continuously differentiable function.

Line search (LS) and trust region (TR) methods are two prominent classes of iterative methods to solve the problem (1.1). The LS method, for a given initial point \(x_{0} \in \mathbb{R}^{n}\), is a procedure that computes a step length \(\alpha _{k}\) in the specific direction \(p_{k}\) and considers a new point as \(x_{k+1} = x_{k} +\alpha _{k} p_{k}\). On the other hand, the TR algorithm computes a trial step \(p_{k}\) which is an approximate solution of the following quadratic subproblem:

in which \(g_{k}=\nabla f(x_{k})\), \(B_{k}\in \mathbb{R}^{n\times n}\) is the exact Hessian \(\nabla ^{2} f(x_{k})\), or a symmetric approximation of it, and \({\Delta }_{k}{\mathrm{>0}}\) is the TR radius. In the rest of the paper, \(\|\cdot \|\) refers to the Euclidean norm. According to [19] (also see [5, Theorem 8.5]), \(p^{*}\) is the exact solution of (1.2) if and only if there exists a \(\lambda \ge 0\) such that

and \((B_{k} + \lambda I)\) is positive semidefinite.

The regularized Newton method (RNM) is another efficient approach for solving the problem (1.1) and has good convergence properties; see [7, 16, 21, 23, 24, 26]. At each iteration of the RNM, the trial step \(p_{k}\) is obtained by approximately minimizing the following unconstrained quadratic function:

where \(f_{k}=f(x_{k})\) and \(\lambda _{k}\) is called the regularized parameter. Here we define

It is worth noting that the update rule of \(\Delta _{k}\) is similar to the TR radius. At each iteration, the RNM method obtains the trial step \(p_{k}\) by solving the following regularized Newton equation:

where *I* is the identity matrix and \((B_{k}+\lambda _{k} I)\) is positive semidefinite. Therefore, \(p_{k}\) is well defined. If \((B_{k}+\lambda _{k} I)\) is positive definite, \(p_{k}\) is unique. We can conclude that \(p_{k}\) solves (1.5) if and only if it is the global minimizer of the unconstrained quadratic function (1.3).

If we let \(\Delta _{k}= \|p_{k}\| = \| - (B_{k} + \lambda _{k} I)^{-1} g_{k}\|\), then it can be verified (see [19, Theorem 6.1.2]) that \(p_{k}\) is also a solution of the TR subproblem (1.2).

By the famous result given by Powell in [17] (also see [19, Lemma 6.1.3]), we know that

The preceding inequality and (1.3) indicate that

where \(\gamma \in (0,1)\) is a constant.

Generally, solving the TR subproblem is more expensive than the RNM subproblem. In the RNM, only one equation (1.5) is solved at each iteration. Hence, the computational cost of obtaining an RNM step is much lower than solving a TR subproblem [24].

The most common update formula for \(B_{k}\) is the BFGS update formula. Numerically, this method needs \(O(n^{2})\) storage, which makes it unsuitable for large-scale problems. The application of quasi-Newton methods for solving large-scale unconstrained optimization problems has been extended by limited-memory quasi-Newton methods [15] and truncated Newton methods [12, 14]. However, the implementation of these methods for their practical usage is very sophisticated, and the associated software is quite complex [3]. Therefore, researchers have considered an alternative approach for the matrix \(B_{k}\), in which a diagonal matrix \(B_{k} = \mathrm{diag}(b^{(1)} _{k} , b^{(2)}_{k} ,\dots , b^{(n)}_{k} )\) is used to approximate the Hessian matrix [4, 11, 18, 27]. Observe that in this method, only \(O(n)\) storage is required to store \(B_{k}\) [4].

This paper first introduces an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8]. Subsequently, a new nonmonotone strategy to overcome the adverse effect of monotonically is introduced. The estimation of the Hessian is used to correct the framework of a new nonmonotone TR algorithm with the regularized quasi-Newton method. The suggested algorithm exploits a stronger nonmonotone strategy far from the solution and a weaker one close to the solution. We prove that the new algorithm is globally and superlinearly convergent.

In the next section, an appropriate diagonal matrix estimation of the Hessian is derived. In Sect. 3, the new nonmonotone strategy and the structure of the suggested algorithm are explained. Section 4 is associated with the convergence analysis of the new algorithm. In Sect. 5, some numerical experiments on a set of unconstrained optimization test problems are examined. The conclusions are given in Sect. 6.

## 2 Derivation of new diagonal updating

In the quasi-Newton method framework, the Hessian approximation matrix \(B_{k+1}\) is usually required to satisfy the secant equation

where \(s_{k}=x_{k+1}-x_{k}\) and \(y_{k}=g_{k+1}-g_{k}\). To find an appropriate diagonal matrix estimation of the Hessian in the sense of

we assume that \(B_{k}\) is positive definite, and \(s^{T}_{k}y_{k} > 0\) for all *k*. Since it is difficult for a diagonal matrix to satisfy the known secant equation (2.1), we will consider that \(B_{k+1}\) satisfies the weak secant equation of Dennis and Wolkowicz [8], namely

The motivation for using the weak secant equation (2.2) can be seen in [4]. Byrd and Nocedal [6] introduced the function

defined on positive definite matrices, where \(\ln (\cdot )\) denotes the natural logarithm. This is an elegant and efficient tool for analyzing the global properties of quasi-Newton methods. We will introduce an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation (2.2) as follows:

subject to

To achieve a new diagonal updating formula, we give the following penalized version of (2.3) and (2.4):

Now, having in mind that \(\mathrm{tr}(B_{k+1}) = b^{1}_{ k+1} +\cdots + b^{n} _{k+1}\) and \(\det (B_{k+1}) = b^{1} _{k+1} \cdots b^{n}_{ k+1}\), the minimization problem (2.5) becomes

where \(s^{i}_{k}, i = 1, \dots , n\), are the components of vector \(s_{k}\).

The required solution of (2.3) and (2.4) is a stationary point of the penalized function. Hence, from (2.6), we have

Therefore, using (2.7), the elements of the diagonal matrix \(B_{k+1}\) can be expressed as

which are positive and well defined for \(s^{i}_{k}\neq 0\). Since we have \(s^{T}_{k}y_{k} > 0\) for all *k*, to ensure positiveness as well as uniformly boundedness of \(b^{i}_{k+1}\) given by (2.8) in general situations, we set

where *Ï…* is a small positive constant. Therefore, our Hessian approximation can be given by

A crucial problem is choosing the bounds \(L_{k}\) and \(U_{k}\). Here, we introduce an adaptive strategy to determine them. Let us begin by considering the curvature of \(f (x)\) in direction \(s_{k}\), which is represented by

in which \(\bar{H}_{k} = \int _{0}^{1} \nabla ^{2} f (x_{k} +ts_{k})\,dt\) is the average Hessian matrix along \(s_{k}\). Since it is not practical to compute the eigenvalue of the Hessian matrix in each iteration, we can estimate its size based on the scalar

Now \(L_{k}\) and \(U_{k}\) in (2.9) can be chosen according to the value of \(t_{k}\) as follows:

where \(0 \le \sigma _{1}(t_{k}) \le 1\), \(\sigma _{2}(t_{k})\ge 1\), \(0< \underline{L} < 1\), and \(\bar{U} > 1\). Obviously, the values of two bounds can be adjusted by \(\sigma _{1}\) and \(\sigma _{2}\), whose values depend on \(t_{k}\). According to the relation (2.9), there exist two positive constants *m* and *M* such that

## 3 Nonmonotone strategy and new algorithm

Grippo et al. [10] found that monotonically decreasing objective function values in the classical iterative schemes for solving (1.1) may reduce the convergence speed of the TR method, especially in the presence of narrow curved valleys. Also, see [20]. As a remedy, scholars put their best efforts into developing nonmonotone strategies that guarantee global convergence [1, 2, 18, 25]. The pioneering nonmonotone LS method was introduced by Grippo et al. [10] as follows:

in which \(\sigma \in (0, 1)\) is a constant,

\(\phi (0) = 0\), \(0 \le \phi (k) \le \min \{\phi (k -1) + 1, N\}\) for all \(k\ge 1\), and *N* is a nonnegative integer. Despite the good advantages of this strategy, Zhang and Hager [25] found that this method suffers from various weaknesses. Therefore, a nonmonotone strategy based on the weighted average of previous consecutive iterations was proposed by them. Moreover, using an adaptive convex combination of \(f_{l(k)}\) and \(f_{k}\), Amini et al. [2] put an effective substitution in (3.1).

To counteract the adverse effect of monotonicity, here we introduce the following hybrid nonmonotone LS condition:

where \(\delta \in (0,1)\) is a constant and

with \(\xi _{k}\in [0,1]\). As we see, the definition of mean values \(D_{k}\) implies that each \(D_{k}\) is a convex combination of the \(f_{l(k)}\) and \(f_{k}\). For given \(\xi _{0}\in [0,1]\), to calculate \(\xi _{k}\) we employ the following update formula:

The new nonmonotone LS is performed in a backtracking scheme. That is, the step length \(\alpha _{k}\) is the largest member of \(\{\rho ^{j} \beta _{k}\}_{j \ge 0}\) with \(\rho \in (0,1)\) and \(\beta _{k}>0 \) which satisfies inequality (3.2) [2, 19]. Similar to [13], we set \(\beta _{k}=-{g^{T}_{k} p_{k}}/{\pi _{k}\|p_{k}\|^{2}}\) such that \(\pi _{k}=\|y_{k}\|/\|p_{k}\|\).

Let \(p_{k}\) be the solution of (1.3) in which \(B_{k+1}\) is a diagonal matrix. To determine whether a trial step will be accepted, we compute \(\hat{r}_{k}\) as the ratio between of \(f(x)\) and the model function \(\psi _{k}(p)\) by the following relation:

where \(D_{k}\) is computed by (3.3).

The new TR ratio implies that the suggested algorithm benefits from the best convergence results with a stronger nonmonotone strategy far from the solution and a weaker one close to the solution; see [25].

Now, we can present the framework of the new algorithm as follows (see Algorithm 1).

## 4 Convergence analysis

In this section, we examine the convergence properties of the suggested algorithm. To this end, the following standard assumption is needed [18].

### Assumption 4.1

The level set \(\varGamma (x_{0}) = \{x\vert f (x)\le f (x_{0})\}\), satisfies \(\varGamma (x_{0}) \subset \Im \), where â„‘ is a closed and bounded set of \(\mathbb{R}^{n}\).

### Remark 1

Let \(f(x)\) be a twice continuously differentiable function. Therefore, Assumption 4.1 implies that there exists a constant \(M_{1} > 0\) such that

Therefore, using the mean value theorem, one can conclude that

which means that \(g(x)\) is Lipschitz continuous in â„‘.

To establish global convergence of the iterative scheme \(x_{k+1} = x_{k} + \alpha _{k} p_{k}\), with the backtracking LS satisfying (3.3), we assume that Assumption 4.1 holds and the direction \(p_{k}\) satisfies the following sufficient descent conditions:

where \(a_{1}\) and \(a_{2}\) are two positive real-valued constants. For convenience in the discussion, we consider two index sets as follows:

### Lemma 4.1

*Suppose that the sequence* \(\{B_{k}\}\) *is generated by Algorithm *1. *Then for any* *k*, \(B_{k}\) *is positive definite*.

### Proof

According to definition of \(B_{k}=\mathrm{diag}(\bar{b}^{1}_{k},\bar{b}^{2}_{k},\dots ,\bar{b}^{n}_{k})\), and relation (2.9), this is obvious.â€ƒâ–¡

### Lemma 4.2

*Suppose that sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then we have*

### Proof

Using the Taylor expansion with (2.11) and Assumption 4.1, we get

Hence, the proof is complete.â€ƒâ–¡

### Lemma 4.3

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then for all* \(k\in \mathbb{N}\cup \{0\}\), *we have* \(x_{k} \in \Gamma (x_{0})\).

### Proof

We consider two cases.

Case 1. If \(k\in \mathcal{I}\), then from (1.6) and (3.5), we can write

which shows that \(f_{k+1}\le D_{k}\), for all \(k \in \mathcal{I}\).

Case 2. If \(k\in \mathcal{J}\), then the trial step is rejected and LS must be performed. Through (1.7) we know that \(g ^{T}_{k} p_{k} \le 0\) for all *k*. Therefore, from this inequality along with (3.2), we conclude

Hence, we have \(f_{k+1}\le D_{k}\) for all \(k \in \mathcal{J}\). In addition, using the definition of \(f_{l(k)}\) and (3.3), we have

From (4.3) and (4.4) along with (4.5), we have \(f_{k+1}\le D_{k}\le f_{l(k)}\le f_{0} \) for all \(k\in \mathbb{N}\cup \{0\}\). Therefore, the sequence \(\{x_{k}\}\) is contained in \(\Gamma (x_{0})\).â€ƒâ–¡

### Lemma 4.4

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then the sequence* \(\{f_{l(k)}\}\) *is convergent*.

### Proof

From the definition of \(f_{l(k+1)}\) and Lemma 4.3, we have

Thus, \(\{f_{l(k)}\}\) is a nonincreasing sequence. Also, the boundedness of \(\{f_{k}\}\) leads to a lower bound. Therefore, the sequence \(\{f_{l(k)}\}\) is convergent.â€ƒâ–¡

### Lemma 4.5

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then we have*

### Proof

From definition of \(f_{l(k+1)}\), we have \(f_{k+1}\le f_{l(k+1)}\), for all \(k\in \mathbb{N}\). Thus, according to (3.3), we can write

This completes the proof of the lemma.â€ƒâ–¡

### Lemma 4.6

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then Step *4 *of the algorithm is well defined*.

### Proof

First, suppose by contradiction that there exists \(k \in \mathcal{J}\) such that

Using Taylor expansion and Lemma 4.5, we obtain

for some \(\zeta _{k} \in (x_{k}, x_{k}+\alpha _{k}p_{k})\). Therefore, using (2.11) and Assumption 4.1, we can write

If \(\alpha _{k}\to 0\), then we get

Due to the fact that \(\delta \in (0, 1)\), inequality (4.6) leads us to \(g^{T}_{k} p_{k}\ge 0\), which contradicts (1.7). So, Step 4 in Algorithm 3.1 is well defined.â€ƒâ–¡

### Lemma 4.7

*Assume that the sequence* \(\{x_{k} \}\) *is generated by Algorithm *1. *Then for all* \(k\in \mathcal{J}\), *the step length* \(\alpha _{k}\) *satisfies*

### Proof

Assume \(\alpha =\frac{\alpha _{k}}{\rho} \). It follows from Step 4 of Algorithm 1, that

By Taylorâ€™s expansion, we have

where \(\zeta _{k} \in (x_{k}, x_{k}+\alpha p_{k})\). From (4.1), (4.7), (4.8), and Lemma 4.5, we obtain

therefore,

The combination of (4.2) and (4.9) implies that

which completes the proof of the lemma.â€ƒâ–¡

### Lemma 4.8

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then we have*

### Proof

We consider the following two cases:

Case 1. \(k \in \mathcal{I}\). It follows from (3.5) and Lemma 4.5 that

Now, similarly as in the proof of Theorem 3.2 in [1], we can deduce that (4.10) holds.

Case 2. \(k \in \mathcal{J}\). For \(k>N\), using (3.2) and Lemma 4.3, we can write

So, from Lemma 4.4, since \(\alpha _{l(k)-1} g^{T}_{l(k)-1} p_{l(k)-1}<0 \), we can conclude that

Now, from (1.7) along with (4.2), we have

Thus, using (4.11) and (4.12), it follows that

The remainder of the proof can be found in [10] and here is omittedâ€ƒâ–¡

### Corollary 4.1

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *Then we have*

### Proof

From Lemmas 4.3 and 4.5, we have \(f_{k}\le D_{k}\le f_{l(k)} \), This completes the proof by using Lemma 4.8.â€ƒâ–¡

### Lemma 4.9

*Suppose that the sequence* \(\{x_{k}\}\) *is generated by Algorithm *1. *If the sequence* \(\{x_{k}\}\) *does not converge to a stationary point*, *i*.*e*., *there exists a constant* \(\varepsilon >0\) *such that*

*holds for all* \(k\in \mathbb{N}\), *then*

*holds for all* \(k\in \mathbb{N}\).

### Proof

We consider two cases as follows:

Case 1. \(k\in \mathcal{I}\). From (1.6), (2.11), (3.5), and (4.13), we have

Case 2. \(k\in \mathcal{J}\). Similar to Case 2 in the proof of Lemma 4.3, it follows that

Now from (1.7), (2.11), (4.13), and Lemma 4.7, we get

Let \(\varphi =\min \lbrace \mu _{0}\varepsilon \gamma ,\gamma \varepsilon \delta \frac{(1-\delta )\rho a_{1} }{a^{2}_{2}(\delta M+M_{1})} \rbrace \). Combining (4.15) and (4.16), we conclude that relation (4.14) is valid, for all \(k \in \mathbb{N}\).â€ƒâ–¡

In this situation, it is possible to prove the following convergence theorems for Algorithm 1.

### Theorem 4.1

*Algorithm *1 *either terminates in finitely many iterations*, *or generates an infinite sequence* \(\{x_{k}\}\) *which satisfies*

### Proof

If Algorithm 1 terminates in finitely many iterations, the theorem is true. If (4.17) is not true, then there exists a constant \(\varepsilon > 0\) such that (4.13) holds.

Let \(S = \lbrace k:\hat{r}_{k}\ge \mu _{0} \rbrace \). We prove that \(\lambda _{k}\to \infty \) and \(\Delta _{k}\to \infty \), as \(k\to \infty \). From

it follows that \(\lambda _{k}\to \infty \Longleftrightarrow \Delta _{k}\to \infty \).

We consider the following cases:

Case 1. If *S* is a finite set, then there exists some \(\bar{k} > 0\) such that \(\hat{r}_{k}<\mu _{0}\) holds for all \(k> \bar{k}\). Thus we have that \(\Delta _{k+1}\ge c_{2} \Delta _{k}\) holds for all \(k > \bar{k}\). Since \(c_{2} > 1\), we conclude that

Case 2. If *S* is an infinite set, then from Lemma 4.9, we have that

holds for all \(k\in S\). Thanks to Corollary 4.1, we get

which implies that

From the above equality, together with relation (1.5), it follows that

Case 3. Suppose that \(S^{c}\) denotes the complementary set of *S* and \(S ^{c}\) is an infinite set. Now we only need to prove that \(\lambda _{k}\to \infty \), as \(k\to \infty \) and \(k \in S^{c}\). Let \(I^{*}= \lbrace k_{i}:k_{i} - 1\in S \text{ and } k_{i}\in S^{c} \rbrace \), then \(\{k_{i} -1\}\) is an infinite subset of *S*. Using (4.20), we have \(\Delta _{k_{i} -1} \to \infty \) as \(i\to \infty \). From \(k_{i} - 1\in S\), we conclude that \(\hat{r}_{k_{i} -1}\ge \mu _{0}\) and \(\Delta _{k_{i}}=c_{1}\Delta _{k_{i} -1}\) or \(\Delta _{k_{i}}=\Delta _{k_{i} -1}\) hold. Since \(0< c_{1}<1\), \(\Delta _{k_{i}}\ge c_{1}\Delta _{k_{i} -1}\) holds for all \(k_{i}\in I^{*}\).

Hence, we have

For any \(k\notin S\), there exists an index \(k_{i}\) such that \(k_{i} \le k\) and all iterations between \(k_{i}\) and *k* are unsuccessful. According to the construction of Algorithm 1, we can write

thus it follows from (4.21) and (4.22) that

Combing (4.20) and (4.23), we have that

Now from (1.5) and (4.24), we can write

for all \(k\in \mathbb{N}\). Thanks to (4.25), it follows that \(p_{k} \to 0\). Therefore, using (1.6), (2.11), and Lemma 4.2 as \(k \to \infty \), we get

from which we can deduce that

for all sufficiently large *k*. The construction of Algorithm 1 and (4.26) shows that there exists a positive constant \(\Delta ^{*}\) such that \(\Delta _{k}\le \Delta ^{*} \) holds for all sufficiently large *k*, which contradicts (4.24). The proof is completed.â€ƒâ–¡

### Theorem 4.2

*Suppose the infinite sequence* \(\{x_{k}\}\), *convergent to* \({x^{*}}\), *is generated by Algorithm *1. *In addition*, *assume that* \(\nabla ^{2} f(x^{*})\) *is positive definite*. *If the condition*

*holds*, *then the sequence* \(\{x_{k}\}\) *converges to* \(x^{*}\) *superlinearly*.

### Proof

Let \(p_{k}\) be the exact solution of (1.3) which satisfies

We show that \(\hat{r}_{k} \ge \mu _{1}\), for *k* sufficiently large and \(\Delta _{k}\to \infty \) as \(k\to \infty \). First, we define

and then we prove that \(r_{k}\ge \mu _{1}\). From (4.29) it follows that

thus we can express

From the direct computation, we obtain

where the last equality is obtained from (4.28). Therefore, we have that

By Taylor expansion, we can write

where \(\zeta _{k} \in (x_{k}, x_{k+1})\). From (1.3) and (4.32), we get

Since \(\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})\), using (4.27) and (4.33), we obtain

It follows from (4.31), (4.34), and \(0< \mu _{1}<1\) that

therefore, from (4.30), we have \(r_{k}\ge \mu _{1}\). Now using (3.5) and Lemma 4.5, we can write

for *k* sufficiently large. Using Step 4 of Algorithm 1, we have \(\Delta _{k+1}=c_{1}\Delta _{k}\), which implies that \(\lim_{k\to \infty}\Delta _{k} \to 0 \). Therefore, from (1.4), we get

By the Taylor expansion, we can write

where \(\zeta _{k} \in (x_{k},x_{k+1})\). So, from (4.28) and (4.36) for *k* sufficiently large, we have

Thus we can write

The right-hand side of (4.37) converges to 0 due to the fact that \(\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})\), and by using (4.27) and (4.35). Hence, we deduce that

Since \(f (x)\) is twice continuously differentiable, along with Assumption 4.1, we conclude that there exists \(\tau > 0\) such that

From (4.39), it follows that

In view of (4.38), from (4.40) we have \(\lim_{k\to \infty} \frac{ \| x_{k+1}-x^{*}\|}{\| x_{k}-x^{*}\|}=0 \). Therefore, the convergence rate \(\{x_{k}\}\) is superlinear.â€ƒâ–¡

## 5 Numerical experiments

In this section, we report the performance of the proposed algorithm, CNTR, as well as some comparisons of the CNTR algorithm with the NARQNLS algorithm of Zhang and Ni [24] and the NTRLS algorithm of Qunyan and Dan [18].

The experiments have been performed on a set of unconstrained test functions. All test functions are chosen from Andrei [3], which are listed in Table 1.

We performed numerical calculations in MATLAB R2020b (9.9.0.1467703) programming environment. The codes were run on a PC processor (Intel (R) CORE (TM) i7-1355U-2.8 GHz, RAM 16 GB).

In practical implementations for CNTR algorithm, we set \(\mu _{0}=0.1\), \(\mu _{1}=0.8\), \(c_{1}=0.25\), \(c_{2}=2\), \(\underline{L}=0.001\), \(\bar{U}=1000\), \(\upsilon =10^{-5}\), \(\Delta _{0}=0.1\), \(\rho =0.5\), \(\ \pi _{0}=0.4\), \(N=10\). If \(t_{k}< 0 \), then \(\sigma _{1} = 0\), \(\sigma _{2} = 1\); if \(0\le t_{k} \le 10 \), then \(\sigma _{1} = 0.5\), \(\sigma _{2} = 5\); if \(t_{k}>10 \), then \(\sigma _{1} = 1\), \(\sigma _{2} = 10\). To calculate \(\xi _{k}\), we consider \(\xi _{0}=0.85\) and then update \(\xi _{k}\) with relation (3.4).

The value of all parameters used for the NARQNLS and NTRLS algorithms are similar to those in [24] and [18], respectively. All algorithms were ended with an iterate satisfying \(\|g(x_{k})\|\le 10^{-5}\) or \(k> 10000\).

The results obtained are reported in Table 2. Then notations listed in the tables are defined as follows: *NI*, the number of iterations; *NF*, the number of function evaluations; and \((-) \), the number of iterations over 10000.

Taking a glance at Table 2, it can be seen that the CNTR algorithm has solved all the test functions, while the other considered algorithms have failed in some cases.

Dolan and MorÃ© [9] proposed a new method to compare the performance of iterative algorithms with a statistical process by displaying performance profiles.

Figures 1, 2, and 3 show the performance profile of CNTR and the other considered algorithms in terms of the number of iterations \((NI)\), number of function evaluations \((NF)\), and CPU time, respectively.

From Figs. 1 and 2, it can be easily seen that CNTR has the most wins among all considered algorithms. More precisely, the CNTR algorithm is the best in terms of the total number of iterations and function evaluations in more than 47% and 68% of the test functions, respectively. In Fig. 3, we observe that in more than 85% of cases, the CNTR algorithm is faster than the other algorithms. Another remarkable factor of these three figures is that the performance profile of the CNTR algorithm grows faster than the other profiles. These observations imply that the CNTR algorithm is more efficient and robust than the other considered algorithms.

## 6 Conclusion

Minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8], we have introduced an appropriate diagonal matrix estimation of the Hessian. The Hessian estimate has been used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. To overcome the adverse effect of monotonicity, we have introduced a new nonmonotone strategy. The global and superlinear convergence of the proposed algorithm has been established under some standard conditions. It has been shown by the Dolan-MorÃ© performance profile that the suggested algorithm is efficient and robust in applying the set of unconstrained optimization test functions.

## Data Availability

No datasets were generated or analysed during the current study.

## References

Ahookhosh, M., Amini, K.: A nonmonotone trust region method with adaptive radius for unconstrained optimization. Comput. Math. Appl.

**60**(3), 411â€“422 (2010)Ahookhosh, M., Amini, K., Peyghami, M.R.: A non-monotone trust region line search method for large scale unconstrained optimization. Appl. Math. Model.

**36**(1), 478â€“487 (2012)Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim.

**10**(1), 147â€“161 (2008)Andrei, N.: A diagonal quasi-Newton updating method for unconstrained optimization. Numer. Algorithms

**81**(4), 575â€“590 (2019)Andrei, N.: Modern Numerical Nonlinear Optimization. Springer Optimization and Its Applications, vol.Â 195. Springer, Berlin (2022)

Byrd, R., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal.

**26**(3), 727â€“739 (1989)Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program.

**127**(2), 245â€“295 (2011)Dennis, J.E., Wolkowicz, H.: Sizing and least-change secant methods. SIAM J. Numer. Anal.

**30**(5), 1291â€“1314 (1993)Dolan, E.D., MorÃ©, J.J.: Benchmarking optimization software with performance profiles. Math. Program.

**91**(2), 201â€“213 (2002)Grippo, L., Lampariello, F., Lucidi, S.: A non-monotone line search technique for Newtonâ€™s method. SIAM J. Numer. Anal.

**23**(4), 707â€“716 (1986)Leong, W.J., Enshaei, S., Kek, S.L.: Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm. Numer. Algorithms

**86**(3), 1225â€“1241 (2021)Li, Y.J., Li, D.H.: Truncated regularized Newton method for convex minimizations. Comput. Optim. Appl.

**43**(1), 119â€“131 (2009)Liu, J., Ma, C.: A non-monotone trust region method with new inexact line search for unconstrained optimization. Numer. Algorithms

**64**(1), 1â€“20 (2013)Nash, S.G.: Preconditioning of truncated-Newton methods. SIAM J. Sci. Stat. Comput.

**6**(3), 599â€“616 (1985)Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput.

**35**(151), 773â€“782 (1980)Polyak, R.A.: Regularized Newton method for unconstrained convex optimization. Math. Program.

**120**(1), 125â€“145 (2009)Powell, M.J.D.: Convergence properties of a class minimization algorithms. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming, vol.Â 2, pp.Â 1â€“25. Academic Press, New York (1975)

Qunyan, Z., Dan, H.: Non-monotone adaptive trust region method with line search based on new diagonal updating. Appl. Numer. Math.

**91**, 75â€“88 (2015)Sun, W., Yuan, Y.X.: Optimization Theory and Methods. Nonlinear Programming. Springer, New York (2006)

Toint, P.L.: An assessment of non-monotone line search techniques for unconstrained optimization. SIAM J. Sci. Comput.

**17**(3), 725â€“739 (1996)Ueda, K., Yamashita, N.: A regularized Newton method without line search for unconstrained optimization. Comput. Optim. Appl.

**59**(1â€“2), 321â€“351 (2014)Wan, Z., Huang, S., Zheng, X.D.: New cautious BFGS algorithm based on modified Armijo-type line search. J. Inequal. Appl.

**2012**(1), 1 (2012)Zhang, H., Ni, Q.: A new regularized quasi-Newton algorithm for unconstrained optimization. Appl. Math. Comput.

**259**, 460â€“469 (2015)Zhang, H., Ni, Q.: A new regularized quasi-Newton method for unconstrained optimization. Optim. Lett.

**12**(1), 1639â€“1658 (2018)Zhang, H.C., Hager, W.W.: A non-monotone line search technique for unconstrained optimization. SIAM J. Optim.

**14**(4), 1043â€“1056 (2004)Zhou, W., Chen, X.: On the convergence of a modified regularized Newton method for convex optimization with singular solutions. J. Comput. Appl. Math.

**239**(1), 179â€“188 (2013)Zhu, M., Nazareth, J.L., Wolkowicz, H.: The quasi-Cauchy relation and diagonal updating. SIAM J. Optim.

**9**(4), 1192â€“1204 (1999)

## Acknowledgements

Not applicable.

## Funding

Not applicable.

## Author information

### Authors and Affiliations

### Contributions

The authors confirm contribution to the manuscript as follows: study conception and design: Ali Ashrafi convergence analysis: Seyed Hamzeh Mirzaei; performing numerical tests and interpretation of results:Seyed Hamzeh Mirzaei; draft manuscript preparation: Seyed Hamzeh Mirzaei and Ali. Ashrafi; All authors reviewed the results and approved the final version of the manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

### Publisherâ€™s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Mirzaei, S.H., Ashrafi, A. Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method.
*J Inequal Appl* **2024**, 90 (2024). https://doi.org/10.1186/s13660-024-03161-x

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s13660-024-03161-x