- Research
- Open access
- Published:
Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method
Journal of Inequalities and Applications volume 2024, Article number: 90 (2024)
Abstract
In this paper, a new appropriate diagonal matrix estimation of the Hessian is introduced by minimizing the Byrd and Nocedal function subject to the weak secant equation. The Hessian estimate is used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. Moreover, to counteract the adverse effect of monotonicity, we introduce a new nonmonotone strategy. The global and superlinear convergence of the suggested algorithm is established under some standard conditions. The numerical experiments on unconstrained optimization test functions show that the new algorithm is efficient and robust.
1 Introduction
In this paper, we deal with the following unconstrained optimization problem:
where \(f:\mathbb{R}^{n} \to \mathbb{R}\) is a twice continuously differentiable function.
Line search (LS) and trust region (TR) methods are two prominent classes of iterative methods to solve the problem (1.1). The LS method, for a given initial point \(x_{0} \in \mathbb{R}^{n}\), is a procedure that computes a step length \(\alpha _{k}\) in the specific direction \(p_{k}\) and considers a new point as \(x_{k+1} = x_{k} +\alpha _{k} p_{k}\). On the other hand, the TR algorithm computes a trial step \(p_{k}\) which is an approximate solution of the following quadratic subproblem:
in which \(g_{k}=\nabla f(x_{k})\), \(B_{k}\in \mathbb{R}^{n\times n}\) is the exact Hessian \(\nabla ^{2} f(x_{k})\), or a symmetric approximation of it, and \({\Delta }_{k}{\mathrm{>0}}\) is the TR radius. In the rest of the paper, \(\|\cdot \|\) refers to the Euclidean norm. According to [19] (also see [5, Theorem 8.5]), \(p^{*}\) is the exact solution of (1.2) if and only if there exists a \(\lambda \ge 0\) such that
and \((B_{k} + \lambda I)\) is positive semidefinite.
The regularized Newton method (RNM) is another efficient approach for solving the problem (1.1) and has good convergence properties; see [7, 16, 21, 23, 24, 26]. At each iteration of the RNM, the trial step \(p_{k}\) is obtained by approximately minimizing the following unconstrained quadratic function:
where \(f_{k}=f(x_{k})\) and \(\lambda _{k}\) is called the regularized parameter. Here we define
It is worth noting that the update rule of \(\Delta _{k}\) is similar to the TR radius. At each iteration, the RNM method obtains the trial step \(p_{k}\) by solving the following regularized Newton equation:
where I is the identity matrix and \((B_{k}+\lambda _{k} I)\) is positive semidefinite. Therefore, \(p_{k}\) is well defined. If \((B_{k}+\lambda _{k} I)\) is positive definite, \(p_{k}\) is unique. We can conclude that \(p_{k}\) solves (1.5) if and only if it is the global minimizer of the unconstrained quadratic function (1.3).
If we let \(\Delta _{k}= \|p_{k}\| = \| - (B_{k} + \lambda _{k} I)^{-1} g_{k}\|\), then it can be verified (see [19, Theorem 6.1.2]) that \(p_{k}\) is also a solution of the TR subproblem (1.2).
By the famous result given by Powell in [17] (also see [19, Lemma 6.1.3]), we know that
The preceding inequality and (1.3) indicate that
where \(\gamma \in (0,1)\) is a constant.
Generally, solving the TR subproblem is more expensive than the RNM subproblem. In the RNM, only one equation (1.5) is solved at each iteration. Hence, the computational cost of obtaining an RNM step is much lower than solving a TR subproblem [24].
The most common update formula for \(B_{k}\) is the BFGS update formula. Numerically, this method needs \(O(n^{2})\) storage, which makes it unsuitable for large-scale problems. The application of quasi-Newton methods for solving large-scale unconstrained optimization problems has been extended by limited-memory quasi-Newton methods [15] and truncated Newton methods [12, 14]. However, the implementation of these methods for their practical usage is very sophisticated, and the associated software is quite complex [3]. Therefore, researchers have considered an alternative approach for the matrix \(B_{k}\), in which a diagonal matrix \(B_{k} = \mathrm{diag}(b^{(1)} _{k} , b^{(2)}_{k} ,\dots , b^{(n)}_{k} )\) is used to approximate the Hessian matrix [4, 11, 18, 27]. Observe that in this method, only \(O(n)\) storage is required to store \(B_{k}\) [4].
This paper first introduces an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8]. Subsequently, a new nonmonotone strategy to overcome the adverse effect of monotonically is introduced. The estimation of the Hessian is used to correct the framework of a new nonmonotone TR algorithm with the regularized quasi-Newton method. The suggested algorithm exploits a stronger nonmonotone strategy far from the solution and a weaker one close to the solution. We prove that the new algorithm is globally and superlinearly convergent.
In the next section, an appropriate diagonal matrix estimation of the Hessian is derived. In Sect. 3, the new nonmonotone strategy and the structure of the suggested algorithm are explained. Section 4 is associated with the convergence analysis of the new algorithm. In Sect. 5, some numerical experiments on a set of unconstrained optimization test problems are examined. The conclusions are given in Sect. 6.
2 Derivation of new diagonal updating
In the quasi-Newton method framework, the Hessian approximation matrix \(B_{k+1}\) is usually required to satisfy the secant equation
where \(s_{k}=x_{k+1}-x_{k}\) and \(y_{k}=g_{k+1}-g_{k}\). To find an appropriate diagonal matrix estimation of the Hessian in the sense of
we assume that \(B_{k}\) is positive definite, and \(s^{T}_{k}y_{k} > 0\) for all k. Since it is difficult for a diagonal matrix to satisfy the known secant equation (2.1), we will consider that \(B_{k+1}\) satisfies the weak secant equation of Dennis and Wolkowicz [8], namely
The motivation for using the weak secant equation (2.2) can be seen in [4]. Byrd and Nocedal [6] introduced the function
defined on positive definite matrices, where \(\ln (\cdot )\) denotes the natural logarithm. This is an elegant and efficient tool for analyzing the global properties of quasi-Newton methods. We will introduce an appropriate diagonal matrix estimation of the Hessian by minimizing the Byrd and Nocedal [6] function subject to the weak secant equation (2.2) as follows:
subject to
To achieve a new diagonal updating formula, we give the following penalized version of (2.3) and (2.4):
Now, having in mind that \(\mathrm{tr}(B_{k+1}) = b^{1}_{ k+1} +\cdots + b^{n} _{k+1}\) and \(\det (B_{k+1}) = b^{1} _{k+1} \cdots b^{n}_{ k+1}\), the minimization problem (2.5) becomes
where \(s^{i}_{k}, i = 1, \dots , n\), are the components of vector \(s_{k}\).
The required solution of (2.3) and (2.4) is a stationary point of the penalized function. Hence, from (2.6), we have
Therefore, using (2.7), the elements of the diagonal matrix \(B_{k+1}\) can be expressed as
which are positive and well defined for \(s^{i}_{k}\neq 0\). Since we have \(s^{T}_{k}y_{k} > 0\) for all k, to ensure positiveness as well as uniformly boundedness of \(b^{i}_{k+1}\) given by (2.8) in general situations, we set
where Ï… is a small positive constant. Therefore, our Hessian approximation can be given by
A crucial problem is choosing the bounds \(L_{k}\) and \(U_{k}\). Here, we introduce an adaptive strategy to determine them. Let us begin by considering the curvature of \(f (x)\) in direction \(s_{k}\), which is represented by
in which \(\bar{H}_{k} = \int _{0}^{1} \nabla ^{2} f (x_{k} +ts_{k})\,dt\) is the average Hessian matrix along \(s_{k}\). Since it is not practical to compute the eigenvalue of the Hessian matrix in each iteration, we can estimate its size based on the scalar
Now \(L_{k}\) and \(U_{k}\) in (2.9) can be chosen according to the value of \(t_{k}\) as follows:
where \(0 \le \sigma _{1}(t_{k}) \le 1\), \(\sigma _{2}(t_{k})\ge 1\), \(0< \underline{L} < 1\), and \(\bar{U} > 1\). Obviously, the values of two bounds can be adjusted by \(\sigma _{1}\) and \(\sigma _{2}\), whose values depend on \(t_{k}\). According to the relation (2.9), there exist two positive constants m and M such that
3 Nonmonotone strategy and new algorithm
Grippo et al. [10] found that monotonically decreasing objective function values in the classical iterative schemes for solving (1.1) may reduce the convergence speed of the TR method, especially in the presence of narrow curved valleys. Also, see [20]. As a remedy, scholars put their best efforts into developing nonmonotone strategies that guarantee global convergence [1, 2, 18, 25]. The pioneering nonmonotone LS method was introduced by Grippo et al. [10] as follows:
in which \(\sigma \in (0, 1)\) is a constant,
\(\phi (0) = 0\), \(0 \le \phi (k) \le \min \{\phi (k -1) + 1, N\}\) for all \(k\ge 1\), and N is a nonnegative integer. Despite the good advantages of this strategy, Zhang and Hager [25] found that this method suffers from various weaknesses. Therefore, a nonmonotone strategy based on the weighted average of previous consecutive iterations was proposed by them. Moreover, using an adaptive convex combination of \(f_{l(k)}\) and \(f_{k}\), Amini et al. [2] put an effective substitution in (3.1).
To counteract the adverse effect of monotonicity, here we introduce the following hybrid nonmonotone LS condition:
where \(\delta \in (0,1)\) is a constant and
with \(\xi _{k}\in [0,1]\). As we see, the definition of mean values \(D_{k}\) implies that each \(D_{k}\) is a convex combination of the \(f_{l(k)}\) and \(f_{k}\). For given \(\xi _{0}\in [0,1]\), to calculate \(\xi _{k}\) we employ the following update formula:
The new nonmonotone LS is performed in a backtracking scheme. That is, the step length \(\alpha _{k}\) is the largest member of \(\{\rho ^{j} \beta _{k}\}_{j \ge 0}\) with \(\rho \in (0,1)\) and \(\beta _{k}>0 \) which satisfies inequality (3.2) [2, 19]. Similar to [13], we set \(\beta _{k}=-{g^{T}_{k} p_{k}}/{\pi _{k}\|p_{k}\|^{2}}\) such that \(\pi _{k}=\|y_{k}\|/\|p_{k}\|\).
Let \(p_{k}\) be the solution of (1.3) in which \(B_{k+1}\) is a diagonal matrix. To determine whether a trial step will be accepted, we compute \(\hat{r}_{k}\) as the ratio between of \(f(x)\) and the model function \(\psi _{k}(p)\) by the following relation:
where \(D_{k}\) is computed by (3.3).
The new TR ratio implies that the suggested algorithm benefits from the best convergence results with a stronger nonmonotone strategy far from the solution and a weaker one close to the solution; see [25].
Now, we can present the framework of the new algorithm as follows (see Algorithm 1).
4 Convergence analysis
In this section, we examine the convergence properties of the suggested algorithm. To this end, the following standard assumption is needed [18].
Assumption 4.1
The level set \(\varGamma (x_{0}) = \{x\vert f (x)\le f (x_{0})\}\), satisfies \(\varGamma (x_{0}) \subset \Im \), where â„‘ is a closed and bounded set of \(\mathbb{R}^{n}\).
Remark 1
Let \(f(x)\) be a twice continuously differentiable function. Therefore, Assumption 4.1 implies that there exists a constant \(M_{1} > 0\) such that
Therefore, using the mean value theorem, one can conclude that
which means that \(g(x)\) is Lipschitz continuous in â„‘.
To establish global convergence of the iterative scheme \(x_{k+1} = x_{k} + \alpha _{k} p_{k}\), with the backtracking LS satisfying (3.3), we assume that Assumption 4.1 holds and the direction \(p_{k}\) satisfies the following sufficient descent conditions:
where \(a_{1}\) and \(a_{2}\) are two positive real-valued constants. For convenience in the discussion, we consider two index sets as follows:
Lemma 4.1
Suppose that the sequence \(\{B_{k}\}\) is generated by Algorithm 1. Then for any k, \(B_{k}\) is positive definite.
Proof
According to definition of \(B_{k}=\mathrm{diag}(\bar{b}^{1}_{k},\bar{b}^{2}_{k},\dots ,\bar{b}^{n}_{k})\), and relation (2.9), this is obvious. □
Lemma 4.2
Suppose that sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then we have
Proof
Using the Taylor expansion with (2.11) and Assumption 4.1, we get
Hence, the proof is complete. □
Lemma 4.3
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then for all \(k\in \mathbb{N}\cup \{0\}\), we have \(x_{k} \in \Gamma (x_{0})\).
Proof
We consider two cases.
Case 1. If \(k\in \mathcal{I}\), then from (1.6) and (3.5), we can write
which shows that \(f_{k+1}\le D_{k}\), for all \(k \in \mathcal{I}\).
Case 2. If \(k\in \mathcal{J}\), then the trial step is rejected and LS must be performed. Through (1.7) we know that \(g ^{T}_{k} p_{k} \le 0\) for all k. Therefore, from this inequality along with (3.2), we conclude
Hence, we have \(f_{k+1}\le D_{k}\) for all \(k \in \mathcal{J}\). In addition, using the definition of \(f_{l(k)}\) and (3.3), we have
From (4.3) and (4.4) along with (4.5), we have \(f_{k+1}\le D_{k}\le f_{l(k)}\le f_{0} \) for all \(k\in \mathbb{N}\cup \{0\}\). Therefore, the sequence \(\{x_{k}\}\) is contained in \(\Gamma (x_{0})\). □
Lemma 4.4
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then the sequence \(\{f_{l(k)}\}\) is convergent.
Proof
From the definition of \(f_{l(k+1)}\) and Lemma 4.3, we have
Thus, \(\{f_{l(k)}\}\) is a nonincreasing sequence. Also, the boundedness of \(\{f_{k}\}\) leads to a lower bound. Therefore, the sequence \(\{f_{l(k)}\}\) is convergent. □
Lemma 4.5
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then we have
Proof
From definition of \(f_{l(k+1)}\), we have \(f_{k+1}\le f_{l(k+1)}\), for all \(k\in \mathbb{N}\). Thus, according to (3.3), we can write
This completes the proof of the lemma. □
Lemma 4.6
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then Step 4 of the algorithm is well defined.
Proof
First, suppose by contradiction that there exists \(k \in \mathcal{J}\) such that
Using Taylor expansion and Lemma 4.5, we obtain
for some \(\zeta _{k} \in (x_{k}, x_{k}+\alpha _{k}p_{k})\). Therefore, using (2.11) and Assumption 4.1, we can write
If \(\alpha _{k}\to 0\), then we get
Due to the fact that \(\delta \in (0, 1)\), inequality (4.6) leads us to \(g^{T}_{k} p_{k}\ge 0\), which contradicts (1.7). So, Step 4 in Algorithm 3.1 is well defined. □
Lemma 4.7
Assume that the sequence \(\{x_{k} \}\) is generated by Algorithm 1. Then for all \(k\in \mathcal{J}\), the step length \(\alpha _{k}\) satisfies
Proof
Assume \(\alpha =\frac{\alpha _{k}}{\rho} \). It follows from Step 4 of Algorithm 1, that
By Taylor’s expansion, we have
where \(\zeta _{k} \in (x_{k}, x_{k}+\alpha p_{k})\). From (4.1), (4.7), (4.8), and Lemma 4.5, we obtain
therefore,
The combination of (4.2) and (4.9) implies that
which completes the proof of the lemma. □
Lemma 4.8
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then we have
Proof
We consider the following two cases:
Case 1. \(k \in \mathcal{I}\). It follows from (3.5) and Lemma 4.5 that
Now, similarly as in the proof of Theorem 3.2 in [1], we can deduce that (4.10) holds.
Case 2. \(k \in \mathcal{J}\). For \(k>N\), using (3.2) and Lemma 4.3, we can write
So, from Lemma 4.4, since \(\alpha _{l(k)-1} g^{T}_{l(k)-1} p_{l(k)-1}<0 \), we can conclude that
Now, from (1.7) along with (4.2), we have
Thus, using (4.11) and (4.12), it follows that
The remainder of the proof can be found in [10] and here is omitted □
Corollary 4.1
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. Then we have
Proof
From Lemmas 4.3 and 4.5, we have \(f_{k}\le D_{k}\le f_{l(k)} \), This completes the proof by using Lemma 4.8. □
Lemma 4.9
Suppose that the sequence \(\{x_{k}\}\) is generated by Algorithm 1. If the sequence \(\{x_{k}\}\) does not converge to a stationary point, i.e., there exists a constant \(\varepsilon >0\) such that
holds for all \(k\in \mathbb{N}\), then
holds for all \(k\in \mathbb{N}\).
Proof
We consider two cases as follows:
Case 1. \(k\in \mathcal{I}\). From (1.6), (2.11), (3.5), and (4.13), we have
Case 2. \(k\in \mathcal{J}\). Similar to Case 2 in the proof of Lemma 4.3, it follows that
Now from (1.7), (2.11), (4.13), and Lemma 4.7, we get
Let \(\varphi =\min \lbrace \mu _{0}\varepsilon \gamma ,\gamma \varepsilon \delta \frac{(1-\delta )\rho a_{1} }{a^{2}_{2}(\delta M+M_{1})} \rbrace \). Combining (4.15) and (4.16), we conclude that relation (4.14) is valid, for all \(k \in \mathbb{N}\). □
In this situation, it is possible to prove the following convergence theorems for Algorithm 1.
Theorem 4.1
Algorithm 1 either terminates in finitely many iterations, or generates an infinite sequence \(\{x_{k}\}\) which satisfies
Proof
If Algorithm 1 terminates in finitely many iterations, the theorem is true. If (4.17) is not true, then there exists a constant \(\varepsilon > 0\) such that (4.13) holds.
Let \(S = \lbrace k:\hat{r}_{k}\ge \mu _{0} \rbrace \). We prove that \(\lambda _{k}\to \infty \) and \(\Delta _{k}\to \infty \), as \(k\to \infty \). From
it follows that \(\lambda _{k}\to \infty \Longleftrightarrow \Delta _{k}\to \infty \).
We consider the following cases:
Case 1. If S is a finite set, then there exists some \(\bar{k} > 0\) such that \(\hat{r}_{k}<\mu _{0}\) holds for all \(k> \bar{k}\). Thus we have that \(\Delta _{k+1}\ge c_{2} \Delta _{k}\) holds for all \(k > \bar{k}\). Since \(c_{2} > 1\), we conclude that
Case 2. If S is an infinite set, then from Lemma 4.9, we have that
holds for all \(k\in S\). Thanks to Corollary 4.1, we get
which implies that
From the above equality, together with relation (1.5), it follows that
Case 3. Suppose that \(S^{c}\) denotes the complementary set of S and \(S ^{c}\) is an infinite set. Now we only need to prove that \(\lambda _{k}\to \infty \), as \(k\to \infty \) and \(k \in S^{c}\). Let \(I^{*}= \lbrace k_{i}:k_{i} - 1\in S \text{ and } k_{i}\in S^{c} \rbrace \), then \(\{k_{i} -1\}\) is an infinite subset of S. Using (4.20), we have \(\Delta _{k_{i} -1} \to \infty \) as \(i\to \infty \). From \(k_{i} - 1\in S\), we conclude that \(\hat{r}_{k_{i} -1}\ge \mu _{0}\) and \(\Delta _{k_{i}}=c_{1}\Delta _{k_{i} -1}\) or \(\Delta _{k_{i}}=\Delta _{k_{i} -1}\) hold. Since \(0< c_{1}<1\), \(\Delta _{k_{i}}\ge c_{1}\Delta _{k_{i} -1}\) holds for all \(k_{i}\in I^{*}\).
Hence, we have
For any \(k\notin S\), there exists an index \(k_{i}\) such that \(k_{i} \le k\) and all iterations between \(k_{i}\) and k are unsuccessful. According to the construction of Algorithm 1, we can write
thus it follows from (4.21) and (4.22) that
Combing (4.20) and (4.23), we have that
Now from (1.5) and (4.24), we can write
for all \(k\in \mathbb{N}\). Thanks to (4.25), it follows that \(p_{k} \to 0\). Therefore, using (1.6), (2.11), and Lemma 4.2 as \(k \to \infty \), we get
from which we can deduce that
for all sufficiently large k. The construction of Algorithm 1 and (4.26) shows that there exists a positive constant \(\Delta ^{*}\) such that \(\Delta _{k}\le \Delta ^{*} \) holds for all sufficiently large k, which contradicts (4.24). The proof is completed. □
Theorem 4.2
Suppose the infinite sequence \(\{x_{k}\}\), convergent to \({x^{*}}\), is generated by Algorithm 1. In addition, assume that \(\nabla ^{2} f(x^{*})\) is positive definite. If the condition
holds, then the sequence \(\{x_{k}\}\) converges to \(x^{*}\) superlinearly.
Proof
Let \(p_{k}\) be the exact solution of (1.3) which satisfies
We show that \(\hat{r}_{k} \ge \mu _{1}\), for k sufficiently large and \(\Delta _{k}\to \infty \) as \(k\to \infty \). First, we define
and then we prove that \(r_{k}\ge \mu _{1}\). From (4.29) it follows that
thus we can express
From the direct computation, we obtain
where the last equality is obtained from (4.28). Therefore, we have that
By Taylor expansion, we can write
where \(\zeta _{k} \in (x_{k}, x_{k+1})\). From (1.3) and (4.32), we get
Since \(\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})\), using (4.27) and (4.33), we obtain
It follows from (4.31), (4.34), and \(0< \mu _{1}<1\) that
therefore, from (4.30), we have \(r_{k}\ge \mu _{1}\). Now using (3.5) and Lemma 4.5, we can write
for k sufficiently large. Using Step 4 of Algorithm 1, we have \(\Delta _{k+1}=c_{1}\Delta _{k}\), which implies that \(\lim_{k\to \infty}\Delta _{k} \to 0 \). Therefore, from (1.4), we get
By the Taylor expansion, we can write
where \(\zeta _{k} \in (x_{k},x_{k+1})\). So, from (4.28) and (4.36) for k sufficiently large, we have
Thus we can write
The right-hand side of (4.37) converges to 0 due to the fact that \(\nabla ^{2} f(\zeta _{k})\to \nabla ^{2} f(x^{*})\), and by using (4.27) and (4.35). Hence, we deduce that
Since \(f (x)\) is twice continuously differentiable, along with Assumption 4.1, we conclude that there exists \(\tau > 0\) such that
From (4.39), it follows that
In view of (4.38), from (4.40) we have \(\lim_{k\to \infty} \frac{ \| x_{k+1}-x^{*}\|}{\| x_{k}-x^{*}\|}=0 \). Therefore, the convergence rate \(\{x_{k}\}\) is superlinear. □
5 Numerical experiments
In this section, we report the performance of the proposed algorithm, CNTR, as well as some comparisons of the CNTR algorithm with the NARQNLS algorithm of Zhang and Ni [24] and the NTRLS algorithm of Qunyan and Dan [18].
The experiments have been performed on a set of unconstrained test functions. All test functions are chosen from Andrei [3], which are listed in Table 1.
We performed numerical calculations in MATLAB R2020b (9.9.0.1467703) programming environment. The codes were run on a PC processor (Intel (R) CORE (TM) i7-1355U-2.8 GHz, RAM 16 GB).
In practical implementations for CNTR algorithm, we set \(\mu _{0}=0.1\), \(\mu _{1}=0.8\), \(c_{1}=0.25\), \(c_{2}=2\), \(\underline{L}=0.001\), \(\bar{U}=1000\), \(\upsilon =10^{-5}\), \(\Delta _{0}=0.1\), \(\rho =0.5\), \(\ \pi _{0}=0.4\), \(N=10\). If \(t_{k}< 0 \), then \(\sigma _{1} = 0\), \(\sigma _{2} = 1\); if \(0\le t_{k} \le 10 \), then \(\sigma _{1} = 0.5\), \(\sigma _{2} = 5\); if \(t_{k}>10 \), then \(\sigma _{1} = 1\), \(\sigma _{2} = 10\). To calculate \(\xi _{k}\), we consider \(\xi _{0}=0.85\) and then update \(\xi _{k}\) with relation (3.4).
The value of all parameters used for the NARQNLS and NTRLS algorithms are similar to those in [24] and [18], respectively. All algorithms were ended with an iterate satisfying \(\|g(x_{k})\|\le 10^{-5}\) or \(k> 10000\).
The results obtained are reported in Table 2. Then notations listed in the tables are defined as follows: NI, the number of iterations; NF, the number of function evaluations; and \((-) \), the number of iterations over 10000.
Taking a glance at Table 2, it can be seen that the CNTR algorithm has solved all the test functions, while the other considered algorithms have failed in some cases.
Dolan and Moré [9] proposed a new method to compare the performance of iterative algorithms with a statistical process by displaying performance profiles.
Figures 1, 2, and 3 show the performance profile of CNTR and the other considered algorithms in terms of the number of iterations \((NI)\), number of function evaluations \((NF)\), and CPU time, respectively.
From Figs. 1 and 2, it can be easily seen that CNTR has the most wins among all considered algorithms. More precisely, the CNTR algorithm is the best in terms of the total number of iterations and function evaluations in more than 47% and 68% of the test functions, respectively. In Fig. 3, we observe that in more than 85% of cases, the CNTR algorithm is faster than the other algorithms. Another remarkable factor of these three figures is that the performance profile of the CNTR algorithm grows faster than the other profiles. These observations imply that the CNTR algorithm is more efficient and robust than the other considered algorithms.
6 Conclusion
Minimizing the Byrd and Nocedal [6] function subject to the weak secant equation of Dennis and Wolkowicz [8], we have introduced an appropriate diagonal matrix estimation of the Hessian. The Hessian estimate has been used to correct the framework of a nonmonotone trust region algorithm with the regularized quasi-Newton method. To overcome the adverse effect of monotonicity, we have introduced a new nonmonotone strategy. The global and superlinear convergence of the proposed algorithm has been established under some standard conditions. It has been shown by the Dolan-Moré performance profile that the suggested algorithm is efficient and robust in applying the set of unconstrained optimization test functions.
Data Availability
No datasets were generated or analysed during the current study.
References
Ahookhosh, M., Amini, K.: A nonmonotone trust region method with adaptive radius for unconstrained optimization. Comput. Math. Appl. 60(3), 411–422 (2010)
Ahookhosh, M., Amini, K., Peyghami, M.R.: A non-monotone trust region line search method for large scale unconstrained optimization. Appl. Math. Model. 36(1), 478–487 (2012)
Andrei, N.: An unconstrained optimization test functions collection. Adv. Model. Optim. 10(1), 147–161 (2008)
Andrei, N.: A diagonal quasi-Newton updating method for unconstrained optimization. Numer. Algorithms 81(4), 575–590 (2019)
Andrei, N.: Modern Numerical Nonlinear Optimization. Springer Optimization and Its Applications, vol. 195. Springer, Berlin (2022)
Byrd, R., Nocedal, J.: A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal. 26(3), 727–739 (1989)
Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
Dennis, J.E., Wolkowicz, H.: Sizing and least-change secant methods. SIAM J. Numer. Anal. 30(5), 1291–1314 (1993)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Grippo, L., Lampariello, F., Lucidi, S.: A non-monotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23(4), 707–716 (1986)
Leong, W.J., Enshaei, S., Kek, S.L.: Diagonal quasi-Newton methods via least change updating principle with weighted Frobenius norm. Numer. Algorithms 86(3), 1225–1241 (2021)
Li, Y.J., Li, D.H.: Truncated regularized Newton method for convex minimizations. Comput. Optim. Appl. 43(1), 119–131 (2009)
Liu, J., Ma, C.: A non-monotone trust region method with new inexact line search for unconstrained optimization. Numer. Algorithms 64(1), 1–20 (2013)
Nash, S.G.: Preconditioning of truncated-Newton methods. SIAM J. Sci. Stat. Comput. 6(3), 599–616 (1985)
Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980)
Polyak, R.A.: Regularized Newton method for unconstrained convex optimization. Math. Program. 120(1), 125–145 (2009)
Powell, M.J.D.: Convergence properties of a class minimization algorithms. In: Mangasarian, O.L., Meyer, R.R., Robinson, S.M. (eds.) Nonlinear Programming, vol. 2, pp. 1–25. Academic Press, New York (1975)
Qunyan, Z., Dan, H.: Non-monotone adaptive trust region method with line search based on new diagonal updating. Appl. Numer. Math. 91, 75–88 (2015)
Sun, W., Yuan, Y.X.: Optimization Theory and Methods. Nonlinear Programming. Springer, New York (2006)
Toint, P.L.: An assessment of non-monotone line search techniques for unconstrained optimization. SIAM J. Sci. Comput. 17(3), 725–739 (1996)
Ueda, K., Yamashita, N.: A regularized Newton method without line search for unconstrained optimization. Comput. Optim. Appl. 59(1–2), 321–351 (2014)
Wan, Z., Huang, S., Zheng, X.D.: New cautious BFGS algorithm based on modified Armijo-type line search. J. Inequal. Appl. 2012(1), 1 (2012)
Zhang, H., Ni, Q.: A new regularized quasi-Newton algorithm for unconstrained optimization. Appl. Math. Comput. 259, 460–469 (2015)
Zhang, H., Ni, Q.: A new regularized quasi-Newton method for unconstrained optimization. Optim. Lett. 12(1), 1639–1658 (2018)
Zhang, H.C., Hager, W.W.: A non-monotone line search technique for unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Zhou, W., Chen, X.: On the convergence of a modified regularized Newton method for convex optimization with singular solutions. J. Comput. Appl. Math. 239(1), 179–188 (2013)
Zhu, M., Nazareth, J.L., Wolkowicz, H.: The quasi-Cauchy relation and diagonal updating. SIAM J. Optim. 9(4), 1192–1204 (1999)
Acknowledgements
Not applicable.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the manuscript as follows: study conception and design: Ali Ashrafi convergence analysis: Seyed Hamzeh Mirzaei; performing numerical tests and interpretation of results:Seyed Hamzeh Mirzaei; draft manuscript preparation: Seyed Hamzeh Mirzaei and Ali. Ashrafi; All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mirzaei, S.H., Ashrafi, A. Correction of nonmonotone trust region algorithm based on a modified diagonal regularized quasi-Newton method. J Inequal Appl 2024, 90 (2024). https://doi.org/10.1186/s13660-024-03161-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-024-03161-x