Skip to main content

Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

A simple approximated solution method for solving fractional trust region subproblems of nonlinearly equality constrained optimization

  • 91 Accesses

Abstract

In this paper, a fractional model is used to solve nonlinearly constrained optimization problems. In order to solve the fractional trust region subproblems simply, we propose an approximated solution method by cyclically fixing the fractional coefficient part of the approximate function. The global convergence of the fractional trust region method is proved, and the numerical results show that the new algorithm is effective and stable.

Introduction

In this paper, we consider the nonlinear equality constrained optimization problem

$$\begin{aligned}& \min_{x\in R^{n}} f(x), \end{aligned}$$
(1.1)
$$\begin{aligned}& \text{s.t.}\quad C(x)=0, \end{aligned}$$
(1.2)

where \(C(x)=(c_{1}(x), \ldots , c_{m}(x))^{T}\) (\(m\leq n\)) and \(f(x)\), \(c_{i}(x)\) (\(i=1,\ldots ,m\)) are continuously differentiable. The Lagrangian function for problem (1.1)–(1.2) is defined as follows:

$$\begin{aligned} L(x,\lambda ) = f(x) + \sum_{i=1}^{m} \lambda _{i} c_{i}(x), \end{aligned}$$
(1.3)

where \(\lambda _{i}\) for \(i=1,\ldots ,m\) are Lagrange multipliers. Problem (1.1)–(1.2) has been studied by many researchers, including Han [1], Powell [2], Yuan and Sun [3], Powell and Yuan [4], etc.

There are many efficient methods to solve problem (1.1)–(1.2), and the trust region method is a very effective method (see [413]). In addition, the book of Conn, Gould, and Toint [14] is an excellent and comprehensive one on trust region methods. However, most of these methods use the quadratic model to approximate \(f(x)\). The sequential quadratic programming method for (1.1)–(1.2) computes a search direction by minimizing a quadratic model of the Lagrangian subject to the linearized constraints. That is, at the kth iteration, the following subproblem

$$\begin{aligned}& \min_{s \in R^{n}} \varrho _{k} (s) = g_{k}^{T}s + \frac{1}{2} s ^{T}B_{k}s, \end{aligned}$$
(1.4)
$$\begin{aligned}& \text{s.t.}\quad A_{k}^{T}s + C_{k} =0 \end{aligned}$$
(1.5)

is solved to obtain a search direction \(s_{k}\), where \(x_{k}\) is the current iterate point, \(g_{k}=\nabla f(x_{k})\), \(B_{k}\) is symmetric and an approximation to the Hessian \(\nabla _{xx} L(x_{k}, \lambda _{k})\) of the Lagrangian of problem (1.1)–(1.2), \(A_{k}=[\nabla c_{1}(x _{k}),\ldots ,\nabla c_{m}(x_{k})]\) and \(C_{k}=C(x_{k})\). The constraint gradients \(\nabla c_{i}(x_{k})\) are assumed to be linearly independent for all \(x_{k}\). However, if the objective function possesses high nonlinear property and the iterative point is far away from the minimum, the quadratic model could not approximate the original problem very well, which may lead to iteration proceeding slowly.

In 1980, Davidon [15] first proposed the collinear scaling of variables and conic model method for unconstrained optimization. The conic model is

$$\begin{aligned} \tilde{\phi }_{k} (s) = f_{k}+ \frac{g_{k}^{T}s}{1 - a_{k}^{T}s} + \frac{1}{2}\frac{s^{T}B_{k}s}{(1 - a_{k}^{T}s)^{2}}, \end{aligned}$$
(1.6)

where \(a_{k}\in R^{n}\) is a horizontal vector. Then, Sorensen [16] published detailed results on a class of conic model methods and proved that a particular member of this class has the Q-superlinear convergence. Many other scholars have also studied the trust region algorithm of the conic model (see [17, 18]).

The trial step of a trust region algorithm is usually obtained by solving a trust region subproblem. In our trust region subproblem, the trust region bound constraint is

$$\begin{aligned} \Vert s \Vert \le \Delta _{k}, \end{aligned}$$
(1.7)

where \(\Delta _{k}\) is the trust region radius at the kth iteration. It is easy to see that there is a possibility that the linearized constraints (1.5) may have no solutions in the trust region (1.7). To overcome this difficulty, we use a relaxed version of the linearized constraint, which was proposed by Byrd, Schnabel, and Schultz in [19]. In [20], Sun also used this relaxed version of the linearized constraint and proposed a conic trust region method for nonlinearly constrained optimization. That is, at the kth iteration, the trial step \(s_{k}\) is computed by solving the following conic model trust region subproblem:

$$\begin{aligned}& \min_{s \in R^{n}} \phi _{k} (s) = \frac{g_{k}^{T}s}{1 - a_{k} ^{T}s} + \frac{1}{2}\frac{s^{T}B_{k}s}{(1 - a_{k}^{T}s)^{2}}, \end{aligned}$$
(1.8)
$$\begin{aligned}& \text{s.t.}\quad A_{k}^{T}s + \theta _{k} C_{k} =0, \end{aligned}$$
(1.9)
$$\begin{aligned}& \hphantom{\text{s.t.}\quad} \Vert s \Vert \le \Delta _{k}, \end{aligned}$$
(1.10)

where \(\|\cdot \|\) refers to the Euclidean norm and \(\theta _{k}\) is a relaxation parameter. \(\theta _{k}\in (0,1]\) is chosen such that the feasible set of (1.9) and (1.10) is not empty. Geometrically speaking, the role of \(\theta _{k}\) is to compress the feasible area of each constraints of (1.5) to the direction of origin (see [8, 21]).

The above conic model (1.8) has only one parameter \(a_{k}\) and less degree of freedom, which affects the effect of the model approaching the objective function. Therefore, we consider selecting a fractional model with more parameters, which can make full use of the function and gradient information in the previous iteration and can approximate the original function well, thus obtaining a new method for solving the optimization problem. In [22], Zhu considered the unconstrained optimization problem and proposed a fractional model

$$\begin{aligned} \psi _{k} (s) = h_{k}(s) g^{T}_{k}s + \frac{1}{2}h_{k}^{2}(s) s^{T}B _{k}s, \end{aligned}$$
(1.11)

where

$$\begin{aligned} h_{k}(s) = \frac{(1 + c_{k}^{T}s)}{(1 - a_{k}^{T}s)(1 - b_{k}^{T}s)}, \end{aligned}$$
(1.12)

and horizontal vectors \(a_{k}, b_{k}, c_{k}\in R^{n}\) are bounded. The gradient of \(\psi _{k} (s)\) is

$$\begin{aligned} \nabla \psi _{k} (s)= \frac{1}{v_{s}^{3}} \omega (s)\bigl[v_{s} g_{k} + \bigl(1 + c_{k}^{T} s\bigr)B_{k}s\bigr], \end{aligned}$$
(1.13)

where

$$\begin{aligned}& v_{s}=\bigl(1 - a_{k}^{T}s \bigr) \bigl(1 - b_{k}^{T}s\bigr), \end{aligned}$$
(1.14)
$$\begin{aligned}& \omega (s)=\bigl(1 + c_{k}^{T}s \bigr)v_{s}I + \bigl[c_{k}v_{s} + \bigl(1 + c_{k}^{T}s\bigr) \bigl(a _{k} \bigl(1-b_{k}^{T}s\bigr) + b_{k} \bigl(1- a_{k}^{T}s\bigr)\bigr)\bigr] {s^{T}}. \end{aligned}$$
(1.15)

If \(b_{k}=c_{k}=0\), then \(\psi _{k} (s)\) is reduced to the conic model. If \(a_{k}=b_{k}=c_{k}=0\), then \(\psi _{k}(s)\) is the quadratic model \(\varrho _{k} (s)\). The fractional model is new and it is an extension to conic model \(\phi _{k} (s)\). It has more choice of parameter vectors and can make use of their function and gradient information. Based the new fractional model \(\psi _{k} (s)\), a simplified fractional trust region subproblem

$$\begin{aligned}& \min_{s \in R^{n}} \psi _{k} (s), \end{aligned}$$
(1.16)
$$\begin{aligned}& \text{s.t.}\quad \Vert s \Vert \le \tilde{\Delta }_{k} \end{aligned}$$
(1.17)

is proposed, where

$$\begin{aligned}& \tilde{\Delta }_{k} = \min \biggl\{ {\Delta _{k}, \frac{\epsilon _{1}}{{ \Vert a_{k} \Vert }},\frac{\epsilon _{1}}{{ \Vert b_{k} \Vert }}, \frac{ \epsilon _{1}}{{ \Vert c_{k} \Vert }}} \biggr\} , \end{aligned}$$
(1.18)
$$\begin{aligned}& 0 < \epsilon _{1} < 1, \end{aligned}$$
(1.19)

and the parameters \(a_{k}\), \(b_{k}\), \(c_{k}\) were chosen such that

$$ \Vert a_{k} \Vert \tilde{\Delta }_{k} \leq \epsilon _{1}, \qquad \Vert b_{k} \Vert \tilde{\Delta }_{k} \leq \epsilon _{1}, \qquad \Vert c_{k} \Vert \tilde{\Delta } _{k} \leq \epsilon _{1}. $$
(1.20)

Subproblem (1.16)–(1.17) is solved only in quasi-Newton direction in [22]. Based on this, we study in depth the Newton point and the steepest descent point of the fractional trust region subproblem, so as to construct a simple dogleg method to solve the subproblem (see [23]). Numerical experiments show that the fractional model trust region quasi-Newton algorithm seems to be superior to the conic model trust region algorithm in terms of the number of iterations and the running time as the dimension of the optimization problem increases. For the linear equality constrained optimization problem, the null space technique is used to delete the linear equality constraint, and the fractional trust region method for solving the linear equality constrained optimization problem is proposed (see [24]).

Now we use the fractional model to solve nonlinearly constrained optimization problems. In order to solve problem (1.1)–(1.2), we consider the following fractional trust region subproblem. That is, if the current iteration point is \(x_{k}\), then the trial step \(s_{k}\) is computed by

$$\begin{aligned}& \min_{s \in R^{n}} \psi _{k} (s) , \end{aligned}$$
(1.21)
$$\begin{aligned}& \text{s.t.}\quad A_{k}^{T}s+ \theta _{k} C_{k} =0, \end{aligned}$$
(1.22)
$$\begin{aligned}& \hphantom{\text{s.t.}\quad}\bigl\vert \bigl(1 - a_{k}^{T}s\bigr) \bigl(1 - b_{k}^{T}s\bigr) \bigr\vert \geq \varepsilon _{0}, \end{aligned}$$
(1.23)
$$\begin{aligned}& \hphantom{\text{s.t.}\quad} \Vert s \Vert \le \Delta _{k}, \end{aligned}$$
(1.24)

where \(\varepsilon _{0}\in (0,1)\). The purpose of adding constraint (1.23) is to guarantee that the objective function \(\psi _{k} (s)\) is bounded in the trust region and this also increases the difficulty of calculation.

We know that the exact solution of the trust region subproblem is often difficult to obtain, so many approximate solution methods have been spawned. For example, the trust region method is often combined with dogleg method, conjugate gradient method, inexact line search method, alternate direction search method, and other methods to obtain approximate solutions of trust region subproblems. In this paper, we also consider an approximate solution method for the subproblem of the trust region. First, the null space technique is used to remove the linear equality constraints of the trust region subproblem; then the fractional model of the subproblem is reduced to a quadratic model by cyclically fixing the coefficient part; finally, the problem can be easily solved by searching in the descending direction to obtain an approximate solution to the subproblem.

In the global algorithm, we propose a quasi-Newton trust region method with a fractional model and prove the global convergence of the new algorithm. The numerical experiment shows that the new algorithm is effective and robust, especially for large-scale test problems.

The organization of this paper is as follows. In Sect. 2, the fractional trust region subproblem and an algorithm for solving the subproblem are presented. In Sect. 3, we propose a quasi-Newton method with a fractional model for nonlinearly equality constrained optimization and prove its convergence under some reasonable conditions. Numerical results are given in Sect. 4.

Throughout this paper, we use \(\|\cdot \|\) for the 2-norm.

The algorithm of fractional trust region subproblem

In order to solve (1.21)–(1.24), firstly we consider removing constraint (1.23) by the same process of simplification as in [22]. Therefore, when the parameters \(a_{k}\), \(b _{k}\), \(c_{k}\) satisfy (1.20) where

$$\begin{aligned} 0 < \epsilon _{1}\leq 1-\sqrt{\varepsilon _{0}} < 1, \end{aligned}$$
(2.1)

then subproblem (1.21)–(1.24) can be rewritten as the following reduced subproblem:

$$\begin{aligned}& \min_{s \in R^{n}} \psi _{k} (s), \end{aligned}$$
(2.2)
$$\begin{aligned}& \text{s.t.}\quad A_{k}^{T}s+ \theta _{k} C_{k} =0, \end{aligned}$$
(2.3)
$$\begin{aligned}& \hphantom{\text{s.t.}\quad} \Vert s \Vert \le \tilde{\Delta }_{k}, \end{aligned}$$
(2.4)

where \(\tilde{\Delta }_{k} \) is defined as (1.18).

The null-space technique (see [20, 25, 26]) is an important technique for solving optimization problems with equality constraints. In the following, we show the method to eliminate constraint (1.22). Assume that \(A_{k}\) has full column rank, and there exist an orthogonal matrix \(Q_{k}\) and a nonsingular upper triangular matrix \(R_{k}\) such that

Ak=QkRk=[Qk(1)Qk(2)][Rk(1)0]=Qk(1)Rk(1),
(2.5)

where \(Q_{k}^{(1)}\in R^{n\times m}\), \(Q_{k}^{(2)}\in R^{n\times (n-m)}\), and \(R^{(1)}_{k} \in R^{m \times m}\). Then (1.22) can be rewritten as

$$\begin{aligned} \bigl(R^{(1)}_{k}\bigr)^{T} \bigl(Q_{k}^{(1)}\bigr)^{T} s = -\theta _{k} C_{k}. \end{aligned}$$
(2.6)

Therefore the feasible point for (1.22) can be presented by

$$\begin{aligned} s = \tilde{s}_{k}+ Q_{k}^{(2)} u \end{aligned}$$
(2.7)

for any \(u \in R^{n-m}\), where

$$\begin{aligned} \tilde{s}_{k} = -\theta _{k} Q_{k}^{(1)} \bigl(R^{(1)}_{k}\bigr)^{-T} C_{k}, \end{aligned}$$
(2.8)

and \(Q_{k}^{(2)} u\) lies in the null space of \(A_{k}\). We denote

$$\begin{aligned} l_{k} = - Q_{k}^{(1)} \bigl(R^{(1)}_{k} \bigr)^{-T} C_{k}, \end{aligned}$$
(2.9)

then

$$\begin{aligned} \tilde{s}_{k} = \theta _{k} l_{k}. \end{aligned}$$
(2.10)

In order to ensure that s lies in the trust region, we choose \(\theta _{k}\) such that the norm of \(\tilde{s}_{k}\) is at most \(\iota \tilde{\Delta }_{k}\), where \(\iota \in (0,1)\) is a given constant. That is,

$$\begin{aligned} \Vert \tilde{s}_{k} \Vert = \theta _{k} \Vert l_{k} \Vert \leq \iota \tilde{\Delta } _{k}. \end{aligned}$$
(2.11)

Define

$$\begin{aligned} \theta _{k} = \min \biggl\{ 1, \frac{\iota \tilde{\Delta }_{k}}{ \Vert l _{k} \Vert } \biggr\} ,\qquad \widehat{\Delta }_{k} = \sqrt{ \tilde{\Delta }^{2}_{k}- \Vert \tilde{s} _{k} \Vert ^{2} }. \end{aligned}$$
(2.12)

Then the fractional trust region subproblem (2.2)–(2.4) becomes

$$\begin{aligned}& \min_{u \in R^{n-m}} \tilde{\psi }_{k} (u) = h_{k} \bigl(s(u)\bigr) g^{T}_{k} s(u) + \frac{1}{2} h _{k}^{2} \bigl(s(u)\bigr) s(u)^{T} B_{k} s(u), \end{aligned}$$
(2.13)
$$\begin{aligned}& \text{s.t.}\quad \Vert u \Vert \le \widehat{ \Delta }_{k}, \end{aligned}$$
(2.14)

where \(s(u)=\tilde{s}_{k}+ Q_{k}^{(2)} u\).

Remark 1

\(\| u \| \le \widehat{ \Delta }_{k}\) is equivalent to \(\| s \| \le \tilde{\Delta }_{k}\). It is easy to see that if \(\| u \| \le \widehat{ \Delta }_{k}\), then from (2.7), (2.12), and \(\|Q_{k}^{(2)}\|=1\), we have

$$\begin{aligned} \Vert s \Vert = \sqrt{ \Vert \tilde{s}_{k} \Vert ^{2} + \bigl\Vert Q_{k}^{(2)} u \bigr\Vert ^{2}} \leq \sqrt{ \Vert \tilde{s}_{k} \Vert ^{2}+ \Vert u \Vert ^{2} }\leq \sqrt{ \Vert \tilde{s}_{k} \Vert ^{2} +\widehat{\Delta }^{2}_{k}}= \tilde{\Delta }_{k}. \end{aligned}$$

On the other hand, if \(\| s \| \le \tilde{\Delta }_{k}\), then from (2.5) we have

$$\begin{aligned} \Vert u \Vert = \bigl\Vert Q_{k}^{(2)} u \bigr\Vert = \sqrt{ \Vert s \Vert ^{2}- \Vert \tilde{s}_{k} \Vert ^{2} }\leq \sqrt{\tilde{\Delta }^{2}_{k}- \Vert \tilde{s}_{k} \Vert ^{2} }= \widehat{\Delta }_{k}. \end{aligned}$$

Therefore, if \(u_{\ast }\) is the solution of (2.13)–(2.14), then \(s_{\ast } = \tilde{s}_{k}+ Q_{k} ^{(2)} u_{\ast }\) is the solution of (2.2)–(2.4), whereas if \(s_{\ast }\) is the solution of (2.2)–(2.4), then \(u_{\ast } = (Q_{k}^{(2)})^{T}(s_{\ast }-\tilde{s}_{k})\) is the solution of (2.13)–(2.14).

In order to solve subproblem (2.13)–(2.14) by a simple method, we first fix the fractional part of \(\tilde{\psi }_{k} (u)\) by letting \(u=0\) in \(h_{k} (s(u))\). At the same time, we set \(u= - \tau \nabla \tilde{\psi }_{k} (0)\), where \(\tau >0\), \(\nabla \tilde{\psi }_{k} (0)\neq 0\). By direct computation, we know that \(\nabla \tilde{\psi }_{k} (0)= (Q_{k}^{(2)})^{T} \nabla \psi _{k} ( \tilde{s}_{k})\), where \(\nabla \psi _{k} (s)\) is defined by (1.13). Then from (2.7) we have that (2.13)–(2.14) reduce to the following simplified subproblem:

$$\begin{aligned}& \min_{\tau \in R} \varphi _{k} (\tau ), \end{aligned}$$
(2.15)
$$\begin{aligned}& \text{s.t.}\quad 0\leq \tau \le \tau _{ \Delta }, \end{aligned}$$
(2.16)

where

$$\begin{aligned}& \varphi _{k} (\tau )= h_{k} (\tilde{s}_{k}) g^{T}_{k} s(\tau ) + \frac{1}{2} h_{k}^{2} (\tilde{s}_{k}) s(\tau )^{T} B_{k} s(\tau ) \end{aligned}$$
(2.17)
$$\begin{aligned}& \hphantom{\varphi _{k} (\tau )} = \frac{1}{2} h_{k}( \tilde{s}_{k}) \bigl(a_{\tau }^{(1)} \tau ^{2} + b_{ \tau }^{(1)} \tau + c_{\tau }^{(1)} \bigr), \end{aligned}$$
(2.18)
$$\begin{aligned}& s(\tau )= \tilde{s}_{k}- \tau Q_{k}^{(2)} \bigl(Q_{k}^{(2)}\bigr)^{T}\nabla \psi _{k} (\tilde{s}_{k}), \end{aligned}$$
(2.19)
$$\begin{aligned}& a_{\tau }^{(1)} = h_{k}(\tilde{s}_{k}) \nabla \psi _{k} (\tilde{s}_{k})^{T} Q_{k}^{(2)} \bigl(Q_{k}^{(2)} \bigr)^{T} B_{k} Q_{k}^{(2)} \bigl(Q _{k}^{(2)}\bigr)^{T}\nabla \psi _{k} (\tilde{s}_{k}), \end{aligned}$$
(2.20)
$$\begin{aligned}& b_{\tau }^{(1)} = -2\nabla \psi _{k} ( \tilde{s}_{k})^{T} Q_{k}^{(2)} \bigl(Q _{k}^{(2)}\bigr)^{T}\bigl( g_{k} + h_{k}(\tilde{s}_{k}) B_{k} \tilde{s}_{k}\bigr), \end{aligned}$$
(2.21)
$$\begin{aligned}& c_{\tau }^{(1)} = 2 g_{k}^{T} \tilde{s}_{k}+h_{k}(\tilde{s}_{k}) \tilde{s}_{k}^{T} B_{k} \tilde{s}_{k} \end{aligned}$$
(2.22)

and

$$\begin{aligned} \tau _{ \Delta }=\frac{ \widehat{ \Delta }_{k}}{ \Vert (Q_{k}^{(2)})^{T} \nabla \psi (\tilde{s}_{k}) \Vert }. \end{aligned}$$
(2.23)

Combining with (1.17)–(1.20) and (2.11), we can obtain that \(\|\tilde{s}_{k}\| \leq \tilde{\Delta }_{k}\) and

$$\begin{aligned} \zeta _{1} \leq h_{k} ( \tilde{s}_{k})\leq \zeta _{2}, \end{aligned}$$
(2.24)

where

$$\begin{aligned} \zeta _{1}= \frac{1-\epsilon _{1}}{(1 + {\epsilon _{1}})^{2}},\qquad \zeta _{2}= \frac{1+\epsilon _{1} }{{(1 -{\epsilon _{1}})}^{2}}. \end{aligned}$$
(2.25)

From (1.19), it is obvious that \(0<\zeta _{1}<1\) and \(\zeta _{2}>1\). From lines 2 and 3 before (2.15), we know that \((Q_{k}^{(2)})^{T} \nabla \psi _{k}(\tilde{s}_{k})\neq 0\), where \(Q_{k}^{(2)}\) is defined in (2.5). For \(Q_{k}^{(2)}\in R ^{n\times (n-m)}\), \(\nabla \psi _{k} (\tilde{s}_{k})\in R^{n}\), then \(Q_{k}^{(2)}(Q_{k}^{(2)})^{T}\nabla \psi _{k} (\tilde{s}_{k})\in R^{n}\) and \(Q_{k}^{(2)}(Q_{k}^{(2)})^{T}\nabla \psi _{k} (\tilde{s}_{k}) \neq 0\) can be proved easily. Assume that \(B_{k}\) is a symmetric positive definite matrix, then combining with (2.20), (2.24), and (2.25) we have

$$ a_{\tau }^{(1)}>0. $$
(2.26)

When \(B_{k}\) is only symmetric but not positive definite matrix, then the modified BFGS formula can be used to modify the positive semi-definite matrix to a positive definite matrix (see [25]). Hence, it is easy to see that the solution of (2.15)–(2.16) is

$$\begin{aligned} \tau ^{(1)} = \textstyle\begin{cases} \min \{{\tau _{\mathrm{axis}}^{(1)}, \tau _{ \Delta }} \}, & \text{if } b_{\tau }^{(1)} < 0, \\ 0, & \text{otherwise,} \end{cases}\displaystyle \end{aligned}$$
(2.27)

where

$$\begin{aligned} \tau _{\mathrm{axis}}^{(1)} = \frac{ - b_{\tau }^{(1)} }{2a_{\tau } ^{(1)}}. \end{aligned}$$
(2.28)

We know that \(u^{(1)} =-\tau ^{(1)}\nabla \tilde{\psi }_{k} (0)\) is an approximate solution of (2.13)–(2.14). In order to better approximate the solution of the subproblem, we are constantly repeating the above process similarly, except for fixing the fractional part. That is, in the ith iteration we substitute \(u=-\tau ^{(i)}\nabla \tilde{\psi }_{k} (0)\) into \(h_{k} (s(u))\), where \(i=2,3,\ldots \) . We denote

$$\begin{aligned} h_{k}^{(i)}= h_{k} \bigl(s \bigl(-\tau ^{(i)}\nabla \tilde{\psi }_{k} (0)\bigr) \bigr). \end{aligned}$$
(2.29)

Then we can obtain a similar trust-region subproblem to (2.15)–(2.16). That is,

$$\begin{aligned}& \min_{\tau \in R} \varphi _{k}^{(i)} (\tau ), \end{aligned}$$
(2.30)
$$\begin{aligned}& \text{s.t.}\quad 0\leq \tau \le \tau _{ \Delta }, \end{aligned}$$
(2.31)

where

$$\begin{aligned}& \varphi _{k}^{(i)} (\tau )= h_{k}^{(i)} g^{T}_{k} s(\tau ) + \frac{1}{2} \bigl(h_{k}^{(i)}\bigr)^{2} s(\tau )^{T} B_{k} s(\tau ) \end{aligned}$$
(2.32)
$$\begin{aligned}& \hphantom{\varphi _{k}^{(i)} (\tau )} = \frac{1}{2} h_{k}^{(i)} \bigl(a_{\tau }^{(i)} \tau ^{2} + b_{\tau }^{(i)} \tau + c_{\tau }^{(i)} \bigr), \end{aligned}$$
(2.33)
$$\begin{aligned}& a_{\tau }^{(i)} = h_{k}^{(i)}\nabla \psi _{k} (\tilde{s}_{k})^{T} Q _{k}^{(2)} \bigl(Q_{k}^{(2)} \bigr)^{T} B_{k} Q_{k}^{(2)} \bigl(Q_{k}^{(2)}\bigr)^{T} \nabla \psi _{k} (\tilde{s}_{k}) , \end{aligned}$$
(2.34)
$$\begin{aligned}& b_{\tau }^{(i)} = -2\nabla \psi _{k} ( \tilde{s}_{k})^{T}Q_{k}^{(2)} \bigl(Q _{k}^{(2)}\bigr)^{T} \bigl( g_{k} + h_{k}^{(i)} B_{k} \tilde{s}_{k}\bigr), \end{aligned}$$
(2.35)
$$\begin{aligned}& c_{\tau }^{(i)} = 2 g_{k}^{T} \tilde{s}_{k}+h_{k}^{(i)} \tilde{s}_{k} ^{T} B_{k} \tilde{s}_{k}. \end{aligned}$$
(2.36)

By the computation, we have the solution of subproblem (2.30)–(2.31) as follows:

$$\begin{aligned} \tau ^{(i)} = \textstyle\begin{cases} \min \{{\tau _{\mathrm{axis}}^{(i)} , \tau _{ \Delta }} \}, & \text{if } b_{ \tau }^{(i)} < 0, \\ 0, & \text{otherwise,} \end{cases}\displaystyle \end{aligned}$$
(2.37)

where

$$\begin{aligned} \tau _{\mathrm{axis}}^{(i)} = \frac{ - b_{\tau }^{(i)} }{2a_{\tau } ^{(i)}}. \end{aligned}$$
(2.38)

However, if \(\tau ^{(i)}= 0\), then subproblem (2.30)–(2.31) has no positive real roots. For this case, we set \(b_{k} = c_{k}= 0\), and then the fractional model reduces to the conic model. Subproblem (1.21)–(1.24) can be solved by Algorithm 2.4 in [26]. That is, we calculate

$$\begin{aligned}& \tilde{a}_{k} = \bigl(Q_{k}^{(2)} \bigr)^{T}a_{k},\qquad \bar{a} = \frac{\tilde{a}_{k}}{1-a_{k}^{T}\tilde{s}_{k}}, \qquad \bar{g} = \frac{g_{k}}{1-a_{k}^{T}\tilde{s}_{k}}, \\& \bar{B} = \frac{B_{k}}{(1-a_{k}^{T}\tilde{s}_{k})^{2}},\qquad \alpha _{1} = 1- \Vert \bar{a} \Vert ^{2}\widehat{\Delta }^{2}, \\& \widehat{\Delta } = \min \biggl\{ \sqrt{ \tilde{\Delta }^{2} _{k}- \Vert \tilde{s}_{k} \Vert ^{2} }, \frac{1-a_{k}^{T}\tilde{s}_{k}- \varepsilon _{0}}{ \Vert a_{k} \Vert } \biggr\} ,\qquad m = \frac{ \Vert \bar{a} \Vert \widehat{\Delta }^{2}}{\alpha _{1} },\qquad \Delta _{1} = \frac{\widehat{\Delta }}{\sqrt{\alpha _{1}}}, \\& \hat{g} = \bigl(Q_{k}^{(2)}\bigr)^{T}\bar{g}+ \bar{a}\tilde{s}_{k}^{T}\bar{g}+ \bar{a} \tilde{s}_{k}^{T}\bar{B}\tilde{s}_{k}+ \bigl(Q_{k}^{(2)}\bigr)^{T}\bar{B} \tilde{s}_{k},\qquad V = \operatorname{diag}(\sqrt{\alpha _{1}},1,\ldots,1), \\& \hat{B} = \bigl(Q_{k}^{(2)}\bigr)^{T} \bar{B}Q_{k}^{(2)} + \tilde{s}_{k}^{T} \bar{B}\tilde{s}_{k}\bar{a}\bar{a}^{T} +\bar{a} \tilde{s}_{k}^{T} \bar{B}Q_{k}^{(2)}+ \bigl(Q_{k}^{(2)}\bigr)^{T}\bar{B} \tilde{s}_{k}\bar{a}^{T}, \\& g_{1} = V^{-1}Z\hat{g} + mV^{-1}Z \hat{B}Z^{T}e_{1},\qquad B_{1} = V^{-1}Z\hat{B}Z^{T}V^{-1}, \end{aligned}$$

where Z is an orthonormal rotation matrix and satisfies \(Z\bar{a}= \|\bar{a}e_{1}\|\), and \(e_{1}=(1,0,\ldots,0)^{T}\). Then the solution of (2.13)–(2.14) is

$$\begin{aligned} u_{\ast }= \frac{(1-a_{k}^{T}\tilde{s}_{k})(Z^{T}Vd^{\ast }+m Z^{T}e _{1})}{1-a_{k}^{T}\tilde{s}_{k}+\tilde{a}_{k}^{T}(Z^{T}Vd^{\ast }+m Z ^{T}e_{1})}, \end{aligned}$$
(2.39)

where \(d^{\ast }\) is the solution of the quadratic trust region subproblem

$$\begin{aligned}& \min_{d \in R^{n}} g_{1}^{T}d+ \frac{1}{2}d^{T} B_{1} d, \\& \text{s.t.}\quad \Vert d \Vert \leq \Delta _{1}. \end{aligned}$$

Now we give an algorithm for solving the fractional trust region subproblems (2.13)–(2.14).

Subalgorithm 2.1

Step 0.:

Input the data of the kth iteration, i.e., \(0<\epsilon _{1}\), \(\iota <1 \), \(\varepsilon >0\), \(a_{k}\), \(b_{k}\), \(c_{k}\), \(g_{k}\), \(B_{k}\), \(A _{k}\), \(C_{k}\), and \(\tilde{\Delta }_{k}\). Set \(i=1\).

Step 1.:

Compute \(Q_{k}^{(1)}\), \(Q_{k}^{(2)}\), and \(R^{(1)}_{k}\) as defined in (2.5).

Step 2.:

Compute \(l_{k}\), \(\theta _{k}\), \(\widehat{\Delta }_{k}\), and \(\tilde{s}_{k}\) as defined in (2.9)–(2.12). Substitute \(\tilde{s}_{k}\) into formula (1.13) and obtain \(\nabla \psi _{k} (\tilde{s}_{k})\).

Step 3.:

Calculate \(\tau _{\ast }\).

Step 3.1:

Compute \(h_{k} (\tilde{s}_{k})\) and \(\nabla \psi _{k} ( \tilde{s}_{k})\). Compute \(a_{\tau }^{(1)}\), \(b_{\tau }^{(1)}\), \(c_{\tau } ^{(1)}\), and \(\tau _{ \Delta }\) from (2.20)–(2.23). Solve (2.15)–(2.16) obtaining \(\tau ^{(1)}\) as defined in (2.27). If \(\tau ^{(1)}=0\), go to Step 5; otherwise, \(i=i+1\), go to Step 3.2.

Step 3.2:

Compute \(h_{k}^{(i)}\) and \(a_{\tau }^{(i)}\), \(b_{\tau } ^{(i)}\), \(c_{\tau }^{(i)}\) from (2.29) and (2.34)–(2.36). Solve (2.30)–(2.31) obtaining \(\tau ^{(i)}\) as defined in (2.37). If \(\tau ^{(i)}=0\), go to Step 5; otherwise, \(i=i+1\), go to Step 3.3.

Step 3.3:

If \(|\tau ^{(i)}-\tau ^{(i-1)}|<\varepsilon \), then \(\tau _{\ast }=\tau ^{(i)}\), and stop. Otherwise, go to Step 3.2.

Step 4.:

Calculate \(u_{\ast } = -\tau _{\ast }(Q_{k}^{(2)})^{T} \nabla \psi _{k}(\tilde{s}_{k})\), then \(s_{\ast } = \tilde{s}_{k}+ Q _{k}^{(2)} u_{\ast }\).

Step 5.:

Set \(b_{k} = c_{k}= 0\). Calculate \(u_{\ast } \) as defined in (2.39), then \(s_{\ast } = \tilde{s}_{k}+ Q_{k}^{(2)} u_{\ast }\).

From the above analysis, we know that the steepest descent point \(u_{\ast } \) is an approximate solution of the fractional trust region subproblem (2.13)–(2.14).

Global convergence

In this section, we propose a quasi-Newton method with a fractional model for nonlinearly equality constrained optimization and prove its convergence under some reasonable conditions. In order to solve problem (1.1)–(1.2), we approximate \(f(x)\) with a fractional model of the form

$$ m_{k}(s)=f_{k}+\frac{(1+c_{k}^{T}s)g_{k}^{T}s}{(1-a_{k}^{T}s) (1-b _{k}^{T}s)}+ \frac{1}{2}\frac{(1+c_{k}^{T}s)^{2}s^{T}B_{k}s}{(1-a_{k} ^{T}s)^{2}(1-b_{k}^{T}s)^{2}}, $$
(3.1)

where \(f_{k}=f(x_{k})\), \(g_{k}=\nabla f(x_{k})\), \(B_{k}\in R^{n\times n}\) is a positive definite matrix, and \(a_{k}, b_{k}, c_{k}\in R^{n}\) are parameter vectors. We choose these vectors such that (3.1) satisfies the following conditions:

$$\begin{aligned}& m_{k}(0)=f_{k},\qquad \nabla m_{k}(0)=g_{k}, \end{aligned}$$
(3.2)
$$\begin{aligned}& m_{k}(-s_{k-1}) = f_{k-1},\qquad \nabla m_{k}(-s_{k-1}) = g_{k-1}, \end{aligned}$$
(3.3)

where \(x_{k}=x_{k-1}+s_{k-1}\). Obviously, (3.2) holds. Then from (3.3) we have

figurea

where \(v_{k-1} = (1-a_{k}^{T} s_{k-1})(1-b_{k}^{T} s_{k-1})\) and

$$\begin{aligned} Q_{k-1} =& \bigl(1-c_{k}^{T} s_{k-1}\bigr)v_{k-1}I-\bigl[c_{k}v_{k-1}+ \bigl(1-c_{k}^{T} s _{k-1}\bigr) \bigl(a_{k}\bigl(1+B_{k}^{T} s_{k-1}\bigr) \\ &{}+B_{k}\bigl(1+a_{k}^{T} s_{k-1}\bigr)\bigr)\bigr]s_{k-1} ^{T}. \end{aligned}$$
(3.6)

We choose

$$ a_{k}=k_{1}g_{k-1}, \qquad b_{k}=k_{2}B_{k-1}s_{k-1} , \qquad c_{k}=k_{3}g_{k}, $$
(3.7)

where \(k_{1}\), \(k_{2}\), \(k_{3}\) are unknown parameters, and details of the choice of parameters \(k_{1}\), \(k_{2}\), \(k_{3}\) can be found in [24].

The merit function we applied is the \(L_{1}\) exact penalty function

$$\begin{aligned} P(x)=f(x)+\sigma \bigl\Vert C(x) \bigr\Vert _{1}, \end{aligned}$$
(3.8)

where \(\sigma >0\) is a penalty parameter. We know that, for σ sufficiently large, any strong local minimizer of (1.1)–(1.2) is a local minimizer of \(P(x)\) (see [27]). It is found that this function is very convenient to be used as a merit function to force global convergence in line search type algorithms (for example, see [1]). We define the actual reduction in the merit function by

$$\begin{aligned} \operatorname{Ared}_{k}=P(x_{k})-P(x_{k}+s_{k}), \end{aligned}$$
(3.9)

where \(s_{k}\) is a trial step computed by the algorithm at \(x_{k}\). The above choice of actual reduction is used to prove the global convergence of the algorithm conveniently. Correspondingly, the predicted reduction is defined as

$$\begin{aligned} \operatorname{Pred}_{k}=\psi _{k} (0)-\psi _{k} (s_{k})+\sigma _{k}\bigl( \Vert C_{k} \Vert _{1}- \bigl\Vert C_{k}+A_{k}^{T}s_{k} \bigr\Vert _{1}\bigr), \end{aligned}$$
(3.10)

where \(\psi _{k} (s)\) is defined as (1.11). If

$$\begin{aligned} \psi _{k} (0)-\psi _{k} (s_{k})\geq - \frac{\sigma _{k-1} }{2}\bigl( \Vert C_{k} \Vert _{1}- \bigl\Vert C_{k}+A_{k}^{T}s_{k} \bigr\Vert _{1}\bigr), \end{aligned}$$
(3.11)

we set the penalty parameter

$$\begin{aligned} \sigma _{k}=\sigma _{k-1} ; \end{aligned}$$
(3.12)

otherwise,

$$\begin{aligned} \sigma _{k} = \max \biggl[2\sigma _{k-1}, \frac{2(\psi _{k} ( s_{k})-\psi _{k} (0))}{ \Vert C_{k} \Vert _{1}- \Vert C_{k}+A_{k}^{T}s_{k} \Vert _{1}} \biggr]. \end{aligned}$$
(3.13)

Then

$$\begin{aligned} \operatorname{Pred}_{k} \geq \frac{1 }{2}\sigma _{k}\bigl( \Vert C_{k} \Vert _{1}- \bigl\Vert C_{k}+A _{k}^{T}s_{k} \bigr\Vert _{1}\bigr) \end{aligned}$$
(3.14)

holds for all k.

In the following algorithm, we compute the ratio of the actual reduction and the predicted reduction to choose the next iterate point and update the new trust region. Now we give the new quasi-Newton algorithm based on the fractional model (3.1).

Algorithm 3.1

Step 0.:

Choose \(x_{0}\in R^{n}\), \(\varepsilon >0\), \(\varepsilon _{0}>0\), \(\epsilon _{1} \in (0, 1)\), \(\Delta _{\max }>0\), \(0<\iota _{1}< \iota _{2}<1\), \(0<\delta _{1} <1< \delta _{2}\), \(\sigma _{0}>0\), \(B_{0} =I\), and the initial trust region radius \(\Delta _{0}\in (0,\Delta _{\max }]\). Set \(k=0\).

Step 1.:

Stopping criterion. Compute \(g_{k}\), \(C_{k}\), and \(Q_{k}^{(2)}\) as defined in (2.5). If \(\| (Q_{k}^{(2)} )^{T} g_{k} \|\leq \varepsilon \) and \(\| C_{k} \|\leq \varepsilon \), then \(x_{\ast }=x_{k}\), stop. If \(k=0\), go to Step 3.

Step 2.:

Update \(B_{k+1}\) by

$$ B_{k+1} = B_{k}-\frac{B_{k}s_{k}s_{k}^{T}B_{k}}{s_{k}^{T}B_{k}s_{k}}+ \frac{z_{k}z_{k}^{T}}{z_{k}^{T}s_{k}}, $$
(3.15)

where

$$\begin{aligned}& z_{k}=\vartheta y_{k}+(1-\vartheta )B_{k}s_{k}, \quad \vartheta \in [0,1], \\& \vartheta =\textstyle\begin{cases} 1,& \text{if } y_{k}^{T}s_{k}\geq 0.2s_{k}^{T}B_{k}s_{k}, \\ \frac{0.8s_{k}^{T}B_{k}s_{k}}{s_{k}^{T}B_{k}s_{k}-y_{k}^{T}s_{k}}, &\text{otherwise}, \end{cases}\displaystyle \end{aligned}$$

and \(y_{k}=g_{k+1}-g_{k}\).

Step 3.:

If \(k\leq 1\), then set \(a_{k}=b_{k}=c_{k}=0\) and \(d_{k}=- B_{k}^{-1}g _{k}\), compute \(\alpha _{k}\) such that Wolfe–Powell conditions are satisfied, and set \(x_{k+1}=x_{k}+s_{k}=x_{k}+\alpha _{k}d_{k}\). \(k=k+1\) and go to Step 1.

Step 4.:

Compute

$$\begin{aligned} \textstyle\begin{cases} \alpha = g_{k-1}^{T}s_{k-1},\qquad \tilde{\alpha }=g_{k-1}^{T} \xi _{1},\qquad \hat{\alpha } =g_{k-1}^{T}\xi _{2}, \\ \beta =s_{k-1}^{T}B_{k-1}s_{k-1},\qquad \tilde{\beta } =s_{k-1}^{T}B_{k-1}\xi _{1},\qquad \hat{\beta } =s_{k-1}^{T}B_{k-1}\xi _{2}, \\ \zeta =s_{k-1}^{T}B_{k}s_{k-1},\qquad \tilde{\zeta } =s_{k-1}^{T}B_{k}\xi _{1},\qquad \hat{\zeta } =s_{k-1}^{T}B_{k}\xi _{2}, \\ \gamma = g_{k}^{T}s_{k-1},\qquad \tilde{\gamma } =g_{k}^{T}\xi _{1},\qquad \hat{\gamma } =g_{k}^{T} \xi _{2}, \end{cases}\displaystyle \end{aligned}$$
(3.16)

where \(\xi _{1}\) and \(\xi _{2}\) are chosen such that

$$\begin{aligned} \tilde{\alpha } = \tilde{\gamma } = \hat{\zeta } = \hat{\gamma } = 0. \end{aligned}$$
(3.17)
Step 5.:

If \(\gamma =0\), then go to Step 6; otherwise, compute

$$\begin{aligned}& \eta = \bigl(\gamma +\sqrt{\gamma ^{2}+ 2\zeta (f_{k-1}- f_{k}) }\bigr)/ \zeta , \\& \ddot{\gamma } =\eta \tilde{\beta } \zeta -\eta \beta \tilde{\zeta }-\tilde{\beta }\gamma , \\& \tilde{\iota }= \gamma -\eta \zeta ,\qquad \dot{\iota }= \tilde{\iota }\bigl( \hat{\alpha }\tilde{\beta }- \eta ^{2} \tilde{\zeta }\hat{\beta } \bigr),\\& \dot{\gamma }= \alpha \dot{\iota }+\eta \hat{\alpha } \tilde{\beta } \tilde{\iota }^{2}. \end{aligned}$$

If \(\dot{\gamma }=0\) or \(\ddot{\gamma }=0\), then go to Step 6; otherwise, calculate

$$\begin{aligned}& k_{1} = -\dot{\iota }/\dot{\gamma },\qquad k_{2} = \eta \tilde{\zeta }/\ddot{\gamma }, \\& k_{3} = \frac{1-\eta (1+k_{1}\alpha )(1+k_{2}\beta )}{\gamma }, \end{aligned}$$

and \(a_{k}\), \(b_{k}\), \(c_{k}\) as determined in (3.7), go to Step 7.

Step 6.:

Let \(b_{k}=c_{k}=0\). Calculate

$$\begin{aligned}& \rho _{k} = (f_{k-1}- f_{k})^{2}- \alpha \gamma , \end{aligned}$$
(3.18)
$$\begin{aligned}& \dot{\beta } = \textstyle\begin{cases} \frac{(f_{k-1}- f_{k})+\sqrt{\rho _{k}}}{-\alpha },&\text{if } \rho_{k} \geq 0, \\ 1,&\text{otherwise}, \end{cases}\displaystyle \end{aligned}$$
(3.19)

and set

$$\begin{aligned} a_{k}= \frac{1-\dot{\beta }}{\alpha }g_{k-1}. \end{aligned}$$
(3.20)
Step 7.:

If \(\|a_{k}\| > \frac{\epsilon _{1}}{\tilde{\Delta }_{k}}\), then \(a_{k} = \frac{\epsilon _{1} a_{k}}{\tilde{\Delta }_{k}\|a_{k}\|}\). Update \(b_{k}\) and \(c_{k}\) in the same way so that (1.20) are satisfied.

Step 8.:

Solve subproblem (2.2)–(2.4) by Subalgorithm 2.1 to get \(s_{k}\).

Step 9.:

If (3.11) holds, then \(\sigma _{k}=\sigma _{k-1}\); otherwise, calculate \(\sigma _{k}\) as defined in (3.13).

Step 10.:

Compute

$$ \rho _{k}=\frac{\operatorname{Ared}_{k}}{\operatorname{Pred}_{k}}, $$
(3.21)

where \(\operatorname{Ared}_{k}\), \(\operatorname{Pred}_{k}\) are defined in (3.9) and (3.10).

Step 11.:

Update the trust region radius

$$\begin{aligned} \tilde{\Delta }_{k+1}= \textstyle\begin{cases} \delta _{1} \tilde{\Delta }_{k}, &\text{if } \rho _{k}\leq \iota _{1}, \\ \min \{\delta _{2}\tilde{\Delta }_{k}, \Delta _{\max }\}, &\text{if } \rho _{k}\geq \iota _{2} \text{ and } \Vert s_{k} \Vert = \tilde{\Delta }_{k}, \\ \tilde{\Delta }_{k}, &\text{otherwise}. \end{cases}\displaystyle \end{aligned}$$
(3.22)
Step 12.:

If \(\rho _{k}\geq \iota _{1}\), then \(x_{k+1}=x_{k}+s_{k}\). Set \(k=k+1\), and go to Step 1; otherwise \(x_{k+1}=x_{k}\), \(k=k+1\), and go to Step 6.

Remark 2

In order to ensure the positive definite transfer of the Hessian matrix, we use the modified BFGS formula in Step 2 to iterate to get the positive definite matrix \(B_{k+1}\) (see [25]).

In the following, we establish the convergence results of Algorithm 3.1. The focus of this paper is to transform the fractional model trust region subproblem of the equality constrained optimization model into a simple one-dimensional quadratic model subproblem by cyclically fixing the fractional coefficient part of the model, so as to obtain a new approximate solution method for solving the subproblem. Therefore, the framework of the global convergence proof is similar to the proof process of the conic model trust region algorithm in [20], with a major difference being the lower bound of reduction in each iteration (see Lemma 3.1), which is an important result required in the proof of convergence.

Lemma 3.1

Suppose that (1.20) holds, where \(\epsilon _{1}\)satisfies (2.1). If \(\{x_{k}\}\), \(\{B_{k}\}\), and \(\{(A_{k}^{T}A_{k})^{-1} \}\)are uniformly bounded. Let \(s_{k}\)be the solution of subproblem (2.2)(2.4), then there exist positive constants \(M_{1}\)and \(M_{2}\)such that

$$\begin{aligned} \psi _{k} (0)-\psi _{k} (s_{k}) \geq & M_{1} \bigl\Vert \bigl(Q_{k}^{(2)}\bigr)^{T}g_{k} \bigr\Vert \min \biggl\{ \tilde{\Delta }_{k}, \frac{ \Vert (Q_{k}^{(2)})^{T} g _{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} \\ & {}- M_{2} \bigl(1+ \Vert B_{k} \Vert \bigr) \min \bigl\{ \tilde{\Delta }_{k}, \Vert C _{k} \Vert \bigr\} \end{aligned}$$
(3.23)

for allk.

Proof

Define

$$\begin{aligned} s_{k} (t) = \theta _{k} l_{k} -t Q_{k}^{(2)}\bigl(Q_{k}^{(2)} \bigr)^{T} g_{k}, \end{aligned}$$
(3.24)

where \(\theta _{k}\) and \(l_{k} \) are defined in (2.12) and (2.9) respectively. Then we can see that \(s_{k} (t)\) is in the feasible region of subproblem (2.2)–(2.4) for all \(t\in [0, \widehat{\Delta }_{k}/{\| (Q_{k}^{(2)})^{T} g_{k} \|}]\). From the definition of \(s_{k}\) and \(s_{k} (t)\), we have

$$\begin{aligned} \psi _{k} (0)-\psi _{k} (s_{k}) \geq \psi _{k} (0)-\psi _{k} \bigl(s_{k}(t)\bigr) \end{aligned}$$
(3.25)

for all \(t\in [0, \widehat{\Delta }_{k}/{\| (Q_{k}^{(2)})^{T} g_{k} \|}]\). From (1.20), (2.24), and \(s_{k} (t)\leq \tilde{\Delta }_{k}\) it follows

$$\begin{aligned} \zeta _{1} \leq h_{k} \bigl(s_{k}(t)\bigr)\leq \zeta _{2}, \end{aligned}$$
(3.26)

where \(\zeta _{1}\), \(\zeta _{2}\) are defined in (2.25). By \(\|Q_{k}^{(2)}\|=1\), the Cauchy–Schwarz inequality, and for all \(t\in [0, \widehat{\Delta }_{k}/{\| (Q_{k}^{(2)})^{T} g_{k} \|}]\), we have

$$\begin{aligned}& \psi _{k} (0)-\psi _{k} \bigl(s_{k}(t)\bigr) \\& \quad = -h_{k} \bigl(s_{k}(t)\bigr) \theta _{k} g^{T}_{k} l_{k} + t h_{k} \bigl(s_{k}(t)\bigr) \bigl\Vert \bigl(Q_{k}^{(2)}\bigr)^{T} g_{k} \bigr\Vert ^{2} \\& \qquad {}- \frac{1}{2} \bigl(h_{k} \bigl(s_{k}(t) \bigr)\bigr)^{2} \theta _{k} ^{2}l_{k}^{T} B _{k} l_{k} -\bigl(h_{k} \bigl(s_{k}(t)\bigr)\bigr)^{2} \theta _{k}l_{k}^{T} B_{k}Q_{k}^{(2)} \bigl(-t\bigl(Q _{k}^{(2)}\bigr)^{T} g_{k}\bigr) \\& \qquad {} - \frac{1}{2} \bigl(h_{k} \bigl(s_{k}(t) \bigr)\bigr)^{2} t^{2} g_{k}^{T} Q_{k}^{(2)} \bigl(Q _{k}^{(2)} \bigr)^{T}B_{k} Q_{k}^{(2)} \bigl(Q_{k}^{(2)}\bigr)^{T} g_{k} \\& \quad \geq -\zeta _{2}\theta _{k} \Vert l_{k} \Vert \Vert g_{k} \Vert + t\zeta _{1} \bigl\Vert \bigl(Q_{k} ^{(2)} \bigr)^{T} g_{k} \bigr\Vert ^{2} \\& \qquad {} - \frac{1}{2} \zeta _{2}^{2} \bigl( \theta _{k} ^{2} \Vert l_{k} \Vert ^{2} \Vert B _{k} \Vert + 2 \theta _{k} \Vert l_{k} \Vert \Vert B_{k} \Vert \widehat{ \Delta }_{k} +t ^{2} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert ^{2} \Vert B_{k} \Vert \bigr) \\& \quad \geq t\zeta _{1} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert ^{2}- \frac{1}{2} t^{2} \zeta _{2}^{2} \bigl\Vert \bigl(Q_{k}^{(2)}\bigr)^{T} g_{k} \bigr\Vert ^{2} \Vert B_{k} \Vert \\& \qquad {} - \zeta _{2}\theta _{k} \Vert l_{k} \Vert \Vert g_{k} \Vert - \zeta _{2}^{2} \theta _{k} \Vert l_{k} \Vert \Vert B_{k} \Vert \bigl( \theta _{k} \Vert l_{k} \Vert +\widehat{\Delta } _{k} \bigr). \end{aligned}$$
(3.27)

By (2.12), we obtain

$$\begin{aligned}& \max_{t\in [0, \widehat{\Delta }_{k}/{ \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }]} \biggl\{ t\zeta _{1} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert ^{2}- \frac{1}{2} t^{2}\zeta _{2}^{2} \bigl\Vert \bigl(Q_{k}^{(2)}\bigr)^{T} g_{k} \bigr\Vert ^{2} \Vert B _{k} \Vert \biggr\} \\& \quad \geq \frac{\zeta _{1}}{2} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert ^{2} \min \biggl\{ \frac{\widehat{\Delta }_{k}}{ \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }, \frac{ \zeta _{1}}{\zeta _{2}^{2} \Vert B_{k} \Vert } \biggr\} \\& \quad \geq \frac{\zeta _{1}}{2} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert \min \biggl\{ \tilde{\Delta }_{k}, \frac{\zeta _{1} \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }{ \zeta _{2}^{2} \Vert B_{k} \Vert } \biggr\} - \frac{\zeta _{2}}{2} \bigl\Vert \bigl(Q_{k}^{(2)}\bigr)^{T} g_{k} \bigr\Vert \bigl(\theta _{k} \Vert l_{k} \Vert \bigr) \\& \quad \geq M_{1} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert \min \biggl\{ \tilde{\Delta }_{k}, \frac{ \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} - \zeta _{2}^{2} \theta _{k} \Vert l_{k} \Vert \Vert g_{k} \Vert , \end{aligned}$$
(3.28)

where \(0<\zeta _{1}<1\), \(\zeta _{2}>1\), \(M_{1}= {\zeta _{1}^{2}}/(2\zeta _{2}^{2})\), and the second inequality is obtained from the triangular inequality properties. Then from (3.25)–(3.28) we have

$$\begin{aligned} \psi _{k} (0)-\psi _{k} (s_{k}) \geq & \max_{t\in [0, \widehat{\Delta }_{k}/{ \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }]} \psi _{k} (0)-\psi _{k} \bigl(s_{k}(t)\bigr) \\ \geq & M_{1} \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert \min \biggl\{ \tilde{\Delta }_{k}, \frac{ \Vert (Q_{k}^{(2)})^{T} g_{k} \Vert }{ \Vert B_{k} \Vert } \biggr\} \\ &{} - \zeta _{2}^{2}\theta _{k} \Vert l_{k} \Vert \bigl[ \Vert B_{k} \Vert \bigl( \theta _{k} \Vert l_{k} \Vert +\widehat{\Delta }_{k} \bigr)+2 \Vert g_{k} \Vert \bigr]. \end{aligned}$$
(3.29)

Besides, since \(\{(A_{k}^{T}A_{k})^{-1}\}\) are uniformly bounded, then from Lemma 3.1 in [20] we know that there exists a positive constant \(\zeta _{3}\) such that

$$\begin{aligned} \theta _{k} \Vert l_{k} \Vert \leq \min \bigl\{ \iota \tilde{\Delta }_{k}, \zeta _{3} \Vert C_{k} \Vert \bigr\} \end{aligned}$$

holds where \(\iota \in (0,1)\) is defined in (2.11). Then we have

$$\begin{aligned} \theta _{k} \Vert l_{k} \Vert \leq \max \{ \iota , \zeta _{3} \} \min \bigl\{ \tilde{\Delta }_{k}, \Vert C_{k} \Vert \bigr\} . \end{aligned}$$
(3.30)

The boundedness of \(\{x_{k}\}\), rule (3.22), and inequality (3.30) imply that \(\theta _{k} \| l_{k}\|+ \widehat{\Delta }_{k} \) and \(\| g_{k}\|\) are bounded above uniformly. For instance, if \(\theta _{k} \| l_{k}\|+ \widehat{\Delta }_{k} \leq \kappa _{1}\) and \(\| g_{k}\| \leq \kappa _{2}\) for all sufficiently large k, then we have

$$\begin{aligned} \Vert B_{k} \Vert \bigl( \theta _{k} \Vert l_{k} \Vert +\widehat{\Delta }_{k} \bigr)+2 \Vert g_{k} \Vert \leq \kappa _{1} \Vert B_{k} \Vert +2\kappa _{2} \leq \max \{ \kappa _{1}, 2 \kappa _{2} \} \bigl( \Vert B_{k} \Vert +1\bigr). \end{aligned}$$
(3.31)

Let

$$\begin{aligned} M_{2}= \zeta _{2}^{2} \max \{ \iota , \zeta _{3} \} \max \{ \kappa _{1}, 2\kappa _{2} \} . \end{aligned}$$

Then from (3.29)–(3.31) we know that (3.23) holds. Hence the theorem is proved. □

The proofs of the following Theorems 3.13.2 are similar to those in [20], so we only give the conclusion and omit the proofs.

Theorem 3.1

Under the conditions of Lemma 3.1, we have

$$\begin{aligned} \lim_{k\to \infty } \Vert C_{k} \Vert = 0. \end{aligned}$$

Theorem 3.2

Under the conditions of Lemma 3.1, we have

$$\begin{aligned} \lim_{k\to \infty } \inf \bigl\Vert \bigl(Q_{k}^{(2)} \bigr)^{T} g_{k} \bigr\Vert = 0. \end{aligned}$$

Theorem 3.3

Let the conditions of Lemma 3.1hold and \(\lambda _{k} = (R^{(1)} _{k})^{-1} (Q_{k}^{(1)})^{T} g_{k} \). Then we have

$$\begin{aligned} \lim_{k\to \infty } \inf \Vert g_{k}-A_{k}\lambda _{k} \Vert = 0. \end{aligned}$$
(3.32)

Proof

From the definition of \(\lambda _{k}\) and (2.5), we have

$$\begin{aligned} g_{k}-A_{k}\lambda _{k} =& \bigl(Q^{(1)}_{k} \bigl(Q_{k}^{(1)} \bigr)^{T} + Q^{(2)} _{k} \bigl(Q_{k}^{(2)} \bigr)^{T}\bigr)g_{k}- Q_{k}^{(1)}R^{(1)}_{k} \bigl(R^{(1)}_{k}\bigr)^{-1} \bigl(Q_{k}^{(1)}\bigr)^{T} g_{k} \\ =& Q^{(2)}_{k} \bigl(Q_{k}^{(2)} \bigr)^{T}g_{k}. \end{aligned}$$

From Theorem 3.2 it follows that (3.32) holds. □

From Theorems 3.1 and 3.3, we know that there exists a subsequence of \(\{x_{k}\}\) produced by Algorithm 3.1 converging to the KT point of (1.1)–(1.2).

Numerical experiment

In this section, Algorithm 3.1 (abbreviated as FTR) is tested with some test problems, where eight problems are directly chosen from [28, 29] and are listed in Table 1.

Table 1 Test functions

Moreover, in order to test Algorithm 3.1 more generally, we designed three problems (Problems 9–11) where Problems 10–11 are chosen from [30, 31] and the nonlinear equality constraints are polynomials (see [26]).

  1. 9.

    Conic function

    $$\begin{aligned}& f(x)=\sum_{i=1}^{n/2} \frac{x_{2i-1}^{2}+x_{2i}^{2}}{(1-x_{2i-1})^{2}} \\& \mbox{s.t.}\quad c_{i}(x)=x_{2i}^{2}-x_{2i-1}x_{2i}+10=0, \quad 1 \leq i \leq n/2. \end{aligned}$$
  2. 10.

    Generalized Brown function

    $$\begin{aligned}& f(x)= \sum_{i=1}^{n/2} \bigl( \bigl(x_{2i-1}^{2}\bigr)^{(x_{2i}^{2}+1)}+ \bigl(x_{2i} ^{2}\bigr)^{(x_{2i-1}^{2}+1)} \bigr) \\& \mbox{s.t.}\quad c_{i}(x)=(3-2x_{i+1})x_{i+1}+1-x_{i}-2 x_{i+2}=0,\quad 1 \leq i \leq n-2. \end{aligned}$$
  3. 11.

    Penalty-I function

    $$\begin{aligned}& f(x)= 10^{-5} \sum_{i=1}^{n} (x_{i}-1)^{2}+\Biggl(\sum_{i=1}^{n} x_{i} ^{2}-0.25\Biggr)^{2} \\& \mbox{s.t.}\quad c_{i}(x)=3x_{i}^{3} +4x_{i+1}^{2}=0, \quad 1 \leq i \leq n-1. \end{aligned}$$

Our fractional model is proposed on the basis of the conic model, and it is a generalized form of the conic model. Both of them are suitable for solving the optimization problem that the objective function is non-quadratic and the curvature changes severely. At this time, the quadratic model methods often produce a poor prediction of the minimizer of the function. In order to compare the calculation results of the fractional model and the conic model, we let \(b_{k}=c_{k}=0\) in Algorithm 3.1. Then we can obtain the conic model algorithm and call this algorithm CTR. Therefore, we solve these test problems by FTR and CTR respectively and compare their results.

All the computations are carried out in Matlab R2013b on a microcomputer in double precision arithmetic. These tests use the same stopping criterion \(\| (Q_{k}^{(2)} )^{T} g_{k} \|\leq \varepsilon \) and \(\| C_{k} \|\leq \varepsilon \). If the stoping criterion is not satisfied with \(\text{Iter} \leq 5000\), then the test is terminated. We mark these by an in such tables. The columns in the tables have the following meanings: Pro denotes the number of the test problems; n is the dimension of the test problems, Iter is the number of iterations; nf and ng are the numbers of function and gradient evaluations respectively; Qg is the value of Euclidean norm \(\|(Q_{k}^{(2)} )^{T} g_{k}\|\) at the final iteration; CPU(s) denotes the total iteration time of the algorithm in seconds.

The parameters in these algorithms are as follows:

$$\begin{aligned}& B_{0} = I, \qquad \epsilon _{1} = 0.6,\qquad \varepsilon =10^{-6},\qquad \varepsilon _{0}=10^{-3}, \qquad \Delta _{0} = 2, \\& \Delta _{\max } = 10,\qquad \iota = \sqrt{0.5},\qquad \sigma _{0}=1,\qquad \iota _{1} = 0.01,\qquad \iota _{2} =0.7, \\& \delta _{1} =0.25,\qquad \delta _{2} =1.5. \end{aligned}$$

The numerical comparison for 11 test problems is listed in Table 2. Because of the difference in the initial iteration point, we actually performed 19 experiments. In terms of the number of iterations, our algorithm FTR may be somewhat superior to CTR for 17 tests, and the two algorithms are the same in efficiency for the other 2 tests. Because FTR needs some extra algebra computation for some parameters, FTR takes more time than CTR.

Table 2 The numerical results of Algorithm 3.1 for some test problems

We know that large-scale problems with 10,000 or more dimensions can be tested for unconstrained optimization problems. But when the test problem contains nonlinear equality constraints, the difficulty of solving the problem is greatly increased. Therefore, the dimensions of the nonlinear equality constraint problems are limited. From Table 3, we can see that as the dimensions of each test problem range from 25 to 400, we have computed nine numerical comparisons experiments for three test questions. It can be found that our algorithm can get the minimum value of the function after a finite number of iterations, but the algorithm CTR can not, and some test problem algorithms failed. The number of iterations of FTR is less than that of CTR. That is, with the increase of the dimension of the problem, our algorithm FTR can be superior to CTR both in iteration number and algorithm stability.

Table 3 The numerical results of Algorithm 3.1 for some test problems (\(\varepsilon =10^{-4}\))

Conclusions

The fractional model in Algorithm 3.1 is the extension of a conic model. By using more information of function and gradient from the previous iterations and choosing parameters flexibly, the fractional model can be more approximate to the original problem. When solving the trust region subproblem, we reduce the fractional model of the subproblem to a quadratic model by cycling the fixed coefficient part, and iteratively search in the descending direction to obtain the approximate solution of the subproblem. Solving of the subproblems is creative and the algorithm is simple. The theoretical and numerical results show that the method is effective and robust.

References

  1. 1.

    Han, S.P.: A globally convergent method for nonlinear programming. J. Optim. Theory Appl. 22(3), 297–309 (1977)

  2. 2.

    Powell, M.J.D.: Variable Metric Methods for Constrained Optimization. Springer, Berlin Heidelberg (1983)

  3. 3.

    Yuan, Y.X., Sun, W.Y.: Conic Methods for Unconstrained Minimization and Tensor Methods for Nonlinear Equations. Science Press, Beijing (1997)

  4. 4.

    Powell, M.J.D., Yuan, Y.: A trust region algorithm for equality constrained optimization. Math. Program. 49(1), 189–211 (1990)

  5. 5.

    Vardi, A.: A trust region algorithm for equality constrained minimization: convergence properties and implementation. SIAM J. Numer. Anal. 22(3), 575–591 (1981)

  6. 6.

    Boggs, P.T., Byrd, R.H., Schnabel, R.B.: A stable and efficient algorithm for nonlinear orthogonal distance regression. SIAM J. Sci. Stat. Comput. 8(6), 1052–1078 (1987)

  7. 7.

    Toint, P.L.: Global convergence of a class of trust region methods for nonconvex minimization in Hilbert space. IMA J. Numer. Anal. 8, 231–252 (1988)

  8. 8.

    Zhang, J.Z., Zhu, D.T.: Projected quasi-Newton algorithm with trust region for constrained optimization. J. Optim. Theory Appl. 67(2), 369–393 (1990)

  9. 9.

    El-Alem, M.: A robust trust-region algorithm with a nonmonotonic penalty parameter scheme for constrained optimization. SIAM J. Optim. 5(2), 348–378 (1995)

  10. 10.

    Qu, S.J., Zhou, Y.Y., Zhang, Y.L., Wahab, M.I.M., Zhang, G., Ye, Y.Y.: Optimal strategy for a green supply chain considering shipping policy and default risk. Comput. Ind. Eng. 131, 172–186 (2019)

  11. 11.

    Huang, R.P., Qu, S.J., Yang, X.G., Liu, Z.M.: Multi-stage distributionally robust optimization with risk aversion. J. Ind. Manag. Optim. (2019). https://doi.org/10.3934/jimo.2019109

  12. 12.

    Qin, X., Yao, J.C.: A viscosity iterative method for a split feasibility problem. J. Nonlinear Convex Anal. 20, 1497–1506 (2019)

  13. 13.

    Dai, Y., Yamashita, N.: Convergence analysis of sparse quasi-Newton updates with positive definite matrix completion for two-dimensional functions. Numer. Algebra Control Optim. 1, 61–69 (2011)

  14. 14.

    Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics, Philadelphia (2000)

  15. 15.

    Davidon, W.C.: Conic approximations and collinear scalings for optimizers. SIAM J. Numer. Anal. 17(17), 268–281 (1980)

  16. 16.

    Sorensen, D.C.: The q-superlinear convergence of a collinear scaling algorithm for unconstrained optimization. SIAM J. Numer. Anal. 17(17), 84–114 (1980)

  17. 17.

    Zhang, X., Wen, X., Ni, Q.: Subspace trust-region algorithm with conic model for unconstrained optimization. Numer. Algebra Control Optim. 3, 223–234 (2013)

  18. 18.

    Zhu, H., Ni, Q.: A simple alternating direction method for the conic trust region subproblem. Math. Probl. Eng. 2018, Article ID 5358191 (2018)

  19. 19.

    Byrd, R.H., Shultz, G.A.: A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal. 24(5), 1152–1170 (1987)

  20. 20.

    Sun, W.Y., Yuan, Y.X.: A conic trust-region method for nonlinearly constrained optimization. Ann. Oper. Res. 103(1), 175–191 (2001)

  21. 21.

    Omojokun, E.O.: Trust region algorithms for optimization with nonlinear equality and inequality constraints. Ph.D. thesis, Boulder, CO, USA (1989)

  22. 22.

    Zhu, H., Ni, Q., Zeng, M.L.: A quasi-Newton trust region method based on a new fractional model. Numer. Algebra Control Optim. 5(3), 237–249 (2015)

  23. 23.

    Zhu, H., Ni, Q., Dang, C., Zhang, H.: A trust region method based on the fractional model for unconstrained optimization. Sci. Sin., Math. 48(4), 531–546 (2018)

  24. 24.

    Zhu, H., Ni, Q., Zhang, L., Yang, W.: A fractional trust region method for linear equality constrained optimization. Discrete Dyn. Nat. Soc. 2016(2), 1–10 (2016)

  25. 25.

    Ni, Q.: Optimization Method and Program Design. Science Press, Beijing (2009)

  26. 26.

    Zhang, L.W., Ni, Q.: Trust region algorithm of new conic model for nonlinearly equality constrained optimization. J. Numer. Methods Comput. Appl. 31(4), 279–289 (2010)

  27. 27.

    Coleman, T.F., Conn, A.R.: Second-order conditions for an exact penalty function. Math. Program. 19(1), 178–185 (1980)

  28. 28.

    Schittkowski, K.: More Test Examples for Nonlinear Programming Codes. Springer, New York (1979)

  29. 29.

    Lukšan, L., Vlček, J.: Indefinitely preconditioned inexact Newton method for large sparse equality constrained nonlinear programming problems. Numer. Linear Algebra Appl. 5(3), 219–247 (1998)

  30. 30.

    Zhu, M., Xue, Y., Sheng, Z.F.: A quasi-Newton type trust region method based on the conic model. Numer. Math. J. Chin. Univ. 17(1), 36–47 (1995)

  31. 31.

    Moré, J.J., Garbow, B.S., Hillstrom, K.E.: Testing unconstrained optimization software. ACM Trans. Math. Softw. 7(1), 17–41 (1981)

Download references

Acknowledgements

We are grateful to the editors and referees for their suggestions and comments.

Availability of data and materials

Not applicable.

Funding

Honglan Zhu was supported by the National Natural Science Foundation of China (NSFC) grant 11926357. Qin Ni was supported by the NSFC grant 11771210 and the Natural Science Foundation of Jiangsu Province (No. BK20141409). Xuebing Zhang was supported by the Basic Science Research Program by Natural Science Foundation of Jiangsu Province (No. BK20150420) and the Startup Foundation for Introducing Talent of NUIST.

Author information

All authors worked in cooperation. All authors carried out the proof, read and approved the current version of the manuscript.

Correspondence to Honglan Zhu.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhu, H., Ni, Q. & Zhang, X. A simple approximated solution method for solving fractional trust region subproblems of nonlinearly equality constrained optimization. J Inequal Appl 2020, 25 (2020). https://doi.org/10.1186/s13660-020-2300-7

Download citation

Keywords

  • Nonlinearly equality constrained optimization
  • Fractional model
  • Trust region method
  • Approximated solution method
  • Global convergence