In this section, we analyze the global convergence of the propose method, where we assume that \(g_{k}\neq0\) for all \(k\geq0\) else a stationary point is obtained. First of all, we show that the search direction satisfies the sufficient descent and the conjugacy conditions. In order to present the results, the following assumptions are needed.
Assumption 1
The objective function f is convex and the gradient g is Lipschitz continuous on the level set
$$ K= \bigl\{ x\in\Re^{n}|f(x)\leq f(x_{0}) \bigr\} . $$
(17)
Then there exist some positive constants \(\psi_{1}\), \(\psi_{2}\), and L such that
$$ \bigl\Vert g(x)-g(y)\bigr\Vert \leq L\Vert x-y\Vert $$
(18)
and
$$ \psi_{1}\Vert z\Vert ^{2}\leq z^{T} G(x) z\leq\psi_{2}\Vert z\Vert ^{2}, $$
(19)
for all \(z \in R^{n}\) and \(x,y \in K\) where \(G(x)\) is the Hessian matrix of f.
Under Assumption 1, we can easily deduce that
$$ \psi_{1}\Vert s_{k}\Vert ^{2} \leq s^{T}_{k} y_{k}\leq\psi_{2}\Vert s_{k}\Vert ^{2}, $$
(20)
where \(s^{T}_{k} y_{k}=s^{T}_{k} \bar{G} s_{k} \) and \(\bar{G}=\int_{0}^{1} G(x_{k} + \lambda s_{k}) s_{k} \,d \lambda\). We begin by showing that the updating matrix (9) is positive definite.
Lemma 3.1
Suppose that Assumption
1
holds; then the matrix (9) is positive definite.
Proof
In order to show that the matrix (9) is positive definite we need to show that \(\mu_{k}\) is well defined and bounded. First, by the Cauchy-Schwarz inequality we have
$$\begin{aligned} \biggl(\frac{s_{k}^{T}s_{k}}{y_{k}^{T}s_{k}} \biggr)^{2}- \frac{s_{k}^{T}s_{k}}{y_{k}^{T}y_{k}}&= \frac{ (s_{k}^{T}s_{k} ) ( (s_{k}^{T}s_{k} ) (y_{k}^{T}y_{k} )- (y_{k}^{T}s_{k} )^{2} )}{(y_{k}^{T}s_{k})^{2}(y_{k}^{T}y_{k})} \\ &\geq 0, \end{aligned}$$
and this implies that the scaling parameter \(\mu_{k}\) is well defined. It follows that
$$\begin{aligned} 0&< \mu_{k} =\frac{s_{k}^{T}s_{k}}{y_{k}^{T}s_{k}}- \biggl( \biggl( \frac{s_{k}^{T}s_{k}}{y_{k}^{T}s_{k}} \biggr)^{2}-\frac{s_{k}^{T}s_{k}}{y_{k}^{T}y_{k}} \biggr)^{\frac{1}{2}} \\ & \leq\frac{s_{k}^{T}s_{k}}{y_{k}^{T}s_{k}}\leq\frac{\Vert s_{k}\Vert ^{2}}{\psi_{1}^{2}\Vert s_{k}\Vert ^{2}}=\frac{1}{\psi_{1}^{2}}. \end{aligned}$$
When the scaling parameter is positive and bounded above, then for any non-zero vector \(p\in\Re^{n}\) we obtain
$$\begin{aligned} p^{T} Q_{k+1}p&=\mu_{k} p^{T} p I + \frac{p^{T}s_{k}s_{k}^{T}p}{s_{k}^{T}y_{k}}- \mu _{k} \frac{p^{T}y_{k} y_{k}^{T}p }{y_{k}^{T} y_{k}} \\ &= \mu_{k} \biggl[\frac{(p^{T} p)(y_{k}^{T} y_{k})- p^{T}y_{k} y_{k}^{T}p}{y_{k}^{T} y_{k}} \biggr]+ \frac{(p^{T}s_{k})^{2}}{s_{k}^{T}y_{k}}. \end{aligned}$$
By the Cauchy-Schwarz inequality and (20), we have \((p^{T} p)(y_{k}^{T} y_{k})- (p^{T} y_{k} )(y_{k}^{T}p) \geq0 \) and \(y_{k}^{T} s_{k}>0\), which implies that the matrix (9) is positive definite \(\forall k\geq0\).
Observe also that
$$\begin{aligned} \operatorname{tr}(Q_{k+1})&=\operatorname{tr}(\mu_{k} I)+\frac{s_{k}^{T}s_{k}}{s_{k}^{T}y_{k}}- \mu_{k} \frac {y_{k}^{T}y_{k} }{y_{k}^{T} y_{k}} \\ &=(n-1)\mu_{k}+\frac{s_{k}^{T}s_{k}}{s_{k}^{T}y_{k}} \\ &\leq\frac{n-1}{\psi_{1}^{2}}+\frac{\Vert s_{k}\Vert ^{2}}{\psi_{1}\Vert s_{k}\Vert ^{2}} \\ &=\frac{\psi_{1}+n-1}{\psi_{1}^{2}}. \end{aligned}$$
(21)
Now,
$$ 0 < \frac{1}{\psi_{2}}\leq \biggl(\frac{s_{k}^{T}s_{k}}{y_{k}^{T}s_{k}} \biggr) \leq \operatorname{tr}(Q_{k+1}) \leq\frac{\psi_{1}+n-1}{\psi_{1}^{2}}. $$
(22)
Thus, \(\operatorname{tr}(Q_{k+1})\) is bounded. On the other hand, by the Sherman-Morrison House-Holder formula (\(Q^{-1}_{k+1}\) is actually the memoryless updating matrix updated from \(\frac{1}{\mu_{k}} I \) using the direct DFP formula), we can obtain
$$ Q^{-1}_{k+1}=\frac{1}{\mu_{k}} I - \frac{1}{\mu_{k}}\frac {y_{k}s_{k}^{T}+s_{k} y_{k}^{T}}{s_{k}^{T}y_{k}}+ \biggl(1+ \frac{1}{\mu_{k}} \frac {s^{T}_{k} s_{k} }{s_{k}^{T} y_{k}} \biggr)\frac{y_{k} y_{k}^{T} }{s_{k}^{T} y_{k}}. $$
(23)
We can also establish the boundedness of \(\operatorname{tr}(Q^{-1}_{k+1})\) as
$$\begin{aligned} \operatorname{tr}\bigl(Q^{-1}_{k+1}\bigr)&=\operatorname{tr} \biggl( \frac{1}{\mu_{k}} I \biggr)-\frac{2}{\mu_{k}}\frac {s_{k}^{T}y_{k}}{s_{k}^{T}y_{k}} + \frac{\Vert y_{k}\Vert ^{2}}{s_{k}^{T}y_{k}} +\frac{1}{\mu_{k}} \frac{\Vert s_{k}\Vert ^{2} \Vert y_{k}\Vert ^{2}}{ (s_{k}^{T} y_{k} )^{2}} \\ &\leq\frac{n}{\mu_{k}} -\frac{2}{\mu_{k}}+\frac{L^{2}\Vert s_{k}\Vert ^{2}}{\psi_{1}\Vert s_{k}\Vert ^{2}} + \frac{1}{\mu_{k}}\frac{L^{2}\Vert s_{k}\Vert ^{4}}{\psi^{2}_{1}\Vert s_{k}\Vert ^{4}} \\ &\leq\frac{(n-2)}{\psi_{1}^{2}} +\frac{L^{2}}{\psi_{1}} +\frac{L^{2}}{\psi ^{4}_{1}} \\ &=\omega, \end{aligned}$$
(24)
where \(\omega=\frac{(n-2)}{\psi_{1}^{2}}+\frac{L^{2}}{\psi_{1}} +\frac {L^{2}}{\psi^{4}_{1}} >0\), for \(n \geq2\). □
Now, we shall state the sufficient descent property of the proposed search direction in the following lemma.
Lemma 3.2
Suppose that Assumption
1
holds on the objective function
f
then the search direction (12) satisfies the sufficient descent condition
\(g_{k+1}^{T} d_{k+1}\leq-c\Vert g_{k+1}\Vert ^{2}\).
Proof
Since \(- g_{k+1}^{T}d_{k+1} \geq\frac{1}{\operatorname{tr}(Q^{-1}_{k+1})}\Vert g_{k+1}\Vert ^{2} \) (see for example Leong [22] and Babaie-Kafaki [23]), then by using (24) we have
$$ -g_{k+1}^{T}d_{k+1}\geq c\Vert g_{k+1}\Vert ^{2}, $$
(25)
where \(c=\min \{1,\frac{1}{\omega} \}\). Thus,
$$ g_{k+1}^{T}d_{k+1} \leq-c\Vert g_{k+1}\Vert ^{2}. $$
(26)
Dai-Liao [24] extended the classical conjugacy condition from \(y_{k} ^{T} d_{k+1}=0\) to
$$ y_{k} ^{T} d_{k+1} =-t \bigl(s_{k}^{T}g_{k+1}\bigr), $$
(27)
where \(t\geq0\). Thus, we can also show that our proposed method satisfies the above conjugacy condition. □
Lemma 3.3
Suppose that Assumption
1
holds, then the search direction (12) satisfies the conjugacy condition (27).
Proof
By (12), we obtain
$$\begin{aligned} y_{k} ^{T} d_{k+1}&=-\mu y_{k}^{T} g_{k+1}- \frac {s^{T}_{k}g_{k+1}}{s^{T}_{k}y_{k}} y_{k}^{T}s_{k} +\mu\frac{y_{k}^{T} g_{k+1}}{y_{k}^{T} y_{k}}y_{k}^{T} y_{k} \\ &=-\mu y_{k}^{T} g_{k+1}- \frac {s_{k}^{T}g_{k+1}}{s_{k}^{T}y_{k}}s_{k}^{T}y_{k}+ \mu\frac {y_{k}^{T}g_{k+1}}{y_{k}^{T} y_{k}}y_{k}^{T} y_{k} \\ &=-\mu y_{k}^{T} g_{k+1}- s_{k}^{T}g_{k+1}+ \mu y_{k}^{T} g_{k+1} \\ &=- s_{k}^{T}g_{k+1}, \end{aligned}$$
where the result holds for \(t=1\). The following lemma gives the boundedness of the search direction. □
Lemma 3.4
Suppose that Assumption
1
holds then there exists a constant
\(p>0\)
such that
\(\Vert d_{k+1}\Vert \leq P\Vert g_{k+1}\Vert \), where
\(d_{k+1}\)
is defined by (12).
Proof
A direct result of (10) and the boundedness of \(\operatorname{tr}(Q_{k+1})\) gives
$$\begin{aligned} \Vert d_{k+1}\Vert &=\Vert Q_{k+1}g_{k+1} \Vert \\ &\leq \operatorname{tr}(Q_{k+1})\Vert g_{k+1}\Vert \\ & \leq P\Vert g_{k+1}\Vert , \end{aligned}$$
(28)
where \(P= (\frac{\psi_{1}+n-1}{\psi_{1}^{2}} )\). □
In order to establish the convergence result, we give the following lemma.
Lemma 3.5
Suppose that Assumption
1
holds. Then there exist some positive constants
\(\gamma_{1}\)
and
\(\gamma_{2}\)
such that for any steplength
\(\alpha_{k}\)
generated by Step 3 of Algorithm
1
will satisfy either of the following:
$$ f(x_{k}+\alpha_{k} d_{k})-f(x_{k}) \leq\frac{-\gamma_{1} (g_{k}^{T}d_{k} )^{2}}{\Vert d_{k}\Vert ^{2}}, $$
(29)
or
$$ f(x_{k}+\alpha_{k}d_{k})-f(x_{k}) \leq\gamma_{2} g_{k}^{T}d_{k}. $$
(30)
Proof
Suppose that (16) is satisfied with \(\alpha_{k}=1\), then
$$ f(x_{k}+\alpha_{k}d_{k})-f(x_{k}) \leq\delta g_{k}^{T}d_{k}, $$
(31)
implies that (30) is satisfied with \(\gamma_{2}=\delta\).
Suppose \(\alpha_{k}< 1\), and that (16) is not satisfied. Then for a steplength \(\alpha\leq\frac{\alpha_{k}}{p_{1}}\) we have
$$ f(x_{k}+\alpha d_{k})-f(x_{k})> \delta\alpha g_{k}^{T}d_{k}. $$
(32)
Now, by the mean-value theorem there exists a scalar \(\tau_{k}\in(0,1)\) such that
$$ f(x_{k}+\alpha d_{k})-f(x_{k})= \alpha g ( x_{k}+\tau\alpha d_{k} )^{T} d_{k}. $$
(33)
From (32) we have
$$\begin{aligned} ( \delta-1 ) \alpha g_{k}^{T}d_{k} & < \alpha \bigl(g(x_{k}+\tau_{k}\alpha d_{k})-g_{k} \bigr)^{T}d_{k} \\ &=\alpha y_{k}^{T} d_{k} \\ &< L \bigl( \alpha \Vert d_{k}\Vert \bigr) ^{2}, \end{aligned}$$
which implies
$$ \alpha\geq-\frac{(1-\delta)(g_{k}^{T}d_{k})}{L \Vert d_{k}\Vert ^{2}}. $$
(34)
Now,
$$ \alpha_{k}\geq p_{1}\alpha\geq- \frac{(1-\delta)(g_{k}^{T}d_{k})}{L \Vert d_{k}\Vert ^{2}}. $$
(35)
Substituting (34) in (16) we have the following:
$$\begin{aligned} f(x_{k}+\alpha_{k}d_{k})-f(x_{k}) & \leq-\frac{\delta(1-\delta)(g_{k}^{T}d_{k})}{L \Vert d_{k}\Vert ^{2}} \bigl(g_{k}^{T}d_{k}\bigr) \\ &=\frac{-\gamma_{1} (g_{k}^{T}d_{k} )^{2}}{\Vert d_{k}\Vert ^{2}}, \end{aligned}$$
where
$$\gamma_{1}=\frac{\delta(1-\delta)}{L }. $$
Therefore
$$ f(x_{k}+\alpha_{k} d_{k})-f(x_{k}) \leq\frac{-\gamma_{1} (g_{k}^{T}d_{k} )^{2}}{\Vert d_{k}\Vert ^{2}}. $$
(36)
□
Theorem 3.6
Suppose that Assumption
1
holds. Then Algorithm
1
generates a sequence of approximation
\(\{x_{k} \}\)
such that
$$ \lim_{k\rightarrow\infty} \Vert g_{k}\Vert = 0. $$
(37)
Proof
As a direct consequence of Lemma 3.4, the sufficient descent property (26), and the boundedness of the search direction (28) we have
$$\begin{aligned} f(x_{k}+\alpha_{k} d_{k})-f(x_{k}) &\leq\frac{-\gamma_{1} (g_{k}^{T}d_{k} )^{2}}{\Vert d_{k}\Vert ^{2}} \\ &\leq\frac{-\gamma_{1} c^{2}\Vert g_{k}\Vert ^{4}}{P^{2}\Vert g_{k}\Vert ^{2}} \\ &=\frac{-\gamma_{1} c^{2}}{P^{2}}\Vert g_{k}\Vert ^{2} \end{aligned}$$
(38)
or
$$\begin{aligned} f(x_{k}+\alpha_{k} d_{k})-f(x_{k}) &\leq\gamma_{1} g_{k}^{T}d_{k} \\ &\leq-\gamma_{1} c^{2}\Vert g_{k}\Vert ^{2}. \end{aligned}$$
(39)
Hence, in either case, there exists a positive constant \(\gamma_{3}\) such that
$$ f(x_{k}+\alpha_{k} d_{k})-f(x_{k}) \leq-\gamma_{3}\Vert g_{k}\Vert ^{2}. $$
(40)
Since the steplength \(\alpha_{k} \) generated by Algorithm 1 is bounded away from zero, (38) and (39) imply that \(f (x_{k} )\) is a non-increasing sequence. Thus, by the boundedness of \(f (x_{k} )\) we have
$$0=\lim_{k \rightarrow\infty} \bigl(f (x_{k+1} )-f (x_{k} ) \bigr)\leq-\gamma_{3}\lim_{k \rightarrow\infty} \Vert g_{k}\Vert ^{2}, $$
and as a result
$$ \lim_{k\rightarrow\infty} \Vert g_{k}\Vert =0. $$
(41)
□