- Research
- Open Access
- Published:
New inertial proximal gradient methods for unconstrained convex optimization problems
Journal of Inequalities and Applications volume 2020, Article number: 255 (2020)
Abstract
The proximal gradient method is a highly powerful tool for solving the composite convex optimization problem. In this paper, firstly, we propose inexact inertial acceleration methods based on the viscosity approximation and proximal scaled gradient algorithm to accelerate the convergence of the algorithm. Under reasonable parameters, we prove that our algorithms strongly converge to some solution of the problem, which is the unique solution of a variational inequality problem. Secondly, we propose an inexact alternated inertial proximal point algorithm. Under suitable conditions, the weak convergence theorem is proved. Finally, numerical results illustrate the performances of our algorithms and present a comparison with related algorithms. Our results improve and extend the corresponding results reported by many authors recently.
1 Introduction
Let H be a real Hilbert space with the inner product \(\langle \cdot ,\cdot \rangle \) and the induced norm \(\Vert \cdot \Vert \), and let C be a nonempty closed convex subset of H. Let \(\Gamma _{0}(H)\) be a space of functions in H that are proper, convex, and lower semicontinuous. We will deal with the unconstrained convex optimization problem of the following type:
where \(f,g\in \Gamma _{0}(H)\). It is often the case where f is differentiable and g is subdifferentiable.
In 1978, problem (1.1) was first studied in [13] and provided a natural tool to study various generic optimization models under a common framework. In recent years, many researchers have already proposed some algorithms to solve problem (1.1) and have discussed a lot of weak and strong convergence results, such as [1, 6, 12, 23, 25], just to name a few. As we know, lots of important optimization problems can be cast in this form. See, for instance, [23], where the author introduced the properties and iterative methods for the lasso as a special case of (1.1); due to the involvement of the \(l_{1}\) norm, which promotes sparsity, we can get a good result on solving the corresponding problem.
The following proposition is very useful for constructing the iterative algorithms.
Proposition 1.1
(see [23])
Let \(f,g\in \Gamma _{0}(H)\). Let \(x^{*}\in H\) and \(\lambda >0\). Assume that f is finite-valued and differential on H. Then \(x^{*}\) is a solution to (1.1) if and only if \(x^{*}\) solves the fixed point equation
On the other hand, we know that the errors often are produced in the process of calculation. It is an important property of algorithms which guarantees the convergence of the iterate under summable errors. Many authors have studied algorithms with perturbations and their convergence. Some related results are found in [3–5]. In 2011, Boikanyo and Morosanu introduced [2] a proximal point algorithm with error sequence. Under the summability condition on errors and some additional conditions on the parameters, they obtained strong convergence theorem.
In 2016, Jin, Censor, and Jiang [11] presented the projected scaled gradient (PSG) method with bounded perturbations in a finite dimensional setting for solving the following minimization problem:
where f is a continuously differentiable, convex function. More precisely, the method generates a sequence according to
and converges to a solution of problem (1.3) under suitable conditions, where \(D(x_{n})\) is a diagonal scaling matrix.
In 2017, Xu extended the method to infinite dimensional space and projected the superiorization techniques for the relaxed PSG [24]. The following iterative step was introduced:
where \(\tau _{n}\in [0,1]\). The weak convergence theorem was obtained in [24].
Quite recently, Guo and Cui [8] considered the modified proximal gradient method:
where h is a contractive mapping. The algorithm converges strongly to a solution of problem (1.1).
To accelerate the convergence of iteration methods, Polyak [19] introduced the following algorithm that can speed up gradient descent:
This modification was made immensely popular by Nesterov’s accelerated gradient algorithm [18]. Generally, an inertial iteration for operator P writes
In 2009, Beck and Teboulle [1] proposed a fast iterative shrinkage-thresholding algorithm for linear inverse problems. By applying the inertial technique, \(\{x_{n}\}\) is not employed on the previous point \(\{x_{n-1}\}\), but rather at the point \(\{y_{n}\}\) which uses a very specific linear combination of the previous two points \(\{x_{n-1},x_{n-2}\}\). Therefore, the convergence speed of the algorithm is greatly accelerated.
In 2015, for solving the maximal monotone inclusion problem, Mu and Peng [17] introduced alternated inertial proximal point iterates as follows:
where \(y_{n}\) is defined as
In equation (1.9), T is a set-valued maximal monotone operator and \(\lambda >0\). This form is a lot less popular than general inertia. However, it has pretty good convergence properties and performance.
In 2017, Iutzeler and Hendrickx [10] proposed a generic acceleration for optimization algorithm via relaxation and inertia, they also used alternated inertial acceleration in their algorithm. They obtained the convergence of the iterative sequence under some suitable assumptions.
Very recently, Shehu and Gibali [21] studied a new alternated inertial procedure for solving split feasibilities. Under some mild assumptions, they showed that the sequence converges strongly.
In this paper, mainly inspired and motivated by the above works, we introduce several iterative algorithms. Firstly, we combine the contractive mapping and proximal operator to propose an inertial acceleration proximal gradient method with errors for solving problem (1.1). Under more general and flexible conditions, we prove that the sequence converges strongly. Further, we extend the algorithm to a more generalized viscosity inertial acceleration method. Secondly, we propose a kind of alternating inertial proximal point algorithm with errors to solve problem (1.1), then we prove that the sequence converges weakly under appropriate conditions. Finally, we present several numerical examples to illustrate the effectiveness of our iterative schemes.
2 Preliminaries
We start by recalling some lemmas, definitions, and propositions needed in the proof of the main results.
Recall that given a closed subset C of a real Hilbert space H, for any \(x\in H\), there exists a unique nearest point in C denoted by \(P_{C}x\) such that
Such a \(P_{C}x\) is called the metric projection of H onto C.
Lemma 2.1
(see [14])
Let C be a nonempty closed convex subset of a real Hilbert space H. Given \(x\in H\) and \(z\in C\), then \(y=P_{C}x\) if and only if we have the relation
Lemma 2.2
Let H be a real Hilbert space, the following statements hold:
-
(i)
\(\Vert x+y\Vert ^{2}=\Vert x\Vert ^{2}+2\langle x,y\rangle +\Vert y\Vert ^{2}\), \(\forall x,y \in H \).
-
(ii)
\(\Vert x+y\Vert ^{2}\leq \Vert x\Vert ^{2}+2\langle x+y,y\rangle \), \(\forall x,y \in H \).
-
(iii)
\(\Vert \alpha x+(1-\alpha )y\Vert ^{2}=\alpha \Vert x\Vert ^{2}+(1-\alpha )\Vert y\Vert ^{2}- \alpha (1-\alpha )\Vert x-y\Vert ^{2}\) for all \(\alpha \in \mathbb{R}\) and \(x,y\in H \).
Definition 2.3
A mapping \(F:H\rightarrow H\) is said to be
-
(i)
Lipschitzian if there exists a positive constant L such that
$$ \Vert F x-Fy \Vert \leq L \Vert x-y \Vert ,\quad \forall x,y\in H. $$In particular, if \(L=1\), F is called nonexpansive. If \(L\in [0,1)\), F is called contractive.
-
(ii)
α-averaged mapping(α-av for short) if
$$ F=(1-\alpha )I+\alpha T, $$where \(\alpha \in (0,1)\) and \(T:H\rightarrow H\) is nonexpansive.
Proposition 2.4
([22])
-
(i)
If \(T_{1}, T_{2},\ldots, T_{n} \) are averaged mappings, then we can get that \(T_{n}T_{n-1}\cdots T_{1}\) is averaged. In particular, if \(T_{i}\) is \(\alpha _{i}\)-av for each \(i=1,2\), where \(\alpha _{i} \in (0,1)\), then \(T_{2}T_{1}\) is \((\alpha _{2}+\alpha _{1}-\alpha _{2}\alpha _{1})\)-av.
-
(ii)
If the mappings \(\{T_{i}\}^{N}_{i=1}\) are averaged and have a common fixed point, then we have
$$ \bigcap^{N}_{i=1}\operatorname{Fix}(T_{i})= \operatorname{Fix}(T_{1}\cdots T_{N}). $$Here, the notation \(\operatorname{Fix}(T)\) denotes the set of fixed points of the mapping T; that is, \(\operatorname{Fix}(T) := \{x \in H : Tx = x\}\).
-
(iii)
If T is ν-ism, then, for any \(\tau >0\), τT is \(\frac{\nu }{\tau }\)-ism.
-
(iv)
T is averaged if and only if \(I-T\) is ν-ism for some \(\nu >\frac{1}{2}\). Indeed, for any \(0<\alpha <1\), T is α-averaged if and only if \(I-T\) is \(\frac{1}{2\alpha }\)-ism.
Definition 2.5
(see [16])
The proximal operator of \(\varphi \in \Gamma _{0}(H)\) is defined by
The proximal operator of φ of order \(\lambda >0\) is defined as the proximal operator of λφ, that is,
Lemma 2.6
The proximal identity
holds for \(\varphi \in \Gamma _{0}(H)\), \(\lambda >0 \) and \(\mu >0\).
Lemma 2.7
(Demiclosedness principle, see [7])
Let H be a real Hilbert space, and let \(T:H\rightarrow H \) be a nonexpansive mapping with \(\operatorname{Fix}(T)\neq \emptyset \). If \(\{x_{n}\}\) is a sequence in H weakly converging to x and if \(\{(I-T)x_{n}\}\) converges strongly to y, then \((I-T)x=y\); in particular, if \(y=0\), then \(x\in \operatorname{Fix}(T)\).
Lemma 2.8
(see [9])
Assume that \(\{s_{n}\}\) is a sequence of nonnegative real numbers such that
where \(\{\gamma _{n}\}\) is a sequence in \((0,1)\), \(\{\eta _{n}\}\) is a sequence of nonnegative real numbers and \(\{\mu _{n}\}\) and \(\{\varphi _{n}\}\) are two sequences in \(\mathbb{R}\) such that
-
(i)
\(\sum_{n=0}^{\infty }\gamma _{n}=\infty \),
-
(ii)
\(\lim_{n\rightarrow \infty }\varphi _{n}=0\),
-
(iii)
\(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\) for any subsequence \(\{n_{k}\}\subset \{n\}\).
Then \(\lim_{n\rightarrow \infty }s_{n}=0\).
Lemma 2.9
(see [7])
Let C be a nonempty closed convex subset of a real Hilbert space H. Let \(\{x_{n}\}\) be a sequence in H satisfying the properties:
-
(i)
\(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists for each \(z\in C\),
-
(ii)
\(\omega _{w}(x_{n})\subset C\), where \(\omega _{w}(x_{n}):=\{x : \exists x_{n_{j}}\rightharpoonup x\}\) (\(\{x_{n_{j}} \}\) is a subsequence of \(\{x_{n}\}\)) denotes the weak ω-limit set of \(\{x_{n}\}\).
Then \(\{x_{n} \}\) converges weakly to a point in C.
Lemma 2.10
(see [20])
Let \(\{s_{n}\}\) be a sequence of nonnegative numbers satisfying the generalized nonincreasing property
where \(\{\sigma _{n}\}\) is a sequence of nonnegative numbers such that \(\sum_{n=0}^{\infty }\sigma _{n}<\infty \). Then \(\{s_{n}\}\) is bounded and \(\lim_{n\rightarrow \infty }s_{n} \) exists.
3 Main results
3.1 Inertial proximal gradient algorithm
In this section, we combine a viscosity iterative method for approximating the unique fixed point of the following variational inequality problem (VIP for short):
where \(h: H\rightarrow H\) is ρ-contractive and \(V_{\lambda }\) is nonexpansive.
We propose an inertial acceleration algorithm.
Algorithm 1
-
1.
Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).
-
2.
Given \(x_{n}\), \(x_{n-1}\), compute
$$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$(3.2) -
3.
Calculate the next iterate via
$$ x_{n+1}=\alpha _{n}h(y_{n})+(1- \alpha _{n}) \bigl(\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr)\bigr). $$(3.3) -
4.
If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n= n+1\) and go to 2.
Rewrite iteration (3.3) as follows:
where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and
Note that \(\Vert \tilde{e}_{n}\Vert \leq \Vert \hat{e}_{n}\Vert \leq \Vert e(y_{n})\Vert +\lambda _{n} \Vert \theta (y_{n})\Vert \), it is easy to get \(\sum_{n=0}^{\infty }\Vert \tilde{e}_{n}\Vert <\infty \) from conditions (iii)–(iv) of Theorem 3.1. We use S to denote the solution set of problem (1.1).
Theorem 3.1
Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Let h be ρ-contractive self-map of H with \(0\leq \rho <1\) and ∇f is L-Lipschitzian. Assume that D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 1, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that
-
(i)
\(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);
-
(ii)
\(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);
-
(iii)
\(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);
-
(iv)
\(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);
-
(v)
\(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).
Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).
Proof
We divide the proof into several steps.
Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),
Put \(V_{\lambda _{n}}:=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)\), from (3.4) and (3.5), we have
From conditions (iii)–(v) and \(\alpha _{n}>0\), we get \(\{(\delta _{n}\Vert x_{n}-x_{n-1}\Vert +\Vert \tilde{e}_{n}\Vert )/\alpha _{n}\}\) is bounded. Thus there exists some \(M_{1}>0\) such that
for all \(n\geq 0\). Then the mathematical induction implies that
Therefore, the sequence \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\), \(\{h(y_{n})\}\), and \(\{V_{\lambda _{n}}y_{n}\}\).
Step 2. Show that \(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies
for any sequence \(\{n_{k}\}\subset \{n\}\). Firstly, fix \(z\in S\), we have
Then from (3.4) we get
where \(M_{2}\) is some constant such that
Put \(\gamma _{n}:=\alpha _{n}(2-\alpha _{n}(1+2\rho ^{2})-2(1-\alpha _{n}) \rho )\), using (3.4) and (3.7), we deduce that
Secondly, since \(V_{\lambda _{n}} \) is \(\frac{2+\lambda _{n}L}{4} \)-av by Proposition 2.4, we can rewrite
where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive and, by condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.4), (3.8), and (3.10), we obtain
Set
Since \(\sum_{n=0}^{\infty }\gamma _{n}=\infty \) and \(\varphi _{n}\rightarrow 0\) hold obviously, in order to complete the proof by using Lemma 2.8, it suffices to verify that \(\eta _{n_{k}}\rightarrow 0\) (\(k\rightarrow \infty \)) implies
for any subsequence \(\{n_{k}\}\subset \{n\}\).
Indeed, as \(k\rightarrow \infty \), \(\eta _{n_{k}}\rightarrow 0\) implies \(\Vert T_{n_{k}}y_{n_{k}}-y_{n_{k}}\Vert \rightarrow 0\), from (3.10), we have
Due to condition (v), it follows that
Thus, we have
It follows from (3.12) and (3.13) that
Step 3. Show that
Take \(\tilde{x} \in \omega _{w}(x_{n_{k}})\) and assume that \(\{x_{n_{k_{j}}}\}\) is a subsequence of \(\{x_{n_{k}}\}\) weakly converging to x̃. Without loss of generality, we still use \(\{x_{n_{k}}\}\) to denote \(\{x_{n_{k_{j}}}\}\). Assume \(\lambda _{n_{k}}\rightarrow \lambda \), then \(0<\lambda <\frac{2}{L}\). Set \(V_{\lambda }=\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)\), then \(V_{\lambda }\) is nonexpansive. Set
Using the proximal identity of Lemma 2.6, we deduce that
Since \(\{x_{n}\}\) is bounded, ∇f is Lipschitz continuous, and \(\lambda _{n_{k}}\rightarrow \lambda \), we immediately derive from the last relation that \(\Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}}\Vert \rightarrow 0\). As a result, we find
Using Lemma 2.7, we get \(\omega _{w}(x_{n_{k}})\subset S\). Meanwhile, we have
Also, since \(x^{*}\) is the unique solution of variational inequality problem (3.1), we get
and hence \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\). □
Furthermore, we extend Algorithm 1 to a more generalized viscosity iterative algorithm. Suppose that the contractive mappings sequence \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume that the solution set \(S\neq \emptyset \), next we prove that the sequence \(\{x_{n}\}\) generated by Algorithm 2 converges strongly to a point \(x^{*}\in S\), which also solves variational inequality (3.1).
A more general inertial iterative algorithm is as follows.
Algorithm 2
-
1.
Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).
-
2.
Given \(x_{n}\), \(x_{n-1}\), compute
$$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$(3.20) -
3.
Calculate the next iterate via
$$ x_{n+1}=\alpha _{n}h_{n}(y_{n})+(1- \alpha _{n}) (\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n}) \bigr). $$(3.21) -
4.
If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.
Theorem 3.2
Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent. Let \(\{h_{n}\}\) be a sequence of \(\rho _{n}\)-contractive self-mappings of H with \(0<\rho _{l}=\liminf_{n\rightarrow \infty }\rho _{n}\leq \limsup_{n \rightarrow \infty }\rho _{n}=\rho _{u}<1\) and \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume that ∇f is L-Lipschizian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), define the sequence \(\{x_{n}\}\) by Algorithm 2, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that
-
(i)
\(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);
-
(ii)
\(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);
-
(iii)
\(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);
-
(iv)
\(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);
-
(v)
\(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).
Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).
Proof
Using the uniform convergence of the sequence of contractive mapping \(\{h_{n}\}\) and consulting [6], we have \(\lim_{n\rightarrow \infty }h_{n}=h\). It is not hard to complete the proof by using some similar techniques as in Theorem 3.1. □
3.2 Alternated inertial proximal gradient algorithm
In the light of the ideas of [10, 17, 21] and more related references, combining the proximal gradient method, we consider the following algorithm.
Algorithm 3
-
1.
Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).
-
2.
Given \(x_{n}\), \(x_{n-1}\), compute
$$ y_{n}= \textstyle\begin{cases} x_{n}+\delta _{n}(x_{n}-x_{n-1}), &n=\mathit{odd}, \\ x_{n}, &n=\mathit{even}. \end{cases} $$(3.22) -
3.
Calculate the next iterate via
$$ x_{n+1}=\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}-\lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr). $$(3.23) -
4.
If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.
Similar to (3.3), we rewrite (3.23) as follows:
where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and
Theorem 3.3
Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Assume that ∇f is L-Lipschitzian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 3, where \(\lambda _{n}\in (0,\frac{2}{L})\). Suppose that
-
(i)
\(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);
-
(ii)
\(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);
-
(iii)
\(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);
-
(iv)
\(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).
Then \(\{x_{n}\}\) converges weakly to a solution of the minimization problem of (1.1).
Proof
Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),
Applying conditions (ii) and (iv), we deduce that \(\{x_{2n}\}\) is bounded. Since
It is easy to get that \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\) and \(\{V_{\lambda _{n}}y_{n}\}\). Also, it follows from (3.25) and (3.26) that \(\{x_{n}\}\) is quasi-Fejer monotone with respect to S. By Lemma 2.10, \(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists.
Step 2. Show that \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\) and \(\lim_{n\rightarrow \infty }\Vert x_{n}-V_{\lambda _{n}}x_{n}\Vert =0\). Firstly, fix \(z\in S\), by Lemma 2.2 and Schwartz’s inequality, we have
Since \(V_{\lambda _{n}}\) is \(\frac{2+\lambda _{n}L}{4} \)-av, we see that
where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive. From condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.23) and (3.26), we obtain
where \(M_{3}=\sup \{2\Vert y_{2n+1}-z\Vert +\Vert \tilde{e}_{2n+1}\Vert \}\).
With the help of equality (3.28), we have
where \(M_{4}=\sup \{2\Vert x_{2n}-z\Vert +\Vert \tilde{e}_{2n}\Vert \}\).
Substituting (3.30) into (3.29), we get
Hence, we have the following result:
Noting the fact that \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\), we deduce from (3.32) that
In particular, \(\lim_{n\rightarrow \infty }\Vert T_{2n}x_{2n}-x_{2n}\Vert =0\). Now we have
Similarly, we argue that
Observe that
From (3.35) and condition (ii), we get
It follows from (3.36) and condition (iv) that
Combining (3.34) and (3.38), we obtain \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\). This yields
Step 3. Show that
Since \(\lambda _{n}\) is bounded, we may assume that the subsequence \(\lambda _{n_{k}}\) converges to some λ. It can be proved by a method similar to step 3 in Theorem 3.1. We conclude that (3.40) holds. By Lemma 2.9, we get \(\{x_{n}\}\) converges weakly. □
4 Numerical illustrations
In this section, we consider the following two examples to demonstrate the effectiveness of the algorithms and convergence of Theorem 3.1 and Theorem 3.3.
Example 4.1
Let \(H=\mathbb{R}^{N}\). Define \(h(x)=\frac{1}{10}x\). Take \(f(x)=\frac{1}{2}\Vert Ax-b\Vert ^{2}\), then we obtain that \(\nabla f(x)=A^{T}(Ax-b)\) with Lipschitz constant \(L=\Vert A^{T}A\Vert \), where \(A^{T}\) represents the transpose of A. Take \(g=\Vert x\Vert _{1}\), then
In [15], we know that
where \(\operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(i)=\max \{\vert x(i)\vert -\lambda _{n},0\}\operatorname{sign}(x(i))\), and \(x(i)\) denotes the ith element of x, \(i=1,2,\ldots,N\). Let D be a diagonal matrix with the element \(y_{n}(i)\). That is, \(D_{ii}=y_{n}(i)\), \(i=1,2,\ldots, N\). Given \(\alpha _{n}=\frac{1}{100n}\), \(\lambda _{n}=\frac{1}{30L}\frac{n+1}{n+2}\), and
for every \(n\geq 0\). Generate an \(M*N\) random matrix A whose entries are sampled independently from uniformly distribution. Generate randomly a vector b from a Gaussian distribution of zero mean and unit variance.
According to the iterative process of Theorem 3.1, the sequence \(\{x_{n}\}\) is generated by
Next, we use MATLAB software for numerical implementation. Set \(M=100\), \(N=1000\). Under the same parameters, contrast with iterative algorithm (4.2) in reference [6]. Take different error limit ϵ, we obtain the numerical experiment results in Table 1, where n and t denote the iterative number and running time(tic/toc), respectively. We use \(\Vert x_{n+1}-x_{n}\Vert <\epsilon \) as the stopping criteria.
In addition, we compare the values of \(\Vert x_{n+1}-x_{n}\Vert \) at the same number of iterations of (4.1) and (4.2). The results can be seen in Fig. 1. We also present different running time and the number of iterations at different stopping criteria ϵ. See Fig. 2.
The comparison of \(\Vert x_{n+1}-x_{n}\Vert \) of inertial acceleration (IA) and without inertial acceleration (UA) for \((M,N)= (100,1000)\) of Example 4.1
The comparison of running time and iteration steps of inertial acceleration (IA) and without inertial acceleration (UA) with the same stopping criteria for \((M,N)= (100,1000)\) of Example 4.1
It can be easily seen from Table 1, Fig. 1, and Fig. 2 that Algorithm 1 is faster than iterative formula (4.2) without inertial step. At the same stopping criteria, the values of \(\Vert x_{n+1}-x_{n}\Vert \) and \(\Vert Ax_{n}-b\Vert \) of Algorithm 1 are smaller.
In what follows, we give an example in an infinite dimensional space.
Example 4.2
Suppose that \(H=L^{2}([0,1])\) with the norm \(\Vert x\Vert =(\int _{0}^{1}(x(t))^{2}\,dt)^{\frac{1}{2}}\) and the inner product \(\langle x,y\rangle =\int _{0}^{1}x(t)y(t)\,dt\), \(\forall x,y\in H\). Define \(h(x)=\frac{1}{2}x\) and \(Ax(t)=tx(t)\). Let \(f(x)=\frac{1}{2}\Vert Ax(t)-u(t)\Vert ^{2}\) and \(g(x)\) be the indicator function of C, respectively, where \(u(t)\in H\) is a fixed function and \(C=\{x\in H\vert \Vert x\Vert \leq 1\}\).
By the definition of f and g, we obtain
and
where \(\iota _{C}\) denotes the indicator function and
We also deduce the adjoint operator of A is still A, i.e., \(A^{*}=A\). Take \(D(x_{n})=I\), set the parameters \(\alpha _{n}=\frac{1}{1000n}\) and \(\lambda _{n}=\frac{n}{L*(n+1)}\), according to the iterative algorithm of Theorem 3.1, we get the following sequence \(\{x_{n}\}\):
The numerical integration method used in this example is the trapezoidal formula. We test these two algorithms with different stopping criteria. The numerical results are shown in Table 2.
In what follows, we present a comparison of inertial proximal gradient algorithm (IA) and alternated inertial proximal gradient algorithm (AIA). Set \(e(y_{n})=\frac{1}{n^{2}}\) as the outer perturbation, the numerical results are reported in Table 3.
It is observed that the norm of \(x_{n}\) is close to 1 with the increase of iteration steps. From this example, the alternated inertia algorithm needs fewer iterations and less running time than inertia algorithm, but there is not much difference between the two algorithms.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Boikanyo, O.A., Morosanu, G.: Strong convergence of a proximal point algorithm with bounded error sequence. Optim. Lett. 72, 415–420 (2013)
Censor, Y., Davidi, R., Herman, G.T.: Perturbation resilience and superiorization of iterative algorithms. Inverse Probl. 26, 065008 (2010)
Davidi, R., Herman, G.T., Censor, Y.: Perturbation-resilient block-iterative projection methods with application to image reconstruction from projections. Int. Trans. Oper. Res. 16, 505–524 (2009)
Dong, Q.L., Zhao, J., He, S.N.: Bounded perturbation resilience of the viscosity algorithm. J. Inequal. Appl. 2016, 299 (2016)
Duan, P.C., Song, M.M.: General viscosity iterative approximation for solving unconstrained convex optimization problems. J. Inequal. Appl. 2015, 334 (2015)
Geobel, K., Kirk, W.A.: Topics in Metric Fixed Point Theory. Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1990)
Guo, Y.N., Cui, W.: Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm. J. Inequal. Appl. 2018, 103 (2018)
He, S.N., Yang, C.P.: Solving the variational inequality problem defined on intersection of finite level sets. Abstr. Appl. Anal. 2013, Article ID 942315 (2013)
Iutzeler, F., Hendricks, M.: A generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Softw. 34, 383–405 (2019)
Jin, W., Censor, Y., Jiang, M.: Bounded perturbation resilience of projected scaled gradient methods. Comput. Optim. Appl. 63, 365–392 (2016)
Mahammand, A.A., Naseer, S., Xu, H.K.: Properties and iterative methods for Q-lasso. Abstr. Appl. Anal. 8, Article ID 250943 (2013)
Mangasarian, O.L., Meyer, R.R., Robinson, S.M., Auslender, A.: Minimisation de fonctions localement Lipschitiziwnnes: appligramming a la programmation mi-convexe, mi-differential. In: Nonlinear Programming 3. Academic Press, New York (1978)
Marino, G., Xu, H.K.: Weak and strong convergence theorems for strict pseudo-contractions in Hilbert spaces. J. Math. Anal. Appl. 329, 336–346 (2007)
Micchelli, C.A., Shen, L.X., Xu, Y.S.: Proximity algorithms for image models: denoising. Inverse Probl. 27, 045009 (2011)
Moreau, J.J.: Proprietes des applications ‘prox’. C. R. Acad. Sci. Paris Sér. A Math. 256, 1069–1071 (1963)
Mu, Z.G., Peng, Y.: A note on the inertial proximal point method. Stat. Optim. Inf. Comput. 3, 241–248 (2015)
Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc. Publications Division, New York (1987)
Shehu, Y., Gibali, A.: New inertial relaxed method for solving split feasibilities. Optim. Lett. (2020). https://doi.org/10.1007/s11590-020-01603-1
Xu, H.K.: Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 150, 360–378 (2011)
Xu, H.K.: Properties and iterative methods for the lasso and its variants. Chin. Ann. Math. 35B(3), 1–18 (2014)
Xu, H.K.: Bounded perturbation resilience and superiorization techniques for the projected scaled gradient method. Inverse Probl. 33, 044008 (2017)
Yao, Z.S., Cho, S.Y., Kang, S.M., Zhu, L.J.: A regularized algorithm for the proximal split feasibility problem. Abstr. Appl. Anal. 2014, Article ID 894272 (2014)
Acknowledgements
The authors would like to thank the referee for valuable suggestions to improve the manuscript.
Funding
The authors thank the Foundation of Tianjin Key Lab for Advanced Signal Processing (2019ASP-TJ04) and the Scientific Research Project of Tianjin Municipal Education Commission (2019KJ133) for support.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Duan, P., Zhang, Y. & Bu, Q. New inertial proximal gradient methods for unconstrained convex optimization problems. J Inequal Appl 2020, 255 (2020). https://doi.org/10.1186/s13660-020-02522-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13660-020-02522-6
MSC
- 47H09
- 47H10
- 47J25
- 65K10
- 90C25
Keywords
- Convex optimization
- Viscosity approximation
- Proximal operator
- Inertial acceleration
- Alternated inertial acceleration