Skip to main content

New inertial proximal gradient methods for unconstrained convex optimization problems


The proximal gradient method is a highly powerful tool for solving the composite convex optimization problem. In this paper, firstly, we propose inexact inertial acceleration methods based on the viscosity approximation and proximal scaled gradient algorithm to accelerate the convergence of the algorithm. Under reasonable parameters, we prove that our algorithms strongly converge to some solution of the problem, which is the unique solution of a variational inequality problem. Secondly, we propose an inexact alternated inertial proximal point algorithm. Under suitable conditions, the weak convergence theorem is proved. Finally, numerical results illustrate the performances of our algorithms and present a comparison with related algorithms. Our results improve and extend the corresponding results reported by many authors recently.

1 Introduction

Let H be a real Hilbert space with the inner product \(\langle \cdot ,\cdot \rangle \) and the induced norm \(\Vert \cdot \Vert \), and let C be a nonempty closed convex subset of H. Let \(\Gamma _{0}(H)\) be a space of functions in H that are proper, convex, and lower semicontinuous. We will deal with the unconstrained convex optimization problem of the following type:

$$ \min_{x\in H}f(x)+g(x), $$

where \(f,g\in \Gamma _{0}(H)\). It is often the case where f is differentiable and g is subdifferentiable.

In 1978, problem (1.1) was first studied in [13] and provided a natural tool to study various generic optimization models under a common framework. In recent years, many researchers have already proposed some algorithms to solve problem (1.1) and have discussed a lot of weak and strong convergence results, such as [1, 6, 12, 23, 25], just to name a few. As we know, lots of important optimization problems can be cast in this form. See, for instance, [23], where the author introduced the properties and iterative methods for the lasso as a special case of (1.1); due to the involvement of the \(l_{1}\) norm, which promotes sparsity, we can get a good result on solving the corresponding problem.

The following proposition is very useful for constructing the iterative algorithms.

Proposition 1.1

(see [23])

Let \(f,g\in \Gamma _{0}(H)\). Let \(x^{*}\in H\) and \(\lambda >0\). Assume that f is finite-valued and differential on H. Then \(x^{*}\) is a solution to (1.1) if and only if \(x^{*}\) solves the fixed point equation

$$ x^{*}=\bigl(\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)\bigr)x^{*}. $$

On the other hand, we know that the errors often are produced in the process of calculation. It is an important property of algorithms which guarantees the convergence of the iterate under summable errors. Many authors have studied algorithms with perturbations and their convergence. Some related results are found in [35]. In 2011, Boikanyo and Morosanu introduced [2] a proximal point algorithm with error sequence. Under the summability condition on errors and some additional conditions on the parameters, they obtained strong convergence theorem.

In 2016, Jin, Censor, and Jiang [11] presented the projected scaled gradient (PSG) method with bounded perturbations in a finite dimensional setting for solving the following minimization problem:

$$ \min_{x\in C}f(x), $$

where f is a continuously differentiable, convex function. More precisely, the method generates a sequence according to

$$ x_{n+1}=P_{C}\bigl(x_{n}-\lambda _{n}D(x_{n})\nabla f(x_{n})+e(x_{n}) \bigr),\quad n\geq 0,$$

and converges to a solution of problem (1.3) under suitable conditions, where \(D(x_{n})\) is a diagonal scaling matrix.

In 2017, Xu extended the method to infinite dimensional space and projected the superiorization techniques for the relaxed PSG [24]. The following iterative step was introduced:

$$ x_{n+1}=(1-\tau _{n})x_{n}+\tau _{n}P_{C}\bigl(x_{n}-\gamma _{n}D(x_{n}) \nabla f(x_{n})+e(x_{n})\bigr),\quad n\geq 0,$$

where \(\tau _{n}\in [0,1]\). The weak convergence theorem was obtained in [24].

Quite recently, Guo and Cui [8] considered the modified proximal gradient method:

$$ x_{n+1}=\alpha _{n} h(x_{n})+(1- \alpha _{n})\operatorname{prox}_{\lambda _{n} g}(I- \lambda _{n} \nabla f) (x_{n})+e(x_{n}),\quad n\geq 0,$$

where h is a contractive mapping. The algorithm converges strongly to a solution of problem (1.1).

To accelerate the convergence of iteration methods, Polyak [19] introduced the following algorithm that can speed up gradient descent:

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=y_{n}-\lambda _{n}\nabla F(x_{n}). \end{cases} $$

This modification was made immensely popular by Nesterov’s accelerated gradient algorithm [18]. Generally, an inertial iteration for operator P writes

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\mathbf{P}(y_{n}). \end{cases} $$

In 2009, Beck and Teboulle [1] proposed a fast iterative shrinkage-thresholding algorithm for linear inverse problems. By applying the inertial technique, \(\{x_{n}\}\) is not employed on the previous point \(\{x_{n-1}\}\), but rather at the point \(\{y_{n}\}\) which uses a very specific linear combination of the previous two points \(\{x_{n-1},x_{n-2}\}\). Therefore, the convergence speed of the algorithm is greatly accelerated.

In 2015, for solving the maximal monotone inclusion problem, Mu and Peng [17] introduced alternated inertial proximal point iterates as follows:

$$ x_{n+1}=J_{\lambda T}(y_{n}), $$

where \(y_{n}\) is defined as

$$ y_{n}= \textstyle\begin{cases} x_{n}+\delta _{n}(x_{n}-x_{n-1}), &n=\mathit{odd}, \\ x_{n}, &n=\mathit{even}. \end{cases} $$

In equation (1.9), T is a set-valued maximal monotone operator and \(\lambda >0\). This form is a lot less popular than general inertia. However, it has pretty good convergence properties and performance.

In 2017, Iutzeler and Hendrickx [10] proposed a generic acceleration for optimization algorithm via relaxation and inertia, they also used alternated inertial acceleration in their algorithm. They obtained the convergence of the iterative sequence under some suitable assumptions.

Very recently, Shehu and Gibali [21] studied a new alternated inertial procedure for solving split feasibilities. Under some mild assumptions, they showed that the sequence converges strongly.

In this paper, mainly inspired and motivated by the above works, we introduce several iterative algorithms. Firstly, we combine the contractive mapping and proximal operator to propose an inertial acceleration proximal gradient method with errors for solving problem (1.1). Under more general and flexible conditions, we prove that the sequence converges strongly. Further, we extend the algorithm to a more generalized viscosity inertial acceleration method. Secondly, we propose a kind of alternating inertial proximal point algorithm with errors to solve problem (1.1), then we prove that the sequence converges weakly under appropriate conditions. Finally, we present several numerical examples to illustrate the effectiveness of our iterative schemes.

2 Preliminaries

We start by recalling some lemmas, definitions, and propositions needed in the proof of the main results.

Recall that given a closed subset C of a real Hilbert space H, for any \(x\in H\), there exists a unique nearest point in C denoted by \(P_{C}x\) such that

$$ \Vert x-P_{C}x \Vert \leq \Vert x-y \Vert , \quad \forall y \in C. $$

Such a \(P_{C}x\) is called the metric projection of H onto C.

Lemma 2.1

(see [14])

Let C be a nonempty closed convex subset of a real Hilbert space H. Given \(x\in H\) and \(z\in C\), then \(y=P_{C}x\) if and only if we have the relation

$$ \langle x-y,y-z\rangle \geq 0, \quad \forall z\in C. $$

Lemma 2.2

Let H be a real Hilbert space, the following statements hold:

  1. (i)

    \(\Vert x+y\Vert ^{2}=\Vert x\Vert ^{2}+2\langle x,y\rangle +\Vert y\Vert ^{2}\), \(\forall x,y \in H \).

  2. (ii)

    \(\Vert x+y\Vert ^{2}\leq \Vert x\Vert ^{2}+2\langle x+y,y\rangle \), \(\forall x,y \in H \).

  3. (iii)

    \(\Vert \alpha x+(1-\alpha )y\Vert ^{2}=\alpha \Vert x\Vert ^{2}+(1-\alpha )\Vert y\Vert ^{2}- \alpha (1-\alpha )\Vert x-y\Vert ^{2}\) for all \(\alpha \in \mathbb{R}\) and \(x,y\in H \).

Definition 2.3

A mapping \(F:H\rightarrow H\) is said to be

  1. (i)

    Lipschitzian if there exists a positive constant L such that

    $$ \Vert F x-Fy \Vert \leq L \Vert x-y \Vert ,\quad \forall x,y\in H. $$

    In particular, if \(L=1\), F is called nonexpansive. If \(L\in [0,1)\), F is called contractive.

  2. (ii)

    α-averaged mapping(α-av for short) if

    $$ F=(1-\alpha )I+\alpha T, $$

    where \(\alpha \in (0,1)\) and \(T:H\rightarrow H\) is nonexpansive.

Proposition 2.4


  1. (i)

    If \(T_{1}, T_{2},\ldots, T_{n} \) are averaged mappings, then we can get that \(T_{n}T_{n-1}\cdots T_{1}\) is averaged. In particular, if \(T_{i}\) is \(\alpha _{i}\)-av for each \(i=1,2\), where \(\alpha _{i} \in (0,1)\), then \(T_{2}T_{1}\) is \((\alpha _{2}+\alpha _{1}-\alpha _{2}\alpha _{1})\)-av.

  2. (ii)

    If the mappings \(\{T_{i}\}^{N}_{i=1}\) are averaged and have a common fixed point, then we have

    $$ \bigcap^{N}_{i=1}\operatorname{Fix}(T_{i})= \operatorname{Fix}(T_{1}\cdots T_{N}). $$

    Here, the notation \(\operatorname{Fix}(T)\) denotes the set of fixed points of the mapping T; that is, \(\operatorname{Fix}(T) := \{x \in H : Tx = x\}\).

  3. (iii)

    If T is ν-ism, then, for any \(\tau >0\), τT is \(\frac{\nu }{\tau }\)-ism.

  4. (iv)

    T is averaged if and only if \(I-T\) is ν-ism for some \(\nu >\frac{1}{2}\). Indeed, for any \(0<\alpha <1\), T is α-averaged if and only if \(I-T\) is \(\frac{1}{2\alpha }\)-ism.

Definition 2.5

(see [16])

The proximal operator of \(\varphi \in \Gamma _{0}(H)\) is defined by

$$ \operatorname{prox}_{\varphi }(x)=\arg \min_{\nu \in H}\biggl\{ \varphi (\nu )+\frac{1}{2} \Vert \nu -x \Vert ^{2}\biggr\} , \quad x\in H. $$

The proximal operator of φ of order \(\lambda >0\) is defined as the proximal operator of λφ, that is,

$$ \operatorname{prox}_{\lambda \varphi }(x)=\arg \min_{\nu \in H}\biggl\{ \varphi (\nu )+ \frac{1}{2\lambda } \Vert \nu -x \Vert ^{2}\biggr\} , \quad x\in H . $$

Lemma 2.6

The proximal identity

$$ \operatorname{prox}_{\lambda \varphi }x=\operatorname{prox}_{\mu \varphi } \biggl(\frac{\mu }{\lambda }x+\biggl(1- \frac{\mu }{\lambda }\biggr) \operatorname{prox}_{\lambda \varphi }x\biggr) $$

holds for \(\varphi \in \Gamma _{0}(H)\), \(\lambda >0 \) and \(\mu >0\).

Lemma 2.7

(Demiclosedness principle, see [7])

Let H be a real Hilbert space, and let \(T:H\rightarrow H \) be a nonexpansive mapping with \(\operatorname{Fix}(T)\neq \emptyset \). If \(\{x_{n}\}\) is a sequence in H weakly converging to x and if \(\{(I-T)x_{n}\}\) converges strongly to y, then \((I-T)x=y\); in particular, if \(y=0\), then \(x\in \operatorname{Fix}(T)\).

Lemma 2.8

(see [9])

Assume that \(\{s_{n}\}\) is a sequence of nonnegative real numbers such that

$$\begin{aligned}& s_{n+1}\leq (1-\gamma _{n})s_{n}+\gamma _{n}\mu _{n}, \quad n\geq 0, \\& s_{n+1}\leq s_{n}-\eta _{n}+\varphi _{n}, \quad n\geq 0, \end{aligned}$$

where \(\{\gamma _{n}\}\) is a sequence in \((0,1)\), \(\{\eta _{n}\}\) is a sequence of nonnegative real numbers and \(\{\mu _{n}\}\) and \(\{\varphi _{n}\}\) are two sequences in \(\mathbb{R}\) such that

  1. (i)

    \(\sum_{n=0}^{\infty }\gamma _{n}=\infty \),

  2. (ii)

    \(\lim_{n\rightarrow \infty }\varphi _{n}=0\),

  3. (iii)

    \(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\) for any subsequence \(\{n_{k}\}\subset \{n\}\).

Then \(\lim_{n\rightarrow \infty }s_{n}=0\).

Lemma 2.9

(see [7])

Let C be a nonempty closed convex subset of a real Hilbert space H. Let \(\{x_{n}\}\) be a sequence in H satisfying the properties:

  1. (i)

    \(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists for each \(z\in C\),

  2. (ii)

    \(\omega _{w}(x_{n})\subset C\), where \(\omega _{w}(x_{n}):=\{x : \exists x_{n_{j}}\rightharpoonup x\}\) (\(\{x_{n_{j}} \}\) is a subsequence of \(\{x_{n}\}\)) denotes the weak ω-limit set of \(\{x_{n}\}\).

Then \(\{x_{n} \}\) converges weakly to a point in C.

Lemma 2.10

(see [20])

Let \(\{s_{n}\}\) be a sequence of nonnegative numbers satisfying the generalized nonincreasing property

$$ s_{n+1}\leq s_{n}+\sigma _{n},\quad n\geq 0, $$

where \(\{\sigma _{n}\}\) is a sequence of nonnegative numbers such that \(\sum_{n=0}^{\infty }\sigma _{n}<\infty \). Then \(\{s_{n}\}\) is bounded and \(\lim_{n\rightarrow \infty }s_{n} \) exists.

3 Main results

3.1 Inertial proximal gradient algorithm

In this section, we combine a viscosity iterative method for approximating the unique fixed point of the following variational inequality problem (VIP for short):

$$ \bigl\langle (I-h)x^{*},\tilde{x}-x^{*}\bigr\rangle \geq 0,\quad \forall \tilde{x}\in \operatorname{Fix}(V_{\lambda }), $$

where \(h: H\rightarrow H\) is ρ-contractive and \(V_{\lambda }\) is nonexpansive.

We propose an inertial acceleration algorithm.

Algorithm 1

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\alpha _{n}h(y_{n})+(1- \alpha _{n}) \bigl(\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr)\bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n= n+1\) and go to 2.

Rewrite iteration (3.3) as follows:

$$\begin{aligned} x_{n+1}&=\alpha _{n}h(y_{n})+(1-\alpha _{n})\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+\hat{e}_{n}\bigr) \\ &=\alpha _{n}h(y_{n})+(1-\alpha _{n}) \bigl( \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n} \nabla f(y_{n})\bigr)+\tilde{e}_{n}\bigr), \end{aligned}$$

where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and

$$ \tilde{e}_{n}=\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+ \hat{e}_{n}\bigr)- \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}-\lambda _{n} \nabla f(y_{n})\bigr). $$

Note that \(\Vert \tilde{e}_{n}\Vert \leq \Vert \hat{e}_{n}\Vert \leq \Vert e(y_{n})\Vert +\lambda _{n} \Vert \theta (y_{n})\Vert \), it is easy to get \(\sum_{n=0}^{\infty }\Vert \tilde{e}_{n}\Vert <\infty \) from conditions (iii)–(iv) of Theorem 3.1. We use S to denote the solution set of problem (1.1).

Theorem 3.1

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Let h be ρ-contractive self-map of H with \(0\leq \rho <1\) and f is L-Lipschitzian. Assume that D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 1, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that

  1. (i)

    \(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);

  2. (ii)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  5. (v)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).


We divide the proof into several steps.

Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),

$$\begin{aligned} \Vert y_{n}-z \Vert &= \bigl\Vert x_{n}+\delta _{n}(x_{n}-x_{n-1})-z \bigr\Vert \\ &\leq \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert . \end{aligned}$$

Put \(V_{\lambda _{n}}:=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)\), from (3.4) and (3.5), we have

$$\begin{aligned} & \Vert x_{n+1}-z \Vert \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert \\ &\quad = \bigl\Vert \alpha _{n}\bigl(h(y_{n})-z \bigr)+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}-z)+(1- \alpha _{n})\tilde{e}_{n} \bigr\Vert \\ &\quad \leq \alpha _{n} \bigl\Vert h(y_{n})-h(z) \bigr\Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert +(1-\alpha _{n}) \Vert V_{\lambda _{n}}y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \alpha _{n}\rho \Vert y_{n}-z \Vert + \alpha _{n} \bigl\Vert h(z)-z \bigr\Vert +(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad =\bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert y_{n}-z \Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert +\alpha _{n} \bigl\Vert h(z)-z \bigr\Vert + \Vert \tilde{e}_{n} \Vert \\ &\quad =\bigl(1-\alpha _{n}(1-\rho )\bigr) \Vert x_{n}-z \Vert +\alpha _{n}(1-\rho ) \frac{ \Vert h(z)-z \Vert +(\delta _{n} \Vert x_{n}-x_{n-1} \Vert + \Vert \tilde{e}_{n} \Vert )/\alpha _{n}}{1-\rho }. \end{aligned}$$

From conditions (iii)–(v) and \(\alpha _{n}>0\), we get \(\{(\delta _{n}\Vert x_{n}-x_{n-1}\Vert +\Vert \tilde{e}_{n}\Vert )/\alpha _{n}\}\) is bounded. Thus there exists some \(M_{1}>0\) such that

$$ M_{1}\geq \sup \bigl\{ \bigl\Vert h(z)-z \bigr\Vert +\bigl(\delta _{n} \Vert x_{n}-x_{n-1} \Vert + \Vert \tilde{e}_{n} \Vert \bigr)/\alpha _{n}\bigr\} $$

for all \(n\geq 0\). Then the mathematical induction implies that

$$ \Vert x_{n}-z \Vert \leq \max \biggl\{ \Vert x_{0}-z \Vert , \frac{M_{1}}{1-\rho }\biggr\} . $$

Therefore, the sequence \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\), \(\{h(y_{n})\}\), and \(\{V_{\lambda _{n}}y_{n}\}\).

Step 2. Show that \(\lim_{k\rightarrow \infty }\eta _{n_{k}}=0\) implies

$$ \lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert =0 $$

for any sequence \(\{n_{k}\}\subset \{n\}\). Firstly, fix \(z\in S\), we have

$$\begin{aligned} \Vert y_{n}-z \Vert ^{2}&= \bigl\Vert x_{n}+\delta _{n}(x_{n}-x_{n-1})-z \bigr\Vert ^{2} \\ &\leq \Vert x_{n}-z \Vert ^{2}+2\bigl\langle x_{n}-z+\delta _{n}(x_{n}-x_{n-1}), \delta _{n}(x_{n}-x_{n-1})\bigr\rangle \\ &\leq \Vert x_{n}-z \Vert ^{2}+2\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert + \delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr). \end{aligned}$$

Then from (3.4) we get

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert ^{2} \\ &\quad \leq \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n})V_{\lambda _{n}}y_{n}-z \bigr\Vert ^{2} +2(1-\alpha _{n})\bigl\langle \alpha _{n}h(y_{n})+(1- \alpha _{n})V_{ \lambda _{n}}y_{n}-z,\tilde{e}_{n} \bigr\rangle \\ &\quad \quad {}+ \Vert \tilde{e}_{n} \Vert ^{2} \\ &\quad \leq \alpha _{n}^{2} \bigl\Vert h(y_{n})-z \bigr\Vert ^{2}+(1-\alpha _{n})^{2} \Vert V_{ \lambda _{n}}y_{n}-z \Vert ^{2}+2\alpha _{n}(1-\alpha _{n})\bigl\langle h(y_{n})-z,V_{ \lambda _{n}}y_{n}-z \bigr\rangle \\ &\quad \quad {} +\bigl(2\alpha _{n} \bigl\Vert h(y_{n})-z \bigr\Vert +2(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \bigr) \Vert \tilde{e}_{n} \Vert \\ &\quad \leq 2\alpha _{n}^{2}\bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert ^{2}+ \bigl\Vert h(z)-z \bigr\Vert ^{2}\bigr)+(1- \alpha _{n})^{2} \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} + 2\alpha _{n}(1-\alpha _{n})\bigl\langle h(y_{n})-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq 2\alpha _{n}^{2}\bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert ^{2}+ \bigl\Vert h(z)-z \bigr\Vert ^{2}\bigr)+(1- \alpha _{n})^{2} \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} + 2\alpha _{n}(1-\alpha _{n}) \bigl( \bigl\Vert h(y_{n})-h(z) \bigr\Vert \Vert y_{n}-z \Vert + \bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z\bigr\rangle \bigr)+M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \bigl(1-\alpha _{n}\bigl(2-\alpha _{n} \bigl(1+2\rho ^{2}\bigr)-2(1-\alpha _{n}) \rho \bigr)\bigr) \Vert y_{n}-z \Vert ^{2} \\ &\quad \quad {} +2\alpha _{n}(1-\alpha _{n})\bigl\langle h(z)-z,V_{ \lambda _{n}}y_{n}-z\bigr\rangle +2\alpha _{n}^{2} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert , \end{aligned}$$

where \(M_{2}\) is some constant such that

$$ M_{2}\geq \sup \bigl\{ 2\alpha _{n} \bigl\Vert h(y_{n})-z \bigr\Vert +2(1-\alpha _{n}) \Vert y_{n}-z \Vert + \Vert \tilde{e}_{n} \Vert \bigr\} . $$

Put \(\gamma _{n}:=\alpha _{n}(2-\alpha _{n}(1+2\rho ^{2})-2(1-\alpha _{n}) \rho )\), using (3.4) and (3.7), we deduce that

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad \leq (1-\gamma _{n}) \Vert x_{n}-z \Vert ^{2}+2\delta _{n}(1-\gamma _{n}) \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr) \\ &\quad\quad {} +2\alpha _{n}(1-\alpha _{n})\bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle +2\alpha _{n}^{2} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$

Secondly, since \(V_{\lambda _{n}} \) is \(\frac{2+\lambda _{n}L}{4} \)-av by Proposition 2.4, we can rewrite

$$\begin{aligned} V_{\lambda _{n}}=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)=(1-w_{n})I+w_{n}T_{n}, \end{aligned}$$

where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive and, by condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.4), (3.8), and (3.10), we obtain

$$\begin{aligned} & \Vert x_{n+1}-z \Vert ^{2} \\ &\quad = \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n}) (V_{\lambda _{n}}y_{n}+ \tilde{e}_{n})-z \bigr\Vert ^{2} \\ &\quad \leq \bigl\Vert \alpha _{n}h(y_{n})+(1-\alpha _{n})V_{\lambda _{n}}y_{n}-z \bigr\Vert ^{2}+M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \bigl\Vert V_{\lambda _{n}}y_{n}-z+\alpha _{n} \bigl(h(y_{n})-V_{\lambda _{n}}y_{n}\bigr) \bigr\Vert ^{2} +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \Vert V_{\lambda _{n}}y_{n}-z \Vert ^{2}+{ \alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\Vert ^{2}+2\alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad = \bigl\Vert (1-w_{n})y_{n}+w_{n}T_{n}y_{n}-z \bigr\Vert ^{2}+{\alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {} +2\alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{ \lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad =(1-w_{n}) \Vert y_{n}-z \Vert ^{2}+w_{n} \Vert T_{n}y_{n}-T_{n}z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2} \\ &\quad \quad {} +{\alpha _{n}}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2}+2 \alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \Vert y_{n}-z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}+ \alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {}+2\alpha _{n}\bigl\langle V_{ \lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert \\ &\quad \leq \Vert x_{n}-z \Vert ^{2}-w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}+ \alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2} \\ &\quad \quad {}+2\alpha _{n}\bigl\langle V_{ \lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle \\ & \quad \quad {} +2\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigl( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert \bigr)+M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$


$$\begin{aligned}& s_{n}= \Vert x_{n}-z \Vert ^{2},\quad\quad \eta _{n}=w_{n}(1-w_{n}) \Vert T_{n}y_{n}-y_{n} \Vert ^{2}, \\& \begin{aligned} \mu _{n}={}&\frac{1}{2-\alpha _{n}(1+2{\rho }^{2})-2(1-\alpha _{n})\rho }\biggl(2{ \alpha _{n}} \bigl\Vert h(z)-z \bigr\Vert ^{2}+M_{2} \frac{ \Vert \tilde{e}_{n} \Vert }{\alpha _{n}} \\ &{}+ \frac{2\delta _{n} \Vert x_{n}-x_{n-1} \Vert ( \Vert x_{n}-z \Vert +\delta _{n} \Vert x_{n}-x_{n-1} \Vert )}{\alpha _{n}} \\ &{} +2(1-\alpha _{n})\bigl\langle h(z)-z,V_{\lambda _{n}}y_{n}-z \bigr\rangle \biggr), \end{aligned} \\& \varphi _{n}=\alpha _{n}^{2} \bigl\Vert h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\Vert ^{2}+2 \alpha _{n}\bigl\langle V_{\lambda _{n}}y_{n}-z,h(y_{n})-V_{\lambda _{n}}y_{n} \bigr\rangle +M_{2} \Vert \tilde{e}_{n} \Vert . \end{aligned}$$

Since \(\sum_{n=0}^{\infty }\gamma _{n}=\infty \) and \(\varphi _{n}\rightarrow 0\) hold obviously, in order to complete the proof by using Lemma 2.8, it suffices to verify that \(\eta _{n_{k}}\rightarrow 0\) (\(k\rightarrow \infty \)) implies

$$ \limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0 $$

for any subsequence \(\{n_{k}\}\subset \{n\}\).

Indeed, as \(k\rightarrow \infty \), \(\eta _{n_{k}}\rightarrow 0\) implies \(\Vert T_{n_{k}}y_{n_{k}}-y_{n_{k}}\Vert \rightarrow 0\), from (3.10), we have

$$\begin{aligned} \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert =w_{n_{k}} \Vert y_{n_{k}}-T_{n_{k}}y_{n_{k}} \Vert \rightarrow 0. \end{aligned}$$

Due to condition (v), it follows that

$$\begin{aligned} \Vert y_{n_{k}}-x_{n_{k}} \Vert =\delta _{n_{k}} \Vert x_{n_{k}}-x_{n_{k}-1} \Vert \rightarrow 0. \end{aligned}$$

Thus, we have

$$\begin{aligned} &\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad =\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-y_{n_{k}}+y_{n_{k}}-V_{ \lambda _{n_{k}}}y_{n_{k}}+V_{\lambda _{n_{k}}}y_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad \leq \lim_{k\rightarrow \infty } \Vert x_{n_{k}}-y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert V_{\lambda _{n_{k}}}y_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert \\ &\quad \leq \lim_{k\rightarrow \infty }2 \Vert x_{n_{k}}-y_{n_{k}} \Vert +\lim_{k \rightarrow \infty } \Vert y_{n_{k}}-V_{\lambda _{n_{k}}}y_{n_{k}} \Vert . \end{aligned}$$

It follows from (3.12) and (3.13) that

$$\begin{aligned} &\lim_{k\rightarrow \infty } \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert =0. \end{aligned}$$

Step 3. Show that

$$\begin{aligned} \omega _{w}(x_{n_{k}})\subset S. \end{aligned}$$

Take \(\tilde{x} \in \omega _{w}(x_{n_{k}})\) and assume that \(\{x_{n_{k_{j}}}\}\) is a subsequence of \(\{x_{n_{k}}\}\) weakly converging to . Without loss of generality, we still use \(\{x_{n_{k}}\}\) to denote \(\{x_{n_{k_{j}}}\}\). Assume \(\lambda _{n_{k}}\rightarrow \lambda \), then \(0<\lambda <\frac{2}{L}\). Set \(V_{\lambda }=\operatorname{prox}_{\lambda g}(I-\lambda \nabla f)\), then \(V_{\lambda }\) is nonexpansive. Set

$$ t_{k}=x_{n_{k}}-\lambda _{n_{k}}\nabla f(x_{n_{k}}), \quad\quad z_{k}=x_{n_{k}}- \lambda \nabla f(x_{n_{k}}). $$

Using the proximal identity of Lemma 2.6, we deduce that

$$\begin{aligned} & \Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \\ &\quad = \Vert \operatorname{prox}_{\lambda _{n_{k}}g}t_{k}- \operatorname{prox}_{\lambda g}z_{k} \Vert \\ &\quad = \biggl\Vert \operatorname{prox}_{\lambda g}\biggl( \frac{\lambda }{\lambda _{n_{k}}}t_{k}+ \biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr) \operatorname{prox}_{{\lambda _{n_{k}}}g}t_{k}\biggr)-\operatorname{prox}_{ \lambda g}z_{k} \biggr\Vert \\ &\quad \leq \biggl\Vert \frac{\lambda }{\lambda _{n_{k}}}t_{k}+\biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr)\operatorname{prox}_{\lambda _{n_{k}}g}t_{k}-z_{k} \biggr\Vert \\ &\quad \leq \frac{\lambda }{\lambda _{n_{k}}} \Vert t_{k}-z_{k} \Vert +\biggl(1- \frac{\lambda }{\lambda _{n_{k}}}\biggr) \Vert \operatorname{prox}_{\lambda _{n_{k}}g}t_{k}-z_{k} \Vert \\ &\quad =\frac{\lambda }{\lambda _{n_{k}}} \vert \lambda _{n_{k}}-\lambda \vert \bigl\Vert \nabla f(x_{n_{k}}) \bigr\Vert +\biggl(1-\frac{\lambda }{\lambda _{n_{k}}} \biggr) \Vert \operatorname{prox}_{ \lambda _{n_{k}}g}t_{k}-z_{k} \Vert . \end{aligned}$$

Since \(\{x_{n}\}\) is bounded, f is Lipschitz continuous, and \(\lambda _{n_{k}}\rightarrow \lambda \), we immediately derive from the last relation that \(\Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}}\Vert \rightarrow 0\). As a result, we find

$$ \Vert x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \leq \Vert x_{n_{k}}-V_{\lambda _{n_{k}}}x_{n_{k}} \Vert + \Vert V_{\lambda _{n_{k}}}x_{n_{k}}-V_{\lambda }x_{n_{k}} \Vert \rightarrow 0. $$

Using Lemma 2.7, we get \(\omega _{w}(x_{n_{k}})\subset S\). Meanwhile, we have

$$\begin{aligned} &\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},V_{\lambda _{n_{k}}}y_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},V_{\lambda _{n_{k}}}x_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},x_{n_{k}}-x^{*} \bigr\rangle \\ &\quad =\bigl\langle h\bigl(x^{*}\bigr)-x^{*}, \tilde{x}-x^{*}\bigr\rangle ,\quad \forall \tilde{x} \in S. \end{aligned}$$

Also, since \(x^{*}\) is the unique solution of variational inequality problem (3.1), we get

$$ \limsup_{k\rightarrow \infty }\bigl\langle h\bigl(x^{*} \bigr)-x^{*},x_{n_{k}}-x^{*} \bigr\rangle \leq 0, $$

and hence \(\limsup_{k\rightarrow \infty }\mu _{n_{k}}\leq 0\). □

Furthermore, we extend Algorithm 1 to a more generalized viscosity iterative algorithm. Suppose that the contractive mappings sequence \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume that the solution set \(S\neq \emptyset \), next we prove that the sequence \(\{x_{n}\}\) generated by Algorithm 2 converges strongly to a point \(x^{*}\in S\), which also solves variational inequality (3.1).

A more general inertial iterative algorithm is as follows.

Algorithm 2

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}). $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\alpha _{n}h_{n}(y_{n})+(1- \alpha _{n}) (\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n}) \bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.

Theorem 3.2

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent. Let \(\{h_{n}\}\) be a sequence of \(\rho _{n}\)-contractive self-mappings of H with \(0<\rho _{l}=\liminf_{n\rightarrow \infty }\rho _{n}\leq \limsup_{n \rightarrow \infty }\rho _{n}=\rho _{u}<1\) and \(\{h_{n}(x)\}\) is uniformly convergent on any B, where B is any bounded subset of H. Assume that f is L-Lipschizian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), define the sequence \(\{x_{n}\}\) by Algorithm 2, where \(\lambda _{n}\in (0,\frac{2}{L})\), \(\alpha _{n}\in (0,\frac{2+\lambda _{n} L}{4})\). Suppose that

  1. (i)

    \(\lim_{n\rightarrow \infty }\alpha _{n}=0\), \(\sum_{n=0}^{\infty }\alpha _{n}=\infty \);

  2. (ii)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  5. (v)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges strongly to \(x^{*}\), where \(x^{*}\) is a solution of (1.1), which is also the unique solution of variational inequality problem (3.1).


Using the uniform convergence of the sequence of contractive mapping \(\{h_{n}\}\) and consulting [6], we have \(\lim_{n\rightarrow \infty }h_{n}=h\). It is not hard to complete the proof by using some similar techniques as in Theorem 3.1. □

3.2 Alternated inertial proximal gradient algorithm

In the light of the ideas of [10, 17, 21] and more related references, combining the proximal gradient method, we consider the following algorithm.

Algorithm 3

  1. 1.

    Choose \(x_{0},x_{1} \in H\) and set \(n:=1\).

  2. 2.

    Given \(x_{n}\), \(x_{n-1}\), compute

    $$ y_{n}= \textstyle\begin{cases} x_{n}+\delta _{n}(x_{n}-x_{n-1}), &n=\mathit{odd}, \\ x_{n}, &n=\mathit{even}. \end{cases} $$
  3. 3.

    Calculate the next iterate via

    $$ x_{n+1}=\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}-\lambda _{n}D(y_{n})\nabla f(y_{n})+e(y_{n})\bigr). $$
  4. 4.

    If \(\Vert x_{n}-x_{n+1}\Vert <\epsilon \), then stop. Otherwise, set \(n=n+1\) and go to 2.

Similar to (3.3), we rewrite (3.23) as follows:

$$ x_{n+1}=\operatorname{prox}_{\lambda _{n}g} \bigl(y_{n}-\lambda _{n}\nabla f(y_{n})\bigr)+ \tilde{e}_{n}, $$

where \(\hat{e}_{n}=\lambda _{n}\theta (y_{n})+e(y_{n})\), \(\theta (y_{n})=\nabla f(y_{n})-D(y_{n})\nabla f(y_{n})\), and

$$ \tilde{e}_{n}=\operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}- \lambda _{n}\nabla f(y_{n})+ \hat{e}_{n}\bigr)- \operatorname{prox}_{\lambda _{n}g}\bigl(y_{n}-\lambda _{n} \nabla f(y_{n})\bigr). $$

Theorem 3.3

Let \(f,g\in \Gamma _{0}(H)\) and assume that (1.1) is consistent (i.e., \(S\neq \emptyset \)). Assume that f is L-Lipschitzian and D is a diagonal scaling matrix. Given \(x_{0},x_{1}\in H\), let \(\{x_{n}\}\) be a sequence generated by Algorithm 3, where \(\lambda _{n}\in (0,\frac{2}{L})\). Suppose that

  1. (i)

    \(0<\liminf_{n\rightarrow \infty }\lambda _{n}\leq \limsup_{n \rightarrow \infty }\lambda _{n}<\frac{2}{L}\);

  2. (ii)

    \(\sum_{n=0}^{\infty }\Vert e(y_{n})\Vert <\infty \);

  3. (iii)

    \(\sum_{n=0}^{\infty }\Vert \theta (y_{n})\Vert <\infty \);

  4. (iv)

    \(\sum_{n=0}^{\infty }\delta _{n}\Vert x_{n}-x_{n-1}\Vert <\infty \).

Then \(\{x_{n}\}\) converges weakly to a solution of the minimization problem of (1.1).


Step 1. Show that \(\{x_{n}\}\) is bounded. For any \(z\in S\),

$$\begin{aligned} \Vert x_{2n+2}-z \Vert &= \Vert V_{\lambda _{2n+1}}y_{2n+1}+ \tilde{e}_{2n+1}-z \Vert \\ &\leq \Vert y_{2n+1}-z \Vert + \Vert \tilde{e}_{2n+1} \Vert \\ &= \bigl\Vert x_{2n+1}+\delta _{2n+1}(x_{2n+1}-x_{2n})-z \bigr\Vert + \Vert \tilde{e}_{2n+1} \Vert \\ &\leq \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert + \Vert \tilde{e}_{2n+1} \Vert . \end{aligned}$$

Applying conditions (ii) and (iv), we deduce that \(\{x_{2n}\}\) is bounded. Since

$$\begin{aligned} \Vert x_{2n+1}-z \Vert &= \Vert V_{\lambda _{2n}}y_{2n}+ \tilde{e}_{2n}-z \Vert \\ &= \Vert V_{\lambda _{2n}}x_{2n}+\tilde{e}_{2n}-z \Vert \\ &\leq \Vert x_{2n}-z \Vert + \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

It is easy to get that \(\{x_{n}\}\) is bounded and so are \(\{y_{n}\}\) and \(\{V_{\lambda _{n}}y_{n}\}\). Also, it follows from (3.25) and (3.26) that \(\{x_{n}\}\) is quasi-Fejer monotone with respect to S. By Lemma 2.10, \(\lim_{n\rightarrow \infty }\Vert x_{n}-z\Vert \) exists.

Step 2. Show that \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\) and \(\lim_{n\rightarrow \infty }\Vert x_{n}-V_{\lambda _{n}}x_{n}\Vert =0\). Firstly, fix \(z\in S\), by Lemma 2.2 and Schwartz’s inequality, we have

$$\begin{aligned} \Vert y_{2n+1}-z \Vert ^{2}&= \bigl\Vert x_{2n+1}+\delta _{2n+1}(x_{2n+1}-x_{2n})-z \bigr\Vert ^{2} \\ &\leq \Vert x_{2n+1}-z \Vert ^{2}+2\bigl\langle x_{2n+1}-z+\delta _{2n+1}(x_{2n+1}-x_{2n}), \delta _{2n+1}(x_{2n+1}-x_{2n})\bigr\rangle \\ &\leq \Vert x_{2n+1}-z \Vert ^{2} \\ &\quad{} +2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr). \end{aligned}$$

Since \(V_{\lambda _{n}}\) is \(\frac{2+\lambda _{n}L}{4} \)-av, we see that

$$\begin{aligned} V_{\lambda _{n}}=\operatorname{prox}_{\lambda _{n}g}(I-\lambda _{n}\nabla f)=(1-w_{n})I+w_{n}T_{n}, \end{aligned}$$

where \(w_{n}=\frac{2+\lambda _{n}L}{4}\), \(T_{n}\) is nonexpansive. From condition (ii), we get \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\). Combining (3.23) and (3.26), we obtain

$$\begin{aligned} \Vert x_{2n+2}-z \Vert ^{2}&= \Vert V_{\lambda _{2n+1}}y_{2n+1}+ \tilde{e}_{2n+1}-z \Vert ^{2} \\ &= \Vert V_{\lambda _{2n+1}}y_{2n+1}-z \Vert ^{2}+2\langle V_{\lambda _{2n+1}}y_{2n+1}-z, \tilde{e}_{2n+1}\rangle + \Vert \tilde{e}_{2n+1} \Vert ^{2} \\ &\leq \Vert y_{2n+1}-z \Vert ^{2}+ \Vert \tilde{e}_{2n+1} \Vert \bigl(2 \Vert y_{2n+1}-z \Vert + \Vert \tilde{e}_{2n+1} \Vert \bigr) \\ &\leq \Vert x_{2n+1}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ &\quad{} +M_{3} \Vert \tilde{e}_{2n+1} \Vert , \end{aligned}$$

where \(M_{3}=\sup \{2\Vert y_{2n+1}-z\Vert +\Vert \tilde{e}_{2n+1}\Vert \}\).

With the help of equality (3.28), we have

$$\begin{aligned} & \Vert x_{2n+1}-z \Vert ^{2} \\ &\quad = \Vert V_{\lambda _{2n}}y_{2n}+\tilde{e}_{2n}-z \Vert ^{2} \\ &\quad = \bigl\Vert (1-w_{2n})x_{2n}+w_{2n}T_{2n}x_{2n}-z \bigr\Vert ^{2}+2\langle V_{\lambda _{2n}}x_{2n}-z, \tilde{e}_{2n}\rangle + \Vert \tilde{e}_{2n} \Vert ^{2} \\ &\quad \leq (1-w_{2n}) \Vert x_{2n}-z \Vert ^{2}+w_{2n} \Vert T_{2n}x_{2n}-T_{2n}z \Vert ^{2}-w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} \\ & \quad \quad {} +\bigl(2 \Vert x_{2n}-z \Vert + \Vert \tilde{e}_{2n} \Vert \bigr) \Vert \tilde{e}_{2n} \Vert \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}-w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2}+M_{4} \Vert \tilde{e}_{2n} \Vert , \end{aligned}$$

where \(M_{4}=\sup \{2\Vert x_{2n}-z\Vert +\Vert \tilde{e}_{2n}\Vert \}\).

Substituting (3.30) into (3.29), we get

$$\begin{aligned} & \Vert x_{2n+2}-z \Vert ^{2} \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ & \quad \quad {} -w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2}+M_{3} \Vert \tilde{e}_{2n+1} \Vert +M_{4} \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

Hence, we have the following result:

$$\begin{aligned} &w_{2n}(1-w_{2n}) \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} \\ &\quad \leq \Vert x_{2n}-z \Vert ^{2}- \Vert x_{2n+2}-z \Vert ^{2}+2\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigl( \Vert x_{2n+1}-z \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \bigr) \\ & \quad \quad {} +M_{3} \Vert \tilde{e}_{2n+1} \Vert +M_{4} \Vert \tilde{e}_{2n} \Vert . \end{aligned}$$

Noting the fact that \(\frac{1}{2}<\liminf_{n\rightarrow \infty }w_{n}\leq \limsup_{n \rightarrow \infty }w_{n}<1\), we deduce from (3.32) that

$$\begin{aligned} \sum_{n=0}^{\infty } \Vert T_{2n}x_{2n}-x_{2n} \Vert ^{2} < \infty . \end{aligned}$$

In particular, \(\lim_{n\rightarrow \infty }\Vert T_{2n}x_{2n}-x_{2n}\Vert =0\). Now we have

$$\begin{aligned} \Vert x_{2n+1}-x_{2n} \Vert \leq w_{2n} \Vert T_{2n}x_{2n}-x_{2n} \Vert + \Vert \tilde{e}_{2n} \Vert \rightarrow 0. \end{aligned}$$

Similarly, we argue that

$$\begin{aligned} \sum_{n=0}^{\infty } \Vert T_{2n+1}y_{2n+1}-y_{2n+1} \Vert ^{2} < \infty . \end{aligned}$$

Observe that

$$\begin{aligned} x_{2n+2}=(1-w_{2n+1})y_{2n+1}+w_{2n+1}T_{2n+1}y_{2n+1}+ \tilde{e}_{2n+1}. \end{aligned}$$

From (3.35) and condition (ii), we get

$$\begin{aligned} \Vert x_{2n+2}-y_{2n+1} \Vert \leq w_{2n+1} \Vert T_{2n+1}y_{2n+1}-y_{2n+1} \Vert + \Vert \tilde{e}_{2n+1} \Vert \rightarrow 0. \end{aligned}$$

It follows from (3.36) and condition (iv) that

$$\begin{aligned} \Vert x_{2n+2}-x_{2n+1} \Vert &\leq \Vert x_{2n+2}-y_{2n+1} \Vert + \Vert y_{2n+1}-x_{2n+1} \Vert \\ &= \Vert x_{2n+2}-y_{2n+1} \Vert +\delta _{2n+1} \Vert x_{2n+1}-x_{2n} \Vert \rightarrow 0. \end{aligned}$$

Combining (3.34) and (3.38), we obtain \(\lim_{n\rightarrow \infty }\Vert x_{n+1}-x_{n}\Vert =0\). This yields

$$\begin{aligned} \Vert x_{n}-V_{\lambda _{n}}x_{n} \Vert &\leq \Vert x_{n}-x_{n+1} \Vert + \Vert x_{n+1}-V_{ \lambda _{n}}y_{n} \Vert + \Vert V_{\lambda _{n}}y_{n}-V_{\lambda _{n}}x_{n} \Vert \\ &\leq \Vert x_{n}-x_{n+1} \Vert + \Vert \tilde{e}_{n} \Vert + \Vert y_{n}-x_{n} \Vert \rightarrow 0. \end{aligned}$$

Step 3. Show that

$$\begin{aligned} \omega _{w}(x_{n})\subset S. \end{aligned}$$

Since \(\lambda _{n}\) is bounded, we may assume that the subsequence \(\lambda _{n_{k}}\) converges to some λ. It can be proved by a method similar to step 3 in Theorem 3.1. We conclude that (3.40) holds. By Lemma 2.9, we get \(\{x_{n}\}\) converges weakly. □

4 Numerical illustrations

In this section, we consider the following two examples to demonstrate the effectiveness of the algorithms and convergence of Theorem 3.1 and Theorem 3.3.

Example 4.1

Let \(H=\mathbb{R}^{N}\). Define \(h(x)=\frac{1}{10}x\). Take \(f(x)=\frac{1}{2}\Vert Ax-b\Vert ^{2}\), then we obtain that \(\nabla f(x)=A^{T}(Ax-b)\) with Lipschitz constant \(L=\Vert A^{T}A\Vert \), where \(A^{T}\) represents the transpose of A. Take \(g=\Vert x\Vert _{1}\), then

$$ \operatorname{prox}_{\lambda g}x=\arg \min_{v\in H}\biggl\{ \frac{1}{2\lambda } \Vert v-x \Vert ^{2}+ \Vert v \Vert _{1}\biggr\} . $$

In [15], we know that

$$ \operatorname{prox}_{\lambda _{n}\Vert \cdot \Vert _{1}}x=\bigl[\operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(1), \operatorname{prox}_{ \lambda _{n}\vert \cdot \vert }x(2),\ldots, \operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(N) \bigr]^{T}, $$

where \(\operatorname{prox}_{\lambda _{n}\vert \cdot \vert }x(i)=\max \{\vert x(i)\vert -\lambda _{n},0\}\operatorname{sign}(x(i))\), and \(x(i)\) denotes the ith element of x, \(i=1,2,\ldots,N\). Let D be a diagonal matrix with the element \(y_{n}(i)\). That is, \(D_{ii}=y_{n}(i)\), \(i=1,2,\ldots, N\). Given \(\alpha _{n}=\frac{1}{100n}\), \(\lambda _{n}=\frac{1}{30L}\frac{n+1}{n+2}\), and

$$ \delta _{n}= \textstyle\begin{cases} \frac{1}{n^{2} \Vert x_{n}-x_{n-1} \Vert }, & \Vert x_{n}-x_{n-1} \Vert \neq 0, \\ 0, & \Vert x_{n}-x_{n-1} \Vert = 0 \end{cases} $$

for every \(n\geq 0\). Generate an \(M*N\) random matrix A whose entries are sampled independently from uniformly distribution. Generate randomly a vector b from a Gaussian distribution of zero mean and unit variance.

According to the iterative process of Theorem 3.1, the sequence \(\{x_{n}\}\) is generated by

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\alpha _{n}h(y_{n})+(1-\alpha _{n})(\operatorname{prox}_{\lambda _{n}g}(y_{n}- \lambda _{n}D(y_{n})A^{T}(Ay_{n}-b)+e(y_{n})). \end{cases} $$

Next, we use MATLAB software for numerical implementation. Set \(M=100\), \(N=1000\). Under the same parameters, contrast with iterative algorithm (4.2) in reference [6]. Take different error limit ϵ, we obtain the numerical experiment results in Table 1, where n and t denote the iterative number and running time(tic/toc), respectively. We use \(\Vert x_{n+1}-x_{n}\Vert <\epsilon \) as the stopping criteria.

$$\begin{aligned} x_{n+1}=\alpha _{n}h(x_{n})+(1- \alpha _{n}) (\operatorname{prox}_{\lambda _{n}g}\bigl(x_{n}- \lambda _{n}DA^{T}(Ax_{n}-b)+e(x_{n}) \bigr). \end{aligned}$$
Table 1 Comparison of Algorithm 1 (IA) with the algorithm without inertia step (UA) for Example 4.1. \(x_{0}=\operatorname{randn}(N,1)\)

In addition, we compare the values of \(\Vert x_{n+1}-x_{n}\Vert \) at the same number of iterations of (4.1) and (4.2). The results can be seen in Fig. 1. We also present different running time and the number of iterations at different stopping criteria ϵ. See Fig. 2.

Figure 1
figure 1

The comparison of \(\Vert x_{n+1}-x_{n}\Vert \) of inertial acceleration (IA) and without inertial acceleration (UA) for \((M,N)= (100,1000)\) of Example 4.1

Figure 2
figure 2

The comparison of running time and iteration steps of inertial acceleration (IA) and without inertial acceleration (UA) with the same stopping criteria for \((M,N)= (100,1000)\) of Example 4.1

It can be easily seen from Table 1, Fig. 1, and Fig. 2 that Algorithm 1 is faster than iterative formula (4.2) without inertial step. At the same stopping criteria, the values of \(\Vert x_{n+1}-x_{n}\Vert \) and \(\Vert Ax_{n}-b\Vert \) of Algorithm 1 are smaller.

In what follows, we give an example in an infinite dimensional space.

Example 4.2

Suppose that \(H=L^{2}([0,1])\) with the norm \(\Vert x\Vert =(\int _{0}^{1}(x(t))^{2}\,dt)^{\frac{1}{2}}\) and the inner product \(\langle x,y\rangle =\int _{0}^{1}x(t)y(t)\,dt\), \(\forall x,y\in H\). Define \(h(x)=\frac{1}{2}x\) and \(Ax(t)=tx(t)\). Let \(f(x)=\frac{1}{2}\Vert Ax(t)-u(t)\Vert ^{2}\) and \(g(x)\) be the indicator function of C, respectively, where \(u(t)\in H\) is a fixed function and \(C=\{x\in H\vert \Vert x\Vert \leq 1\}\).

By the definition of f and g, we obtain

$$ \nabla f(x)=A^{*}(Ax-u) $$


$$ \operatorname{prox}_{\lambda g}x=\arg \min_{v\in H}\biggl\{ \frac{1}{2\lambda } \Vert v-x \Vert ^{2}+ \iota _{C}(v) \biggr\} =P_{C}(x), $$

where \(\iota _{C}\) denotes the indicator function and

$$ \iota _{C}(x)= \textstyle\begin{cases} 0, & \text{if } x\in C, \\ \infty ,& \text{if } x\notin C. \end{cases} $$

We also deduce the adjoint operator of A is still A, i.e., \(A^{*}=A\). Take \(D(x_{n})=I\), set the parameters \(\alpha _{n}=\frac{1}{1000n}\) and \(\lambda _{n}=\frac{n}{L*(n+1)}\), according to the iterative algorithm of Theorem 3.1, we get the following sequence \(\{x_{n}\}\):

$$ \textstyle\begin{cases} y_{n}=x_{n}+\delta _{n}(x_{n}-x_{n-1}), \\ x_{n+1}=\frac{1}{1000n}\frac{1}{2}y_{n}+(1-\frac{1}{1000n})P_{C}(y_{n}- \frac{n}{L(n+1)}A(Ay_{n}-u)). \end{cases} $$

The numerical integration method used in this example is the trapezoidal formula. We test these two algorithms with different stopping criteria. The numerical results are shown in Table 2.

Table 2 Comparison of Algorithm 3 (AIA) with Algorithm 1 (IA) for Example 4.2. \(u=e^{t}\), \(x_{0}=t\), \(x_{1}=t^{2}\)

In what follows, we present a comparison of inertial proximal gradient algorithm (IA) and alternated inertial proximal gradient algorithm (AIA). Set \(e(y_{n})=\frac{1}{n^{2}}\) as the outer perturbation, the numerical results are reported in Table 3.

Table 3 Comparison of Algorithm 3 (AIA) with Algorithm 1 (IA) for Example 4.2. \(u=\sin t\), \(x_{0}=t\), \(x_{1}=2t\)

It is observed that the norm of \(x_{n}\) is close to 1 with the increase of iteration steps. From this example, the alternated inertia algorithm needs fewer iterations and less running time than inertia algorithm, but there is not much difference between the two algorithms.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  2. Boikanyo, O.A., Morosanu, G.: Strong convergence of a proximal point algorithm with bounded error sequence. Optim. Lett. 72, 415–420 (2013)

    Article  MathSciNet  Google Scholar 

  3. Censor, Y., Davidi, R., Herman, G.T.: Perturbation resilience and superiorization of iterative algorithms. Inverse Probl. 26, 065008 (2010)

    Article  MathSciNet  Google Scholar 

  4. Davidi, R., Herman, G.T., Censor, Y.: Perturbation-resilient block-iterative projection methods with application to image reconstruction from projections. Int. Trans. Oper. Res. 16, 505–524 (2009)

    Article  MathSciNet  Google Scholar 

  5. Dong, Q.L., Zhao, J., He, S.N.: Bounded perturbation resilience of the viscosity algorithm. J. Inequal. Appl. 2016, 299 (2016)

    Article  MathSciNet  Google Scholar 

  6. Duan, P.C., Song, M.M.: General viscosity iterative approximation for solving unconstrained convex optimization problems. J. Inequal. Appl. 2015, 334 (2015)

    Article  MathSciNet  Google Scholar 

  7. Geobel, K., Kirk, W.A.: Topics in Metric Fixed Point Theory. Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (1990)

    Book  Google Scholar 

  8. Guo, Y.N., Cui, W.: Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm. J. Inequal. Appl. 2018, 103 (2018)

    Article  MathSciNet  Google Scholar 

  9. He, S.N., Yang, C.P.: Solving the variational inequality problem defined on intersection of finite level sets. Abstr. Appl. Anal. 2013, Article ID 942315 (2013)

    MathSciNet  MATH  Google Scholar 

  10. Iutzeler, F., Hendricks, M.: A generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Softw. 34, 383–405 (2019)

    Article  MathSciNet  Google Scholar 

  11. Jin, W., Censor, Y., Jiang, M.: Bounded perturbation resilience of projected scaled gradient methods. Comput. Optim. Appl. 63, 365–392 (2016)

    Article  MathSciNet  Google Scholar 

  12. Mahammand, A.A., Naseer, S., Xu, H.K.: Properties and iterative methods for Q-lasso. Abstr. Appl. Anal. 8, Article ID 250943 (2013)

    MathSciNet  MATH  Google Scholar 

  13. Mangasarian, O.L., Meyer, R.R., Robinson, S.M., Auslender, A.: Minimisation de fonctions localement Lipschitiziwnnes: appligramming a la programmation mi-convexe, mi-differential. In: Nonlinear Programming 3. Academic Press, New York (1978)

    Google Scholar 

  14. Marino, G., Xu, H.K.: Weak and strong convergence theorems for strict pseudo-contractions in Hilbert spaces. J. Math. Anal. Appl. 329, 336–346 (2007)

    Article  MathSciNet  Google Scholar 

  15. Micchelli, C.A., Shen, L.X., Xu, Y.S.: Proximity algorithms for image models: denoising. Inverse Probl. 27, 045009 (2011)

    Article  MathSciNet  Google Scholar 

  16. Moreau, J.J.: Proprietes des applications ‘prox’. C. R. Acad. Sci. Paris Sér. A Math. 256, 1069–1071 (1963)

    MathSciNet  MATH  Google Scholar 

  17. Mu, Z.G., Peng, Y.: A note on the inertial proximal point method. Stat. Optim. Inf. Comput. 3, 241–248 (2015)

    Article  MathSciNet  Google Scholar 

  18. Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  19. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)

    Article  Google Scholar 

  20. Polyak, B.T.: Introduction to Optimization. Optimization Software Inc. Publications Division, New York (1987)

    MATH  Google Scholar 

  21. Shehu, Y., Gibali, A.: New inertial relaxed method for solving split feasibilities. Optim. Lett. (2020).

    Article  Google Scholar 

  22. Xu, H.K.: Averaged mappings and the gradient-projection algorithm. J. Optim. Theory Appl. 150, 360–378 (2011)

    Article  MathSciNet  Google Scholar 

  23. Xu, H.K.: Properties and iterative methods for the lasso and its variants. Chin. Ann. Math. 35B(3), 1–18 (2014)

    MathSciNet  Google Scholar 

  24. Xu, H.K.: Bounded perturbation resilience and superiorization techniques for the projected scaled gradient method. Inverse Probl. 33, 044008 (2017)

    Article  MathSciNet  Google Scholar 

  25. Yao, Z.S., Cho, S.Y., Kang, S.M., Zhu, L.J.: A regularized algorithm for the proximal split feasibility problem. Abstr. Appl. Anal. 2014, Article ID 894272 (2014)

    MathSciNet  MATH  Google Scholar 

Download references


The authors would like to thank the referee for valuable suggestions to improve the manuscript.


The authors thank the Foundation of Tianjin Key Lab for Advanced Signal Processing (2019ASP-TJ04) and the Scientific Research Project of Tianjin Municipal Education Commission (2019KJ133) for support.

Author information

Authors and Affiliations



All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Peichao Duan.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, P., Zhang, Y. & Bu, Q. New inertial proximal gradient methods for unconstrained convex optimization problems. J Inequal Appl 2020, 255 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: