Skip to main content

A proximal gradient method with double inertial steps for minimization problems involving demicontractive mappings


In this article, we present a novel proximal gradient method based on double inertial steps for solving fixed points of demicontractive mapping and minimization problems. We also establish a weak convergence theorem by applying this method. Additionally, we provide a numerical example related to a signal recovery problem.

1 Introduction

Optimization and fixed point problems epitomize fundamental mathematical concepts with wide-ranging applications across various fields, including engineering, medicine, signal processing, and image processing. Engineers routinely grapple with the imperative to minimize expenditures, optimize designs, or maximize system efficiency, all of which can be aptly framed as optimization conundrums. In parallel, fixed point theorems assume profound significance in addressing particular engineering challenges, providing a robust mathematical framework for establishing the existence of solutions in diverse scenarios. Signal processing substantially reaps benefits from the incorporation of fixed point problems, particularly within the purview of optimization methodologies. These methodologies establish a resilient framework for effectively navigating the intricate complexities associated with denoising and restoration tasks. Notably, the least absolute shrinkage and selection operator (LASSO) [1] emerges as a pivotal optimization problem, assuming a critical role in the domain of signal reconstruction. Acknowledged for its remarkable efficacy within the compressed sensing paradigm, LASSO has enjoyed widespread recognition within the official discourse of signal processing. Within the realm of image processing, the application of optimization techniques and fixed point problem-solving takes on paramount importance, proving invaluable in the effective resolution of challenges such as image deblurring and image inpainting (refer to [27] for comprehensive information).

In 2014 (Jaggi [8]), an equivalence between the LASSO and support vector machines (SVMs) was shown in the following sense. Given any \(L_{2}\) norm loss function SVMs, a corresponding LASSO formulation has the same optimal solutions and vice versa. As a result, one can be re-translated into the other. From further investigation, the sparsity of a LASSO solution is equal to the number of support vectors for the corresponding SVMs. Many useful properties and sublinear time algorithms for SVMs naturally arise from LASSO properties. SVMs are commonly used for classification and regression tasks and have an extensive list of applications in natural language processing (NLP), particularly in the fields of information extraction and email phishing detection. SVMs are highly effective in information extraction tasks, such as named entity recognition, text categorization, and relation extraction. They are able to identify entities and patterns in unstructured text, as demonstrated in [9]. SVMs in email phishing detection [10, 11] utilize features such as sender addresses and content to perform binary classification, distinguishing between valid and questionable emails. SVMs can effectively detect anomalies in email traffic, but their success depends on the quality of features, data representation, and the training dataset. Utilizing ensemble approaches, which involve mixing SVMs with other models, improves the effectiveness of phishing detection. This emphasizes the significance of regularly updating the system to effectively respond to shifting phishing strategies. Moreover, in 2021, Afrin et al. [12] employed SVMs in conjunction with LASSO feature selection techniques to predict liver disease. More recently, Cholamjiak and Das [13] developed a modified projective forward-backward splitting algorithm for multiple models, including the LASSO, aimed at the prediction of Parkinson’s disease through the application of the extreme learning machine.

Prominent optimization techniques for the minimization of the sum of a smooth function and a nonsmooth function encompass the proximal gradient algorithm, credited to its originator [14] (also referenced in [15]). This method involves the sequential application of gradient steps to the first function, followed by the proximity operator applied to the second function. It is widely recognized that the inclusion of inertia, alternatively referred to as Nesterov’s acceleration [16], has the capacity to notably enhance both the theoretical and practical convergence rates of this approach. The recent surge in popularity of Nesterov’s acceleration [16] has spurred the development of numerous variations, such as those detailed in references [1719]. Particularly noteworthy is the fast iterative shrinkage-thresholding algorithm (FISTA), as introduced by Beck and Teboulle [17], which exhibits a significantly enhanced convergence rate akin to Nesterov’s optimal gradient approach, specifically tailored for convex composite objective functions.

Throughout this article, denote by \(\mathcal{H}\) a real Hilbert space with the inner product \(\langle \cdot , \cdot \rangle \) and the associated norm \(\|\cdot \|\). Let \(\mathbb{R}\) and \(\mathbb{N}\) be the sets of real numbers and nonnegative integers, respectively. We are interested in the following minimization problem:

$$\begin{aligned} \min_{x\in \mathcal{H}} f(x)+g(x), \end{aligned}$$

where \(f : \mathcal{H}\to \mathbb{R}\) and \(g : \mathcal{H}\to (-\infty , +\infty ]\) belong to the class of proper, lower semi-continuous (l.s.c.), and convex functions on \(\mathcal{H}\). Furthermore, the function f is assumed to be differentiable with L-Lipschitz continuous gradient f. The set of minimizers of \(f+g\) is denoted by \(\arg \min (f+g)\). It is well known that

$$\begin{aligned} \tilde{x}\in \arg \min (f+g) \quad \iff \quad 0\in (\nabla f+\partial g) ( \tilde{x}), \end{aligned}$$

where ∂g is the subdifferential of g. Recently, Kesornprom and Cholamjiak [20] introduced a new proximal gradient method that integrates the inertial technique alongside adaptive step size, demonstrating its effectiveness in addressing the minimization problem defined in equation (1.1). This algorithm has been applied to solve X-ray image deblurring. Similarly, Kankam and Cholamjiak [21] investigated image restoration as a mathematical model using the minimization problem (1.1).

Next, we consider the following fixed point problem:

$$\begin{aligned} \text{find } x\in \mathcal{H} \text{ such that } x = Tx, \end{aligned}$$

where \(T : \mathcal{H}\to \mathcal{H}\) is a mapping. We denote by \(Fix(T)\) the fixed point set of T. The Mann iteration [22] is prominent among the frequently employed algorithms for solving the fixed point problem described in equation (1.2). In 2008, Maingé [23] introduced an algorithm that cleverly integrates the inertial technique with Mann iteration, customizing it to address the fixed point problem (1.2). It is noteworthy that, under certain conditions, the iterative sequence generated by this algorithm weakly converges to a fixed point of a nonexpansive mapping. The general inertial Mann iteration for a nonexpansive mapping was introduced by Dong et al. in 2018 [24]. It is evident that the method in [23] is a specific instance of this general inertial Mann iteration. According to [24], the sequence obtained by the general inertial Mann iteration weakly converges to a fixed point under certain suitable circumstances.

Drawing upon the insights garnered from preceding research, this paper proposes a novel proximal gradient method that incorporates the general inertial Mann iteration to obtain a weak convergence theorem for solving both the minimization problem (1.1) and the fixed point problem (1.2) associated with a demicontractive mapping, subject to specified control conditions. Furthermore, the efficacy of our proposed algorithm is demonstrated by its application to a signal recovery problem, underscoring its practical utility.

2 Preliminaries

To establish our primary result, this section provides necessary definitions and lemmas. We use the symbol → to represent strong convergence and denote weak convergence as . Let \(s, t\in \mathcal{H}\) and \(\eta \in \mathbb{R}\). Then we have

$$\begin{aligned} &\Vert s+t \Vert ^{2} = \Vert s \Vert ^{2} + 2 \langle s, t\rangle + \Vert t \Vert ^{2}, \end{aligned}$$
$$\begin{aligned} &\Vert s+t \Vert ^{2} \leq \Vert s \Vert ^{2} + 2\langle t, s+t\rangle , \end{aligned}$$


$$\begin{aligned} \bigl\Vert \eta s + (1-\eta )t \bigr\Vert ^{2} &= \eta \Vert s \Vert ^{2} + (1-\eta ) \Vert t \Vert ^{2} - \eta (1-\eta ) \Vert s-t \Vert ^{2}. \end{aligned}$$

Definition 2.1

Let \(h : \mathcal{H}\to (-\infty , +\infty ]\) be proper, convex, and l.s.c. function and \(\tilde{c}>0\). The proximity operator of h of order is defined by

$$\begin{aligned} \mathrm{prox}_{\tilde{c} h}(s) := \underset{t\in \mathcal{H}}{\arg \min} \biggl\lbrace h(t)+ \frac{1}{2\tilde{c}} \Vert s-t \Vert ^{2} \biggr\rbrace \end{aligned}$$

for all \(s\in \mathcal{H}\).

Next, let \(T : \mathcal{H}\to \mathcal{H}\) be a mapping and \(G : \mathcal{H}\rightarrow 2^{\mathcal{H}}\) be a multivalued mapping.

Definition 2.2

T is said to be

  1. (i)

    μ-demicontractive if \(Fix(T)\neq \emptyset \) and there is \(\mu \in [0, 1)\) such that for all \(s\in \mathcal{H}\) and all \(p\in Fix(T)\),

    $$ \Vert Ts-p \Vert ^{2}\leq \Vert s-p \Vert ^{2}+ \mu \Vert s-Ts \Vert ^{2}, $$
  2. (ii)

    L-Lipschitz continuous if there is \(L>0\) such that

    $$ \Vert Ts-Tt \Vert \leq L \Vert s-t \Vert $$

    for all \(s,t\in \mathcal{H}\).

Definition 2.3

G is said to be

  1. (i)

    monotone if for all \((s, u), (t, v)\in graph(G)\) (the graph of mapping G),

    $$ \langle u-v, s-t\rangle \geq 0, $$
  2. (ii)

    maximal monotone if for every \((s, u)\in \mathcal{H}\times \mathcal{H}\),

    $$ (s, u)\in graph(G)\quad \iff \quad \langle u-v, s-t\rangle \geq 0 \quad \text{for all } (t, v)\in graph(G). $$

Definition 2.4

[25] Suppose \(Fix(T)\neq \emptyset \). Then \(I-T\) is demiclosed at zero if for any \(\{s_{n}\}\in \mathcal{H}\), the following implication holds:

s n s and s n T s n 0 s Fix(T).

Lemma 2.5

[26] If T is a Lipschitz continuous and monotone mapping and G is a maximal monotone mapping, then the mapping \(T+G\) is maximal monotone.

Lemma 2.6

[27] Let \(\{x_{n}\}\) and \(\{\Lambda _{n}\}\) be nonnegative sequences of real numbers satisfying \(x_{n+1}\leq (1+\Lambda _{n})x_{n}+\Lambda _{n}x_{n-1}\). Then \(x_{n+1}\leq K\cdot \prod_{j=1}^{n}(1+2\Lambda _{j})\), where \(K = \max \{x_{1}, x_{2}\}\). Furthermore, if \(\sum_{n=1}^{\infty}\Lambda _{n}<\infty \), then \(\{x_{n}\}\) is bounded.

Lemma 2.7

[28] Let \(\{x_{n}\}\) and \(\{y_{n}\}\) be sequences of nonnegative real numbers such that \(\sum_{n=1}^{\infty} y_{n}<\infty \) and \(x_{n+1}\leq x_{n}+y_{n}\). Then \(\{x_{n}\}\) is a convergent sequence.

Lemma 2.8

[29, Opial] Let \(\{s_{n}\}\) be a sequence in \(\mathcal{H}\) and Ψ be a nonempty subset of \(\mathcal{H}\). If, for every \(s^{*}\in \Psi \), \(\lbrace \|s_{n}-s^{*}\| \rbrace \) converges and every weak sequential cluster point of \(\{s_{n}\}\) belongs to Ψ, then \(\{s_{n}\}\) converges weakly to a point in Ψ.

3 Main result

We first assume that the following conditions are satisfied for the convergence analysis of our algorithm:

Condition 1. \(f : \mathcal{H}\to \mathbb{R}\) and \(g : \mathcal{H}\to (-\infty , +\infty ]\) are two proper, l.s.c., and convex functions.

Condition 2. f is differentiable and has an L-Lipschitz continuous gradient f.

Condition 3. \(T : \mathcal{H}\to \mathcal{H}\) is a μ-demicontractive mapping such that \(I-T\) is demiclosed at zero.

Condition 4. \(\Psi := \arg \min (f+g) \cap Fix(T)\) is nonempty.

Remark 3.1

It is known from [29] that \(\tilde{x}\in \arg \min (f+g)\) if and only if \(\tilde{x} = \mathrm{prox}_{\tilde{c} g}(I-\tilde{c}\nabla f) \tilde{x}\), where \(\tilde{c}>0\). If \(w_{n} = y_{n} = u_{n} = Tu_{n}\) in Algorithm 1, then \(w_{n}\in \Psi \).

Algorithm 1
figure a

Modified Proximal Gradient Algorithm

We are now prepared for the main convergence theorem.

Theorem 3.2

Let \(\{s_{n}\}\) be generated by Algorithm 1. Assume that the following conditions hold:

\((\mathcal{C}1)\) \(\sum_{n=1}^{\infty} p_{n}<\infty \); \((\mathcal{C}2)\) \(\lim_{n\rightarrow \infty} q_{n}=1\); \((\mathcal{C}3)\) \(\bar{\eta}<\eta _{n}<1-\mu \) for some \(\bar{\eta} > 0\);

\((\mathcal{C}4)\) \(\sum_{n=1}^{\infty}\theta _{n}<\infty \); \((\mathcal{C}5)\) \(\sum_{n=1}^{\infty}\zeta _{n}<\infty \).

Then \(\{s_{n}\}\) converges weakly to a solution of Ψ.


Let \(\tilde{s}\in \Psi \). Next, we prove all the following claims.

Claim 1. \(\lim_{n\to \infty}\varrho _{n} = \lambda \), where \(\varrho _{n} = \frac {\lambda q_{n}\tau _{n}}{\tau _{n+1}}\).

Since f is L-Lipschitz continuous mapping, if \(\nabla f(w_{n})\neq \nabla f(y_{n})\), then

$$\begin{aligned} \frac{\lambda q_{n} \Vert w_{n}-y_{n} \Vert }{ \Vert \nabla f(w_{n})-\nabla f(y_{n}) \Vert } \geq \frac{\lambda q_{n} \Vert w_{n}-y_{n} \Vert }{L \Vert w_{n}-y_{n} \Vert } = \frac{\lambda q_{n}}{L} \geq \frac{\lambda}{L}. \end{aligned}$$

By using the same technique as in the proof of [30, Lemma 3.1], we obtain

$$\begin{aligned} \lim_{n\to \infty}\tau _{n}=\tau \in \biggl[\min \biggl\{ \tau _{1}, \frac {\lambda}{L} \biggr\} , \tau _{1}+p \biggr], \end{aligned}$$

where \(p = \sum_{n=1}^{\infty} p_{n}\). It follows from \((\mathcal{C}2)\) that

$$\begin{aligned} \lim_{n\to \infty}\varrho _{n} = \lim _{n\to \infty} \frac{\lambda q_{n}\tau _{n}}{\tau _{n+1}} = \lambda . \end{aligned}$$

Claim 2. For any \(n\in \mathbb{N}\),

$$\begin{aligned} \bigl\langle y_{n}-\tilde{s}, y_{n}-w_{n}+ \tau _{n} \bigl(\nabla f(w_{n})- \nabla f(y_{n}) \bigr) \bigr\rangle \leq 0. \end{aligned}$$

By using the definition of \(y_{n}\), we have

$$\begin{aligned} (I-\tau _{n}\nabla f)w_{n}\in (I+\tau _{n} \partial g)y_{n}. \end{aligned}$$

Thus, we can write

$$\begin{aligned} c_{n}= \frac{w_{n}-y_{n}}{\tau _{n}}-\nabla f(w_{n}), \end{aligned}$$

where \(c_{n}\in \partial g(y_{n})\). By Lemma 2.5, we have that the mapping \(\nabla f+\partial g\) is maximal monotone. This leads to

$$\begin{aligned} \bigl\langle y_{n}-\tilde{s}, \nabla f(y_{n})+c_{n} \bigr\rangle \geq 0, \end{aligned}$$

and thus

$$\begin{aligned} \bigl\langle y_{n}-\tilde{s}, y_{n}-w_{n}+ \tau _{n} \bigl(\nabla f(w_{n})- \nabla f(y_{n}) \bigr) \bigr\rangle \leq 0. \end{aligned}$$

Claim 3. For any \(n\in \mathbb{N}\),

$$\begin{aligned} \Vert s_{n+1}-\tilde{s} \Vert ^{2}&\leq \Vert w_{n}-\tilde{s} \Vert ^{2}- \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}-\eta _{n}(1-\mu -\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2}. \end{aligned}$$

From (2.1), we have

$$\begin{aligned} \Vert u_{n}-\tilde{s} \Vert ^{2} &= \bigl\Vert y_{n}-\tilde{s} + \tau _{n} \bigl(\nabla f(w_{n})- \nabla f(y_{n}) \bigr) \bigr\Vert ^{2} \\ &= \bigl\Vert (y_{n}-w_{n})+(w_{n}- \tilde{s}) \bigr\Vert ^{2} + \tau _{n}^{2} \bigl\Vert \nabla f(w_{n})- \nabla f(y_{n}) \bigr\Vert ^{2} \\ &\quad +2\tau _{n} \bigl\langle y_{n}-\tilde{s}, \nabla f(w_{n})-\nabla f(y_{n}) \bigr\rangle \\ &= \Vert y_{n}-w_{n} \Vert ^{2}+ \Vert w_{n}-\tilde{s} \Vert ^{2} + \tau _{n}^{2} \bigl\Vert \nabla f(w_{n})-\nabla f(y_{n}) \bigr\Vert ^{2}+2\langle y_{n}-w_{n}, w_{n}- \tilde{s}\rangle \\ &\quad +2\tau _{n} \bigl\langle y_{n}-\tilde{s}, \nabla f(w_{n})-\nabla f(y_{n}) \bigr\rangle \\ &= \Vert w_{n}-\tilde{s} \Vert ^{2}- \Vert y_{n}-w_{n} \Vert ^{2} + \tau _{n}^{2} \bigl\Vert \nabla f(w_{n})-\nabla f(y_{n}) \bigr\Vert ^{2}+2\langle y_{n}-w_{n}, y_{n}- \tilde{s}\rangle \\ &\quad +2\tau _{n} \bigl\langle y_{n}-\tilde{s}, \nabla f(w_{n})-\nabla f(y_{n}) \bigr\rangle \\ &= \Vert w_{n}-\tilde{s} \Vert ^{2}- \Vert y_{n}-w_{n} \Vert ^{2} + \tau _{n}^{2} \bigl\Vert \nabla f(w_{n})-\nabla f(y_{n}) \bigr\Vert ^{2} \\ &\quad +2 \bigl\langle y_{n}-\tilde{s}, y_{n}-w_{n}+ \tau _{n} \bigl(\nabla f(w_{n})- \nabla f(y_{n}) \bigr) \bigr\rangle . \end{aligned}$$

Using Claim 2, we get

$$\begin{aligned} \Vert u_{n}-\tilde{s} \Vert ^{2} \leq \Vert w_{n}-\tilde{s} \Vert ^{2}- \Vert y_{n}-w_{n} \Vert ^{2} + \tau _{n}^{2} \bigl\Vert \nabla f(w_{n})-\nabla f(y_{n}) \bigr\Vert ^{2}. \end{aligned}$$

By the definitions of \(u_{n}\) and \(\tau _{n}\), we have

$$\begin{aligned} \Vert u_{n}-y_{n} \Vert =\tau _{n} \bigl\Vert \nabla f(w_{n})-\nabla f(y_{n}) \bigr\Vert \leq \varrho _{n} \Vert w_{n}-y_{n} \Vert . \end{aligned}$$

This together with (3.1) implies that

$$\begin{aligned} \Vert u_{n}-\tilde{s} \Vert ^{2}\leq \Vert w_{n}-\tilde{s} \Vert ^{2}- \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}. \end{aligned}$$

Applying this to (2.3) and the demicontractiveness of T, we derive

$$\begin{aligned} \Vert s_{n+1}-\tilde{s} \Vert ^{2} &=(1-\eta _{n}) \Vert u_{n}-\tilde{s} \Vert ^{2}+ \eta _{n} \Vert Tu_{n}-\tilde{s} \Vert ^{2}-\eta _{n}(1-\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &\leq (1-\eta _{n}) \Vert u_{n}-\tilde{s} \Vert ^{2}+\eta _{n} \Vert u_{n}-\tilde{s} \Vert ^{2}+\eta _{n}\mu \Vert Tu_{n}-u_{n} \Vert ^{2}\\ &\quad -\eta _{n}(1-\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &= \Vert u_{n}-\tilde{s} \Vert ^{2}-\eta _{n}(1-\mu -\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &\leq \Vert w_{n}-\tilde{s} \Vert ^{2}- \bigl(1- \varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}- \eta _{n}(1-\mu -\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2}. \end{aligned}$$

Claim 4. \(\lim_{n\rightarrow \infty}\|s_{n}- \tilde{s}\|\) exists.

From Claim 1, we immediately get that \(\lim_{n\to \infty}(1-\varrho _{n}^{2}) = 1-\lambda ^{2} > 0\), and so we can find \(n_{0}\in \mathbb{N}\) such that \(1-\varrho _{n}^{2} > 0\) for all \(n\geq n_{0}\). By the definitions of \(w_{n}\), \(z_{n}\), and using Claim 3, we have, for all \(n\geq n_{0}\),

$$\begin{aligned} \Vert s_{n+1}-\tilde{s} \Vert &\leq \Vert w_{n}- \tilde{s} \Vert \\ &\leq \Vert z_{n}-\tilde{s} \Vert +\zeta _{n} \Vert z_{n}-s_{n-1} \Vert \\ &\leq \Vert s_{n}-\tilde{s} \Vert +\theta _{n} \Vert s_{n}-s_{n-1} \Vert +\zeta _{n}(1+ \theta _{n}) \Vert s_{n}-s_{n-1} \Vert \\ &\leq \Vert s_{n}-\tilde{s} \Vert +\Lambda _{n} \Vert s_{n}-s_{n-1} \Vert \end{aligned}$$
$$\begin{aligned} &\leq (1+\Lambda _{n}) \Vert s_{n}-\tilde{s} \Vert + \Lambda _{n} \Vert s_{n-1}- \tilde{s} \Vert , \end{aligned}$$

where \(\Lambda _{n} = \theta _{n}+\zeta _{n}(1+\theta _{n})\). By \((\mathcal{C}4)\) and \((\mathcal{C}5)\), we have \(\sum_{n=1}^{\infty}\Lambda _{n}<\infty \), which together with Lemma 2.6 and (3.4) conclude that \(\{s_{n}\}\) is bounded. This implies that

$$\begin{aligned} \sum_{n=1}^{\infty}\Lambda _{n} \Vert s_{n}-s_{n-1} \Vert < \infty , \end{aligned}$$

and so

$$\begin{aligned} \lim_{n\to \infty} \Vert s_{n}-w_{n} \Vert = \lim_{n\to \infty}\Lambda _{n} \Vert s_{n}-s_{n-1} \Vert = 0. \end{aligned}$$

It follows that \(\lbrace \Vert s_{n}-\tilde{s} \Vert \rbrace \) is convergent because of using Lemma 2.7, (3.3), and (3.5).

Claim 5. \(\lim_{n\rightarrow \infty}\|s_{n}-u_{n} \| = 0\).

Indeed, applying Claim 3 and (2.2), we have

$$\begin{aligned} \Vert s_{n+1}-\tilde{s} \Vert ^{2} &\leq \Vert z_{n}-\tilde{s} \Vert ^{2}+2\zeta _{n} \langle z_{n}-s_{n-1}, w_{n}-\tilde{s}\rangle \\ &\quad - \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}-\eta _{n}(1-\mu - \eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &\leq \Vert s_{n}-\tilde{s} \Vert ^{2}+2\theta _{n}\langle s_{n}-s_{n-1}, z_{n}- \tilde{s}\rangle +2\zeta _{n}(1+\theta _{n})\langle s_{n}-s_{n-1}, w_{n}- \tilde{s}\rangle \\ &\quad - \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}-\eta _{n}(1-\mu - \eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &\leq \Vert s_{n}-\tilde{s} \Vert ^{2}+2\theta _{n} \Vert s_{n}-s_{n-1} \Vert \Vert z_{n}- \tilde{s} \Vert +2\zeta _{n}(1+\theta _{n}) \Vert s_{n}-s_{n-1} \Vert \Vert w_{n}- \tilde{s} \Vert \\ &\quad - \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}-\eta _{n}(1-\mu - \eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2} \\ &\leq \Vert s_{n}-\tilde{s} \Vert ^{2}+2M\Lambda _{n} \Vert s_{n}-s_{n-1} \Vert \\ &\quad - \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2}-\eta _{n}(1-\mu - \eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2}, \end{aligned}$$

where \(M := \sup_{n\in \mathbb{N}}\{\|z_{n}-\tilde{s}\|, \|w_{n}- \tilde{s}\|\}\). It follows that

$$\begin{aligned} \bigl(1-\varrho _{n}^{2} \bigr) \Vert w_{n}-y_{n} \Vert ^{2} &\leq \Vert s_{n}-\tilde{s} \Vert ^{2}- \Vert s_{n+1}- \tilde{s} \Vert ^{2}+2M\Lambda _{n} \Vert s_{n}-s_{n-1} \Vert \\ &\quad -\eta _{n}(1-\mu -\eta _{n}) \Vert Tu_{n}-u_{n} \Vert ^{2}. \end{aligned}$$

From Claim 4, \((\mathcal{C}3)\), (3.6), and \(\lim_{n\to \infty}(1-\varrho _{n}^{2})> 0\), we obtain

$$\begin{aligned} \lim_{n\to \infty} \Vert w_{n}-y_{n} \Vert = 0, \end{aligned}$$

implying that

$$\begin{aligned} \lim_{n\to \infty} \Vert Tu_{n}-u_{n} \Vert = 0. \end{aligned}$$

Using \(\lim_{n\to \infty}\varrho _{n} = \lambda \), (3.2), and (3.7), we deduce

$$\begin{aligned} \lim_{n\to \infty} \Vert u_{n}-y_{n} \Vert = 0. \end{aligned}$$

This implies in view of (3.6) and (3.7) that

$$\begin{aligned} \Vert s_{n}-u_{n} \Vert &\leq \Vert s_{n}-w_{n} \Vert + \Vert w_{n}-y_{n} \Vert + \Vert y_{n}-u_{n} \Vert \to 0 \quad \text{as } n\to \infty . \end{aligned}$$

Claim 6. Every weak sequential cluster point of \(\{s_{n}\}\) belongs to Ψ.

Let \(s^{*}\) be a weak sequential cluster point of \(\{s_{n}\}\), meaning that \(s_{n_{k}}\rightharpoonup s^{*}\) as \(k\to \infty \) for some subsequence \(\{s_{n_{k}}\}\) of \(\{s_{n}\}\). This implies by Claim 5 that \(u_{n_{k}}\rightharpoonup s^{*}\) as \(k\to \infty \). This together with (3.8), by the demiclosedness at zero of \(I-T\), \(s^{*}\in Fix(T)\). Next, we show that \(s^{*}\in \arg \min (f+g)\). Let \((v, u)\in graph (\nabla f+\partial g )\), that is, \(u-\nabla f(v)\in \partial g(v)\). It implies by the definition of \(y_{n}\) that \(\frac {w_{n_{k}}-y_{n_{k}}-\tau _{n_{k}}\nabla f(w_{n_{k}})}{\tau _{n_{k}}} \in \partial g(y_{n_{k}})\). By the maximal monotonicity of ∂g, we have

$$\begin{aligned} \biggl\langle v-y_{n_{k}}, u-\nabla f(v)- \frac{w_{n_{k}}-y_{n_{k}}-\tau _{n_{k}}\nabla f(w_{n_{k}})}{\tau _{n_{k}}} \biggr\rangle \geq 0. \end{aligned}$$

Thus, by the monotonicity of f, we get

$$\begin{aligned} \langle v-y_{n_{k}}, u \rangle &\geq \biggl\langle v-y_{n_{k}}, \nabla f(v)+ \frac{w_{n_{k}}-y_{n_{k}}-\tau _{n_{k}}\nabla f(w_{n_{k}})}{\tau _{n_{k}}} \biggr\rangle \\ &= \bigl\langle v-y_{n_{k}}, \nabla f(v)-\nabla f(y_{n_{k}}) \bigr\rangle + \bigl\langle v-y_{n_{k}}, \nabla f(y_{n_{k}})- \nabla f(w_{n_{k}}) \bigr\rangle \\ &\quad + \biggl\langle v-y_{n_{k}}, \frac{w_{n_{k}}-y_{n_{k}}}{\tau _{n_{k}}} \biggr\rangle \\ &\geq \bigl\langle v-y_{n_{k}}, \nabla f(y_{n_{k}})-\nabla f(w_{n_{k}}) \bigr\rangle +\frac{1}{\tau _{n_{k}}} \langle v-y_{n_{k}}, w_{n_{k}}-y_{n_{k}} \rangle . \end{aligned}$$

It follows from the Lipschitz continuity of f, \(\lim_{k\to \infty}\frac{1}{\tau _{n_{k}}}>0\), (3.7), and (3.9) that

$$\begin{aligned} \bigl\langle v-s^{*}, u \bigr\rangle = \lim_{k \rightarrow \infty} \langle v-y_{n_{k}}, u \rangle \geq 0, \end{aligned}$$

from which, together with the maximal monotonicity of \(\nabla f+\partial g\), we get that \(s^{*}\in \arg \min (f+g)\). Therefore, \(s^{*}\in \Psi \). Finally, by Lemma 2.8, we can conclude that \(\{s_{n}\}\) converges weakly to a solution of Ψ. □

4 Signal recovery problem

We consider the signal recovery problem using the linear equation shown below:

$$\begin{aligned} b = Ax^{*}+\varepsilon , \end{aligned}$$

where the original signal is \(x^{*}\in \mathbb{R}^{N}\), \(b\in \mathbb{R}^{M}\) is the observed signal with noise ε, and \(A\in \mathbb{R}^{M\times N} (M < N)\) is a filter matrix. It is generally known that resolving this linear equation is equivalent to determining the LASSO problem:

$$\begin{aligned} \min_{x\in \mathbb{R}^{N}} \frac{1}{2} \Vert Ax-b \Vert ^{2}_{2}+ \Vert x \Vert _{1}. \end{aligned}$$

So we can apply our algorithm to solve this problem in case \(f(\cdot ) = \frac{1}{2}\|A(\cdot )-b\|^{2}_{2}\), \(g(\cdot ) = \|\cdot \|_{1}\) and \(T = \mathrm{prox}_{\tilde{c} g}(I-\tilde{c}\nabla f)\), where \(0<\tilde{c}<\frac {2}{\|A\|^{2}_{2}}\). We present the numerical comparison of Algorithm 1 with Algorithm 3.1 in [20] (IMFB) and Algorithm 2.1 in [21] (IFBAS). All calculations are performed in Matlab R2021a on an iMac (Apple M1 chip with 16GB of RAM). Select the original signal \(x^{*}\) generated by the uniform distribution in \([-2, 2]\) with d nonzero components, and A is the Gaussian matrix generated by command \(randn(M, N)\), where the signal size is set to be \(N = 5000\) and \(M = 2500\). The observation b is generated via the addition of white Gaussian noise ε with variance \(\sigma ^{2} = 0.01\) and the initial points being randomly generated. Let \(t_{0} = 1\) and \(t_{n} = \frac {1+\sqrt{1+4t_{n-1}^{2}}}{2}\) for all \(n\in \mathbb{N}\). The control parameters of each algorithm are defined in the following manner:

\((i)\) IFBAS: \(\alpha _{1} = 0.09\), \(\delta = 0.6\), and

$$ \theta _{n}= \textstyle\begin{cases} \frac {1}{n^{2}} &\text{if } n > 1500; \\ \frac {t_{n-1}-1}{t_{n}} & \text{otherwise;} \end{cases} $$

\((ii)\) IMFB: \(\lambda _{1} = 0.09\), \(\delta = 0.6\), and

$$ \theta _{n}= \textstyle\begin{cases} \frac {1}{n^{2}} &\text{if } n > 1500; \\ \frac {t_{n-1}-1}{t_{n}} & \text{otherwise;} \end{cases} $$

\((iii)\) Algorithm 1: \(\tilde{c} = \frac {1}{\|A\|^{2}_{2}}\), \(\tau _{1} = 0.09\), \(\lambda = 0.6\), \(\eta _{n} = 0.9\), \(q_{n} = 1+\frac {1}{n+1}\), \(p_{n} = \zeta _{n} = \frac {1}{(5n+2)^{2}}\), and

$$ \theta _{n}= \textstyle\begin{cases} \frac {1}{n^{2}} &\text{if } n > 1500; \\ \frac {t_{n-1}-1}{t_{n}} & \text{otherwise.} \end{cases} $$

We measure the accuracy of the signal recovery using the mean-squared error, which is defined as \(MSE_{n} = \frac{1}{N}\|v_{n}-x^{*}\|_{2}^{2}<5\times 10^{-5}\), where \(\{v_{n}\}\) is the sequence to be measured. The numerical results are illustrated next.

When \(d = 500\), Fig. 1 illustrates the original signal, measurement, and signals recovered by each algorithm. Figure 2 displays the mean-squared error for the results obtained from all three algorithms in the same scenario. As shown in Table 1, our algorithm improves CPU time and reduces the number of iterations compared to IFBAS and IMFB. This indicates that the new algorithm outperforms the other two.

Figure 1
figure 1

The original signal, the measurement, and the recovered signals by the three algorithms in case \(d = 500\)

Figure 2
figure 2

Illustration of the mean-squared error value versus the number of iterations using the three algorithms in case \(d=500\)

Table 1 The numerical comparison of the three algorithms

Data Availability

No datasets were generated or analysed during the current study.


  1. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    Article  MathSciNet  Google Scholar 

  2. Deep, A., Abbas, S., Singh, B., Alharthi, M.R., Nisar, K.S.: Solvability of functional stochastic integral equations via Darbo’s fixed point theorem. Alex. Eng. J. 60(6), 5631–5636 (2021)

    Article  Google Scholar 

  3. Qu, Z., Liu, C., Zhu, J., Zhang, Y., Zhou, Y., Wang, L.: Two-step proximal gradient descent algorithm for photoacoustic signal unmixing. J. Photoacoust. 27, 100379 (2022)

    Article  Google Scholar 

  4. Jiang, X., Zeng, X., Sun, J., Chen, J.: Distributed proximal gradient algorithm for non-convex optimization over time-varying networks. IEEE Trans. Control Netw. Syst. 10(2), 1005–1017 (2023)

    Article  MathSciNet  Google Scholar 

  5. Khowaja, S.A., Lee, I.H., Dev, K., Jarwar, M.A., Qureshi, N.M.F.: Get your foes fooled: proximal gradient split learning for defense against model inversion attacks on iomt data. IEEE Trans. Netw. Sci. Eng. 10(5), 2607–2616 (2023)

    Article  Google Scholar 

  6. Mouktonglang, T., Poochinapan, K., Suparatulatorn, R.: A parallel method for common variational inclusion and common fixed point problems with applications. Carpath. J. Math. 39(1), 189–200 (2023)

    Article  MathSciNet  Google Scholar 

  7. Suantai, S., Inkrong, P., Cholamjiak, P.: Forward–backward–forward algorithms involving two inertial terms for monotone inclusions. Comput. Appl. Math. 42(6), 255 (2023)

    Article  MathSciNet  Google Scholar 

  8. Jaggi, M.: An equivalence between the Lasso and support vector machines. In: Suykens, J.A.K., Signoretto, M., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 1–26. Chapman and Hall/CRC, Boca Raton (2014)

    Google Scholar 

  9. Li, Y., Bontcheva, K., Cunningham, H.: Adapting SVM for data sparseness and imbalance: a case study in information extraction. Nat. Lang. Eng. 15(2), 241–271 (2009)

    Article  Google Scholar 

  10. Kumar, A., Chatterjee, J.M., Díaz, V.G.: A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing. Int. J. Electr. Comput Syst. Eng. 10(1), 486 (2020)

    Google Scholar 

  11. Salloum, S., Gaber, T., Vadera, S., Shaalan, K.: A systematic literature review on phishing email detection using natural language processing techniques. IEEE Access 10, 65703–65727 (2022)

    Article  Google Scholar 

  12. Afrin, S., Shamrat, F.J.M., Nibir, T.I., Muntasim, M.F., Moharram, M.S., Imran, M.M., Abdulla, M.: Supervised machine learning based liver disease prediction approach with LASSO feature selection. Bull. Electr. Eng. Inform. 10(6), 3369–3376 (2021)

    Article  Google Scholar 

  13. Cholamjiak, W., Das, S.: A modified projective forward-backward splitting algorithm for variational inclusion problems to predict Parkinson’s disease. Appl. Math. Sci. Eng. 32(1), 2314650 (2024)

    Article  MathSciNet  Google Scholar 

  14. Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    Article  MathSciNet  Google Scholar 

  15. Polyak, B.T.: Introduction to Optimization. Optim. Softw. Inc., New York (1987)

    Google Scholar 

  16. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(O(\frac{1}{k^{2}})\). Sov. Math. Dokl. 27(2), 372–376 (1983)

    Google Scholar 

  17. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  18. Ross, I.M.: Generating Nesterov’s accelerated gradient algorithm by using optimal control theory for optimization. J. Comput. Appl. Math. 423, 114968 (2023)

    Article  MathSciNet  Google Scholar 

  19. Oka, T., Misawa, R., Yamada, T.: Nesterov’s acceleration for level set-based topology optimization using reaction-diffusion equations. Appl. Math. Model. 120, 57–78 (2023)

    Article  MathSciNet  Google Scholar 

  20. Kesornprom, S., Cholamjiak, P.: A modified inertial proximal gradient method for minimization problems and applications. AIMS Math. 7(5), 8147–8161 (2022)

    Article  MathSciNet  Google Scholar 

  21. Kankam, K., Cholamjiak, P.: Inertial proximal gradient method using adaptive stepsize for convex minimization problems. Thai J. Math. 21(2), 277–287 (2023)

    MathSciNet  Google Scholar 

  22. Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4(3), 506–510 (1953)

    Article  MathSciNet  Google Scholar 

  23. Maingé, P.E.: Convergence theorems for inertial KM-type algorithms. J. Comput. Appl. Math. 219, 223–236 (2008)

    Article  MathSciNet  Google Scholar 

  24. Dong, Q.L., Cho, Y.J., Rassias, T.M.: General inertial Mann algorithms and their convergence analysis for nonexpansive mappings. In: Rassias, T.M. (ed.) Applications of Nonlinear Analysis, pp. 175–191 (2018)

    Chapter  Google Scholar 

  25. Zhou, H., Qin, X.: Fixed Points of Nonlinear Operators. Iterative Methods. de Gruyter, Berlin (2020)

    Book  Google Scholar 

  26. Brézis, H.: Opérateurs Maximaux Monotones et Semi-groupes de Contractions dans les Espaces de Hilbert. Math. Studies, vol. 5. North-Holland, Amsterdam (1973)

    Google Scholar 

  27. Hanjing, A., Suantai, S.: A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 8(3), 378 (2020)

    Article  Google Scholar 

  28. Auslender, A., Teboulle, M., Ben-Tiba, S.: A logarithmic-quadratic proximal method for variational inequalities. Comput. Optim. Appl. 12, 31–40 (1999)

    Article  MathSciNet  Google Scholar 

  29. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011)

    Book  Google Scholar 

  30. Liu, H., Yang, J.: Weak convergence of iterative methods for solving quasimonotone variational inequalities. Comput. Optim. Appl. 77, 491–508 (2020)

    Article  MathSciNet  Google Scholar 

Download references


This research work was partially supported by the CMU Proactive Researcher, Chiang Mai University [grant number 818/2566] and the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F650018].


This research work was partially supported by the CMU Proactive Researcher, Chiang Mai University [grant number 818/2566] and the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F650018].

Author information

Authors and Affiliations



Conceptualization: R.S.; Writing-original draft: T.M., R.S.; Software: W.C. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Raweerote Suparatulatorn.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mouktonglang, T., Chaiwino, W. & Suparatulatorn, R. A proximal gradient method with double inertial steps for minimization problems involving demicontractive mappings. J Inequal Appl 2024, 69 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:

Mathematics Subject Classification