A proximal gradient method with double inertial steps for minimization problems involving demicontractive mappings

In this article, we present a novel proximal gradient method based on double inertial steps for solving ﬁxed points of demicontractive mapping and minimization problems. We also establish a weak convergence theorem by applying this method. Additionally, we provide a numerical example related to a signal recovery problem.


Introduction
Optimization and fixed point problems epitomize fundamental mathematical concepts with wide-ranging applications across various fields, including engineering, medicine, signal processing, and image processing.Engineers routinely grapple with the imperative to minimize expenditures, optimize designs, or maximize system efficiency, all of which can be aptly framed as optimization conundrums.In parallel, fixed point theorems assume profound significance in addressing particular engineering challenges, providing a robust mathematical framework for establishing the existence of solutions in diverse scenarios.Signal processing substantially reaps benefits from the incorporation of fixed point problems, particularly within the purview of optimization methodologies.These methodologies establish a resilient framework for effectively navigating the intricate complexities associated with denoising and restoration tasks.Notably, the least absolute shrinkage and selection operator (LASSO) [1] emerges as a pivotal optimization problem, assuming a critical role in the domain of signal reconstruction.Acknowledged for its remarkable efficacy within the compressed sensing paradigm, LASSO has enjoyed widespread recognition within the official discourse of signal processing.Within the realm of image processing, the application of optimization techniques and fixed point problem-solving takes on paramount importance, proving invaluable in the effective resolution of challenges such as image deblurring and image inpainting (refer to [2][3][4][5][6][7] for comprehensive information).
In 2014 (Jaggi [8]), an equivalence between the LASSO and support vector machines (SVMs) was shown in the following sense.Given any L 2 norm loss function SVMs, a cor-responding LASSO formulation has the same optimal solutions and vice versa.As a result, one can be re-translated into the other.From further investigation, the sparsity of a LASSO solution is equal to the number of support vectors for the corresponding SVMs.Many useful properties and sublinear time algorithms for SVMs naturally arise from LASSO properties.SVMs are commonly used for classification and regression tasks and have an extensive list of applications in natural language processing (NLP), particularly in the fields of information extraction and email phishing detection.SVMs are highly effective in information extraction tasks, such as named entity recognition, text categorization, and relation extraction.They are able to identify entities and patterns in unstructured text, as demonstrated in [9].SVMs in email phishing detection [10,11] utilize features such as sender addresses and content to perform binary classification, distinguishing between valid and questionable emails.SVMs can effectively detect anomalies in email traffic, but their success depends on the quality of features, data representation, and the training dataset.Utilizing ensemble approaches, which involve mixing SVMs with other models, improves the effectiveness of phishing detection.This emphasizes the significance of regularly updating the system to effectively respond to shifting phishing strategies.Moreover, in 2021, Afrin et al. [12] employed SVMs in conjunction with LASSO feature selection techniques to predict liver disease.More recently, Cholamjiak and Das [13] developed a modified projective forward-backward splitting algorithm for multiple models, including the LASSO, aimed at the prediction of Parkinson's disease through the application of the extreme learning machine.
Prominent optimization techniques for the minimization of the sum of a smooth function and a nonsmooth function encompass the proximal gradient algorithm, credited to its originator [14] (also referenced in [15]).This method involves the sequential application of gradient steps to the first function, followed by the proximity operator applied to the second function.It is widely recognized that the inclusion of inertia, alternatively referred to as Nesterov's acceleration [16], has the capacity to notably enhance both the theoretical and practical convergence rates of this approach.The recent surge in popularity of Nesterov's acceleration [16] has spurred the development of numerous variations, such as those detailed in references [17][18][19].Particularly noteworthy is the fast iterative shrinkage-thresholding algorithm (FISTA), as introduced by Beck and Teboulle [17], which exhibits a significantly enhanced convergence rate akin to Nesterov's optimal gradient approach, specifically tailored for convex composite objective functions.
Throughout this article, denote by H a real Hilbert space with the inner product •, • and the associated norm • .Let R and N be the sets of real numbers and nonnegative integers, respectively.We are interested in the following minimization problem: where f : H → R and g : H → (-∞, +∞] belong to the class of proper, lower semicontinuous (l.s.c.), and convex functions on H. Furthermore, the function f is assumed to be differentiable with L-Lipschitz continuous gradient ∇f .The set of minimizers of f + g is denoted by arg min(f + g).It is well known that where ∂g is the subdifferential of g.Recently, Kesornprom and Cholamjiak [20] introduced a new proximal gradient method that integrates the inertial technique alongside adaptive step size, demonstrating its effectiveness in addressing the minimization problem defined in equation (1.1).This algorithm has been applied to solve X-ray image deblurring.Similarly, Kankam and Cholamjiak [21] investigated image restoration as a mathematical model using the minimization problem (1.1).Next, we consider the following fixed point problem: where T : H → H is a mapping.We denote by Fix(T) the fixed point set of T. The Mann iteration [22] is prominent among the frequently employed algorithms for solving the fixed point problem described in equation (1.2).In 2008, Maingé [23] introduced an algorithm that cleverly integrates the inertial technique with Mann iteration, customizing it to address the fixed point problem (1.2).It is noteworthy that, under certain conditions, the iterative sequence generated by this algorithm weakly converges to a fixed point of a nonexpansive mapping.The general inertial Mann iteration for a nonexpansive mapping was introduced by Dong et al. in 2018 [24].It is evident that the method in [23] is a specific instance of this general inertial Mann iteration.According to [24], the sequence obtained by the general inertial Mann iteration weakly converges to a fixed point under certain suitable circumstances.
Drawing upon the insights garnered from preceding research, this paper proposes a novel proximal gradient method that incorporates the general inertial Mann iteration to obtain a weak convergence theorem for solving both the minimization problem (1.1) and the fixed point problem (1.2) associated with a demicontractive mapping, subject to specified control conditions.Furthermore, the efficacy of our proposed algorithm is demonstrated by its application to a signal recovery problem, underscoring its practical utility.

Preliminaries
To establish our primary result, this section provides necessary definitions and lemmas.We use the symbol → to represent strong convergence and denote weak convergence as .Let s, t ∈ H and η ∈ R. Then we have ) and Definition 2.4 [25] Suppose Fix(T) = ∅.Then I -T is demiclosed at zero if for any {s n } ∈ H, the following implication holds: Lemma 2.5 [26] If T is a Lipschitz continuous and monotone mapping and G is a maximal monotone mapping, then the mapping T + G is maximal monotone.
Lemma 2.6 [27] Let {x n } and { n } be nonnegative sequences of real numbers satisfying

Main result
We first assume that the following conditions are satisfied for the convergence analysis of our algorithm: Condition 1. f : H → R and g : H → (-∞, +∞] are two proper, l.s.c., and convex functions.

Algorithm 1 Modified Proximal Gradient Algorithm
Initialization: Select arbitrary elements s 0 , s 1 ∈ H. Let λ ∈ (0, 1), Iterative Steps: Construct {s n } by using the following steps: Step 1. Compute , , and If w n = y n = u n = Tu n , then stop and w n ∈ .Otherwise, go to Step 2.
Step 2. Update Replace n with n + 1 and then repeat Step 1.We are now prepared for the main convergence theorem.
Theorem 3.2 Let {s n } be generated by Algorithm 1. Assume that the following conditions hold:

Then {s n } converges weakly to a solution of .
Proof Let s ∈ .Next, we prove all the following claims.
Claim 1. lim n→∞ n = λ, where n = λq n τ n τ n+1 .Since ∇f is L-Lipschitz continuous mapping, if ∇f (w n ) = ∇f (y n ), then By using the same technique as in the proof of [30,Lemma 3.1], we obtain where p = ∞ n=1 p n .It follows from (C2) that Claim 2. For any n ∈ N, By using the definition of y n , we have Thus, we can write where c n ∈ ∂g(y n ).By Lemma 2.5, we have that the mapping ∇f + ∂g is maximal monotone.This leads to and thus From (2.1), we have Using Claim 2, we get By the definitions of u n and τ n , we have This together with (3.1) implies that Applying this to (2.3) and the demicontractiveness of T, we derive Claim 4. lim n→∞ s n -s exists.
From Claim 1, we immediately get that lim n→∞ (1 -2 n ) = 1λ 2 > 0, and so we can find n 0 ∈ N such that 1 -2 n > 0 for all n ≥ n 0 .By the definitions of w n , z n , and using Claim 3, we have, for all n ≥ n 0 , where n = θ n + ζ n (1 + θ n ).By (C4) and (C5), we have ∞ n=1 n < ∞, which together with Lemma 2.6 and (3.4) conclude that {s n } is bounded.This implies that and so It follows that { s n -s } is convergent because of using Lemma 2.7, (3.3), and (3.5).Claim 5. lim n→∞ s nu n = 0. Indeed, applying Claim 3 and (2.2), we have where From Claim 4, (C3), (3.6), and lim n→∞ (1 -2 n ) > 0, we obtain This implies in view of (3.6) and (3.7) that Claim 6.Every weak sequential cluster point of {s n } belongs to .Let s * be a weak sequential cluster point of {s n }, meaning that s n k s * as k → ∞ for some subsequence {s n k } of {s n }.This implies by Claim 5 that u n k s * as k → ∞.This together with (3.8), by the demiclosedness at zero of I -T, s * ∈ Fix(T).Next, we show that s * ∈ arg min(f + g).Let (v, u) ∈ graph(∇f + ∂g), that is, u -∇f (v) ∈ ∂g (v).It implies by the definition of y n that ∈ ∂g(y n k ).By the maximal monotonicity of ∂g, we have Thus, by the monotonicity of ∇f , we get It follows from the Lipschitz continuity of ∇f , lim k→∞ (3.7), and (3.9) that from which, together with the maximal monotonicity of ∇f + ∂g, we get that s * ∈ arg min(f + g).Therefore, s * ∈ .Finally, by Lemma 2.8, we can conclude that {s n } converges weakly to a solution of .

Signal recovery problem
We consider the signal recovery problem using the linear equation shown below: where the original signal is x * ∈ R N , b ∈ R M is the observed signal with noise ε, and A ∈ R M×N (M < N) is a filter matrix.It is generally known that resolving this linear equation is equivalent to determining the LASSO problem: So we can apply our algorithm to solve this problem in case f We present the numerical comparison of Algorithm 1 with Algorithm 3.1 in [20] (IMFB) and Algorithm 2.1 in [21] (IFBAS).All calculations are performed in Matlab R2021a on an iMac (Apple M1 chip with 16GB of RAM).Select the original signal x * generated by the uniform distribution in [-2, 2] with d nonzero components, and A is the Gaussian matrix generated by command randn(M, N), where the signal size is set to be N = 5000 and M = 2500.The observation b is generated via the addition of white Gaussian noise ε with variance σ 2 = 0.01 and the initial points being randomly generated.Let t 0 = 1 and t n = 1+ 1+4t 2 n-1 2 for all n ∈ N. The control parameters of each algorithm are defined in the following manner: (i) IFBAS: α 1 = 0.09, δ = 0.6, and  (ii) IMFB: λ 1 = 0.09, δ = 0.6, and if n > 1500; t n-1 -1 t n otherwise; (iii) Algorithm 1: c = 1 , τ 1 = 0.09, λ = 0.6, η n = 0.9, q n = 1 + 1 n+1 , p n = ζ n = 1 (5n+2) 2 , and if n > 1500; t n-1 -1 t n otherwise.
We measure the accuracy of the signal recovery using the mean-squared error, which is defined as MSE n = 1 N v nx * 2 2 < 5 × 10 -5 , where {v n } is the sequence to be measured.The numerical results are illustrated next.
When d = 500, Fig. 1 illustrates the original signal, measurement, and signals recovered by each algorithm.Figure 2 displays the mean-squared error for the results obtained from all three algorithms in the same scenario.As shown in Table 1, our algorithm improves CPU time and reduces the number of iterations compared to IFBAS and IMFB.This indicates that the new algorithm outperforms the other two.

Condition 2 .
f is differentiable and has an L-Lipschitz continuous gradient ∇f .Condition 3. T : H → H is a μ-demicontractive mapping such that I -T is demiclosed at zero.Condition 4. := arg min(f + g) ∩ Fix(T) is nonempty.Remark 3.1 It is known from [29] that x ∈ arg min(f + g) if and only if x = prox cg (I -c∇f )x, where c > 0. If w n = y n = u n = Tu n in Algorithm 1, then w n ∈ .

Figure 1 500 Figure 2
Figure 1 The original signal, the measurement, and the recovered signals by the three algorithms in case d = 500 Opial] Let {s n } be a sequence in H and be a nonempty subset of H. If, for every s * ∈ , { s ns * } converges and every weak sequential cluster point of {s n } belongs to , then {s n } converges weakly to a point in .
[28] {x n } is bounded.Lemma 2.7[28]Let {x n } and {y n } be sequences of nonnegative real numbers such that ∞ n=1 y n < ∞ and x n+1 ≤ x n + y n .Then {x n } is a convergent sequence.

Table 1
The numerical comparison of the three algorithms