# On convergence and complexity analysis of an accelerated forward–backward algorithm with linesearch technique for convex minimization problems and applications to data prediction and classification

## Abstract

In this work, we introduce a new accelerated algorithm using a linesearch technique for solving convex minimization problems in the form of a summation of two lower semicontinuous convex functions. A weak convergence of the proposed algorithm is given without assuming the Lipschitz continuity on the gradient of the objective function. Moreover, the convexity of this algorithm is also analyzed. Some numerical experiments in machine learning are also discussed, namely regression and classification problems. Furthermore, in our experiments, we evaluate the convergent behavior of this new algorithm, then compare it with various algorithms mentioned in the literature. It is found that our algorithm performs better than the others.

## Introduction

In this paper, we study the convex minimization problem in the form of a summation of two convex functions. It can be expressed as follows:

$$\min_{x \in H} \bigl\{ f(x) + g(x)\bigr\} ,$$
(1)

where $$f,g: H \rightarrow \mathbb{R}\cup \{+\infty \}$$ are proper, lower semicontinuous convex functions and H is a Hilbert space. This problem has been analyzed excessively due to its applications in major subjects such as physics, economics, engineering, statistics, and computer science. Some examples of the applications are compressed sensing, signal and image processing, medical image reconstruction, automatic control systems, and machine learning tasks in the form of data prediction and data classification. As seen in  and the references therein, these problems can be formulated as (1).

In the case that f is differentiable, then $$x^{*}$$ solves (1) if and only if

$$x^{*} = \mathrm{prox}_{\alpha g}(I - \alpha \triangledown f) \bigl(x^{*}\bigr),$$
(2)

where $$\alpha >0$$, $$\mathrm{prox}_{\alpha g}(x^{*}) = J_{\alpha }^{\partial g}(x^{*}) = (I - \alpha \partial g)^{-1}(x^{*})$$, ∂g is a subdifferential of g, and I is an identity mapping. One of the most famous algorithms for solving (1) is forward–backward algorithm  which is defined in the following form:

$$x_{n+1} = \mathrm{prox}_{\alpha _{n} g}(I - \alpha _{n} \triangledown f) (x_{n}) \quad \text{for all } n \in \mathbb{N},$$
(3)

where $$\alpha _{n}$$ is a suitable step size. This method has been studied and improved by many works, see [2, 3, 9, 10] for examples. Most of these works assume that f is L-Lipschitz continuous, which might be challenging to verify in general cases. So, in this work, we turn our attention to some iterative methods for which the Lipschitz continuity of f is not required.

In 2016, Cruz and Nghia  replaced the L-Lipschitz continuity of f with the following conditions.

1. A1.

f, g are proper lower semicontinuous convex functions with $$\operatorname{dom} g \subseteq \operatorname{dom} f$$,

2. A2.

f is differentiable on an open set containing domg, and f is uniformly continuous on any bounded subset of domg and maps any bounded subset of domg to a bounded set of H.

Moreover, the authors introduced a linesearch technique as follows:

### Linesearch 1

Given $$x \in \operatorname{dom} g$$, $$\sigma > 0$$, $$\theta \in (0,1)$$, and $$\delta > 0$$.

Input Set $$\alpha = \sigma$$.

While $$\alpha \| \triangledown f(\mathrm{prox}_{\alpha g}(x - \alpha \triangledown f(x))) - \triangledown f(x) \| > \delta \| \mathrm{prox}_{\alpha g}(x - \alpha \triangledown f(x)) - x \|$$

Set $$\alpha = \theta \alpha$$

End While

Output α.

They asserted that Linesearch 1 terminates after a finite number of iterations and introduced the following algorithm:

### Algorithm 1

Given $$x_{1} \in \operatorname{dom} g$$, $$\sigma > 0$$, $$\theta \in (0,1)$$, and $$\delta \in (0,\frac{1}{2})$$. For $$n \in \mathbb{N}$$,

$$x_{n+1} = \mathrm{prox}_{\gamma _{n} g}(I - \gamma _{n} \triangledown f) (x_{n}),$$
(4)

where $$\gamma _{n} := \textbf{Linesearch}\text{ 1} (x_{n},\sigma ,\theta , \delta )$$. They proved weak convergence theorem of (4) under assumptions A1 and A2.

Following the idea of Cruz and Nghia, very recently, Kankam et al.  introduced a new linesearch technique as follows.

### Linesearch 2

Given $$x \in \operatorname{dom} g$$, $$\sigma > 0$$, $$\theta \in (0,1)$$, and $$\delta > 0$$. Define

\begin{aligned}& L(x,\alpha ) = \mathrm{prox}_{\alpha g}\bigl(x-\alpha \triangledown f(x) \bigr)\quad \text{and } \\& S(x,\alpha ) = \mathrm{prox}_{\alpha g}\bigl(L(x,\alpha )-\alpha \triangledown f\bigl(L(x, \alpha )\bigr)\bigr). \end{aligned}

Input Set $$\alpha = \sigma$$.

While

\begin{aligned}& \alpha \max \bigl\{ \bigl\Vert \triangledown f\bigl(S(x,\alpha )\bigr) - \triangledown f\bigl(L(x, \alpha )\bigr) \bigr\Vert , \bigl\Vert \triangledown f\bigl(L(x,\alpha )\bigr) - \triangledown f(x) \bigr\Vert \bigr\} \\& \quad > \delta \bigl( \bigl\Vert S(x,\alpha ) - L(x,\alpha ) \bigr\Vert + \bigl\Vert L(x,\alpha ) - x \bigr\Vert \bigr) \end{aligned}

Set $$\alpha = \theta \alpha$$

End While

Output α.

They showed that Linesearch 2 terminates at finitely many iterations, then established the following two-step algorithm.

### Algorithm 2

Given $$x_{1} \in \operatorname{dom} g$$, $$\sigma > 0$$, $$\theta \in (0,1)$$, and $$\delta \in (0,\frac{1}{8})$$. For $$n \in \mathbb{N}$$,

$$\textstyle\begin{cases} y_{n} = \mathrm{prox}_{\gamma _{n} g}(x_{n} - \gamma _{n} \triangledown f(x_{n}) ), \\ x_{n+1} = \mathrm{prox}_{\gamma _{n} g}(y_{n} - \gamma _{n} \triangledown f(y_{n}) ), \end{cases}$$
(5)

where $$\gamma _{n} := \textbf{Linesearch}\text{ 2}(x_{n},\sigma ,\theta ,\delta )$$. They proved that this algorithm converges weakly to a solution of (1) under assumptions A1 and A2.

Recently, many authors employed the inertial technique in order to accelerate their algorithms. It was first introduced by Polyak  for solving smooth convex minimization problems. After that many inertial-type algorithms have been introduced and analyzed. For instance, in 2001, Alvarez and Attouch  introduced the idea of an inertial-proximal operator to solve the inclusion problem of a maximal monotone operator A. It was defined as follows:

$$x_{n+1} = J^{A}_{\lambda _{n}}\bigl(x_{n} + \theta _{n}(x_{n} - x_{n-1})\bigr)\quad \text{for all } n \in \mathbb{N},$$

where $$x_{0},x_{1} \in H$$ are given as starting points, and $$\{\lambda _{n}\}$$ and $$\{ \theta _{n} \}$$ are nonnegative real sequences. In this algorithm, $$\theta _{n} (x_{n} - x_{n-1})$$ is regarded as an inertial term.

In 2019, Attouch and Cabot  analyzed the convergence rate of an algorithm called RIPA defined by

$$\textstyle\begin{cases} y_{n} = x_{n} + \theta _{n}(x_{n} - x_{n-1}), \\ x_{n+1} = (1-\rho _{n})y_{n} + \rho _{n}J_{\mu _{n}}^{A}(y_{n}), \end{cases}$$

where A is a maximal monotone operator. Under mild restrictions of control parameters, they showed that RIPA gives fast convergence rate.

Inertial-type algorithms have been proposed and studied widely by many authors, see , which showed that inertial step improves the convergence rate of algorithms.

There are several approaches to solving (1), many authors have proposed algorithms for solving inclusion problems. For instance, Moudafi  proposed an algorithm for solving inclusion problems in Hilbert spaces. Cholamjiak and Shehu  introduced an algorithm for such problems in Banach space, we refer to these works for more comprehensive discussion on inclusion problems and related problems. Under the assumption that f is Lipschitz continuous, algorithms proposed in [23, 24] can be used to solve (1).

Another approach to solving (1) is solving a proximal split feasibility problem. This problem can be reduced to convex minimization (1). Many authors have introduced algorithms for solving this problem, we refer to Shehu and Iyiola  for more in-depth discussion on this topic.

Inspired by all the works mentioned in the literature, we aim to introduce a new two-step algorithm which combines a linesearch technique with an inertial step to improve its performance. We obtain a weak convergence of the proposed algorithm to a solution of (1) without assuming f to be L-Lipschitz continuous. Moreover, the complexity of this algorithm is also analyzed. Then, we apply our algorithm to solving regression and classification problems. Furthermore, we compare the performance of the proposed with other linesearch algorithms, namely Algorithms 1 and 2.

This work is organized as follows: In Sect. 2, we recall some definitions and lemmas which will be used in the main results. In Sect. 3, a new algorithm is introduced. We show that the proposed algorithm converges weakly to a solution of (1) as well as analyze its complexity. In Sect. 4, experiments on data classification and regression problems are conducted. Then, we evaluate the performance of the proposed algorithm and other algorithms using various evaluation tools. In the last section, Sect. 5, the conclusion of this research is included.

## Preliminaries

We recall some definitions and lemmas which are crucial to the main results in this section.

We denote $$x_{n} \rightarrow x$$ and $$x_{n} \rightharpoonup x$$ as strong and weak convergence of $$\{x_{n}\}$$ to x, respectively. Let $$h: H \rightarrow \mathbb{R}$$ be a proper lower semicontinuous convex function and $$\operatorname{dom} h = \{ x \in H : f(x) < +\infty \}$$.

For any $$x \in H$$, a subdifferential of h at x is defined by

$$\partial h(x) := \bigl\{ u \in H : \langle u, y-x \rangle + h(x)\leq h(y), y \in H \bigr\} .$$

A proximal operator $$\mathrm{prox}_{\alpha h} : H \rightarrow \operatorname{dom} h$$ is defined by

$$\mathrm{prox}_{\alpha h} (x) = (I + \alpha \partial h)^{-1} (x),$$

where I is an identity operator and $$\alpha > 0$$. This operator is single-valued with full domain, and the following holds:

$$\frac{x - \mathrm{prox}_{\alpha h}(x)}{\alpha } \in \partial h\bigl(\mathrm{prox}_{\alpha h}(x) \bigr)\quad \text{for all } x \in H \text{ and } \alpha > 0.$$
(6)

Next, we recall some crucial lemmas for this work.

### Lemma 1

()

Let ∂h be a subdifferential operator, then ∂h is maximal monotone. Moreover, its graph, $$\operatorname{Gph}(\partial h):= \{ (x,y) \in H \times H: y \in \partial h(x) \}$$, is demiclosed. In other words, for any sequence $$(x_{n},y_{n}) \subseteq \operatorname{Gph}(\partial h)$$ such that $$\{x_{n}\}$$ converges weakly to x and $$\{y_{n}\}$$ converges strongly to y, then $$(x,y) \in \operatorname{Gph}(\partial h)$$.

### Lemma 2

()

Let $$f,g :H \rightarrow \mathbb{R}$$ be proper lower semicontinuous convex functions with $$\operatorname{dom} g \subseteq \operatorname{dom} f$$ and $$J(x,\beta ) = \mathrm{prox}_{\beta g}(x - \beta \triangledown f(x))$$. Then, for any $$x \in \operatorname{dom} g$$ and $$\beta _{2} \geq \beta _{1} > 0$$, we have

$$\frac{\beta _{2}}{\beta _{1}} \bigl\Vert x - J(x,\beta _{1}) \bigr\Vert \geq \bigl\Vert x - J(x, \beta _{2}) \bigr\Vert \geq \bigl\Vert x - J(x,\beta _{1}) \bigr\Vert .$$

### Lemma 3

()

Let H be a real Hilbert space. Then, for all $$a,b \in H$$ and $$\zeta \in [0,1]$$, the following hold:

1. (i)

$$\| a \pm b \|^{2} = \| a \|^{2} \pm 2\langle a,b \rangle + \| b \|^{2}$$,

2. (ii)

$$\| \zeta a + (1-\zeta )b \|^{2} = \zeta \| a \|^{2} + (1-\zeta ) \| b \|^{2} - \zeta (1-\zeta )\| a-b \|^{2}$$,

3. (iii)

$$\| a + b\|^{2} \leq \| a \|^{2} + 2\langle b,a+b \rangle$$.

### Lemma 4

()

Let $$\{a_{n}\}$$ and $$\{\beta _{n}\}$$ be sequences of nonnegative real numbers such that

$$a_{n+1} \leq (1+\beta _{n})a_{n} + \beta _{n} a_{n-1}\quad \textit{for all } n\in \mathbb{N}.$$

Then the following holds:

$$a_{n+1} \leq K\cdotp \prod_{j=1}^{n} (1+2\beta _{j}), \quad \textit{where } K = \max \{a_{1},a_{2} \}.$$

Moreover, if $$\sum_{n=1}^{+\infty }\beta _{n} < +\infty$$, then $$\{a_{n}\}$$ is bounded.

### Lemma 5

()

Let $$\{a_{n}\}$$, $$\{b_{n}\}$$, and $$\{\delta _{n}\}$$ be sequences of nonnegative real numbers such that

$$a_{n+1} \leq (1+\delta _{n}) a_{n} + b_{n} \quad \textit{for all } n\in \mathbb{N}.$$

If $$\sum_{n=1}^{+\infty }\delta _{n} < +\infty$$ and $$\sum_{n=1}^{+\infty }b_{n} < +\infty$$, then $$\lim_{n \rightarrow +\infty } a_{n}$$ exists.

### Lemma 6

(, Opial)

Let H be a Hilbert space and $$\{x_{n}\}$$ be a sequence in H such that there exists a nonempty subset Ω of H satisfying the following:

1. (i)

for any $$x^{*} \in \Omega , \lim_{n \rightarrow +\infty } \|x_{n}-x^{*} \|$$ exists;

2. (ii)

every weak-cluster point of $$\{x_{n}\}$$ belongs to Ω.

Then $$\{x_{n}\}$$ converges weakly to an element in Ω.

Throughout this work, we suppose that a solution of (1) exists and the set of these solutions is denoted by $$S_{*}$$.

## Main results

In this section, we propose an accelerated algorithm by employing a linesearch technique (Linesearch 1) together with the inertial technique for solving (1) and prove its weak convergence. Our algorithm is defined as follows.

### Algorithm 3

Given $$x_{0}, x_{1} \in \operatorname{dom} g$$, $$\sigma > 0$$, $$\theta \in (0,1)$$, $$\delta \in (0, \frac{1}{2})$$, $$\alpha _{n} \in [0,1]$$, and $$\beta _{n} \geq 0$$. For $$n \in \mathbb{N}$$,

$$\textstyle\begin{cases} \hat{x}_{n} = x_{n} + \beta _{n}(x_{n} - x_{n-1}), \\ y_{n} = P_{\operatorname{dom} g} \hat{x}_{n}, \\ z_{n} = \mathrm{prox}_{\gamma _{n} g}(y_{n} - \gamma _{n} \triangledown f(y_{n}) ), \\ x_{n+1} = (1-\alpha _{n})z_{n} + \alpha _{n} \mathrm{prox}_{\rho _{n} g}(z_{n} - \rho _{n} \triangledown f(z_{n}) ), \end{cases}$$

where $$\gamma _{n} := \textbf{Linesearch}\text{ 1} (y_{n},\sigma ,\theta ,\delta )$$ and $$\rho _{n} := \textbf{Linesearch}\text{ 1}(z_{n},\gamma _{n}, \theta ,\delta )$$, and $$P_{\operatorname{dom} g}$$ is a metric projection onto domg.

### Theorem 7

Let H be a real Hilbert space, $$f :H \rightarrow \mathbb{R}\cup \{+\infty \}$$ and $$g :H \rightarrow \mathbb{R}\cup \{+\infty \}$$ be proper lower semicontinuous convex functions satisfying A1 and A2. In addition, suppose that domg is closed and the following is satisfied, for all $$n \in \mathbb{N}$$:

1. B1.

$$\sum_{n=1}^{+\infty } \beta _{n} < +\infty$$.

Then a sequence $$\{x_{n}\}$$ generated by Algorithm 3converges weakly to a point in $$S_{*}$$. In other words, $$\{x_{n}\}$$ converges weakly to a solution of (1).

### Proof

For the sake of convenience, we denote $$w_{n} = \mathrm{prox}_{\rho _{n} g}(z_{n} - \rho _{n} \triangledown f(z_{n}))$$, and let $$x^{*} \in S_{*}$$. For any $$x \in \operatorname{dom} g$$ and $$n \in \mathbb{N}$$, we first prove the following:

\begin{aligned}& \Vert y_{n} - x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2} \geq 2\gamma_{n}[(f+g) (z_{n}) - (f+g) (x)] + (1-2\delta ) \Vert z_{n} - y_{n} \Vert ^{2}, \end{aligned}
(7)
\begin{aligned}& \Vert z_{n} - x \Vert ^{2} - \Vert w_{n} - x \Vert ^{2} \geq 2\rho_{n}[(f+g) (w_{n}) - (f+g) (x)] + (1-2\delta ) \Vert w_{n} - z_{n} \Vert ^{2}. \end{aligned}
(8)

In order to show (7), we obtain from (6) that

$$\frac{y_{n} - z_{n}}{\gamma _{n}} - \triangledown f(y_{n}) \in \partial g(z_{n}) \quad \text{for all } n \in \mathbb{N}.$$

By the definitions of $$\partial g(z_{n})$$, $$\triangledown f(y_{n})$$, and $$\triangledown f(z_{n})$$, we have

\begin{aligned}& g(x) - g(z_{n}) \geq \biggl\langle \frac{y_{n} - z_{n}}{\gamma _{n}} - \triangledown f(y_{n}), x - z_{n} \biggr\rangle , \\& f(x) - f(y_{n}) \geq \bigl\langle \triangledown f(y_{n}), x - y_{n} \bigr\rangle \text{ and } f(y_{n}) - f(z_{n}) \geq \bigl\langle \triangledown f(z_{n}) , y_{n} - z_{n} \bigr\rangle \end{aligned}

for all $$n \in \mathbb{N}$$. From these inequalities and the definition of $$\gamma _{n}$$, we obtain

\begin{aligned} f(x) - f(y_{n}) + g(x) - g(z_{n})\geq{}& \frac{1}{\gamma _{n}}\langle y_{n} - z_{n}, x - z_{n} \rangle + \bigl\langle \triangledown f(y_{n}), z_{n} - y_{n} \bigr\rangle \\ = {}& \frac{1}{\gamma _{n}}\langle y_{n} - z_{n}, x - z_{n} \rangle + \bigl\langle \triangledown f(y_{n}) - \triangledown f(z_{n}) , z_{n} - y_{n} \bigr\rangle \\ & {}+ \bigl\langle \triangledown f(z_{n}), z_{n} - y_{n} \bigr\rangle \\ \geq{}& \frac{1}{\gamma _{n}}\langle y_{n} - z_{n}, x - z_{n} \rangle - \bigl\Vert \triangledown f(y_{n}) - \triangledown f(z_{n}) \bigr\Vert \Vert z_{n} - y_{n} \Vert \\ &{}+ \bigl\langle \triangledown f(z_{n}), z_{n} - y_{n} \bigr\rangle \\ \geq{} & \frac{1}{\gamma _{n}}\langle y_{n} - z_{n}, x - z_{n} \rangle - \frac{\delta }{\gamma _{n}} \Vert z_{n} - y_{n} \Vert ^{2} + f(z_{n}) - f(y_{n}) \end{aligned}

for all $$n \in \mathbb{N}$$. Consequently,

$$\frac{1}{\gamma _{n}}\langle y_{n} - z_{n}, z_{n} -x \rangle \geq (f+g) (z_{n}) - (f+g) (x) - \frac{\delta }{\gamma _{n}} \Vert z_{n} - y_{n} \Vert ^{2} \quad \text{for all } n \in \mathbb{N}.$$

Since $$\langle y_{n} - z_{n}, z_{n} -x \rangle = \frac{1}{2}( \| y_{n} -x \|^{2} - \| y_{n} - z_{n}\|^{2} - \| z_{n} - x \|^{2})$$, we have

$$\frac{1}{2\gamma _{n}}\bigl( \Vert y_{n} -x \Vert ^{2} - \Vert y_{n} - z_{n} \Vert ^{2} - \Vert z_{n} - x \Vert ^{2}\bigr) \geq (f+g) (z_{n}) - (f+g) (x) - \frac{\delta }{\gamma _{n}} \Vert z_{n} - y_{n} \Vert ^{2}$$

for all $$n \in \mathbb{N}$$. Hence, for any $$x \in \operatorname{dom} g$$, we have

\begin{aligned} \Vert y_{n} -x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2} &\geq 2\gamma _{n} \bigl[(f +g) (z_{n}) - (f +g) (x)\bigr] + (1-2\delta ) \Vert z_{n} - y_{n} \Vert ^{2} \end{aligned}

for all $$n \in \mathbb{N}$$. Furthermore, since $$x^{*} \in S_{*} \subseteq \operatorname{dom} g$$, we have

\begin{aligned} \bigl\Vert y_{n} -x^{*} \bigr\Vert ^{2} - \bigl\Vert z_{n} - x^{*} \bigr\Vert ^{2} &\geq 2\gamma _{n}\bigl[(f +g) (z_{n}) - (f +g) \bigl(x^{*}\bigr)\bigr] + (1 -2\delta ) \Vert z_{n} - y_{n} \Vert ^{2} \\ & \geq (1 -2\delta ) \Vert z_{n} - y_{n} \Vert ^{2} \quad \text{for all } n \in \mathbb{N}. \end{aligned}
(9)

To prove (8), using the same arguments, we obtain the following inequalities:

\begin{aligned}& \frac{z_{n} - w_{n}}{\rho _{n}} - \triangledown f(z_{n}) \in \partial g(w_{n}), \\& g(x) - g(w_{n}) \geq \biggl\langle \frac{z_{n} - w_{n}}{\rho _{n}} - \triangledown f(z_{n}), x - w_{n} \biggr\rangle , \\& f(x) - f(z_{n}) \geq \bigl\langle \triangledown f(z_{n}), x - z_{n} \bigr\rangle \quad \text{and}\quad f(z_{n}) - f(w_{n}) \geq \bigl\langle \triangledown f(w_{n}) , z_{n} - w_{n} \bigr\rangle \end{aligned}

for all $$n \in \mathbb{N}$$. Again, using the above inequalities, we have

\begin{aligned} f(x) - f(z_{n}) + g(x) - g(w_{n})\geq{}& \frac{1}{\rho _{n}}\langle z_{n} - w_{n}, x - w_{n} \rangle + \bigl\langle \triangledown f(z_{n}), w_{n} - z_{n} \bigr\rangle \\ = {}&\frac{1}{\rho _{n}}\langle z_{n} - w_{n}, x - w_{n} \rangle + \bigl\langle \triangledown f(z_{n}) - \triangledown f(w_{n}) , w_{n} - z_{n} \bigr\rangle \\ &{} + \bigl\langle \triangledown f(w_{n}), w_{n} - z_{n} \bigr\rangle \\ \geq{}& \frac{1}{\rho _{n}}\langle z_{n} - w_{n}, x - w_{n} \rangle - \bigl\Vert \triangledown f(z_{n}) - \triangledown f(w_{n}) \bigr\Vert \Vert w_{n} - z_{n} \Vert \\ &{}+ \bigl\langle \triangledown f(w_{n}), w_{n} - z_{n} \bigr\rangle \\ \geq {}&\frac{1}{\rho _{n}}\langle z_{n} - w_{n}, x - w_{n} \rangle - \frac{\delta }{\rho _{n}} \Vert w_{n} - z_{n} \Vert ^{2} + f(w_{n}) - f(z_{n}) \end{aligned}

for all $$n \in \mathbb{N}$$, which implies that

$$\frac{1}{\rho _{n}}\langle z_{n} - w_{n}, w_{n} -x \rangle \geq (f+g) (w_{n}) - (f+g) (x) - \frac{\delta }{\rho _{n}} \Vert w_{n} - z_{n} \Vert ^{2} \quad \text{for all } n \in \mathbb{N}.$$

Since $$\langle z_{n} - w_{n}, w_{n} -x \rangle = \frac{1}{2}( \| z_{n} -x \|^{2} - \| z_{n} - w_{n}\|^{2} - \| w_{n} - x \|^{2})$$, we get

$$\frac{1}{2\rho _{n}}\bigl( \Vert z_{n} -x \Vert ^{2} - \Vert z_{n} - w_{n} \Vert ^{2} - \Vert w_{n} - x \Vert ^{2}\bigr) \geq (f+g) (w_{n}) - (f+g) (x) - \frac{\delta }{\rho _{n}} \Vert w_{n} - z_{n} \Vert ^{2}$$

for all $$n \in \mathbb{N}$$. It follows that, for all $$x \in \operatorname{dom} g$$ and $$n \in \mathbb{N}$$,

$$\Vert z_{n} - x \Vert ^{2} - \Vert w_{n} - x \Vert ^{2} \geq 2\rho_{n}[(f+g) (w_{n}) - (f+g) (x)] + (1-2\delta ) \Vert w_{n} - z_{n} \Vert ^{2}.$$

So, putting $$x = x^{*}$$, we obtain

$$\bigl\Vert z_{n} - x^{*} \bigr\Vert ^{2} - \bigl\Vert w_{n} - x^{*} \bigr\Vert ^{2} \geq (1-2\delta ) \Vert w_{n} - z_{n} \Vert ^{2} \quad \text{for all } n \in \mathbb{N}.$$
(10)

Furthermore, from the definition of $$x_{n+1}$$, (9) and (10), we conclude that

\begin{aligned} \bigl\Vert x_{n+1}-x^{*} \bigr\Vert ^{2} &= (1-\alpha _{n}) \bigl\Vert z_{n} - x^{*} \bigr\Vert ^{2} + \alpha _{n} \bigl\Vert w_{n} - x^{*} \bigr\Vert ^{2} -(1-\alpha _{n}) (\alpha _{n}) \Vert z_{n} - w_{n} \Vert ^{2} \\ &\leq \bigl\Vert z_{n} - x^{*} \bigr\Vert ^{2} \end{aligned}
(11)
\begin{aligned} &\leq \bigl\Vert y_{n} - x^{*} \bigr\Vert ^{2} \quad \text{for all } n \in \mathbb{N}. \end{aligned}
(12)

Now, we show that $$\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \|$$ exists.

From (12) and the nonexpansiveness of $$P_{\operatorname{dom} g}$$, we obtain the following:

\begin{aligned} \bigl\Vert x_{n+1} - x^{*} \bigr\Vert &\leq \bigl\Vert y_{n} - x^{*} \bigr\Vert \\ & = \bigl\Vert P_{\operatorname{dom} g} \hat{x}_{n} - P_{\operatorname{dom} g} x^{*} \bigr\Vert \\ &\leq \bigl\Vert \hat{x}_{n} - x^{*} \bigr\Vert \\ & \leq \bigl\Vert x_{n} - x^{*} \bigr\Vert + \beta _{n} \Vert x_{n} - x_{n-1} \Vert \\ &\leq (1+\beta _{n}) \bigl\Vert x_{n} - x^{*} \bigr\Vert + \beta _{n} \bigl\Vert x_{n-1} -x^{*} \bigr\Vert \quad \text{for all } n \in \mathbb{N}. \end{aligned}
(13)

By Lemma 4, we have $$\{x_{n}\}$$ is bounded, and hence $$\sum_{n=1}^{+\infty }\beta _{n}\|x_{n} - x_{n-1}\| < + \infty$$. Consequently,

$$\Vert \hat{x}_{n} - x_{n} \Vert = \beta _{n} \Vert x_{n} - x_{n-1} \Vert \rightarrow 0,\quad \text{as } n \rightarrow +\infty .$$
(14)

From (13), we have

$$\bigl\Vert x_{n+1} - x^{*} \bigr\Vert \leq \bigl\Vert x_{n} - x^{*} \bigr\Vert + \beta _{n} \Vert x_{n} - x_{n-1} \Vert \quad \text{for all } n \in \mathbb{N}.$$

By Lemma 5, we get that $$\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \|$$ exists. Now, from the convexity of domg and the definitions of $$z_{n-1}$$ and $$x_{n}$$, we conclude that $$x_{n} \in \operatorname{dom} g$$. Consequently,

$$\Vert \hat{x}_{n} - y_{n} \Vert \leq \Vert \hat{x}_{n} - x_{n} \Vert \rightarrow 0,\quad \text{as } n \rightarrow +\infty .$$
(15)

By (14) and (15), we have $$\lim_{n \rightarrow +\infty }\| x_{n} - y_{n} \| = 0$$. Using (13) and (14), we obtain $$\lim_{n \rightarrow +\infty }\| x_{n} - x^{*} \| = \lim_{n \rightarrow +\infty }\| y_{n} - x^{*} \|$$. From (11) and (12), we get $$\lim_{n \rightarrow +\infty }\| y_{n} - x^{*} \| = \lim_{n \rightarrow +\infty }\| z_{n} - x^{*} \|$$, and hence (9) implies that $$\lim_{n \rightarrow +\infty }\| y_{n} - z_{n} \| = 0$$. As a result, we have $$\lim_{n \rightarrow +\infty }\| x_{n} - z_{n} \| = 0$$.

Next, we prove that every weak-cluster point of $$\{x_{n}\}$$ belongs to $$S_{*}$$. To do this, let w be a weak-cluster point of $$\{x_{n}\}$$. Then there exists a subsequence $$\{x_{n_{k}}\}$$ of $$\{x_{n}\}$$ such that $$x_{n_{k}} \rightharpoonup w$$ and hence $$z_{n_{k}} \rightharpoonup w$$.

If $$\gamma _{n_{k}} \neq \sigma$$ for finitely many k, thus, we can suppose that $$\gamma _{n_{k}} = \sigma$$ for all $$k \in \mathbb{N}$$ without loss of generality. The definition of $$\gamma _{n_{k}}$$ implies that

$$\bigl\Vert \triangledown f(z_{n_{k}})- \triangledown f(y_{n_{k}}) \bigr\Vert \leq \frac{\delta }{\sigma } \Vert z_{n_{k}} - y_{n_{k}} \Vert .$$

Since f is uniformly continuous, we get $$\lim_{k \rightarrow +\infty }\| \triangledown f(z_{n_{k}})- \triangledown f(y_{n_{k}}) \| = 0$$. We know that

$$\frac{ y_{n_{k}} - z_{n_{k}} }{\gamma _{n_{k}}} - \triangledown f(y_{n_{k}}) + \triangledown f(z_{n_{k}}) \in \partial g(z_{n_{k}}) + \triangledown f(z_{n_{k}}) = \partial (f+g) (z_{n_{k}}).$$

We conclude from the demiclosedness of $$\operatorname{Gph}(\partial (f+g))$$ that $$(w,0) \in \operatorname{Gph}(\partial (f+g))$$. Hence, $$0 \in \partial (f+g)(w)$$, which implies that $$w \in S_{*}$$.

Now, suppose that there exists a subsequence $$\{z_{n_{k_{j}}}\}$$ of $$\{z_{n_{k}}\}$$ such that $$\gamma _{n_{k_{j}}} \leq \sigma \theta$$ for all $$j \in \mathbb{N}$$. In this case, we can set $$\hat{\gamma }_{n_{k_{j}}} = \frac{\gamma _{n_{k_{j}}}}{\theta }$$ and $$\hat{z}_{n_{k_{j}}} = \mathrm{prox}_{\hat{\gamma }_{n_{k_{j}}}g} (y_{n_{k_{j}}} - \hat{\gamma }_{n_{k_{j}}} \triangledown f(y_{n_{k_{j}}}))$$. By the definition of $$\gamma _{n_{k_{j}}}$$, we obtain

$$\bigl\Vert \triangledown f(\hat{z}_{n_{k_{j}}}) - \triangledown f(y_{n_{k_{j}}}) \bigr\Vert > \frac{\delta }{\hat{\gamma }_{n_{k_{j}}}} \Vert \hat{z}_{n_{k_{j}}} - y_{n_{k_{j}}} \Vert .$$
(16)

Moreover, by Lemma 2, we have

$$\frac{1}{\theta } \Vert y_{n_{k_{j}}} - z_{n_{k_{j}}} \Vert \geq \Vert y_{n_{k_{j}}} - \hat{z}_{n_{k_{j}}} \Vert .$$

Therefore, $$\| y_{n_{k_{j}}} - \hat{z}_{n_{k_{j}}}\| \rightarrow 0, \text{as } j \rightarrow +\infty$$, which implies that $$\hat{z}_{n_{k_{j}}} \rightharpoonup w$$. Again, using the uniform continuity of f, we obtain $$\| \triangledown f(\hat{z}_{n_{k_{j}}}) - \triangledown f(y_{n_{k_{j}}}) \| \rightarrow 0$$, as $$j \rightarrow +\infty$$. Combining with (16), we obtain $$\frac{\| \hat{z}_{n_{k_{j}}} - y_{n_{k_{j}}}\|}{\hat{\gamma }_{n_{k_{j}}}} \rightarrow 0$$, as $$j \rightarrow +\infty$$. Moreover, we know that

$$\frac{ y_{n_{k_{j}}}-\hat{z}_{n_{k_{j}}} }{\hat{\gamma }_{n_{k_{j}}}} - \triangledown f(y_{n_{k_{j}}}) + \triangledown f( \hat{z}_{n_{k_{j}}}) \in \partial g(\hat{z}_{n_{k_{j}}}) + \triangledown f(\hat{z}_{n_{k_{j}}}) = \partial (f+g) (\hat{z}_{n_{k_{j}}}).$$

It implies, by the demiclosedness of $$\operatorname{Gph}(\partial (f+g))$$, that $$0 \in \partial (f+g)(w)$$, so $$w \in S_{*}$$.

By Lemma 6, we obtain that $$\{x_{n}\}$$ converges weakly to an element in $$S_{*}$$, and the proof is complete. □

By setting $$\beta _{n} = 0$$ and $$\alpha _{n} = 0$$ for all $$n \in \mathbb{N}$$, then $$y_{n} = x_{n}$$, and hence Algorithm 3 is reduced to Algorithm 1. As a consequence of Theorem 7, we obtain the following result which is one part of Theorem 4.2 in .

### Corollary 8

Let H be a real Hilbert space, $$f,g :H \rightarrow \mathbb{R}\cup \{+\infty \}$$ be proper lower semicontinuous convex functions satisfying A1 and A2. If $$S_{*} \neq \emptyset$$, then a sequence $$\{x_{n}\}$$ generated by Algorithm 1converges weakly to a point in $$S_{*}$$.

In the next theorem, we prove the complexity of Algorithm 3. We first introduce the control sequence $$\{t_{n}\}$$ defined in  by

$$t_{n} = 1 + \sum_{k = n}^{+\infty } \Biggl(\prod_{i=n}^{k} \beta _{i}\Biggr)\quad \text{for all } n \in \mathbb{N}.$$
(17)

This sequence is well defined if the following assumption holds:

$$\sum_{k = n}^{+\infty }\Biggl(\prod _{i=n}^{k}\beta _{i}\Biggr) < + \infty \quad \text{for all } n \in \mathbb{N}.$$

It is easy to see that under the above assumption we have

$$\beta _{n}t_{n+1} = t_{n} - 1 \quad \text{for all } n \in \mathbb{N}.$$
(18)

Next, we prove the following theorem.

### Theorem 9

Given $$x_{0} = x_{1} \in \operatorname{dom} g$$, let $$\{x_{n}\}$$ be a sequence generated by Algorithm 3, and suppose that all assumptions in Theorem 7hold. Additionally, the following assumptions are also true for all $$n \in \mathbb{N}$$:

1. C1.

$$\sum_{k = n}^{+\infty }(\prod_{i=n}^{k}\beta _{i}) < + \infty$$, and $$t^{2}_{n+1} - t_{n+1} \leq t^{2}_{n}$$,

2. C2.

$$\alpha _{n} \in [\frac{1}{2}, 1]$$, and $$\alpha _{n} \leq \alpha _{n-1}$$,

3. C3.

$$\gamma _{n} = \textbf{Linesearch}\textit{ 1}(y_{n},\rho _{n-1},\theta , \delta )$$, $$\rho _{n} := \textbf{Linesearch}\textit{ 1}(z_{n},\gamma _{n},\theta ,\delta )$$, and $$\rho _{n} \geq \rho > 0$$.

Then

$$(f+g) (x_{n+1})- \min_{x \in H}(f+g) (x) \leq \frac{d(x_{1},S_{*})^{2} + t^{2}_{1}\zeta _{1}[(f+g)(x_{1})- \min_{x \in H}(f+g)(x)]}{3\rho t^{2}_{n+1}}$$

for all $$n \in \mathbb{N}$$, where $$\zeta _{1} = 2(\gamma _{1} + \alpha _{1} \rho _{1})$$. In other words,

$$(f+g) (x_{n+1})- \min_{x \in H}(f+g) (x) = \mathcal{O}\biggl( \frac{1}{t^{2}_{n+1}}\biggr) \quad \textit{for all } n \in \mathbb{N}.$$

### Proof

Let $$x^{*} \in S_{*}$$. For any $$x \in \operatorname{dom} g$$, we know that

\begin{aligned}& \Vert y_{n} - x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2} \geq 2\gamma _{n}\bigl[(f+g) (z_{n}) - (f+g) (x)\bigr], \end{aligned}
(19)
\begin{aligned}& \Vert z_{n} - x \Vert ^{2} - \Vert w_{n} - x \Vert ^{2} \geq 2\rho _{n} \bigl[(f+g) (w_{n}) - (f+g) (x)\bigr] \end{aligned}
(20)

for all $$n \in \mathbb{N}$$. Put $$x = z_{n}$$ in (20), then

$$- \Vert w_{n} - x \Vert ^{2} \geq (f+g) (w_{n}) - (f+g) (z_{n}),$$

thus $$(f+g)(z_{n}) \geq (f+g)(w_{n})$$ for all $$n \in \mathbb{N}$$. Since f and g are convex, we have

\begin{aligned} (f+g) (x_{n+1}) &\leq (1-\alpha _{n}) (f+g) (z_{n}) + \alpha _{n}(f+g) (w_{n}) \\ &\leq (f+g) (z_{n}). \end{aligned}
(21)

From the definition of $$x_{n+1}$$, we obtain

\begin{aligned} \Vert x_{n+1} - x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2} ={}& (1-\alpha _{n}) \Vert z_{n} - x \Vert ^{2} + \alpha _{n} \Vert w_{n} - x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2} \\ &{} -(1-\alpha _{n})\alpha _{n} \Vert z_{n} - w_{n} \Vert ^{2} \\ \leq {}&\alpha _{n} \bigl( \Vert w_{n} -x \Vert ^{2} - \Vert z_{n} - x \Vert ^{2}\bigr). \end{aligned}

Hence,

$$\Vert z_{n} - x \Vert ^{2} - \Vert x_{n+1} - x \Vert ^{2} \geq \alpha _{n} \bigl( \Vert z_{n} - x \Vert ^{2} - \Vert w_{n} -x \Vert ^{2}\bigr) \quad \text{for all } n \in \mathbb{N}.$$
(22)

Combining (20) and (22), we have

$$\Vert z_{n} - x \Vert ^{2} - \Vert x_{n+1} - x \Vert ^{2} \geq 2 \alpha _{n} \rho _{n}\bigl[(f+g) (w_{n}) - (f+g) (x) \bigr] \quad \text{for all } n \in \mathbb{N}.$$
(23)

Summing (19) and (23), we get

\begin{aligned} \Vert y_{n} - x \Vert ^{2} - \Vert x_{n+1} - x \Vert ^{2}\geq{}& 2\gamma _{n}\bigl[(f+g) (z_{n}) - (f+g) (x)\bigr] \\ & {}+ 2 \alpha _{n} \rho _{n}\bigl[(f+g) (w_{n}) - (f+g) (x)\bigr] \\ \geq{} &2\gamma _{n}(f+g) (z_{n}) + 2 \alpha _{n} \rho _{n}(f+g) (w_{n}) \\ & {}- 2(\gamma _{n} + \alpha _{n} \rho _{n}) (f+g) (x) \end{aligned}
(24)

for all $$n \in \mathbb{N}$$. We claim that

$$2\gamma _{n}(f+g) (z_{n}) + 2 \alpha _{n} \rho _{n}(f+g) (w_{n}) \geq 2( \gamma _{n} + \alpha _{n} \rho _{n}) (f+g) (x_{n+1}) \quad \text{for all } n \in \mathbb{N}.$$
(25)

To validate our claim, we know from (21) and C2 that

\begin{aligned} (f+g) (z_{n}) + (f+g) (w_{n})= {}& (1-\alpha _{n}) (f+g) (z_{n}) + \alpha _{n}(f+g) (w_{n}) \\ &{}+ \alpha _{n}(f+g) (z_{n}) + (1-\alpha _{n}) (f+g) (w_{n}) \\ \geq {}&(f+g) (x_{n+1}) + \alpha _{n}(f+g) (z_{n}) + (1 -\alpha _{n}) (f+g) (w_{n}) \\ ={}& (f+g) (x_{n+1}) + \biggl( 1- \frac{1-\alpha _{n}}{\alpha _{n}} \biggr) (f+g) (z_{n}) \\ &{}+ \frac{1-\alpha _{n}}{\alpha _{n}}\bigl[(1-\alpha _{n}) (f+g) (z_{n}) + \alpha _{n}(f+g) (w_{n})\bigr] \\ \geq{}& (f+g) (x_{n+1}) + \biggl( 1- \frac{1-\alpha _{n}}{\alpha _{n}} \biggr) (f+g) (z_{n}) + \frac{1-\alpha _{n}}{\alpha _{n}}(f+g) (x_{n+1}) \\ \geq{}& 2(f+g) (x_{n+1}) \quad \text{for all } n \in \mathbb{N}. \end{aligned}

Consequently,

\begin{aligned} 2\gamma _{n}(f+g) (z_{n}) + 2 \alpha _{n} \rho _{n}(f+g) (w_{n}) ={}&2( \gamma _{n} - \alpha _{n} \rho _{n}) (f+g) (z_{n}) \\ &{}+ 2 \alpha _{n} \rho _{n}\bigl[(f+g) (z_{n}) + (f+g) (w_{n})\bigr] \\ \geq {}&2(\gamma _{n} - \alpha _{n} \rho _{n}) (f+g) (x_{n+1}) \\ &{}+ 4\alpha _{n} \rho _{n}(f+g) (x_{n+1}) \\ ={}& 2(\gamma _{n} + \alpha _{n} \rho _{n}) (f+g) (x_{n+1}) \end{aligned}

for all $$n \in \mathbb{N}$$. For simplicity, we denote $$\zeta _{n} = 2(\gamma _{n} + \alpha _{n} \rho _{n})$$. We note that $$\zeta _{n} \geq \zeta _{n+1} \text{for all } n \in \mathbb{N}$$ from C2 and C3. We also know that $$\| \hat{x}_{n} - x \| \geq \| y_{n} - x \|$$ since $$x \in \operatorname{dom} g$$.

So, from (24) and (25), we have

$$\Vert \hat{x}_{n} - x \Vert ^{2} - \Vert x_{n+1} - x \Vert ^{2} \geq \zeta _{n}\bigl[(f+g) (x_{n+1})- (f+g) (x)\bigr] \quad \text{for all } n \in \mathbb{N}.$$
(26)

We know that $$x_{n},x^{*} \in \operatorname{dom} g$$ and $$t_{n+1}> 1$$. Thus, we conclude that $$(1-\frac{1}{t_{n+1}})x_{n} + \frac{1}{t_{n+1}}x^{*} \in \operatorname{dom} g$$. By putting $$x = (1-\frac{1}{t_{n+1}})x_{n} + \frac{1}{t_{n+1}}x^{*}$$ in (26), we obtain

\begin{aligned} & \biggl\Vert x_{n+1} - \biggl(1- \frac{1}{t_{n+1}}\biggr)x_{n} - \frac{1}{t_{n+1}}x^{*} \biggr\Vert ^{2} - \biggl\Vert \hat{x}_{n} - \biggl(1-\frac{1}{t_{n+1}}\biggr)x_{n} - \frac{1}{t_{n+1}}x^{*} \biggr\Vert ^{2} \\ &\quad \leq \zeta _{n}\biggl[(f+g) \biggl(\biggl(1-\frac{1}{t_{n+1}} \biggr)x_{n} + \frac{1}{t_{n+1}}x^{*}\biggr) - (f+g) (x_{n+1})\biggr] \\ &\quad \leq \zeta _{n}\biggl[\biggl(1-\frac{1}{t_{n+1}}\biggr) (f+g) (x_{n}) + \frac{1}{t_{n+1}}(f+g) \bigl(x^{*}\bigr) - (f+g) (x_{n+1})\biggr] \\ &\quad = \zeta _{n}\biggl[\biggl(1-\frac{1}{t_{n+1}}\biggr)\bigl[(f+g) (x_{n})-(f+g) \bigl(x^{*}\bigr)\bigr] - \bigl[(f+g) (x_{n+1})-(f+g) \bigl(x^{*}\bigr)\bigr]\biggr] \end{aligned}
(27)

for all $$n \in \mathbb{N}$$. We also have, for $$n \in \mathbb{N}$$,

\begin{aligned} & \biggl\Vert x_{n+1}- \biggl(1- \frac{1}{t_{n+1}}\biggr)x_{n} - \frac{1}{t_{n+1}}x^{*} \biggr\Vert ^{2} - \biggl\Vert \hat{x}_{n} - \biggl(1-\frac{1}{t_{n+1}}\biggr)x_{n} - \frac{1}{t_{n+1}}x^{*} \biggr\Vert ^{2} \\ &\quad = \frac{1}{t^{2}_{n+1}}\bigl( \bigl\Vert t_{n+1}x_{n+1}-(t_{n+1}-1)x_{n}-x^{*} \bigr\Vert ^{2} \\ &\qquad {}- \bigl\Vert t_{n+1}x_{n} + \beta _{n}t_{n+1}(x_{n}-x_{n-1}) - (t_{n+1}-1)x_{n}-x^{*} \bigr\Vert ^{2}\bigr) \\ &\quad = \frac{1}{ t^{2}_{n+1}}\bigl( \bigl\Vert t_{n+1}x_{n+1} - (t_{n+1} - 1)x_{n}-x^{*} \bigr\Vert ^{2} - \bigl\Vert (t_{n}-1) (x_{n} - x_{n-1}) + x_{n} - x^{*} \bigr\Vert ^{2}\bigr) \\ &\quad =\frac{1}{ t^{2}_{n+1}}\bigl( \bigl\Vert t_{n+1}x_{n+1} - (t_{n+1} - 1)x_{n}-x^{*} \bigr\Vert ^{2} - \bigl\Vert t_{n}x_{n} - (t_{n} - 1)x_{n-1} - x^{*} \bigr\Vert ^{2}\bigr) \end{aligned}
(28)

and

\begin{aligned} &\zeta _{n}\biggl(1-\frac{1}{t_{n+1}}\biggr) \bigl[(f+g) (x_{n})-(f+g) \bigl(x^{*}\bigr)\bigr] - \zeta _{n}\bigl[(f+g) (x_{n+1})-(f+g) \bigl(x^{*} \bigr)\bigr] \\ &\quad = \frac{\zeta _{n}}{t^{2}_{n+1}}\bigl[\bigl(t^{2}_{n+1} - t_{n+1}\bigr)\bigl[(f + g) (x_{n}) - (f + g) \bigl(x^{*}\bigr)\bigr] \\ &\qquad {} - t^{2}_{n+1}\bigl[(f + g) (x_{n+1}) - (f + g) \bigl(x^{*}\bigr)\bigr]\bigr] \\ &\quad \leq \frac{\zeta _{n}}{t^{2}_{n+1}}\bigl[t^{2}_{n}\bigl[(f + g) (x_{n}) - (f + g) \bigl(x^{*}\bigr)\bigr] - t^{2}_{n+1}\bigl[(f + g) (x_{n+1}) - (f + g) \bigl(x^{*}\bigr)\bigr]\bigr]. \end{aligned}
(29)

Hence, we obtain from (27), (28), and (29) that, for $$n \in \mathbb{N}$$,

\begin{aligned} & \frac{1}{ t^{2}_{n+1}}\bigl( \bigl\Vert t_{n+1}x_{n+1}-(t_{n+1}-1)x_{n}-x^{*} \bigr\Vert ^{2} - \bigl\Vert t_{n}x_{n} - (t_{n}-1)x_{n-1} - x^{*} \bigr\Vert ^{2}\bigr) \\ &\quad \leq \frac{\zeta _{n}}{t^{2}_{n+1}}\bigl[t^{2}_{n}\bigl[(f + g) (x_{n}) - (f + g) \bigl(x^{*}\bigr)\bigr]- t^{2}_{n+1}\bigl[(f + g) (x_{n+1}) - (f + g) \bigl(x^{*}\bigr)\bigr]\bigr]. \end{aligned}
(30)

We know that $$\zeta _{n+1} \leq \zeta _{n}$$, so after rearranging (30), we have, for $$n \in \mathbb{N}$$,

\begin{aligned} t^{2}_{n+1}\zeta _{n+1} \bigl[(f+g) (x_{n+1})-(f+g) (x)\bigr]\leq{}& \bigl\Vert t_{n}x_{n} - (t_{n}-1)x_{n-1} - x^{*} \bigr\Vert ^{2} \\ &{}- \bigl\Vert t_{n+1}x_{n+1}-(t_{n+1}-1)x_{n}-x^{*} \bigr\Vert ^{2} \\ &{} + t^{2}_{n}\zeta _{n} \bigl[(f+g) (x_{n})-(f+g) \bigl(x^{*}\bigr)\bigr]. \end{aligned}
(31)

Furthermore, by using (31), we can inductively show that

\begin{aligned} t^{2}_{n+1}\zeta _{n+1}\bigl[(f+g) (x_{n+1})-(f+g) \bigl(x^{*}\bigr)\bigr]\leq{}& \bigl\Vert t_{n}x_{n} - (t_{n}-1)x_{n-1} - x^{*} \bigr\Vert ^{2} \\ &{}+ t^{2}_{n}\zeta _{n}\bigl[(f+g) (x_{n})-(f+g) \bigl(x^{*}\bigr)\bigr] \\ \leq{}& \bigl\Vert t_{n-1}x_{n-1}-(t_{n-1}-1)x_{n-2}-x^{*} \bigr\Vert ^{2} \\ &{} + t^{2}_{n -1}\zeta _{n -1}\bigl[(f + g) (x_{n -1}) - (f + g) \bigl(x^{*}\bigr)\bigr] \\ & \vdots \\ \leq{} &\bigl\Vert t_{1}x_{1} - (t_{1}-1)x_{0}-x^{*} \bigr\Vert ^{2} \\ &{}+ t^{2}_{1}\zeta _{1} \bigl[(f+g) (x_{1})-(f+g) \bigl(x^{*}\bigr)\bigr] \end{aligned}

for all $$n \in \mathbb{N}$$. Since $$\zeta _{n} = 2(\gamma _{n} + \alpha _{n}\rho _{n}) \geq 3\rho$$, we obtain, for all $$n \in \mathbb{N}$$, that

\begin{aligned} (f+g) (x_{n+1})- \min_{x \in H}(f+g) \bigl(x^{*}\bigr)\leq{}& \frac{1}{t^{2}_{n+1}\zeta _{n+1}} \bigl\Vert x_{1}-x^{*} \bigr\Vert ^{2} \\ &{}+ \frac{t^{2}_{1}\zeta _{1}}{t^{2}_{n+1}\zeta _{n+1}}\bigl[(f+g) (x_{1})-(f+g) \bigl(x^{*}\bigr)\bigr] \\ \leq{} &\frac{ \Vert x_{1} - x^{*} \Vert ^{2} + t^{2}_{1}\zeta _{1}[(f + g)(x_{1})- \min_{x \in H}(f + g)(x)]}{3\rho t^{2}_{n+1}}. \end{aligned}

Since $$x^{*}$$ is chosen from $$S_{*}$$ arbitrarily, we have

$$(f+g) (x_{n+1})- \min_{x \in H}(f+g) (x) \leq \frac{d(x_{1},S_{*})^{2} + t^{2}_{1}\zeta _{1}[(f+g)(x_{1})- \min_{x \in H}(f+g)(x)]}{3\rho t^{2}_{n+1}}$$

for all $$n \in \mathbb{N}$$. Hence, we obtain the desired results and the proof is complete. □

### Remark 10

To justify that there exists a sequence $$\{\beta _{n}\}$$ satisfying C1, we choose

$$\beta _{n} = \textstyle\begin{cases} 0.9, &\text{if } n \leq 1000 \\ \frac{1}{n^{2}}, &\text{if } n \geq 1001. \end{cases}$$

Obviously, $$\beta _{n} \geq \beta _{n+1}$$ for all $$n \in \mathbb{N}$$. Since

$$\sum_{k = n}^{+\infty }\Biggl(\prod _{i=n}^{k}\beta _{i}\Biggr) = \beta _{n} + \beta _{n} \beta _{n+1} + \beta _{n} \beta _{n+1} \beta _{n+2} + \cdots ,$$

we have

$$\sum_{k = n}^{+\infty }\Biggl(\prod _{i=n}^{k}\beta _{i}\Biggr) - \sum _{k = n+1}^{+\infty }\Biggl(\prod _{i=n+1}^{k}\beta _{i}\Biggr) \geq 0,$$

and hence $$t_{n+1} \leq t_{n}$$. Furthermore, it is easy to see that

$$\sum_{k = 1}^{+\infty }\Biggl(\prod _{i=1}^{k}\beta _{i}\Biggr) < + \infty .$$

Therefore, $$\sum_{k = n}^{+\infty }(\prod_{i=n}^{k}\beta _{i}) < + \infty$$ and $$t^{2}_{n+1} - t_{n+1} \leq t^{2}_{n}$$ for all $$n \in \mathbb{N}$$, so C1 is satisfied.

## Applications to data classification and regression problems

In this section, we apply Algorithm 3 to solving regression and classification problems. Moreover, we conduct some numerical experiments for comparing the performance of Algorithm 3 with Algorithm 1 and Algorithm 2.

Machine learning is an application of artificial intelligence (AI) which has the ability to automatically learn and improve from experience. There are many techniques for the machine to learn, in this work, we focus on extreme learning machine (ELM) introduced by Huang et al.  defined as follows:

Let $$S := \{(x_{k},t_{k}): x_{k} \in \mathbb{R}^{n}, t_{k} \in \mathbb{R}^{m}, k = 1,2,\ldots,N\}$$ be a training set of N distinct samples, $$x_{k}$$ is an input data, and $$t_{k}$$ is a target. The output function of ELM for SLFNs with M hidden nodes and activation function G is

$$o_{j} = \sum_{i=1}^{M} \eta _{i} G\bigl(\langle w_{i},x_{j} \rangle + b_{i}\bigr),$$

where $$w_{i}$$ is the weight vector connecting the ith hidden node and the input node, $$\eta _{i}$$ is the weight vector connecting the ith hidden node and the output node, and $$b_{i}$$ is bias. The hidden layer output matrix H is defined as follows:

$$\textbf{H} = \begin{bmatrix} G(\langle w_{1},x_{1} \rangle + b_{1}) & \cdots & G(\langle w_{M},x_{1} \rangle + b_{M}) \\ \vdots & \ddots & \vdots \\ G(\langle w_{1},x_{N} \rangle + b_{1}) & \cdots & G(\langle w_{M},x_{N} \rangle + b_{M}) \end{bmatrix} .$$

To solve ELM is finding $$\eta = [\eta ^{T}_{1},\ldots,\eta ^{T}_{M} ]^{T}$$ such that $$\textbf{H}\eta = \textbf{T}$$, where $$\textbf{T} = [t^{T}_{1},\ldots,t^{T}_{N} ]^{T}$$ is the training data. We can write the solution η in the form $$\eta = \textbf{H}^{\dagger }\textbf{T}$$, where $$\textbf{H}^{\dagger }$$ is the Moore–Penrose generalized inverse of H. However, if $$\textbf{H}^{\dagger }$$ does not exist, then η is quite difficult to find. In this case, we can employ the concept of convex minimization to find such η without relying on the existence of $$\textbf{H}^{\dagger }$$.

To prevent overfitting, we use following regularization: Least absolute shrinkage and selection operator (LASSO) :

$$\text{Minimize:} \quad \Vert \textbf{H}\eta - \textbf{T} \Vert ^{2}_{2} + \lambda \Vert \eta \Vert _{1},$$
(32)

where λ is a regularization parameter, and consider $$f(x) =\| \textbf{H}x - \textbf{T} \|^{2}_{2}$$ and $$g(x) = \lambda \| x \|_{1}$$. Based on this model, we conduct some numerical experiment on a regression of a sine function and a classification on the Iris and heart disease dataset.

Throughout Sects. 4.1 and 4.2, we use sigmoid as an activation function. Moreover, we choose parameters according to the hypotheses of Theorem 7. All results are performed on Intel Core i5-7500 CPU with 16GB RAM and GeForce GTX 1060 6GB GPU.

### Experiments for regression

We generate distinct points $$x_{1},x_{2},\ldots,x_{10}$$ in an interval $$[-4, 4]$$, and define the training set $$S := \{\sin x_{n} : n=1,\ldots,10\}$$ and a graph of a sine function on $$[-4, 4]$$ as the target. Moreover, we set $$M = 25$$ as the number of hidden nodes, and $$\lambda = 10^{-5}$$.

For the first experiment, we set $$\delta = 0.49$$, $$\sigma = 0.1$$, $$\theta = 0.1$$, and $$\alpha _{n} = \zeta _{n} = \frac{0.9n}{n+1}$$ in Algorithm 3 to evaluate the convergence behavior of Algorithm 3 with various inertial parameters $$\beta _{n}$$, namely

\begin{aligned}& \beta ^{1}_{n} = 0, \qquad \beta _{n}^{2}= \textstyle\begin{cases} \frac{n}{n+1}, &\text{if } n \leq 10\text{,}000, \\ \frac{1}{n^{2}}, &\text{if } n \geq 10\text{,}001, \end{cases}\displaystyle \\& \beta ^{3}_{n} = \textstyle\begin{cases} 0.9, &\text{if } n \leq 10\text{,}000, \\ \frac{1}{n^{2}}, &\text{if } n \geq 10\text{,}001, \end{cases}\displaystyle \quad \text{and}\quad \beta _{n}^{4} = \frac{10^{10}}{ \Vert x_{n} - x_{n-1} \Vert ^{3}+n^{3}+10^{10}} . \end{aligned}

To evaluate the performance, we use mean square error(MSE) defined as follows:

$$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n} (\bar{y}_{i} - y_{i})^{2}.$$

By letting MSE $$= 1 \times 10^{-3}$$ and 1000 number of iterations as the stopping criteria, we obtain the following results in Table 1 which show that some inertial parameters improve the performance of Algorithm 3 substantially.

In the next experiment, we compare the performance of Algorithm 3 with Algorithm 1 and Algorithm 2. All the parameters are chosen as seen in Table 2.

By letting $$\mathrm{MSE} = 1 \times 10^{-3}$$ and 30,000 number of iterations as the stopping criteria, the results are shown in Table 3.

From Table 3, we see that Algorithm 3 takes only 433 iterations to reach the stopping criteria, so it outperforms both Algorithm 1 and 2.

In the following experiment, we evaluate the performance of each algorithm at the 700th iteration with mean absolute error (MAE) and root mean squared error (RMSE) defined as follows:

$$\text{MAE} = \frac{1}{n}\sum_{i=1}^{n} \vert \bar{y}_{i} - y_{i} \vert , \qquad \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (\bar{y}_{i} - y_{i})^{2}}.$$

The results can be seen in Table 4.

As seen in Table 4, Algorithm 3 achieves the lowest MAE and RMSE at the 700th iteration. In Fig. 1, we illustrate the performance of each algorithm at the 700th iteration.

### Data classification

We conduct some experiments on Iris dataset  and heart disease dataset  from https://archive.ics.uci.edu. The Iris dataset contains three classes of Iris plants with 50 instances of each, and the heart disease dataset contains two classes, namely 165 patients with heart disease and 138 patients without heart disease. See Table 5 for more details of the datasets.

We set the number of hidden nodes $$M = 35$$ and $$\lambda = 10^{-5}$$ for both datasets. For an estimation of the optimal weight β, we use Algorithm 1, Algorithm 2, and Algorithm 3, and the output O of training and testing set are calculated by $$O = \textbf{H} \beta$$.

Furthermore, the dataset is split into training and testing set, see Table 6 for details.

The accuracy is calculated by the following:

$$\text{accuracy } = \frac{\text{correctly predicted data}}{\text{ all data }}\times 100.$$

We denote acc.train and acc.test as accuracy of training and testing set, respectively. We first compare the accuracy of Algorithm 3 at 100th iteration with different inertial parameter β, namely

$$\beta ^{1}_{n} = 0, \qquad \beta _{n}^{2}= \textstyle\begin{cases} \frac{n}{n+1}, &\text{if } n \leq 1000 \\ \frac{1}{n^{2}},& \text{if } n \geq 1001, \end{cases}\displaystyle \quad \text{and}\quad \beta _{n}^{3} = \textstyle\begin{cases} 0.9, &\text{if } n \leq 1000 \\ \frac{1}{n^{2}}, &\text{if } n \geq 1001. \end{cases}$$

By setting $$\sigma = 0.49$$, $$\delta = 0.1$$, $$\theta = 0.1$$, and $$\alpha _{n} = \frac{0.9n}{n+1}$$ in Algorithm 3, the numerical experiment for data classification can be seen in Table 7.

It is observed that $$\beta ^{3}_{n}$$ achieves the highest accuracy, so throughout this section we choose $$\beta ^{3}_{n}$$ as inertial parameters.

The next experiment is a comparison of the performance for Algorithm 1, Algorithm 2, and Algorithm 3 at the 100th iteration. See Table 8 for the result.

Now, we employ 10-fold stratified cross validation on both Iris and heart disease datasets. We denote

$$\text{Average ACC} = \sum_{i=1}^{N} \frac{x_{i}}{y_{i}} \times 100\%/N,$$

where N is a number of folds, $$x_{i}$$ is a number of correctly predicted samples at fold i, and $$y_{i}$$ is a number of all samples at fold i.

$$\text{err}_{L\%} = \frac{\text{sum of errors in all 10 training sets}}{\text{sum of all samples in 10 training sets}} \times 100\%,$$

and

$$\text{err}_{T\%} = \frac{\text{sum of errors in all 10 testing sets}}{\text{sum of all samples in 10 testing sets}} \times 100\%.$$

Then we define

$$\text{ERR}_{\%} = (\text{err}_{L\%} + \text{err}_{T\%})/2 .$$

In Table 9, we show the result for classification of Iris dataset at the 100th iteration by Algorithm 1, Algorithm 2, and Algorithm 3 at each fold.

In Table 10, we show the result of heart disease dataset at the 100th iteration.

According to Tables 9 and 10, we can conclude that Algorithm 3 achieves the highest accuracy.

## Conclusions

In this paper, a new algorithm for solving convex minimization problems with an inertial and a linesearch technique, proposed by Cruz and Nghia , is introduced and studied. We prove a weak convergence of the proposed algorithm to a solution of (1) without assuming f to be L-Lipschitz continuous. The complexity theorem is also proved under some control conditions. We also employ our algorithm as a machine learning algorithm based on the extreme learning machine model (ELM) introduced by Huang et al.  for regression and classification problems. Moreover, we conduct some experiments to show that the proposed algorithm has a good behavior of convergence in terms of low number of iterations and high accuracy for regression and classification problems which imply that our algorithm performs very well in terms of speed in comparison to Algorithm 1 and Algorithm 2.

## Availability of data and materials

The datasets analysed during the current study are available in https://archive.ics.uci.edu.

## References

1. 1.

Chen, M., Zhang, H., Lin, G., Han, Q.: A new local and nonlocal total variation regularization model for image denoising. Clust. Comput. 22, 7611–7627 (2019)

2. 2.

Combettes, P.L., Wajs, V.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

3. 3.

Hanjing, A., Suantai, S.: A fast image restoration algorithm based on a fixed point and optimization method. Mathematics 8, 378 (2020). https://doi.org/10.3390/math8030378

4. 4.

Kankam, K., Pholasa, N., Cholamjiak, C.: On convergence and complexity of the modified forward-backward method involving new linesearches for convex minimization. Math. Methods Appl. Sci. 42, 1352–1362 (2019)

5. 5.

Kowalski, M., Meynard, A., Wu, H.: Convex optimization approach to signals with fast varying instantaneous frequency. Appl. Comput. Harmon. Anal. 44, 89–122 (2018)

6. 6.

Shehu, Y., Iyiola, O.S., Ogbuisi, F.U.: Iterative method with inertial terms for nonexpansive mappings: applications to compressed sensing. Numer. Algorithms 83, 1321–1347 (2020)

7. 7.

Zhang, Y., Li, X., Zhao, G., Cavalcante, C.C.: Signal reconstruction of compressed sensing based on alternating direction method of multipliers. Circuits Syst. Signal Process. 39, 307–323 (2020)

8. 8.

Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

9. 9.

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

10. 10.

Bussaban, L., Suantai, S., Kaewkhao, A.: A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpath. J. Math. 36, 21–30 (2020)

11. 11.

Bello Cruz, J.Y., Nghia, T.T.: On the convergence of the forward-backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)

12. 12.

Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)

13. 13.

Alvarez, F., Attouch, H.: An inertial proximal method for maxi mal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)

14. 14.

Attouch, H., Cabot, A.: Convergence rate of a relaxed inertial proximal algorithm for convex minimization. Optimization 69, 1281–1312 (2019)

15. 15.

Abbas, M., Iqbal, H.: Two inertial extragradient viscosity algorithms for solving variational inequality and fixed point problems. J. Nonlinear Var. Anal. 4, 377–398 (2020)

16. 16.

Abass, H.A., Aremu, K.O., Jolaoso, L.O., Mewomo, O.T.: An inertial forward-backward splitting method for approximating solutions of certain optimization problems. J. Nonlinear Funct. Anal. 2020, 6 (2020). https://doi.org/10.23952/jnfa.2020.6

17. 17.

Luo, Y.: An inertial splitting algorithm for solving inclusion problems and its applications to compressed sensing. J. Appl. Numer. Optim. 2, 279–295 (2020)

18. 18.

Alvarez, F.: Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. 14, 773–782 (2004)

19. 19.

Bot, R.I., Csetnek, E.R., Laszlo, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)

20. 20.

Chidume, C.E., Kumam, P., Adamu, A.: A hybrid inertial algorithm for approximating solution of convex feasibility problems with applications. Fixed Point Theory Appl. 2020, 12 (2020) https://doi.org/10.1186/s13663-020-00678-w

21. 21.

Thong, D.V., Vinh, N.T., Cho, Y.J.: New strong convergence theorem of the inertial projection and contraction method for variational inequality problems. Numer. Algorithms 84, 285–305 (2020)

22. 22.

Attouch, H., Juan Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward-backward algorithm for convex minimization. SIAM J. Sci. Comput. 24, 232–256 (2014)

23. 23.

Moudafi, A.: On the convergence of the forward-backward algorithm for null-point problems. J. Nonlinear Var. Anal. 2, 263–268 (2018)

24. 24.

Cholamjiak, P., Shehu, Y.: Inertial forward-backward splitting method in Banach spaces with application to compressed sensing. Appl. Math. 64(4), 409–435 (2019)

25. 25.

Shehu, Y., Iyiola, O.S.: Strong convergence result for proximal split feasibility problem in Hilbert spaces. Optimization 66(12), 2275–2290 (2017)

26. 26.

Burachik, R.S., Iusem, A.N.: Set-Valued Mappings and Enlargements of Monotone Operators. Springer, Berlin (2008)

27. 27.

Huang, Y., Dong, Y.: New properties of forward-backward splitting and a practical proximal-descent algorithm. Appl. Math. Comput. 237, 60–68 (2014)

28. 28.

Takahashi, W.: Introduction to Nonlinear and Convex Analysis. Yokohama Publishers, Yokohama (2009)

29. 29.

Moudafi, A., Al-Shemas, E.: Simultaneous iterative methods for split equality problem. Trans. Math. Program. Appl. 1, 1–11 (2013)

30. 30.

Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)

31. 31.

Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Stat. Methodol. 58, 267–288 (1996)

32. 32.

Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)

33. 33.

Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.J., Sandhu, S., Guppy, K.H., Lee, S., Froelicher, V.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64, 304–310 (1989). https://doi.org/10.1016/0002-9149(89)90524-9

## Acknowledgements

DC was supported by Post-Doctoral Fellowship of Chiang Mai University, Thailand. This research was also supported by Chiang Mai University and Thailand Science Research and Innovation under the project IRN62W0007.

## Funding

This work was funded by Chiang Mai University and Thailand Science Research and Innovation.

## Author information

Authors

### Contributions

Writing original draft preparation, PS; review and editing, WI; software and editing, DC; supervision, SS. All authors have read and agreed to the published version of the manuscript.

### Corresponding author

Correspondence to Suthep Suantai.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Rights and permissions 