In this section, we establish a thresholding representation theory of the problem \((QP_{a}^{\lambda})\), which underlies the algorithm to be proposed. Then an iterative fraction thresholding algorithm (IFTA) is proposed to solve the problem \((QP_{a}^{\lambda})\) for all \(a>0\).

### 3.1 Thresholding representation theory

For any fixed positive parameters \(\lambda>0\), \(\mu>0\), \(a>0\) and \(x\in\mathbb{R}^{n}\), let

$$ C_{1}(x)= \bigl\Vert F(x)x-b \bigr\Vert _{2}^{2}+\lambda P_{a}(x) $$

(19)

and

$$ C_{2}(x, y)=\mu \bigl\Vert F(y)x-b \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)-\mu \bigl\Vert F(y)x-F(y)y \bigr\Vert _{2}^{2}+ \Vert x-y \Vert _{2}^{2}. $$

(20)

It is clear that \(C_{2}(x,x)=\mu C_{1}(x)\) for all \(\mu>0\).

### Theorem 1

*For any*
\(\lambda>0\)
*and*
\(0<\mu<L_{\ast}^{-1}\)
*with*
\(\Vert F(x^{\ast })x-F(x^{\ast})x^{\ast} \Vert _{2}^{2}\leq L_{\ast} \Vert x-x^{\ast} \Vert _{2}^{2}\). *If*
\(x^{\ast}\)
*is the optimal solution of*
\(\min_{x\in\mathbb {R}^{n}}C_{1}(x)\), *then*
\(x^{\ast}\)
*is also the optimal solution of*
\(\min_{x\in\mathbb{R}^{n}}C_{2}(x,x^{\ast})\), *that is*,

$$C_{2}\bigl(x^{\ast},x^{\ast}\bigr)\leq C_{2} \bigl(x,x^{\ast}\bigr) $$

*for any*
\(x\in\mathbb{R}^{n}\).

### Proof

By the definition of \(C_{2}(x, y)\), we have

$$\begin{aligned} C_{2}\bigl(x,x^{\ast}\bigr)&=\mu \bigl\Vert F \bigl(x^{\ast}\bigr)x-b \bigr\Vert _{2}^{2}+\lambda \mu P_{a}(x)-\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x-F \bigl(x^{\ast}\bigr)x^{\ast} \bigr\Vert _{2}^{2}+ \bigl\Vert x-x^{\ast} \bigr\Vert _{2}^{2} \\ &\geq\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x-b \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x) \\ &\geq \mu C_{1}\bigl(x^{\ast}\bigr) \\ &=C_{2}\bigl(x^{\ast},x^{\ast}\bigr). \end{aligned}$$

□

### Theorem 2

*For any*
\(\lambda>0\), \(\mu>0\)
*and solution*
\(x^{\ast}\)
*of*
\(\min_{x\in\mathbb{R}^{n}}C_{1}(x)\), \(\min_{x\in\mathbb {R}^{n}}C_{2}(x,x^{\ast})\)
*is equivalent to*

$$ \min_{x\in\mathbb{R}^{n}} \bigl\{ \bigl\Vert x-B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+ \lambda\mu P_{a}(x) \bigr\} , $$

(21)

*where*
\(B_{\mu}(x^{\ast})=x^{\ast}+\mu F(x^{\ast})^{\top }(b-F(x^{\ast})x^{\ast})\).

### Proof

By the definition, \(C_{2}(x,y)\) can be rewritten as

$$\begin{aligned} C_{2}\bigl(x,x^{\ast}\bigr)={}& \bigl\Vert x- \bigl(x^{\ast}-\mu F\bigl(x^{\ast}\bigr)^{\top}F \bigl(x^{\ast }\bigr)x^{\ast}+\mu F\bigl(x^{\ast} \bigr)^{\top}b\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)+\mu \Vert b \Vert _{2}^{2}+ \bigl\Vert x^{\ast} \bigr\Vert _{2}^{2} \\ &{}-\mu \bigl\Vert F\bigl(x^{\ast}\bigr)x^{\ast} \bigr\Vert _{2}^{2}- \bigl\Vert x^{\ast}-\mu F \bigl(x^{\ast }\bigr)^{\top}F\bigl(x^{\ast} \bigr)x^{\ast}+\mu F\bigl(x^{\ast}\bigr)^{\top}b \bigr\Vert _{2}^{2} \\ ={}& \bigl\Vert x-B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x)+\mu \Vert b \Vert _{2}^{2}+ \bigl\Vert x^{\ast} \bigr\Vert _{2}^{2}-\mu \bigl\Vert F\bigl(x^{\ast} \bigr)x^{\ast} \bigr\Vert _{2}^{2}- \bigl\Vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}, \end{aligned}$$

which implies that \(\min_{x\in\mathbb{R}^{n}}C_{2}(x,x^{\ast})\) for any \(\lambda>0\), \(\mu>0\) is equivalent to

$$\min_{x\in\mathbb{R}^{n}} \bigl\{ \bigl\Vert x-B_{\mu} \bigl(x^{\ast}\bigr) \bigr\Vert _{2}^{2}+\lambda\mu P_{a}(x) \bigr\} . $$

□

Combining Theorem 2, Theorem 1 and Lemma 1, the thresholding representation of \((QP_{a}^{\lambda})\) can be concluded by

$$ x^{\ast}=G_{a,\lambda\mu}\bigl(B_{\mu} \bigl(x^{\ast}\bigr)\bigr), $$

(22)

where the operator \(G_{a,\lambda\mu}\) is defined in Definition 1 and obtained by replacing *λ* with *λμ*. With the thresholding representations (22), the IFTA for solving the regularization problem \((QP_{a}^{\lambda})\) can be naturally defined as

$$ x^{k+1}=G_{a, \lambda\mu}\bigl(B_{\mu} \bigl(x^{k}\bigr)\bigr),\quad k=0,1,2,\ldots, $$

(23)

where \(B_{\mu}(x^{k})=x^{k}+\mu F(x^{k})^{\top}(b-F(x^{k})x^{k})\).

### 3.2 Adjusting the values for the regularization parameter \(\lambda>0\)

In this subsection, the cross-validation method (see [9, 10, 12]) is accepted to automatically adjust the value for the regularization parameter \(\lambda>0\). In other words, when some prior information is known for a regularization problem, this selection is more reasonable and intelligent. Suppose that the vector \(x^{\ast}\) of sparsity *r* is the optimal solution of the regularization problem \((QP_{a}^{\lambda})\), and without loss of generality, set

$$\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{1}\geq \bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{2}\geq\cdots\geq \bigl\vert (B_{\mu} \bigl(x^{\ast}\bigr) \bigr\vert _{r}\geq \bigl\vert (B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{r+1}\geq \cdots \geq \bigl\vert (B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{n}\geq0. $$

Then it follows from (14) that

$$\begin{aligned} &\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{i}>t_{a,\lambda\mu}^{\ast}\quad\Leftrightarrow\quad i\in\{1,2,\ldots,r \}, \\ &\bigl\vert B_{\mu}\bigl(x^{\ast}\bigr) \bigr\vert _{i}\leq t_{a,\lambda\mu}^{\ast}\quad\Leftrightarrow\quad i\in\{r+1,r+2, \ldots,n\}, \end{aligned}$$

where \(t_{a,\lambda\mu}^{\ast}\) is obtained by replacing *λ* with *λμ* in \(t_{a,\lambda}^{\ast}\).

By \(t_{a,\lambda\mu}^{2}\leq t_{a,\lambda\mu}^{1}\), we have

$$ \textstyle\begin{cases} \vert B_{\mu}(x^{\ast}) \vert _{r}\geq t_{a,\lambda\mu}^{\ast}\geq t_{a,\lambda\mu}^{2}=\sqrt{\lambda\mu}-\frac{1}{2a}; \\ \vert B_{\mu}(x^{\ast}) \vert _{r+1}< t_{a,\lambda\mu}^{\ast}\leq t_{a,\lambda \mu}^{1}=\frac{\lambda\mu}{2}a. \end{cases} $$

(24)

It follows that

$$ \frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}\leq\lambda\leq\frac {(2a \vert B_{\mu}(x^{\ast}) \vert _{r}+1)^{2}}{4a^{2}\mu}. $$

(25)

From (25), we obtain

$$\lambda\in \biggl[\frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}, \frac {(2a \vert B_{\mu}(x^{\ast}) \vert _{r}+1)^{2}}{4a^{2}\mu} \biggr]. $$

We denote by \(\lambda_{1}\) and \(\lambda_{2}\) the left and the right of the above interval, respectively:

$$\lambda_{1}=\frac{2 \vert B_{\mu}(x^{\ast}) \vert _{r+1}}{a\mu}\quad \text{and}\quad \lambda_{2}= \frac{(2a \vert B_{\mu}(x^{\ast }) \vert _{r}+1)^{2}}{4a^{2}\mu}. $$

A choice of *λ* is

$$\lambda=\textstyle\begin{cases} \lambda_{1} & \text{if } \lambda_{1}\leq\frac{1}{a^{2}\mu }; \\ \lambda_{2} & \text{if } \lambda_{1}>\frac{1}{a^{2}\mu}. \end{cases} $$

Since \(x^{\ast}\) is unknown, and \(x^{k}\) is the best available approximation to \(x^{\ast}\), so we can take

$$ \lambda=\textstyle\begin{cases} \lambda_{1,k}=\frac{2 \vert B_{\mu}(x^{k}) \vert _{r+1}}{a\mu} & \text{if } \lambda_{1,k}\leq\frac{1}{a^{2}\mu}; \\ \lambda_{2,k}=\frac{(2a \vert B_{\mu}(x^{k}) \vert _{r}+1)^{2}}{4a^{2}\mu} & \text{if } \lambda_{1,k}>\frac{1}{a^{2}\mu}, \end{cases} $$

(26)

in the *k*th iteration. That is, (26) can be used to automatically adjust the value of the regularization parameter \(\lambda >0\) during iteration.

### Remark 1

Notice that (26) is valid for any \(\mu>0\) satisfying \(0<\mu \leq \Vert F(x_{k}) \Vert _{2}^{-2}\). In general, we can take \(\mu=\mu _{k}=\frac{1-\epsilon}{ \Vert F(x_{k}) \Vert _{2}^{2}}\) with any small \(\epsilon\in(0,1)\) below. Especially, the threshold value is \(t_{a,\lambda\mu}^{\ast}=\frac{\lambda\mu}{2}a\) when \(\lambda=\lambda_{1,k}\), and \(t_{a,\lambda\mu}^{\ast}=\sqrt{\lambda\mu}-\frac{1}{2a}\) when \(\lambda=\lambda_{2,k}\).

### 3.3 Iterative fraction thresholding algorithm (IFTA)

Based on the thresholding representation (23) and the analyses given in Sect. 3.2, the proposed iterative fraction thresholding algorithm (IFTA) for regularization problem \((QP_{a}^{\lambda})\) can be naturally described in Algorithm 1.

### Remark 2

The convergence of IFTA is not proved theoretically in this paper, and this is our future work.