Skip to main content

Efficiency of orthogonal super greedy algorithm under the restricted isometry property

Abstract

We investigate the efficiency of orthogonal super greedy algorithm (OSGA) for sparse recovery and approximation under the restricted isometry property (RIP). We first show that under the RIP conditions of the measurement matrix Φ and the minimum magnitude of the nonzero coordinates of the signal, for \(l_{2}\) bounded or \(l_{\infty }\) bounded noise vector e, with explicit stopping rules, OSGA can recover the support of an arbitrary K-sparse signal x from \(y=\varPhi x +e\) in at most K steps. Then, we investigate the error performance of OSGA in m term approximation with regards to dictionaries satisfying the RIP in a separable Hilbert space. We establish a Lebesgue-type inequality for OSGA. Based on this inequality, we obtain the optimal rate of convergence for the sparse class induced by such dictionaries.

Introduction

Recovery and approximation by sparse linear combination of elements from a fixed redundant family is frequently utilized in many application areas, such as image or signal processing, PDE solvers and statistical learning, see [1]. In general, these problems are N-P hard. It is well known that greedy type algorithms are efficient approaches to solve them, see [2,3,4,5]. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. OGA is a simple yet powerful algorithm for highly nonlinear sparse approximation that has seen a large amount of research over its history, see [3, 6, 7] and the reference therein.

In this paper, we consider the orthogonal super greedy algorithm (OSGA), which is more efficient than OGA from the viewpoint of computational complexity. The performances of (OSGA) in greedy approximation and signal recovering were analyzed in [8,9,10,11,12]. We will make further study on the efficiency of the OSGA from two aspects.

In Sect. 2, we will study the efficiency of the OSGA in recovering N-dimensional sparse signal from linear measurements. This topic is also known in the literature as compressed sensing (CS), [13,14,15]. We show that OSGA can recover a K-sparse signal from noisy measurements in at most K steps. We remark that in the field of signal processing, orthogonal super greedy algorithm OSGA is also known as orthogonal multi matching pursuit (OMMP) [4]. So for the reader’s convenience, we will use the term OMMP instead of OSGA in Sect. 2.

In Sect. 3, we study the error performance of the OSGA in the general context. We investigate the efficiency of OSGA for m-term approximation with regard to dictionaries in a real separable Hilbert space H. Assuming that the dictionary satisfies the RIP condition, we establish a Lebesgue inequality for OSGA which bounds the error of OSGA by the best m-term approximation error. Based on this inequality, we derive the sharp convergence rate of OSGA on the sparse class induced by dictionary satisfying the RIP condition.

The efficiency of the OMMP in CS

In this section, we analyze the efficiency of the OMMP in compressed sensing.

We consider the following model. Suppose that \(x\in \mathbb{R}^{N}\) is an unknown N-dimensional signal and we wish to recover it from M noisy measurements y given by inner products with fixed vectors, that is,

$$ y = \varPhi x+e, $$
(2.1)

where Φ is a known \(M\times N \) measurement matrix with \(M\ll N\) and \(e\in \mathbb{R}^{M}\) is a vector of measurement errors. The error vector can either be zero, bounded, or Gaussian noise.

For \(x=(x_{j})_{j=1}^{N}\in \mathbb{R}^{N}\), define

$$ \Vert x \Vert _{p}:= \Biggl(\sum _{i=1}^{N} x_{i}^{p} \Biggr)^{1/p}, \quad 0< p< \infty , $$

and

$$ \Vert x \Vert _{\infty }:=\sup _{1\leq i\leq N}\bigl\{ \vert x_{i} \vert \bigr\} . $$

The signal \(x\in \mathbb{R}^{N}\) is said to be K-sparse if \(\|x\|_{0}:=|\textrm{supp}(x)|=|\{i: x_{i}\neq 0\}|\leq K< N\). We study the performance of OMMP for the recovery of the support of the K-sparse signal x under model (2.1).

These algorithms recover the K-sparse signal by iteratively constructing the support of it by some greedy principles.

The OMMP is a stepwise forward selection and is easy to implement, it has been used for signal recovery. Now let us recall the definition of OMMP(s) in an algorithmic way (see Algorithm 1).

Algorithm 1
figure a

Orthogonal Matching Multi Pursuit (OMMP(s))

OSGA(s) generalizes the orthogonal greedy algorithm (OGA) in the sense that it selects multiple s indices in each iteration, it can recover the sparse signal using fewer steps and further reduce the complexity. To investigate the efficiency of the OMMP in CS, we use the restricted isometry property (RIP) condition of Φ which ensures the stable recovery of x from noisy measurements. This property is introduced by Candes and Tao [16, 17] as follows.

A matrix Φ is said to satisfy RIP of order K if there exists a constant \(\delta \in (0,1)\) such that, for all K-sparse vectors x,

$$ (1-\delta ) \Vert x \Vert _{2}^{2}\leq \Vert \varPhi x \Vert _{2}^{2}\leq (1+\delta ) \Vert x \Vert _{2}^{2}. $$
(2.2)

In particular, the minimum of all δ satisfying (2.2) is referred to as an isometry constant \(\delta _{K}\).

We recall some results on the efficiency of OMMP based on RIP. These results concern the estimate of the upper bound of sK order RIP constant \(\delta _{sK}\). In the noiseless case, Liu and Temlyakov [8] proved that \(\delta _{sK}<\frac{\sqrt{s}}{(2+\sqrt{2}) \sqrt{K}}\) is a sufficient condition for OMMP(s) to recover every K-sparse signal in at most K iterations successfully. Then, Wang, Kwon, and Shim [10] relaxed the condition to \(\delta _{sK}<\frac{ \sqrt{s}}{3\sqrt{s}+\sqrt{K}}\). The constant was further relaxed as \(\delta _{sK}<\frac{\sqrt{s}}{2\sqrt{s}+\sqrt{K}}\) by Satpathi, Das, and Chakraborty [18]. In [18] the authors also pointed out the possibility of extending the results to the noisy case. On the other hand, Dan Wei [9] defined the restricted orthogonality constant \(\theta _{K,K'}\) and proved that \(\delta _{sK-s+1}+\sqrt{\frac{K}{s}}\theta _{sK-s+1,s}<1\) is a sufficient condition guaranteeing the perfect recovery of K-sparse signals by OMMP. She shows that this result implies the result of [18]. Moreover in [9] the performance of OMMP for support recovery from noisy measurements was also discussed.

Motivated by the above works, we investigate the OMMP in the general setting where noise is presented under the RIP based conditions. We prove if the sampling matrix Φ satisfies the RIP of order sK with the isometry constant

$$ \delta _{sK}< 1+\frac{1}{2}\sqrt{\frac{K}{s}}- \frac{1}{2}\sqrt{ \frac{K}{s}+4\sqrt{\frac{K}{s}}}, $$

then OMMP can recover every K-sparse signal in at most K steps with the \(l_{2}\) bounded and \(l_{\infty }\) bounded noise. It is easy to check that

$$ \frac{\sqrt{s}}{2\sqrt{s}+\sqrt{K}}< 1+\frac{1}{2}\sqrt{ \frac{K}{s}}-\frac{1}{2} \sqrt{\frac{K}{s}+4\sqrt{\frac{K}{s}}}. $$

So our results improve and extend the results in [18].

When dealing with the noisy measurements, one of the key components in OMMP is the stopping rule which depends on the structure of the noise. In general, there are several natural stopping rules for OMMP.

  1. (1)

    Stop after a fixed number of iterations K.

  2. (2)

    For \(l_{2}\) bounded noise case: The error vector e is bounded in \(l_{2}\) norm with \(\|e\|_{2}\leq b_{2}\), we set the stopping rule as \(\|r^{k}\|_{2}\leq b_{2}\).

  3. (3)

    For \(l_{\infty }\) bounded noise case: The error vector e is bounded in \(l_{\infty }\) norm with \(\|\varPhi ^{\ast }e\|_{\infty }\leq b _{\infty }\), we set the stopping rule as

    $$ \bigl\Vert \varPhi ^{\ast }r^{k} \bigr\Vert _{\infty } \leq \biggl(1+\frac{\sqrt{sK}\delta _{sK}}{1-\delta _{sK}} \biggr)b_{\infty }, $$

    where \(\varPhi ^{\ast }\) is the transpose of Φ.

Since we mainly consider \(l_{2}\) bounded noise and \(l_{\infty }\) bounded noise, we use stopping rules (2) and (3) respectively. In Sect. 2.1, some notations are introduced and some consequences of restricted isometry properties are presented. In Sect. 2.2, we give the RIP based sufficient conditions for OMMP(s) with \(l_{2}\) bounded noise case by using the stopping rule (2). In Sect. 2.3, we establish the results in the \(l_{\infty }\) bounded noise case by using the stopping rule (3). In Sect. 2.4, we give the RIP based sufficient conditions for OMMP(s) with high probability in the Gaussian noise case.

Preliminary

The following notations will be used in this section. For a signal \(x\in \mathbb{R}^{N}\), let \(\varOmega =\{1,\ldots , N\}\) denote the index set of x and \(T=\mathrm{supp}(x)=\{i: x_{i}\neq 0\}\) the support of it.

Suppose that \(\phi _{1},\phi _{2},\ldots,\phi _{N}\) are the columns of the matrix Φ. We always assume \(\|\phi _{i}\|_{2}=1\), \(1\leq i\leq N\). Suppose that Λ is a subset of Ω, let \(\varPhi _{\varLambda }\) denote the matrix Φ restricted to the columns indexed by Λ. We use the same way to define \(x_{\varLambda }\) for the vector \(x\in \mathbb{R}^{N}\). Thus

$$ \varPhi _{\varLambda }x=\varPhi x_{\varLambda }=\varPhi _{\varLambda }x_{\varLambda }. $$
(2.3)

We recall some useful consequences of the RIP.

Lemma 2.1

([5])

Suppose that Γ and Λ are two disjoint sets of indices. If the matrix Φ satisfies the RIP of order \(|\varGamma \cup \varLambda |\) with isometry constant \(\delta _{|\varGamma \cup \varLambda |}\), then for any \(x\in \mathbb{R}^{| \varLambda |}\) we have

$$ \bigl\Vert \varPhi _{\varGamma }^{\ast }\varPhi _{\varLambda }x \bigr\Vert _{2} \leq \delta _{ \vert \varGamma \cup \varLambda \vert } \Vert x \Vert _{2} $$
(2.4)

and

$$ \bigl\Vert \bigl(\varPhi _{\varLambda }^{\ast }\varPhi _{\varLambda }\bigr)^{-1}x \bigr\Vert _{2} \leq \frac{1}{(1- \delta _{ \vert \varLambda \vert })} \Vert x \Vert _{2}. $$
(2.5)

Lemma 2.2

([6])

Let \(x\in \mathbb{R}^{N}\) be a K-sparse vector. Suppose that the matrix Φ satisfies the RIP of order K with isometry constant \(\delta _{K}\). Then, for \(\varLambda \subset \varOmega \),

$$ (1-\delta _{K}) \Vert x_{T \setminus \varLambda } \Vert _{2}\leq \bigl\Vert \varPhi _{T \setminus \varLambda }^{\ast }\bigl(I- \varPhi _{\varLambda }\varPhi _{\varLambda } ^{\dagger }\bigr)\varPhi _{T \setminus \varLambda }x_{T \setminus \varLambda } \bigr\Vert _{2} \leq (1+\delta _{K}) \Vert x_{T \setminus \varLambda } \Vert _{2}, $$
(2.6)

where \(\varPhi ^{\dagger }=(\varPhi ^{\ast }\varPhi )^{-1}\varPhi ^{\ast }\) is the Moore–Penrose pseudo-inverse of Φ.

The residual vector after k steps can be written as

$$\begin{aligned} r^{k} =&y-\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger }y=(I-P_{k})y \\ =& (I-P_{k}) (\varPhi x+e) \\ =& (I-P_{k})\varPhi x+(I-P_{k})e \\ =&r_{1}^{k}+r_{2}^{k}, \end{aligned}$$

where \(P_{k}=\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger }\) denotes the projection onto the linear space spanned by \(\{\phi _{i}, i\in \varLambda ^{k}\}\), \(r_{1}^{k}=(I-P_{k})\varPhi x \) is the signal part of the residual, and \(r_{2}^{k}=(I-P_{k})e\) is the noise part of the residual.

Denote:

$$ T_{k}=\varLambda ^{k}\cap T,\qquad A_{k}=T\setminus\varLambda ^{k},\qquad B_{k}=T \cup \varLambda ^{k}. $$

Suppose that \(S_{k}\subseteq \varOmega \setminus B_{k}\), \(|S_{k}|=s\), such that

$$ \min _{i\in S_{k}} \bigl\vert \bigl\langle r^{k},\phi _{i} \bigr\rangle \bigr\vert \geq \max _{i\in (\varOmega \setminus B_{k})\setminus S_{k}} \bigl\vert \bigl\langle r ^{k},\phi _{i} \bigr\rangle \bigr\vert . $$

The difference between OMMP(s) and the standard orthogonal matching pursuit (OMP) is that at each step of the OMMP(s), s elements are simultaneously selected from a dictionary instead of one. For a sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration, we need to give a lower bound of \(M_{S_{k}}-m_{S_{k}}\). So we provide the following lemma which plays an important role in our analysis.

Lemma 2.3

Let \(m_{S_{k}}:=\min_{i\in S_{k}} \{ |\langle r_{1}^{k},\phi _{i} \rangle |\}\) and \(M_{A_{k}}:=\max_{i\in A_{k}} \{|\langle r_{1}^{k},\phi _{i} \rangle |\}\). Assume \(A_{k}\neq \emptyset \). Then there hold

$$ m_{S_{k}}\leq \frac{\delta _{sK}}{\sqrt{s}(1-\delta _{sK})} \Vert x_{A_{k}} \Vert _{2} $$
(2.7)

and

$$ M_{A_{k}}\geq \frac{1-\delta _{sK}}{\sqrt{ \vert A_{k} \vert }} \Vert x_{A_{k}} \Vert _{2}. $$
(2.8)

Proof

Using the fact \((I-P_{k})\varPhi _{\varLambda ^{k}}=\varPhi _{\varLambda ^{k}}- \varPhi _{\varLambda ^{k}}(\varPhi _{\varLambda ^{k}}^{\ast }\varPhi _{\varLambda ^{k}})^{-1} \varPhi _{\varLambda ^{k}}^{\ast }\varPhi _{\varLambda ^{k}}=0\), we express

$$ r_{1}^{k}=(I-P_{k})\varPhi x $$

as

$$\begin{aligned} r_{1}^{k} =& (I-P_{k})\varPhi _{T} x_{T}=(I-P_{k}) (\varPhi _{T_{k}} x_{T_{k}}+ \varPhi _{A_{k}} x_{A_{k}} ) \\ =&(I-P_{k})\varPhi _{A_{k}} x_{A_{k}}=\bigl(I-\varPhi _{\varLambda ^{k}} \varPhi _{\varLambda ^{k}}^{\dagger }\bigr)\varPhi _{A_{k}} x_{A_{k}} \\ =& \varPhi _{A_{k}}x_{A_{k}}-\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{ \dagger }\varPhi _{A_{k}}x_{A_{k}} . \end{aligned}$$
(2.9)

Since \(\varLambda ^{k}\cap A_{k}=\emptyset \), applying (2.5) and (2.4) of Lemma 2.1, and the fact that for any two integers \(K\leq K'\) it holds

$$ \delta _{K}\leq \delta _{K'}, $$
(2.10)

it follows that

$$\begin{aligned} \bigl\Vert \varPhi _{\varLambda ^{k}}^{\dagger }\varPhi _{A_{k}}x_{A_{k}} \bigr\Vert _{2} =& \bigl\Vert \bigl( \varPhi _{\varLambda ^{k}}^{\ast } \varPhi _{\varLambda ^{k}}\bigr)^{-1}\varPhi _{\varLambda ^{k}} ^{\ast }\varPhi _{A_{k}}x_{A_{k}} \bigr\Vert _{2} \\ \leq &\frac{1}{(1-\delta _{sk})} \bigl\Vert \varPhi _{\varLambda ^{k}}^{\ast }\varPhi _{A _{k}}x_{A_{k}} \bigr\Vert _{2} \\ \leq &\frac{\delta _{sk+ \vert A_{k} \vert }}{(1-\delta _{sk})} \Vert x_{A_{k}} \Vert _{2} \leq \frac{\delta _{sK}}{(1-\delta _{sK})} \Vert x_{A_{k}} \Vert _{2} . \end{aligned}$$
(2.11)

By (2.9), (2.11), and Lemma 2.1, we have

$$\begin{aligned} \bigl\Vert \varPhi _{S_{k}}^{\ast }r_{1}^{k} \bigr\Vert _{2} =& \bigl\Vert \varPhi _{S_{k}}^{\ast } \bigl( \varPhi _{A_{k}}x_{A_{k}}-\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger } \varPhi _{A_{k}}x_{A_{k}}\bigr) \bigr\Vert _{2} \\ \leq & \bigl\Vert \varPhi _{S_{k}}^{\ast }\varPhi _{A_{k}}x_{A_{k}} \bigr\Vert _{2}+\bigl\| \varPhi _{S _{k}}^{\ast }\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger } \varPhi _{A _{k}}x_{A_{k}})\bigr\| _{2} \\ \leq &\delta _{s+ \vert A_{k} \vert } \Vert x_{A_{k}} \Vert _{2}+ \delta _{s(k+1)} \bigl\Vert \varPhi _{\varLambda ^{k}}^{\dagger }\varPhi _{A_{k}}x_{A_{k}} \bigr\Vert _{2} \\ \leq &\delta _{sK} \Vert x_{A_{k}} \Vert _{2}+ \delta _{sK}\frac{\delta _{sK}}{(1- \delta _{sK})} \Vert x_{A_{k}} \Vert _{2} \\ = & \frac{\delta _{sK}}{(1-\delta _{sK})} \Vert x_{A_{k}} \Vert _{2} . \end{aligned}$$
(2.12)

Thus

$$ m_{S_{k}} \leq \frac{1}{\sqrt{s}} \bigl\Vert \varPhi _{S_{k}}^{\ast }r_{1}^{k} \bigr\Vert _{2} \leq \frac{\delta _{sK}}{\sqrt{s}(1-\delta _{sK})} \Vert x_{A_{k}} \Vert _{2}. $$

Using (2.9) and Lemma 2.2, we obtain

$$\begin{aligned} \bigl\Vert \varPhi _{A_{k}}^{\ast }r_{1}^{k} \bigr\Vert _{2} =&\bigl\| \varPhi _{A_{k}}^{\ast }\bigl(I- \varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{\dagger }\bigr)\varPhi _{A_{k}}x_{A_{k}})\bigr\| _{2} \\ =& \bigl\| \varPhi _{T\setminus \varLambda ^{k}}^{\ast }\bigl(I-\varPhi _{\varLambda ^{k}} \varPhi _{\varLambda ^{k}}^{\dagger }\bigr)\varPhi _{T\setminus \varLambda ^{k}}x_{T\setminus \varLambda ^{k}}) \bigr\| _{2} \\ \geq &(1-\delta _{sK}) \Vert x_{T\setminus \varLambda ^{k}} \Vert _{2}=(1-\delta _{sK}) \Vert x_{A_{k}} \Vert _{2}. \end{aligned}$$
(2.13)

Hence

$$ M_{A_{k}}\geq \frac{1}{\sqrt{ \vert A_{k} \vert }} \bigl\Vert \varPhi _{A_{k}}^{\ast }r_{1} ^{k} \bigr\Vert _{2} \geq \frac{(1-\delta _{sK})}{\sqrt{ \vert A_{k} \vert }} \Vert x_{A_{k}} \Vert _{2}. $$

Therefore Lemma 2.3 is proved. □

\(l_{2}\) bounded noise

We shall discuss the performance of OMMP(s) for the \(l_{2}\) bounded noise case. That is, the error vector e is bounded in \(l_{2}\) norm with \(\|e\|_{2}\leq b_{2}\) for some constant \(b_{2}>0\). With a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of the K-sparse signal of x, OMMP(s) can recover the support of x in at most K iterations.

Theorem 2.1

Suppose that \(\|e\|_{2}\leq b_{2}\) and Φ satisfies the RIP of order sK with the isometry constant \(\delta :=\delta _{sK}<1+ \frac{1}{2}\sqrt{\frac{K}{s}}-\frac{1}{2}\sqrt{\frac{K}{s}+4\sqrt{ \frac{K}{s}}}\). Then OMMP(s) with the stopping rule \(\|r^{k}\|_{2} \leq b_{2}\) recovers the support of any K-sparse signal \(x\in \mathbb{R}^{N}\) from \(y=\varPhi x +e\) provided that all the nonzero components \(x_{i}\) satisfy

$$ \vert x_{i} \vert >\frac{2b_{2}}{(1-\delta )-\sqrt{\frac{K}{s}}\frac{\delta }{1- \delta }}. $$
(2.14)

Proof

Denote \(E_{k}=\max_{i\in \varOmega } \{|\langle r_{2}^{k},\phi _{i} \rangle |\}\). Recalling the definition of \(m_{S_{k}}\) and \(M_{A_{k}}\) in Lemma 2.3, there hold

$$ \min _{i\in S_{k}} \bigl\{ \bigl\vert \bigl\langle r^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} \leq \min _{i\in S_{k}} \bigl\{ \bigl\vert \bigl\langle r_{1}^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} + \max _{i\in \varOmega } \bigl\{ \bigl\vert \bigl\langle r_{2}^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} =m _{S_{k}}+E_{k} $$

and

$$ \max _{i\in A_{k}} \bigl\{ \bigl\vert \bigl\langle r^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} \geq \max _{i\in A_{k}} \bigl\{ \bigl\vert \bigl\langle r_{1}^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} - \max _{i\in \varOmega } \bigl\{ \bigl\vert \bigl\langle r_{2}^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} =M _{A_{k}}-E_{k}. $$

It is clear that in order for OMMP(s) to select at least one correct index at this step, it is necessary to have

$$ \max _{i\in A_{k}} \bigl\{ \bigl\vert \bigl\langle r^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} > \min _{i\in S_{k}} \bigl\{ \bigl\vert \bigl\langle r^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} . $$

A sufficient condition to guarantee OMMP(s) to select at least one correct index at each iteration until all correct indices are selected is

$$ M_{A_{k}}-E_{k}>m_{S_{k}}+E_{k}. $$

That is,

$$ 2E_{k}< M_{A_{k}}- m_{S_{k}} $$
(2.15)

holds for \(k=0,1,\ldots ,K-1\). We first provide an upper bound for \(E_{k}\). The assumption \(\|e\|\leq b_{2}\) yields that

$$ E_{k}=\max _{i\in \varOmega } \bigl\{ \bigl\vert \bigl\langle r_{2}^{k},\phi _{i} \bigr\rangle \bigr\vert \bigr\} \leq \max _{i\in \varOmega } \Vert \phi _{i} \Vert _{2} \bigl\Vert r_{2}^{k} \bigr\Vert _{2}= \bigl\Vert (I-P_{k})e \bigr\Vert _{2}\leq \Vert e \Vert _{2}\leq b_{2}. $$
(2.16)

Next we give a lower bound of \(M_{A_{k}}- m_{S_{k}}\). Using Lemma 2.3 and assumption (2.14), we have

$$\begin{aligned} M_{A_{k}}- m_{S_{k}} \geq & \biggl(\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }}- \frac{ \delta }{\sqrt{s}(1-\delta )} \biggr) \Vert x_{A_{k}} \Vert _{2} \\ \geq & \biggl(\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }}-\frac{\delta }{ \sqrt{s}(1-\delta )} \biggr)\sqrt{ \vert A_{k} \vert } \frac{2b_{2}}{(1-\delta )-\sqrt{\frac{K}{s}}\frac{\delta }{1-\delta }} \\ >& 2b_{2}. \end{aligned}$$

This means OMMP(s) will select at least one correct index at the \(k+1\)-th iteration, and OMMP(s) can recover the support of x in at most K iterations. It remains to show that under the stopping rule \(\|r^{k}\|\leq b_{2}\), OMMP exactly stops when all the correct indices are selected.

First, assume that \(A_{k}=\emptyset \), then \(T\subseteq \varLambda ^{k}\) and \((I-P_{k})\varPhi x=0\). Thus

$$ \bigl\Vert r^{k} \bigr\Vert _{2}= \bigl\Vert (I-P_{k})\varPhi x+(I-P_{k})e \bigr\Vert _{2}= \bigl\Vert (I-P_{k})e \bigr\Vert _{2} \leq \Vert e \Vert _{2}\leq b_{2}. $$

So when all the K correct indices are selected, the \(l_{2}\) norm of the residual will be less than \(b_{2}\) and hence OMMP(s) stops. Now we show that OMMP(s) does not stop early. Suppose that the OMMP(s) has run k steps for some \(k< K\). We will verify that \(\|r^{k}\|_{2}>b _{2}\) and so OMMP(s) does not stop at the current step.

Secondly, assume that \(A_{k}\neq \emptyset \), then

$$\begin{aligned} \bigl\Vert r^{k} \bigr\Vert _{2} =& \bigl\Vert (I-P_{k})\varPhi x+(I-P_{k})e \bigr\Vert _{2} \\ \geq & \bigl\Vert (I-P_{k})\varPhi x \bigr\Vert _{2}- \bigl\Vert (I-P_{k})e \bigr\Vert _{2} \\ =& \bigl\Vert \varPhi _{A_{k}}x_{A_{k}}-\varPhi _{\varLambda ^{k}}\varPhi _{\varLambda ^{k}}^{ \dagger }\varPhi _{A_{k}}x_{A_{k}} \bigr\Vert _{2}- \bigl\Vert (I-P_{k})e \bigr\Vert _{2} \\ =& \bigl\Vert \varPhi _{B_{k}}\overline{x}^{(k)} \bigr\Vert _{2}- \bigl\Vert (I-P_{k})e \bigr\Vert _{2} \\ \geq & \bigl\Vert \varPhi _{B_{k}}\overline{x}^{(k)} \bigr\Vert _{2}- \Vert e \Vert _{2} \\ \geq &\sqrt{1-\delta _{ \vert B_{k} \vert }} \bigl\Vert \overline{x}^{(k)} \bigr\Vert _{2}-b_{2} \\ \geq &\sqrt{1-\delta } \Vert x_{A_{k}} \Vert _{2}-b_{2} \\ \geq &\sqrt{1-\delta }\sqrt{ \vert A_{k} \vert } \frac{2b_{2}}{(1-\delta )-\sqrt{ \frac{K}{s}}\frac{\delta }{1-\delta }}-b_{2} \\ >&b_{2}, \end{aligned}$$

where \(\overline{x}^{(k)}= (x_{A_{k}}\ -\varPhi _{\varLambda ^{k}} ^{\dagger }\varPhi _{A_{k}}x_{A_{k}} )^{\ast }\), and to derive the third inequality, we use the RIP condition. Hence OMMP(s) does not stop early, and the theorem is proved. □

\(l_{\infty }\) bounded noise

We now give the RIP based sufficient conditions for OMMP(s) with \(l_{\infty }\) bounded noise. Our result is the following theorem.

Theorem 2.2

Suppose that \(\|\varPhi ^{\ast }e\|_{\infty }\leq b_{\infty }\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1. Then OMMP(s) with the stopping rule

$$ \bigl\Vert \varPhi ^{\ast }r^{k} \bigr\Vert _{\infty } \leq \biggl(1+\frac{\sqrt{sK}\delta }{1- \delta }\biggr)b_{\infty } $$

finds the support of any K-sparse signal x if all the nonzero coefficients \(x_{i}\) satisfy

$$ \vert x_{i} \vert >\frac{2(1+\frac{\sqrt{sK}\delta }{1-\delta })b_{\infty }}{(1- \delta )-\sqrt{\frac{K}{s}}\frac{\delta }{1-\delta }}. $$
(2.17)

Proof

Denote \(E_{k}=\|\varPhi ^{\ast }(I-P_{k})e\|_{\infty }=\max_{i\in \varOmega } |\langle r_{2}^{k},\phi _{i} \rangle |\). We first prove that

$$ 2E_{k}< M_{A_{k}}- m_{S_{k}} $$
(2.18)

holds for \(k=0,1,\ldots ,K-1\), where \(m_{S_{k}}\) and \(M_{A_{k}}\) are defined as in Lemma 2.3. It follows from the proof of Theorem 2.1 that inequality (2.18) can ensure OMMP recovers the true support of x in at most K steps.

We first provide an upper bound of \(E_{k}\). The assumption \(\|\varPhi ^{\ast }e\|_{\infty }\leq b_{\infty }\) and the fact \(\varPhi _{\varLambda ^{k}}^{\ast }(I-P_{k})e=0\) yield that

$$\begin{aligned} E_{k} =&\max _{i\in \varOmega } \bigl\vert \bigl\langle r_{2}^{k}, \phi _{i} \bigr\rangle \bigr\vert = \max _{i\in \varOmega \setminus \varLambda ^{k}} \bigl\vert \phi _{i}^{\ast }(I-P _{k})e \bigr\vert \\ \leq & \bigl\Vert \varPhi _{\varOmega \setminus \varLambda ^{k}}^{\ast }e \bigr\Vert _{\infty }+ \bigl\Vert \varPhi _{\varOmega \setminus \varLambda ^{k}}^{\ast }P_{k} e \bigr\Vert _{\infty } \\ \leq & b_{\infty }+\max _{i\in \varOmega \setminus \varLambda ^{k}} \bigl\Vert \phi _{i}^{\ast } \varPhi _{\varLambda ^{k}}\bigl(\varPhi _{\varLambda ^{k}}^{\ast } \varPhi _{\varLambda ^{k}} \bigr)^{-1}\varPhi _{\varLambda ^{k}}^{\ast }e \bigr\Vert _{2} \\ \leq & b_{\infty }+\delta \bigl\Vert \bigl(\varPhi _{\varLambda ^{k}}^{\ast } \varPhi _{\varLambda ^{k}}\bigr)^{-1}\varPhi _{\varLambda ^{k}}^{\ast }e \bigr\Vert _{2} \\ \leq & b_{\infty }+\frac{\delta }{1-\delta } \bigl\Vert \varPhi _{\varLambda ^{k}} ^{\ast }e \bigr\Vert _{2} \\ \leq & b_{\infty }+\frac{\delta }{1-\delta } \sqrt{sk}b_{\infty } \\ \leq & \biggl(1 +\frac{\delta \sqrt{sK}}{1-\delta } \biggr)b_{ \infty }. \end{aligned}$$
(2.19)

Here, to derive the third and fourth inequalities, we have used (2.4) and (2.5) of Lemma 2.1 respectively.

On the other hand, we can obtain a lower bound of \(M_{A_{k}}- m_{S _{k}}\). By using Lemma 2.3 and assumption (2.17), we have

$$\begin{aligned} M_{A_{k}}- m_{S_{k}} \geq & \biggl(\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }}- \frac{ \delta }{\sqrt{s}(1-\delta )} \biggr) \Vert x_{A_{k}} \Vert _{2} \\ \geq & \biggl(\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }}-\frac{\delta }{ \sqrt{s}(1-\delta )} \biggr)\sqrt{ \vert A_{k} \vert } \frac{2(1+\frac{ \sqrt{sK}\delta }{1-\delta })b_{\infty }}{(1-\delta )-\sqrt{ \frac{K}{s}}\frac{\delta }{1-\delta }} \\ >& 2 \biggl(1+\frac{\sqrt{sK}\delta }{1-\delta } \biggr)b_{\infty }. \end{aligned}$$
(2.20)

The bounds (2.19) and (2.20) imply that (2.18) holds for \(k=0,1,\ldots ,K-1\). It remains to show that OMMP (s) exactly stops when all the correct indices are selected under the stopping rule \(\|\varPhi ^{\ast }r^{k}\|_{\infty }\leq (1+\frac{\sqrt{sK}\delta _{sK}}{1- \delta _{sK}})b_{\infty }\).

First, assume that \(A_{k}=\emptyset \), then \(T\subseteq \varLambda ^{k}\) and \((I-P_{k})\varPhi x=0\). Thus

$$ \bigl\Vert \varPhi ^{\ast }r^{k} \bigr\Vert _{\infty }= \bigl\Vert \varPhi ^{\ast }(I-P_{k}) (\varPhi x+e) \bigr\Vert _{\infty }= \bigl\Vert \varPhi ^{\ast }(I-P_{k})e \bigr\Vert _{\infty }\leq \biggl(1 +\frac{ \delta \sqrt{sK}}{1-\delta } \biggr)b_{\infty }, $$

where the last inequality is deduced from (2.19). Hence, OMMP(s) stops when all the correct indices are selected.

Secondly, assume that \(A_{k}\neq \emptyset \), then

$$\begin{aligned} \bigl\Vert \varPhi ^{\ast }r^{k} \bigr\Vert _{\infty } =& \bigl\Vert \varPhi ^{\ast }(I-P_{k}) (\varPhi x+e) \bigr\Vert _{\infty } \\ \geq & \bigl\Vert \varPhi ^{\ast }(I-P_{k})\varPhi x \bigr\Vert _{\infty }- \bigl\Vert \varPhi ^{\ast }(I-P _{k})e \bigr\Vert _{\infty } \\ \geq & \max _{i\in A_{k}} \bigl\vert \bigl\langle (I-P_{k})\varPhi x,\phi _{i} \bigr\rangle \bigr\vert - \bigl\Vert \varPhi ^{\ast }(I-P_{k})e \bigr\Vert _{\infty } \\ \geq &\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }} \Vert x_{A_{k}} \Vert - \biggl(1 + \frac{ \delta \sqrt{sK}}{1-\delta } \biggr)b_{\infty } \\ \geq &\frac{1-\delta }{\sqrt{ \vert A_{k} \vert }}\sqrt{ \vert A_{k} \vert } \frac{2(1+\frac{ \sqrt{sK}\delta }{1-\delta })b_{\infty }}{(1-\delta )- \sqrt{ \frac{K}{s}}\frac{\delta }{1-\delta }}- \biggl(1 +\frac{\delta \sqrt{sK}}{1-\delta } \biggr)b_{\infty } \\ =&\frac{(1-\delta )2(1+\frac{\sqrt{sK}\delta }{1-\delta })b_{ \infty }}{(1-\delta )- \sqrt{\frac{K}{s}}\frac{\delta }{1-\delta }}- \biggl(1 +\frac{\delta \sqrt{sK}}{1-\delta } \biggr)b_{\infty } \\ >& \biggl(1 +\frac{\delta \sqrt{sK}}{1-\delta } \biggr)b_{\infty }. \end{aligned}$$

Here, to derive the third and fourth inequalities, we have used (2.8) of Lemma 2.3 and assumption (2.17), respectively. Hence OMMP(s) does not stop early, and the theorem is proved. □

Gaussian noise

As an application of the above results on \(l_{2}\) and \(l_{\infty }\) bounded noise cases, we shall discuss the performance of OMMP(s) on recovering K-sparse signals with the Gaussian noise. Gaussian noise is of particular interest in statistics since it is probably the best simulation of real noise when the noise source is particularly complex, cf. [19, 20].

The motivation of applying the results on the \(l_{2}\) bounded and \(l_{\infty }\) bounded noise cases to the Gaussian noise case comes from the following results of Cai, Xu, and Zhang [21], which shows that the Gaussian noise e is essentially bounded in both \(l_{2}\) and \(l_{\infty }\) norms. They have shown that if e is zero-mean white Gaussian noise with covariance \(\sigma ^{2}I_{M\times M}\), that is, \(e\sim N(0,\sigma ^{2}I_{M\times M})\), then there hold

$$ P \bigl(e\in \bigl\{ e: \Vert e \Vert _{2}\leq \sigma \sqrt{M+2 \sqrt{M \log M}} \bigr\} \bigr)\geq 1-\frac{1}{M} $$

and

$$ P \bigl(e\in \bigl\{ e: \bigl\Vert \varPhi ^{\ast }e \bigr\Vert _{\infty }\leq \sigma \sqrt{2 \log N} \bigr\} \bigr)\geq 1- \frac{1}{2\sqrt{\pi \log N}}. $$

Combining the above results with those of Theorem 2.1 and Theorem 2.2, we immediately obtain the following theorems.

Theorem 2.3

Suppose that \(e\sim N(0,\sigma ^{2}I_{M\times M})\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1 and nonzero components \(x_{i}\) satisfy

$$ \vert x_{i} \vert >\frac{2\sigma \sqrt{M+2\sqrt{M\log M}}}{(1-\delta )-\sqrt{ \frac{K}{s}}\frac{\delta }{1-\delta }}. $$

Then OMMP(s) with the stopping rule \(\|r^{k}\|_{2}\leq \sigma \sqrt{M+2\sqrt{M \log M}}\) finds the support of any K-sparse signal \(x\in \mathbb{R} ^{N}\) from \(y=\varPhi x +e\) with probability at least \(1-\frac{1}{M}\).

Theorem 2.4

Suppose that \(e\sim N(0,\sigma ^{2}I_{M\times M})\) and Φ satisfies the RIP of order sK with the same isometry constant as in Theorem 2.1 and nonzero components \(x_{i}\) satisfy

$$ \vert x_{i} \vert >\frac{2(1+\frac{\sqrt{sK}\delta }{1-\delta })\sigma \sqrt{2 \log N}}{(1-\delta )-\sqrt{\frac{K}{s}}\frac{\delta }{1-\delta }}. $$

Then OMMP(s) with the stopping rule \(\|\varPhi ^{\ast }r^{k}\|_{\infty }\leq (1+\frac{\sqrt{sK}\delta }{1-\delta })\sigma \sqrt{2\log N}\) finds the support of any K-sparse signal \(x\in \mathbb{R}^{N}\) from \(y=\varPhi x +e\) with probability at least \(1-\frac{1}{2\sqrt{\pi \log N}}\).

Theorem 2.3 and Theorem 2.4 show that, with a suitable stopping rule and a reasonable condition on the minimum magnitude of the nonzero coordinates of the K-sparse signal of x, OMMP(s) can recover the support of x in at most K iterations with high probability from measurements with Gaussian noise.

Error performance of the OSGA

In this section, we study the error performance of the OSGA in the general context. Let H be a real separable Hilbert space with an inner product \(\langle \cdot , \cdot \rangle \) and the norm \(\|x\|:=\langle x , x\rangle ^{1/2}\) for all \(x\in \mathit{H}\). Recall that a set \(\mathcal{D}\) from H is called a dictionary if

$$ \mathcal{D}=\{\phi _{i},i\in I\}\subset \mathit{H},\quad \mbox{and} \quad \overline{\operatorname{span}} {\mathcal{D}} = \mathit{H}. $$

Without loss of generality we shall assume that the dictionary \(\mathcal{D}\) is normalized, i.e.,

$$ \Vert \phi _{i} \Vert =1,\quad i\in I. $$

The standard measure of approximation power of a dictionary is the error of the best m-term approximation. Given a dictionary \(\mathcal{D}\), we define the m-sparse class as

$$ \varSigma _{m}:=\varSigma _{m}(\mathcal{D}):= \biggl\{ \sum _{j\in \varLambda }c _{j}\phi _{j} : c_{j}\in \mathbb{R},\phi _{j}\in \mathcal{D},\sharp ( \varLambda ) = m \biggr\} . $$

The error of best m-term approximation to a function \(f\in H \) from the dictionary \(\mathcal{D}\) is defined as

$$ \sigma _{m}(f):=\sigma _{m}(f, \mathcal{D}):=\inf _{ g\in \varSigma _{m} } \Vert f-g \Vert . $$
(3.1)

Now we will use OSGA(s) to generate a m term approximant \(f_{m}\) and estimate the error \(\|f-f_{m}\|\) in terms of \(\sigma _{m}(f)\).

So let us first recall the definition of the OSGA(s) (see Algorithm 2).

Algorithm 2
figure b

Orthogonal Super Greedy Algorithm (OSGA(s))

To investigate the error performance of SOGA, one needs to make some assumptions on the dictionary \(\mathcal{D}\). A simple and useful assumption is the coherence μ of a dictionary \(\mathcal{D}\) defined by

$$ \mu :=\mu (\mathcal{D}):=\sup _{\phi ,\psi \in {\mathcal{D}},\phi \neq \psi } \bigl\vert \langle \phi , \psi \rangle \bigr\vert . $$

Dictionaries with coherence μ are called μ-coherent. It was proved in [8] that for μ-coherent dictionaries the OSGA provides the same (in the sense of order) upper bound on the sparse class for the error as the OGA. In this paper, we will improve this results under a weaker assumption on a dictionary. This assumption is RIP which is formulated in a finite dimensional context in Sect. 2. Here we require the definition of RIP in terms of the dictionary instead of the measurement matrix in a infinite dimensional context:

A dictionary \(\mathcal{D}\) is said to satisfy RIP of order k if there exists a constant \(\delta \in (0,1)\) such that, for any subset \(J\subset I\) with \(\sharp (J)\leq k\) and any scalars \(a_{i}\), \(i\in J\), the following inequalities hold:

$$ (1-\delta ) \biggl(\sum _{i\in J} \vert a_{i} \vert ^{2}\biggr)\leq \biggl\Vert \sum _{i \in J} a_{i}\phi _{i} \biggr\Vert ^{2}\leq (1+\delta ) \biggl(\sum _{i\in J} \vert a _{i} \vert ^{2}\biggr). $$
(3.2)

As before, the minimum of all δ satisfying (3.2) is referred to as an isometry constant \(\delta _{k}\).

It is known from [22] that if the dictionary \(\mathcal{D}\) has coherence μ, then it satisfies RIP of order k for \(k\leq \mu ^{-1}+1\) with isometry constant \(\delta _{k}\leq \mu (k-1)\), but not vice versa.

Recently, under the RIP assumption, the authors in [7] proved the almost optimality of OGA by establishing the corresponding Lebesgue-type inequality. This inequality quantifies the efficiency of OGA for individual elements. It is natural to ask if we can establish Lebesgue-type inequality for OSGA(s). We give a positive answer to this question. In fact, we prove the following inequality for the OSGA(s) when the dictionary \(\mathcal{D}\) satisfies RIP. This is the first Lebesgue-type inequality for the super greedy type algorithms obtained so far as we know.

Theorem 3.1

Given parameter \(s\in \mathbb{N}\), there exist fixed constants \(A:=88\), \(\delta ^{\ast }:=1/10\), \(C:=8\) such that the following holds for all \(n\geq 0\): if \(\mathcal{D}\) is a dictionary in a Hilbert space H satisfying RIP of order \((As+1)n\) with isometry constant \(\delta _{(As+1)n}\leq \delta ^{\ast }\), then for any target function \(f\in \mathit{H}\), we have

$$ \Vert r_{An} \Vert = \Vert f-f_{An} \Vert \leq C \sigma _{n}(f,\mathcal{D}). $$

In Theorem 3.1, we remark that the values of A, \(\delta ^{ \ast }\), C for which the above result holds are coupled. For example, it is possible to obtain a smaller value of C at the price of a larger value of A or of a smaller value of \(\delta ^{\ast }\).

Based on this inequality, we will derive the convergence rate of OSGA on the sparse class induced by dictionaries satisfying RIP conditions.

The rest part of this section is organized as follows. The Sects. 3.1 and 3.2 are devoted to the proof of Theorem 3.1. In Sect. 3.3, as an application of Theorem 3.1, we study the rate of convergence of OSGA on the sparse class induced by RIP dictionaries.

Reduction of the residual

To prove Theorem 3.1, we establish the following lemma which quantifies the reduction of the residuals generated by the OSGA(s) under the RIP condition. When \(s=1\), this lemma was proved in [7]. However, to deal with the case of \(s>1\), we need some new techniques. We need to establish a new lemma (Lemma 3.1). In what follows, we denote by

$$ \varLambda _{k}:=\bigcup_{i=1}^{k} I_{i}:=\{i_{1},\ldots ,i_{sk}\} $$

the set of indices selected after k iterations of the OSGA(s) applied to the target element \(f\in \mathit{H}\).

Lemma 3.1

Let \((f_{k})_{k\geq 0}\) be the sequence of approximations generated by the OSGA(s) applied to f, and let \(g=\sum_{i\in T}z_{i}\phi _{i}\), \(\sharp (T)\leq n\). Then, if T is not contained in \(\varLambda _{k}\), we have

$$ \Vert r_{k+1} \Vert ^{2}\leq \Vert r_{k} \Vert ^{2}-\frac{1- \delta _{\sharp (T\cup \varLambda _{k})}}{(1+\delta _{s})\sharp (T\backslash \varLambda _{k})} \bigl( \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2} \bigr). $$
(3.3)

Lemma 3.1 quantifies the reduction of \(\|r_{k}\|\) at each iteration provided that T is not contained in \(\varLambda _{k}\) and that \(\|r_{k}\| \geq \|f-g\|\). Note that in the case when \(T\subseteq \varLambda _{k}\), we have \(\|r_{k}\| \leq \|f-g\|\).

Proof

We may assume that \(\|r_{k}\|\geq \|f-g\|\), otherwise inequality (3.3) is trivially satisfied. Denote

$$ F_{k+1}:=\operatorname{span}(\phi _{i},i\in I_{k+1}). $$

Then \(H_{k+1}\) is a direct sum of \(H_{k}\) and \(F_{k+1}\). Therefore,

$$\begin{aligned} r_{k+1}=f-f_{k+1} =&r_{k}+f_{k}-f_{k+1} \\ =&r_{k}+f_{k}-P_{H_{k+1}}(f) \\ =&r_{k}+f_{k}-P_{H_{k+1}}(r_{k}+f_{k}) \\ =&r_{k}-P_{H_{k+1}}(r_{k}). \end{aligned}$$

It is clear that \(F_{k+1}\subset H_{k+1}\) implies

$$\begin{aligned} \Vert r_{k+1} \Vert ^{2} =& \bigl\Vert r_{k}-P_{H_{k+1}}(r_{k}) \bigr\Vert ^{2} \\ \leq & \bigl\Vert r_{k}-P_{F_{k+1}}(r_{k}) \bigr\Vert ^{2} \\ =& \Vert r_{k} \Vert ^{2}- \bigl\Vert P_{F_{k+1}}(r_{k}) \bigr\Vert ^{2}. \end{aligned}$$
(3.4)

Now we estimate \(\|P_{F_{k+1}}(r_{k})\|\). Since the dictionary \(\mathcal{D}\) satisfies RIP of order s with isometry constant \(\delta _{s}\), we have

$$ \biggl\Vert \sum _{i\in I_{k+1}} a_{i} \phi _{i} \biggr\Vert ^{2}\leq (1+\delta _{s}) \biggl( \sum _{i\in I_{k+1}} \vert a_{i} \vert ^{2}\biggr). $$
(3.5)

Thus

$$\begin{aligned} \bigl\Vert P_{F_{k+1}}(r_{k}) \bigr\Vert =&\sup _{\psi \in F_{k+1}, \Vert \psi \Vert \leq 1} \bigl\vert \bigl\langle P_{F_{k+1}}(r_{k}),\psi \bigr\rangle \bigr\vert \\ =&\sup _{(c_{i})_{i\in I_{k+1}}, \Vert \sum _{i\in I_{k+1}} c_{i}\phi _{i} \Vert \leq 1} \biggl\vert \sum_{i\in I_{k+1}} \langle r_{k},\phi _{i}\rangle c_{i} \biggr\vert \\ \geq &\sup _{(c_{i})_{i\in I_{k+1}},\sum _{i\in I_{k+1}} \vert c_{i} \vert ^{2} \leq (1+\delta _{s})^{-1}} \biggl\vert \sum _{i\in I_{k+1}}\langle r_{k},\phi _{i} \rangle c_{i} \biggr\vert \\ =&(1+\delta _{s})^{-1/2}\biggl(\sum _{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i} \rangle \bigr\vert ^{2}\biggr)^{1/2}. \end{aligned}$$
(3.6)

Using inequality (3.6), we can continue to estimate (3.4) as

$$\begin{aligned} \Vert r_{k+1} \Vert ^{2} \leq & \Vert r_{k} \Vert ^{2}-(1+\delta _{s})^{-1}\biggl( \sum _{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i}\rangle \bigr\vert ^{2}\biggr) \\ \leq & \Vert r_{k} \Vert ^{2}-(1+\delta _{s})^{-1}\Bigl(\max_{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i}\rangle \bigr\vert ^{2} \Bigr). \end{aligned}$$

Therefore to prove inequality (3.3), it suffices to prove that

$$ (1+\delta _{s})^{-1}\Bigl(\max _{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i}\rangle \bigr\vert ^{2}\Bigr)\geq \frac{1-\delta _{\sharp (T\cup \varLambda _{k})}}{(1+\delta _{s})\sharp (T\backslash \varLambda _{k})} \bigl( \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2}\bigr). $$
(3.7)

To prove this, we first note that

$$\begin{aligned} 2\sqrt{ \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2}} \Vert g-f_{k} \Vert \leq & \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2}+ \Vert g-f_{k} \Vert ^{2} \\ =& \Vert r_{k} \Vert ^{2}- \Vert g-f_{k}-r_{k} \Vert ^{2}+ \Vert g-f_{k} \Vert ^{2} \\ \leq & 2 \bigl\vert \langle g-f_{k},r_{k}\rangle \bigr\vert =2 \bigl\vert \langle g,r_{k}\rangle \bigr\vert . \end{aligned}$$

This inequality can be written as

$$ \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2} \leq \frac{ \vert \langle g,r_{k}\rangle \vert }{ \Vert g-f _{k} \Vert ^{2}}. $$
(3.8)

If we write \(f_{k}=\sum_{i\in \varLambda _{k}}c_{i}^{k}\phi _{i}\), \(\delta :=\delta _{\sharp (T\cup \varLambda _{k})}\), then the denominator of the right-hand side of inequality (3.8) satisfies the RIP

$$\begin{aligned} \Vert g-f_{k} \Vert ^{2} =& \biggl\Vert \sum _{i\in T}z_{i}\phi _{i}-\sum _{i\in \varLambda _{k}}c _{i}^{k}\phi _{i} \biggr\Vert ^{2} \\ =& \biggl\Vert \sum_{i\in T\backslash \varLambda _{k}}z_{i}\phi _{i}+ \sum_{i\in T \cap \varLambda _{k}}\bigl(z_{i}-c_{i}^{k} \bigr)\phi _{i}+ \sum_{i\in \varLambda _{k}\backslash T } \bigl(-c_{i}^{k}\bigr)\phi _{i} \biggr\Vert ^{2} \\ \geq & (1-\delta ) \biggl(\sum_{i\in T\backslash \varLambda _{k}} \vert z_{i} \vert ^{2}+ \sum_{i\in T \cap \varLambda _{k}} \bigl\vert z_{i}-c_{i}^{k} \bigr\vert ^{2}+ \sum_{i\in \varLambda _{k}\backslash T } \bigl\vert c_{i}^{k} \bigr\vert ^{2}\biggr) \\ \geq &(1-\delta ) \biggl(\sum_{i\in T\backslash \varLambda _{k}} \vert z_{i} \vert ^{2}\biggr). \end{aligned}$$

On the other hand, the numerator of the right-hand side of inequality (3.8) satisfies

$$\begin{aligned} \bigl\vert \langle g,r_{k}\rangle \bigr\vert ^{2} =& \biggl\vert \biggl\langle \sum_{i\in T}z_{i} \phi _{i},r _{k}\biggr\rangle \biggr\vert ^{2} \\ =& \biggl\vert \biggl\langle \sum_{i\in T\backslash \varLambda _{k}}z_{i} \phi _{i},r_{k} \biggr\rangle \biggr\vert ^{2} \\ \leq & \biggl(\sum_{i\in T\backslash \varLambda _{k}} \vert z_{i} \vert \bigl\vert \langle \phi _{i},r _{k}\rangle \bigr\vert \biggr)^{2} \\ \leq & \max_{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i}\rangle \bigr\vert ^{2}\biggl( \sum _{i\in T\backslash \varLambda _{k}} \vert z_{i} \vert \biggr)^{2} \\ \leq & \max_{i\in I_{k+1}} \bigl\vert \langle r_{k},\phi _{i}\rangle \bigr\vert ^{2}\sharp (T\backslash \varLambda _{k})\sum_{i\in T\backslash \varLambda _{k}} \vert z_{i} \vert ^{2}. \end{aligned}$$

Therefore we obtain

$$ \Vert r_{k} \Vert ^{2}- \Vert f-g \Vert ^{2} \leq \frac{\sharp (T\backslash \varLambda _{k}) (\max_{i\in I_{k+1}} \vert \langle r_{k},\phi _{i}\rangle \vert ^{2}) }{1- \delta _{\sharp (T\cup \varLambda _{k})}}, $$

which implies (3.7). Thus we complete the proof of Lemma 3.1. □

The following proposition is an immediate consequence of Lemma 3.1.

Proposition 3.1

Assume that, for a given integer \(A>0\) and \(\delta ^{\ast }<1\), a dictionary \(\mathcal{D}\) satisfies RIP of order \((As+1)n\) with \(\delta _{(As+1)n}\leq \delta ^{\ast }\). If \(g=\sum_{i\in T} z_{i}\phi _{i}\), \(\sharp (T)\leq n\), then for any triple of non-negative integers \((j,m,L)\) such that \(\sharp (T\cup \varLambda _{j})\leq m\) and \(j+mL \leq (As+1)n\), we have

$$ \Vert r_{j+mL} \Vert ^{2}\leq e^{-\frac{1-\delta ^{\ast }}{1+\delta ^{\ast }}L} \Vert r_{j} \Vert ^{2}+ \Vert f-g \Vert ^{2}. $$
(3.9)

Proof

By Lemma 3.1, if \(g=\sum_{i\in T} z_{i}\phi _{i}\), \(\sharp (T) \leq n\), then for any triple of non-negative integers \((j,m,L)\) such that \(\sharp (T\cup \varLambda _{j})\leq m\) and \(j+mL\leq (As+1)n\), we have

$$\begin{aligned} \Vert r_{j+mL} \Vert ^{2}- \Vert f-g \Vert ^{2} \leq &\biggl(1-\frac{1-\delta ^{\ast }}{(1+ \delta ^{\ast })m}\biggr)^{mL}\max \bigl\{ 0, \Vert r_{j} \Vert ^{2}- \Vert f-g \Vert ^{2}\bigr\} \\ \leq & e^{-\frac{1-\delta ^{\ast }}{1+\delta ^{\ast }}L}\max \bigl\{ 0, \Vert r _{j} \Vert ^{2}- \Vert f-g \Vert ^{2}\bigr\} , \end{aligned}$$

where we have used the fact that \(\sharp (T\cup \varLambda _{l})\leq m\) for all \(l\geq j\). This implies inequality (3.9). Thus we complete the proof of Proposition 3.1. □

Lebesgue-type inequality for SOGA

We are in a position to prove Theorem 3.1. Fix \(f\in \mathit{H}\), we first observe that the assertion of the theorem can be derived from the following lemma.

Lemma 3.2

If \(0\leq k< n\) satisfies

$$ \Vert r_{Ak} \Vert \leq 2\sigma _{k}(f), \quad \sigma _{k}(f) >4\sigma _{n}(f), $$
(3.10)

then there exists \(k<\tilde{k}\leq n\) such that

$$ \Vert r_{A\tilde{k}} \Vert \leq 2\sigma _{\tilde{k}}(f),\quad \sigma _{\tilde{k}}(f) \leq 4\sigma _{n}(f). $$
(3.11)

Indeed, assuming that Lemma 3.2 holds, we complete the proof of Theorem 3.1 as follows. We let be the largest integer in \(\{0,1,\ldots ,n\}\) for which \(\|r_{A\tilde{k}}\|\leq 2\sigma _{ \tilde{k}}(f)\). Since

$$ \Vert r_{0} \Vert =\sigma _{0}= \Vert f \Vert , $$

such exists. If \(\tilde{k}< n \), then we must have \(\sigma _{\tilde{k}}(f)\leq 4\sigma _{n}(f)\), and therefore

$$ \Vert r_{An} \Vert \leq \Vert r_{A\tilde{k}} \Vert \leq 2\sigma _{\tilde{k}}(f)\leq 8 \sigma _{n}(f). $$
(3.12)

We are therefore left with proving Lemma 3.2. For this, we fix

$$ \delta ^{\ast }=\frac{1}{10}, $$
(3.13)

and \(0\leq k< n\) such that (3.10) holds. Let \(k< K\leq n\) be the first integer such that \(\sigma _{k} >4\sigma _{K}\). By the definition of \(\sigma _{K}(f)\), for any \(B>1\), there exists \(g=\sum_{j\in T} z _{j}\psi _{j}\), \(\sharp (T)=K\) such that

$$ \Vert f-g \Vert \leq B\sigma _{K}(f). $$

The significance of K is that on the one hand

$$ \Vert f-g \Vert \leq B\sigma _{K}(f)< \frac{B}{4}\sigma _{k}(f), $$
(3.14)

while on the other hand

$$ \sigma _{k}(f)\leq 4\sigma _{K-1}(f). $$
(3.15)

To apply Proposition 3.1 for the above g and \(j=Ak\), we need to bound \(\sharp (T\cup \varLambda _{Ak})\) with A yet to be specified. To this end, we write \(K=k+M\), with \(M>0\), and notice that if \(S\subset T\) is any set with \(\sharp (S)=M\) and \(g_{S}:=\sum_{j\in S} z _{j}\psi _{j}\), then

$$\begin{aligned} \Vert g_{S} \Vert \geq & \bigl\Vert f-(g-g_{S}) \bigr\Vert - \Vert f-g \Vert \\ \geq &\sigma _{k}(f)-B\sigma _{K}(f) \\ \geq & \biggl(1-\frac{B}{4} \biggr)\sigma _{k}(f), \end{aligned}$$
(3.16)

where we have used the fact that \(g-g_{S}\in \varSigma _{k}\). Using the RIP, we obtain the following lower bound for the coefficients of g: for any set \(S\subset T\) with \(\sharp (S)=M\),

$$ \biggl(1-\frac{B}{4} \biggr)^{2}\sigma ^{2}_{k}(f)\leq \Vert g_{S} \Vert ^{2} \leq \bigl(1+\delta ^{\ast }\bigr)\sum _{j\in S} \vert z_{j} \vert ^{2}= \frac{11}{10}\sum_{j \in S} \vert z_{j} \vert ^{2}. $$
(3.17)

Take for S the set \(S_{g}\) of the M smallest coefficients of g and note that then, for any more general \(S\subset T\) with \(\sharp (S) \geq M\), one has

$$ \biggl(\sum_{j\in S} \vert z_{j} \vert ^{2}\biggr)\bigg/\biggl(\sum_{j\in S_{g}} \vert z_{j} \vert ^{2}\biggr)\geq \sharp (S)/ M, $$

and hence

$$ \frac{10}{11} \biggl(1-\frac{B}{4} \biggr)^{2}\frac{\sharp (S)}{M} \sigma ^{2}_{k}(f) \leq \sum_{j\in S} \vert z_{j} \vert ^{2}. $$
(3.18)

Now we consider the particular set \(S:=T\backslash \varLambda _{Ak}\) satisfying \(\sharp (S)\geq M\). Combining the above bound with the RIP, we obtain

$$\begin{aligned} \frac{10}{11} \biggl(1-\frac{B}{4} \biggr)^{2} \frac{\sharp (S)}{M} \sigma ^{2}_{k}(f) \bigl(1-\delta ^{\ast }\bigr) \leq & \Vert g_{S} \Vert ^{2} \leq \Vert g-f _{Ak} \Vert ^{2} \\ \leq &\bigl( \Vert g-f \Vert + \Vert r_{Ak} \Vert \bigr)^{2} \\ \leq &\bigl(B\sigma _{K}(f)+2\sigma _{k}(f) \bigr)^{2} \\ \leq & \biggl(\frac{B}{4}+2 \biggr)^{2}\sigma ^{2}_{k}(f). \end{aligned}$$

Since \(\delta ^{\ast }=\frac{1}{10}\) this gives the bound

$$ \sharp (T\backslash \varLambda _{Ak})\leq \frac{9}{11} \frac{(\frac{B}{4}+2)^{2}}{(1- \frac{B}{4})^{2}}M\leq 11M, $$
(3.19)

where the second inequality is obtained by taking B sufficiently close to 1.

We proceed now to verifying Lemma 3.2 with \(\tilde{k}=K-1\) when \(K-1> k\) and with \(\tilde{k}=k+1\) otherwise. In the first we can use the estimate in Proposition 3.1 with \(j=Ak\) in combination with (3.15) to deal with the term \(\|r_{Ak}\|\) in (3.9). When \(K=k+1\), however, we cannot bound \(\|r_{Ak}\|\) directly in terms of \(\sigma _{l}(f)\) for some \(l>k\). Accordingly, we use Proposition 3.1 in different ways for the two cases.

In the case where \(K-1> k\), i.e., \(M \geq 2\), we apply (3.9) with \(j=Ak\), \(m=11M\), and \(L=4\). Indeed \(Ak+Lm=Ak+44M\leq An\) holds for \(k+ M\leq n\) whenever \(A\geq 44\). Moreover, notice that for such A

$$\begin{aligned} A(K-1) =&Ak+A(M-1) \\ \geq &Ak+\frac{1}{2}AM \\ =&Ak+\frac{Am}{22} \\ =&Ak+Lm, \end{aligned}$$

whenever

$$ A\geq 88. $$
(3.20)

This gives

$$\begin{aligned} \Vert r_{A(K-1)} \Vert ^{2} \leq & \Vert r_{Ak+mL} \Vert ^{2} \\ \leq &e^{-\frac{36}{11}} \Vert r_{Ak} \Vert ^{2}+ \Vert f-g \Vert ^{2} \\ \leq & e^{-\frac{36}{11}} 4\sigma _{k}^{2}(f)+B^{2} \sigma _{K}^{2}(f) \\ \leq & e^{-\frac{36}{11}} 64\sigma _{K-1}^{2}(f)+B^{2} \sigma _{K-1} ^{2}(f) \\ \leq & 4\sigma _{K-1}^{2}(f), \end{aligned}$$

where we have used (3.15) in the fourth inequality, and the last inequality follows by taking B sufficiently close to 1. We thus obtain (3.11) for the value \(\tilde{k}=K-1>k\).

In the case where \(K-1= k\), i.e., \(M =1\), we apply (3.9) with \(j=Ak\), \(m=11\), and \(L=8\). From (3.19), we know that \(\sharp (T \cup \varLambda _{Ak})\leq 11\) and \(An\geq A(k+1)\geq Ak+mL\) for A satisfying (3.20). This yields

$$\begin{aligned} \Vert r_{A(k+1)} \Vert ^{2} \leq & \Vert r_{Ak+mL} \Vert ^{2} \\ \leq &e^{-\frac{72}{11}} \Vert r_{Ak} \Vert ^{2}+ \Vert f-g \Vert ^{2} \\ \leq & e^{-\frac{72}{11}} 4\sigma _{k}^{2}(f)+B^{2} \sigma _{k+1}^{2}(f) \\ \leq & \biggl(4 e^{-\frac{72}{11}} +\frac{B^{2}}{16} \biggr)\sigma _{k}^{2}(f). \end{aligned}$$

This implies that \(\varLambda _{A(k+1)}\) contains T. In fact, if it missed one of the indices \(i\in T\), then we derive from the RIP condition

$$\begin{aligned} \bigl(1-\delta ^{\ast }\bigr) \vert z_{i} \vert ^{2} \leq & \Vert g-f_{A(k+1)} \Vert ^{2} \\ \leq &\bigl( \Vert f-g \Vert + \Vert r_{A(k+1)} \Vert \bigr)^{2} \\ \leq & \biggl(B\sigma _{K}(f)+\sqrt{4 e^{-\frac{72}{11}} + \frac{B^{2}}{16}}\sigma _{k}(f)\biggr)^{2} \\ \leq & \biggl(\frac{B}{4}+\sqrt{4 e^{-\frac{72}{11}} + \frac{B^{2}}{16}} \biggr)\sigma ^{2}_{k}(f). \end{aligned}$$

On the other hand, we know from (3.18) that

$$ \frac{10}{11} \biggl(1-\frac{B}{4} \biggr)^{2}\sigma ^{2}_{k}(f)\leq \vert z _{i} \vert ^{2}, $$
(3.21)

which for B sufficiently close to 1 is a contradiction since

$$ \frac{10}{11} \biggl(1-\frac{B}{4} \biggr)^{2} > \frac{10}{9} \biggl(\frac{B}{4}+\sqrt{4 e^{-\frac{72}{11}} + \frac{B^{2}}{16}} \biggr). $$

This implies that \(\|r_{A(k+1)}\|\leq \sigma _{k+1}(f)\), and therefore inequality (3.11) holds for the value \(\tilde{k}=k+1\). This verifies Lemma 3.2 and hence completes the proof of Theorem 3.1.

Convergence rate on sparse class

As an application of Theorem 3.1, we will derive the convergence rate of SOGA on the sparse class induced by a dictionary. Firstly, we recall the definitions of these classes. For a general dictionary \(\mathcal{D}\) in H, we define the class of functions

$$ A_{1}^{0}(\mathcal{D}):= \biggl\{ f\in H:f=\sum _{i\in \varLambda }c_{i}(f)\phi _{i}, \phi _{i}\in \mathcal{D}, \sharp \varLambda < \infty , \sum _{i\in \varLambda } \bigl\vert c_{i}(f) \bigr\vert \leq 1 \biggr\} $$

and we define \(A_{1}(\mathcal{D})\) to be the closure of \(A_{1}^{0}( \mathcal{D})\) in H. It is well known that the class \(A_{1}( \mathcal{D})\) plays an important role in the study of greedy approximation with respect to dictionaries. In [23], DeVore and Temlyakov proved that, for an arbitrary dictionary \(\mathcal{D}\) in H, the OGA provides, after m iterations, an approximation of \(f\in A_{1}(\mathcal{D})\) with the following upper bound of the convergence rate:

$$ r_{m}^{\mathrm{OGA}}(f)\leq m^{-1/2}. $$

Note that the rate \(m^{-1/2}\) is sharp since when \(\mathcal{D}\) is an ortho-normal basis of H, it is easy to find \(f_{0}\in A_{1}( \mathcal{D})\) such that

$$ r_{m}^{\mathrm{OGA}}(f_{0})=c\cdot m^{-1/2}. $$

Unlike OGA, to get the same convergence rate on \(A_{1}(\mathcal{D})\) for SOGA, one must make some extra assumptions on the dictionary \(\mathcal{D}\). A specific feature of a dictionary, μ-coherence, allows us to build OSGA with the same rate of convergence. Assuming that the dictionary \(\mathcal{D}\) is μ-coherent, Liu and Temlyakov in [8] proved the following theorem.

Theorem 3.2

If \(\mathcal{D}\) is a dictionary in a Hilbert space H with coherence parameter μ. Then, for \(s\leq (2\mu )^{-1}\), OSGA(s) provides, after m iterations, an approximation of \(f\in A_{1}( \mathcal{D})\) with the following upper bound on the error:

$$ \Vert r_{m} \Vert ^{2}= \Vert f-f_{m} \Vert ^{2} \leq 40.5(sm)^{-1}. $$

Now we present our results. By an immediate consequence of Theorem 3.1, we get the following theorem.

Theorem 3.3

Given parameter \(s\in \mathbb{N}\), for all \(n\geq 0\), if \(\mathcal{D}\) is a dictionary in a Hilbert space H satisfying RIP of order \((88s+1)n\) with isometry constant \(\delta _{(88s+1)n}\leq 1/10\), then OSGA(s) provides, after m iterations, an approximation of \(f\in A_{1}(\mathcal{D})\), with the following error bound:

$$ \Vert r_{88n} \Vert = \Vert f-f_{88n} \Vert \leq Cn^{-\frac{1}{2}}. $$

Proof

As the discussion above, the results of DeVore and Temlyakov [8] imply the following estimate for the best m term approximation error of functions \(f\in A_{1}(\mathcal{D})\):

$$ \sigma _{n}(f,\mathcal{D})\leq n^{-\frac{1}{2}},\quad n = 1,2,\ldots. $$
(3.22)

Combining (3.22) with Theorem 3.1, we complete the proof of Theorem 3.3. □

References

  1. DeVore, R.A.: Nonlinear approximation. Acta Numer. 7, 51–150 (1998)

    Article  Google Scholar 

  2. Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal Matching Pursuit: Recursive Function Approximation with Applications to Wavelet Decomposition (1993)

    Google Scholar 

  3. Lin, J.H., Li, S.: Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory 165, 20–40 (2013)

    MathSciNet  Article  Google Scholar 

  4. Liu, E., Temlyakov, V.N.: Super greedy type algorithm. Adv. Comput. Math. 37, 493–504 (2012)

    MathSciNet  Article  Google Scholar 

  5. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26, 301–321 (2009)

    MathSciNet  Article  Google Scholar 

  6. Cai, T., Wang, L.: Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory 57, 4680–4688 (2011)

    MathSciNet  Article  Google Scholar 

  7. Cohen, A., Dahmen, W., DeVore, R.A.: Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 45, 113–127 (2017)

    MathSciNet  Article  Google Scholar 

  8. Liu, E., Temlyakov, V.N.: The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory 58, 2040–2047 (2012)

    MathSciNet  Article  Google Scholar 

  9. Wei, D.: Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math. 57, 2179–2188 (2014)

    MathSciNet  Article  Google Scholar 

  10. Wang, J., Kwon, S., Shim, B.: Generalized orthogonal matching pursuit. IEEE Trans. Signal Process. 60, 6202–6216 (2012)

    MathSciNet  Article  Google Scholar 

  11. Wei, X.J., Ye, P.X.: Estimates of restricted isometry constant in super greedy algorithms. Int. J. Future Gener. Commun. Netw. 8, 137–144 (2015)

    Article  Google Scholar 

  12. Yi, S., Li, B., Pan, W.L., Li, J.: Analysis of generalised orthogonal matching pursuit using restricted isometry constant. Electron. Lett. 14, 1020–1022 (2014)

    Google Scholar 

  13. Baraniuk, R.G.: Compressive sensing. IEEE Signal Process. Mag. 24, 118–121 (2007)

    Article  Google Scholar 

  14. Candes, E.J.: Compressive sampling. In: Int. Congress of Mathematics, vol. 3, pp. 1433–1452 (2006)

    Google Scholar 

  15. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006)

    MathSciNet  Article  Google Scholar 

  16. Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51, 4203–4215 (2005)

    MathSciNet  Article  Google Scholar 

  17. Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52, 5406–5425 (2005)

    MathSciNet  Article  Google Scholar 

  18. Satpathi, S., Das, R., Chakraborty, M.: Improving the bound on the rip constant in generalized orthogonal matching pursuit. IEEE Signal Process. Lett. 20, 1074–1077 (2013)

    Article  Google Scholar 

  19. Candes, E.J., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)

    MathSciNet  Article  Google Scholar 

  20. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–451 (2004)

    MathSciNet  Article  Google Scholar 

  21. Cai, T., Xu, G., Zhang, J.: On recovery of sparse signals via \(l_{1}\) minimization. IEEE Trans. Inf. Theory 55, 3388–3397 (2009)

    Article  Google Scholar 

  22. Tropp, J.A.: Greedy is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)

    Article  Google Scholar 

  23. DeVore, R.A., Temlyakov, V.N.: Some remarks on greedy algorithms. Adv. Comput. Math. 5, 173–187 (1996)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

Thanks for all the referenced authors.

Availability of data and materials

Not applicable.

Funding

This work is supported by the National Nature Science Foundation of China [Grant Nos. 11701411, 11271199, 11671213, 11401247].

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally and significantly in writing this paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiujie Wei.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wei, X., Ye, P. Efficiency of orthogonal super greedy algorithm under the restricted isometry property. J Inequal Appl 2019, 124 (2019). https://doi.org/10.1186/s13660-019-2075-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-019-2075-x

Keywords

  • Orthogonal super greedy algorithm
  • Orthogonal multi matching pursuit
  • Restricted isometry property
  • Compressed sensing
  • m-term approximation
  • Lebesgue-type inequality