# Random sampling and approximation of signals with bounded derivatives

## Abstract

Approximation of analog signals from noisy samples is a fundamental, but nevertheless difficult problem. This paper addresses the problem of approximating functions in $$H_{\gamma , \varOmega }$$ from randomly chosen samples, where

$$H_{\gamma , \varOmega }= \bigl\{ f \mid f\mbox{ is continuous on } \overline{\varOmega }, \mbox{and } \|D f\|_{L_{\infty }(\varOmega )} \le \gamma \|f\|_{L_{\infty }(\varOmega ) } \bigr\} .$$

We are concerned with the probability that functions in $$H_{\gamma , \varOmega }$$ can be approximated from the noisy samples stably and how they can be approximated.

By calculating the upper bound of the covering number of a subset of $$H_{\gamma , \varOmega }$$ and using the uniform law of large numbers, we conclude that functions in $$H_{\gamma , \varOmega }$$ can be recovered stably with overwhelming probability provided that the sampling noise satisfies some mild conditions and the sampling size is sufficiently large. Furthermore, an $$\ell _{\infty }$$-regularized least squares model is proposed to approximate functions from noisy samples. The alternating direction method of multipliers (ADMM) algorithm is then applied to solve the model. In the end, numerical experiments are presented and discussed to illustrate the efficiency of our method.

## Introduction

In modern digital data processing of signals, such as sound, image or video processing, one always uses a discretized version of the original analog functions . Then how the analog function can be recovered from its samples and whether the reconstruction is stable are fundamental problems in sampling theory . For example, the Shannon–Whittaker sampling theorem says for each $$f\in L_{2}(\mathbb{R})$$ and $$\operatorname{supp} (\hat{f})\subseteq [-\frac{1}{2},\frac{1}{2}]$$, it can be completely recovered from the sample points $$\{f(j): j\in \mathbb{Z} \}$$ by the formula $$f(x)=\sum_{j\in \mathbb{Z}}f(j) \frac{\sin \pi (x-j)}{ \pi (x-j)}$$, where the convergence is uniform on $$\mathbb{R}$$ and also in $$L_{2}(\mathbb{R})$$.

To characterize the conditions under which it is possible to recover particular classes of functions from the sampling points stably, we introduce the following sampling inequalities:

$$m_{p}\|f\|_{L_{p}(\mathbb{R}^{d})}^{p}\le \sum _{x_{j}\in X} \bigl\vert f(x_{j}) \bigr\vert ^{p} \le M_{p}\|f\|_{L_{p}(\mathbb{R}^{d})}^{p} \quad \forall f\in V,$$
(1.1)

where $$m_{p}$$ and $$M_{p}$$ are positive constants independent of f and $$1\le p <\infty$$. The set $$X=\{x_{j}:j\in J\}$$ is said to be a stable set of sampling for function class $$V\subseteq L_{p}(\mathbb{R}^{d})$$ if the inequalities (1.1) hold (see e.g. [2, 3]). The sampling inequalities (1.1) imply that a small perturbation of f causes only a small change of sampled values $$\{f(x_{j})\}$$, and vice versa. This means that the sampling is a stable process and the reconstruction of functions in V from X is continuous. For the Shannon–Whittaker theorem, since the functions $$\{\frac{ \sin \pi (\cdot -j)}{\pi (\cdot -j)} \}_{j \in \mathbb{Z} }$$ form an orthonormal basis for the space of bandlimited functions with highest cycle frequency $$\frac{1}{2}$$, we have $$\|f\|_{L_{2}(\mathbb{R})}^{2} = \sum_{j\in \mathbb{Z}}|f(j)|^{2}$$ and $$m_{2}=M_{2}=1$$.

In the past years, there has been a considerable body of research on developing reconstruction algorithms and characterizing the conditions of stable set of sampling for various function classes. For instance, Aldroubi and Gröchenig  investigated the nonuniform sampling and reconstruction in shift-invariant spaces. Xian and Li [4, 5] studied the sampling set conditions and applications of weighted finitely generated shift-invariant spaces. Sun and Zhou , and Sun  characterized the local sampling in spline subspaces and shift-invariant spaces, respectively.

This paper addresses the problem of sampling and approximation of functions with bounded derivatives. We define the following class of functions on $$\varOmega \subset \mathbb{R}^{d}$$:

$$H_{\gamma , \varOmega }:= \bigl\{ f \mid f\mbox{ is continuous on } \overline{ \varOmega }, \mbox{and } \|D f\|_{L_{\infty }(\varOmega )} \le \gamma \|f\|_{L_{\infty }(\varOmega ) } \bigr\} ,$$
(1.2)

where $$|Df|=|D_{x_{1}}f|+ |D_{x_{2}}f|+ \cdots +|D_{x_{d}}f|$$, and $$D_{x_{i}}=\frac{\partial }{\partial {x_{i}}}$$ is the weak derivative of order 1. The continuity of f at the boundary of Ω means that f can be continuously extended to Ω̅. The parameter γ characterizes the degree of oscillation of functions in $$H_{\gamma , \varOmega }$$. For simplicity, throughout this paper, we always assume that $$\varOmega =(0, 1)^{d}$$. Sampling in $$H_{\gamma , \varOmega }$$ is an appropriate model for many applications, in particular in signal and image processing . Moreover, by Bernstein’s inequality , if $$f\in L_{2}(\mathbb{R})$$ and $$\operatorname{supp} (\hat{f})\subseteq [-\frac{\gamma }{4\pi },\frac{ \gamma }{4\pi }]$$, then $$f \in H_{\gamma , \mathbb{R}}$$. In other words, the space of bandlimited functions is a special case of $$H_{\gamma , \varOmega }$$.

Random sampling approaches were used in a variety of fields, such as statistical learning theory , compressed sensing , image processing , and many others . Recently, Bass and Gröchenig in  studied the stability problem of random sampling in the space of bandlimited functions. Yang et al. [14, 15] and Führ et al.  discussed the stability conditions and applications of random sampling in shift-invariant spaces.

The purpose of this paper is to investigate random sampling of functions in $$H_{\gamma , \varOmega }$$. Let $$\{(x_{j}, y_{j})\}_{j=1}^{n}$$ be the sampling of $$f\in H_{\gamma , \varOmega }$$, where $$\{x_{j}\}$$ is uniformly drawn from Ω, and $$y_{j}=f(x_{j}) +\epsilon _{j}$$ with $$\epsilon _{j}$$ being a random noise. We consider the probability that $$\{(x_{j}, y_{j})\}_{j=1}^{n}$$ is a stable set of sampling for $$H_{\gamma , \varOmega }$$, and how f can be approximated from these samples. By estimating the upper bound of the capacity of $$H_{\gamma , \varOmega }$$ and applying the uniform law of large numbers of the sampling values, we conclude that with overwhelming probability, the sampling inequalities (1.1) hold uniformly for all functions in $$H_{\gamma , \varOmega }$$ when the sampling noise satisfies some mild conditions. Furthermore, an $$\ell _{\infty }$$-regularized least squares model is proposed, and the corresponding numerical algorithm is discussed.

The rest of this paper is structured as follows. Section 2 introduces some mathematical notation and the covering number of $$H_{\gamma , \varOmega }$$. Section 3 characterizes the stability properties of random sampling in $$H_{\gamma , \varOmega }$$. Section 4.1 presents an optimization model and the corresponding numerical algorithm to approximate $$f\in H_{\gamma , \varOmega }$$ from its noisy samples. Section 4.2 discusses some numerical experiments. Finally, Sect. 5 concludes and discusses future directions.

## Preliminaries

We first introduce some notation. Let $$\mathbb{N}$$ denote the set of positive integers. As usual, for $$x\in \mathbb{R}$$, let $$\lfloor x \rfloor$$ denote the largest integer smaller than or equal to x, and $$\lceil x \rceil$$ denote the smallest integer greater than or equal to x. For $$x=(x_{1}, \ldots , x_{d})$$, $$y=(y _{1}, \ldots , y_{d})\in \mathbb{R}^{d}$$, let $$|x-y|:=\max_{1\le i \le d} \{|x_{i} -y_{i}| \}$$.

For two sets A and B, the Cartesian product $$A \times B$$ is the set of all ordered pairs defined by $$A\times B=\{ (a,b)\mid a\in A \mbox{ and } b\in B \}$$. Similarly, it can be generalized to an m-ary Cartesian product over m sets.

Let $$\mathcal{M}$$ be a Lebesgue measurable set. For $$1 \leq p < \infty$$, we denote by $$L_{p}(\mathcal{M})$$ the Banach space of all functions such that

$$\|f\|_{L_{p}(\mathcal{M})}:= \biggl( \int _{\mathcal{M}} \bigl\vert f(x) \bigr\vert ^{p} \,dx \biggr) ^{1/p} < \infty \quad \mbox{for } 1 \leq p < \infty ,$$

and $$\|f\|_{{L_{\infty }(\mathcal{M})}}$$ is the essential supremum of f on $$\mathcal{M}$$.

We can similarly define $$\ell _{p}=\ell _{p}(\mathbb{Z}^{d})$$ the Banach space of all sequences $$a=(a_{k})_{k \in \mathbb{Z}^{d}}$$ such that $$\|a\|_{\ell _{p}} < \infty$$, where

$$\|a\|_{\ell _{p}}:= \biggl(\sum_{k \in \mathbb{Z}^{d}}|a_{k}|^{p} \biggr)^{1/p} \quad \mbox{for } 1 \leq p < \infty ,$$

and $$\|a\|_{\ell _{\infty }}$$ is the supremum of a on $$\mathbb{Z} ^{d}$$.

The first main result of this paper is to estimate the probability such that the sampling inequalities (1.1) hold uniformly for all functions in $$H_{\gamma , \varOmega }$$. Such similar problems are classical in statistical learning theory . The most powerful tool used there is the capacity of the involved function set and the uniform law of large numbers [17,18,19]. Our analysis is following this strategy.

Since the covering number is a very convenient and powerful tool for metric space, we choose it to characterize the capacity of function sets. Let S be a metric space. For any $$\eta > 0$$, the covering number $$\mathcal{N}(S,\eta )$$ is the minimal number of balls with radius η that can cover S. When S is compact, $$\mathcal{N}(S, \eta )$$ is finite for any given $$\eta > 0$$.

It is not hard to verify that any $$f\in H_{\gamma , \varOmega }$$ satisfies the inequalities (1.1) if and only if $$f/\|f\|_{L_{\infty }( \mathbb{R}^{d})}$$ does. So we consider the subset

$$H_{\gamma , \varOmega }^{\ast }:= \bigl\{ f \in H_{\gamma , \varOmega }, \mbox{and } \Vert f \Vert _{L_{\infty }(\varOmega ) } = 1 \bigr\} .$$
(2.1)

The following proposition gives an upper bound for the covering number of $$H_{\gamma , \varOmega }^{\ast }$$, and the proof follows the line of argument in [9, Proposition 5.4] where $$d=1$$.

### Proposition 2.1

Let $$H_{\gamma , \varOmega }^{\ast }$$ be defined by (2.1). For any $$\eta >0$$, the covering number of $$H_{\gamma , \varOmega }^{ \ast }$$ with respect to $$\|\cdot \|_{L_{\infty }}$$

$$\mathcal{N}\bigl(H_{\gamma , \varOmega }^{\ast }, \eta \bigr)\le \exp \biggl( \ln 2 + \frac{2}{\eta } +\biggl( \frac{4 \gamma }{\eta }\biggr) ^{d} \ln 3 \biggr).$$

### Proof

We first define a set of regular grid points in Ω with equal distance $$\delta =\frac{\eta }{4\gamma }$$. Let $$\mathrm{I}=\{1, 2, \ldots , \lfloor \frac{1}{\delta } \rfloor \}$$ and $$i:=(i_{1}, i_{2}, \ldots , i_{d}) \in {\mathrm{I}}^{d}$$. For each $$i_{\ell } \in \mathrm{I}$$, set $$x_{i_{\ell }}={i_{\ell }} \delta$$. Then one can check that the point set

$$X:=\bigl\{ x_{i}=(x_{i_{1}}, x_{i_{2}}, \ldots , x_{i_{d}} ) \mid x_{i _{\ell }}={i_{\ell }} \delta , i \in \mathrm{I}^{d} \bigr\}$$

is a regular grid in Ω with equal distance δ, which is also a δ-net of Ω.

For $$f \in H_{\gamma , \varOmega }^{\ast }$$, we have $$\|D f\|_{L_{\infty }} \le \gamma \|f\|_{L_{\infty }} \le \gamma$$. Moreover, by the continuity of f, for any $$x, y\in \varOmega$$, we have $$-1\le f(x) \le 1$$ and $$|f(x)-f(y)| \le \gamma |x-y|$$. It follows that

$$(\nu _{i} -1) \frac{\eta }{2} \le f(x_{i}) \le \nu _{i} \frac{\eta }{2}, \quad \forall x_{i} \in X,$$

for some $$\nu _{i} \in \mathrm{J}:=\{ -\lceil \frac{2}{\eta } \rceil +1, - \lceil \frac{2}{\eta } \rceil +2, \ldots , \lceil \frac{2}{\eta } \rceil \}$$. For each

$$\nu =(\nu _{(1,1, \ldots , 1)},\nu _{(2,1, \ldots , 1)}, \ldots , \nu _{i}, \ldots , \nu _{(\lfloor \frac{1}{\delta } \rfloor , \lfloor \frac{1}{\delta } \rfloor , \ldots , \lfloor \frac{1}{\delta } \rfloor ) }) \in {\mathrm{J}} ^{|{\mathrm{I}}^{d} |},$$

define

$$H_{\nu }:= \biggl\{ f\in H_{\gamma , \varOmega }^{\ast } \Bigm| (\nu _{i} -1) \frac{\eta }{2} \le f(x_{i}) \le \nu _{i} \frac{\eta }{2}, \forall i \in {\mathrm{I}}^{d} \biggr\} .$$

Then we obtain

$$H_{\gamma , \varOmega }^{\ast } \subseteq \bigcup_{\nu \in {\mathrm{J}}^{|{\mathrm{I}}^{d} |}} H_{\nu }.$$

Besides, for every $$f, g \in H_{\nu }$$, there exist $$\tilde{i} \in {\mathrm{I}}^{d}$$ and $$x_{\tilde{i}} \in X$$ such that

\begin{aligned} \max_{x \in \varOmega } \bigl\vert f(x)-g(x) \bigr\vert &= \max _{ \vert x-x_{\tilde{i}} \vert \le \delta } \bigl\vert f(x)-g(x) \bigr\vert \\ & \le \max_{ \vert x-x_{\tilde{i}} \vert \le \delta } \bigl\vert f(x)-f(x_{\tilde{i}}) \bigr\vert + \bigl\vert g(x)-g(x _{\tilde{i}}) \bigr\vert + \bigl\vert f(x_{\tilde{i}})-g(x_{\tilde{i}}) \bigr\vert \\ & \le 2 \gamma \delta +\frac{\eta }{2}=\eta . \end{aligned}

Hence, we conclude that the diameter of $$H_{\nu } \subseteq H_{\gamma , \varOmega }^{\ast }$$ is at most η, and

$$\bigl\{ H_{\nu }: \nu \in {\mathrm{J}}^{|{\mathrm{I}}^{d} |} \bigr\}$$

is an η-covering of $$H_{\gamma , \varOmega }^{\ast }$$.

In the following, we count the number of sets $$\{ H_{\nu } \}$$, $$\nu \in {\mathrm{J}}^{|{\mathrm{I}}^{d} |}$$. Let $$f\in H_{\nu }$$. For every fixed $$i=(i_{1}, i_{2}, \ldots ,i_{\ell }, \ldots , i_{d}) \in {\mathrm{I}}^{d}$$ and $$x_{i} \in X$$, we have

$$(\nu _{i} -1) \frac{\eta }{2} \le f(x_{i}) \le \nu _{i} \frac{\eta }{2}.$$

Similarly, for $$x_{\overline{i}} \in X$$ with $$\overline{i}=(i_{1}, i _{2}, \ldots ,i_{\ell }+1, \ldots , i_{d}) \in {\mathrm{I}}^{d}$$, a δ-neighboring point of $$x_{i}$$, we have

$$(\nu _{\overline{i}} -1) \frac{\eta }{2} \le f(x_{\overline{i}}) \le \nu _{\overline{i}} \frac{\eta }{2}.$$

It follows that

$$( \nu _{i} - \nu _{\overline{i}} -1) \frac{\eta }{2} \le f(x_{i}) - f(x _{\overline{i}}) \le ( \nu _{i} - \nu _{\overline{i}} +1) \frac{\eta }{2} .$$
(2.2)

Since $$f\in H_{\nu } \subseteq H_{\gamma , \varOmega }^{\ast }$$ and $$|x_{i} -x_{\overline{i}}| = \delta$$, we have

$$\bigl\vert f(x_{i}) - f(x_{\overline{i}}) \bigr\vert \le \gamma \delta =\frac{\eta }{4}.$$
(2.3)

By (2.2) and (2.3), we obtain $$|\nu _{i} -\nu _{ \overline{i}}| \frac{\eta }{2} - \frac{\eta }{2} \le \frac{\eta }{4}$$. Hence,

$$\nu _{\overline{i}} \in \{\nu _{i}, \nu _{i} +1, \nu _{i} -1\} .$$

In addition, since $$\nu _{(1,1, \ldots , 1)}$$ has at most $$2 \lceil \frac{2}{ \eta } \rceil$$ possible values, the number of nonempty sets $$\{ H_{\nu } \}$$, $$\nu \in {\mathrm{J}}^{|{\mathrm{I}}^{d} |}$$ is at most

\begin{aligned} 2 \biggl\lceil \frac{2}{\eta } \biggr\rceil 3^{|{\mathrm{I}}^{d} |} \le& 2\biggl( \frac{2}{ \eta } +1\biggr) 3^{ \lfloor \frac{1}{\delta } \rfloor ^{d}} \\ \le& \exp \biggl( \ln \biggl( 2 \biggl( \frac{2}{\eta } +1\biggr) \biggr) + \biggl\lfloor \frac{1}{\delta } \biggr\rfloor ^{d} \ln 3 \biggr) \\ \le& \exp \biggl( \ln \biggl( 2\biggl( \frac{2}{\eta } +1\biggr) \biggr) + \biggl\lfloor \frac{4 \gamma }{\eta } \biggr\rfloor ^{d} \ln 3 \biggr). \end{aligned}

Moreover, since $$\ln (1+t) \le t$$ for $$t\ge 0$$, we have

$$\exp \biggl( \ln \biggl( 2\biggl( \frac{2}{\eta } +1\biggr) \biggr) + \biggl\lfloor \frac{4 \gamma }{ \eta } \biggr\rfloor ^{d} \ln 3 \biggr) \le \exp \biggl( \ln 2 + \frac{2}{\eta } +\biggl( \frac{4 \gamma }{\eta }\biggr) ^{d} \ln 3 \biggr).$$

This completes the proof of Proposition 2.1. □

## Stability of random sampling

Let $$X=\{x_{j}: j \in \mathbb{N}\}$$ be a sequence of independent and identically distributed (i.i.d.) random variables, each of which is uniformly drawn from Ω, and $$y_{j}=f(x_{j}) +\epsilon _{j}$$ with $$\epsilon _{j}$$ being a random noise. In this section, we investigate the probability that any $$f\in H_{\gamma , \varOmega }$$ can be recovered from its samples $$\{(x_{j}, y_{j})\}_{j=1}^{n}$$ stably. We will prove that if the random noise satisfies some mild conditions, then, with overwhelming probability, the sampling inequalities (1.1) hold uniformly for all functions in $$H_{\gamma , \varOmega }$$ with (noisy) sampling values.

For every fixed $$f \in H_{\gamma , \varOmega }$$, we define the random variable

$$X_{j}(f):= \bigl\vert f(x_{j}) \bigr\vert ^{p}- \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx,$$
(3.1)

where $$\{x_{j} \}$$ is uniformly drawn from Ω. Then one can check that the sequence $$\{X_{j}(f): j\in \mathbb{N}\}$$ is independent and the expectation $$\mathbb{E}(X_{j}(f))=0$$. Besides, for the variance of $$X_{j}(f)$$, we have

\begin{aligned} \operatorname{Var} \bigl(X_{j}(f)\bigr) &=\mathbb{E} \bigl(X_{j}(f)^{2}\bigr)-\bigl(\mathbb{E} \bigl(X_{j}(f)\bigr)\bigr)^{2} \\ &=\mathbb{E}\bigl( \bigl\vert f(x_{j}) \bigr\vert ^{2p} \bigr)- \biggl( \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p}\,dx \biggr) ^{2} \\ &\le \mathbb{E}\bigl( \bigl\vert f(x_{j}) \bigr\vert ^{2p}\bigr) \\ &= \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{2p}\,dx \\ &\le \|f\|_{L_{\infty }(\varOmega )}^{2p}. \end{aligned}

The following Bernstein inequality plays an important role in probability theory, which gives bounds on the probability that the sum of independent random variables deviates from the expectation.

### Lemma 3.1

()

Let $$\xi _{1}, \xi _{2}, \ldots , \xi _{n}$$ be independent random variables. Assume that $$\mathbb{E} (\xi _{j})=0$$, $$\operatorname{Var} (\xi _{j}) \le \sigma ^{2}$$ and $$|\xi _{j}|\le K$$ almost surely for all j. Then, for any $$\lambda \ge 0$$,

$$\operatorname{Prob} \Biggl( \Biggl\vert \frac{1}{n} \sum _{j=1}^{n} \xi _{j} \Biggr\vert \ge \lambda \Biggr)\le 2\exp \biggl(-\frac{n {\lambda }^{2}}{2\sigma ^{2}+\frac{2}{3} K \lambda } \biggr).$$

### Lemma 3.2

Let $$\{x_{j}:j=1,2, \ldots , n \}$$ be a sequence of i.i.d. random variables that are uniformly drawn from $$\varOmega =(0, 1)^{d}$$. Let $$H_{\gamma , \varOmega }^{\ast }$$ be defined by (2.1), and $$X_{j}(f)$$ be given by (3.1). Then, for any $$\lambda \ge 0$$ and $$n\in \mathbb{N}$$,

$$\operatorname{Prob} \Biggl(\sup_{f\in H_{\gamma , \varOmega }^{\ast } } \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr)\le 2 \mathcal{N} \biggl(H_{\gamma , \varOmega }^{\ast }, \frac{\lambda }{2p} \biggr) \exp \biggl( - \frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr).$$

### Proof

Let $$\{f_{\ell }\}_{\ell =1, \ldots , L}$$, where $$L=\mathcal{N} (H _{\gamma , \varOmega }^{\ast }, \frac{\lambda }{2p} )$$, be a sequence in $$H_{\gamma , \varOmega }^{\ast }$$ such that $$H_{\gamma , \varOmega }^{ \ast }$$ can be covered by the $$L_{\infty }$$ balls centered at $$f_{\ell }$$ with radius $$\frac{\lambda }{2p}$$. For each fixed $$f_{\ell } \in H_{\gamma , \varOmega }^{\ast }$$, since $$\|f_{\ell }\| _{L_{\infty }(\varOmega )} = 1$$, we have $$\operatorname{Var} (X_{j}(f_{\ell })) \le 1$$ and $$|X_{j}(f_{\ell })| \le 1$$. By Lemma 3.1, we obtain

$$\operatorname{Prob} \Biggl( \Biggl\vert \frac{1}{n} \sum _{j=1}^{n} X_{j}(f_{\ell }) \Biggr\vert \ge \lambda \Biggr)\le 2\exp \biggl(-\frac{n {\lambda }^{2}}{2+\frac{2}{3} \lambda } \biggr).$$
(3.2)

For any given $$f\in H_{\gamma , \varOmega }^{\ast }$$, there exists some $$\ell \in \{1, 2, \ldots , L\}$$ such that $$\|f-f_{\ell }\|_{L_{\infty }} \le \frac{\lambda }{2p}$$. Thus, by the mean value theorem,

\begin{aligned} & \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) -\frac{1}{n} \sum_{j=1}^{n} X _{j}(f_{\ell }) \Biggr\vert \\ &\quad = \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} \bigl( \bigl\vert f(x_{j}) \bigr\vert ^{p} - \bigl\vert f_{\ell }(x_{j}) \bigr\vert ^{p} \bigr) \Biggr\vert \\ &\quad \le p\bigl(\max \bigl\{ \Vert f \Vert _{L_{\infty }(\varOmega )}, \Vert f_{\ell } \Vert _{L_{\infty }(\varOmega )} \bigr\} \bigr)^{p-1} \Vert f-f_{\ell } \Vert _{L_{\infty }(\varOmega )} \\ &\quad \le p \Vert f-f_{\ell } \Vert _{L_{\infty }(\varOmega )} \\ &\quad \le \frac{\lambda }{2}. \end{aligned}

Combining this with (3.2), we conclude that, for each fixed , $$1\le \ell \le L$$,

\begin{aligned} & \operatorname{Prob} \Biggl\{ \sup_{ \{f: \|f-f_{\ell }\|_{L_{\infty }} \le \frac{\lambda }{2p} \}} \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr\} \\ &\quad \le \operatorname{Prob} \Biggl\{ \Biggl\vert \frac{1}{n} \sum _{j=1}^{n} X_{j}(f_{ \ell }) \Biggr\vert \ge \lambda - \frac{\lambda }{2} \Biggr\} \\ &\quad \le 2 \exp \biggl( -\frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr) . \end{aligned}

Besides, since

$$H_{\gamma , \varOmega }^{\ast } \subseteq \bigcup_{\ell =1}^{L} \biggl\{ f: \|f-f _{\ell } \|_{L_{\infty }} \le \frac{\lambda }{2p} \biggr\} ,$$

we obtain

$$\operatorname{Prob} \Biggl(\sup_{f\in H_{\gamma , \varOmega }^{\ast } } \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr)\le \sum _{\ell =1} ^{L} \operatorname{Prob} \Biggl\{ \sup _{ \{f: \|f-f_{\ell }\|_{L_{\infty }} \le \frac{\lambda }{2p} \}} \Biggl\vert \frac{1}{n} \sum _{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr\} .$$

Therefore, noting that $$L=\mathcal{N} (H_{\gamma , \varOmega }^{\ast }, \frac{ \lambda }{2p} )$$, we conclude that

$$\operatorname{Prob} \Biggl(\sup_{f\in H_{\gamma , \varOmega }^{\ast } } \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr)\le 2 \mathcal{N} \biggl(H_{\gamma , \varOmega }^{\ast }, \frac{\lambda }{2p} \biggr) \exp \biggl( - \frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr) .$$

□

### Theorem 3.3

Let $$H_{\gamma , \varOmega }$$ be defined by (1.2). Assume that $$\{x_{j}: j=1, 2, \ldots , n\}$$ is a sequence of i.i.d. random variables that are uniformly drawn from Ω. Then, for any $$0< \lambda <\frac{1}{(p+1)^{d} \gamma ^{d}}$$, the following sampling inequalities:

$$n \bigl(1- (p+1)^{d} \gamma ^{d} \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \le \sum_{j=1}^{n} \bigl\vert f(x_{j}) \bigr\vert ^{p} \le n \bigl(1+ (p+1)^{d} \gamma ^{d} \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx$$
(3.3)

hold uniformly for all functions in $$H_{\gamma , \varOmega }$$ with probability at least

$$1- 2 \exp \biggl( \ln 2 + \frac{4p}{\lambda } +\biggl( \frac{8p \gamma }{ \lambda }\biggr) ^{d} \ln 3 -\frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr).$$

### Proof

Obviously that every $$f\in H_{\gamma , \varOmega }$$ satisfies (3.3) if and only if $$f/\|f\|_{L_{\infty }}$$ does. Thus, we assume that $$\|f\|_{L_{\infty }}=1$$ and $$f \in H_{\gamma , \varOmega }^{\ast }$$.

Let $$X_{j}(f)$$ be defined by (3.1), and one can check that the event

$$\mathcal{E}=\Biggl\{ \sup_{f\in H_{\gamma , \varOmega }^{\ast } } \Biggl\vert \frac{1}{n} \sum_{j=1}^{n} X_{j}(f) \Biggr\vert \ge \lambda \Biggr\}$$

is the complement of

$$\tilde{\mathcal{E}}=\Biggl\{ n \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx- \lambda n \le \sum_{j=1}^{n} \bigl\vert f(x_{j}) \bigr\vert ^{p}\le n \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx+ \lambda n, \forall f\in H_{\gamma , \varOmega }^{\ast } \Biggr\} .$$

For an arbitrary $$f\in H_{\gamma , \varOmega }^{\ast }$$, there exists an $$x^{\star } \in \overline{\varOmega }$$ such that $$|f(x^{\star })|= \|f\| _{L_{\infty }}=1$$. Without loss of generality, we assume that $$f(x^{\star })=1$$. Then, for $$x\in \varOmega ^{\ast }$$, where $$\varOmega ^{ \ast }:=\{x\in \varOmega , |x-x^{\star } |\le \frac{1}{\gamma } \}$$, we have

$$f\bigl(x^{\star }\bigr)-f(x)= \bigl\vert f(x)- f\bigl(x^{\star } \bigr) \bigr\vert \le \gamma \bigl\vert x-x^{\star } \bigr\vert \le 1.$$

It follows that

$$f(x) \ge 1- \gamma \bigl\vert x-x^{\star } \bigr\vert \ge 0 \quad \mbox{for } x \in \varOmega ^{\ast }$$

and

$$\int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \ge \int _{\varOmega ^{\ast }} \bigl(1- \gamma \bigl\vert x-x^{\star } \bigr\vert \bigr)^{p} \,dx \ge \frac{1}{(p+1)^{d} \gamma ^{d}} .$$

Therefore, the event

\begin{aligned} \bar{\mathcal{E}} =& \Biggl\{ n \bigl(1- (p+1)^{d} \gamma ^{d} \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \le \sum_{j=1}^{n} \bigl\vert f(x_{j}) \bigr\vert ^{p} \le n \bigl(1+ (p+1)^{d} \gamma ^{d} \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx, \\ &{} \forall f\in H_{\gamma , \varOmega } ^{\ast } \Biggr\} \end{aligned}

contains the event $$\tilde{\mathcal{E}}$$.

Thus, by Lemma 3.2 and Proposition 2.1, the inequalities (3.3) hold uniformly for all functions in $$H_{\gamma , \varOmega }$$ with probability

\begin{aligned} \operatorname{Prob} (\bar{\mathcal{E}}) \ge& \operatorname{Prob} (\tilde{ \mathcal{E}})=1- \operatorname{Prob} ({\mathcal{E}}) \\ \ge& 1- 2 \exp \biggl( \ln 2 + \frac{4p}{ \lambda } +\biggl( \frac{8p \gamma }{\lambda }\biggr) ^{d} \ln 3 - \frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr) . \end{aligned}

□

### Corollary 3.4

Under the same conditions of Theorem 3.3, let $$y_{j}=f(x _{j}) +\epsilon _{j}$$, $$j=1, 2, \ldots , n$$ be the sampling of f. Suppose that the random noise $$\{\epsilon _{j}\}$$ are independent with $$E(|\epsilon _{j}|^{p})=\sigma ^{p}$$ and $$| |\epsilon _{j}|^{p} - \sigma ^{p} | \le M \|f\|_{L_{p}}^{p}$$ for all $$\epsilon _{j}$$. In addition, we assume that $$\frac{\sigma ^{p} }{\|f\|_{L_{p}}^{p} } \le \rho \ll 1$$. Then, for any $$0< \lambda <\frac{2^{1-p} -\rho }{2^{1-p} (p+1)^{d} \gamma ^{d} +1}$$, the inequalities

\begin{aligned} & n \bigl(2^{1-p}-\rho - \bigl( 2^{1-p} (p+1)^{d} \gamma ^{d} +1 \bigr) \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \\ &\quad \le \sum_{j=1}^{n} \bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \\ &\quad \le 2^{p-1} n \bigl(1+\rho + \bigl( (p+1)^{d} \gamma ^{d}+1 \bigr) \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \end{aligned}
(3.4)

hold uniformly for all functions in $$H_{\gamma , \varOmega }$$ with probability at least

$$\biggl( 1-2 \exp \biggl(-\frac{n \lambda ^{2} }{2 M^{2}}\biggr) \biggr) \biggl( 1- 2 \exp \biggl( \ln 2 + \frac{4p}{\lambda } +\biggl( \frac{8p \gamma }{\lambda }\biggr) ^{d} \ln 3 -\frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr) \biggr).$$

### Proof

One can check that every $$f\in H_{\gamma , \varOmega }$$ satisfies the inequalities of (3.4) if and only if $$f/\|f\|_{L_{p}}$$ does. Thus, we assume that $$\|f\|_{L_{p}}=1$$. By Hoeffding’s inequality , we have

$$\operatorname{Prob} \Biggl\{ \Biggl\vert \frac{1}{n} \sum _{j=1}^{n} |\epsilon _{j} |^{p} - \sigma ^{p} \Biggr\vert \ge \lambda \Biggr\} \le 2 \exp \biggl(- \frac{n \lambda ^{2}}{2 M^{2}}\biggr).$$

So, with probability $$1-2 \exp (-\frac{n \lambda ^{2} }{2 M^{2}})$$,

$$\frac{1}{n} \sum_{j=1}^{n} |\epsilon _{j} |^{p} \le \sigma ^{p}+ \lambda \|f \|_{L_{p}}^{p}.$$

For $$1\le p <\infty$$, since $$t^{p}$$ is a convex function of t on $$[0, + \infty )$$, by Jensen’s inequality, we have

$$\bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \le \bigl( \bigl\vert f(x_{j}) \bigr\vert + \vert \epsilon _{j} \vert \bigr)^{p} \le 2^{p-1} \bigl( \bigl\vert f(x_{j}) \bigr\vert ^{p} + \vert \epsilon _{j} \vert ^{p} \bigr)$$

and

$$\bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \ge \bigl( \bigl\vert f(x_{j}) \bigr\vert - \vert \epsilon _{j} \vert \bigr)^{p} \ge 2^{1-p} \bigl\vert f(x_{j}) \bigr\vert ^{p} - \vert \epsilon _{j} \vert ^{p} .$$

Hence, with the same probability, we have

$$\sum_{j=1}^{n} \bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \le 2^{p-1} \sum_{j=1} ^{n} \bigl\vert f(x_{j}) \bigr\vert ^{p} + 2^{p-1} n \bigl( \sigma ^{p}+ \lambda \|f\|_{L_{p}} ^{p}\bigr)$$

and

$$\sum_{j=1}^{n} \bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \ge 2^{1-p} \sum_{j=1} ^{n} \bigl\vert f(x_{j}) \bigr\vert ^{p} - n\bigl( \sigma ^{p}+ \lambda \|f\|_{L_{p}}^{p}\bigr).$$

Combining this with Theorem 3.3, we conclude that

\begin{aligned} & n \bigl(2^{1-p}-\rho - \bigl( 2^{1-p} (p+1)^{d} \gamma ^{d} +1 \bigr) \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \\ &\quad \le \sum_{j=1}^{n} \bigl\vert f(x_{j})+ \epsilon _{j} \bigr\vert ^{p} \\ &\quad \le 2^{p-1} n \bigl(1+\rho + \bigl( (p+1)^{d} \gamma ^{d}+1 \bigr) \lambda \bigr) \int _{\varOmega } \bigl\vert f(x) \bigr\vert ^{p} \,dx \end{aligned}

holds with probability at least

$$\biggl( 1-2 \exp \biggl(-\frac{n \lambda ^{2} }{2 M^{2}}\biggr) \biggr) \biggl( 1- 2 \exp \biggl( \ln 2 + \frac{4p}{\lambda } +\biggl( \frac{8p \gamma }{\lambda }\biggr) ^{d} \ln 3 -\frac{ 3n \lambda ^{2}}{24+ 4 \lambda } \biggr) \biggr).$$

□

We remark that ρ in Corollary 3.4 is connected with the signal-to-noise ratio (SNR) of f, and the signal can be recovered from its noisy samples stably only if the noise level is relatively small.

## Approximation algorithm and numerical examples

### Approximation model and algorithm

In this subsection, we consider how to approximate $$f\in H_{\gamma , \varOmega }$$ from its noisy samples $$\{(x_{j}, y_{j})\}_{j=1}^{n}$$, where $$y_{j}=f(x_{j})+\epsilon _{j}$$ and $$\epsilon _{j}$$ is a random noise. The main idea is that we seek an approximant $$f^{\ast }$$ by solving the following optimization problem:

$$\min_{g\in V}\sum_{i=1}^{n} \bigl(g(x_{i})-y_{i}\bigr)^{2} + \varGamma (g),$$
(4.1)

where the first term tries to fit $$f^{\ast }(x_{i})$$ to $$y_{i}$$, and the second term is a regularization term.

For the function space V in (4.1), there are several choices, such as the Sobolev space, reproducing kernel Hilbert space, polynomial space or a principal shift-invariant space $$S^{h}(\phi ,\varOmega )$$, which is defined by

$$S^{h}(\phi , \varOmega )=\biggl\{ \sum_{\alpha \in I} \mathbf{u} (\alpha )\phi \biggl(\frac{ \cdot }{h} -\alpha \biggr): \mathbf{u} ( \alpha )\in \mathbb{R} \biggr\} ,$$

with

$$I=\biggl\{ \alpha \in \mathbb{Z}^{d}: \operatorname{supp} \phi \biggl( \frac{\cdot }{h}- \alpha \biggr)\cap \varOmega \ne \emptyset \biggr\} .$$

Here, function ϕ is called the generator of $$S^{h}(\phi , \varOmega )$$, and $$h>0$$ is a scaling parameter that controls the refinement of the space. The compact support of ϕ is preferred since it generates sparse matrices. In addition, $$S^{h}(\phi ,\varOmega )$$ provides good approximation to smooth functions if ϕ satisfies certain conditions (see e.g. [22,23,24]).

The regularization term in (4.1) is chosen such that the derivative of $$f^{\ast }$$ is controlled and the sampling noise can be reduced. For $$g=\sum_{\alpha \in I}\mathbf{u} (\alpha )\phi (\frac{ \cdot }{h} -\alpha )$$, we take $$\varGamma (g)= \|\operatorname{diag}( \boldsymbol{\lambda })\mathcal{W}\mathbf{u}\|_{\ell _{\infty }}$$, where $$\mathcal{W}$$ is the discrete wavelet frame transform, and $$\operatorname{diag}( \boldsymbol{\lambda })$$ is a diagonal matrix based on the vector λ which scales different wavelet channels. The advantage of using wavelet frames here is that the model has fast algorithms, and wavelet frames can be regarded as certain discretizations to the general differential operator . Moreover, with properly chosen parameters $$\operatorname{diag}( \boldsymbol{\lambda })$$, we have $$\|D g\|_{L_{\infty }(\varOmega )} \sim \| \operatorname{diag}(\boldsymbol{\lambda })\mathcal{W}\mathbf{u}\|_{\ell _{\infty }}$$ .

In summary, we determine the approximating function by minimizing

$$\sum_{j=1}^{n} \biggl(\sum _{\alpha \in I} \mathbf{u} (\alpha ) \phi \biggl( \frac{x _{j}}{h}-\alpha \biggr)-y_{j}\biggr)^{2} + \bigl\Vert {\operatorname{diag}}(\boldsymbol{\lambda }) {\mathcal{W}} {\mathbf{u}} \bigr\Vert _{\ell _{\infty } },$$
(4.2)

where u are the coefficients to solve. Finally, let

$$f^{\ast }=\sum_{\alpha \in I} { \mathbf{u}}^{\ast }( \alpha ) \phi \biggl( \frac{ \cdot }{h} -\alpha \biggr)$$

with $$\mathbf{u}^{\ast }$$ being the minimizer of (4.2).

Next, we show how to numerically solve (4.2), which can be written in the following matrix-vector form:

$$\min_{\mathbf{u}\in \mathbb{R}^{m}} \Vert \mathbf{A} \mathbf{u}- \mathbf{y} \Vert _{\ell _{2}}^{2} + \bigl\Vert \operatorname{diag}(\boldsymbol{\lambda })\mathcal{W} \mathbf{u} \bigr\Vert _{\ell _{\infty }} ,$$
(4.3)

where $$\mathbf{y}=[y_{1}, y_{2}, \ldots , y_{n} ]^{T}$$, $$\mathbf{A} _{jk}=\phi (x_{j}/h-\alpha _{k})$$, $$X=\{x_{1},\ldots , x_{n}\}$$ and $$I=\{\alpha _{1},\ldots , \alpha _{m}\}$$. This is equivalent to

$$\min_{\mathbf{u}, \mathbf{d}} \Vert \mathbf{A}\mathbf{u}- \mathbf{f} \Vert _{ \ell _{2}}^{2} + \bigl\Vert \operatorname{diag}(\boldsymbol{\lambda }) \mathbf{d} \bigr\Vert _{ \ell _{\infty }} \quad \mbox{subject to } \mathbf{d}=\mathcal{W} \mathbf{u}.$$
(4.4)

Then the alternating direction method of multipliers (ADMM) method [12, 25] can be applied to solve (4.4) as follows:

with initial guesses $$\mathbf{u}^{0}$$, $${\mathbf{d}}^{0}$$, $${\mathbf{b}}^{0}$$ and a parameter μ.

The quadratic problem (4.5) has the first order optimality condition

$$\bigl(2\mathbf{A}^{T}\mathbf{A}+\mu \mathbf{I}\bigr) \mathbf{u}=2\mathbf{A}^{T} \mathbf{f} +\mu \mathcal{W}^{T} \bigl({\mathbf{d}}^{i}-{\mathbf{b}}^{i}\bigr).$$
(4.8)

This is a sparse and positive definite linear system, and we use the conjugate gradient (CG) method to solve the problem. The solution to (4.6) is equivalent to the proximal operator of $$\| \cdot \|_{\ell _{\infty }}$$, i.e.,

$$\mathbf{prox}_{\frac{\lambda }{\mu } \Vert \cdot \Vert _{\ell _{\infty }} } \bigl( {\mathcal{W}} \mathbf{u}^{i+1} + \mathbf{b}^{i}\bigr) =\mathop {\mathrm {arg\,min}}_{ \mathbf{d}} \Vert \mathbf{d} \Vert _{\ell _{\infty }} + \frac{\mu }{2 \lambda } \bigl\Vert \mathbf{d}-\bigl({ \mathcal{W}} \mathbf{u}^{i+1} +\mathbf{b}^{i}\bigr) \bigr\Vert ^{2} _{\ell _{2}}.$$

By the property of the Moreau decomposition ,

$$\mathbf{prox}_{\frac{\lambda }{\mu } \| \cdot \|_{\ell _{\infty }} } \bigl( {\mathcal{W}} \mathbf{u}^{i+1} + \mathbf{b}^{i}\bigr) = {\mathcal{W}} \mathbf{u}^{i+1} + \mathbf{b}^{i}- \mathbf{proj}_{B_{1}(0, \frac{ \lambda }{\mu })} \bigl({\mathcal{W}} \mathbf{u}^{i+1} +\mathbf{b}^{i}\bigr),$$

where $$\mathbf{proj}_{B_{1}(0, \frac{\lambda }{\mu })}$$ is the projection to the $$\ell _{1}$$-norm ball given by

$$\mathbf{proj}_{B_{1}(0, \frac{\lambda }{\mu })} (\mathbf{t})= \mathop {\mathrm {arg\,min}}_{\|\mathbf{v}\|_{\ell _{1}} \le \frac{\lambda }{\mu } } \| \mathbf{v} -\mathbf{t} \|_{\ell _{2}}^{2}.$$
(4.9)

The exact solution of (4.9) can be computed at most $$O(N)$$ time , where N is the dimension of the space.

Note that the numerical computation of $$\mathcal{W}^{T}({\mathbf{d}} ^{i}-{\mathbf{b}}^{i})$$ in (4.8) is done by fast wavelet algorithm, similar to (4.6) and (4.7); see [25, 28] for detailed discussions. Finally, the proposed algorithm is summarized in Algorithm 1.

### Numerical examples and discussions

In this subsection, we show the efficiency of the proposed approach (4.2) by approximating two functions in $$H_{\gamma , \varOmega }$$ from some noisy samples.

In the first example, we test to approximate the well-known Franke function , which is defined as follows:

\begin{aligned} \operatorname{franke}(x_{1}, x_{2}) =& \frac{3}{4}e^{-((9x_{1}-2)^{2} + (9x _{2}-2)^{2})/4} + \frac{3}{4}e^{-((9x_{1}+1)^{2})/49 - (9x_{2}+1)/10} \\ &{} + \frac{1}{2}e^{-((9x_{1}-7)^{2} + (9x_{2}-3)^{2})/4} - \frac{1}{5}e^{-(9x_{1}-4)^{2} - (9x_{2}-7)^{2}} . \end{aligned}

We randomly sample $$\{\mathbf{x}_{j}\}_{j=1}^{1000}$$ from $$\varOmega =(0, 1)^{2}$$, and take the sampling values $$y_{j}=\operatorname{franke}(\mathbf{x} _{j})+\epsilon _{j}$$, where $$\epsilon _{j}$$ is a random variable drawn from the normal distribution $$N(0, 0.01)$$. Set the scaling parameter $$h=1/180$$, and the tensor product of the cubic spline

$$B_{4}(x)= \textstyle\begin{cases} x^{3}/6 & \text{if } 0\le x < 1, \\ (-3x^{3}+12x^{2}-12x+4)/6 & \text{if } 1\le x < 2, \\ (3x^{3}-24x^{2}+60x-44)/6 & \text{if } 2\le x < 3, \\ (4-x)^{3}/6 & \text{if } 3\le x < 4, \\ 0 & \text{else}, \end{cases}$$

as generator ϕ, with its associated wavelet tight frame masks $$h_{1}=[1/16,-1/4,3/8,-1/4, 1/16]$$, $$h_{2}=[-1/8,1/4,0, -1/4, 1/8]$$, $$h_{3}=[\sqrt{6}/16,0,-\sqrt{6}/8,0,\sqrt{6}/16]$$ and $$h_{4}=[-1/8,-1/4, 0,1/4,1/8]$$.

Figure 1(a) illustrates the approximation result obtained by Algorithm 1, whereas Fig. 1(b) shows the original Franke function. It can be seen that the proposed model (4.2) is able to approximate function well.

In the second example, we test to approximate

$$f(x_{1},x_{2})=\sin (2\pi x_{1})^{2} \cos (2\pi x_{2}).$$
(4.10)

Using a similar method to the first example, we randomly sample $$\{\mathbf{x}_{j}\}_{j=1}^{1000}$$ from $$\varOmega =(0, 1)^{2}$$ and take the sampling values $$y_{j}=f(\mathbf{x}_{j})+\epsilon _{j}$$ with $$\epsilon _{j}$$ drawn from $$N(0, 0.01)$$. We use the same generator ϕ and parameter h. In Fig. 2(a) the approximation is depicted, whereas in Fig. 2(b) the original function $$f(x_{1},x_{2})$$ is depicted.

## Conclusion and future work

In this paper, we investigated random sampling of functions with bounded derivatives. We considered the probability that functions in $$H_{\gamma , \varOmega }$$ can be recovered from noisy samples stably and how they can be approximated.

For the stability problem, we first estimated the capacity of $$H_{\gamma , \varOmega }$$. By using the uniform law of large numbers, we concluded that functions in $$H_{\gamma , \varOmega }$$ can be recovered stably with overwhelming probability when the sampling noise satisfies some mild conditions. Then we proposed an $$\ell _{\infty }$$-regularized least squares model in order to control the fluctuation of functions and suppress noise. The ADMM algorithm was applied to solve the optimization model. Finally, experiments on some function approximation tasks indicate the efficiency of the proposed approach.

In future work, it is of interest to present an error analysis for different kinds of denoising schemes when they are applied to reconstructing functions from random sampling.

## References

1. Lapidoth, A.: A Foundation in Digital Communication, 2nd edn. Cambridge University Press, Cambridge (2017)

2. Aldroubi, A., Gröchenig, K.: Nonuniform sampling and reconstruction in shift-invariant spaces. SIAM Rev. 43(4), 585–620 (2001)

3. Xian, J.: Average sampling and reconstruction in a reproducing kernel subspace of homogeneous type space. Math. Nachr. 87(8–9), 1042–1056 (2014)

4. Xian, J., Li, S.: Sampling set conditions in weighted finitely generated shift-invariant spaces and their applications. Appl. Comput. Harmon. Anal. 23(2), 171–180 (2007)

5. Xian, J., Li, S.: Improved sampling and reconstruction in spline subspaces. Acta Math. Appl. Sin. Engl. Ser. 32(2), 447–460 (2016)

6. Sun, W., Zhou, X.: Characterization of local sampling sequences for spline subspaces. Adv. Comput. Math. 30(2), 153–175 (2009)

7. Sun, Q.: Local reconstruction for sampling in shift-invariant spaces. Adv. Comput. Math. 32(3), 335–352 (2010)

8. Pinsky, M.A.: Introduction to Fourier Analysis and Wavelets, vol. 102. Am. Math. Soc., Providence (2008)

9. Cucker, F., Zhou, D.-X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge (2007)

10. Candès, E.J.: Compressive sampling. In: Proceedings of the International Congress of Mathematicians, vol. 3, pp. 1433–1452 (2006)

11. Piazzo, L.: Image estimation in the presence of irregular sampling, noise, and pointing jitter. IEEE Trans. Image Process. 28(2), 713–722 (2019)

12. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

13. Bass, R.F., Gröchenig, K.: Random sampling of bandlimited functions. Isr. J. Math. 177(1), 1–28 (2010)

14. Yang, J., Wei, W.: Random sampling in shift-invariant spaces. J. Math. Anal. Appl. 398(1), 26–34 (2013)

15. Yang, J.: Random sampling and reconstruction in multiply generated shift-invariant spaces. Anal. Appl. 17(2), 323–347 (2019)

16. Führ, H., Xian, J.: Relevant sampling in finitely generated shift-invariant spaces. J. Approx. Theory 240, 1–15 (2019)

17. Wu, Q., Ying, Y., Zhou, D.-X.: Learning rates of least-square regularized regression. Found. Comput. Math. 6(2), 171–192 (2006)

18. Cai, J.F., Shen, Z., Ye, G.B.: Approximation of frame based missing data recovery. Appl. Comput. Harmon. Anal. 31(2), 185–204 (2011)

19. Lin, S., Guo, X., Zhou, D.-X.: Distributed learning with regularized least squares. J. Mach. Learn. Res. 18(1), 3202–3232 (2017)

20. Bennett, G.: Probability inequalities for the sum of independent random variable. J. Am. Stat. Assoc. 57(297), 33–45 (1962)

21. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

22. Jia, R.Q.: Approximation with scaled shift-invariant spaces by means of quasi-projection operators. J. Approx. Theory 131(1), 30–46 (2004)

23. Dong, B., Shen, Z.: Pseudo-splines, wavelets and framelets. Appl. Comput. Harmon. Anal. 22(1), 78–104 (2007)

24. Johnson, M.J., Shen, Z., Xu, Y.H.: Scattered data reconstruction by regularization in B-spline and associated wavelet spaces. J. Approx. Theory 159(2), 197–223 (2009)

25. Cai, J.F., Osher, S., Shen, Z.: Split Bregman methods and frame based image restoration. Multiscale Model. Simul. 8(2), 337–369 (2009)

26. Yang, J., Stahl, D., Shen, Z.: An analysis of wavelet frame based scattered data reconstruction. Appl. Comput. Harmon. Anal. 42(3), 480–507 (2017)

27. Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the $$\ell _{1}$$-ball for learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, pp. 272–279. ACM, New York (2008)

28. Dong, B., Shen, Z.: MRA Based Wavelet Frames and Applications. IAS Lecture Notes Series. Summer Program on The Mathematics of Image Processing, Park City Mathematics Institute (2010)

29. Franke, R.: A critical comparison of some methods for interpolation of scattered data. Naval Postgraduate School Monterey Ca (1979)

### Acknowledgements

The authors thank Prof. Qiyu Sun at University of Central Florida and Prof. Jun Xian at Sun Yat-Sen University for valuable discussions on the sampling of signals with bounded derivatives.

Not applicable.

## Funding

This work was partially supported by the National Natural Science Foundation (Grant No. 11771120) and the Fundamental Research Funds for the Central Universities 2018B19614, China.

## Author information

Authors

### Contributions

All authors contributed equally to the manuscript and approved the final manuscript.

### Corresponding author

Correspondence to Jianbin Yang.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests. 