Skip to main content

A note on hypercircle inequality for data error with \(l^{1}\) norm

Abstract

In our previous work, we have extended the hypercircle inequality (HI) to situations where the data error is known. Furthermore, the most recent result is applied to the problem of learning a function value in the reproducing kernel Hilbert space. Specifically, a computational experiment of the method of hypercircle, where the data error is measured with the \(l^{p}\) norm \((1< p \leq \infty )\), is compared to the regularization method, which is a standard method of the learning problem. Despite this breakthrough, there is still a significant aspect of data error measure with the \(l^{1}\) norm to consider in this issue. In this paper, we do not only explore the hypercircle inequality for the data error measured with the \(l^{1}\) norm, but also provide an unexpected application of hypercircle inequality for only one data error to the \(l^{\infty}\) minimization problem, which is a dual problem in this case.

Introduction

Many inequalities are known in connection with the approximation problem. In 1947 the hypercircle inequality has been applied to boundary value problems in mathematical physics [11]. In 1959, Golomb and Weinberger demonstrated the relevance of the hypercircle inequality (HI) to a large class of numerical approximation problems [5]. At present the method of hypercircle, which has a long history in applied mathematics, has received attention by mathematicians in several directions [4, 12]. In 2011, Khompurngson and Micchelli [6] described HI and its potential application to kernel-based learning when the data is known exactly and then extended it to situation where there is known data error (Hide). Furthermore, the most recent result is applied to the problem of learning a function value in the reproducing kernel Hilbert space. Specifically, a computational experiment of the method of hypercircle, when data error is measured with the \(l^{p}\) norm \((1< p \leq \infty )\), is compared to the regularization method, which is a standard method of learning problem [6, 8]. We continue our research on this topic by presenting a full analysis of hypercircle inequality for data error (Hide) measured with the \(l^{\infty}\) norm [10]. Despite this breakthrough, there is still a significant aspect of data error measure with the \(l^{1}\) norm to consider in this issue. In this paper, we do not only explore the hypercircle inequality for data error measured with the \(l^{1}\) norm, but also provide an unexpected application of hypercircle inequality for only one data error to the \(l^{\infty}\) minimization problem, which is a dual problem in this case.

Recently, we are specifically interested in a detailed analysis of the hypercircle inequality for data error (Hide) measured with the \(l^{\infty}\) norm [10]. Given a set of \(\mathit{{linearly\ independent}}\) vectors \(X= \{ x_{j}: j \in \mathbb{N}_{n}\}\) in a real Hilbert space H with inner product \(\langle \cdot,\cdot \rangle \) and norm \(\Vert \cdot \Vert \), where \(\mathbb{N}_{n} =\{ 1,2,\ldots,n\}\). The Gram matrix of the vector in X is

$$\begin{aligned} G= \bigl( \langle x_{i}, x_{j} \rangle: i, j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

We define the linear operator \(L: H \longrightarrow \mathbb{R}^{n}\) as

$$\begin{aligned} L x = \bigl(\langle x,x_{j} \rangle: j \in \mathbb{N}_{n} \bigr),\quad x\in H. \end{aligned}$$

Consequently, the adjoint map \(L^{T}: \mathbb{R}^{n} \longrightarrow H\) is given as

$$\begin{aligned} L^{T} a = \sum_{j \in \mathbb{N}_{n}} a_{j} x_{j},\quad a \in \mathbb{R}^{n}. \end{aligned}$$

It is well known that for any \(d \in \mathbb{R}^{n}\), there is a unique vector \(x(d) \in M\) such that

$$\begin{aligned} x(d):= L^{T} \bigl(G^{-1}d \bigr):= \arg \min \bigl\{ \Vert x \Vert : x \in H, L(x) = d \bigr\} , \end{aligned}$$
(1)

where M is the n-dimensional subspace of H spanned by the vectors in X; see, for example, [9]. We start with \(I \subseteq \mathbb{N}_{n}\) that contains m elements \((m< n)\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notations \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\) and \(e_{{J}}= (e_{i}: i \in J) \in \mathbb{R}^{n-m}\). We define the set

$$\begin{aligned} \mathbb{E}_{\infty} = \bigl\{ e: e \in \mathbb{R}^{n}: e_{{I}}=0, \vert \!\vert \! \vert e _{{J}} \vert \!\vert \!\vert _{ \infty} \leq \varepsilon \bigr\} , \end{aligned}$$

where ε is some positive number. For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d|\mathbb{E}_{\infty}):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L(x)- d \in \mathbb{E}_{\infty} \bigr\} . \end{aligned}$$
(2)

Given \(x_{0} \in H\), our main goal here is to estimate \(\langle x, x_{0} \rangle \) for \(x \in \mathcal{H}(d|\mathbb{E}_{\infty})\). According to the midpoint algorithm, we define

$$\begin{aligned} I(x_{0}, d|\mathbb{E}_{\infty}) = \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty}) \bigr\} . \end{aligned}$$

We point out that \(I(x_{0}, d|\mathbb{E}_{\infty}) \) is a closed bounded subset in \(\mathbb{R}\). Therefore we obtain that

$$\begin{aligned} I(x_{0}, d|\mathbb{E}_{\infty}) = \bigl[ m_{-}(x_{0},d| \mathbb{E}_{\infty}), m_{+}(x_{0},d| \mathbb{E}_{\infty}) \bigr], \end{aligned}$$

where \(m_{-}(x_{0},d|\mathbb{E}_{\infty}) = \min \{\langle x,x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty})\}\) and \(m_{+}(x_{0},d|\mathbb{E}_{\infty}) = \max \{\langle x,x_{0} \rangle: x \in \mathcal{H} (d|\mathbb{E}_{\infty})\}\). Hence the best estimator is the midpoint of this interval. According to our previous work [10], we give a formula for the right-hand endpoint.

Theorem 1.1

If \(x_{0} \notin M \) and \(\mathcal{H}(d|\mathbb{E}_{\infty}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{\infty})= \min \bigl\{ \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{J}} \Vert \! \vert _{1} + (d,c).: c \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$
(3)

Therefore the midpoint of the uncertainty \(I(x_{0},d|\mathbb{E}_{\infty}) \) is given by

$$\begin{aligned} m(x_{0},d|\mathbb{E}_{\infty})= \frac{m_{+} (x_{0},d|\mathbb{E}_{\infty})- m_{+} (x_{0},-d|\mathbb{E}_{\infty})}{2}. \end{aligned}$$
(4)

Furthermore, we describe every solution to the error bound problem (3) that is required to find the uncertainty interval midpoint. Specifically, the result is applied to a problem with learning the value of a function in the Hardy space of square-integrable functions on the unit circle, which has a well-known reproducing kernel. These formulas allow us to give explicitly the right-hand endpoint \(m_{+}(x_{0},d|\mathbb{E}_{\infty})\) when only the data error is known. We conjecture that the results of this case appropriately extend to the case of data error measured with the \(l^{1}\) norm, which is our motivation to study this subject.

The paper is organized as follows. In Sect. 2, we provide basic concepts of the particular case of hypercircle inequality for only one data error. Specifically, we provide an explicit solution of a dual problem, which we need for main results. In Sect. 3, we solve the problem of hypercircle inequality for data error measured with the \(l^{1}\) norm. The main result in this section is Theorem 3.3, which establishes the solution for the \(l^{\infty}\) minimization problem, which is a dual problem in this case. Finally, we provide an example of a learning problem in the Hardy space of square-integrable functions on the unit circle and report on numerical experiments of the proposed methods.

Hypercircle inequality for only one data error

In this section, we describe HI for only one data error and its potential relevance to kernel-based learning. Given a set \(I \subseteq \mathbb{N}_{n}\) that contains \(n-1 \) elements, we assume that \(\mathbf{j} \notin I\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notation \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\). For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d, \varepsilon ):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L_{{I}}(x) = d_{{I}}, \bigl\vert \langle x, x_{{\mathbf{j}}} \rangle - d_{{ \mathbf{j}}} \bigr\vert \leq \varepsilon \bigr\} . \end{aligned}$$
(5)

Let \(x_{0} \in H\). Our purpose here is to find the best estimator for \(\langle x,x_{0} \rangle \) knowing that \(\Vert x \Vert \leq 1\),

$$\langle x,x_{i} \rangle = d_{i}\quad \text{for all }j \in \mathbb{N}_{n-1} \quad\text{and}\quad \langle x,x_{{\mathbf{j}}} \rangle =d_{{\mathbf{j}}}+e,\quad \text{where } \vert e \vert \leq \varepsilon. $$

According to our previous work [7], we point out that \(\mathcal{H}(d, \varepsilon ) \) is weakly sequentially compact in the weak topology on H. It follows that \(I(x_{0}, d, \varepsilon ):= \{ \langle x,x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \} \) fills out a closed bounded interval in \(\mathbb{R}\). Clearly, the midpoint of the uncertainty interval is the best estimator for \(\langle x,x_{0} \rangle \) when \(x \in \mathcal{H}(d, \varepsilon )\). Therefore the hypercircle inequality for partially corrupted data becomes as follows.

Theorem 2.1

If \(x_{0} \in H\) and \(\mathcal{H}(d, \varepsilon ) \neq \emptyset \), then there is \(e_{0} \in \mathbb{R}\) such that \(|e_{0}| \leq \varepsilon \) and for any \(x \in \mathcal{H}(d, \varepsilon )\),

$$\begin{aligned} \bigl\vert \bigl\langle x(d+e_{0}), x_{0} \bigr\rangle - \langle x,x_{0} \rangle \bigr\vert \leq \frac{1}{2} (m_{+}(x_{0},d, \varepsilon ) + m_{-}(x_{0},d, \varepsilon ), \end{aligned}$$

where \(x(d+e_{0}) = Q^{T} (G^{-1}(d+e_{0}) ) \in \mathcal{H}(d, \varepsilon )\),

$$\begin{aligned} m_{+}(x_{0},d, \varepsilon ):= \max \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} , \end{aligned}$$
(6)

and

$$\begin{aligned} m_{-}(x_{0},d, \varepsilon ):= \min \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} . \end{aligned}$$
(7)

For the particular case \(\varepsilon =0\), let us provide an explicit HI bound and a hypercircle inequality as follows.

Theorem 2.2

If \(x \in \mathcal{H}(d)\) and \(x_{0} \in H\), then

$$\begin{aligned} \bigl\vert \bigl\langle x(d), x_{0} \bigr\rangle - \langle x,x_{0} \rangle \bigr\vert \leq \operatorname{dist}(x_{0},M) \sqrt{1 - \bigl\Vert x(d) \bigr\Vert ^{2}}, \end{aligned}$$
(8)

where \(\operatorname{dist} (x_{0},M):= \min \{ \Vert x_{0} - y \Vert : y \in M \}\).

The inequality above guarantees the presence of an approximation value, which is the vector in the closest point of a hyperplane to the origin. Moreover, it is independent of the vector \(x_{0}\). For the detailed proofs, see [2].

A more complete right-hand endpoint of the uncertainty interval may be obtained by the following results. To this end, we define the function \(V:\mathbb{R}^{n} \rightarrow \mathbb{R} \) for each \(c \in \mathbb{R}^{n} \) by

$$\begin{aligned} V(c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{{\mathbf{j}}} \vert + (d,c). \end{aligned}$$

Theorem 2.3

If \(x_{0} \notin M \) and \(\mathcal{H}(d, \varepsilon ) \) contains more than one element, then

$$\begin{aligned} m_{+}(x_{0},d, \varepsilon )= \min \bigl\{ V(c): c \in \mathbb{R}^{n} \bigr\} , \end{aligned}$$
(9)

and the right-hand side of equation (9) has a unique solution.

Proof

See [7]. □

To state the midpoint of the uncertainty interval, we point out the following fact. We begin with the left-hand side of the interval

$$\begin{aligned} -m_{+}(x_{0},-d, \varepsilon ) = m_{-}(x_{0},d, \varepsilon ):= \min \bigl\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d, \varepsilon ) \bigr\} . \end{aligned}$$

The midpoint is given by

$$\begin{aligned} m(x_{0},d, \varepsilon )= \frac{m_{+} (x_{0},d, \varepsilon )- m_{+} (x_{0},-d, \varepsilon )}{2}. \end{aligned}$$
(10)

In the remainder of this section, we provide an explicit solution to (9).

Theorem 2.4

If \(x_{0} \notin M \) and \(\mathcal{H}(d, \varepsilon )\) contain more than one element, then we have:

1. \(\frac{x_{0}}{ \Vert x_{0} \Vert } \in \mathcal{H}(d, \varepsilon )\) if and only if \(m_{+} (x_{0},d, \varepsilon )= \Vert x_{0} \Vert \),

2. \(x_{+}(d_{{I}}) \in \mathcal{H}(d, \varepsilon )\) if and only if

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon ) = \bigl\langle x(d_{{I}}), x_{0} \bigr\rangle + \operatorname{dist}(x_{0},M_{{I}}) \sqrt{1 - \bigl\Vert x(d_{{I}}) \bigr\Vert ^{2}}, \end{aligned}$$

where the vector \(x_{+}(d_{{I}}):= \arg \max \{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d_{{I}})\} \),

3. \(\frac{x_{0}}{ \Vert x_{0} \Vert }, x_{+}(d_{{I}}) \notin \mathcal{H}(d, \varepsilon )\) if and only if

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon )= \max \bigl\{ \bigl\langle x_{+}(d + \varepsilon \mathbf{e} ), x_{0} \bigr\rangle , \bigl\langle x_{+}(d - \varepsilon \mathbf{e}), x_{0} \bigr\rangle \bigr\} , \end{aligned}$$

where the vector \(\mathbf{e} \in \mathbb{R}^{n}\) with \(\mathbf{e}_{{I}} = 0\) and \(|\mathbf{e}_{{\mathbf{j}}}| = 1 \).

Proof

According to our hypotheses, the minimum \(c^{*} \in \mathbb{R}^{n}\) is the unique solution of the right-hand side of equation (9).

(1) The proof directly follows from [7], that is, we can state that if \(x_{0} \notin M\), then

$$\begin{aligned} 0 = \arg \min \bigl\{ V (c): c \in \mathbb{R}^{n} \bigr\} \end{aligned}$$

if and only if \(\frac{x_{0}}{ \Vert x_{0} \Vert } \in \mathcal{H}(d, \varepsilon )\).

(2) Again from [7] it follows that \(c^{*} = \arg \min \{ V (c): c \in \mathbb{R}^{n}\}\) with \(c^{*}_{{\mathbf{j}}}=0\) if and only if

$$\begin{aligned} x_{+}(d_{{I}}) \in \mathcal{H}(d, \varepsilon ). \end{aligned}$$

By the hypercircle inequality and (8) we obtain that

$$\begin{aligned} \bigl\langle x_{+}(d_{{I}}), x_{0} \bigr\rangle = \bigl\langle x(d_{{I}}), x_{0} \bigr\rangle + \operatorname{dist}(x_{0},M_{{I}}) \sqrt{1 - \bigl\Vert x(d_{{I}}) \bigr\Vert ^{2}}. \end{aligned}$$

(3) Under our hypotheses and [7], the minimum \(c^{*} \in \mathbb{R}^{n}\) is the unique solution of the function V, and \(c^{*}_{{\mathbf{j}}} \neq 0\). Computing the gradient of V yields

$$\begin{aligned} -L \biggl( \frac{x_{0} - L^{T}c^{*}}{ \Vert x_{0} - L^{T}c^{*} \Vert } \biggr) + \varepsilon \operatorname{sgn} \bigl(c^{*}_{n} \bigr)\mathbf{e} + d =0, \end{aligned}$$
(11)

which confirms that

$$L_{I} \bigl(x_{+}(d, \varepsilon ) \bigr) = d_{I} \text{ and } \bigl\langle x_{+}(d, \varepsilon ), x_{{\mathbf{j}}} \bigr\rangle = d_{{ \mathbf{j}}} + \operatorname{sgn} \bigl(c^{*} \bigr)\varepsilon, $$

where the vector \(x_{+}(d, \varepsilon ) \) is given by

$$\begin{aligned} x_{+}(d, \varepsilon ):= \frac{x_{0} - L^{T}c^{*}}{ \Vert x_{0} - L^{T}c^{*} \Vert } \in \mathcal{H}(d + \varepsilon \mathbf{e} ) \cup \mathcal{H}(d - \varepsilon \mathbf{e} ). \end{aligned}$$

Therefore we obtain that

$$\begin{aligned} m_{+} (x_{0},d, \varepsilon )= \max \bigl\{ \bigl\langle x_{+}(d + \varepsilon \mathbf{e} ), x_{0} \bigr\rangle , \bigl\langle x_{+}(d - \varepsilon \mathbf{e}), x_{0} \bigr\rangle \bigr\} . \end{aligned}$$

 □

We end this section by discussing a concrete example of the hypercircle inequality for only one data error for function estimation in a reproducing kernel Hilbert space. Specifically, we report on a new numerical experiment in a reproducing kernel Hilbert space by using the available material from HI and our recent results. A real-valued function \(K(t,s)\) of t and s in \(\mathcal{T}\) is called a reproducing kernel of H if the following property is satisfied for all \(t \in \mathcal{T}\) and \(f \in H\):

$$\begin{aligned} f(t) = \langle K_{t}, f \rangle, \end{aligned}$$
(12)

where \(K_{t}\) is the function defined for \(s \in \mathcal{T}\) as \(K_{t}(s)=K(t,s)\). Moreover, for any kernel K, there is unique RKHS with K as its reproducing kernel [1]. In our example, we choose the Gaussian kernel on \(\mathbb{R}\), that is,

$$\begin{aligned} K(s, t) = e^{-\frac{(s - t)^{2}}{10}},\quad s, t \in \mathbb{R}\mathbbm{.} \end{aligned}$$

The computational steps are organized in the following way. Let \(T=\{t_{j}:j\in \mathbb{N}_{n}\}\) be points of increasing order in \(\mathbb{R}\). Consequently, we have a finite set of linearly independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) in H, where

$$\begin{aligned} K_{t_{j}}(t):=e^{-\frac{(t_{j} - t)^{2}}{10}},\quad j\in \mathbb{N}_{n}, t \in \mathbb{R}. \end{aligned}$$

Thus the vectors \(\{ x_{j}: j\in \mathbb{N}_{n}\}\) appearing above are identified with the function \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) Therefore the Gram matrix of \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) is given by

$$\begin{aligned} G(t_{1},\ldots,t_{n}):= \bigl( K(t_{i},t_{j}): i,j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

In our experiment, we choose the exact function

$$\begin{aligned} g(t) = -0.15K_{0.5}(t) + 0.05 K_{0.85}(t) - 0.25 K_{-0.5}(t) \end{aligned}$$

and compute the vector \(d=\{ g(t_{j}): j\in \mathbb{N}_{12}\}\) as shown in Fig. 1.

Figure 1
figure 1

Exact function

Given \(t_{0} = 3\), we want to estimate \(f(3) = \langle K_{t_{0}}, f \rangle \) knowing that \(\Vert f \Vert _{K} \leq \rho \) and \(f(t_{i}) = \langle K_{t_{i}}, f \rangle = d_{i}\) for all \(i \in \mathbb{N}_{12}\). In addition, we assume that there is one missing data, that is, we assume that \(g(0)\) is missing. Therefore we proximate \(f(0)\) by \(f_{d_{I}}(0) =6.5768973 \), which is obtained from the hypercircle inequality, Theorem 2.2, whereas the exact value \(g(0) = 6.576978\). Next, we wish to estimate \(f(3)=\langle f,K_{t_{0}} \rangle \) knowing that \(f(t_{j}) = d_{j}\) for all \(j \in \mathbb{N}_{12}\) and \(f(0) = 6.5768973 +e\), where \(|e| \leq \varepsilon \). Clearly, our data set contains both accurate and inaccurate data. Specifically, there is only one data error in this case. By Theorem 2.4 we easily see that \(f_{d_{I}} \in \mathcal{H}(d, \varepsilon )\). Thus the best value to estimate \(f(3)\) is \(f_{d_{I}}(3) = 3.137912\) knowing that \(f(t_{j}) = d_{j}\) for all \(j \in \mathbb{N}_{12}\) and \(f(0) = 6.5768973 +e\), where \(|e| \leq \varepsilon \). The exact value is \(g(3) = 3.1395855\).

Hypercircle inequality for data error measured with \(l^{1}\) norm

In the previous section, we have provided basic concepts of the particular case of hypercircle inequality for only one data error. For our purpose, we restrict our attention to the study of hypercircle inequality for partially corrupted data with the \(l^{1}\) norm. We start with \(I \subseteq \mathbb{N}_{n}\) that contains m elements \((m< n)\). For each \(e = ( e_{1},\ldots,e_{n}) \in \mathbb{R}^{n}\), we also use the notations \(e_{{I}} = (e_{i}: i \in I) \in \mathbb{R}^{m}\) and \(e_{{J}}= (e_{i}: i \in J) \in \mathbb{R}^{n-m}\). We define \(\mathbb{E}_{1} =\{ e: e \in \mathbb{R}^{n}: e_{{I}}=0, \Vert \vert e _{{J}} \vert \Vert _{1} \leq \varepsilon \}\), where ε is some positive number. For each \(d \in \mathbb{R}^{n}\), we define the partial hyperellipse

$$\begin{aligned} \mathcal{H}(d|\mathbb{E}_{1}):= \bigl\{ x: x\in H, \Vert x \Vert \leq 1, L(x)- d \in \mathbb{E}_{1} \bigr\} . \end{aligned}$$
(13)

As we said earlier, it follows that \(\mathcal{H}(d|\mathbb{E}_{1})\) is weakly sequentially compact in the weak topology on H and \(I(x_{0},d|\mathbb{E}_{1}):=\{ \langle x, x_{0} \rangle: x \in \mathcal{H}(d|\mathbb{E}_{1})\}\) is a closed bounded interval in \(\mathbb{R}\). Again, the midpoint of the uncertainty interval is the best estimator for \(\langle x, x_{0} \rangle \) when \(x \in \mathcal{H}(d|\mathbb{E}_{1})\). Therefore the midpoint of the uncertainty \(I(x_{0},d|\mathbb{E}_{1}) \) is given by

$$\begin{aligned} m(x_{0},d|\mathbb{E}_{1})= \frac{m_{+} (x_{0},d|\mathbb{E}_{1})- m_{+} (x_{0},-d|\mathbb{E}_{1})}{2}. \end{aligned}$$
(14)

We easily see that the data set contains both accurate and inaccurate data. In the same manner, we provide the duality formula to obtain the right-hand endpoint of the uncertainty interval \(I(x_{0},d|\mathbb{E}_{1})\). To this end, let us define the convex function \(\mathbb{V}: \mathbb{R}^{n} \longrightarrow \mathbb{R}\) by

$$\begin{aligned} \mathbb{V}(c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{ \mathbf{J}}} \vert \! \Vert _{\infty} + (d,c), \quad c \in \mathbb{R}^{n}. \end{aligned}$$
(15)

Theorem 3.1

If \(x_{0} \notin M \) and \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{1})= \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} , \end{aligned}$$
(16)

and the right-hand side of equation (16) has a unique solution. Moreover, \(x_{+}(d_{I}) \in \mathcal{H}(d|\mathbb{E}_{1})\) if and only if \(c^{*}_{J} = 0 \) where, \(c^{*} = \arg \min \{ \mathbb{V}(c): c \in \mathbb{R}^{n}\} \).

Proof

See [10]. □

We begin our main result of this section by providing a useful observation. To this end, let us introduce the following notations. For each \(j\in \mathbf{J}\), define the function \(\mathbb{V}_{j}: \mathbb{R}^{n} \rightarrow \mathbb{R} \) by

$$\begin{aligned} \mathbb{V}_{j}( c):= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{j} \vert + (c, d),\quad c \in \mathbb{R}^{n}. \end{aligned}$$
(17)

Clearly, we see that the duality formula (17) corresponds to the hyperellipse with only one data error

$$\begin{aligned} \mathcal{H}_{j}(d, \varepsilon ) = \bigl\{ x: x \in B, L_{{ \mathbb{N}_{n} \setminus \{j\}}}(x) =d_{{\mathbb{N}_{n} \setminus \{j \}}}, \bigl\vert \langle x, x_{j} \rangle - d_{j} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$
(18)

By Theorem 2.4, if \(x_{0} \notin M \) and \(\mathcal{H}_{j}(d, \varepsilon )\) contain more than one element, then there is unique \(a^{*} \in \mathbb{R}^{n}\) such that

$$\begin{aligned} \mathbb{V}_{j} \bigl( a^{*} \bigr)= \min \bigl\{ \mathbb{V}_{j}( a):a \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$

We can now state the first result.

Theorem 3.2

If \(x_{0} \notin M, \mathcal{H}(d|\mathbb{E}_{1}), \mathcal{H}_{j}( \mathbf{d}, \varepsilon ) \) contains more than one point and there exists \(a^{*} = \arg \min \{ \mathbb{V}_{j}(a): a \in \mathbb{R}^{m+1}\}\) with \(\Vert a^{*} \Vert _{\infty} = |a^{*}_{j} |\), then

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ \mathbb{V}_{j}( a ): a \in \mathbb{R}^{n} \bigr\} . \end{aligned}$$
(19)

Proof

For each \(c \in \mathbb{R}^{n}\), we observe that

$$\begin{aligned} \mathbb{V}_{j}( c) &= \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \vert c_{j} \vert + (c, d) \\ &\leq \bigl\Vert x_{0} - L^{T}(c) \bigr\Vert + \varepsilon \Vert \! \vert c_{{ \mathbf{J}}} \Vert \! \vert _{ \infty} + (c, d) \\ &= \mathbb{V}( c), \end{aligned}$$

which means that \(\min \{\mathbb{V}_{j}(c ): c \in \mathbb{R}^{n} \} \leq \min \{ \mathbb{V}(c): c \in \mathbb{R}^{n}\}\).

According to our assumption, we obtain that

$$\begin{aligned} \mathbb{V}_{j} \bigl( a^{*} \bigr) &= \bigl\Vert x_{0} - L^{T} \bigl(a^{*} \bigr) \bigr\Vert + \varepsilon \bigl\vert a^{*}_{j} \bigr\vert + \bigl(a^{*}, d \bigr) \end{aligned}$$
(20)
$$\begin{aligned} &= \bigl\Vert x_{0} - L^{T} \bigl(a^{*} \bigr) \bigr\Vert + \varepsilon \bigl\Vert \! \bigl\vert a^{*}_{{ \mathbf{J}}} \bigr\Vert \! \bigr\vert _{ \infty} + \bigl(a^{*}, d \bigr), \end{aligned}$$
(21)

which completes the proof. □

To study the general case, let us introduce the following notations. We first denote the set

$$\begin{aligned} \Lambda _{\infty} = \bigl\{ \lambda: \lambda \in \mathbb{R}^{n}, \lambda _{{ \mathbf{I}}} = 0, \vert \!\vert \!\vert \lambda _{{\mathbf{J}}} \vert \!\vert \!\vert _{\infty} \leq 1 \bigr\} . \end{aligned}$$

For each \(\lambda \in \Lambda _{\infty}\), we denote the set of linearly independent vectors

$$\begin{aligned} X \bigl(\lambda ^{j} \bigr)= \{ x_{i}: i \in \mathbb{N}_{m}\} \cup \bigl\{ \mathbf{x} \bigl( \lambda ^{j} \bigr) \bigr\} \end{aligned}$$

in H, where the vector

$$\begin{aligned} \mathbf{x} \bigl(\lambda ^{j} \bigr) = x_{j} + \sum _{i\in \mathbf{J}\backslash \{j\} } \lambda _{i} x_{i}. \end{aligned}$$

Consequently, we denote by \(M(X(\lambda ^{j})) \) the \((m+1)\)-dimensional linear subspace of H spanned by the vectors in \(X(\lambda ^{j})\). From now on, we denote by \(G( X(\lambda ^{j})) \) the Gram matrix of the vectors in \(X_{j}(\lambda _{{\mathbf{J}}})\), which is symmetric and positive definite. The vector \(\mathbf{d}(\lambda ^{j}) \in \mathbb{R}^{m+1}\) has the components

$$\mathbf{d} \bigl(\lambda ^{j} \bigr)_{i} = d_{i} \quad\text{for }i \in I\text{ and } \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} = d_{j} + \sum _{{i \in \mathbf{J} \backslash \{j\} }} \lambda _{i} d_{i}. $$

Therefore we obtain the following partial hyperellipse with constant \(\mathbf{d}(\lambda ^{j}) \):

$$\begin{aligned} \mathcal{H} \bigl(\mathbf{d} \bigl(\lambda ^{j} \bigr), \varepsilon \bigr) = \bigl\{ x: x \in B, L_{I}(x) = d_{I}, \bigl\vert \bigl\langle x,\mathbf{x} \bigl(\lambda ^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$
(22)

Next, this partial hyperellipse with only one data error as (22) corresponds to a duality formula for the right-hand endpoint of uncertainty interval, \(m_{+}(x_{0},\mathbf{d}(\lambda ^{j}), \varepsilon )\), as shown the following way. For all \(j\in \mathbf{J} \) and \(\lambda \in \Lambda _{\infty}\), we define the function \(\mathbb{V}_{j}(\cdot | \lambda ): \mathbb{R}^{m+1} \rightarrow \mathbb{R} \) by

$$\begin{aligned} \mathbb{V}_{j}( c| \lambda ):= \bigl\Vert x_{0} - L^{T}_{I}(c_{I}) - c_{m+1} \bigl( \mathbf{x} \bigl(\lambda ^{j} \bigr) \bigr) \bigr\Vert + \varepsilon \vert c_{m+1} \vert + \bigl(c, \mathbf{d} \bigl( \lambda ^{j} \bigr) \bigr),\quad c \in \mathbb{R}^{m+1}. \end{aligned}$$
(23)

Theorem 3.3

If \(x_{0} \notin M\), \(\frac{x_{0}}{ \Vert x_{0} \Vert } \notin \mathcal{H}(d|\mathbb{E}_{1})\), and \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then there are \(\hat{\lambda} \in \Lambda _{\infty} \) and \(j \in \mathbf{J}\) such that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ \mathbb{V}_{ \mathbf{j}}( c|\hat{\lambda} ): c \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$
(24)

Proof

According to our assumption, we can conclude that the right-hand side of equation (24) has a unique solution. Since \(\frac{x_{0}}{ \Vert x_{0} \Vert } \notin \mathcal{H}(d|\mathbf{E}_{1})\), the vector \(c^{*} \neq 0 \). We then assume that \(\Vert \vert c^{*}_{{ \mathbf{J}}} \vert \Vert _{\infty} = |c^{*}_{\mathbf{j}}|\) for some \(\mathbf{j} \in \mathbf{J} \). Alternatively, we find that there is \(\hat{\lambda} \in \Lambda _{\infty}\) such that \(c^{*}_{i} = \hat{\lambda}_{i} c^{*}_{\mathbf{j}}\) for all \(i \in \mathbf{J} \backslash \{\mathbf{j} \} \). Therefore we obtain that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} &= \bigl\Vert x_{0} - L^{T}_{I} \bigl(c^{*}_{I} \bigr) - c^{*}_{\mathbf{j}} \bigl(\mathbf{x} \bigl(\hat{ \lambda}^{j} \bigr) \bigr) \bigr\Vert + \varepsilon \bigl\vert c^{*}_{ \mathbf{j}} \bigr\vert + \bigl(c^{*}, \mathbf{d} \bigl(\hat{\lambda}^{j} \bigr) \bigr) \\ &=\min \bigl\{ \mathbb{V}_{\mathbf{j}}( a| \hat{\lambda} ): a \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$
(25)

 □

Computing the gradient of \(\mathbb{V}_{\mathbf{j}}( \cdot | \hat{\lambda} ) \), the minimum \(c^{*}_{I \cup \{ \mathbf{j} \}} = a^{*} \in \mathbb{R}^{m+1}\) is a unique solution of the nonlinear equations

$$\begin{aligned} L_{{\mathbf{I} }} \bigl(x_{+} \bigl(\mathbf{d} \bigl(\hat{ \lambda}^{j} \bigr), \varepsilon \bigr) \bigr) = d_{I}, \end{aligned}$$

and

$$\begin{aligned} \bigl\langle x_{+} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr),\mathbf{x} \bigl( \lambda ^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\lambda ^{j} \bigr)_{m+1} = \operatorname{sgn} \bigl(c^{\star}_{ \mathbf{j}} \bigr)\varepsilon, \end{aligned}$$

where the vector \(x_{+}(\mathbf{d}(\hat{\lambda}^{j}), \varepsilon ) \) is given by

$$\begin{aligned} x_{+} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr):= \frac{x_{0} - L^{T}_{I}(a^{*}_{I}) - a^{*}_{\mathbf{j}}(\mathbf{x}(\hat{\lambda}^{j}))) }{ \Vert x_{0} - L^{T}_{I}(a^{*}_{I}) - a^{*}_{\mathbf{j}}(\mathbf{x}(\hat{\lambda}^{j})) \Vert } \in \mathcal{H} \bigl(\mathbf{d} \bigl( \hat{ \lambda}^{j} \bigr), \varepsilon \bigr), \end{aligned}$$

and the partial hyperellipse with the constant \(\mathbf{d}(\hat{\lambda}^{j})\) is

$$\begin{aligned} \mathcal{H} \bigl(\mathbf{d} \bigl(\hat{\lambda}^{j} \bigr), \varepsilon \bigr) = \bigl\{ x: x \in B, L_{I}(x) = d_{I}, \bigl\vert \bigl\langle x,\mathbf{x} \bigl(\hat{\lambda}^{j} \bigr) \bigr\rangle - \mathbf{d} \bigl(\hat{\lambda}^{j} \bigr)_{m+1} \bigr\vert < \varepsilon \bigr\} . \end{aligned}$$

To this end, let us introduce the following set: For each \(\lambda \in \Lambda _{\infty}\), we define

$$\begin{aligned} W(\lambda ):= \min \bigl\{ m_{i}(\lambda ): i \in \mathbf{J} \bigr\} , \end{aligned}$$

where \(m_{i}(\lambda ) = \min \{ \mathbb{V}_{i}( c| \lambda ): c \in \mathbb{R}^{m+1} \}\).

Theorem 3.4

If \(\mathcal{H}(d|\mathbb{E}_{1}) \) contains more than one point, then

$$\begin{aligned} m_{+}(x_{0},d|\mathbb{E}_{1} ) = \min \bigl\{ W( \lambda ): \lambda \in \Lambda _{\infty} \bigr\} \end{aligned}$$
(26)

Proof

For each \(\lambda \in \Lambda _{\infty}\), we see that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq \min \bigl\{ \mathbb{V}_{i}( a|\lambda ): a \in \mathbb{R}^{m+1} \bigr\} \end{aligned}$$

for all \(i \in \mathbf{J}\), that is, for each \(\lambda \in \Lambda _{\infty}\),

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq W( \lambda ). \end{aligned}$$

Consequently, we obtain that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} \leq \inf \bigl\{ W( \lambda ): \lambda \in \Lambda _{\infty} \bigr\} . \end{aligned}$$

According to Theorem 3.2, there are \(\hat{\lambda} \in \Lambda _{\infty}\) and \(\mathbf{j} \in \mathbf{J} \) such that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ V_{ \mathbf{j}}( c|\hat{\lambda} ): c \in \mathbb{R}^{m+1} \bigr\} . \end{aligned}$$

Therefore we can conclude that

$$\begin{aligned} \min \bigl\{ \mathbb{V}(c): c \in \mathbb{R}^{n} \bigr\} = \min \bigl\{ W(\lambda ): \lambda \in \Lambda _{\infty} \bigr\} . \end{aligned}$$

 □

We end this section by extending these results to estimate optimally any number of features. Let us define the function \(W:\mathcal{H}(d|\mathbb{E}_{1}) \rightarrow \mathbb{R}^{k}\) defined for \(x \in \mathcal{H}(d|\mathbb{E}_{1})\) by

$$\begin{aligned} Wx = \bigl(\langle x,x_{-k + j}\rangle: j \in \mathbb{N}_{k} \bigr). \end{aligned}$$
(27)

In the case of estimating a single feature, the uncertainty set is an interval. For multiple features, the uncertainty set is a bounded set in a finite-dimensional space. Consequently, the corresponding uncertainty set is given as

$$\begin{aligned} U(d|\mathbb{E}_{1}):= \bigl\{ Wx: x \in \mathcal{H}(d| \mathbb{E}_{1}) \bigr\} . \end{aligned}$$
(28)

It is easy to check that \(U(d|\mathbb{E}_{1})\) is a convex compact subset of \(\mathbb{R}^{k}\). To get the best estimator, we need to find the center and radius of \(U(d|\mathbb{E}_{1})\). We recall the Chebyshev radius and center. For this purpose, we choose the \(l^{\infty}\) norm \(\Vert \vert \cdot \vert \Vert _{\infty} \) on \(\mathbb{R}^{k}\) and define the radius of \(U(d|\mathbb{E}_{1})\) as

$$\begin{aligned} r_{\infty} \bigl(U(d|\mathbb{E}_{1}) \bigr):= \inf _{y \in \mathbb{R}^{k}} \sup_{u \in U(d|\mathbb{E}_{1})} \vert \!\vert \!\vert u - y \vert \!\vert \!\vert _{\infty}. \end{aligned}$$

We denote its center as \(m_{\infty} \in \mathbb{R}^{k}\). In the theorem below, we will show that the \(l^{\infty}\) center of the set \(U(d|\mathbb{E})\) is given by the vector

$$\begin{aligned} m_{\infty}:= \bigl( m(x_{-k+j},d|\mathbb{E}_{1}): j \in \mathbb{N}_{k} \bigr), \end{aligned}$$

where \(m(x_{-k +j},d|\mathbb{E}_{1})\) is the center of the interval \(I(x_{-k+j},d|\mathbb{E}_{1})\) for all \(j \in \mathbb{N}_{k}\).

Theorem 3.5

If \(\mathcal{H}(d|\mathbb{E}_{1}) \neq \emptyset \), then the \(l^{\infty}\) center of the uncertainty set is \(m_{\infty} = ( m(x_{-k+j}, d|\mathbb{E}_{1}): j \in \mathbb{N}_{k})\), and its radius is given by

$$\begin{aligned} r_{\infty} \bigl(U(d|\mathbb{E}_{1}) \bigr) = \max \bigl\{ r \bigl(I(x_{-k+j},d|\mathbb{E}_{1}) \bigr): j \in \mathbb{N}_{k} \bigr\} . \end{aligned}$$

Proof

This follows by the same method as in [6]. □

To this end, we present some results of a numerical experiment on estimating multiple features of a vector in the partial hyperellipse \(\mathcal{H}(d|\mathbb{E}_{1})\). For our computational experiments, we choose the Hardy space of square-integrable functions on the unit circle with reproducing kernel

$$\begin{aligned} K(z,\zeta ) = \frac{1}{1-\overline{\zeta}z}, \zeta,\quad z \in \Delta, \end{aligned}$$

where \(\Delta:=\{ z: |z| \leq 1\}\), [3]. Specifically, let \(H^{2}( \Delta ) \) be the set of all functions analytic in the unit disc Δ with norm

$$\begin{aligned} \Vert f \Vert = \sup_{\substack{0< r< 1}} \biggl(\frac{1}{2 \pi} \int _{0}^{2\pi} \bigl\vert f \bigl(re^{i \theta} \bigr) \bigr\vert ^{2} d\theta \biggr)^{\frac{1}{2}}. \end{aligned}$$

Specifically, let \(T=\{t_{j}:j\in \mathbb{N}_{n}\}\) be points of increasing order in \((-1,1)\). Consequently, we have a finite set of linearly independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) in H, where

$$\begin{aligned} K_{t_{j}}(t):=\frac{1}{1-t_{j} t}, j\in \mathbb{N}_{n},\quad t \in \Delta. \end{aligned}$$

Thus the vectors \(\{ x_{j}: j\in \mathbb{N}_{n}\}\) appearing above are identified with the functions \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\). Therefore the Gram matrix of \(\{ K_{t_{j}}: j\in \mathbb{N}_{n}\}\) is given by

$$\begin{aligned} G(t_{1},\ldots,t_{n}):= \bigl( K(t_{i},t_{j}): i,j \in \mathbb{N}_{n} \bigr). \end{aligned}$$

Next, we recall the Cauchy determinant defined for \(\{ t_{j}: j \in \mathbb{N}_{n}\}\) and \(\{ s_{j}: j \in \mathbb{N}_{n}\}\) as

$$\begin{aligned} \operatorname{det} \biggl(\frac{1}{ 1- t_{i} s_{j}} \biggr)_{i,j \in \mathbb{N}_{n}} = \frac{ \prod_{ 1 \leq j < i \leq n} (t_{j} - t_{i})(s_{j} - s_{i})}{ \prod_{i,j \in \mathbb{N}_{n}} ( 1-t_{i} s_{j})}; \end{aligned}$$
(29)

see, for example, [2]. From this formula we obtain that

$$\begin{aligned} \operatorname{det} G(t_{1},\ldots,t_{n}) = \frac{ \prod_{1\leq i < j \leq n}(t_{i} - t_{j})^{2}}{ \prod_{i,j \in \mathbb{N}_{n}}(1-t_{i}t_{j})}. \end{aligned}$$
(30)

In our case, for any \(t_{0} \in (-1,1) \) and \(t_{0} \notin T:= \{ t_{j}: j \in \mathbb{N}_{n}\}\), we obtain that

$$\begin{aligned} \operatorname{dist} \bigl( K_{t_{0}}, \operatorname{span} \{K_{t_{j}}: j\in \mathbb{N}_{n} \} \bigr) = \frac{ \vert B(t_{0}) \vert }{\sqrt{1-t_{0}^{2}}}, \end{aligned}$$
(31)

where B is the rational function defined for \(t \in \mathbb{C}\setminus \{ t^{-1}_{j}: j \in \mathbb{N}_{n}\}\) by

$$\begin{aligned} B(t):= \prod_{j \in \mathbb{N}_{n}} \frac{t-t_{j}}{1-tt_{j}}, \end{aligned}$$
(32)

and the vector \(x_{0}\) appearing previously is identified with the function \(K_{t_{0}}\). We organize the computational steps as follows. We choose a finite set of linear independent elements \(\{ K_{t_{j}}: j\in \mathbb{N}_{6}\}\) in H with

$$\begin{aligned} t_{1}=-0.9,\qquad t_{2} = -0.6,\qquad t_{3} = -0.3,\qquad t_{4} = 0.3,\qquad t_{5} = 0.6 \quad\text{and} \quad t_{6} = 0.9. \end{aligned}$$

We choose the exact function

$$\begin{aligned} g(t) = -0.15K_{0.5}(t) + 0.05 K_{0.85}(t) - 0.25 K_{-0.5}(t) \end{aligned}$$

and compute the vector \(d=\{ g(t_{j}): j\in \mathbb{N}_{6}\}\). By the definition of (12) the linear operator \(L: H^{2}( \Delta ) \longrightarrow \mathbb{R}^{5}\) is defined for \(f \in H^{2}( \Delta )\) as follows:

$$\begin{aligned} Lf:= \bigl( f(t_{i}): i \in \mathbb{N}_{6} \bigr) )= \bigl( \langle f, K_{t_{i}} \rangle: i \in \mathbb{N}_{6} \bigr) ). \end{aligned}$$

In our experiment, we choose

$$\begin{aligned} t_{-2}=-0.4,\qquad t_{-1} =0,\qquad t_{0} = 0.4, \end{aligned}$$

and we wish to estimate

$$\begin{aligned} Wf= \bigl( \langle f,K_{t_{-3+j}} \rangle = f(t_{-3+j}): j \in \mathbb{N}_{3} \bigr) \end{aligned}$$

when we known that

$$\begin{aligned} f(t_{j}) = d_{j}\quad\text{for all } j \in \mathbb{N}_{6} \setminus \{ 3, 4\}\quad\text{and}\quad \bigl\vert f(t_{3}) - d_{3} \bigr\vert + \bigl\vert f(t_{4}) - d_{4} \bigr\vert \leq \varepsilon = 0.1. \end{aligned}$$

According to Theorem 3.2, the functions \(\mathbb{V}_{1}\) and \(\mathbb{V}_{2}\) become

$$\begin{aligned} \mathbb{V}_{1}( c| \lambda ) = {}&\bigl\Vert K_{t_{0}} - (c_{1}K_{t_{1}} +c_{2} K_{t_{2}} + c_{3} K_{t_{5}} + c_{4} K_{t_{6}}) - c_{5}(K_{t_{3}} + \lambda K_{t_{4}} ) \bigr\Vert \\ &{}+ \varepsilon \vert c_{5} \vert + (c_{I},d_{I}) + c_{5}(d_{3} + \lambda d_{4} ) \end{aligned}$$

and

$$\begin{aligned} \mathbb{V}_{2}( c| \lambda ) ={}& \bigl\Vert K_{t_{0}} - (c_{1}K_{t_{1}} +c_{2} K_{t_{2}} + c_{3} K_{t_{5}} + c_{4} K_{t_{6}}) - c_{5}(K_{t_{4}} + \lambda K_{t_{3}} ) \bigr\Vert + \varepsilon \vert c_{5} \vert + (c_{I},d_{I}) \\ &{}+ c_{5}(d_{4} + \lambda d_{3} ). \end{aligned}$$

In this computation, we found that \(f^{+}_{d_{{I}}} \notin \mathcal{H}(d|\mathbb{E}_{1})\). To obtain the minimum of W, we must compare the values of \(m_{1}(t_{-3+j}, \lambda )\) and \(m_{2}(t_{-3+j}, \lambda )\), where

$$\begin{aligned} m_{1}(t_{-3+j}, \lambda ) = \max \bigl\{ f^{+}_{d(\lambda ^{1}) + \varepsilon \mathbf{e}}(t_{-3+j}), f^{+}_{d(\lambda ^{1}) - \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} \end{aligned}$$

and

$$\begin{aligned} m_{2}(t_{-3+j}, \lambda ) = \max \bigl\{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-3+j}), f^{+}_{d(\lambda ^{2}) - \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} , \end{aligned}$$

that is,

$$\begin{aligned} m_{j}(t_{-3+j}, \lambda ) &= \max \bigl\{ f^{+}_{d(\lambda ^{j}) \pm \varepsilon \mathbf{e}}(t_{-3+j}) \bigr\} \\ &= \max \bigl\{ f_{d(\lambda ^{j}) \pm \varepsilon \mathbf{e}}(t_{-3+j}) + \operatorname{dist} \bigl(K_{t_{-3+j}},M \bigl(\lambda ^{j} \bigr) \bigr) \sqrt{1 - \Vert f_{d( \lambda ^{j}\pm \varepsilon \mathbf{e})} \Vert ^{2}} \bigr\} . \end{aligned}$$

To obtain the right-hand endpoint, we need to find the minimum of W defined for \(\lambda \in [-1,1]\). As explained earlier, the midpoint algorithm requires us to find numerically the minimum of the function \(\mathbb{V}\) for d and −d, that is, we compute \(v_{\pm}:= \min \{ \mathbb{V}(c,\pm d): c \in \mathbb{R}^{n}\} \), and then our midpoint estimator is given by \(\frac{v_{+} - v_{-}}{2} \). The result of this computation is shown in Table 1.

Table 1 Optimal value

Furthermore, we see that \(m_{+}(t_{-2},d|\mathbb{E}_{1}) = \min \{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-2}): \lambda \in [-1, 1]\} \). To obtain \(m_{+}(t_{-2},-d|\mathbb{E}_{1}) \), we then plot \(m_{1}\) and \(m_{2}\) as functions of λ for \(t_{-2} = -0.4\), as shown in Fig. 2.

Figure 2
figure 2

\(m_{i}(-0.4, \lambda )\) for −d

Similarly, we find that

$$\begin{aligned} m_{+}(t_{-1},d|\mathbb{E}_{1}) = \min \bigl\{ f^{+}_{d(\lambda ^{2}) + \varepsilon \mathbf{e}}(t_{-1}): \lambda \in [-1, 1] \bigr\} \end{aligned}$$

and

$$\begin{aligned} m_{+}(t_{-1},-d|\mathbb{E}_{1}) = \min \bigl\{ f^{+}_{-d(\lambda ^{1}) + \varepsilon \mathbf{e}}(t_{-1}): \lambda \in [-1, 1] \bigr\} . \end{aligned}$$

For the case \(t_{0} = 0.4\), we plot \(m_{1}\) and \(m_{2}\) as functions of λ for d and −d as shown in Figs. 3 and 4, respectively.

Figure 3
figure 3

\(m_{i}(0.4, \lambda )\) for d

Figure 4
figure 4

\(m_{i}(0.4, \lambda )\) for −d

Conclusions

In this paper, we described an unexpected application of hypercircle inequality for only one data error to the \(l^{\infty}\) minimization problem (16). In two different circumstances, we applied what we have learned from recent results to the problem of learning the value of a function in RKHS, which can be beneficial in practice.

Availability of data and materials

Not applicable.

References

  1. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68, 337–404 (1950)

    MathSciNet  Article  Google Scholar 

  2. Davis, P.J.: Interpolation and Approximation. Dover, New York (1975)

    MATH  Google Scholar 

  3. Duren, P.: Theory of \(H^{p}\) Space, 2nd edn. Dover, New York (2000)

    Google Scholar 

  4. Garcia, A.G., Portal, A.: Hypercircle inequalities and sampling theory. Appl. Anal. 82, 1111–1125 (2003)

    MathSciNet  Article  Google Scholar 

  5. Golomb, M., Weinberger, H.F.: Optimal approximation and error bounds, on numerical approximation. In: Langer, R.E. (ed.) Proceedings of a Symposium, Madison, April 21–23, 1958, pp. 117–190. The University of Wisconsin Press, Madison (1959). Publication No. 1 of the Mathematics Research Center, U.S. Army, the University of Wisconsin

    Google Scholar 

  6. Khompurngson, K., Micchelli Hide, C.A.: Jaen J. Approx. 3, 87–115 (2011)

    MathSciNet  Google Scholar 

  7. Khompurngson, K., Novaprateep, B.: Hypercircle inequality for partially-corrupted data. Ann. Funct. Anal. 6, 95–108 (2015)

    MathSciNet  Article  Google Scholar 

  8. Khompurngson, K., Novaprateep, B., Lenbury, Y.: Learning the value of a function from inaccurate data. East-West J. Math. Special Vol., 128–138 (2010)

  9. Micchelli, C.A., Rivlin, T.J.: A survey of optimal recovery. In: Micchelli, C.A., Rivlin, T.J. (eds.) Optimal Estimation in Approximation Theory. Plenum, New York (1977)

    Chapter  Google Scholar 

  10. Nammanee, K., Khompurngson, K.: Hypercircle inequality for data error measured with \(l^{\infty}\) norm. Jaen J. Approx. 11, 151–167 (2019)

    MathSciNet  MATH  Google Scholar 

  11. Synge, J.L.: The method of the hypercircle in function-space for boundary-value problems. Proc. R. Soc. Lond. Ser. A, Math. Phys. Sci. 1027, 447–467 (1947)

    MathSciNet  MATH  Google Scholar 

  12. Valentin, R.A.: The use of the hypercircle inequality in deriving a class of numerical approximation rules for analytic functions. Math. Compet. 22, 110–117 (1968)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the referee for his/her comments and suggestions on the manuscript. This work was supported by PMU (B05F630094) and University of Phayao, Thailand.

Funding

PMU (B05F630094) and University of Phayao, Thailand.

Author information

Authors and Affiliations

Authors

Contributions

KK developed the theoretical part and performed the analytic calculations and numerical simulations. Both authors contributed to the final version of the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Kannika Khompungson.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khompungson, K., Nammanee, K. A note on hypercircle inequality for data error with \(l^{1}\) norm. J Inequal Appl 2022, 87 (2022). https://doi.org/10.1186/s13660-022-02824-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13660-022-02824-x

MSC

  • 54A05

Keywords

  • Hypercircle inequality
  • Convex optimization
  • Noise data