A note on hypercircle inequality for data error with l 1 norm

In our previous work, we have extended the hypercircle inequality (HI) to situations where the data error is known. Furthermore, the most recent result is applied to the problem of learning a function value in the reproducing kernel Hilbert space. Speciﬁcally, a computational experiment of the method of hypercircle, where the data error is measured with the l p norm (1 < p ≤ ∞ ), is compared to the regularization method, which is a standard method of the learning problem. Despite this breakthrough, there is still a signiﬁcant aspect of data error measure with the l 1 norm to consider in this issue. In this paper, we do not only explore the hypercircle inequality for the data error measured with the l 1 norm, but also provide an unexpected application of hypercircle inequality for only one data error to the l ∞ minimization problem, which is a dual problem in this case. MSC


Introduction
Many inequalities are known in connection with the approximation problem. In 1947 the hypercircle inequality has been applied to boundary value problems in mathematical physics [11]. In 1959, Golomb and Weinberger demonstrated the relevance of the hypercircle inequality (HI) to a large class of numerical approximation problems [5]. At present the method of hypercircle, which has a long history in applied mathematics, has received attention by mathematicians in several directions [4,12]. In 2011, Khompurngson and Micchelli [6] described HI and its potential application to kernel-based learning when the data is known exactly and then extended it to situation where there is known data error (Hide). Furthermore, the most recent result is applied to the problem of learning a function value in the reproducing kernel Hilbert space. Specifically, a computational experiment of the method of hypercircle, when data error is measured with the l p norm (1 < p ≤ ∞), is compared to the regularization method, which is a standard method of learning problem [6,8]. We continue our research on this topic by presenting a full analysis of hypercircle inequality for data error (Hide) measured with the l ∞ norm [10]. Despite this breakthrough, there is still a significant aspect of data error measure with the l 1 norm to consider in this issue. In this paper, we do not only explore the hypercircle inequality for data error measured with the l 1 norm, but also provide an unexpected application of hypercircle inequality for only one data error to the l ∞ minimization problem, which is a dual problem in this case.
Recently, we are specifically interested in a detailed analysis of the hypercircle inequality for data error (Hide) measured with the l ∞ norm [10]. Given a set of linearly independent vectors X = {x j : j ∈ N n } in a real Hilbert space H with inner product ·, · and norm · , where N n = {1, 2, . . . , n}. The Gram matrix of the vector in X is We define the linear operator L : H − → R n as Consequently, the adjoint map L T : R n − → H is given as It is well known that for any d ∈ R n , there is a unique vector x(d) ∈ M such that where M is the n-dimensional subspace of H spanned by the vectors in X; see, for example, [9]. We start with I ⊆ N n that contains m elements (m < n). For each e = (e 1 , . . . , e n ) ∈ R n , we also use the notations e I = (e i : i ∈ I) ∈ R m and e J = (e i : i ∈ J) ∈ R n-m . We define the set where ε is some positive number. For each d ∈ R n , we define the partial hyperellipse Given x 0 ∈ H, our main goal here is to estimate x, x 0 for x ∈ H(d|E ∞ ). According to the midpoint algorithm, we define We point out that I(x 0 , d|E ∞ ) is a closed bounded subset in R. Therefore we obtain that Hence the best estimator is the midpoint of this interval. According to our previous work [10], we give a formula for the right-hand endpoint.
Therefore the midpoint of the uncertainty I(x 0 , d|E ∞ ) is given by Furthermore, we describe every solution to the error bound problem (3) that is required to find the uncertainty interval midpoint. Specifically, the result is applied to a problem with learning the value of a function in the Hardy space of square-integrable functions on the unit circle, which has a well-known reproducing kernel. These formulas allow us to give explicitly the right-hand endpoint m + (x 0 , d|E ∞ ) when only the data error is known. We conjecture that the results of this case appropriately extend to the case of data error measured with the l 1 norm, which is our motivation to study this subject. The paper is organized as follows. In Sect. 2, we provide basic concepts of the particular case of hypercircle inequality for only one data error. Specifically, we provide an explicit solution of a dual problem, which we need for main results. In Sect. 3, we solve the problem of hypercircle inequality for data error measured with the l 1 norm. The main result in this section is Theorem 3.3, which establishes the solution for the l ∞ minimization problem, which is a dual problem in this case. Finally, we provide an example of a learning problem in the Hardy space of square-integrable functions on the unit circle and report on numerical experiments of the proposed methods.

Hypercircle inequality for only one data error
In this section, we describe HI for only one data error and its potential relevance to kernelbased learning. Given a set I ⊆ N n that contains n -1 elements, we assume that j / ∈ I. For each e = (e 1 , . . . , e n ) ∈ R n , we also use the notation e I = (e i : i ∈ I) ∈ R m . For each d ∈ R n , we define the partial hyperellipse Let x 0 ∈ H. Our purpose here is to find the best estimator for x, x 0 knowing that x ≤ 1, According to our previous work [7], we point out that H(d, ε) is weakly sequentially compact in the weak topology on H. It follows that I(x 0 , d, ε) := { x, x 0 : x ∈ H(d, ε)} fills out a closed bounded interval in R. Clearly, the midpoint of the uncertainty interval is the best estimator for x, x 0 when x ∈ H(d, ε). Therefore the hypercircle inequality for partially corrupted data becomes as follows.
Theorem 2.1 If x 0 ∈ H and H(d, ε) = ∅, then there is e 0 ∈ R such that |e 0 | ≤ ε and for any x ∈ H(d, ε), and For the particular case ε = 0, let us provide an explicit HI bound and a hypercircle inequality as follows.
The inequality above guarantees the presence of an approximation value, which is the vector in the closest point of a hyperplane to the origin. Moreover, it is independent of the vector x 0 . For the detailed proofs, see [2].
A more complete right-hand endpoint of the uncertainty interval may be obtained by the following results. To this end, we define the function V :

Theorem 2.3 If x 0 /
∈ M and H(d, ε) contains more than one element, then and the right-hand side of equation (9) has a unique solution.
To state the midpoint of the uncertainty interval, we point out the following fact. We begin with the left-hand side of the interval The midpoint is given by In the remainder of this section, we provide an explicit solution to (9).

Theorem 2.4 If x 0 /
∈ M and H(d, ε) contain more than one element, then we have: where the vector e ∈ R n with e I = 0 and |e j | = 1.
Proof According to our hypotheses, the minimum c * ∈ R n is the unique solution of the right-hand side of equation (9).
(1) The proof directly follows from [7], that is, we can state that if By the hypercircle inequality and (8) we obtain that (3) Under our hypotheses and [7], the minimum c * ∈ R n is the unique solution of the function V , and c * j = 0. Computing the gradient of V yields which confirms that where the vector x + (d, ε) is given by Therefore we obtain that We end this section by discussing a concrete example of the hypercircle inequality for only one data error for function estimation in a reproducing kernel Hilbert space. Specifically, we report on a new numerical experiment in a reproducing kernel Hilbert space by where K t is the function defined for s ∈ T as K t (s) = K(t, s). Moreover, for any kernel K , there is unique RKHS with K as its reproducing kernel [1]. In our example, we choose the Gaussian kernel on R, that is, The computational steps are organized in the following way. Let T = {t j : j ∈ N n } be points of increasing order in R. Consequently, we have a finite set of linearly independent ele- Thus the vectors {x j : j ∈ N n } appearing above are identified with the function {K t j : j ∈ N n } Therefore the Gram matrix of {K t j : j ∈ N n } is given by In our experiment, we choose the exact function g(t) = -0.15K 0.5 (t) + 0.05K 0.85 (t) -0.25K -0.5 (t) and compute the vector d = {g(t j ) : j ∈ N 12 } as shown in Fig. 1.
In addition, we assume that there is one missing data, that is, we assume that g(0) is missing. Therefore we proximate f (0) by f d I (0) = 6.5768973, which is obtained from the hypercircle inequality, Theorem 2.2, whereas the exact value g(0) = 6.576978. Next, we wish to estimate f (3) = f , K t 0 knowing that f (t j ) = d j for all j ∈ N 12 and f (0) = 6.5768973 + e, where |e| ≤ ε. Clearly, our data set contains both accurate and inaccurate data. Specifically, there is only one data error in this case. By Theorem 2. 4 we easily see that f d I ∈ H(d, ε). Thus the best value to estimate f (3) is f d I (3) = 3.137912 knowing that f (t j ) = d j for all j ∈ N 12 and f (0) = 6.5768973 + e, where |e| ≤ ε. The exact value is g(3) = 3.1395855.

Hypercircle inequality for data error measured with l 1 norm
In the previous section, we have provided basic concepts of the particular case of hypercircle inequality for only one data error. For our purpose, we restrict our attention to the study of hypercircle inequality for partially corrupted data with the l 1 norm. We start with I ⊆ N n that contains m elements (m < n). For each e = (e 1 , . . . , e n ) ∈ R n , we also use the notations e I = (e i : i ∈ I) ∈ R m and e J = (e i : i ∈ J) ∈ R n-m . We define E 1 = {e : e ∈ R n : e I = 0, |e J | 1 ≤ ε}, where ε is some positive number. For each d ∈ R n , we define the partial hyperellipse As we said earlier, it follows that H(d|E 1 ) is weakly sequentially compact in the weak topology on H and I(x 0 , d|E 1 ) := { x, x 0 : x ∈ H(d|E 1 )} is a closed bounded interval in R. Again, the midpoint of the uncertainty interval is the best estimator for x, x 0 when x ∈ H(d|E 1 ). Therefore the midpoint of the uncertainty I(x 0 , d|E 1 ) is given by We easily see that the data set contains both accurate and inaccurate data. In the same manner, we provide the duality formula to obtain the right-hand endpoint of the uncertainty interval I(x 0 , d|E 1 ). To this end, let us define the convex function V : R n − → R by Proof See [10].
We begin our main result of this section by providing a useful observation. To this end, let us introduce the following notations. For each j ∈ J, define the function V j : R n → R by Clearly, we see that the duality formula (17) corresponds to the hyperellipse with only one data error By Theorem 2.4, if x 0 / ∈ M and H j (d, ε) contain more than one element, then there is unique a * ∈ R n such that We can now state the first result.
Proof For each c ∈ R n , we observe that which means that min{V j (c) : c ∈ R n } ≤ min{V(c) : c ∈ R n }. According to our assumption, we obtain that which completes the proof.
To study the general case, let us introduce the following notations. We first denote the set For each λ ∈ ∞ , we denote the set of linearly independent vectors Consequently, we denote by M(X(λ j )) the (m + 1)-dimensional linear subspace of H spanned by the vectors in X(λ j ). From now on, we denote by G(X(λ j )) the Gram matrix of the vectors in X j (λ J ), which is symmetric and positive definite. The vector d(λ j ) ∈ R m+1 has the components Therefore we obtain the following partial hyperellipse with constant d(λ j ): Next, this partial hyperellipse with only one data error as (22) corresponds to a duality formula for the right-hand endpoint of uncertainty interval, m + (x 0 , d(λ j ), ε), as shown the following way. For all j ∈ J and λ ∈ ∞ , we define the function V j (·|λ) : , and H(d|E 1 ) contains more than one point, then there areλ ∈ ∞ and j ∈ J such that Proof According to our assumption, we can conclude that the right-hand side of equation (24) has a unique solution. Since x 0 x 0 / ∈ H(d|E 1 ), the vector c * = 0. We then assume that |c * J | ∞ = |c * j | for some j ∈ J. Alternatively, we find that there isλ ∈ ∞ such that c * i =λ i c * j for all i ∈ J\{j}. Therefore we obtain that Computing the gradient of V j (·|λ), the minimum c * I∪{j} = a * ∈ R m+1 is a unique solution of the nonlinear equations where the vector x + (d(λ j ), ε) is given by ∈ H d λ j , ε , and the partial hyperellipse with the constant d(λ j ) is To this end, let us introduce the following set: For each λ ∈ ∞ , we define where m i (λ) = min{V i (c|λ) : c ∈ R m+1 }.

Theorem 3.4
If H(d|E 1 ) contains more than one point, then Proof For each λ ∈ ∞ , we see that Consequently, we obtain that According to Theorem 3.2, there areλ ∈ ∞ and j ∈ J such that Therefore we can conclude that We end this section by extending these results to estimate optimally any number of features. Let us define the function W : In the case of estimating a single feature, the uncertainty set is an interval. For multiple features, the uncertainty set is a bounded set in a finite-dimensional space. Consequently, the corresponding uncertainty set is given as It is easy to check that U(d|E 1 ) is a convex compact subset of R k . To get the best estimator, we need to find the center and radius of U(d|E 1 ). We recall the Chebyshev radius and center. For this purpose, we choose the l ∞ norm | · | ∞ on R k and define the radius of U(d|E 1 ) as We denote its center as m ∞ ∈ R k . In the theorem below, we will show that the l ∞ center of the set U(d|E) is given by the vector where m(x -k+j , d|E 1 ) is the center of the interval I(x -k+j , d|E 1 ) for all j ∈ N k .

Theorem 3.5
If H(d|E 1 ) = ∅, then the l ∞ center of the uncertainty set is m ∞ = (m(x -k+j , d|E 1 ) : j ∈ N k ), and its radius is given by Proof This follows by the same method as in [6].
To this end, we present some results of a numerical experiment on estimating multiple features of a vector in the partial hyperellipse H(d|E 1 ). For our computational experiments, we choose the Hardy space of square-integrable functions on the unit circle with reproducing kernel where := {z : |z| ≤ 1}, [3]. Specifically, let H 2 ( ) be the set of all functions analytic in the unit disc with norm Specifically, let T = {t j : j ∈ N n } be points of increasing order in (-1, 1). Consequently, we have a finite set of linearly independent elements {K t j : j ∈ N n } in H, where Thus the vectors {x j : j ∈ N n } appearing above are identified with the functions {K t j : j ∈ N n }. Therefore the Gram matrix of {K t j : j ∈ N n } is given by Next, we recall the Cauchy determinant defined for {t j : j ∈ N n } and {s j : j ∈ N n } as see, for example, [2]. From this formula we obtain that In our case, for any t 0 ∈ (-1, 1) and t 0 / ∈ T := {t j : j ∈ N n }, we obtain that where B is the rational function defined for t ∈ C \ {t -1 j : j ∈ N n } by and the vector x 0 appearing previously is identified with the function K t 0 . We organize the computational steps as follows. We choose a finite set of linear independent elements {K t j : j ∈ N 6 } in H with Lf := f (t i ) : i ∈ N 6 ) = f , K t i : i ∈ N 6 ).
To obtain the right-hand endpoint, we need to find the minimum of W defined for λ ∈ [-1, 1]. As explained earlier, the midpoint algorithm requires us to find numerically the minimum of the function V for d and -d, that is, we compute v ± := min{V(c, ±d) : c ∈ R n },   Table 1.
Similarly, we find that

Conclusions
In this paper, we described an unexpected application of hypercircle inequality for only one data error to the l ∞ minimization problem (16). In two different circumstances, we applied what we have learned from recent results to the problem of learning the value of a function in RKHS, which can be beneficial in practice.