An axiomatic integral and a multivariate mean value theorem

In order to investigate minimal sufficient conditions for an abstract integral to belong to the convex hull of the integrand, we propose a system of axioms under which it happens. If the integrand is a continuous $R^n$-valued function over a path connected topological space, we prove that any such integral can be represented as a convex combination of values of the integrand in at most $n$ points, which yields an ultimate multivariate mean value theorem.


Introduction and motivation
The basic integral mean value theorem states that for a function X which is continuous on the interval [a, b], there exists a point t ∈ (a, b) such that To show that the assumption of continuity is crucial for validity of this theorem, we can take the interval [−1, 1] and define X(s) = −1 for s ∈ [−1, 0) and X(s) = 1 for s ∈ [0, 1]. Hence here we do not have a single point t ∈ (−1, 1) for (1) to be satisfied. However, we can achieve a similar result with a convex combination of values of f in two points: a X(s) ds = 1 2 X(t 1 ) + 1 2 X(t 2 ), t 1 ∈ (−1, 0), t 2 ∈ (0, 1).
It turns out that the difference of one extra point for non-continuous functions remains in much more general case and for a very broad class of integrals in higher dimensions. This is the topic of this article.
In multivariate case, with X ∈ R n , n ≥ 1, there is an old and seemingly forgotten mean value theorem by Kowalevski [6,7] as follows.
Theorem B [4]. Let C : s → X(s), s ∈ I, be a continuous curve in R n , where I ⊂ R is an interval, and let K be the convex hull of the curve C. Then each v ∈ K can be represented as a convex combination of n or fewer points of the curve C.
In this way, a more general mean value theorem for n-dimensional functions is obtained directly from the fact that the normalized integral should belong to the convex hull of the set of values of the integrand. This is proved in the following theorem for Lebesgue integral with a probability measure.
Theorem C[4, Lemma 1].Let (S, F , µ) be a probability space, and let X i : S → R , i = 1, . . . , n, be µ-integrable functions. Let X(t) = (X 1 (t), . . . , X n (t)) for every t ∈ S. Then Finally, the main result of [4] reads Theorem D [4,Theorem 2]. For an interval I ⊆ R, let µ be a finite positive measure on the Borel sigma-field of I. Let X k , k = 1, . . . , n, n ≥ 1, be continuous functions on I, integrable on I with respect to the measure µ. Then there exist points t 1 , . . . , t n in I, and non-negative numbers λ 1 , . . . , λ n , with Let us note that without the continuity assumption we still may use the Carathéodory's convex hull theorem which would yield (4) with n + 1 points t i and the same number of λ i 's, which shows that the example at the beginning of the text well describes the general situation in R n .
In this paper we give a more general theorem of the type (4), tracing the steps of the proof in [4] in a much more general context. In Section 3 we show that a result like (4) holds if the integral over I is replaced with a general linear functional on some function space, under a system of axioms, whereas the interval I can be replaced by a topological space which is path connected.
To reach this goal, we need to extend the Theorem C. In section 2 we show that Theorem C holds for any linear functional which satisfies a condition slightly stronger than positivity, and where functions X k are defined over an arbitrary nonempty set.
Applications of such a very general mean value theorem are numerous, and we are not discussing particular applications in this paper. Let us just mention that as shown in [4], the theorems of this type can be considered as an aid to construct quadrature rules or their approximative versions, see also a recent paper [3] for another application related to integrals. Another advantage of the approach presented in this paper is that the results are widely applicable to different kinds of integrals treated as linear functionals over some space of functions.

Axioms and their consequences
We start with an arbitrary nonempty set S with an algebra F (may be a sigma algebra as well) of its subsets. Therefore, F contains S and if a set A is result of finitely many set operations over sets in F , then A ∈ F . Let S be a family of functions X : S → R which satisfies the following conditions: C1 If X 1 , X 2 ∈ S then aX 1 + bX 2 ∈ S for all a, b ∈ R, C2 For B ∈ F , the indicator function I B (·) belongs to S, C3 For X ∈ S and any interval J, the set {s ∈ S : X(s) ∈ J} is in F .
C4 For X ∈ S such that X(s) ≥ 0 for all s ∈ S, it holds that X · I X∈J ∈ S for any interval J.
Note that from C1 and C2 it follows that all constants are in S. Let us also note that functions in S are not assumed to be bounded.
Let E be a functional defined on S and taking values in R such that the following axioms hold.
Now we may define a set function P as and consider yet another condition related to P : C5 If X ∈ S and P (N ) = 0, then X · I N ∈ S, and E (X · I N ) = 0.
The last axiom that we propose is A3 For X ∈ S, if E X = 0 then either P (X = 0) = 1 or there exist s 1 , s 2 ∈ S such that X(s 1 )X(s 2 ) < 0, or equivalently (see Lemma 2.3 below) A3 ′ For X ∈ S, if X(s) ≥ 0 for all s ∈ S and E X = 0, then P (X = 0) = 1.
Finally we extend the functional E to act on functions with values in R n . Let X = (X 1 , . . . X n ) be a function from S to R n , where we assume that X i , i = 1, . . . , n satisfy axioms and conditions above, then we define The central result of this section is Theorem 2.10, where we show that under A1-A2-A3, E (X) belongs to the convex hull of X(S).
A similar axiomatic approach is applied in Daniell's integral, and there are other axiomatic systems in the literature for different purposes like in [2] for Riemann integrals in connection to evaluation the length of a curve, general means in [8], finitely additive probabilities (FAPs) in [10] and applications in a recent article [11]. The system of axioms applied in this article differs from others in the literature in the conditions that allow non-absolute integrals, as well as in axiom 3, which is slightly stronger condition then usual positivity. The reason of introducing this system of axioms is that it provides conditions under which E X belongs to the convex hull of X(S) (theorem 2.10), independently of the kind of integrals that is considered. Now we are going to derive some additional properties as consequences from the axioms. 2.1 Lemma. Under system of axioms A1-A2-A3 or A1-A2-A3 ′ , assuming also conditions C 1 -C 2 , the set function P defined on F with (5) is a finitely additive probability on (S, F ).
Proof. Since I S (s) = 1 for all s ∈ S, we have that P (S) = 1. For disjoint sets B 1 , . . . , B m , using A2 we get additivity: Let us now show that P (B) ≥ 0 for all B ∈ F . Indeed, suppose that P (B) = −ε for some ε > 0. This implies (by A1 and A2) that E (I B + ε) = 0; on the other hand, I B (s) + ε > 0 for all s ∈ S, which contradicts both A3 and A3 ′ , and this ends the proof.
From now on, the quintuplet (S, F , S, E , P ) will be assumed to be as defined above in the framework of axioms A1-A2-A3 and conditions C1 − C5 (if not specified differently). The letter X will be reserved for elements of S, and P is a set function derived from E as in (5).

Lemma.
Assuming that A1 and A2 hold, the axioms A3 and A3 ′ are equivalent. Proof. In Lemma 2.1 we already proved the property that P is a FAP follows with either A3 or A3 ′ , so we may use that property in both parts of the present proof.
Due to the equivalence established in the Lemma 2.3, in the rest of article we refer to A3 as being either A3 or A3'.
2.4 Remark. Suppose that we have a quintuplet (S, F , S, E , P ), that satisfies assumptions C1-C5 and Axioms A1-A2, where P is defined with (5) Next, for each X ∈ S we define the function X * := X| S * -that is, X restricted to S * and let S * be the space of all those mappings. We define a linear functional E * on S as and the set function P * as In this way we get a new quintuplet with (S * , F * , S * , E * , P * ), and it is not difficult to see that the new quintuplet inherits conditions C1 − C5 and axioms A1-A2 , as well as A3 if it is satisfied with the original quintuplet.
In the next Lemma we prove Markov's inequality from the axioms. 2.5 Lemma. Let X ∈ S and X(s) ≥ 0 for all s ∈ S. Then Proof. Since X ≥ 0, we use C4 to conclude that E X = E (X · I 0≤X≤ε ) + E (X · I X>ε ).
Proof. Let X ∈ S such that X ≥ 0 and E X = 0. We need to show that P (X > 0) = 0. By Markov's inequality we have that P (X > ε) = 0 for any ε > 0, and so, using the countable additivity we get as desired.
2.7 Remark. Consider the case where P is a countably additive probability on (S, F ), F is a sigma algebra, and E X = S X(s) dP (s). Axioms A1-A2 and conditions C1-C5 are clearly satisfied, and Markov's inequality can be proved from properties of the integral, so by Lemma 2.6, the axiom A3 also holds.
Let us now recall some facts about FAPs. A probability P which is defined on an algebra F of subsets of the set S, is purely finitely additive if ν ≡ 0 is the only countably additive measure with the property that ν(B) ≤ P (B) for all B ∈ F . A purely finitely additive probability P is strongly finitely additive-SFAP if there exist countably many disjoint sets For every probability P on F there exists a countably additive probability P c and a purely finitely additive probability P d such that P = λP c + (1 − λ)P d for some λ ∈ [0, 1]. This decomposition is unique (except for λ = 0 or λ = 1, when it is trivially non-unique).

Lemma.
Assuming axioms A1-A2, conditions C1-C5 and positivity, if P is a SFAP, the condition of Axiom A3 is not satisfied.
2.9 Example. Let S = [0, +∞) and let P be the probability defined by the non-principal ultrafilter of Banach limit as s → +∞. Let X(s) = e −s . Then X ≥ 0 and E X = 0, but P (X = 0) = 0. In this case the convex hull K(X) = (0, 1] and E X ∈ K(X).
2.10 Theorem. Let (S, F , S, E , P ) be a quintuplet as defined above, and let X = (X 1 , . . . , X n ), where X i ∈ S for all i. Assuming that axioms A1-A2-A3 and conditions C1-C5 hold, E X belongs to the convex hull of the set X(S) = {X(t) | t ∈ S} ⊂ R n .
Proof. Without loss of generality we may assume that for all i, E X i = 0 (otherwise if E X i = c i we can observe E (X i − c i ) = 0). Let K denotes the convex hull of the set X(S) ∈ R n . We now prove that 0 ∈ K by induction on n. Let n = 1. By A3, E X = 0 implies that either X(s) = 0 for some s ∈ S or there are s 1 , s 2 ∈ S such that X(s 1 ) > 0 and X(s 2 ) < 0. In both cases it follows that 0 ∈ K. Now assume that the statement of the theorem is valid for all dimensions from 1 to n − 1 for all quintuplets (S, F , S, E , P ) that satisfy the conditions mentioned in the statement of the theorem. Let now X be a vector function with values in R n .
If every hyperplane π that contains 0 has the property that the set X(S) has an non-empty intersection with both of two open half-spaces with π as boundary, then 0 ∈ K (see [9] for details). Otherwise, suppose that with some real numbers a 1 , . . . , a n , such that a 2 k > 0. By linearity (A2) we have that E L(t) = 0, which is (A3 ′ ) possible together with (8) only if L(t) = 0 for all t ∈ S \ N , where µ(N ) = 0. Assuming that a n = 0, we find that X n (s) = − n−1 k=1 a k a n X k (s) for every s ∈ S \ N .
In other words, a separating hyperplane exists only if there exists a linear relation among n given functions with probability one. In order to eliminate X n and to reduce the system to n − 1 functions, we consider functions X * i (s) = X i (s) on the restricted domain S * = S \ N , (i = 1, . . . , n−1) and the corresponding functional E * . Let K * be convex hull of X * (S * ) ∈ R n−1 By hereditary property (Remark 2.4), we have that E * (X * i ) = E (X i · I S\N ) = 0, by C4. Note that K * ⊂ K. By induction assumption, the statement of the theorem holds for dimension n − 1, and so with some t 1 , . . . , t m ∈ S * ⊂ S (here we use the fact that X * i (s) = X i (s) for s ∈ S * ). Finally, using (9) and (10) we find that also and so, the statement of the theorem holds for dimension n.
2.11 Remark. Theorem 2.10 provides sufficient conditions for E X to belong to the convex hull of X(S). However, by inspection of the proof, we can see that the Axiom 3 is also necessary, assuming A1 − A2 and conditions C1 − C5. Now, as a corollary to Theorem 2.10 using Caratheodory's theorem on representation of convex hull in finite dimension, we get the following result: 2.12 Theorem. Assume that axioms A1-A2-A3 and conditions C1-C5 hold on (S, F , S, E , P ). Let X = (X 1 , . . . , X n ), where X i ∈ S for all i. Then there are points t 0 , . . . , t n and a discrete probability law given by probabilities λ 0 , . . . , λ n so that The Theorem 2.12 is the most general mean value theorem for axiomatic integral. Due to Remark 2.7, the statement of this theorem applies with E X i = S X i (s) dµ(s), µ is a countably additive probability measure on (S, F ) where F is a sigma algebra, and X i are (S, F ) − (R, B) measurable and integrable functions (B is Borel sigma field on R).
In the next section we consider the case of continuous functions X i .

Mean value theorem for continuous multivariate mappings
3.1 Definition. A path from a point a to point b in a topological space S is a continuous mapping f : [0, 1] → S such that f (0) = a and f (1) = b. A space S is path connected if for any two points a, b ∈ S there exists a path that connects them.
Let us remark that any topological vector space is path connected. A path that connects points a and b is given by f (λ) = λa + (1 − λ)b, λ ∈ [0, 1]. The same is true for a convex subset S of any topological space.
The following result is a generalization of Theorem B. 3.2 Theorem. Let X : t → X(t), t ∈ S, be a continuous function defined on a path connected topological space S with values in R n , and let K be the convex hull of the set X(S). Then each v ∈ K can be represented as a convex combination of n or fewer points of the set X(S).
Proof. By Carathéodory's theorem, any v ∈ K can be represented as a convex combination of at most n+1 points of the set X(S). Without loss of generality, assume that v = 0. Therefore, there exist t j ∈ S and v j ≥ 0, 0 ≤ j ≤ n, such that t i = t j for i = j, v 0 + · · · + v n = 1, and We may also assume that all n + 1 points x(t j ) do not belong to one hyperplane in R n (in particular, x(t i ) = x(t j ) for i = j) and that the numbers v j are all positive; otherwise, at least one term from (13) can be eliminated. Now we apply the following reasoning: Denote by p j (x), 0 ≤ j ≤ n, the coordinates of the vector x ∈ R n with respect to the coordinate system with the origin at 0, and with the vector base consisted of vectors x(t j ), j = 1, . . . , n (that is, x = n j=1 p j (x)x(t j )). Then from (13) we find that p j (x(t 0 )) = −v j /v 0 < 0, j = 1, . . . , n, i.e. the coordinates of the vector x(t 0 ) are negative. The coordinates of vectors x(t j ), j = 1, 2, . . . , n are non-negative: p j (x(t j )) = 1 and p k (x(t j )) = 0 for k = j. Now consider a path t = t(λ), λ ∈ [0, 1] which connects points t 0 and t 1 , so that t(0) = t 0 and t(1) = t 1 . The functions λ → p j (x(t(λ))) := f j (λ) are continuous as mappings from [0, 1] to R n and f j (0) < 0 for all j = 1, . . . , n, whereas f 1 (1) = 1 and f j (1) = 0 for j > 1. Therefore, for each of functions f j there exists one or more points λ ∈ (0, 1] such that f j (λ) = 0. Since the set N = ∪ n j=1 f −1 j ({0}) is closed and non-empty subset of (0, 1], there exists λ 0 = min S, λ 0 > 0. Lett := t(λ 0 ). From this construction it follows that there exists (at least one) k such that p k (x(t)) = 0 and p j (x(t)) < 0 for j = k. Hence, and it follows that v = 0 is a convex combination of points x(t) and x(t j ), j = 1, . . . , n, j = k.
As a direct corollary to Theorems 2.10 and 3.2, we have the following mean value theorem for continuous multivariate mappings.
3.3 Theorem. Let (S, F , S, E , P ) be a quintuplet as defined in Section 2, where S is a path connected topological space. Under conditions C1-C5 and axioms A1-A3, let X i , i = 1, . . . , n be continuous functions from S. Then there exist points t 1 , . . . , t n in S, and non-negative numbers λ 1 , . . . , λ n , with 3.4 Remark. Since Theorem 3.2 is independent of axioms and conditions of Section 2, the statement of Theorem 3.3 holds whenever the Theorem 2.12 holds. In particular, it holds whenever E X i = S X i (s) dµ(s), where µ is a countably additive probability measure.
In fact, all what Theorem 3.3 says is that we can save one point in the representation (12) of Theorem 2.12. Although it might look not much significant, in some applications it makes difference. For example, Karamata's representation for covariance in [5] based on Kowalewski's original result for n = 2, strongly depend on two points and nothing similar can be derived with three points.

Some particular cases and open problems
4.1 Riemann and Lebesgue integral on R d . For d = 1, let −∞ < a < b < +∞ and let X be a Riemann integrable function on [a, b]. Define In terms of previous notations, S = [a, b], and S is the family of Riemann integrable functions on S. A natural choice for algebra F of sets related to Riemann integral should be the algebra of intervals in [a, b], which can be defined as the collection of all subintervals of [a, b] (including singletons and empty set) and their finite unions. The corresponding probability is defined then as follows: where λ(J i ) is the length of J i . This is Jordan probability measure on [a, b], and it is well known that, even for continuous bounded functions, the set {s ∈ S : X(s) ≤ c} where c is a real number, does not obligatory belong to F (this is probably first shown by an example in [1]). This fact makes it impossible to use our system of axioms directly, because condition C3 does not hold. Nevertheless, we can proceed by noticing that the algebra F as described above is a sub-algebra of the Borel sigma algebra B, generated by open sets in [a, b], and since Riemann and Lebesgue integrals coincide if the integrand is Riemann integrable, we can proceed it this way.
In more common notations, let us consider a general case of a functional E based on Lebesgue integral on R d , d ≥ 1: or, in shorthand, where λ is the Lebesgue measure on R d restricted to D, and V (D) = λ(D) > 0, where D is a convex (or in more generality, path connected) subset of R d . The underlying probability measure in our construction of Section 2 is P (·) = 1 V (D) λ(·). Let f 1 , . . . , f n be continuous functions D → R such that E f i as defined in (15). Then, by Theorem 3.3, we have that there are points x 1 , . . . , x n ∈ D and non-negative numbers λ 1 , . . . , λ n with n i=1 λ i = 1 such that 1 V (D) D f 1 (s 1 , . . . , s d ) ds 1 . . . ds d = λ 1 f 1 (x 1 ) + · · · λ n f 1 (x n ) . . . = . . . 1 V (D) D f n (s 1 , . . . , s d ) ds 1 . . . ds d = λ 1 f n (x 1 ) + · · · λ n f n (x n ) In words, this result shows that for an arbitrary system of n integrals with continuous integrands, there exists an exact quadrature rule with n points in D, with coefficients λ i which are the same for all integrals. Note that x i are d-dimensional points, so in fact here we have dn scalar parameters.
4.2 Integrals with respect to countably additive probability measure. As already noted, Theorem 3.3 holds for all integrals based on countably additive probability measure.
Suppose that S is a path connected topological space, and let µ be a countably additive probability measure on (S, F ), where F is a sigma algebra of Borel subsets of S. Let f 1 , . . . , f n be continuous mappings from S to R, and suppose that for some x j ∈ C[0, T ] and λ j ≥ 0 with j λ j = 1.

Open question.
We showed in Section 2 that Axiom 3 does not hold for (integrals based on) SFAPs, so by Remark 2.11, the mean value theorem does not hold in this case. It would be of interest to describe classes of finitely additive probabilities for which Axiom 3 holds or does not hold, in terms of some structural properties of measures.