Skip to main content

Upper and lower bounds for the Bregman divergence


In this paper we study upper and lower bounds on the Bregman divergence \(\Delta_{\mathcal {F}}^{\xi }(y,x):=\mathcal {F}(y)-\mathcal {F}(x)- \langle \xi , y-x \rangle\) for some convex functional \(\mathcal {F}\) on a normed space \(\mathcal {X}\), with subgradient \(\xi \in\partial \mathcal {F}(x)\). We give a considerably simpler new proof of the inequalities by Xu and Roach for the special case \(\mathcal {F}(x)= \Vert x \Vert ^{p}\), \(p>1\). The results can be transferred to more general functions as well.


In recent times the Bregman divergence (or Bregman distance) \(\Delta _{\mathcal {F}}^{x^{*}}(y,x)\), introduced by Bregman in [1], has been used as a generalized distance measure in various branches of applied mathematics, for example optimization, inverse problems, statistics and computational mathematics, especially machine learning. To get an overview over the Bregman divergence and its possible applications in optimization and inverse problems we refer to [2,3,4]. In particular the Bregman divergence has been used for various algorithms in numerical analysis and also for convergence analysis of numerical methods and algorithms.

Especially when doing convergence analysis it is often crucial to have lower and upper bounds on the Bregman divergence in terms of norms. In [5] the authors prove upper and lower bounds for expressions

$$ \Vert x+y \Vert ^{p}- \Vert x \Vert ^{p}-p \bigl\langle j_{p} (x),y \bigr\rangle =: \Delta_{\mathcal {F}}^{j_{p}(x)}(x+y,x), $$

where \(j_{p}:\mathcal {X}\to \mathcal {X}^{*}\) is a duality mapping, under certain assumptions on the Banach space \(\mathcal {X}\). As it turns out that (1) is the Bregman divergence corresponding to the functional \(\mathcal {F}= \Vert \cdot \Vert ^{p}\) these results have been used since then in many papers working with the Bregman divergence. However, from the proofs of [5] it seems difficult to transfer the results to other functions \(\mathcal {F}\). Thus we develop in this work a simple framework to find such bounds and in fact can apply it to give a short new proof of the results from [5] for \(\mathcal {F}(x)= \Vert x \Vert ^{p}\), \(p>1\).

Our approach is as follows: Proving upper bounds is rather simple if one sufficiently understands the smoothness of \(\mathcal {F}\) as the Bregman divergence is basically a linearization error and linearization errors are related to differentiability by definition. In particular we will show that one can obtain upper bounds for the Bregman divergence corresponding to \(\mathcal {F}=\phi( \Vert \cdot \Vert )\), if \(\phi: \mathbb {R}\to \mathbb {R}\) is convex and sufficiently smooth.

Regarding lower bounds we will make use of \(\mathcal {F}^{*}\), the convex conjugate of \(\mathcal {F}\). Actually it can be shown that lower bounds for \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\) correspond to upper bounds for \(\Delta_{\mathcal {F}^{*}}^{x}(y^{*},x^{*})\). Note that this idea is not at all new. Already in [6, 7] this kind of connection between \(\mathcal {F}\) and \(\mathcal {F}^{*}\) was discussed in depth. So again one can just make use of the smoothness of \(\mathcal {F}^{*}\) to conclude lower bounds for \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\). One might argue that convex conjugates can be rather complicated functions and expecting differentiability is too optimistic. This is true to some extent, but actually reasonable lower bounds on \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\) already imply differentiability of \(\mathcal {F}^{*}\) at \(x^{*}\) (see [6, Theorem 2.1]). So if one has any hope of finding lower bounds then one might as well work with the convex conjugate.

One reason why our proof is simpler than the proof from [5] is that they did it the other way round. They firstly proved lower bounds with quite some effort and then used the convex conjugate to show upper bounds.

We will focus mainly on asymptotic bounds for \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\) as \(\Vert x-y \Vert \to0\). It is the more interesting case for applications as for example in convergence analysis one will be interested in the Bregman divergence of \(x_{n}\) and x, where \(x_{n}\to x\). Also theoretical it is the more challenging case, since for \(\Vert x-y \Vert \to\infty\) the Bregman divergence \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\) will mostly depend on the behavior of \(\mathcal {F}(y)\) as y tends to infinity and it should be easy to find lower and upper bounds. In particular we will show at the end of the paper, how one can deduce uniform bounds for all \(x,y\in \mathcal {X}\) from the asymptotic bounds for the case \(\mathcal {F}= \Vert \cdot \Vert ^{p}\).

The paper consists of 4 sections. In Sect. 2 we recall some basis notions of convex analysis. In Sect. 3 we define moduli of smoothness and convexity corresponding to a general functional \(\mathcal {F}\) and develop some properties of them. Finally, in Sect. 4 we then use the theory from Sect. 3 on the functional \(\mathcal {F}=\frac {1}{p} \Vert \cdot \Vert ^{p}\) for \(p>1\) and find lower and upper bounds for the corresponding Bregman divergence given by the smoothness, respectively, the convexity of the space \(\mathcal {X}\) as shown in [5].

Tools from convex analysis

In this work \(\mathcal {X}\) will always be a real Banach space, with \(\dim \mathcal {X}\ge2\), \(\mathcal {X}^{*}\) denotes its dual space, \(S_{\mathcal {X}}=\{x\in \mathcal {X}: \Vert x \Vert =1\} \) the unit sphere and \(\mathcal {F}: \mathcal {X}\to \overline {\mathbb {R}}:=\mathbb {R}\cup\{\infty\}\) some function. We will need some basic concepts from convex analysis, so we shortly recall them in this chapter.

\(x^{*}\in \mathcal {X}^{*}\) is called a subgradient of a convex function \(\mathcal {F}: \mathcal {X}\to \overline {\mathbb {R}}\) at \(x\in \mathcal {X}\) if \(\mathcal {F}(x)\) is finite and

$$ \mathcal {F}(y)\ge \mathcal {F}(x) + \bigl\langle x^{*}, y-x \bigr\rangle , $$

for all \(y\in \mathcal {X}\). The set of all subgradients of \(\mathcal {F}\) at x is called the subdifferential of \(\mathcal {F}\) at x and denoted by \(\partial \mathcal {F}(x)\). The convex conjugate \(\mathcal {F}^{*}:\mathcal {X}^{*} \to \overline {\mathbb {R}}\) of \(\mathcal {F}\) is defined by

$$ \mathcal {F}^{*}\bigl(x^{*}\bigr)=\sup_{x\in \mathcal {X}} \bigl[ \bigl\langle x^{*},x \bigr\rangle -\mathcal {F}(x) \bigr]. $$

From these two definitions one can directly conclude the following generalized Young (in)equality. For all \(x\in \mathcal {X}\), \(x^{*}\in \mathcal {X}^{*}\) we have

$$ \mathcal {F}(x)+\mathcal {F}^{*}\bigl(x^{*}\bigr)\ge \bigl\langle x^{*},x \bigr\rangle . $$

Equality holds true if and only if \(x^{*}\in\partial \mathcal {F}(x)\). Further we have

$$ \mathcal {F}(x)\ge \mathcal {F}^{**}(x):=\sup_{x^{*}\in \mathcal {X}^{*}} \bigl[ \bigl\langle x^{*},x \bigr\rangle -\mathcal {F}^{*}\bigl(x^{*}\bigr) \bigr], \quad x\in X, $$

where equality holds if and only if \(\mathcal {F}\) is convex and lower-semicontinuous. Finally, we define the object of interest of this work. For \(\mathcal {F}(x)<\infty\) and \(x^{*}\in\partial \mathcal {F}(x)\) the Bregman divergence \(\Delta_{\mathcal {F}}^{x^{*}}(y,x)\) is given by

$$ \Delta_{\mathcal {F}}^{x^{*}}(y,x)=\mathcal {F}(y)-\mathcal {F}(x)- \bigl\langle x^{*}, y-x \bigr\rangle \ge0, $$

for all \(y\in \mathcal {X}\). We will be especially interested in functionals \(\mathcal {F}(x)=\frac {1}{p} \Vert x \Vert ^{p}\) for some \(p\ge1\) and need to understand their subdifferentials, so finally we have the following. For some \(p\ge1\) the set-valued mapping \(J_{p}:\mathcal {X}\to2^{X^{*}}\) given by

$$ J_{p}(x)= \bigl\{ x^{*}\in \mathcal {X}^{*}: \bigl\langle x^{*},x \bigr\rangle = \bigl\Vert x^{*} \bigr\Vert \Vert x \Vert , \bigl\Vert x^{*} \bigr\Vert = \Vert x \Vert ^{p-1} \bigr\} $$

is called the duality mapping with respect to p of \(\mathcal {X}\). The sets \(J_{p}(x)\) are always non-empty. A mapping \(j_{p}:\mathcal {X}\to \mathcal {X}^{*}\) is called selection of \(J_{p}\) if \(j_{p}(x)\in J_{p}(x)\) for all \(x\in \mathcal {X}\). If \(\mathcal {F}(x)=\frac{1}{p} \Vert x \Vert ^{p}\), then the theorem of Asplund [8, Theorem 1] yields

$$ \partial \mathcal {F}(x)=J_{p}(x). $$

Moduli of smoothness and convexity

Finding upper bounds for (1) is related to the smoothness of the norm of \(\mathcal {X}\) whereas lower bounds are related to convexity. Thus it is necessary to understand the moduli of smoothness and convexity of the space \(\mathcal {X}\) and we shortly recall their definitions (see e.g. [9]):

Definition 3.1

The modulus of convexity \(\delta_{\mathcal {X}}\colon[0,2]\to[0,1]\) of the space \(\mathcal {X}\) is defined by

$$ \delta_{\mathcal {X}}(\varepsilon):=\inf\bigl\{ 1- \Vert y+\tilde{y} \Vert /2:y,\tilde {y}\in S_{\mathcal {X}}, \Vert y-\tilde{y} \Vert =\varepsilon \bigr\} . $$

The modulus of smoothness \(\rho_{\mathcal {X}}\colon[0,\infty)\to[0,\infty)\) of \(\mathcal {X}\) is defined by

$$ \rho_{\mathcal {X}}(\tau):= \sup\bigl\{ \bigl( \Vert x+\tau y \Vert + \Vert x-\tau y \Vert \bigr)/2-1:x,y\in S_{\mathcal {X}}\bigr\} . $$

The space \(\mathcal {X}\) is called uniformly convex if \(\delta_{\mathcal {X}}(\varepsilon)>0\) for every \(\varepsilon>0\). It is called uniformly smooth if \(\lim_{\tau\to0} \rho_{\mathcal {X}}(\tau)/\tau=0\). The space \(\mathcal {X}\) is called r-convex (or convex of power type r) if there exists a constant \(K>0\) such that \(\delta_{\mathcal {X}}(\varepsilon)\ge K\varepsilon^{r}\) for all \(\varepsilon\in [0,2]\). Similarly, it is called s-smooth (or smooth of power type s) if \(\rho_{\mathcal {X}}(\tau)\le K\tau^{s}\) for all \(\tau>0\).

These two moduli have a well-developed theory, which has been known in the literature for a long time and we will not discuss all their properties. However, for our proofs we will need the following properties.

Lemma 3.2

  1. 1.

    We have for \(\tau_{1}\le\tau_{2}\) that \(\rho_{\mathcal {X}}(\tau_{1}) /\tau_{1}\le \rho_{\mathcal {X}}(\tau_{2})/\tau_{2}\).

  2. 2.

    We see for all \(\overline {\tau }>0\) that there exists a constant \(C_{\overline {\tau }}\) such that for all Banach spaces \(\mathcal {X}\) we have

    $$ \rho_{\mathcal {X}}(\tau)\ge\bigl(1+\tau^{2}\bigr)^{\frac{1}{2}}-1\ge C_{\overline {\tau }}\tau^{2},\quad \tau\le \overline {\tau }. $$
  3. 3.

    If \(\delta_{\mathcal {X}}\) is extended byon \(\mathbb {R}\setminus[0,2]\) then \((2\delta_{\mathcal {X}})^{*}=2\rho_{\mathcal {X}^{*}}\).

  4. 4.

    There exists a convex function f such that \(\delta_{\mathcal {X}}(\tau /2)\le f(\tau)\le\delta_{\mathcal {X}}(\tau)\). In particular we have \(\delta_{\mathcal {X}}^{**}(\tau)\ge\delta_{\mathcal {X}}(\tau/2)\).


All statements follow from [9, Ch. 1.e]. The function f in the last statement can be chosen as the Orlicz function \(f(\tau):= (\rho_{\mathcal {X}^{*}} )(\tau/2)\) by [9, Lemmata 1.e.6, 1.e.7]. □

For our purposes it will be more natural to introduce new definitions of the moduli of smoothness and convexity related to functionals instead of spaces.

Definition 3.3

Let \(\mathcal {F}\colon \mathcal {X}\to \overline {\mathbb {R}}\) be some arbitrary function, \(x\in \mathcal {X}\), \(\mathcal {F}(x)<\infty\) and \(\xi \in \mathcal {X}^{*}\). Define the linearization error functional \(\Delta_{\mathcal {F}}^{\xi }(y,x)\) by

$$ \Delta_{\mathcal {F}}^{\xi }(y,x)=\mathcal {F}(y)-\mathcal {F}(x)- \langle \xi , y-x \rangle. $$

The modulus of smoothness \(\rho_{\mathcal {F},x}^{\xi }\colon[0,\infty)\to [0,\infty]\) of \(\mathcal {F}\) in x with respect to ξ is defined by

$$ \rho_{\mathcal {F},x}^{\xi }(\tau):=\sup_{y\in S_{\mathcal {X}}} \bigl\vert \mathcal {F}(x+\tau y)-\mathcal {F}(x)- \langle \xi ,\tau y \rangle \bigr\vert =\sup _{ \Vert x-y \Vert =\tau} \bigl\vert \Delta_{\mathcal {F}}^{\xi }(y,x) \bigr\vert . $$

The modulus of convexity \(\delta_{\mathcal {F},x}^{\xi }\colon[0,\infty)\to [0,\infty]\) of \(\mathcal {F}\) in x with respect to ξ is defined by

$$ \delta_{\mathcal {F},x}^{\xi }(\tau):=\inf_{ \Vert x-y \Vert =\tau} \bigl\vert \Delta_{\mathcal {F}}^{\xi }(y,x) \bigr\vert . $$

\(\mathcal {F}\) is called r-convex (or convex of power type r) in x (w.r.t. ξ) if there exist \(K,\overline {\tau }>0\) such that \(\delta_{\mathcal {F},x}^{\xi }(\tau)\ge K\tau^{r}\) for all \(0<\tau\le \overline {\tau }\). Similarly, it is called s-smooth (or smooth of power type s) in x (w.r.t. ξ) if \(\rho_{\mathcal {F},x}^{\xi }(\tau)\le K\tau^{s}\) for all \(0<\tau\le \overline {\tau }\).

The quantities \(\rho_{\mathcal {F},x}^{\xi }\), \(\delta_{\mathcal {F},x}^{\xi }\) give us a reformulation of our basic problem: We want to find upper bounds for \(\rho_{\mathcal {F},x}^{\xi }(\tau)\) and lower bounds for \(\delta_{\mathcal {F},x}^{\xi }(\tau)\). Before we show some properties of these functions we should state some simple facts for their interpretation.

Remark 3.4

We will mostly consider convex functions \(\mathcal {F}\) with \(\xi \in\partial \mathcal {F}(x)\) so that the linearization error functional is a Bregman divergence and one can neglect the absolute value.

\(\mathcal {F}\) is Fréchet-differentiable in x if and only if there exists \(\xi \in \mathcal {X}^{*}\), such that \(\rho_{\mathcal {F},x}^{\xi }(\tau)/\tau\to0\) as \(\tau\to0\). \(\mathcal {F}\) being s-smooth in x, with \(s\in (1,2]\) then can be seen as a stronger form of differentiability, comparable to fractional derivatives; however, \(\mathcal {F}\) being 2-smooth is not equivalent to twice differentiability but rather to the notion of strong smoothness.

If there exists a selection \(j:\mathcal {X}\to \mathcal {X}^{*}\) of the subdifferential of \(\mathcal {F}\), i.e. for every x exists \(j(x)\in\partial \mathcal {F}(x)\), then this implies already that \(\mathcal {F}\) is convex. \(\delta_{\mathcal {F},x}^{j(x)}(\tau)>0\) for all \(x,\tau\) implies strict convexity. As before r-convexity is an even stronger notion of convexity and 2-convexity is connected to strong convexity. In [10] the modulus of local (or total) convexity of \(\mathcal {F}\), \(\nu_{\mathcal {F}}(x,\tau)\), was introduced and is basically given by \(\delta_{\mathcal {F},x}^{\xi }(\tau)\) just that \(\langle \xi ,y-x \rangle\) is replaced by the right hand side derivative of \(\mathcal {F}\) at x in direction \(y-x\). If \(\mathcal {F}\) is convex and Gâteaux-differentiable then \(\nu_{\mathcal {F}}(x,\tau )\) coincides with \(\delta_{\mathcal {F},x}^{\xi }(\tau)\), where \(\xi =\mathcal {F}'(x)\). The modulus of total convexity has been studied in several papers, see e.g. [11]. There exist further definitions of moduli of convexity and smoothness related to functions (e.g. [4, 12]), but giving a complete overview over all such definitions goes beyond the scope of this work.

It turns out that for functionals \(\mathcal {F}\) that originate from the norm of \(\mathcal {X}\) the moduli of the space and of the functions are closely related.

Proposition 3.5

Let \(\mathcal {F}= \Vert \cdot \Vert _{\mathcal {X}}\) and for all \(x\in \mathcal {X}\) let \(\xi _{x}\in \partial \mathcal {F}(x)\) be arbitrary. We have

$$ \rho\le\sup_{x\in S_{\mathcal {X}}}\rho_{\mathcal {F},x}^{\xi _{x}}\le2\rho. $$


If we replace y by −y in the definition of \(\rho_{\mathcal {F},x}^{\xi _{x}}\) we see

$$\begin{aligned} 2 \rho(\tau)&= \sup \bigl\{ \mathcal {F}(x+\tau y)+\mathcal {F}(x-\tau y)-2\mathcal {F}(x) + \langle \xi _{x},y-y \rangle: x,y\in S_{\mathcal {X}}\bigr\} \\ &\le2\sup_{x\in S_{\mathcal {X}}}\rho_{\mathcal {F},x}^{\xi _{x}}(\tau) \end{aligned}$$

and for all \(x,y\in S_{\mathcal {X}}\) we have by the definition of the subdifferential and as \(\mathcal {F}(x)= \Vert x \Vert _{\mathcal {X}}=1\)

$$ \mathcal {F}(x+\tau y)-\mathcal {F}(x)- \langle \xi _{x},\tau y \rangle\le \mathcal {F}(x+\tau y)+\mathcal {F}(x- \tau y)-2\le2 \rho(\tau). $$


So this already gives us an upper bound for \(\rho_{ \Vert \cdot \Vert _{\mathcal {X}},x}^{\xi }(\tau)\) if \(x\in S_{\mathcal {X}}\), \(\xi \in\partial \mathcal {F}(x)\). For generalizing this to all \(x\in \mathcal {X}\) we use the following.

Proposition 3.6

If the functional \(\mathcal {F}\) is positively q-homogeneous then we have, for all \(x\in \mathcal {X}\setminus\{0\}\), \(\xi \in \mathcal {X}^{*}\),

$$ \Vert x \Vert ^{q}\delta_{\mathcal {F},x/ \Vert x \Vert }^{\xi / \Vert x \Vert ^{q-1}} \biggl( \frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr)\le \bigl\vert \Delta_{\mathcal {F}}^{\xi }(y,x) \bigr\vert \le \Vert x \Vert ^{q}\rho_{\mathcal {F},x/ \Vert x \Vert }^{\xi / \Vert x \Vert ^{q-1}} \biggl(\frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr) $$

and \(\xi / \Vert x \Vert ^{q-1}\in\partial \mathcal {F}(x/ \Vert x \Vert )\) if and only if \(\xi \in\partial \mathcal {F}(x)\).


If \(\mathcal {F}\) is positively q-homogeneous we have

$$ \bigl\vert \Delta_{\mathcal {F}}^{\xi }(y,x) \bigr\vert = \Vert x \Vert ^{q} \biggl\vert \Delta_{\mathcal {F}}^{\xi / \Vert x \Vert ^{q-1}} \biggl( \frac{y}{ \Vert x \Vert },\frac{x}{ \Vert x \Vert } \biggr) \biggr\vert , $$

so that the first claim follows from Definition 3.3. The second claim follows from multiplying (2) either by \(\Vert x \Vert ^{q}\) or \(\Vert x \Vert ^{-q}\). □

For convex functions \(\mathcal {F}\) one can show that both moduli are nondecreasing.

Proposition 3.7

Let \(\mathcal {F}\) be convex, \(x\in \mathcal {X}\) and \(\xi \in\partial \mathcal {F}(x)\). Then for \(\lambda\ge1\) one has

$$ \rho_{\mathcal {F},x}^{\xi }(\lambda\tau)\ge\lambda\rho_{\mathcal {F},x}^{\xi }( \tau), \qquad \delta_{\mathcal {F},x}^{\xi }(\lambda\tau)\ge\lambda \delta_{\mathcal {F},x}^{\xi }(\tau). $$

In particular \(\delta_{\mathcal {F},x}^{\xi }\), \(\rho_{\mathcal {F},x}^{\xi }\) are nondecreasing.


The idea is the same, as in [10, Sect. 2.4]. Let \(\lambda \ge1\). For all \(y\in \mathcal {X}\), \(\Vert y-x \Vert =\tau\) one can define \(y_{\lambda}=\lambda y+(1-\lambda)x\), so \(\Vert y_{\lambda}-x \Vert =\lambda \tau\). As \(\lambda\ge1\) we get by the convexity of \(\mathcal {F}\)

$$ \mathcal {F}\bigl(\lambda y+(1-\lambda)x\bigr)\ge\lambda \mathcal {F}(y)+(1-\lambda)\mathcal {F}(x) $$

(multiply by \(\lambda^{-1}\) and replace \(y=\lambda^{-1}y+(1-\lambda ^{-1})x\) to see the equivalence to the usual definition of convexity) and thus

$$ \frac{1}{\lambda}\Delta_{\mathcal {F}}^{\xi }(y_{\lambda},x)= \frac{1}{\lambda } \bigl(\mathcal {F}\bigl(\lambda y+(1-\lambda)x\bigr)-\mathcal {F}(x) \bigr)- \langle \xi ,y-x \rangle\ge \Delta_{\mathcal {F}}^{\xi }(y,x) . $$

So for all \(y\in \mathcal {X}\), \(\Vert y-x \Vert =\tau\) we find

$$ \Delta_{\mathcal {F}}^{\xi }(y,x)\le\frac{1}{\lambda} \rho_{\mathcal {F},x}^{\xi }(\lambda \tau), $$

which gives the first inequality. Similarly for all \(y\in \mathcal {X}\), \(\Vert y-x \Vert =\lambda\tau\) one can define \(\tilde{y}_{\lambda}=\frac{1}{\lambda}y+(1-\frac{1}{\lambda})x\), then \(\Vert \tilde{y}_{\lambda}-x \Vert =\tau\) and again convexity of \(\mathcal {F}\) can be used to show \(\Delta_{\mathcal {F}}^{\xi }(y,x)\ge\lambda\Delta_{\mathcal {F}}^{\xi }(\tilde{y}_{\lambda},x)\), which yields the other inequality. □

We also have a chain rule.

Proposition 3.8

Let \(f\colon \mathbb {R}\to \mathbb {R}\) and \(x\in \mathcal {X}\), \(\xi \in \mathcal {X}^{*}\), \(t\in \mathbb {R}\) be such that \(\rho_{f,\mathcal {F}(x)}^{t}\) is nondecreasing. Then for all \(\tau\ge0\) we have

$$ \rho_{f\circ \mathcal {F},x}^{t\xi }(\tau)\le \vert t \vert \rho_{\mathcal {F},x}^{\xi }(\tau )+\rho_{f,\mathcal {F}(x)}^{t} \bigl( \Vert \xi \Vert \tau+\rho_{\mathcal {F},x}^{\xi }(\tau) \bigr). $$


Let \(s=\mathcal {F}(x)\) and define functions R, r by

$$\begin{aligned}& \mathcal {F}(x+y)-\mathcal {F}(x) = \langle \xi ,y \rangle+R(y) \quad \forall y\in \mathcal {X}, \\& f(s+h)-f(s) =th+r(h) \quad \forall h\in \mathbb {R}. \end{aligned}$$

Then we have for \(\tau>0\) and \(y\in S_{\mathcal {X}}\)

$$\begin{aligned} f\circ \mathcal {F}(x+\tau y)-f\circ \mathcal {F}(x)&=t \bigl( \langle \xi ,\tau y \rangle+R(y) \bigr)+r \bigl( \langle \xi ,\tau y \rangle+R(\tau y) \bigr) \\ &= \langle t\xi ,\tau y \rangle+tR(\tau y)+r \bigl( \langle \xi ,\tau y \rangle+R( \tau y) \bigr). \end{aligned}$$

Now the claim follows from \(R(\tau y)\le\rho_{\mathcal {F},x}^{\xi }(\tau)\) and \(r(h)\le\rho_{f,\mathcal {F}(x)}^{t}( \vert h \vert )\) together with the assumption that \(\rho_{f,\mathcal {F}(x)}^{t}\) is a nondecreasing function. □

Propositions 3.5, 3.7 and 3.8 are already sufficient to find upper bounds on \(\rho_{\mathcal {F},x}^{\xi }\) for \(\mathcal {F}=f ( \Vert x \Vert _{\mathcal {X}})\) if f is convex and if we sufficiently understand the smoothness of f and of the space \(\mathcal {X}\). Regarding lower bounds the following proposition will be our key instrument.

Proposition 3.9

Let \(\mathcal {F}\) convex and x be such there exists \(\xi \in\partial \mathcal {F}(x)\). We have

$$ \bigl(\delta_{\mathcal {F},x}^{\xi }\bigr)^{*}= \rho_{\mathcal {F}^{*},\xi }^{x} . $$

Further we see that \(\mathcal {F}\) is p-convex in x w.r.t. ξ if and only if \(\mathcal {F}^{*}\) is \(p'\)-smooth in ξ w.r.t. x.


We have

$$\begin{aligned} \rho_{\mathcal {F}^{*},\xi }^{x}(\tau)&=\sup_{y^{*}\in S_{\mathcal {X}^{*}}} \bigl[\mathcal {F}^{*} \bigl(\xi +\tau y^{*}\bigr)-\mathcal {F}^{*}(\xi )- \bigl\langle \tau y^{*},x \bigr\rangle \bigr] \\ &=\sup_{y^{*}\in S_{\mathcal {X}^{*}}}\sup_{y\in \mathcal {X}} \bigl[ \bigl\langle \xi +\tau y^{*},y \bigr\rangle -\mathcal {F}(y)-\mathcal {F}^{*}(\xi )- \bigl\langle \tau y^{*},x \bigr\rangle \bigr] \\ &=\sup_{y\in \mathcal {X}} \bigl[ \langle \xi ,y \rangle-\mathcal {F}(y)-\mathcal {F}^{*}(\xi )+\tau \Vert y-x \Vert \bigr]. \end{aligned}$$

By Young’s equality (3) we then have

$$\begin{aligned} \rho_{\mathcal {F}^{*},\xi }^{x}(\tau) &=\sup_{y\in \mathcal {X}} \bigl[ \mathcal {F}(x)-\mathcal {F}(y) + \langle \xi ,y-x \rangle+\tau \Vert y-x \Vert \bigr] \\ &=\sup_{\varepsilon\in \mathbb {R}_{0}^{+}}\sup_{y\in \mathcal {X}, \Vert y-x \Vert =\varepsilon} \bigl[\varepsilon \tau-\Delta_{\mathcal {F}}^{\xi }(y,x) \bigr]= \bigl(\delta_{\mathcal {F},x}^{\xi }\bigr)^{*}(\tau). \end{aligned}$$

The second statement follows from (6), which gives

$$ \rho_{\mathcal {F}^{*},\xi }^{x}= \bigl(\delta_{\mathcal {F},x}^{\xi }\bigr)^{*}, \qquad \delta _{\mathcal {F},x}^{\xi }\ge \bigl( \delta_{\mathcal {F},x}^{\xi }\bigr)^{**}= \bigl( \rho_{\mathcal {F}^{*},\xi }^{x} \bigr)^{*} $$

and the fact that by Proposition 3.7 we have for \(\tau>\overline {\tau }\) \(\rho_{\mathcal {F},x}^{\xi }(\tau)\ge\tau\rho_{\mathcal {F},x}^{\xi }(\overline {\tau })/\overline {\tau }\), \(\delta_{\mathcal {F},x}^{\xi }(\tau)\ge\tau\delta_{\mathcal {F},x}^{\xi }(\overline {\tau })/\overline {\tau }\), so that in particular

$$\begin{aligned}& \bigl(\delta_{\mathcal {F},x}^{\xi }\bigr)^{*}\bigl(\tau^{*}\bigr)=\sup _{0\le\tau\le \overline {\tau }} \bigl[ \tau^{*}\tau-\delta_{\mathcal {F},x}^{\xi }( \tau) \bigr], \quad \text{for } \tau^{*}\le\delta_{\mathcal {F},x}^{\xi }( \overline {\tau })/\overline {\tau }, \\& \bigl(\rho_{\mathcal {F},x}^{\xi }\bigr)^{*}\bigl(\tau^{*}\bigr)=\sup _{0\le\tau\le \overline {\tau }} \bigl[ \tau^{*}\tau-\rho_{\mathcal {F},x}^{\xi }( \tau) \bigr], \quad \text{for } \tau^{*}\le\rho_{\mathcal {F},x}^{\xi }( \overline {\tau })/\overline {\tau }. \end{aligned}$$

Thus one can just put in the corresponding lower or upper bound and calculate the maximum, which completes the proof. □

Application to norm powers

In this section we will consider \(\mathcal {F}=\frac{1}{p} \Vert \cdot \Vert ^{p}\) for some \(p>1\) and use the theory from the last chapter to reproduce the main results from [5]. Note that in the light of Proposition 3.6 it is sufficient to understand \(\delta _{\mathcal {F},x}^{j_{p}(x)}\) and \(\rho_{\mathcal {F},x}^{j_{p}(x)}\) for \(x\in S_{\mathcal {X}}\).

Theorem 4.1

For some fixed \(p>1\) let \(\mathcal {F}=\frac{1}{p} \Vert \cdot \Vert ^{p}\).

  1. 1.

    For all \(\overline {\tau }>0\) exists a constant \(C_{\overline {\tau },p}>0\), such that for \(x\in S_{\mathcal {X}}\) and \(\tau\le \overline {\tau }\) we have

    $$ \rho_{\mathcal {F},x}^{j_{p}(x)}(\tau)\le C_{\overline {\tau },p}\rho_{\mathcal {X}} ( \tau). $$
  2. 2.

    If we have for \(\overline {\tau }>0\), \(\tau\le \overline {\tau }\) and all \(x\in S_{\mathcal {X}}\)

    $$ \rho_{\mathcal {F},x}^{j_{p}(x)}(\tau)\le\phi(\tau), $$


    $$ \rho_{\mathcal {X}}(\tau)\le p^{1/p-1}\phi(\tau)+C_{\overline {\tau }} \tau^{2}, $$

    for \(\tau\le \overline {\tau }\). In particular if \(\phi\colon \mathbb {R}^{+}\to \mathbb {R}^{+}\) fulfills \(\lim_{\tau\to0}\phi(\tau)/\tau=0\), then \(\mathcal {X}\) is uniformly smooth.

  3. 3.

    Let \(\frac{1}{p}+\frac{1}{p'}=1\). For all \(x\in S_{\mathcal {X}}\), \(\overline {\tau }>0\) we have

    $$ \delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge C_{\overline {\tau },p'} \delta_{\mathcal {X}} (\tau /C_{\overline {\tau },p'}), \quad \tau\le\frac{p'-1}{2} \overline {\tau }, $$

    where \(C_{\overline {\tau },p'}\) is the constant from 1.

  4. 4.

    If there exists \(\overline {\tau }>0\) such that we have for all \(x\in S_{\mathcal {X}}\) and \(\tau\le \overline {\tau }\)

    $$ \delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge\phi(\tau), $$

    where \(\phi\colon \mathbb {R}^{+}\to \mathbb {R}^{+}\) is nondecreasing and \(\phi(\tau )>0\) for \(\tau>0\), then \(\mathcal {X}\) is uniformly convex.


Claim 1: Note that \(\mathcal {F}=f\circ \Vert \cdot \Vert \), with \(f(t)=\frac{1}{p}t^{p}\), which is convex, thus \(\rho _{f,\mathcal {F}(x)}^{1}\) is nondecreasing by Proposition 3.7, so Proposition 3.8 gives

$$ \rho_{\mathcal {F},x}^{j_{p}(x)}(\tau)\le \rho_{ \Vert \cdot \Vert ,x}^{j_{p}(x)}( \tau)+\rho_{f,\mathcal {F}(x)}^{1} \bigl(\tau+\rho_{ \Vert \cdot \Vert ,x}^{j_{p}(x)}( \tau ) \bigr). $$

We have by Taylor’s theorem

$$\rho_{f,1}^{1}(\tau)=\sup_{\sigma\in\{-1,+1\}} \frac{p-1}{2}\tau ^{2}+ r(\sigma\tau)\tau^{2}\le C \tau^{2}, \quad \text{for } \tau\le 3\overline {\tau }, $$

where the inequality holds for a constant depending on p and τ̅ as \(\rho_{f,1}^{1}\) is always finite and so is the remainder r. We have \(j_{p}(x)\in\partial \Vert \cdot \Vert (x)\) for \(x\in S_{\mathcal {X}}\), so by Proposition 3.5 we have \(\rho_{ \Vert \cdot \Vert ,x}^{j_{p}(x)}(\tau)\le2\rho_{\mathcal {X}}(\tau )\) and one can easily see that \(\rho_{\mathcal {X}}(\tau)\le\tau\). So we have

$$ \rho_{\mathcal {F},x}^{j_{p}(x)}(\tau)\le2\rho_{\mathcal {X}}(\tau)+ 9 C \tau^{2}\le (2+9C/C_{\overline {\tau }})\rho_{\mathcal {X}}(\tau),\quad \tau \le \overline {\tau }, $$

where the second inequality follows from item 2 of Lemma 3.2.

Claim 2: Note that \(\Vert \cdot \Vert =f^{-1}\circ \mathcal {F}\) and \(f^{-1}(t)= (pt )^{\frac {1}{p}}\) is concave, thus \(-f^{-1}\) is convex and it is differentiable, so \(-1\in\partial (-f^{-1} ) (\frac{1}{p} )\) and by Proposition 3.7 \(\rho_{f^{-1},1/p}^{1}=\rho _{-f^{-1},1/p}^{-1}\) is nondecreasing. Then Proposition 3.8 gives for all \(x\in S_{\mathcal {X}}\)

$$ \rho_{ \Vert \cdot \Vert ,x}^{j_{p}(x)}(\tau)\le\rho_{\mathcal {F},x}^{j_{p}(x)}(\tau )+ \rho_{f^{-1},1/p}^{1} \bigl(\tau+\rho_{\mathcal {F},x}^{j_{p}(x)}( \tau ) \bigr)\le\phi(\tau)+C_{\overline {\tau }}\tau^{2}, $$

where the second inequality follows by Taylor’s theorem as above and the fact that by Claim 1 we always have \(\rho _{\mathcal {F},x}^{j_{p}(x)}(\tau)\le C\tau\) for some \(C>0\). Thus Proposition 3.5 yields the assertion.

Claim 3: First of all note that \(\mathcal {F}^{*}(t)=\frac{1}{p'}t^{p'}\), with \(\frac{1}{p}+\frac{1}{p'}=1\). We have

$$ \delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge \bigl(\delta_{\mathcal {F},x}^{j_{p}(x)} \bigr)^{**}(\tau)= \bigl(\rho_{\mathcal {F}^{*},j_{p}(x)}^{x} \bigr)^{*}( \tau) =\sup_{r\ge0} \bigl[\tau r- \rho_{\mathcal {F}^{*},j_{p}(x)}^{x}(r) \bigr]. $$

By Claim 1 we have for all \(x\in S_{\mathcal {X}}\) \(\rho _{\mathcal {F}^{*},j_{p}(x)}^{x}(r)\le C_{\overline {\tau },p'}\rho_{\mathcal {X}^{*}}(r)\) for all \(0< r<\overline {\tau }\). We are only interested in the case \(\tau\to0\) so let \(\tau\le C_{\overline {\tau },p'}\rho_{\mathcal {X}^{*}}(\overline {\tau })/\overline {\tau }\), where \(\rho_{\mathcal {X}^{*}}(\overline {\tau })/\overline {\tau }>0\) by Lemma 3.2, 2. Then by Lemma 3.2, 1. we have \(\tau r\le C_{\overline {\tau },p'}\rho_{\mathcal {X}^{*}}(r)\) for \(r\ge \overline {\tau }\) and thus find

$$ \sup_{0\le r} \bigl[\tau r- \rho_{\mathcal {F}^{*},j_{p}(x)}^{x}(r) \bigr]\ge \sup_{0\le r\le \overline {\tau }} \bigl[\tau r- C_{\overline {\tau },p'} \rho_{\mathcal {X}^{*}}(r) \bigr]= (C\rho_{\mathcal {X}^{*}} )^{*}(\tau). $$

So we have by Lemmas 3.2, 3 and 4

$$ \delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge (C_{\overline {\tau },p'} \rho_{\mathcal {X}} )^{*}(\tau)=\frac{C_{\overline {\tau },p'}}{2} (2\delta_{\mathcal {X}} )^{**} \biggl(\frac{2\tau}{C_{\overline {\tau },p'}} \biggr) \ge C_{\overline {\tau },p'} ( \delta_{\mathcal {X}} ) \biggl(\frac{\tau }{C_{\overline {\tau },p'}} \biggr). $$

Finally, note that \(C_{\overline {\tau },p'}\ge\frac{p'-1}{2}C^{-1}_{\overline {\tau }}\), with \(C_{\overline {\tau }}\) from item 2 of Lemma 3.2 and thus \(C_{\overline {\tau },p'}\rho_{\mathcal {X}^{*}}(\overline {\tau })/ \overline {\tau }\ge\frac {p'-1}{2}\overline {\tau }\).

Claim 4: By assumption we have \(\delta _{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge\phi(\tau)\) for \(\tau\le \overline {\tau }\) and by Proposition 3.7 we have for \(\tau>\overline {\tau }\) \(\delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge\tau \delta_{\mathcal {F},x}^{j_{p}(x)}(\overline {\tau })/\overline {\tau }\) and thus \(\delta_{\mathcal {F},x}^{j_{p}(x)}(\tau)\ge\tilde{\phi}(\tau)\) with

$$ \tilde{\phi}(\tau):= \textstyle\begin{cases} \phi(\tau), &\tau\le \overline {\tau }, \\ \tau\phi(\overline {\tau })/\overline {\tau }, &\tau> \overline {\tau }. \end{cases} $$

So by Proposition 3.9 we have for all \(x^{*}\in S_{\mathcal {X}^{*}}\)

$$ \rho_{\mathcal {F}^{*},x^{*}}^{j_{p}^{*}(x^{*})}(\tau)= \bigl(\delta_{\mathcal {F},j_{p}^{*}(x^{*})}^{x^{*}} \bigr)^{*}(\tau)\le\tilde{\phi}^{*}(\tau). $$

Now just observe that for \(\tau<\phi(\overline {\tau })/\overline {\tau }\) we have

$$ \frac{\tilde{\phi}^{*}(\tau)}{\tau}=\sup_{0\le t} \biggl[ t-\frac {\tilde{\phi}(t)}{\tau} \biggr]=\sup_{0\le t\le \overline {\tau }} \biggl[ t-\frac{\phi(t)}{\tau} \biggr]\to0, \quad \tau\to0, $$

as ϕ is nondecreasing. So by part 2 of the theorem we see that \(\mathcal {X}^{*}\) is uniformly smooth from which it follows that \(\mathcal {X}\) is uniformly convex [9, Prop. 1.e.2]. □

Remark 4.2

One can see from the above proof that in the asymptotic case \(\overline {\tau }\to0\) one can choose the constant \(C_{\overline {\tau },p}\) such that

$$ C_{\overline {\tau },p}\to \textstyle\begin{cases} 2, &\mathcal {X}\text{ is not 2-smooth}, \\ 1+p, &\mathcal {X}\text{ is 2-smooth}. \end{cases} $$

These constants are not sharp for every space \(\mathcal {X}\), but at least in the asymptotic case the constants are much simpler than the ones given in [5]. For best known constants with respect to \(L^{p}\) spaces we refer to [13] and [14].

The above theorem combined with Proposition 3.6 gives us upper and lower bounds on the Bregman divergence for \(\Vert x-y \Vert \le \overline {\tau }\Vert x \Vert \). However, as for large \(\Vert x-y \Vert \) the Bregman divergence will be dominated by the term \(\frac{1}{p} \Vert y \Vert ^{p}\) it is not difficult to also find bounds that hold for all \(x,y\in X\). Further one can additionally conclude bounds for the symmetric Bregman divergence,

$$ \Delta_{\mathcal {F}}^{\mathrm{sym}}(x,y):=\Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)+ \Delta _{\mathcal {F}}^{j_{p}(y)}(x,y)= \bigl\langle j_{p}(x)-j_{p}(y),x-y \bigr\rangle , $$

from our theorem. These two claims are shown in the following two propositions.

Proposition 4.3

For some fixed \(p>1\) let \(\mathcal {F}=\frac{1}{p} \Vert \cdot \Vert ^{p}\) and let \(\phi\colon \mathbb {R}^{+}\to \mathbb {R}^{+}\) be nondecreasing. Let \(V=\mathcal {X}\setminus\{ 0\} \times \mathcal {X}\) and define the statements:

$$\begin{aligned} &\exists C,c>0\ \forall(x,y)\in V, \Vert x-y \Vert \le c \Vert x \Vert \mbox{:} \quad \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\le C \Vert x \Vert ^{p}\phi \biggl(\frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr) , \end{aligned}$$
$$\begin{aligned} &\exists C>0\ \forall(x,y)\in V \mbox{:}\quad \Delta_{\mathcal {F}}^{\mathrm{sym}}(x,y)\le C \max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \phi \biggl(\frac{2 \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr) , \end{aligned}$$
$$\begin{aligned} &\exists C>0\ \forall(x,y)\in V \mbox{:}\quad \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\le C \max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \phi \biggl(\frac{2 \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr). \end{aligned}$$

Then (a) (b) (c). Obviously one also has (c) {(a) with ϕ replaced by \(\phi(2\cdot )\)}.


We only show that (a) implies (b) as (b) (c) follows trivially. Without loss of generality let \(c\le1\). First of all assume \(\frac{ \Vert x-y \Vert }{ \Vert x \Vert }>c\). Then by

$$ \frac{ \Vert x-y \Vert }{ \Vert x \Vert }\frac{ \Vert x \Vert }{ \Vert y \Vert }=\frac{ \Vert x-y \Vert }{ \Vert y \Vert }\ge\frac{ \Vert y \Vert - \Vert x \Vert }{ \Vert y \Vert } \ge1- \frac{ \Vert x \Vert }{ \Vert y \Vert } $$

one can see that regardless of whether we have \(\Vert x \Vert / \Vert y \Vert >1/2\) or \(\Vert x \Vert / \Vert y \Vert \le 1/2\) one always has \(\frac{2 \Vert x-y \Vert }{ \Vert y \Vert }>c\). So by

$$\begin{aligned} \Delta_{\mathcal {F}}^{\mathrm{sym}}(x,y)&= \bigl\langle j_{p}(x)-j_{p}(y),x-y \bigr\rangle \le \Vert x \Vert ^{p}+ \Vert y \Vert ^{p}+ \Vert x \Vert ^{p-1} \Vert y \Vert + \Vert y \Vert ^{p-1} \Vert x \Vert \\ &\le4\max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \end{aligned}$$

we find that

$$\Delta_{\mathcal {F}}^{\mathrm{sym}}(x,y)\le\frac{4}{\phi (c )}\max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \phi \biggl( \frac{2 \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr). $$

Now consider the case \(\Vert x-y \Vert / \Vert x \Vert \le c\le 1\). In this range we can conclude (b) from (a) as \(\Vert y \Vert \le2 \Vert x \Vert \), so that

$$\phi \biggl(\frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr)\le \phi \biggl(\frac{2 \Vert x-y \Vert }{ \Vert y \Vert } \biggr). $$


Proposition 4.4

For some fixed \(p>1\) let \(\mathcal {F}=\frac{1}{p} \Vert \cdot \Vert ^{p}\), let \(\phi \colon \mathbb {R}^{+}\to \mathbb {R}^{+}\) be nondecreasing and \(\phi(\tau)>0\) for \(\tau>0\). Let \(V=\mathcal {X}\setminus\{0\} \times \mathcal {X}\) and define the statements:

$$\begin{aligned} &\exists C,c>0\ \forall (x,y)\in V, \Vert x-y \Vert \le c \Vert x \Vert \mbox{:}\quad \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge C \Vert x \Vert ^{p}\phi \biggl(\frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr) , \end{aligned}$$
$$\begin{aligned} &\exists C>0\ \forall(x,y)\in V \mbox{:}\quad \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge C \max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \phi \biggl(\frac{ \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr) , \end{aligned}$$
$$\begin{aligned} &\exists C>0\ \forall(x,y)\in V \mbox{:}\quad \Delta_{\mathcal {F}}^{\mathrm{sym}}(x,y)\ge C \max\bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \phi \biggl(\frac{ \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr). \end{aligned}$$

Then (d) (e) (f).


The proof is very similar to the previous proof so we just sketch it. We look at three different cases. By Proposition 3.7 we know that \(\delta_{\mathcal {F},x}^{j_{p}(x)}\) is nondecreasing, so (d) gives also for \(\Vert x-y \Vert / \Vert x \Vert \ge c\) \(\Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge C \Vert x \Vert ^{p}\phi(c)\) and thus

$$ \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge \textstyle\begin{cases} C \Vert x \Vert ^{p}\phi (\frac{ \Vert x-y \Vert }{ \Vert x \Vert } ), & \frac { \Vert x-y \Vert }{ \Vert x \Vert } \le c, \\ C \Vert x \Vert ^{p}\phi(c), &c \le\frac{ \Vert x-y \Vert }{ \Vert x \Vert }< N, \\ C_{p,\phi,N} \Vert y \Vert ^{p}\phi (\frac{ \Vert x-y \Vert }{ \Vert y \Vert } ), &N \le\frac{ \Vert x-y \Vert }{ \Vert x \Vert }, \end{cases} $$

for sufficiently large \(N>3\), where the last line follows from the fact that \(\frac{ \Vert x-y \Vert }{ \Vert x \Vert }\ge N\) implies \(\Vert y \Vert \ge(N-1) \Vert x \Vert \), which implies \(\frac{ \Vert x-y \Vert }{ \Vert y \Vert }\ge1-(N-1)^{-1}\ge1/2\) and thus it suffices to see that the Bregman divergence will be dominated by \(\Vert y \Vert ^{p}\), for sufficiently large N. To conclude (e) one then basically has to redefine the constants. Equation (f) follows trivially. □

To conclude this chapter we combine the results and summarize the most important inequalities.

Corollary 4.5

Let \(\mathcal {X}\) be a Banach space and \(\mathcal {F}(x)=\frac{1}{p} \Vert x \Vert ^{p}\) for \(p>1\). Then there exist constants \(C_{1},C_{2}>0\) such that for all \(x,y\in \mathcal {X}\) we have

$$ \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\le C_{1} \max \bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \rho_{\mathcal {X}}\biggl(\frac{2 \Vert x-y \Vert }{\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr) $$


$$ \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge C_{2} \max \bigl\{ \Vert x \Vert , \Vert y \Vert \bigr\} ^{p} \delta _{\mathcal {X}}\biggl(\frac{ \Vert x-y \Vert }{3\max\{ \Vert x \Vert , \Vert y \Vert \}} \biggr). $$

If the space \(\mathcal {X}\) is s-smooth, then there exists \(C>0\) and for all \(\overline {\tau }>0\) also \(C_{\overline {\tau }}>0\) such that

$$ \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\le \textstyle\begin{cases} C \Vert x-y \Vert ^{s}, & p=s, \\ C_{\overline {\tau }} \Vert x \Vert ^{p-s} \Vert x-y \Vert ^{s},\quad \textit{for }\frac { \Vert x-y \Vert }{ \Vert x \Vert }\le \overline {\tau }, & p\neq s. \end{cases} $$

If the space \(\mathcal {X}\) is r-convex, then there exists \(\tilde{C}>0\) and for all \(\overline {\tau }>0\) also \(\tilde{C}_{\overline {\tau }}>0\) such that

$$ \Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\ge \textstyle\begin{cases} \tilde{C} \Vert x-y \Vert ^{r}, & p=r, \\ \tilde{C}_{\overline {\tau }} \Vert x \Vert ^{p-r} \Vert x-y \Vert ^{r},\quad \textit{for }\frac{ \Vert x-y \Vert }{ \Vert x \Vert }\le \overline {\tau }, & p\neq r. \end{cases} $$


By Proposition 3.6 we have

$$ \Vert x \Vert ^{p}\delta_{\mathcal {F},x/ \Vert x \Vert }^{j_{p}(x)/ \Vert x \Vert ^{p-1}} \biggl( \frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr)\le\Delta_{\mathcal {F}}^{j_{p}(x)}(y,x)\le \Vert x \Vert ^{p}\rho _{\mathcal {F},x/ \Vert x \Vert }^{j_{p}(x)/ \Vert x \Vert ^{p-1}} \biggl(\frac{ \Vert x-y \Vert }{ \Vert x \Vert } \biggr). $$

Thus item 1 of Theorem 4.1 and the s-smoothness show the bound (9) for \(x\in \mathcal {X}\) and y such that \(\Vert x-y \Vert \le \overline {\tau }\Vert x \Vert \). Similarly item 3 of Theorem 4.1 and the r-convexity show (10) for \(x\in \mathcal {X}\) and y such that \(\Vert x-y \Vert \le\frac {p'-1}{2}\overline {\tau }\Vert x \Vert \). As this holds true for all \(\overline {\tau }>0\) one can just replace \(\tilde{\overline {\tau }}=\frac{2\overline {\tau }}{p'-1}\). Apply Proposition 4.3 and Proposition 4.4 to conclude from this the uniform bounds for all \(x,y\in \mathcal {X}\). □


  1. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    MathSciNet  Article  Google Scholar 

  2. Burger, M.: Bregman distances in inverse problems and partial differential equations. In: Advances in Mathematical Modeling, Optimization and Optimal Control. Springer Optimization and Its Applications, pp. 3–33. Springer, Cham (2016)

    Chapter  Google Scholar 

  3. Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Numerical Mathematics and Scientific Computation. Oxford University Press, New York (1997)

    MATH  Google Scholar 

  4. Reem, D., Reich, S., Pierro, A.D.: Re-examination of Bregman functions and new properties of their divergences. Optimization, 1–70 (2018)

  5. Xu, Z.-B., Roach, G.F.: Characteristic inequalities of uniformly convex and uniformly smooth Banach spaces. J. Math. Anal. Appl. 157(1), 189–210 (1991)

    MathSciNet  Article  Google Scholar 

  6. Zǎlinescu, C.: On uniformly convex functions. J. Math. Anal. Appl. 95(2), 344–374 (1983)

    MathSciNet  Article  Google Scholar 

  7. Azé, D., Penot, J.-P.: Uniformly convex and uniformly smooth convex functions. Ann. Fac. Sci. Toulouse Math. 4(4), 705–730 (1995)

    MathSciNet  Article  Google Scholar 

  8. Asplund, E.: Positivity of duality mappings. Bull. Am. Math. Soc. 73(2), 200–203 (1967)

    MathSciNet  Article  Google Scholar 

  9. Lindenstrauss, J., Tzafriri, L.: Classical Banach Spaces II: Function Spaces. Ergebnisse der Mathematik und Ihrer Grenzgebiete, vol. 97. Springer, Berlin (1979)

    Book  Google Scholar 

  10. Butnariu, D., Censor, Y., Reich, S.: Iterative averaging of entropic projections for solving stochastic convex feasibility problems. Comput. Optim. Appl. 8(1), 21–39 (1997)

    MathSciNet  Article  Google Scholar 

  11. Butnariu, D., Reich, S., Zaslavski, A.J.: There are many totally convex functions. J. Convex Anal. 13(3–4), 623 (2006)

    MathSciNet  MATH  Google Scholar 

  12. Borwein, J., Guirao, A.J., Hájek, P., Vanderwerff, J.: Uniformly convex functions on Banach spaces. Proc. Am. Math. Soc. 137(3), 1081–1091 (2008)

    MathSciNet  Article  Google Scholar 

  13. Xu, Z.-B.: Characteristic inequalities of \({L}^{p}\) spaces and their applications. Acta Math. Sin. 32, 209–218 (1989) (Chinese)

    MATH  Google Scholar 

  14. Xu, Z.-B., Zhang, Z.-S.: Another set of characteristic inequalities of \({L}^{p}\) spaces. Acta Math. Sin. 37, 433–439 (1989) (Chinese)

    Google Scholar 

Download references


I thank my supervisor Thorsten Hohage for many helpful comments. Financial support by Deutsche Forschungsgemeinschaft through grant CRC 755, project C09, and RTG 2088 is gratefully acknowledged.

Availability of data and materials

Not applicable.


Financial support given by Deutsche Forschungsgemeinschaft through grants CRC 755 and RTG 2088.

Author information

Authors and Affiliations



There is only one author. The author read and approved the final manuscript.

Corresponding author

Correspondence to Benjamin Sprung.

Ethics declarations

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sprung, B. Upper and lower bounds for the Bregman divergence. J Inequal Appl 2019, 4 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Bregman divergence
  • Bregman distance
  • Uniform smoothness
  • Uniform convexity
  • Total convexity