Wasserstein bounds in CLT of approximative MCE and MLE of the drift parameter for Ornstein-Uhlenbeck processes observed at high frequency

This paper deals with the rate of convergence for the central limit theorem of estimators of the drift coefficient, denoted θ, for the Ornstein-Uhlenbeck process X:={Xt,t≥0}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$X := \{X_{t},t\geq 0\}$\end{document} observed at high frequency. We provide an approximate minimum contrast estimator and an approximate maximum likelihood estimator of θ, namely θ˜n:=1/(2n∑i=1nXti2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\widetilde{\theta}_{n}:= {1}/{ (\frac{2}{n} \sum_{i=1}^{n}X_{t_{i}}^{2} )}$\end{document}, and θˆn:=−∑i=1nXti−1(Xti−Xti−1)/(Δn∑i=1nXti−12)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\widehat{\theta}_{n}:= -{\sum_{i=1}^{n} X_{t_{i-1}} (X_{t_{i}}-X_{t_{i-1}} )}/{ (\Delta _{n} \sum_{i=1}^{n} X_{t_{i-1}}^{2} )}$\end{document}, respectively, where ti=iΔn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t_{i} = i \Delta _{n}$\end{document}, i=0,1,…,n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$i=0,1,\ldots , n $\end{document}, Δn→0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Delta _{n}\rightarrow 0$\end{document}. We provide Wasserstein bounds in the central limit theorem for θ˜n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\widetilde{\theta}_{n}$\end{document} and θˆn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\widehat{\theta}_{n}$\end{document}.


Introduction
Let X := {X t , t ≥ 0} be the Ornstein-Uhlenbeck (OU) process driven by a Brownian motion {W t , t ≥ 0}.More precisely, X is the solution of the following linear stochastic differential equation X 0 = 0; dX t = −θX t dt + dW t , t ≥ 0, (1.1) where θ > 0 is an unknown parameter.The drift parametric estimation for the OU process (1.1) has been widely studied in the literature.There are several methods that can estimate the parameter θ in (1.1) such as maximum likelihood estimation, least squares estimation and minimum contrast estimation, we refer to monographs [14,15].While for the study of the asymptotic distribution of the estimators of θ based on discrete observations of X, there is extensive literature, only several works have been dedicated to the rates of weak convergence of the distributions of the estimators to the standard normal distribution.From a practical point of view, in parametric inference, it is more realistic and interesting to consider asymptotic estimation for (1.1) based on discrete observations.Thus, let us assume that the process X given in (1.1), is observed equidistantly in time with the step size This project was funded by Kuwait Foundation for the Advancement of Sciences (KFAS) under project code: PR18-16SM-04.
∆ n : t i = i∆ n , i = 0, • • • , n, and T = n∆ n denotes the length of the "observation window".Here we are concerned with the approximate minimum contrast estimator (AMCE) , and the approximate maximum likelihood estimator (AMLE) which are discrete versions of the minimum contrast estimator (MCE) and the maximum likelihood estimator (MLE) defined, respectively, as follows: Recall that, for two random variables X and Y , the Wasserstein metric is given by where Lip(1) is the set of all Lipschitz functions with Lipschitz constant 1.
Rates of convergence in the central limit theorem of the MCE θT and MLE θT under the Kolmogorov and Wasserstein distances have been studied as follows: There exist c, C > 0 depending only on θ such that where N ∼ N (0, 1) denotes a standard normal random variable.The purpose of this manuscript is to derive upper bounds of the Wasserstein distance for the rates of convergence of the distribution of the AMCE θ n and the AMLE θ n .These estimators are unbiased and we show that they are consistent and admit a central limit theorem as ∆ n → 0 and T → ∞.Moreover, we bound the rate of convergence to the normal distribution in terms of Wasserstein distance.
Note that the papers [3] and [5] provided explicit upper bounds for the Kolmogorov distance for the rates of convergence of the distribution of θ n and θ n , respectively.On the other hand, [8] provided Wasserstein bounds in central limit theorem for θ n .Let us describe what is proved in this direction: • Theorem 2.1 in [3] shows that there exists C > 0 depending on θ such that • Theorem 2.3 in [5] proves that there exists C > 0 depending on θ such that • Theorem 5.4 in [8] establishes that there exists C > 0 depending on θ such that The aim of the present paper is to provide new explicit bounds for the rate of convergence in the CLT of the estimators θ n and θ n under the Wasserstein metric as follows: There exists a constant C > 0 such that, for all n ≥ 1, T > 0, see Theorem 3.6, and see Theorem 4.1.
Remark 1.2.The estimates (1.5) and (1.6) show that we have improved the bounds on the error of normal approximation for θ n and θ n .In other words, it is clear that the obtained bounds in (1.5) and (1.6) are sharper than the bounds in (1.2), (1.3) and (1.4).
To finish this introduction, we note the general structure of this paper.Section 2 contains some preliminaries presenting the tools needed from the analysis on Wiener space, including Wiener chaos calculus and Malliavin calculus.Upper bounds for the rates of convergence of the distribution of the AMCE θ n and the AMLE θ n are provided in Section 3 and Section 4, respectively.

Preliminaries
This section gives a brief overview of some useful facts from the Malliavin calculus on Wiener space.Some of the results presented here are essential for the proofs in the present paper.For our purposes we focus on special cases that are relevant for our setting and omit the general high-level theory.We direct the interested reader to [18,Chapter 1]and [16,Chapter 2].
Fix (Ω, F , P ) for the Wiener space of a standard Wiener process W = (W t ) t≥0 .The first step is to identify the general centered Gaussian process (Z t ) ≥0 with an isonormal Gaussian process X = {X(h), h ∈ H} for some Hilbert space H. Recall that for such processes X, for every One can define H as the closure of real-valued step functions on [0, ∞) with respect to the inner product The next step involves the multiple Wiener-Itô integrals.The formal definition involves the concepts of Malliavin derivative and divergence.We refer the reader to [18, Chapter 1]and [16,Chapter 2].For our purposes we define the multiple Wiener-Itô integral I p via the Hermite polynomials H p .In particular, for h ∈ H with h H = 1, and any p ≥ 1, For p = 1 and p = 2 we have the following: Note also that I 0 can be taken to be the identity operator.
• Some notation for Hilbert spaces.Let H be a Hilbert space.Given an integer q ≥ 2 the Hilbert spaces H ⊗q and H ⊙q correspond to the qth tensor product and qth symmetric tensor product of H.If f ∈ H ⊗q is given by f = j 1 ,...,jq a(j 1 , . . ., j q )e j 1 ⊗ • • • e jq , where (e j i ) i∈ [1,q] form an orthonormal basis of H ⊗q , then the symmetrization f is given by where the first sum runs over all permutations σ of {1, . . ., q}.Then f is an element of H ⊙q .We also make use of the concept of contraction.The rth contraction of two tensor products • Isometry property of integrals [16, Proposition 2.7.5]Fix integers p, q ≥ 1 as well as f ∈ H ⊙p and g ∈ H ⊙q .
• Product formula [16, Proposition 2.7.10]Let p, q ≥ 1.If f ∈ H ⊙p and g ∈ H ⊙q then (2.5) • Hypercontractivity in Wiener Chaos.For every q ≥ 1, H q denotes the qth Wiener chaos of W , defined as the closed linear subspace of L 2 (Ω) generated by the random variables {H q (W (h)), h ∈ H, h H = 1} where H q is the qth Hermite polynomial.For any F ∈ ⊕ q l=1 H l (i.e. in a fixed sum of Wiener chaoses), we have It should be noted that the constants c p,q above are known with some precision when F is a single chaos term: indeed, by [16, Corollary 2.8.14], c p,q = (p − 1) q/2 .
• Optimal fourth moment theorem.Let N denote the standard normal law.Let a sequence X : X n ∈ H q , such that EX n = 0 and V ar [X n ] = 1 , and assume X n converges to a normal law in distribution, which is equivalent to lim n E [X 4 n ] = 3.Then we have the optimal estimate for total variation distance d T V (X n , N ), known as the optimal 4th moment theorem, proved in [17].This optimal estimate also holds with Wasserstein distance d W (X n , N ), see [8, Remark 2.2], as follows: there exist two constants c, C > 0 depending only on the sequence X but not on n, such that Moreover, we recall that the third and fourth cumulants are respectively In particular, when E[X] = 0, we have that If g ∈ H ⊗2 , then the third and fourth cumulants for I 2 (g) satisfy the following (see (6.2) and (6.6) in [2], respectively), and (2.9)
Throughout the paper N denotes a standard normal random variable.Also, C denotes a generic positive constant (perhaps depending on θ, but not on anything else), which may change from line to line.

Approximate minimum contrast estimator
In this section we prove the consistency and provide upper bounds in the Wasserstein distance for the rate of normal convergence of an approximate minimum contrast estimator of the drift parameter θ of the Ornstein-Uhlenbeck process X := {X t , t ≥ 0} driven by a Brownian motion {W t , t ≥ 0}, defined as solution of the following linear stochastic differential equation where θ > 0 is an unknown parameter.Since (3.1) is linear, it is immediate to see that its solution can be expressed explicitly as Moreover, is a stationary Gaussian process, see [6,9].Furthermore, Since Z := {Z t , t ≥ 0} is a continuous centered stationary Gaussian process, then it can be represented as a Wiener-Itô (multiple) integral Z t = I 1 (1 [0,t] ) for every t ≥ 0, as in (2.1).Let ρ(r) = E(Z r Z 0 ) denote the covariance of Z for every r ≥ 0. It is easy to show that In particular, ρ(0) = 1 2θ .Moreover, notice that ρ(r) = ρ(−r) for all r < 0. Our goal is to estimate θ based the discrete observations of X, using the approximative minimum contrast estimator: where g(x) := 1 2x , t i = i∆ n , i = 0, . . ., n, ∆ n → 0 and T = n∆ n , whereas f n (X) , n ≥ 1, are given by In order to analyze the estimator θ n of θ based on discrete high-frequency data in time of X, we first estimate the limiting variance ρ(0) = 1 2θ by the estimator f n (X), given by (3.6).Let us introduce , where According to (2.2), F n (Z) can be written as By (3.4), straightforward calculation leads to the following technical lemma.
Lemma 3.1.Let X and Z be the processes given in (3.2) and (3.3) respectively.Then there exists C > 0 depending only on θ such that for every p 1 and for all n ∈ N, Lemma 3.2.There exists C > 0 depending only on θ such that for large n Consequently, using Proof.Using the well-known Wick formula, we have This implies and as n → ∞ Therefore the desired result is obtained.
Lemma 3.3.There exists C > 0 depending only on θ such that for all n ≥ 1, ) Consequently, we can write Combining this with (2.8) and (3.7), we get On the other hand, where we used where we used the the change of variables and then applying Brascamp-Lieb inequality given by Lemma 2.1.Therefore the proof of (3.17) is complete.
Proof.Using (3.5), it is sufficient to prove that the results of the theorem are satisfied for the estimator f n (X) of 1 2θ .The weak consistency of f n (X) is an immediate consequence from (3.10).If n∆ η n → 0 for some 1 < η < 2, the strong consistency of f n (X) has been proved by [10,Theorem 11].Now, suppose that n∆ η n → ∞ for some η > 1.It follows from (3.10) that Combining this with the hypercontractivity property (2.6) and [13,Lemma 2.1], which is a well-known direct consequence of the Borel-Cantelli Lemma, we obtain f n (X) → 1 2θ almost surely.
Theorem 3.6.There exists C > 0 depending only on θ such that for all n ≥ 1, Proof.Recall that by definition θ = g 1 2θ .We have 2 for some random point ζ n between f n (X) and 1 2θ .Thus, we can write, Therefore, where we have used that The second term in the inequality above is bounded in Theorem 3.4.By Hölder's inequality, and the hypercontractivity property (2.6), for p, q > 1 with 1/p+ 1/q = 1 , for some constant C > 0 depending on p. Consequently, for every p ≥ 1 To establish (3.23) it is left to show that E |ζ n | −3p < ∞ for some p ≥ 1.Using the monotonocity of x −3 and the fact that it is enough to show that E|f n (X) | −3p < ∞ for some p ≥ 1.This follows as an application of the technical [8, Proposition 6.3].

Approximate maximum likelihood estimator
The maximum likelihood estimator for θ based on continuous observations of the process X given by (3.1), is defined by Here we want to study the asymptotic distribution of a discrete version of (4.1).Then, we assume that the process X given in (3.1) is observed equidistantly in time with the step size ∆ n : t i = i∆ n , i = 0, • • • , n, and T = n∆ n denotes the length of the "observation window".Let us consider the following discrete version of θT : Note that [7] and [11], respectively, proved the weak and strong consistency of the estimator θ n as T → ∞ and ∆ n → 0.
Let X be the process given by (3.1), and let us introduce the following sequences where where f n (X) is given by (3.6).Next, since ζ t i−1 and ζ t i − ζ t i−1 are independent, we have On the other hand, e −θ(t i +t i−1 +t j +t j−1 +t k +t k−1 +t l +t l−1 ) e −4θ(t i +t i−1 ) E ζ .
Theorem 4.1.There exists a constant C > 0 such that, for all n ≥ 1,

. 4 )
Remark 1.1.Note that in [3, Theorem 2.1], [5, Theorem 2.3] and [8, Theorem 5.4], the asymptotic normality of the distribution of θ n and θ n need n∆ 2 n = T 2 n → 0 and T → ∞.However, Theorem 3.6 and Theorem 4.1, which are stated and proved below, show that, respectively, the asymptotic normality of the distribution of θ n and θ n only need ∆ n = T n → 0 and T → ∞.