On the boundary properties of Bernstein estimators on the simplex

In this paper, we study the asymptotic properties (bias, variance, mean squared error) of Bernstein estimators for cumulative distribution functions and density functions near and on the boundary of the $d$-dimensional simplex. Our results generalize those found by Leblanc (2012), who treated the case $d=1$, and complement the results from Ouimet (2021) in the interior of the simplex. Since the"edges"of the $d$-dimensional simplex have dimensions going from $0$ (vertices) up to $d - 1$ (facets) and our kernel function is multinomial, the asymptotic expressions for the bias, variance and mean squared error are not straightforward extensions of one-dimensional asymptotics as they would be for product-type estimators studied by almost all past authors in the context of Bernstein estimators or asymmetric kernel estimators. This point makes the mathematical analysis much more interesting.


Introduction
The d-dimensional (unit) simplex and its interior are defined by where the weights are the following probabilities from the Multinomial(m, x) distribution: where the observations X 1 , X 2 , . . . , X n are assumed to be independent and F distributed. Precisely, for m, n ∈ N, let F n,m (x) := It should be noted that the c.d.f. estimator in (1) only makes sense here if the observations' support is contained in an hyperrectangle inside the unit simplex. If the observations have full support on the unit simplex, then the c.d.f. estimator would take values in (0, 1) on the interior of the unit hypercube.
For a density f supported on S d , we define the Bernstein density estimator of f by where m d is a scaling factor equal to the inverse of the volume of the hypercube k m , k+1 . As pointed out in Ouimet (2021a), the estimator f n,m is not a proper density but integrates to 1 asymptotically, and it can be written as a finite mixture of Dirichlet densities (in the one-dimensional case, finite beta mixtures are studied for example in McLachlan and Peel (2000)).
2 Review of the literature and motivation Vitale (1975) was the first author to consider Bernstein density estimation on the compact interval [0, 1], namely (2) with d = 1. The author computes the asymptotics of the bias, variance and mean squared error (MSE) at each point where the second derivative of f exists (also assuming that f is bounded everywhere). His proof rests on careful Taylor expansions for the density terms inside the bulk of the binomial distribution while concentration bounds are applied to show that the contributions coming from outside the bulk are negligible. The optimal rate, with respect to the MSE, is achieved when m n 2/5 and shown to be O x (n −4/5 ) for x ∈ (0, 1). For x ∈ {0, 1} (at the boundary), he finds that the MSE is O x (n −3/5 ) using m n 2/5 , but this choice of m turns out to be suboptimal; the rate O x (n −2/3 ) can be achieved with m n 1/3 . These results were generalized by Stadtmüller (1980, 1981) to target densities with support of the form [0, 1], [0, ∞) and (−∞, ∞), where the binomial weights of the density estimator were replaced by more general convolutions. Tenbusch (1994) was the first author to consider Bernstein estimation in the multivariate context. Assuming that f is twice continuously differentiable, he derived asymptotic expressions for the bias, variance and MSE of the density estimators on the two-dimensional unit simplex and the unit square [0, 1] 2 , and also proved their uniform strong consistency and asymptotic normality. He showed that the optimal rate of the MSE is O x (n −2/3 ) for interior points when d = 2, and it is achieved when m n 1/3 . On the "edges" of dimension 1 (i.e., excluding the corners), he showed that the optimal rate is O x (n −4/7 ) and it is achieved when m n 2/7 . On the "edges" of dimension 0 (i.e., the three corners of the 2-dimensional simplex), he showed that the optimal rate is O x (n −1/2 ) and it is achieved when m n 1/4 . As we will see, this deterioration of the MSE near the boundary is solely due to an increase of the variance of the density estimator, while the bias remains asymptotically uniform and of order O(m −1 ). This is in contrast with traditional kernel estimators, where the opposite trade-off happens: the bias increases near the boundary while the variance remains asymptotically the same. For both classes of estimators, the rates of the MSE might be the same on the boundary but for different reasons.
It should be noted that this variance increase near and on the boundary was previously observed by other authors for product-type asymmetric kernel estimators (meaning when the kernel is a product of one-dimensional kernels) on the geometrically simpler domains [0, 1] d and [0, ∞) d , see, e.g., (Bouezmarni and Rombouts, 2010, page 142 and (7a)), (Igarashi and Kakizawa, 2020, pages 50 and 52) and (Kakizawa, 2022, Corollary 1). Leblanc (2012b) also observed this phenomenon in one dimension and provided explicit asymptotic expressions for the bias, variance and MSE of the classical Bernstein density and c.d.f. estimators near and on the boundary of [0, 1]. Our main goal in this paper is to extend Leblanc's results to the d-dimensional simplex domain and complete the theoretical analysis initiated by Ouimet (2021a) on the asymptotic properties of Bernstein density and c.d.f. estimators on the simplex (with multinomial weights).
In Leblanc (2012b), Leblanc extended the results of Vitale (1975) by not only studying the asymptotics of the bias and variance for points that are on the boundary but also near the boundary (i.e., for x = λ/m where λ ≥ 0 remains fixed as n, m → ∞). In particular, he showed that the leading term of the variance can be written using modified Bessel functions of the first kind. He also proved similar results for the Bernstein c.d.f. estimator. As mentioned above, our goal in this paper is to generalize the results of Leblanc (2012b) to all d ≥ 1. The results will help us understand how the dimension of the "edge" we are closest to influences the variance and the MSE of the density estimator and the c.d.f. estimator.
Since the "edges" of the d-dimensional simplex have dimensions going from 0 (vertices) up to d − 1 (facets) and our kernel function is multinomial, the asymptotic expressions for the bias, variance and MSE are not straightforward extensions of the one-dimensional expressions as they would be for producttype estimators such as the ones in (Bouezmarni and Rombouts, 2010, page 142 and (7a)), (Igarashi and Kakizawa, 2020, pages 50 and 52) and (Kakizawa, 2022, Corollary 1). This is why the results in the present paper are of mathematical interest. Apart from the present paper, the only article dealing with boundary results for multidimensional non product-type Bernstein estimators or asymmetric kernel estimators that we are aware of seems to be Corollary 2 of Kakizawa (2022), where the variance of a multivariate elliptical-based Birnbaum-Saunders kernel density estimator is computed near and on the boundary.

Contribution
Our theoretical contribution is to find asymptotic expressions for the bias, variance and MSE of the Bernstein c.d.f. and density estimators, defined respectively in (1) and (2), near and on the boundary of the d-dimensional simplex. We also deduce the asymptotically optimal choice of the bandwidth parameter m using the expressions for the MSE, which can be used in practice to implement a plug-in selection method. All these results generalize those found in Leblanc (2012b) for the unit interval and complement those found in Ouimet (2021a) in the interior of the simplex. Our rates of convergence for the MSE are in line with those recently found in Ouimet and Tolosana-Delgado (2022) for Dirichlet kernel estimator. In both cases, the general rule is that the variable smoothing that is built-in for Bernstein estimators and asymmetric kernel estimators yields an asymptotically smaller bias near the boundary at the cost of an increase in variance, compared to traditional multivariate kernel estimators.
Under the assumption that the target density is twice continuously differentiable, we find in particular that the variance is O x (n −1 m d/2 ) in the interior of the simplex and it gets multiplied by a factor m 1/2 everytime we go near the boundary in one of the d dimensions. If we are near an edge of dimension d − |J | (see Section 4 for the definition of J ), then the variance is O x (n −1 m (d+|J |)/2 ). Additional smoothness conditions on the partial derivatives of the target density can improve those rates, see, e.g., Corollary 2 for more details on this point.
In contrast to other methods of boundary bias reduction (such as the reflection method or boundary kernels (see, e.g., (Scott, 2015, Chapter 6)), this property is built-in for Bernstein estimators, which makes them one of the easiest to use in the class of estimators that are asymptotically unbiased near (and on) the boundary. Bernstein estimators are also non-negative everywhere on their domain, which is definitely not the case of many estimators corrected for boundary bias. This is another reason for their desirability. Also, as an anonymous referee pointed out, the fact that the bandwidth parameter m is an integer makes the optimisation step easier to implement (for most bandwidth selection criteria like least-square cross validation, likelihood cross-validation, etc.) than for estimators where the bandwidth parameter h is a real number. Bandwidth selection methods and their consistency will be investigated thoroughly in upcoming work.

Outline
In Section 4 and Section 5, we state our results for the density estimator and the c.d.f. estimator, respectively. The proofs are given in Section 6 and Section 7. Some technical lemmas and tools are gathered in Appendix A.

Notation
Throughout the paper, the notation u = O(v) means that lim sup |u/v| < C < ∞ as m or n tend to infinity, depending on the context. The positive constant C can depend on the target c.d.f. F , the target density f or the dimension d, but no other variable unless explicitly written as a subscript. One common occurrence is a local dependence of the asymptotics with a given point x on the simplex, in which case we would write u = O x (v). In a similar fashion, the notation u = o(v) means that lim |u/v| = 0 as m or n tend to infinity. The same rule applies for the subscript. The symbol D over an arrow '−→' will denote the convergence in law (or distribution). We will use the shorthand [d] := {1, 2, . . . , d} in several places. The functions will denote the modified Bessel functions of the first kind of order 0 and 1, respectively. For any vector v ∈ R d and any subset of with the conventions ∅ := 0 and ∅ := 1. Finally, the bandwidth parameter m = m(n) is always implicitly a function of the number of observations, the only exceptions being in Lemmas 1, 3, 5, 6 and the related proofs.
• f ∈ C 2 (Int(S d )) and there exists an open set U ⊆ R d that contains S d and an extension f ext : U → R such that f ext ≡ f on S d and f ext ∈ C 2 (U).
Remark 1. If f ∈ C 2 (R d ), then we can just take f ext = f . However, if f is discontinuous anywhere at some point on the boundary of S d , then, at that point, the partial derivatives of f will technically refer to the partial derivatives of f ext .
In the first lemma, we obtain a general expression for the bias of the density estimator.
By considering points x ∈ S d that are close to the boundary in some components (see the subset of indices J ⊆ [d] below), we get the bias of the density estimator near the boundary.
where x J and λ J are defined in (4), and where Next, we obtain a general expression for the variance of the density estimator.
By combining Lemma 2 and the technical estimate in Lemma 5, we get the asymptotics of the variance of the density estimator near the boundary.
Theorem 2 (Variance of f n,m (x) near the boundary of S d ). Assume (5). For any x ∈ S d such that By combining Theorem 1 and Theorem 2, we get the asymptotics of the mean squared error of the density estimator near and on the boundary. In particular, the optimal bandwidth parameter m will depend on the number of components of x that are close to the boundary.
Corollary 1 (MSE of f n,m (x) near the boundary of S d ). Assume (5). For any x ∈ S d such that (9). If the quantity inside the big bracket is non-zero in (10), the asymptotically optimal choice of m, with respect to MSE, is (4)) in which case, we have, as n → ∞, By imposing further conditions on the partial derivatives of f , we can remove terms from the bias in Theorem 1 and obtain another expression for the mean squared error of the density estimator near and on the boundary, and also the corresponding optimal bandwidth parameter m when J = [d].
Corollary 2 (MSE of f n,m (x) near the boundary of S d ). Assume (5) and also (in particular, the first bracket in (7) is zero). Then, for any x ∈ S d such that where x J , x [d]\J and λ J are defined in (4), v J (x) is defined in (9), and where . (12), the asymptotically optimal choice of m, with respect to MSE, is (11), the optimal rate of the MSE near any vertex of the simplex is of the same order as in the interior of simplex (Corollary 1 with J = ∅) without the smoothness conditions.

Remark 3. In order to optimize m when J = [d]
in Corollary 2, we would need a more precise expression for the bias in Theorem 7 by assuming more regularity conditions on f than we did in (5).

Results for the c.d.f. estimator F n,m
For every result stated in this section, we will make the following assumption: • F is three-times continuously differentiable on S d .
Below, we obtain a general expression for the bias of the c.d.f. estimator on the simplex, and then near the boundary.
Lemma 3 (Bias of F n,m (x) on S d ). Assume (13). Then, uniformly for x ∈ S d , we have, as m → ∞, Theorem 3 (Bias of F n,m (x) near the boundary of S d ). Assume (13). For any x ∈ S d such that we have, as n → ∞, In the case where x i = 0 for some i ∈ J = ∅, notice that Bias(F n,m (x)) = 0 because F n,m (x) = 0 a.s.
Next, we obtain a general expression for the variance of the c.d.f. estimator on the simplex.
Lemma 4 (Variance of F n,m (x) on S d ). Assume (13). Then, uniformly for x ∈ S d , we have, as n → ∞, By combining Lemma 6 and Lemma 4, we get the asymptotics of the variance of the c.d.f. estimator near the boundary.
Theorem 4 (Variance of F n,m (x) near the boundary of S d ). Assume (13). For any x ∈ S d such that In the case where x i = 0 for some i ∈ J = ∅, notice that Var(F n,m (x)) = 0 because F n,m (x) = 0 a.s.
By combining Theorem 3 and Theorem 4, we get the asymptotics of the mean squared error of the c.d.f. estimator near the boundary.
Corollary 3 (MSE of F n,m (x) near the boundary of S d ). Assume (13). For any x ∈ S d such that (14) and V J (x) is defined in (15). In the case where x i = 0 for some i ∈ J = ∅, notice that MSE(F n,m (x)) = 0 because F n,m (x) = 0 a.s. Furthermore, as pointed out by (Leblanc, 2012b(Leblanc, , p.2772 for d = 1, there is no optimal m with respect to the MSE when J = ∅. This is also true here. The remaining case J = ∅ (i.e., when x is far from the boundary in every component) was already treated in Corollary 1 of Ouimet (2021a).

Proof of Lemma 1
Take δ n 0 slow enough as n → ∞ (for example δ n ≥ m −1/4 ) that standard concentration bounds for the binomial distribution yield for some appropriate constants c d , C d > 0. For any k such that k/m − x 1 ≤ δ n , we can use Taylor expansions and our assumption that f is twice continuously differentiable to obtain If we multiply the last expression by P k,m−1 (x) and sum over k ∈ N d 0 ∩(m−1)S d , then the joint moments from Lemma 7 and Jensen's inequality yield The conclusion follows.

Proof of Theorem 1
Take x as in the statement of the theorem. Using the notation from (6), we have and Then, from (17), (18) (19) and Lemma 1, we can easily deduce the conclusion.

Proof of Lemma 2
By the independence of the observations X 1 , X 2 , . . . , X n , we have From Lemma 1, we already know that E[ f n,m (x)] = f (x) + O(m −1 ), uniformly for x ∈ S d . We can also expand the integral using a Taylor expansion: The Cauchy-Schwarz inequality yields By putting (20), (21) and (22) together, we get the conclusion.

Proof of Theorem 2
By Lemma 2, Lemma 5,and (26) in Lemma 7, we have Using the Taylor expansion f ( The conclusion follows.

Proof of Lemma 3
Take δ n 0 slow enough as n → ∞ that the contribution coming from points k/m outside the bulk of the multinomial distribution (i.e., k/m − x 1 > δ n ) is negligible, exactly as we did in (16). Then, for any k such that k/m − x 1 ≤ δ n , we can use a Taylor expansion and our assumption that F is three-times continuously differentiable to obtain If we multiply the last expression by P k,m (x), sum over k ∈ N d 0 ∩ mS d , and then take the expectation on both sides, we get We apply Holder's inequality on the error term to get the conclusion.

Proof of Theorem 3
Take x as in the statement of the theorem. For all i, j, ∈ [d], note that (1 + o λ J (1)), By Lemma 3 and the second and fourth moments in Lemma 7, we deduce This ends the proof.

Proof of Lemma 4
To estimate the variance of F n,m (x), note that By the independence of the observations X 1 , X 2 , . . . , X n , we get Using the expansion in (23) and Lemma 3, the above is By the second moment expression in Lemma 7, we get the conclusion.

Proof of Theorem 4
By Lemma 4, Lemma 6 and (26) in Lemma 7, we have Var(F n,m (x)) = n −1 m −1 Now, using the fact that

we get
Var(F n,m (x)) = n −1 m −1 This ends the proof.

A Technical lemmas
The first lemma is a generalization of Lemma 3 in Ouimet (2021a). It is used in the proof of Theorem 2.
where I 0 is defined in (3) and ψ [d]\J is defined in (4).
The crucial tool for the proof is a local limit theorem for the multinomial distribution in Arenbaev (1976) that combines the multivariate Poisson approximation and the multivariate normal approximation, depending on which components of x ∈ S d are close to the boundary.

Conflict of the interest
The author declares no conflict of interest.

Funding information
The author acknowledges (past) support of a postdoctoral fellowship from the NSERC (PDF) and a postdoctoral fellowship supplement from the FRQNT (B3X). The author is currently supported by a CRM-Simons postdoctoral fellowship from the Centre de recherches mathématiques and the Simons Foundation.