Counterexamples to the classical central limit theorem for triplewise independent random variables having a common arbitrary margin

We present a general methodology to construct triplewise independent sequences of random variables having a common but arbitrary marginal distribution $F$ (satisfying very mild conditions). For two specific sequences, we obtain in closed form the asymptotic distribution of the sample mean. It is non-Gaussian (and depends on the specific choice of $F$). This allows us to illustrate the extent of the 'failure' of the classical central limit theorem (CLT) under triplewise independence. Our methodology is simple and can also be used to create, for any integer $K$, new $K$-tuplewise independent sequences that are not mutually independent. For $K \geq 4$, it appears that the sequences created using our methodology do verify a CLT, and we explain heuristically why this is the case.


Introduction
Independence is a fundamental concept in probability. When speaking of 'independence', one generally means mutual independence, as opposed to pairwise independence, or, in general, ' -tuplewise independence' ( ≥ 2). Recall that a collection of random variables (defined on the same probability space) are mutually independent, or just independent, if they are -tuplewise independent for all positive integers .
While mutual independence implies -tuplewise independence (for any ), the converse is not true. For the case = 2 ('pairwise independence'), several counterexamples can be found in the literature, see, e.g., Avanzi et al. [1] for a recent survey. For instance, one can define the following simple example that even if such a sequence is strictly stationary, a CLT need not hold. Weakley [23] extended this work by allowing the r.v.s in the sequence to have any symmetrical distribution (with finite variance). Takeuchi [21] showed that growing linearly with the sample size is not even sufficient for a CLT to hold. In those examples, however, the asymptotic distribution of the sample mean is not given explicitly, hence we cannot judge to what extent it departs from normality.
Kantorovitz [17] does provide an example of a triplewise independent sequence for which converges to a 'misbehaved' distribution -that of 1 · 2 , where 1 and 2 are independent (0, 1) -but this is achieved for a very specific choice of margin, namely the Bernoulli distribution.
In Section 2, we present a methodology, borrowing elements from graph theory, to construct new sequences of triplewise independent and identically distributed (noted thereafter t.i.i.d.) r.v.s whose common marginal distribution can be chosen arbitrarily (under very mild conditions). In Section 3, we provide a necessary and sufficient condition for a CLT to hold for such sequences.
In Section 4, we provide what we believe to be the first two examples of triplewise independent sequences with arbitrary margins for which the asymptotic distribution of the standardized sample mean is explicitly known and non-Gaussian. Those two distributions depend on the choice of the margin and have heavier tails than a Gaussian. This allows us to assess how far away from the Gaussian distribution one can get under sole triplewise independence. This work thus highlights why mutual independence is so fundamental for the classical CLT to hold.
Lastly, in Section 5, we explain how our methodology can easily be extended to create new -tuplewise independent sequences (which are not mutually independent) for any integer . While such sequences are interesting in themselves, it appears that for ≥ 4 they do verify a CLT, and we explain heuristically why this is the case. Despite not being the focus of this paper, we note that these sequences could prove useful to benchmark the performance of multivariate independence tests, many of which have been proposed in recent years, see, e.g., Fan et al. [12]

Construction of triplewise independent sequences
In this section, we present a general methodology to construct sequences { } ≥1 of t.i.i.d. r.v.s having a common (but arbitrary) marginal distribution satisfying the following condition: has finite variance and for any r.v.
We begin our construction of the sequence { } ≥1 by letting be a distribution satisfying Condition 1, with mean and variance denoted by and 2 , respectively. For a r.v. ∼ , let be any Borel set such that P( ∈ ) = ℓ −1 , for some integer ℓ ≥ 2. (2.1) Our construction relies on a sequence of simple graphs { } ≥1 with two properties: 1. The girth of is 4 (or larger), for all ; 2. The number of edges of grows to infinity as → ∞.

Main result
We now state our main result, which links the asymptotic distribution of the standardized mean of the sequence { } 1≤ ≤ to that of in (2.5). This result holds for any growing sequence of simple graphs { } ≥1 of girth at least 4 (as defined previously). Specific examples are given in the next section.
where ∼ (0, 1) and := Remark 3.1. If ̸ = 0 and is asymptotically non-Gaussian (this happens for certain graphs { } ≥1 , see the next section for examples), then is asymptotically non-Gaussian. Note that the restriction ̸ = 0 is not stringent, as it includes all distributions (in Condition 1) with a non-atomic part. Indeed, if ∼ has a non-atomic part, then has a non-atomic part on either . Without loss of generality, assume that the non-atomic part is on (E[ ], ∞), then we can find an integer ℓ ≥ 2 and a Borel set 0 such that P( ∈ ) = ℓ −1 with = (E[ ], ∞) ∩ 0 . By construction, this yields . The restriction ̸ = 0 also includes almost all discrete distributions with at least one weight of the form ℓ −1 ; see Remark 2 in Avanzi et al. [1] for a formal argument. Also, note that, depending on , many choices for (with possibly different values of ℓ) could be available.
Remark 3.2. If the margin satisfies Condition 1, and if = 0 (i.e., = ) or is asymptotically Gaussian, then our construction provides new triplewise independent (but not mutually independent) sequences which do satisfy a CLT (regardless of which graphs { } ≥1 are used).
Proof of Theorem 3.1. We prove (3.2) by obtaining the limit of the characteristic function of , and then by invoking Lévy's continuity theorem. Namely, we show that, for all ∈ R, Recall the notation defined in (2.6) and let︀ since Ξ = #{ : = 1}, and we know that, from (2.9), With the notation := / √ , the mutual independence between the 's, the 's and yields, for all ∈ R, For the second factor in (3.4), the classical CLT yields where in the last equality we used the fact that, from (2.9), For the third factor in (3.4), the quantity inside the bracket converges to 1 by the CLT. Hence, the elementary bound and the fact that For the fourth factor in (3.4), we note that Ξ − ℓ −1 P −→ 0 (because of the law of large numbers for pairwise independent r.v.s). Then, by the continuous mapping theorem, Since the sequence {|E[ i | ]|} ∈N is uniformly integrable (it is bounded by 1), Theorem 25.12 in [4] shows that we also have the mean convergence which proves (3.3). The conclusion follows.

Examples
In Theorem 3.1, whether the standardized sample mean is asymptotically Gaussian depends on the 'connectivity' of the chosen graphs { } ≥1 . In particular, it appears that having graphs of bounded diameter is a necessary (albeit not sufficient) condition for to be asymptotically non-Gaussian. To make this point explicit, we present two specific examples for which we obtain the (non-Gaussian) asymptotic distribution of (via Theorem 3.1, this also provides the asymptotic distribution of ). We present a third example where the limiting distribution is Gaussian.  1, 0, 1, 0), and VG denotes the variance-gamma distribution (see Definition A.1).
Proof. First, note that ( ) = 2 and = 2 . Define, for ∈ {1, 2, . . . , ℓ}, independent. Importantly, if (1) and (2) are known, then the number of 1's in the sequence { } 1≤ ≤ , denoted by Ξ throughout, can be deduced from simple calculations as where 1 denotes the indicator function on the set . It is well known that the covariance matrix for the first ℓ − 1 components of a Multinomial( , By the classical multivariate CLT and Definition A.1 in Appendix A, we get the result. Next, we illustrate what the asymptotic distribution of (the standardized sample mean) looks like in this example. By Theorem 3.1, converges in law to a r.v.
Hence, when ℓ ≥ 2 is fixed, completely determines the shape of (ℓ) ; close to 0 means that (ℓ) is close to a standard Gaussian, while close to ±1 means that (ℓ) is close to a standardized VG(ℓ − 1, 0, 1, 0). Figure 4.2 (where ℓ = 2 and varies) illustrates this shift from a Gaussian distribution towards a VG(ℓ − 1, 0, 1, 0) distribution. On the other hand, regardless of , if ℓ increases then (ℓ) gets closer to a (0, 1). This is illustrated in Figure 4.3 (where = 0.99 and ℓ varies). It is clear from these figures that triplewise independence can be a very poor substitute to mutual independence as an assumption in the classical CLT.  Lastly, the first moments of (ℓ) (obtained with simple calculations in Mathematica) are Thus, an upper bound on the kurtosis of (ℓ) is 6/(ℓ − 1) + 3, which implies that the limiting r.v. (ℓ) can be substantially more heavy-tailed than the standard Gaussian distribution (which is also seen in Figure 4.2).

Third example
In our construction, a CLT can hold. As a 'positive example', we consider here the sequence of -hypercube graphs, which have ( ) = 2 vertices and = 2 −1 edges. Despite being 'highly connected', these graphs do induce a Gaussian limit for { } ≥1 . We will prove below that̃︀ Ξ is asymptotically Gaussian, which implies that is as well. We have the following decomposition:̃︀  By symmetry of the construction, the case = 1 is trivial (i.e., E[̃︀ Ξ (1) ] = 0). Therefore, assume that ≥ 2.
Consider any instance ∈ Ω for the values of the Bernoulli r.v.s on the vertices of the -hypercube such that where > 0 is a universal constant.
(b) By the weak law of large numbers for weakly correlated r.v.s with finite variance, and the fact that Var(̃︀ Ξ ) =   Indeed, if the Bernoulli r.v.s 0 and 2 +1 are equal in Figure 4.5 (this is represented by = 1 in (4.4), which has probability 1/2), then for each of the 4-cycles in the graph, the sum of the 1's on the left, top and right edges will be 3 with probability 1/4 and 1 with probability 3/4. By the independence of the Bernoulli r.v.s on the top-left and top-right corners of the 4-cycles, we can thus represent the sum of the " sums of 1's" that we just described by + 2 where ∼ Binomial( , 1/4). We get 1 + + 2 by including the '1' for the bottom edge ( 0 , 2 +1 ), which we only count once since this edge is common to all the 4-cycles. Similarly, if the Bernoulli r.v.s 0 and 2 +1 are not equal in Figure 4.5 (this is represented by = 0 in (4.4), which has probability 1/2), then for each of the 4-cycles in the graph, the sum of the 1's on the left, top and right edges will be 2 with probability 3/4 and 0 with probability 1/4. By the independence of the Bernoulli r.v.s on the top-left and top-right corners of the 4-cycles, we can thus represent the sum of the " sums of 1's" that we just described by 2( − ) since − ∼ Binomial( , 3/4). By combining the cases = 1 and = 0, we get the representation (4.4).

The general case ≥ 4
One can easily adapt the methodology presented in this paper to build new sequences of -tuplewise independent random variables (with an arbitrary margin ). Indeed, all one needs to do is find a growing sequence of simple graphs of girth + 1 ≥ 5 and then, as before, put i.i.d. discrete uniforms on the vertices and assign 1's to edges for which the r.v.s on the adjacent vertices are equal. A girth of + 1 guarantees -tuplewise independence of the sequences hence created. An arbitrary margin can be obtained as before by defining sequences { } ≥1 and { } ≥1 as in (2.7), and then creating the final sequence { } ≥1 as in (2.8).
Whether or not sequences created this way will satisfy a CLT is a different (and difficult) question. In [2], the author constructs explicitly an infinite collection of simple connected regular graphs of girth 6 and diameter 3, which we denote by , where the index runs over the possible prime powers. These graphs are obtained as the incidence graphs of projective planes of order = − 1. For any given prime power , the graph is ( + 1)-regular and has 2 · ( 2 + + 1) vertices. In particular, it is a ( , 6)-cage because the number of vertices achieves the Moore (lower) bound, see, e.g., Biggs [3,Chapter 23]. This extremely uncommon sequence of graphs would be the perfect candidate for our construction to display a limiting non-Gaussian law for the normalized sum . Indeed, in addition to having a minimal number of vertices, these graphs also have a constant (and finite) diameter, which means that we do not have strong mixing of the binary random variables assigned to the edges (strong mixing is the most common assumption for a CLT with dependent random variables, see, e.g., Rosenblatt [20]). However, even in this context where the edges' dependence is, in a sense, maximized (because of the constant diameter and the minimal number of vertices), our simulations show that we cannot reject the hypothesis of a Gaussian limit for . We applied the following normality tests with = 2 6 (which corresponds to a sample of size = ( + 1)( 2 + + 1) = 270,465) and 5,000 samples: For the interested reader, the code is provided in Appendix B. for regular graphs of girth ≥ 5, see, e.g., Biggs [3,Proposition 23.1]. This dichotomy in the statistics context (and its link to graph theory) seems to be a completely new and promising observation.
Remark 5.2. In contrast to the sequence of graphs in our first example (Section 4.1), the sequence of hypercube graphs in our third example (Section 4.3) do not satisfy (5.1). The property (5.1) in a sense measures the connectivity of the graphs, and therefore the level of dependence between the r.v.s , assigned to the edges in our construction. Since (5.1) cannot be satisfied for ≥ 4 when the underlying graphs are regular, the third example reinforces our intuition that, for ≥ 4, the sequence { } ≥1 (and thus ) will always converge to a Gaussian random variable.

A The variance-gamma distribution
Definition A.1. The variance-gamma distribution with parameters > 0, ∈ R, > 0, ∈ R has the density function where is the modified Bessel function of the second kind of order . If a certain random variable has this distribution, then we write ∼ VG( , , 2 , ).
We have the following result, which is a consequence (for example) of Theorem 1 in [13].