Copula modeling for discrete random vectors

Copulas have now become ubiquitous statistical tools for describing, analysing and modelling dependence between random variables. Sklar’s theorem, “the fundamental theorem of copulas”, makes a clear distinction between the continuous case and the discrete case, though. In particular, the copula of a discrete random vector is not fully identi able, which causes serious inconsistencies. In spite of this, downplaying statements may be found in the related literature, where copula methods are used for modelling dependence between discrete variables. This paper calls to reconsidering the soundness of copula modelling for discrete data. It suggests a more fundamental construction which allows copula ideas to smoothly carry over to the discrete case. Actually it is an attempt at rejuvenating some century-old ideas of Udny Yule, who mentioned a similar construction a long time before copulas got in fashion.


Introduction
In Yule [53], one can read: "Two association tables that are not directly comparable owing to the di erent proportions of A's and B's in the data from which the tables were compiled may be rendered directly comparable by multiplying the frequencies in rows and columns by appropriate factors, [...] reducing the original tables to some arbitrarily selected standard form" (p. 588). The standard form that he recommends is the table whose margins have been made uniform. Likewise, in their extensive study of association coe cients in ( × )contingency tables, Goodman and Kruskal [21, p. 747] mentioned transforming all marginals to / for facilitating interpretation. Later, Mosteller [36] developed: "We might instead think of a contingency table as having a basic nucleus which describes its association and think of all tables formed by multiplying elements in rows and columns by positive numbers as forming an equivalence class -a class of tables with the same degree of association" (p. 4). And: "we might especially arrange the table to have uniform margins on each side in the case of a two-way table so as to get a clearer look at the association that is actually occurring" (p. 6).
If one identi es two-way contingency tables with bivariate discrete distributions, then the above ideas have much in common with copulas: one tries to capture the dependence structure between the two variables apart from the marginal distributions by making these into uniforms, hence uninformative. The observation is notable, as it has been known at least since Marshall [32] that the notion of copula ts poorly in the discrete framework. Here 'copula' refers to the classical de nition [7, De nition 1. 3.1] which, in the bivariate case, reads: Such copulas naturally arise in statistical modelling through the celebrated Sklar's theorem [50]: Theorem 1.1 (Sklar). Let F XY be the distribution function of a bivariate random vector (X, Y), with marginal distribution functions F X and F Y . Then there exists a copula C such that, for all (x, y) ∈ R , F XY (x, y) = C(F X (x), F Y (y)). (1.1) If F X and F Y are continuous, then C is unique; otherwise C is uniquely determined on Ran F X × Ran F Y only. Conversely, for any univariate distribution functions F X and F Y and any copula C, the function F XY de ned by (1.1) is a valid bivariate distribution function with marginals F X and F Y .
The popularity of copulas for dependence modelling largely follows from quotes like 'Copulas allow us to separate the e ect of dependence from e ects of the marginal distributions'. Clearly, if C is unique, then it unequivocally characterises how the two marginals F X and F Y interlock for producing the joint behaviour of (X, Y), while staying ignorant of what those marginals are. For instance, if X and Y are independent (X ⊥ ⊥ Y) and if C is unique, then from (1.1) C must be the 'independence copula' (or 'product copula') 2) and this regardless of F X and F Y . It is often overlooked that the situation is this appealing only in the case of continuous margins. When there is no one-to-one correspondence between the joint distribution F XY and the copula C, i.e., for X and/or Y discrete, the above argument falls apart.
Instrumental to copula ideas is the vector (F X (X), F Y (Y)). If X and Y are both continuous, then, through 'Probability Integral Transform' (hereafter: PIT), F X (X) and F Y (Y) have uniform distributions U [ , ] , and the copula C is their joint distribution. Clearly one can plug any increasing transformations of X and/or Y into PIT with the same output. Hence copulas are invariant under increasing transformations of the margins [37,Theorem 2.4.3], that is, 'margin-free'. Any copula-based dependence measure, such as Kendall's or Spearman's correlations [37,Chapter 5], is then margin-free as well.
Now, in the case X and/or Y discrete, Ran F X and/or Ran F Y are just countable subsets of [ , ]. The distributions of F X (X) and/or F Y (Y) are thus not U [ , ] , and their joint distribution cannot be a copula as described by De nition 1.1. It is actually a subcopula, i.e., a function satisfying the main structural properties of copulas but whose support is only a strict subset of I containing 0 and 1 [37, De nition 2.2.1]. Any such subcopula can be extended into a copula [37,Lemma 2.3.5]: the gaps in I \(Ran F X × Ran F Y ) can be lled in a way preserving the properties of copulas; however there are uncountably many ways of doing so and C in (1.1) is not identi able.
Such unidenti ability does cause serious inconsistencies, which [18] systematically investigated following preliminary warnings in [32], and questions the soundness of copula modelling for discrete data. This is discussed in Section 2, while a further invitation to calling current practice into question is o ered in Section 3.
In particular, we claim that the concept of 'copula' should be given a more fundamental meaning, not limited to De nition 1.1 but agreeing with it in the continuous case. The key is to apprehend copulas from a di erent perspective, a detailed discussion of which is given in Section 4.
Then we propose a construction which, rejuvenating Yule's, Goodman and Kruskal's and Mosteller's conceptions, allows a seamless extension of the main ideas of copula theory to the discrete case; rst in the case of a bivariate Bernoulli distribution (Section 5) and then gradually generalising it to bivariate discrete distributions with nite support (Section 6) and nally with in nite support (Section 8). Along the way, we bridge the gap between the copula literature and methods of contingency tables analysis. In particular, the role of the odds-ratio is reinforced and Yule's colligation coe cient establishes itself as the appropriate dependence parameter in a discrete copula-like setting; algebraic and geometric representations of contingency tables are leveraged; parallels between discrete copula modelling and the problem of matrix scaling are drawn; and a novel visual display of bivariate discrete distributions ('confetti plot') is proposed. New parametric models of dependence between discrete random variables are also introduced in Section 7.

Copulas on discrete distributions
Most reasons which make copula modelling attractive and e ective in the continuous case, break up in the discrete case: "everything that can go wrong, will go wrong" [10, p. 641]. A major downside is that copulas lose their margin-free nature when applied to discrete random vectors -whereas the whole copula methodology came into being in the rst place for exploiting the bene ts of margin-freeness [47]. In particular, in the discrete case, copula-based measures of dependence (e.g. Kendall's or Spearman's) are margin-dependent [Proposition 2.3 in 18, 32, Section 4.2]. Worse, a given copula model per se may or may not be intrinsically meaningful or even compatible with some marginals [12,54]. All in all, the case of discrete random vectors seems indeed misaligned with the very essence of the whole copula methodology.
A telling example is the following. Let X ∼ Bern(π X ), Y ∼ Bern(π Y ) for two probabilities π X , π Y ∈ ( , ), and X ⊥ ⊥ Y (independent). Then, for reconstructing the bivariate Bernoulli F XY it is enough to plug in (1.1) any copula C such that as it is seen by inspection. Indeed Sklar's theorem states that C is only identi able on Ran }, but given that C is xed by trivial constraints along the sides of I, only what happens at ( − π X , − π Y ) may re ect (in)dependence. The product copula (1.2) naturally ful ls (2.1), but so does a wide spectrum of other copulas of miscellaneous shapes whose only common trait is to go through − π X , − π Y , ( − π X )( − π Y ) ∈ ( , ) -see Appendix for some illustration. Any conclusion drawn from such a copula-based bivariate Bernoulli model is highly questionable, as its central element C may interchangeably characterise independence or dependence of drastically di erent strength and nature, yet compatible with (2.1).
It is true that, if one takes two discrete variables X and Y and binds them together through a copula C that we have picked, the 'Conversely'-part of Sklar's theorem guarantees that one has built a valid bivariate discrete distribution F XY with the right marginals. But there is no special link between C and F XY . For instance, [12] explains how the bivariate Bernoulli distribution on which [18, Example 13] tted a FGM copula could have been obtained all the same from a Plackett or an Ali-Mikhail-Haq copula, or from the reader's 'peculiar favourite copula family' [12, p. 128], making it futile to mention any speci c copula model at all in this case.

Transformations of the margins to uniforms
The root of all trouble is that it is not true that F X (X) ∼ U [ , ] for X discrete. Though, the U [ , ] -distribution of F X (X) and F Y (Y) in the continuous case is clearly what prompted De nition 1.1, and the widespread belief that copula methods are based on transformations of the margins into uniforms. Thus the very foundation of copula theory seems un t for the discrete framework. Nonetheless, in order to make the discrete case forcibly t into the classical copula framework, a common practice has been to 'jitter' the original discrete variables with some uniform random noise. The so-created arti cial continuous random vector has a unique copula, known as the checkerboard copula C . Arguably, C retains some of the dependence structure of the original discrete vector [6,18,38,45,46], and is a valid copula extension of the underlying subcopula [11,19,20]. However, C is just a particular choice -and not always the most natural one -among all the copulas satisfying (1.1), and by itself does not solve any of the problems exposed above [45, Remark 1.5(a)].
[35, Section 4] explicitly asked: 'Why does one transform the marginals to a uniform distribution?', and failing to nd any compelling mathematical answer (among other things) lead him to reject the idea of copulas altogether. Yet, it has been widely acknowledged since then [10], but even long before [24, p. 69], that the choice of transforming the margins to uniforms is driven by convenience only. Now, given that transforming to uniform is precisely the stumbling block of copula methods for discrete variables, one may sensibly ask: why stick to an inessential choice initially made for convenience only, if it is no more convenient at all in the situation of interest? Sklar's theorem establishes that the valuable information for understanding the joint behaviour of (X, Y)in particular their dependence -can be captured by some incidental copula C evaluated on Ran F X × Ran F Y . A naive interpretation of this puts C in the foreground, urging us to 'guess' what it might be; whereas it is actually of no importance, making the value of such guesswork unclear. In fact, there is no reason to extend the unique subcopula of a discrete vector to a copula, and any justi able analysis of the underlying dependence structure should be undertaken at the subcopula level, or equivalent.
Consider again the bivariate Bernoulli case. What Sklar's theorem fundamentally says is that the whole dependence structure can be described by one single number; see the lines following (2.1). Naturally this agrees with any basic analysis of a ( × )-contingency table, where it is well-known that the dependence/association in the table is captured by a single degree of freedom -cf. the χ -test. It is not clear what would be the bene t of playing on a whole bivariate function C for modelling or characterising that dependence, knowing that one single number de nes it entirely. [8] argued that this number should be the odds-ratio because it is 'margin-free' (he did not use that term, though, but see his Corollary 2). [41] calls this 'variationindependence' between the marginal parameters (π X , π Y ) and the odds ratio ω, and further shows (his Theorem 6.3) that any such margin-free dependence parameter must be a one-to-one function of ω. In Section 5, it will indeed be shown that there is a one-to-one correspondence between the odds ratio and the value C( − π X , − π Y ) singled out by Sklar's theorem in this situation. It is also easy (Section 5.6) to show that, given π X and π Y , the full bivariate pmf can be reconstructed from the value of ω only. Hence the marginal distributions coupled with the margin-free dependence parameter ω unequivocally de nes the bivariate distribution of interest. Clearly, the single number ω entirely ful ls what we would like the role of a copula to be, while by no means being related to De nition 1.1.
Transformation to uniform marginals is thus clearly not a necessary step for making sense of the main ideas behind copula modelling. Indeed, in Section 4, an alternative perspective on copulas is given, not relying explicitly on PIT. Avoiding PIT allows the concept to be readily adapted to the discrete case as well, while keeping all the pleasant properties of usual copula modelling.

Copulas as equivalence classes of dependence
Let (X, Y) be a continuous vector with distribution F XY . For simplicity, assume that X and Y are both supported on [ , ] (without loss of generality, one can imagine that we observe X ∈ R and Y ∈ R on the inverse logit scale, and copulas are invariant to monotonic transformations of the margins in any case) and that F XY admits a density f XY with marginal densities f X and f Y on the unit square I. Let F = {f : I → R, s.t. f ≥ , I f = }, the set of all bivariate probability densities on I, and S the set of all di erentiable strictly increasing functions from [ , ] to [ , ]. See that (S, •), where • denotes function composition, is a group, and so is (S × S, •.), where •. denotes componentwise composition: For any (Φ, Ψ) ∈ S × S, de ne g Φ,Ψ : F → F as , (u, v) ∈ I. Clearly g Φ,Ψ (f XY ) is the joint density of (Φ(X), Ψ(Y)), i.e., the version of f XY whose marginal distributions have been individually distorted by Φ and Ψ. The class [f XY ] contains all those 'marginally distorted' densities. What these densities have in common, compared to other classes, can then only be the constituent of f XY 'between' the margins, that is, what 'glues' them together. This is exactly the de nition of 'dependence': 'the information on the law of a random vector which remains to be determined once the marginal laws have been speci ed' [52]. Each equivalence class in F is thus representative of a certain dependence structure. The elements of F are really those which deserve the name 'copula', as they genuinely are the links ('copulae' in Latin) which cement marginals inside bivariate densities. To avoid confusion with De nition 1.1, though, we will call the element [f ] ∈ F the 'nucleus' of f to align with [36]; see Section 1. A nucleus [f ] is, in some sense, a bivariate density which has been entirely stripped from its marginals. Now, the abstract concept of a bivariate density with no marginals is di cult to visualise. Hence, for understanding the inner dependence structure of the vector (X, Y), one may want to exhibit a simple re-embodiment of [f XY ] into a proper density by pasting on it some default marginals. If one targets uniform marginals, then, by PIT, this is the element in which we recognise the density c of the copula C of F XY . Clearly the choice of uniform margins for reembodying [f XY ] into a proper density is totally arbitrary. It seems just sensible, for interpretation and visualisation purpose, to keep things as uncomplicated as possible. That said, uniforms and/or PIT do not play any role when de ning the concept of nucleus, which is really what copulas are all about. The construction can thus be adapted mutatis mutandis to discrete distributions, as detailed below.

The Bernoulli copula . The bivariate Bernoulli distribution
Consider again the case of two Bernoulli random variables X ∼ Bern(π X ) and Y ∼ Bern(π Y ) sharing (potentially) some dependence. The corresponding bivariate Bernoulli distribution, say p, is typically presented under the form of a ( × )-table: degenerate table). De ne P × the set of all such bivariate Bernoulli probability mass functions, where each p ∈ P × is identi ed to the matrix Now, as x,y pxy = , one can actually identify P × to the 3-dimensional simplex, here a regular tetrahedron whose vertices are the degenerate distributions d , d , d and d ; see Figure 5.1. We will call this tetrahedron the Bernoulli tetrahedron. Note that, as we assume < π X , π Y < , the 4 vertices and the edges d d , d d , d d and d d are not admissible elements of P × .

. Marginal transformations
Mimicking Section 4, we look for isolating the constituent of p which remains invariant to 'monotonic distortion' of the margins. In the continuous case, by such distortion it was meant the vector (Φ(X), Ψ(Y)) whose density (4.1) can take miscellaneous shapes. Here, this 'transformation trick' does not work: for X ∼ Bern(π X ), Φ(X) remains the same two-point distribution ( −π X , π X ) (only the 'labels' change), and same for Ψ(Y). Now, one can see (4.1) from a more basic perspective, considering the group action g Φ,Ψ as just a mechanism reassigning the initial probability mass di erently over I. Under the e ect of g Φ,Ψ , the point (u, v), initially assigned the probability f XY (u, v) du dv, would now get a probability f * is given by (4.1) -in a sense, we regard this as the e ect of a transport map taking the initial probability measure onto another one, as opposed to the initial interpretation as the resulting probability measure after transformation of the random variables. Note that the normalisation by Φ (Φ − (u)) Ψ (Ψ − (v)) just guarantees f * XY = . This more fundamental interpretation of (4.1) carries over to the Bernoulli framework.
Indeed, for some ϕ > , de ne a distorted distribution for X as Bern(π * X ), where π * X = ϕπ X −π X +ϕπ X . Clearly, for ϕ > , π * X > π X : some of the probability initially assigned to X = has been transferred to the next value, X = ; and reversely for ϕ < . Like above, the factor /( − π X + ϕπ X ) is just a normalisation, guaranteeing π * X ∈ [ , ] for all ϕ > . When the margin Y is similarly distorted, the initial joint probability distribution is re-assigned through table (5.1) by a similar process of transferring probability weight between adjacent cells. The organisation of the cells, in particular their order along each margin, is not altered: the marginal distortions are monotonic in that sense. In e ect, the distorted table is obtained by multiplying the rows and columns of (5.2) by positive values (and renormalise). This totally concords with what [53] and [36] urged; see Section 1.
Speci cally, de ne D ( ) × the set of all diagonal matrices whose entry ( , ) is equal to 1, and for any ϕ, ψ > , set Equipped with the matrix multiplication ·, (D ( ) × , ·) is a group, and so is (D ( ) × ×D ( ) × , ·.), where ·. is componentwise matrix multiplication: Similarly to Section 4, g Φ,Ψ is a group action of (D ( ) × × D ( ) × , ·.) on P × . Any p ∈ P × induces an orbit is the set of all those equivalence classes. Any [p] ∈ P × is a class of bivariate Bernoulli distributions (5.2) sharing the same 'core' structure once we strip them from their margins.

. Bernoulli nucleus and Bernoulli copula probability mass function
For any distribution p ∈ P × , de ne establishing that the elements of P × are again classes of equivalent dependence. We call [p] the nucleus of p, which contains the full information about how the Bernoulli marginals are glued together inside p.
In the Bernoulli tetrahedron, the sets of distributions p ∈ P × sharing common odds ratios ω ∈ ( , ∞) are doubly-ruled surfaces corresponding to sections of hyperboloids of one sheet [15, Section 3]. E.g., Figure 5.1 shows the surface corresponding to ω = , that is, all bivariate Bernoulli distributions for which X ⊥ ⊥ Y.
For interpretation purpose, it may be insightful to de ne a representative of [p], that is, a particular 'simple' bivariate Bernoulli distribution with odds-ratio ω(p). Again, a natural choice is the element of [p] with uniform margins, as [21] suggested (Section 1). Simple algebra reveals that, for p ∈ P × such that ω(p) = ω ≥ , there is a unique element in [p] with Bernoulli( / )-margins, which is This representative is akin to the copula density in the continuous case, hence we call p the Bernoulli copula probability mass function (copula pmf). Note that the values in (5.5) were mentioned in [4, 11.2-14], while a similar 'copula' was investigated in [51].
In the Bernoulli tetrahedron, all p ∈ P × with the same marginal distributions must lie on a straight line orthogonal to the edges d d and d d [15,Section 4]. Denote Given that X ⊥ ⊥ Y ⇐⇒ ω = , the independence Bernoulli copula pmf is evidently and clearly X ⊥ ⊥ Y ⇐⇒ p = π. This can be contrasted to the observation made in Section 2 that a continuous copula C gluing two independent Bernoulli's as in (1.1) need not be the independence copula.

. Structural zeros
The limit values ω = and ω = ∞ occur when (at least) one of the entries of (5. The value ω = arises in distributions like where ×'s represent non-zero elements. In the terminology of [29,, case (i) corresponds to absolute association, whereas cases (ii) correspond to complete association. Clearly (i) represents 'perfect dependence' (negative, in this case), but it is not that clear for (ii) as there is no one-to-one correspondence between X and Y. Therefore, the fact that the odds-ratio ω = and the corresponding copula pmf w do not distinguish between (i) and (ii) may appear puzzling. Yet it is easily seen that 'absolute association' (i) is only possible if π X + π Y = . Whenever π X + π Y ≠ , any sense of 'perfect dependence' automatically translates into 'complete association'. Hence the dependence is actually as strong as can be in both cases (i) and (ii) given the margins. A marginal feature, the distinction between (i) and (ii) must be ignored by the copula pmf.
In , as one can approach w arbitrarily close while staying on any of the two faces. In these critical cases (ii), it is thus necessary to extend the nuclei [p ] or [p ] to their closure for them to include the corresponding copula pmf w. Further characterisation will be given in more generality in Section 6. The case ω = ∞ and p = m is treated in perfect analogy.

. Yule's colligation coe cient
Suppose that (U, V) is a bivariate Bernoulli vector with joint pmf (5.5) for some ω ≥ . One can check that Pearson's correlation between U and V is which is exactly Yule's 'colligation coe cient' [53, pp. 592-593]. Hence Υ can be regarded as the Bernoulli analogue to Spearman's ρ in the sense that it is Pearson's correlation computed after copula transformation.
In fact, as √ ω = ( + Υ )/( − Υ ), the copula pmf p (5.5) can be written under the even simpler form . The e ect of Υ on p is thus linear in nature. The value of Υ acts as a ruler along w m in Figure 5.1: from Υ = − at w to Υ = at m, via Υ = at π.
Kendall's τ corrected for the occurrence of ties, i.e. 'τ b ' [18,De nition 3], is, for the bivariate Bernoulli case, This, computed on the copula pmf (5.5), reduces down to τ b = Υ again. The above observations suggest that Yule's Υ is a very natural, if not the canonical, dependence parameter in the ( × )-table framework. This is noteworthy as, although it was originally Yule's preferred association measure [53, p. 592], this coe cient has by now largely passed into oblivion.

. Construction of arbitrary bivariate Bernoulli distributions with given copula pmf
Analogously to the 'Conversely'-part of Theorem 1.1, one can wonder if it is always possible to construct a bivariate Bernoulli distribution p whose marginals are Bern(π X ) and Bern(π Y ) ( < π X , π Y < ) and dependence structure prescribed by a certain Bernoulli copula (5.5). The answer is a rmative. In Figure 5 The other values follow by substitution. In particular, p = − π X − π Y + p . As p in (5.9) is an increasing function of ω, so is p . In the same time, 'the value C( −π X , −π Y ) singled out by Sklar's theorem', described below (3.1), is precisely p , establishing the one-to-one correspondence between C( − π X , − π Y ) and ω. Characterising the dependence in a bivariate Bernoulli vector by ω, or any monotonic function thereof, is thus totally consistent with Sklar's theorem.
For ω = , we must have either p = or p = (or both). By obvious substitution, one gets if π X + π Y = , π X + π Y > or π X + π Y < , respectively. These distributions correspond to the lower Let p be its joint probability mass function, de ned by pxy = P(X = x, Y = y), (x, y) ∈ S X × S Y , and p X = (p • , p • , . . . , p R− • ) and p Y = (p • , p • , . . . , p •S− ) its marginal distributions: px• = y∈S Y pxy = P(X = x) and p•y = x∈S X pxy = P(Y = y). Let P R×S be the set of all such bivariate discrete distributions p with px• > ∀x ∈ S X and p•y > ∀y ∈ S Y , identi ed to the (R × S)-matrices Any such distribution can be regarded as a point in the (RS − )-dimensional simplex [13].

. Odds ratio matrix
As Ran F X × Ran F Y here consists of (R − )(S − ) informative locations (i.e., strictly inside the unit square), Sklar's Theorem implies that one must be able to entirely describe the inner dependence structure of p by (R − )(S − ) parameters, naturally in agreement with the usual break down of degrees of freedom in comparable (R × S)-contingency tables. Those (R − )(S − ) parameters can be a family of odds-ratios [2]. [41, p. 119] spells out that those odds-ratios are margin-free ('variation-independent' of the marginal parameters). [1, p. 55] stressed that 'given the marginals, the odds ratios determine the cell probabilities'; in other words, the full distribution can be entirely reconstructed by coupling the marginal distributions and the margin-free set of odds ratios. Again, those entirely ful l here the desired role of classical copulas, the explicit resort to which being purposeless.
Remark 6.1. Although they were ruled out in (5.2)-(5.4) when assuming < π X , π Y < , cases of / may arise in (6.1). Then the corresponding entry of Ω(p) may be left unde ned. Admitting some slight lack of rigour, we identify two odds ratio matrices whose all well de ned entries are equal, i.e., an unde ned entry in a matrix is assumed to be equal to whatever the corresponding entry may be in the other.

. Marginal transformations, nucleus and copula probability mass function
De ne D ( ) Q×Q the set of all diagonal Q × Q matrices whose entry ( , ) is equal to 1 and other diagonal entries are positive. Similarly to (5.3), for any Φ ∈ D ( ) R×R and Ψ ∈ D ( ) S×S , let The matrix Φ multiplies the rows of p and the matrix Ψ multiplies the columns of p: this is akin to 'marginal distortions' as in Section 5.2. This de nes a group action on P R×S , which induces orbits [p]: Free from any sense of marginal distributions, the orbits [p] must again be equivalence classes of dependence. Indeed, for any two p, p * ∈ P R×S , p ∼ p * ⇒ Ω(p) = Ω(p * ), as all odds ratios are preserved by g Φ,Ψ , exactly as in (5.4). This holds true for any unde ned elements of Ω(p) as well, as g Φ,Ψ leaves the zeros of p una ected. Hence [p] will again be called the nucleus of the discrete pmf p. If all entries of Ω(p) are de ned and positive, then Ω(p) = Ω(p * ) ⇒ [p] = [p * ]. Like in Remark 5.1, though, one may nd two p , p ∈ P R×S with Ω(p ) = Ω(p ) but [p ] ≠ [p ] when Supp(p ) ≠ Supp(p ), that is, when p and p show a di erent pattern of structural zeros. Again, the preponderant role of structural zeros on the dependence structure appears clearly. Now one may wish to single out the member of [p] with uniform marginals for embodying the dependence pattern in p by a simple element of the class. That one would be called the 'copula pmf' of p, leading to the following de nition of a discrete copula, the obvious analogue to De nition 1.1. Remark 6.2. A very similar de nition is given in [30], who investigated such 'discrete copulas' in the case R = S. The 'copula pmf' here coincides essentially with the bistochastic matrix of their Proposition 2. See also [33,34,39] and [7, Section 3.1.1].

. Existence and uniqueness of the copula pmf
De ning the copula pmf of p as the member of [p] with uniform margins raises the question of the existence and uniqueness of such an element on [p]. This question is linked to the algebraic problem of 'matrix scaling': 'Given a nonnegative matrix A, can we nd diagonal matrices D and D such that D AD is doubly stochastic?' [48] showed that the answer is a rmative if A is a positive square matrix, a result later generalised to nonnegative and/or non-square matrices; see [25] for a review. This allows a simple necessary and su cient criterion for the existence and uniqueness of the copula pmf of a given p ∈ P R×S to be formulated.
When all pxy's are positive in p, it can directly be deduced from [48,49] that the copula pmf exists and is unique. Hence the non-trivial case is again when structural zeros are present in p, with their layout being the essential feature. De ne the support pxy > ∀(x, y).
Case (a) is the 'easy' case, which covers all (R × S)-distributions with no structural zeros (N(p) = ∅), but not only: p is allowed to have structural zeros (N(p) ≠ ∅), provided those are not too 'prominent' in the speci ed sense. The unique p is obviously the copula pmf of p. From (6.4), Supp(p) = Supp(p), that is, the pattern of structural zeros (if any) is the same in p and p.
Case (b) is the critical case. In case (b (i)), the matrix p can be made block-diagonal by some permutations of its rows and columns. Then, each sub-block of non-zero elements of p can be dealt with separately when adjusting the margins, and it remains possible to write p under the form (6.3), that is p ∈ [p], and Supp(p) = Supp(p). In the Bernoulli case, this corresponds to 'absolute association' ((i) in (5.7)).
By contrast, in case (b (ii)), the matrix p cannot be made block-diagonal. For complying with the uniform margins constraint, new zeros must be created in p which must therefore be a limit point of [p] in the sense (6.5). Then p ∈ Cl([p]) and Supp(p) ⊂ Supp(p). The new zeros are created on (ν * . But it holds true that Ω(p) = Ω(p) (in the sense of Remark 6.1) and p ∈ C R×S , hence p is again the unique copula pmf of p. In the Bernoulli case, this corresponds to 'complete association' ((ii) in (5.7)).
Finally, case (c) establishes the non-existence of a copula pmf when the structural zeros form a bulky subset of p. As the zeros cannot be turned into positive values by (6.3) and are frozen, there do not remain su ciently many degrees of freedom for adjusting the marginals. Pragmatically, the dependence between X and Y is so overly dictated by the structural zeros that an approach based on odds ratios is pointless.
The above observations allow us to state: Corollary 6.1 (Existence and uniqueness of the copula pmf). The bivariate discrete distribution p ∈ P R×S admits a unique copula pmf p if and only if |ν X | R + |ν Y | S ≤ for all (ν X × ν Y ) ∈ N(p). By de nition, the copula pmf p has discrete uniform margins, is such that Ω(p) = Ω(p) (in the sense of Remark 6.1) and Supp(p) ⊆ Supp(p).
Finally, the following result provides an interesting characterisation of that copula pmf. Proof. See [5], Theorems 3.2 and 3.3.
The copula pmf p is thus the bivariate discrete distribution with uniform marginals which is the closest to the initial p in terms of the Kullback-Leibler divergence. Not only this provides a quantitative description of the copula pmf p (as opposed to the rather qualitative de nition based on the concept of 'nucleus' in Section 6.2), it also allows a clear parallel to be drawn between the proposed discrete copula and its continuous counterpart. Indeed it is known that a similar characterisation of the copula of a continuous vector exists, under mild conditions, in terms of I-projection [42,44].

. Iterated proportional tting procedure
Unlike in the Bernoulli case (5.5), p is usually not available in closed form in the general (R × S)-case. (Speci c models lead to closed form copula pmf, though; see Section 7.) However, one can easily extract p from any p ∈ P R×S by iterated proportional tting (IPF). This consists of alternately normalising the rows and columns of p to have uniform marginals. In e ect, it reproduces (6.5), alternately left-and right-multiplying the current version of p by diagonal matrices Φ k and Ψ k , which leaves all odds-ratios una ected. Hence IPF perfectly ts in our framework: the output of any iteration remains in [p]. The convergence of IPF was investigated in [14,26] and [43]; see also [4,Section 3.6] and [41,Section 12.2]. The IPF seeded on any p ∈ P R×S indeed converges to its copula pmf p provided that it exists, i.e. under the condition of Corollary 6.1. The convergence of the IPF procedure is geometric in case (a) and (b (i)), and arithmetic in case (b (ii)) [5]. The R package mipfp [3] implements the IPF.

. Construction of arbitrary bivariate discrete distributions with given copula pmf
Similarly to Section 5.6, one may want to construct a (R × S)-discrete distribution with particular marginal distributions p X and p Y and dependence structure driven by a copula pmf p. The existence and uniqueness of such a distribution is (partially) established by the following result, analogue to Theorem 6.1. This result guarantees the existence of the requested distribution p in 'easy' cases: no zeros in p, or zeros not lying on rows and columns carrying large target marginal weights, or block-diagonal copula pmf p. It does not say that such a p does not exist in the other cases. In fact, such distribution may exist, as evidenced by (5.10) in the Bernoulli case. However, reconstructing p then is not achieved through the transformation (6.3), as some zeros of p must be turned back into a positive probability, hence p ≁ p as p belongs to an orbit of which p is only a limit point. Existence and uniqueness of p in those cases remains an open question; however, a geometric perspective similar to Remark 5.2 suggests positive conclusions.

. Yule's coe cient
By analogy with Section 5.5, one can de ne a margin-free measure of overall concordance in p as Pearson's correlation coe cient computed on p. We call such a coe cient Yule's coe cient Υ , which can again be regarded as the discrete analogue to Spearman's ρ. Suppose that U is discrete uniform on { R+ , R+ , . . . , R R+ }, V is discrete uniform on { S+ , S+ , . . . , S S+ }, and their joint pmf is given by the copula pmf p. It can then be checked that Pearson's correlation between U and V is the Fréchet bounds analogous to (5.6). Note that any p ∈ P R×R represented by a diagonal matrix is easily seen to admit m or w as copula pmf. Those fall into case (b (i)) of Theorem 6.1 and correspond to (positive or negative) 'absolute association'; cf. Section 5.4. There also exist non-diagonal distributions p ∈ P R×R , belonging to case (b (ii)) of Theorem 6.1, which admit m or w as copula pmf as well. Those would be akin to 'complete association'. As in the Bernoulli case, the distinction between 'absolute' and 'complete association' is only a marginal feature which must be ignored by the copula. For instance, positive 'absolute association', i.e. probability weight concentrated on the main diagonal of p, is only possible if p X ≡ p Y . By contrast, dependence as strong as can be between unequal discrete marginals must turn into 'complete association'. As a result, Υ = ± without distinction between 'absolute' and 'complete' association.
Now, when R ≠ S, |Υ | cannot reach 1. Indeed, if X and Y do not take the same number of values, the associated copula pmf can never approach any of the diagonal forms m or w for the obvious reason, preventing any sense of 'perfect dependence'. In that case, the maximum value attained by |Υ | occurs when p is the pmf associated to the Fréchet bounds in the class of (R×S)-bivariate discrete distributions with uniform margins [16,'Exemple I']. It is also clear that, like Spearman's ρ, Υ only detects monotonic dependence ('concordance') between X and Y. In particular, for max(R, S) > , Υ can be 0 even when X and Y are not independent. Genuine measures of dependence ∆, in the sense of ∆ = ⇐⇒ X ⊥ ⊥ Y, may be de ned along the same way as in [17].
Example 6.1. The copula pmf being, by de nition, margin-free, its construction only requires a sense of order for the 'values' of X and Y. In particular, if X and/or Y are ordinal random variables, then it remains meaningful to construct their copula pmf in order to understand their dependence. This is illustrated here through data on congenital sex organ malformations cross-classi ed by maternal alcohol consumption from a study described in [22]: shown as a confetti plot in Figure 6.1. The adverse e ect of maternal alcohol consumption on the risk of congenital malformation appears clearly, and is quanti ed by a positive value of Yule's coe cient of Υ = .
. Entirely margin-free, such a copula-based measure of association between two ordinal random variables does not rely on assigning scores to each category as is otherwise necessary [28, Section 2.3] -the 'labels' X ∈ { , } and Y ∈ { , , , , } in (6.8) have no impact whatsoever. This seems desirable, as [21, p. 740] noted: "We feel that the use of arbitrary scores to motivate measures is infrequently appropriate."

Parametric discrete copulas
Paralleling the continuous case, one can construct parametric models of copula pmf's. In fact, any parametric continuous copula readily gives rise to a discrete copula pmf of any dimension (R × S), as described in Section 7.1. One may also think of speci c discrete copulas originating from particular bivariate discrete distributions, such as the Binomial copula (Section 7.2) or truncated Geometric copula (Section 7.3).

. Discrete versions of classical continuous copulas
Then, as C has uniform margins on I, it follows, for any u, v, S− v= p uv = R and R− u= p uv = S . Hence the (R × S)-discrete distribution p = [p uv ] u= ,...,R− , v= ,...,S− is a copula pmf as de ned by De nition 6.1.
For simple parametric continuous copulas C, such p can be written in closed form. For instance, it can be checked that the (R × S)-discrete version of the FGM copula is, for θ ∈ [− , ], For θ = in (7.2), one nds p uv = /RS for all (u, v), which is the (R × S)-independence copula pmf: Remark 7.1. The discrete copula pmf's derived from a continuous one through (7.1) are obtained by overlaying C on the regular mesh { , R , . . . , R− R , } × { , S , . . . , S− S , } over the unit square I. The discretisation is thus carried out 'in the copula world', keeping all marginals uniform. The so-produced discrete copula pmf's p can then be used in a second time for modelling dependence and/or constructing new bivariate discrete distributions with prescribed marginals. The idea of two distinct building blocks, the marginals on one side and the dependence/copula on the other, is maintained. By contrast, when writing (1.1) for a bivariate discrete distribution F XY with a certain continuous copula C, the discretisation is achieved by overlaying C on the mesh Ran F X × Ran F Y set by the margins of F XY . So the discretised copula is de ned by the margins, which explains why dependence and marginal distributions can never be separated in such models. The two approaches coincide for a continuous vector (X, Y), though, as then the mesh Ran F X × Ran F Y reduces down to the whole unit square I, akin to a 'continuous regular mesh'.

. The Binomial copula
Let (X , Y ), . . ., (Xn , Yn) be independent copies of a bivariate Bernoulli random variable with pmf (5.1). [31,Section 3] de ned the bivariate Binomial as the distribution of the vector (X, Then, it can be checked that the odds-ratios (6.1) are where ω is the odds-ratio of the initial bivariate Bernoulli (3.1). For n xed, the dependence structure in a bivariate Binomial is thus only driven by one parameter ω, and the corresponding Binomial(n)-copula, which is a ((n + ) × (n + ))-discrete distribution with uniform margins, is a one-parameter model.
For instance, if n = , the bivariate Binomial distribution and its odds ratio matrix (6.2) are, respectively, Through some algebra one can make the margins of p into uniforms through (6.3), and one obtains for ω ≠ . For ω = , of course, p = π, the ( × )-independence copula pmf (7.3). See also that, for ω = or ω = ∞, p = w and p = m, the Fréchet lower and upper bounds (6.7) in 3 dimensions. One also has as Yule's coe cient (6.6) for this copula pmf, which is Υ = − for ω = and Υ = for ω = ∞. This family of Binomial copulas is thus comprehensive as it allows all values for Yule's coe cients from − and .

. The truncated geometric copula
Let (X , Y ), (X , Y ), . . . again be a sequence of independent replications from the bivariate Bernoulli distribution (5.1). [31, Section 6] de ned the bivariate Geometric distribution as the distribution of the vector (X, Y) where X is the number of 's before the rst 1 in the sequence X , X , . . ., and Y the number of 's before the rst 1 in the sequence Y , Y , . . .. The pmf is for some N ≥ . The pmf of (X N ,Ỹ N ) is Denotep N the corresponding matrix in P N×N . For all integer N, it follows from Corollary 6.1 that this bivariate discrete distribution admits a unique copula pmfp So, the dependence structure of a discrete bivariate vector supported on N × N can be represented by a unique continuous copula. Yet, this unique copula is not any of the copulas C satisfying (1.1).
Indeed, analogously to Remark 7.1, (1.1) reconstructs the bivariate discrete distribution F XY by overlaying a copula C on the mesh Ran F X × Ran F Y over I. Such copula is not unique and is indissociable to the margins. By contrast, the above construction singles out one unique copula which represents the 'core' of F XY in the spirit of the marginal transformations described in Section 6.2. It is independent of the margins, as it is a representation of all the odds ratios ωxy (6.1) for (x, y) ∈ N+ × N+. The bivariate discrete distribution F XY can thus be broken down into its marginal distributions on one hand, and its unique copula on the other, like in the continuous case. A di erence is that here, the combination of the copula and the marginals is not carried out by (1.1), but by a continuous version of IPF [25, Sections 6.3 and 6.4, and references therein].

. The Geometric copula
Consider the truncated Geometric distributionp N given by (7.5), and set p • = p • = / . For any N ≥ , one gets a copula pmf'sp N ∈ C N×N involving only one parameter ω, like (7.7) for N = . As N → ∞, those discrete pmf's turn into a continuous distribution with uniform margins as pictured in Figure 8.1 (left: ω = ; right: ω = / ). For ω > , this copula admits a singularity along the main diagonal of the unit square I. This is reminiscent of the Marshall-Olkin copula [37, Section 3.1.1], a link to which could have been expected here given that the Marshall-Olkin bivariate Exponential distribution is the limit version of the bivariate Geometric distribution introduced above [31, Section 6]. The Geometric copula, however, remains a representative of the inner dependence structure in the discrete vector (X, Y) whose pmf is (7.4), and is not the Marshall-Olkin copula as appears clearly when ω < . Then, the limiting Geometric copula density is seen to be identically null on the main diagonal of I, forming some sort of 'inverse singularity'.

. The Poisson copula
Let Z ∼ P(λ ), Z ∼ P(λ ) and Z ∼ P(λ ) be three independent Poisson random variables, with λ , λ > and λ ≥ . Then de ne for (x, y) ∈ N × N, and X ∼ P(λ + λ ) and Y ∼ P(λ + λ ). The odds ratios (6.1) reduce down to It is seen that the dependence structure in such a bivariate Poisson vector only depends on the parameter ω . = λ /(λ λ ). If the bivariate Poisson distribution is understood as a limiting version of a bivariate Binomial [31,Section 4], then this ω would indeed be akin to the odds ratio in the constituting initial bivariate Bernoulli distribution. Acting as in the previous section, one can rst truncate X and Y at N − , for obtaining discrete copula pmf's and then let N tend to in nity for obtaining the Poisson copula densities shown in Figure 8    Like in any bivariate discrete distribution built on such an idea of 'trivariate reduction' (8.1), the components X and Y of a bivariate Poisson vector can only show positive association. How to construct bivariate discrete distributions with Poisson marginals showing negative association has been a challenging problem for a long time. For instance, [23] noted: "we have been unable to discover explicit in the literature any examples of bivariate Poisson distributions in which the correlation is negative." [40] proposed a classical copula construction based on (1.1), thus subject to caution following the discussion in Section 2, in particular, the impossibility of ever disjointing margins and dependence.
By contrast, one can couple any two Poisson distributions with any continuous copula through IPF (Sections 6.4-6.5). Figure 8.3 shows confetti plots of three bivariate discrete distributions with Poisson P( ) marginals and negative association; coupled through (a) a Clayton copula with θ = − . ; (b) a Gaussian copula with ρ = − . ; and (c) a Geometric copula with ω = / (Figure 8.1). This illustrates that the proposed discrete copula approach shares with its continuous counterpart the same exibility for constructing 'new' bivariate distributions with arbitrary marginals and dependence structure.

Concluding remarks
The classical de nition of a copula (De nition 1.1) stems implicitly but manifestly from the Probability Integral Transform result. Hence it is fundamentally grounded in the continuous framework, and there is little surprise that classical copula ideas lead to many inconsistencies when applied on discrete random vectors. What may appear surprising is that a large part of the previous literature in the eld has tried to make such an inherently continuous concept forcibly t the discrete case as well, in spite of those inconsistencies.
In this paper it is argued that the very essence of a copula should not be imprisoned in De nition 1.1. Fundamentally, a copula is akin to an equivalence class of distributions sharing the same dependence structure. De ning such equivalence classes, called nuclei, does not require resorting to PIT and hence smoothly carries over to the discrete case. This paper describes that 'discrete copula' construction. All the pleasant properties of copulas for modelling dependence are maintained in the presented discrete framework, such as margin-freeness of anything copula-based or exibility in constructing bivariate distributions with arbitrary marginals and dependence structure.
Existence and uniqueness of the copula probability mass function, analogue to the copula density in the continuous case, are established under mild conditions. The ideas are rst introduced in the bivariate Bernoulli case, and then generalised to distributions supported on { , , . . . , R} × { , , . . . , S}, for some nite R and S, and nally to bivariate distributions supported on N × N. Interestingly, the dependence structure in such a (N × N)-supported distribution may still be captured by a classical continuous copula, and that copula is unique. However, that copula is not to be understood through Sklar's theorem (1.1). The construction gives rise to new continuous copulas, such as the Geometric copula and the Poisson copula, representing the de-pendence structure in bivariate Geometric and bivariate Poisson distributions, respectively. Purely discrete copulas are also introduced, such as the Binomial copula.
Finally, we note that it is straightforward to generalise the bivariate concepts expounded in this paper to higher dimensions. De nition 6.1 (discrete copula) can be formulated in terms of a multi-dimensional array with uniform margins in every dimension. Higher-order odds-ratios keep their valuable properties, in particular margin-freeness [41, Section 6.2]. The matrix scaling problem is well-studied for higher-dimensional arrays as well [25, Section 6.1, and references therein], and the multi-dimensional IPF algorithm is implemented in the R mipfp package [3].

Acknowledgement
The author would like to thank the Editor, the Associate Editor and two anonymous referees for their helpful suggestions which greatly helped to improve the quality of the paper. ; (c) α = . , β = .

A Appendix
Let C and C be two copulas such that C ≺ Π ≺ C (concordance ordering; that is, C (u, v) ≤ uv ≤ C (u, v), ∀(u, v) ∈ I). Consider the mixture copula de ned as for α, β ∈ [ , ] with α + β ≤ . Call and assume ξ ∈ ( , ). Then it can be checked that, for any α ≤ ξ and β = −ξ ξ α, the copula C α,β is such that C α,β ( − π X , − π Y ) = ( − π X )( − π Y ), (A.1) and thus satis es Sklar's theorem for two independent Bernoulli random variables X ∼ Bern(π X ) and Y ∼ Bern(π Y ). For any C , C , the independence copula Π corresponds to (α, β) = ( , ), but there exist in nitely many other copulas, of various natures and shapes, which satisfy (A.1). For example, for π X = π Y = / , the 6 copula densities shown in Figure 9.1 are equally valid for representing the independence between X and Y. Yet, their appearances -and consequently any qualitative or quantitative assessment of the underlying dependence based on them -are dramatically di erent.