Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access March 9, 2023

Consistency of mixture models with a prior on the number of components

  • Jeffrey W. Miller EMAIL logo
From the journal Dependence Modeling

Abstract

This article establishes general conditions for posterior consistency of Bayesian finite mixture models with a prior on the number of components. That is, we provide sufficient conditions under which the posterior concentrates on neighborhoods of the true parameter values when the data are generated from a finite mixture over the assumed family of component distributions. Specifically, we establish almost sure consistency for the number of components, the mixture weights, and the component parameters, up to a permutation of the component labels. The approach taken here is based on Doob’s theorem, which has the advantage of holding under extraordinarily general conditions, and the disadvantage of only guaranteeing consistency at a set of parameter values that has probability one under the prior. However, we show that in fact, for commonly used choices of prior, this yields consistency at Lebesgue-almost all parameter values, which is satisfactory for most practical purposes. We aim to formulate the results in a way that maximizes clarity, generality, and ease of use.

1 Introduction

Many theoretical advances have been made in establishing posterior consistency and contraction rates for density estimation when using nonparametric mixture models (see [8] and many references therein) or finite mixture models with a prior on the number of components [12,24]. Elegant results have also been provided showing posterior consistency and contraction rates for estimation of the discrete mixing distribution [19] when using either class of models, as well as consistency for the number of components [9].

Meanwhile, it has long been known that Doob’s theorem [4] can be used to prove almost sure consistency for the number of components as well as the mixture weights and the component parameters, up to a permutation [20]. Interestingly, in contrast to the modern theory mentioned previously, a Doob-type result can be extraordinarily general, holding under very minimal conditions. Doob’s theorem has been criticized for only guaranteeing consistency on a set of probability one under the prior, and thus, a poorly chosen prior can lead to a useless result [22]; however, for many models, this is a straw man argument since a well-chosen prior can lead to a consistency guarantee at Lebesgue-almost all parameter values.

While the result of Nobile [20] was prescient and general, it has some disadvantages. First, Nobile [20] assumes some conditions that are not needed, specifically, (i) that there is a sigma-finite measure μ such that for all v , the component distribution F v has a density f v with respect to μ , (ii) that v f v ( x ) is continuous for all x , and (iii) employing a somewhat complicated algorithm for mapping parameters into an identifiable space. Furthermore, it is difficult to use Nobile [20] as a reference since the exposition is quite technical and requires significant effort to unpack.

In this article, we present a Doob-type consistency result for mixtures, with the goal of maximizing clarity, generality, and ease of use. Our result generalizes upon the work of Nobile [20] in that we do not require conditions (i)–(iii). We formulate the result directly in terms of the original parameter space (rather than a transformed space as done by Nobile [20]), reflecting the way these models are used in practice. Furthermore, we provide conditions under which consistency holds almost everywhere with respect to Lebesgue measure, rather than just almost everywhere with respect to the prior as done by Nobile [20].

Compared to the modern theory, the limitation of a Doob-type result is that, for any given true parameter value, the theorem cannot tell us whether it is in the measure zero set where consistency may fail. Another important caveat is that the data are required to be generated from the assumed class of finite mixture models. Most consistency results are based on an assumption of model correctness, and the result we present is no different in that respect. However, unfortunately, the posterior on the number of components in a mixture model is especially sensitive to model misspecification [2,14], so any inferences about the number of components should be viewed with extreme skepticism. On the other hand, Miller and Harrison [15,16] show that popular nonparametric mixture models (such as Dirichlet process mixtures) are not even consistent for the number of components when the component family is correctly specified – and this lack of consistency is an even more fundamental concern than sensitivity to misspecification. Thus, although finite mixture models are rarely – if ever – exactly correct, having a consistency guarantee at least provides an assurance that the methodology is coherent.

In practice, mixture models with a prior on the number of components often provide useful insights into heterogeneous data, and, as the saying goes, “all models are wrong but some are useful” [1]. Mixtures are extensively used in a wide range of applications, and modern algorithms facilitate posterior inference when placing a prior on the number of components; see Miller and Harrison [17] and references therein. Thus, it is important to characterize the theoretical properties of these models as generally as possible.

The article is organized as follows. In Section 2, we describe the class of models under consideration and introduce the conditions to be assumed. In particular, we state conditions on the component distributions (Condition 2.1) and prior (Condition 2.2) enabling Lebesgue-almost everywhere consistency, and we provide common examples satisfying these conditions. In Section 3, we state our main results, and Section 4 contains the proofs.

2 Model

Let ( F v : v V ) be a family of probability measures on X , where V R D is measurable and X is a Borel measurable subset of a complete separable metric space, equipped with the Borel sigma-algebra. For all d , we give R d the Euclidean topology and the resulting Borel sigma-algebra. For k { 1 , 2 , } , define Δ k w ( 0 , 1 ) k : i = 1 k w i = 1 R k . For w Δ k and v V k , define a probability measure

(1) P w , v = i = 1 k w i F v i

on X . Thus, P w , v is the mixture with weights w i and component parameters v i .

Let π , D k , and G k be probability measures on { 1 , 2 , } , Δ k , and V k , respectively. Consider the following model:

(2) (number of components) K π (mixture weights) W K = k D k where  W = ( W 1 , , W k ) (component parameters) V K = k G k where  V = ( V 1 , , V k ) (observed data) X 1 , , X n W , V P W , V i.i.d.

We use uppercase letters to denote random variables, such as K , and lowercase to denote particular values, such as k .

2.1 Conditions

Condition 2.1

(Family of component distributions).

  1. For all measurable A X , the function v F v ( A ) is measurable on V .

  2. (Finite mixture identifiability) For all k , k { 1 , 2 , } , w Δ k , w Δ k , v V k , and v V k , if P w , v = P w , v , then i = 1 k w i δ v i = i = 1 k w i δ v i .

Here, δ x denotes the unit point mass at x . Roughly, Condition 2.1(1) is that ( F v : v V ) is a measurable family and Condition 2.1(2) is that the discrete mixing distribution i = 1 k w i δ v i is uniquely determined by P w , v . Condition 2.1(2) is a standard definition of finite mixture identifiability [25]. Let S k denote the set of permutations of { 1 , , k } .

Condition 2.2

(Prior). Under the model in equation (2), for all k { 1 , 2 , } ,

  1. P ( K = k ) > 0 ,

  2. for all A Δ k measurable, if P ( W A K = k ) = 0 , then { w 1 : k 1 : w A } has Lebesgue measure zero,

  3. for all A V k measurable, if σ S k P ( V σ A K = k ) = 0 , then A has Lebesgue measure zero,

  4. P ( V i = V j K = k ) = 0 for all 1 i < j k .

Here, w 1 : k 1 = ( w 1 , , w k 1 ) and V σ = ( V σ 1 , , V σ k ) . Roughly, Conditions 2.2(1–3) state that the prior gives positive mass to all k and all sets with nonzero Lebesgue measure, for some permutation of the component labels. Condition 2.2(4) is that the component parameters are distinct with prior probability 1. Note that we do not assume that W k and V k have densities with respect to Lebesgue measure.

2.2 Examples

The conditions in Section 2.1 hold for many commonly used mixture models.

2.2.1 Family of component distributions

For the component distributions F v , there are many commonly used choices that satisfy Condition 2.1(2), including the multivariate normal [26] and, more generally, many elliptical families, such as multivariate t distributions [10]. Several discrete families, such as the Poisson, geometric, negative binomial, and many other power-series distributions, also satisfy Condition 2.1(2) [23]. In each of these cases, Condition 2.1(1) can be easily verified using Folland [7, Theorem 2.37].

2.2.2 Prior

For the prior on the mixture weights W k , Condition 2.2(2) is satisfied by choosing W k Dirichlet ( α k 1 , , α k k ) for any α k 1 , , α k k > 0 , since this has a density with respect to ( k 1 ) -dimensional Lebesgue measure d w 1 d w k 1 and this density is strictly positive on Δ k . More generally, for the same reason, Condition 2.2(2) is satisfied if W k is defined as follows: let Z i Beta ( a k i , b k i ) independently for i { 1 , , k 1 } , where a k i , b k i > 0 , then set W i = Z i j = 1 i 1 ( 1 Z j ) for i { 1 , , k 1 } and W k = 1 i = 1 k 1 W i ; this is called the generalized Dirichlet distribution [3,11].

For the prior on the component parameters V k , perhaps the most common situation is that V 1 , , V k are i.i.d. from some distribution G 0 ; in this case, Conditions 2.2(3) and 2.2(4) are satisfied if G 0 has a density with respect to Lebesgue measure and this density is strictly positive on V except for a set of Lebesgue measure zero. A more interesting example is the case of repulsive mixtures, which use a non-independent prior on component parameters to favor well-separated mixture components. For instance, Petralia et al. [21] propose defining V k to have a density (with respect to Lebesgue measure) proportional to h ( v ) i = 1 k g 0 ( v i ) , where g 0 is a probability density on V and h : V k R is either h ( v ) = 1 i < j k ρ ( v i v j ) or h ( v ) = min 1 i < j k ρ ( v i v j ) , where ρ : [ 0 , ) R is a strictly increasing, bounded function with ρ ( 0 ) = 0 . Then Conditions 2.2(3) and 2.2(4) are satisfied as long as g 0 is strictly positive on V except for a set of Lebesgue measure zero. This holds not only when v i consists of location parameters but also, in general, for any form of component parameter, such as both location and scale parameters.

3 Main results

We show that for any model as in equation (2) satisfying Conditions 2.1 and 2.2, the posterior is consistent for k , w , and v up to a permutation of the component labels, except on a set of Lebesgue measure zero. More generally, if only Conditions 2.1 and 2.2(4) are satisfied, then the result holds except on a set of prior measure zero.

Define Θ k Δ k × V k and Θ k = 1 Θ k , noting that Θ 1 , Θ 2 , are disjoint sets. Thus, for any θ Θ , we have θ = ( w , v ) for some unique w Δ k , v V k , and k { 1 , 2 , } ; let k ( θ ) denote this value of k . In terms of θ , the data distribution is P θ = P w , v , where P w , v is defined in equation (1).

We define a metric on Θ as follows: for θ , θ Θ , let

(3) d Θ ( θ , θ ) = min { θ θ , 1 } if k ( θ ) = k ( θ ) , 1 otherwise,

where is the Euclidean norm on Δ k × V k R k + k D . Propositions A.1 and A.2 show that d Θ is indeed a metric and Θ is a Borel measurable subset of a complete separable metric space; we give Θ the resulting Borel sigma-algebra. Recall that S k denotes the set of permutations of { 1 , , k } . For σ S k and θ Θ k , let θ [ σ ] denote the transformation of θ obtained by permuting the component labels, that is, if θ = ( w , v ) , then θ [ σ ] ( w σ , v σ ) , where w σ = ( w σ 1 , , w σ k ) and v σ = ( v σ 1 , , v σ k ) . For θ 0 Θ k and ε > 0 , define

(4) B ˜ ( θ 0 , ε ) = σ S k { θ Θ : d Θ ( θ , θ 0 [ σ ] ) < ε } .

Consider the model in equation (2) and define the random variable θ ( W , V ) .

Theorem 3.1

Assume Conditions 2.1and 2.2(4) hold. There exists Θ Θ such that P ( θ Θ ) = 1 and for all θ 0 Θ , if X 1 , X 2 , P θ 0 i.i.d., then for all ε > 0 ,

(5) lim n P ( θ B ˜ ( θ 0 , ε ) X 1 , , X n ) = 1 a.s. [ P θ 0 ]

and

(6) lim n P ( K = k ( θ 0 ) X 1 , , X n ) = 1 a.s. [ P θ 0 ] .

Here, the conditional probabilities are under the assumed model in equation (2); note that θ X 1 , , X n has a regular conditional distribution by Durrett [6, Theorems 1.4.12 and 4.1.6]. Now, define a measure λ on Θ as follows. Let λ V k denote Lebesgue measure on V k , and let λ Δ k denote the measure on Δ k such that, for all A Δ k measurable, λ Δ k ( A ) equals the Lebesgue measure of { w 1 : k 1 : w A } R k 1 . Define λ ( A ) k = 1 ( λ Δ k × λ V k ) ( A Θ k ) for all measurable A Θ . In essence, λ can be thought of as Lebesgue measure on Θ .

Theorem 3.2

If Conditions 2.1and 2.2 hold, then the set Θ in Theorem 3.1can be chosen such that λ ( Θ Θ ) = 0 .

In other words, for λ -almost all values of θ 0 in Θ , if X 1 , X 2 , P θ 0 i.i.d., then for all ε > 0 , equations (5) and (6) hold P θ 0 -almost surely.

4 Proofs

Proof of Theorem 3.1

The basic idea of the proof is to use Doob’s theorem on posterior consistency [4,13]. However, Doob’s theorem cannot be directly applied since it requires identifiability, and while we assume identifiability of i = 1 k w i δ v i in Condition 2.1(2), this does not imply identifiability of ( w , v ) due to (a) invariance of P w , v with respect to permutation of the component labels and (b) the existence of points in Θ where v i = v j . To handle this, we consider a certain restricted parameter space on which identifiability holds for ( w , v ) , apply Doob’s theorem to a collapsed model on this restricted space, and then show that this implies the claimed result on all of Θ .

Identifiability constraints. We constrain the component parameters as follows to obtain identifiability of ( w , v ) . Putting the dictionary order (also known as lexicographic order) on elements of V R D , define

V k { ( v 1 , , v k ) V k : v 1 v k } R k D .

Here, v i v j denotes that v i precedes v j and v i v j . There is nothing particularly special about using the dictionary order here, aside from being a well-known total order on multivariate spaces and producing order-constrained sets V k that are Borel measurable. Define Θ ˜ k Δ k × V k and Θ ˜ k = 1 Θ ˜ k . Then, Θ ˜ is a Borel measurable subset of a complete separable metric space under the metric d Θ as defined in equation (3); this follows from Propositions A.1 and A.2 by taking X k = R k + k D , d k ( x , y ) = x y for x , y X k , and A k = Θ ˜ k for k { 1 , 2 , } .

Collapsed model. For θ Θ k , define T ( θ ) = θ [ σ ] , where σ S k is chosen such that θ [ σ ] Θ ˜ k if possible, and otherwise θ [ σ ] = θ . Then, P ( T ( θ ) Θ ˜ ) = 1 since the subset of V k where two or more v i ’s coincide has prior probability zero, by Condition 2.2(4). Denoting B [ σ ] = { θ [ σ ] : θ B } , note that by the definition of T , for all B Θ ˜ k ,

(7) T 1 ( B ) = { θ Θ : T ( θ ) B } = σ S k B [ σ ] .

Letting Q ˜ denote the distribution of T ( θ ) , restricted to Θ ˜ , we have

(8) T ( θ ) Q ˜ X 1 , , X n T ( θ ) P T ( θ ) i.i.d.

by Dudley [5, Theorem 10.2.1] since P θ = P T ( θ ) and for all A X n and B Θ ˜ measurable, P ( X 1 : n A , T ( θ ) B ) = P ( X 1 : n A , θ T 1 ( B ) ) = B P θ ( n ) ( A ) d Q ˜ ( θ ) , where X 1 : n = ( X 1 , , X n ) ; measurability of θ P θ ( n ) ( A ) for A X n follows from measurability of θ P θ ( A ) for A X (shown below at equation (9)) along with Miller [13, Lemma 5.2]. We refer to equation (8) as the collapsed model.

Applying Doob’s theorem. We show that the collapsed model in equation (8) satisfies the conditions of Doob’s theorem [13]. First, we check identifiability. Let θ , θ Θ ˜ such that P θ = P θ . By Condition 2.1(2), i = 1 k w i δ v i = i = 1 k w i δ v i , where θ = ( w , v ) , θ = ( w , v ) , k = k ( θ ) , and k = k ( θ ) . By the definition of Θ ˜ , v 1 , , v k are all distinct, v 1 , , v k are all distinct, w 1 , , w k > 0 , and w 1 , , w k > 0 . This implies that k = k , w = w σ , and v = v σ for some σ S k . Furthermore, because v 1 v k and v 1 v k by the definition of Θ ˜ , it must be the case that σ is the identity permutation, so w = w and v = v , that is, θ = θ . Therefore, θ = ( w , v ) is identifiable on the restricted space Θ ˜ .

Next, we check measurability. Let A X be measurable. Then, for any k { 1 , 2 , } ,

(9) θ P θ ( A ) = i = 1 k w i F v i ( A )

is measurable as a function on Θ k = Δ k × V k , since the projections ( w , v ) w i and ( w , v ) v i are measurable, and v i F v i ( A ) is measurable on V by Condition 2.1(1). Therefore, θ P θ ( A ) is measurable as a function on Θ ˜ k = Δ k × V k Δ k × V k . It follows that it is measurable as a function on Θ ˜ (since the pre-image of a measurable subset of R is a union of measurable subsets of Θ ˜ 1 , Θ ˜ 2 , , respectively, and is thus measurable by Proposition A.2).

Thus, by Doob’s theorem [13], there exists Θ ˜ Θ ˜ such that P ( T ( θ ) Θ ˜ ) = 1 and the collapsed model is consistent at all T ( θ 0 ) Θ ˜ ; that is, for any neighborhood B Θ ˜ of T ( θ 0 ) , we have P ( T ( θ ) B X 1 : n ) 1 a.s.[ P T ( θ 0 ) ], where X 1 : n = ( X 1 , , X n ) . Define Θ to be the set of all points in Θ that can be obtained by permuting the mixture components of a point in Θ ˜ , that is, Θ k = 1 σ S k ( Θ ˜ Θ ˜ k ) [ σ ] . Then, by equation (7),

P ( θ Θ ) = P ( T ( θ ) Θ ˜ ) = 1 .

Putting the pieces together. Let θ 0 Θ and define k 0 = k ( θ 0 ) . Let X 1 , X 2 , P θ 0 i.i.d., let ε ( 0 , 1 ) , and define B { θ Θ ˜ : d Θ ( θ , T ( θ 0 ) ) < ε } Θ ˜ k 0 . Referring to equation (4), observe that σ S k 0 B [ σ ] B ˜ ( θ 0 , ε ) . Hence, by equation (7),

(10) P ( θ B ˜ ( θ 0 , ε ) X 1 : n ) P ( θ σ S k 0 B [ σ ] X 1 : n ) = P ( T ( θ ) B X 1 : n ) n a.s. 1 ,

where X 1 : n = ( X 1 , , X n ) , since P θ 0 = P T ( θ 0 ) and the collapsed model is consistent at all T ( θ 0 ) Θ ˜ . This proves equation (5). Equation (6) follows directly from equation (10), since ε < 1 implies B ˜ ( θ 0 , ε ) Θ k 0 , and therefore,

P ( K = k 0 X 1 : n ) = P ( θ Θ k 0 X 1 : n ) P ( θ B ˜ ( θ 0 , ε ) X 1 : n ) n a.s. 1 .

Proof of Theorem 3.2

Define Θ as in the proof of Theorem 3.1. Since P ( θ Θ ) = 1 ,

0 = P ( θ Θ Θ ) = k = 1 P ( θ Θ k Θ K = k ) P ( K = k ) .

Since P ( K = k ) > 0 for all k by Condition 2.2(1), P ( θ Θ k Θ K = k ) = 0 for all k .

For σ S k , let D k σ and G k σ denote the distributions of W σ k and V σ k , respectively, under the model. Note that for all σ S k , ( Θ k Θ ) [ σ ] = Θ k Θ . Thus,

(11) ( D k σ × G k σ ) ( Θ k Θ ) = ( D k × G k ) ( Θ k Θ ) = P ( θ Θ k Θ K = k ) = 0 .

Note that λ Δ k is invariant under permutations σ S k , since by Folland [7, Theorem 2.47], Lebesgue measure d w 1 d w k 1 on { w 1 : k 1 ( 0 , 1 ) k 1 : i = 1 k 1 w i < 1 } is invariant under transformations of the form g ( w 1 : k 1 ) = ( w σ 1 , , w σ k 1 ) , where w k = 1 i = 1 k 1 w i , because the Jacobian determinant is ± 1 . Conditions 2.2(2) and 2.2(3) are that λ Δ k D k and λ V k σ S k G k σ , respectively, where denotes absolute continuity. Thus, by Folland [7, Exercise 3.2.12],

(12) λ Δ k × λ V k λ Δ k × σ S k G k σ = σ S k λ Δ k σ × G k σ σ S k D k σ × G k σ .

By equation (11), ( D k σ × G k σ ) ( Θ k Θ ) = 0 for all σ S k , and thus, ( λ Δ k × λ V k ) ( Θ k Θ ) = 0 by equation (12). Therefore, λ ( Θ Θ ) = k = 1 ( λ Δ k × λ V k ) ( Θ k Θ ) = 0 .□

5 Conclusion

There are several directions that could be pursued in future work. First, it is straightforward to generalize to cases in which the true parameter is known to be in a subset of the space and the prior is restricted accordingly – for instance, if it is known that the number of components is less than some maximal number. A related generalization would be to handle partially identified mixtures, that is, to show consistency in cases where the mixtures are only identifiable up to a certain maximum number of components, such as mixtures of binomials [25, Proposition 4].

Another interesting direction would be to handle overfitted mixtures, in which the true distribution is a finite mixture from the assumed family, but the model uses a fixed number of components that is greater than the true value. An extension to establish consistency for the emission distributions of a hidden Markov model would also be interesting, if possible.

Finally, perhaps the biggest weakness of the Doob-type result is the unknown measure zero set on which consistency may fail. By adding regularity conditions, it might be possible to eliminate this limitation (i.e., to fully characterize the measure zero set) while still retaining broad generality, using a proof technique that augments Doob’s theorem.

Acknowledgments

Thanks to Matthew Harrison for helpful comments on an early version of this manuscript.

  1. Funding information: The author states that there is no funding involved.

  2. Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Conflict of interest: The author states that there is no conflict of interest.

  4. Data availability statement: Data sharing is not applicable to this article as no datasets were generated or analyzed during this study.

Appendix A Supporting results

Proposition A.1

If X 1 , X 2 , is a sequence of disjoint, complete separable metric spaces with metrics d 1 , d 2 , , respectively, then X = i = 1 X i is a complete separable metric space under the metric

d ( x , y ) = min { d i ( x , y ) , 1 } if x , y X i for some i , 1 if x X i , y X j , and i j ,

and the topology induced by this metric coincides with the disjoint union topology.

The disjoint union topology is the smallest topology that contains all the open sets of all the X i ’s. Equivalently, it is the topology consisting of all unions of the form i = 1 A i , where A i is open in X i for i { 1 , 2 , } .

Proof

First, we show that d is a metric on X . It is easy to see that d ( x , y ) = d ( y , x ) , d ( x , y ) 0 , and d ( x , y ) = 0 x = y . To prove the triangle inequality, let x , y , z X and suppose x X i , y X j , and z X k . Using the fact that d ¯ ( x , y ) min { d ( x , y ) , 1 } is a metric [18, Theorem 20.1], it is simple to check that d ( x , y ) d ( x , z ) + d ( z , y ) in each of the following cases: (1) i = j = k , (2) i = j k , and (3) i j .

Next, we show that X is complete under d . Let x 1 , x 2 , X be a Cauchy sequence. Choose N such that for all n , m N , d ( x n , x m ) 1 / 2 . Suppose i is the index such that x N X i . Then, x n X i for all n N , and d ( x n , x m ) = d i ( x n , x m ) for all n , m N . Thus, ( x N , x N + 1 , ) is a Cauchy sequence in X i under d i , so it converges (under d i ) to some x X i since X i is complete. Hence, it also converges to x under d . Therefore, X is complete.

Furthermore, X is separable, since if C i X i is a countable dense subset of X i under d i , then it is also dense in X i under d , so i = 1 C i is a countable dense subset of X under d .

Finally, d induces the disjoint union topology on X , since the collection of open balls

{ B ε ( x ) : ε ( 0 , 1 ) , x X i , i = 1 , 2 , } ,

where B ε ( x ) = { y X : d ( x , y ) < ε } , is a base for both the disjoint union topology and the d -metric topology.□

Proposition A.2

Suppose X 1 , X 2 , , and X are defined as in Proposition A.1. If A 1 , A 2 , are Borel measurable subsets of X 1 , X 2 , , respectively, then i = 1 A i is a Borel measurable subset of X .

Proof

For a topological space Y , let T Y denote its topology and let Y = σ ( T Y ) denote its Borel sigma-algebra. Since T X i T X (by the definition of the disjoint union topology), then X i X , and therefore, A i X i X for all i = 1 , 2 , . Hence, i = 1 A i X .□

References

[1] Box, G. E. (1979). Robustness in the strategy of scientific model building. In: Robustness in statistics (pp. 201–236). Cambridge, MA: Elsevier Inc. 10.1016/B978-0-12-438150-6.50018-2Search in Google Scholar

[2] Cai, D., Campbell, T., & Broderick, T. (2021). Finite mixture models do not reliably learn the number of components. In: International Conference on Machine Learning, PMLR, (pp. 1158–1169). Search in Google Scholar

[3] Connor, R. J., & Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal of the American Statistical Association, 64(325), 194–206. 10.1080/01621459.1969.10500963Search in Google Scholar

[4] Doob, J. L. (1949). Application of the theory of martingales. In: Actes du Colloque International Le Calcul des Probabilités et ses applications (Lyon, 28 Juin – 3 Juillet, 1948) (pp. 23–27). Paris: CNRS. Search in Google Scholar

[5] Dudley, R. M. (2002). Real analysis and probability. Cambridge, UK: Cambridge University Press. 10.1017/CBO9780511755347Search in Google Scholar

[6] Durrett, R. (1996). Probability: Theory and examples (Second edition). Belmont, CA: Wadsworth Publishing Company. Search in Google Scholar

[7] Folland, G. B. (2013). Real analysis: Modern techniques and their applications. New York, NY: John Wiley & Sons. Search in Google Scholar

[8] Ghosal, S., & Van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference. Cambridge, UK: Cambridge University Press. 10.1017/9781139029834Search in Google Scholar

[9] Guha, A., Ho, N., & Nguyen, X. (2021). On posterior contraction of parameters and interpretability in Bayesian mixture modeling. Bernoulli, 27(4), 2159–2188. 10.3150/20-BEJ1275Search in Google Scholar

[10] Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33(4), 753–763. 10.1111/j.1467-9469.2006.00505.xSearch in Google Scholar

[11] Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173. 10.1198/016214501750332758Search in Google Scholar

[12] Kruijer, W., Rousseau, J., & Van Der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electronic Journal of Statistics, 4, 1225–1257. 10.1214/10-EJS584Search in Google Scholar

[13] Miller, J. W. (2018). A detailed treatment of Doob’s theorem. arXiv: http://arXiv.org/abs/arXiv:1801.03122. Search in Google Scholar

[14] Miller, J. W., & Dunson, D. B. (2018). Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 114, 1113–1125.10.1080/01621459.2018.1469995Search in Google Scholar PubMed PubMed Central

[15] Miller, J. W., & Harrison, M. T. (2013). A simple example of Dirichlet process mixture inconsistency for the number of components. Advances in Neural Information Processing Systems, 26. Search in Google Scholar

[16] Miller, J. W., & Harrison, M. T. (2014). Inconsistency of Pitman-Yor process mixtures for the number of components. Journal of Machine Learning Research, 15(1), 3333–3370. Search in Google Scholar

[17] Miller, J. W., & Harrison, M. T. (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113(521), 340–356. 10.1080/01621459.2016.1255636Search in Google Scholar PubMed PubMed Central

[18] Munkres, J. R. (2000). Topology (Second edition). Upper Saddle River: Prentice Hall. Search in Google Scholar

[19] Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models. The Annals of Statistics, 41(1), 370–400. 10.1214/12-AOS1065Search in Google Scholar

[20] Nobile, A. (1994). Bayesian analysis of finite mixture distributions. (PhD thesis), Department of Statistics, Carnegie Mellon University, Pittsburgh, PA. Search in Google Scholar

[21] Petralia, F., Rao, V., & Dunson, D. (2012). Repulsive mixtures. Advances in Neural Information Processing Systems, 25. Search in Google Scholar

[22] Roeder, K., & Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association, 92(439), 894–902. 10.1080/01621459.1997.10474044Search in Google Scholar

[23] Sapatinas, T. (1995). Identifiability of mixtures of power-series distributions and related characterizations. Annals of the Institute of Statistical Mathematics, 47(3), 447–459. 10.1007/BF00773394Search in Google Scholar

[24] Shen, W., Tokdar, S. T., & Ghosal, S. (2013). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrika, 100(3), 623–640. 10.1093/biomet/ast015Search in Google Scholar

[25] Teicher, H. (1963). Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34, 1265–1269. 10.1214/aoms/1177703862Search in Google Scholar

[26] Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214. 10.1214/aoms/1177698520Search in Google Scholar

Received: 2022-05-06
Revised: 2022-08-14
Accepted: 2022-10-05
Published Online: 2023-03-09

© 2023 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 5.3.2024 from https://www.degruyter.com/document/doi/10.1515/demo-2022-0150/html
Scroll to top button