Having in mind that the best variational approximation targeting the joint posterior is the Dirichlet distribution in (7), an obvious choice for $$\mathcal{F}$$ would be (a subset of) the Dirichlet family of distributions. However, it will prove useful to take into account an even broader family as well, that is, the generalized Dirichlet family of distributions (Wong, 1998, 2010). The VB solution (7) can be expressed as a generalized Dirichlet distribution:

$$q\mathrm{(}\theta \mathrm{)}=\mathcal{D}\mathrm{(}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K}\mathrm{)}\equiv \mathcal{G}\mathcal{D}\mathrm{(}{\gamma}_{1}\mathrm{,}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K-1}\mathrm{;}\text{\hspace{0.17em}}{\gamma}_{1}^{+}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K-1}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}(15)}$$(15)

where $${\gamma}_{\ell}^{+}\mathrm{:}={\displaystyle {\sum}_{j=\ell +1}^{K}}{\gamma}_{\ell},\text{}\ell =\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1.$$

Next we define two different parameterizations of $$\mathcal{F}$$ in Equation (14), denoted by $${\mathcal{F}}_{\mathcal{D}}$$ and $${\mathcal{F}}_{\mathcal{G}\mathcal{D}}$$ with $${\mathcal{F}}_{\mathcal{D}}\subset {\mathcal{F}}_{\mathcal{G}\mathcal{D}}.$$ This is done by considering transformations of the parameters $$\mathrm{(}\gamma \mathrm{,}\text{\hspace{0.17em}}{\gamma}^{+}\mathrm{)}\to \mathrm{(}\tilde{\gamma}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}^{+}\mathrm{}\mathrm{)}\mathrm{.}$$ To simplify the optimization we keep the same mean as the original VB distribution (7). This is reasonable due to the fact that, according to our simulation study (as well as many others not reported here), this distribution is quite accurate in estimating the posterior means.

The first transformation is based on only one variable $$\delta \in \mathbb{R},$$ that is,

$$\mathrm{(}{\tilde{\gamma}}_{k}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}_{k}^{+}\mathrm{)}=\mathrm{(}{e}^{\delta}{\gamma}_{k}\mathrm{,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{e}^{\delta}{\gamma}_{k}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1.$$

Note here that this transformation implies that

$$\begin{array}{c}{\tilde{\gamma}}_{k+1}+{\tilde{\gamma}}_{k+1}^{+}={e}^{\delta}{\gamma}_{k+1}+{e}^{\delta}{\gamma}_{k+1}^{+}={e}^{\delta}\mathrm{(}{\gamma}_{k+1}+{\gamma}_{k+1}^{+}\mathrm{)}={e}^{\delta}{\gamma}_{k}^{+}\\ ={\tilde{\gamma}}_{k}^{+}\mathrm{,}\text{\hspace{1em}}\forall k=\mathrm{1,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1\end{array}$$

hence, for all *δ* the resulting distribution is still a member of the subset of the Dirichlet family:

$${\mathcal{F}}_{\mathcal{D}}\mathrm{:}=\mathrm{\{}\mathcal{D}\mathrm{(}{e}^{\delta}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{\delta}{\gamma}_{K}\mathrm{)}:\delta \in \mathbb{R}\}\mathrm{.}\text{\hspace{1em}(16)}$$(16)

The second transformation relaxes the restriction of remaining inside the Dirichlet family. Now δ=(**δ**_{1}, …, *δ*_{K}_{–1}),

$$\mathrm{(}{\tilde{\gamma}}_{k}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}_{k}^{+}\mathrm{)}=\mathrm{(}\mathrm{}{e}^{{\delta}_{k}}{\gamma}_{k}\mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{k}}{\gamma}_{k}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1\text{\hspace{1em}(17)}$$(17)

resulting to the following subset of the generalized Dirichlet family:

$${\mathcal{F}}_{\mathcal{G}\mathcal{D}}\mathrm{:}=\mathrm{\{}\mathcal{G}\mathcal{D}\mathrm{(}{e}^{{\delta}_{1}}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{K-1}}{\gamma}_{K-1}\mathrm{;}\text{\hspace{0.17em}}{e}^{{\delta}_{1}}{\gamma}_{1}^{+}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{K-1}}{\gamma}_{K-1}^{+}\mathrm{)}:{\delta}_{k}\in \mathbb{R}\mathrm{,}\text{\hspace{0.17em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K\}\mathrm{.}\text{\hspace{1em}(18)}$$(18)

Note that the number of parameters in (16) and (18) equals one and *K*–1, respectively. Moreover, for all $$f\in {F}_{GD}$$ it holds that,

$$\mathbb{E}{\theta}_{k}=\frac{{\tilde{\gamma}}_{k}}{{\tilde{\gamma}}_{k}+{\tilde{\gamma}}_{k}^{+}}{\displaystyle \prod _{j=1}^{k-1}}\frac{{\tilde{\gamma}}_{j}}{{\tilde{\gamma}}_{j}+{\tilde{\gamma}}_{j}^{+}}=\frac{{e}^{{\delta}_{k}}{\gamma}_{k}}{{e}^{{\delta}_{k}}{\gamma}_{k}+{e}^{{\delta}_{k}}{\gamma}_{k}^{+}}{\displaystyle \prod _{j=1}^{k-1}}\frac{{e}^{{\delta}_{j}}{\gamma}_{j}}{{e}^{{\delta}_{j}}{\gamma}_{j}+{e}^{{\delta}_{j}}{\gamma}_{j}^{+}}=\frac{{\gamma}_{k}}{{\gamma}_{1}+{\gamma}_{1}^{+}}=\frac{{\gamma}_{k}}{{\displaystyle \sum _{j=1}^{K}}{\gamma}_{j}}\mathrm{,}$$

∀*k*=1, …, *K*, while the same remains true for $$f\in {\mathcal{F}}_{\mathcal{D}}$$ as well, since $${\mathcal{F}}_{\mathcal{D}}\subset {\mathcal{F}}_{\mathcal{G}\mathcal{D}}.$$ Consequently, both families (16) and (18) contain distributions having the same means as the distribution *q*(**θ**) in (7). Finally, notice that when *K*=2 both parameterizations are the same, as in such a case a generalized Dirichlet distribution degenerates to a Dirichlet. In order to maximize (14) under parameterizations (16) or (18) the following stochastic approximation algorithm was implemented.

Corresponding author: Panagiotis Papastamoulis, University of Manchester, Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK, e-mail:

Published Online: 2014-01-10Published in Print: 2014-04-01Citation Information:Statistical Applications in Genetics and Molecular Biology. Volume 13, Issue 2, Pages 203–216, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, DOI: https://doi.org/10.1515/sagmb-2013-0054, January 2014©2014 by Walter de Gruyter Berlin/Boston. This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY-NC-ND 4.0)