Having in mind that the best variational approximation targeting the joint posterior is the Dirichlet distribution in (7), an obvious choice for $$\mathcal{F}$$ would be (a subset of) the Dirichlet family of distributions. However, it will prove useful to take into account an even broader family as well, that is, the generalized Dirichlet family of distributions (Wong, 1998, 2010). The VB solution (7) can be expressed as a generalized Dirichlet distribution:

$$q\mathrm{(}\theta \mathrm{)}=\mathcal{D}\mathrm{(}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K}\mathrm{)}\equiv \mathcal{G}\mathcal{D}\mathrm{(}{\gamma}_{1}\mathrm{,}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K-1}\mathrm{;}\text{\hspace{0.17em}}{\gamma}_{1}^{+}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{\gamma}_{K-1}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}(15)}$$(15)

where $${\gamma}_{\ell}^{+}\mathrm{:}={\displaystyle {\sum}_{j=\ell +1}^{K}}{\gamma}_{\ell},\text{}\ell =\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1.$$

Next we define two different parameterizations of $$\mathcal{F}$$ in Equation (14), denoted by $${\mathcal{F}}_{\mathcal{D}}$$ and $${\mathcal{F}}_{\mathcal{G}\mathcal{D}}$$ with $${\mathcal{F}}_{\mathcal{D}}\subset {\mathcal{F}}_{\mathcal{G}\mathcal{D}}.$$ This is done by considering transformations of the parameters $$\mathrm{(}\gamma \mathrm{,}\text{\hspace{0.17em}}{\gamma}^{+}\mathrm{)}\to \mathrm{(}\tilde{\gamma}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}^{+}\mathrm{}\mathrm{)}\mathrm{.}$$ To simplify the optimization we keep the same mean as the original VB distribution (7). This is reasonable due to the fact that, according to our simulation study (as well as many others not reported here), this distribution is quite accurate in estimating the posterior means.

The first transformation is based on only one variable $$\delta \in \mathbb{R},$$ that is,

$$\mathrm{(}{\tilde{\gamma}}_{k}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}_{k}^{+}\mathrm{)}=\mathrm{(}{e}^{\delta}{\gamma}_{k}\mathrm{,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{e}^{\delta}{\gamma}_{k}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1.$$

Note here that this transformation implies that

$$\begin{array}{c}{\tilde{\gamma}}_{k+1}+{\tilde{\gamma}}_{k+1}^{+}={e}^{\delta}{\gamma}_{k+1}+{e}^{\delta}{\gamma}_{k+1}^{+}={e}^{\delta}\mathrm{(}{\gamma}_{k+1}+{\gamma}_{k+1}^{+}\mathrm{)}={e}^{\delta}{\gamma}_{k}^{+}\\ ={\tilde{\gamma}}_{k}^{+}\mathrm{,}\text{\hspace{1em}}\forall k=\mathrm{1,}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1\end{array}$$

hence, for all *δ* the resulting distribution is still a member of the subset of the Dirichlet family:

$${\mathcal{F}}_{\mathcal{D}}\mathrm{:}=\mathrm{\{}\mathcal{D}\mathrm{(}{e}^{\delta}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{\delta}{\gamma}_{K}\mathrm{)}:\delta \in \mathbb{R}\}\mathrm{.}\text{\hspace{1em}(16)}$$(16)

The second transformation relaxes the restriction of remaining inside the Dirichlet family. Now δ=(**δ**_{1}, …, *δ*_{K}_{–1}),

$$\mathrm{(}{\tilde{\gamma}}_{k}\mathrm{,}\text{\hspace{0.17em}}{\tilde{\gamma}}_{k}^{+}\mathrm{)}=\mathrm{(}\mathrm{}{e}^{{\delta}_{k}}{\gamma}_{k}\mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{k}}{\gamma}_{k}^{+}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{1em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K-1\text{\hspace{1em}(17)}$$(17)

resulting to the following subset of the generalized Dirichlet family:

$${\mathcal{F}}_{\mathcal{G}\mathcal{D}}\mathrm{:}=\mathrm{\{}\mathcal{G}\mathcal{D}\mathrm{(}{e}^{{\delta}_{1}}{\gamma}_{1}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{K-1}}{\gamma}_{K-1}\mathrm{;}\text{\hspace{0.17em}}{e}^{{\delta}_{1}}{\gamma}_{1}^{+}\mathrm{,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}{e}^{{\delta}_{K-1}}{\gamma}_{K-1}^{+}\mathrm{)}:{\delta}_{k}\in \mathbb{R}\mathrm{,}\text{\hspace{0.17em}}k=\mathrm{1,}\text{\hspace{0.17em}}\dots \mathrm{,}\text{\hspace{0.17em}}K\}\mathrm{.}\text{\hspace{1em}(18)}$$(18)

Note that the number of parameters in (16) and (18) equals one and *K*–1, respectively. Moreover, for all $$f\in {F}_{GD}$$ it holds that,

$$\mathbb{E}{\theta}_{k}=\frac{{\tilde{\gamma}}_{k}}{{\tilde{\gamma}}_{k}+{\tilde{\gamma}}_{k}^{+}}{\displaystyle \prod _{j=1}^{k-1}}\frac{{\tilde{\gamma}}_{j}}{{\tilde{\gamma}}_{j}+{\tilde{\gamma}}_{j}^{+}}=\frac{{e}^{{\delta}_{k}}{\gamma}_{k}}{{e}^{{\delta}_{k}}{\gamma}_{k}+{e}^{{\delta}_{k}}{\gamma}_{k}^{+}}{\displaystyle \prod _{j=1}^{k-1}}\frac{{e}^{{\delta}_{j}}{\gamma}_{j}}{{e}^{{\delta}_{j}}{\gamma}_{j}+{e}^{{\delta}_{j}}{\gamma}_{j}^{+}}=\frac{{\gamma}_{k}}{{\gamma}_{1}+{\gamma}_{1}^{+}}=\frac{{\gamma}_{k}}{{\displaystyle \sum _{j=1}^{K}}{\gamma}_{j}}\mathrm{,}$$

∀*k*=1, …, *K*, while the same remains true for $$f\in {\mathcal{F}}_{\mathcal{D}}$$ as well, since $${\mathcal{F}}_{\mathcal{D}}\subset {\mathcal{F}}_{\mathcal{G}\mathcal{D}}.$$ Consequently, both families (16) and (18) contain distributions having the same means as the distribution *q*(**θ**) in (7). Finally, notice that when *K*=2 both parameterizations are the same, as in such a case a generalized Dirichlet distribution degenerates to a Dirichlet. In order to maximize (14) under parameterizations (16) or (18) the following stochastic approximation algorithm was implemented.

## Comments (0)