In this section, we describe a mean-field inference algorithm to approximate the posterior distribution of the attack and defense coefficients, as well as the home coefficient, which we need to predict the outcomes of the tournament games.

Variational inference provides an alternative to Markov chain Monte Carlo (MCMC) methods as a general source of approximation methods for inference in probabilistic models (Jordan et al. 1999). Variational algorithms turn inference into a non-convex optimization problem, but they are in general computationally less demanding compared to MCMC methods and do not suffer from limitations involving mixing of the Markov chains. In a general variational inference scenario, we have a set of hidden variables Φ whose posterior distribution given the observations **y** is intractable. In order to approximate the posterior $$p\mathrm{(}\Phi \mathrm{|}y\mathrm{,}\text{\hspace{0.17em}}\mathscr{H}\mathrm{)},$$ where $$\mathscr{H}$$ denotes the set of hyperparameters of the model, we first define a parametrized family of distributions over the hidden variables, *q*(Φ), and then fit their parameters to find a distribution that is close to the true posterior. Closeness is measured in terms of Kullback-Leibler (KL) divergence between both distributions *D*_{KL}(*q*||*p*). The computation of the KL divergence is intractable, but fortunately, minimizing *D*_{KL}(*q*||*p*) is equivalent to maximizing the so-called evidence lower bound (ELBO) $$\mathcal{L},$$ since

$$\begin{array}{c}logp\mathrm{(}y\mathrm{|}H\mathrm{)}=\mathbb{E}[logp\mathrm{(}y\mathrm{,}\Phi \mathrm{|}\mathscr{H}\mathrm{)}]+H\mathrm{[}q\mathrm{]}+{D}_{KL}\mathrm{(}q\left|\right|p\mathrm{)}\\ \ge \mathbb{E}[logp\mathrm{(}y\mathrm{,}\text{\hspace{0.17em}}\Phi \mathrm{|}\mathscr{H}\mathrm{)}]+H\mathrm{[}q\mathrm{]}\triangleq \mathcal{L}\mathrm{,}\end{array}\text{\hspace{1em}(3)}$$(3)

where the expectations above are taken with respect to the variational distribution *q*(Φ), and H[*q*] denotes the entropy of the distribution *q*(Φ).

Typical variational inference methods maximize the ELBO $$\mathcal{L}$$ by coordinate ascent, iteratively optimizing each variational parameter. A closed-form expression for the corresponding updates can be easily found for conditionally conjugate variables, i.e., variables whose complete conditional is in the exponential family. We refer to (Ghahramani and Beal 2001; Hoffman et al. 2013) for further details. In order to obtain a conditionally conjugate model, and following (Dunson and Herring 2005; Gopalan et al. 2013, 2014; Zhou et al. 2012), we augment the representation by defining for each game the auxiliary latent variables

$$\begin{array}{l}{z}_{m\mathrm{,}k}^{H1}~Poisson\text{\hspace{0.17em}}\mathrm{(}\gamma {\alpha}_{h\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}{\beta}_{a\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{0.17em}}{z}_{m\mathrm{,}k}^{H2}~Poisson\text{\hspace{0.17em}}\mathrm{(}\gamma {\eta}_{\ell \mathrm{(}h\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}{\rho}_{\ell \mathrm{(}a\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{}\mathrm{)}\mathrm{,}\\ {z}_{m\mathrm{,}k}^{A1}~Poisson\text{\hspace{0.17em}}\mathrm{(}{\alpha}_{a\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}{\beta}_{h\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{z}_{m\mathrm{,}k}^{A2}~Poisson\mathrm{(}{\eta}_{\ell \mathrm{(}a\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}{\rho}_{\ell \mathrm{(}h\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{}\mathrm{)}\mathrm{,}\end{array}\text{\hspace{1em}(4)}$$(4)

so that the observations for the home and away scores can be, respectively, expressed as

$${y}_{m}^{H}={\displaystyle \sum _{k=1}^{{K}_{1}}}{z}_{m\mathrm{,}k}^{H1}+{\displaystyle \sum _{k=1}^{{K}_{2}}}{z}_{m\mathrm{,}k}^{H2}\mathrm{,}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}and\text{\hspace{0.17em}\hspace{0.17em}}{y}_{m}^{A}={\displaystyle \sum _{k=1}^{{K}_{1}}}{z}_{m\mathrm{,}k}^{A1}+{\displaystyle \sum _{k=1}^{{K}_{2}}}{z}_{m\mathrm{,}k}^{A2}\mathrm{,}\text{\hspace{1em}(5)}$$(5)

due to the additive property of Poisson random variables. Thus, the auxiliary variables preserve the marginal Poisson distribution of the observations. Furthermore, the complete conditional distribution over the auxiliary variables, given the observations and the rest of latent variables, is a Multinomial. Using the auxiliary variables, and denoting **α**={**α**_{t}}, **β**={**β**_{t}}, **η**={**η**_{l}}, **ρ**={**ρ**_{l}} and $$z=\mathrm{\{}{z}_{mk}^{H1}\mathrm{,}\text{\hspace{0.17em}}{z}_{mk}^{H2}\mathrm{,}\text{\hspace{0.17em}}{z}_{mk}^{A1}\mathrm{,}\text{\hspace{0.17em}}{z}_{mk}^{A2}\mathrm{\}},$$ the joint distribution over the hidden variables can be written as

$$\begin{array}{l}p\mathrm{(}\alpha \mathrm{,}\beta \mathrm{,}\eta \mathrm{,}\rho \mathrm{,}\gamma \mathrm{,}z\mathrm{|}H\mathrm{)}={\displaystyle \prod _{t=1}^{T}}{\displaystyle \prod _{k=1}^{{K}_{1}}}p\mathrm{(}{\alpha}_{t\mathrm{,}\text{\hspace{0.17em}}k}\mathrm{|}{s}_{\alpha}\mathrm{,}\text{\hspace{0.17em}}{r}_{\alpha}\mathrm{)}p\mathrm{(}{\beta}_{t\mathrm{,}k}\mathrm{|}{s}_{\beta}\mathrm{,}\text{\hspace{0.17em}}{r}_{\beta}\mathrm{)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\times p\mathrm{(}\gamma \mathrm{|}{s}_{\gamma}\mathrm{,}\text{\hspace{0.17em}}{r}_{\gamma}\mathrm{)}{\displaystyle \prod _{l=1}^{L}}{\displaystyle \prod _{k=1}^{{K}_{2}}}p\mathrm{(}{\eta}_{l\mathrm{,}k}\mathrm{|}{s}_{\eta}\mathrm{,}\text{\hspace{0.17em}}{r}_{\eta}\mathrm{)}p\mathrm{(}{\rho}_{l\mathrm{,}k}\mathrm{|}{s}_{\rho}\mathrm{,}\text{\hspace{0.17em}}{r}_{\rho}\mathrm{)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\times {\displaystyle \prod _{m=1}^{M}}{\displaystyle \prod _{k=1}^{{K}_{1}}}p\mathrm{(}{z}_{m\mathrm{,}k}^{H1}\mathrm{|}\gamma \mathrm{,}\text{\hspace{0.17em}}{\alpha}_{h\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{,}\text{\hspace{0.17em}}{\beta}_{a\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{)}p\mathrm{(}{z}_{m\mathrm{,}k}^{A1}\mathrm{|}{\alpha}_{a\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{,}\text{\hspace{0.17em}}{\beta}_{h\mathrm{(}m\mathrm{}\mathrm{)}\mathrm{,}k}\mathrm{)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\times {\displaystyle \prod _{m=1}^{M}}{\displaystyle \prod _{k=1}^{{K}_{2}}}p\mathrm{(}{z}_{m\mathrm{,}k}^{H2}\mathrm{|}\gamma \mathrm{,}\text{\hspace{0.17em}}{\eta}_{\ell \mathrm{(}h\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{,}\text{\hspace{0.17em}}{\rho}_{\ell \mathrm{(}a\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{)}p\mathrm{(}{z}_{m\mathrm{,}k}^{A2}\mathrm{|}{\eta}_{\ell \mathrm{(}a\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{,}\text{\hspace{0.17em}}{\rho}_{\ell \mathrm{(}h\mathrm{(}m\mathrm{)}\mathrm{)}\mathrm{,}k}\mathrm{}\mathrm{)}\mathrm{,}\end{array}\text{\hspace{1em}(6)}$$(6)

and the observations are generated according to Eq. 5. In mean-field inference, the posterior distribution is approximated with a completely factorized variational distribution, i.e., *q* is chosen as

$$\begin{array}{l}q\mathrm{(}\alpha \mathrm{,}\beta \mathrm{,}\eta \mathrm{,}\rho \mathrm{,}\gamma \mathrm{,}z\mathrm{)}=q\mathrm{(}\gamma \mathrm{)}{\displaystyle \prod _{t=1}^{T}}{\displaystyle \prod _{k=1}^{{K}_{1}}}q\mathrm{(}{\alpha}_{t\mathrm{,}k}\mathrm{)}q\mathrm{(}{\beta}_{t\mathrm{,}k}\mathrm{)}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}{\displaystyle \prod _{l=1}^{L}}{\displaystyle \prod _{k=1}^{{K}_{2}}}q\mathrm{(}{\eta}_{l\mathrm{,}k}\mathrm{)}q\mathrm{(}{\rho}_{l\mathrm{,}k}\mathrm{)}{\displaystyle \prod _{m=1}^{M}}q\mathrm{(}{z}_{m}^{H}\mathrm{)}q\mathrm{(}{z}_{m}^{A}\mathrm{}\mathrm{)}\mathrm{,}\end{array}\text{\hspace{1em}(7)}$$(7)

being $${z}_{m}^{H}$$ the vector containing the variables $$\mathrm{\{}{z}_{mk}^{H1}\mathrm{,}\text{\hspace{0.17em}}{z}_{mk}^{H2}\mathrm{\}}$$ for game *m* (and similarly for $${z}_{m}^{A}$$ and $$\mathrm{\{}{z}_{mk}^{A1}\mathrm{,}\text{\hspace{0.17em}}{z}_{mk}^{A2}\mathrm{\}}\mathrm{)}.$$ For conciseness, we have removed the dependency on the variational parameters in Eq. 7. We set the variational distribution for each variable in the same exponential family as the corresponding complete conditional, therefore yielding

$$\begin{array}{l}q\mathrm{(}\gamma \mathrm{)}=gamma\mathrm{(}\gamma \mathrm{|}{\gamma}^{shp}\mathrm{,}\text{\hspace{0.17em}}{\gamma}^{rte}\mathrm{}\mathrm{)}\mathrm{,}\\ q\mathrm{(}{\alpha}_{t\mathrm{,}k}\mathrm{)}=gamma\mathrm{(}{\alpha}_{t\mathrm{,}k}\mathrm{|}{\alpha}_{t\mathrm{,}k}^{shp}\mathrm{,}\text{\hspace{0.17em}}{\alpha}_{t\mathrm{,}k}^{rte}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\\ q\mathrm{(}{\beta}_{t\mathrm{,}k}\mathrm{)}=gamma\mathrm{(}{\beta}_{t\mathrm{,}k}\mathrm{|}{\beta}_{t\mathrm{,}k}^{shp}\mathrm{,}\text{\hspace{0.17em}}{\beta}_{t\mathrm{,}k}^{rte}\mathrm{}\mathrm{)}\mathrm{,}\\ q\mathrm{(}{\eta}_{l\mathrm{,}k}\mathrm{)}=gamma\mathrm{(}{\eta}_{l\mathrm{,}k}\mathrm{|}{\eta}_{l\mathrm{,}k}^{shp}\mathrm{,}\text{\hspace{0.17em}}{\eta}_{l\mathrm{,}k}^{rte}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\\ q\mathrm{(}{\rho}_{l\mathrm{,}k}\mathrm{)}=gamma\mathrm{(}{\rho}_{l\mathrm{,}k}\mathrm{|}{\rho}_{l\mathrm{,}k}^{shp}\mathrm{,}\text{\hspace{0.17em}}{\rho}_{l\mathrm{,}k}^{rte}\mathrm{}\mathrm{)}\mathrm{,}\\ q\mathrm{(}{z}_{m}^{H}\mathrm{)}=multinomial\mathrm{(}{z}_{m}^{H}\mathrm{|}{y}_{m}^{H}\mathrm{,}\text{\hspace{0.17em}}{\varphi}_{m}^{H}\mathrm{}\mathrm{)}\mathrm{,}\text{\hspace{0.17em}}\\ q\mathrm{(}{z}_{m}^{A}\mathrm{)}=multinomial\mathrm{(}{z}_{m}^{A}\mathrm{|}{y}_{m}^{A}\mathrm{,}\text{\hspace{0.17em}}{\varphi}_{m}^{A}\mathrm{}\mathrm{)}\mathrm{.}\end{array}\text{\hspace{1em}(8)}$$(8)

Then, the set of variational parameters is composed of the shape and rate for each gamma distribution, as well as the probability vectors $${\varphi}_{m}^{H}$$ and $${\varphi}_{m}^{A}$$ for the multinomial distributions. Note that $${\varphi}_{m}^{H}$$ and $${\varphi}_{m}^{A}$$ are both (*K*_{1}+*K*_{2})-dimensional vectors. To minimize the KL divergence and obtain an approximation of the posterior, we apply a coordinate ascent algorithm (the update equations of the variational parameters are given in Appendix A).

## Comments (0)