Let *y*_{ij} be the *i*-th observation (i.e. voter) and the *j*-th variable (i.e. player on the ballot) where *i* = 1, 2, …, *N* and *j* = 1, 2, …, *J* where each variable is partially observed (if a voter released their ballot). *y*_{ij} = 1 if the *i*-th voter voted for the *j*-th player and is 0 otherwise. We then define *y*_{.j} to be the column vector containing 0’s, 1’s and missing values for the *j*-th player, and the observed and missing parts of *y*_{.j} are ${\mathbf{y}}_{.j}^{obs}$ and ${\mathbf{y}}_{.j}^{mis}$, respectively. Thus ${\mathbf{Y}}^{obs}=({\mathbf{y}}_{.1}^{obs},{\mathbf{y}}_{.2}^{obs},\mathrm{\dots},{\mathbf{y}}_{.J}^{obs})$, ${\mathbf{Y}}^{mis}=({\mathbf{y}}_{.1}^{mis},{\mathbf{y}}_{.2}^{mis},\mathrm{\dots},{\mathbf{y}}_{.J}^{mis})$ and $\mathbf{Y}=({\mathbf{Y}}^{obs},{\mathbf{Y}}^{mis})$. **Y**^{obs} is a *n*_{obs} × *J* matrix and **Y**^{mis} is a *n*_{mis} × *J* matrix and ${n}_{obs}+{n}_{mis}=N$.

FCS (Van Buuren et al., 2006) for multivariate imputation creates a model for the *j*-th variable conditional on all of the remaining *J* − 1 variables, and missing values of the *j*-th variable are imputed using this model. This process is then repeated for each of the *J* variables imputing values for all of the missing data.

In this setting, we make the assumption that the covariance structure of the public ballots and the private ballots are the same, which we believe to be a reasonable assumption here. Further, the imputations are bound by two restrictions: (1) vote totals received by each player and (2) a maximum of ten votes per ballot. Formally,

$${\mathbf{Y}}^{mis}{\text{1}}_{J}\le 10{\text{1}}_{{n}_{mis}}$$

and

$${\text{1}}_{{n}_{obs}}^{\prime}{\mathbf{Y}}^{obs}+{\text{1}}_{{n}_{mis}}^{\prime}{\mathbf{Y}}^{mis}=V$$

where **1**_{J} and ${\text{1}}_{{n}_{mis}}$ are each a column vector of length *J* and *n*_{mis}, respectively, consisting of all 1’s, $V=({v}_{1},{v}_{2},\mathrm{\dots},{v}_{J})$, and ${V}_{j}={\mathbf{\text{y\u2019}}}_{.j}^{obs}{\text{1}}_{{n}_{obs}}+{\mathbf{\text{y\u2019}}}_{.j}^{mis}{\text{1}}_{{n}_{mis}}$ is the total votes received by player *j* for *j* = 1, 2, …, *J*. We are interested in drawing imputations from

$$P\left({\mathbf{Y}}^{mis}\right|{\mathbf{Y}}^{obs},{\mathbf{Y}}^{mis}{\text{1}}_{J}\le 10{\text{1}}_{J},{\text{1}}_{{n}_{mis}}^{\prime}{\mathbf{Y}}^{mis}=V-{\text{1}}_{{n}_{obs}}^{\prime}{\mathbf{Y}}^{obs})$$

These restrictions were incorporated by first generating 100,000 synthetic ballots as candidates to potentially become imputed values for **Y**^{mis}. Synthetic ballots that did not conform to the restriction ${\mathbf{Y}}^{mis}{\text{1}}_{J}\le 10{\text{1}}_{{n}_{mis}}$ were removed from the potential ballots. From the remaining synthetic ballots that did conform to this restriction, a ballot was randomly sampled and used to impute a value of **Y**^{mis}. This alone, however, does not satisfy the second restriction ${\text{1}}_{{n}_{mis}}^{\prime}{\mathbf{Y}}^{mis}\le V$ and an iterative algorithm was incorporated to satisfy this condition. This worked by iterating through each player *j* and sampling from the potential candidate ballots that satisfy the first restriction (i.e. vote totals per ballot) until the second condition (i.e. Player *j*’s vote total is equal to *v*_{j}). This process was repeated by iterating through the players *j* = 1, 2, …, *J* until ${\text{1}}_{{n}_{mis}}^{\prime}{\mathbf{Y}}^{mis}=V$. The resulting imputation contains imputed ballots with 10 or fewer votes, conforms to the player vote total restrictions, and maintains the covariance structure of the observed public ballots.

This imputation algorithm described previously is implemented here using R (R Development Core Team, 2007) with the function “mice” in the package MICE (Van Buuren and Oudshoorn, 2007). By default the “mice” function uses logistic regression when imputing binary data, and this is the setting that was chosen to impute the unobserved voting dat in this study.

## Comments (0)