In order to express the model likelihood, we have first to express the distribution of the response variables given the latent variables. In particular, for each subject i we observe the sequence bi = (bi1, …, bili)′; we also observe yi,obs which corresponds to all or a part of the sequence yi = (yi1, …, yili)′. In particular, if all elements of bi are equal to 0, then yi,obs and yi will coincide; if some elements of bi are equal to 1 or 2, then yi,obs will be a subvector of yi.
Based on the assumptions formulated in the previous section, the distribution of interest has the following density function:
where p(bil|Ui = u) is defined in (1), the second product is extended to all observed elements of yi, and ϕ(yil|Ui = u) denotes the density of the normal distribution defined according to assumption (2). As in a standard finite mixture model, the manifest distribution has density that may be obtained as
This is the basis for the model log-likelihood, which has expression
where θ is a vector containing all model parameters, that is, βu, γ1u, γ2u, δu, for u = 1, …, k, and σ2.
In order to maximize ℓ(θ) with respect to θ, we rely on the Expectation-Maximization (EM) algorithm (Dempster, Laird, and Rubin 1977). This algorithm has been used extensively for fitting mixture models (see McLachlan and Krishnan 1997; McLachlan and Peel 2000; Fraley and Raftery 2002) in the maximum likelihood framework.
The EM algorithm is based on alternating the following two steps until convergence in the target function:
E-step: it consists of computing the conditional expected value, given the observed data and the current value of the parameters, of the complete data log-likelihood, which is defined as follows:
In the above expression, ziu is an indicator variable equal to 1 if subject i belongs to cluster u (i.e. Ui = u), and to 0 otherwise.
M-step: the expected value resulting from the E-step is maximized with respect to θ and, in this way, this parameter vector is updated.
In practice, the E-step reduces to compute the (conditional) expected value of each indicator variable ziu, denoted by by the following simple rule on the basis of the current value of the parameters:
Regarding the M-step, we can use explicit solutions for the parameter vectors βu and for the common variance σ2:
where oi is the dimension of yi,obs, that is, the number of regularly completed laps by runner i. On the other hand, updating the remaining parameters γ1u and γ2u in (1) requires an iterative algorithm of a Netwon-Raphson type. However, this is a simple algorithm since the objective function being maximized is of the same form as the objective function used when fitting a standard multinomial logit model with weights by maximum likelihood. The same Netwon-Raphson algorithm is also applied to update the parameters δu in (3) that affect the distribution of each latent variables Ui on the basis of the individual covariates. In the case where the πiu probabilities are assumed to be equal for all subjects (i.e. πiu = πu), we have an explicit solution for the maximization of the expected complete-data log-likelihood with respect to the πu probabilities:
It is important, as for any other iterative algorithm, that the EM algorithm described above is suitably initialized; this amount to guessing starting values for the parameters in θ. We suggest to use both a simple rule providing sensible values for these parameters and a random rule which allows us to properly explore the parameter space. Just to clarify, we choose the starting values for the mass probabilities πiu as 1/k for u = 1, …, k under the first rule, which is equivalent to fix the same size for all clusters. The corresponding random starting rule is instead based on first drawing each parameter πiu from a uniform distribution between 0 and 1 and then normalizing these values.
We recall that trying different starting values for the EM algorithm is important to face the problem of multimodality of the likelihood function that may arise in finite mixture models and combining different initialization rules (deterministic and random) is an effective strategy in this regard.
Comments (0)