Jump to ContentJump to Main Navigation
Show Summary Details
More options …

The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

2 Issues per year

IMPACT FACTOR 2016: 0.500
5-year IMPACT FACTOR: 0.862

CiteScore 2016: 0.42

SCImago Journal Rank (SJR) 2016: 0.488
Source Normalized Impact per Paper (SNIP) 2016: 0.467

Mathematical Citation Quotient (MCQ) 2016: 0.09

See all formats and pricing
More options …

A Universal Approximate Cross-Validation Criterion for Regular Risk Functions

Daniel Commenges
  • Corresponding author
  • INSERM, ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Bordeaux F-33000, France
  • ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Universite de Bordeaux, Bordeaux F-33000, France
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Cécile Proust-Lima
  • INSERM, ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Bordeaux F-33000, France
  • ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Universite de Bordeaux, Bordeaux F-33000, France
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Cécilia Samieri
  • INSERM, ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Bordeaux F-33000, France
  • ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Universite de Bordeaux, Bordeaux F-33000, France
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Benoit Liquet
  • INSERM, ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Bordeaux F-33000, France
  • ISPED, Centre INSERM U-897-Epidemiologie-Biostatistique, Universite de Bordeaux, Bordeaux F-33000, France
  • School of Mathematics and Physics, The University of Queensland, St Lucia, Brisbane, Queensland 4066, Australia
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-04-03 | DOI: https://doi.org/10.1515/ijb-2015-0004


Selection of estimators is an essential task in modeling. A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and assessed using another function (the assessment function). A classical case is that both functions estimate an information risk (specifically cross-entropy); this corresponds to using maximum likelihood estimators and assessing them by Akaike information criterion (AIC). In more general cases, the assessment risk can be estimated by leave-one-out cross-validation. Since leave-one-out cross-validation is computationally very demanding, we propose in this paper a universal approximate cross-validation criterion under regularity conditions (UACVR). This criterion can be adapted to different types of estimators, including penalized likelihood and maximum a posteriori estimators, and also to different assessment risk functions, including information risk functions and continuous rank probability score (CRPS). UACVR reduces to Takeuchi information criterion (TIC) when cross-entropy is the risk for both estimation and assessment. We provide the asymptotic distributions of UACVR and of a difference of UACVR values for two estimators. We validate UACVR using simulations and provide an illustration on real data both in the psychometric context where estimators of the distributions of ordered categorical data derived from threshold models and models based on continuous approximations are compared.

Keywords: AIC; cross-entropy; cross-validation; estimator choice; Kullback–Leibler risk; model selection; ordered categorical observations; psychometric tests

1 Introduction

Selecting estimators is an essential step in modeling, and Akaike information criterion (AIC) [1] has been widely used for this purpose. AIC allows selecting maximum likelihood estimators (MLE) based on parametric models that are not too badly specified. More general criteria have been developed, in particular the Takeuchi information criterion (TIC) [2] and the general information criterion (GIC) [3]. A related criterion in the field of neural networks is the network information criterion (NIC) [4]. Two other well-known criteria are the Bayesian information criterion (BIC) and the deviance information criterion (DIC); both use Bayesian arguments and are not directly related to the present paper. A good reference book for information criteria is by Konishi and Kitagawa [5].

Likelihood cross-validation (LCV) has also been widely used for comparing parametric models. Stone [6] heuristically established that LCV was asymptotically identical to AIC. LCV, however, is more flexible in that it can be applied to other estimators than MLEs, for instance, to penalized likelihood estimators: see Golub et al. [7] and Wahba [8].

Cross-validation can also be applied to other assessment risks than Kullback–Leibler risk. The leave-one-out cross-validation is the most natural and one of the most efficient [9, 10], but it is also the most computationally demanding so that approximation formulas have been derived. Approximate cross-validation formulas have been developed for penalized splines [11, 12] or penalized likelihood [13, 14]. Commenges et al. [15] derived an approximate cross-validation criterion in the context of prognosis.

In the present paper we consider the following general framework: estimators of the true density function are defined as minimizing an estimating function; the estimating function itself can be viewed as an estimator of a risk, that we call an “estimating risk.” Typically there is a model, that is a family of densities for the variable Y, (gθ)θΘ, Θp, and the estimator is chosen as minimizing the estimating risk. The estimators of the true density are then assessed using an “assessment risk,” which allows choosing between different available estimators. The most conventional case is when the estimating risk is E[loggθ(Y)] which is estimated by the log likelihood, and the assessment risk of the obtained estimator gθˆ is E[loggθˆ(Y)], which can be estimated by cross-validation or in the parametric case by the normalized AIC: AIC /2n. These information risks are very appealing but there are cases where other risks are relevant. As an example, the MLE could be assessed by the continuous rank probability score (CRPS) [16]: this is detailed in Section 4.4. Another example is the estimation of the distribution of ordinal data through an approximation using models for continuous data. Models for ordinal variables that can take a large number of values are rather cumbersome; it is convenient to treat these data as continuous, using an estimating risk adapted to continuous data. However, if we wish to compare the obtained estimator to that obtained by a model for ordinal data, the assessment risk must still take into account that the data are really ordinal. Such assessment risk can be estimated by cross-validation; cross-validation has good properties but is very computationally demanding. The main aim of this paper is to find an approximation for leave-one-out cross-validation, valid whatever the estimating and assessment risks satisfying regularity conditions that will be detailed. This will be applied to the ordinal data example.

Section 2 presents the framework, the cross-validation criterion and its approximation. It is universal in the sense that it can be applied to any estimating and assessment risks satisfying regularity conditions. We denote the approximate criterion by UACVR (U for Universal, A for approximate, CV for cross-validation and R for regularity). In Section 2 the asymptotic distributions of UACVR and of a difference of two UACVR values are given. Section 4 shows how UACVR specializes to particular cases: TIC appears as a special case when cross-entropy is used for defining both estimating and assessment risks, and AIC follows if the models are close to being well specified; other important cases where estimating and assessment risks defined in a less symmetric way are given. Section 5 presents a simulation study. Section 6 presents an illustration of the use of UACVR for comparing estimators derived from threshold models and estimators obtained by continuous approximations in the case of ordered categorical data with repeated measurements; these data are psychometric scores from a large study on cognitive aging. Section 7 concludes.

2 The universal cross-validation criterion and its approximation

2.1 The estimating risk and its estimation by an estimating function

Suppose that a sample of independently identically distributed (i.i.d.) variables On=(Yi,i=1,,n) is available. Based on On, an estimator gθˆ (where θˆ is short for θˆn) of the probability density function f of the true distribution can be chosen in a model, that is a family of distributions (gθ)θΘ, Θp. The main rules for designing estimators of θ can be thought of as minimizing an estimating risk. The estimating risk Φ(θ) is defined as the expectation under the true distribution of a loss function ϕ(θ,Yi): Φ(θ)=E{ϕ(θ,Yi)}. We would like to choose gθ0 where θ0=argminθΦ(θ). For making consistent estimation possible, it is natural to require that whenever the model is well specified, the risk is minimized by the true distribution. Precisely, saying that the model is well-specified amounts to say that there is a value θ, such that gθ=f. Then we require that θ*=argminθΦ(θ); moreover we will require that this minimum is unique. This is related to the concept of strictly proper scores [16]. In the scoring rule literature, the problem is formulated in terms of reward rather than loss; it is possible to establish a correspondence between the two theories by considering that minus a loss is a reward, and of course while one tries to minimize the expected loss, one tries to maximize the expected reward.

We cannot compute the estimating risk but a natural estimator of the estimating risk is the estimating function ΦOn(θ)=n1i=1nϕ(θ,Yi). The estimator θˆ defined as minimizing ΦOn(θ) is called an M-estimator. By the law of large numbers, ΦOn converges in probability toward Φ(θ)=E{ϕ(θ,Yi)}. Under some conditions given in Van der Vaart [17] (see, e.g. Theorem 5.7), θˆ converges in probability toward θ0. A simple set of sufficient conditions is that Θ is compact, Φ(θ) is continuous and has a unique minimizer, ϕ(θ,y) is continuous for every y.

Example 1: If we take as loss function ϕ(θ,Yi)=[YiEgθ(Yi)]2, the estimating risk is Φ(θ)=E[YiEgθ(Yi)]2; the estimating function is ΦOn(θ)=n1i=1n[YiEgθ(Yi)]2 and θˆ is the least-square estimator.

Example 2: If we take as loss function ϕ(θ,Yi)=loggθ(Yi), the estimating risk is Φ(θ)=E[loggθ(Yi)] which is the cross-entropy of gθ with respect to f; the estimating function is ΦOn(θ)=n1i=1nloggθ(Yi) and θˆ is the MLE.

2.2 The assessment risk and its estimation by cross-validation

When several estimators are available, we wish to assess their performance by estimating an assessment risk. Estimators with small assessment risks will be preferred. For constructing the risk of an estimator gθˆ we may use a loss function ψ(gθˆ,Y). The assessment risk is the expectation under f of ψ(gθˆ,Y), where both Y and gθˆ are random: Ψ(gθˆ)=E{ψ(gθˆ,Y)}.(1)The problem is to estimate the assessment risk (without knowing the true density f). A natural, albeit naive, estimator is ΨOn(gθˆ)=n1i=1nψ(gθˆ,Yi).(2)However ΨOn(gθˆ) is not completely satisfying because it does not take into account that gθˆ depends on the observations; as a result ΨOn(gθˆ) underestimates Ψ(gθˆ) (the well-known overoptimism bias).

If another sample On=(Yi,i=1,,n) i.i.d. with respect to On were available, a natural estimator of the assessment risk would be ΨOn(gθˆ)=n1i=1nψ(gθˆ,Yi). We call ΨOn(gθˆ) the “oracle estimator.” This is an unbiased estimator of the assessment risk but cannot be computed based on On. Its variance is varΨOn(gθˆ)=n1var{ψ(gθˆ,Yi)|θˆ}, which tends toward n1κ2, where κ2 is the variance of ψ(gθ0,Yi).

A pseudo-oracle estimator of the assessment risk is often used by practitioners who split their original sample in a training and a validation sample. However, this practice leads to a loss of efficiency since only half of the data is used for computing the estimator gθˆ and half of the data also for estimating its assessment risk. Cross-validation estimators of the assessment risk make a more efficient use of the information. In particular the leave-one-out cross-validation criterion is CV(gθˆ)=n1i=1nψ(gθˆi,Yi),where θˆi=argminΦOn|i and ΦOn|i=1n1jinϕ(θ,Yj). CV(gθˆ) does nearly as well as if another sample On were available, in terms of both bias and variance. Indeed it can immediately be seen that E{CV(gθˆ)}=Ψ(gθˆn1). We shall see in Section 3 that the asymptotic variance of the approximate cross-validation criterion UACVR(gθˆ) is precisely n1κ2, the same as that of the oracle estimator.

For comparing two estimators, the difference of assessment risks is relevant. This can be estimated by the difference of cross-validation estimates of the assessment risks.

2.3 The universal approximate cross-validation criterion

The leave-one-out cross-validation criterion may be computationally demanding since it is necessary to run the maximization algorithm n times for finding the θˆi,i=1,,n. For this reason an approximate formula is very useful. In this section we propose a universal approximation of the cross-validation (UACVR) criterion for regular loss functions ϕ and ψ.

Definition 1 (Universal approximation of the cross-validation) UACVR(gθˆ)=ΨOn(gθˆ)+Trace(HΦOn1K),(3)where HΦOn=2ΦOnθ2|θˆ and K=n1i=1nvˆidˆiT, with vˆi=ψ(gθ,Yi)θ|θˆand dˆi=1n1ϕ(θ,Yi)θ|θˆ.The leading term in eq. (3) is the naive estimator of Ψ(gθˆ) defined in eq. (2) while the second term is a correction accounting for parameter estimation. This correction term involves HΦOn, the Hessian of the estimating function, and vˆi and dˆi which are the gradients of the assessment and estimating functions (up to the multiplicative constant 1/(n1) for the latter).

Under regularity assumptions on ϕ(.,.) and ψ(.,.), we have that the leave-one-out cross-validation criterion differs from UACVR by an asymptotically negligible term in op(n1), which makes UACVR a good approximation for n relatively large, when leave-one-out cross-validation becomes computationally too demanding. The regularity conditions are detailed in the Appendix and are essentially: A1: Φ(θ) has a unique maximizer; A2: thrice differentiability of ϕ(θ,y); A3: twice differentiability of ψ(θ,y).

Theorem 1 Under assumptions A1, A2, A3, we have CV(gθˆ)=ΨOn(gθˆ)+Trace(HΦOn1K)+op(n1),(4)UACVR applies only to regular parametric problems. Thus it does not apply to non- or semi-parametric estimators and more generally to singular problems as treated by Watanabe [18]. Also, some assessment functions do not satisfy the regularity assumptions: for instance, a non-parametric estimator of the area under the ROC curve can be used for assessing the discriminating ability of an estimator, and this is not continuous in the parameter θ. Nevertheless, UACVR may be useful in various important contexts as detailed in Section 4, including penalized likelihood estimators approximated on a spline basis, which is a way to avoid strong parametric assumptions.

3 Asymptotic distribution and tracking interval

3.1 Asymptotic distribution of UACVR

Commenges et al. [19] using results of Vuong [20] studied the asymptotic distribution of a difference of normalized AIC’s as an estimator of a difference of Kullback–Leibler risks: the normalized AIC is defined as 12nAIC. Here similar arguments are applied to study the asymptotic distribution of UACVR and a difference of two UACVR values. By the continuous mapping theorem, the asymptotic distribution of UACVR(gθˆ) is the same as that of Ψ(gθ0). Since the latter quantity is a mean, it immediately follows by the central limit theorem that n1/2{UACVR(gθˆ)Ψ(gθ0)}DN(0,κ2),(5)where κ2=varψ(gθ0,Y) and var stands for the variance under the true distribution. We can also write: n1/2{UACVR(gθˆ)Ψ(gθˆ)}DN(0,κ2),(6)and κ2 can be estimated by the empirical variance of ψ(gθˆ,Yi), i=1,,n.

3.2 Asymptotic distribution of a difference between UACVR values of two estimators

If two estimators gθˆ and hγˆ are available, we would like to know which is the best according to the chosen assessment risk. Thus, we have to estimate the difference of their assessment risks: Δψ(gθˆ,hγˆ)=Ψ(gθˆ)Ψ(hγˆ). The obvious estimator is: DUACVR(gθˆ,hγˆ)=UACVR(gθˆ)UACVR(hγˆ). We focus on the case where gθ0hγ0. We obtain in that case using the same arguments as above: n1/2{DUACVR(gθˆn,hγˆn)Δ(gθˆn,hγˆn)}DN(0,ω2),(7)where ω2=varψ(gθ0,Y)ψ(hγ0,Y), and this can be estimated by the empirical variance of ψ(gθˆ,Yi)ψ(hγˆ,Yi).

Based on the same type of results, Commenges et al. [19] proposed to construct a “tracking interval” for a difference of normalized AIC values. The tracking interval is a kind of confidence interval for the difference of risks. Because the variability of estimators of difference of risks is rather large in general, it is useful to have an interval estimate rather than just a point estimate. However, in the conventional theory of point and interval estimation, the target parameter is fixed; here, it changes with n. Thus, we have a moving target: hence the name of tracking interval. Some simulations in Commenges et al. [19] showed that the variance of the difference of AIC was correctly estimated and the corresponding tracking interval had good coverage properties. The same idea can be applied in the more general case treated here. The tracking interval is given by (An,Bn), where An=DUACVR(gθˆn,hγˆn)zα/2n1/2ωˆn and Bn=DUACVR(gθˆn,hγˆn)+zα/2n1/2ωˆn, where zu is the uth quantile of the standard normal variable.

Note that ω is in general much lower than κ. This has been shown by Commenges et al. [13] for the expected cross-entropy assessment risk and comes from the fact that ψ(gθˆ,Yi) and ψ(hγˆ,Yi) are often positively correlated.

4 Particular cases of UACVR

In this section we give seven frameworks in which UACVR applies (a non-exhaustive list).

4.1 MLEs and information assessment risk: TIC and AIC

Suppose we take: ϕ(θ,Yi)=ψ(gθ,Yi)=loggθ(Yi). Then, the estimating function is minus the log-likelihood. It estimates the estimating risk, here the cross-entropy [21] of gθ with respect to the true density f: E{loggθ(Y)}=H(f)+KL(gθ;f), where H(f)=E{logf(Y)} is the entropy of f and KL(gθ;f)=Elogf(Y)gθ(Y) the Kullback–Leibler divergence of gθ relative to f. The assessment risk is here the expected cross-entropy: ECE(gθˆ)=E[E{loggθˆ(Y)|On}]=H(f)+EKL(gθˆ;f),(8)where EKL(gθˆ;f)=Elogf(Y)gθˆ(Y) is the expected Kullback–Leibler risk. It differs from the conventional Kullback–Leibler risk defined for a fixed density because it is applied here to an estimator: it was mentioned by Hall [22] under the name of “expected Kullback–Leibler loss.” So, although the loss functions for estimating and assessment are the same, there is a dissymmetry in that the estimating risk is a cross-entropy while, because gθˆ is random, the assessment risk is an expected cross-entropy.

In that case the leading term of eq. (3) is minus the maximized (normalized) log-likelihood. We have also that vˆi is the individual score and dˆi=1n1vˆi so that UACVR is identical to a normalized version of TIC [5]. If the model is well specified K tends in probability toward I(θ0). The Hessian HΦOn also tends toward I(θ0) so that the correction term tends toward p, the number of parameters. Thus, if the model is not too badly specified, TIC is approximately equal to AIC. We have UACVR=12nTIC12nAIC, and this estimates the expected cross-entropy of the estimator, ECE(gθˆ). In practice, Burnham and Anderson [23] do not recommend the use of TIC if n is small because of the variability of the correction term. On the other hand, Konishi and Kitagawa [5] show (see their Table 3.3) that the correction terms can be rather different when the models are misspecified.

4.2 M-estimators and information assessment risk: GIC

Konishi and Kitagawa [3] have generalized TIC and AIC to cases where gθˆ was an M-estimator. The criterion they proposed, obtained by correcting the bias of the log-likelihood, is the GIC. GIC is also a special case of UACVR, obtained when the assessment risk is the expected cross-entropy. They apply GIC in particular to penalized likelihood estimators. Thus UACVR, as GIC, can be applied to maximum a posteriori, maximum penalized likelihood and hierarchical likelihood estimators.

4.3 Restricted AIC

Liquet and Commenges [24] have proposed a modification of AIC and LCV when estimators are based on the full information while they are assessed on a smaller (more targeted) information. More specifically, the estimator is based on the sample On=(Yi,i=1,,n) but the assessment risk is based on a random variable Z which is a coarsened version of Y. For instance Z is a dichotomization of Y: Z=1Y>l. For this case, the restricted AIC (RAIC) was derived by both direct approximation of the risk and by approximation of the LCV. RAIC is a particular case of UACVR for the case: ϕ(θ,Yi)=loggθ(Yi) and ψ(gθ(Yi))=loggθ(Zi).

4.4 Estimators assessment by CRPS

Gneiting and Raftery [16] studied scoring rules and particularly the CRPS. Its inverse that can be used as a loss function is defined as CRPS(G(.,θ),Y)=+{G(u,θ)1uY}2du,where G(.,θ) is the cumulative distribution function (c.d.f.) of a distribution in the model. The risk is a Cramer–von Mises-type distance: d(G,G)={G(u)G(u)}2du. In some cases, it may be interesting to assess MLE’s using this assessment risk rather than the logarithmic loss which may be too sensitive to low values of the density. UACVR can be used for estimating this risk. In that case, the leading term of UACVR is n1i=1nCRPS(G(.,θˆ),Yi); for the correcting term, HΦOn is the Hessian of the log-likelihood (since θˆ is the MLE) and K must be computed with vˆi=ψθ|θˆ=2+{G(u,θˆ)1uYi}G(u,θ)θ|θˆdu; dˆi is the individual score (gradient of the individual log-likelihood) divided by n1. Thus the computation of vˆi, for each i, involves the computation of p simple integrals, which can be done numerically.

4.5 Estimators assessment by Brier score

Brier score [25] can be used to assess estimators of the distribution of categorical variables, say Y, taking values 1,,m. Consider a model for this distribution: we write gθ(j)=P(Y=j). Brier score is defined as j=1m(δY,jgθ(j))2, where δ is the Kronecker symbol (δY,j=1 if Y=j, zero otherwise). Assume that we estimate θ by maximum likelihood and use the Brier score for assessment. In this case, the leading term of UACVR is n1i=1nj=1m(δYi,jgθ(j))2; for the correcting term, HΦOn is the Hessian of the log-likelihood (since θˆ is the MLE) and K must be computed with vˆi=ψθ|θˆ=2gθθ|θˆ(Yi)+2j=1mgθˆ(j)gθθ|θˆ(j); dˆi is the individual score (gradient of the individual log-likelihood) divided by n1.

4.6 Conditional AIC

A referee suggested that UACVR might be useful for selecting random effect models based on conditional assessment functions, that is when the target is the density conditional on random effects. Conditional Akaike criterion was proposed by Vaida and Blanchard [26]; Greven and Kneib [27] proposed a correction taking into account uncertainty on the covariance matrix of the random effects; Braun et al. [28] proposed a predictive cross-validation criterion. UACVR could directly apply to this case by considering that the assessment loss is loggθ(Y|bˆ), where b is the random effect and bˆ its estimator. Since bˆ is a function of θ and Y, the assessment loss can indeed be written ψ(θ,y). For computing UACVR, the main task would be here to compute the gradient ψ(θ,Yi)θ, not forgetting the dependence of bˆ on θ. This could be easily done by numerical differentiation.

4.7 Estimators based on continuous approximation of categorical data

Assume Y is an ordered categorical variable taking values l=0,1,,L. Here for simplicity we consider that Y is univariate. Several models are available for this type of variables. Cumulative probit models, further called “threshold link models,” assume that Yi=l if a latent variable Λi takes values in the interval (cl,cl+1) for l=0,,L, with c0= and cL+1=+: Yi=l=0L1{Λi(cl,cl+1)}l.(9)Λi itself can be modeled as a noisy linear form of explanatory variables Λi=βxi+εi, where εi has a normal distribution of mean zero and variance σ2, and where xi are explanatory variables. The parameters are θ=(c1,,cL,β,σ). For identifiability one must add some constraints, for instance σ=1 and null intercept in the linear model for Λi. An estimator of the distribution can be obtained by maximum likelihood leading to define gθˆ. The assessment risk can be ECE(gθˆ). Note that since Y is discrete, the densities are defined with respect to a counting measure that is, gθˆ(l) defines the probability that Y=l.

One may also make a continuous approximation which leads to simpler computations and may be more parsimonious, especially if Y is multivariate as in the illustration of Section 6. For example we can consider the model Yi=βxi+εi. Maximizing the likelihood of this model for observations of Yi leads to a probability measure specified by the density hcγˆ. This is however a density relative to Lebesgue measure. This probability measure gives zero probabilities to {Yi=l} for all l, and this yields infinite value for ECE (meaning strong rejection of this estimator). However from hc a natural estimator of f can be constructed by gathering at l the mass around l: hγˆ(l)=l1/2l+1/2hcγˆ(u)du, for l=1,,L1, and hγˆ(0)=1/2hcγˆ(u)du, hγˆ(L)=L1/2+hcγˆ(u)du. UACVR can be computed for this estimator for estimating its ECE. The leading term of UACVR(hγˆ) can be interpreted as the log-likelihood obtained by this estimator with respect to the counting measure. For the correcting term we need the Hessian of the log-likelihood of hcγˆ and we have to compute vˆi=ψ(hγ,Yi)γ|θˆ. For instance if Yi=l for l=1,L1 we have vˆi=l1/2l+1/2hcγˆγ(u)dul1/2l+1/2hcγˆ(u)du.Since the denominator is the probability under hcγˆ that Y(l1/2,l+1/2), vˆi can be interpreted as the conditional expectation (under hcγˆ) of the individual score. Thus if hcγˆ does not vary much on (l1/2,l+1/2), vˆi is close to (n1)dˆi. Using the same arguments as in Section 4.1 we obtain that UACVR is close to correcting by the number of parameters as in AIC; such a criterion that we call AICd was proposed by Proust-Lima et al. [29], and this is likely to be a good approximation if the number of modalities of Y is large.

5 Simulation: choice of estimators for ordered categorical data

5.1 Design

We conducted a simulation study to illustrate the use of UACVR for comparing estimators derived from threshold link models and estimators obtained by a linear continuous approximation in the case of ordered categorical data (see Section 4.7). The aim was to assess the performance of UACVR as an estimator of ECE defined in eq. (8), and to compare it to the normalized naive AIC criterion (noted AIC) and the normalized AIC criterion computed on the counting measure (noted AICd). Performances of these criteria were studied in the case where the number of modalities (L+1) of the response variable Y is small (Section 5.2.1) and when it is large (Section 5.2.2).

5.1.1 True distributions

For all the simulations, the data came from a cumulative probit model where the relationship between Yi and Λi is as in eq. (9) and the linear form of Λi is specified by Λi=β1Xi1+β2Xi2+εi;i=1,,n,(10)where εi and the two explanatory variables Xi1 and Xi2 were generated from independent standard normal distributions. In order not to disadvantage the linear continuous approximation compared to the threshold link model, the parameters c1,,cL were chosen as the solution of the following equations: {P(Λi<c1)=P(Λi>cL)P(Λi<c1)=P(c1<Λi<c2),ci+1=ci+mwithm=(cLc1)/(L1)

5.1.2 The different models

For each generated sample, we fitted the cumulative probit model as previously defined, and a linear model assuming a linear continuous approximation of the response variable Y, Yi=γ0+γ1Xi1+γ2Xi2+εi, with εi being independent zero mean normal variables with variance τ2. Both models were fitted by maximum likelihood using a Fortran program which was checked to be correct by comparing the results with those obtained by the R package lcmm [30].

Samples of 300,500,3,000 subjects were generated. For all simulations, N = 10,000 samples were generated. The true assessment risk, ECE, which is available only in a simulation study, was computed by a Monte Carlo approach: for each sample Onj we computed gθˆ(j); we generated a large number M = 100,000 observations Yk independent of Onj,j=1,,N; we estimated ECE by the global mean 1NMj=1Nk=1Mloggθˆ(j)(Yk).

5.2 Results of the simulation

5.2.1 Small number of modalities

We consider here the case where the number of modalities of Y is relatively small (L+1=5). In this simulation, we fixed β1=1.05, β2=1.85. In Table 1, we present, for different sample sizes n, the results for the different empirical criteria AIC, AICd and UACVR which can be compared with ECE. For any sample size, the cumulative probit model provided a better ECE than the linear model (positive difference). It appeared that UACVR had a very small bias for all the sample sizes (of order 103). The two other criteria AIC and AICd were also in favor of a threshold model. However, as expected, the naive normalized AIC did not correctly estimate ECE due to the wrong probability measure (Lebesgue measure instead of a counting measure). We note that the criterion AICd estimated ECE relatively well, with a small bias around 102 and 103. All the criteria were in agreement with ECE for the choice of the model.

Table 1:

Performance of the criteria for a small number of modalities (L + 1 = 5) and different sample sizes.

Table 2:

Performance of the criteria for a large number of modalities (L + 1 = 20) and different sample sizes.

Table 3:

Performance of the 95% tracking interval in both situations (L+1=5 and L+1=20) and for the different sample sizes (n=300,500 and 3,000).

5.2.2 Large number of modalities

We consider here the case where the number of modalities of Y is relatively large (L+1=20). In this simulation, we fixed β1=0.15, β2=0.85. The results of this simulation are presented in Table 2. For any sample size, the linear model provided a better ECE than the threshold model (negative difference). It appeared that UACVR had a small bias for all the sample sizes (of order 103 and 104). The AICd criterion gave similar results as the UACVR criterion while the AIC criterion failed to find the best estimator (positive difference).

5.2.3 Coverage of tracking intervals

Finally we looked at the coverage of the tracking intervals and the percentage of cases where 0 was inside of the tracking interval. The results are given in Table 3. The coverage rates appear to be too large. We checked that the distributions of UACVR were approximately normal. We found however that the estimated standard deviations were too large by a factor varying from 1.2 to 1.8 for small and large number of modalities respectively, but we were unable to find the reason of this overestimation. Nevertheless, the estimate gives the order of magnitude of the variability of UACVR.

For small number of modalities, 0 was always outside of the tracking interval, leading to an unequivocal choice. For large number of modalities, the percentage increased with n. This may seem paradoxical but illustrates well the difference between a tracking interval and a confidence interval. What happens is that the misspecification risk of the linear model is rather large for small number of modalities and is very small for large number of modalities. Thus the global risk is driven by the statistical risk. The latter decreases with n, so that the difference of risks, which is the target, decreases with n, becoming very small for n=3,000; in this case the two models are nearly equivalent and there is no point to choose one rather than the other according to the chosen risk.

6 Illustration on the choice of estimators for psychometric tests

In epidemiological studies, cognition is measured by psychometric tests which usually consist in the sum of items measuring one or several cognitive domains. A common example is the Mini-Mental State Examination (MMSE) score [31], computed as the sum of 30 binary items evaluating memory, calculation, orientation in space and time, language, and word recognition; for this reason it is called a “sumscore” and ranges from 0 to 30. Although in essence psychometric tests are ordered categorical data, they are most often analyzed as continuous data. Indeed, they usually have a large number of different levels and, especially in longitudinal studies, models for categorical data are numerically complex. Recently, Proust-Lima et al. [29] defined a latent process mixed model to analyze repeated measures of discrete outcomes involving either a threshold link model or an approximation of it using continuous parameterized increasing functions. Comparison of models assuming either categorical data (using the threshold model) or continuous data (using continuous functions) was done with an AICd, computed with respect to the counting measure. In this illustration, we use UACVR to compare such latent process mixed models assuming either continuous or ordered categorical data when applied on the repeated measures of the MMSE and its calculation subscore in a large sample from a French prospective cohort study.

6.1 Latent process mixed models

In brief, the latent process mixed model assumes that a latent process (Λi(t))t0 underlies the repeated measures of the observed variable Yij for subject i (i=1,...,n) and occasion j (j=1,...,ni). The latent process Λi(t) is defined as a standard linear mixed model: Λi(t)=Xi(t)Tβ+Zi(t)Tbi for t0 where Xi(t) and Zi(t) are distinct vectors of time-dependent covariates associated, respectively, with the vector of fixed effects β and the vector of random effects bi (biN(μ,D)). We further assume that bi0, the first component of bi that usually represents the random intercept, is N(0,1) for identifiability; except for the variance of bi0, D is an unstructured variance matrix.

A measurement model links the latent process with the observed repeated measures through intermediary variables which are noisy versions of the latent process at time tij: Λij=Λi(tij)+εij, where the εij’s are i.i.d. normal variables with zero expectation. For ordered categorical data, a standard threshold link model as defined in eq. (9) (Section 4.7) for the univariate case is well adapted, leading to a cumulative probit mixed model. For continuous data, the link has been modeled as H(Yij;η)=Λij where H(.;η) is a monotonic increasing transformation. Three families of such transformations are considered: (i) H(y;η)=h(y;η1,η2)η3η4 where h(.;η1,η2) is the beta c.d.f. with parameters (η1,η2); (ii) H(y;η)=η1+l=2m+2ηlBlI(y) where (BlI)l=2,m+2 is a basis of quadratic I-splines with m nodes; (iii) H(y;η)=yη1η2 which gives the standard linear mixed model.

Latent process mixed models are estimated within the maximum likelihood framework using the lcmm function of lcmm R package [30]. When assuming continuous data, the likelihood can be computed analytically using the Jacobian of H [32]. In contrast, when assuming ordered categorical data, an integration over the random effects has to be done numerically [29].

UACVR is computed from the log-likelihood ΨOn obtained for the MLEs θˆ with respect to the counting measure: ΨOn(θˆ)=n1i=1n+j=1niP(Yij|bi)fb(bi)dbi=n1i=1n+j=1nil=0LP(Yij=l|bi)1{Yij=l}fb(bi)dbi=n1i=1n+j=1nil=0LP(clΛij<cl+1|bi)1{Yij=l}fb(bi)dbi,(11)where c0=, cL+1=+, and either cl (l=1,...,L) are the estimated thresholds when a threshold model is considered, or cl=H(l12,ηˆ) (l=1,...,L) when monotonic increasing families of transformations are used. We also need to compute vˆi similarly as in Section 4.7. The integral is approximated by Gaussian quadrature.

6.2 Application: categorical psychometric tests

Data come from the French prospective cohort study PAQUID initiated in 1988 to study normal and pathological aging [33]. Subjects included in the cohort were 65 and older at initial visit and were followed up to 10 times with a visit at 1, 3, 5, 8, 10, 13, 15, 17 and 20 years after the initial visit. At each visit, a battery of psychometric tests including the MMSE was completed. In the present analysis, all the subjects free of dementia at the 1-year visit and who had at least one MMSE measure during the whole follow-up were included: this resulted in a sample size of 2,914 subjects. Data from baseline were removed to avoid modeling the first-passing effect. The observed distributions of the MMSE sumscore and of its calculation subscore are displayed in Figure 1.

Distributions of MMSE sumscore and MMSE calculation subscore in the PAQUID sample (n = 2,914). Data were pooled from all available visits for a total of 10,846 observations.
Figure 1:

Distributions of MMSE sumscore and MMSE calculation subscore in the PAQUID sample (n = 2,914). Data were pooled from all available visits for a total of 10,846 observations.

The trajectory of the latent process was modeled as an individual quadratic function of age with correlated random effects for intercept, slope and quadratic slope (Zi(t)T=(1,agei(t),agei2(t))), and an adjustment for binary covariates educational level (EL = 1 if the subject graduated from primary school) and gender (SEX = 1 if the subject is a man) plus their interactions with age and quadratic age (so that Xi(t)T=Zi(t)T(1,ELi,SEXi)). For MMSE sumscore, in addition to the threshold link, the linear, beta c.d.f. and I-splines (with five equidistant nodes) continuous link functions were considered. For calculation subscore, in addition to the threshold link, only the linear link was considered.

6.3 Results

Table 4 gives the assessment criteria for estimators based on the different models, and Table 5 provides the differences in UACVR or AICd and their 95% tracking interval. For the MMSE sumscore, the mixed model assuming the standard linear transformation yielded a clearly worse UACVR than other models accounting for nonlinear relationships with the underlying latent process. The model involving a beta c.d.f. gave a similar risk as the one involving the less parsimonious I-splines transformation (DUACVR=0.0070 and 0 in the 95% tracking interval). Finally, the mixed model considering a threshold link model, which is numerically demanding (because of a three-dimensional integral in the likelihood), gave the best assessment risk but remained relatively close to the simpler ones assuming a beta c.d.f. (DUACVR=0.0200) or a I-splines transformation (DUACVR=0.0270). For the interpretation of these values Commenges et al. [19] suggested to qualify values of order 101, 102 and 103 as “large,” “moderate” and “small,” respectively; moreover for multivariate observations, it was suggested to divide by the total number of observations rather by the number of independent observations. With this correction (which amounts to divide the current values by a factor of 3.7=10,846/2,914) the differences between the linear model and the other models can be qualified as “large,” and the differences between the threshold model and both beta c.d.f. and I-splines are between “moderate” and “small.” Of course, this gives only an idea of the difference of risks between estimators; a more intuitive and reliable interpretation scale is still to be found. Figure 2 displays the estimated link functions in (A) and the predicted mean trajectories of the latent process according to educational level in (B) from the models involving either a linear, a beta c.d.f., I-splines or a threshold link function. The estimated link functions as well as the predicted trajectories of the latent process are very close when assuming either beta c.d.f., I-splines or a threshold link function but they greatly differ when assuming a linear link.

(A) Estimated inverse link functions between MMSE sumscores and the underlying latent process and (B) predicted trajectories of the latent process of a woman according to educational level (with EL+ and EL– for, respectively, validated or non-validated primary school diploma) in latent process mixed models assuming either linear, beta c.d.f., I-splines or threshold link functions (PAQUID sample, n = 2,914); the trajectories for the latter three transformations are indistinguishable.
Figure 2:

(A) Estimated inverse link functions between MMSE sumscores and the underlying latent process and (B) predicted trajectories of the latent process of a woman according to educational level (with EL+ and EL– for, respectively, validated or non-validated primary school diploma) in latent process mixed models assuming either linear, beta c.d.f., I-splines or threshold link functions (PAQUID sample, n = 2,914); the trajectories for the latter three transformations are indistinguishable.

For the calculation subscore also, the standard linear mixed model again gave a clearly higher risk than the mixed model assuming a threshold link model (DUACVR(linear,thresholds)=0.452, 95% tracking interval: [0.413,0.492]).

Table 4:

Number of parameters (p), naive normalized AIC (AIC), AICd, and UACVR for latent process mixed models involving different transformations H and applied on either the MMSE sumscore or its calculation subscore.

Table 5:

Difference of AICd (DAICd), difference of two UACVR values(DUACVR) and its 95% tracking interval between latent process mixed models involving different transformations H1 and H2, and applied on either the MMSE sumscore or its calculation subscore.

7 Conclusion

We have proposed a universal approximate formula for leave-one-out cross-validation under regularity conditions: it is universal in the sense that it applies to any couple of estimating and assessment risks which can be correctly estimated from the observations. UACVR is often a very good approximation of leave-one-out cross-validation which itself does nearly as well as an “oracle estimator” of the assessment risk which would be computable if we assessed the estimator on an independent replica of the sample. Another asset is that UACVR does not need the assumption that the models are well specified, and non-nested models can be compared. The result is in principle restricted to parametric models but extends to smooth semi- or non-parametric ones through spline representation of penalized likelihood estimators. The approximate formula not only allows fast computation, because the model is fitted only once, but also allows deriving the asymptotic distribution.

Estimating this distribution is important since the variability of UACVR, as that of any criterion used for estimator choice, may be large. Hopefully, as noted in Section 3, the variability of a difference of UACVR values between two estimators is smaller, but still remains non-negligible. A simple formula allows to estimate these variances and to construct so-called tracking intervals; our simulation study however shows that the coverage of these tracking intervals is too large, due to an overestimation of the variances. This is an open question to find why this happened here while in other contexts [15, 19] the coverage rates were correct, and possibly to find a correction to this overestimation; nevertheless, the estimates get the correct order of magnitude and the tracking intervals may be useful.

In this paper, UACVR has been applied to the issue of choice between estimators of the distribution of longitudinal categorical data based on cumulative probit mixed models or on mixed models based on a continuous approximation. It has been shown that the naive AIC can be misleading while a procedure called AICd (which had not been validated yet) yields results very close to UACVR, even if the latter is slightly better. Both quantities can be computed in the lcmm R package.

Appendix: Proof of Theorem 1

Under Assumptions A1–A3 below, we have formula (4).

In the proof, we apply the op concept to vectors and matrices. Saying that a matrix H is Op(1) means that all its elements are Op(1). The proof is partly heuristic in that we need at the end an assumption for obtaining that a mean of n Op(n2) remainder terms is itself an Op(n2) or at least an op(n1) term.

We assume:

  • A1 θ0 is the unique minimizer of Φ(θ) and the M-estimator θˆ is consistent for θ0.

  • A2 ϕ(θ,y) is thrice differentiable for every y and the third derivative is dominated by a fixed function in a neighborhood of θ0.

  • A3 ψ(θ,y) is twice differentiable for every y and the second derivative is dominated by a fixed function in a neighborhood of θ0.

The proof is as follows. Assumption A2 is the essential assumption in the so-called classical conditions [17] for obtaining that n(θˆθ0) has an asymptotic normal distribution. It implies that θˆiθˆ=Op(n1/2). A Taylor expansion of ΦOn|iθ|θˆi around θˆ yields 0=ΦOn|iθ|θˆ+HΦOn|i(θˆiθˆ)+Rn1,where HΦOn|i=2ΦOn|iθ2|θˆ and Rn1 is a quadratic form of θˆiθˆ involving third derivatives of ΦOn|i taken in θ˜ so that ||θ˜nθˆ||||θˆiθˆ||. Thus ||θ˜nθˆ|| is also an Op(n1/2). Under Assumption A2 and using Lemma 2.12 of Van der Vaart [17], Rn1 is an Op(n1). Assumptions A1 and A2 imply that I(θ)=2Φθ2|θ exists and is invertible in a neighborhood of θ0. By the strong law of large numbers, HΦOn=2ΦOnθ2|θˆ and HΦOn|i=2ΦOn|iθ2|θˆ converge toward I(θ0) and thus are invertible for sufficiently large n. It also follows that both these matrices and their inverses are Op(1). Thus, from the above development we obtain θˆiθˆ=HΦOn|i1ΦOn|iθ|θˆ+Rn,where Rn=HΦOn|i1Rn1 is an Op(n1).

By definition of ΦOn(θ) we have the relation nΦOn(θ)=(n1)ΦOn|i(θ)+ϕ(θ,Yi).(12)Taking derivatives of the terms of this equation and taking the values at θˆ we find 0=(n1)ΦOn|iθ|θˆ+ϕ(θ,Yi)θ|θˆ and we obtain that ΦOn|iθ|θˆ=dˆi

Hence we have θˆiθˆ=HΦOn|i1dˆi+Rn,(13)Note that this implies that θˆiθˆ=Op(n1) because HΦOn|i=Op(1) and both dˆi and Rn are Op(n1). But this in turn implies that Rn is in fact an Op(n2) (as a quadratic form of Op(n1) terms). Now we show that HΦOn|i can be replaced by HΦOn=2ΦOnθ2|θˆ in eq. (13). By twice derivating eq. (12) we obtain HΦOn=n1nHΦOn|i+1nHϕi where Hϕi=2ϕ(θ,Yi)θ2|θˆ; since the last term is an Op(n1), we can write HΦOn|i=HΦOn+Op(n1). Equation (13) can be written HΦOn|i(θˆiθˆ)=dˆi+Op(n2) or replacing HΦOn|i by HΦOn+Op(n1), HΦOn(θˆiθˆ)=dˆi+Op(n1)(θˆiθˆ)+Op(n2). Using the fact that θˆiθˆ=Op(n1) we obtain θˆiθˆ=HΦOn1dˆi+Op(n2).(14)Developing now the assessment loss function for θˆi around θˆ yields (using Assumption A3): ψ(gθˆi,Yi)=ψ(gθˆ,Yi)+(θˆiθˆ)Tvˆi+Op(n2).Replacing in this equation θˆiθˆ by its approximation in eq. (14) we obtain ψ(gθˆi,Yi)=ψ(gθˆ,Yi)+dˆiTHΦOn1vˆi+Op(n2). Taking the mean of the left-hand terms of these equations yields CV(gθˆ). Taking the mean of the terms on the right-hand side gives us a development with an error term which is the mean of n error terms in Op(n2). Because the number of error terms to consider increases with n, it is not true in general that such a mean preserves the order of the error terms. This is true assuming some boundedness conditions of the expectations of these terms. At this stage the proof is heuristic: we assume conditions such that the mean of these Op(n2) terms is also an Op(n2), or at least op(n1). When this holds, we obtain the announced result given in formula (4).


  • 1.

    Akaike H. Information theory and an extension of the maximum likelihood principle. In BN Petrov and F Csâki, editors. Proc. of the 2nd Int. symp. on information theory, Budapest: Akademiai Kiâdo, 1973: 267–81. Google Scholar

  • 2.

    Takeuchi K. Distributions of information statistics and criteria for adequacy of models . Math Sci 1976;153:12–18. Google Scholar

  • 3.

    Konishi S, Kitagawa G. Generalised information criteria in model selection . Biometrika 1996;83:875–90. CrossrefGoogle Scholar

  • 4.

    Murata N, Yoshizawa S, Amari S-I. Network information criterion-determining the number of hidden units for an artificial neural network model . Neural Networks IEEE Trans 1994;5:865–72. CrossrefGoogle Scholar

  • 5.

    Konishi S, Kitagawa G. Information criteria and statistical modeling. New York: Springer Series in Statistics, 2008. Google Scholar

  • 6.

    Stone M. Cross-validatory choice and assessment of statistical predictions (with discussion) . J R Stat Soc B 1974;39:111–47. Google Scholar

  • 7.

    Golub G, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter . Technometrics 1979;21:215–23. CrossrefGoogle Scholar

  • 8.

    Wahba G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem . Ann Stat 1985;13:1378–402. CrossrefGoogle Scholar

  • 9.

    Van Der Laan M, Dudoit S, Keles S. Asymptotic optimality of likelihood-based cross-validation . Stat Appl Genet Mol Biol 2004;3:1036. Google Scholar

  • 10.

    Xu G, Huang JZ. Asymptotic optimality and efficient computation of the leave-subject-out cross-validation . Ann Stat 2012;40:3003–30. Web of ScienceCrossrefGoogle Scholar

  • 11.

    Gu C, Xiang D. Cross-validating non-Gaussian data . J Comput Graphical Stat 2001;10:581–91. CrossrefGoogle Scholar

  • 12.

    Xiang D, Wahba G. A generalized approximate cross validation for smoothing splines with non-Gaussian data . Stat Sin 1996;6:675–92. Google Scholar

  • 13.

    Commenges D, Joly P, Gegout-Petit A, Liquet B. Choice between semi-parametric estimators of Markov and non-Markov multi-state models from generally coarsened observations . Scand J Stat 2007;34:33–52. CrossrefGoogle Scholar

  • 14.

    O’Sullivan F. A statistical perspective on ill-posed inverse problems . Stat Sci 1986;1:502–18. CrossrefGoogle Scholar

  • 15.

    Commenges D, Liquet B, Proust-Lima C. Choice of prognostic estimators in joint models by estimating differences of expected conditional Kullback-Leibler risks . Biometrics 2012;68:380–7. Web of ScienceCrossrefGoogle Scholar

  • 16.

    Gneiting T, Raftery A. Strictly proper scoring rules, prediction, and estimation . J Am Stat Assoc 2007;102:359–78. CrossrefWeb of ScienceGoogle Scholar

  • 17.

    Van der Vaart A. Asymptotic statistics. Cambridge: Cambridge University Press, 2000. Google Scholar

  • 18.

    Watanabe S. Algebraic geometry and statistical learning theory. Vol. 25. Cambridge: Cambridge University Press, 2009. Google Scholar

  • 19.

    Commenges D, Sayyareh A, Letenneur L, Guedj J, Bar-Hen A. Estimating a difference of Kullback-Leibler risks using a normalized difference of AIC . Ann Appl Stat 2008;2:1123–42. Web of ScienceCrossrefGoogle Scholar

  • 20.

    Vuong Q. Likelihood ratio tests for model selection and non-nested hypotheses . Econometrica 1989;57:307–33. CrossrefGoogle Scholar

  • 21.

    Cover T, Thomas J. Elements of information theory. New York: John Wiley and Sons, 1991:542. Google Scholar

  • 22.

    Hall P. On Kullback-Leibler loss and density estimation . Ann Stat 1987;15:1491–519. CrossrefGoogle Scholar

  • 23.

    Burnham KP, Anderson DR. Model selection and multimodel inference: a practical information-theoretic approach. 2nd ed. New York: Springer-Verlag, 2002. Google Scholar

  • 24.

    Liquet B, Commenges D. Choice of estimators based on different observations: modified AIC and LCV criteria . Scand J Stat 2011;38:268–87. CrossrefWeb of ScienceGoogle Scholar

  • 25.

    Brier GW. Verification of forecasts expressed in terms of probability . Mon Weather Rev 1950;78:1–3. CrossrefGoogle Scholar

  • 26.

    Vaida F, Blanchard S. Conditional Akaike information for mixed-effects models . Biometrika 2005;92:351–70. CrossrefGoogle Scholar

  • 27.

    Greven S, Kneib T. On the behaviour of marginal and conditional AIC in linear mixed models . Biometrika 2010;97:773–89. Web of ScienceCrossrefGoogle Scholar

  • 28.

    Braun J, Held L, Ledergerber B. Predictive cross-validation for the choice of linear mixed-effects models with application to data from the Swiss HIV cohort study . Biometrics 2012;68:53–61. Web of ScienceCrossrefGoogle Scholar

  • 29.

    Proust-Lima C, Amieva H, Jacqmin-Gadda H. Analysis of multivariate mixed longitudinal data: a flexible latent process approach Br J Math Stat Psychol 2012;66:470–87. Web of ScienceGoogle Scholar

  • 30.

    Proust-Lima C, Philipps V, Diakite A, Liquet B. LCMM: Estimation of extended mixed models using latent classes and latent processes. R package version 1.6.6, 2014. Google Scholar

  • 31.

    Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician . J Psychiatr Res 1975;12:189–98. CrossrefGoogle Scholar

  • 32.

    Proust C, Jacqmin-Gadda H, Taylor JM, Ganiayre J, Commenges D. A nonlinear model with latent process for cognitive evolution using multivariate longitudinal data . Biometrics 2006;62:1014–24. CrossrefGoogle Scholar

  • 33.

    Letenneur L, Commenges D, Dartigues JF, Barberger-Gateau P. Incidence of dementia and Alzheimer’s disease in elderly community residents of South-Western France . Int J Epidemiol 1994;23:1256–61. CrossrefGoogle Scholar


    About the article

    Published Online: 2015-04-03

    Published in Print: 2015-05-01

    Citation Information: The International Journal of Biostatistics, ISSN (Online) 1557-4679, ISSN (Print) 2194-573X, DOI: https://doi.org/10.1515/ijb-2015-0004.

    Export Citation

    © 2015 Walter de Gruyter GmbH, Berlin/Boston. Copyright Clearance Center

    Citing Articles

    Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

    Takeshi Emura, Masahiro Nakatochi, Shigeyuki Matsui, Hirofumi Michimae, and Virginie Rondeau
    Statistical Methods in Medical Research, 2017, Page 096228021668803

    Comments (0)

    Please log in or register to comment.
    Log in