Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter October 12, 2020

Parametric models for combined failure time data from an incident cohort study and a prevalent cohort study with follow-up

  • James McVittie ORCID logo EMAIL logo , David Wolfson , David Stephens , Vittorio Addona and David Buckeridge


A classical problem in survival analysis is to estimate the failure time distribution from right-censored observations obtained from an incident cohort study. Frequently, however, failure time data comprise two independent samples, one from an incident cohort study and the other from a prevalent cohort study with follow-up, which is known to produce length-biased observed failure times. There are drawbacks to each of these two types of study when viewed separately. We address two main questions here: (i) Can our statistical inference be enhanced by combining data from an incident cohort study with data from a prevalent cohort study with follow-up? (ii) What statistical methods are appropriate for these combined data? The theory we develop to address these questions is based on a parametrically defined failure time distribution and is supported by simulations. We apply our methods to estimate the duration of hospital stays.

Corresponding author: James McVittie, McGill University, Mathematics and Statistics, 805 Sherbrooke Street West, Montreal, Quebec Canada, E-mail:

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: The first author was supported by a Natural Sciences and Engineering Research Council of Canada PGSD-3 award. David Stephens acknowledges the support of a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.


1. Humbert, M, Sitbon, O, Yaïci, A, Montani, D, O’Callaghan, DS, Jaïs, X, et al. On behalf of the French Pulmonary Arterial Hypertension Network. Survival in incident and prevalent cohorts of patients with pulmonary arterial hypertension. Eur Respir J 2010;36:549–55. in Google Scholar

2. Lee, CH, Ning, J, Kryscio, RJ, Shen, Y. Analysis of combined incident and prevalent cohort data under a proportional mean residual life model. Stat Med 2019;38:2103–14. in Google Scholar

3. Daepp, MIG, Hamilton, MJ, West, GB, Bettencourt, LMA. The mortality of companies. J R Soc Interface 2015;12. in Google Scholar

4. Groothuis, PA, Hill, JR. Pay discrimination, exit discrimination or both? Another look at an old issue using NBA data. J Sports Econ 2011;14:171–85. in Google Scholar

5. Welch, S.M. Nonparametric estimates of the duration of welfare spells. Econ Lett 1998;60:217–21. in Google Scholar

6. Andersen, PK, Borgan, Ø, Gill, RD, Keiding, N. Statistical Models Based on Counting Processes. New York: Springer-Verlag; 1993.10.1007/978-1-4612-4348-9Search in Google Scholar

7. Kaplan, EL, Meier, P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:457–81. in Google Scholar

8. Kalbfleisch, JD, Prentice, RL. The statistical analysis of failure time data, 2nd ed. New York: Wiley; 1980.Search in Google Scholar

9. Tsai, W-Y, Jewell, NP, Wang, M-C. A note on the product-limit estimator under right censoring and left truncation. Biometrika 1987;74:883–6. in Google Scholar

10. Wang, M-C. Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 1991;86:130–43. in Google Scholar

11. Zhou, Y. A note on the TJW product-limit estimator for truncated and censored data. Stat Probab Lett 1996;26:381–7. in Google Scholar

12. Asgharian, M, M’Lan, CE, Wolfson, DB. Length-biased sampling with right censoring: an unconditional approach. J Am Stat Assoc 2002;97:201–9. in Google Scholar

13. Bergeron, P-J, Asgharian, M, Wolfson, DB. Covariate bias induced by length-biased sampling of failure times. J Am Stat Assoc 2008;103:737–42. in Google Scholar

14. Pan, W, Chappell, R. A nonparametric estimator of survival functions for arbitrarily truncated and censored data. Lifetime Data Anal 1998;4:187–202. in Google Scholar

15. Wolfson, DB, Best, AF, Addona, V, Wolfson, J, Gadalla, SM. Benefits of combining prevalent and incident cohorts: an application to myotonic dystrophy. Stat Methods Med Res 2019;28:3333–45. in Google Scholar PubMed

16. McVittie, JH, Wolfson, DB, Stephens, DA. A note on the applicability of the standard non-parametric maximum likelihood estimator for combined incident and prevalent cohort data. Stat 2020;9. in Google Scholar

17. Vardi, Y. Nonparametric estimation in the presence of length bias. Ann Stat 1982;10:616–20. in Google Scholar

18. Vardi, Y. Empirical distributions in selection bias models. Ann Stat 1985;13:178–203. in Google Scholar

19. Gill, RD, Vardi, Y, Wellner, JA. Large sample theory of empirical distributions in biased sampling models. Ann Stat 1988;16:1069–112. in Google Scholar

20. Miller, RGJr. What price Kaplan-Meier?. Biometrics 1983;39:1077–81. in Google Scholar

21. Laslett, GM. The survival curve under monotone density constraints with application to two-dimensional line segment processes. Biometrika 1982;69:153–60. in Google Scholar

22. van der Laan, MJ. Efficiency of the NPMLE in the line-segment problem. Scand J Stat 1996;23:527–50.Search in Google Scholar

23. Wijers, BJ. Consistent non-parametric estimation for a one-dimensional line segment process observed in an interval. Scand J Stat 1995;22:335–60.Search in Google Scholar

24. Saarela, O, Kulathinal, S, Karvanen, J. Joint analysis of prevalence and incidence data using conditional likelihood. Biostatistics 2009;10:575–87. in Google Scholar PubMed

25. Ibragimov, IA, Has’minskii, RZ. Statistical Estimation: Asymptotic Theory. Springer-Verlag; 1981.10.1007/978-1-4899-0027-2Search in Google Scholar

26. Hoadley, B. Asymptotic properties of maximum likelihood estimators for the independent not identically distributed case. Ann Math Stat 1991;42:1977–91. in Google Scholar

27. Wilks, SS. Multidimensional statistical scatter. In: Olkin, I, Ghurye, S, Hoeffding, W, Madow, W, Mann, H, editors Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling Stanford University Press; 1960. pp. 486–503.Search in Google Scholar

28. Verma, A, Rochefort, C, Powell, G, Buckeridge, D. Hospital readmissions and the day of the week. J Health Serv Res Pol 2018;23:21–7. in Google Scholar PubMed

29. Shaban-Nejad, A, Lavinge, M, Okhmatovskaia, A, Buckeridge, DL. PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data. Ann N Y Acad Sci 2017;1387:44–53. in Google Scholar PubMed

30. Addona, V, Atherton, J, Wolfson, DB. Testing the assumptions for the analysis of survival data arising from a prevalent cohort study with follow-up. Int J Biostat 2012;8. in Google Scholar PubMed

31. Addona, V, Wolfson, DB. A formal test for the stationarity of the incidence rate using data from a prevalent cohort study with follow-up. Lifetime Data Anal 2006;12:267–84. in Google Scholar PubMed

32. Tierney, JF, Pignon, J-P, Gueffyier, F, Clarke, M, Askie, L, Vale, CL, et al. On behalf of the Cochrane IPD Meta-analysis Methods Group. How individual participant data meta-analyses have influenced trial design, conduct, and analysis. J Clin Epidemiol 2015;68:1325–35. in Google Scholar PubMed PubMed Central

33. Wolfson, C, Wolfson, DB, Asgharian, M, M’Lan, CE, Østbye, T, Rockwood, K, et al. For the Clinical Progression of Dementia Study Group. A reevaluation of the duration of survival after the onset of dementia. N Engl J Med 2001;344:1111–16. in Google Scholar PubMed

Supplementary material

The online version of this article offers supplementary material (

Received: 2020-04-01
Accepted: 2020-09-29
Published Online: 2020-10-12

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 4.6.2023 from
Scroll to top button