Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Dependence Modeling

Ed. by Puccetti, Giovanni

1 Issue per year

Covered by:

Open Access
See all formats and pricing
More options …

A joint regression modeling framework for analyzing bivariate binary data in R

Giampiero Marra
  • Corresponding author
  • Department of Statistical Science, University College London, Gower Street, London WC1E 6BT, UK
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Rosalba Radice
  • Department of Economics, Mathematics and Statistics, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2017-12-07 | DOI: https://doi.org/10.1515/demo-2017-0016


We discuss some of the features of the R add-on package GJRM which implements a flexible joint modeling framework for fitting a number of multivariate response regression models under various sampling schemes. In particular,we focus on the case inwhich the user wishes to fit bivariate binary regression models in the presence of several forms of selection bias. The framework allows for Gaussian and non-Gaussian dependencies through the use of copulae, and for the association and mean parameters to depend on flexible functions of covariates. We describe some of the methodological details underpinning the bivariate binary models implemented in the package and illustrate them by fitting interpretable models of different complexity on three data-sets.

Keywords: binary data; copula; confounding; joint model; penalized smoother; selection bias; R; simultaneous parameter estimation

MSC 2010: 62H99; 62J02


  • [1] Abadie, A., D. Drukker, J. L. Herr, and G.W. Imbens (2004). Implementingmatching estimators for average treatment effects in Stata. Stata J. 4(3), 290-311.Google Scholar

  • [2] Abowd, J. M. and H. S. Farber (1982). Job queues and the union status of workers. Ind. Labor. Relat. Rev. 35(3), 354-367.CrossrefGoogle Scholar

  • [3] Azzalini, A. (1985). A class of distributions which includes the normal one. Scand. J. Stat. 12(2), 171-178.Google Scholar

  • [4] Azzalini, A. and R. B. Arellano-Valle (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. J. Stat. Plan. Infer. 143(2), 419-433.Google Scholar

  • [5] Bärnighausen, T., J. Bor, S. Wandira-Kazibwe, and D. Canning (2011). Correcting HIV prevalence estimates for survey nonparticipation using Heckman-type selection models. Epidemiology 22(1), 27-35.CrossrefGoogle Scholar

  • [6] Bazan, J. L., H. Bolfarinez, and M. B. Branco (2010). A framework for skew-probit links in binary regression. Commun. Stat. Simulat. 39(4), 678-697.CrossrefGoogle Scholar

  • [7] Buchmueller, T. C., K. Grumbach, R. Kronick, and J. G. Kahn (2005). The effect of health insurance on medical care utilization and implications for insurance expansion: a review of the literature. Med. Care Res. Rev. 62(1), 3-30.Google Scholar

  • [8] Cappellari, L. and S. P. Jenkins (2003). Multivariate probit regression using simulated maximum likelihood. Stata J. 3(3), 278-294.Google Scholar

  • [9] Chen, G. G. and T. Åstebro (2012). Bound and collapse bayesian reject inference for credit scoring. J. Oper. Res. Soc. 63(10), 1374-1387.CrossrefGoogle Scholar

  • [10] Chib, S. and E. Greenberg (2007). Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16(1), 86-114.Google Scholar

  • [11] Clarke, P. S. and F. Windmeijer (2012). Instrumental variable estimators for binary outcomes. J. Amer. Statist. Assoc. 107, 1638-1652.Google Scholar

  • [12] Collier, P. and A. Hoeffler (2004). Greed and grievance in civil war. Oxford Econ. Pap. 56, 563-595.CrossrefGoogle Scholar

  • [13] Dubin, J. A. and D. Rivers (1989). Selection bias in linear regression, logit and probit models. Sociol. Method Res. 18(2-3), 360-390.CrossrefGoogle Scholar

  • [14] Fearon, J. D. and D. D. Laitin (2003). Ethnicity, insurgency, and civil war. Am. Polit. Sci. Rev. 97(1), 75-90.Google Scholar

  • [15] Fitzmaurice, G., M. Davidian, G. Verbeke, and G. Molenberghs (2008). Longitudinal Data Analysis. Chapman & Hall/CRC, London.Google Scholar

  • [16] Frees, E. W. and E. A. Valdez (1998). Understanding relationships using copulas. N. Am. Actuar. J. 2(1), 1-25.Google Scholar

  • [17] Goldman, D. P., J. Bhattacharya, D. F. McCaffrey, N. Duan, A. A. Leibowitz, G. F. Joyce, and S. C. Morton (2001). Effect of insurance on mortality in an HIV-positive population in care. J. Amer. Statist. Assoc. 96, 883-894.CrossrefGoogle Scholar

  • [18] Greene, W. H. (2012). Econometric Analysis. Prentice Hall, New York.Google Scholar

  • [19] Gronau, R. (1974). Wage comparisons: A selectivity bias. J. Polit. Econ. 82(6), 1119-1143.Google Scholar

  • [20] Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econ. Soc. Meas. 5(4), 475-492.Google Scholar

  • [21] Heckman, J. J. (1978). Dummy endogenous variables in a simultaneous equation system. Econometrica 46(4), 931-959.CrossrefGoogle Scholar

  • [22] Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47(1), 153-161.CrossrefGoogle Scholar

  • [23] Henningsen, A. (2015). mvProbit: Multivariate Probit Models. R package version 0.1-8. Available on CRAN.Google Scholar

  • [24] Inc., S. I. (2017a). SAS/ETS(R) 14.2 User’s Guide. Cary, NC.Google Scholar

  • [25] Inc., S. I. (2017b). SAS/STAT Software, Version 9.4. Cary, NC.Google Scholar

  • [26] Jeliazkov, I. and X. S. Yang (2014). Bayesian Inference in the Social Sciences. John Wiley & Sons, Hoboken NJ.Google Scholar

  • [27] Latif, E. (2009). The impact of diabetes on employment in Canada. Health Econ. 18(5), 577-589.CrossrefGoogle Scholar

  • [28] Lewis, H. G. (1974). Comments on selectivity biases in wage comparisons. J. Polit. Econ. 82(6), 1145-1155.CrossrefGoogle Scholar

  • [29] Li, Y. and G. A. Jensen (2011). The impact of private long-term care insurance on the use of long-term care. Inquiry 48(1), 34-50.CrossrefGoogle Scholar

  • [30] Maddala, G. S. (1983). Limited Dependent and Qualitative Variables in Econometrics. Cambridge University Press.Google Scholar

  • [31] Marra, G. and R. Radice (2011). Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Can. J. Stat. 39(2), 259-279.CrossrefGoogle Scholar

  • [32] Marra, G. and R. Radice (2013). A penalized likelihood estimation approach to semiparametric sample selection binary response modeling. Electron. J. Stat. 7, 1432-1455.CrossrefGoogle Scholar

  • [33] Marra, G. and R. Radice (2017a). Bivariate copula additive models for location, scale and shape. Comput. Stat. Data An. 112, 99-113.CrossrefGoogle Scholar

  • [34] Marra, G. and R. Radice (2017b). GJRM: Generalised Joint RegressionModelling. R package version 0.1-2. Available on CRAN.Google Scholar

  • [35] Marra, G., R. Radice, T. Bärnighausen, S. N. Wood, and M. E. McGovern (2017). A simultaneous equation approach to estimating HIV prevalence with non-ignorable missing responses. J. Amer. Statist. Assoc. 112(518), 484-496.Google Scholar

  • [36] McGovern, M. E., T. Bärnighausen, G. Marra, and R. Radice (2015). On the assumption of bivariate normality in selection models: a copula approach applied to estimating HIV prevalence. Epidemiology 26(2), 229-237.Google Scholar

  • [37] Miranda, A. and S. Rabe-Hesketh (2006). Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables. Stata J. 6(3), 285-308.Google Scholar

  • [38] Nelsen, R. (2006). An Introduction to Copulas. Second edition. Springer, New York.Google Scholar

  • [39] Nieman, M. D. (2015). Statistical analysis of strategic interaction with unobserved player actions: Introducing a strategic probit with partial observability. Polit. Anal. 23(3), 429-448.Google Scholar

  • [40] Pianzola, J. (2014). Selection biases in voting advice application research. Elect. Stud. 36, 272-280.CrossrefGoogle Scholar

  • [41] Poirier, D. J. (1980). Partial observability in bivariate probit models. J. Econometrics 12(2), 209-217.CrossrefGoogle Scholar

  • [42] Poirier, D. J. (2014). Identification in multivariate partial observability probit. Int. J. Math. Model. Num. Optim. 5(1-2), 45-63.Google Scholar

  • [43] R Development Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.Google Scholar

  • [44] Radice, R., G. Marra, and M. Wojtys (2016). Copula regression spline models for binary outcomes. Stat. Comput. 26(5), 981-995.CrossrefGoogle Scholar

  • [45] Rigby, R. A. and D. M. Stasinopoulos (2005). Generalized additive models for location, scale and shape. J. Roy. Statist. Soc. Ser. C 54(3), 507-554.Google Scholar

  • [46] Ruppert, D., M. P. Wand, and R. J. Carroll (2003). Semiparametric Regression. Cambridge University Press, New York.Google Scholar

  • [47] Shane, D. and P. K. Trivedi (2012). What drives differences in health care demand? The role of health insurance and selection bias. HEDG Working Papers 12/09. Available at https://www.york.ac.uk/media/economics/documents/herc/wp/12_09.pdf.Google Scholar

  • [48] Shideler, G. S., D. W. Carter, C. Liese, and J. E. Serafy (2015). Lifting the goliath grouper harvest ban: Angler perspectives and willingness to pay. Fish. Res. 161, 156-165.Google Scholar

  • [49] Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris 8, 229-231.Google Scholar

  • [50] Sklar, A. (1973). Random variables, joint distributions, and copulas. Kybernetica 9, 449-460.Google Scholar

  • [51] StataCorp (2015a). Stata 14 Base Reference Manual. StataCorp LP, College Station TX.Google Scholar

  • [52] StataCorp (2015b). Stata Statistical Software: Release 14. StataCorp LP, College Station TX.Google Scholar

  • [53] Toomet, O. and A. Henningsen (2008). Sample selection models in R: Package sampleselection. J. Stat. Softw. 27(7), 1-23.Google Scholar

  • [54] Van de Ven, W. P. and B. Van Praag (1981). The demand for deductibles in private health insurance: A probit model with sample selection. J. Econometrics 17(2), 229-252.CrossrefGoogle Scholar

  • [55] Winkelmann, R. (2011). Copula bivariate probit models: with an application to medical expenditures. Health Econ. 21, 1444-1455.CrossrefGoogle Scholar

  • [56] Wood, S. N. (2013a). On p-values for smooth components of an extended generalized additive model. Biometrika 100(1), 221-228.Google Scholar

  • [57] Wood, S. N. (2013b). A simple test for random effects in regression models. Biometrika 100(4), 1005-1010.CrossrefGoogle Scholar

  • [58] Wood, S. N. (2017a). Generalized Additive Models: An Introduction With R. Second edition. Chapman & Hall/CRC, London.Google Scholar

  • [59] Wood, S. N. (2017b). mgcv:Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation. R package version 1.8-19. Available on CRAN.Google Scholar

  • [60] Yee, T. W. (2017). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-4. Available on CRAN.Google Scholar

About the article

Received: 2017-09-22

Accepted: 2017-10-18

Published Online: 2017-12-07

Published in Print: 2017-12-20

Citation Information: Dependence Modeling, Volume 5, Issue 1, Pages 268–294, ISSN (Online) 2300-2298, DOI: https://doi.org/10.1515/demo-2017-0016.

Export Citation

© 2017. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in