Jump to ContentJump to Main Navigation

The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

2 Issues per year


IMPACT FACTOR 2013: 0.948

SCImago Journal Rank (SJR): 1.039
Source Normalized Impact per Paper (SNIP): 0.746

Mathematical Citation Quotient 2013: 0.04

Free Access

Bivariate Zero-Inflated Regression for Count Data: A Bayesian Approach with Application to Plant Counts

Anandamayee Majumdar1 / Corinna Gries2

1Arizona State University

2University of Wisconsin-Madison

Citation Information: The International Journal of Biostatistics. Volume 6, Issue 1, ISSN (Online) 1557-4679, DOI: 10.2202/1557-4679.1229, August 2010

Publication History

Published Online:
2010-08-13

Lately, bivariate zero-inflated (BZI) regression models have been used in many instances in the medical sciences to model excess zeros. Examples include the BZI Poisson (BZIP), BZI negative binomial (BZINB) models, etc. Such formulations vary in the basic modeling aspect and use the EM algorithm (Dempster, Laird and Rubin, 1977) for parameter estimation. A different modeling formulation in the Bayesian context is given by Dagne (2004). We extend the modeling to a more general setting for multivariate ZIP models for count data with excess zeros as proposed by Li, Lu, Park, Kim, Brinkley and Peterson (1999), focusing on a particular bivariate regression formulation. For the basic formulation in the case of bivariate data, we assume that Xi are (latent) independent Poisson random variables with parameters ? i, i = 0, 1, 2. A bi-variate count vector (Y1, Y2) response follows a mixture of four distributions; p0 stands for the mixing probability of a point mass distribution at (0, 0); p1, the mixing probability that Y2 = 0, while Y1 = X0 + X1; p2, the mixing probability that Y1 = 0 while Y2 = X0 + X2; and finally (1 - p0 - p1 - p2), the mixing probability that Yi = Xi + X0, i = 1, 2. The choice of the parameters {pi, ? i, i = 0, 1, 2} ensures that the marginal distributions of Yi are zero inflated Poisson (? 0 + ? i). All the parameters thus introduced are allowed to depend on co-variates through canonical link generalized linear models (McCullagh and Nelder, 1989). This flexibility allows for a range of real-life applications, especially in the medical and biological fields, where the counts are bivariate in nature (with strong association between the processes) and where there are excess of zeros in one or both processes. Our contribution in this paper is to employ a fully Bayesian approach consolidating the work of Dagne (2004) and Li et al. (1999) generalizing the modeling and sampling-based methods described by Ghosh, Mukhopadhyay and Lu (2006) to estimate the parameters and obtain posterior credible intervals both in the case where co-variates are not available as well as in the case where they are. In this context, we provide explicit data augmentation techniques that lend themselves to easier implementation of the Gibbs sampler by giving rise to well-known and closed-form posterior distributions in the bivariate ZIP case. We then use simulations to explore the effectiveness of this estimation using the Bayesian BZIP procedure, comparing the performance to the Bayesian and classical ZIP approaches. Finally, we demonstrate the methodology based on bivariate plant count data with excess zeros that was collected on plots in the Phoenix metropolitan area and compare the results with independent ZIP regression models fitted to both processes.

Keywords: Bayesian inference; regression model; bivariate zero-inflated Poisson; data augmentation; generalized linear model; multivariate zero-inflated Poisson

Comments (0)

Please log in or register to comment.