## Abstract

*Purpose*: Observational studies designed to investigate the safety of a drug in a postmarketing setting typically aim to examine rare and non-acute adverse effects in a population that is not restricted to particular patient subgroups for which the therapy, typically a drug, was originally approved. Large healthcare databases and, in particular, rich electronic medical record (EMR) databases, are well suited for the conduct of these safety studies since they can provide detailed longitudinal information on drug exposure, confounders, and outcomes for large and representative samples of patients that are considered for treatment in clinical settings. Analytic efforts for drawing valid causal inferences in such studies are faced with three challenges: (1) the formal definition of relevant effect measures addressing the safety question of interest; (2) the development of analytic protocols to estimate such effects based on causal methodologies that can properly address the problems of time-dependent confounding and selection bias due to informative censoring, and (3) the practical implementation of such protocols in a large clinical/medical database setting. In this article, we describe an effort to specifically address these challenges with marginal structural modeling based on inverse probability weighting with data reduction and super learning.

*Methods*: We describe the principles of, motivation for, and implementation of an analytical protocol applied in a safety study investigating possible effects of exposure to oral bisphosphonate therapy on the risk of non-elective hospitalization for atrial fibrillation or atrial flutter among older women based on EMR data from the Kaiser Permanente Northern California integrated health care delivery system. Adhering to guidelines brought forward by Hernan (Epidemiology 2011;22:290-1), we start by framing the safety research question as one that could be directly addressed by a sequence of ideal randomized experiments before describing the estimation approach that we implemented to emulate inference from such trials using observational data.

*Results*: This report underlines the important computation burden involved in the application of the current R implementation of super learning with large data sets. While computing time and memory requirements did not permit aggressive estimator selection with super learning, this analysis demonstrates the applicability of simplified versions of super learning based on select sets of candidate learners to avoid complete reliance on arbitrary selection of parametric models for confounding and selection bias adjustment. Results do not raise concern over the safety of one-year exposure to BP but may suggest residual bias possibly due to unmeasured confounders or insufficient parametric adjustment for observed confounders with the candidate learners selected.

*Conclusions*: Adjustment for time-dependent confounding and selection bias based on the ad hoc inverse probability weighting approach described in this report may provide a feasible alternative to extended Cox modeling or the point treatment analytic approaches (e.g. based on propensity score matching) that are often adopted in safety research with large data sets. Alternate algorithms are needed to permit the routine and more aggressive application of super learning with large data sets.

## Comments (0)