Measured and unmeasured confounding present challenges in non-experimental, e.g. pharmacoepidemiologic research. To control for large numbers of measured confounders, summary scores are increasingly used. The propensity score (PS), defined as the conditional probability of treatment given a set of measured covariates, has become the most widely used summary score for confounding control [1, 2]. An alternative summary score to the PS is the prognostic score, also known as the disease risk score (DRS) . Unlike the PS which models covariate associations with treatment, the DRS models the probability or rate of disease occurrence absent of exposure. In a recent paper, Hansen  formalized the theoretical framework for the prognostic score or DRS. Formally, a DRS is defined as any scalar or multi-dimensional function that, when conditioned on, induces independence between measured covariates and the potential outcome under control (discussed further in Section 3) . Although applications of the DRS have been limited compared to the PS, use of DRSs in medical studies has increased in recent years. A number of recent studies have demonstrated the application of DRSs for confounding control in both simulated and substantive data [3–9].
While both PSs and DRSs control for measured confounders, unmeasured confounding continues to be fundamental obstacle in pharmacoepidemiology and non-experimental studies in general. In the presence of unmeasured confounding, it has been shown that controlling for variables that do not affect the outcome except through treatment (instrumental variables) amplifies bias caused by unmeasured confounders [10–15]. Pearl  further explains that bias amplification is not just a function of controlling for instruments but also occurs when controlling for any variable that affects treatment, including measured confounders. Controlling for measured confounders, however, removes confounding bias due to the measured confounders in addition to increasing bias caused by unmeasured confounders.
Given the potential for bias amplification, PS and DRS models that exclude instrumental variables are desirable in terms of reducing bias caused by unmeasured confounders. Because bias amplification is also a function of controlling for measured confounders, Pearl  suggests that researchers should consider the cost when controlling for measured confounders that have a strong effect on treatment but only a weak effect on the outcome (near instruments). For studies involving large numbers of covariates, however, identifying instrumental variables and evaluating the cost of controlling for near instruments can be challenging. Pharmacoepidemiologic and medical studies utilizing automated databases often involve large numbers of potential covariates that have not been selected with a specific research question in mind and where a multitude of factors other than the prognosis strongly influence treatment decisions (e.g. marketing, formularies, and physician preference) . In these settings, reducing bias amplification through automated or knowledge driven variable selection strategies can be difficult.
In this paper, we discuss ways in which researchers can estimate DRSs to potentially reduce bias amplification in situations where it is difficult to identify instrumental variables or evaluate the cost of controlling for near instruments. In Section 2, we use causal diagrams and path analysis to review the process of bias amplification. We show how bias amplification results from controlling indirect correlations that are induced between predictors of treatment and unmeasured confounders when conditioning on treatment. In Section 3, we review the balancing properties of DRSs and discuss why traditional estimation strategies for the DRS cause bias amplification in the presence of unmeasured confounders. In Section 4, we then discuss alternative out-of-sample estimation strategies for the DRS within large medical claims databases. We show that under certain assumptions, researchers can reduce bias amplification by estimating the DRS within historical data prior to treatment introduction or by estimating the DRS in populations with reduced treatment prevalence. In Section 5, we compare DRS estimation strategies when comparing the COX-2 inhibitor celecoxib to non-selective NSAIDS in reducing gastrointestinal complications. In Section 6, we conclude and discuss limitations of the proposed estimation strategies for the DRS.
2 Bias amplification
2.1 Setup and notation
For illustrative purposes, consider the causal diagram described in Figure 1 where the nodes represent random variables and the arrows represent causal effects. Figure 1 describes a causal structure consisting of a treatment (T), an instrumental variable (Z), an unmeasured confounder (U), and an outcome (Y). Causal associations between variables are shown by paths in which all arrows point forward. For example, the path represents the direct causal effect of U on Y and the path represents the indirect causal effect of U on Y. For simplicity, we will assume that T is a linear function in Z and U and that Y is linear in U and T.
Under assumptions of linearity, Wright  developed a method of path analysis to determine the correlation between two variables. This method initially labeled the arrows between variables with correlation coefficients, since they are the same in both directions. However, to assess causal effects in terms of how much one variable will change when another is changed by a given amount, regression coefficients need to be used.
Lunt et al.  and Pearl  explain that Wright’s method can be modified slightly to determine the relation between two variables in terms of regression coefficients. It has been shown that the value of a path between two variables is the product of the individual segments in the path and that the overall association between two variables is determined by the sum of all paths between the variables [17–19]. So in this case, if we were to perform a regression of Y on U (a hypothetical regression as U is unobserved), the coefficient for U would be equal to , where represents the value of the direct causal effect of U on Y, and represents the value of the indirect causal effect of U on Y through T (Figure 1).
If we wished to calculate the direct effect of U on Y, we would need to block the indirect path . Paths can be blocked either by controlling for a variable on the path or by having a “collider” on the path. For example, the path is blocked by the collider at T. If we were to regress Y on U and T, the coefficient for U would be .
Regression coefficients can also represent non-causal effects. For example, in Figure 1, we could regress T on U and the regression coefficient would be . However, we could also regress U on T. It is important to note that this regression coefficient will not be equal to , since its units will be whatever units U is measured in, whereas represents a change in the prevalence of treatment when U changes by one unit. For this reason, we let the regression coefficient represent the value of the path labeled , but in the reverse direction.
Causal diagrams enable us to identify and control for confounding effects. For example, in Figure 1 there are two paths from T to Y: and . The second path does not represent a causal effect since all of the arrows do not point in the forward direction. The coefficient represents the effect of T on Y through both of these paths. This coefficient does not represent the causal effect of T on Y, since it is confounded by U. If we block the path by controlling for U (e.g. including it in the regression equation), only the direct causal path from T to Y is left open, and the regression coefficient for T will represent the causal effect. Pearl [12, 19] and Lunt et al.  provide detailed discussions on path analysis and additional examples of Wright’s method in a structural equation setting.
2.2 Bias amplification from a causal diagram perspective
Figure 1 illustrates that the instrumental variable, Z, and the unmeasured confounder, U, are marginally independent since the path between Z and U is blocked by the collider, T. Once we condition on T, which we do when we estimate the treatment effect, we induce a correlation between Z and U which is represented by the dashed arrows between Z and U in Figure 2(a). The dashed lines illustrate that this association is induced and non-causal. The two arrows between Z and U are used to illustrate that this induced association represents two pathways: (1) the indirect path in the direction from Z to U () and (2) the indirect path in the direction from U to Z (). The value of the path is represented by the term and can be interpreted as the regression coefficient for Z in a regression of U on Z and T. Similarly, the value of the path is represented by the term and is interpreted as the regression coefficient of U in a regression of Z on U and T.
There are now four paths from treatment, T, to the outcome, Y: (1) the direct path , (2) the indirect path (confounding by U), (3) the indirect path (confounding by U and Z, created by the correlation between U and Z that occurs when conditioning on T), and (4) the indirect path (which also represents confounding by U and Z, created by the correlation between U and Z after conditioning on T). Were we to regress Y on T, the coefficient for T would represent the sum effect of these four paths. Formally, (1) (2) (3)
Each of the terms in eq. (3) represents one of the four pathways from T to Y when we regress Y on T. The first term, , is the direct effect of T on Y and represents the unconfounded treatment effect. The second term in eq. (3), , represents the value of the path . The third term, , represents the value of the path . The last term, , represents the value of the path . The sum of the last three terms in eq. (3) represents the bias in the treatment effect when not controlling for the instrument, Z, and can be expressed as (4)
After controlling for Z, all of the pathways between T and Y that occur through Z are blocked (Figure 2(b)). There are now only two paths between treatment and the outcome: (1) the direct path and (2) the confounding path . A regression of Y on T and Z can then be calculated as the sum of the values of these two pathways. (5) (6) (7)In eq. (7), represents the value of the direct effect of T on Y and the term represents the value of the indirect path . The bias in the treatment effect after controlling for the instrument, Z, can then be expressed as (8)
Pearl  shows that , where is the correlation between Z and T. Since , with equality only if . In other words, not controlling for the instrument, Z, reduces the overall bias in the treatment effect that is caused by the unmeasured confounder, U. When we do not condition on Z, we allow for an induced correlation to occur between Z and U upon conditioning on T. This induced correlation results in additional confounding (an induced confounding) that is of smaller magnitude and in the opposite direction than the confounding caused by U alone. Therefore, the overall confounding in the treatment effect is less than the confounding caused only by U. Controlling for Z eliminates the induced confounding and returns the total confounding to the value it would have been in the absence of the instrument.
Pearl  provides an intuitive way of describing this effect which is discussed by Myers et al. . According to Pearl, not conditioning on an instrumental variable allows the instrument to account for part of the change or variation in the treatment variable. This reduces the amount of variation in the treatment variable that is explained by the unmeasured confounder, thereby reducing the total amount of residual confounding caused by the unmeasured confounder.
While we have described bias amplification in terms of controlling for an instrumental variable, the same arguments apply when controlling for any variable affecting treatment (e.g. measured confounders). Controlling for a measured confounder, X, will remove the confounding due to X, but also eliminate the induced reduction in confounding by U. This may increase or decrease the overall confounding, depending on the relative strengths of these two confounding effects. For a detailed discussion on bias amplification, we refer the reader to Pearl . The purpose for this discussion is to emphasize and illustrate that bias amplification results from controlling induced correlations that occur between instrumental variables (or measured confounders) and unmeasured confounders upon conditioning on treatment. Viewing bias amplification from this perspective helps to discuss ways in which DRSs can be estimated to avoid controlling these induced correlations while simultaneously controlling for the direct effects of measured confounders on the outcome.
3 Balancing properties of DRSs and bias amplification when conditioning on vs
3.1 Covariate balance when conditioning on prognostic scores
In medical research investigators are often interested in comparing a treatment to a comparator therapy or no treatment. Following Rubin’s description of the counterfactual framework [20, 21], let represent the potential response had the individual received treatment and the potential response had the individual remained untreated. In practice, only one of the potential outcomes is observed. Let Y represent the observed outcome and T a dichotomous treatment. Further, let X represent a set of measured baseline covariates.
Hansen  shows that a prognostic score, or DRS, acts as a “prognostic balancing” score in that measured covariates are independent of the potential outcome under no treatment upon conditioning on the DRS. Formally, a function of X, , is a DRS if it satisfies , where denotes the independence of random variables and denotes conditional on. Prognostic balance differs from “propensity balance” since conditioning on the PS renders covariates independent of treatment assignment, not the outcome. Hansen explains that if the conditional expectation of given X follows a generalized linear model, then one possible DRS is the function . The DRS can also be a multi-dimensional function. For example, if follows a linear model where the variance of is nonconstant in X, then the function and the variance function together constitute a DRS . Hansen further shows that under the assumption (i.e. treatment assignment is weakly ignorable or no unmeasured confounding), subclassification on a DRS, , is sufficient to satisfy and allow for identification of the treatment effect in the treated.
3.2 Bias amplification when conditioning on
Consider the causal scenario illustrated in Figure 3(a) where T represents a treatment, X a measured confounder, U an unmeasured confounder, and Y the outcome. For simplicity, we will keep the same linearity and distributional assumptions discussed previously when describing Figure 1 with the exception that we will relax the linearity assumption between X and Y and assume (9)where represents a non-linear function in X (e.g. ). The direct effect of X on Y can then be represented by as shown in Figure 3(a).
The reason for not assuming linearity between Y and X is because if the DRS is a linear function of only one covariate, X, then each DRS value could be mapped to a unique value of X and conditioning on the DRS would essentially be equivalent to conditioning on X itself. When the DRS is non-linear in X or a function of two or more covariates, however, this is generally not the case. While the previous discussion on bias amplification assumed a linear structural equation framework, Pearl [12, 22] extends the discussion of bias amplification to non-linear models and shows that similar arguments for describing bias amplification can apply in many non-linear settings.
When discussing confounding control through DRSs, it is helpful to illustrate prognostic balance using DAGs that include the potential outcome under no treatment, . Richardson and Robins  have formalized the unification of DAGs with counterfactual theory and potential outcomes. Figure 3(b) shows the causal relations between the covariates X, U, and T with . Because is defined as the potential outcome had everyone remained untreated, the observed treatment variable, T, does not affect . Because we have assumed that Y is linear in T, the direct effects of X and U on are the same as those for the observed outcome, Y (Figure 3).
For this causal scenario, one possible DRS for the measured covariate X is the function . It is straightforward to show that is equivalent to modeling the direct relationship between X and which, in this case, is represented by the function in eq. (9). Therefore, conditioning on blocks the direct pathway between X and as illustrated in Figure 4(a): it can be thought of as blocking the arrow labeled from the figure, resulting in being independent of X (i.e. ).
If we condition on T, we again induce a correlation between U and X that is not modeled in the DRS function, . So, if we regress on T and , there are three pathways between T and : (1) the path , (2) the path , and (3) the path (Figure 4(b)). Paths (2) and (3) partially counter the effect of confounding by path (1) as discussed previously. Therefore, the bias in the estimated treatment effect when conditioning on is reduced compared to the bias that would result had we controlled for X directly since conditioning on X blocks paths (2) and (3) while conditioning on does not (Figure 4(b)).
3.3 Bias amplification when conditioning on
In practice, is only observed for the untreated population, and the DRS must be estimated indirectly using the observed outcome Y. This has primarily been done in one of two ways. The first is to fit a regression model to the untreated within the study cohort and then use this model to predict the disease risk for all individuals within the full cohort. The second is to fit a regression model to the full cohort (i.e. both treated and untreated) as a function of baseline covariates and treatment and then estimate the disease risk for each individual after setting treatment status to untreated [3, 5–7, 9, 24].
Both of these estimation strategies attempt to estimate indirectly by estimating the function . If there were no unmeasured confounding, then it is easy to show that . In the presence of unmeasured confounding, however, and does not have the properties of a prognostic score (or DRS) as defined by Hansen . Further, in the presence of unmeasured confounding, conditioning on results in bias amplification. This is because the function conditions (or restricts) on and, therefore, the regression coefficient for X in this function will not only represent the direct effect of X on Y but also the indirect correlation that is induced between X and Y through U when conditioning on . Consequently, conditioning on not only controls for the confounding due to X but also controls for the induced correlation between X and Y through U and, therefore, removes the reduction in confounding in U that is caused by this induced correlation. Hence, the total confounding effect in this case is (eq. 8) which is greater than the overall confounding had we conditioned on .
3.4 Simulation illustrating bias amplification when conditioning on vs
For illustrative purposes, we simulated a causal structure consisting of a dichotomous treatment, T, a standard normal measured confounder, X, a standard normal unmeasured confounder, U, and a normally distributed outcome, Y. The conditional expectation of treatment and outcome were simulated according to eqs (10) and (11). We varied the strength of the effect of the measured confounder, X, on treatment () while holding the other parameters constant. (10) (11)
The parameter values in eqs (10) and (11) were chosen simply for illustrative purposes, and other values could be used. In addition to the scenarios described in eqs (10) and (11) where U acts as an unmeasured confounder, we also conducted simulations where U had no effect on T or Y (no unmeasured confounding) to illustrate that traditional estimation strategies for the DRS can successfully control for measured confounders and result in unbiased effect estimates in the absence of unmeasured confounding.
We estimated the treatment effect after conditioning on and again after conditioning on . Because the data were simulated, was known by design. We estimated by fitting an outcome model within the untreated cohort and then used this model to predict disease risk for each individual within the full cohort. For comparison, we also estimated the treatment effect after adjusting for X directly within a traditional outcome regression model.
Results in Table 1 show the bias in the unadjusted treatment effect as well as the bias after adjusting for X directly in an outcome regression model, or adjusting for X by conditioning on or . When there is no unmeasured confounding and X does not affect treatment (OR = 1.0), both the unadjusted and the adjusted effect estimates are unbiased as expected since there is no confounding due to either U or X. When there is no unmeasured confounding and X has an effect on treatment (X is a measured confounder), adjusting for X directly in an outcome regression model, or conditioning on either or , successfully controls for the confounding caused by X, resulting in approximately unbiased effect estimates (Table 1).
When U acts as an unmeasured confounder and X has no effect on T (OR = 1), the unadjusted and adjusted bias is 0.44. This bias represents the magnitude of confounding due to the unmeasured confounder, U. As the strength of the effect of X on T increases, the unadjusted bias also increases as expected (Table 1). When we condition on , we control for the confounding that is caused by X without controlling the reduction in confounding in U that is due to the induced correlation between X and U as discussed previously. This reduction in confounding that is caused by the induced correlation between X and U is illustrated by the reduction in bias for when X affects treatment (Table 1).
When U acts as an unmeasured confounder and we condition on or control for X directly in an outcome regression model, we remove the confounding that is caused by X, but we also control the reduction in confounding in U that is due to the indirect correlation between X and U as discussed previously. Therefore, the overall confounding in the estimated treatment effect is the same had this indirect correlation not existed at all (i.e. X did not affect T to begin with). While bias amplification is the common term used to describe this event, this process can be thought of as a bias reduction that occurs when not controlling indirect correlations that are induced between predictors of treatment and unmeasured confounders.
We use this example to emphasize two aspects of bias amplification: (1) IVs and measured confounders can reduce the magnitude of bias caused by unmeasured confounders when the induced correlations between predictors of treatment and unmeasured confounders are not controlled and (2) prognostic scores (or DRSs) can control for the confounding effects of measured confounders while at the same time reduce bias amplification.
4 Estimation strategies for the DRS to reduce bias amplification
4.1 Challenges with same-sample estimation
Under the assumption of no unmeasured confounding, the traditional DRS estimation strategies discussed previously can result in accurate estimates for the DRS. Even in settings with no unmeasured confounding, however, implementing these strategies in practice can be challenging due to limitations when modeling the DRS within the study cohort. Fitting the DRS to the full cohort benefits from increased sample size, but can be particularly sensitive to model misspecification due to the additional assumptions required for correctly modeling the relation between the treatment and outcome . Hansen  explains that even small misspecifications in the specified DRS model can result in the treatment effect being associated with the estimated scores. This non-ancillarity in the estimated DRSs can introduce additional bias when used for confounding control [3, 9].
Fitting the DRS only within the untreated cohort, however, increases the potential for overfitting the model which can result in misspecified DRSs [3, 7]. In particular, if the DRS is estimated from the untreated or control group within the study data, this estimated DRS model will be overfit in this referent group, leading to over-estimation of risk in the control group for higher risk patients and under-estimation of risk in the control group for lower risk patients [3, 7]. Since the DRS is often and appropriately used for stratified treatment comparisons, such within study estimation can introduce bias in these comparisons. While bias due to overfitting the DRS model may be small in some settings (e.g. the bias due to overfitting the DRS model in the simulations described in Section 3.4 was very small which is likely due to the very large sample size used in the simulations), Hansen  describes scenarios and gives an example involving smaller sample sizes where overfitting the DRS model can potentially be problematic. Because factors affecting disease risk are likely to be constant over time and across populations, recent strategies that use out-of-sample estimation have been proposed to potentially overcome these shortcomings.
Both Hansen  and Glynn et al.  have proposed that disease risk can be accurately estimated from either a separate population, or the same population but with historical data from a period prior to the current study period. Hansen explains that estimating the DRS within an alternate sample of controls can potentially avoid the complications of overfitting that can occur when using same-sample estimation. Glynn et al.  suggest that estimating the DRS within historical data prior to treatment introduction can be particularly advantageous in pharmacoepidemiologic studies that use large administrative healthcare databases to evaluate newly introduced treatments and evolving drug therapies. Estimation with historical data from the same data collection process increases the comparability of subject surveillance and variable definitions between the DRS model development and application samples and limits overfitting associated with the application of a prediction model to the same data used in its development.
4.2 Estimating disease risk within historical data prior to treatment introduction
A potential advantage of estimating the DRS within historical data prior to treatment introduction that has not been discussed is the potential to reduce bias amplification in the presence of unmeasured confounders. Recall that bias amplification results from controlling indirect associations between predictors of treatment and unmeasured confounders that are induced upon conditioning on treatment. When using historical data prior to treatment introduction to estimate the DRS, we can avoid controlling these induced associations since we avoid having to condition on treatment when estimating the DRS.
For example, consider the same causal structure discussed previously with the exception that treatment has not been introduced (Figure 5(a)). Assuming the effects of covariates on the observed outcome are constant across populations, it is straightforward to show that is equal to , where represents the observed outcome within the historical population prior to treatment introduction. Therefore, conditioning on within the original study cohort where treatment is present satisfies by blocking the direct effect of X on without controlling the indirect association between X and through U (Figure 5(b)).
4.3 Estimating disease risk in populations with reduced treatment prevalence
Although the theoretical aspects of using historical data to estimate the DRS are promising, the implementation of this strategy is limited to situations where historical data are available. For studies conducted within administrative healthcare databases, this strategy can further be challenging when covariate assessments and coding practices change over time. To address these complications we propose an alternative DRS estimation strategy for studies conducted within large administrative databases: estimate the DRS in populations where the treatment prevalence is reduced compared to the original study population.
In pharmacoepidemiologic and medical studies, many restrictions are often placed on study participation to increase the comparability of treatment groups. For example, study participation is often restricted to new-users of a given treatment (or treatments) after a specified washout period to reduce possible biases caused by healthy users . The new-user design and other restriction criteria are necessary to reduce bias when using non-experimental study designs to evaluate medical treatments. However, many restrictions placed on study participation when estimating the treatment effect may be unnecessarily restrictive when estimating the DRS. If the effects of covariates on disease risk are stable across populations, then individuals from outside the defined study cohort can potentially be used when estimating the DRS. Including external data increases the volume of outcome events which can be advantageous when attempting to model rare outcomes as a complex function of large sets of covariates. Further, including external data when estimating the DRS can potentially allow researchers to reduce bias amplification by lowering the prevalence of treatment in the population where the DRS is estimated.
The strength of induced correlations between predictors of treatment and unmeasured confounders that occur when conditioning on treatment is a function of the prevalence of treatment within the population. Reducing the prevalence of treatment reduces the strength of these induced correlations (see Appendix for a detailed explanation). For example, consider two populations with similar causal structures with the exception that the prevalence of treatment is lower in one of the populations (Figure 6(a) and 6(b)). Let and represent the observed outcome and treatment in the population with reduced treatment prevalence. The induced correlation between X and U that occurs when conditioning on treatment will be weaker within the population with reduced treatment prevalence compared to the original cohort (Figure 6(a) and 6(b)).
If we estimate the DRS within the population with reduced treatment prevalence, we will be estimating a function that is equal to . The regression coefficient for X in this function represents the direct effect of X on as well as the indirect association between X and that occurs through U when conditioning on . This coefficient correctly represents the direct effect of X on Y within the original study cohort since the effect of X on the outcome is constant across populations. However, this coefficient also underrepresents the strength of the induced association that X has with Y through U in the original population since the induced correlation between X and U is weaker in the population with reduced treatment prevalence where the DRS was estimated. Therefore, conditioning on within the original study population controls for the confounding caused by X, but only partially controls the induced correlation between X and U within the original study cohort. This correlation, in turn, reduces the magnitude of confounding caused by U as discussed previously.
4.4 Simulation study: a simple illustrative example
To illustrate, we simulated the same population described in eqs (10) and (11) which has a baseline treatment prevalence of 50%. In this example, we held the effect of X on T constant at an OR. We estimated the treatment effect within this population after conditioning on , where was again estimated by fitting an outcome model within the untreated cohort and then used to predict disease risk for each individual within the full cohort. We then simulated four populations similar to the original population with the exception that we reduced the prevalence of treatment. We estimated the treatment effect within the original study cohort (simulated population with baseline treatment prevalence of 50%) after conditioning on DRSs that were estimated within the populations with reduced treatment prevalence.
Results in Table 2 illustrate that bias amplification can be reduced by estimating the DRS in a population where the prevalence of treatment is reduced compared to the original study cohort. Results further indicate that this reduction in bias is a function of the baseline prevalence of treatment in the population where the DRS is estimated. This simple example is not intended to be reflective of any practical settings, but is used to illustrate that conditioning on DRSs that are estimated in populations with reduced treatment prevalence only partially controls the induced confounding pathway (i.e. the induced correlation) that occurs between predictors of treatment and unmeasured confounders upon conditioning on treatment. This, in turn, reduces the overall confounding in the estimated treatment effect.
4.5 Challenges when comparing treatment to alternative therapies
In the previous discussion and simulated examples, we compared treated to untreated individuals to simplify the discussion. While comparing treated populations to no treatment or placebo is a question of interest in many studies, comparing treatments to alternative therapies (comparative effectiveness) is often more relevant for clinical decision making in pharmacoepidemiology. Comparing treatment to alternative therapies has many advantages with respect to avoiding the potential for confounding bias, but presents additional challenges when attempting to reduce bias amplification through the proposed DRS estimation strategies. This is because we can no longer avoid conditioning on treatment when estimating the DRS.
For example, if represents a comparator treatment and the potential outcome under , then the DRS is defined as not , where represents the baseline disease risk had individuals remained untreated (i.e. did not receive or ). Conditioning on induces prognostic balance for by satisfying , but does not necessarily satisfy . Conditioning on when comparing alternative treatment therapies, therefore, requires the additional assumption that treatment effect heterogeneity for the comparator treatment is similar across populations that have the same distribution of baseline disease risk.
Because the DRS does not balance covariates across treatment groups, two groups that have the same distribution of baseline disease risk (i.e. ) do not necessarily imply that those treatment groups will have the same distribution of disease risk under the comparator treatment (i.e. ). Differences can arise because different covariate distributions across the two treatment groups can potentially result in differences in treatment effect heterogeneity for the comparator treatment across the two populations. If there is no treatment effect heterogeneity, or it is reasonable to assume that treatment effect heterogeneity for the comparator treatment is similar across populations with the same distribution of baseline disease risk, then treatment groups with the same distribution of baseline disease risk will also have the same distribution of disease risk under the comparator treatment. Under this assumption, the proposed estimation strategies for baseline disease risk (i.e. ) can potentially reduce bias amplification when comparing alternative treatment therapies while at the same time control for the confounding effects of measured confounders.
5 Empirical example: COX-2 inhibitors vs NSAIDs
We compared new-users of the COX-2 inhibitor celecoxib to new-users of non-selective NSAIDs (ibuprofen or diclofenac) in reducing gastrointestinal (GI) complications over a 60-day follow-up period. New-users were defined as individuals who initiated a celecoxib or NSAID between the years 2006 and 2008 after having no prescription of these medications during a 6-month washout period. GI complication was defined as a hospitalization for a GI hemorrhage or peptic ulcer disease or an outpatient visit for a GI hemorrhage. Analyses were performed in the MarketScan Commercial Claims and Encounters and Medicare Supplementary and Coordination of Benefit database (Truven Healthcare, Inc.), a large US employer-based insurance claims database. The database contains patient billing information for in- and outpatient procedures and diagnoses, pharmacy medication dispensing, and enrollment information for enrolled employees, spouses, dependents, and retirees. We included all individuals who were continuously enrolled in MarketScan for at least 12 months prior to drug initiation. All demographic and clinical covariates were defined during the 6 months prior to drug initiation and are described in Table 3.
We chose this example for a few reasons. First, observational studies comparing COX-2 inhibitors to NSAIDs using claims data are known to contain some degree of unmeasured confounding due to a lack of information on important confounding factors (e.g. smoking status, body mass index, alcohol consumption, etc.) [16, 25]. Further, previous studies have shown strong confounding by indication when comparing COX-2 inhibitors to NSAIDs in preventing GI complication [26, 27]. The likelihood for confounding by indication in this setting is supported by strong differences in the distribution of baseline covariates across treatment groups shown in Table 3. Finally, we can use results from clinical trials as a benchmark when evaluating the performance of various methods for DRS estimation [28, 29].
The effect of initiating celecoxib vs NSAIDs was estimated using a Cox proportional hazards model after stratifying on the estimated DRS. We also estimated the treatment effect after stratifying on the estimated PS for comparison. We estimated the DRS in two different populations. We first estimated the DRS (DRS1) within the comparator group of NSAID new-users (ibuprofen or diclofenac). We also estimated the DRS (DRS2) in a population of untreated individuals (celecoxib and NSAID non-users). Because surveillance between non-initiators and new-users can differ, we identified a group of healthcare-seeking individuals who had an outpatient physicians visit after a 6-month washout period where no celecoxib or NSAID was used. We required that individuals have an outpatient physicians visit in order to minimize potential differences in covariate surveillance between non-initiators and new-users. For each of the DRS models, we calculated the c-statistic and Hosmer–Lemeshow test statistic within a validation set of controls (i.e. a subset of NSAID new-users) to evaluate the discrimination and calibration of the fitted DRS model. Individuals were censored if they lost any part of coverage during follow-up.
PS stratification and both estimation methods for the DRS gave similar results in terms of the estimated hazard ratio (Table 4). Although both PS and DRS stratification moved the effect estimate in the expected direction, none of these methods resulted in effect estimates that are consistent with clinical trials which have shown COX-2 inhibitors to have a significant protective effect against GI complications compared to non-selective NSAIDs [28, 29]. These discrepancies indicate the existence of strong unmeasured confounding. In this example, we did not see an improvement in a reduction of bias amplification when estimating the DRS within an untreated population where the prevalence of both treatments (celecoxib and NSAIDs) were reduced. Estimating the DRS within NSAID new-users (DRS1) resulted in better model fit with improved discrimination (c-statistic) and calibration (Hosmer–Lemeshow test) compared to estimating the DRS within an untreated population (DRS2).
We have given a concise overview of bias amplification and have discussed how bias amplification results from controlling induced correlations that occur between predictors of treatment and unmeasured confounders when conditioning on treatment. We have further discussed ways in which bias amplification can potentially be reduced using alternative estimation strategies for the DRS. Estimating the DRS in an outside population using either historical or contemporary data allows researchers to potentially reduce bias amplification caused by both instrumental variables and measured confounders. In theory, alternative estimation strategies for the DRS provide a way to avoid conditioning on measured instrumental variables and may reduce the cost of controlling for measured confounders that have strong effects on treatment and weak effects on the outcome.
Although estimating the DRS in alternative populations has the potential to reduce bias amplification, applying these estimation strategies within administrative claims databases has limitations as illustrated in the empirical example comparing new-users of the COX-2 inhibitor celecoxib to new-users of non-selective NSAIDs. As discussed previously, estimating the DRS in untreated individuals outside of the defined study cohort has the potential for covariate and outcome misclassification due to differential surveillance of individuals within the database. Further, estimating the DRS within populations outside the study cohort requires the assumption that the effects of baseline covariates on the outcome are not modified in the population where disease risk is estimated. Estimation of disease risk in a lower risk population (non-initiators) can lead to problems with extrapolation of the DRS and possibly non-linear associations with risk. Additional research is needed to evaluate the performance of DRSs when using external data to estimate the DRS in settings specific to pharmacoepidemiology and large database research.
This work was supported by a grant (R01 AG023178) from the National Institute on Aging. The authors also thank the Editor and reviewers for their helpful comments.
Consider the causal scenario described in Figure 6(a) where is the regression coefficient of X in a regression of U on X and T. Let represent the same regression coefficient, but within a population with reduced treatment prevalence compared to the original study population (Figure 6(b)).
In the above equations, and represent the correlation between U and X and the correlation between U and X after conditioning on T, respectively. The term is the standard deviation of U conditional on T. Other terms can be interpreted similarly. Since approaches 0 as the prevalence of treatment approaches 0, it follows that .
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41–55. [Crossref]
Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methods yielded increasing use, advantages in specific settings, but not substantially different estimates compared with conventional multivariate methods. J Clin Epidemiol 2006;59:437–47. [PubMed]
Arbogast PG, Ray WA. Performance of disease risk scores, propensity scores, and traditional multivariable outcome regression in the presence of multiple confounders. Am J Epidemiol 2011;174:613–20. [Web of Science] [Crossref] [PubMed]
Tadrous M, Gagne JJ, Stürmer T, Cadarette SM. Disease risk score as a confounder summary method: systematic review and recommendations. Pharmacoepidemiol Drug Saf 2013;22:122–9. [Web of Science] [PubMed] [Crossref]
Leacy FP, Stuart EA. On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Stat Med 2013 [Epub ahead of print]. [Web of Science] [PubMed]
Bhattacharya J, Vogt WB. Do instrumental variables belong in propensity scores? NBER technical working paper no. 343. National Bureau of Economic Research, Cambridge, MA, 2007.
Wooldridge J. Should instrumental variables be used as matching variables? Technical report. East Lansing, MI: Michigan State University, 2009.
Pearl J. On a class of bias-amplifying variables that endanger effect estimates. In: Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence, AUAI, Corvallis, OR, 2010:417–24.
Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S, Rothman KJ, et al. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am J Epidemiol 2011;174:1213–22. [PubMed] [Crossref] [Web of Science]
Brookhart MA, Stürmer S, Glynn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: challenges and potential approaches. Med Care 2010;48:S114–20. [Crossref] [PubMed] [Web of Science]
Wright S. Correlation and causation. J Agric Res 1921;20:557–85.
Lunt M, Avorn J, Glynn RJ, Rothman KJ, Stürmer T. Propensity score calibration in the absence of surrogacy. AJE 2012;175:1294–302. [Crossref]
Pearl J. Linear models: a useful microscope for causal analysis. J Causal Inference 2013;1:155–70.
Rubin D. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol1974;66:688–701. [Crossref]
Pearl J. Causality: models, reasoning, and inference, 2nd ed. New York, NY: Cambridge University Press, 2009.
Richardson TS, Robins JM. Single world intervention graphs (SWIGS): a unification of the counterfactual and graphical approaches to causality. Working paper number 128. Center for Statistics and the Social Sciences, 2013.
Miettinen OS. Stratification by a multivariate confounder score. Am J Epidemiol 1976;104:609–20. [PubMed]
Brookhart MA, Wang PS, Solomon DH, Schneeweiss S. Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology 2006;17:268–75. [Crossref] [PubMed]
Patino FG, Allison J, Olivieri J, Mudano A, Juarez L, Person S, et al. The effects of physician specialty and patient comorbidities on the use and discontinuation of coxibs. Arthritis Rheum 2003;49:293–9. [Crossref] [PubMed]
Silverstein FE, Faich G, Goldstein JL, Simon LS, Pincus T, Whelton A, et al. Gastrointestinal toxicity with celecoxib vs nonsteroidal anti-inflammatory drugs for osteoarthritis and rheumatoid arthritis: the CLASS study: a randomized controlled trial. Celecoxib long-term arthritis safety study. J Am Med Assoc 2000;284:1247–55. [Crossref]
Bombardier C, Laine L, Reicin A, Shapiro D, Burgos-Vargas R, Davis B, et al. Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. VIGOR study group. N Engl J Med 2000;343:1520–8. [Crossref]