Abstract:
While standard metaanalysis pools the results from randomized trials that compare two treatments, network metaanalysis aggregates the results of randomized trials comparing a wider variety of treatment options. However, it is unclear whether the aggregation of effect estimates across heterogeneous populations will be consistent for a meaningful parameter when not all treatments are evaluated on each population. Drawing from counterfactual theory and the causal inference framework, we define the population of interest in a network metaanalysis and define the target parameter under a series of nonparametric structural assumptions. This allows us to determine the requirements for identifiability of this parameter, enabling a description of the conditions under which network metaanalysis is appropriate and when it might mislead decision making. We then adapt several modeling strategies from the causal inference literature to obtain consistent estimation of the interventionspecific mean outcome and modelindependent contrasts between treatments. Finally, we perform a reanalysis of a systematic review to compare the efficacy of antibiotics on suspected or confirmed methicillinresistant Staphylococcus aureus in hospitalized patients.
1 Introduction
While individual studies are rarely used to inform scientific or medical decision making [1], multiple sources of evidence may be aggregated in order to offer more generalizable and precise comparisons between treatments [2, 3, 4, 5]. Metaanalysis, which is the statistical synthesis of multiple study results, is often considered the highest form of quantitative evidence due to its ability to combine all relevant information in the scientific literature. However, because of such issues as effect heterogeneity across study populations and methodology that does not necessarily account for all sources of bias, the status of metaanalysis as the “gold standard” of medical knowledge has been questioned [6].
Standard metaanalysis compares two treatments of interest (or, for instance, an active treatment and placebo). When many treatments for a common condition are tested and made available over time, the medical literature may then contain multiple randomized controlled trials (RCTs) with various treatment comparisons on potentially different populations. Without additional guidance, clinicians and patients are left to informally synthesize information in the available studies in order to determine an optimal treatment decision. A network metaanalysis statistically aggregates the results from the relevant RCTs in order to obtain an estimate of the contrast between each pair of treatments. In particular, this type of analysis can produce estimates of contrasts even when no RCT directly compared the two treatments of interest directly.
Each RCT in the network may be performed on populations that differ in terms of their baseline characteristics. These populationspecific variables may affect the average response to treatment so that in order to combine inference involving the means, it might be beneficial to control for such variables [7]. Furthermore, it has been noted that if these characteristics not only differentially affect response to treatment, but also the initial study design choice of which treatments to compare, then these variables may confound the overall effect estimate [6, 8]. As an example, Jansen et al. [8],suggest that the baseline severity of patients recruited into a study can be related to the type of treatments investigated in the study and also affect the average outcome at the end of the study. As we demonstrate in this paper, such “studylevel confounding” must be adjusted for in order to obtain consistent estimation of average treatment effects.
In this paper, we consider the setting where individual patient data are not available so that the observed data is limited to average covariate and outcome values in addition to studylevel information (which we refer to as “aggregate” or studylevel data). We begin by describing past parametric approaches to network metaanalysis where the parameter of interest is dependent on the model specification and where the absence of effect heterogeneity is often required a priori. Using the counterfactual framework, we propose a novel definition of a marginal and modelindependent causal parameter of interest in network metaanalysis and delineate the assumptions required to estimate this parameter in the presence of measured studylevel confounders. We are then able to clarify conditions under which a network metaanalysis is appropriate and when it might mislead decision making regardless of estimation method used. We describe several marginal estimation methods adapted from the single study causal inference setting, including a doubly robust and semiparametric locally efficient Targeted Maximum Likelihood Estimator, and then compare these methods in a simulation study. Finally, we perform a reanalysis of the systematic review by Bally et al. [9],to compare the efficacy of antibiotics on suspected or confirmed methicillinresistant Staphylococcus aureus (MRSA) in hospitalized patients.
2 The observed data
Each RCT is assumed to randomly sample subjects from a wider population, called a superpopulation. Within the RCT, randomization assigns subjects to two or more groups, each one receiving a treatment. These groups are often referred to as treatment arms. Due to randomization and random sampling, each group is a representative sample from the superpopulation. Therefore, each arm can be thought of as a distinct study on the same superpopulation. The superpopulations targeted by the RCTs may differ in terms of their characteristics due to, for example, each trial’s physical and temporal location, the individual inclusion and exclusion criteria, and the recruitment sample size targets. Therefore, if effect heterogeneity exists (i. e. if the relative treatment effects at the subject level depend on baseline covariate values), one would not expect the average relative treatment effects to necessarily be equal across superpopulations.
More formally, the superpopulation is the conceptual group of essentially infinite size from which the study sample is selected [10]. A measure of some outcome (
Let
Because we are interested in summarizing effects across multiple superpopulations, we are arguably attempting to estimate effects in a metapopulation that contains the individual superpopulations from each study. For the purpose of this paper, we define the metapopulation as the union of possible study superpopulations and define our parameters of interest with respect to this metapopulation. In particular, we assume that the individual
3 Past approaches to network metaanalysis
Standard approaches in network metaanalysis where only aggregate data are observed place a hierarchical model on either the studyspecific contrasts (e. g. the difference in means,
The effect targeted in a hierarchical model depends on the contrasttype chosen and the parametrization of the model, and may or may not correspond to a marginal effect as we define further on. For binary outcomes, due to the noncollapsibility of the logistic regression model [19] in particular, adjustment for covariates in such a model changes the true value of the “effect” parameter being estimated. This type of modeling strategy may therefore be biased for the estimation of a marginal effect. Even in linear models, the inclusion of treatment interactions with covariates can also bias the value of the coefficient of treatment relative to the marginal effect. Zhang et al. [12],and Zhang et al. [20],take a missing data perspective and model the armspecific outcomes using a Bayesian hierarchical model to estimate marginal parameters. While neither approach has yet been extended to incorporate covariates, the former paper assumes that treatments are applied to studies atrandom while the latter allows for estimation in a notatrandom context by explicitly specifying the unobservable selection mechanism.
While adjustment for covariates is rare in practice, Jansen et al. [8],introduced the notion of adapting Pearl’s causal directed acyclic graphs (DAGs) to this setting [21] in order to assist in covariate selection. As a general rule, Jansen et al. [8],advocate for the adjustment of all modifiers of the relative treatment effects across comparisons. They also discourage adjustment for covariates that are not effect modifiers due to the fact that they may induce bias in the metaanalysis.
4 The counterfactual approach
Let
For two treatments,
The patient sample in any given study arm may not be representative of the metapopulation, for which the effect of interest is defined. In addition, because treatment was not randomly allocated across different RCTs, the collection of mean outcomes observed under a given treatment
4.1 A causal directed acyclic graph (DAG) for network metaanalysis
Similar to Alonso et al. [22], we assume that heterogeneity in the different superpopulations targeted in the individual RCTs implies that each RCT estimates a different causal effect. Like Zhang et al. [12], we take an “armbased” approach to the problem. Like Jansen et al. [8], we draw a causal DAG in order to conceptualize the relationship between treatment, study results, and populationspecific characteristics. We arbitrarily choose to intervene on the arm labeled
Many of the assumptions presented in detail in Section 4.3 are drawn explicitly using the studylevel DAGs in Figure 1(a). The nodes of the DAG represent variables measured at the level of the RCT and the arrows between them represent the effect of the parent on the child node. For example, the absence of an arrow from
Figure 1:
The sample size node
Causal DAGs can be used as a tool to identify which variables must be controlled for in the metaanalysis in order to estimate the treatmentspecific metapopulation mean outcome. Depending on some underlying statistical assumptions that we will investigate in detail in the following sections, these DAGs may simplify to Figure 1(b). This happens because we can ignore the mediation path through
Note that the recommendations based on this DAG differ from those of Jansen et al. [8], who say that the analysis must adjust exclusively for effect modifiers. The assumptions that we list in Section 4.3 are explicitly required in the steps we take in Section 4.2 in order to obtain identifiability of the metaanalysis parameter of interest.
4.2 The Gformula and nonparametric identifiability
Suppose we observe the aggregate data
4.2.1 The observed data generation
At the study design stage for RCT
The second stage operates at the individual level once subjects are recruited and randomly assigned treatment. Suppose each subject
Let
Assuming no interference between arms and that the distribution of
The probability density function
where
4.2.2 The counterfactual distribution
Define an intervention as the assignment of treatment strategy
where
4.2.3 Identifiability for conditionally independent
Y
ˉ
and
S
Suppose we have that
Identifiability without assuming this structural independence is possible, and we describe the additional causal assumptions required for this setting in Appendix A.2.
4.2.4 Identifiability for binary outcomes
If the original study outcomes are binary (such that
4.3 Assumptions
For convenience, here we list the assumptions needed for the identification of
No interference. The use of the above counterfactual notation presupposes that the treatment assigned to one study does not affect the counterfactual outcome of another study [25]. A secondary level of interference within an individual study involves the treatment in one study arm affecting the outcomes in another study arm. This means that the estimates
Unconfoundedness. (Weak) unconfoundedness [26] is required for the identification of
Consistency. The consistency assumption in this context states that the counterfactual mean of a study arm under a given treatment is the same as the observed result. With notation, this is equivalent to stating that
Positivity. Finally, we need to evaluate both theoretical and practical positivity. Theoretical positivity is the assumption that, conditional only on variables required for unconfoundedness, all studies had a positive probability of being assigned each treatment under investigation. Practical positivity is the condition that for every level of the characteristics
It is important to note that treatment comparisons are based on the same
It is furthermore important to note that the positivity assumption is not the same as requiring that all studies could have realistically been assigned each treatment. In particular, certain treatments may not have been available when some older trials were carried out. If year of study is not required to unconfound the analysis, then the unconditional probability may still be nonzero.
5 Estimation of the treatmentspecific metapopulation mean outcome
5.1 GComputation
GComputation procedures based on the Gformula in Section 4.2 can be used to estimate the target parameter. Here we define a simple procedure resulting from the data requirement that the sample mean and standard deviation are independent within a study arm. This procedure allows for simple frequentist estimation of the mean effect of treatment.
This procedure requires estimates for the conditional expectation
Because
A model for the regression on
The standard error for the GComputation estimate is usually computed through nonparametric bootstrap methods [29]. Bootstrap resampling must be done by resampling studies, rather than arms, similar to what is done in a study with clustering [30].
5.2 Inverse probability of treatment weighting
Likelihood methods, such as GComputation, require correct parametric specification of the outcome model, which may be difficult to specify. An alternative approach is to utilize propensity score methods, which require the estimation of a model for the treatment received by the arm. For a given treatment type
Despite the small sample size in standard network metaanalysis, one might attempt inverse probability of treatment weighting (IPTW) for the estimation of the marginal parameter. Let
Intuitively, this estimator takes a mean of
The consistency of this estimator can be shown as follows.
5.3 Targeted minimum lossbased estimation
Targeted Minimum Lossbased Estimation (TMLE) [31, 32] is a framework for the construction of semiparametric estimators generally applied to the estimation of causal quantities. The TMLE procedure is carried out by first fitting a model for the expected value of the armbased means,
This TMLE is consistent under correct specification of the propensity score model or the model for the expected value of the mean outcome (the property of double robustness). If both of these models are correct, then TMLE is asymptotically efficient in the class of regular, asymptotically linear estimators in the semiparametric model space [32]. More details and a proof of consistency are included in Appendix A.3.
6 Simulation study
In this section we demonstrate that we can obtain consistent estimation of the target parameter
While the proposed estimators do not restrict the number of study arms, we fix all simulated studies to have exactly two treatment arms for simplicity. We are interested in estimating the mean outcome of the metapopulation under treatment for each of four treatments of interest. For each study
Variable  Study design: for each


Number of arms 

Studylevel covariate 





Treatments






Sample size


(study recruitment) where


Withinstudy: for each


Subjectlevel covariate


Subjectlevel outcome 

where


Observed data: for each




Studylevel information

The sample statistics from each study arm are calculated by taking the mean and standard deviation of
We tested the three methods described in the text (GComputation, IPTW and TMLE) for
The unadjusted estimator was greatly biased for the first and third contrasts, indicating that those two contrasts were highly confounded by the simulated studylevel covariate. The correctly specified GComputation estimator had the lowest bias throughout, the smallest standard errors, and near optimal confidence interval coverage. This is to be expected as GComputation is a function of maximum likelihood parameter estimates with correct parametric specification of the necessary component of the likelihood (namely, the conditional mean of the outcome). However, with an incorrectly specified outcome model, the estimator was biased which caused the coverage to suffer for the third contrast.
IPTW was the most biased estimator and also had the largest variance. The bias largely dissipated when the sample size was increased to
TMLE with correct outcome model specification had bias comparable to GComputation but slightly higher for

SEMC  SEBS(




15  50  15  50  15  50  
Correctly specified models  


GComp  0  0  0.04  0.02  0.04(91)  0.02(94)  
IPTW  40  9  0.57  0.46  0.61(89)  0.41(92)  
TMLE  4  0  0.27  0.03  0.44(96)  0.05(92)  


GComp  1  0  0.04  0.02  0.04(91)  0.02(95)  
IPTW  –2  0  0.10  0.04  0.25(99)  0.05(97)  
TMLE  –1  –1  0.17  0.04  0.29(96)  0.04(93)  


GComp  0  0  0.04  0.02  0.04(89)  0.02(93)  
IPTW  81  3  0.76  0.74  0.62(63)  0.80(68)  
TMLE  –9  0  0.81  0.11  0.74(94)  0.21(95)  
Misspecified outcome model  

No adjustment  101  103  0.65  0.35  0.61(75)  0.34(52) 
GComp  2  12  0.20  0.13  0.24(98)  0.11(94)  
TMLE  –8  –2  0.33  0.09  0.46(97)  0.11(96)  

No adjustment  5  –7  0.37  0.20  0.38(92)  0.20(93) 
GComp  –1  7  0.20  0.12  0.18(99)  0.11(95)  
TMLE  0  0  0.15  0.05  0.28(99)  0.05(96)  

No adjustment  126  125  0.69  0.38  0.61(51)  0.36(18) 
GComp  36  33  0.53  0.29  0.48(88)  0.24(80)  
TMLE  44  –24  0.86  0.38  0.75(87)  0.36(75) 
7 Application: Antibiotic use on methicillinresistant Staphylococcus aureus infection
We illustrate this causal inference approach and the adapted estimation methods in network metaanalysis with an example from infectious disease research. An increase in MRSA has spurred investigation of comparative efficacy of different antibiotic treatment options. While the antibiotic vancomycin has been the standard treatment for decades, treatment failures have been noted in patients with serious infections [37]. Interest therefore lies in whether alternative antibiotics are as effective as the standard. Bally et al. [9] performed a systematic review and Bayesian network metaanalyses of RCTs of parenteral antibiotics used for treating hospitalized adults with complicated skin and softtissue infections (cSSTIs) and hospitalacquired or ventilatorassociated pneumonia.
We consider the target population of interest to be the population of clinical trial participants with suspected or confirmed MRSA cSSTIs or pneumonia, with corresponding studies published until May 2012. The site of infection and confirmation of MRSA represent important differences in the entrance criteria of the various studies. 24 studies were found. Patients were randomized based on suspicion of MRSA in all but three studies for which the protocol specified confirmation of presence of MRSA at baseline. 14 studies enrolled subjects with cSSTIs, 7 studies enrolled subjects with hospitalacquired or ventilatorassociated pneumonia, and 3 studies allowed for either indication. The original network metaanalysis of Bally et al. [9],analyzed each infection site in separate analyses and therefore obtained stratified estimates. Based on the theory we developed, we can account for the potentially different treatment effects in each subpopulation by controlling for subpopulation type as a covariate in the analysis. By doing so, we ask a higherlevel yet still clinically interesting question: “Are the alternative therapies as effective as the standard antibiotic for the treatment of suspected or confirmed MRSA?” Because infection site, MRSA confirmation, and study year can potentially affect the choice of investigated therapies and the outcomes, these three covariates (labeled
The outcome of interest is clinical test of cure for all subjects who received at least one dose of treatment (a standard measure in infectious disease research). Four papers evaluated the outcome only on a subset of patients selected postrandomization; as this does not conform to our definition of the RCTspecific parameter of interest, we considered these outcomes missing. For our analysis, we chose to compare vancomycin with the two most prevalent alternatives: telavancin and linezolid. In total, 47 study arms evaluated one of these three treatments and 36 had an observed outcome. Of the remaining treatments, tigecycline, daptomycin, and ceftaroline were each evaluated in three study arms, and a regime of quinupristin/dalfopristin was evaluated in one arm. All of this information is available in the data extraction Table 3.
Publication  Events  Ni  Ai  StudyID  Year  Infection  Confirmed MRSA 

at baseline  
Katz et al., 2008  42  48  vancomycin  1  2007  cSSTI  0 
36  48  daptomycin  1  2007  cSSTI  0  
Arbeit et al., 2004  162  266  vancomycin  2  2001  cSSTI  0 
165  264  daptomycin  2  2001  cSSTI  0  
235  292  vancomycin  3  2000  cSSTI  0  
217  270  daptomycin  3  2000  cSSTI  0  
Breedt et al., 2005  216  250  vancomycin  4  2003  cSSTI  0 
212  253  tigecycline  4  2003  cSSTI  0  
Sacchidanand et al., 2005  196  255  vancomycin  5  2003  cSSTI  0 
203  268  tigecycline  5  2003  cSSTI  0  
Stryjewski et al., 2008  307  429  vancomycin  6  2006  cSSTI  0 
309  426  telavancin  6  2006  cSSTI  0  
360  489  vancomycin  7  2006  cSSTI  0  
348  472  telavancin  7  2006  cSSTI  0  
Stryjewski et al., 2006  81  95  vancomycin  8  2004  cSSTI  0 
82  100  telavancin  8  2004  cSSTI  0  
Corey et al., 2010  297  347  vancomycin  9  2007  cSSTI  0 
304  351  ceftaroline  9  2007  cSSTI  0  
Wilcox et al., 2010  289  338  vancomycin  10  2007  cSSTI  0 
291  342  ceftaroline  10  2007  cSSTI  0  
Talbot et al., 2007  26  32  vancomycin  11  2005  cSSTI  0 
59  67  ceftaroline  11  2005  cSSTI  0  
Weigelt et al., 2005  402  573  vancomycin  12  2003  cSSTI  0 
439  583  linezolid  12  2003  cSSTI  0  
Stevens et al., 2002  54  87  vancomycin  13  1999  cSSTI  0 
64  99  linezolid  13  1999  cSSTI  0  
16  32  vancomycin  14  1999  pneumonia  0  
20  39  linezolid  14  1999  pneumonia  0  
Wunderinket al., 2003  128  302  vancomycin  15  2000  pneumonia  0 
135  321  linezolid  15  2000  pneumonia  0  
Rubenstein et al., 2001  73  192  vancomycin  16  1999  pneumonia  0 
85  203  linezolid  16  1999  pneumonia  0  
Rubenstein et al., 2011  221  374  vancomycin  17  2007  pneumonia  0 
214  372  telavancin  17  2007  pneumonia  0  
228  380  vancomycin  18  2007  pneumonia  0  
227  377  telavancin  18  2007  pneumonia  0  
Fagon et al., 2000  67  148  vancomycin  19  1996  pneumonia  0 
65  150  quinupristin/ dalfopristin 
19  1996  Pneumo nia  0  
Lin et al., 2008  NA  33  linezolid  20  2005  cSSTI  0 
NA  29  vancomycin  20  2005  cSSTI  0  
NA  38  linezolid  21  2005  pneumonia  0  
NA  40  vancomycin  21  2005  pneumonia  0  
Kohno et al., 2007  NA  51  linezolid  22  2004  cSSTI  0 
NA  26  vancomycin  22  2004  cSSTI  0  
NA  31  linezolid  23  2004  pneumonia  0  
NA  17  vancomycin  23  2004  pneumonia  0  
Florescu et al., 2008  NA  70  tigecycline  24  2005  cSSTI  0 
NA  23  vancomycin  24  2005  cSSTI  0  
Itani et al., 2010  223  276  linezolid  25  2007  cSSTI  1 
196  266  vancomycin  25  2007  cSSTI  1  
Wunderink et al., 2008  NA  30  linezolid  26  2005  pneumonia  1 
NA  20  vancomycin  26  2005  pneumonia  1  
Wunderink et al., 2012  102  186  linezolid  27  2010  pneumonia  1 
92  205  vancomycin  27  2010  pneumonia  1 
We ran four methods to obtain estimates of the counterfactual relative risk of both contrasts with the comparator vancomycin. The methods are 1) a ratio of the unadjusted mean outcomes using all available arms (called “No Adjust”), 2) a random effects regression for the armspecific study outcomes using a loglink and a studyspecific intercept (“RE Arm”), 3) GComputation where a random effects logistic regression weighted by the inverse standard errors is used to predict the conditional mean outcomes, and 4) TMLE with a weighted logistic random effects model for the outcome and LASSOpenalized logistic regressions (to handle the sparse data) for the propensity score and a missing data model using the R library glmnet Friedman et al. [38] The missing outcomes required that the TMLE algorithm include fitting a model to estimate the probability of a missing outcome in each study; the TMLE update step was therefore modified to use a product of the propensity score and the probability of observing the outcome in place of
The results of the network metaanalysis are presented graphically in Figure 2 (and numerically in the Appendix Figure 4). We also included the results of the studies that contrasted the two treatments directly. For the comparison of telavancin versus vancomycin, all estimators include the null in the confidence interval. The random effect regression and GComputation produce estimates of the relative risk close to one, indicating near equivalence of treatments while the point estimate of TMLE was further from the null (in the direction of the superiority of vancomycin). Notably, the confidence interval for the TMLE in the first contrast is much wider than the others. The unadjusted method produced a point estimate in the direction of the superiority of telavancin, demonstrating that the correction for studylevel confounding impacted the analysis. For the comparison of linezolid versus vancomycin, the random effects regression, GComputation and TMLE agree on the superiority of linezolid. The original study by Bally et al. [9],also found some suggestion of a superior effect of linezolid compared to vancomycin but for both subpopulations the confidence intervals were large and spanned the null.
Figure 2:
We can also easily obtain estimates of the contrast between telavancin and linezolid. The GComputation and TMLE produce risk ratios for clinical success of 0.94 (
If we are to interpret the summary statistics as estimates of the relative causal effects of antibiotic choice on successful treatment, the causal assumptions in Section 4.3 need to be satisfied. Each of the studies evaluated the clinical efficacy of the treatments, which is defined on patients who had received at least one dose of the study drug. Because randomized treatment was firstline therapy (administered intravenously inhospital) and the success of treatment was determined clinically, each trial estimated the relative effect under full adherence. No interference: No interference is credible in this case because all subjects were already suspected or confirmed to have MRSA upon entry to the study. Therefore, the choice of treatment in the other arm wouldn’t have an effect on existing infections nor the success of treatment. Unconfoundedness: The unconfoundedness assumption relies on whether year, infection type, and whether MRSA was confirmed were sufficient to control for confounding at the studylevel. This assumption could be violated if prognostic demographic variables were involved in the study design stage. However, prognostic markers such as diabetes and peripheral vascular disease (for cSSTI) and mechanical ventilation, APACHE II score, clinical markers of severity, and presence of organ dysfunction [for pneumonia) are unlikely to determine the choice of initial therapy [39], Niederman [40]. Consistency: The dosage regimens varied somewhat across studies but were all considered to be at therapeutic levels. However, the length of time to the evaluation time point for each treatment type varied within and between studies (e. g. 7–14 days for telavancin versus 12–28 days for linezolid]. If this corresponds to meaningfully different treatment durations (and/or periods of time lapsed before evaluation), this would indicate different definitions of interventions across studies, and thus a violation of the consistency assumption. Positivity: All subjects in the study were indicated to receive any of the treatments evaluated.
8 Summary
In this paper, we nonparametrically define the parameter of interest in a network metaanalysis with direct and indirect comparisons using the counterfactual framework often employed in causal inference. This definition of the parameter of interest is modelindependent and is interpretable on what we define as a metapopulation, the union of all superpopulations. Such an approach allows for a straightforward description of what is being estimated, which is accessible even without an understanding of the estimation methods being used. In particular, we can interpret the marginal effects defined in this paper as the relative mean outcome had all subjects in the metapopulation been assigned to each treatment versus another. If a specific population is of interest and not represented by the metapopulation, with some conditions it may also be possible to more generally transport effect estimation, as described by Bareinboim and Pearl [41].
We have presented a set of conditions under which identifiability of the parameter of interest is possible. Identifiability allows for a clear description of when the parameter of interest can and cannot be estimated. For instance, the noninterference requirement casts doubt on the synthesis of studies that allow for treatment switching, crossover, or group contamination. The assumptions that we made allowed for the simplification of the relevant components of the observed data likelihood so that armbased inference is possible.
One might alternatively specify the RCTestimated contrast as the “outcome of interest” (rather than use the armspecific outcome as we did). However, under this alternative, the propensity score would then be defined as the probability of a trial directly contrasting a given treatment pair. For standard network metaanalysis sample sizes, this would most often produce practical positivity problems, indicating the need for extrapolation using the outcome model (and thereby creating estimators that are very sensitive to model misspecification). In particular, two treatments that had never been directly compared would have no data support in this model.
If all treatments are selected completely at random into studies (or if only two treatments have ever been available to compare) then a standard unadjusted analysis using those arms assigned the desired treatments would be consistent. If we weaken this assumption and replace it with conditional exchangeability, then the estimators introduced in this paper are appropriate in that they allow for the adjustment of studylevel covariates.
Our methods also allow for a wider inclusion criteria of studies in a systematic review. It is often the case that systematic reviews will exclude studies because they do not evaluate the exact desired clinical endpoint. Using our proposed methods, we can avoid selection bias due to studies excluded only for this reason. To do so, we would artificially censor the outcomes of studies that do not estimate the desired outcometype of interest. The censored outcomes of these studies might then be considered “missing at random” conditional on the study baseline information which should still be included in the analysis (both in the propensity score model and the missing data model).
For the analysis of continuous individuallevel outcomes, we assumed independence between the sample mean and standard deviation within each study arm. While we chose to present our identifiability argument under this assumption, it is not ultimately necessary. However, it is not straightforward to propose a valid Monte Carlo or Bayesian estimation approach to the setting with dependent sample means and standard deviations. In some cases, it may be possible to transform the individuallevel data to remove the skew, but this relies on access to each study’s raw data, in which case an individual patient data analysis would be preferable.
In the simulation study, we show that certain estimators adopted from the causal inference literature can produce valid estimates of effect contrasts under the identifiability conditions described. In particular, GComputation and TMLE might lend themselves well to network metaanalysis, which is characterized by small sample sizes and low prevalence for certain treatments. IPTW was seen to be sensitive to rare treatment assignment and GComputation and TMLE were seen to be somewhat sensitive to model misspecification. Some general benefits of using TMLE are that it is double robust and can incorporate nonparametric (or machine learning) estimation of the propensity score and outcome model which can help avoid bias from model misspecification vander Laan and Rose [32]. More methods development and investigations are needed to address extremely rare treatments and how (or whether) TMLE can be adapted to be robust in this setting.
The application we presented compared the results of random effects regression, GComputation, and TMLE in a network metaanalysis of the relative efficacy of treatment options for MRSA infection. The random effects regression and GComputation produced small confidence intervals relative to the direct contrasts of the individual RCTs though TMLE only did for one comparison investigated. In contrast to the analysis in the original article that used unadjusted contrastbased hierarchical Bayesian modeling on the separate subpopulations of infection types, our analyses concluded that there is evidence to support the superiority of linezolid over vancomycin. We also noted the poor stability of IPTW in this example and generally do not recommend this estimator when the data support for certain treatment levels is sparse. Finally, using this data example, we demonstrated how the causal assumptions should be listed and critiqued in order to stimulate discussion about the appropriateness of causal interpretations in specific contexts.
The framework we present formally assumes that we are restricting our analyses to studies evaluating a common parametertype. If there was only partialadherence in the RCTs, our framework does not allow for the mixing of intenttotreat parameter estimates with adherenceadjusted parameter estimates. [Estimation of the adherenceadjusted parameters in RCTs is described in [42]. The same restriction applies to the results of observational studies if the parameter type estimated in the observational study is not the same as in the clinical trials. Specifically, treatment adherence and outcome need to be defined identically across studies, and all studies whose endpoints are included must estimate the same mean treatmentspecific counterfactual outcome. Although it is common practice to include different parameter types in a metaanalysis, our formalization of the target parameter reveals that a causal interpretation of the resulting effect estimate may be quite challenging.
In addition to the issues we describe, there are many other concerns about aggregating study results in various settings. For instance, one might question the independence between RCTs happening close in time, or the systematic review inclusion criteria. We believe our framework provides additional structure to the ongoing discussion about the validity of network metaanalysis and will help stimulate solutions to the remaining challenges.
References
1. Slavin RE. Best evidence synthesis. An intelligent alternative to metaanalysis. J Clin Epidemiol 1995;48:9–18. Search in Google Scholar
2. Lumley T. Network metaanalysis for indirect treatment comparisons. Stat Med 2004;21:2313–2324. Search in Google Scholar
3. Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004;23:3105–3124. Search in Google Scholar
4. Caldwell DM, Ades AE, Higgins JPT. Simultaneous comparison of multiple treatments: Combining direct and indirect evidence. BMJ 2005;331:897–900. Search in Google Scholar
5. Salanti G, Higgins JPT, Ades AE, Ioannidis JPA. Evaluation of networks of randomized trials. Stat Methods Med Res 2008;17:279–301. Search in Google Scholar
6. Berlin JA, Golub RM. “Metaanalysis as evidence: Building a better pyramid. J Am Med Assoc 2014;312:603–606. Search in Google Scholar
7. Salanti G, Marinho V, Higgins JPT. A case study of multipletreatments metaanalysis demonstrates that covariates should be considered. J Clin Epidemiol 2009;62:857–864. Search in Google Scholar
8. Jansen JP, Schmid CH, Salanti G. Directed acyclic graphs can help understand bias in indirect and mixed treatment comparisons. J Clin Epidemiol 2012;65:798–807. Search in Google Scholar
9. Bally M, Dendukuri N, Sinclair A, Ahern SP, Poisson M, Brophy J. A network metaanalysis of antibiotics for treatment of hospitalised patients with suspected or proven meticillinresistant Staphylococcus aureus infection. Int J Antimicrob Agents 2012;40:479–495. Search in Google Scholar
10. Robins JM. Confidence intervals for causal parameters. Stat Med 1988;7:773–785. Search in Google Scholar
11. Dias S, Sutton AJ, Ades AE, Welton NJ. A generalized linear modeling framework for pairwise and network metaanalysis of randomized controlled trials. ed Decis Making 2013a;33:607–617. Search in Google Scholar
12. Zhang J, Carlin BP, Neaton JD, Soon GG, Nie L, Kane R, et al. Network metaanalysis of randomized clinical trials: Reporting the proper summaries. Clin Trials 2014;11:246–262. Search in Google Scholar
13. Cope S, Zhang J, Saletan S, Smiechowski B, Jansen JP, Schmid P. A process for assessing the feasibility of a network metaanalysis: A case study of everolimus in combination with hormonal therapy versus chemotherapy for advanced breast cancer. BMC Med 2014;12(93). Search in Google Scholar
14. Lu G, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc 2006;101:447–459. Search in Google Scholar
15. Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 3: Heterogeneitysubgroups, metaregression, bias, and biasadjustment. Med Decision Making 2013b;33:618–640. Search in Google Scholar
16. Jansen PJ, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, et al. Indirect treatment comparison/network metaanalysis study questionnaire to assess relevance and credibility to inform health care decision making: An isporamcpnpc good practice task force report. Value in Health 2014;17:157–173. Search in Google Scholar
17. Welton NJ, Soares MO, Palmer S, Ades AE, Harrison D, ShankarHari M, et al. Accounting for heterogeneity in relative treatment effects for use in costeffectiveness models and valueofinformation analyses. Med Decision Making 2015;35:608–621. Search in Google Scholar
18. Hong H, Chu H, Zhang J, Carlin BP. Rejoinder to the discussion of “A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons,” by S. Dias and A.E. Ades. Res Synth Methods 2016;7:29–33. Search in Google Scholar
19. Gail MH, Wieand S, Piantadosi S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 1984;71:431–444. Search in Google Scholar
20. Zhang J, Chu H, Hong H, Virnig BA, Carlin BP. Bayesian hierarchical models for network metaanalysis incorporating nonignorable missingness. Stat Methods Med Res 2015; doi:10.1177/0962280215596185 Search in Google Scholar
21. Pearl J. 2nd ed. Causality: Models, Reasoning, and Inference. Cambridge University Press: New York, NY, 2009 Search in Google Scholar
22. Alonso A, Van der Elst W, Molenberghs G, Buyse M, Burzykowski T. On the relationship between the causalinference and metaanalytic paradigms for the validation of surrogate endpoints. Biometrics 2015;71:15–24. Search in Google Scholar
23. Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect. Math Model 1986;7:1393–1512. Search in Google Scholar
24. Ferguson TS. A course in large sample theory, Texts in statistical science. Chapman & Hall/CRC: London, UK, 1996. Search in Google Scholar
25. Rubin DB. Randomization analysis of experimental data: The fisher randomization test comment. J Am Stat Assoc 1980;75:591–593. Search in Google Scholar
26. Imbens GW. The role of the propensity score in estimating doseresponse functions. Biometrika 2000;87:706–710. Search in Google Scholar
27. Cole SR, Frangakis CE. The consistency statement in causal inference: A definition or an assumption? Epidemiology 2009;20:3–5. Search in Google Scholar
28. VanderWeele TJ, Hernán MA. Causal inference under multiple versions of treatment. Journal of Causal Inference 2013;1:1–20. Search in Google Scholar
29. Snowden JM, Rose S, Mortimer KM. Implementation of gcomputation on a simulated data set: Demonstration of a causal inference technique. Am J Epidemiol 2011;173:731–738. Search in Google Scholar
30. Efron B, Tibshirani RJ. An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability. Chapman & Hall/CRC: Boca Raton, FL, 1994. Search in Google Scholar
31. van der Laan MJ, Rubin D. Targeted maximum likelihood learning. Int J Biostat 2006;2. Article 11. Search in Google Scholar
32. van der Laan MJ, Rose S. Targeted Learning: Causal Inference for Observational and Experimental Data Springer, Springer Series in Statistics 2011. Springer: New York, NY. Search in Google Scholar
33. Gruber S, van der Laan MJ. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int J Biostat 2010;6. Article 26. Search in Google Scholar
34. Cole SR, Hernán MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol 2008;168:656–664. Search in Google Scholar
35. Schnitzer ME, Moodie EEM, Platt RW. Targeted maximum likelihood estimation for marginal timedependent treatment effects under density misspecification. Biostatistics 2013;14:1–14. Search in Google Scholar
36. Porter KE, Gruber S, van der Laan MJ, Sekhon JS. The relative performance of targeted maximum likelihood estimators. Int J Biostat 2011;7:1–34. Search in Google Scholar
37. Liu C, Bayer A, Cosgrove SE, Daum RS, Fridkin SK, Gorwitz RJ, et al. Clinical practice guidelines by the infectious diseases society of america for the treatment of methicillinresistant staphylococcus aureus infections in adults and children. Clin Infect Dis 2011;52. e18–e55. Search in Google Scholar
38. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22. http://www.jstatsoft.org/v33/i01/. Search in Google Scholar
39. Lipsky BA, Itani KM, Weigelt JA, Joseph W, Paap CM, Reisman A, et al The role of diabetes mellitus in the treatment of skin and skin structure infections caused by methicillinresistant staphylococcus aureus: Results from three randomized controlled trials. Int J Infect Dis 2011;15. e140–e146. Search in Google Scholar
40. Niederman MS. Hospitalacquired pneumonia, health careassociated pneumonia, ventilatorassociated pneumonia, and ventilatorassociated tracheobronchitis: Definitions and challenges in trial design. Clin Infect Dis 2010;51(Suppl 1). S12–S7. Search in Google Scholar
41. Bareinboim E, Pearl J. Metatransportability of causal effects: A formal approach. Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, 2013. Search in Google Scholar
42. Hernán MA, HernándezDíaz S. Beyond the intention to treat in comparative effectiveness research. Clin trials 2012;9:48–55. Search in Google Scholar
43. Tsiatis AA. Semiparametric Theory and Missing Data Springer, Springer Series in Statistics. Springer: New York, NY, 2006. Search in Google Scholar
44. van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer series in statistics. New York: Springer Verlag. 2003. Search in Google Scholar
AA Appendix
A.1 Proof of identifiability under structural independence
The joint counterfactual distribution can be decomposed as
Let
where the integral for
because we assume that the study size has no effect on the individuallevel outcome. It follows that
A.2 Identifiability without assuming structural independence
It may not be plausible to assume conditional independence between
The target parameter can be estimated as a multiple integral over each
Since each component of this density is estimable from the data, we have identifiability of the target parameter in this case as well.
A.3 Efficiency and Consistency of TMLE
The local semiparametric efficiency and estimation consistency of the TMLE we describe can be derived very similarly to the standard observational data setting (with a single categorical exposure variable) for the estimation of the average treatment effect [32]. To give more insight into how this extends to the network metaanalysis case, we present some additional details and a proof of double robustness.
The efficient influence function for parameter of interest
Note that the TMLE update step produces values of
so that it follows that the TMLE is a locally efficient estimator [43, 44]. Specifically, the logistic regression update step with single covariate
First suppose that for increasing values of
Now suppose that
Therefore, if either of the models for
A.4 Data extraction information and numerical results for the example of antibiotic use on methicillinresistant Staphylococcus aureus infection
Table 3 presents the full study list from the systematic review of Bally et al. [9] and the data that we used in the analysis in Section 7. Table 4 presents the numerical results that we obtained from our analyses, corresponding with Figure 2. The full reference list is below.
TEL vs VAN  LIN vs VAN  TEL vs LIN  

Method  Est  SE  95 % CI  EST  SE  95 % CI  
No Adjust  1.04  0.028  (0.99,1.10)  0.92  0.027  (0.87,0.97)  1.13  0.045  (1.05,1.22)  
RE Arm  1.00  0.010  (0.98,1.02)  1.08  0.012  (1.05,1.10)  0.92  0.014  (0.89,0.95)  
GComp (RE)  1.00  0.003  (1.00,1.00)  1.06  0.006  (1.06,1.09)  0.94  0.005  (0.92,0.94)  
TMLE (RE)  0.89  0.106  (0.75,1.19)  1.05  0.012  (1.03,1.07)  0.85  0.102  (0.71,1.12) 
References for the MRSA application
Arbeit R. D., Maki D., Tally F. P., Campanaro E., Eisenstein B. I. Daptomycin 9801 and 9901 Investigators. Clinical Infectious Diseases Vol. 38, 2004:1673–1681. The safety and efficacy of daptomycin for the treatment of complicated skin and skinstructure infections. Search in Google Scholar
Breedt J., Teras J., Gardovskis J., Maritz F. J., Vaasna T., Ross D. P., GioudPaquet M., Dartois N., EllisGrosse E. J., Loh E. and Tigecycline 305 cSSSI Study Group. Safety and efficacy of tigecycline in treatment of skin and skin structure infections: Results of a doubleblind phase 3 comparison study with vancomycinaztreonam. Antimicrobial Agents and Chemotherapy 2005;49:4658–4666. Search in Google Scholar
Corey G. R., Wilcox M. H., Talbot G. H., Thye D., Friedland D., Baculik T. and CANVAS 1 investigators, 2010;Canvas 1: The first phase iii, randomized, doubleblind study evaluating ceftaroline fosamil for the treatment of patients with complicated skin and skin structure infections,” Journal of Antimicrobial Chemotherapy, 65 Suppl 4, iv41–51. Search in Google Scholar
Fagon J., Patrick H., Haas D. W., Torres A., Gibert C., Cheadle W. G., Falcone R. E., Anholm J. D., Paganin F., Fabian T. C., Lilienthal F. “Treatment of grampositive nosocomial pneumonia. prospective randomized comparison of quinupristin/dalfopristin versus vancomycin. nosocomial pneumonia group,” American Journal of Respiratory and Critical Care Medicine 2000;161:753–762. Search in Google Scholar
Florescu I., Beuran M., Dimov R., Razbadauskas A., Bochan M., Fichev G., Dukart G., Babinchak T., Cooper C. A., EllisGrosse E. J., Dartois N., Gandjini H. 307 Study Group, 2008;Efficacy and safety of tigecycline compared with vancomycin or linezolid for treatment of serious infections with methicillinresistant staphylococcus aureus or vancomycinresistant enterococci: a phase 3, multicentre, doubleblind, randomized study,” Journal of Antimicrobial Chemotherapy, 62 Suppl 1, i17–28. Search in Google Scholar
Itani K. M., Dryden H. M. S., Bhattacharyya M.J., Kunkel A., Baruch M., Weigelt J. A. Efficacy and safety of linezolid versus vancomycin for the treatment of complicated skin and softtissue infections proven to be caused by methicillinresistant staphylococcus aureus. American Journal of Surgery 2010;199:804–816. Search in Google Scholar
Katz D. E., Lindfield K. C., Steenbergen J. N., Benziger D. P., Blackerby K. J., Knapp A. G., Martone W. J. “A pilot study of highdose short duration daptomycin for the treatment of patients with complicated skin and skin structure infections caused by grampositive bacteria,” International Journal of Clinical Practice 2008;62:1455–1464. Search in Google Scholar
Rubinstein E., Cammarata S., Oliphant T., Wunderink R. and Linezolid Nosocomial Pneumonia Study Group. Linezolid (pnu100766) versus vancomycin in the treatment of hospitalized patients with nosocomial pneumonia: A randomized, doubleblind, multicenter study. Clinical Infectious Diseases 2001;32:402–412. Search in Google Scholar
Rubinstein E., Lalani T., Corey G. R., Kanafani Z. A., Nannini E. C., Rocha M. G., Rahav G., Niederman M. S., Kollef M. H., Shorr A. F., Lee P. C., Lentnek A. L., Luna C. M., Fagon J. Y., Torres A., Kitt M. M., Genter F. C., Barriere S. L., Friedland H. D., Stryjewski M. E., Study Group ATTAIN. “Telavancin versus vancomycin for hospitalacquired pneumonia due to grampositive pathogens,” Clinical Infectious Diseases 2011;52:31–40. Search in Google Scholar
Sacchidanand S., Penn R. L., Embil J. M., Campos M. E., Curcio D., EllisGrosse E., Loh E., Rose G. “Efficacy and safety of tigecycline monotherapy compared with vancomycin plus aztreonam in patients with complicated skin and skin structure infections. Results from a phase 3, randomized, doubleblind trial,” International Journal of Infectious Diseases 2005;9:251–261. Search in Google Scholar
Stryjewski M. E., Chu V. H., O’Riordan W. D., Warren B.L., Dunbar L.M., Young D.M., Vall´ee M., Fowler V.G. J., Morganroth J., and FAST 2 Investigator Group. Barriere S., Kitt M. M., Corey G. R. Telavancin versus standard therapy for treatment of complicated skin and skin structure infections caused by grampositive bacteria: Fast 2 study. Antimicrobial Agents and Chemotherapy 2006;50:862–867. Search in Google Scholar
Stryjewski M. E., Graham D. R., Wilson S. E., O’Riordan W., Young D., Lentnek A., Ross D. P., Fowler V.G., Hopkins A., Friedland H. D., Barriere S. L., Kitt M. M., Corey G. R. Assessment of Telavancin in Complicated Skin and SkinStructure Infections Study (2008): “Telavancin versus vancomycin for the treatment of complicated skin and skinstructure infections caused by grampositive organisms. Clinical Infectious Diseases 1683–1693;46. Search in Google Scholar
Talbot G. H., Thye D., Das A., Ge Y. 2007;Phase 2 study of ceftaroline versus standard therapy in treatment of complicated skin and skin structure infections,” Antimicrobial Agents and Chemotherapy, 51:3612–3616. Search in Google Scholar
Wilcox M. H., Corey G. R., Talbot G. H., Thye D., Friedland D., Baculik T. and CANVAS 2 investigators, 2010;Canvas 2: The second phase iii, randomized, doubleblind study evaluating ceftaroline fosamil for the treatment of patients with complicated skin and skin structure infections,” Journal of Antimicrobial Chemotherapy, 65 Suppl 4, iv53–65. Search in Google Scholar
Wunderink R. G., Cammarata S. K., Oliphant T. H., Kollef M. H. and Linezolid Nosocomial Pneumonia Study Group. Continuation of a randomized, doubleblind, multicenter study of linezolid versus vancomycin in the treatment of patients with nosocomial pneumonia. Clinical Therapeutics 2003;25:980–992. Search in Google Scholar
Wunderink R. G., Mendelson M. H., Somero M. S., Fabian T. C., May A. K., Bhattacharyya H., Leeper K. V. J., Solomkin J. S. Early microbiological response to linezolid vs vancomycin in ventilatorassociated pneumonia due to methicillinresistant staphylococcus aureus. Chest 2008;134:1200–1207. Search in Google Scholar
Wunderink R. G., Niederman M. S., Kollef M. H., Shorr A. F., Kunkel M. J., Baruch A., McGee W. T., Reisman A., Chastre J. “Linezolid in methicillinresistant staphylococcus aureus nosocomial pneumonia. A randomized, controlled study,” Clinical Infectious Disease 2012;54:621–629. Search in Google Scholar
©2016 by De Gruyter
This article is distributed under the terms of the Creative Commons Attribution NonCommercial License, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.