In causal inference, a variety of causal effect estimands have been studied, including the sample, uncensored, target, conditional, optimal subpopulation, and optimal weighted average treatment effects. Ad hoc methods have been developed for each estimand based on inverse probability weighting (IPW) and on outcome regression modeling, but these may be sensitive to model misspecification, practical violations of positivity, or both. The contribution of this article is twofold. First, we formulate the generalized average treatment effect (GATE) to unify these causal estimands as well as their IPW estimates. Second, we develop a method based on Kernel optimal matching (KOM) to optimally estimate GATE and to find the GATE most easily estimable by KOM, which we term the Kernel optimal weighted average treatment effect. KOM provides uniform control on the conditional mean squared error of a weighted estimator over a class of models while simultaneously controlling for precision. We study its theoretical properties and evaluate its comparative performance in a simulation study. We illustrate the use of KOM for GATE estimation in two case studies: comparing spine surgical interventions and studying the effect of peer support on people living with HIV.
One of the primary goals of causal inference is to estimate the average causal effect of a treatment or intervention on an outcome under study. A common causal estimand of interest is the sample average treatment effect (SATE), which is the average effect of a treatment on an outcome among all individuals in the sample. Often, however, we may be interested in other averages. For example, Buchanan et al.  and Stuart  consider the target average treatment effect (TATE) on a population or sample distinct from the study sample and propose the use of inverse probability of sampling weights. Similarly, if outcome data are only available for some units, Cain and Cole , Robins and Finkelstein  propose the use of inverse probability of censoring weights to generalize the results to the whole sample. Other estimands of interest focus on particular subgroups of the sample such as the sample average treatment effect on the treated (SATT), the conditional average treatment effect [5,6], and the complete-case SATE . In particular, Crump et al.  propose the optimal SATE (OSATE) and, as in ref. , the optimal weighted average treatment effect (OWATE) as the average treatment effect is restricted by or weighted by overlap in covariate distributions to make the estimation easier.
Ad hoc methods, such as those based on inverse probability weighting (IPW) [10,11, 12,13] and outcome regression modeling, have been widely used to estimate these causal estimands. However, due to their sensitivity to model misspecification, these methods may lead to biased estimates. In addition, IPW-based methods depend heavily on the positivity assumption, and practical violations of these methods lead to extreme weights and high variance [14,15, 16,17]. In Section S2.1 in the Supplementary Material, we thoroughly discuss these issues, some of the related work to overcome them and alternative methodologies to estimate the aforementioned causal estimands.
In this article, we start by presenting a general causal estimand, the generalized average treatment effect (GATE), which unifies all the causal estimands previously presented and motivates the formulation of new ones. We then present and apply Kernel optimal matching (KOM) [18,19] to optimally estimate GATE. KOM provides weights that simultaneously mitigates the possible effect of model misspecification and control for possible practical positivity violations . We do that by minimizing the worst-case conditional mean squared error (CMSE) of the weighted estimator in estimating GATE over the space of weights. The proposed methodology has several attractive characteristics. First, KOM can be used to optimally estimate a variety of well-known causal estimands, as well as to find new ones such as the Kernel optimal weighted average treatment effect (KOWATE). In Section 3.3, we show that various causal estimands can be easily estimated by simply modifying the optimization problem formulation we give for KOM, which is fed to an off-the-shelf solver. Second, minimizing the worst-case CMSE of the weighted estimator leads to better accuracy, precision, and total error. We show this in our simulation study in Section 4. Third, by optimally balancing covariates, KOM mitigates the effect of possible model misspecification. In Section 4, we show that both absolute bias and root mean squared error (RMSE) of the weighted estimator that uses weights obtained by using KOM are consistently lower across levels of misspecification. Fourth, the weights are obtained by using off-the-shelf solvers for convex-quadratic optimization. Finally, KOM is implemented in an open source R package.
In Section 2, we introduce notation, specify assumptions and define GATE, the estimand of interest, and its weighted estimator. We then introduce KOM for GATE, describe its theoretical properties, and present some practical guidelines on its use (Section 3). In Section 4, we present the results of a simulation study aimed at comparing the performance of KOM with IPW, overlap weights, truncated weights, and outcome regression modeling with respect to absolute bias and RMSE across levels of practical positivity violations and levels of misspecification. In Section 5, we apply KOM to the evaluation of the effect of spine surgical interventions on the Oswestry disability index (ODI) among patients with lumbar stenosis or lumbar spondylolisthesis, and on the evaluation of peer support on CD4 cell count in two target populations of healthier patients, using real-world data. We conclude with some remarks in Section 6.
2 Generalized average treatment effect
We consider an observational study consisting of units, drawn independently and identically distributed (iid). Using the potential outcome framework , for each unit , we let be the potential outcome of the treatment . We let be the observed confounders. We consider three exclusive and exhaustive subsets of the units: (i) units treated with , for whom we observe ; (ii) units treated with , for whom we observe ; and (iii) unlabeled units, for whom we do not observe any outcome. We let , be the labeled units. We set , the indicator of being treated with , , the indicator of being in the labeled sample, and the indicator of being unlabeled. Let , for . We define the generalized average treatment effect (GATE) as a weighted average difference between the conditional expectation of the potential outcome of both the labeled and unlabeled units:
where is chosen to target the estimand of interest and may depend on (see Assumption 2.5). For instance, when , we target the SATE, and when , we target the TATE. Moreover, by setting equal to the overlap weights [9,21] and the truncated weights , we target the OWATE and OSATE, respectively. We provide examples of causal estimands in the first two columns of Table 1. Note that since we assume iid data, SATE, TATE, OWATE, and OSATE are also within of their population-level averages. In other cases, since can be any function of (and ), a clear population-level quantity may be less immediately apparent.
Notes: , is the propensity score, is the probability of being in the sample, , , and .
To estimate GATE in equation (2.1), we propose to use the following weighted estimator:
For instance, the usual IPW estimator for SATE is given by plugging in , where is the propensity score. In Section 2.1, we provide an analogous generalized formulation of the weights that make unbiased for GATE for any . Notice that the value of for is irrelevant to . Nonetheless, for consistent dimensions, we will use to denote the set of all weights, where simply for may be arbitrary (e.g., zero in above).
2.1 Identification of GATE by weighting
In this section, we provide a general formulation of the weights that make unbiased for GATE for any . To do so, we impose the assumption of consistency, noninterference, ignorable treatment assignment, and ignorable sample assignment . Consistency states that the observed outcome corresponds to the potential outcome of the treatment applied to that unit, and noninterference reflects the fact that units potential outcomes are not affected by how the treatment or intervention has been allocated. Consistency together with noninterference are also known as stable unit treatment value assumption (SUTVA) . Ignorable treatment assignment (also called unconfoundeness, no unmeasured confounding, or exchangeability) states that the potential outcome, , is conditionally independent of the treatment assignment mechanism given covariates. Similarly, ignorable sample assignment states that the potential outcome, , is conditionally independent of the sampling assignment mechanism, i.e., being labeled, given covariates. Thus, while the 1-treated, 0-treated, and unlabeled units can differ systematically in their distribution of (despite units being marginally iid), we assume that, conditioned on , the distributions of the potential outcomes are the same. Treatment and selection overlap state that the propensity of being treated and of being selected conditional on confounders is bounded away from zero and one and away from zero, respectively, ensuring we can always (eventually) find representative units in both treatment groups [20,22]. We formalize these assumptions as follow,
(Ignorable treatment assignment)
(Treatment overlap) The propensity score is bounded away from 0 and 1 for almost every .
(Selection overlap) The sampling probability is bounded away from 0 for almost every .
Letting , we additionally assume,
(Honest weights) and are -measurable.
Assumption 2.5 requires and to be functions of . We can easily relax this assumption to only require that and are independent of all else given , that is, that any residual randomness is independent of units’ idiosyncracies as they pertain to outcomes, and our results easily extend. However, all of our examples actually have and being functions of , which simplifies the presentation.
In the next lemma, we define the generalized IPW weights, , and show that , the weighted estimator in equation (2.2) weighted by , is unbiased for GATE.
Define the generalized IPW weights
This is a well-known result for SATE  and TATE [22,1]. If we assume appropriate bounds on the norms of and the variances of , it is easy to additionally see that has diminishing variance and is therefore also consistent. We show examples of generalized IPW, , in Table 1. Note that when is a deterministic function of , as is usually the case, just means plug in into this function except change to and to . Notice that does not in fact appear in Table 1; this is because it is only relevant when depends on treatment assignment for unlabeled units, which, while permissible in our framework, is an uncommon choice. When does not depend on for unlabeled units, it does not need to be observed or even exist as it appears nowhere else. Note also that we do not need overlap conditions on . Moreover, we do not actually need to know or estimate any of in our method, which we present in the next section.
In the next section, we introduce KOM for estimating GATE, which, instead of plugging estimated propensities into the weighted estimator, provides weights that minimizes the CMSE of for GATE. By doing so, the proposed methodology optimally minimizes the bias with respect to GATE while simultaneously controlling precision. We further consider simultaneously choosing to minimize the worst-case CMSE to obtain KOWATE.
3 Kernel optimal matching for estimating GATE
In this section, we start by decomposing the CMSE of the weighted estimator, , in equation (2.2). We show that this CMSE can be decomposed in terms of (a) the imbalances between the conditional expectations of the potential outcomes among the treated and the control and (b) a variance term (Section 3.1). Since the CMSE depends on some unknown functions (conditional expectations), in Section 3.2, we guard against all possible realizations of the unknown functions by considering the worst-case CMSE of . In Section 3.3, we embed these in reproducing kernel Hilbert spaces (RKHS) and use quadratic programming to minimize the corresponding worst-case CMSE and find optimal weights.
3.1 Decomposing the CMSE of
Recall that, in Section 2, we defined . Further define and , for , and . We then define, for each function , the -moment imbalance between the weighted -treated labeled sample and the -weighted total sample,
where is equal to 1 if and 0 otherwise. In the following theorem, we show that the CMSE of can be decomposed into the squared such imbalances in the conditional expectations of the potential outcomes.
In the next section, we show how to find weights that minimize equation (3.1). The main challenge in this task is that the functions , on which this quantity depends, are unknown.
3.2 Worst-case CMSE
To overcome the issue that we do not know the -functions that the CMSE of depends, we will guard against any possible realizations of the unknown functions. Specifically, since the CMSE of scales quadratically homogeneously with and , we consider its magnitude with respect to that of and . We therefore need to define a magnitude. We choose the following,
where are some extended seminorm on functions from the space of confounders to the space of outcomes. We discuss a specific choice of such extended seminorms in Theorem 3.2. Given this magnitude, we can define the worst-case squared bias as follows:
is the worst-case imbalance in the -moment between the weighted -treated group and the -weighted sample over all functions in the unit ball of .
There are many possible ways to choose the seminorm . RKHS norms is one rich family of choices that offer a flexible and general modeling framework, encompassing both parametric families and nonparametric universal approximators. As we will see, RKHS norms also give rise to an optimization problem that is amenable to both theoretical analysis and to the computational solution, with the choice of particular RKHS only affecting a matrix parameter fed into the solver.
Given a positive semidefinite (PSD) kernel , if we choose the corresponding RKHS (a Hilbert space of functions with continuous evaluations, which is associated with the reproducing kernel ) to specify the norm, we can show that the worst-case imbalance can be expressed as a convex-quadratic function in .
Let be the RKHS norm associated with the PSD kernel . Define the matrix as and note that it is positive semidefinite by definition. Then,
where is the diagonal matrix with in its diagonal entry, and is the diagonal matrix with in its diagonal entry.
Based on Theorem 3.2, letting the RKHS given by the kernel specify the norm, both the worst-case bias and the worst-case CMSE of are convex-quadratic functions in . Specifically, we define the worst-case CMSE as follows:
Note that there is freedom to scale this objective. In particular, represent a ratio between variance and norm of rather than variance bounds in the sense that for any hypothetical norm bound , we have that the worst-case CMSE over -functions with squared norms bounded by and over residual variances bounded by is just a scaling of the aforementioned objective: . Since we will minimize this objective, it does not matter what positive constant we scale it by; all that matters is the trade-off rate between the bias and variance terms. For simplicity, we use within-treatment-group equal variance weights, . More generally, we can use any positive definite matrix to penalize the variances as . In the next section, we show how to minimize the worst-case CMSE of in estimating GATE, , by using off-the-shelf solvers for quadratic optimization.
3.3 Minimizing the worst-case CMSE
In the previous two sections, we showed that the CMSE of in estimating GATE can be decomposed in squared bias plus its variance. We also showed that, since the bias depends on unknown conditional expectations, by guarding against any possible realizations of these unknown functions, embedded in an RKHS given by the kernel , the worst-case CMSE of can be expressed as a convex-quadratic function in . Here, we use quadratic programming to obtain the weights that minimizes the worst-case CMSE of . When interested in estimating, for example, SATE, and TATE, the set of weights is fixed, i.e., all are given, known scalars. We show the corresponding convex-quadratic optimization problem when the set of weights is fixed in the next section. In addition, given the flexibility of the proposed methodology, we can also let be variable and let it be chosen by the solver in such a way that the worst-case CMSE of is minimized. We show this in Section 3.3.2.
Let . When is fixed, we propose to use weights obtained by solving the following optimization problem:
where are interpreted as penalization parameters that control the trade-off between bias and variance. When equal zero, we obtain weights that yield minimal bias. When , we obtain uniform weights. (If we have estimates of heteroskedastic conditional variance, we can also easily use unit-specific weights.) We discuss how to tune this hyperparameter in Section S3.2 of the Supplementary Material. As shown in Theorem 3.2, using an RKHS norm, we can show that the optimization problem (3.3) reduces to the following linearly-constrained convex-quadratic optimization problem:
We can also let be variable. Instead of being given set values, we are given a feasible set . We assume that and that is a polytope (expressed by linear constraints). To simultaneously find the GATE, subject to , that is most easily estimable and the weights to estimate this GATE, we propose to solve the following optimization problem:
When is a singleton, this optimization problem is the same as that in equation (3.3). Again, we can show that the optimization problem (3.5) reduces to a linearly-constrained convex-quadratic optimization problem:
The solution to the optimization problem (3.6) provides both weights that define a GATE of interest and the weights to estimate it. The weights are chosen to allow for minimal CMSE. That is, it focuses on the subpopulation where the average effect on which is easiest to estimate by KOM. We discuss this further in this study.
When we use , we term the resulting GATE estimand Kernel optimal weighted ATE (KOWATE). We can also construct other causal estimands by choosing different . For instance, we may restrict to an unweighted subsample as in the OSATE of  by choosing , where is a chosen subsample size. We refer to this as Kernel optimal SATE (KOSATE). Table 2 summarizes these causal estimands. KOWATE and KOSATE can be seen as an extension of OWATE  and OSATE , where the average treatment effect is weighted by or restricted by overlap in covariate distributions to make the estimation easier. It is worth noticing that, other causal estimands can be easily constructed by plugging the set of overlap or truncated weights of [8,9] as fixed in the optimization problem.
|SATE||Fixed ( )||from (3.3)|
|SATT||Fixed ( )||from (3.3)|
|TATE||Fixed ( )||from (3.3)|
|OWATE||Fixed ( )||from (3.3)|
What target populations are KOWATE and KOSATE choosing? The idea is to pick the subpopulation that is easiest to estimate by KOM. This subpopulation will emphasize areas with better overlap, where overlap is characterized in terms of worst-case moment imbalances as defined by the kernels, rather than in terms of (unknown) propensity scores.
We illustrate this in a simple simulated example described in Figure 1. Specifically, Figure 1 shows scatterplots between two confounders, one on the vertical axis and one on the horizontal axis, weighted by the weights , obtained when targeting SATE (first column of Figure 1, KOSATE (second top panel), KOWATE (third top panel), OSATE (second bottom panel), and OWATE (third bottom panel). The histograms on the top and right axes represent the distributions of the confounders across treated (dark-gray) and control (light-gray). The data was generated to exhibit practical positivity violations and we provide more details on the data generation in the simulation section (Section 4).
When targeting SATE, we consider a fixed that is equal to 1 for all units in the sample. On the other hand, all of KOSATE, OSATE, KOWATE, and OWATE focus on the area of confounders with high overlap. In practice, we find that this translate to better performance, as shown in Table 3 in our case study. KOSATE and OSATE do this while restricting to either including or excluding samples, as can be seen by the two point sizes in Figure 1. KOWATE and OWATE consider a range of weights, as can be seen by the variable point sizes. Visually, the weights that define the GATE for KOSATE and OSATE are similar as they both focus on the area of overlap; the same for KOWATE and OWATE. The differences are that KOWATE and KOSATE guard against possible misspecification of propensity models and that they target the CMSE of the estimator itself, rather than the asymptotic variance, and therefore they account for the desired precision of the KOM estimate that will be applied. We provide a deeper study of this in Section S3.1 of the Supplementary Material, where we consider the effects of misspecification as well as how the weights differ as well between the methods.
|(SE)||1.33 (3.98)||2.54 (2.56)||3.03 (2.38)||5.09* (2.31)|
indicates statistical significance at the 0.05 level.
In this section, we study the consistency of the proposed weighted estimator with respect to the true causal estimand GATE (for both fixed and variables).
The aforementioned theorem shows that for any GATE estimand, under appropriate assumptions, the KOM estimate is root- consistent.
The assumption about the kernel can be automatically satisfied by using a bounded kernel, such as the Gaussian or Matern kernels. The assumption of requires no model misspecification. We can relax this assumption if we use a universal kernel, such as Gaussian, but the rate may deteriorate from to as we need to include a vanishing approximation term. For brevity, we omit the details.
To apply Theorem 3.3 to the case of variable , note that the solution of problem (3.5) is a function of and is therefore honest (satisfies Assumption 2.5) and that, given these , the solution to problem (3.3) is the exactly same as that in problem (3.5), as it can just be viewed as a nested minimization problem (once in and once in ). So to apply Theorem 3.3 to the case of variable weights, we need only guarantee that (we already constrain ). We can either take that as an assumption, or we can enforce it in the construction of by including a bound on each for some . In practice, we find that this is not necessary.
In this section, we present the results of a simulation study aimed at comparing KOM with IPW, overlap weights, truncated IPW, and outcome regression modeling in estimating GATE with respect to absolute bias and root MSE, across levels of practical positivity violations and across levels of misspecification. In summary, KOM showed a consistently low absolute bias and RMSE across all of the considered scenarios.
We considered a sample size of . We computed the potential outcomes from the following models: , , where . We computed the observed outcome as , where , , , . We generated and and considered a convex combination between the correct variables and the misspecified variables , and . The true GATE was computed as . We considered for all units in the sample. We consider several , namely, (a) KOWATE; (b) KOSATE, where we set equal to the number of units chosen by OSATE, (c) SATE, (d) OWATE, and (e) OSATE with . For KOWATE and KOSATE, we consider the KOM weights given by problem (3.5). For SATE, we consider several estimates: (a) KOM as in the optimization problem (3.3); (b) IPW; and (c) using outcome regression modeling. For OWATE and OSATE, we use the estimated propensity to define . To estimate OSATE, we computed the set of truncated weights setting . We used a product of polynomial-degree-2 kernels for KOM. We modeled the propensity score, , by using a polynomial-degree-4 logistic regression and we used a polynomial-degree-4 regression model for outcome regression modeling. We evaluated the performance of the proposed method across levels of practical positivity violation and misspecification as described in Sections 4.1.1 and 4.1.2, respectively. In practice, the solver sometimes fails due to the quadratic objective being numerically nonPSD (despite being PSD in theory). We found that this can be fixed by adding to the quadratic objective matrix to inflate its spectrum slightly. If this also fails, when using a product of polynomial-degree-2 kernels, we rerun the same optimization problem by considering a product of a polynomial-degree-2 and degree-1 kernels. If the latter also failed, then we considered a product of polynomial-degree-1 kernels. We estimated the estimand of interest by plugging in the set of obtained weights into the weighted estimator . We used scikit-learn (through the R package reticulate) to tune the hyperparameters and the R interface of Gurobi to obtain the set of KOM weights. More specifically, within scikit-learn, we used the GaussianProcessRegressor function with the kernel as the parameter (as described in Section S3.2 of the Supplementary Materials). Other parameters, such as alpha, and optimizer, among others were set to default. More details can be found in the scikit-learn’s https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.htmlUser guide. We also included a simple difference in means (Crude) estimator to have a sense of the scale of improvement in absolute bias and RMSE.
4.1.1 Estimating GATE across levels of practical positivity violation
To evaluate the performance of the proposed methodology across levels of practical positivity violation, we let vary between 0.1 and 1. We considered 10 levels. We refer to as weak practical positivity violation, to as moderate, and to as strong. In our simulation scenario, the propensity score, , ranged between 0.37 and 0.63 under weak violation, between 0.07 and 0.92 under moderate violation, and between 0.007 and 0.993 under strong violation (average of min/max of estimated propensities over simulations under no misspecification).
4.1.2 Estimating GATE across levels of misspecification
We also evaluated the performance of the proposed methodology across levels of misspecification. To do so, we used the variables instead of and considered three levels, , which we refer to as correct specification (which is also overparametrized because we use the polynomial models previously described in all scenarios), , which we refer to as moderate misspecification, and , which we refer to as strong misspecification.
In this section, we discuss the results of our simulation study. In summary, KOM outperformed IPW, overlap, truncated weights, and outcome regression modeling with respect to absolute bias and RMSE in estimating GATE across levels of practical positivity violation under both moderate and strong misspecification.
4.2.1 Results across levels of practical positivity violations and model misspecification
Reference  presented KOM for SATE. The authors showed that KOM outperformed IPW, truncated IPW, propensity score matching, regression adjustment, CBPS, and SBW with respect to bias and MSE across most of the considered levels of practical positivity violation and considered scenarios. In addition, the authors showed that KOM for SATE outperformed the other methods especially under strong practical positivity violation. Figure 2 shows the absolute bias (left panels) and RMSE (right panels) of SATE estimated by using KOM (KOM-SATE; solid-black), KOSATE by using KOM (solid-dark-gray), KOWATE estimated by using KOM (solid-light-gray), SATE estimated by using IPW (long-dashed-black), OSATE estimated by using truncated weights (long-dashed-dark-gray), OWATE estimated by using overlap weights (long-dashed-light-gray), and SATE estimated by using outcome regression modeling (OM; dotted-black), and a simple mean difference (Crude; dotted-light-gray), with estimated optimal (i.e., ). The top panels of Figure 2 show absolute bias and RMSE across levels of practical positivity violations under strong misspecification, while the middle and bottom panels under moderate misspecification and correct specification, respectively. In summary, KOM showed a consistently low absolute bias and RMSE across all the considered scenarios, outperforming the other methods especially under strong practical positivity violation and strong misspecification (top-right panel). KOM matched the performance of the other methods with respect to absolute bias and RMSE under weak levels of practical positivity violations and correct model misspecification. We obtained similar results when (Figure S4). IPW exhibited extremely high absolute bias and RMSE under moderate to strong practical positivity violations and moderate to strong model misspecification. Under moderate and strong misspecification, OM resulted in even higher absolute bias and RMSE across all levels of practical positivity violations that the results are outside the plot region in the top and middle panels of Figures 2 and 4. In Figure 1, we showed that the weights obtained by using KOSATE and KOWATE select a population where there is more overlap compared with that of SATE. In our simulation results, this can be seen in the right bottom panel of Figure 2, where the RMSE obtained by using SATE is higher than that of KOWATE and KOSATE with similar absolute bias. To further analyze this behaviour, we evaluated the distribution of the maximum value of weights obtained using SATE, KOWATE, and KOSATE. While SATE obtained a median maximum value of weight around 27, KOWATE and KOSATE obtained values of 7 and 9, respectively. Consequently, the standard error computed from simulations for SATE, KOWATE, and KOSATE was around 0.29, 0.12, and 0.13, respectively (Figure S11 shows the distribution of the maximum value of the weights computed by using SATE, KOWATE, and KOSATE). We provide additional simulations results and some considerations about standard error estimation and coverage in Sections S4.1 and S4.2, respectively, in the Supplementary Material.
5 Application case studies
In this section, we present an empirical application of the proposed methodology. We apply KOM in the evaluation of two spine surgical interventions. In addition, in a second empirical application presented in Section S5 of the Supplementary Material, we apply KOM in the evaluation of peer support on CD4 cell count at 12 months after trial recruitment among patients affected by HIV, in two target populations where patients were healthier compared to those of the trial population.
5.1 The effect of fusion-plus-laminectomy on ODI
In this section, we apply KOM in the evaluation of two spine surgical interventions, laminectomy alone versus fusion-plus-laminectomy, on the Oswestry Disability Index (ODI), among patients with lumbar stenosis or lumbar spondylolisthesis. Briefly, lumbar stenosis is caused by the narrowing of the space around the spinal cord in the lumbar spine . Lumbar spondylolisthesis is caused by the slippage of one vertebra on another. These pathologies lead to low back and leg pain, ultimately limiting the quality of life of those patients affected by them . In case these pathologies are not anymore controlled by medications or physical therapy, surgical interventions may be needed. Typically, patients with lumbar stenosis are treated with laminectomy alone, while those with lumbar spondylolistheses with fusion-plus-laminectomy [23,25,26]. In addition, laminectomy alone is done to patients with leg pain, while fusion-plus-laminectomy to patients with mechanical back pain . This surgical practice leads to a practical positivity violation.
Differently from other medical areas where randomized controlled trials are the gold standard to evaluate interventions, the use of randomized controlled trials to evaluate surgical interventions is rare. This is due to practical and methodological issues . Lately, a number of large real-world observational datasets have collected information about surgical interventions and outcomes. However, these datasets are purely observational and confounding must be carefully taken into account. Furthermore, the assumption of the correct model specification is hardly ever met. To overcome these challenges, in this section, we evaluate the effect of fusion-plus-laminectomy on ODI by estimating SATE, KOWATE, and KOSATE using KOM.
5.1.1 Study population
We used data from a single-institutional subset of the Spine QOD registry . QOD was launched in 2012 with the goal of evaluating the effectiveness of spine surgery interventions on the improvement quality of life, pain, and disability. The registry contains clinical and demographic information as well as patient-reported outcomes. We restrict our study to patients who had their first spine surgery intervention, i.e., primary surgery. Demographic and clinical information was collected at the time of the patient interview that happened before surgical intervention. The outcome under study, ODI, was collected at 3-month follow-up. The study subset was composed of 313 patients. Two-hundred forty-nine (79%) received laminectomy alone and 64 (21%) fusion-plus-laminectomy. We identified as potential confounders the following variables: biological sex (female vs male), lumbar stenosis (yes vs no), lumbar spondylolistheses (yes vs no), back pain (score from 0 to 10), leg pain (score from 0 to 10), activity at home (yes vs no), and activity outside home (yes vs no). As previously described, spine surgical practice may lead to a practical violation of the positivity assumption. For example, in our subset, less than 1% of patients with low-to-moderate leg pain were treated with fusion-plus-laminectomy.
5.1.2 Models setup
We estimate SATE by solving optimization problem (3.3) with , and KOWATE and KOSATE by solving optimization problem (3.5), where we set and , respectively. We obtained by summing truncated weights obtained by using a logistic regression model and setting . Once the set of weights was obtained, we plugged them into a weighted ordinary least squares estimator. We used scikit-learn (through the R package reticulate) to tune the hyperparametes and the R interface of Gurobi to obtain the set of KOM weights. We computed robust (sandwich) standard errors in each case [29,12,30]. We used the R packages lm for estimating SATE, KOWATE, and KOSATE and sandwiched to estimate robust standard errors.
In this section, we present the results of our analysis. Previous randomized trials showed no statistically significant difference between laminectomy alone versus fusion-plus-laminectomy on ODI [31,32]. The proposed methodology consistently showed similar results to those of [31,32]. Specifically, Table 3 shows point estimates and standard errors with respect to SATE, KOWATE, and KOSATE. While the unadjusted method, i.e., naive method regressing only the treatment on the outcome, shows a significant effect of fusion-plus-laminectomy on ODI, adjusted estimates from SATE, KOWATE, and KOSATE show a non statistically significant effect of it. Standard errors are lower for KOWATE and KOSATE compared to SATE. Figure 3 shows the covariate balance with respect to SATE (top panel), KOSATE (middle panel), and KOWATE (lower panel). The black dots show the level of balance after weighting, while the light-gray dots show the unadjusted balance. KOWATE provides the lowest covariate balance compared with SATE and KOSATE. Finally, on the basis of the results obtained by applying KOM, we conclude that fusion-plus-laminectomy has no statistically significant effect on ODI.
5.1.4 What populations are KOWATE and KOSATE choosing?
As discussed in Section 3.3.2, by changing the set of weights , we change the target causal estimand considered and consequently the target population under study. In this section, based on the setting described in the previous section, we describe what population KOWATE and KOSATE are targeting. Table 4 shows the distributions of the confounders considered for analysis, in the study population (SATE), and in the target populations obtained when using the weights from KOWATE and KOSATE. Sample sizes for the KOWATE and KOSATE population are 313 and 249, respectively. In other words, KOSATE discarded 64 participants from the study populations. The minimum and maximum values for the weights chosen by KOWATE among the 64 participants discarded by KOSATE is 0.000001 and 0.01, respectively. This suggests that both KOSATE and KOWATE de-emphasize similar participants. In addition, using standard logistic regression models for IPW, the minimum and maximum estimated propensities of getting treated among these 64 discarded participants is 0.00001 and 0.99998 (with the mean equal to 0.07812), suggesting that both KOWATE and KOSATE discard participants for which there is the lack of overlap. Consequently, in both target populations obtained by using KOWATE or KOSATE, the distribution of participants without physical activity outside and inside the home is down-weighted. These participants would also be discarded if using truncated IPW weights at a reasonable truncation level. The mean and standard deviation (SD) of the continuous confounders considered (leg and back pain) do not seem to significantly change across populations. For instance, while the mean and SD of leg pain in the study population is 6.98 (2.87), in the KOWATE and KOSATE population, they are 6.88 (2.64) and 7.12 (2.51), respectively. Similar results are obtained for lumbar stenosis and gender. Finally, KOWATE up-weights participants with lumbar spondylolisthesis compared with those of the study population and KOSATE.
|Lumbar spondylolisthese (Y)||53.0 (16.9)||83.7 (26.7)||43.0 (17.3)|
|Lumbar stenosis (Y)||183.0 (58.5)||155.3 (49.6)||133.0 (53.4)|
|Physical activity outside home (N)||47.0 (15.0)||16.7 ( 5.3)||26.0 (10.4)|
|Physical activity inside home (N)||42.0 (13.4)||14.2 ( 4.5)||12.0 ( 4.8)|
|Gender (M)||188.0 (60.1)||192.6 (61.5)||152.0 (61.0)|
|Mean (SD)||Mean (SD)||Mean (SD)|
|Leg pain||6.98 (2.87)||6.88 (2.64)||7.12 (2.51)|
|Back pain||6.01 (3.29)||5.70 (3.17)||6.03 (3.12)|
In this article, we presented a general causal estimand, GATE, that unified previously proposed causal estimand, such as SATE, OWATE, OSATE, and TATE among others and motivated the formulation of new ones. We also presented and applied KOM to optimally estimate GATE. KOM directly and optimally control both bias and variance, which leads to a successful mitigation of possible model misspecifications while controlling precision. In addition, by easily modifying the optimization problem that is fed to an off-the-shelf solver, the proposed method effectively targets different causal estimands of interest. Furthermore, by automatically learning the structure of the data, KOM allows to balance linear, nonlinear, additive, and nonadditive covariate relationships. One future direction may be to extend KOM for GATE in the longitudinal setting with time-dependent confounders, extending the work of  to more general estimands. Another future direction may be to extend the analysis (specifically, Lemma 2.1 and Theorem 3.3) to the stratified setting where we condition on and/or . Yet another future direction may be to extend other approaches that aim for robustness to misspecification to apply to the problem of estimating GATE [34,35,36, among others], for example, by deriving a Neyman-orthogonal score for GATE by orthogonalizing the identification formula in Lemma 2.1 and applying .
An issue of KOWATE and KOSATE is their interpretation. Here, we provide rationale for the consideration of KOWATE and KOSATE, especially in the presence of the lack of overlap. First, similar to truncated IPW and overlap weights, the target population obtained by KOWATE and KOSATE is clinically relevant. This is because it highlights the portion of the sample where the treatment is actually applied . For instance, in the study population described in Section 5, less than 1% of patients with low-to-moderate leg pain were treated with fusion-plus-laminectomy. This would suggest that it would be more clinically relevant to target a population where subjects with low-to-moderate leg pain and those who receive fusion-plus-laminectomy were not included. This is particularly important when emulating a target randomized trial, where propensity scores close to 0 or 1 would suggest that important inclusion or exclusion criteria may not have been followed . Second, in the case of homogeneous treatment effect, as in our simulation setting, the true effect of SATE, KOWATE, and KOSATE is the same. For instance, assuming that the effect of fusion-plus-laminectomy is homogeneous, a claim that it has no effect on ODI can be interpreted as that for SATE, the average effect of a treatment on an outcome among all individuals in the sample. Similar reasoning can be applied for the interpretation of confidence intervals. However, in the presence of lack of overlap, KOWATE and KOSATE may be helpful in providing more precise estimates (as shown in Section S4.4 of the Supplementary Material). In the case of heterogeneous effects, the problem of large weights exacerbate, and, consequently, conditional KOWATE and KOSATE could be used. Third, although this does not solve the issue of interpretation, deviation of estimands from SATE is common in statistical data analysis practice. For instance, many matching algorithms exclude subjects depending on some tuning parameters . Another widely used technique, weight trimming/truncation also alters the estimand. While these deviations from SATE are intentionally done to deal with lack of overlap, weight truncation, for instance, may introduce bias. Similar to other recently proposed successful techniques, such as overlap weights, the proposed estimands and methodology provide a less ad hoc way to deal with these issues.
The Authors thank Mattias Larsson, Nguyen Thi Kim Chuc, Do Duy Cuong and Vu Van Tam for providing access to the HIV dataset. This article is based upon work supported by the National Science Foundation under Grants Nos. 1656996 and 1740822.
Funding information: This article is based upon work supported by the National Science Foundation under Grants Nos. 1656996 and 1740822.
Conflict of interest: Authors state no conflict of interest.
 Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE, Daar ES, et al. Generalizing evidence from randomized trials using inverse probability of sampling weights. J R Statist Soc A (Statist Soc). 2018;181(4):1193–209. 10.1111/rssa.12357Search in Google Scholar PubMed PubMed Central
 Cain LE, Cole SR. Inverse probability-of-censoring weights for the correction of time-varying noncompliance in the effect of randomized highly active antiretroviral therapy on incident AIDS or death. Statist Med. 2009;28(12):1725–38. 10.1002/sim.3585Search in Google Scholar PubMed
 Robins JM, Finkelstein DM. Correcting for noncompliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics. 2000;56(3):779–88. 10.1111/j.0006-341X.2000.00779.xSearch in Google Scholar PubMed
 Cai T, Tian L, Wong PH, Wei L. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics. 2010;12(2):270–82. 10.1093/biostatistics/kxq060Search in Google Scholar PubMed PubMed Central
 Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89(427):846–66. 10.1080/01621459.1994.10476818Search in Google Scholar
 Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Statistical models in epidemiology, the environment, and clinical trials. New York, NY: Springer; 2000. p. 95–133. 10.1007/978-1-4612-1284-3_2Search in Google Scholar
 Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statist Med. 2004;23(19):2937–60. 10.1002/sim.7231Search in Google Scholar PubMed
 Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Statist Assoc. 1995;90(429):106–21. 10.1080/01621459.1995.10476493Search in Google Scholar
 Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J Am Statist Assoc. 1999;94(448):1096–120. 10.1080/01621459.1999.10473862Search in Google Scholar
 Robins J, Sued M, Lei-Gomez Q, Rotnitzky A. Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statist Sci. 2007;22(4):544–59. 10.1214/07-STS227DSearch in Google Scholar
 Kang JD, Schafer JL. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist Sci. 2007;22(4):523–39. Search in Google Scholar
 Kallus N. Generalized Optimal Matching Methods for Causal Inference. 2016. arXiv:http://arXiv.org/abs/arXiv:161208321. Search in Google Scholar
 Kallus N, Pennicooke B, Santacatterina M. More Robust Estimation of Sample Average Treatment Effects Using Kernel Optimal Matching in an Observational Study of Spine Surgical Interventions. 2018. arXiv:http://arXiv.org/abs/arXiv:181104274. Search in Google Scholar
 Stuart EA, Cole SR, Bradshaw CP, Leaf PJ. The use of propensity scores to assess the generalizability of results from randomized trials. J R Statist Soc A (Statist Soc). 2011;174(2):369–86. 10.1111/j.1467-985X.2010.00673.xSearch in Google Scholar PubMed PubMed Central
 Resnick DK, Watters WC, Mummaneni PV, Dailey AT, Choudhri TF, Eck JC, et al. Guideline update for the performance of fusion procedures for degenerative disease of the lumbar spine. Part 10: lumbar fusion for stenosis without spondylolisthesis. J Neurosurgery: Spine. 2014;21(1):62–6. 10.3171/2014.4.SPINE14275Search in Google Scholar PubMed
 Waterman BR, Belmont Jr PJ, Schoenfeld AJ. Low back pain in the United States: incidence and risk factors for presentation in the emergency setting. Spine J. 2012;12(1):63–70. 10.1016/j.spinee.2011.09.002Search in Google Scholar PubMed
 Eck JC, Sharan A, Ghogawala Z, Resnick DK, Watters III WC, Mummaneni PV, et al. Guideline update for the performance of fusion procedures for degenerative disease of the lumbar spine. Part 7: lumbar fusion for intractable low-back pain without stenosis or spondylolisthesis. J Neurosurgery: Spine 2014;21(1):42–7. 10.3171/2014.4.SPINE14270Search in Google Scholar PubMed
 Raad M, Donaldson CJ, El Dafrawy MH, Sciubba DM, Riley III LH, Neuman BJ, et al. Trends in isolated lumbar spinal stenosis surgery among working US adults aged 40–64 years, 2010–2014. J Neurosurgery: Spine 2018;29(2):169–75. 10.3171/2018.1.SPINE17964Search in Google Scholar PubMed
 NeuroPoint Alliance I. QOD spine surgery registry; 2018. http://www.neuropoint.org/registries/qod-spine/. Search in Google Scholar
 Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the joint causal effect of nonrandomized treatments. J Am Statist Assoc. 2001;96(454):440–8. 10.1198/016214501753168154Search in Google Scholar
 Försth P, Ólafsson G, Carlsson T, Frost A, Borgström F, Fritzell P, et al. A randomized, controlled trial of fusion surgery for lumbar spinal stenosis. New England J Med. 2016;374(15):1413–23. 10.1056/NEJMoa1513721Search in Google Scholar PubMed
 Ghogawala Z, Dziura J, Butler WE, Dai F, Terrin N, Magge SN, et al. Laminectomy plus fusion versus laminectomy alone for lumbar spondylolisthesis. New England J Med. 2016;374(15):1424–34. 10.1056/NEJMoa1508788Search in Google Scholar PubMed
 Kallus N, Santacatterina M. Optimal balancing of time-dependent confounders for marginal structural models. 2018 June. arXiv e-prints. arXiv:1806.01083. Search in Google Scholar
 Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, et al. Double/debiased machine learning for treatment and structural parameters. Econometrics J. 2018;21(1):C1–C68. 10.1111/ectj.12097. Search in Google Scholar
 Hazlett C. Kernel balancing: a flexible non-parametric weighting procedure for estimating causal effects. 2018. Available at SSRN 2746753. Search in Google Scholar
 Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64. 10.1093/aje/kwv254Search in Google Scholar PubMed PubMed Central
© 2022 Nathan Kallus and Michele Santacatterina, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.