In the causal adjustment setting, variable selection techniques based only on the outcome or only on the treatment allocation model can result in the omission of confounders and hence may lead to bias, or the inclusion of spurious variables and hence cause variance inflation, in estimation of the treatment effect. We propose a variable selection method using a penalized objective function that is based on both the outcome and treatment assignment models. The proposed method facilitates confounder selection in high-dimensional settings. We show that under some mild conditions our method attains the oracle property. The selected variables are used to form a doubly robust regression estimator of the treatment effect. Using the proposed method we analyze a set of data on economic growth and study the effect of life expectancy as a measure of population health on the average growth rate of gross domestic product per capita.
In the analysis of observational data, when attempting to establish the magnitude of the causal effect of treatment (or exposure) in the presence of confounding, the practitioner is faced with certain modeling decisions that facilitate estimation. Should one take the parametric approach, at least one of two statistical models must be proposed; (i) the conditional mean model that models the expected outcome as a function of predictors, and (ii) the treatment allocation model that describes the mechanism via which treatment is allocated to (or, at least, received by) individuals in the study, again as a function of the predictors [1, 2].
Predictors that appear in both mechanisms (i) and (ii) are termed confounders, and their omission from models (i) and (ii) is typically regarded as a serious error, as it leads to inconsistent estimators of the treatment effect. Thus practitioners usually adopt a conservative approach, and attempt to ensure that they do not omit confounders by fitting a richly parameterized treatment allocation model. The conservative approach, however, can lead to predictors of treatment allocation only – and not outcome – being included in the treatment allocation model. The inclusion of such “spurious" variables in model (ii) is usually regarded as harmless. However, the typical forfeit for this conservatism is inflation of variance of the effect estimator [3, 4]. Moreover,  and  showed that inclusion of variables that are only predictors of treatment may cause bias [7, 8]. This problem also applies to the conditional mean model, but is in practice less problematic, as practitioners seem to be more concerned with bias removal, and therefore more likely to introduce the spurious variables in model (ii). Little formal guidance as to how the practitioner should act in this setting has been provided. In this paper, we refer to variables that only predict the treatment and are not associated with the outcome as treatment predictors.
As has been shown by , it is plausible that judicious variable selection may lead to appreciable efficiency gains, and several approaches with this aim have been proposed [10, 11]. However, confounder selection methods based on either just the treatment assignment model or just the outcome model may fail to account for non-ignorable confounders which barely predict the treatment or the outcome, respectively [12, 13]: in this manuscript, we use the term weak confounder for these variables.  shows that confounder selection procedures based on AIC and BIC can be sub-optimal and introduce a method based on the focused information criterion (FIC) which targets the treatment effect by minimizing a prediction mean square error (see also the cross-validation method of ).  introduces a Super Learner estimator which is computed by selecting a candidate from a set of estimators obtained from different models using a cross-validation risk [17, 18].
Bayesian adjustment for confounding (BAC) is a parametric variable selection approach introduced by ; BAC specifies a prior distribution for a set of possible models which includes a dependence parameter, , representing the odds of including a variable in the outcome model given that the same variable is in the propensity score model . If we know a priori that a predictor of treatment is in fact a confounder, then can be set to [13, 21].  proposes a decision-theoretic approach to confounder selection that can handle high-dimensional cases; a Bayesian regression model is fit and using the posterior credible region of the regression parameters, a set of candidate models is formed. A sparse model is then found by penalizing models that do not include confounders. This method is conservative in the sense that it may include treatment predictors that may inflate the variance of the treatment effect . Also, tuning the penalty function can be challenging. Doubly robust procedures have also been proposed where variables are selected using both the outcome and the treatment assignment models [12, 24].
Asymptotically, it is known that penalizing the conditional outcome model, given treatment and covariates, results in a valid variable selection strategy for causal effect estimation; however, for small to moderate sample sizes, it may result in the omission of weak confounders . The objective of this manuscript is to improve the small sample performance of the outcome penalization strategy while maintaining its asymptotic performance (Table 2). We present a covariate selection procedure which facilitates the estimation of the treatment effect in the high-dimensional cases. Specifically, we propose a penalized objective function which considers both covariate-treatment and covariate-outcome associations and has the ability to select even weak confounders. This objective function is used to identify the set of non-ignorable confounders and predictors of outcome; the resulting parameter estimates do not have any causal interpretation. We derive the asymptotic properties of procedure and show that under some mild conditions the estimators have oracle properties (specifically, are consistent and asymptotically normally distributed). We utilize the selected covariates to estimate the causal effect of interest using a doubly robust estimator.
2 Preliminaries & notation
In standard notation, let denote the (potential) outcome arising from treatment , and let denote the treatment received. We consider for illustration the case of binary treatment. The observed outcome, , is defined as . We restrict attention here to the situation where each predictor can be classified into one of three types, and to single time-point studies. We consider
treatment predictors (), which are related to treatment and not to outcome.
confounders (), which are related to both outcome and treatment.
outcome predictors (), which are related to outcome and not to treatment;
see the directed acyclic graph (DAG) in Figure 1.
In addition, as is usual, we will make the assumption of no unmeasured confounders, that is, that treatment received and potential outcome to treatment , , are independent, given the measured confounders, i.e., . In any practical situation, to facilitate causal inference, the analyst must make an assessment as to the structural nature of the relationships between the variables encoded by the DAG in Figure 1.
The propensity score for binary treatments
The propensity score, , for binary treatment is defined as , where is a -dimensional vector of (all) covariates. In its random variable form,  show that is the coarsest function of covariates that exhibits the balancing property, that is, . As a consequence, the causal effect can be computed by iterated expectation
Remark 1: Inclusion of covariates that are just related to the outcome in the propensity score model increases the covariance between the fitted and , and decreases the variance of the estimated causal effect, in line with the results of  and .
2.2 Penalized estimation
In a given parametric model, if is a -dimensional regression coefficient, is a penalty function and is the negative log-likelihood, the maximum penalized likelihood (MPL) estimator is defined as
MPL estimators are shrinkage estimators, and as such, they typically have more finite sample bias, though less variation than unpenalized analogues. Commonly used penalty functions include LASSO , SCAD , Elastic Net  and HARD .
The remainder of this paper is organized as follows. Section 3 presents our two step variable selection and estimation procedure; we establish its theoretical properties. The performance of the proposed method is studied via simulation in Section 4. We analyze a real data set in Section 5, and Section 6 contains concluding remarks. All proofs are given in Appendix A of the Supplementary Materials.
3 Penalization and treatment effect estimation
In this section, we present our proposed method for estimating the treatment effect in high-dimensional cases. We separate the covariate selection and treatment effect estimation procedure. First, we form a penalized objective function which is used to identify the important covariates, and establish the theoretical properties of the resulting estimators (i.e., minimizers of the penalized objective function). Note that because of the special characteristics of this function the estimators do not have any causal interpretation and are used just to prioritize variables. Second, treatment effect estimation is performed using a doubly robust estimator with the selected covariates. We use a simple model structure to illustrate the methodology, and assume that a random sample of size of observations of outcome, exposure and covariates is available.
In order to present the method, we initially assume that columns of are orthogonal; this simplifying assumption is relaxed in Appendix C of the Supplementary Materials.
3.1 Penalized objective function
Consider the following linear outcome model under a binary exposure
where is a standardized covariate vector, and is standard normal residual error. Assuming a logit model for the propensity score, we have
Let denote the corresponding design matrix. In the formulation, and denote vectors of parameters and is the treatment effect.
First, we form an objective function, , in which the coefficients and are replaced by a single common vector of coefficients that is proportional to a weighted sum of and , where denotes the componentwise absolute value. This will guarantee that the objective function satisfies the following condition:
Condition: An element, , of the minimizer of converges to zero as if the corresponding covariate is not associated with both treatment and outcome.
In standard linear regression where we regress on a vector of covariates (with no treatment variable) that are presumed orthogonal, an example of an objective function that estimates is
yielding the estimating equation and estimate
We have used the fact is an identity matrix due to standardization. It is clear that as , Similarly, in a binary exposure setting, can be estimated using the objective function which is based on minus the log likelihood for a logistic regression. Now, we combine these two objective functions and show that the resulting objective function have some interesting features. First, we obtain the least squares (or ridge) of by regressing on and the vector of covariates . Because of the inclusion of spurious and treatment predictors in the model, is not efficient but is but is consistent for under the no unmeasured confounders assumption and a correctly specified outcome model. Define . Let
where is a positive constant. Under orthogonality of and standardization, we have
Note that for each , the parameter corresponding to is the same in both models. Now, we show that in this function the absolute values play a critical rule in satisfying the condition. Let and be the least squares estimate of the parameters in the outcome model. By convexity of , must be the unique solution to , which implies that
This equation shows that (i.e. the condition is satisfied) if . Note that implies . Let be the maximum likelihood estimate of the parameters in the treatment model. Then, using the first two terms of a Taylor expansion, we have , , and thus
Hence, is a weighted sum of and . The constant controls the contribution of components and to the estimator. For example, for , ; while for , . Thus, gives a flexibility to our objective function such that as it decreases to zero the proposed estimate converges to . See Section 3.2 for more detailed discussion.
Figure 2 visually presents how behaves, for a fixed , when converges to zero as sample size increases, i.e., the th variable becomes insignificant. Specifically, this figure presents a case where there is just one covariate (i.e., ) and the coefficient of this covariate in outcome and treatment models are and , respectively, where is the sample size. As expected, corresponding to this covariate does not converge to zero as sample size increases. The same behavior would be observed if as and is a non-zero constant.
Therefore, penalizing objective function (2) results in selecting covariates that are either related to outcome (Type-III) or treatment (Type-I). However, this is against our goal of keeping variables that are either predictors of the outcome or non-ignorable confounders and excluding treatment predictors (Type-I). To deal with this problem, we present a weighted lasso penalty function that is tailored specifically for causal inference variable selection:
where and are the least squares (or ridge) and maximum likelihood estimates of the parameters in the outcome and treatment models, respectively, and is a tuning parameter. Thus, the proposed modified penalized objective function is given by
We refer to as penalized modified objective function estimators (PMOE). The magnitude of the penalty on each parameter is proportional to its contribution to the outcome and treatment model. Note that as , our penalty function puts more penalty on the th parameter while considering the covariate-treatment association. For example, when a covariate barely predicts the outcome and treatment, our proposed penalty function imposes a stronger penalty on the parameter compared to a case where a covariate barely predicts the outcome and is strongly related to treatment. This is an important feature of the proposed penalty function which allows selecting such weak confounders for small sample sizes. The proposed penalized estimator asymptotically selects the same covariates as the adaptive lasso on the outcome model [25, 32]. Thus, the proposed method improves the small sample performance of the outcome model penalization strategy while maintaining the asymptotic properties of this strategy.
3.2 The role of in PMOE
The constant reflects investigators belief about the importance of including variables that are weakly(strongly) related to outcome and strongly(weakly) related to treatment. For smaller values of , variables that are weakly related to the outcome but strongly to the treatment have more contribution to and have more chance to be selected. Also, when , the treatment mechanism does not have any contribution in the objective function (2). Thus, our procedure performs similarly to the outcome model based variable selection that may exclude non-ignorable confounders that are weakly related to the outcome. Often investigators do not have an idea a priori about how to weigh the importance of different types of confounders. In Section 3.4, a method that helps the investigators to properly choose this tuning parameter is presented. Our simulation studies shed more light on the role of in our procedure.
3.3 Main theorem
Suppose is the true parameter value of the -dimensional vector of parameters where contains those elements of that are in fact zero, so that the corresponding predictors are not confounders; denotes the true number of predictors present in the model (exact sparsity assumption). Let be the vector of estimators corresponding to (4). The next theorem proves the sparsity and asymptotic normality of the proposed penalized estimators under the following two conditions:
 (a) Let
We assume that converges in distribution a multivariate normal .
 (b) Let
where is a positive definite matrix and is the identity matrix.
(Oracle properties) Suppose conditions (a) & (b) are fulfilled, further and . Then
where is the true vector of non-zero coefficients. Also, and are corresponding elements of and , respectively. 1
3.4 Choosing the tuning parameters
We select the tuning parameter using the Generalized Cross Validation (GCV) method suggested by  and . Let be the selected covariates corresponding to a specific value of , we first regress on and calculate the residual sum of square (RSS) of this this model. Then
where is the effective number of parameters and . The selected tuning parameter is defined by .
3.5 Estimation of the causal effect
For treatment effect estimation, we fit the following model using the set of covariates selected in the previous step; note that a user may want to use other causal adjustment models such as inverse probability weighting or propensity score matching, and the selection approach can be used for these procedures also.
Our model is a slight modification of the conventional propensity score regression approach of , and specifies
where , is a function of covariates and is the propensity score. The quantity is used in place of ; if is used the fitted model may result in a biased estimator for since may be incorrectly specified. By defining in this way, we restore for where is the number of selected variables (if is correctly specified), as is the (fitted) expected value of , and hence , where . Therefore, misspecification of will not result in an inconsistent estimator of provided that is correctly specified. Moreover,  showed that the proposed estimator attains the semiparametric bound when both and are correctly specified .
In general, this model results in a doubly robust estimator (see ,  and ); it yields a consistent estimator of if either the propensity score model or conditional mean model (5) is correctly specified, and is the most efficient estimator  when both are correctly specified. For additional details on the related asymptotic and finite sample behavior, see [39, 40, 41] and .
Remark 2: The importance of the doubly robust estimator is that, provided the postulated outcome and treatment model identify the true non-zero coefficients in each model, the proposed estimation procedure will consistently estimate the treatment effect given that at least one of the propensity score or outcome models are correctly specified. Assuming linear working models, a sufficient but not necessary condition for selecting non-ignorable confounders is the linearity of the true models in their parameters.
The model chosen for estimation of the treatment effect is data dependent. Owing to the inherited uncertainty in the selected model, making statistical inference about the treatment effect “post-selection inference". Hence, inference about the treatment effect obtained in the estimation step needs to be done with caution. Under certain regularity conditions,  developed the weak consistency of the post-selection estimator [44, 45].
The post-selection inference issue implies that the standard error of obtained by fitting a linear model (5) that includes the selected variables are, in general, under estimated, and thus the standard confidence intervals will be under covered. To mitigate this problem, we propose to approximate the standard errors of our estimator using an idea similar to . Specifically, we bootstrap the sample and in each bootstrap, force the components of the penalized estimator to zero whenever they are close to zero and estimate the treatment effect using the selected covariates, i.e., we define . Although our simulation results suggest that the thresholded bootstrap method produces valid confidence intervals, developing the theoretical properties of the method merits further investigation that is beyond the scope of this paper.
The procedure summary
Estimate the vector of parameters where is defined in (4).
Using the covariates with , estimate the propensity score .
Define a random variable and fit the outcome model The vector of parameters is estimated using ordinary least squares. For simplicity, we assume the linear working model for . The design matrix includes a subset of variables with .
4 Simulation studies
In this section, we study the performance of our proposed variable selection method using simulated data. We compare our results with BAC method introduced by , the Bayesian credible region (Cred. Reg.) introduced by , the collaborative targeted maximum likelihood estimators (CTMLE) introduced by , a doubly robust approach (Belloni) proposed by  and outcome penalized estimator (Y-fit) where we regress the outcome against the treatment and covariates, but only penalize the covariates. Our simulation includes a scenario in which there is a weak confounder that is strongly related to the treatment but weakly to the outcome. We consider linear working models for throughout this section.
S.D: empirical standard error; Covg: coverage of confidence intervals; CTMLE: collaborative-TMLE; Belloni: doubly robust approach; Cred. Reg.: bayesian credible region; BAC: bayesian adjustment for confounding; Y-fit: penalized outcome model via the LASSO; PMOE: proposed method; Oracle: true model.
We generate 500 data sets of sizes 300 and 500 from the following two models:
where in both models
and has a for . Note that in the second scenario, is considered as a weak confounder. Results are summarized in Table 1; the Y-fit row refers to the estimator obtained by penalizing the outcome model using LASSO penalty, and the Oracle row refers to estimator obtained by including in the propensity score and the outcome model (5). We studied the performance of PMOE for different values of constant and the optimal one that is chosen by the GCV criterion, i.e., PMOE. In the first scenario there is no weak confounder and the Y-fit is omitted since the results are similar to the PMOE row. The Bayesian adjustment for confounding (BAC) has been implemented using the R package BEAU with and the Bayesian credible region (Cred. Reg.) has been implemented using the R package BayesPen with flat prior. The doubly robust variable selection method, i.e., Belloni, is also implemented using the R package hdm. Finally, we implemented CTMLE.
Figure 3 shows the proportion of times that our algorithm selects a certain value of tuning parameter . In scenario 1, where there is no weak confounder, the algorithm tends to select higher values for which means the procedure adaptively selects the important covariates mostly based on the outcome model; while in scenario 2, where is a non-ignorable weak confounder, the algorithm puts more weights on the treatment model by selecting smaller values for .
The variance of the estimator in the BAC is too large due to the inclusion of spurious variables that are not related to the outcome. The PMOE and Cred. Reg. estimators, however, are unbiased and have smaller variance. In fact, PMOE has the lowest variance compared to BAC and Cred. Reg. methods regardless of the value of . In the second scenario, the Y-fit estimator is biased because of under selecting the confounder . Also, the bias of the proposed PMOE estimator increases by increasing that is expected because as increases the proposed method should perform similarly to the outcome based variable selection methods such as Y-fit. In general, doubly robust procedures, e.g., CTMLE and Belloni, are sensitive to the positivity assumption. Specifically, when the estimated propensity score using the selected variables is close to boundaries, such methods lead to unstable estimates and possibly under covered confidence intervals. Both CTMLE and Belloni produce confidence intervals that are under covered, and CTMLE seems to be affected more seriously by the issue discussed above. The main drawback of Belloni’s method is that it fails to eliminate variables that only predict the treatment from the set of selected variables. As our results in Table 1 shows this deficiency leads to inflating the standard error of the corresponding estimates. The estimator PMOE outperforms all the other estimators and has similar performance as the oracle estimator which confirms that our method successfully tunes .
Number of coefficients that are correctly or incorrectly set to zero. Cred. Reg.: bayesian credible region; BAC: bayesian adjustment for confounding; Y-fit: penalized outcome model via the LASSO; PMOE: proposed method.
Table 2 presents the average number of coefficients set to zero correctly and incorrectly under the second scenario. There are four non-zero coefficients in our generative model so the number in the correct column should be 96 and in the incorrect column should be 0. This table shows that Cred. Reg., and BAC are somewhat conservative and include some of the variables that should not be included which can be the source of the observed variance inflation in Table 1. This is mostly due to inclusion of variables that are predictors of treatment model but have no association with the outcome. Also, increasing in PMOE, increases the chance of setting the coefficient of to zero. The Y-fit row shows that this method is setting a nonzero coefficient to zero (i.e., coefficient of ) which explains the bias in Table 1. The tuned estimator PMOE has satisfactory performance in ignoring covariates that are only predictors of the treatment mechanism and selecting all the non-ignorable confounders. This, in fact, highlights the importance of our proposed method. We couldn’t obtain the information needed in Table 2 using the available CTMLE’s and Belloni’s R codes, and thus, they are omitted.
Simulation studies presented in Appendix D of the Supplementary Materials study the performance of our covariate selection and estimation procedure when the covariates are non-orthogonal. Web Tables 2 and 3 presented in Appendix D show that the proposed method is still outperforming the other methods. Moreover, in Appendix B of the Supplementary Materials, we study cases that the number of covariates is larger than the sample size (). We also investigate cases where either of the working models of the propensity score or the outcome model is misspecified. Our results show that the proposed method performs well, and outperforms Y-fit (Web Table 1).
5 Application: the effect of life expectancy on economic growth
In this section, we use the proposed method to analyse the cross-country economic growth data reported by  and compare our findings with similar studies on the same data set. For illustration purposes, we focus on a subset of the data which includes 88 countries and 35 variables. Additional details are provided in . We are interested in selecting non-ignorable variables which confound the effect of life expectancy (exposure variable) as a measure of population health on the average growth rate of gross domestic product per capita in 1960-1996 (outcome).
The causal (or, at least, unconfounded) effect of life expectancy on economic growth is controversial.  find no evidence of increasing life expectancy on economic growth while  shows that it might have a positive effect. We dichotomize the life expectancy based on the observed median, which is 50 years. Hence, the exposure variable D=1 if life expectancy is below 50 years in that country and 0 otherwise.
|Air Distance to Big Cities||–-||–-||–-|
|Fraction of Catholics||–-||–-||–-|
|Population Density 1960||–-||–-||–-|
|East Asian Dummy|
|Initial Income (Log GDP in 1960)||–-||–-|
|Public Education Spending Share||–-||–-||–-||–-|
|Nominal Government Share||–-||–-||–-||–-||–-|
|Land Area Near Navigable Water||–-||–-||–-||–-||–-|
|Fraction GDP in Mining||–-||–-|
|Real Exchange Rate Distortions||–-||–-||–-||–-||–-|
|Latin American Dummy||–-|
|Landlocked Country Dummy||–-||–-||–-||–-||–-|
|Oil producing Country Dummy||–-||–-||–-||–-||–-|
|Land Area Near Navigable Water||–-||–-||–-||–-|
Cred. Reg.: bayesian credible region; BAC: bayesian adjustment for confounding; Y-fit: penalized outcome model via the LASSO; PMOE: proposed method.
We applied our method described in Section 3 to selection important confounders. In our analysis, Y-fit refers to the case where just the outcome model is penalized using LASSO to select the significant covariates. We also implement the BAC with and Credible region with flat prior methods. Belloni’s and CTMLE methods are also implemented. Our procedure selects as the optimal value for this tuning parameter. The estimates obtained by PMOE, i.e., 0.617 with CI: (1.27,1.361), and CTMLE, i.e., 0.719 with CI: (0.512,0.927) are fairly close. However the standard error of the latter one is 70% smaller that the former, which raises concern about the validity of the corresponding confidence intervals (see Table 1). Y-fit and Belloni lead to the smallest and largest effects, respectively. Based on our simulation studies, we conjecture that this is because Y-fit may ignore some of non-ignorable confounders and Belloni’s may include some of the treatment predictors in the model. The BAC estimate, i.e., 0.524 with CI:(0.228,1.286), is also close to the PMOE. The Credible region method leads to the second highest effect estimate of 0.820 with CI: (0.203,1.341).
Table 3 presents the list of variables that are selected at least by one of the methods. For brevity, we focus on Y-fit, BAC, Cred. Reg., and PMOE. The proposed method selects 12 and 9 variables depending on the value of while Y-fit, BAC and Cred. Reg. select 7, 6 and 11 variables, respectively. Y-fit and PMOE are susceptible to under selecting some of non-ignorable confounders that barely predict the outcome. Specifically, our results suggest that Population Density 1960, and Initial Income are such non-ignorable confounders which are known to be important confounders in the economics literature. Table 4 shows that ignoring these variables leads to a substantially different treatment effect estimates. Moreover, because there are evidence that variables such as Oil producing Country, Land Area Near Navigable Water, and Nominal Government Share may confound the effect of the life expectancy on the economic growth [48, 49, 51, 52, 53, 54, 55, 56], PMOE seems to select more reasonable covariates than PMOE. Despite the small discrepancies in the selected variables between the BAC, Creg. Reg. and PMOE, these methods lead to the same conclusion and suggest that while the life expectancy has positive effect on the average growth rate of gross domestic product per capita in 1960-1996, the effect is not statistically significant.
To gain insight into the effect of parameter on the selection results, we plot the estimated coefficients for different values of for given tuning parameters. In Figure 4, the blue solid (dark dashed) lines correspond to coefficients that their estimated value is (not) greater than 0.05 for the entire range of . Also, displays (a)–(d) correspond to tuning parameter values , 0.0001, 0.001, and 0.01, respectively. Figure 4 shows that, for different values of , the selected set of covariates may vary slightly. For example, in Figure 4(c) where a moderate penalty function is imposed, the proposed method suggests to include Colony (the dashed line) to the selected set for larger values of . Also, as the penalty becomes stronger, the estimated coefficients become more stable across values of (Figure 4(d)). This is because, for larger values of , the penalty function has a more dominant rule on the variable selection than .
Table 4 also reports the standard errors of the estimated effect of life expectancy using different penalization methods. The standard error of the PMOE estimator is approximated using an idea similar to . Specifically, we bootstrap the sample and in each bootstrap force the components of the penalized estimator to zero whenever they are close to zero and estimate the treatment effect using the selected covariates, i.e., we define . We utilize this thresholded bootstrap method to approximate the standard error of the treatment effect. Although more investigation is required to validate the asymptotic properties of this method, we only use this standard errors to shed light on the behavior of the penalized estimators. For example, the estimators correspond to PMOE and Y-fit have the lowest standard errors that may support the possibility of under-selecting important covariates. Our results suggests that although the effect of life expectancy is positive, it is unlikely to be significant that is consistent with .
ATE: average treatment effect; CTMLE: collaborative-TMLE; Belloni: doubly robust approach; Cred. Reg.: bayesian credible region; BAC: bayesian adjustment for confounding; Y-fit: penalized outcome model via the LASSO; PMOE: proposed method; CI: confidence interval.
We have established a two-step procedure for estimating an unconfounded treatment effect in high-dimensional settings. First, we deal with the sparsity by penalizing a modified objective function which considers both covariate-outcome and covariate-treatment associations. Then, the selected variables are used to form a doubly robust regression estimator of the treatment effect by incorporating the propensity score in the conditional expectation of the outcome. The selected covariates may be used in other causal techniques as well as the proposed regression method. The proposed method may also be used to identify candidate instrumental variables [57, 58,59, 60]. Specifically, one can penalize the treatment model first and record the selected variables and then apply the proposed method and identify the candidate instrumental variables as those that are selected by the treatment model but not the proposed method .
As described in section 3.5, any covariate selection procedure which involves the outcome variable affects the subsequent inference of the selected coefficients [61, 62, 63, 64, 65]. This is because the selected model itself is stochastic and it needs to be accounted for.  proposes a method to produce a valid confidence interval for the coefficients of the selected model in the post-selection context. In our setting, although we do not penalize the treatment effect, the randomness of the selected model affects the inference about the causal effect parameter through confounding. Moreover, note that the oracle property of the penalized regression estimators is a pointwise asymptotic feature and does not necessarily hold for all the points in the parameter space [67, 68]. In this manuscript, we assume that the parameter dimension () is fixed while the number of observation tends to infinity. One important extension to our work is to generalize the framework to cases where the tuple tends to infinity . Analyzing the convergence of the estimated vector of parameters in the more general setting requires an adaptation of restricted eigenvalue condition  or restricted isometry property .
Funding statement: This research was partly supported by the Natural Sciences and Engineering Council of Canada through Discovery Grants to Masoud Asgharian (NSERC RGPIN 217398-13).
4. Schisterman EF, Cole S, Platt RW. Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology 2009;20:488.10.1097/EDE.0b013e3181a819a1Search in Google Scholar PubMed PubMed Central
7. Patrick AR, Schneeweiss S, Brookhart MA, Glynn RJ, Rothman KJ, Avorn J, Stürmer T. The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration. Pharmacoepidemiology and drug safety 2011;20:551–559.10.1002/pds.2098Search in Google Scholar PubMed PubMed Central
8. Pearl J. On a class of bias-amplifying variables that endanger effect estimates (2012). arXiv preprint arXiv:1203.3503.Search in Google Scholar
10. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. Variable selection for propensity score models. Am J Epidemiol. 2006a;163:1149–1156.10.1093/aje/kwj149Search in Google Scholar PubMed PubMed Central
11. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology (Cambridge, Mass.) 2009;20:512.10.1097/EDE.0b013e3181a663ccSearch in Google Scholar PubMed PubMed Central
15. Brookhart MA, van der Laan MJ. A semiparametric model selection criterion with applications to the marginal structural model. Comput Stat Data Anal. 2006b;50:475–498.10.1016/j.csda.2004.08.013Search in Google Scholar
17. Sinisi S, Polley E, Petersen M, Rhee S, Van Der Laan M. Super learning: an application to the prediction of HIV-1 drug resistance. Stat Appl Genetics Molecular Biol. 2007;6:7.10.2202/1544-6115.1240Search in Google Scholar PubMed PubMed Central
18. Van der Laan M, Dudoit S, Van der Vaart A. The cross-validated adaptive epsilon-net estimator. UC Berkeley Division of Biostatistics Working Paper Series, 2004:142.Search in Google Scholar
20. Wang C, Dominici F, Parmigiani G, Zigler CM. Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models. Biometrics. 2015.10.1111/biom.12315Search in Google Scholar PubMed PubMed Central
21. Zigler CM, Watts K, Yeh RW, Wang Y, Coull BA, Dominici F. Model feedback in Bayesian propensity score estimation. Biometrics, 2013.10.1111/j.1541-0420.2012.01830.xSearch in Google Scholar PubMed PubMed Central
23. Lin W, Feng R, Li H. Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. J Am Stat Assoc 2015;110:270–288.10.1080/01621459.2014.908125Search in Google Scholar PubMed PubMed Central
35. Davidian M, Tsiatis A, Leon S. Semiparametric estimation of treatment effect in a pretest–posttest study with missing data. Stat Sci. 2005;20:261.10.1214/088342305000000151Search in Google Scholar PubMed PubMed Central
36. Schafer JL, Kang JDY. Discussion of “semi-parametric estimation of treatment effect in a pretest–postest study with missing data” by M. Davidian et al. Stat Sci 2005;20:292–295.10.1214/088342305000000151Search in Google Scholar
38. Tsiatis AA. Semiparametric theory and missing data. Springer Verlag, 2006.Search in Google Scholar
39. Kang J, Schafer J. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007;22:523–539.10.1214/07-STS227Search in Google Scholar
42. Robins JM Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association Section on Bayesian Stat Sci, 1999, 2000:6–10.Search in Google Scholar
43. Zhao P, Yu B. On model selection consistency of lasso. J Mach Learn Res. 2006;7:2541–2563.Search in Google Scholar
45. Zhang J, Jeng XJ, Liu H. Some two-step procedures for variable selection in high-dimensional linear regression (2008). arXiv preprint arXiv:0810.1644.Search in Google Scholar
49. Acemoglu D, Johnson S. Disease and development: the effect of life expectancy on economic growth, Technical report, National Bureau of Economic Research. (2006).10.3386/w12269Search in Google Scholar
50. Husain MJ. Alternative estimates of the effect of the increase of life expectancy on economic growth. Economics Bulletin 2012;32:3025–3035.Search in Google Scholar
52. Eicher TS, Papageorgiou C, Raftery AE. Default priors and predictive performance in bayesian model averaging, with application to growth determinants. J Appl Econometrics. 2011;26:30–55.10.1002/jae.1112Search in Google Scholar
55. Ley E, Steel MF. On the effect of prior assumptions in bayesian model averaging with applications to growth regression. J Appl Econometrics. 2009b;24:651–674.10.1002/jae.1057Search in Google Scholar
56. Magnus JR, Powell O, Prüfer P. A comparison of two model averaging techniques with an application to growth empirics. J Econometrics. 2010;154:139–153.10.1016/j.jeconom.2009.07.004Search in Google Scholar
57. Angrist JD, Imbens GW. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J Am Stat Assoc. 1995;90:431–442.10.1080/01621459.1995.10476535Search in Google Scholar
59. Kang H, Cai TT, Small DS. Robust confidence intervals for causal effects with possibly invalid instruments ( 2015). arXiv preprint arXiv:1504.03718.Search in Google Scholar
60. Kang H, Zhang A, Cai TT, Small DS. Instrumental variables estimation with some invalid instruments and its application to mendelian randomization. J Am Stat Assoc. 2016;111:132–144.10.1080/01621459.2014.994705Search in Google Scholar
63. Taylor J, Lockhart R, Tibshirani RJ, Tibshirani R. Exact post-selection inference for forward stepwise and least angle regression (2014). arXiv preprint arXiv:1401.3889.Search in Google Scholar
65. Tibshirani R, Taylor J, Lockhart R, Tibshirani R. Exact post-selection inference for sequential regression procedures (2014). arXiv preprint arXiv:1401.3889.Search in Google Scholar
69. Negahban S, Ravikumar PD, Wainwright MJ, Yu B, et al. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. In: NIPS, 2009:1348–1356.10.1214/12-STS400Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.