# Abstract

Often the research interest in causal inference is on the regression causal effect, which is the mean difference in the potential outcomes conditional on the covariates. In this paper, we use sufficient dimension reduction to estimate a lower dimensional linear combination of the covariates that is sufficient to model the regression causal effect. Compared with the existing applications of sufficient dimension reduction in causal inference, our approaches are more efficient in reducing the dimensionality of covariates, and avoid estimating the individual outcome regressions. The proposed approaches can be used in three ways to assist modeling the regression causal effect: to conduct variable selection, to improve the estimation accuracy, and to detect the heterogeneity. Their usefulness are illustrated by both simulation studies and a real data example.

## 1 Introduction

Causal inference has been widely applied for decades to draw cause-and-effect conclusions based on observational studies, in which treatments are assigned to observations in a non-random fashion. In many cases, theories in causal inference are developed under the potential outcome framework [1]. That is, when the treatment assignment is binary, i. e., either treated or untreated, the outcome variable in the hypothetical complete data set has two components *X* be a set of covariates recorded in the study that collects subject’s personal information. The regression causal effect, defined by

Let *T* be the treatment assignment with support {0,1}. Since for each subject, only *X* includes all the potential confounders; that is,

under which the distribution of *X* is identical whether or not restricted to a specific treatment group. As (1) applies to the individual outcomes

Because the regression causal effect is only a part of the two-dimensional function *X*, and

Then *η* must be estimated in the first step. However, this estimation would not be needed had we known that the regression causal effect is a constant, in which case we could estimate

In addition to fitting the regression function, another common interest about the regression causal effect is to find a pre-assumed low-dimensional transformation of the covariates, denoted by

where *f* is an free and unknown function and the error term *ϵ* satisfies *X* to estimate the regression causal effect, data visualization is possible and the number of parameters is reduced in the subsequent modeling, both of which enhance the accuracy and interpretability of the estimation. Third, hypothesis testing on whether

In the literature, estimation of

When *X*, e. g.

In this paper, we use sufficient dimension reduction to propose a new and model-free estimator of *X*, is given a priori. Compared with Luo et al. [5], the estimator is directly built on the regression causal effect rather than the individual outcome regressions, so it is more efficient in reducing dimensionality. In practice, it is applicable when the propensity score is easy to tackle, and particularly useful when the individual outcome regressions are complex and need to be fitted nonparametrically. In addition, it can be slightly modified to perform variable selection, and detect the heterogeneity of the regression causal effect. For simplicity, we assume *X* to be continuous and have zero mean throughout the theoretical development.

## 2 Sufficient dimension reduction

Sufficient dimension reduction is a family of methods that aims to reduce the dimension of covariates prior to subsequent modeling. When the regression of a response variable *W* on the *p*-dimensional covariates *X* is of interest, it assumes the existence of

where *f* is a free and unknown function and *ϵ* satisfies *β* that satisfies (4) and any matrix *A* of full row-rank, *f* is adjusted accordingly, one needs to consider the linear space spanned by the columns of *β* with minimal dimension for identifiable parametrization. Cook and Li [14] showed that under fairly general conditions on *X*, which we assume throughout the article, such space with minimal dimension is unique. This space, commonly called the central mean subspace and denoted by

Existing methods for estimating the central mean subspace include ordinary least square [15], principal Hessian directions [16], iterative Hessian transformations [14], minimal average variance estimation [17], and other semiparametric methods [18], [19], etc. A method is called Fisher-consistent if it recovers a subspace of *X* and *X*:

where *β* spans *β*, then the condition is equivalent to an elliptical distribution of *X*. Principal Hessian directions first estimates *X*:

where *β* spans *β*, then these conditions together require *X* to have a joint normal distribution. These conditions also hold approximately when the dimension of *X* is relatively large [20], for which they are not considered restrictive in applications. Using the sample moments to estimate the corresponding population moments, both ordinary least square and principal Hessian directions are easy to implement.

Because ordinary least square is built upon

A separate issue in sufficient dimension reduction is to estimate the dimension *d* of

## 3 The central causal effect subspace

Before digging into more details, we introduce the notations and some regularity conditions. Let *R*, let *R*. We denote the propensity score by

For any real vector *v*, denote its Euclidean norm by *v* as a matrix with one column whenever needed. For any matrix *β*, let *β*, *β*, and for any index set *A*, let *β* indexed by *A* and *β*. When *β* is a square matrix, let *β*. We use **0** to denote the origin of a real space of arbitrary dimension, if no ambiguity is caused. For any two linear subspaces

When

In the literature, multiple papers have studied the application of sufficient dimension reduction in causal inference. Ghosh [24] applied it to the propensity score and introduced *η* measurable of *p*-dimensional but

Because it is the regression causal effect, rather than the outcome regressions or the propensity score, that serves as the primary interest in causal inference, it must be

Because

which is observable from the data and can substitute

This observation is the key to our theoretical development. The proof of (9) is straightforward and is omitted. Its weaker version

## Theorem 1.

*For**defined in (**8**), we have**.*

The proof of this theorem can be found in Appendix A.1. In practice, the propensity score is unknown and needs to be estimated. Throughout the article, we assume that a consistent estimator

For ease of asymptotic study, we additionally assume

(C1) Assume *ϕ* and *h* are known functions such that *ϕ* is continuously twice differentiable.

This asymptotic linearity assumption holds for the logistic regression, the covariate balancing propensity score, and the super learner, etc., so it is fairly general. The case where (C1) is violated will be discussed in § 8.

Using

Step 0. Let

Step 1. Estimate

Step 2. Estimate

Step 3. Let

In Step 3, one needs to determine the rank of

We next develop the asymptotic normality of the matrix estimator

## Theorem 2.

*Suppose the unconfoundedness assumption (**1**), the common support condition (**7**), and (C1) hold, and the sample observations are independent. As**, we have*

*where*Γ

*is a block diagonal matrix with diagonal blocks*

*and*

*, and*

*Here,*

*and*

*.*

The proof of this theorem can be found in Appendix A.2. As mentioned in § 1, an estimator of the central causal effect subspace can be used in three ways: to perform variable selection for the regression causal effect, to improve the estimation accuracy of the regression causal effect, and, to detect the heterogeneity of the regression causal effect. We next discuss these in details.

## 4 A sparse estimation

To enhance the interpretability of the central causal effect subspace and its estimator, we now conduct sparse sufficient dimension reduction. That is, in addition to (4) with

This assumption is commonly adopted in variable selection. It means that all the components of *β* of *β*. Compared with variable selection, sparse sufficient dimension reduction further tells by which linear combination the active set affects the regression causal effect. Thus, it provides researchers with additional information, and is more efficient in reducing dimensionality.

To incorporate the sparsity structure into the ensemble moment-based estimator, we follow Chen, Zou, and Cook [29] to convert the eigen-decomposition of

Step 4. Let *S* stands for sparsity, be a minimizer of

subject to

In (13), *θ* is the tuning parameter that determines how sparse the resulting estimate is. Chen et al. [29] suggested selecting *θ* by minimizing a Bayesian information criterion, which we slightly modify to be

where *θ* and

To minimize (13), Chen et al. [29] suggested an algorithm that alternates between the local quadratic approximation and spectral decomposition until convergence. Chen et al. [29] also showed the selection consistency and the oracle property of the resulting sparse estimator. These properties can be directly parallelized for our case, proof omitted here.

## Theorem 3.

*Let**be the oracle estimator with**fixed at zero and**estimated by Steps 1 – 3 using**as the covariates. If Assumptions (**1**), (**7**), (**12**), and (C1) hold, and θ in (**13**) satisfies that**and**as**, then we have**,**, and**.*

By Theorem 3, the sparse ensemble moment-based estimator consistently selects the active set for the regression causal effect, and is asymptotically equally accurate as the oracle estimator. A simulation study (see simulation study 1 of Supplementary Material) showed that in the finite-sample level, it consistently outperforms the ordinary ensemble moment-based estimator when the sparsity assumption (12) holds. This differs from the commonly observed phenomenon in variable selection, where sparse estimators are always suboptimal to their ordinary counterparts in terms of larger finite-sample bias. However, it is reasonable in the sufficient dimension reduction scenario, as the estimation accuracy is not measured for the coefficients of individual covariates in the active set, but rather for the entire central mean subspace, which would be improved if the estimation error associated with the irrelative covariates is wiped out.

The sparse ensemble moment-based estimator inherits the advantage that it avoids fitting the individual outcome regressions. This is shared by the variable selection procedure in Tian et al. [6], not by the others mentioned in § 1.

## 5 Estimation of the regression causal effect

Equation (9) suggests estimating the regression causal effect by equivalently estimating *X*. When the central causal effect subspace is zero-dimensional, this amounts to averaging

As an illustration, we next use local linear regression to estimate the regression causal effect. Based on the scatter plot of the reduced covariates and this estimate, one may also adopt appropriate parametric models to further improve the estimation. Following Step 4 in § 4, we have

Step 5. For any

Here, *ℓ*. The sparse ensemble moment-based estimator

In the literature, a common strategy to estimate the regression causal effect is to treat it as the difference between the individual outcome regression functions, and estimate the latter within each treatment group. Unfortunately, this strategy can not be parallelized if we use the reduced covariates from the central causal effect subspace in place of *X*. The reason is that these reduced covariates may not be sufficient to predict the individual outcomes, so the un-confoundedness assumption (1) would fail and the individual outcome regression would not be estimable by the observed data. For example, in Model (2) where *T*, which is fully unspecified.

## 6 Detecting heterogeneous causal effect

Based on the asymptotic normality result in Theorem 2, all the aforementioned order-determination methods in § 2 can be applied to detect whether the central casual effect subspace is zero dimensional, which corresponds to a homogeneous regression causal effect. As an example, we employ the hypothesis testing procedure proposed in Bura and Yang [21].

The test is based on the observation that when

in distribution, where *d* is nonzero, the test statistic in (16) is stochastically larger and diverging to infinity in probability. Thus, for a pre-specified significance level *α*, we reject the null hypothesis, i. e. a homogeneous regression causal effect, if

To estimate the *X*, and estimate Λ by bootstrap re-sampling and using the bootstrap sample covariance matrix of

Given the estimates `R`.

Compared with the parametric and nonparametric tests in Crump et al. [11], our test enjoys the advantages of both: it can detect a

## 7 Simulation studies

We use the simulated models to illustrate the effectiveness of the sparse ensemble moment-based estimator in estimating the regression causal effect and in variable selection, and that of the proposed

### 7.1 Estimating the regression causal effect

We consider the following four models. In each model, the treatment assignment *T* is generated independently of the outcomes conditional on the propensity score, so the un-confoundedness assumption (1) holds.

Model 1.

Model 2.

Model 3.

Model 4.

For *X*, under independent Bernoulli distribution with mean equal to 0.5, and generate the other components under independent standard normal distributions; in Model 2, we generate the components of *X* from independent uniform distribution on (−2,2); in Models 3 and 4, we generate *X* under

In Models 1 and 2, the regression causal effect is linear, although the individual outcome regression functions are more complex in Model 2 with non-monotone structure. Theoretically, the proposed methods will perform consistently in both models, while all the existing methods that rely on individual outcome regressions are expected to be competent for Model 1. In Models 3 and 4, both the regression causal effect and the outcome regression functions are nonlinear. In conjunction with the various propensity scores, these models represent a variety of cases in practice.

From the dimension reduction point of view, the ensemble space

As mentioned in § 4, we use the sparse ensemble moment-based estimator in estimating

To measure the overall deviation of an estimator

where

### Table 1

Model | Oracle | S-ENS | LZG | GCL | GCQ | RF | GEL | GEQ | WML | WMQ |

1 | 6.6 | 6.6 | 8.9 | 8.8 | 11.4 | 29.0 | 9.9 | 12.4 | 9.9 | 12.4 |

(4.4) | (4.4) | (3.8) | (3.8) | (3.1) | (3.3) | (2.3) | (2.2) | (2.3) | (2.3) | |

2 | 8.9 | 16.8 | 15.5 | 52.6 | 14.1 | 55.7 | 16.6 | 15.9 | 16.7 | 15.9 |

(2.7) | (4.1) | (2.3) | (5.8) | (0.8) | (2.5) | (3.7) | (2.8) | (3.6) | (2.8) | |

3 | 18.7 | 20.9 | 21.2 | 57.3 | 19.5 | 28.2 | 71.6 | 19.6 | 71.5 | 19.6 |

(4.9) | (6.9) | (5.3) | (6.5) | (3.4) | (2.2) | (6.8) | (3.1) | (6.8) | (3.2) | |

4 | 19.5 | 19.9 | 53.8 | 125.6 | 87.7 | 96.4 | 125 | 87.6 | 125 | 87.6 |

(3.2) | (2.9) | (3.5) | (5) | (6.6) | (12.7) | (5.2) | (6.4) | (5.2) | (6.3) |

From Table 1, the linear G-computation is consistent only in Model 1. It is slightly improved by the linear G-estimation, as the latter truly specifies both the regression causal effect and the propensity score in Models 1 and 2. The quadratic G-computation is consistent in Models 1–3, as well as the quadratic G-estimation. The dynamic weighted least square approaches perform almost identically to the G-estimations, which conforms to the results in Wallace and Moodie [9]. All these methods fail in Model 4, where they mis-specify parametric models on the regression causal effect. On the other hand, the random forest estimator, which is non-parametric, is lack of effectiveness in most models due to the limited sample size.

By contrast, both the proposed estimator and the semiparametric G-computation are consistent in all the models. In particular, the proposed estimator outperforms the linear G-computation in Model 1, for which the latter adopts parsimonious and appropriate parametric models. This is not surprising, as the proposed estimator uses additional sparsity structure in the model. Compared with the semiparametric G-computation, the proposed estimator is substantially superior in Model 4, where the outcome regression functions are complex. Referring to the discussion in § 5, this conforms to our theoretical expectation. Compared with the oracle estimator, the proposed estimator is less effective in Model 2, indicating that the cost of estimating the central causal effect subspace can be non-negligible.

### 7.2 Variable selection

We now examine the variable selection consistency of the sparse ensemble moment-based estimator, by evaluating its true positive rate and false positive rate of selecting the active set of covariates, when applied to Models 1–4.

Persson et al. [13] proposed a variable selection procedure that estimates two active sets, each for an individual outcome regression. Naturally, the union of the two sets can be treated as an estimate of the active set for the regression causal effect, although their estimate can contain redundant covariates, for example, in Models 3 and 4. The difference Lasso approach [4] and the virtual twins method [10] first impute the missing outcomes using nonparametric techniques and then use the imputed

### Table 2

Model | S-ENS | VT | dLasso | PHWD | ||||

TPR | FPR | TPR | FPR | TPR | FPR | TPR | FPR | |

1 | 1.000 | 0.000 | 1.000 | 0.007 | 1.000 | 0.388 | 1.000 | 0.433 |

2 | 1.000 | 0.008 | 1.000 | 0.000 | 1.000 | 0.485 | 1.000 | 0.434 |

3 | 1.000 | 0.099 | 1.000 | 0.185 | 0.530 | 0.290 | 1.000 | 0.443 |

4 | 1.000 | 0.000 | 1.000 | 0.000 | 0.702 | 0.306 | 1.000 | 0.361 |

Due to the use of the Lasso method, the difference Lasso approach favors a linear regression causal effect. This is supported by the results in Table 2, which show that the method is incompetent in Models 3 and 4. The approach based on Persson et al. [13] has a desired sensitivity in all the models, but with a worrisome specificity by its nature. By contrast, both the virtual twins method and the sparse ensemble moment-based estimator constantly select the exact active set in all the models, with the former slightly outperformed by the latter in Model 3.

### 7.3 Testing the heterogeneity

We now evaluate the proposed *X* in Model 1 changed to be normally distributed as in the other two models. In addition, to examine the actual significance level of the test, we simulate the following three models that have homogeneous regression causal effect.

Model 5.

Model 6.

Model 7.

The distribution of

We perform the proposed

As mentioned in § 1, Crump et al. [11] proposed both a normal test and a

### Table 3

1 | 3 | 4 | 5 | 6 | 7 | |

SDR | 93.3 | 93.1 | 99.9 | 98.7 | 96.7 | 99.7 |

r-Normal | 100 | 58.2 | 100 | 89.1 | 23.9 | 8.8 |

100 | 53.8 | 100 | 92.5 | 27.2 | 10.0 | |

Normal | 100 | 79.4 | 100 | 91.3 | 4.7 | 4.0 |

100 | 76.5 | 100 | 94.5 | 5.2 | 5.4 |

From Table 3, the proposed

To give a closer look at the performance of the tests, in Figure 1, we draw the box-plot of p-values for each model and each test, except for the normal tests as they perform similarly to Crump et al.’s

### Figure 1

### 7.4 Data analysis

We analyzed the data from the health evaluation and linkage to primary care study, publicly available with the approval of the Institutional Review Board of Boston University Medical Center and the Department of Health and Human Services. The data set contains 453 patients recruited from a detoxification unit, who possibly spent at least one night on the street or shelter within six months before entering the study, in which case the patient is marked as homeless. Our interest is to estimate the causal effect of the homeless experience on patients’ physical health condition, measured when entering the study and by the SF-36 physical component score, with higher scores indicating better functioning.

To make the un-confoundedness assumption plausible, we included all the covariates collected in the data who do not have many missing values, some of which were transformed to favor the linearity condition (5) and the constant variance condition (6). They are: age at baseline, a scale indicating depressive symptoms, the square root of the number of friends, the square of a total score of inventory of drug use consequences, the square root of a sex risk score, gender, the average and the maximum number of drinks consumed per day in the past month, and the number of times hospitalized for medical problems. All the nine covariates were standardized to have zero mean and unit variance.

By applying the proposed test, we detected that the regression causal effect is heterogeneous with p-value 0.04. The sequential tests [21] further suggested that

Thus, gender is the dominating factor for the causal effect of homeless experience, and the number of friends and the sex risk are also affective. Cross-validation [17], [5] showed that both

To illustrate the sufficiency and effectiveness of the univariate reduced covariate, we generated a pseudo

### Figure 2

## 8 Discussion

When a set of appropriate parametric models is not available for the propensity score

In these cases, it is easily seen that the ensemble moment-based estimator is still consistent but its asymptotically normality (10) fails, for which the inferential results must be adjusted. For the variable selection consistency, the order of the tuning parameter *θ* in Theorem 3 needs to be adjusted according to the convergence order of

As mentioned earlier, an alternative that avoids estimating the propensity score is to estimate the regression causal effect as the difference between the outcome regression functions, the latter being estimated semiparametrically [5]. Following the literature, one may think of constructing a doubly-robust estimator that combines the two approaches. However, in contrast to the case where the parameter of interest is the average causal effect and both the propensity score and the outcome regression functions are estimated parametrically, such an estimator will always inherit the drawback of outcome regression-based estimator, i. e. the estimation of the nuisance functional parameters mentioned in § 1 and the redundant directions in

**Funding source: **Social Sciences and Humanities Research Council of Canada

**Award Identifier / Grant number: **430-2016-00163

**Funding source: **Natural Sciences and Engineering Research Council of Canada

**Award Identifier / Grant number: **RGPIN-2017-04064

**Funding statement: **Dr. Zhu’s research was supported by Award Number 430-2016-00163 from the Social Sciences and Humanities Research Council and by Grant Number RGPIN-2017-04064 from the Natural Sciences and Engineering Research Council of Canada.

## Appendix A

### A.1 Proof of Theorem 1

### A.2 Proof of Theorem 2

### Proof.

We first show the asymptotic normality of

which we denote by

Because

and that by the central limit theorem,

### References

1. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Search in Google Scholar

2. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7:1393–512.10.1016/0270-0255(86)90088-6Search in Google Scholar

3. Snowden JM, Rose S, Mortimer KM. Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173:731–8.10.1093/aje/kwq472Search in Google Scholar

4. Ghosh D, Zhu Y, Coffman DL. Penalized regression procedures for variable selection in the potential outcomes framework. Stat Med. 2015;34:1645–58.10.1002/sim.6433Search in Google Scholar

5. Luo W, Zhu Y, Ghosh D. On estimating regression-based causal effects using sufficient dimension reduction. Biometrika. 2017;104:51–65.10.1093/biomet/asw068Search in Google Scholar

6. Tian L, Alizadeh AA, Gentles AJ, Tibshirani R. A simple method for estimating interactions between a treatment and a large number of covariates. J Am Stat Assoc. 2014;109:1517–32.10.1080/01621459.2014.951443Search in Google Scholar

7. Abrevaya J, Hsu Y-C, Lieli RP. Estimating conditional average treatment effects. J Bus Econ Stat. 2015;33:485–505.10.1080/07350015.2014.975555Search in Google Scholar

8. Robins JM. Optimal structural nested models for optimal sequential decisions. In: Proceedings of the second seattle Symposium in Biostatistics. Springer; 2004. p. 189–326.10.1007/978-1-4419-9076-1_11Search in Google Scholar

9. Wallace MP, Moodie EE. Doubly-robust dynamic treatment regimen estimation via weighted least squares. Biometrics. 2015;71:636–44.10.1111/biom.12306Search in Google Scholar

10. Foster JC, Taylor JMG, Ruberg SJ. Subgroup identification from randomized clinical trial data. Stat Med. 2011;30:2867–80.10.1002/sim.4322Search in Google Scholar

11. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Nonparametric tests for treatment effect heterogeneity. Rev Econ Stat. 2008;90:389–405.10.3386/t0324Search in Google Scholar

12. Imai K, Ratkovic M. Estimating treatment effect heterogeneity in randomized program evaluation. Ann Appl Stat. 2013;7:443–70.10.1214/12-AOAS593Search in Google Scholar

13. Persson E, Häggström J, Waernbaum I, de Luna X. Data-driven algorithms for dimension reduction in causal inference. Comput Stat Data Anal. 2017;105:280–92.10.1016/j.csda.2016.08.012Search in Google Scholar

14. Cook RD, Li B. Dimension reduction for conditional mean in regression. Ann Stat. 2002;30:455–74.10.1214/aos/1021379861Search in Google Scholar

15. Li K-C, Duan N. Regression analysis under link violation. The Annals of Statistics. 1989. 1009–1052.10.1214/aos/1176347254Search in Google Scholar

16. Li K-C. On principal hessian directions for data visualization and dimension reduction: Another application of stein’s lemma. J Am Stat Assoc. 1992;87:1025–39.10.1080/01621459.1992.10476258Search in Google Scholar

17. Xia Y, Tong H, Li WK, Zhu L-X. An adaptive estimation of dimension reduction space. J R Stat Soc, Ser B, Stat Methodol. 2002;64:363–410.10.1142/9789812836281_0023Search in Google Scholar

18. Luo W, Li B, Yin X. On efficient dimension reduction with respect to a statistical functional of interest. Ann Stat. 2014;42:382–412.10.1214/13-AOS1195Search in Google Scholar

19. Ma Y, Zhu L. On estimation efficiency of the central mean subspace. J R Stat Soc, Ser B, Stat Methodol. 2014;76:885–901.10.1111/rssb.12044Search in Google Scholar

20. Hall P, Li K-C. On almost linearity of low dimensional projections from high dimensional data. Ann Stat. 1993;47:867–89.10.1214/aos/1176349155Search in Google Scholar

21. Bura E, Yang J. Dimension estimation in sufficient dimension reduction: a unifying approach. J Multivar Anal. 2011;102:130–42.10.1016/j.jmva.2010.08.007Search in Google Scholar

22. Zhu L, Miao B, Peng H. On sliced inverse regression with high-dimensional covariates. J Am Stat Assoc. 2006;101:630–42.10.1198/016214505000001285Search in Google Scholar

23. Luo W, Li B. Combining eigenvalues and variation of eigenvectors for order determination. Biometrika. 2016;103:875–87.10.1093/biomet/asw051Search in Google Scholar

24. Ghosh D. Propensity score modelling in observational studies using dimension reduction methods. Stat Probab Lett. 2011;81:813–20.10.1016/j.spl.2011.03.002Search in Google Scholar

25. Hu Z, Follmann DA, Wang N. Estimation of mean response via the effective balancing score. Biometrika. 2014;101:613–24.10.1093/biomet/asu022Search in Google Scholar

26. Huang M-Y, Chan KCG. Joint sufficient dimension reduction and estimation of conditional and average treatment effects. Biometrika. 2017;104:583–96.10.1093/biomet/asx028Search in Google Scholar

27. Imai K, Ratkovic M. Covariate balancing propensity score. J R Stat Soc B. 2014;76:243–63.10.1111/rssb.12027Search in Google Scholar

28. van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007;6:1–21.10.2202/1544-6115.1309Search in Google Scholar

29. Chen X, Zou C, Cook R. Coordinate-independent sparse suffcient dimension reduction and variable selection. Ann Stat. 2010;6:3696–723.Search in Google Scholar

## Supplemental Material

The online version of this article offers supplementary material (https://doi.org/10.1515/jci-2018-0015).

**Received:**2018-06-04

**Revised:**2018-09-26

**Accepted:**2018-10-02

**Published Online:**2018-10-19

**Published in Print:**2019-04-26

© 2019 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.