# Abstract

Recent approaches in causal inference have proposed estimating average causal effects that are local to some subpopulation, often for reasons of efficiency. These inferential targets are sometimes data-adaptive, in that they are dependent on the empirical distribution of the data. In this short note, we show that if researchers are willing to adapt the inferential target on the basis of efficiency, then extraordinary gains in precision can potentially be obtained. Specifically, when causal effects are heterogeneous, any asymptotically normal and root-

## 1 Introduction

When causal effects are heterogeneous, then inferences depend on the population for which causal effects are estimated. Although population average causal effects have traditionally been the inferential targets, recent results have focused on estimating average causal effects that are *local* to some subpopulation for reasons of efficiency. These approaches include trimming observations based on the distribution of the propensity score [1], using regression adjustment to estimate reweighted causal effects [2, 3, 4], or implementing calipers for propensity-score matching [5, 6]. In some cases, the target parameter is dependent on the empirical distribution of the data, including cases where the researcher is explicitly conducting inference on, e. g., the average treatment effect among the treated conditional on the observed covariate distribution [7], or other causal sample functionals [8, 9], without revision to the estimator being used.

These approaches privilege efficiency in estimation over targeting population average causal effects, and often allow for the target to be defined on the basis of the observed data. We provide an example of how these approaches, taken to their extreme, can provide extraordinary gains in statistical certainty. We consider the case of a *data-adaptive* target parameter [10] that is allowed to vary with the data depending on which subpopulation’s local average causal effect is best estimated. When treatment effects are heterogeneous, adaptively changing the target parameter on the basis of efficiency yields an unusual result: if the population average causal effect can be consistently estimated with a root-

## 2 Results

Consider a full data probability distribution

(Effect heterogeneity).

Assumption 1 is equivalent to assuming that causal effects are not constant across observations in the distribution

We do not observe the full data probability distribution

*An estimator*
*is root*-
*consistent and asymptotically normal for*
*if*
*, for some*

We now define the target parameter,

*Let the target parameter*

*where, as in Assumption 1*,

The target parameter adapts naturally to the closest value in an interval surrounding

*There exists a nonnegative weighting associated with each empirical distribution*
*, such that across all*

A proof of Proposition 1 follows directly from the fact that a weighted mean can obtain any value in the interval defined by the infimum and supremum of its distribution’s support. Proposition 1 asserts that across all realizations, the target parameter

However, mirroring results on other data-adaptive parameters under random sampling, including the sample average causal effect, the target parameter

*Suppose that*
*is a root*-
*consistent and asymptotically normal estimator of*
*. Then*

A proof of Proposition 2 follows by noting that

We now turn to our primary result, proving the superefficiency of

*Suppose that Assumption 1 holds and that*
*is a root*-
*consistent and asymptotically normal estimator of*
*. Then*

Decompose

In short, Proposition 3 demonstrates that the probability that

*Suppose that*
*. Then*

A proof of Corollary 1 follows by noting that
_{G}[τ]=ℝ, then the value that any estimator
_{G}[τ]=ℝ^{+} and

Our results can be generalized to stronger claims straightforwardly. When a regularity condition is imposed on the rate of convergence of

*Suppose that Assumption 1 holds and*
*obeys*
*, where*
*Then*

We will show that the mean square error of

Since

## 3 Discussion

Our results highlight the additional certainty obtained by data-adaptively choosing the population for which average causal effects are measured on the basis of efficiency. It is well known that efficiency gains may be obtained through data-adaptive inference. But the extent to which the researcher can benefit from such practice has been understated. Under treatment effect heterogeneity – a precondition for locality to be a concern – all root-

There is of course a cost to this superefficiency: the target parameter is likely not of intrinsic interest. This issue is not unique to our setting, and other methods that change the inferential target based on efficiency concerns may be subject to this critique. As Crump et al. ([1], p. 188) notes, “external validity may be lost by changing the focus to average treatment effects for a subset of the original sample.” This is exacerbated in our setting by the researcher’s lack of knowledge about the characteristics of the subpopulation under study. Our result represents an extreme case of privileging efficiency over targeting population average causal effects. However, our results provide insight into a potential pathology of data-adaptivity purely on efficiency concerns: the gains in statistical certainty may be essentially unbounded without further restrictions. We hope that future work in the domain of efficiency theory for data-adaptive parameters will consider classes of restrictions that would exclude the case considered here.

# Acknowledgement

The author thanks Don Green, Cyrus Samii, Jas Sekhon, Mark van der Laan, and two anonymous reviewers for helpful comments. The author expresses particular gratitude to Jas Sekhon for suggesting a parsimonious proof strategy for Proposition 3 and to an anonymous reviewer for inspiring Corollary 1. All remaining errors are the author’s responsibility.

### References

1. Crump RK, Hotz VJ, Imbens GW, Mitnik OA. Dealing with limited overlap in estimation of average treatment effects. Biometrika 2009. Search in Google Scholar

2. Humphreys M. Bounds on least squares estimates of causal effects in the presence of heterogeneous assignment probabilities Columbia University, 2009 Manuscript. Search in Google Scholar

3. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press, 2009. Search in Google Scholar

4. Aronow PM, Samii C. Does regression produce representative estimates of causal effects? Am J Pol Sci 2016;60(1):250–267. Search in Google Scholar

5. Austin PC. Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm Stat 2011;10(2):150–161. Search in Google Scholar

6. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985;39(1):33–38. Search in Google Scholar

7. Abadie A, Imbens G. Simple and bias-corrected matching estimators for average treatment effects. NBER technical working paper no. 283 2002. Search in Google Scholar

8. Aronow PM, Green DP, Lee DK. Sharp bounds on the variance in randomized experiments. Ann Stat 2014;42(3):850–871. Search in Google Scholar

9. Balzer LB, Petersen ML, van der Laan MJ. Targeted estimation and inference for the sample average treatment effect. Berkeley, CA: Bepress, 2015. Search in Google Scholar

10. van der Laan MJ, Hubbard AE, Pajouh SK. Statistical inference for data adaptive target parameters. Princeton, NJ: Bepress, 2013. Search in Google Scholar

**Published Online:**2016-11-11

**Published in Print:**2016-9-1

©2016 by De Gruyter

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.