Show Summary Details
More options …

# The International Journal of Biostatistics

Ed. by Chambaz, Antoine / Hubbard, Alan E. / van der Laan, Mark J.

IMPACT FACTOR 2018: 1.309

CiteScore 2018: 1.11

SCImago Journal Rank (SJR) 2018: 1.325
Source Normalized Impact per Paper (SNIP) 2018: 0.715

Mathematical Citation Quotient (MCQ) 2018: 0.03

Online
ISSN
1557-4679
See all formats and pricing
More options …
Volume 12, Issue 1

# Optimal Individualized Treatments in Resource-Limited Settings

Alexander R. Luedtke
• Corresponding author
• Division of Biostatistics, University of California, 101 Haviland Hall, Berkeley, California 94720–7358, USA
• Email
• Other articles by this author:
/ Mark J. van der Laan
• Division of Biostatistics, University of California, 101 Haviland Hall, Berkeley, California 94720–7358, USA
• Other articles by this author:
Published Online: 2016-05-26 | DOI: https://doi.org/10.1515/ijb-2015-0007

## Abstract

An individualized treatment rule (ITR) is a treatment rule which assigns treatments to individuals based on (a subset of) their measured covariates. An optimal ITR is the ITR which maximizes the population mean outcome. Previous works in this area have assumed that treatment is an unlimited resource so that the entire population can be treated if this strategy maximizes the population mean outcome. We consider optimal ITRs in settings where the treatment resource is limited so that there is a maximum proportion of the population which can be treated. We give a general closed-form expression for an optimal stochastic ITR in this resource-limited setting, and a closed-form expression for the optimal deterministic ITR under an additional assumption. We also present an estimator of the mean outcome under the optimal stochastic ITR in a large semiparametric model that at most places restrictions on the probability of treatment assignment given covariates. We give conditions under which our estimator is efficient among all regular and asymptotically linear estimators. All of our results are supported by simulations.

## 1 Introduction

Suppose one wishes to maximize the population mean of some outcome using some binary point treatment, where for each individual clinicians have access to (some subset of) measured baseline covariates. Such a treatment strategy is termed an individualized treatment regime (ITR), and the (counterfactual) population mean outcome under an ITR is referred to as the value of an ITR. The ITR which maximizes the value is referred to as the optimal ITR or the optimal rule. There has been much recent work on this problem in the case where treatment is an unlimited resource (see Murphy [1] and Robins [2] for early works on the topic, and Chakraborty and Moodie [3] for a recent overview). It has been shown that the optimal treatment in this context is given by checking the sign of the average treatment effect conditional on (some subset of) the baseline covariates, also known as the blip function [2].

The optimal ITR assigns treatment to people from a given strata of covariates for which treatment is on average beneficial, and does not assign treatment to this strata otherwise. If treatment is even slightly beneficial to all subsets of the population, then such a treatment strategy would suggest treating the entire population. There are many realistic situations in which such a treatment strategy, or any strategy that treats a large proportion of the population, is not feasible due to limitations on the total amount of the treatment resource. In a discussion of Murphy [1], Arjas observed that resource constraints may render optimal ITRs of little practical use when the treatment of interest is a social or educational program, though no solution to the constrained problem was given [4].

The mathematical modeling literature has considered the resource allocation problem to a greater extent. Lasry et al. [5] developed a model to allocate the annual CDC budget for HIV prevention programs to subpopulations which would benefit most from such an intervention. Tao et al. [6] consider a mathematical model to optimally allocate screening procedures for sexually tranmitted diseases subject to a cost constraint. Though Tao et al. do not frame the problem as a statistical estimation problem, they end up confronting similar optimization challenges to those that we will face. In particular, they confront the (weakly) NP-hard knapsack problem from the combinatorial optimization literature [7, 8]. We will end up avoiding most of the challenges associated with this problem by primarily focusing on stochastic treatment rules, which will reduce to the easier fractional knapsack problem [9, 8]. Stochastic ITRs allow the treatment to rely on some external stochastic mechanism for individuals in a particular strata of covariates.

We consider a resource constraint under which there is a maximum proportion of the population which can be treated. We primarily focus on evaluating the public health impact of an optimal resource-constrained (R-C) ITR via its value. The value function has been shown to be of interest in several previous works (see, e.g., Zhang et al. [10], van der Laan and Luedtke [11], Goldberg et al. [12]. Despite the general interest of this quantity, estimating this quantity is challenging for unconstrained deterministic regimes at so-called exceptional laws, i.e. probability distributions at which the blip function is zero in some positive probability strata of covariates [2]; a slightly more general assumption is given in Luedtke and van der Laan [13]. Chakraborty et al. [14] showed that one can develop confidence intervals for this parameter using m-out-of-n bootstrap, though these confidence intervals shrink at a slower than root-n rate. Luedtke and van der Laan [13] showed that root-n rate confidence intervals can be developed for this quantity under reasonable conditions in the large semiparametric model which at most places restrictions on the treatment mechanism.

We develop a root-n rate estimator for the optimal R-C value and corresponding confidence intervals in this same large semiparametric model. We show that our estimator is efficient among all regular and asymptotically linear estimators under conditions. When the baseline covariates are continuous and the resource constraint is active, i.e. when the optimal R-C value is less than the optimal unconstrained value, these conditions are far more reasonable than the non-exceptional law assumption needed for regular estimation of the optimal unconstrained value.

We now give a brief outline of the paper. Section 2 defines the statistical estimation problem of interest, gives an expression for the optimal deterministic rule under a condition, and gives a general expression for the optimal stochastic rule. Section 3 presents our estimator of the optimal R-C value. Section 4 presents conditions under which the optimal R-C value is pathwise differentiable, and gives an explicit expression for the canonical gradient under these conditions. Section 5 describes the properties of our estimator, including how to develop confidence intervals for the optimal R-C value. Section 6 presents our simulation methods. Section 7 presents our simulation results. Section 8 closes with a discussion and areas of future research. All proofs are given in the Appendix.

## 2 Optimal R-C rule and value

Suppose we observe n independent and identically distributed (i.i.d.) draws from a single time point data structure $\left(W,A,Y\right)\sim {P}_{0}$, where the vector of covariates W has support ${W}$, the treatment A has support $\left\{0,1\right\}$, and the outcome Y has support in the closed unit interval. Our statistical model is nonparametric, beyond possible knowledge of the treatment mechanism, i.e. the probability of treatment given covariates. Little generality is lost with the bound on Y, given that any continuous outcome bounded in [b, c] can be rescaled to the unit interval with the linear transformation $\left(y-b\right)/\left(c-b\right)$. Suppose that treatment are resources are limited so that at most a $\mathrm{\kappa }\in \left(0,1\right)$ proportion of the population can receive the treatment $A=1$. Let V be some function of W, and denote the support of V with ${V}$. A deterministic treatment rule $\stackrel{˜}{d}$ takes as input a function of the covariates $v\in {V}$ and outputs a binary treatment decision $\stackrel{˜}{d}\left(v\right)$. The stochastic treatment rules considered in this work are maps from ${U}\in {V}$ to $\left\{0,1\right\}$, where ${U}$ is the support of some random variable $U\sim {P}_{U}$. If d is a stochastic rule and $u\in {U}$ is fixed, then $d\left(u,\cdot \right)$ represents a deterministic treatment rule. Throughout this work we will let U be drawn independently of all draws from ${P}_{0}$.

For a distribution P, let ${\stackrel{ˉ}{Q}}_{P}\left(a,w\right)\stackrel{\mathrm{\Delta }}{=}{E}_{P}\left[Y|A=a,W=w\right]$. For notational convenience, we let ${\stackrel{ˉ}{Q}}_{0}\triangleq {\stackrel{ˉ}{Q}}_{{P}_{0}}$. Let $\stackrel{˜}{d}$ be a deterministic treatment regime. For a distribution P, let ${\stackrel{˜}{\mathrm{\Psi }}}_{\stackrel{˜}{d}}\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{P}\left(\stackrel{˜}{d}\left(V\right),W\right)\right]$ represent the value of $\stackrel{˜}{d}$. Under causal assumptions, this quantity is equal to the counterfactual mean outcome if, possibly contrary to fact, the rule $\stackrel{˜}{d}$ were implemented in the population [15, 16]. The optimal R-C deterministic regime at P is defined as the deterministic regime $\stackrel{˜}{d}$ which solves the optimization problem $\mathrm{M}\mathrm{a}\mathrm{x}\mathrm{i}\mathrm{m}\mathrm{i}\mathrm{z}\mathrm{e}\phantom{\rule{thinmathspace}{0ex}}{\stackrel{˜}{\mathrm{\Psi }}}_{\stackrel{˜}{d}}\left(P\right)\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{s}\mathrm{u}\mathrm{b}\mathrm{j}\mathrm{e}\mathrm{c}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}\mathrm{t}\mathrm{o}\phantom{\rule{thinmathspace}{0ex}}{E}_{{P}_{0}}\left[\stackrel{˜}{d}\left(V\right)\right]\le \mathrm{\kappa }.$(1)

For a stochastic regime d, let ${\mathrm{\Psi }}_{d}\left(P\right)\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{U}}\left[{\stackrel{˜}{\mathrm{\Psi }}}_{d}\left(U,.\right)\left(P\right)\right]$ represent the value of d. Under causal assumptions, this quantity is equal to the counterfactual mean outcome if, possibly contrary to fact, the stochastic rule d were implemented in the population (see Ref. [17] for a similar identification result). The optimal R-C stochastic regime at P is defined as the stochastic treatment regime d which solves the optimization problem $\mathrm{M}\mathrm{a}\mathrm{x}\mathrm{i}\mathrm{m}\mathrm{i}\mathrm{z}\mathrm{e}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{\Psi }}_{d}\left(P\right)\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{s}\mathrm{u}\mathrm{b}\mathrm{j}\mathrm{e}\mathrm{c}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}\mathrm{t}\mathrm{o}\phantom{\rule{thinmathspace}{0ex}}{E}_{{P}_{U}×P}\left[d\left(U,\phantom{\rule{thinmathspace}{0ex}}V\right)\right]\le \mathrm{\kappa }.$(2)We call the optimal value under a R-C stochastic regime $\mathrm{\Psi }\left(P\right)$. Because any deterministic regime can be written as a stochastic regime which does not rely on the stochastic mechanism U, we have that $\mathrm{\Psi }\left(P\right)\ge \stackrel{˜}{\mathrm{\Psi }}\left(P\right)$. The constraint ${E}_{{P}_{U}×P}\left[d\left(U,V\right)\right]\le \mathrm{\kappa }$ above is primarily meant to represent a clinical setting where each patient arrives at the clinic with covariate summary measure V, a value of U is drawn from ${P}_{U}$ for this patient, and treatment is then assigned according to $d\left(U,V\right)$. By Fubini’s theorem, this is like rewriting the above constraint as ${E}_{P}{E}_{{P}_{U}}\left[d\left(U,V\right)\right]\le \mathrm{\kappa }$. Nonetheless, this constraint also represents the case where a single value of $U=u$ is drawn for the entire population, and each individual is treated according to the deterministic regime $d\left(u,\cdot \right)$, i.e. ${E}_{{P}_{U}}{E}_{P}\left[d\left(U,V\right)\right]\le \mathrm{\kappa }$. This case appears less interesting because, for a fixed u, there is no guarantee that ${E}_{P}\left[d\left(u,V\right)\right]\le \mathrm{\kappa }$.

For a distribution P, define the blip function as ${\stackrel{ˉ}{Q}}_{b,P}\left(v\right)\stackrel{\mathrm{\Delta }}{=}{E}_{P}\left[|{\stackrel{ˉ}{Q}}_{P}\left(1,W\right)-{\stackrel{ˉ}{Q}}_{P}\left(0,W\right)V=v\right].$Let ${S}_{P}$ represent the survival function of ${\stackrel{ˉ}{Q}}_{b,P}$, i.e. $\mathrm{\tau }\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}P{r}_{P}\left({\stackrel{ˉ}{Q}}_{b,P}>\mathrm{\tau }\right)$. Let ${\mathrm{\eta }}_{P}\stackrel{\mathrm{\Delta }}{=}inf\left\{\mathrm{\tau }:{S}_{P}\left(\mathrm{\tau }\right)\le \mathrm{\kappa }\right\}$ $\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}{\mathrm{\tau }}_{P}\stackrel{\mathrm{\Delta }}{=}max\left\{{\mathrm{\eta }}_{P},0\right\}.$(3)For notational convenience we let ${\stackrel{ˉ}{Q}}_{b,0}\stackrel{\mathrm{\Delta }}{=}{\stackrel{ˉ}{Q}}_{b,{P}_{0}}$, ${S}_{0}\stackrel{\mathrm{\Delta }}{=}{S}_{{P}_{0}}$, ${\mathrm{\eta }}_{0}\stackrel{\mathrm{\Delta }}{=}{\mathrm{\eta }}_{{P}_{0}}$, and ${\mathrm{\tau }}_{0}\stackrel{\mathrm{\Delta }}{=}{\mathrm{\tau }}_{{P}_{0}}$.

Define the deterministic treatment rule ${\stackrel{˜}{d}}_{P}$ as $v\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}I\left({\stackrel{ˉ}{Q}}_{b,P}\left(v\right)>{\mathrm{\tau }}_{P}\right)$, and for notational convenience let ${\stackrel{˜}{d}}_{0}\stackrel{\mathrm{\Delta }}{=}{\stackrel{˜}{d}}_{{P}_{0}}$. We have the following result.

If $P{r}_{P}\left({\stackrel{ˉ}{Q}}_{b,P}\left(V\right)={\mathrm{\tau }}_{P}\right)=0$, then the ${\stackrel{˜}{d}}_{P}$ is an optimal deterministic rule satisfying the resource constraint, i.e. ${\stackrel{˜}{\mathrm{\Psi }}}_{{\stackrel{˜}{d}}_{P}}\left(P\right)$ attains the maximum described in eq. (1).

One can in fact show that ${\stackrel{˜}{d}}_{P}$ is the P almost surely unique optimal deterministic regime under the stated condition. We do not treat the case where $P{r}_{P}\left({\stackrel{ˉ}{Q}}_{b,P}\left(V\right)={\mathrm{\tau }}_{P}\right)>0$ for deterministic regimes, since in this case (1) is a more challenging problem: for discrete V with positive treatment effect in all strata, eq. (1) is a special case of the 0–1 knapsack problem, which is NP-hard, though is considered one of the easier problems in this class [7, 8]. In the knapsack problem, one has a collection of items, each with a value and a weight. Given a knapsack which can only carry a limited weight, the objective is to choose which items to bring so as to maximize the value of the items in the knapsack while respecting the weight restriction. Considering the optimization problem over stochastic rather than deterministic regimes yields a fractional knapsack problem, which is known to be solvable in polynomial time [9, 8]. The fractional knapsack problem differs from the 0–1 knapsack problem in that one can pack partial items, with the value of the partial items proportional to the fraction of the item packed.

Define the stochastic treatment rule ${d}_{P}$ by its distribution with respect to a random variable drawn from ${P}_{U}$: $P{r}_{{P}_{U}}\left({d}_{P}\left(U,v\right)=1\right)=\left\{\begin{array}{cc}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\mathrm{\kappa }-{S}_{P}\left({\mathrm{\tau }}_{P}\right),& \phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}{\stackrel{ˉ}{Q}}_{b,P}\left(v\right)={\mathrm{\tau }}_{P}\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}{\mathrm{\tau }}_{P}>0\\ I\left({\stackrel{ˉ}{Q}}_{b,P}\left(v\right)>{\mathrm{\tau }}_{P}\right),& \phantom{\rule{thickmathspace}{0ex}}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}\phantom{\rule{1pt}{0ex}}.\end{array}$We will let ${d}_{0}\stackrel{\mathrm{\Delta }}{=}{d}_{{P}_{0}}$. Note that ${\stackrel{˜}{d}}_{P}\left(V\right)$ and ${d}_{P}\left(U,V\right)$ are ${P}_{U}×P$ almost surely equal if $P{r}_{P}\left({\stackrel{ˉ}{Q}}_{b,P}\left(V\right)={\mathrm{\tau }}_{P}\right)=0$ or if ${\mathrm{\tau }}_{P}\le 0$, and thus have the same value in these settings. It is easy to show that ${E}_{{P}_{U}×P}\left[{d}_{P}\left(U,V\right)\right]=\mathrm{\kappa }\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}{\mathrm{\tau }}_{P}>0.$(4)The following theorem establishes the optimality of the stochastic rule ${d}_{P}$ in a resource-limited setting.

The maximum in eq. (2) is attained at $d={d}_{P}$, i.e. ${d}_{P}$ is an optimal stochastic rule.

Note that the above theorem does not claim that ${d}_{P}$ is the unique optimal stochastic regime. For discrete V, the above theorem is an immediate consequence of the discussion of the knapsack problem in Dantzig [9].

In this paper we focus on the value of the optimal stochastic rule. Nonetheless, the techniques that we present in this paper will only yield valid inference in the case where the data are generated according to a distribution ${P}_{0}$ for which $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)=0$. This is analogous to assuming a non-exceptional law in settings where resources are not limited [13, 2], though we note that for continuous covariates V this assumption is much more likely if ${\mathrm{\tau }}_{0}>0$. It seems unlikely that the treatment effect in some positive probability strata of covariates will concentrate on some arbitrary (determined by the constraint $\mathrm{\kappa }$) value ${\mathrm{\tau }}_{0}$. Nonetheless, one could deal with situations where $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)>0$ using similar martingale-based online estimation techniques to those presented in Luedtke and van der Laan [13].

## 3 Estimating the optimal optimal R-C value

We now present an estimation strategy for the optimal R-C rule. The upcoming sections justify this strategy and suggest that it will perform well for a wide variety of data generating distributions. The estimation strategy proceeds as follows:

• 1.

Obtain estimates ${\stackrel{ˉ}{Q}}_{n}$, ${\stackrel{ˉ}{Q}}_{b,n}$, and ${g}_{n}$ of ${\stackrel{ˉ}{Q}}_{0}$, ${\stackrel{ˉ}{Q}}_{b,0}$, and ${g}_{0}$ using any desired estimation strategy which respects the fact that Y is bounded in the unit interval.

• 2.

Estimate the marginal distributions of W and V with the corresponding empirical distributions.

• 3.

Estimate ${S}_{0}$ with the plug-in estimator ${S}_{n}$ given by $\mathrm{\tau }\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}\frac{1}{n}{\sum }_{i=1}^{n}I\left({\stackrel{ˉ}{Q}}_{b,n}\left({v}_{i}\right)>\mathrm{\tau }\right)$.

• 4.

Estimate ${\mathrm{\eta }}_{0}$ with the plug-in estimator ${\mathrm{\eta }}_{n}\stackrel{\mathrm{\Delta }}{=}inf\left\{\mathrm{\tau }:{S}_{n}\left(\mathrm{\tau }\right)\le \mathrm{\kappa }\right\}$.

• 5.

Estimate ${\mathrm{\tau }}_{0}$ with the plug-in estimator given by ${\mathrm{\tau }}_{n}\stackrel{\mathrm{\Delta }}{=}max\left\{{\mathrm{\eta }}_{n},0\right\}$.

• 6.

Estimate ${d}_{0}$ with the plug-in estimator ${d}_{n}$ with distribution $P{r}_{{P}_{U}}\left({d}_{n}\left(U,v\right)=1\right)=\left\{\begin{array}{cc}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\mathrm{\kappa }-{S}_{n}\left({\mathrm{\tau }}_{n}\right),& \phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}{\stackrel{ˉ}{Q}}_{b,n}\left(v\right)={\mathrm{\tau }}_{n}\phantom{\rule{thinmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{\tau }}_{n}>0\\ I\left({\stackrel{ˉ}{Q}}_{b,n}\left(v\right)>{\mathrm{\tau }}_{n}\right),& \phantom{\rule{-63pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}\phantom{\rule{-1pt}{0ex}}.\end{array}$

• 7.

Run a TMLE for the parameter ${\mathrm{\Psi }}_{{d}_{n}}\left({P}_{0}\right)$:

• (a)

For $\stackrel{˜}{a}\in \left\{0,1\right\}$, define $H\left(a,w\right)\stackrel{\mathrm{\Delta }}{=}\frac{P{r}_{{P}_{U}}\left({d}_{n}\left(U,v\right)=a\right)}{{g}_{n}\left(a|w\right)}$. Run a univariate logistic regression using: $\phantom{\rule{1pt}{0ex}}\mathrm{O}\mathrm{u}\mathrm{t}\mathrm{c}\mathrm{o}\mathrm{m}\mathrm{e}\phantom{\rule{negativethinmathspace}{0ex}}:\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\left({y}_{i}:i=1,...,n\right)$ $\phantom{\rule{1pt}{0ex}}\mathrm{O}\mathrm{f}\mathrm{f}\mathrm{s}\mathrm{e}\mathrm{t}\phantom{\rule{negativethinmathspace}{0ex}}:\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\left(\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}{\stackrel{ˉ}{Q}}_{n}\left({a}_{i},{w}_{i}\right):i=1,...,n\right)$ $\phantom{\rule{-38pt}{0ex}}\mathrm{C}\mathrm{o}\mathrm{v}\mathrm{a}\mathrm{r}\mathrm{i}\mathrm{a}\mathrm{t}\mathrm{e}\phantom{\rule{negativethinmathspace}{0ex}}:\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\left(H\left({a}_{i},{w}_{i}\right):i=1,...,n\right).$

Let ${\mathrm{\epsilon }}_{n}$ represent the estimate of the coefficient for the covariate, i.e. ${\mathrm{\epsilon }}_{n}\stackrel{\mathrm{\Delta }}{=}\underset{\mathrm{\epsilon }\in \mathbb{R}}{\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{a}\mathrm{x}}\phantom{\rule{thickmathspace}{0ex}}\frac{1}{n}\sum _{i=1}^{n}\left[{\stackrel{ˉ}{Q}}_{n}^{\mathrm{\epsilon }}\left({a}_{i},{w}_{i}\right)log{y}_{i}+\left(1-{\stackrel{ˉ}{Q}}_{n}^{\mathrm{\epsilon }}\left({a}_{i},{w}_{i}\right)\right)log\left(1-{y}_{i}\right)\right],$where ${\stackrel{ˉ}{Q}}_{n}^{\mathrm{\epsilon }}\left(a,w\right)\stackrel{\mathrm{\Delta }}{=}\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}{}^{-1}\left(\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\phantom{\rule{thinmathspace}{0ex}}{\stackrel{ˉ}{Q}}_{n}\left(a,w\right)+\mathrm{\epsilon }H\left(a,w\right)\right)$.

• (b)

Define ${\stackrel{ˉ}{Q}}_{n}^{\ast }\stackrel{\mathrm{\Delta }}{=}{\stackrel{ˉ}{Q}}_{n}^{{\mathrm{\epsilon }}_{n}}$.

• (c)

Estimate ${\mathrm{\Psi }}_{{d}_{n}}\left({P}_{0}\right)$ using the plug-in estimator given by

${\mathrm{\Psi }}_{{d}_{n}}\left({P}_{n}^{\ast }\right)\stackrel{\mathrm{\Delta }}{=}\frac{1}{n}\sum _{i=1}^{n}\sum _{a=0}^{1}{\stackrel{ˉ}{Q}}_{n}^{\ast }\left(a,{w}_{i}\right)P{r}_{{P}_{U}}\left({d}_{n}\left(U,{v}_{i}\right)=a\right).$

We use ${\mathrm{\Psi }}_{{d}_{n}}\left({P}_{n}^{\ast }\right)$ as our estimate of $\mathrm{\Psi }\left({P}_{0}\right)$. We will denote this estimator $\stackrel{ˆ}{\mathrm{\Psi }}$, where we have defined $\stackrel{ˆ}{\mathrm{\Psi }}$ so that $\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)={\mathrm{\Psi }}_{{d}_{n}}\left({P}_{n}^{\ast }\right)$. Note that we have used a TMLE for the data dependent parameter ${\mathrm{\Psi }}_{{d}_{n}}\left({P}_{0}\right)$, which represents the value under a stochastic intervention ${d}_{n}$. Nonetheless, we assume that $P{r}_{{P}_{0}}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)=0$ for many of the results pertaining to our estimator $\stackrel{ˆ}{\mathrm{\Psi }}$, i.e. we assume that the optimal R-C rule is deterministic. We view estimating the value under a stochastic rather than deterministic intervention as worthwhile because one can give conditions under which the above estimator is (root-n) consistent for $\mathrm{\Psi }\left({P}_{0}\right)$ at all laws ${P}_{0}$, even if non-negligible bias invalidates standard Wald-type confidence intervals for the parameter of interest at laws ${P}_{0}$ for which $P{r}_{{P}_{0}}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)>0$.

We will use ${P}_{n}^{\ast }$ to denote any distribution for which ${\stackrel{ˉ}{Q}}_{{P}_{n}^{\ast }}={\stackrel{ˉ}{Q}}_{n}^{\ast }$, ${g}_{{P}_{n}^{\ast }}={g}_{n}$, and ${P}_{n}^{\ast }$ has the marginal empirical distribution of W for the marginal distribution of W. We note that such a distribution ${P}_{n}^{\ast }$ exists provided that ${\stackrel{ˉ}{Q}}_{n}^{\ast }$ and ${g}_{n}$ fall in the parameter spaces of $P\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}{\stackrel{ˉ}{Q}}_{P}\left(W\right)$ and $P\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}{g}_{P}$, respectively.

In practice we recommend estimating ${\stackrel{ˉ}{Q}}_{0}$ and ${\stackrel{ˉ}{Q}}_{b,0}$ using an ensemble method such as super-learning to make an optimal bias-variance trade-off (or, more generally, minimize cross-validated risk) between a mix of parametric models and data adaptive regression algorithms [18, 19]. If the treatment mechanism ${g}_{0}$ is unknown then we recommend using similar data adaptive approaches to obtain the estimate ${g}_{n}$. If ${g}_{0}$ is known (as in a randomized controlled trial without missingness), then one can either take ${g}_{n}={g}_{0}$ or estimate ${g}_{0}$ using a correctly specified parametric model, which we expect to increase the efficiency of estimators when the ${\stackrel{ˉ}{Q}}_{0}$ part of the likelihood is misspecified [20, 21].

There is typically little downside to using data adaptive approaches to estimate the needed portions of the likelihood, though we do give a formal empirical process condition in Section 5.1 which describes exactly how data adaptive these estimators can be. If one is concerned about the data adaptivity of the estimators of the needed portions of the likelihood, then one can consider a cross-validated TMLE approach such as that presented in van der Laan and Luedtke [20] or an online one-step estimator as that presented in Luedtke and van der Laan [13]. These two approaches make no restrictions on the data adaptivity of the estimators of ${\stackrel{ˉ}{Q}}_{0}$, ${\stackrel{ˉ}{Q}}_{b,0}$, or ${g}_{0}$.

We now outline the main results of this paper, which hold under appropriate consistency and regularity conditions.

• Asymptotic linearity of $\stackrel{ˆ}{\mathrm{\Psi }}$:

$\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)-\mathrm{\Psi }\left({P}_{0}\right)=\frac{1}{n}\sum _{i=1}^{n}{D}_{0}\left({O}_{i}\right)+{o}_{{P}_{0}}\left({n}^{-1/2}\right),$

with ${D}_{0}$ a known function of ${P}_{0}$.

• $\stackrel{ˆ}{\mathrm{\Psi }}$ is an asymptotically efficient estimate of $\mathrm{\Psi }\left({P}_{0}\right)$.

• One can obtain a consistent estimate ${\mathrm{\sigma }}_{n}^{2}$ for the variance of ${D}_{0}\left(O\right)$. An asymptotically valid 95% confidence intervals for $\mathrm{\Psi }\left({P}_{0}\right)$ given by $\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)±1.96{\mathrm{\sigma }}_{n}/\sqrt{n}$.

The upcoming sections give the consistency and regularity conditions which imply the above results.

## 4 Canonical gradient of the optimal R-C value

The pathwise derivative of $\mathrm{\Psi }$ will provide a key ingredient for analyzing the asymptotic properties of our estimator. We refer the reader to Pfanzagl [22] and Bickel et al. [23] for an overview of the crucial role that the pathwise derivative plays in semiparametric efficiency theory. We remind the reader that an estimator $\stackrel{ˆ}{\mathrm{\Phi }}$ is an asyptotically linear estimator of a parameter $\mathrm{\Phi }\left({P}_{0}\right)$ with influence curve $I{C}_{{P}_{0}}$ provided that $\stackrel{ˆ}{\mathrm{\Phi }}\left({P}_{n}\right)-\mathrm{\Phi }\left({P}_{0}\right)=\frac{1}{n}\sum _{i=1}^{n}I{C}_{{P}_{0}}\left({O}_{i}\right)+{o}_{{P}_{0}}\left({n}^{-1/2}\right).$If $\mathrm{\Phi }$ is pathwise differentiable with canonical gradient $I{C}_{{P}_{0}}$, then $\stackrel{ˆ}{\mathrm{\Phi }}$ is RAL and asymptotically efficient (minimum variance) among all such RAL estimators of $\mathrm{\Phi }\left({P}_{0}\right)$ [23, 22].

For $o\in {O}$, a deterministic rule $\stackrel{˜}{d}$, and a real number $\mathrm{\tau }$, define ${D}_{1}\left(\stackrel{˜}{d},P\right)\left(o\right)\stackrel{\mathrm{\Delta }}{=}\frac{I\left(a=\stackrel{˜}{d}\left(v\right)\right)}{{g}_{P}\left(a|w\right)}\left(y-{\stackrel{ˉ}{Q}}_{P}\left(a,w\right)\right)$ $\phantom{\rule{1em}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}{D}_{2}\left(\stackrel{˜}{d},P\right)\left(o\right)\stackrel{\mathrm{\Delta }}{=}{\stackrel{ˉ}{Q}}_{P}\left(\stackrel{˜}{d}\left(v\right),w\right)-{E}_{P}{\stackrel{ˉ}{Q}}_{P}\left(\stackrel{˜}{d}\left(V\right),W\right),$where ${g}_{P}\left(a|W\right)\stackrel{\mathrm{\Delta }}{=}P{r}_{P}\left(A=a|W\right)$. We will let ${g}_{0}\stackrel{\mathrm{\Delta }}{=}{g}_{{P}_{0}}$. We note that ${D}_{1}\left(\stackrel{˜}{d},P\right)+{D}_{2}\left(\stackrel{˜}{d},P\right)$ is the efficient influence curve of the parameter ${\stackrel{˜}{\mathrm{\Psi }}}_{\stackrel{˜}{d}}\left(P\right)$.

Let d be some stochastic rule. The canonical gradient of ${\mathrm{\Psi }}_{d}$ is given by $I{C}_{d}\left(P\right)\left(o\right)\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{U}}\left[{D}_{1}\left(d\left(U,\cdot \right),P\right)\left(o\right)+{D}_{2}\left(d\left(U,\cdot \right),P\right)\left(o\right)\right].$Define $D\left(d,\mathrm{\tau },P\right)\left(o\right)\stackrel{\mathrm{\Delta }}{=}I{C}_{d}\left(P\right)\left(o\right)-\mathrm{\tau }\left({E}_{{P}_{U}}\left[d\left(U,v\right)\right]-\mathrm{\kappa }\right).$For ease of reference, let ${D}_{0}\stackrel{\mathrm{\Delta }}{=}D\left({d}_{0},{\mathrm{\tau }}_{0},{P}_{0}\right)$. The upcoming theorem makes use of the following assumptions.

• C1. ${g}_{0}$ satisfies the positivity assumption: $P{r}_{0}\left(0<{g}_{0}\left(1|W\right)<1\right)=1$.

• C2. ${\stackrel{ˉ}{Q}}_{b,0}\left(W\right)$ has density ${f}_{0}$ at ${\mathrm{\eta }}_{0}$, and $0<{f}_{0}\left({\mathrm{\eta }}_{0}\right)<\mathrm{\infty }$.

• C3. ${S}_{0}$ is continuous in a neighborhood of ${\mathrm{\eta }}_{0}$.

• C4. $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)=\mathrm{\tau }\right)=0$ for all $\mathrm{\tau }$ in a neighborhood of ${\mathrm{\tau }}_{0}$.

We now present the canonical gradient of the optimal R-C value.

Suppose C1 through C4. Then $\mathrm{\Psi }$ is pathwise differentiable at ${P}_{0}$ with canonical gradient ${D}_{0}$.

Note that C3) implies that $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)=0$. Thus ${d}_{0}$ is (almost surely) deterministic and the expectation over ${P}_{U}$ in the definition of ${D}_{0}$ is superfluous. Nonetheless, this representation will prove useful when we seek to show that our estimator solves the empirical estimating equation defined by an estimate of $D\left({d}_{0},{\mathrm{\tau }}_{0},{P}_{0}\right)$.

When the resource constraint is active, i.e. ${\mathrm{\tau }}_{0}>0$, the above theorem shows that $\mathrm{\Psi }$ has an additional component over the optimal value parameter when no resource constraints are present [11]. The additional component is ${\mathrm{\tau }}_{0}×\left({E}_{{P}_{U}}\left[{d}_{0}\left(U,v\right)\right]-\mathrm{\kappa }\right)$, and is the portion of the derivative that relies on the fact that ${d}_{0}$ is estimated and falls on the edge of the parameter space. We note that it is possible that the variance of ${D}_{0}\left(O\right)$ is greater than the variance of $I{C}_{{d}_{0}}\left({P}_{0}\right)\left(O\right)$. If ${\mathrm{\tau }}_{0}=0$ then these two variances are the same, so suppose ${\mathrm{\tau }}_{0}>0$. Then, provided that $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{0}\right)=0$, we have that $Va{r}_{{P}_{0}}\left({D}_{0}\left(O\right)\right)-Va{r}_{{P}_{0}}\left(I{C}_{{d}_{0}}\left({P}_{0}\right)\right)$ $={\mathrm{\tau }}_{0}\mathrm{\kappa }\left(1-\mathrm{\kappa }\right)\left({\mathrm{\tau }}_{0}-2{E}_{{P}_{0}}\left[|{\stackrel{ˉ}{Q}}_{0}\left(1,W\right){\stackrel{˜}{d}}_{0}\left(V\right)=1\right]+2{E}_{{P}_{0}}\left[|{\stackrel{ˉ}{Q}}_{0}\left(0,W\right){\stackrel{˜}{d}}_{0}\left(V\right)=0\right]\right).$

For any $\mathrm{\kappa }\in \left(0,1\right)$, it is possible to exhibit a distribution ${P}_{0}$ which satisfies the conditions of Theorem 3 and for which $Va{r}_{{P}_{0}}\left({D}_{0}\left(O\right)\right)>Va{r}_{{P}_{0}}\left(I{C}_{{d}_{0}}\left({P}_{0}\right)\left(O\right)\right)$. Perhaps more surprisingly, it is also possible to exhibit a distribution ${P}_{0}$ which satisfies the conditions of Theorem 3 and for which $Va{r}_{{P}_{0}}\left({D}_{0}\left(O\right)\right). We omit further the discussion here because the focus of this work is on considering the estimating the value from the optimization problem (2), rather than discussing how this procedure relates to the estimation of other parameters.

## 5 Results about the proposed estimator

We now show that $\stackrel{ˆ}{\mathrm{\Psi }}$ is an asymptotically linear estimator for $\mathrm{\Psi }\left({P}_{0}\right)$ with influence curve ${D}_{0}$ provided our estimates of the needed parts of ${P}_{0}$ satisfy consistency and regularity conditions. Our theoretical results are presented in Section 5.1, and the conditions of our main theorem are discussed in Section 5.2.

## 5.1 Inference for $\mathrm{\Psi }\left({P}_{0}\right)$

For any distributions P and ${P}_{0}$ satisfying positivity, stochastic intervention d, and real number $\mathrm{\tau }$, define the following second-order remainder terms: $\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{R}_{10}\left(d,P\right)\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{U}×{P}_{0}}\left[\left(1-\frac{{g}_{0}\left(d|W\right)}{g\left(d|W\right)}\right)\left({\stackrel{ˉ}{Q}}_{P}\left(d,W\right)-{\stackrel{ˉ}{Q}}_{0}\left(d,W\right)\right)\right]$ $\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}{R}_{20}\left(d\right)\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{U}×{P}_{0}}\left[\left(d-{d}_{0}\right)\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)-{\mathrm{\tau }}_{0}\right)\right].$

Above the reliance of d and ${d}_{0}$ on $\left(U,V\right)$ is omitted in the notation. Let ${R}_{0}\left(d,P\right)\stackrel{\mathrm{\Delta }}{=}{R}_{10}\left(d,P\right)+{R}_{20}\left(d\right)$. The upcoming theorem will make use of the following assumptions.

• 1.

${g}_{0}$ satisfies the strong positivity assumption: $P{r}_{0}\left(\mathrm{\delta }<{g}_{0}\left(1|W\right)<1-\mathrm{\delta }\right)=1$ for some $\mathrm{\delta }>0$.

• 2.

${g}_{n}$ satisfies the strong positivity assumption for a fixed $\mathrm{\delta }>0$ with probability approaching 1: there exists some $\mathrm{\delta }>0$ such that, with probability approaching 1, $P{r}_{0}\left(\mathrm{\delta }<{g}_{n}\left(1|W\right)<1-\mathrm{\delta }\right)=1$.

• 3.

${R}_{0}\left({d}_{n},{P}_{n}^{\ast }\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$.

• 4.

${E}_{{P}_{0}}\left[{\left(D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)\left(O\right)-{D}_{0}\left(O\right)\right)}^{2}\right]={o}_{{P}_{0}}\left(1\right)$.

• 5.

$D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)$ belongs to a ${P}_{0}$-Donsker class ${D}$ with probability approaching 1.

• 6.

$\frac{1}{n}{\sum }_{i=1}^{n}D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)\left({O}_{i}\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$.

We note that the ${\mathrm{\tau }}_{0}$ in the final condition above only enters the expression in the sum as a multiplicative constant in front of $-{E}_{{P}_{U}}\left[d\left(U,{v}_{i}\right)\right]-\mathrm{\kappa }$.

$\stackrel{ˆ}{\mathrm{\Psi }}$ is asymptotically linear) Suppose C2) through 6. Then $\stackrel{ˆ}{\mathrm{\Psi }}$ is a RAL estimator of $\mathrm{\Psi }\left({P}_{0}\right)$ with influence curve ${D}_{0}$, i.e. $\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)-\mathrm{\Psi }\left({P}_{0}\right)=\frac{1}{n}\sum _{i=1}^{n}{D}_{0}\left({O}_{i}\right)+{o}_{{P}_{0}}\left({n}^{-1/2}\right).$

Further, $\stackrel{ˆ}{\mathrm{\Psi }}$ is efficient among all such RAL estimators of $\mathrm{\Psi }\left({P}_{0}\right)$.

Let ${\mathrm{\sigma }}_{0}^{2}\stackrel{\mathrm{\Delta }}{=}Va{r}_{{P}_{0}}\left({D}_{0}\right)$. By the central limit theorem, $\sqrt{n}\left(\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)-\mathrm{\Psi }\left({P}_{0}\right)\right)$ converges in distribution to a $N\left(0,{\mathrm{\sigma }}_{0}^{2}\right)$ distribution. Let ${\mathrm{\sigma }}_{n}^{2}\stackrel{\mathrm{\Delta }}{=}\frac{1}{n}{\sum }_{i=1}^{n}D\left({d}_{n},{\mathrm{\tau }}_{n},{P}_{n}^{\ast }\right)\left({O}_{i}{\right)}^{2}$ be an estimate of ${\mathrm{\sigma }}_{0}^{2}$. We now give the following lemma, which gives sufficient conditions for the consistency of ${\mathrm{\tau }}_{n}$ for ${\mathrm{\tau }}_{0}$.

(Consistency of ${\mathrm{\tau }}_{n}$) Suppose C2) and C3). Also suppose ${\stackrel{ˉ}{Q}}_{b,n}$ is consistent for ${\stackrel{ˉ}{Q}}_{b,0}$ in ${L}^{1}\left({P}_{0}\right)$ and that the estimate ${\stackrel{ˉ}{Q}}_{b,n}$ belongs to a ${P}_{0}$ Glivenko Cantelli class with probability approaching 1. Then ${\mathrm{\tau }}_{n}\to {\mathrm{\tau }}_{0}$ in probability.

It is easy to verify that conditions similar to those of Theorem 4, combined with the convergence of ${\mathrm{\tau }}_{n}$ to ${\mathrm{\tau }}_{0}$ as considered in the above lemma, imply that ${\mathrm{\sigma }}_{n}\to {\mathrm{\sigma }}_{0}$ in probability. Under these conditions, an asymptotically valid two-sided $1-\mathrm{\alpha }$ confidence interval is given by $\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)±{z}_{1-\mathrm{\alpha }/2}\frac{{\mathrm{\sigma }}_{n}}{\sqrt{n}},$where ${z}_{1-\mathrm{\alpha }/2}$ denotes the $1-\mathrm{\alpha }/2$ quantile of a $N\left(0,1\right)$ random variable.

## 5.2 Discussion of conditions of Theorem 4

Conditions C2) and C3). These are standard conditions used when attempting to estimate the $\mathrm{\kappa }$-quantile ${\mathrm{\eta }}_{0}$, defined in eq. (3). Provided good estimation of ${\stackrel{ˉ}{Q}}_{b,0}$, these conditions ensure that gathering a large amount of data will enable one to get a good estimate of the $\mathrm{\kappa }$-quantile of the random variable ${\stackrel{ˉ}{Q}}_{b,0}$. See Lemma 5 for an indication of what is meant by “good estimation” of ${\stackrel{ˉ}{Q}}_{b,0}$. It seems reasonable to expect that these conditions will hold when V contains continuous random variables and ${\mathrm{\eta }}_{0}\mathit{/}=0$, since we are essentially assuming that ${\stackrel{ˉ}{Q}}_{b,0}$ is not degenerate at the arbirtrary (determined by $\mathrm{\kappa }$) point ${\mathrm{\eta }}_{0}$.

Condition C4). If ${\mathrm{\tau }}_{0}>0$, then C4) is implied by C3). If ${\mathrm{\tau }}_{0}=0$, then C4) is like assuming a non-exceptional law, i.e. that the probability of a there being no treatment effect in a strata of V is zero. Because ${\mathrm{\tau }}_{0}$ is not known from the outset, we require something slightly stronger, namely that the probability of any specific small treatment effect is zero in a strata of V is zero. Note that this condition does not prohibit the treatment effect from being small, e.g. $P{r}_{0}\left(|{\stackrel{ˉ}{Q}}_{b,0}\left(V\right)|<\mathrm{\tau }\right)>0$ for all $\mathrm{\tau }>0$, but rather it prohibits there existing a sequence ${\mathrm{\tau }}_{m}↓0$ with the property that $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{m}\right)>0$ infinitely often. Thus this condition does not really seem any stronger than assuming a non-exceptional law. If one is concerned about such exceptional laws then we suggest adapting the methods in [13] to the R-C setting.

Condition 1. This condition assumes that people from each strata of covariates have a reasonable (at least a $\mathrm{\delta }>0$) probability of treatment.

Condition 2. This condition requires that our estimates of ${g}_{0}$ respect the fact that each strata of covariates has a reasonable probability of treatment.

Condition 3. This condition is satisfied if ${R}_{10}\left({d}_{n},{P}_{n}^{\ast }\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$ and ${R}_{20}\left({d}_{n}\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$. The term ${R}_{10}\left({d}_{n},{P}_{n}^{\ast }\right)$ takes the form of a typical double robust term that is small if either or ${\stackrel{ˉ}{Q}}_{0}$ is estimated well, and is second-order, i.e. one might hope that ${R}_{10}\left({d}_{n},{P}_{n}^{\ast }\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$, if both both ${g}_{0}$ and ${\stackrel{ˉ}{Q}}_{0}$ are estimated well. One can upper bound this remainder with a product of the ${L}^{2}\left({P}_{0}\right)$ rates of convergence of these two quantities using the Cauchy-Schwarz inequality. If ${g}_{0}$ is known, then one can take ${g}_{n}={g}_{0}$ and this term is zero.

Ensuring that ${R}_{20}\left({d}_{n}\right)={o}_{{P}_{0}}\left({n}^{-1/2}\right)$ requires a little more work but will still prove to be a reasonable condition. We will use the following margin assumption for some $\mathrm{\alpha }>0$: $P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}|\le t\right)\lesssim {t}^{\mathrm{\alpha }}\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{f}\mathrm{o}\mathrm{r}\phantom{\rule{thickmathspace}{0ex}}\mathrm{a}\mathrm{l}\mathrm{l}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}t>0,$(5)

where ‘‘$\lesssim$’’ denotes less than or equal to up to a multiplicative constant. This margin assumption is analogous to that used in Audibert and Tsybakov [24]. The following result relates the rate of convergence of ${R}_{20}\left({d}_{n}\right)$ to the rate at which ${\stackrel{ˉ}{Q}}_{b,n}-{\mathrm{\tau }}_{n}$ converges to ${\stackrel{ˉ}{Q}}_{b,\phantom{\rule{thinmathspace}{0ex}}0}-{\mathrm{\tau }}_{0}$.

If eq. (5) holds for some $\mathrm{\alpha }>0$, then

• 1.

$|{R}_{20}\left({d}_{n}\right)|\lesssim {∥\left({\stackrel{ˉ}{Q}}_{b,n}-{\mathrm{\tau }}_{n}\right)-\left({\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right)∥}_{2,{P}_{0}}^{2\left(1+\mathrm{\alpha }\right)/\left(2+\mathrm{\alpha }\right)}$

• 2.

$|{R}_{20}\left({d}_{n}\right)|\lesssim {∥\left({\stackrel{ˉ}{Q}}_{b,n}-{\mathrm{\tau }}_{n}\right)-\left({\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right)∥}_{\mathrm{\infty },{P}_{0}}^{1+\mathrm{\alpha }}$.

The above is similar to Lemma 5.2 in Audibert and Tsybakov [24], and a similar result was proved in the context of optimal ITRs without resource constraints in Luedtke and van der Laan [13]. If ${S}_{0}$ has a finite derivative at ${\mathrm{\tau }}_{0}$, as is given by C2), then one can take $\mathrm{\alpha }=1$. The above theorem then implies that ${R}_{20}\left({d}_{n}\right)\phantom{\rule{negativethinmathspace}{0ex}}=\phantom{\rule{negativethinmathspace}{0ex}}{o}_{{P}_{0}}\left({n}^{-1/2}\right)$ if either ${∥\left({\stackrel{ˉ}{Q}}_{b,n}-{\mathrm{\tau }}_{n}\right)-\left({\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right)∥}_{2,{P}_{0}}$ is ${o}_{{P}_{0}}\left({n}^{3/8}\right)$ or ${∥\left({\stackrel{ˉ}{Q}}_{b,n}-{\mathrm{\tau }}_{n}\right)-\left({\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right)∥}_{\mathrm{\infty },{P}_{0}}$ is ${o}_{{P}_{0}}\left({n}^{1/4}\right)$.

Condition 4. This is a mild consistency condition which is implied by the ${L}^{2}\left({P}_{0}\right)$ consistency of ${d}_{n}$, ${g}_{n}$, and ${\stackrel{ˉ}{Q}}_{n}^{\ast }$ to ${d}_{0}$, ${g}_{0}$, and ${\stackrel{ˉ}{Q}}_{0}$. We note that the consistency of the intial (unfluctuated) estimate ${\stackrel{ˉ}{Q}}_{n}$ for ${\stackrel{ˉ}{Q}}_{0}$ will imply the consistency of ${\stackrel{ˉ}{Q}}_{n}^{\ast }$ to ${\stackrel{ˉ}{Q}}_{0}$ given 2, since in this case ${\mathrm{\epsilon }}_{n}\to 0$ in probability, and thus ${∥{\stackrel{ˉ}{Q}}_{n}^{\ast }-{\stackrel{ˉ}{Q}}_{n}∥}_{2,{P}_{0}}\to 0$ in probability.

Condition 5. This condition places restrictions on how data adaptive the estimators of ${d}_{0}$, ${g}_{0}$, and ${\stackrel{ˉ}{Q}}_{0}$ can be. We refer the reader to Section 2.10 of van der Vaart and Wellner [25] for conditions under which the estimates of ${d}_{0}$, ${g}_{0}$, and ${\stackrel{ˉ}{Q}}_{0}$ belonging to Donsker classes implies that $D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)$ belongs to a Donsker class. We note that this condition was avoided for estimating the value function using a cross-validated TMLE in van der Laan and Luedtke [20] and using an online estimator of the value function in Luedtke and van der Laan [13], and using either technique will allow one to avoid the condition here as well.

Condition 6. Using the notation $Pf=\int f\left(o\right)dP\left(o\right)$ for any distribution P and function $f:{O}\to \mathbb{R}$, we have that ${P}_{n}D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)=\phantom{\rule{thickmathspace}{0ex}}{P}_{n}{D}_{1}\left({d}_{n},{P}_{n}^{\ast }\right)+{P}_{n}{D}_{2}\left({d}_{n},{P}_{n}^{\ast }\right)$ $\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}-{\mathrm{\tau }}_{0}\left(\frac{1}{n}\sum _{i=1}^{n}{E}_{{P}_{U}}\left[{d}_{n}\left(U,{v}_{i}\right)\right]-\mathrm{\kappa }\right).$

The first term is zero by the fluctuation step of the TMLE algorithm and the second term on the right is zero because ${P}_{n}^{\ast }$ uses the empirical distribution of W for the marginal distribution of W. If ${\mathrm{\tau }}_{0}=0$ then clearly the third term is zero, so suppose ${\mathrm{\tau }}_{0}>0$. Combining eq. (4) and the fact that ${d}_{n}$ is a substitution estimator shows that the third term is 0 with probability approaching 1 provided that ${\mathrm{\tau }}_{n}>0$ with probability approaching 1. This will of course occur if ${\mathrm{\tau }}_{n}\to {\mathrm{\tau }}_{0}>0$ in probability, for which Lemma 5 gives sufficient conditions.

## 6 Simulation methods

We simulated i.i.d. draws from two data generating distributions at sample sizes 100, 200, and 1,000. For each sample size and distribution we considered resource constraints $\mathrm{\kappa }=0.1$ and $\mathrm{\kappa }=0.9$. We ran 2,000 Monte Carlo draws of each simulation setting. All simulations were run in R [26].

We first present the two data generating distributions considered, and then present the estimation strategies used.

## 6.1.1 Simulation 1

Our first data generating distribution is identical to the single time point simulation considered in van der Laan and Luedtke [20] and Luedtke and van der Laan [18]. The outcome is binary and the baseline covariate vector $W=\left({W}_{1},...,{W}_{4}\right)$ is four dimensional for this distribution, with ${W}_{1},{W}_{2},{W}_{3},{W}_{4}\stackrel{i.i.d.}{\sim }N\left(0,1\right)$ $A|W\text{\hspace{0.17em}}\sim \text{\hspace{0.17em}}\mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\left(1/2\right)$ $\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\left({E}_{{P}_{0}}\left[Y|A,W,H=0\right]\right)=1-{W}_{1}^{2}+3{W}_{2}+A\left(5{W}_{3}^{2}-4.45\right)$ $\mathrm{l}\mathrm{o}\mathrm{g}\mathrm{i}\mathrm{t}\left({E}_{{P}_{0}}\left[Y|A,W,H=1\right]\right)=-0.5-{W}_{3}+2{W}_{1}{W}_{2}+A\left(3|{W}_{2}|-1.5\right),$where H is an unobserved $Bernoulli\left(1/2\right)$ variable independent of $A,W$. For this distribution ${E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(0,W\right)\right]\approx {E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(1,W\right)\right]\approx 0.464$.

We consider two choices for V, namely $V={W}_{3}$, and $V={W}_{1},...,{W}_{4}$. We obtained estimates of the approximate optimal R-C optimal value for this data generating distribution using ${10}^{7}$ Monte Carlo draws. When $\mathrm{\kappa }=0.1$, $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.493$ for $V={W}_{3}$ and $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.511$ for $V={W}_{1},...,{W}_{4}$. When $\mathrm{\kappa }=0.9$, $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.536$ for $V={W}_{3}$ and $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.563$ for $V={W}_{1},...,{W}_{4}$. We note that the resource constraint is not active (${\mathrm{\tau }}_{0}=0$) when $\mathrm{\kappa }=0.9$ for either choice of V.

## 6.1.2 Simulation 2

Our second data generating distribution is very similar to one of the distributions considered in Luedtke and van der Laan [13], though has been modified so that the treatment effect is positive for all values of the covariate. The data are generated as follows: $W~\text{Uniform}\left(-1,1\right)$ $A|W~Bernoulli\left(1/2\right)$ $Y|A,W\text{\hspace{0.17em}}\stackrel{˜}{\text{\hspace{0.17em}}}\text{\hspace{0.17em}}\mathrm{B}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{o}\mathrm{u}\mathrm{l}\mathrm{l}\mathrm{i}\phantom{\rule{thinmathspace}{0ex}}\left({\mathrm{Q}}_{0}\left(\mathrm{A},\mathrm{W}\right)\right),$where for $\stackrel{˜}{W}\stackrel{\mathrm{\Delta }}{=}W+5/6$ we define ${\stackrel{ˉ}{Q}}_{0}\left(A,W\right)-\frac{6}{10}\stackrel{\mathrm{\Delta }}{=}\left\{\begin{array}{cc}0,& \phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}A=1\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}-1/2\le W\le 1/3\\ -{\stackrel{˜}{W}}^{3}+{\stackrel{˜}{W}}^{2}-\frac{1}{3}\stackrel{˜}{W}+\frac{1}{27},& \phantom{\rule{-24pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}A=1\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}\mathrm{W}<-1/2\\ -{W}^{3}+{W}^{2}-\frac{1}{3}W+\frac{1}{27},& \phantom{\rule{-34pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{f}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}A=1\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}W>1/3\\ -\frac{3}{10},& \phantom{\rule{-78pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{o}\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{w}\mathrm{i}\mathrm{s}\mathrm{e}.\phantom{\rule{1pt}{0ex}}\end{array}$For this distribution ${E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(0,W\right)\right]=0.3$ and ${E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(1,W\right)\right]\approx 0.583$.

We use $V=W$. This simulation is an example of a case where ${\stackrel{ˉ}{Q}}_{b,0}\left(V\right)>0$ almost surely, so any constraint on resources will reduce the optimal value from its unconstrained value of 0.583. In particular, we have that $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.337$ when $\mathrm{\kappa }=0.1$ and $\mathrm{\Psi }\left({P}_{0}\right)\approx 0.572$ when $\mathrm{\kappa }=0.9$.

## 6.2 Estimating nuisance functions

We treated ${g}_{0}$ as known in both simulations and let ${g}_{n}={g}_{0}$. We estimated ${\stackrel{ˉ}{Q}}_{0}$ using the super-learner algorithm with the quasi-log-likelihood loss function (family = binomial) and a candidate library of data adaptive (SL.gam and SL.nnet) and parametric algorithms (SL.bayesglm, SL.glm, SL.glm.interaction, SL.mean, SL.step, SL.step.interaction, and SL.step.forward). We refer the reader to table 2 in the technical report Luedtke and van der Laan [18] for a brief description of these algorithms. We estimated ${\stackrel{ˉ}{Q}}_{b,0}$ by running a super-learner using the squared error loss function and the same candidate algorithms and used W to predict the outcome $\stackrel{˜}{Y}\stackrel{\mathrm{\Delta }}{=}\frac{2A-1}{{g}_{0}\left(A|W\right)}\left(Y-{\stackrel{ˉ}{Y}}_{n}\right)+{\stackrel{ˉ}{Y}}_{n}$, where ${\stackrel{ˉ}{Y}}_{n}$ represents the sample mean of Y from the n observations. See Luedtke and van der Laan [18] for a justification of this estimation scheme.

Once we had our estimates ${\stackrel{ˉ}{Q}}_{n}$, ${\stackrel{ˉ}{Q}}_{b,n}$, and ${g}_{n}$ we proceeded with the estimation strategy described in Section 3.

## 6.3 Evaluating performance

We used three methods to evaluate our proposed approach. First, we looked at the coverage of two-sided 95% confidence intervals for the optimal R-C value. Second, we report the average confidence interval widths. Finally, we looked at the power of the $\mathrm{\alpha }=0.025$ level test ${H}_{0}:\mathrm{\Psi }\left({P}_{0}\right)={\mathrm{\mu }}_{0}$ against ${H}_{1}:\mathrm{\Psi }\left({P}_{0}\right)>{\mathrm{\mu }}_{0}$, where ${\mathrm{\mu }}_{0}\stackrel{\mathrm{\Delta }}{=}{E}_{0}\left[{\stackrel{ˉ}{Q}}_{0}\left(0,W\right)\right]$ is treated as a known quantity. Under causal assumptions, ${\mathrm{\mu }}_{0}$ can be identified with the counterfactual quantity representing the population mean outcome if, possibly contrary to fact, no one receives treatment. If treatment is not currently being given in the population, one could substitute the population mean outcome (if known) for ${\mathrm{\mu }}_{0}$. Our test of significance consisted of checking of the lower bound in the two-sided 95% confidence interval is greater than ${\mathrm{\mu }}_{0}$. If an estimator of $\mathrm{\Psi }\left({P}_{0}\right)$ is low-powered in testing ${H}_{0}$ against ${H}_{1}$ then clearly the estimator will have little practical value.

## 7 Simulation results

Figure 1:

Coverage of two-sided 95% confidence intervals. As expected, coverage increases with sample size. The coverage tends to be better for $\mathrm{\kappa }=0.1$ than for $\mathrm{\kappa }=0.9$, though the estimator performed well at the largest sample size (1,000) for all simulations and choices of $\mathrm{\kappa }$. Error bars indicate 95% confidence intervals to account for uncertainty from the finite number of Monte Carlo draws.

The proposed estimation strategy performed well overall. Figure 1 demonstrates the coverage of 95% confidence intervals for the optimal R-C value. All methods performed well at all sample sizes for the highly constrained setting where $\mathrm{\kappa }=0.1$. The results were more mixed for the resource constraint $\mathrm{\kappa }=0.9$. All methods performed well at the largest sample size considered. This supports our theoretical results, which were all asymptotic in nature. For Simulation 1, in which the resource constraint was not active for either choice of V, the coverage dropped off at lower sample sizes. Coverage was approximately 90% in the two smaller sample sample sizes for $V={W}_{3}$, which may be expected for such an asymptotic method. For the more complex problem of estimating the optimal value when $V={W}_{1},...,{W}_{4}$ the coverage was somewhat lower (80% when $n=100$ and 84% when $n=200$). In Simulation 2, the coverage was better ($>$91%) for the smaller sample sizes. We note that the resource constraint was still active (${\mathrm{\tau }}_{0}>0$) when $\mathrm{\kappa }=0.9$ for this simulation, and also that the estimation problem is easier because the baseline covariate was univariate.

We report the average confidence interval widths across the 2,000 Monte Carlo draws. For $n=100$, average confidence interval widths were between 0.25 and 0.26 across all simulations and choices of $\mathrm{\kappa }$. For $n=200$, all average confidence interval widths were between 0.17 and 0.18. For $n=1000$, all average confidence interval widths were approximately 0.08. We note that the usefulness of such confidence intervals varies across simulations and choices of $\mathrm{\kappa }$. When $V={W}_{3}$ and $\mathrm{\kappa }=0.1$ in Simulation 1, the optimal R-C value is approximately 0.493, versus a baseline value ${\mathrm{\mu }}_{0}={E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(0,W\right)\right]$ of approximately 0.464. Thus here the confidence interval would give the investigator little information, even at a sample size of 1,000. In Simulation 2 with $\mathrm{\kappa }=0.9$, on the other hand, the optimal R-C value is approximately 0.572, versus a baseline value of ${\mathrm{\mu }}_{0}\approx 0.3$. Thus here all confidence intervals would likely be informative for investigators, even those made for data sets of size 100.

Figure 2:

Power of the $\mathrm{\alpha }=0.025$ level test of ${H}_{0}:\mathrm{\Psi }\left({P}_{0}\right)={\mathrm{\mu }}_{0}$ against ${H}_{1}:\mathrm{\Psi }\left({P}_{0}\right)>{\mathrm{\mu }}_{0}$, where ${\mathrm{\mu }}_{0}={E}_{{P}_{0}}\left[{\stackrel{ˉ}{Q}}_{0}\left(0,W\right)\right]$ is treated as known. Power increases with sample size and $\mathrm{\kappa }$. Error bars indicate 95% confidence intervals to account for uncertainty from the finite number of Monte Carlo draws.

Figure 2 gives the power of the $\mathrm{\alpha }=0.025$ level test ${H}_{0}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\mathrm{\Psi }\left({P}_{0}\right)={\mathrm{\mu }}_{0}$ against the alternative ${H}_{1}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\mathrm{\Psi }\left({P}_{0}\right)>{\mathrm{\mu }}_{0}$. Overall our method appears to have reasonable power in this statistical test. We see that power increases with sample size, the key property of consistent statistical tests. We also see that power increases with $\mathrm{\kappa }$, which is unsurprising given that Y is binary and ${g}_{0}\left(a|w\right)$ is $1/2$ for all $a,w$. We note that power will not always increase with $\mathrm{\kappa }$, for example if ${P}_{0}$ is such that ${g}_{0}\left(1|w\right)$ is very small for individuals with covariate w who are treated at $\mathrm{\kappa }=0.9$ but not at $\mathrm{\kappa }=0.1$. This observation is not meant as a criticism to the estimation scheme that we have presented because we assume that $\mathrm{\kappa }$ will be chosen to reflect real resource constraints, rather than to maximize the power for a test ${{H}_{0}}^{\prime }:\mathrm{\Psi }\left({P}_{0}\right)={\mathrm{\mu }}^{\prime }$ versus ${{H}_{1}}^{\prime }:\mathrm{\Psi }\left({P}_{0}\right)>{\mathrm{\mu }}^{\prime }$ for some fixed ${\mathrm{\mu }}^{\prime }$.

We also implemented an estimating equation based estimator for the optimal R-C value and found the two methods performed similarly. We would recommend using the TMLE in practice because it has been shown to be robust to near positivity violations in a wide variety of settings [27]. We note that ${g}_{0}\left(1|w\right)=1/2$ for all w in both of our simulations, so no near positivity violations occurred. We do not consider the estimation equation approach any further here because the focus of this work is on considering the optimization problem (2), rather than on comparing different estimation frameworks.

## 8 Discussion and future work

We have considered the problem of estimating the optimal resource-constrained value. Under causal assumptions, this parameter can be identified with the maximum attainable population mean outcome under individualized treatment rules which rely on some summary of measured covariates, subject to the constraint that a maximum proportion $\mathrm{\kappa }$ of the population can be treated. We also provided an explicit expression for an optimal stochastic rule under the resource constraint.

We derived the canonical gradient of the optimal R-C value under the key assumption that the treatment effect is not exactly equal to ${\mathrm{\tau }}_{0}$ in some strata of covariates which occurs with positive probability. The canonical gradient plays a key role in developing asymptotically linear estimators. We found that the canonical gradient of the optimal R-C value has an additional component when compared to the canonical gradient of the optimal unconstrained value when the resource constraint is active, i.e. when ${\mathrm{\tau }}_{0}>0$.

We presented a targeted minimum loss-based estimator for the optimal R-C value. This estimator was designed to solve the empirical mean of an estimate of the canonical gradient. This quickly yielded conditions under which our estimator is RAL, and efficient among all such RAL estimators. All of these results rely on the condition that the treatment effect is not exactly equal to ${\mathrm{\tau }}_{0}$ for positive probability strata of covariates. This assumption is more plausible than the typical non-exceptional law assumption when the covariates are continuous and the constraint is active because it may be unlikely that the treatment effect concentrates on an arbitrary (determined by $\mathrm{\kappa }$) ${\mathrm{\tau }}_{0}>0$. We note that this pseudo-non-exceptional law assumption has implied that the optimal stochastic rule is almost surely equal to the optimal deterministic rule. Though we have not presented formal theorems here, it is not difficult to derive conditions under which our estimator of the optimal value under a R-C stochastic rule is (root-n) consistent even when the treatment effect is equal to ${\mathrm{\tau }}_{0}$ with positive probability, though the bias will be non-negligible (converge to zero at the same root-n rate as the variance). One could use an analogue of the variance-stabilized online estimator presented in Luedtke and van der Laan [13] to get inference for the optimal R-C value in this setting.

Our simulations confirmed our theoretical findings. We found that coverage improved with sample size, with near-nominal coverage at the largest sample size considered. This is not surprising given that most of our analytic results were asymptotic, though we note that the method also performed well at the smaller sample sizes considered. The confidence intervals were informatively tight when one considered the difference between the optimal R-C value and the value under no treatment. Further simulations are needed to fully understand the behavior of this method in practice.

Some resource constraints encountered in practice may not be of the form ${E}_{{P}_{U}×{P}_{0}}\left[d\left(U,V\right)\right]\le \mathrm{\kappa }$. For example, the cost of distributing the treatment to people may vary based on the values of the covariates. For simplicity assume $V=W$. If $c:{W}\to \left[0,\mathrm{\infty }\right)$ is a cost function, then this constraint may take the form ${E}_{{P}_{U}×{P}_{0}}\left[c\left(W\right)d\left(U,W\right)\right]\le \mathrm{\kappa }$. If ${\mathrm{\tau }}_{0}=0$, then an optimal stochastic rule under such a constraint takes the form $\left(u,w\right)↦I\left({\stackrel{ˉ}{Q}}_{b,0}\left(w\right)>0\right)$. If ${\mathrm{\tau }}_{0}>0$, then an optimal stochastic rule under such a constraint takes the form $\left(u,w\right)↦I\left({\stackrel{ˉ}{Q}}_{b,0}\left(w\right)>{\mathrm{\tau }}_{0}c\left(w\right)\right)$ for w for which ${\stackrel{ˉ}{Q}}_{b,0}\left(w\right)\mathit{/}={\mathrm{\tau }}_{0}c\left(w\right)$ or $c\left(w\right)=0$, and randomly distributes the remaining resources uniformly among all remaining w. We leave further consideration of this more general resource constraint problem to future work.

In this work our primary focus has been on estimating the optimal value under a resource constraint, rather than the optimal rule under a resource constraint. Nonetheless, our estimation procedure yields an estimate ${d}_{n}$ of the optimal R-C rule. It would be interesting to further analyze ${d}_{n}$ in future work to better understand how well this estimator will perform, or if there are better estimators which more directly frame the estimation challenge as a (weighted) classification problem [28, 29]. Note that we are not guaranteed that ${d}_{n}$ satisfies the constraint, i.e. it is quite possible that ${E}_{{P}_{U}×{P}_{0}}\left[{d}_{n}\left(U,V\right)\right]>\mathrm{\kappa }$, though concentration inequalities suggest that one can give conditions under which ${E}_{{P}_{U}×{P}_{0}}\left[{d}_{n}\left(U,V\right)\right]-\mathrm{\kappa }$ is small with probability approaching 1. One could also seek an optimal rule estimate ${{d}_{n}}^{\prime }$ which satisfies that, with probability at least $1-\mathrm{\delta }$ for some user-defined $\mathrm{\delta }>0$, ${E}_{{P}_{U}×{P}_{0}}\left[{d}_{n}^{\prime }\left(U,V\right)\right]\le \mathrm{\kappa }$.

Further work is needed to generalize this work to the multiple time point setting. Before generalizing the procedure, one must know exactly what form the multiple time point constraint takes. For example, it may be the case that only a $\mathrm{\kappa }$ proportion of the population can be treated at each time point, or it may be the case that treatment can only be administered at a $\mathrm{\kappa }$ proportion of patient-time point pairs. Regardless of which constraint one chooses, it seems that the nice recursive structure encountered in Q-learning may not hold for multiple time point R-C problems. While useful for computational considerations, being able to express the optimal rule using approximate dynamic programming is not necessary for the existence of a good optimal rule estimator, especially when the number of time points is small. If the computational complexity of the procedure is a major concern, it may be beneficial to frame the multiple time point learning problem as a single optimization problem, using smooth surrogates for indicator functions as Zhao et al. [30] do when they introduce simultaneous outcome weighted learning (SOWL). One would then need to appropriately account for the fact that the empirical resource constraint may only be approximately satisfied.

We have not considered the ethical considerations associated with allocating limiting resources to a population. The debate over the appropriate means to distribute limited treatment resources to a population is ongoing (see, e.g., Brock and Wikler [31], Macklin and Cowan [32], Singh [33], for examples in the treatment of HIV/AIDS). Clearly any investigator needs to consider the ethical issues associated with certain resource allocation schemes. Our method is optimal in a particular utilitarian sense (maximizing the expected population mean outcome with respect to an outcome of interest) and yields a treatment strategy which treats individuals who are expected to benefit most from treatment in terms of our outcome of interest. One must be careful to ensure that the outcome of interest truly captures the most important public health implications. Unlike in unconstrained individualized medicine, inappropriately prescribing treatment to a stratum will also have implications for individuals outside of that strata, namely for the individuals who do not receive treatment due to its lack of availability. We leave further ethical considerations to experts on the matter. It will be interesting to see if there are settings in which it is possible to transform the outcome or add constraints to the optimization problem so that the statistical problem considered in this paper adheres to the ethical guidelines in those settings.

We have looked to generalize previous works in estimating the value of an optimal individualized treatment regime to the case where the treatment resource is a limited resource, i.e. where it is not possible to treat the entire population. This work should allow for the application of optimal personalized treatment strategies to many new problems of interest.

## References

• 1. Murphy SA. Optimal dynamic treatment regimes. J Roy Stat Soc Ser B 2003;650:331–6. Google Scholar

• 2. Robins JM. Optimal structural nested models for optimal sequential decisions. In: DY Lin and Heagerty P, editors. Proc. Second Seattle Symp. Biostat, 179, 2004:189–326. Crossref

• 3. Chakraborty B, Moodie EE. Statistical methods for dynamic treatment regimes. Berlin, Heidelberg, New York: Springer, 2013. Google Scholar

• 4. Arjas E, Jennison C, Dawid AP, Cox DR, Senn S, Cowell RG, et al. Optimal dynamic treatment regimes – discussion on the paper by murphy. J Roy Stat Soc Ser B 2003;65:355–66.

• 5. Lasry A, Sansom SL, Hicks KA, Uzunangelov V. A model for allocating CDCs HIV prevention resources in the United States. Health Care Manag Sci 2011;140:115–24. Google Scholar

• 6. Tao G, Zhao K, Gift T, Qiu F, Chen G. Using a resource allocation model to guide better local sexually transmitted disease control and prevention programs. Oper Res Heal Care 2012;10:23–9. Google Scholar

• 7. Karp RM. Reducibility among combinatorial problems. New York, Berlin, Heidelberg: Springer, 1972. Google Scholar

• 8. Korte B, Vygen J. Combinatorial optimization, 5th ed. Berlin, Heidelberg, New York: Springer, 2012. Google Scholar

• 9. Dantzig GB. Discrete-variable extremum problems. Oper Res 1957;50:266–88. Google Scholar

• 10. Zhang B, Tsiatis A, Davidian M, Zhang M, Laber E. A robust method for estimating optimal treatment regimes. Biometrics 2012;68:1010–18.

• 11. van der Laan MJ, Luedtke AR. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Technical report 329. Available at: http://www.Bepress.Com/Ucbbiostat/, Division of Biostatistics, University of California, Berkeley, 2014a.

• 12. Goldberg Y, Song R, Zeng D, Kosorok MR. Comment on dynamic treatment regimes: technical challenges and applications. Electron J Stat 2014;8:1290–300.

• 13. Luedtke AR, van der Laan MJ. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Technical Report 332. Available at: http://biostats.bepress.com/ucbbiostat/paper332/, Division of Biostatistics, University of California, Berkeley, submitted to Annals of Statistics, 2014b.

• 14. Chakraborty B, Laber EB, Zhao Y-Q. Inference about the expected performance of a data-driven dynamic treatment regime. Clin Trials 2014;110:408–17.

• 15. Pearl J. Causality: models, reasoning and inference, 2nd ed. New York: Cambridge University Press, 2009. Google Scholar

• 16. Robins JM. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Comput Math Appl 1987;140:139s–161s. ISSN 0097–4943. Google Scholar

• 17. Daz I, van der Laan MJ. Population intervention causal effects based on stochastic interventions. Biometrics 2012;680:541–9.

• 18. Luedtke AR, van der Laan MJ. Super-learning of an optimal dynamic treatment rule. Technical Report 326. Available at: http://www.bepress.com/ucbbiostat/, Division of Biostatistics, University of California, Berkeley, under Review at JCI, 2014a.

• 19. van der Laan MJ, Polley E, Hubbard A. Super learner. Stat Appl Genet Mol 2007;60. Article 25.ISSN 1. Google Scholar

• 20. van der Laan MJ, Luedtke AR. Targeted learning of the mean outcome under an optimal dynamic treatment rule. J Causal Inference 2014b. (Ahead Of print). DOI: .

• 21. van der laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York, Berlin, Heidelberg: Springer, 2003. Google Scholar

• 22. Pfanzagl J. Estimation in semiparametric models. Berlin, Heidelberg, New York: Springer, 1990. Google Scholar

• 23. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Baltimore: Johns Hopkins University Press, 1993. Google Scholar

• 24. Audibert JY, Tsybakov AB. Fast learning rates for plug-in classifiers. Ann Statist 2007;350:608–33. Google Scholar

• 25. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Berlin, Heidelberg, New York: Springer, 1996. Google Scholar

• 26. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2014. Available at: http://www.r-project.org/. Google Scholar

• 27. van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2011.

• 28. Rubin DB, van der laan MJ. Statistical issues and limitations in personalized medicine research with clinical trials. Int J Biostat 2012;8:article 18.

• 29. Zhao Y, Zeng D, Rush A, Kosorok M. Estimating individual treatment rules using outcome weighted learning. J Am Stat Assoc 2012;107:1106–18.

• 30. Zhao YQ, Zeng D, Laber EB, Kosorok MR. New statistical learning methods for estimating optimal dynamic treatment regimes. J Am Stat Assoc 2014. Web of Science

• 31. Brock DW, Wikler D. Ethical challenges in long-term funding for HIV/AIDS. Health Aff 2009;280:1666–76.

• 32. Macklin R, Cowan E. Given financial constraints, it would be unethical to divert antiretroviral drugs from treatment to prevention. Health Aff 2012;310:1537–44.

• 33. Singh JA. Antiretroviral resource allocation for HIV prevention. AIDS 2013;270:863–5.

## Proofs for Section 2

We first state a simple lemma.

For a distribution P and a stochastic rule d, we have the following representation for ${\mathrm{\Psi }}_{d}$: ${\mathrm{\Psi }}_{d}\left(P\right)\stackrel{\mathrm{\Delta }}{=}{E}_{{P}_{U}×P}\left[d\left(U,V\right){\stackrel{ˉ}{Q}}_{b,P}\left(V\right)\right]+{E}_{P}\left[{\stackrel{ˉ}{Q}}_{P}\left(0,W\right)\right].$

Proof of Lemma 6. We have that ${\mathrm{\Psi }}_{d}\left(P\right)={E}_{{P}_{U}×P}\left[d\left(U,V\right){\stackrel{ˉ}{Q}}_{P}\left(1,W\right)\right]+{E}_{{P}_{U}×P}\left[\left(1-d\left(U,V\right)\right){\stackrel{ˉ}{Q}}_{P}\left(0,W\right)\right]$ $={E}_{{P}_{U}×P}\left[d\left(U,V\right)\left({\stackrel{ˉ}{Q}}_{P}\left(1,W\right)-{\stackrel{ˉ}{Q}}_{P}\left(0,W\right)\right)\right]+{E}_{P}\left[{\stackrel{ˉ}{Q}}_{P}\left(0,W\right)\right]$ $={E}_{{P}_{U}×P}\left[d\left(U,V\right){\stackrel{ˉ}{Q}}_{b,P}\left(V\right)\right]+{E}_{P}\left[{\stackrel{ˉ}{Q}}_{P}\left(0,W\right)\right],$where the final equality holds by the law of total expectation.□

Proof of Theorem 1. This result will be a consequence of Theorem 2. If $P{r}_{P}\left({\stackrel{ˉ}{Q}}_{b,0}\left(V\right)={\mathrm{\tau }}_{P}\right)=0$, then ${d}_{P}\left(U,V\right)$ is ${P}_{U}×P$ almost surely equal to ${\stackrel{˜}{d}}_{P}\left(V\right)$, and thus ${\stackrel{˜}{\mathrm{\Psi }}}_{{\stackrel{˜}{d}}_{P}}\left(P\right)={\mathrm{\Psi }}_{{d}_{P}}\left(P\right)$. Thus $\left(u,v\right)↦{\stackrel{˜}{d}}_{P}\left(v\right)$ is an optimal stochastic regime. Because the class of deterministic regimes is a subset of the class of stochastic regimes, ${\stackrel{˜}{d}}_{P}$ is an optimal deterministic regime.□

Proof of Theorem 2. Let d be some stochastic treatment rule which satisfies the resource constraint. For $\left(b,c\right)\in \left\{0,1{\right\}}^{2}$, define ${B}_{bc}\stackrel{\mathrm{\Delta }}{=}\left\{\left(u,v\right):{d}_{P}\left(u,v\right)=b,d\left(u,v\right)=c\right\}$. Note that ${\mathrm{\Psi }}_{{d}_{P}}\left(P\right)-{\mathrm{\Psi }}_{d}\left(P\right)={E}_{{P}_{U}×P}\left[\left({d}_{P}\left(U,V\right)-d\left(U,V\right)\right){\stackrel{ˉ}{Q}}_{b,0}\left(V\right)\right]$ $={E}_{{P}_{U}×P}\left[{\stackrel{ˉ}{Q}}_{b,0}\left(V\right)I\left(\left(U,V\right)\in {B}_{10}\right)\right]-{E}_{{P}_{U}×P}\left[{\stackrel{ˉ}{Q}}_{b,0}\left(V\right)I\left(\left(U,V\right)\in {B}_{01}\right)\right]$(6)

The ${\stackrel{ˉ}{Q}}_{b,0}\left(V\right)$ in the first term in 1 can be upper bounded by ${\mathrm{\tau }}_{P}$, and in the second term can be lower bounded by ${\mathrm{\tau }}_{P}$. Thus, ${\mathrm{\Psi }}_{{d}_{P}}\left(P\right)-{\mathrm{\Psi }}_{d}\left(P\right)\ge {\mathrm{\tau }}_{P}\left[P{r}_{{P}_{U}×P}\left(\left(U,V\right)\in {B}_{10}\right)-P{r}_{{P}_{U}×P}\left(\left(U,V\right)\in {B}_{01}\right)\right]$ $={\mathrm{\tau }}_{P}\left[P{r}_{{P}_{U}×P}\left(\left(U,V\right)\in {B}_{10}\cup {B}_{11}\right)-P{r}_{{P}_{U}×P}\left(\left(U,V\right)\in {B}_{01}\cup {B}_{11}\right)\right]$ $={\tau }_{P}\left({E}_{{P}_{U}×P}\left[{d}_{P}\left(U,V\right)\right]-{E}_{{P}_{U}×P}\left[d\left(U,V\right)\right]\right).$

If ${\mathrm{\tau }}_{P}=0$ then the final line is zero. Otherwise, ${E}_{{P}_{U}×P}\left[{d}_{P}\left(U,V\right)\right]=\mathrm{\kappa }$ by eq. (4). Because d satisfies the resource constraint, ${E}_{{P}_{U}×P}\left[d\left(U,V\right)\right]\le \mathrm{\kappa }$ and thus the final line above is at least zero. Thus ${\mathrm{\Psi }}_{{d}_{P}}\left(P\right)-{\mathrm{\Psi }}_{d}\left(P\right)\ge 0$ for all ${\mathrm{\tau }}_{P}$. Because d was arbitrary, ${d}_{P}$ is an optimal stochastic rule.□

## Proofs for Section 4

Proof of Theorem 3. The pathwise derivative of $\mathrm{\Psi }\left(Q\right)$ is defined as ${|}_{\frac{d}{d\mathrm{\epsilon }}\mathrm{\Psi }\left(Q\left(\mathrm{\epsilon }\right)\right)}$ along paths $\left\{{P}_{\mathrm{\epsilon }}:\mathrm{\epsilon }\right\}\subset {M}$. In particular, these paths are chosen so that $d{Q}_{W,\mathrm{\epsilon }}=\left(1+\mathrm{\epsilon }{H}_{W}\left(W\right)\right)d{Q}_{W},$ $\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{w}\mathrm{h}\mathrm{e}\mathrm{r}\mathrm{e}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}E{H}_{W}\left(W\right)=0\text{\hspace{0.17em}}\mathrm{a}\mathrm{n}\mathrm{d}\text{\hspace{0.17em}}{C}_{W}\stackrel{\mathrm{\Delta }}{=}\underset{w}{sup}|{H}_{W}\left(w\right)|<\mathrm{\infty };$ $d{Q}_{Y,\mathrm{\epsilon }}\left(Y|A,W\right)=\left(1+\mathrm{\epsilon }{H}_{Y}\left(Y|A,W\right)\right)d{Q}_{Y}\left(Y|A,W\right),$ $\text{ }\text{\hspace{0.17em}}where\text{\hspace{0.17em}}\text{ }E\left({H}_{Y}|A,W\right)=0and\underset{w,a,y}{sup}|{H}_{Y}\left(y|a,w\right)|<\infty .$

The parameter $\mathrm{\Psi }$ is not sensitive to fluctuations of ${g}_{0}\left(a|w\right)=P{r}_{0}\left(a|w\right)$, and thus we do not need to fluctuate this portion of the likelihood. Let ${\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{\stackrel{ˉ}{Q}}_{b,{P}_{\mathrm{\epsilon }}}$, ${\stackrel{ˉ}{Q}}_{\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{\stackrel{ˉ}{Q}}_{{P}_{\mathrm{\epsilon }}}$, ${d}_{\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{d}_{{P}_{\mathrm{\epsilon }}}$, ${\mathrm{\eta }}_{\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{\mathrm{\eta }}_{{P}_{\mathrm{\epsilon }}}$, ${\mathrm{\tau }}_{\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{\mathrm{\tau }}_{{P}_{\mathrm{\epsilon }}}$, and ${S}_{\mathrm{\epsilon }}\stackrel{\mathrm{\Delta }}{=}{S}_{{P}_{\mathrm{\epsilon }}}$. First note that ${\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\left(v\right)={\stackrel{ˉ}{Q}}_{b,0}\left(v\right)+\mathrm{\epsilon }{h}_{\mathrm{\epsilon }}\left(v\right)$(7)for an ${h}_{\mathrm{\epsilon }}$ with $\underset{|\mathrm{\epsilon }|<1}{sup}\underset{v}{sup}|{h}_{\mathrm{\epsilon }}\left(v\right)|\stackrel{\mathrm{\Delta }}{=}{C}_{1}<\mathrm{\infty }.$(8)Note that C4) implies that ${d}_{0}$ is (almost surely) deterministic, i.e. ${d}_{0}\left(U,\cdot \right)$ is almost surely a fixed function. Let $\stackrel{˜}{d}$ represent the deterministic rule $v↦I\left({\stackrel{ˉ}{Q}}_{b,0}\left(v\right)>0\right)$ to which $d\left(u,\cdot \right)$ is (almost surely) equal for all u. By Lemma 1, $\begin{array}{l}\Psi \left({P}_{\epsilon }\right)-\Psi \left({P}_{0}\right)={\int }_{w}\left({E}_{{P}_{U}}\left[{d}_{\epsilon }\left(U,V\right)\right]-{\stackrel{˜}{d}}_{0}\left(V\right)\right){\overline{Q}}_{b,\epsilon }d{Q}_{W,\epsilon }\\ +{\int }_{w}{\stackrel{˜}{d}}_{0}\left(V\right)\left({\overline{Q}}_{b,\epsilon }d{Q}_{W,\epsilon }-{\overline{Q}}_{b,0}d{Q}_{W,0}\right)\\ +{E}_{{P}_{\epsilon }}{\overline{Q}}_{\epsilon }\left(0,W\right)-{E}_{{P}_{0}}{\overline{Q}}_{0}\left(0,W\right)\\ ={\int }_{w}\left({E}_{{P}_{U}}\left[{d}_{\epsilon }\left(U,V\right)\right]-{\stackrel{˜}{d}}_{0}\left(V\right)\right)\left({\overline{Q}}_{b,\epsilon }-{\tau }_{0}\right)d{Q}_{W,\epsilon }\\ +{\tau }_{0}{\int }_{w}\left({E}_{{P}_{U}}\left[{d}_{\epsilon }\left(U,V\right)\right]d{Q}_{W,\epsilon }-{\stackrel{˜}{d}}_{0}\left(V\right)d{Q}_{W,0}\right)\\ -{\tau }_{0}{\int }_{w}{\stackrel{˜}{d}}_{0}\left(V\right)\left(d{Q}_{W,\epsilon }-d{Q}_{W,0}\right)\\ +{\Psi }_{{d}_{0}}\left({P}_{\epsilon }\right)-{\Psi }_{{d}_{0}}\left({P}_{0}\right).\end{array}$(9)Dividing the fourth term by $\mathrm{\epsilon }$ and taking the limit as $\mathrm{\epsilon }\to 0$ gives the pathwise derivative of the mean outcome under the rule that treats ${d}_{0}$ as known. The third term can be written as $-\mathrm{\epsilon }{\mathrm{\tau }}_{0}{\int }_{w}{\stackrel{˜}{d}}_{0}\left(V\right){H}_{W}d{Q}_{W,0}$, and thus the pathwise derivative of this term is $-{\int }_{w}{\mathrm{\tau }}_{0}{\stackrel{˜}{d}}_{0}\left(V\right){H}_{W}d{Q}_{W,0}$. If ${\mathrm{\tau }}_{0}>0$, then ${E}_{{P}_{U}×{P}_{0}}\left[{\stackrel{˜}{d}}_{0}\left(V\right)\right]=\mathrm{\kappa }$. The pathwise derivative of this term is zero if ${\mathrm{\tau }}_{0}=0$. Thus, for all ${\mathrm{\tau }}_{0}$, $\underset{\mathrm{\epsilon }\to 0}{lim}-\frac{1}{\mathrm{\epsilon }}{\mathrm{\tau }}_{0}{\int }_{w}{\stackrel{˜}{d}}_{0}\left(V\right)\left(d{Q}_{W,\mathrm{\epsilon }}-d{Q}_{W,0}\right)={\int }_{w}\left(-{\mathrm{\tau }}_{0}\left({\stackrel{˜}{d}}_{0}\left(v\right)-\mathrm{\kappa }\right)\right){H}_{W}\left(w\right)d{Q}_{W,0}\left(w\right).$Thus the third term in eq. (9) generates the $v\phantom{\rule{thinmathspace}{0ex}}↦-{\mathrm{\tau }}_{0}\left({\stackrel{˜}{d}}_{0}\left(v\right)-\mathrm{\kappa }\right)$ portion of the canonical gradient, or equivalently $v\phantom{\rule{thinmathspace}{0ex}}↦-{\mathrm{\tau }}_{0}\left({E}_{{P}_{U}}\left[{d}_{0}\left(U,v\right)\right]-\mathrm{\kappa }\right)$. The remainder of this proof is used to show that the first two terms in eq. (9) are $o\left(\mathrm{\epsilon }\right)$.

Step 1: ${\mathrm{\eta }}_{\mathrm{\epsilon }}\to {\mathrm{\eta }}_{0}$.

We refer the reader to eq. (3) for a definition of the quantile $P↦{\mathrm{\eta }}_{P}$. This is a consequence of the continuity of ${S}_{0}$ in a neighborhood of ${\mathrm{\eta }}_{0}$. For $\mathrm{\gamma }>0$, $|{\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}|>\mathrm{\gamma }\phantom{\rule{1pt}{0ex}}\phantom{\rule{thickmathspace}{0ex}}\mathrm{i}\mathrm{m}\mathrm{p}\mathrm{l}\mathrm{i}\mathrm{e}\mathrm{s}\phantom{\rule{thickmathspace}{0ex}}\mathrm{t}\mathrm{h}\mathrm{a}\mathrm{t}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{1pt}{0ex}}{S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)\le \mathrm{\kappa }\text{\hspace{0.17em}}\mathrm{o}\mathrm{r}\text{\hspace{0.17em}}{S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)>\mathrm{\kappa }.$(10)

For positive constants ${C}_{1}$ and ${C}_{W}$, ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)\ge \left(1-{C}_{W}|\mathrm{\epsilon }|\right)P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}>{\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)\ge \left(1-{C}_{W}|\mathrm{\epsilon }|\right){S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }+{C}_{1}|\mathrm{\epsilon }|\right).$

Fix $\mathrm{\gamma }>0$ small enough so that ${S}_{0}$ is continuous at ${\mathrm{\eta }}_{0}-\mathrm{\gamma }$. In this case we have that ${S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }+{C}_{1}|\mathrm{\epsilon }|\right)\to {S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)$ as $\mathrm{\epsilon }\to 0$. By the infimum in the definition of ${\mathrm{\eta }}_{0}$, we know that ${S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)>\mathrm{\kappa }$. Thus ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)>\mathrm{\kappa }$ for all $|\mathrm{\epsilon }|$ small enough.

Similarly, ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)\le \left(1+{C}_{W}|\mathrm{\epsilon }|\right)\text{\hspace{0.17em}}{S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }-{C}_{1}|\mathrm{\epsilon }|\right)$. Fix $\mathrm{\gamma }>0$ small enough so that ${S}_{0}$ is continuous at ${\mathrm{\eta }}_{0}+\mathrm{\gamma }$. Then ${S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }-{C}_{1}|\mathrm{\epsilon }|\right)\to {S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)$ as $\mathrm{\epsilon }\to 0$. Condition C2) implies the uniqueness of the $\mathrm{\kappa }$-quantile of ${\stackrel{ˉ}{Q}}_{b,0}$, and thus that ${S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)<\mathrm{\kappa }$. It follows that ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)<\mathrm{\kappa }$ for all $|\mathrm{\epsilon }|$ small enough.

Combining ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)>\mathrm{\kappa }$ and ${S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)<\mathrm{\kappa }$ for all $\mathrm{\epsilon }$ close to zero with eq. (10) shows that ${\mathrm{\eta }}_{\mathrm{\epsilon }}\to {\mathrm{\eta }}_{0}$ as $\mathrm{\epsilon }\to 0$.

Step 2: Second term of eq. (9) is 0 eventually.

If ${\mathrm{\tau }}_{0}=0$ then the result is immediate, so suppose ${\mathrm{\tau }}_{0}>0$. By the previous step, ${\mathrm{\eta }}_{\mathrm{\epsilon }}\to {\mathrm{\eta }}_{0}$, which implies that ${\mathrm{\tau }}_{\mathrm{\epsilon }}\to {\mathrm{\tau }}_{0}>0$ by the continuity of the max function. It follows that ${\mathrm{\tau }}_{\mathrm{\epsilon }}>0$ for $\mathrm{\epsilon }$ large enough. By eq. (4), $P{r}_{{P}_{U}×{P}_{\mathrm{\epsilon }}}\left({d}_{\mathrm{\epsilon }}\left(U,V\right)=1\right)=\mathrm{\kappa }$ for all sufficiently small $|\mathrm{\epsilon }|$ and $P{r}_{0}\left({\stackrel{˜}{d}}_{0}\left(V\right)=1\right)=\mathrm{\kappa }$. Thus the second term of eq. (9) is 0 for all $|\mathrm{\epsilon }|$ small enough.

Step 3: ${\mathrm{\tau }}_{\mathrm{\epsilon }}-{\mathrm{\tau }}_{0}=O\left(\mathrm{\epsilon }\right)$.

Note that $\mathrm{\kappa }<{S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-|\mathrm{\epsilon }|\right)\le \left(1+{C}_{W}|\mathrm{\epsilon }|\right){S}_{0}\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-\left(1+{C}_{1}\right)|\mathrm{\epsilon }|\right)$. A Taylor expansion of ${S}_{0}$ about ${\mathrm{\eta }}_{0}$ shows that $\mathrm{\kappa }<\left(1+{C}_{W}|\mathrm{\epsilon }|\right)\left({S}_{0}\left({\mathrm{\eta }}_{0}\right)+\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}-\left(1+{C}_{1}\right)|\mathrm{\epsilon }|\right)\left(-{f}_{0}\left({\mathrm{\eta }}_{0}\right)+o\left(1\right)\right)\right)$ $=\mathrm{\kappa }+\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}-\left(1+{C}_{1}\right)|\mathrm{\epsilon }|\right)\left(-{f}_{0}\left({\mathrm{\eta }}_{0}\right)+o\left(1\right)\right)+O\left(\mathrm{\epsilon }\right)$ $=\mathrm{\kappa }-\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}\right){f}_{0}\left({\mathrm{\eta }}_{0}\right)+o\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}\right)+O\left(\mathrm{\epsilon }\right).$(11)The fact that ${f}_{0}\left({\mathrm{\eta }}_{0}\right)\in \left(0,\mathrm{\infty }\right)$ shows that ${\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}$ is bounded above by some $O\left(\mathrm{\epsilon }\right)$ sequence. Similarly, $\mathrm{\kappa }\ge {S}_{\mathrm{\epsilon }}\left({\mathrm{\eta }}_{\mathrm{\epsilon }}+|\mathrm{\epsilon }|\right)\ge \left(1-{C}_{W}|\mathrm{\epsilon }|\right){S}_{0}\left({\mathrm{\eta }}_{\mathrm{\epsilon }}+\left(1+{C}_{1}\right)|\mathrm{\epsilon }|\right)$. Hence, $\mathrm{\kappa }\ge \left(1-{C}_{W}|\mathrm{\epsilon }|\right)\left({S}_{0}\left({\mathrm{\eta }}_{0}\right)+\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}+\left(1+{C}_{1}\right)|\mathrm{\epsilon }|\right)\left(-{f}_{0}\left({\mathrm{\eta }}_{0}\right)+o\left(1\right)\right)\right)$ $=\mathrm{\kappa }-\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}\right){f}_{0}\left({\mathrm{\eta }}_{0}\right)+o\left({\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}\right)+O\left(\mathrm{\epsilon }\right).$

It follows that ${\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}$ is bounded below by some $O\left(\mathrm{\epsilon }\right)$ sequence. Combining these two bounds shows that ${\mathrm{\eta }}_{\mathrm{\epsilon }}-{\mathrm{\eta }}_{0}=O\left(\mathrm{\epsilon }\right)$, which immediately implies that ${\mathrm{\tau }}_{\mathrm{\epsilon }}-{\mathrm{\tau }}_{0}=max\left\{O\left(\mathrm{\epsilon }\right),0\right\}=O\left(\mathrm{\epsilon }\right)$.

Step 4: First term of eq. (9) is $o\left(\mathrm{\epsilon }\right)$.

We know that ${\stackrel{ˉ}{Q}}_{b,0}\left(V\right)-{\mathrm{\tau }}_{0}+O\left(\mathrm{\epsilon }\right)\le {\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\left(V\right)-{\mathrm{\tau }}_{\mathrm{\epsilon }}\le {\stackrel{ˉ}{Q}}_{b,0}\left(V\right)-{\mathrm{\tau }}_{0}+O\left(\mathrm{\epsilon }\right).$

By C4), it follows that there exists some $\mathrm{\delta }>0$ such that ${sup}_{|\mathrm{\epsilon }|<\mathrm{\delta }}P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\left(V\right)={\mathrm{\tau }}_{\mathrm{\epsilon }}\right)=0$. By the absolute continuity of ${Q}_{W,\mathrm{\epsilon }}$ with respect to ${Q}_{W,0}$, ${sup}_{|\mathrm{\epsilon }|<\mathrm{\delta }}P{r}_{{P}_{\mathrm{\epsilon }}}\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\left(V\right)={\mathrm{\tau }}_{\mathrm{\epsilon }}\right)=0$. It follows that, for all small enough $|\mathrm{\epsilon }|$ and almost all u, ${d}_{\mathrm{\epsilon }}\left(u,v\right)=I\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}\left(v\right)>{\mathrm{\tau }}_{\mathrm{\epsilon }}\right)$. Hence, $\begin{array}{rl}{\int }_{w}& \left({E}_{{P}_{U}}\left[{d}_{\mathrm{\epsilon }}\left(U,V\right)\right]-{d}_{0}\left(V\right)\right)\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}-{\mathrm{\tau }}_{0}\right)d{Q}_{W,\mathrm{\epsilon }}\\ & =\left|{\int }_{w}\left(I\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}>{\mathrm{\tau }}_{\mathrm{\epsilon }}\right)-I\left({\stackrel{ˉ}{Q}}_{b,0}>{\mathrm{\tau }}_{0}\right)\right)\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}-{\mathrm{\tau }}_{0}\right)d{Q}_{W,\mathrm{\epsilon }}\right|\\ & \le {\int }_{w}\left|I\left({\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}>{\mathrm{\tau }}_{\mathrm{\epsilon }}\right)-I\left({\stackrel{ˉ}{Q}}_{b,0}>{\mathrm{\tau }}_{0}\right)\right|\left(\left|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right|+{C}_{1}|\mathrm{\epsilon }|\right)d{Q}_{W,\mathrm{\epsilon }}\\ & \le {\int }_{w}I\left(|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}|\le |{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}-{\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}+{\mathrm{\tau }}_{\mathrm{\epsilon }}|\right)\left(\left|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right|+{C}_{1}|\mathrm{\epsilon }|\right)d{Q}_{W,\mathrm{\epsilon }}\\ & ={\int }_{w}I\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}|\le |{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}-{\stackrel{ˉ}{Q}}_{b,\mathrm{\epsilon }}+{\mathrm{\tau }}_{\mathrm{\epsilon }}|\right)\left(\left|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}\right|+{C}_{1}|\mathrm{\epsilon }|\right)d{Q}_{W,\mathrm{\epsilon }}\\ & \le O\left(\mathrm{\epsilon }\right){\int }_{w}I\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}|\le O\left(\mathrm{\epsilon }\right)\right)d{Q}_{W,\mathrm{\epsilon }}\\ & \le O\left(\mathrm{\epsilon }\right)\left(1+{C}_{W}|\mathrm{\epsilon }|\right)P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-{\mathrm{\tau }}_{0}|\le O\left(\mathrm{\epsilon }\right)\right),\end{array}$ where the penultimate inequality holds by Step 3 and eq. (7). The last line above is $o\left(\mathrm{\epsilon }\right)$ because $Pr\left(0 as $\mathrm{\epsilon }\to 0$ for any random variable X. Thus dividing the left-hand side above by $\mathrm{\epsilon }$ and taking the limit as $\mathrm{\epsilon }\to 0$ yields zero.□

## Proofs for Section 5

We give the following lemma before proving Theorem 4.

Let ${P}_{0}$ and P be distributions which satisfy the positivity assumption and for which Y is bounded in probability. Let d be some stochastic treatment rule and $\mathrm{\tau }$ be some real number. We have that ${\mathrm{\Psi }}_{d}\left(P\right)-\mathrm{\Psi }\left({P}_{0}\right)=-{E}_{{P}_{0}}\left[D\left(d,{\mathrm{\tau }}_{0},P\right)\left(O\right)\right]+{R}_{0}\left(d,P\right)$.

Proof of Lemma 12. Note that ${\mathrm{\Psi }}_{d}\left(P\right)-\mathrm{\Psi }\left({P}_{0}\right)+{E}_{{P}_{0}}\left[D\left(d,{\mathrm{\tau }}_{0},P\right)\left(O\right)\right]$ $=\phantom{\rule{thickmathspace}{0ex}}{\mathrm{\Psi }}_{d}\left(P\right)-{\mathrm{\Psi }}_{d}\left({P}_{0}\right)+\sum _{j=1}^{2}{E}_{{P}_{U}×{P}_{0}}\left[{D}_{j}\left(d\left(U,\cdot \right),P\right)\left(O\right)\right]$ $+{\mathrm{\Psi }}_{d}\left({P}_{0}\right)-{\mathrm{\Psi }}_{{d}_{0}}\left({P}_{0}\right)-{\mathrm{\tau }}_{0}{E}_{{P}_{U}×{P}_{0}}\left[d\left(U,V\right)-\mathrm{\kappa }\right].$

Standard calculations show that the first term on the right is equal to ${R}_{10}\left(d,P\right)$ [21]. If ${\mathrm{\tau }}_{0}>0$, then eq. (4) shows that ${\mathrm{\tau }}_{0}{E}_{{P}_{U}×{P}_{0}}\left[d-\mathrm{\kappa }\right]={\mathrm{\tau }}_{0}{E}_{{P}_{U}×{P}_{0}}\left[d-{d}_{0}\right]$. If ${\mathrm{\tau }}_{0}=0$, then obviously ${\mathrm{\tau }}_{0}{E}_{{P}_{U}×{P}_{0}}\left[d-\mathrm{\kappa }\right]={\mathrm{\tau }}_{0}{E}_{{P}_{U}×{P}_{0}}\left[d-{d}_{0}\right]$. Lemma 1 shows that ${\mathrm{\Psi }}_{d}\left({P}_{0}\right)-{\mathrm{\Psi }}_{{d}_{0}}\left({P}_{0}\right)={E}_{{P}_{U}×{P}_{0}}\left[\left(d-{d}_{0}\right){\stackrel{ˉ}{Q}}_{b,0}\right]$. Thus the second line above is equal to ${R}_{20}\left(d\right)$.□

Proof of Theorem 4. We make use of empirical process theory notation in this proof so that $Pf={E}_{P}\left[f\left(O\right)\right]$ for a distribution P and function f. We have that $\stackrel{ˆ}{\mathrm{\Psi }}\left({P}_{n}\right)-\mathrm{\Psi }\left({P}_{0}\right)$ $=-{P}_{0}D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)+{R}_{0}\left({d}_{n},{P}_{n}^{\ast }\right)$(12) $=\left({P}_{n}-{P}_{0}\right)D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)+{R}_{0}\left({d}_{n},{P}_{n}^{\ast }\right)+{o}_{{P}_{0}}\left({n}^{-1/2}\right)$(13) $=\left({P}_{n}-{P}_{0}\right){D}_{0}+\left({P}_{n}-{P}_{0}\right)\left(D\left({d}_{n},{\mathrm{\tau }}_{0},{P}_{n}^{\ast }\right)-{D}_{0}\right)+{R}_{0}\left({d}_{n},{P}_{n}^{\ast }\right).$The middle term on the last line is ${o}_{{P}_{0}}\left({n}^{-1/2}\right)$ by 1, 2, 4, and 5 [25], and the third term is ${o}_{{P}_{0}}\left({n}^{-1/2}\right)$ by 3. This yields the asymptotic linearity result. Proposition 1 in Section 3.3 of Bickel et al. [23] yields the claim about regularity and asymptotic efficiency when conditions C2), C3), C4), and 1 hold (see Theorem 3).□

Proof of Lemma 5. We will show that ${\mathrm{\eta }}_{n}\to {\mathrm{\eta }}_{0}$ in probability, and then the consistency of ${\mathrm{\tau }}_{n}$ follows by the continuous mapping theorem. By C3), there exists an open interval N containing ${\mathrm{\eta }}_{0}$ on which ${S}_{0}$ is continuous. Fix $\mathrm{\eta }\in N$. Because ${\stackrel{ˉ}{Q}}_{b,n}$ belongs to a Glivenko-Cantelli class with probability approaching 1, we have that $\left|{S}_{n}\left(\mathrm{\eta }\right)-{S}_{0}\left(\mathrm{\eta }\right)\right|=\left|{P}_{n}I\left({\stackrel{ˉ}{Q}}_{b,n}>\mathrm{\eta }\right)-{P}_{0}I\left({\stackrel{ˉ}{Q}}_{b,0}>\mathrm{\eta }\right)\right|$ $\le \left|{P}_{0}\left(I\left({\stackrel{ˉ}{Q}}_{b,n}>\mathrm{\eta }\right)-I\left({\stackrel{ˉ}{Q}}_{b,0}>\mathrm{\eta }\right)\right)\right|+\left|\left({P}_{n}-{P}_{0}\right)I\left({\stackrel{ˉ}{Q}}_{b,n}>\mathrm{\eta }\right)\right|$ $\begin{array}{l}|{S}_{n}\left(\eta \right)-{S}_{0}\left(\eta \right)|=|{P}_{n}I\left({\overline{Q}}_{b,n}>\eta \right)-{P}_{0}I\left({\overline{Q}}_{b,0}>\eta \right)|\\ \le |{P}_{0}\left(I\left({\overline{Q}}_{b,n}>\eta \right)-I\left({\overline{Q}}_{b,0}>\eta \right)\right)|+|\left({P}_{n}-{P}_{0}\right)I\left({\overline{Q}}_{b,n}>\eta \right)|\\ \le \underset{\triangleq {T}_{n}\left(\eta \right)}{\underbrace{|{P}_{0}\left(I\left({\overline{Q}}_{b,n}>\eta \right)-I\left({\overline{Q}}_{b,0}>\eta \right)\right)|}}+{o}_{{P}_{0}}\left(1\right),\end{array}$(14)where we use the notation $Pf={E}_{P}\left[f\left(O\right)\right]$ for any distribution P and function f. Let ${Z}_{n}\left(\mathrm{\eta }\right)\left(w\right)\stackrel{\mathrm{\Delta }}{=}{\left(I\left({\stackrel{ˉ}{Q}}_{b,n}\left(w\right)>\mathrm{\eta }\right)-I\left({\stackrel{ˉ}{Q}}_{b,0}\left(w\right)>\mathrm{\eta }\right)\right)}^{2}$. The following display holds for all $q>0$: ${T}_{n}\left(\mathrm{\eta }\right)\le {P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)$ $={P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|>q\right)+{P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le q\right)$ $={P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|>q\right)+{P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le q\right)$(15) $\le {P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(|{\stackrel{ˉ}{Q}}_{b,n}-{\stackrel{ˉ}{Q}}_{b,0}|>q\right)+{P}_{0}{Z}_{n}\left(\mathrm{\eta }\right)I\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le q\right)$(16) $\le P{r}_{0}\left(|{\stackrel{ˉ}{Q}}_{b,n}-{\stackrel{ˉ}{Q}}_{b,0}|>q\right)+P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le q\right)$ $\le \frac{{P}_{0}|{\stackrel{ˉ}{Q}}_{b,n}-{\stackrel{ˉ}{Q}}_{b,0}|}{q}+P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le q\right).$Above eq. (15) holds because C3) implies that $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}=\mathrm{\eta }\right)=0$, eq. (16) holds because ${Z}_{n}\left(\mathrm{\eta }\right)=1$ implies that $|{\stackrel{ˉ}{Q}}_{b,n}-{\stackrel{ˉ}{Q}}_{b,0}|\ge |{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|$, and the final inequality holds by Markov’s inequality. The lemma assumes that ${E}_{{P}_{0}}|{\stackrel{ˉ}{Q}}_{b,n}-{\stackrel{ˉ}{Q}}_{b,0}|={o}_{{P}_{0}}\left(1\right)$, and thus we can choose a sequence ${q}_{n}↓0$ such that ${T}_{n}\left(\mathrm{\eta }\right)\le P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le {q}_{n}\right)+{o}_{{P}_{0}}\left(1\right).$To see that the first term on the right is $o\left(1\right)$, note that $P{r}_{0}\left({\stackrel{ˉ}{Q}}_{b,0}=\mathrm{\eta }\right)=0$ combined with the continuity of ${S}_{0}$ on N yield that, for n large enough, $P{r}_{0}\left(0<|{\stackrel{ˉ}{Q}}_{b,0}-\mathrm{\eta }|\le {q}_{n}\right)={S}_{0}\left(-{q}_{n}+\mathrm{\eta }\right)-{S}_{0}\left({q}_{n}+\mathrm{\eta }\right).$The right-hand side is $0\left(1\right)$, and thus ${T}_{n}\left(\mathrm{\eta }\right)={o}_{{P}_{0}}\left(1\right)$. Plugging this into eq. (14) shows that ${S}_{n}\left(\mathrm{\eta }\right)\to {S}_{0}\left(\mathrm{\eta }\right)$ in probability. Recall that $\mathrm{\eta }\in N$ was arbitrary.

Fix $\mathrm{\gamma }>0$. For $\mathrm{\gamma }$ small enough, ${\mathrm{\eta }}_{0}-\mathrm{\gamma }$ and ${\mathrm{\eta }}_{0}+\mathrm{\gamma }$ are contained in N. Thus ${S}_{n}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)\to {S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)$ and ${S}_{n}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)\to {S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)$ in probability. Further, ${S}_{0}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)>\mathrm{\kappa }$ by the definition of ${\mathrm{\eta }}_{0}$ and ${S}_{0}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)<\mathrm{\kappa }$ by Condition C2). It follows that, with probability approaching 1, ${S}_{n}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)>\mathrm{\kappa }$ and ${S}_{n}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)<\mathrm{\kappa }$. But $|{\mathrm{\eta }}_{n}-{\mathrm{\eta }}_{0}|>\mathrm{\gamma }$ implies that ${S}_{n}\left({\mathrm{\eta }}_{0}-\mathrm{\gamma }\right)\le \mathrm{\kappa }$ or ${S}_{n}\left({\mathrm{\eta }}_{0}+\mathrm{\gamma }\right)>\mathrm{\kappa }$, and thus $|{\mathrm{\eta }}_{n}-{\mathrm{\eta }}_{0}|\le \mathrm{\gamma }$ with probability approaching 1. Thus ${\mathrm{\eta }}_{n}\to {\mathrm{\eta }}_{0}$ in probability, and ${\mathrm{\tau }}_{n}\to {\mathrm{\tau }}_{0}$ by the continuous mapping theorem.□

Proof of Theorem 6. This proof mirrors the proof of Lemma 5.2 in Audibert and Tsybakov [24]. It is also quite similar to the proof of Theorem 7 in Luedtke and van der Laan [13], though the proof given in that working technical report is for optimal rules without any resource constraints, and also contains several typographical errors which will be corrected in the final version.

Define ${B}_{n}$ to be the function $v\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}{\stackrel{ˉ}{Q}}_{b,n}\left(v\right)-{\mathrm{\tau }}_{n}$ and ${B}_{0}$ to be the function $v\phantom{\rule{thinmathspace}{0ex}}↦\phantom{\rule{thinmathspace}{0ex}}{\stackrel{ˉ}{Q}}_{b,0}\left(v\right)-{\mathrm{\tau }}_{0}$. Below we omit the dependence of ${B}_{n}$, ${B}_{0}$ on V in the notation and of ${d}_{n}$, ${d}_{0}$ on U and V. For any $t>0$, we have that $|{R}_{20}\left({d}_{n}\right)|\le \phantom{\rule{thickmathspace}{0ex}}{E}_{{P}_{U}×{P}_{0}}\left[\left|\left({d}_{n}-{d}_{0}\right){B}_{0}\right|\right]$ $=\phantom{\rule{thickmathspace}{0ex}}{E}_{{P}_{U}×{P}_{0}}\left[I\left({d}_{n}\mathit{/}={d}_{0}\right)\left|{B}_{0}\right|\right]$ $=\phantom{\rule{thickmathspace}{0ex}}{E}_{{P}_{U}×{P}_{0}}\left[I\left({d}_{n}\mathit{/}={d}_{0}\right)\left|{B}_{0}\right|I\left(0<|{B}_{0}|\le t\right)\right]$ $+\text{\hspace{0.17em}}{E}_{{P}_{U}×{P}_{0}}\left[I\left({d}_{n}\mathit{/}={d}_{0}\right)\left|{B}_{0}\right|I\left(|{B}_{0}|>t\right)\right]$ $\le \phantom{\rule{thickmathspace}{0ex}}{E}_{{P}_{0}}\left[|{B}_{n}-{B}_{0}|I\left(0<|{B}_{0}|\le t\right)\right]$ $+\text{\hspace{0.17em}}{E}_{{P}_{0}}\left[|{B}_{n}-{B}_{0}|I\left(|{B}_{n}-{B}_{0}|>t\right)\right]$ $\le \phantom{\rule{thickmathspace}{0ex}}{∥{B}_{n}-{B}_{0}∥}_{2,{P}_{0}}P{r}_{0}\left(0<|{B}_{0}|\le t{\right)}^{1/2}+\frac{{∥{B}_{n}-{B}_{0}∥}_{2,{P}_{0}}^{2}}{t}$ $\le \phantom{\rule{thickmathspace}{0ex}}{∥{B}_{n}-{B}_{0}∥}_{2,{P}_{0}}{C}_{0}^{1/2}{t}^{\mathrm{\alpha }/2}+\frac{{∥{B}_{n}-{B}_{0}∥}_{2,{P}_{0}}^{2}}{t},$

where the second inequality holds because ${d}_{n}\mathit{/}={d}_{0}$ implies that $|{B}_{n}-{B}_{0}|\ge |{B}_{0}|$ when $|{B}_{0}|>0$, the third inequality holds by the Cauchy-Schwarz and Markov inequalities, and the ${C}_{0}$ on the final line is the constant implied by eq. (5). The first result follows by plugging $t={∥{B}_{n}-{B}_{0}∥}_{2,{P}_{0}}^{2/\left(2+\mathrm{\alpha }\right)}$ into the upper bound above. We also have that $|{R}_{20}\left({d}_{n}\right)|\le {E}_{{P}_{U}×{P}_{0}}\left[I\left({d}_{n}\mathit{/}={d}_{0}\right)|{B}_{0}|\right]$ $\le {E}_{{P}_{0}}\left[I\left(0<|{B}_{0}|\le |{B}_{n}-{B}_{0}|\right)|{B}_{0}|\right]$ $\le {E}_{{P}_{0}}\left[I\left(0<|{B}_{0}|\le {∥{B}_{n}-{B}_{0}∥}_{\mathrm{\infty },{P}_{0}}\right)|{B}_{0}|\right]$ $\le {∥{B}_{n}-{B}_{0}∥}_{\mathrm{\infty },{P}_{0}}P{r}_{0}\left(0<|{B}_{0}|\le {∥{B}_{n}-{B}_{0}∥}_{\mathrm{\infty },{P}_{0}}\right).$By eq. (5), it follows that $|{R}_{20}\left({d}_{n}\right)|\lesssim {∥{B}_{n}-{B}_{0}∥}_{\mathrm{\infty },{P}_{0}}^{1+\mathrm{\alpha }}$.□

Published Online: 2016-05-26

Published in Print: 2016-05-01

Funding: This research was supported by NIH grant R01 AI074345-06. AL was supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program.

Citation Information: The International Journal of Biostatistics, Volume 12, Issue 1, Pages 283–303, ISSN (Online) 1557-4679, ISSN (Print) 2194-573X,

Export Citation

## Citing Articles

[1]
Tyler J. VanderWeele
Epidemiology, 2019, Volume 30, Number 5, Page 648
[2]
Tyler J. VanderWeele, Alex R. Luedtke, Mark J. van der Laan, and Ronald C. Kessler
Epidemiology, 2019, Volume 30, Number 3, Page 334
[3]
David M Kent, Ewout Steyerberg, and David van Klaveren
BMJ, 2018, Page k4245
[4]
Michael R. Kosorok and Eric B. Laber
Annual Review of Statistics and Its Application, 2019, Volume 6, Number 1, Page 263
[5]
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins
The Econometrics Journal, 2018
[6]
Alexander R Luedtke and Mark J van der Laan
Statistical Methods in Medical Research, 2017, Volume 26, Number 4, Page 1630