For the general framework, consider a study of a random sample of $n$ individuals from a population and for each of whom we can measure a vector of covariates ${X}_{i}$, which we assume have finite although possibly many levels. Each individual can be assigned a standard treatment $t=0$, in which case we would measure a potential outcome ${Y}_{i}(t=0)$, or a new treatment $t=1$, in which case we would measure a potential outcome ${Y}_{i}(t=1)$ [6]. Actual assignment ${\text{Treat}}_{i}(=0,1)$ is assigned at random, that is, ${\text{Treat}}_{i}$ is independent of $({Y}_{i}(0),{Y}_{i}(1),{X}_{i})$, and then the outcome ${Y}_{i}:={Y}_{i}({\text{Treat}}_{i})$ corresponding to the actual assignment is observed. Based on the information of the study, the overall population average potential outcome $E\{{Y}_{i}(t)\}$ can be estimated without further assumptions by the sample analogue $E({Y}_{i}\mid {\text{Treat}}_{i}=t)$ of the average observed outcomes among those assigned ${\text{Treat}}_{i}=t$.

Even if the new treatment is the best (on average, or for a particular patient, Zhang et al. [2]), its effect may be small and its administration associated with burden or adverse effects. Then, for subsequent clinical practice, physicians may wish to only give the new treatment to patients for whom the above study suggests the effect is large enough. To do this, for example, in the psychiatric trial we discuss in Section 2.2, the physicians wanted to characterize a subgroup of patients based on covariates, for whom the treatment effect is, on average, greater than a chosen clinically important value, say ${\text{eff}}_{\text{min}}$. Taking here the absolute difference as the causal effect of interest, the physicians’ goal is as follows:

$\begin{array}{rl}& \text{find a group of patients},\phantom{\rule{thinmathspace}{0ex}}\begin{array}{c}\text{highly}\text{benefited}\end{array}\phantom{\rule{thinmathspace}{0ex}},\phantom{\rule{thinmathspace}{0ex}}\text{that maximizes the proportion,\hspace{0.17em}}\text{pr}\{{X}_{i}\in \begin{array}{c}\text{highly}\text{benefited}\end{array}\},\\ & \text{subject to having large average effect,\hspace{0.17em}}\phantom{\rule{thinmathspace}{0ex}}E\{{Y}_{i}(1)-{Y}_{i}(0)\mid {X}_{i}\in \begin{array}{c}\text{highly}\text{benefited}\end{array}\}\ge {\text{eff}}_{\text{min}}.\end{array}$(1)If it is possible to estimate well the conditional $\text{effect}\phantom{\rule{thinmathspace}{0ex}}({X}_{i}):=E\{{Y}_{i}(1)-{Y}_{i}(0)\mid {X}_{i}\}$ for all ${X}_{i}$ without further assumptions, then the goal eq. (1) is easily addressable. To see this, consider, for any indicator function $in({X}_{i})$, the quantity $\text{effect}\phantom{\rule{thinmathspace}{0ex}}\{in({X}_{i})=1\}:=E\{{Y}_{i}(1)-{Y}_{i}(0)\mid in({X}_{i})=1\}$. We prove the following result in the Appendix.

** Result 1. **Among all indicator functions $in({X}_{i})$ such that $\text{effect}\phantom{\rule{thinmathspace}{0ex}}\{in({X}_{i})=1\}\ge {\text{eff}}_{\text{min}},$ the indicator that maximizes the size $\text{pr}\{in({X}_{i})=1\}$ is of the form

$i{n}_{0}({X}_{i}):=1\text{if and only if\hspace{0.17em}}\text{effect}\phantom{\rule{thinmathspace}{0ex}}({X}_{i})\ge k$where $k$ is a constant determined by $\text{effect}\phantom{\rule{thinmathspace}{0ex}}\{i{n}_{0}({X}_{i})=1\}={\text{eff}}_{\text{min}}$, provided that such a $k$ exists.

In other words, the largest group $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}$ satisfying eq. (1) is $\{x:i{n}_{0}(x)=1\}$ and is obtained if we start including in the group patients from the larger down to the smaller values of the conditional $\text{effect}\phantom{\rule{thinmathspace}{0ex}}({X}_{i})$, and stop when including the covariate with the next smallest value of $\text{effect}\phantom{\rule{thinmathspace}{0ex}}({X}_{i})$ in $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}$ would first produce an average effect $E\{{Y}_{i}(1)-{Y}_{i}(0)\mid {X}_{i}\in \begin{array}{c}\text{highly}\\ \text{benefited}\end{array}\}\text{smaller than}$ ${\text{eff}}_{\text{min}}$.

More realistically, when the levels of ${X}_{i}$ are many, the conditional effects are not estimable without further assumptions, and the above direct approach is not feasible. An existing approach [5] mirrors the theoretical approach using a working model (see Figure 1, first two columns). Specifically, here the existing approach in a first stage fits a parametric working model (which may not be correct): $\text{pr}({Y}_{i}(t)\mid {X}_{i},\beta )$ $(=\text{pr}({Y}_{i}\mid {X}_{i},{\text{Treat}}_{i}=t,\beta )$, by random assignment), by the MLE ${\stackrel{\u02c6}{\beta}}^{mle}$ or a solution to another standard estimating equation. Based on this fit, the approach obtains an initial, model-based estimate of the effect $E({Y}_{i}\mid {X}_{i},{\text{Treat}}_{i}=1)\text{-\hspace{0.17em}}E({Y}_{i}\mid {X}_{i},{\text{Treat}}_{i}=0)$ using

$\begin{array}{rl}\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c6}{\beta}}^{mle}):=& E({Y}_{i}\mid {X}_{i},{\text{Treat}}_{i}=1,{\stackrel{\u02c6}{\beta}}^{mle})\\ & -E({Y}_{i}\mid {X}_{i},{\text{Treat}}_{i}=0,{\stackrel{\u02c6}{\beta}}^{mle}).\end{array}$(2)This approach can attempt to approximate goal eq. (1) by mimicking the theoretical solution given above, as follows: first, sort the covariates by the values of estimated effects, $\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c6}{\beta}}^{mle})$; then, start creating the set $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})$ by cumulating ${X}_{i}$ from larger to smaller values of $\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c6}{\beta}}^{mle});$ and close the set $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})$ when the empirical (non-parametric) estimated effect (difference in sample averages of treated minus control) in that set would stop being $\ge {\text{eff}}_{\text{min}}$. This gives

$\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})=\underset{\text{\hspace{0.17em}over all values\hspace{0.17em}}e}{\text{the largest-fraction}}\phantom{\rule{thinmathspace}{0ex}}\{{X}_{i}:\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c6}{\beta}}^{mle})\ge e\}$(3)such that the empirical treatment effect in the set is at least ${\text{eff}}_{\text{min}}$. By largest-fraction set we mean a set that has the largest probability based on the empirical distribution of ${X}_{i}$ in the study.

Figure 1: Schematic representation of the theoretical solution, the existing approach, and the proposed approach, for a given ${\text{eff}}_{\text{min}}$.

A useful property of this approach, resulting from the empirical estimation at the second stage, is that the effect among the estimated highly benefited set in eq. (3) is approximately the desired clinical effect ${\text{eff}}_{\text{min}}$, even if the working model is incorrect. Specifically, [5] show that, allowing for the working model to be incorrect, the estimator ${\stackrel{\u02c6}{\beta}}^{mle}$ will converge to a value, say ${\stackrel{\u02c9}{\beta}}^{mle}$, and the set $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})$ will converge to

$\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c9}{\beta}}^{mle})=\underset{\text{\hspace{0.17em}over\hspace{0.17em}}e}{\text{the largest-probability set}}\phantom{\rule{thinmathspace}{0ex}}\{{X}_{i}:\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c9}{\beta}}^{mle})\ge e\}$such that the effect within the set is at least ${\text{eff}}_{\text{min}}$. Therefore, the empirical $\stackrel{\u02c6}{\text{effect}\phantom{\rule{thinmathspace}{0ex}}}\{\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})\}$, defined as the difference between the empirical averages of the highly benefited set assigned $\text{Treat}=1$ versus those assignd $\text{Treat}=0$, converges to at least the nominal effect ${\text{eff}}_{\text{min}}$. The above assumes that $\text{effect}{\phantom{\rule{thinmathspace}{0ex}}}^{model}({X}_{i},{\stackrel{\u02c9}{\beta}}^{mle})$ is not constant in ${X}_{I}$; if it is, then the convergence may not hold, for example, because the sets may be empty.

For a trial with small to moderate sample size, the set of patients $\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}({\stackrel{\u02c6}{\beta}}^{mle})$ may have a true effect that is smaller than the limit. For this reason, we can use a modified set ${\begin{array}{c}\text{highly}\\ \text{benefited}\end{array}}^{calib}({\stackrel{\u02c6}{\beta}}^{mle})$, that uses a resampling method to calibrate its effect to the nominal ${\text{eff}}_{\text{min}}$ (Appendix B).

A problem with the above approach, however, is that it still uses the estimate (e.g., MLE) of the working model *as if the model were correct*. In Section 3, we show that, by using a different estimation of the same working model, a different highly benefited group can be identified, which can be much larger than the one identified by the existing approach. First, however, we illustrate the existing approach using data from the Citalopram for Agitation in Alzheimer Disease Study (CitAD) [1].

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.