# Abstract

Providing terminally ill patients with access to experimental treatments, as allowed by recent “right to try” laws and “expanded access” programs, poses a variety of ethical questions. While practitioners and investigators may assume it is impossible to learn the effects of these treatment without randomized trials, this paper describes a simple tool to estimate the effects of these experimental treatments on those who take them, despite the problem of selection into treatment, and without assumptions about the selection process. The key assumption is that the average outcome, such as survival, would remain stable over time in the absence of the new treatment. Such an assumption is unprovable, but can often be credibly judged by reference to historical data and by experts familiar with the disease and its treatment. Further, where this assumption may be violated, the result can be adjusted to account for a hypothesized change in the non-treatment outcome, or to conduct a sensitivity analysis. The method is simple to understand and implement, requiring just four numbers to form a point estimate. Such an approach can be used not only to learn which experimental treatments are promising, but also to warn us when treatments are actually harmful – especially when they might otherwise appear to be beneficial, as illustrated by example here. While this note focuses on experimental medical treatments as a motivating case, more generally this approach can be employed where a new treatment becomes available or has a large increase in uptake, where selection bias is a concern, and where an assumption on the change in average non-treatment outcome over time can credibly be imposed.

## 1 Introduction

On 30th May 2018, the United States established a federal “right to try” law, allowing terminally ill patients to access experimental medical treatments that have cleared Phase 1 testing but were not yet approved by the Food and Drug Administration (FDA).^{[1]} Such laws extend pre-existing methods of gaining access to unapproved treatments through “compassionate use” or “expanded access” programs, but sidestepping FDA petition and oversight procedures. Numerous ethical objections have been made to these laws, pointing to risks of dangerous side-effects that could shorten or worsen lives, raising false hopes among vulnerable patients, enabling quackery, undermining the FDA, and imposing a financial burden on desperate patients and their families.

Can we – and should we – learn anything about the efficacy and safety of drugs from those taking such experimental treatments? The first reaction of clinicians and statisticians alike may be an emphatic “no”: without randomization into treatment and control groups, individuals will self-select into treatment, thus little can be learned from the observational results. Indeed, one criticism of these laws have been that they “will only make it more difficult to know if medication is effective or safe [1].

Yet, failing to carefully assess the effects of these treatment would both forgo the opportunity to learn which are promising, and perhaps worse, could greatly amplify their harm. Clinicians and investigators will surely make inferences about the effects of new treatments by comparing, at least casually, those who receive them and those who do not. Estimates based on comparisons of this type are not just biased, they are dangerous. As illustrated below, through the effects of self-selection, treatments that are actually harmful may appear to be beneficial in such naive comparisons, perversely encouraging more patients to take them.

We thus have a responsibility to learn the benefits and harms of such treatments, and to avoid that worst errors that naive comparisons would generate. Fortunately, a very simple technique outlined here allows valid estimates of treatment effects under such circumstances in which individuals self-select into treatment in unknown ways. Of course, such inference is not free. The critical assumption required is that in the absence of the new treatment of interest, the average outcome would remain stable or change by a specified amount across two time periods (one before the treatment is available, and one after). While not true in every case, this assumption is straightforward to understand and can often be assessed by experts familiar with treatment of the disease in question.

While this note takes experimental medical treatments as an illustration and motivation, the method described here applies to a wide variety of circumstances where a treatment becomes newly available, individuals or units opt into taking that treatment, and the assumption of stability in average non-treatment outcomes over time is reasonable.^{[2]} The method may also be useful even in cases where a randomized trial is possible or has been conducted, but that trial had strict eligibility criteria or low willingness to consent to randomization, resulting in an estimate that may not generalize well to the population that would actually elect to take the treatment.

## 2 Proposed approach

Perhaps surprisingly, we can make inferences about how those who take a new treatment benefit from it even when patients select into taking treatment in unknown ways. Consider person *i*, with some observed outcome *i*, but also two potential outcomes – the outcome she would have experienced had she been treated,

We next consider how the average outcome people would experience without the treatment,

where *D*) is available,

The core assumption required is on the value of *δ*. For simplicity of exposition and because it is likely the primary use case, we consider first the assumption that *stability of the average non-treatment outcome* assumption. However, alternative assumed values of *δ* can be employed. Within this setting, besides the practical assumption that at least some eligible individuals take the new treatment, *δ* is required to identify the average treatment effect among the treated. No assumption is required on the treatment assignment mechanism.

Showing this identification result is straightforward: we know the average non-treatment outcome among the whole group in period one, because it is equal to the mean observed (non-treatment) outcome in period zero (shifted by *δ* if

which we can re-arrange to identify

Finally, the Average Treatment Effect on the Treated (ATT) is the difference between the treatment and non-treatment potential outcomes, taken solely among the treated, i. e.

When the stability assumption is maintained, *Discussion*).

### 2.1 Extensions

A number of extensions are possible. First, investigators could use this tool to examine “what-if” scenarios: if clinicians have beliefs about the four quantities required here or evidence from past cases, we can compute ATT estimates to understand the underlying effect implied by those beliefs. Such an analysis is informal and only as good as the guesses that are used as data. However, it correctly produces an ATT estimate subject to those guesses, avoiding the errors that result from naive comparisons that may otherwise be made. An illustration of such errors is given below.

Second, the assumption that

## 3 Illustration: A harmful drug that appears to be beneficial

A simple illustration can demonstrate the method and how it can avoid the most dangerous errors that may arise due to direct comparisons that practitioners may be tempted to make. Suppose an aggressive form of cancer, once diagnosed at a certain stage, has only a 50 % one-year survival rate as measured over a period from 2005 to 2010 (

To verify this result while reinforcing the intuition: first, under the stability assumption, if we could see how everybody in the later group fairs in the absence of the treatment, we know that (up to finite sample error) 50 % would survive at one year. Among the 70 % who did not take the treatment, only 40 % survive at one year. This “drop” in survival rate among the non-treated signals that it must have been those who were worse-off who chose not to take the treatment. Consequently, those who do take the treatment must have had higher (non-treatment) survival in order to bring this 40 % up to the required 50 % for the whole group. Specifically, the 30 % of the group who took the treatment must have had an average non-treatment survival of *x* in the equation

The observed survival rate of 60 % under treatment thus no longer appears favorable compared to the 73 % who would have survived without treatment in this group, yielding an estimated ATT of *δ* will characterize what we would conclude for any given assumption of *δ*.

## 4 Discussion

While motivated here by the problem of experimental medical treatments, this approach is quite general in its applicability to situations in which new treatments become available and individuals self-select to receive them. The main concern investigators must keep in mind when choosing to apply this method is whether they can justify the stability assumption, or employ some value of *δ* other than zero. Fortunately, the stability of non-treatment outcomes is a straightforward assumption to understand, and can be well evaluated by experts in many cases. It is most plausible, first, if little has changed in the way the disease is treated over the time in question. This includes any *other* treatments that might be administered or withdrawn due to taking the new treatment in question. Second, a more subtle concern lies in compositional changes in the population who acquires the disease, such as changes in population health or competing risks, as these could also drive changes in the average non-treatment outcome. That said, if such compositional shifts do occur, they are likely to be slow-moving and so may be possible to rule out as problematic in the short-run. While no test can prove that the stability assumption (or any other assumption on *δ*) holds, investigators can check the stability of average outcomes over the course of many years prior to the introduction of the new treatment, which can boost the credibility that it remains stable thereafter. Altogether, if the outcome has been stable for many years prior to the introduction of the treatment in question, and there are no known changes in the use of other treatments or sudden changes in the composition of the group with the disease, then a strong case can be made for the stability of the average non-treatment outcome.

Pragmatically, only four numbers are required to form a point estimate: the estimated average outcome prior to treatment (

### 4.1 Comparisons to related methods

One alternative approach worth mentioning but less often applicable would be possible when we have a disease such that the prognosis in the absence of any new treatment is virtually certain, and thus the non-treatment outcome one would normally learn from a control group is already known. Suppose that nearly all individuals with a certain cancer at a certain stage die within one year (and those whose cancers do remit, if any, show no signs of their potential for remittance until it happens and thus would not have any basis for self-selecting into treatment). If a group – selected by any means – takes a new medication and has a 50 % survival rate at one year, then the improvement can reasonably be attributed to the new treatment, as we believe we know how those individuals would have faired under non-treatment, despite the absence of a control group. While possibly workable in some scenarios, such an approach is limited to cases where the outcome is nearly certain. By contrast, the approach here is more general, and recognizes that when outcomes are uncertain (such as a 50 % one year survival rate), there is non-trivial scope for self-selection.^{[3]}

Second, this method may bear a resemblance to the Difference-in-Difference (DID) approach, but can operate in circumstances where DID is not possible, and provides a relaxation of DID in cases where it is possible. To conduct DID, we need either to measure each unit before and after (some are exposed to) treatment, or we must be able to place individuals into larger groupings that persist over time (such as states), with treatment being assigned at the level of those larger units at time

The method is thus particularly useful where DID is not possible, however in arrangements where DID is possible (such as panel data), it provides an “adjustable” version of DID that allows prescribed deviations from the parallel trends assumption. Specifically, DID requires the parallel trends assumption, *δ*. To understand the connection, consider that there are two ways to support a particular assumption of *δ*. The trend in average non-treatment outcomes over time could be different for the would-be-treated group and the would-be-control group, with the average of these trends (weighted by their population proportions) amounting to *δ*. Alternatively, we may propose a given *δ* by assuming that both the would-be-treated and would-be-control group changed by *δ*, in turn ensuring that the average *δ*. This more restrictive claim is precisely the parallel trends assumption. Further, if we do assume parallel trends, this means we can learn the appropriate *δ* from the change over time in the control group alone. Setting *δ* to the estimated change in the control group, *δ* through their weighted average.

Third and perhaps most illuminating, in the case when ^{[4]} Accordingly, the required assumptions for this method (under the *T* and *Y* in this graph encodes the exclusion restriction corresponding to the “

### Figure 1

While I am not aware of any empirical or theoretical work describing the identification logic used here, the equivalence to using time as an instrument connects to an emergent set of medical studies in which the uptake of new treatments increases dramatically over time [5], [6], [7], [8], [9].^{[5]} The approach described here helps to give clarity to the identification assumptions required of such work and how they can be judged. In addition, when identification depends upon an assumption on *δ* as described here, covariate adjustment procedures become unnecessary, or require explicit justification in terms of identification. Rather, simpler analyses using Equation 3, together with discussions of plausible values of *δ* are called for. Further, sensitivity analysis based on a range of these *δ* values can be a valuable addition to any such work.

### 4.2 Representational limitations of RCTs

Even when RCTs are possible, investigators may worry about two important representational limitations. For example if only a small fraction of people with a given disease are eligible or willing to consent, then how might this group be different from the ultimate target group who will use the treatment once approved? Clinical designs that allow partial self-selection in an effort to address these concern include “comprehensive cohort studies” [11] and “patient preference trials” [12]. In comprehensive cohort studies, it is proposed that those patients who refuse randomization be allowed to instead join the study but with the treatment of their preference. The randomized arms are compared as in any experiment. The outcomes (as well as the pre-treatment characteristics) of those in the preference groups are hoped to improve our understanding of generalizability, but it is unclear how to make reliable use of the information provided by these preference groups given confounding biases. In later work, “patient preference trials” encapsulate a variety of research designs in which patients’ preferences are elicited, some individuals are randomized, and some receive a treatment of their choosing. In the recent proposal of [13], treatment preferences are elicited from all individuals, who are randomized into one group that will have their treatment assigned at random, and one that can choose its own treatment. This design allows for sharp-bound identification and sensitivity analysis for the average causal effects among those who would choose a given treatment.

The present method has the primary benefit of sidestepping the need for randomization, still required by the above designs. However, the allowance for self-selection alters the estimand in ways that may be preferable or complementary to an RCT, depending on the goals of the study. First, in retrospective work, the ATT identified here may be the ideal quantity of interest if we would like to know what effect a past treatment actually had. Second, if our goals are more prospective, the ATT from this method may say more about the potential effects of a new treatment in the clinical population likely to take it, if the RCT was highly restrictive in its eligibility criteria, or suffered low consent rates. On the other hand, the ATT may not be ideal to inform policy making decisions – such as promoting a new treatment to become a first line therapy – if the group likely to take the treatment under such a policy varies widely from those who have elected to take in the study period.

### 4.3 Conclusions

In summary, this note describes a simple identification procedure allowing estimation of the ATT regardless of self-selection into the sample. The simplest assumption – stability of average non-treatment outcomes (*δ* values. In contrast to DID, the method works when different individuals are present in the two time periods, without any indication of who in the earlier time period would have been exposed to treatment had they been observed in the latter time period. In the

This method is broadly applicable where treatment become newly available or popular, and an assumption on the stability of average non-treatment outcomes can be credibly made. Returning to the motivating case of “right to try” and other access to experimental medical treatments, the availability of this method does not change the deep and difficult set of ethical questions that must be answered about when and whether an experimental treatment should be made available. Rather, given the current laws, we must consider our ethical responsibility to learn what we can from such treatment regimes – not only to determine which therapies are promising for further trials, but to more quickly protect against harmful ones.

# Acknowledgment

The author thanks Darin Christensen, Erin Hartman, Chris Tausanovitch, Mark Handcock, Jeff Lewis, Aaron Rudkin, Maya Petersen, Arash Naeim, Onyebuchi Arah, Dean Knox, Ami Wulf, Paasha Mahdavi, and members of the 2018 Southern California Methods workshop for valuable feedback and discussion.

### References

1. Hambine J. The disingenuousness of right to try. The Atlantic. 2018.Search in Google Scholar

2. Neyman J, Dabrowska D, Speed T. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Stat Sci. 1923 [1990];5(4):465–72.Search in Google Scholar

3. Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91(434):444–55.10.3386/t0136Search in Google Scholar

4. Pearl J. Causality. Cambridge university press; 2009.10.1017/CBO9780511803161Search in Google Scholar

5. Johnston K, Gustafson P, Levy A, Grootendorst P. Use of instrumental variables in the analysis of generalized linear models in the presence of unmeasured confounding with applications to epidemiological research. Stat Med. 2008;27(9):1539–56.10.1002/sim.3036Search in Google Scholar

6. Cain LE, Cole SR, Greenland S, Brown TT, Chmiel JS, Kingsley L, Detels R. Effect of highly active antiretroviral therapy on incident aids using calendar period as an instrumental variable. Am J Epidemiol. 2009;169(9):1124–32.10.1093/aje/kwp002Search in Google Scholar

7. Shetty KD, Vogt WB, Bhattacharya J. Hormone replacement therapy and cardiovascular health in the united states. Med Care. 2009;600–5.10.1097/MLR.0b013e31818bfe9bSearch in Google Scholar

8. Mack CD, Brookhart MA, Glynn RJ, Meyer AM, Carpenter WR, Sandler RS, Stürmer T. Comparative effectiveness of oxaliplatin vs. 5-flourouricil in older adults: an instrumental variable analysis. Epidemiology (Camb, Mass). 2015;26(5):690.10.1097/EDE.0000000000000355Search in Google Scholar

9. Gokhale M, Buse JB, DeFilippo Mack C, Jonsson Funk M, Lund J, Simpson RJ, Stürmer T. Calendar time as an instrumental variable in assessing the risk of heart failure with antihyperglycemic drugs. Pharmacoepidemiol Drug Saf. 2018;27(8):857–66.10.1002/pds.4578Search in Google Scholar

10. Brookhart MA, Rassen JA, Schneeweiss S. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 2010;19(6):537–54.10.1002/pds.1908Search in Google Scholar

11. Olschewski M, Scheurlen H. Comprehensive cohort study: an alternative to randomized consent design in a breast preservation trial. Methods Inf Med. 1985;24(03):131–4.10.1055/s-0038-1635365Search in Google Scholar

12. Brewin CR, Bradley C. Patient preferences and randomised clinical trials. BMJ, Br Med J. 1989;299(6694):313.10.1136/bmj.299.6694.313Search in Google Scholar

13. Knox D, Yamamoto T, Baum MA, Berinsky A. Design, identification, and sensitivity analysis for patient preference trials. Technical report, Working Paper. 2014.Search in Google Scholar

**Received:**2018-07-08

**Accepted:**2018-11-09

**Published Online:**2018-12-06

**Published in Print:**2019-04-26

© 2019 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.