# Abstract

The design and analysis of active-control trials to evaluate experimental HIV pre-exposure prophylaxis (PrEP) agents pose serious statistical challenges. We recently proposed a new outcome measure, the averted infections ratio (AIR) – the proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The main aim of the current paper is to examine the mathematical connection between AIR and the HIV incidence rate ratio, the standard outcome measure. We also consider the sample size implications of the choice of primary outcome measure and explore the connection between effectiveness and efficacy under a simplified model of adherence.

## 1 Introduction

Late-phase trials of experimental HIV pre-exposure prophylaxis (PrEP) agents are currently designed as active-control trials, with oral TDF-FTC, currently the only drug licensed for this indication, constituting the control regimen. In the absence of a validated surrogate, the primary endpoint is an incident HIV infection (Cutrell et al. 2017; Janes et al. 2019). The standard primary outcome measure, following the approach used in earlier placebo controlled trials, is the HIV incidence rate ratio comparing the experimental and control groups (Cutrell et al. 2017; Donnell et al. 2013). In a recent paper, we pointed out serious difficulties in the interpretation of this measure and described an alternative measure of effectiveness, the averted infections ratio (AIR), based on the concept of averted infections (Dunn et al. 2018). The AIR is interpreted as proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The measure is simple to interpret, has direct clinical and public health relevance, and is a natural preservation-of-effect metric for assessing statistical non-inferiority.

The main aim of this paper is to examine the connection between AIR and the rate ratio in more mathematical detail, and to explain how the AIR allows a reduction in sample size for the same level of statistical power. We also point out a curious feature of the AIR concerning effectiveness and efficacy under a simplified model of adherence.

## 2 Statistical Formulation

Currently, most studies of experimental PrEP agents are designed as non-inferiority trials, where the primary aim is to show that HIV incidence is not unacceptably higher with the experimental agent than with TDF-FTC. This is formally judged by whether the observed confidence limit (lower or upper, as appropriate) for the primary outcome measure exceeds a pre-defined non-inferiority margin. A “preservation of effect” argument is often used as a basis for this margin i. e. to aim to show that the experimental agent preserves a minimum fraction of the effect of TDF-FTC relative to placebo or no treatment (Snapinn and Jiang 2008).

Denote the experimental and control groups by the subscripts E and C, respectively. We also consider a hypothetical placebo group denoted by the subscript P. Let λ (subscripted by E, C, or P) denote the population-average HIV incidence rate (used interchangeably for both the parameter and the estimator), and let Δ denote the non-inferiority margin.

The standard analytical approach considers inference on a log incidence scale, the “natural” parameterisation for the exponential family of distributions (Nelder 1998). Non-inferiority is demonstrated if it is shown (probabilistically) that

Denote the expression on the LHS of eq. (2) as STD (shorthand for Standard). Although this formulation is not conventional, it facilitates a comparison with the AIR. The latter is defined by (Dunn et al. 2018):

The superficial similarity of eqs. (2) and (3) masks a key difference: AIR is essentially a (standardised) rate difference measure and STD essentially a (standardised) rate ratio measure.

Specifying _{,} the incidence rate that would have been observed in the absence of an intervention, is often not practicable. Another tack is to make inferences via the assumed effectiveness of the control group agent:

Re-arranging eq. (4) and substituting in eqs. (2) and (3), STD can be expressed as

and AIR expressed as

Eq. (6) reveals an interesting point. Although AIR was formulated as a rate difference based measure, when estimated via

## 3 Comparison of AIR and STD

We exemplify the difference between AIR and STD using a hypothetical two-arm active-control trial. The following conditions are fixed: (a) equal follow-up in the control and experimental arms (b) 40 HIV endpoints in control arm (c) control agent effectiveness of 60 % (relative to placebo). The number of HIV endpoints in the experimental arm is allowed to vary between 20 and 70 (rate ratio of 0.50 to 1.75).

Figure 1 shows the relationship between AIR and STD and the number of HIV endpoints in experimental arm. Both AIR and STD are equal to one when 40 HIV endpoints are also observed in the experimental arm (i. e. the two agents are equally effective). Both measures are greater than 1 when there are fewer than 40 endpoints in the experimental arm (i. e. it is more effective than the control agent) although the AIR is less than STD. Conversely, the AIR is greater than STD when there are more than 40 endpoints in the experimental arm (i. e. it is less effective than the control agent).

### Figure 1:

The lower confidence limits, upon which the assessment of non-inferiority is based, are more pertinent than the point estimates. The lower 5 % confidence limits are represented by dotted lines in Figure 1, along with a grey horizontal line representing a non-inferiority margin of 50 %. Focussing on where these lines intercept, non-inferiority is seen to be demonstrated by AIR if there are 49 or fewer HIV endpoints, and by STD if there are 44 or fewer HIV endpoints. That is, if between 45 and 49 HIV endpoints are observed then non-inferiority is demonstrated by AIR but not by STD. This implies that greater statistical power can be achieved by using the AIR rather than the rate ratio. The following section looks at this algebraically.

## 4 Implications for Sample Size by Using the AIR Rather than Rate Ratio

From eq. (5), non-inferiority is demonstrated by the rate ratio if the upper confidence limit for

As inference regarding non-inferiority is based on

Similarly, under the AIR approach, the approximate sample size is inversely proportional to

Thus the ratio of sample sizes (more precisely, the required person-years follow-up) under the standard statistical approach compared with AIR is given by:

In some studies (e. g. HPTN-083 trial comparing injectable cabotegravir versus oral TDF-FTC) the experimental agent is assumed to be more effective than the control agent for the purposes of sample size calculation. However, power is often evaluated (e. g. DISCOVER trial comparing oral TAF-FTC versus oral TDF-FTC) assuming the experimental and control agents are equally effective (

Figure 2 shows the percentage reduction in sample size (a simple transformation of the ratio) plotted against *assumed* values for *true* value of

### Figure 2:

## 5 Efficacy and Effectiveness

There is a key distinction between efficacy and effectiveness (Sommer and Zeger 1991; Dai et al. 2013; Wilder-Smith et al. 2017). Efficacy, the measure of key interest to regulators, is the effect of the intervention under idealised conditions, including taking the drug precisely as prescribed. Effectiveness, of more relevance to public health decision makers, is the effect of the intervention in real-life clinical practice, allowing for imperfect adherence. Our paper thus far has implicitly referred to effectiveness (θ); in this section, we consider the relationship between effectiveness and efficacy for the different effect measures.

For simplicity, we assume that effectiveness is a function of efficacy and adherence only. A meaningful definition of adherence is not straightforward, particularly for “on demand” regimens, since the presence of drug during periods without risky sex is irrelevant (Molina et al. 2015). Again simplifying, in a binary manner, we consider that for each sex act involving exposure to HIV there is: (a) a probability P that there are protective PrEP concentrations, which multiply the risk of acquiring infection by a factor (1- ψ) (b) a probability (1-P) that PrEP concentrations are wholly inadequate and confer *zero* protection against infection. By definition, ψ denotes PrEP efficacy. Our approach has parallels with that of Dai et al. who also invoke a binary division in a counterfactual framework, but splitting participants (rather than sex acts) into compliers and non-compliers (Dai et al. 2013). In a more empirical approach, Hanscom et al. define adherence as the proportion of active-arm participants with detectable levels of PrEP (Hanscom et al. 2018). More realistic, but less tractable, models would allow the level of protection to be a continuous function of PrEP drug concentrations (Anderson et al. 2012).

In our framework, again using the subscripts E, C, and P to denote the experimental, control, and hypothetical placebo groups:

In terms of effectiveness,

and similarly for

Manipulating eq. (3),

As we pointed out previously, the AIR can be expressed as the effectiveness of the experimental agent divided by the effectiveness of the active-control agent (Dunn et al. 2018). Surprisingly, it also equal to the ratio of the respective efficacies if

In contrast both the rate ratio and rate difference depend on the level of adherence, even if this is equal between the two groups.

These measures are drawn closer to the null values of one and zero, respectively, the higher the level of non-adherence. Figure 3 shows how the various measures relate to adherence (as quantified by the parameter P described above) when

### Figure 3:

## 6 Discussion

In this short note we provide more mathematical detail about the AIR than we presented in our original exposition (Dunn et al. 2018). Apart from ease of interpretation, its adoption allows the use of smaller sample sizes compared with basing inference on the rate ratio. These savings are substantial, typically between 30 % and 40 % for plausible assumptions about the effectiveness of the active control agent. The reduction in sample size may seem like statistical sleight-of-hand, merely being a consequence of using a less stringent non-inferiority margin for the rate ratio. Our counter-argument is that the AIR is a more meaningful scale for assessing preservation of effect. Work in progress on sample size calculations based on the hypothetical placebo incidence (rather than the effectiveness of the control agent) suggests an even greater advantage in using the AIR.

Another interpretational advantage of the AIR is that it reflects preservation of effect for both effectiveness and efficacy for two agents with the same adherence. We stress that this interpretation only applies under our postulated model, which includes simplistic assumptions. Further research is required to test the robustness of this conclusion under different model formulations. Irrespective of the intended analytical approach, there are compelling reasons to attempt to measure adherence within a trial, including generating knowledge to allow development of more realistic causal models and to examine the plausibility of prior assumptions about the effectiveness of the control agent (Dai et al. 2013; Hanscom et al. 2018).

Finally, although the AIR was developed in the context of non-inferiority trials, this does not preclude its application to superiority trials. Indeed, we would argue that it offers the same interpretational advantages over the rate ratio in this setting as well. The gains in statistical power will also hold given that the problems of sample size calculation for non-inferiority and superiority trials are essentially symmetrical (Dunn, Copas, and Brocklehurst 2018).

**Funding statement: **DTD was supported by the UK Medical Research Council (MRC_UU_12023/23) during preparation of and outside the submitted work. DVG was supported by US National Institutes of Health grants (R03 AI120819, R03 AI122908, R01 AI143357).

### References

Anderson, P. L., D. V. Glidden, A. Liu, S. Buchbinder, J. R. Lama, J. V. Guanira, V. McMahan, et al. 2012. “Emtricitabine-tenofovir Concentrations and Pre-exposure Prophylaxis Efficacy in Men Who Have Sex with Men.” Science Translational Medicine 4 (151).2297284310.1126/scitranslmed.3004006Search in Google Scholar

Cutrell, A., D. Donnell, D. T. Dunn, D. V. Glidden, A. Grobler, B. Hanscom, B. S. Stancil, R. D. Meyer, R. Wang, and R. L. Cuffe. 2017. “HIV Prevention Trial Design in an Era of Effective Pre-exposure Prophylaxis.” *HIV Clinical Trials* 18 (5–6): 177–88.10.1080/15284336.2017.1379676Search in Google Scholar

Dai, J. Y., P. B. Gilbert, J. P. Hughes, and E. R. Brown. 2013. “Estimating the Efficacy of Preexposure Prophylaxis for HIV Prevention among Participants with a Threshold Level of Drug Concentration.” *American Journal of Epidemiology* 177 (3): 256–63.2330215210.1093/aje/kws324Search in Google Scholar

Donnell, D., J. P Hughes, L. Wang, Y. Q. Chen, and T. R. Fleming. 2013. “Study Design Considerations for Evaluating Efficacy of Systemic Preexposure Prophylaxis Interventions.” *Journal of Acquired Immune Deficiency Syndromes* 63: S130–S143.10.1097/QAI.0b013e3182986facSearch in Google Scholar

Dunn, D. T., A. J. Copas, and P. Brocklehurst. 2018. “Superiority and Non-inferiority: Two Sides of the Same Coin?” *Trials* 19 (1): 499.10.1186/s13063-018-2885-z30223881Search in Google Scholar

Dunn, D. T., D. V. Glidden, O. T. Stirrup, and S. McCormack. 2018. “The Averted Infections Ratio: A Novel Measure of Effectiveness of Experimental HIV Pre-exposure Prophylaxis Agents.” *Lancet HIV* 5 (6): e329–e334.10.1016/S2352-3018(18)30045-6Search in Google Scholar

Efron, B. 1977. “The Efficiency of Cox’s Likelihood Function for Censored Data.” *Journal of American Statistical Association* 72: 557–65.10.1080/01621459.1977.10480613Search in Google Scholar

Hanscom, B., J. P. Hughes, B. D. Williamson, and D. Donnell. 2018 (October). “Adaptive Non-inferiority Margins under Observable Non-constancy.” Statistical Methods in Medical Research . DOI: https://doi.org/10.1177/0962280218801134.30293490Search in Google Scholar

Janes, Holly, Deborah Donnell, Peter B. Gilbert, Elizabeth R. Brown, and Martha Nason. 2019. “Taking Stock of the Present and Looking Ahead: Envisioning Challenges in the Design of Future HIV Prevention Efficacy Trials.” *Lancet HIV*10.1016/s2352-3018(19)30133-x. [Epub ahead of print].31078451Search in Google Scholar

Molina, J. M., C. Capitant, B. Spire, G. Pialoux, L. Cotte, I. Charreau, C. Tremblay, et al. 2015. “On-Demand Preexposure Prophylaxis in Men at High Risk for HIV-1 Infection.” The New England Journal of Medicine 373 (23): 2237–46.10.1056/NEJMoa150627326624850Search in Google Scholar

Nelder, J. A. 1998. “A Large Class of Models Derived from Generalized Linear Models.” *Statistics in Medicine* 17 (23): 2747–53.10.1002/(SICI)1097-0258(19981215)17:23<2747::AID-SIM40>3.0.CO;2-I9881420Search in Google Scholar

Snapinn, S., and Q. Jiang. 2008. “Preservation of Effect and the Regulatory Approval of New Treatments on the Basis of Non-inferiority Trials.” *Statistics in Medicine* 27 (3): 382–91.1791471210.1002/sim.3073Search in Google Scholar

Sommer, A., and S. L. Zeger. 1991. “On Estimating Efficacy from Clinical Trials.” *Statistics in Medicine* 10: 45–52.200635510.1002/sim.4780100110Search in Google Scholar

Wilder-Smith, A., I. Longini, P. L. Zuber, T. Barnighausen, W. J. Edmunds, N. Dean, V. M. Spicher, M. R. Benissa, and B. D. Gessner. 2017. “The Public Health Value of Vaccines beyond Efficacy: Methods, Measures and Outcomes.” *BMC Medicine* 15 (1): 138.2874329910.1186/s12916-017-0911-8Search in Google Scholar

Zhu, Haiyuan. 2016. “Sample Size Calculation for Comparing Two Poisson or Negative Binomial Rates in Noninferiority or Equivalence Trials.” *Statistics in Biopharmaceutical Research* 9 (1): 107–15.10.1080/19466315.2016.1225594Search in Google Scholar

**Received:**2019-02-15

**Revised:**2019-06-04

**Accepted:**2019-06-12

**Published Online:**2019-07-12

© 2019 Dunn and Glidden, published by De Gruyter

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.