Abstract
The design and analysis of active-control trials to evaluate experimental HIV pre-exposure prophylaxis (PrEP) agents pose serious statistical challenges. We recently proposed a new outcome measure, the averted infections ratio (AIR) – the proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The main aim of the current paper is to examine the mathematical connection between AIR and the HIV incidence rate ratio, the standard outcome measure. We also consider the sample size implications of the choice of primary outcome measure and explore the connection between effectiveness and efficacy under a simplified model of adherence.
1 Introduction
Late-phase trials of experimental HIV pre-exposure prophylaxis (PrEP) agents are currently designed as active-control trials, with oral TDF-FTC, currently the only drug licensed for this indication, constituting the control regimen. In the absence of a validated surrogate, the primary endpoint is an incident HIV infection (Cutrell et al. 2017; Janes et al. 2019). The standard primary outcome measure, following the approach used in earlier placebo controlled trials, is the HIV incidence rate ratio comparing the experimental and control groups (Cutrell et al. 2017; Donnell et al. 2013). In a recent paper, we pointed out serious difficulties in the interpretation of this measure and described an alternative measure of effectiveness, the averted infections ratio (AIR), based on the concept of averted infections (Dunn et al. 2018). The AIR is interpreted as proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The measure is simple to interpret, has direct clinical and public health relevance, and is a natural preservation-of-effect metric for assessing statistical non-inferiority.
The main aim of this paper is to examine the connection between AIR and the rate ratio in more mathematical detail, and to explain how the AIR allows a reduction in sample size for the same level of statistical power. We also point out a curious feature of the AIR concerning effectiveness and efficacy under a simplified model of adherence.
2 Statistical Formulation
Currently, most studies of experimental PrEP agents are designed as non-inferiority trials, where the primary aim is to show that HIV incidence is not unacceptably higher with the experimental agent than with TDF-FTC. This is formally judged by whether the observed confidence limit (lower or upper, as appropriate) for the primary outcome measure exceeds a pre-defined non-inferiority margin. A “preservation of effect” argument is often used as a basis for this margin i. e. to aim to show that the experimental agent preserves a minimum fraction of the effect of TDF-FTC relative to placebo or no treatment (Snapinn and Jiang 2008).
Denote the experimental and control groups by the subscripts E and C, respectively. We also consider a hypothetical placebo group denoted by the subscript P. Let λ (subscripted by E, C, or P) denote the population-average HIV incidence rate (used interchangeably for both the parameter and the estimator), and let Δ denote the non-inferiority margin.
The standard analytical approach considers inference on a log incidence scale, the “natural” parameterisation for the exponential family of distributions (Nelder 1998). Non-inferiority is demonstrated if it is shown (probabilistically) that
Denote the expression on the LHS of eq. (2) as STD (shorthand for Standard). Although this formulation is not conventional, it facilitates a comparison with the AIR. The latter is defined by (Dunn et al. 2018):
The superficial similarity of eqs. (2) and (3) masks a key difference: AIR is essentially a (standardised) rate difference measure and STD essentially a (standardised) rate ratio measure.
Specifying
Re-arranging eq. (4) and substituting in eqs. (2) and (3), STD can be expressed as
and AIR expressed as
Eq. (6) reveals an interesting point. Although AIR was formulated as a rate difference based measure, when estimated via
3 Comparison of AIR and STD
We exemplify the difference between AIR and STD using a hypothetical two-arm active-control trial. The following conditions are fixed: (a) equal follow-up in the control and experimental arms (b) 40 HIV endpoints in control arm (c) control agent effectiveness of 60 % (relative to placebo). The number of HIV endpoints in the experimental arm is allowed to vary between 20 and 70 (rate ratio of 0.50 to 1.75).
Figure 1 shows the relationship between AIR and STD and the number of HIV endpoints in experimental arm. Both AIR and STD are equal to one when 40 HIV endpoints are also observed in the experimental arm (i. e. the two agents are equally effective). Both measures are greater than 1 when there are fewer than 40 endpoints in the experimental arm (i. e. it is more effective than the control agent) although the AIR is less than STD. Conversely, the AIR is greater than STD when there are more than 40 endpoints in the experimental arm (i. e. it is less effective than the control agent).

Comparison of point estimates and lower 5 % confidence limit for AIR and STD estimators. Analysis based on a hypothetical two-arm active-control trial (details specified in Section 3).
The lower confidence limits, upon which the assessment of non-inferiority is based, are more pertinent than the point estimates. The lower 5 % confidence limits are represented by dotted lines in Figure 1, along with a grey horizontal line representing a non-inferiority margin of 50 %. Focussing on where these lines intercept, non-inferiority is seen to be demonstrated by AIR if there are 49 or fewer HIV endpoints, and by STD if there are 44 or fewer HIV endpoints. That is, if between 45 and 49 HIV endpoints are observed then non-inferiority is demonstrated by AIR but not by STD. This implies that greater statistical power can be achieved by using the AIR rather than the rate ratio. The following section looks at this algebraically.
4 Implications for Sample Size by Using the AIR Rather than Rate Ratio
From eq. (5), non-inferiority is demonstrated by the rate ratio if the upper confidence limit for
As inference regarding non-inferiority is based on
Similarly, under the AIR approach, the approximate sample size is inversely proportional to
Thus the ratio of sample sizes (more precisely, the required person-years follow-up) under the standard statistical approach compared with AIR is given by:
In some studies (e. g. HPTN-083 trial comparing injectable cabotegravir versus oral TDF-FTC) the experimental agent is assumed to be more effective than the control agent for the purposes of sample size calculation. However, power is often evaluated (e. g. DISCOVER trial comparing oral TAF-FTC versus oral TDF-FTC) assuming the experimental and control agents are equally effective (
Figure 2 shows the percentage reduction in sample size (a simple transformation of the ratio) plotted against

Percentage reduction in sample size achieved by using AIR rather than rate ratio as primary effect measure, according to control drug effectiveness and rate ratio (experimental to control arms). All input parameters assumed equal. Non-inferiority margin (Δ) = 0.5.
5 Efficacy and Effectiveness
There is a key distinction between efficacy and effectiveness (Sommer and Zeger 1991; Dai et al. 2013; Wilder-Smith et al. 2017). Efficacy, the measure of key interest to regulators, is the effect of the intervention under idealised conditions, including taking the drug precisely as prescribed. Effectiveness, of more relevance to public health decision makers, is the effect of the intervention in real-life clinical practice, allowing for imperfect adherence. Our paper thus far has implicitly referred to effectiveness (θ); in this section, we consider the relationship between effectiveness and efficacy for the different effect measures.
For simplicity, we assume that effectiveness is a function of efficacy and adherence only. A meaningful definition of adherence is not straightforward, particularly for “on demand” regimens, since the presence of drug during periods without risky sex is irrelevant (Molina et al. 2015). Again simplifying, in a binary manner, we consider that for each sex act involving exposure to HIV there is: (a) a probability P that there are protective PrEP concentrations, which multiply the risk of acquiring infection by a factor (1- ψ) (b) a probability (1-P) that PrEP concentrations are wholly inadequate and confer zero protection against infection. By definition, ψ denotes PrEP efficacy. Our approach has parallels with that of Dai et al. who also invoke a binary division in a counterfactual framework, but splitting participants (rather than sex acts) into compliers and non-compliers (Dai et al. 2013). In a more empirical approach, Hanscom et al. define adherence as the proportion of active-arm participants with detectable levels of PrEP (Hanscom et al. 2018). More realistic, but less tractable, models would allow the level of protection to be a continuous function of PrEP drug concentrations (Anderson et al. 2012).
In our framework, again using the subscripts E, C, and P to denote the experimental, control, and hypothetical placebo groups:
In terms of effectiveness,
and similarly for
Manipulating eq. (3),
As we pointed out previously, the AIR can be expressed as the effectiveness of the experimental agent divided by the effectiveness of the active-control agent (Dunn et al. 2018). Surprisingly, it also equal to the ratio of the respective efficacies if
In contrast both the rate ratio and rate difference depend on the level of adherence, even if this is equal between the two groups.
These measures are drawn closer to the null values of one and zero, respectively, the higher the level of non-adherence. Figure 3 shows how the various measures relate to adherence (as quantified by the parameter P described above) when

Point estimates of AIR, rate ratio, and rate difference as a function of adherence under a simplified model (see Section 5). Adherence represents the probability of protective PrEP concentrations during each sex act involving exposure to HIV. Footnote. Assumptions: control drug effectiveness = 80 %, experimental drug effectiveness = 60 %, placebo incidence = 5 per 100 PY.
6 Discussion
In this short note we provide more mathematical detail about the AIR than we presented in our original exposition (Dunn et al. 2018). Apart from ease of interpretation, its adoption allows the use of smaller sample sizes compared with basing inference on the rate ratio. These savings are substantial, typically between 30 % and 40 % for plausible assumptions about the effectiveness of the active control agent. The reduction in sample size may seem like statistical sleight-of-hand, merely being a consequence of using a less stringent non-inferiority margin for the rate ratio. Our counter-argument is that the AIR is a more meaningful scale for assessing preservation of effect. Work in progress on sample size calculations based on the hypothetical placebo incidence (rather than the effectiveness of the control agent) suggests an even greater advantage in using the AIR.
Another interpretational advantage of the AIR is that it reflects preservation of effect for both effectiveness and efficacy for two agents with the same adherence. We stress that this interpretation only applies under our postulated model, which includes simplistic assumptions. Further research is required to test the robustness of this conclusion under different model formulations. Irrespective of the intended analytical approach, there are compelling reasons to attempt to measure adherence within a trial, including generating knowledge to allow development of more realistic causal models and to examine the plausibility of prior assumptions about the effectiveness of the control agent (Dai et al. 2013; Hanscom et al. 2018).
Finally, although the AIR was developed in the context of non-inferiority trials, this does not preclude its application to superiority trials. Indeed, we would argue that it offers the same interpretational advantages over the rate ratio in this setting as well. The gains in statistical power will also hold given that the problems of sample size calculation for non-inferiority and superiority trials are essentially symmetrical (Dunn, Copas, and Brocklehurst 2018).
Funding statement: DTD was supported by the UK Medical Research Council (MRC_UU_12023/23) during preparation of and outside the submitted work. DVG was supported by US National Institutes of Health grants (R03 AI120819, R03 AI122908, R01 AI143357).
References
Anderson, P. L., D. V. Glidden, A. Liu, S. Buchbinder, J. R. Lama, J. V. Guanira, V. McMahan, et al. 2012. “Emtricitabine-tenofovir Concentrations and Pre-exposure Prophylaxis Efficacy in Men Who Have Sex with Men.” Science Translational Medicine 4 (151).10.1126/scitranslmed.3004006Search in Google Scholar PubMed PubMed Central
Cutrell, A., D. Donnell, D. T. Dunn, D. V. Glidden, A. Grobler, B. Hanscom, B. S. Stancil, R. D. Meyer, R. Wang, and R. L. Cuffe. 2017. “HIV Prevention Trial Design in an Era of Effective Pre-exposure Prophylaxis.” HIV Clinical Trials 18 (5–6): 177–88.10.1080/15284336.2017.1379676Search in Google Scholar PubMed PubMed Central
Dai, J. Y., P. B. Gilbert, J. P. Hughes, and E. R. Brown. 2013. “Estimating the Efficacy of Preexposure Prophylaxis for HIV Prevention among Participants with a Threshold Level of Drug Concentration.” American Journal of Epidemiology 177 (3): 256–63.10.1093/aje/kws324Search in Google Scholar PubMed PubMed Central
Donnell, D., J. P Hughes, L. Wang, Y. Q. Chen, and T. R. Fleming. 2013. “Study Design Considerations for Evaluating Efficacy of Systemic Preexposure Prophylaxis Interventions.” Journal of Acquired Immune Deficiency Syndromes 63: S130–S143.10.1097/QAI.0b013e3182986facSearch in Google Scholar PubMed PubMed Central
Dunn, D. T., A. J. Copas, and P. Brocklehurst. 2018. “Superiority and Non-inferiority: Two Sides of the Same Coin?” Trials 19 (1): 499.10.1186/s13063-018-2885-zSearch in Google Scholar PubMed PubMed Central
Dunn, D. T., D. V. Glidden, O. T. Stirrup, and S. McCormack. 2018. “The Averted Infections Ratio: A Novel Measure of Effectiveness of Experimental HIV Pre-exposure Prophylaxis Agents.” Lancet HIV 5 (6): e329–e334.10.1016/S2352-3018(18)30045-6Search in Google Scholar PubMed PubMed Central
Efron, B. 1977. “The Efficiency of Cox’s Likelihood Function for Censored Data.” Journal of American Statistical Association 72: 557–65.10.1080/01621459.1977.10480613Search in Google Scholar
Hanscom, B., J. P. Hughes, B. D. Williamson, and D. Donnell. 2018 (October). “Adaptive Non-inferiority Margins under Observable Non-constancy.” Statistical Methods in Medical Research . DOI: https://doi.org/10.1177/0962280218801134.Search in Google Scholar PubMed PubMed Central
Janes, Holly, Deborah Donnell, Peter B. Gilbert, Elizabeth R. Brown, and Martha Nason. 2019. “Taking Stock of the Present and Looking Ahead: Envisioning Challenges in the Design of Future HIV Prevention Efficacy Trials.” Lancet HIV10.1016/s2352-3018(19)30133-x. [Epub ahead of print].Search in Google Scholar PubMed
Molina, J. M., C. Capitant, B. Spire, G. Pialoux, L. Cotte, I. Charreau, C. Tremblay, et al. 2015. “On-Demand Preexposure Prophylaxis in Men at High Risk for HIV-1 Infection.” The New England Journal of Medicine 373 (23): 2237–46.10.1056/NEJMoa1506273Search in Google Scholar PubMed
Nelder, J. A. 1998. “A Large Class of Models Derived from Generalized Linear Models.” Statistics in Medicine 17 (23): 2747–53.10.1002/(SICI)1097-0258(19981215)17:23<2747::AID-SIM40>3.0.CO;2-ISearch in Google Scholar PubMed
Snapinn, S., and Q. Jiang. 2008. “Preservation of Effect and the Regulatory Approval of New Treatments on the Basis of Non-inferiority Trials.” Statistics in Medicine 27 (3): 382–91.10.1002/sim.3073Search in Google Scholar PubMed
Sommer, A., and S. L. Zeger. 1991. “On Estimating Efficacy from Clinical Trials.” Statistics in Medicine 10: 45–52.10.1002/sim.4780100110Search in Google Scholar PubMed
Wilder-Smith, A., I. Longini, P. L. Zuber, T. Barnighausen, W. J. Edmunds, N. Dean, V. M. Spicher, M. R. Benissa, and B. D. Gessner. 2017. “The Public Health Value of Vaccines beyond Efficacy: Methods, Measures and Outcomes.” BMC Medicine 15 (1): 138.10.1186/s12916-017-0911-8Search in Google Scholar PubMed
Zhu, Haiyuan. 2016. “Sample Size Calculation for Comparing Two Poisson or Negative Binomial Rates in Noninferiority or Equivalence Trials.” Statistics in Biopharmaceutical Research 9 (1): 107–15.10.1080/19466315.2016.1225594Search in Google Scholar
© 2019 Dunn and Glidden, published by De Gruyter
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.