Skip to content
BY-NC-ND 4.0 license Open Access Published by De Gruyter July 12, 2019

The Connection between the Averted Infections Ratio and the Rate Ratio in Active-control Trials of Pre-exposure Prophylaxis Agents

David T. Dunn ORCID logo and David V. Glidden

Abstract

The design and analysis of active-control trials to evaluate experimental HIV pre-exposure prophylaxis (PrEP) agents pose serious statistical challenges. We recently proposed a new outcome measure, the averted infections ratio (AIR) – the proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The main aim of the current paper is to examine the mathematical connection between AIR and the HIV incidence rate ratio, the standard outcome measure. We also consider the sample size implications of the choice of primary outcome measure and explore the connection between effectiveness and efficacy under a simplified model of adherence.

1 Introduction

Late-phase trials of experimental HIV pre-exposure prophylaxis (PrEP) agents are currently designed as active-control trials, with oral TDF-FTC, currently the only drug licensed for this indication, constituting the control regimen. In the absence of a validated surrogate, the primary endpoint is an incident HIV infection (Cutrell et al. 2017; Janes et al. 2019). The standard primary outcome measure, following the approach used in earlier placebo controlled trials, is the HIV incidence rate ratio comparing the experimental and control groups (Cutrell et al. 2017; Donnell et al. 2013). In a recent paper, we pointed out serious difficulties in the interpretation of this measure and described an alternative measure of effectiveness, the averted infections ratio (AIR), based on the concept of averted infections (Dunn et al. 2018). The AIR is interpreted as proportion of infections that would be averted by using the experimental agent rather than the control agent (compared to no intervention). The measure is simple to interpret, has direct clinical and public health relevance, and is a natural preservation-of-effect metric for assessing statistical non-inferiority.

The main aim of this paper is to examine the connection between AIR and the rate ratio in more mathematical detail, and to explain how the AIR allows a reduction in sample size for the same level of statistical power. We also point out a curious feature of the AIR concerning effectiveness and efficacy under a simplified model of adherence.

2 Statistical Formulation

Currently, most studies of experimental PrEP agents are designed as non-inferiority trials, where the primary aim is to show that HIV incidence is not unacceptably higher with the experimental agent than with TDF-FTC. This is formally judged by whether the observed confidence limit (lower or upper, as appropriate) for the primary outcome measure exceeds a pre-defined non-inferiority margin. A “preservation of effect” argument is often used as a basis for this margin i. e. to aim to show that the experimental agent preserves a minimum fraction of the effect of TDF-FTC relative to placebo or no treatment (Snapinn and Jiang 2008).

Denote the experimental and control groups by the subscripts E and C, respectively. We also consider a hypothetical placebo group denoted by the subscript P. Let λ (subscripted by E, C, or P) denote the population-average HIV incidence rate (used interchangeably for both the parameter and the estimator), and let Δ denote the non-inferiority margin.

The standard analytical approach considers inference on a log incidence scale, the “natural” parameterisation for the exponential family of distributions (Nelder 1998). Non-inferiority is demonstrated if it is shown (probabilistically) that

(1)log(λE)log(λC)<(1Δ)[log(λP)log(λC)]
(2)log(λP)log(λE)log(λP)log(λC)>Δ

Denote the expression on the LHS of eq. (2) as STD (shorthand for Standard). Although this formulation is not conventional, it facilitates a comparison with the AIR. The latter is defined by (Dunn et al. 2018):

(3)AIR=λPλEλPλC

The superficial similarity of eqs. (2) and (3) masks a key difference: AIR is essentially a (standardised) rate difference measure and STD essentially a (standardised) rate ratio measure.

Specifying λP, the incidence rate that would have been observed in the absence of an intervention, is often not practicable. Another tack is to make inferences via the assumed effectiveness of the control group agent:

(4)θC=1λC/λP

Re-arranging eq. (4) and substituting in eqs. (2) and (3), STD can be expressed as

(5)STD=1log(λC/λE)log(1θC)

and AIR expressed as

(6)AIR=1λE/λC(1θC)θC

Eq. (6) reveals an interesting point. Although AIR was formulated as a rate difference based measure, when estimated via θC it becomes a linear function of the rate ratio (experimental group relative to control group) observed in the trial. The remainder of this paper considers inference based on θC rather than λP. Also, from this perspective, there is no need to conceptualise constant incidence rates as implied by eqs. (2), (3), and (4). The only underlying assumption is a constant hazard ratio, which can be estimated by Cox regression models with no or little loss in statistical efficiency (Efron 1977).

3 Comparison of AIR and STD

We exemplify the difference between AIR and STD using a hypothetical two-arm active-control trial. The following conditions are fixed: (a) equal follow-up in the control and experimental arms (b) 40 HIV endpoints in control arm (c) control agent effectiveness of 60 % (relative to placebo). The number of HIV endpoints in the experimental arm is allowed to vary between 20 and 70 (rate ratio of 0.50 to 1.75).

Figure 1 shows the relationship between AIR and STD and the number of HIV endpoints in experimental arm. Both AIR and STD are equal to one when 40 HIV endpoints are also observed in the experimental arm (i. e. the two agents are equally effective). Both measures are greater than 1 when there are fewer than 40 endpoints in the experimental arm (i. e. it is more effective than the control agent) although the AIR is less than STD. Conversely, the AIR is greater than STD when there are more than 40 endpoints in the experimental arm (i. e. it is less effective than the control agent).

Figure 1: Comparison of point estimates and lower 5 % confidence limit for AIR and STD estimators. Analysis based on a hypothetical two-arm active-control trial (details specified in Section 3).

Figure 1:

Comparison of point estimates and lower 5 % confidence limit for AIR and STD estimators. Analysis based on a hypothetical two-arm active-control trial (details specified in Section 3).

The lower confidence limits, upon which the assessment of non-inferiority is based, are more pertinent than the point estimates. The lower 5 % confidence limits are represented by dotted lines in Figure 1, along with a grey horizontal line representing a non-inferiority margin of 50 %. Focussing on where these lines intercept, non-inferiority is seen to be demonstrated by AIR if there are 49 or fewer HIV endpoints, and by STD if there are 44 or fewer HIV endpoints. That is, if between 45 and 49 HIV endpoints are observed then non-inferiority is demonstrated by AIR but not by STD. This implies that greater statistical power can be achieved by using the AIR rather than the rate ratio. The following section looks at this algebraically.

4 Implications for Sample Size by Using the AIR Rather than Rate Ratio

From eq. (5), non-inferiority is demonstrated by the rate ratio if the upper confidence limit for log(λE/λ) is less than (Δ1)log(1θC). Similarly from eq. (6), non-inferiority is demonstrated by the AIR if the upper confidence limit for log(λE/λ) is less than log(1θCΔ)log(1θC). It could be questioned whether it is valid to use the same value of Δ for two different metrics but it is not obvious why one should demand a higher or lower preservation of effect with one metric than the other.

As inference regarding non-inferiority is based on log(λE/λ) in both cases, the two approaches differ only in terms of the non-inferiority margin (on the rate ratio scale). Under the standard statistical approach, the approximate sample size to demonstrate non-inferiority, for a specified power and confidence interval (and assuming a 1:1 allocation ratio), can be shown to be inversely proportional to (Zhu 2016):

(7)[(Δ1)log(1θC)log(λE/λC)]2

Similarly, under the AIR approach, the approximate sample size is inversely proportional to

(8)[log(1θCΔ)log(1θC)log(λE/λC)]2

Thus the ratio of sample sizes (more precisely, the required person-years follow-up) under the standard statistical approach compared with AIR is given by:

(9)[log(1θCΔ)log(1θC)log(λE/λC)(Δ1)log(1θC)log(λE/λC)]2

In some studies (e. g. HPTN-083 trial comparing injectable cabotegravir versus oral TDF-FTC) the experimental agent is assumed to be more effective than the control agent for the purposes of sample size calculation. However, power is often evaluated (e. g. DISCOVER trial comparing oral TAF-FTC versus oral TDF-FTC) assuming the experimental and control agents are equally effective (λE/λC=1). In this case eq. (9) simplifies to

(10)[log(1θCΔ)log(1θC)(Δ1)log(1θC)]2

Figure 2 shows the percentage reduction in sample size (a simple transformation of the ratio) plotted against θC in the range 0.5–0.8, for Δ = 0.5 (the commonly accepted value for the non-inferiority margin). When λE/λ=1, the reduction ranges between 27 % (θC=0.5) and 46 % (θC=0.8). The degree of advantage by using the AIR is diminished when λE/λ<1. For example, when λE/λC=0.7, the reduction ranges between 15 % (θC=0.5) and 36 % (θC=0.8). It is emphasised that these are assumed values for λE/λwhich pertain to the sample size calculation. The actual gain in power by using the AIR depends on the true value of λE/λ, which becomes apparent only once the trial is conducted. We note from eq. (9) that the relative sample sizes are independent of λP, the underlying HIV incidence rate in the study population, although the absolute sample sizes are dependent on this parameter.

Figure 2: Percentage reduction in sample size achieved by using AIR rather than rate ratio as primary effect measure, according to control drug effectiveness and rate ratio (experimental to control arms). All input parameters assumed equal. Non-inferiority margin (Δ) = 0.5.

Figure 2:

Percentage reduction in sample size achieved by using AIR rather than rate ratio as primary effect measure, according to control drug effectiveness and rate ratio (experimental to control arms). All input parameters assumed equal. Non-inferiority margin (Δ) = 0.5.

5 Efficacy and Effectiveness

There is a key distinction between efficacy and effectiveness (Sommer and Zeger 1991; Dai et al. 2013; Wilder-Smith et al. 2017). Efficacy, the measure of key interest to regulators, is the effect of the intervention under idealised conditions, including taking the drug precisely as prescribed. Effectiveness, of more relevance to public health decision makers, is the effect of the intervention in real-life clinical practice, allowing for imperfect adherence. Our paper thus far has implicitly referred to effectiveness (θ); in this section, we consider the relationship between effectiveness and efficacy for the different effect measures.

For simplicity, we assume that effectiveness is a function of efficacy and adherence only. A meaningful definition of adherence is not straightforward, particularly for “on demand” regimens, since the presence of drug during periods without risky sex is irrelevant (Molina et al. 2015). Again simplifying, in a binary manner, we consider that for each sex act involving exposure to HIV there is: (a) a probability P that there are protective PrEP concentrations, which multiply the risk of acquiring infection by a factor (1- ψ) (b) a probability (1-P) that PrEP concentrations are wholly inadequate and confer zero protection against infection. By definition, ψ denotes PrEP efficacy. Our approach has parallels with that of Dai et al. who also invoke a binary division in a counterfactual framework, but splitting participants (rather than sex acts) into compliers and non-compliers (Dai et al. 2013). In a more empirical approach, Hanscom et al. define adherence as the proportion of active-arm participants with detectable levels of PrEP (Hanscom et al. 2018). More realistic, but less tractable, models would allow the level of protection to be a continuous function of PrEP drug concentrations (Anderson et al. 2012).

In our framework, again using the subscripts E, C, and P to denote the experimental, control, and hypothetical placebo groups:

(11)λC=(1PC)λP+PC(1ψC)λP=λP(1PCψC)
(12)λE=(1PE)λP+PE(1ψE)λP=λP(1PEψE)

In terms of effectiveness,

θC=1λCλP=1λP(1PCψC)λP=PCψC

and similarly for θE.

Manipulating eq. (3),

(13)AIR=PEψEPCψC=θEθC

As we pointed out previously, the AIR can be expressed as the effectiveness of the experimental agent divided by the effectiveness of the active-control agent (Dunn et al. 2018). Surprisingly, it also equal to the ratio of the respective efficacies if PE=PC. Thus if adherence is the same for the two agents being compared (which may well be the case for trials evaluating similar oral formulations, such as the DISCOVER trial), the AIR can be interpreted in terms of either effectiveness or efficacy, regardless of the level of adherence in the trial. This does not mean that the level of adherence achieved is irrelevant, however, since this affects the precision of the estimate (higher adherence, more precision).

In contrast both the rate ratio and rate difference depend on the level of adherence, even if this is equal between the two groups.

(14)λE/λ=1PEψE1PCψC;λEλC=λP(PCψCPEψE)

These measures are drawn closer to the null values of one and zero, respectively, the higher the level of non-adherence. Figure 3 shows how the various measures relate to adherence (as quantified by the parameter P described above) when θC = 0.8, θE = 0.6, and λP = 0.05. As predicted by the algebra, the value of AIR is invariant.

Figure 3: Point estimates of AIR, rate ratio, and rate difference as a function of adherence under a simplified model (see Section 5). Adherence represents the probability of protective PrEP concentrations during each sex act involving exposure to HIV. Footnote. Assumptions: control drug effectiveness = 80 %, experimental drug effectiveness = 60 %, placebo incidence = 5 per 100 PY.

Figure 3:

Point estimates of AIR, rate ratio, and rate difference as a function of adherence under a simplified model (see Section 5). Adherence represents the probability of protective PrEP concentrations during each sex act involving exposure to HIV. Footnote. Assumptions: control drug effectiveness = 80 %, experimental drug effectiveness = 60 %, placebo incidence = 5 per 100 PY.

6 Discussion

In this short note we provide more mathematical detail about the AIR than we presented in our original exposition (Dunn et al. 2018). Apart from ease of interpretation, its adoption allows the use of smaller sample sizes compared with basing inference on the rate ratio. These savings are substantial, typically between 30 % and 40 % for plausible assumptions about the effectiveness of the active control agent. The reduction in sample size may seem like statistical sleight-of-hand, merely being a consequence of using a less stringent non-inferiority margin for the rate ratio. Our counter-argument is that the AIR is a more meaningful scale for assessing preservation of effect. Work in progress on sample size calculations based on the hypothetical placebo incidence (rather than the effectiveness of the control agent) suggests an even greater advantage in using the AIR.

Another interpretational advantage of the AIR is that it reflects preservation of effect for both effectiveness and efficacy for two agents with the same adherence. We stress that this interpretation only applies under our postulated model, which includes simplistic assumptions. Further research is required to test the robustness of this conclusion under different model formulations. Irrespective of the intended analytical approach, there are compelling reasons to attempt to measure adherence within a trial, including generating knowledge to allow development of more realistic causal models and to examine the plausibility of prior assumptions about the effectiveness of the control agent (Dai et al. 2013; Hanscom et al. 2018).

Finally, although the AIR was developed in the context of non-inferiority trials, this does not preclude its application to superiority trials. Indeed, we would argue that it offers the same interpretational advantages over the rate ratio in this setting as well. The gains in statistical power will also hold given that the problems of sample size calculation for non-inferiority and superiority trials are essentially symmetrical (Dunn, Copas, and Brocklehurst 2018).

Funding statement: DTD was supported by the UK Medical Research Council (MRC_UU_12023/23) during preparation of and outside the submitted work. DVG was supported by US National Institutes of Health grants (R03 AI120819, R03 AI122908, R01 AI143357).

References

Anderson, P. L., D. V. Glidden, A. Liu, S. Buchbinder, J. R. Lama, J. V. Guanira, V. McMahan, et al. 2012. “Emtricitabine-tenofovir Concentrations and Pre-exposure Prophylaxis Efficacy in Men Who Have Sex with Men.” Science Translational Medicine 4 (151).2297284310.1126/scitranslmed.3004006Search in Google Scholar

Cutrell, A., D. Donnell, D. T. Dunn, D. V. Glidden, A. Grobler, B. Hanscom, B. S. Stancil, R. D. Meyer, R. Wang, and R. L. Cuffe. 2017. “HIV Prevention Trial Design in an Era of Effective Pre-exposure Prophylaxis.” HIV Clinical Trials 18 (5–6): 177–88.10.1080/15284336.2017.1379676Search in Google Scholar

Dai, J. Y., P. B. Gilbert, J. P. Hughes, and E. R. Brown. 2013. “Estimating the Efficacy of Preexposure Prophylaxis for HIV Prevention among Participants with a Threshold Level of Drug Concentration.” American Journal of Epidemiology 177 (3): 256–63.2330215210.1093/aje/kws324Search in Google Scholar

Donnell, D., J. P Hughes, L. Wang, Y. Q. Chen, and T. R. Fleming. 2013. “Study Design Considerations for Evaluating Efficacy of Systemic Preexposure Prophylaxis Interventions.” Journal of Acquired Immune Deficiency Syndromes 63: S130–S143.10.1097/QAI.0b013e3182986facSearch in Google Scholar

Dunn, D. T., A. J. Copas, and P. Brocklehurst. 2018. “Superiority and Non-inferiority: Two Sides of the Same Coin?” Trials 19 (1): 499.10.1186/s13063-018-2885-z30223881Search in Google Scholar

Dunn, D. T., D. V. Glidden, O. T. Stirrup, and S. McCormack. 2018. “The Averted Infections Ratio: A Novel Measure of Effectiveness of Experimental HIV Pre-exposure Prophylaxis Agents.” Lancet HIV 5 (6): e329–e334.10.1016/S2352-3018(18)30045-6Search in Google Scholar

Efron, B. 1977. “The Efficiency of Cox’s Likelihood Function for Censored Data.” Journal of American Statistical Association 72: 557–65.10.1080/01621459.1977.10480613Search in Google Scholar

Hanscom, B., J. P. Hughes, B. D. Williamson, and D. Donnell. 2018 (October). “Adaptive Non-inferiority Margins under Observable Non-constancy.” Statistical Methods in Medical Research . DOI: https://doi.org/10.1177/0962280218801134.30293490Search in Google Scholar

Janes, Holly, Deborah Donnell, Peter B. Gilbert, Elizabeth R. Brown, and Martha Nason. 2019. “Taking Stock of the Present and Looking Ahead: Envisioning Challenges in the Design of Future HIV Prevention Efficacy Trials.” Lancet HIV10.1016/s2352-3018(19)30133-x. [Epub ahead of print].31078451Search in Google Scholar

Molina, J. M., C. Capitant, B. Spire, G. Pialoux, L. Cotte, I. Charreau, C. Tremblay, et al. 2015. “On-Demand Preexposure Prophylaxis in Men at High Risk for HIV-1 Infection.” The New England Journal of Medicine 373 (23): 2237–46.10.1056/NEJMoa150627326624850Search in Google Scholar

Nelder, J. A. 1998. “A Large Class of Models Derived from Generalized Linear Models.” Statistics in Medicine 17 (23): 2747–53.10.1002/(SICI)1097-0258(19981215)17:23<2747::AID-SIM40>3.0.CO;2-I9881420Search in Google Scholar

Snapinn, S., and Q. Jiang. 2008. “Preservation of Effect and the Regulatory Approval of New Treatments on the Basis of Non-inferiority Trials.” Statistics in Medicine 27 (3): 382–91.1791471210.1002/sim.3073Search in Google Scholar

Sommer, A., and S. L. Zeger. 1991. “On Estimating Efficacy from Clinical Trials.” Statistics in Medicine 10: 45–52.200635510.1002/sim.4780100110Search in Google Scholar

Wilder-Smith, A., I. Longini, P. L. Zuber, T. Barnighausen, W. J. Edmunds, N. Dean, V. M. Spicher, M. R. Benissa, and B. D. Gessner. 2017. “The Public Health Value of Vaccines beyond Efficacy: Methods, Measures and Outcomes.” BMC Medicine 15 (1): 138.2874329910.1186/s12916-017-0911-8Search in Google Scholar

Zhu, Haiyuan. 2016. “Sample Size Calculation for Comparing Two Poisson or Negative Binomial Rates in Noninferiority or Equivalence Trials.” Statistics in Biopharmaceutical Research 9 (1): 107–15.10.1080/19466315.2016.1225594Search in Google Scholar

Received: 2019-02-15
Revised: 2019-06-04
Accepted: 2019-06-12
Published Online: 2019-07-12

© 2019 Dunn and Glidden, published by De Gruyter

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.