Confidence limits for the averted infections ratio estimated via the counterfactual placebo incidence rate

Abstract Objectives The averted infections ratio (AIR) is a novel measure for quantifying the preservation-of-effect in active-control non-inferiority clinical trials with a time-to-event outcome. In the main formulation, the AIR requires an estimate of the counterfactual placebo incidence rate. We describe two approaches for calculating confidence limits for the AIR given a point estimate of this parameter, a closed-form solution based on a Taylor series expansion (delta method) and an iterative method based on the profile-likelihood. Methods For each approach, exact coverage probabilities for the lower and upper confidence limits were computed over a grid of values of (1) the true value of the AIR (2) the expected number of counterfactual events (3) the effectiveness of the active-control treatment. Results Focussing on the lower confidence limit, which determines whether non-inferiority can be declared, the coverage achieved by the delta method is either less than or greater than the nominal coverage, depending on the true value of the AIR. In contrast, the coverage achieved by the profile-likelihood method is consistently accurate. Conclusions The profile-likelihood method is preferred because of better coverage properties, but the simpler delta method is valid when the experimental treatment is no less effective than the control treatment. A complementary Bayesian approach, which can be applied when the counterfactual incidence rate can be represented as a prior distribution, is also outlined.


Introduction
In a series of papers we have considered the analysis of active-control non-inferiority trials with a time-to-event outcome in the context of HIV prevention trials (Dunn and Glidden 2019;Dunn et al. 2018;Glidden, Stirrup, and Dunn 2020). Our key conclusion is that the standard metric used in such trials, the rate ratio comparing experimental and control arms, is misleading. We further argued that clinically meaningful inference requires estimation or specification of one of two unobserved parameters: (a) the event rate that would have been observed in trial subjects if they had received no treatment (counterfactual placebo arm) or (b) the effectiveness of the control arm relative to the counterfactual placebo arm. With this information, in combination with the observed incidence rates in the control and experimental arms, we can estimate a measure called the averted infections ratio (AIR). The AIR is interpreted as the proportion of events that would be averted by use of the experimental treatment compared with the control treatment. In the context of non-inferiority trials, it is a natural criterion for assessing the degree to which the experimental treatment preserves the effect of the control treatment relative to no treatment ("preservation-of-effect") (Ghosh et al. 2011;Pigeot et al. 2003;Snapinn and Jiang 2008). Non-inferiority trials using this approach typically aim to demonstrate at least 50% preservation-of-effect, although this value is context specific and higher values may be warranted (Pigeot et al. 2003;Ghosh et al. 2011). In this paper, we consider the derivation of confidence limits for the AIR when it is estimated via the counterfactual placebo incidence.

Notation and statistical formulation
Denote the hypothetical placebo, control, and experimental arms by the subscripts P, C, and E, respectively. We observe F C person-years follow-up in control arm and F E person-years follow-up in experimental arm. Let X C and X E be the random variables denoting the number of observed events, where we assume that X C ∼ Poi (F C C ) and X E ∼ Poi (F E E ). Let P represent the counterfactual placebo incidence. The averted infections ratio is defined as Alternatively, Ψ can be expressed in terms of the counterfactual control arm effectiveness ( C = 1 − C ∕ P ) rather than P : In this formulation, Ψ is a linear function of the rate ratio and confidence limits for Ψ can be obtained by direct transformation of confidence limits for the rate ratio. As the latter problem has been extensively studied (Graham, Mengersen, and Morton 2003;Li, Tang, and Wong 2014;Price and Bonett 2000;Sahai and Khurshid 1993) we focus on formulation (1).

Inference conditional on counterfactual incidence
This section considers the derivation of confidence limits for the AIR when considering a single, pre-specified value of P . This allows exploration of how the confidence limits (and point estimates) vary over a range of plausible values of P , which can be highly informative (Glidden, Stirrup, and Dunn 2020).

Delta method
We first apply a log transformation to the AIR, a natural procedure for any statistic that is a ratio of two variables. From Eq.

Profile-likelihood method
The log-likelihood under a Poisson model is We can express (3) in terms of Ψ via Eq. (1), noting that a nuisance parameter (either C , or E , or a function of C and E ) is also involved. Denoting this arbitrary nuisance parameter by , the profile-likelihood confidence region for Ψ is defined by the set of values (Cole, Chu, and Greenland 2014) is the unconstrained maximised log-likelihood.
An alternative approach is to parameterise the problem in terms of C and E rather than Ψ and . We therefore maximise (3) subject to the constraint implied by Eq. (1) for a specified value Ψ * . Re-arranging, Introducing a Lagrange multiplier ( ), we maximise Differentiating (5) with respect to , E , and C results in a set of three non-linear equations: noting that Ψ * and P are constants. Using the method of elimination, The roots of the function implied by (4) were found using the uniroot function in R (version 4.02), which utilises the golden-section search procedure combined with parabolic interpolation (code in Appendix).

Unconditional inference
In addition to exploring how the AIR varies over a range of values of the counterfactual incidence, we may wish to integrate over this parameter to obtain the unconditional distribution of the AIR. Bayesian inference provides a natural framework for this problem. Here we consider the case where trial investigators are able to specify a simple prior distribution for the counterfactual incidence, although more sophisticated approaches which incorporate external information are also possible (Glidden, Stirrup, and Dunn 2020).
Assume that the prior for P can be specified as a Gamma distribution based on background knowledge.
For E and C , we use weakly informative priors ∼Gamma(0.5,0.001) -this approximates to Jeffrey's prior (Gelman et al. 1995), and also corresponds to adding 0.5 to the observed number of events as discussed in Section 6. As the Gamma distribution is the conjugate prior for the Poisson model, the posterior distributions for E and C are Gamma(X E + 0.5, F E + 0.001) and Gamma(X C + 0.5, F C + 0.001), respectively (Gelman et al. 1995). We generate samples from the distributions of P , E , and C , to derive the posterior distribution for the AIR using Eq.
(1). The main application of the AIR is in non-inferiority trials, where it is reasonable to assume that C < P since the effectiveness of the control drug will already have been established. Further, the AIR is uninterpretable if C > P as this would imply there was no yardstick against which the experimental drug could be compared (nothing to preserve). In most realistic applications it is also reasonable to assume that E < P as the experimental drug will have been selected as having some biological activity. It is therefore There are three possible re-sampling strategies: (a) re-sample * P only (b) re-sample . The best strategy is not obvious, and all are explored in the example in Section 7.

Three arm trials with a placebo arm
Trials are occasionally designed with a placebo arm in addition to the control and experimental arms, thereby providing a direct estimate of P (Ghosh et al. 2011). The Taylor series approximation (Eq. (2)) requires an additional term to reflect the uncertainty in the estimate of P : The additional term is generally much smaller than the first two terms and, in expectation, (7) tends towards (2) when E = C . This leads to a paradoxical finding, namely that the sample size of the placebo group appears to be irrelevant when this equality is assumed (as is commonly the case when designing non-inferiority trials). This paradox is explained by the fact that Ψ = 1 when E = C regardless of the value of P . However, the placebo group needs to be large enough in order to ensure that the estimatêP is sufficiently stable. An interesting, unresolved question is the optimal relative sample size allocation to the three arms. We further note the profile-likelihood approach (Section 3.2) could, in principle, be extended to three arm trials.

Coverage probabilities Methods
Exact coverage probabilities for the lower and upper confidence limits (at nominal coverage probabilities of 1-, for = 0.025, 0.05) were computed using the delta method and profile-likelihood method described in Section 3. For the purposes of exposition we assume F C = F E = 1, so that C and E can be considered as the expected number of events, and P the expected number of counterfactual events, in each of the two trial arms. The following parameters were examined over a grid of values: Ψ = 0.5(0.1)1.0; P = 40(20)100; C = 0.6(0.1)0.9. Exact coverage probabilities were computed by where C = P (1 − C ) , E = P (1 − C ) , and I(X C , X E ) equals 1 if the lower(upper) confidence limit is less(greater) than Ψ, otherwise equals 0. The log-likelihood is undefined when either X C = 0 or X E = 0. However, in contrast with the rate ratio, this is a highly informative outcome in terms of the AIR (even X C = 0, X E = 0). To avoid this problem, X C and X E were replaced by X C + 0.5 and X E + 0.5 before applying the methods of Section 3.2. For consistency, this adjustment was also applied for confidence limits determined by the delta method. The addition of 0.5 resulted in improved coverage estimates under both approaches, as has previously been reported for the rate ratio (Price and Bonett 2000).

Results
The complete set of coverage probabilities for the lower and upper confidence limits are given in the Appendix. However, the lower confidence limit is of primary interest since this is the comparator for the non-inferiority margin. Also, the upper limit of Ψ may be severely constrained for large values of C . Ψ can be expressed as E ∕ C , so that, for example, Ψ ≤ 1.25 if C = 0.8, Ψ ≤ 1.11 if C = 0.9. Figure 1 shows coverage probabilities using the delta method for the lower one-tailed = 0.05 confidence limit (similar patterns were observed for = 0.025). Coverage is generally too low for Ψ = 0.5-0.8, is reasonably accurate for Ψ = 0.9, and is too high for Ψ = 1.0. This pattern is explained by a negative correlation between the empirical AIR and its estimated standard error, conditional on the true AIR (Ψ). Conditional on Ψ, coverage is higher the larger the value of the control arm effectiveness ( C ), except for Ψ = 1.0 when differences are minor. As expected, actual and nominal coverage are closer the larger the value of P , although convergence is slow with material discrepancies even for P = 100. Coverage probabilities for the upper confidence limit were consistently and substantially too high (Appendix), particularly for lower values of Ψ. Table 1 shows coverage probabilities for the profile-likelihood-based lower confidence limit for P = 40 and = 0.05. Coverage was close to the nominal value of 0.95 (range 0.9468-0.9615) for all permutations of Ψ and C ; as expected, correspondence was even closer at higher values of P (not shown). Coverage for the profile-likelihood-based upper confidence limit were also highly accurate, in contrast to the delta method (Appendix). The results of these analyses support the routine use of profile-likelihood-based confidence limits, although the delta method is valid in a conservative sense (i.e. actual coverage exceeds nominal coverage) if the true AIR is ≥0.9 approximately. This is reflected in larger values for the lower confidence limit using the delta method (Appendix).

Example
The BRIEF TB/A5279 study was a randomised, non-inferiority trial that compared two regimens for the prevention of active tuberculosis in HIV-infected patients who were living in areas of high tuberculosis prevalence or who had evidence of latent tuberculosis infection (Swindells et al. 2019). The reference regimen was 9 months of daily isoniazid alone (9-month arm) and the experimental regimen was 1 month of daily rifapentine plus isoniazid (1-month arm). The incidence of the primary endpoint (diagnosis of tuberculosis, or death from tuberculosis or unknown cause) were similar in the 1-month arm (32 endpoints, 4,926 personyears follow-up (PYFU), incidence rate 0.65 per 100 PYFU) and 9-month arm (33 endpoints, 4,896 PYFU, incidence rate 0.67 per 100 PYFU). The primary metric was the rate difference rather than the rate ratio, which is generally used in HIV prevention research. Non-inferiority was declared by the investigators because the upper 97.5% confidence limit of 0.30 per 100 PYFU was less than the pre-specified margin of 1.25 per 100 PYFU. However, this conclusion is questionable as the authors did not take the counterfactual placebo incidence into account. Notably, the observed incidence in the 9-month arm was markedly lower than the incidence rate assumed for the purposes of sample size calculation (2 per 100 PYFU). Figure 2 shows the lower 5% and upper 95% confidence limits for the AIR as a function of the counterfactual incidence, computed using the delta and profile-likelihood methods. Consistent with results of Section 6.2, the delta method yields narrower confidence intervals. The figure also reveals the sensitive relationship between the lower confidence limit and the assumed counterfactual incidence, underlining the importance of obtaining as much information as possible about this parameter. Figure 3 show the results of a Bayesian analysis (10,000 simulations) under two different priors for the counterfactual incidence: Gamma(10,0.001) and Gamma(10,0.002), corresponding to mean incidence rates of 1 and 2 per 100 PYFU, respectively. Without expert knowledge, we emphasise that this is an illustrative  rather than a definitive analysis. The lower incidence rate is broadly consistent with the overall ∼30% efficacy of tuberculosis prophylaxis in HIV-infected patients (Ross et al. 2021); the higher value is the rate that the investigators postulated for the control regimen (post hoc, a substantial over-estimate).
For the low incidence scenario, 22.2% of initial simulations had to be re-sampled because of violation of Eq. (6). The posterior median (90% credibility interval) AIR was 1.038 (0.347, 3.627) under re-sampling strategy (a), 1.033 (0.373, 3.228) under strategy (b), and 1.031 (0.357, 3.281) under strategy (c). For the high incidence scenario, only 0.6% of initial simulations had to be re-sampled. The posterior median (90% credibility interval) AIR under re-sampling strategy (a) was 1.009 (0.760, 1.370). The values under the other re-sampling strategies were almost identical (all within ±0.002). In general, our preference is to re-sample * P only (strategy (a)) since, lacking empirical data, P is the most uncertain parameter. Figure 3B shows the posterior distributions for the AIR under this strategy, and highlights that inference is much tighter under the high incidence scenario.

Summary
We have described two approaches for calculating confidence limits for the AIR given a pre-specified value of the counterfactual incidence: a closed-form solution based on a Taylor series expansion (delta method), and an iterative method based on the profile-likelihood, for which R code is provided. The profile-likelihood method is preferred because of better coverage properties, but the delta method is valid when the experimental treatment is no less effective than the control treatment. The difference between the two methods is minimal when the counterfactual incidence is much larger than the observed incidence in both the control and experimental arms. We also describe a simple Bayesian approach when the counterfactual incidence rate can be represented as a simple prior distribution. However, more precise inference can be achieved by harnessing other data which inform the prior distribution (Glidden, Stirrup, and Dunn 2020).