Abstract
Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standard-of-care that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects’ predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer’s disease clinical trial and observe meaningful reductions in mean-squared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.
Acknowledgments
We are grateful to Xinkun Nie and Oleg Sofrygin for enlightening conversations and to Rachael C. Aikens for feedback on a draft of this article. Data collection and sharing for this project was funded in part by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data collection and sharing for this project was funded in part by the University of California, San Diego Alzheimer’s Disease Cooperative Study (ADCS) (National Institute on Aging Grant Number U19AG010483).
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: None declared.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Appendix A. Mathematical results
Throughout we assume enough regularity conditions for the asymptotic normality of M-estimators to hold. The details are found in chapter 5 (thm 5.23) of van der Vaart [49].
Lemma A.1
(Rosenblum). The influence function for the linear regression treatment effect estimator we describe in Section 3 is ψ = ψ 1 − ψ 0 where
and
This follows from results in Robins et al. [50]. An accessible presentation for the case of generalized linear models is given in Rosenblum and Laan [51].
Definition A.1
(Difference-in-means). The “difference-in-means” (or “unadjusted”) estimator of τ = μ
1 − μ
0 is
Note that throughout the appendix we omit the subscript n on estimators. E.g. τ Δ is shorthand for τ Δ,n and our asymptotic statements refer to the sequence of estimators as n becomes large.
Lemma A.2
The difference-in-means estimator has asymptotic variance given by
where
Proof
This fact is well-known. One proof follows the outline of 7 below taking Z ⊤ = [1, W]. □
Definition A.2
(ANCOVA I). The “ANCOVA I” estimator of τ = μ
1 − μ
0 (denoted
Definition A.3
(ANCOVA II). The “ANCOVA II” estimator of τ = μ
1 − μ
0 (denoted
The following two Theorems A.3 and A.4 are mild generalizations of or follow closely from results stated in Leon et al. [24] and Yang and Tsiatis [16]. Details are provided here for the reader’s convenience.
Theorem A.3
The ANCOVA I estimator is asymptotically unbiased for τ = μ 1 − μ 0 and has asymptotic variance given by
where
Proof
We begin by applying Lemma A.1. Minimization of the expected log-likelihood shows that
where
Where
It is known that all regular and asymptotically linear estimators of the treatment effect have an influence function of this form with h(X) dependent on the choice of estimator [24, 26].
By the theory of influence functions, our estimator has a limiting distribution [26]
The asymptotic variance of
The covariance of the two terms involves the expectations
where we have introduced ξ * = π 1 ξ 0 + π 0 ξ 1. Assembling obtains the desired result. □
Corollary A.3.1
When X ∈ R (a single covariate), a consistent estimate of the sampling variance
where
Proof
This follows from the definitions and Slutsky’s theorem. □
Corollary A.3.2
If either π 0 = π 1 or ξ 0 = ξ 1, then
Theorem A.4
The ANCOVA II estimator is asymptotically unbiased for τ = μ 1 − μ 0 and has asymptotic variance given by
Proof
Arguments similar to those in Theorem A.3 show that the influence function for the GLM marginal effect estimator with this specification is identical to Eq. (12) except that ξ = π 0 ξ 0 + π 1 ξ 1 is replaced by ξ * = π 1 ξ 0 + π 0 ξ 1. Specifically ψ II = ψ 1,II − ψ 0,II with
The result follows from proceeding along the outline of Theorem A.3. □
Corollary A.4.1
When X ∈ R (a single covariate), a consistent estimate of the sampling variance
Corollary A.4.2
Adding covariates to the ANCOVA II estimator can only decrease its asymptotic variance.
Proof
Consider using covariates X with variance Σ
x
and covariance with Y
w
of ξ
w,x
versus a set of covariates [X, M] (
The denominator must be positive because
Theorem A.5
ANCOVA II is a more efficient estimator than ANCOVA I or difference-in-means. ANCOVA I may or may not be more efficient than difference-in-means (unless π 0 = π 1 = 0.5 or ξ 0 = ξ 1, in which case it is as efficient as ANCOVA II). In a slight abuse of notation,
Proof
Lemma A.6
Consider using the ANCOVA II estimator with an arbitrary (multivariate) transformation of the covariates f(X) in place of the raw covariates X. Among all fixed transformations f(X), the transformation
Consider replacing X in the interacted linear model (ANCOVA II) with an arbitrary fixed (possibly multivariate) function of the covariates f(X). By Eq. (23) and our definitions of ξ * and V the influence function for this estimator is ψ = ψ 1 − ψ 0 with
where
The result is precisely the efficient influence function for the treatment effect [24, 26]. It is known that no regular and asymptotically linear (RAL) estimator (which essentially all practical and reasonable estimators are) can be more efficient than any estimator with this influence function.
Corollary A.6.1
Presume a constant treatment effect: μ 1(X) = μ 0(X) + τ. Then the ANCOVA II analysis that uses μ 0(X) in the role of X has the lowest possible asymptotic variance among all regular and asymptotically linear estimators with access to the covariates X.
Proof
μ
1(X) = μ
0(X) + τ implies
which is the same as the efficient influence function when μ 1(X) = μ 0(X) + τ. □
Corollary A.6.2
Corollary A.6.1 also holds when the ANCOVA II estimator is replaced by the ANCOVA I estimator.
Proof
Theorem A.5 establishes that ANCOVA I is as efficient as ANCOVA II when
The following lemma is required for the proof that proceeds it.
Lemma A.7
Let
Proof
The final convergence holds by our assumption that
Taking advantage of the fact that |f|, |f
n
| ≤ b are bounded we can make similar arguments to show that
Corollary A.7.1
Let
Proof
Let
Now note
as desired. □
Theorem A.8
Presume X has compact support and there is a constant treatment effect: μ
1(X) = μ
0(X) + τ with |μ
0(x)| < b bounded. Let m(x) be a (random) function learned from the external data (
Y
′,
X
′)
n′ such that |m(x)| < b is also bounded and
Proof
Define our estimator of interest as the ANCOVA II estimator that uses the learned model m(X) in place of the covariates X if m(X) is not numerically constant up to some machine precision and otherwise as the difference-in-means estimator. Denote this estimator
Showing
where
Let
To wit, consider the difference
where we’ve abbreviated
And show it converges to 0. Recalling that m itself is random (depends on the external data (
X
′
Y
′)), but independent of the trial data (
X
,
W
,
Y
), note that we can treat m(⋅) as if it were a fixed function and B as a fixed constant if we condition on the external data. After conditioning, the quantity inside the parentheses is IID and has mean zero because its μ
0(X) − m(X)B and
where we’ve used the fact that the summands are IID to pass the variance through the sum and effectively gain the 1/n required to cancel the n. The same argument shows that the equivalent for the second term in Eq. (35) is
To complete the proof we invoke Corollary A.7.1 in combination with our assumptions |m(x)| < b, |μ
0(x)| < b and
Corollary A.8.1
Theorem A.8 also holds for the ANCOVA I estimator.
Proof
In the case of a constant treatment effect ANCOVA I and ANCOVA II have the same asymptotic variance (Theorem A.5). The result follows immediately. □
Appendix B. Estimating
σ
w
2
and ρ
w
for power calculations
One method for obtaining estimates for the marginal potential outcome variances (
The control-arm marginal outcome variance
The correlation ρ 0 between M″ and Y″ can be estimated by
which is the usual sample correlation coefficient. These values may be inflated (
The corresponding values for the treatment arm can rarely be estimated from data because treatment-arm data for the experimental treatment is likely to be scarce or unavailable. It is therefore prudent to assume
Appendix C. Additional simulation results
Here we detail a full set of simulation results using additional specifications for the regression estimators (Figure 1). “Covariates” indicates whether the raw covariates were adjusted for. “Prognostic score” indicates whether any prognostic score was used, and, if so, whether it was estimated from a training dataset or whether the true value was used. “Interactions” specifies whether treatment × (covariates and/or prognostic score) interactions were used. “SE” indicates the standard deviation of the mean squared error.
Scenario | Covariates | Prognostic score | Interaction | MSE | SE |
---|---|---|---|---|---|
Baseline | False | None | True | 7.64 × 10−2 | 1.08 × 10−3 |
Baseline | False | None | False | 7.64 × 10−2 | 1.08 × 10−3 |
Baseline | False | Estimated | True | 1.76 × 10−2 | 2.46 × 10−4 |
Baseline | False | Estimated | False | 1.75 × 10−2 | 2.45 × 10−4 |
Baseline | False | Oracle | True | 7.69 × 10−3 | 1.09 × 10−4 |
Baseline | False | Oracle | False | 7.69 × 10−3 | 1.09 × 10−4 |
Baseline | True | None | True | 5.07 × 10−2 | 7.18 × 10−4 |
Baseline | True | None | False | 5.04 × 10−2 | 7.14 × 10−4 |
Baseline | True | Estimated | True | 1.74 × 10−2 | 2.46 × 10−4 |
Baseline | True | Estimated | False | 1.73 × 10−2 | 2.44 × 10−4 |
Baseline | True | Oracle | True | 7.85 × 10−3 | 1.11 × 10−4 |
Baseline | True | Oracle | False | 7.85 × 10−3 | 1.11 × 10−4 |
Surrrogate | False | None | True | 7.47 × 10−2 | 1.05 × 10−3 |
Surrrogate | False | None | False | 7.47 × 10−2 | 1.05 × 10−3 |
Surrrogate | False | Estimated | True | 4.05 × 10−2 | 5.69 × 10−4 |
Surrrogate | False | Estimated | False | 4.03 × 10−2 | 5.66 × 10−4 |
Surrrogate | False | Oracle | True | 8.25 × 10−3 | 1.18 × 10−4 |
Surrrogate | False | Oracle | False | 8.24 × 10−3 | 1.18 × 10−4 |
Surrrogate | True | None | True | 5.03 × 10−2 | 7.09 × 10−4 |
Surrrogate | True | None | False | 5.00 × 10−2 | 7.04 × 10−4 |
Surrrogate | True | Estimated | True | 3.75 × 10−2 | 5.27 × 10−4 |
Surrrogate | True | Estimated | False | 3.72 × 10−2 | 5.23 × 10−4 |
Surrrogate | True | Oracle | True | 8.41 × 10−3 | 1.20 × 10−4 |
Surrrogate | True | Oracle | False | 8.41 × 10−3 | 1.20 × 10−4 |
Shifted | False | None | True | 7.65 × 10−2 | 1.10 × 10−3 |
Shifted | False | None | False | 7.65 × 10−2 | 1.10 × 10−3 |
Shifted | False | Estimated | True | 6.79 × 10−2 | 9.62 × 10−4 |
Shifted | False | Estimated | False | 6.79 × 10−2 | 9.62 × 10−4 |
Shifted | False | Oracle | True | 8.20 × 10−3 | 1.15 × 10−4 |
Shifted | False | Oracle | False | 8.20 × 10−3 | 1.15 × 10−4 |
Shifted | True | None | True | 5.03 × 10−2 | 7.11 × 10−4 |
Shifted | True | None | False | 5.00 × 10−2 | 7.05 × 10−4 |
Shifted | True | Estimated | True | 4.91 × 10−2 | 6.97 × 10−4 |
Shifted | True | Estimated | False | 4.86 × 10−2 | 6.90 × 10−4 |
Shifted | True | Oracle | True | 8.34 × 10−3 | 1.17 × 10−4 |
Shifted | True | Oracle | False | 8.34 × 10−3 | 1.17 × 10−4 |
Strong | False | None | True | 7.73 × 10−2 | 1.08 × 10−3 |
Strong | False | None | False | 7.73 × 10−2 | 1.08 × 10−3 |
Strong | False | Estimated | True | 1.85 × 10−2 | 2.65 × 10−4 |
Strong | False | Estimated | False | 1.85 × 10−2 | 2.64 × 10−4 |
Strong | False | Oracle | True | 8.16 × 10−3 | 1.16 × 10−4 |
Strong | False | Oracle | False | 8.16 × 10−3 | 1.16 × 10−4 |
Strong | True | None | True | 5.14 × 10−2 | 7.18 × 10−4 |
Strong | True | None | False | 5.11 × 10−2 | 7.13 × 10−4 |
Strong | True | Estimated | True | 1.84 × 10−2 | 2.62 × 10−4 |
Strong | True | Estimated | False | 1.82 × 10−2 | 2.59 × 10−4 |
Strong | True | Oracle | True | 8.33 × 10−3 | 1.18 × 10−4 |
Strong | True | Oracle | False | 8.32 × 10−3 | 1.18 × 10−4 |
Linear | False | None | True | 3.49 × 10−2 | 4.83 × 10−4 |
Linear | False | None | False | 3.49 × 10−2 | 4.83 × 10−4 |
Linear | False | Estimated | True | 9.64 × 10−3 | 1.38 × 10−4 |
Linear | False | Estimated | False | 9.64 × 10−3 | 1.38 × 10−4 |
Linear | False | Oracle | True | 8.20 × 10−3 | 1.16 × 10−4 |
Linear | False | Oracle | False | 8.20 × 10−3 | 1.16 × 10−4 |
Linear | True | None | True | 8.37 × 10−3 | 1.18 × 10−4 |
Linear | True | None | False | 8.37 × 10−3 | 1.18 × 10−4 |
Linear | True | Estimated | True | 8.39 × 10−3 | 1.19 × 10−4 |
Linear | True | Estimated | False | 8.39 × 10−3 | 1.19 × 10−4 |
Linear | True | Oracle | True | 8.37 × 10−3 | 1.18 × 10−4 |
Linear | True | Oracle | False | 8.37 × 10−3 | 1.18 × 10−4 |
Heterogeneous | False | None | True | 5.54 × 10−2 | 7.76 × 10−4 |
Heterogeneous | False | None | False | 5.54 × 10−2 | 7.76 × 10−4 |
Heterogeneous | False | Estimated | True | 2.30 × 10−2 | 3.23 × 10−4 |
Heterogeneous | False | Estimated | False | 2.32 × 10−2 | 3.25 × 10−4 |
Heterogeneous | False | Oracle | True | 2.29 × 10−2 | 3.20 × 10−4 |
Heterogeneous | False | Oracle | False | 2.32 × 10−2 | 3.24 × 10−4 |
Heterogeneous | True | None | True | 2.99 × 10−2 | 4.30 × 10−4 |
Heterogeneous | True | None | False | 2.98 × 10−2 | 4.29 × 10−4 |
Heterogeneous | True | Estimated | True | 2.13 × 10−2 | 3.01 × 10−4 |
Heterogeneous | True | Estimated | False | 2.19 × 10−2 | 3.08 × 10−4 |
Heterogeneous | True | Oracle | True | 1.89 × 10−2 | 2.69 × 10−4 |
Heterogeneous | True | Oracle | False | 1.98 × 10−2 | 2.81 × 10−4 |

Visualization of the simulation results presented in tabular form above.
Appendix D. Covariates in the empirical demonstration dataset
Baseline covariates in the DHA study and ADNI/CPAD historical training data.
Covariate | Description |
---|---|
AChEI or memantine usage | Whether a subject is using a class of symptomatic Alzheimer’s drugs |
ADAS commands | Assesses the subject’s ability to follow commands |
ADAS comprehension | Assesses the subject’s ability to understand spoken language |
ADAS construction | Assesses the subject’s ability to draw basic figures |
ADAS ideational | Assesses the subject’s ability to carry out a basic task |
ADAS naming | Assesses the subject’s ability to name common objects |
ADAS orientation | Assesses the subject’s knowledge of time and place |
ADAS remember instructions | Assesses the subject’s ability to remember test instructions |
ADAS spoken language | Assesses the subject’s ability to speak clearly |
ADAS word finding | Assesses the subject’s word finding in speech |
ADAS word recall | Assesses the subject’s ability to recall a list of words |
ADAS word recognition | Assesses the subject’s ability to remember and identify words |
Age | Subject age at baseline |
ApoE e4 Allele count | The number of ApoE e4 alleles a subject has (0, 1, or 2) |
CDR community | Assesses the subject’s engagement in community activities |
CDR home and hobbies | Assesses the subject’s engagement in home and personal activities |
CDR judgement | Assesses the subject’s judgement skills |
CDR memory | Assesses the subject’s memory |
CDR orientation | Assesses the subject’s knowledge of time and place |
CDR personal care | Assesses the subject’s ability to care for themselves |
Diastolic blood pressure | The diastolic blood pressure of a subject |
Education (Years) | The number of years of education of a subject |
Heart rate | The resting heart rate of a subject |
Height | The height of a subject |
Indicator for clinical trial | 1 if the subject is in an RCT, 0 if not |
MMSE attention and calculation | Assesses the subject’s attention and calculation skills |
MMSE language | Assesses the subject’s language skills |
MMSE orientation | Assesses the subject’s knowledge of place and time |
MMSE recall | Assesses the subject’s ability to remember prompts |
MMSE registration | Assesses the subject’s ability to repeat prompts |
Region: Europe | 1 if the subject lives in Europe, 0 otherwise |
Region: Northern America | 1 if the subject lives in the US or Canada, 0 otherwise |
Region: Other | 1 if the subject lives outside of Europe/US/Canada, 0 otherwise |
Serious adverse events | The number of serious adverse events reported |
Sex | 1 if female, 0 if male |
Systolic blood pressure | The systolic blood pressure of a subject |
Weight | The weight of a subject |
References
1. Maldonado, G, Greenland, S. Estimating causal effects. Int J Epidemiol 2002;31:422–9.10.1093/ije/31.2.422Search in Google Scholar
2. Sox, HC, Goodman, SN. The methods of comparative effectiveness research. Publ Health 2012;33:425–45. https://doi.org/10.1146/annurev-publhealth-031811-124610.Search in Google Scholar PubMed
3. Overhage, JM, Ryan, PB, Schuemie, MJ, Stang, PE. Desideratum for evidence based epidemiology. Drug Saf 2013;36:5–14. https://doi.org/10.1007/s40264-013-0102-2.Search in Google Scholar PubMed
4. Hannan, EL Randomized clinical trials and observational studies guidelines for assessing respective strengths and limitations. JACC Cardiovasc Interv 2008;1:211–7. https://doi.org/10.1016/j.jcin.2008.01.008.Search in Google Scholar PubMed
5. Kopp-Schneider, A, Calderazzo, S, Wiesenfarth, M. Power gains by using external information in clinical trials are typically not possible when requiring strict type I error control. Biom J 2020;62:361–74. https://doi.org/10.1002/bimj.201800395.Search in Google Scholar PubMed PubMed Central
6. Ibrahim, JG, Chen, M-H, Gwon, Y, Chen, F. The power prior: theory and applications. Stat Med 2015;34:3724–49. https://doi.org/10.1002/sim.6728.Search in Google Scholar PubMed PubMed Central
7. Lim, J, Walley, R, Yuan, J, Liu, J, Dabral, A, Best, N. Minimizing patient burden through the use of historical subject-level data in innovative confirmatory clinical trials. TIRS 2018;52:546–59. https://doi.org/10.1177/2168479018778282.Search in Google Scholar PubMed
8. Baker, SG, Lindeman, KS. Rethinking historical controls. Biostatistics 2001;2:383–96. https://doi.org/10.1093/biostatistics/2.4.383.Search in Google Scholar PubMed
9. Ghadessi, M, Tang, R, Zhou, J, Liu, R, Wang, C, Toyoizumi, K, et al.. A roadmap to using historical controls in clinical trials – by drug information association adaptive design scientific working group (DIA-ADSWG). Orphanet J Rare Dis 2020;15:69. https://doi.org/10.1186/s13023-020-1332-x.Search in Google Scholar PubMed PubMed Central
10. Hansen, BB. The prognostic analogue of the propensity score. Biometrika 2008;95:481–8. https://doi.org/10.1093/biomet/asn004.Search in Google Scholar
11. Aikens, RC, Greaves, D, Baiocchi, M. A pilot design for observational studies: using abundant data thoughtfully. Stat Med 2020;39:4821–40.10.1002/sim.8754Search in Google Scholar PubMed
12. Wyss, R, Lunt, M, Brookhart, MA, Glynn, RJ, Stürmer, T. Reducing bias amplification in the presence of unmeasured confounding through out-of-sample estimation strategies for the disease risk score. J Causal Inference 2014;2:131–46. https://doi.org/10.1515/jci-2014-0009.Search in Google Scholar PubMed PubMed Central
13. Lin, W. Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique. Ann Appl Stat 2013;7:295–318. https://doi.org/10.1214/12-aoas583.Search in Google Scholar
14. Kahan, BC, Jairath, V, J Doré, C, Morris, TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014;15:139. https://doi.org/10.1186/1745-6215-15-139.Search in Google Scholar PubMed PubMed Central
15. Raab, GM, Day, S, Sales, J. How to select covariates to include in the analysis of a clinical trial. Contr Clin Trials 2000;21:330–42. https://doi.org/10.1016/s0197-2456(00)00061-1.Search in Google Scholar PubMed
16. Yang, L, Tsiatis, AA. Efficiency study of estimators for a treatment effect in a pretest–posttest trial. Am Statistician 2001;55:314–21. https://doi.org/10.1198/000313001753272466.Search in Google Scholar
17. Committee for Medicinal Products for Human Use. Guideline on adjustment for baseline covariates in clinical trials. London: European Medicines Agency; 2015.Search in Google Scholar
18. Cooney, MT, Dudina, AL, Graham, IM. Value and limitations of existing scores for the assessment of cardiovascular risk: a review for clinicians. J Am Coll Cardiol 2009;54:1209–27. https://doi.org/10.1016/j.jacc.2009.07.020.Search in Google Scholar PubMed
19. Austin, SR, Wong, Y-N, Uzzo, RG, Beck, JR, Egleston, BL. Why summary comorbidity measures such as the Charlson comorbidity index and elixhauser score work. Medical Care 2015;53:e65–72. https://doi.org/10.1097/mlr.0b013e318297429c.Search in Google Scholar
20. Ambrosius, WT, Sink, KM, Foy, CG, Berlowitz, DR, Cheung, AK, Cushman, WC, et al., The SPRINT Study Research Group. The design and rationale of a multicenter clinical trial comparing two strategies for control of systolic blood pressure: the systolic blood pressure intervention trial (SPRINT). Clin Trials 2014;11:532–46. https://doi.org/10.1177/1740774514537404.Search in Google Scholar PubMed PubMed Central
21. Borm, GF, Fransen, J, Lemmens, WAJG. A simple sample size formula for analysis of covariance in randomized clinical trials. J Clin Epidemiol 2007;60:1234–8. https://doi.org/10.1016/j.jclinepi.2007.02.006.Search in Google Scholar PubMed
22. Rubin, DB. Causal inference using potential outcomes. J Am Stat Assoc 2005;100:322–31. https://doi.org/10.1198/016214504000001880.Search in Google Scholar
23. Wang, B, Ogburn, EL, Rosenblum, M. Analysis of covariance in randomized trials: more precision and valid confidence intervals, without model assumptions. Biometrics 2019;75:1391–400. https://doi.org/10.1111/biom.13062.Search in Google Scholar PubMed
24. Leon, S, Tsiatis, AA, Davidian, M. Semiparametric estimation of treatment effect in a pretest–posttest study. Biometrics 2003;59:1046–55. https://doi.org/10.1111/j.0006-341x.2003.00120.x.Search in Google Scholar PubMed
25. Aronow, PM, Miller, BT. Foundations of agnostic statistics. New York: Cambridge University Press; 2019:286–7 pp.10.1017/9781316831762.010Search in Google Scholar
26. Tsiatis, A. Semiparametric theory and missing data. New York: Springer Science & Business Media; 2007.Search in Google Scholar
27. Luo, Y, Spindler, M. High-dimensional L2 boosting: rate of convergence. 2016 arXiv.Search in Google Scholar
28. Belloni, A, Chernozhukov, V. Least squares after model selection in high-dimensional sparse models. Bernoulli 2013;19:521–47. https://doi.org/10.3150/11-bej410.Search in Google Scholar
29. Farrell, MH, Liang, T, Misra, S. Deep neural networks for estimation and inference. 2018 arXiv.10.3982/ECTA16901Search in Google Scholar
30. Syrgkanis, V, Zampetakis, M. Estimation and inference with trees and forests in high dimensions. 2020 arXiv.Search in Google Scholar
31. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikit-learn: machine learning in Python. 2012 arXiv.Search in Google Scholar
32. Quinn, JF, Raman, R, Thomas, RG, Yurko-Mauro, K, Nelson, EB, Van Dyck, C, et al.. Docosahexaenoic acid supplementation and cognitive decline in alzheimer disease: a randomized trial. J Am Med Assoc 2010;304:1903–11. https://doi.org/10.1001/jama.2010.1510.Search in Google Scholar PubMed PubMed Central
33. Coon, KD, Myers, AJ, Craig, DW, Webster, JA, Pearson, JV, Lince, DH, et al.. A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset alzheimer’s disease. J Clin Psychiatr 2007;68:613–8. https://doi.org/10.4088/jcp.v68n0419.Search in Google Scholar PubMed
34. Rosen, WG, Mohs, RC, Davis, KL. A new rating scale for Alzheimer’s disease. Am J Psychiatr 1984;141:1356–64. https://doi.org/10.1176/ajp.141.11.1356.Search in Google Scholar PubMed
35. Galasko, D, Bennett, D, Sano, M, Ernesto, C, Thomas, R, Grundman, M, et al.. An inventory to assess activities of daily living for clinical trials in Alzheimer’s disease. The Alzheimer’s disease cooperative study. Alzheimer Dis Assoc Disord 1997;11:S33–9. https://doi.org/10.1097/00002093-199700112-00005.Search in Google Scholar
36. Morris, JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology 1993;43:2412–4. https://doi.org/10.1212/wnl.43.11.2412-a.Search in Google Scholar PubMed
37. Neville, J, Kopko, S, Broadbent, S, Avilés, E, Stafford, R, Solinsky, CM, et al., Coalition Against Major Diseases. Development of a unified clinical trial database for Alzheimer’s disease. Alzheimer’s Dementia 2015;11:1212–21. https://doi.org/10.1016/j.jalz.2014.11.005.Search in Google Scholar PubMed
38. Romero, K, Mars, M, Frank, D, Anthony, M, Neville, J, Kirby, L, et al.. The coalition against major diseases: developing tools for an integrated drug development process for Alzheimer’s and Parkinson’s diseases. Clin Pharmacol Ther 2009;86:365–7. https://doi.org/10.1038/clpt.2009.165.Search in Google Scholar PubMed
39. Chernozhukov, V, Chetverikov, D, Demirer, M, Duflo, E, Hansen, C, Newey, W, et al.. Double/debiased machine learning for treatment and structural parameters. Econom J 2018;21:C1–68. https://doi.org/10.1111/ectj.12097.Search in Google Scholar
40. Wager, S, Du, W, Taylor, J, Tibshirani, RJ. High-dimensional regression adjustments in randomized experiments. Proc Natl Acad Sci Unit States Am 2016;113:12673–8. https://doi.org/10.1073/pnas.1614732113.Search in Google Scholar PubMed PubMed Central
41. Rothe, C.Flexible covariate adjustments in randomized experiments, Working Paper; 2018.Search in Google Scholar
42. Dankar, FK, El Emam, K. The application of differential privacy to health data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops on – EDBT-ICDT ’12; 2012. pp. 158–66.10.1145/2320765.2320816Search in Google Scholar
43. Brisimi, TS, Chen, R, Mela, T, Olshevsky, A, Paschalidis, IC, Shi, W. Federated learning of predictive models from federated electronic health records. Int J Med Inf 2018;112:59–67. https://doi.org/10.1016/j.ijmedinf.2018.01.007.Search in Google Scholar PubMed PubMed Central
44. Coalition Against Major Diseases, Organiza, Abbott, Alliance for Aging Research, Alzheimer’s Association, Alzheimer’s Foundation of America, AstraZeneca Pharmaceuticals LP, Bristol-Myers Squibb Company, Critical Path Institute, CHDI Foundation Inc, Eli Lilly and Company, F Hoffmann-La Roche Ltd, Forest Research Institute, Genentech Inc, GlaxoSmithKline, Johnson & Johnson, National Health Council, Novartis Pharmaceuticals Corporation, Parkinson’s Action Network, Parkinson’s Disease Foundation, Pfizer Inc, sanofi-aventis Collaborating, Fisher, CK, Smith, AM, Walsh, JR. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci Rep 2019;9:13622. https://doi.org/10.1038/s41598-019-49656-2.Search in Google Scholar PubMed PubMed Central
45. Rajkomar, A, Oren, E, Chen, K, Dai, AM, Hajaj, N, Hardt, M, et al.. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 2018;1:18. https://doi.org/10.1038/s41746-018-0029-1.Search in Google Scholar PubMed PubMed Central
46. LeCun, Y, Bengio, Y, Hinton, G. Deep learning. Nature 2015;521:436. https://doi.org/10.1038/nature14539.Search in Google Scholar PubMed
47. Miotto, R, Wang, F, Wang, S, Jiang, X, Dudley, JT. Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf 2018;19:1236–46. https://doi.org/10.1093/bib/bbx044.Search in Google Scholar PubMed PubMed Central
48. Dubois, S, Romano, N, Jung, K, Shah, N, Kale, D. The effectiveness of transfer learning in electronic health records data. In: Workshop Track - ICLR; 2017.Search in Google Scholar
49. van der Vaart, AW. Asymptotic statistics. Cambridge: Cambridge University Press; 2000.Search in Google Scholar
50. Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846. https://doi.org/10.2307/2290910.Search in Google Scholar
51. Rosenblum, M, van der Laan, MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. Int J Biostat 2010;6:13. https://doi.org/10.2202/1557-4679.1138.Search in Google Scholar PubMed PubMed Central
52. Freedman, DA. On regression adjustments to experimental data. Adv Appl Math 2008;40:180–93. https://doi.org/10.1016/j.aam.2006.12.003.Search in Google Scholar
53. Long, JS, Ervin, LH. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician 2012;54:217–24. https://doi.org/10.1080/00031305.2000.10474549.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston