Abstract
Estimating causal effects from randomized experiments is central to clinical research. Reducing the statistical uncertainty in these analyses is an important objective for statisticians. Registries, prior trials, and health records constitute a growing compendium of historical data on patients under standardofcare that may be exploitable to this end. However, most methods for historical borrowing achieve reductions in variance by sacrificing strict typeI error rate control. Here, we propose a use of historical data that exploits linear covariate adjustment to improve the efficiency of trial analyses without incurring bias. Specifically, we train a prognostic model on the historical data, then estimate the treatment effect using a linear regression while adjusting for the trial subjects’ predicted outcomes (their prognostic scores). We prove that, under certain conditions, this prognostic covariate adjustment procedure attains the minimum variance possible among a large class of estimators. When those conditions are not met, prognostic covariate adjustment is still more efficient than raw covariate adjustment and the gain in efficiency is proportional to a measure of the predictive accuracy of the prognostic model above and beyond the linear relationship with the raw covariates. We demonstrate the approach using simulations and a reanalysis of an Alzheimer’s disease clinical trial and observe meaningful reductions in meansquared error and the estimated variance. Lastly, we provide a simplified formula for asymptotic variance that enables power calculations that account for these gains. Sample size reductions between 10% and 30% are attainable when using prognostic models that explain a clinically realistic percentage of the outcome variance.
Acknowledgments
We are grateful to Xinkun Nie and Oleg Sofrygin for enlightening conversations and to Rachael C. Aikens for feedback on a draft of this article. Data collection and sharing for this project was funded in part by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Data collection and sharing for this project was funded in part by the University of California, San Diego Alzheimer’s Disease Cooperative Study (ADCS) (National Institute on Aging Grant Number U19AG010483).

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

Research funding: None declared.

Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
Appendix A. Mathematical results
Throughout we assume enough regularity conditions for the asymptotic normality of Mestimators to hold. The details are found in chapter 5 (thm 5.23) of van der Vaart [49].
Lemma A.1
(Rosenblum). The influence function for the linear regression treatment effect estimator we describe in Section 3 is ψ = ψ _{1} − ψ _{0} where
and
This follows from results in Robins et al. [50]. An accessible presentation for the case of generalized linear models is given in Rosenblum and Laan [51].
Definition A.1
(Differenceinmeans). The “differenceinmeans” (or “unadjusted”) estimator of τ = μ
_{1} − μ
_{0} is
Note that throughout the appendix we omit the subscript n on estimators. E.g. τ _{Δ} is shorthand for τ _{Δ,n } and our asymptotic statements refer to the sequence of estimators as n becomes large.
Lemma A.2
The differenceinmeans estimator has asymptotic variance given by
where
Proof
This fact is wellknown. One proof follows the outline of 7 below taking Z ^{ ⊤ } = [1, W]. □
Definition A.2
(ANCOVA I). The “ANCOVA I” estimator of τ = μ
_{1} − μ
_{0} (denoted
Definition A.3
(ANCOVA II). The “ANCOVA II” estimator of τ = μ
_{1} − μ
_{0} (denoted
The following two Theorems A.3 and A.4 are mild generalizations of or follow closely from results stated in Leon et al. [24] and Yang and Tsiatis [16]. Details are provided here for the reader’s convenience.
Theorem A.3
The ANCOVA I estimator is asymptotically unbiased for τ = μ _{1} − μ _{0} and has asymptotic variance given by
where
Proof
We begin by applying Lemma A.1. Minimization of the expected loglikelihood shows that
where
Where
It is known that all regular and asymptotically linear estimators of the treatment effect have an influence function of this form with h(X) dependent on the choice of estimator [24, 26].
By the theory of influence functions, our estimator has a limiting distribution [26]
The asymptotic variance of
The covariance of the two terms involves the expectations
where we have introduced ξ _{*} = π _{1} ξ _{0} + π _{0} ξ _{1}. Assembling obtains the desired result. □
Corollary A.3.1
When X ∈ R (a single covariate), a consistent estimate of the sampling variance
where
Proof
This follows from the definitions and Slutsky’s theorem. □
Corollary A.3.2
If either π _{0} = π _{1} or ξ _{0} = ξ _{1}, then
Theorem A.4
The ANCOVA II estimator is asymptotically unbiased for τ = μ _{1} − μ _{0} and has asymptotic variance given by
Proof
Arguments similar to those in Theorem A.3 show that the influence function for the GLM marginal effect estimator with this specification is identical to Eq. (12) except that ξ = π _{0} ξ _{0} + π _{1} ξ _{1} is replaced by ξ _{*} = π _{1} ξ _{0} + π _{0} ξ _{1}. Specifically ψ _{II} = ψ _{1,II} − ψ _{0,II} with
The result follows from proceeding along the outline of Theorem A.3. □
Corollary A.4.1
When X ∈ R (a single covariate), a consistent estimate of the sampling variance
Corollary A.4.2
Adding covariates to the ANCOVA II estimator can only decrease its asymptotic variance.
Proof
Consider using covariates X with variance Σ_{
x
} and covariance with Y
_{
w
} of ξ
_{
w,x
} versus a set of covariates [X, M] (
The denominator must be positive because
Theorem A.5
ANCOVA II is a more efficient estimator than ANCOVA I or differenceinmeans. ANCOVA I may or may not be more efficient than differenceinmeans (unless π _{0} = π _{1} = 0.5 or ξ _{0} = ξ _{1}, in which case it is as efficient as ANCOVA II). In a slight abuse of notation,
Proof
Lemma A.6
Consider using the ANCOVA II estimator with an arbitrary (multivariate) transformation of the covariates f(X) in place of the raw covariates X. Among all fixed transformations f(X), the transformation
Consider replacing X in the interacted linear model (ANCOVA II) with an arbitrary fixed (possibly multivariate) function of the covariates f(X). By Eq. (23) and our definitions of ξ _{*} and V the influence function for this estimator is ψ = ψ _{1} − ψ _{0} with
where
The result is precisely the efficient influence function for the treatment effect [24, 26]. It is known that no regular and asymptotically linear (RAL) estimator (which essentially all practical and reasonable estimators are) can be more efficient than any estimator with this influence function.
Corollary A.6.1
Presume a constant treatment effect: μ _{1}(X) = μ _{0}(X) + τ. Then the ANCOVA II analysis that uses μ _{0}(X) in the role of X has the lowest possible asymptotic variance among all regular and asymptotically linear estimators with access to the covariates X.
Proof
μ
_{1}(X) = μ
_{0}(X) + τ implies
which is the same as the efficient influence function when μ _{1}(X) = μ _{0}(X) + τ. □
Corollary A.6.2
Corollary A.6.1 also holds when the ANCOVA II estimator is replaced by the ANCOVA I estimator.
Proof
Theorem A.5 establishes that ANCOVA I is as efficient as ANCOVA II when
The following lemma is required for the proof that proceeds it.
Lemma A.7
Let
Proof
The final convergence holds by our assumption that
Taking advantage of the fact that f, f
_{
n
} ≤ b are bounded we can make similar arguments to show that
Corollary A.7.1
Let
Proof
Let
Now note
as desired. □
Theorem A.8
Presume X has compact support and there is a constant treatment effect: μ
_{1}(X) = μ
_{0}(X) + τ with μ
_{0}(x) < b bounded. Let m(x) be a (random) function learned from the external data (
Y
′,
X
′)_{
n′} such that m(x) < b is also bounded and
Proof
Define our estimator of interest as the ANCOVA II estimator that uses the learned model m(X) in place of the covariates X if m(X) is not numerically constant up to some machine precision and otherwise as the differenceinmeans estimator. Denote this estimator
Showing
where
Let
To wit, consider the difference
where we’ve abbreviated
And show it converges to 0. Recalling that m itself is random (depends on the external data (
X
′
Y
′)), but independent of the trial data (
X
,
W
,
Y
), note that we can treat m(⋅) as if it were a fixed function and B as a fixed constant if we condition on the external data. After conditioning, the quantity inside the parentheses is IID and has mean zero because its μ
_{0}(X) − m(X)B and
where we’ve used the fact that the summands are IID to pass the variance through the sum and effectively gain the 1/n required to cancel the n. The same argument shows that the equivalent for the second term in Eq. (35) is
To complete the proof we invoke Corollary A.7.1 in combination with our assumptions m(x) < b, μ
_{0}(x) < b and
Corollary A.8.1
Theorem A.8 also holds for the ANCOVA I estimator.
Proof
In the case of a constant treatment effect ANCOVA I and ANCOVA II have the same asymptotic variance (Theorem A.5). The result follows immediately. □
Appendix B. Estimating
σ
w
2
and ρ
_{
w
} for power calculations
One method for obtaining estimates for the marginal potential outcome variances (
The controlarm marginal outcome variance
The correlation ρ _{0} between M″ and Y″ can be estimated by
which is the usual sample correlation coefficient. These values may be inflated (
The corresponding values for the treatment arm can rarely be estimated from data because treatmentarm data for the experimental treatment is likely to be scarce or unavailable. It is therefore prudent to assume
Appendix C. Additional simulation results
Here we detail a full set of simulation results using additional specifications for the regression estimators (Figure 1). “Covariates” indicates whether the raw covariates were adjusted for. “Prognostic score” indicates whether any prognostic score was used, and, if so, whether it was estimated from a training dataset or whether the true value was used. “Interactions” specifies whether treatment × (covariates and/or prognostic score) interactions were used. “SE” indicates the standard deviation of the mean squared error.
Scenario  Covariates  Prognostic score  Interaction  MSE  SE 

Baseline  False  None  True  7.64 × 10^{−2}  1.08 × 10^{−3} 
Baseline  False  None  False  7.64 × 10^{−2}  1.08 × 10^{−3} 
Baseline  False  Estimated  True  1.76 × 10^{−2}  2.46 × 10^{−4} 
Baseline  False  Estimated  False  1.75 × 10^{−2}  2.45 × 10^{−4} 
Baseline  False  Oracle  True  7.69 × 10^{−3}  1.09 × 10^{−4} 
Baseline  False  Oracle  False  7.69 × 10^{−3}  1.09 × 10^{−4} 
Baseline  True  None  True  5.07 × 10^{−2}  7.18 × 10^{−4} 
Baseline  True  None  False  5.04 × 10^{−2}  7.14 × 10^{−4} 
Baseline  True  Estimated  True  1.74 × 10^{−2}  2.46 × 10^{−4} 
Baseline  True  Estimated  False  1.73 × 10^{−2}  2.44 × 10^{−4} 
Baseline  True  Oracle  True  7.85 × 10^{−3}  1.11 × 10^{−4} 
Baseline  True  Oracle  False  7.85 × 10^{−3}  1.11 × 10^{−4} 
Surrrogate  False  None  True  7.47 × 10^{−2}  1.05 × 10^{−3} 
Surrrogate  False  None  False  7.47 × 10^{−2}  1.05 × 10^{−3} 
Surrrogate  False  Estimated  True  4.05 × 10^{−2}  5.69 × 10^{−4} 
Surrrogate  False  Estimated  False  4.03 × 10^{−2}  5.66 × 10^{−4} 
Surrrogate  False  Oracle  True  8.25 × 10^{−3}  1.18 × 10^{−4} 
Surrrogate  False  Oracle  False  8.24 × 10^{−3}  1.18 × 10^{−4} 
Surrrogate  True  None  True  5.03 × 10^{−2}  7.09 × 10^{−4} 
Surrrogate  True  None  False  5.00 × 10^{−2}  7.04 × 10^{−4} 
Surrrogate  True  Estimated  True  3.75 × 10^{−2}  5.27 × 10^{−4} 
Surrrogate  True  Estimated  False  3.72 × 10^{−2}  5.23 × 10^{−4} 
Surrrogate  True  Oracle  True  8.41 × 10^{−3}  1.20 × 10^{−4} 
Surrrogate  True  Oracle  False  8.41 × 10^{−3}  1.20 × 10^{−4} 
Shifted  False  None  True  7.65 × 10^{−2}  1.10 × 10^{−3} 
Shifted  False  None  False  7.65 × 10^{−2}  1.10 × 10^{−3} 
Shifted  False  Estimated  True  6.79 × 10^{−2}  9.62 × 10^{−4} 
Shifted  False  Estimated  False  6.79 × 10^{−2}  9.62 × 10^{−4} 
Shifted  False  Oracle  True  8.20 × 10^{−3}  1.15 × 10^{−4} 
Shifted  False  Oracle  False  8.20 × 10^{−3}  1.15 × 10^{−4} 
Shifted  True  None  True  5.03 × 10^{−2}  7.11 × 10^{−4} 
Shifted  True  None  False  5.00 × 10^{−2}  7.05 × 10^{−4} 
Shifted  True  Estimated  True  4.91 × 10^{−2}  6.97 × 10^{−4} 
Shifted  True  Estimated  False  4.86 × 10^{−2}  6.90 × 10^{−4} 
Shifted  True  Oracle  True  8.34 × 10^{−3}  1.17 × 10^{−4} 
Shifted  True  Oracle  False  8.34 × 10^{−3}  1.17 × 10^{−4} 
Strong  False  None  True  7.73 × 10^{−2}  1.08 × 10^{−3} 
Strong  False  None  False  7.73 × 10^{−2}  1.08 × 10^{−3} 
Strong  False  Estimated  True  1.85 × 10^{−2}  2.65 × 10^{−4} 
Strong  False  Estimated  False  1.85 × 10^{−2}  2.64 × 10^{−4} 
Strong  False  Oracle  True  8.16 × 10^{−3}  1.16 × 10^{−4} 
Strong  False  Oracle  False  8.16 × 10^{−3}  1.16 × 10^{−4} 
Strong  True  None  True  5.14 × 10^{−2}  7.18 × 10^{−4} 
Strong  True  None  False  5.11 × 10^{−2}  7.13 × 10^{−4} 
Strong  True  Estimated  True  1.84 × 10^{−2}  2.62 × 10^{−4} 
Strong  True  Estimated  False  1.82 × 10^{−2}  2.59 × 10^{−4} 
Strong  True  Oracle  True  8.33 × 10^{−3}  1.18 × 10^{−4} 
Strong  True  Oracle  False  8.32 × 10^{−3}  1.18 × 10^{−4} 
Linear  False  None  True  3.49 × 10^{−2}  4.83 × 10^{−4} 
Linear  False  None  False  3.49 × 10^{−2}  4.83 × 10^{−4} 
Linear  False  Estimated  True  9.64 × 10^{−3}  1.38 × 10^{−4} 
Linear  False  Estimated  False  9.64 × 10^{−3}  1.38 × 10^{−4} 
Linear  False  Oracle  True  8.20 × 10^{−3}  1.16 × 10^{−4} 
Linear  False  Oracle  False  8.20 × 10^{−3}  1.16 × 10^{−4} 
Linear  True  None  True  8.37 × 10^{−3}  1.18 × 10^{−4} 
Linear  True  None  False  8.37 × 10^{−3}  1.18 × 10^{−4} 
Linear  True  Estimated  True  8.39 × 10^{−3}  1.19 × 10^{−4} 
Linear  True  Estimated  False  8.39 × 10^{−3}  1.19 × 10^{−4} 
Linear  True  Oracle  True  8.37 × 10^{−3}  1.18 × 10^{−4} 
Linear  True  Oracle  False  8.37 × 10^{−3}  1.18 × 10^{−4} 
Heterogeneous  False  None  True  5.54 × 10^{−2}  7.76 × 10^{−4} 
Heterogeneous  False  None  False  5.54 × 10^{−2}  7.76 × 10^{−4} 
Heterogeneous  False  Estimated  True  2.30 × 10^{−2}  3.23 × 10^{−4} 
Heterogeneous  False  Estimated  False  2.32 × 10^{−2}  3.25 × 10^{−4} 
Heterogeneous  False  Oracle  True  2.29 × 10^{−2}  3.20 × 10^{−4} 
Heterogeneous  False  Oracle  False  2.32 × 10^{−2}  3.24 × 10^{−4} 
Heterogeneous  True  None  True  2.99 × 10^{−2}  4.30 × 10^{−4} 
Heterogeneous  True  None  False  2.98 × 10^{−2}  4.29 × 10^{−4} 
Heterogeneous  True  Estimated  True  2.13 × 10^{−2}  3.01 × 10^{−4} 
Heterogeneous  True  Estimated  False  2.19 × 10^{−2}  3.08 × 10^{−4} 
Heterogeneous  True  Oracle  True  1.89 × 10^{−2}  2.69 × 10^{−4} 
Heterogeneous  True  Oracle  False  1.98 × 10^{−2}  2.81 × 10^{−4} 
Appendix D. Covariates in the empirical demonstration dataset
Covariate  Description 

AChEI or memantine usage  Whether a subject is using a class of symptomatic Alzheimer’s drugs 
ADAS commands  Assesses the subject’s ability to follow commands 
ADAS comprehension  Assesses the subject’s ability to understand spoken language 
ADAS construction  Assesses the subject’s ability to draw basic figures 
ADAS ideational  Assesses the subject’s ability to carry out a basic task 
ADAS naming  Assesses the subject’s ability to name common objects 
ADAS orientation  Assesses the subject’s knowledge of time and place 
ADAS remember instructions  Assesses the subject’s ability to remember test instructions 
ADAS spoken language  Assesses the subject’s ability to speak clearly 
ADAS word finding  Assesses the subject’s word finding in speech 
ADAS word recall  Assesses the subject’s ability to recall a list of words 
ADAS word recognition  Assesses the subject’s ability to remember and identify words 
Age  Subject age at baseline 
ApoE e4 Allele count  The number of ApoE e4 alleles a subject has (0, 1, or 2) 
CDR community  Assesses the subject’s engagement in community activities 
CDR home and hobbies  Assesses the subject’s engagement in home and personal activities 
CDR judgement  Assesses the subject’s judgement skills 
CDR memory  Assesses the subject’s memory 
CDR orientation  Assesses the subject’s knowledge of time and place 
CDR personal care  Assesses the subject’s ability to care for themselves 
Diastolic blood pressure  The diastolic blood pressure of a subject 
Education (Years)  The number of years of education of a subject 
Heart rate  The resting heart rate of a subject 
Height  The height of a subject 
Indicator for clinical trial  1 if the subject is in an RCT, 0 if not 
MMSE attention and calculation  Assesses the subject’s attention and calculation skills 
MMSE language  Assesses the subject’s language skills 
MMSE orientation  Assesses the subject’s knowledge of place and time 
MMSE recall  Assesses the subject’s ability to remember prompts 
MMSE registration  Assesses the subject’s ability to repeat prompts 
Region: Europe  1 if the subject lives in Europe, 0 otherwise 
Region: Northern America  1 if the subject lives in the US or Canada, 0 otherwise 
Region: Other  1 if the subject lives outside of Europe/US/Canada, 0 otherwise 
Serious adverse events  The number of serious adverse events reported 
Sex  1 if female, 0 if male 
Systolic blood pressure  The systolic blood pressure of a subject 
Weight  The weight of a subject 
References
1. Maldonado, G, Greenland, S. Estimating causal effects. Int J Epidemiol 2002;31:422–9.10.1093/ije/31.2.422Search in Google Scholar
2. Sox, HC, Goodman, SN. The methods of comparative effectiveness research. Publ Health 2012;33:425–45. https://doi.org/10.1146/annurevpublhealth031811124610.Search in Google Scholar PubMed
3. Overhage, JM, Ryan, PB, Schuemie, MJ, Stang, PE. Desideratum for evidence based epidemiology. Drug Saf 2013;36:5–14. https://doi.org/10.1007/s4026401301022.Search in Google Scholar PubMed
4. Hannan, EL Randomized clinical trials and observational studies guidelines for assessing respective strengths and limitations. JACC Cardiovasc Interv 2008;1:211–7. https://doi.org/10.1016/j.jcin.2008.01.008.Search in Google Scholar PubMed
5. KoppSchneider, A, Calderazzo, S, Wiesenfarth, M. Power gains by using external information in clinical trials are typically not possible when requiring strict type I error control. Biom J 2020;62:361–74. https://doi.org/10.1002/bimj.201800395.Search in Google Scholar PubMed PubMed Central
6. Ibrahim, JG, Chen, MH, Gwon, Y, Chen, F. The power prior: theory and applications. Stat Med 2015;34:3724–49. https://doi.org/10.1002/sim.6728.Search in Google Scholar PubMed PubMed Central
7. Lim, J, Walley, R, Yuan, J, Liu, J, Dabral, A, Best, N. Minimizing patient burden through the use of historical subjectlevel data in innovative confirmatory clinical trials. TIRS 2018;52:546–59. https://doi.org/10.1177/2168479018778282.Search in Google Scholar PubMed
8. Baker, SG, Lindeman, KS. Rethinking historical controls. Biostatistics 2001;2:383–96. https://doi.org/10.1093/biostatistics/2.4.383.Search in Google Scholar PubMed
9. Ghadessi, M, Tang, R, Zhou, J, Liu, R, Wang, C, Toyoizumi, K, et al.. A roadmap to using historical controls in clinical trials – by drug information association adaptive design scientific working group (DIAADSWG). Orphanet J Rare Dis 2020;15:69. https://doi.org/10.1186/s130230201332x.Search in Google Scholar PubMed PubMed Central
10. Hansen, BB. The prognostic analogue of the propensity score. Biometrika 2008;95:481–8. https://doi.org/10.1093/biomet/asn004.Search in Google Scholar
11. Aikens, RC, Greaves, D, Baiocchi, M. A pilot design for observational studies: using abundant data thoughtfully. Stat Med 2020;39:4821–40.10.1002/sim.8754Search in Google Scholar PubMed
12. Wyss, R, Lunt, M, Brookhart, MA, Glynn, RJ, Stürmer, T. Reducing bias amplification in the presence of unmeasured confounding through outofsample estimation strategies for the disease risk score. J Causal Inference 2014;2:131–46. https://doi.org/10.1515/jci20140009.Search in Google Scholar PubMed PubMed Central
13. Lin, W. Agnostic notes on regression adjustments to experimental data: reexamining Freedman’s critique. Ann Appl Stat 2013;7:295–318. https://doi.org/10.1214/12aoas583.Search in Google Scholar
14. Kahan, BC, Jairath, V, J Doré, C, Morris, TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials 2014;15:139. https://doi.org/10.1186/1745621515139.Search in Google Scholar PubMed PubMed Central
15. Raab, GM, Day, S, Sales, J. How to select covariates to include in the analysis of a clinical trial. Contr Clin Trials 2000;21:330–42. https://doi.org/10.1016/s01972456(00)000611.Search in Google Scholar PubMed
16. Yang, L, Tsiatis, AA. Efficiency study of estimators for a treatment effect in a pretest–posttest trial. Am Statistician 2001;55:314–21. https://doi.org/10.1198/000313001753272466.Search in Google Scholar
17. Committee for Medicinal Products for Human Use. Guideline on adjustment for baseline covariates in clinical trials. London: European Medicines Agency; 2015.Search in Google Scholar
18. Cooney, MT, Dudina, AL, Graham, IM. Value and limitations of existing scores for the assessment of cardiovascular risk: a review for clinicians. J Am Coll Cardiol 2009;54:1209–27. https://doi.org/10.1016/j.jacc.2009.07.020.Search in Google Scholar PubMed
19. Austin, SR, Wong, YN, Uzzo, RG, Beck, JR, Egleston, BL. Why summary comorbidity measures such as the Charlson comorbidity index and elixhauser score work. Medical Care 2015;53:e65–72. https://doi.org/10.1097/mlr.0b013e318297429c.Search in Google Scholar
20. Ambrosius, WT, Sink, KM, Foy, CG, Berlowitz, DR, Cheung, AK, Cushman, WC, et al., The SPRINT Study Research Group. The design and rationale of a multicenter clinical trial comparing two strategies for control of systolic blood pressure: the systolic blood pressure intervention trial (SPRINT). Clin Trials 2014;11:532–46. https://doi.org/10.1177/1740774514537404.Search in Google Scholar PubMed PubMed Central
21. Borm, GF, Fransen, J, Lemmens, WAJG. A simple sample size formula for analysis of covariance in randomized clinical trials. J Clin Epidemiol 2007;60:1234–8. https://doi.org/10.1016/j.jclinepi.2007.02.006.Search in Google Scholar PubMed
22. Rubin, DB. Causal inference using potential outcomes. J Am Stat Assoc 2005;100:322–31. https://doi.org/10.1198/016214504000001880.Search in Google Scholar
23. Wang, B, Ogburn, EL, Rosenblum, M. Analysis of covariance in randomized trials: more precision and valid confidence intervals, without model assumptions. Biometrics 2019;75:1391–400. https://doi.org/10.1111/biom.13062.Search in Google Scholar PubMed
24. Leon, S, Tsiatis, AA, Davidian, M. Semiparametric estimation of treatment effect in a pretest–posttest study. Biometrics 2003;59:1046–55. https://doi.org/10.1111/j.0006341x.2003.00120.x.Search in Google Scholar PubMed
25. Aronow, PM, Miller, BT. Foundations of agnostic statistics. New York: Cambridge University Press; 2019:286–7 pp.10.1017/9781316831762.010Search in Google Scholar
26. Tsiatis, A. Semiparametric theory and missing data. New York: Springer Science & Business Media; 2007.Search in Google Scholar
27. Luo, Y, Spindler, M. Highdimensional L2 boosting: rate of convergence. 2016 arXiv.Search in Google Scholar
28. Belloni, A, Chernozhukov, V. Least squares after model selection in highdimensional sparse models. Bernoulli 2013;19:521–47. https://doi.org/10.3150/11bej410.Search in Google Scholar
29. Farrell, MH, Liang, T, Misra, S. Deep neural networks for estimation and inference. 2018 arXiv.10.3982/ECTA16901Search in Google Scholar
30. Syrgkanis, V, Zampetakis, M. Estimation and inference with trees and forests in high dimensions. 2020 arXiv.Search in Google Scholar
31. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikitlearn: machine learning in Python. 2012 arXiv.Search in Google Scholar
32. Quinn, JF, Raman, R, Thomas, RG, YurkoMauro, K, Nelson, EB, Van Dyck, C, et al.. Docosahexaenoic acid supplementation and cognitive decline in alzheimer disease: a randomized trial. J Am Med Assoc 2010;304:1903–11. https://doi.org/10.1001/jama.2010.1510.Search in Google Scholar PubMed PubMed Central
33. Coon, KD, Myers, AJ, Craig, DW, Webster, JA, Pearson, JV, Lince, DH, et al.. A highdensity wholegenome association study reveals that APOE is the major susceptibility gene for sporadic lateonset alzheimer’s disease. J Clin Psychiatr 2007;68:613–8. https://doi.org/10.4088/jcp.v68n0419.Search in Google Scholar PubMed
34. Rosen, WG, Mohs, RC, Davis, KL. A new rating scale for Alzheimer’s disease. Am J Psychiatr 1984;141:1356–64. https://doi.org/10.1176/ajp.141.11.1356.Search in Google Scholar PubMed
35. Galasko, D, Bennett, D, Sano, M, Ernesto, C, Thomas, R, Grundman, M, et al.. An inventory to assess activities of daily living for clinical trials in Alzheimer’s disease. The Alzheimer’s disease cooperative study. Alzheimer Dis Assoc Disord 1997;11:S33–9. https://doi.org/10.1097/0000209319970011200005.Search in Google Scholar
36. Morris, JC. The clinical dementia rating (CDR): current version and scoring rules. Neurology 1993;43:2412–4. https://doi.org/10.1212/wnl.43.11.2412a.Search in Google Scholar PubMed
37. Neville, J, Kopko, S, Broadbent, S, Avilés, E, Stafford, R, Solinsky, CM, et al., Coalition Against Major Diseases. Development of a unified clinical trial database for Alzheimer’s disease. Alzheimer’s Dementia 2015;11:1212–21. https://doi.org/10.1016/j.jalz.2014.11.005.Search in Google Scholar PubMed
38. Romero, K, Mars, M, Frank, D, Anthony, M, Neville, J, Kirby, L, et al.. The coalition against major diseases: developing tools for an integrated drug development process for Alzheimer’s and Parkinson’s diseases. Clin Pharmacol Ther 2009;86:365–7. https://doi.org/10.1038/clpt.2009.165.Search in Google Scholar PubMed
39. Chernozhukov, V, Chetverikov, D, Demirer, M, Duflo, E, Hansen, C, Newey, W, et al.. Double/debiased machine learning for treatment and structural parameters. Econom J 2018;21:C1–68. https://doi.org/10.1111/ectj.12097.Search in Google Scholar
40. Wager, S, Du, W, Taylor, J, Tibshirani, RJ. Highdimensional regression adjustments in randomized experiments. Proc Natl Acad Sci Unit States Am 2016;113:12673–8. https://doi.org/10.1073/pnas.1614732113.Search in Google Scholar PubMed PubMed Central
41. Rothe, C.Flexible covariate adjustments in randomized experiments, Working Paper; 2018.Search in Google Scholar
42. Dankar, FK, El Emam, K. The application of differential privacy to health data. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops on – EDBTICDT ’12; 2012. pp. 158–66.10.1145/2320765.2320816Search in Google Scholar
43. Brisimi, TS, Chen, R, Mela, T, Olshevsky, A, Paschalidis, IC, Shi, W. Federated learning of predictive models from federated electronic health records. Int J Med Inf 2018;112:59–67. https://doi.org/10.1016/j.ijmedinf.2018.01.007.Search in Google Scholar PubMed PubMed Central
44. Coalition Against Major Diseases, Organiza, Abbott, Alliance for Aging Research, Alzheimer’s Association, Alzheimer’s Foundation of America, AstraZeneca Pharmaceuticals LP, BristolMyers Squibb Company, Critical Path Institute, CHDI Foundation Inc, Eli Lilly and Company, F HoffmannLa Roche Ltd, Forest Research Institute, Genentech Inc, GlaxoSmithKline, Johnson & Johnson, National Health Council, Novartis Pharmaceuticals Corporation, Parkinson’s Action Network, Parkinson’s Disease Foundation, Pfizer Inc, sanofiaventis Collaborating, Fisher, CK, Smith, AM, Walsh, JR. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci Rep 2019;9:13622. https://doi.org/10.1038/s41598019496562.Search in Google Scholar PubMed PubMed Central
45. Rajkomar, A, Oren, E, Chen, K, Dai, AM, Hajaj, N, Hardt, M, et al.. Scalable and accurate deep learning with electronic health records. npj Digital Medicine 2018;1:18. https://doi.org/10.1038/s4174601800291.Search in Google Scholar PubMed PubMed Central
46. LeCun, Y, Bengio, Y, Hinton, G. Deep learning. Nature 2015;521:436. https://doi.org/10.1038/nature14539.Search in Google Scholar PubMed
47. Miotto, R, Wang, F, Wang, S, Jiang, X, Dudley, JT. Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf 2018;19:1236–46. https://doi.org/10.1093/bib/bbx044.Search in Google Scholar PubMed PubMed Central
48. Dubois, S, Romano, N, Jung, K, Shah, N, Kale, D. The effectiveness of transfer learning in electronic health records data. In: Workshop Track  ICLR; 2017.Search in Google Scholar
49. van der Vaart, AW. Asymptotic statistics. Cambridge: Cambridge University Press; 2000.Search in Google Scholar
50. Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846. https://doi.org/10.2307/2290910.Search in Google Scholar
51. Rosenblum, M, van der Laan, MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to leverage baseline variables. Int J Biostat 2010;6:13. https://doi.org/10.2202/15574679.1138.Search in Google Scholar PubMed PubMed Central
52. Freedman, DA. On regression adjustments to experimental data. Adv Appl Math 2008;40:180–93. https://doi.org/10.1016/j.aam.2006.12.003.Search in Google Scholar
53. Long, JS, Ervin, LH. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician 2012;54:217–24. https://doi.org/10.1080/00031305.2000.10474549.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston