Polygenic scores for psychiatric disease: from research tool to clinical application

Abstract Propensity to psychiatric disease involves the contribution of multiple genetic variants with small individual effects (i. e., polygenicity). This contribution can be summarized using polygenic scores (PGSs). The present article discusses the methodological foundations of PGS calculation, together with the limitations and caveats of their use. Furthermore, we show that in terms of using genetic information to address the complexities of mental disorders, PGSs have become a standard tool in psychiatric research. PGS also have the potential for translation into clinical practice. Although PGSs alone do not allow reliable disease prediction, they have major potential value in terms of risk stratification, the identification of disorder subtypes, functional investigations, and case selection for experimental models. However, given the stigma associated with mental illness and the limited availability of effective interventions, risk prediction for common psychiatric disorders must be approached with particular caution, particularly in the non-regulated consumer context.


Introduction
Propensity to psychiatric disease and related quantitative traits involves the contribution of multiple genetic variants with small individual effects (i. e., polygenicity). This contribution can be summarized using polygenic scores (PGSs). PGS have thus become a standard tool in psychiatric research. PGSs also have the potential for translation into clinical practice. Here, we provide an overview of the conceptual foundations of PGSs, examples of their suc-

What are polygenic scores?
Genome-wide association studies (GWAS) have identified a large number of common single nucleotide polymorphisms (SNPs) that influence risk for a variety of psychiatric diseases, including schizophrenia (SCZ) [1,2], bipolar disorder (BD) [3], and major depressive disorder (MDD) [4,5]. The significantly associated variants, together with many other associated variants that do not (yet) fulfill the strict significance criteria applied in GWAS, constitute the polygenic component of psychiatric disorders and related quantitative traits. The individual polygenic load of these variants can be summarized using PGSs.
How are PGSs generated? In general, PGSs use information, e. g., effect sizes and P-values, from a large GWAS training dataset to predict phenotypic outcomes in an independent test dataset. For the calculation of PGSs, the allele count of each variant that is associated with disease risk or a quantitative trait in GWAS is multiplied with the effect size of the respective SNP (Fig. 1). For quantitative traits, this effect size is the linear regression β coefficient. For case/control analyses, the effect size is the natural logarithm of the odds ratio. The weighted allele counts of all associated variants are then summed to compute the PGS. In general, genotype information from microarrayswhich typically test for 500,000 to one million SNPs -is enhanced by imputation. Here, the genotype probabilities of millions of variants are estimated using haplotype reference panels in order to account more comprehensively for frequent genetic variants in a given population. For imputed SNPs, the allelic dosage, i. e., the additive probability of carrying the minor allele, is used instead of the allele count in the formula shown in Fig. 1. The allelic dosage is calculated as follows: dosage = 2 × probability homozygous + 1 × probability heterozygous , where both probabilities refer to the minor allele.
PGSs are thus a quantitative measure of the additive genetic burden for a particular disease or trait and can be used to assess individual genetic load. A practical application of a simple PGS is the prediction of eye color using only six SNPs [6]. In the case of the biologically much Polygenic scores (PGSs) are typically calculated by summing the weighted allele counts of independent (i. e., uncorrelated) single nucleotide polymorphisms (SNPs) found to be associated with a trait or disorder in a genome-wide association study (GWAS). For quantitative traits, the weights are the linear regression effect sizes, i. e., the β coefficients. For case/control phenotypes, the weights are the natural logarithm of the odds ratio, ln(OR). more complex and heterogeneous psychiatric diseases, however, many more variants contribute to disease development and, therefore, must be incorporated in order to render the PGS meaningful. SNPs are typically selected based on their GWAS association strength, i. e., P-values. In GWAS, the effective P-value is dependent on the statistical power, and is thus influenced by the sample size and allele frequency. It can therefore be assumed that many of the variants that fail to reach formal genome-wide significance (P < 5×10 −8 ) in a GWAS would surpass the significance threshold if larger sample sizes were used. In fact, the phenotypic variance explained by psychiatric PGSs increases when more than just the genome-wide significant variants are included.
In most cases, therefore, the P-value threshold for the inclusion of variants into PGSs is determined empirically. A range of PGSs are calculated, each incorporating variants below a different P-value threshold [2,7]. Association of each PGS with the disease or trait is tested in an independent cohort, and the PGS at the P-value threshold with the highest predictive capability is selected. To avoid overfitting, i. e., the generation of prediction models that are suitable for the test dataset but which do not replicate in other data, the selection of P-value thresholds either via (nested) cross-validation or in a third, independent validation dataset is recommended. Research has shown that for most psychiatric disorders, thresholds of P < 0.01 or P < 0.05 have the best signal-to-noise ratio for disease pre-diction [1][2][3][4][5]. Some tools, e. g., PRSice [8], calculate thousands of PGS at different thresholds, and the optimal PRS is then selected automatically. However, unless carefully applied, such approaches may increase the risk of overfitting and generate problems in terms of effective correction for multiple testing.
A second issue to be considered in the selection of variants for PGS calculation is linkage disequilibrium (LD). GWAS typically analyze 7-9 million variants, and most are imputed based on the presence of LD, rather than being directly genotyped. However, the PGS should be shaped by independently associated genetic loci and not by correlated markers. Moreover, studies differ in the choice of microarray types and imputation reference panels. The latter are provided by sources such as the 1000 Genomes and Haplotype Reference consortia. Thus, even with imputed data, the training and test datasets often display only a partial overlap of variants.
Typically, these issues of LD and SNP matching are addressed using LD clumping. For each locus, the variant in the test dataset that shows the highest correlation with the top-associated SNP from the training data is determined and used to calculate the PGS. Variants in LD with the top SNP are discarded. Similarly to choosing the P-value threshold, the parameters used for LD clumping can be determined empirically. However, problems arise when the ethnic background -and thus the LD pattern -differs between the training and test data [9]. Moreover, standard LD clumping does not perform well in regions showing longrange, complex LD, e. g., the major histocompatibility complex (MHC) region on chromosome 6. To circumvent this issue, PGS calculation often involves only the single topassociated SNP within the MHC region. This type of approach has been used by the Psychiatric Genomics Consortium, e. g., for SCZ PGSs [2].
Several algorithms have been proposed to improve the selection and weighting of variants for PGSs, e. g., LDpred and PRS-CS [10,11]. These algorithms, usually based on a Bayesian regression framework, directly model LD within PGS by shrinking effect sizes according to the LD. Their prediction performance surpasses that of PGSs calculated by P-value thresholding and LD clumping. Methods have also been developed to combine data from several GWAS into a single PGS, an approach that is particularly useful for analyses of psychiatric disorders that show a strong genetic correlation [12,13].

Limitations of polygenic scores
Two critical restrictions of classical PGSs, i. e., their dependence on P-value thresholds and LD clumping, can be circumvented using advanced tools such as PRS-CS. However, not all GWAS are suitable for calculation of the PGSs. First, training GWAS should be as large as possible to ensure that the effect sizes in the GWAS sample are a reasonable estimate of the true effect sizes in the population. Second, training and test samples must be independent to ensure unbiased effect sizes [14]. Since most training GWAS data are generated by large international consortia, obtaining independent test data can be challenging.
PGSs typically aggregate hundreds or thousands of variants with small individual effect sizes and thus follow a normal distribution on a population level (Fig. 2). This distribution has important consequences for the sensitivity and specificity of PGS-based predictions.
The assessment of mean PGS differences between cases and controls often generates highly significant P-values. Nevertheless, the absolute differences in mean PGS between groups are small, and the respective PGS distributions show a clear overlap. Therefore, PGSs only have substantial predictive value for individuals in the top and bottom percentiles of the distribution [15]. Moreover, absolute PGS values have no objective meaning. Instead, they should be interpreted in comparison with the PGSs of individuals with known case/control status. Systematic differences in allele frequencies and LD patterns between training and test subjects impair PGS-based predictions. Training and test datasets must thus be matched in terms of ethnic background, and, ideally, also regarding the genotyping technology and imputation method. Even differences in demographic characteristics and socio-economic status between training GWAS and test subjects can affect the prediction results [16]. These requirements limit the application of PGSs for individual prediction. An out-ofcontext PGS value for any single individual is thus noninterpretable.
PGSs assume a simplified genetic model of disease and only incorporate the additive effects of common variants, typically with minor allele frequencies above 1 %. Therefore, they do not account for non-additive inheritance models, epistasis, or rare variants. The latter include copy number variants (CNVs) and the many rare point mutations in the genome that confer risk but have not yet been identified. These limitations are likely to contribute to the observation that current PGSs for psychiatric disorders explain only a small proportion of phenotypic variance. On a liability scale, phenotypic variances explained by published PGSs were 6-7 % for SCZ, 4 % for BD, and 2-3 % for MDD [1][2][3][4][5]. These low values and the small areas under the receiver operating curves of the prediction models imply that PGSs are not yet suitable for the reliable prediction of a future psychiatric disorder.
The specific variants analyzed in a GWAS explain only a part of the broad-sense heritability of any given disorder or quantitative trait. In general, this SNP-based heritability constitutes an upper limit for the predictive capability of the respective PGS. Interestingly, for body height -another highly polygenic human trait -the complete estimated heritability may be explained when both common and rare variants are included in the analysis [17]. It is thus expected that prediction models for psychiatric disorders will improve when rare variants are incorporated into the respective PGSs.
Finally, PGSs can only be applied to the populations in which the training GWAS were conducted [18]. This limitation poses a problem for psychiatric research in non-European populations, for which large-scale GWAS suitable for use as training data are lacking. In terms of future psychiatric clinical practice, the inability to calculate meaningful PGSs in non-European populations may aggravate existing differences in access to optimal medical care [19].

Polygenic scores as a powerful research tool
The PGS approach is a highly effective tool in medical research. For example, PGSs can be used to investigate whether disease subtypes have differing underlying genetic burdens. As anticipated, research into psychiatric diseases, including SCZ, BD, and MDD, has identified higher PGSs in more severe disease subtypes [3,20,21]. PGS can also be used to determine whether -and if so, to what extent -different diseases have common genetic causes. Cross-disorder psychiatric studies have shown that the genetic overlap between schizophrenic and affective disorders, in particular, is very large (approx. 34 % with MDD and 70 % with BD), and that certain SNPs contribute to the development of psychiatric diseases across all diagnoses [22,23]. Combined analyses of subtypes and genetic effects across different diagnoses have shown that in BD cases, SCZ PGSs were particularly high in patients diagnosed with BD type I, in patients presenting with psychotic features, and in early-onset BD cases. In contrast, BD type II patients showed a higher MDD PGS [3,20]. In SCZ cases, a higher BD PGS was associated with manic symptoms [24].
PGS have also been used to demonstrate that psychiatric multiplex families -a phenomenon traditionally thought to be caused by rare variants of large effects -can display an increased load of common genetic risk variants (e. g., [25]). This finding does not, of course, exclude an effect of rare variants in these families. However, it suggests that family aggregation is unlikely to be caused by rare variants alone.
Interestingly, PGS have also been used to determine resilience to disease, e. g., to development of schizophrenia [26]. The study of genetic factors that counteract an individual's risk may be particularly helpful in elucidating mechanisms that can be modulated to reverse pathophysiological processes.
To generate a functional understanding of psychiatric disease, the analysis of endophenotypes is considered helpful, and a large number of endophenotypes have been proposed, e. g., quantitative neurocognitive and neurophysiological measures and structural and functional neuroimaging data [27]. In this context, the application of PGSs allows the relationship between proposed endophenotypes and disease to be redefined on an etiological level [28][29][30][31].
PGS can also be used to provide biological insights on the gene or pathway level [32]. For example, investigations of the association between PGSs and gene expression in single tissues and at specific time points can improve our understanding of the molecular etiology of pathological processes [33,34].
Finally, PGSs offer new possibilities for the functional investigation of disease mechanisms in experimental models. Particularly in experimental approaches that are restricted to a limited number of participants, the random selection of patients and controls with varying genetic burdens can obliterate group-level differences. For the investigation of induced pluripotent stem cells, for example, this problem has been addressed by selecting cell donors on the basis of PGS stratification [35].

Paths to clinical translation in psychiatry
At present, the power of PGSs to predict the future development of a psychiatric disorder is very limited [36]. Predictions will be improved by the use of larger training datasets with 100,000 or more cases, such as those targeted by the Psychiatric Genomics Consortium [37]. However, the fact that rare genetic variants and non-genetic factors also make a significant contribution to psychiatric disease development will continue to preclude any reliable prediction of psychiatric disease when based on PGS alone. Nevertheless, clinical medicine provides compelling examples in which the identification of individuals with an increased disease risk (i. e., risk stratification), followed by targeted screening and prevention measures, led to a substantial reduction in morbidity and mortality. In principle, PGSs may be used for risk stratification of this nature and for improving procedures already established in the clinical setting [15,38,39]. In the case of psychiatric disease, no preventive measures have yet been established in clinical practice that could be routinely applied to individuals with increased disease risk, e. g., those with a positive family history. It is even possible that communication of the individual risk for a psychiatric disorder may result in a selffulfilling prophecy [40]. Therefore, the effects of risk stratification and intervention strategies should first be demonstrated in studies involving comprehensive risk-benefit analyses.
The first use of PGSs in psychiatric diagnostics might be considered for carriers of CNVs that are associated with an increased risk for specific disorders, and for which odds ratios of up to 50 have been reported [41]. Since the polygenic background influences the penetrance of psychiatric disease in CNV carriers [42], the use of PGSs would improve individual risk estimates. Experience from clinical genetics shows that in general, the parents of children carrying CNVs wish to be informed about the potential phenotypic spectrum of their children.
Another potential application of PGSs in future psychiatric clinical practice is the identification of more homogeneous patient subgroups [20,43,44]. The heterogeneous nature of psychiatric diseases is apparent since the currently used clinical diagnosis is not based on an etiological foundation (see the accompanying articles). In principle, the establishment of PGSs for specific, biological pathways or networks corresponding to individual disease etiology would allow the identification of patients with a pathway-specific risk load. This would provide the foundation for an etiology-based, stratified approach to the diagnosis and clinical management of psychiatric disease.
Efforts are also underway to establish PGSs for drug responses and propensity to side effects for currently available pharmacological psychiatric therapies (e. g., the International Consortium on Lithium Genetics, ConLiGen), as has been performed previously for other medical disorders [45]. However, large samples will be required for the establishment of meaningful prediction models.
When developing PGSs, a critical ethical consideration must be their potential application within the nonregulated consumer context [46]. For cardiovascular disorders, e. g., coronary artery disease, mobile applications that calculate PGSs and provide lifestyle and dietary recommendations already exist [47]. This approach can facilitate the prevention of cardiovascular disease via a positive influence on risk behavior.
Only weeks after the publication of a GWAS on samesex behavior, PGSs generated from this study that explained hardly any (<1 %) of the variance of the examined phenotype were misused for homosexuality prediction by a commercial direct-to-consumer genetic testing provider, leading to a backlash from the lesbian, gay, bisexual, and transgender community [48]. For psychiatric disease, the misuse of PGSs developed within the research context could have serious ethical consequences, given the high degree of stigma that is still associated with mental illness in the global society.

Conclusions for research and clinical practice
-Polygenic scores (PGSs) are an effective tool for the quantification of the polygenic contribution to psychiatric disorders at the individual level. -In psychiatric research, PGSs are now applied on many levels, from obtaining insights into biological mechanisms to understanding the relationship between different diseases. -Although the generation of PGSs requires large datasets (training sample), PGSs can be applied to much smaller datasets in subsequent studies (test sample). -The application of PGSs for disease prediction remains limited, since, in addition to frequent genetic variants, rare variants and non-genetic factors also play an important role.
-Potential future clinical applications include the identification of etiology-based disease entities for the development of new therapeutic strategies and the prediction of drug responses and side effects for currently available pharmacological therapies.
Conflict of interest: Till Andlauer and Markus Nöthen state that there are no conflicts of interest.
Patients' rights and animal protection statement: This article does not contain any studies with human or animal subjects.