New insights from genetic studies of eczema

Abstract Genome-wide association studies (GWAS) provided fundamental insight into the genetic determinants of complex allergic diseases. For eczema, 58 susceptibility loci were reported. Protein-changing variants were associated with eczema at genome-wide significance at 12 loci. The majority of risk variants were, however, located in non-coding, regulatory regions of the genome. Prioritized target genes were enriched in pathways of the immune response and of epithelial barrier function. Interestingly, a large overlap in the genetic architecture underlying different allergic diseases was identified pointing to common pathomechanisms for eczema, asthma, hay fever, and food allergy. Here, we review the most recent findings from GWAS for eczema including the role of rare variants and genetic heterogeneity in ethnically diverse populations. In addition, we provide an overview of genes underlying Mendelian disorders featuring eczematous skin inflammation.


Introduction
Eczema (atopic dermatitis) is a chronic inflammatory skin disease with a lifetime prevalence of up to 20 % in westernized countries.It usually develops in infancy and often precedes or co-occurs with other allergic phenotypes, such as food allergy, asthma, and hay fever.Eczema is characterized by chronic relapsing skin inflammation of flexural areas of the body, severe itch, and an aberrant immune response to environmental allergens.A defective skin barrier is a hallmark of eczema, manifesting clinically as marked dryness, sensitivity to physical/chemical irritation, and increased sensitivity to bacterial and viral infections.
Eczema is a multifactorial, complex disease.The increasing prevalence over the last decades clearly pointed to environmental factors as important modifiers of disease risk.On the other hand, family and twin studies indicated that up to 80 % of eczema susceptibility can be attributed to heritable factors, making the genome the target of interest for deciphering the molecular mechanisms underlying eczema.While loss-of-function (LOF) mutations in the epidermal barrier gene filaggrin (FLG) identified 16 years ago are still the strongest genetic risk factors for eczema, genome-wide approaches unraveled a number of involved pathways and contributed a plethora of candidate genes including some promising targets for therapeutic approaches.
In this article, we summarize the latest findings on the genetics of eczema.We review studies investigating the role of common and rare variants and of genetic determinants shared between different allergic diseases.Finally, we discuss the influence of the ethnic background which may contribute to differences in eczema phenotype and prevalence.

Searching for the causal eczema variants
In order to identify susceptibility genes in complex diseases such as eczema, different genome-wide strategies were applied.Whole genome linkage studies investigated allele sharing among affected relatives.However, their requirement for either extended pedigrees or large numbers of nuclear families or affected sib pairs as well as the lack of precision in mapping complex disease genes limited their feasibility and wide-spread use.
The advent of array technology enabled the genotyping of up to several million single nucleotide polymorphisms (SNPs) per individual in a single experiment and quickly replaced linkage studies with genome-wide association studies (GWAS).GWAS compare the allele frequencies of SNPs in affected cases and unaffected control individuals.Due to the case-control design, study participants are much easier to recruit.Moreover, the high density of markers and the linkage disequilibrium between them enabled the mapping of susceptibility loci to much smaller chromosomal intervals.
Large study populations recruited in nationwide efforts, such as the UKB, FinnGen, Estonian Biobank, or BioBank Japan (BBJ), are of great value for genetic studies, but the quality of the phenotype information on eczema may vary.While in FinnGen, Estonian Biobank, or BBJ codes of the International Statistical Classification of Diseases, 10th Revision (ICD-10) were available for eczema, the UKB provided this information only for a small subset of individuals.In the UKB, two data fields were related to eczema; a questionnaire-based report of a diagnosis of "rhinitis/hay fever or eczema ever" and a self-report of "eczema or dermatitis".However, both data fields did not capture eczema with the required accuracy; asking adults for "rhinitis/hay fever or eczema" would preferentially identify individuals with allergic rhinitis since it is a much more common disorder.Moreover, the trait "eczema or dermatitis" would include a large number of individuals with unspecified dermatitis.An acceptable solution to this problem was to use the overlap of both definitions which would yield fewer cases but enrich for eczema [9,16].Accordingly, the multi-ethnic eczema GWAS, meta-analyzing results from BBJ (n= 2,597 cases) with data on "rhinitis/hay fever or eczema" from UKB (n= 25,685 cases), should be regarded as a study on allergic disease rather than eczema.All susceptibility loci reported as newly associated with eczema in this meta-analysis [14] were found in a previous GWAS on any allergic disease [16] and were therefore excluded from Table 2. Likewise, two GWAS reporting results on eczema but using either the "rhinitis/hay fever or eczema" or the "eczema or dermatitis" definition in the UKB were not included here [17,18].

Studying rare variants in eczemaexplaining the missing heritability?
In recent years, more and more studies indicated a substantial role of rare variants in complex diseases [19,20].Their identification on the genome-wide level is however challenging.Since the penetrance of individual variants in complex diseases is incomplete, large numbers of samples need to be genotyped in order to find statistically significant differences between cases and controls.In addition, most rare variants are not covered by the commercially available genotyping arrays.Therefore whole exome sequencing (WES) or whole genome sequencing (WGS) are the methods of choice for rare variant detection.Since genome-wide sequencing studies are cost-intensive, they were applied to a limited number of severely affected individuals or family members.WES performed in eczema patients with ichthyosis vulgaris from Ethiopia [21], in individuals with eczema and high IgE levels from Japan [22], and in adultonset eczema patients from Korea [23] identified rare nonsense and missense mutations in a number of epidermal differentiation complex (EDC) genes, in ADAM33, DSG4, GTF2H5, EVPL, NLRP1, SPINK5, and CYP27A1.However, due to limited sample size and power (n < 50 cases), none of the associations were statistically significant.To understand the genetics underlying human traits more completely, there are increasing efforts for sequencing large study populations: The UKB recently released whole exomes of 454,787 UK individuals.The analyses of rare coding variants in "rhinitis/hay fever or eczema" and "eczema or dermatitis", both identified 2 LOF mutations in FLG at genome-wide significance [24].Apart from larger study populations, the use of a strict definition of a severe phenotype or severely affected families may improve the power of rare variant studies.
Recently, an alternative, imputation-based approach for studying rare variants in eczema was successfully applied.Since genotyping arrays contain a restricted number of SNPs, imputation is used to estimate missing variants not included on the array from known haplotypes of a reference population.With the Haplotype Reference Consortium panel (HRC) comprising almost 65,000 haplotypes of 32,488 sequenced individuals [25], for the first time a reference set of decent size was available to impute rare variants with high accuracy.In a meta-analysis of 21 study populations including 20,016 eczema patients and 380,433 controls, overall 48 independent variants from 38 loci were associated with eczema of which 11 contained low frequency/rare variants [9].A similar approach using large population-specific reference panels for imputation was used in a meta-analysis of Estonian Biobank, FinnGen, and UKB reporting 30 loci including 4 rare variants [8].Finally, genotyping 2,000 eczema patients and 15,000 controls with the Immunochip, a custom genotyping array that was specifically designed for studying chronic inflammatory, immune-related diseases and that included 200,000 common and rare variants from 186 candidate loci, identified low frequency variants in the docking protein 2 gene (DOK2) [26].
Imputation-based studies require a minimum number of the rare allele in the reference data set; they cannot identify individual mutations involved in the disease.Future WES and WGS studies will close this gap in the detection of disease causing rare variants.In eczema, variants with a minor allele frequency below 5 % were estimated to account for 23 % of the SNP-based heritability (77 % was explained by common variants), supporting a substantial role for rare and low-frequency variants in eczema risk [9].

Functional assessment of eczemaassociated variants
The majority of variants associated with eczema in GWAS are located in intergenic regions for which a functional link to eczema is often difficult to obtain.Pinpointing the culprit variant(s) at a risk locus, identifying their target gene(s) and experimentally elucidating their role in the disease process remain major challenges.By integrating functional evidence from transcriptomic, proteomic and epigenomic data, including systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues, Mountjoy et al. have recently finemapped >133,000 GWAS loci and made the prioritized target genes publically available [27].
To identify variants with a potential functional impact we screened the GWAS catalog for eczema-associated va-riants altering the protein sequence or having a high deleteriousness score (Table 3).Apart from the well-established FLG LOF mutations, missense variants were reported at genome-wide significance in RUNX3, SLC9A4, IL6R, IL13, DUSP1, NOTCH4, DOK2, NLRP10, TESPA, NFILZ, and TNFRSF6B.Non-coding variants with a high deleteriousness score were located in KIAA0391, TNXB, EGR2, and AFF1 and between PBX2/GPSM3, CCD80/RP11-572C15.3, and ZNF365/ ADO.Interestingly, 14 out of 23 variants likely to be harmful had a minor allele frequency < 5 % (Table 3) supporting a role for low frequency and rare variants in eczema.
In eczema, we and others have found a significant enrichment of immune-related pathways involved in cytokine signaling, T-cell differentiation and activation as well as enrichment of genes related to the cornified cell envelope in keratinocytes.Interestingly, stratifying the eczema associated variants according to allele frequency revealed that rare/low-frequency SNPs were more involved in epidermal barrier function whereas common SNPs pointed to immune functions [9].

Eczema genes from monogenic diseases
A number of eczema-associated genes were identified through rare monogenic diseases, in which eczematous skin inflammation occurs as a symptom of a more complex syndrome.The most prominent discovery was the identification of LOF mutations in the epidermal barrier gene filaggrin (FLG).They were first identified as causal variants in families with ichthyosis vulgaris [28], an autosomal-dominant skin disease characterized by dry, scaly skin and often accompanied by eczema.Shortly after, the same mutations were reported to be associated with eczema [29].Additional eczema risk genes were previously reported for monogenic diseases, including SPINK5 underlying autosomal-dominant Netherton syndrome, CARD11 for autosomal-dominant immunodeficiency 11B, and STAT3 for autosomal-dominant hyper-IgE recurrent infection syndrome.A systematic search for "eczema" or "atopic dermatitis" in OMIM yielded 18 genes mutated in Mendelian disorders commonly presenting with eczema-like skin lesions.Recurrent additional features were immunodeficiency, inflammation, other allergic disorders, and elevated total IgE as well as cutaneous and skeletal abnormalities (Table 4).Some of the underlying genes, such as FLG, IL6R, CARD11, and IL2RA, were also reported in GWAS on eczema, others were located in or near eczema susceptibility loci (Table 2).Future studies may identify additional overlaps in the genetics of eczema and rare monogenic diseases.

Filaggrin-are there more eczema risk genes in the epidermal differentiation complex?
In eczema, LOF mutations in FLG show semi-dominant inheritance with incomplete penetrance.With an odds ratio of 3 they are the strongest known risk factor for eczema.Almost all GWAS replicated the FLG locus within the EDC [30], a 1.6-Mb region on chromosome 1q21.3that contains over 60 genes, most of which are expressed during differentiation of the skin or of mucosal tissues.GWAS usually detect multiple common variants significantly associated with eczema in that region and it is tempting to assume that additional susceptibility genes for eczema reside within the EDC.However, a recent GWAS with data on low frequency and rare variants was able to condition the results on the 3 main LOF variants in Europeans; all significant association signals within the EDC were eliminated [9] indicating that they were due to FLG mutations.The effect of FLG mutations on eczema might be even larger.A systematic evaluation of all FLG LOF mutations reported in the gnomAD sequence database, which includes the sequences of 141,456 individuals from populations world-wide [31], revealed that the 4 most common European mutations covered 80 % of the LOF alleles in that population [32].A total of 276 different mutations accounted for the remaining 20 % of LOF alleles, which could not be imputed due to their very low allele frequencies.In non-European populations, an additional 320 LOF variants in FLG, most of them rare, were reported.Thus, sequencing in populations of diverse ancestries will be required to uncover additional rare eczema risk variants not only in the EDC.

Susceptibility loci shared between eczema and other allergic diseases
Eczema patients often suffer from additional allergic diseases, such as asthma, hay fever, or food allergy, pointing to a common genetic background underlying different allergic disorders.To identify shared risk variants, GWAS on any allergic disease were performed enabling large increases in sample size [16,[33][34][35].The largest meta-GWAS on allergic disease compared 180,129 cases with asthma and/or hay fever and/or eczema with 180,709 non-allergic controls from 13 study population of European ancestry [16].A total of 99 genomic loci with 136 independently associated genetic variants were identified.Among them, only 5 genomic loci showed evidence for a disease-specific effect in patients with only eczema, only asthma, or only hay fever.Interestingly at all 5 loci, allele frequencies of the risk alleles were significantly higher (FLG, IL2RA) or lower (L1RL2/IL18R1, WDR36/CAMK4, GSDMB) in eczema, distinguishing this allergic disorder of the skin from the allergic airways diseases asthma and hay fever.On the other hand, eczema and food allergy which often co-occur in infancy seem to share the majority of susceptibility loci.All variants identified in a GWAS on childhood food allergy [36] were located in loci which were also associated with eczema.
Summarizing the results, the majority of eczema loci were also involved in other allergic diseases.The eczema definition as well as the inclusion or exclusion of other allergic traits are important factors influencing GWAS results.Eczema-specific loci exist and the corresponding genes might be involved in epidermal barrier-specific functions.

The role of the ethnic backgroundsimilarities and differences
Heterogeneity of eczema between ethnicities has been described [37].In East Asian populations, eczema is characterized by immune-dysregulation and epidermal barrier features resembling a phenotype between eczema and psoriasis in patients of European ancestry.African Americans seem to have a greater lesional infiltration of dendritic cells and decreased expression of Th1-and Th17-related markers compared with European Americans.In addition, a higher prevalence of eczema and a less prominent role of FLG LOF mutations were observed.GWAS pointed to similarities of the genetic architecture underlying eczema in different ethnic groups but also identified differences which may explain phenotypic heterogeneity.
Of the 58 loci reported here (Table 2), 35 were detected only in studies of European ancestry, 8 were specific for East Asian populations, and 15 were reported in both ethnicities or only in multi-ethnic studies.The excess of European-specific loci may however be attributed to the increased power of these studies due to much larger sample sizes of the respective GWAS; the maximum number of cases was 22,474 in European studies vs. 4,296 in East Asian studies.Apart from ancestry-specific susceptibility loci, genetic diversity is evident on the variant level.Since allele frequencies as well as haplotype structure can vary between populations, different lead SNPs identified at a susceptibility locus in Europeans and Asians, respectively, may be in linkage disequilibrium with the same causal variant.Moreover, also the causal variants may be population-specific.In this regard, a comprehensive analysis of all 600 FLG LOF muta-tions present in the gnomAD database [31] yielded valuable insights [32].While the mutation spectrum was similar among related populations (e. g.Finnish and non-Finnish European), there was a significant difference between more distant populations.The most common LOF variants in the East Asian population were almost absent in Europeans and vice versa.Interestingly, there was no overlap of the most common LOF mutations between the South Asian population and the East Asian population.The allele frequency of all FLG mutations combined ranged from 1.4 % in the Latino/ Admixed American to 7.6 % in the East Asian population [32] which indicated an ancestry-dependent contribution of FLG LOF mutations in the development of eczema.
To date, genetic research on eczema has mainly been conducted in populations of European and East Asian ancestry.Since ethnic heterogeneity in terms of the eczema phenotype and the underlying genotypes exists, lack of studies on ethnically diverse populations, such as African, South Asian, and Latino populations, is a major drawback as emphasized recently in an article on polygenic risk scores [38].Future efforts therefore need to be more inclusive regarding ethnicities.This will increase our global knowledge of eczema genetics and may provide benefit to a large proportion of the world population which is not adequately represented in research yet.

Conclusion
In recent years, GWAS significantly contributed to the identification of genetic loci involved in the development of eczema and shed light on a remarkable overlap of the genetic architecture underlying different allergic diseases.However, the functional impact of the majority of eczemaassociated regulatory variants still needs to be resolved.Integration of growing sets of tissue-specific or single-cell-specific transcriptomic, proteomic, and epigenomic data will facilitate the elucidation of underlying disease mechanisms.Moreover, genome-wide sequencing studies will yield new insights into more rare and potentially functional variants.Studies from the GWAS catalog (December 2022) reporting associations with the traits eczema, atopic eczema or atopic dermatitis are shown. 2For multi-ethnic studies, the sample size of each population is indicated; E, European; NE, non-European; A, East Asian. 3The study populations participating in these studies were partially overlapping.The majority of cases was subsequently included in the EAGLE consortium. 4 The Immunochip contains 200,000 common and rare variants from 186 candidate loci involved in inflammatory, immune-related diseases. 5In UKB, cases were defined based on the life-time presence of rhinitis/hay fever or eczema.Accordingly, results from the multi-ethnic analysis were excluded from Table 2. GWAS loci were defined based on a distance > 1Mb between neighbouring genome-wide significant SNPs in the GWAS catalog.

Appendix
2 If multiple studies reported SNP association at a locus, the positions of the first and of the last lead SNP are indicated. 3 Associations reported in the GWAS catalog (December 2022) for the traits eczema, atopic eczema or atopic dermatitis are shown.For each locus identified in a study, the lead SNP is indicated.Lead SNPs are ordered according to their genomic positions.4 Association results of the multi-ethnic analysis of Tanaka et al. were excluded due to the eczema definition used for UKB.

5
Mapped genes according to the GWAS catalog using the physical position of a SNP are reported for the first and last lead SNP at a locus.For intergenic SNPs the neighbouring genes are listed.

Table 1 :
Study chracteristics of the genome-wide association studies on eczema.

Table 3 :
Eczema-associated variants with potential functional impact.
[26]ome-wide significant nonsense/missense variants and variants with a high CADD score (>12) from the GWAS catalog are listed.Rare variants identified in the studies of Grosche et al.[9]and Mucha et al.[26]were added.2Allelefrequency(AF) of the major allele in the European population of 1000 Genomes is reported.3CombinedAnnotation Dependent Depletion (CADD) score from model GRCh37-v1.4.

Disease Inheritance MIM Number Gene name (symbol) Locus Symptoms
Onset of recurrent infections and inflammatory features, such as vasculitis and eczema in infancy or early childhood, thrombocytopenia, increased serum IgE, IgA, or IgM, leukocytosis, and increased eosinophils.Recurrent viral, fungal, and bacterial infections in infancy, lymphadenopathy, and variable autoimmune features, such as autoimmune enteropathy and eczematous skin lesions.Early-onset skin lesions, including eczematous dermatitis, infectious abscesses, and warts, recurrent respiratory infections or allergies, and chronic persistent bacterial and viral infections.Some patients may have gastrointestinal involvement, including inflammatory bowel disease, EBV+ smooth muscle tumors, and esophagitis.Severe diarrhea due to enteropathy, type 1 diabetes mellitus, and dermatitis in infancy.Other features may include hypothyroidism, autoimmune hemolytic anemia, thrombocytopenia, lymphadenopathy, hepatitis, and nephritis.Most patients with eczema and increased serum IgE.