Single-cell sequencing: promises and challenges for human genetics

Abstract Over the last decade, single-cell sequencing has transformed many fields. It has enabled the unbiased molecular phenotyping of even whole organisms with unprecedented cellular resolution. In the field of human genetics, where the phenotypic consequences of genetic and epigenetic alterations are of central concern, this transformative technology promises to functionally annotate every region in the human genome and all possible variants within them at a massive scale. In this review aimed at the clinicians in human genetics, we describe the current status of the field of single-cell sequencing and its role for human genetics, including how the technology works as well as how it is being applied to characterize and monitor diseases, to develop human cell atlases, and to annotate the genome.


Introduction
The advent of next-generation sequencing (NGS) technologies has made the screening of patients and the discovery of new variants routine; however, the task of deciphering the impact of the uncovered genomic alterations is still the central challenge of the field of human genetics.Over 60 % of patients with rare diseases of probable genetic etiology leave the clinic without a diagnosis even after wholegenome sequencing according to the pilot report from the UK 100,000 genomes project [1].Improving this necessitates multifaceted efforts, starting with thorough characterization of the "normal or wild-type" state of tissues to benchmark diseased states against, obtaining a complete picture of genotype-phenotype relationships across variants, and developing technologies that facilitate clinical translation.Advances in all three areas will be decisive in developing effective diagnosis and therapeutic regimes.
One technology that attracted an enormous amount of attention lately across many fields is single-cell sequencing (sc-seq).While sequencing of tissues (hereafter referred to as bulk-seq) has been a remarkable method to characterize the average profile of a tissue in health and in disease, it becomes insensitive in detecting a phenotype, when a disease affects only a subpopulation of cells in an organ or tissue leading to fraught conclusions.Scseq technologies enable acquiring more granular information about the different cell types within a tissue, thus increasing not only the resolution of the data but also the statistical power when benchmarking a diseased state to a normal state.That is, it is theoretically possible to compare sequencing data from patients to a "wild-type" sample to determine gene expression as well as cell composition changes, even at early stages of disease progression.This is just one of the reasons why sc-seq was named the method of the year in 2013 (Nature Methods) and breakthrough of the year in 2018 (Science), has been highlighted for early detection of diseases [2], and draws over 8700 publications per year ("single-cell" on PubMed).Creation of such "wild-type" benchmarks -the so-called cell atlases -has been one of the major achievements of the field.What cell atlases are and how they can be used is discussed towards the end of this review.Sc-seq as an unbiased and high-resolution phenotyping method has also been paralleled with multiplexed gene editing methods, such as pooled CRISPR and saturation genome editing.This combined genotype-phenotype screening allows annotation of hundreds to thousands of genomic regions in one single experiment, which is also briefly discussed.
This review gives an introduction to the field of singlecell genomics with a specific focus on its use for human geneticists.We describe the current state of sc-seq tech-nology and how it applies to human genetics.First, we briefly outline the experimental and analytical workflows.Then we review sc-seq applications most relevant to the field of human genetics and discuss how the toolbox comprising various flavors of sc-seq is being integrated into the field of medical genetics.In the interest of brevity, we have restricted the scope of the review based on the definition of the technology and the biological focus of the studies.We define sc-seq as technologies that provide cellular resolution, use sequencing as a read-out strategy (cf.hybridization probes), and probe cellular nucleic acids (cf.proteome sequencing).We limit our discussions to studies which focus on human tissues and diseases relevant to human genetics, except for the section on the applications under development.For more focused reviews on these topics, we suggest the following reviews [3][4][5][6].

Experimental and analytical workflows
As the name implies, sc-seq comprises the extraction of the nucleic acids of interest from individual cells or nuclei (hereafter collectively referred to as cells for simplicity), followed by sequencing and data analysis (Fig. 1).Here we provide a quick summary of the modalities of sc-seq as well as the experimental and analytical workflows, which have been elaborated and critically evaluated in many recent reviews [4,[7][8][9][10].
In its most basic form, there are three main modalities of sc-seq where the cellular nucleic acids are sequencedsc-genome-seq, sc-epigenome-seq, and sc-transcriptomeseq.Each of these modalities offer complementary information that are uniquely suited to solve niche challenges in human genetics.Sc-genome-seq is suited for applications such as identifying genotypes (albeit with sparse coverage when compared to bulk-genome-seq), elucidating mechanisms that lead to somatic mutations [11], constructing developmental lineages, and prenatal testing [12].On the other hand, sc-epigenome and sctranscriptome offer the possibility to phenotype cells and also elucidate the mechanistic pathways leading from the genotype or environmental cues to a pathological state.More recently, multi-ome or multimodal sequencing has enabled the simultaneous profiling of combinations of these modalities, for example to link regulatory elements to gene expression profiles and to deduce gene regulatory networks [13,14].
The experimental workflow of sc-seq resembles bulkseq for the most part, including PCR steps, enzymatic frag-mentation, and end-repair, as well as sample-index and sequencing-adapter ligation.However, there are two major steps that differentiate sc-seq from bulk-seq -dissociating the tissue to cells and cellular barcoding (Fig. 1A, B).Dissociating the sample into a single-cell suspension enables access of reagents to individual cells and thereby the extraction of molecules of interest, without mixing the molecular contents between cells.This step requires painstaking optimization, since the tissue type, whether it is fresh or frozen, and the biological age of the sample all affect the protocol.Follow-up flow-assisted cell sorting (FACS) may be required to filter out cell debris.Often nucleus sequencing is preferred over cell sequencing due to the ease of dissociation.
Cellular barcoding is in principle similar to sample indexing -a prevailing method for cost reduction in NGS, where molecules (or sequencing libraries) from multiple samples are multiplexed using unique oligonucleotide indices prior to pooled sequencing (Fig. 1C).The resulting data is bioinformatically demultiplexed based on the known indices.In cellular barcoding, unique oligonucleotides instead demarcate the molecules with cellular identities.But unlike sample indexing, where indices refer directly back to a particular metadata of the sample (e. g., treatment vs. control group, or the experimental replicate #), cellular barcoding occurs at random and the identities of the cells (e. g., cell type) need to be bioinformatically determined.Several technologies have been developed for cellular barcoding.Each employs a different strategy to append oligonucleotide barcodes to molecules within each cell.These include using micro/nano-wells (Fig. 1B) [15,16] to isolate cells in individual reaction chambers, or microfluidics-based droplets [17,18] or split-pool methods that use combinatorial methods to achieve this goal without cellular isolation [19].Kits that simplify most of the steps from cellular barcoding to generation of sequencing libraries are commercially available from BD Rhapsody, 10x Genomics, TakaraBio, MissionBio, and Standard Biotools, among others.A comparison of some of these technologies can be found in several reviews [10,20,21].
The resulting sequence datasets are generally large and complex.Their large size is a result of the fact that most recent sc-seq methods readily sequence well over thousands and up to millions of cells [23].The datasets are challenging because of artifacts arising from the limited amount of nucleic acid molecules within each cell (e. g., dropout events, as discussed in the Technological limitations section).Apart from additional correction and filtering measures to account for the artifacts and that the sequencing reads need to be demultiplexed using the cellular barcodes to generate an omics profile per cell, the main (B) This is followed by cellular barcoding using droplet-or well-based technologies that enables pooled sequencing of the molecules from all the cells.(C) The final step is sequencing.Although long-read sequencing is rarely used in current analysis workflows, it has potential for identification of splice variants in the case of sc-transcriptome-seq or structural variants in the case of sc-genome-seq as well as to reduce sequencing costs [22].Adapted from [10].
analytical workflow also resembles that of bulk-seq.The genomic features, i. e., variants in sc-genome-seq, gene expression counts in sc-transcriptome-seq, or read counts in sc-epigenome-seq, are then extracted from the raw data by alignment to the reference genome.The extracted features are segregated per cell and tabulated into a cell × feature matrix.At this point in the analytical workflow, the cell names in this matrix are alphanumerals corresponding to the oligonucleotide barcodes and do not contain any interpretable metadata (e. g., cell type or cellular genotype).
The next steps of the workflow are generally aimed at inferring this metadata from the data by means of (hierarchical) clustering of the cells (Fig. 2A) and the identification of differential features between clusters.Cell type annotation, i. e., the identification of cell type corresponding to each cluster, is usually performed manually.However, thanks to the recent boom in the abundance of public cell atlases, automated methods are also becoming available [24].However, the accuracy and sensitivity of annotations to smaller and previously undescribed cell populations may still be limited, making manual curation a necessary step.From this point on, the downstream analyses vary significantly based on project goals, such as identifying mutational landscapes and differentially expressed genes, calculating cellular compositions, or establishing gene regulatory networks stratified either by sample groups or by the identified cell clusters.Several bioinformatic methods have been developed to handle all aspects of the analysis.Some of these include Cellranger, Seurat, Scanpy, etc., for sc-transcriptome-seq analysis [25], chromVar, Signac, scABC, etc., for sc-epigenome-seq analysis [26], and Monovar, SCIΦ, etc., for sc-genome-seq analysis [27].Nevertheless, the sc-seq data continues to be vulnerable to subjec-tive analysis, and is best left to experienced bioinformaticians.Automated and web-based interactive tools are becoming available, which will make sc-seq more accessible to a diagnostics setting [28,29].

Applications of sc-seq technologies
In this section, we review the literature on applying sc-seq to human genetics.This section is divided into three parts: (i) applications in disease characterization, (ii) applications to aid diagnosis or therapy, and (iii) applications under active development.

i. Applications in disease characterization -Cellular phenotyping and deciphering molecular mechanisms
Sc-seq is suited to the characterization of diseases and underpinning the molecular mechanisms behind the pathology.This is especially the case when the pathological state affects a subset of cells (e. g., cell type-specific) or when it affects multiple organs or tissues because of pleiotropic genes.Given the single-cell resolution, it also enables identification of co-occurring mutations from mutually exclusive ones, which is not possible in bulk-seq [12].These advantages have led to the application of sc-seq for the characterization of various diseases, including infectious diseases such as HIV, tuberculosis, influenza, COVID-19, etc., which have already been reviewed recently [12,30,31].A large proportion of applications on non-infectious diseases focuses on cancer, namely the characterization of cellular heterogeneity, gene pathways, clonal evolution, etc. [32][33][34][35][36]. Sc-seq has also been used to characterize complex genetic diseases [37] such as heart diseases [38], Diamond-Blackfan anemia [39], and autism [40]; autoimmune conditions such as lupus [41], multiple sclerosis [42], and rheumatoid arthritis [43]; respiratory illnesses such as asthma [44][45][46]; and tissue degenerative conditions such as aging [47,48], age-related ocular diseases [49][50][51], Alzheimer's disease [52,53], Parkinson's disease [54][55][56], and ALS [57].These efforts have also resulted in publicly accessible databases similar to wild-type cell atlases, providing an easy portal to query the expression patterns of genes of interest (e. g., scREAD for Alzheimer's disease [52]).To a lesser extent, sc-seq has also been used to characterize monogenic and chromosomal disorders, which are summarized in Table 1.In order to portray the power of sc-seq for disease characterization and molecu-lar phenotyping of a mutation, we describe two of these studies in Boxes 1 and 2.
Box 1. Trisomy.Autosomal polyploidy, such as trisomy 21 and 18, is associated with decreased cellular proliferation, congenital defects, intellectual disability, and shortened life expectancy.In the case of trisomy 21 (Down syndrome), it also leads to impaired memory [58] as well as a higher predisposition for Alzheimer's disease, with its clinical hallmark of plaques [59].The etiology of the syndrome is, however, not fully known.Palmer et al. [60] applied sc-transcriptome-seq on cerebral cortices from 29 age-, sex-, and quality-matched Down syndrome and control brains to characterize the differential cellular constitution as well as isoform-specific expression profiles.One of the primary observations in this study was the imbalance between the numbers of inhibitory and excitatory neurons in the cortex -an observation reported previously in mouse models.This imbalance was seen in all examined brains, but limited to the interneurons developing from the caudal (as opposed to medial) ganglionic eminence.As opposed to naive expectation and in congruence with previous investigations, the expression of only nine genes correlated with the polyploidy (i.e., fold expression change > 1.5), and most of the affected genes are not located in chromosome 21.Moreover, the misexpression in the Down syndrome group was cell type-dependent, with microglia being the most affected cell type.Additionally, signatures of aging were found in the microglia from young Down syndrome samples.The Down syndrome microglia also overexpressed components of C1q and ADGRG1, which are implicated in overactive synapse pruning and in memory loss.In short, the authors identified cellular processes that could mediate the phenotypic consequences of Down syndrome, which would have gone undetected by bulktranscriptome-seq or other comparable methods.
The nuclear protein MECP2 is suggested to act as a transcriptional repressor by recruiting repressor elements to methylated DNA in a cell type-specific manner, a function which is impaired in the mutated MECP lacking the transcriptional repressor domain.As a result of the mosaicism, the neural circuits in female individuals with Rett syndrome are composed of normal as well as diseased neurons.Many of the previous investigations on the function of MECP2 were carried out on hemizygous male mice, where all the cells are affected, limiting the conclusions for mosaic states.Renthal et al. [75] addressed this by applying sc-transcriptome-seq as well as clever genotyping methods to establish genotype-phenotype relationships.

Umbilical cord blood
Altered cell populations and misregulated pathways [74] Rett syndrome Transcriptome Gene expression in mosaic Rett syndrome

Occipital cortex
The upregulated genes in Rett syndrome are correlated with the extent of DNA methylation [75] However, one of the technical challenges in addressing a genotype-phenotype problem is that sc-transcriptomeseq technologies are not ideal for genotyping, because of several reasons: (1) they only detect variants in the transcribed regions; (2) most technologies (except for Smartseq or other bespoke methods [60]) are designed for counting the transcripts as opposed to genotyping and therefore do not cover the full length of the transcript; and (3) they only capture a small fraction of the cellular transcriptome (usually 2000-3000 transcripts per cell).The authors therefore relied on identifying the cells expressing the wild-type allele or the mutant allele by taking advantage of allele-specific SNPs that were maintained in cis with the mutant MECP2 gene.As a result, they were able to identify the ∼ 3000 dysregulated genes in excitatory neurons and ∼ 200 genes in vasoactive intestinal peptide interneurons.By taking advantage of published single-cell methylation data from the human cortex [76], they were able to conclude that indeed MECP2 in humans represses highly methylated long genes in wild-type but not MECP2 mutant neurons, thus providing mechanistic insight into this disease.
Box 2. Rett syndrome.X-linked genetic disorders, such as Rett syndrome, which is caused by mutations in the MECP2 gene, or Fragile X syndrome, which is caused by repeat expansion within the FMR1 gene [61], result in neurodevelopmental disorders.In female patients the manifestation of the disorder is generally milder as a result of mosaicism, where the affected tissues are composed of cells expressing either normal or non-functional genes due to the random inactivation of the alleles.Bulk/tissuelevel sequencing methods are insensitive to the phenotypic characterization in these cases due to the lack of cellular resolution to tease apart somatic mosaicism.Scseq offers unique advantages.Not only does it offer singlecell resolution to characterize the cellular phenotypes, it also offers the possibility to compare the phenotypes between cells expressing the functional or the mutant MECP2 obtained from the same individual, thus avoiding batch effects and biological variabilities when comparing Rett samples and age-matched controls.Moreover, since the cells with altered transcriptome in mosaic diseases are surrounded by an otherwise "normally functioning" environment, inter-cellular factors affecting the cells are also alleviated.

ii. Pre-clinical applications to aid diagnostics and therapy
The ability to detect rare cellular subpopulations within a tissue biopsy has been the long-held promise of sc-seq technologies for early disease diagnosis, enabling timely therapeutic interventions (Fig. 2B).But, unlike the rapidly accelerating literature on such prospective applications of sc-seq for disease diagnosis [77], its explicit use in the clinic is lagging behind.Currently there are a disproportionately large number of reviews highlighting the promise of sc-seq and its diverse use case scenarios for diagnostics, therapeutic monitoring, and personalized medicine, especially in the context of cancer [78][79][80][81][82][83][84].While several hurdles, as discussed in the Technological limitations section, prevent the direct application of sc-seq in the clinic, its usage, e. g., in clinical trials, to assess the (in)effectiveness of a new therapy and to delineate the underlying mechanisms has seen some interest [85][86][87].
One of the primary applications of the technology in a therapeutic setting has been in assessing and understanding the response (or resistance) to cancer therapy.For example, Kim et al. [88] used sc-seq on triple-negative breast cancer biopsies/excised samples to address an unresolved question, i. e., whether resistance to neoadjuvant chemotherapy was caused by generation of new mutations during the therapy or the selection of rare, pre-existing clones, owing to intra-tumor heterogeneity.By the combined use of sc-genome-seq and sc-transcriptome-seq the authors found out that while the chemotherapy caused the adaptive selection of pre-existing mutations, the selected cells underwent transcriptional reprogramming in response to the therapy.
Mochizuki et al. [89] report on their intermediate results from a Phase I clinical trial utilizing chimeric antigen receptor T-cell (CART) therapy for gliomas.They find correlations between immunosuppressive myeloid populations and the response to therapy based on sc-transcriptomeseq of cerebrospinal fluid biopsies.In this context, it is worth highlighting that the use of circulating cells, such as circulating tumor cells and cells in the cerebrospinal fluid, in combination with sc-seq is being recognized as a promising strategy for diagnosis and therapeutic monitoring [83].Indeed tools specifically designed to enrich cells from patient biopsies with liquid samples are also being developed [90].

iii. Applications under active development
Most aspects of sc-seq are yet to be standardized for routine clinical use.One of the reasons for this steady and continuous development is to build upon the basic idea of sequencing every cell and to introduce new capabilities.Incorporation of new capabilities constitutes one of the focus areas of the development, which include simultaneous recording of spatial characteristics with "scispace" (see Box 3), longitudinal sequencing of the transcriptome in live cells with "live-seq" [91], or the recording of information related to cell physiology with "patchseq" [92].Another direction into which the technology is currently developing is to address fundamental and longstanding questions in biology, such as annotating the noncoding genome or unraveling the complexity of human development.These challenges are being addressed using novel approaches utilizing sc-seq such as "pooled CRISPR screening" [93] and "cell lineage tracing" [94,95].While not all of these breakthroughs are of immediate relevance to human genetics, we believe some of the outcomes of this global effort mandate a discussion for this readership, even if their benefits in the clinic may take years to fruition.These include: the construction of open-access cell atlases, the unbiased annotation of functional elements in the genome, and the high-throughput phenotyping of variants.We limit our discussion below to the cell atlases and the phenotyping of variants, since the annotation of functional elements in the genome was recently reviewed elsewhere [10].Box 3. A word on spatial transcriptomics.One of the drawbacks of sc-seq is the loss of spatial information when dissociating the tissue into cells.This loss is most noticed when an entire organism (e. g., whole embryos [23,96] or zebrafish [97]) or a large tissue (e. g., brain [98]) is sequenced, where the spatial context is at least as important as the cell type information because of abundant cell types such as mesenchymal or epithelial cells.Spatial transcriptomics is a related sequencing methodology that, until recently, prioritized preserving spatial coordinates over cellular identity of the sequenced molecules [99].Methods have been developed to integrate, experimentally [100] and bioinformatically [5,101], the sc-seq data with spatial data, which can help delineate how tissue-level phenotypes form by collective cellular functions.

iii (a) Human and mouse cell atlases
Creating publicly accessible cell atlases of organisms based on sc-seq has been one focus area of the developmental research field (Fig. 3A).In a cell atlas, cells are classified and catalogued based on their expression profiles and epigenetic marks.They are an especially valuable resource for developmental and disease research, as they provide an open and peer-reviewed benchmark to compare diseased cellular states as well as to screen gene expression patterns or epigenetic modifications at a par-  1. (B) Schematic of a use case scenario for the diagnosis of polycystic kidney disease, depicting the identification of candidate genes using a standard exome diagnostics workflow.This gene list can be further filtered based on expression analysis in the relevant (e. g., whole embryo or kidney) publicly available cell atlases.In the portrayed scenario, Gene C, expressed in the affected organ (brown arrowhead), will be prioritized for diagnostics.Note: Synthesized data.ticular cell type, developmental time, or tissue (Fig. 3A).An example of this was discussed earlier -the work on Rett syndrome (Box 2).The international effort of building such cell atlases has seen contributions from individual labs through to one enormous single-cell experiment as well as international consortia, such as the Human Cell Atlas (https://www.humancellatlas.org/),collaborating to put together different sc-seq experiments to scan the whole body.The outcome has not only been extensive knowledge on how the cells work in an orchestrated fashion leading to the development of a functional organism, but it has also laid the groundwork to interpret diseased cellular states, which from a clinical perspective can aid early disease detection and therapy (Fig. 2).In a case example where the clinician aims to determine the phenotypic consequences of a deleterious frame-shift mutation found in a patient with a rare disease, where the expression pattern of the gene is still unknown, it will no longer require painstaking expression profiling (e. g., in situ hybridization assays).Instead, this information is readily available at the fingertips through web interfaces to atlases such as https://www.cambridgecellatlas.org/ or https:// descartes.brotmanbaty.org/(Fig. 3B).Whole-embryo atlases of other model organisms such as mice, including during its embryonic development between E9.5 and E13.5 [23] with spatial information [102], and those featuring pleiotropic mutations are also available [103].Indeed new cell atlases are being published on a daily basis and it will take a concerted effort to collect, host, organize, and present this information in a useful, comparable, and seamless fashion to maximize their clinical benefits [104].

iii (b) Multiplexed, high-throughput phenotyping of variants
Current pipelines for the diagnosis of rare genetic diseases critically rely on functionally annotated variants.However, only about 30 % of the known variants in ClinVar have been definitely classified to be pathogenic or not, with nearly half the variants classified to be variants of uncertain significance (VUS).Indeed, the lack or uncertainty of some of the annotations has been attributed to the 10 % of the undiagnosed cases after whole-genome sequencing [1].With whole-genome sequencing being envisaged as standard care and the increasing number of un-annotated variants being identified in coding and non-coding regions, there is an urgent need for high-throughput genotype-phenotype screening technologies that can establish (cf.bioinformatically predict) deleteriousness and phenotypic consequences of variants.The advent of approaches such as saturation genome editing (see accompanying article by Findlay et al. in this issue) have enabled the highthroughput generation and screening of thousands of coding and non-coding variants, as opposed to relying on population screens to identify variants, where the rate of discovery of rare variants is inherently limited by mutation rates and selective pressures.These approaches have also been beautifully and comprehensively reviewed elsewhere [93].However, the throughput of genotyping has not yet been met in phenotyping, which is mostly limited to specific functional screens, by means of guide RNA representation in the population [105], application of selective pressure, or FACS sorting [106].Sc-seq promises to provide this tool, offering unbiased, multiplexed, and high-throughput phenotyping capabilities.It has already found applications in the annotation of genomic regions [107][108][109][110].The combination of genome editing approaches with sc-seq technologies will eventually enable testing of all observed variants of a patient in one multiplexed experiment.This will have an immediate impact on human genetics and help to establish genotype-phenotype relationships for variants across the entire human genome at scale.

Technological limitations
Sc-seq shares some of the fundamental challenges with bulk-seq in capturing and amplifying the nucleic acid from the samples, leading to PCR amplification biases, dropout events, and allelic imbalance, except that these biases are more exaggerated in sc-seq due to the limited nucleic acid content in individual cells.Bioinformatic quality control tools currently represent the primary strategy used to tackle such experimental artifacts.For example, tools such as Scrublet [111] have been developed to detect doublet cells in sc-seq to be filtered out, which may otherwise corrode the data.Appropriate extraction and handling of the samples are also vital, since the qualities of chromatin and the RNA are known to have a direct influence on the data quality [112].Current sc-seq sequencing methods also face a real trade-off between sequencing coverage and number of cells sequenced [10].Commercially available kits such as 10x Genomics toolkits help sequence 20,000 cells and detect a few thousand genes per cell, running the risk of smaller cell populations or phenotypes with subtle gene expression changes left undetected.However, advanced bioinformatic tools can help overcome some of these experimental limitations, as demonstrated by us and others by detecting even minor changes in both gene expression and cell type compositions in mutant mouse embryos [103].

Conclusions
In basic research, sc-seq technologies have been widely established as a toolbox to query developmental processes and disease mechanisms with unprecedented sensitivity and granularity.Despite neck-breaking advances over the last decade, the technology is in many ways still nascent.That is, the data created and conclusions drawn do not yet suffice as a "one-experiment proof" and continue to require reinforcements with additional validations.There are indications of increased transfer of sc-seq to a number of fields, including human genetics, with tens of genetic diseases that have already been characterized with this technology (Table 1).With sequencing and library preparation costs rapidly dropping, the protocols being standardized [113], and bioinformatic tools becoming more accessible, the barriers to translation are rapidly vanishing.The ultimate promise of the technology for the field of human genetics is to offer the means for massively parallel functional variants testing in vitro and at some stage also in vivo.While we are not there yet, we expect the technology to be ripe for adoption within the present decade.Education will ultimately play a key role in realizing this and we hope this review has contributed towards informing this readership of human geneticists about the current state of this recent and booming technology.
Research funding: M.S. is a DZHK principal investigator and is supported by grants from the Deutsche Forschungsgemeinschaft (DFG) (SP1532/3-1, SP1532/4-1, and SP1532/5-1) and the Deutsches Zentrum für Luft-und Raumfahrt (DLR 01GM1925).Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.Competing interests: Authors state no conflict of interest.Informed consent: Not applicable.Ethical approval: Not applicable.

Figure 1 :
Figure 1: The experimental workflow of sc-seq.(A) The workflow starts with the dissociation of the biopsy samples into a cellular or nuclear suspension.(B)This is followed by cellular barcoding using droplet-or well-based technologies that enables pooled sequencing of the molecules from all the cells.(C) The final step is sequencing.Although long-read sequencing is rarely used in current analysis workflows, it has potential for identification of splice variants in the case of sc-transcriptome-seq or structural variants in the case of sc-genome-seq as well as to reduce sequencing costs[22].Adapted from[10].

Figure 2 :
Figure 2: Analysis of sc-seq data for early disease detection.(A) The sc-seq data obtained from a brain biopsy can be visualized in the form of a UMAP embedding, where each cell is represented by a dot.Cells (dots) with similar phenotype (transcriptome or epigenetic marks) cluster together, which are assigned to cell types based on prior knowledge.(B) Clusters in A might contain cells undergoing transition, such as differentiation or cell cycles, which can be visualized by means of trajectory analysis.Here, cells that deviate from established healthy trajectories may aid in early disease detection.Based on [10] and [2].Note -synthesized data.

Figure 3 :
Figure3: Wild type atlases of human tissues and their applications in human genetics diagnostics.(A) Single-cell atlases of many human organs and tissues are publicly available, some of which are highlighted here.Blue and red colors indicate the modality of sc-seq included in the dataset.Detailed information related to these atlases can be found in Supplementary Table1.(B) Schematic of a use case scenario for the diagnosis of polycystic kidney disease, depicting the identification of candidate genes using a standard exome diagnostics workflow.This gene list can be further filtered based on expression analysis in the relevant (e. g., whole embryo or kidney) publicly available cell atlases.In the portrayed scenario, Gene C, expressed in the affected organ (brown arrowhead), will be prioritized for diagnostics.Note: Synthesized data.

Table 1 :
A list of sc-seq-based studies to phenotype and elucidate monogenic and chromosomal disorders.