Fast forward evolution in real time: the rapid spread of SARS-CoV-2 variant of concern lineage B.1.1.7 in Saxony-Anhalt over a period of 5 months

Objectives: Random mutations and recombinations are the main sources for the genetic diversity in SARS-CoV-2, with mutations in the SARS-CoV-2 spike (S) receptor binding motif (RBM) representing a high potential for the emergence of new putative variants under investigation (VUI) or variants of concern (VOC). It is of importance, to measure the different circulating SARS-CoV-2 lineages in order to establish a regional SARS-CoV-2 surveillance program. We established whole genome sequencing (WGS) of circulating SARS-CoV-2 lineages in order to establish a regional SARS-CoV-2 surveillance program. Methods: We established a surveillance program for circulating SARS-CoV-2 lineages by performing whole genome sequencing (WGS) in SARS-CoV-2 isolates. Specimens were collected over a period of 5 months from three different sites. Specimens were collected from both patients suffering from COVID-19 and from outpatients without any clinical signs or symptoms; both in a tertiary university hospital, and two private laboratories within an urban area of eastern part Germany. Results: Viral WGS from the 364 respiratory specimens with positive SARS-CoV-2 RT-PCR comprised 16 different SARS-CoV-2 lineages. The majority of the obtained sequences (252/364=69%) was assigned to the variant of concern (VOC) Alpha (B.1.1.7). This variant ﬁ rst appeared in February in our samples and quickly became the dominant virus variant. All SNP PCR results could be veri ﬁ ed using WGS. Other VOCs found in our cohort were Beta (B.1.351, n=2) and Delta (B.1.617.2, n=1). Conclusions: Lineage analysis revealed 16 different virus variants among 364 respiratory samples analyzed by WGS. The number of distinct lineages dramatically decreased over time in leaving only few variants, in particular, the VOC Alpha or B.1.1.7. By closer inspection of point mutations, we found several distinct mutations of the viral spike protein that were reported to increase affinity or enable immune escape. Within a study period of only 5 months, SARS-CoV-2 lineage B.1.1.7 became the dominant lineage in our study population. This study emphasizes the bene ﬁ t of SARS-CoV-2 testing by WGS. The increasing use of WGS to sequence the entire SARS-CoV-2 genome will reveal additional VUIs and VOCs with the potential to evade the immune system and, thus, will be a promising tool for data mining of relevant information for epidemiological studies. SARS-CoV-2 lineage monitoring using WGS is an important surveillance tool for early detection of upcoming new lineages of concern.


Introduction
Like in other RNA viruses, random mutations and recombinations are the main sources for the genetic diversity in SARS-CoV-2 [1]. As this might lead to new potentially dangerous mutations, whole genome sequencing (WGS) of SARS-CoV-2 has been identified as an important technology for surveillance [2]. Viral entry into human cells is determined through the spike protein ORF S [3]. Mutations in the SARS-CoV-2 spike receptor binding motif might lead to potential variants of concern (VOC) [4]. E.g., spike protein point mutation D614G led to a massive increase in infectivity of SARS-CoV-2, and viruses carrying this mutation became dominant in early 2020 [5,6]. Another mutation of the spike protein that severely increased the infectivity is the N501Y substitution, prevalent in variants of concern like B.1.1.7 and B.1.351 [7]. In this study, we established a surveillance program for circulating SARS-CoV-2 lineages, in particular for the VOCs Alpha (B.1.1.7), Beta (B.1.351), Gamma (P.1) and Delta (B.1.617.2), from the area of Halle (Saale), located in the German federal state of Saxony-Anhalt. We performed whole genome sequencing of virus genomes obtained by tiled amplicon sequencing of samples collected between January and May 2021. While we observed a highly diverse lineage distribution at the beginning of the collection period, VOC B.1.1.7 first appeared in February, and then rapidly became the dominant form of the virus found in our samples.

Materials and methods
Sample collection SARS-CoV-2 isolates were collected from COVID-19 patients as well as routine nasopharyngeal swabs from patients without any clinical signs and symptoms in a tertiary university hospital within an urban area of eastern part of Germany. Real-time qPCR (RT-PCR) was performed from patients' respiratory samples to analyze the viral genome copy number following our internal standard operation procedures. In case of SARS-CoV-2 positivity, a single nucleotide polymorphism (SNP) discrimination assay was established to discriminate N501Y and del69/70 variants. For SARS-CoV-2 whole genome sequencing, only SARS-CoV-2 positive samples with a Ct-value <25 were used.

NGS library preparation, sequencing and data processing
Sequence libraries were constructed from total RNA extracts using the Illumina COVIDSeq Test kit (Illumina, San Diego, CA, US). Shortly, first-strand cDNA, serving as template for two distinct PCR reactions is synthesized using sets of primers generating amplicons that cover the whole SARS-CoV-2-genome. The generated amplicons were tagged with adapter sequences followed by another PCR, which adds primers including index-sequences. A pooled library of all samples was used for sequencing. Samples were sequenced with an Illumina NextSeq 1000 (Illumina, San Diego, CA, US) in either single-end 100 nt or paired-end 50 nt mode.
Publicly available SARS-CoV-2 genome sequences from the whole federal republic of Germany collected between January and May 2021 were obtained from GISAID (gisaid.org; [13]). Only complete genome sequences not flagged with low coverage were used. Lineage assignments were obtained as described above for our own samples.

Variant distribution
In total, 364 respiratory specimens from different patients with positive SARS-CoV-2 RT-PCR were collected between January and May 2021 and subsequently sequenced. Patients were either hospitalized because of respiratory symptoms due to COVID-19, or were outpatients who received SARS-CoV-2 swabs for surveillance reasons. The sequenced samples comprised 16 different SARS-CoV-2 lineages. During January and February 2021, we observed a quite diverse mixture of SARS-CoV-2 lineages among our samples ( Figure 1A and B). In February 2021 we observed for the first time the variant of concern (VOC) B.1.1.7, also referred to as Alpha. Strikingly, the fraction of this variant increased tremendously and B.1.1.7 quickly became the dominant lineage in our samples ( Figure 1C-F). This enforcement of B.1.1.7 could also be observed by considering SARS-CoV-2 genome sequences from the entire country of Germany which was collected in the same time period, and submitted to GISAID (gisaid.org). However, it appeared that B.1.1.7 approached earlier in some other regions of Germany, since this variant started to become dominant in the whole republic already in February 2021 (Supplemental Material, Figure S1). All samples classified as B.

Mutation analysis
Mutations were determined via pairwise sequence alignment (PSA) between the SARS-CoV-2 reference genome (RefSeq NC_045512.2) and the respective consensus sequences. The median number of detected single nucleotide variations (SNVs) per consensus sequence increased over time from 20 in January to 39 in May 2021. The increase coincides with the rise of B.1.1.7, however, when considering only genome sequences from this variant, numbers were also slightly increasing, indicating the ongoing evolution of the virus (Figure 2A).
By far, the most common SNVs were C to U transitions (42% of all SNVs, Figure 2B). This is in agreement with previous reports and might indicate the activity of RNA editing enzymes like APOBEC or ADAR proteins from the human hosts [14,15].
We could find several mutations in the spike protein (S) that have been described to either increase the affinity to the human ACE2 receptor or are under suspicion to confer immune evasive potential to the virus.
In all 364 samples the SNV causing the S:D614G substitution was found. This mutation was reported to increase the infectivity and viruses harboring this mutation rapidly became the most prevalent form [6].
The S:N501Y mutation was exclusively found in all but one samples classified as B.1.1.7 (n=251) and in both samples classified as B.1.351. This mutation has also been reported to increase infectivity [7] and is characteristic for the VOCs B.1.1.7 and B.1.351.
Another mutation of regard that we found exclusively in the 11 samples classified as B.1.160 or B.1.160.31, respectively is the S:S477N mutation. This mutation was associated with resistance to neutralisation by multiple monoclonal antibodies [16].
Another point mutation located in the receptor binding domain (RBD) of the spike protein reported to enhance binding affinity to the human ACE2 receptor is the N439K  mutation [17]. Among other SARS-CoV-2 variants, this mutation was found in the lineage B.1.258 and we recovered this mutation exclusively in the three samples classified as B.1.258. The S:452R point mutation was found in the two samples classified as C.36 and B.1.617.2, respectively. This mutation was reported to strengthen the binding affinity and to enable the escape from neutralising antibodies [18].
In the sample classified as C.36 we also found another previously reported SNV in the spike protein, Q677H, which, so far, has not been shown to alter virulence, however, according to its location in the RBD, it has been speculated to influence viral properties [19].
Two different reported substitutions of the glutamic acid 484 of the spike protein were found: E484K and E484Q. While the first was found exclusively in the two B.1.351 samples, the latter was found in one sample classified as B.1.1.7 and another classified as B.1.177.86. Both mutations have been described to reduce antibody neutralisation [20,21]. Table 1 summarizes the findings on mutations of the spike protein.

Discussion
As reported in previous studies, there is a high need for SARS-CoV-2 lineage detection for the implication of further COVID-19 infection control measures. This was shown in particular during the initial SARS-COV-2 outbreak report from Hamburg [22], but also in various other outbreak publications, e.g. from the biggest German meat processing plant, located in North Rhine Westphalia [23], from the district of Heinsberg in North Rhine-Westphalia [24], as well as from seroprevalence studies [25].
Applying WGS allowed us to directly infer the virus lineages instead of testing for only a few variants of interest via single nucleotide polymorphisms (SNP)-PCR [26]. SARS-CoV-2 WGS was able to determine the whole set of circulating SARS-CoV-2 lineages from the local region of interest. Furthermore, WGS allowed us to find and investigate all mutations in the virus consensus sequences, and not only those used for defining lineages. This facilitated a deeper insight into the evolution of SARS-CoV-2 as well as the distribution of certain mutations. SARS-CoV-2 WGS from the reported urban area in eastern part of Germany showed a highly variable lineage distribution that changed from a rather diverse population to a population highly dominated by the VOC Alpha (B.1.1.7) in only a few months. This development resembled the development of virus lineages in the entire country of Germany, for the same time period as reflected by publicly available data from GISAID. However, the spread of B.1.1.7 in the considered area showed a time-lag compared to the data from Germany, suggesting that this variant reached other regions of the country earlier. This is in line with a previous report, in which SARS-CoV-2 spread was seen first in western and southern regions of Germany, where early mobility restrictions were implemented [27]. Mobility restrictions following lockdown measures did not happen homogeneously in Germany, a greater mobility reduction occurred in western and southern states as compared to the eastern states of Germany (e.g. Saxony-Anhalt). The VOC Gamma (P.1) was not detected in our study, however, Beta (B.1.351) and Delta (B.1.617.2) were detected, in two and one patient samples, respectively. Thus, also our study underlines the need for regional surveillance programs, in order to be able to react quickly on changing virus lineage distribution, since differences in virus genotype can impact viral load, support continuous shedding of the virus, promote spread of SARS-CoV-2 and might impact the effectiveness of vaccination strategies against SARS-CoV-2.
Acknowledgments: The authors would like to thank Anke Cachay, Monique Dreher, Anastassia Teuber, Gabriele Unger and Nancy Wurg (Department of Laboratory Medicine, Unit III, Molecular Diagnostic Section, Halle University Hospital) for specimen handling, SARS-CoV-2 RNA isolation/inactivation and SNP-PCR, as well as Michael Böttcher and Ghanem El Kassem (Charles Tanford Protein Centre, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany) for cooperation with the Illumina Mini-Seq. We further acknowledge the authors, originating and submitting laboratories of the sequences from GISAID's hCov19 Database (see Supplemental Material, Table S1).