Mutations in the SARS CoV-2 spike protein may cause functional changes in the protein quaternary structure [SARS CoV-2 spike proteinindeki

Objectives: This study aimed to model the changes result-ing from mutations in surface (spike/S) glycoproteins, which play a key role in the entry of the severe acute respiratory syndrome coronavirus-2 (SARS CoV-2) into host cells, in a protein quaternary structure and to evaluate their possible effects on the functional structure. Methods: Genome sequence information of SARS CoV-2-infected patients located in Turkey was obtained from the GISAID EpiCoV database. Structural analysis of spike proteins was done using bioinformatics tools (MAFFT, PSIPRED, ProMod3, PyMoL and DynOmics). Results: We identified 76 Thr>Ile mutations in the N -ter-minal domain; 468 Ile>Val mutations in the receptor binding site and 614 Asp>Gly, 679 Asn>Lys, 771 Ala>Val and 772 Val>Ile mutations in the S1 subunit. It has been observed thatthemutations,exceptthoseofresidues771and772,may cause signi ﬁ cant conformational, topological and electrostatic changes in a protein quaternary structure. It has been determined that the mutations in the receptor binding site transform the protein structure into a formation that can mask the binding site and affect receptor af ﬁ nity. Conclusions: It has been considered that SARS CoV-2 S glycoprotein mutations may cause changes in a protein functional structure that can affect the severity of disease.


Introduction
Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS CoV-2) has affected more than five million people and caused deaths of more than 340 thousand people in a short period of 6 months [1,2]. The COVID-19 agent is a positive-polarity, single-strand RNA virus [3]. Although there remain controversies about the origin of the virus and its intermediate host, it has been considered that the origin of the virus may be bats owing to high genetic similarities [4,5]. The genome size of the virus is 29.9 kb in total, comprising 12 protein-coding sequences [6]. These proteins play a variety of functional roles, from replication of the viral genome to virulence severity. The one with the most critical role among them is the surface (spike/S) glycoprotein structure that binds to the human angiotensin-converting enzyme 2 (hACE2) receptor and initiates the entry of the virus into the host cell [7,8]. Coronavirus S glycoproteins exhibit a homotrimeric structure with each monomer comprising two subunits (S1 and S2) [9]. Several independent studies on the SARS CoV-2 S glycoprotein structure have been recently published, but the molecular mechanism of interaction between the S glycoprotein receptor binding site and hACE2 remains unclear [10].
Although there is no definitive treatment for COVID-19, some drugs (such as remdesivir, Ebola; chloroquine, malaria and favipiravir, influenza) that are indicated for different diseases are used in the treatment of disease [11]. In the event that three-dimensional structures of virus proteins are not fully known, traditional studies remain limited. Because the discovery of a new molecule for the treatment of the disease requires a long time, the replacement of existing drugs can provide promising data [12,13]. At this stage, supporting conventional studies, which require a long time and large financial resources, with computational studies will contribute to reaching the target in a short time. In addition, due to the high mutation potential of RNA virus genomes, the risk of drug resistance is an important issue to be considered in the studies based on the genetic and protein structure [13,14]. Non-static drug target-validation of molecules to be developed for structural proteins-is one of the most significant challenges in fighting a disease. Computational studies as well as modeling of mutational changes in a protein quaternary structure in a very short time can provide important data for pharmaceutical drug studies.
In our study, the changes in the protein structure of mutations in the sequence encoding S glycoprotein from the Turkish SARS CoV-2 isolates, which is a significant drug target, were modeled. The results that may be ascertained by mutations in the functional structure of the protein have been evaluated in terms of changes in virulence effect and risk of drug resistance.

Sequence data and mutation analysis
Nucleotide sequence information of 61 isolates from Turkey were obtained from GISAID EpiCoV database [15]. S glycoprotein nucleotide sequences were transformed into protein sequences with MegaX software [16]. Reference spike glycoprotein accession code is YP_009724390 [17]. Protein sequence information of 61 isolates were aligned with the MAFFT (v7.463) multiple sequence alignment program FFT-NS-2 algorithm [18]. The scoring matrix BLOSUM 80 was chosen for the amino acid sequences [19]. Gap opening penalty was used 1.53. The mutated residues were analyzed with MegaX bioinformatic workbench [16].

Homology modeling and structural analysis
Three-dimensional model of wild and mutant spike proteins was generated by the method of homology modeling using Swiss-Model [20]. 6vxx (protein data bank code) was selected as template. Models are built based on the target-template alignment using ProMod3 [21]. MolProbity and QMEAN (Qualitative Model Energy Analysis) were used for structural validation and model quality (covalent geometry, torsion angle, optimized hydrogen placement and whole atom contact analysis) of wild and mutant spike proteins [22,23]. Physiochemical properties of wild and mutant spike proteins were estimated by Prot-Param tool from ExPASy portal [24]. Secondary structure component (random coils, beta strands alpha helices) of spike protein were identified by using PSIPRED web server [25]. Protein structure imaging and comparison was performed with PyMoL (Ver2.3.4 Schrödinger).
TM-score and RMSD (root mean square deviation) values were calculated with i-Tasser web service to detect topological and structural differences between wild and mutant S proteins [26]. The changes caused by mutations in the dynamic structure of proteins were simulated in elastic network models (ENM). After mutation, the movement and dynamic changes of molecules in the S glycoprotein structure were analyzed by DynOmics. By comparing the experimentally crystallographic data with the wild and mutant model data obtained in this study, both the collective motion of the proteins and the mean square fluctuation of each residue were calculated by Gausian network model (GNM) and anisotropic network model (ANM) [27].

Results
A total of six mutations have been determined in the S glycoproteins from the Turkish SARS CoV-2 isolates. We identified 76 Thr>Ile mutations in the N-terminal domain; 468 Ile>Val mutations in the receptor binding site and 614 Asp>Gly, 679 Asn>Lys, 771 Ala>Val and 772 Val>Ile mutations in the S1 subunit. Mutant S glycoprotein sequence information was established in the MegaX software using the identified mutation data. Secondary structure prediction for wild type and mutant S glycoprotein was performed using the PSIPRED workbench. Furthermore, 65 strands, 20 helices and 84 coils were identified in the wild type glycoproteins, whereas 62 strands, 25 helices and 87 coils were identified in the mutant glycoproteins. The wild type and mutant protein models for SARS CoV-2 S glycoprotein were created using the ProMod3 (ver3.0.0). The coverage value and sequence identity ratio of pattern 6vxx in the wild type structure, which is used for the homotrimeric SARS CoV-2 S glycoprotein model, were 0.94 and 99.50%, respectively. The coverage value and sequence identity ratio in the mutant structure were 0.94 and 99.08%, respectively. The quality of the model was evaluated using the QMEAN and MolProbity software. The MolProbity scores in the mutant and wild type structures were 1.39 and 1.42, respectively. The MolProbity score of the model used as a pattern was 2.8. The fact that the MolProbity score in the model is lower than that in the model used as a pattern shows that it is better than the average structures at this resolution [28]. For the wild type protein sequences, the GMQE score was 0.75, the QMEAN score was −2.07, the clash score was 1.57, the Ramachandran favored score was 91.5% and the Ramachandran outliers score was 1.82%. For the mutant protein sequences, the clash score was 1.34, the Ramachandran Favored score was 91.69%, the Ramachandran Outliers was 1.52%, the GMQE score was 0.74 and the QMEAN score was −2.13. Ramachandran plots indicate that the generated protein models have acceptable polypeptide backbone phi (φ) and psi (Ψ) torsion angles for alpha helix and beta strand regions [29]. The quality scores show that the models are obtained within appropriate limits in terms of model quality.
It was found that Thr>Ile mutation in S glycoprotein residue 76 may result in changes in the protein conformation and tertiary structure ( Figure 1). It was observed that 76 Thr>Ile mutations caused changes in the region between 68 Ile and 166 Cys. Furthermore, residues 154-156 (coils in the wild type) transformed into a shorter strand structure in the mutant type ( Figure 2). We observed that Ile>Val mutation in the area where SARS CoV-2 S glycoprotein interacts with hACE2 receptor can result in changes in the tertiary structure between residues 468-490, which exhibit coil structural properties (Figure 3). In the wild type, the distance between 456 Phe and 482 Gly was 23.9 angstroms (Å), whereas in the mutant type, this distance was found to be 17.2 Å, which is closer to the major protein backbone by 6.7 Å. In the same region, the distance  between 476 Gly and 490 Phe in the wild and mutant types were found to be 4.4 Å and 13.8 Å, respectively ( Figure 4). It has been observed that 614 Asp > Gly mutations in S glycoprotein may cause conformational changes between residues 620 and 641 ( Figure 5). It was observed that the structure between residues 623 and 638 with a hairpin appearance in the wild type ( Figure 5A) varied in the mutant protein ( Figure 5B). The distance between residues 623 and 638 in the wild type and mutant proteins was found to be 8 and 10.9 Å, respectively. It has been determined that S glycoprotein 771 Ala>Val and 772 Val>Ile mutations did not exhibit conformational deterioration in these positions,   which may cause a change in the functional structure ( Figure 6). It was observed that the mutations caused changes in the electrostatic structure of S glycoprotein (Figure 7). Between the wild and mutant S proteins, the RMSD and TM-score value were found to be 0.62 Å and 0.975, respectively. Gaussian network model (GNM) and anisotropic network model (ANM) were calculated with DynOmics to analyze the changes in protein dynamic structure ( Figure 9). The B-factor correlation between the experimental crystallographic structure and the wild protein was 0.67, and 0.62 with the mutant protein.

Discussion
Higher infection and mortality rates associated with SARS CoV-2 compared with other pathogen coronaviruses prioritise the clarification of its molecular organisation and pathogenesis.
The present study revealed that the mutation in S glycoproteins in the Turkish SARS CoV-2 isolates may cause changes in the protein quaternary structure, topology and conformation (RMSD 0.62 and TM-score 0.975). S glycoprotein is known to play an important role in the entry of the virus into target cells, cellular transmission and infection severity. The virus, which binds to the host cell receptors with the S1 subunit of S glycoproteins, initiates the membrane fusion process required for the passage of the virion into the host cell via S2 subunit activity [30][31][32]. Two conformational conditions are observed in the S glycoprotein receptor binding site. One of them is down-formation, in which the receptor binding site is hidden/masked, whereas the other is upformation, in which the receptor binding site is accessible and exhibits a less stable structure [33][34][35]. In this study, it was observed that the region that exhibits an upformation-like structure turned into a down-formationlike structure because of the changes caused by 468 Ile>Val mutation between residues 468 and 490 in the region in which the S glycoprotein is bound to hACE2 ( Figure 8). The region with formational change (between residues 468 and 490) covers 8 of 14 residues in which the S glycoprotein interacts with the hACE2 receptor [36]. It is known that the mutations in the receptor binding site have an impact on the S glycoprotein-receptor affinity [37][38][39]. The data obtained in this study point to the structure evolving into a down-formation-like structure. This can result in a decrease in the S glycoprotein receptor affinity, viral infection severity and transmission. Moreover, the data obtained from the study may be associated with the fact that the basic reproduction   number (R0) in Turkey is decreased from double digit numbers to less than -1- [40].
In the present study, it was observed that 614 Asp>Gly and 76 Thr>Ile mutations may cause conformational changes in the region between residues 620 and 641 and between 68 Ile and 166 Cys residues in the S1 subunit of S glycoprotein, respectively. It was observed that residues 154-156 (coils in the wild type) transformed into a shorter strand structure in the mutant type. Following the binding of the S1 subunit to the host cell receptor, large-scale conformational changes occur in its secondary, tertiary and quaternary structures. These changes are intended to ensure the transition of the S2 subunit to a stable conformation following the membrane fusion [34,35]. The conformational changes following 614 Asp>Gly and 76 Thr>Ile mutations are considered to affect the mechanism of such multilayer exchange and the membrane fusion process in the S1 subunit.
The S glycoprotein dynamic network models indicated a decrease in molecular fluctuation in the mutant protein compared to the wild protein ( Figure 9). It is known that the molecular fluctuation frequency is closely related to the functional properties of biomolecules [41,42].
Theoretical fluctuation data of static molecular structures contribute to high accuracy results in the analysis of collective movements and dynamic changes of large biomolecules [43]. The reduction in molecular fluctuation in the receptor binding region of the mutant S glycoprotein can be evaluated from two aspects. Firstly, decreased molecular motion can result in a decrease in ACE2 affinity. Secondly, reduced molecular motion can enable the S-ACE2 complex to transform into a more stable and rigid structure after binding. After mutation, considering the transformation of the receptor binding site into an upformation-like structure, molecular stability and rigid structure may adversely affect the effective coupling of S glycoprotein and ACE2 and the fusion process.
In conclusion, the mutations in the Turkish SARS CoV-2 isolates may result in changes in the conformational and topological structure of S glycoprotein. It is considered that conformational changes in the receptor binding site may result in a decrease in the receptor affinity. The data obtained in our study provide a structural framework to understand the possible effects of changes caused by S glycoprotein mutations in a protein quaternary structure on the functional properties.