Understanding the impacts of self-shuffling approach on structure and function of shuffled endoglucanase enzyme via MD simulations

Objective: We identify the impacts of structural differences on functionality of EG3_S2 endoglucanase enzyme with MD studies. The results of previous experimental studies have been explained in details with computational approach. The objective of this study is to explain the functional differences between shuffled enzyme (EG3_S2) and its native counterpart (EG3_nat) from Trichoderma reseei , via Molecular Dynamics approach. Materials and methods: For this purpose, we performed MD simulations along 30 ns at three different reaction temperatures collected as NpT ensemble, and then monitored the backbone motion, flexibilities of residues, and intramolecular interactions of EG3_S2 and EG3_nat enzymes. Results: According to MD results, we conclude that EG3_S2 and EG3_nat enzymes have unique RMSD patterns, e.g. RMSD pattern of EG3_S2 is more dynamic than that of EG3_nat at all temperatures. In addition to this dynamic-ity, EG3_S2 establishes more salt bridge interactions than EG3_nat. Conclusion: By taking these results into an account with the preservation of catalytic Glu residues in a proper manner, we explain the structural basis of differences between shuffled and native enzyme via molecular dynamic studies. distributed non-randomly. PROVE tests the models’ statistical Z-score deviations from good quality structures in Protein Data Bank (PDB) database [32] by calculating the volumes of atoms in macromolecules. PROCHECK exam-ines the stereochemical qualities of models by generating Ramachandran plots and by comparing the folding states of protein models with the crystal structures in PDB database to validate them statistically. WHATCHECK, derived from some validation algorithms of WHATIF software [33], analyses many stereochemical parameters for models.


Introduction
Among various biological molecules, enzymes are considered as the most remarkable products of natural evolution due to being involved in almost every cycle of life domain. Naturally occurred enzymes are mostly limited to adapt the harsh conditions of biotechnological applications since the needs and conditions of natural reactions are different than that of biotechnologically related ones. Hence, it creates a need for the invention of new methods to generate better catalysts than natural ones [1,2]. Among many existed methods, DNA shuffling is considered as one of promising approach, enabling us to perform directed evolution of enzymes in laboratory environment [3,4], in terms of generation of novel enzymes [2,[5][6][7][8][9].
Cellulase enzymes are able to convert available cellulose and lingo-cellulose to biomass, and this particular ability makes them more attractive and promising to solve the problem about available green energy source problem in all over the world [10]. They have great potential to be used in several fields such as food, brewery and wine, animal feed, textile and laundry, biofuel, and pulp and paper industry [11,12]. They are divided into two broad categories as endoglucanase (EG) and cellobiohydrolase (CB) in terms of mode of actions [13].
In our previous study, we performed self-shuffling approach with Cel12A (EG3) gene belonging to a fungal GH family 12 enzymes [14], being considered as one of five endoglucanase produced by Trichoderma reesei [13,15]. During the self-shuffling approach, 84, 240 and 419 bp, and 80, 220 and 420 bp DNA fragments of EG3_nat reassembled to end up with EG3_S1 and EG3_S2 enzymes [2]. Due to the newly established structure and the full conservation of catalytic resides, we reported EG3_S2 as a promising catalyst than its native counterpart with enhanced enzymatic and thermal stability, and altered kinetic properties [2]. The basis of this study is to elucidate how EG3_S2 is more promising catalyst than its native counterpart from the structural point of view.

Materials and methods
Through I-TASSER, the secondary structure of EG3_S2 enzyme was modeled [16,17]. In crystal structure, several residues in N-terminus of EG3_nat were also missing in PDB ID: 1H8V [14], and this missing part was also modeled through I-TASSER [16,17]. Among generated several models, the best one was selected based on C-scores [16]. Followed by the solvation and ionization of enzymes in Visual Molecular dynamics (VMD) [18], these systems were subjected to model refinement through 2 ns equilibration at 298 K, collected as NPT, after performing 20,000-step energy-minimization with Greedy Algorithm. At the end of refinements, refined models were subjected to 10,000-step energy-minimization with Greedy Algorithm, and then it was followed by 1 ns equilibration at 298 K, collected as NPT ensemble. Thirty nanoseconds production simulations were performed at three different temperatures, 313 K, 323 K and 33 K, with 2-fs time step. All simulations were repeated twice using different velocity seeds and the average results were reported for all cases, e.g. RMSD (backbone), RMSF, the evolution of salt bridges, and the radius of gyration.
All molecular dynamics simulations were performed using the NAMD [19] program with CHARMM 36 all-atom force field [20,21] including the correction maps [22]. Water was described as the TIP3P model [23]. During the NPT simulations, the pressure was kept constant at 1 atm using Langevin pressure coupling and periodic boundary conditions were applied at all dimensions (x, y, z). To calculate the electrostatic interactions, the particle-mesh Ewald (PME) method was used [24,25]. Through VMD, the visualization and analysis of MD trajectories, e.g. RMSD (backbone), RMSF, salt bridge interactions and radius of gyration, were performed [18].
The validation of completed EG3_nat and EG3_S2 structures were realized after modelling in I-TASSER (crude models), after 10,000-step energy minimization and after equilibration for 1 ns, separately, via SAVES v5.0 (http:// servicesn.mbi.ucla.edu/SAVES/) by employing the results of VERIFY 3D [26,27], ERRAT [28], PROVE [29], PROCHECK [30] and WHATCHECK [31]. VERIFY 3D compares a protein model with 3D profile of its amino acid sequences that was generated by using good quality structures. ERRAT does a statistical comparison for the non-bonded interactions between different atom types that were claimed as distributed non-randomly. PROVE tests the models' statistical Z-score deviations from good quality structures in Protein Data Bank (PDB) database [32] by calculating the volumes of atoms in macromolecules. PROCHECK examines the stereochemical qualities of models by generating Ramachandran plots and by comparing the folding states of protein models with the crystal structures in PDB database to validate them statistically. WHATCHECK, derived from some validation algorithms of WHATIF software [33], analyses many stereochemical parameters for models.

Results and discussion
In our previous study, we stated that performing self-shuffling with EG3 (Cel12A) gene has resulted in formation of novel enzyme, EG3_S2, with altered enzymatic and kinetic properties with respect to its native counterpart [2]. Here, we aim to elucidate the structural basis of these particular differentiations that basically determine the kinetic and enzymatic properties.
First, the generated protein models were examined as mentioned in method section and the details for model verifications are provided in Table S1. These results show that VERIFY 3D and PROVE gave a good result for only EG3_nat (pdb id: 1H8V). For ERRAT and PROCHECK results, the further steps from generation to equilibration are required. Of note, there were only a very little percentage of residues in disallowed region of Ramachandran plot for all models (see Table S1). Although none of them was perfect, all models were good enough for work based on WHATCHECK results. Since these analyses are mostly based on the statistical data derived from the good quality structures in PDB database, we obtain worse scores for EG3_S2 compare to EG3_nat. By taking the adequately good quality reports for the structures of EG3_nat and EG3_S2, especially Ramachandran plot analysis as the most sensitive method for validity of a model [34], we make a decision to carry on this study for further.
Upon self-shuffling approach, we obtain EG3_S2 enzyme with novel structural reorganization. For instance, there are reported 2 α-helices and 15 β-sheets in the crystal structures of EG3_nat [14] but 3 α-helices and 16 β-sheets in EG3_S2 enzyme [2]. As in line with expectations, these particular variations result in altered functionality and stability in EG3_S2 enzyme with respect to its counterpart [2], and now we report different RMSD patterns for EG3_S2 and EG3_nat enzymes in Figure 1. As displayed in Figure 1, more dynamic RMSD pattern has been captured for EG3_S2 rather than its native counterpart at all temperatures. Actually for EG3_S2 shuffled enzyme, increase in running temperatures with 10 K intervals does not lead any drastic change in its RMSD pattern. This Understanding the impacts of self-shuffling approach on structure and function fact proves us that EG3_S2 enzyme could easily adopt changes in temperature and hence more dynamic backbone response is captured along 30 ns. Even EG3_S2 and EG3_nat shared 65% identity in the level of amino acid sequence as we stated, the self-shuffling approach has resulted in formation of novel endoglucanase enzyme [2] and this n particular novelty corresponds to the unique RMSD patterns (Figure 1). Since we have already known that thermal stability of EG3_S2 is better than its native counterpart based on our previous experimental study; this particular increase in RMSD pattern is not about the dissociation or unfolding of EG3_S2 but it is about to the way of adopting temperature in a different way. Basically, the full or partial unfolding of enzyme structures are generally associated with an instant peak in RMSD patterns, which means ~2-3 or more fold increase in RMSD pattern within a very shorter time period ~0.02-0.08 ns [35], and RMSD patterns belonging to EG3_S2 could not be accounted as structural unfolding. Moreover, we saved the configurations of EG3_S2 along the MD trajectories, specifically at 10 th ns, 20 th ns and 30 th ns, at different temperatures and we checked the structural integrity of EG3_S2 by performing TM-alignment. As provided in Supplementary Figures 7 and 8, the structural integrity of EG3_S2 are preserved along the trajectories at all running temperatures by obtaining TM-scores as being smaller than 5 Å. Herein, we conclude that the particular increase in RMSD pattern of EG3_S2 is not due to the loss of structural integrity as we previously suggested. As already mentioned in the method section, MD simulations have been carried out with the X-ray structure of EG3_nat with PDB ID: 1H8V [14], and 3D model of EG3_S2 generated by I-TASSER. Thus, we draw a conclusion such that running MD simulations in these conditions would be a reason for this particular difference that we clearly observe in RMSD patterns ( Figure 1).
For native counterpart (EG3_nat) of shuffled enzyme, more flattered RMSD patterns are monitored along 30 ns (Figure 1). For both 313 K and 323 K, more or less same RMSD patterns are captured. Upon increase in temperature from 323 K to 333 K, RMSD values are almost doubled in a gradual manner.
Then, we focus on the RMSF patterns of EG3_S2 and EG3_nat enzymes along 30 ns. Here, we measure the RMSF values of Cα-atoms in backbone rather than RMSF measurements including side-chain atoms. As in line with RMSD patterns of shuffled and native enzymes (Figure 1), the higher RMSF results, especially observed in the first 100 amino acids, are reported for EG3_S2 rather than native enzyme and it also supports the fact of more dynamic behavior of shuffled enzyme compared to EG3_nat. This particular RMSF pattern is also in line with the RMSD pattern displayed in Figure 1. Again for RMSF patterns, we suggest that running EG3_S2 as a modeled protein structure but EG3_nat as a crystal structure could lead this particular difference in the fluctuations of residues.
Since temperature intervals (10/K) are not so high; we do not observe a clear segregation in RMSF patterns of both shuffled and native enzymes but we clearly identify higher mobility in EG3_S2, accounted as its novel structural property. In Figure 2, the most interesting feature is captured in RMSF pattern of native enzyme as indicated with circle. Between Tyr76-Asn107 residues, the flexibilities increase almost up to ~6-fold at 323 K compared to those at 313 K and 333 K. Here, it is crucial to point out that those RMSF values are back to those at 313 K upon increasing temperature from 323 K to 333 K, and it seems to be transient. In Figure S1, we provide the detailed structural information for Tyr76-Asn107 regions as this particular region is composed mainly β-sheet structures and is not covered the active site. Especially, Glu132 is so far from this particular region (see Figure S1). As stated before, the higher RMSF values are reported for EG3_S2 as its novel structural properties as being different that native one. Through the experimental studies, we have already revealed that there is than enzymatic activity of shuffled enzyme compared to its native counterpart [2]. By using modeled protein structure of the shuffled enzyme and crystal structure of native enzyme, we measured the flexibilities of particular residues, in close contact with catalytic ones, and concluded that there is a partial increase in those of EG3_S2 compared to native one [2]. This particular RMSF pattern also confirms the reported results. Previously, we explained this particular increase in EG3_S2 from the thermodynamic point of view as the flexibility of residues modulates the affinity and specificity of biomolecules through underlying energy landscapes [36]. This particular increase in flexibilities of residues results in a decrease in binding affinity of biomolecules since the binding process with increased flexibility might be associated with disorder in bound state of protein [37]. Indeed, binding affinity of protein is directly lowered via the contributions of increased flexibilities to the entropy of protein [38].
As a next step, we perform the multiple alignments of protein sequences belonging to shuffled and native sequences ( Figure S2). The great variations in the amino acid level have been created between EG3_S2 and EG3_nat as provided in Figure S2. Of course, this particular variation has played a role in intramolecular interactions, e.g. salt bridge formations. Compared to EG3_S2, less salt bridge interactions are counted in EG3_nat and this fact could provide us a basis for improved thermal stability in EG3_S2 with respect to its native counterpart, which has been already demonstrated in our previous work [2]. Along 30 ns MD simulations, there exist only three salt bridge interactions in EG3_nat at all three running temperatures, e.g. Asp142-Lys139, Asp187-Lys189 and Asp197-Arg196. Upon increase in temperature from 323 K to 333 K, one more salt bridge interaction exists as Asp63-Lys2 in N-terminus of enzyme.
In Figure 3, we provide the patterns of salt bridge interactions in EG3_nat at 313 K, 323 K and 333 K by indicating the closeness of these particular interactions to catalytic Glu residues. Specifically, Asp142-Lys139 and Asp197-Arg196 interactions are more stable compared to Asp187-Lys189, which is actually more solvent exposed compared to Asp142-Lys139 and Asp197-Arg196. We only report fluctuations for Asp187-Lys189 interactions upon increasing temperature. With VMD, we search for residues in close contact with catalytic Glu132 and Glu216, ~5 Å, and conclude that there is no close contact residues to catalytic Glu, which is involved in any salt-bridge interactions.
As clearly displayed in Figure 3, all these listed salt bridge interactions are not closely located to catalytic site, Glu132 and Glu216. As previously mentioned, one more salt bridge interaction exists with increase in temperature to 333 K, Asp63-Lys2, in N-terminus of EG3_nat. Due to increase in provided kinetic energy to system, the conformational changes have been experienced in enzyme structure and hence residues having a potential to form salt bridge interactions, Asp63-Lys2, is come into close contact. As in line with expectation, this particular Asp63-Lys2 interaction is not so stable over 30 ns trajectory ( Figure S3). It becomes so strong in specific time periods, e.g. 0 th -3 rd ns, 5 th -21 th and 24 th -25 th ns time periods in the range of ~3 Å. But, the higher fluctuated peaks are introduced over the rest of trajectories as being weaker, in the range of ~8-10 Å, and it displays a tendency to become weaker through the end of trajectory ( Figure S3). In fact, the location of salt bridge interactions contributes to stability of enzyme structures in different manners, e.g. salt bridge interactions on surface contribute less than 1 kcal/ mol but those on buried part of enzyme structure contribute more than 4 kcal/mol to stability of enzyme structures [39,40]. If we consider both location and strength pattern of Asp63-Lys2 interaction, we could draw a conclusion that Asp63-Lys2 interaction does not contribute more to the stability of EG3_nat at 333 K. As mentioned above, the self-shuffling approach of EG3 gene leads to generation of EG3_S2 with novel structural properties (Figure 1). The distributions of existed salt bridge interactions in EG3_S2 are also confirmed this fact as almost ~10-fold more salt bridge interactions are existed at EG3_S2 on comparison to native structure for each temperature. Among these many, we just focus on Glu77-Lys76, Glu77-Lys84, Glu243-Arg242, Glu278-Arg253 and Glu335-Lys331, which are consistently existed at all temperatures. As displayed in Figure 4, the patterns of all these interactions are dynamic instead of being so stable, for instance Asp197-Arg196 interaction of EG3_nat ( Figure 3). These particular patterns observed in salt bridge interactions of EG3_S2 could also verify its RMSD pattern, see Figure 1. Only for Glu278-Arg253 interaction of EG3_S2, we monitor the loosing of this interaction depending on increasing temperature.
As being different than EG3_nat, both catalytic Glu residues involve in salt bridge formations; Glu224-Lys263 at 333 K, Glu224-Arg237 at 313 K and 323 K, and Glu308-Arg237 at 323 K and 333 K. In Figure 5, the evolutions of these salt bridge interactions are displayed by indicating their closeness to catalytic site of EG3_S2. Upon providing more kinetic energy to system (333 K), the conformational changes have been undergone by EG3_S2 enzyme and it results in movement of Lys263 close to Glu224 that leads to the formations of salt bridge interaction in middle-strength ( Figure 5). Up to 5 th ns of MD trajectory, this particular Glu224-Lys263 interaction is not so strong but then it reaches to ~4 Å average distance between 5 th and 25 th ns time periods. The similar patterns in evolution of Glu308-Arg237 and Glu224-Arg237 interactions are also reported in Figure 5. During the first 10 th ns of MD trajectory, Glu308-Arg237 (323 K and 333 K) and Glu224-Arg237 (313 K and 323 K) interactions are not  so strong but then these particular residues come closer and the stabilized and strong salt bridge interactions are established up to the end of whole trajectory and to 25 th ns, respectively. For Glu308-Arg237 and Glu224-Arg237 interactions, we observe temperature-dependent loosening in thee through the whole trajectory in Glu308-Arg237 and from 10 th ns to the end of trajectory in Glu224-Arg237. To sum up, these particular interactions existed in active site of EG3_S2 have demonstrated us that the active site of EG3_S2 becomes compacts through the enhancement of intramolecular interactions, even increase in reaction temperature. Actually, these particular interactions could help the preservations of catalytic residues in a proper manner required for effective chemical reaction, and thus we report EG3_S2 in our previous work as a better catalysis than its native counterpart with altered k cat /k m ratio, ~6.5fold [2].
In fact, the adaptation of enzymes to higher temperatures is a complex issue to define, and it has been related to the Coulomb interactions, e.g. ion pairs, salt bridges and hydrogen bonds, and a set of cooperative effects, e.g. the minimization of surface energy, the hydration of nonpolar surface groups, the burying of hydrophobic residues, the optimization of core packing, weak protein-protein and protein-solvent interactions [41][42][43]. Reporting higher thermal stability for EG3_S2 than native in our previous work suggests that these Coulomb interactions and listed cooperative effects are arranged in a better way in EG3_S2 than its native counterpart. The differences in RMSD patterns of EG3_S2 and EG3_nat, and the evolution of existed salt bridge interactions have confirmed this fact.
Lastly, we report the radius of gyration (R g ) measurement results belonging to EG3_S2 and EG3_nat enzymes along 30 ns MD trajectories. Through R g measurement, we implicitly assess the compactness of protein structure as the lowest R g value corresponds to the tightest protein conformation. The results in Figure 6 tell us that EG3_nat displays the tightest conformation even increase in provided kinetic energy of system. These particular R g patterns of EG3_S2 and EG3_nat are in line with RMSD results (Figure 1) that EG3_S2 displays more dynamic pattern, accounted as a unique structural property, with respect to its native counterpart. Here, we suggest that the tendency of EG3_S2 to deviate from its centered mass is actually due to the variations in amino acid sequences between EG3_ S2 and EG3_nat ( Figure S2), and their 3D distributions instead of being partially unfold. The level of variations in amino acid sequences is too high to draw a conclusion for unfolding process by taking R g results into an account.
In sum up, we describe the differences in structural properties of EG3_S2 and EG3_nat enzymes along 30 ns MD simulations. We have already known from our experimental study that self-shuffling approach leads to the formation of novel endoglucanase enzyme, EG3_S2, in terms of enzymatic and kinetic properties. MD simulations help us to clearly elucidate how EG3_S2 is a better catalysis than its native counterpart from the structural point of view. Once again, this study demonstrates us how computational approaches are strong to support experimental results by filling the gaps from other perspectives.