In silico approach to understand epigenetics of POTEE in ovarian cancer

Abstract Ovarian cancer is the third leading cause of cancer-related deaths in India. Epigenetics mechanisms seemingly plays an important role in ovarian cancer. This paper highlights the crucial epigenetic changes that occur in POTEE that get hypomethylated in ovarian cancer. We utilized the POTEE paralog mRNA sequence to identify major motifs and also performed its enrichment analysis. We identified 6 motifs of varying lengths, out of which only three motifs, including CTTCCAGCAGATGTGGATCA, GGAACTGCC, and CGCCACATGCAGGC were most likely to be present in the nucleotide sequence of POTEE. By enrichment and occurrences identification analyses, we rectified the best match motif as CTTCCAGCAGATGT. Since there is no experimentally verified structure of POTEE paralog, thus, we predicted the POTEE structure using an automated workflow for template-based modeling using the power of a deep neural network. Additionally, to validate our predicted model we used AlphaFold predicted POTEE structure and observed that the residual stretch starting from 237-958 had a very high confidence per residue. Furthermore, POTEE predicted model stability was evaluated using replica exchange molecular dynamic simulation for 50 ns. Our network-based epigenetic analysis discerns only 10 highly significant, direct, and physical associators of POTEE. Our finding aims to provide new insights about the POTEE paralog.


Introduction
Ovarian cancer is a slow and a silent killer in females leading to deaths annually [1][2][3]. Ovarian cancer that forms from the epithelial cells of the oviduct (fallopian tube) is very common in females. There are five types of ovarian cancer -growing from the epithelial cells (high grade/low grade serous), oviduct, endometrioid (endometrium), mucinous (cervical glands), and clear cell tumors (vaginal rests) [3,4]. The World Health Organization (WHO) reports that ovarian cancer is detected in females in their 60s [5]. The disease is still an apex challenge for clinicians as its initial screening and diagnosis are not specific. There is a lack of effective biomarkers, and thus, no person-centric treatment strategy is available. Generic information suggests that age, familial history, genetics, environmental factors are responsible for causing ovarian cancer [2].
Epigenetics is a commonly encountered term with cancers and many other diseases and is simply a new sub-field in molecular biochemistry that aims to study the changed heritable physical characteristics (phenotypes), gene expression, and its activity without any change in the original DNA template sequence [6][7][8]. DNA methylation and histone modifications are the two widely studied epigenetic mechanism [9][10][11][12][13]. The epigenetic modifications consist of mutualistic interactions between DNA methylation, histone modification, and micro RNA (miRNA) expression that channelize and maintain the gene expression during cancer formation [14]. Myriad epigenetic mechanisms have also been observed to trigger the development of ovarian cancer [2]. Prostate ovary testis embryo expression (POTE), a cancer testis antigen (CTA) family of 14 paralogs have been classified according to phylogenetic evidences into three main groups -group I (POTEA), group II (POTEB1, B2, B3, C & D) and group III (POTE, F, I, J, KP, M) [15][16][17][18].
Researchers [17] in their study state that POTE groups I & II showed their normal testis-specific behavior in normal tissues where they expressed as cancer-testis antigens (CTAs) howbeit, POTE group 3 was observed in many normal tissues pointing to their non-CTA nature. In another study [19], the research group discerned that cancer testis antigen (CTA) family -POTE (prostate ovary testis embryo expression) has been intertwined to display its role in ovarian cancer due to global hypomethylation of L1 and 5 ′ CpG hypomethylation. They suggested that POTEs C, E, and F have a high dominance in high-grade serous epithelial ovarian cancer (HGSOC) combined with hypomethylation at 5 ′ promoter regions that they deduced from patient matched samples. While examining decitabine treatment and DNA methyl transferase (DNMT) knockout cell lines they validated that DNA methylation functions as a suppressor to POTE expression, while epigenetic drug treatment aiming histone deacetylases (HDACs) and histone methyltransferases (HMTs) along with decitabine improved the POTE expression. Also, Wang et al. [20] have screened POTEE paralog, viz., a group 3 member of the POTE family, and suggest its clinical importance to be used as an identifier for non-small cell lung cancer (NSCLC). With all these studies in hand, we aim to identify the lesser-known POTEE paralog in ovarian cancer using an exploratory in silico pipeline. With the few literature sources cited above that showcase that POTEE gets hypomethylated (over-expressed) in ovarian cancer, our study has validated this using an in-silico analysis that uses genomic, structural, electrostatic and epigenetics-based network approach. Genomic analyses help us to hint out at the correlation between the CTCF based motifs and POTEE paralog. We also predict the structure of POTEE using a deep neural network (DNN) based homology modelling and then compare it with existing Swiss-Model and AlphaFold POTEE models to check for the confidence per residue. We also check for the energy stability and electrostatic stability of our predicted model using replica exchange molecular dynamics (REMD). Finally, to establish why POTEE paralog showcases an epigenetic nature in ovarian cancer, we adopt a network based epigenetic approach that lists out the highly significant, direct and physical associators of POTEE.

Genomic analysis -motif identification, enrichment, comparison & occurrences
We used CTCFBSDB software [21,22] for motif identification. Please see, we have not incorporated any statistical analyses in the current study. For CTCFBSs, we deployed a web-based tool named -CTCFDB that predicts the CTCF using different permutations and combinations of zinc fingers to identify divergent DNA sequences. This web tool has an array of identified core motifs for CTCFBS sequences and the motifs are shown using position weight matrices (PWM). In total, six PWM are used to represent CTCFBS sequences that get rectified and ultimately get included in the webtool repository. The EMBL_M1 and EMBL_M2 motifs were identified by Schmidt et al. [23], while the Ren_20 motif was first given by Kim et al. [24]; and the LM2, LM7, and LM23 motifs were rectified first by Xie et al. [25]. This webtool uses the STORM program15 and each of the six PWM to provide the best single sequence in the users query sequence. MEME suite's CentriMo software [26] was deployed for motif enrichment purposes.
TomTom software viz., available in MEME suite [27] was used for identified and enriched motif comparison. Find Individual Motif Occurrence (FIMO) was also deployed to identify the motif occurrences in the POTEE mRNA sequence [28]. We have used the threshold scoring parameter for selecting the best CTCF-based motifs, while q and p-values for selecting motif occurrences.

Structural alignment: DNN-based POTEE model aligned to Swiss-Model POTEE model
We aligned our predicted POTEE model to the existing Swiss-Model [30] POTEE model (ID: Q6S8J3) to infer the common regions protein sequence (UniProt Id: Q6S8J3). This alignment of both the structures was executed in PyMol software [31]. Similarity index (%age), coverage and TM score were basic parameters that were used for selecting template that was to be used to develop the structure of POTEE.

Replica exchange molecular dynamic simulation (REMD) and electrostatic analyses
The modeled POTEE structure was submitted for the replica exchange molecular dynamics (REMD) in NAMD-VMD software [32] for 50 ns. CHARMM 22 parameter forcefield (par_all22_prot_cmap.inp) was deployed to compute the essential forces and energies for this purpose (https://www.ks.uiuc.edu/Training/Tutorials/namd/namd-tutorial-unix-html/node25. html). The maximum and minimum temperature ranges were obtained from temperature predictor for parallel tempering simulations viz., a webserver that generates temperature sets for REMD simulations [33]. The retrieved temperature string was as -300.00, 318.87, 338.60, 359.16, 380.76, 400.00 (300-400 K) for 20 replicas. The model was minimized using the conjugate gradient (CG) algorithm [34]. The replica exchange desired acceptance ratio was tuned to be greater than 0.2 with the neighboring replica exchanges were checked after every 10 ps. A total of 20,000 replica exchanges were obtained after the completion of the simulation. 0.002 ps was set as the integration step for mass production run. The simulation time was set as 50 ns wherein the early 10 ns was kept for the equilibration phase and the remaining 40 ns for all of the additional analyses. Solvation was executed using a dodecahedron rhombic box where the shortest distance between the POTEE model and the edge of the box was kept 1 nm, thus 50,000 interacting particles in the entire system. Neutralization was done at 0.15 M NaCl concentration to maintain the overall charge of the system. Electrostatic associations were computed for each of the above mentioned steps using the particle-mesh Ewald (PME) method with a 1.2-nm cut-off range of electrostatic interaction. A cut-off of 1.2 nm was subjected to Lennard-Jones (LJ) interactions. Molecular mechanics generalized Born surface area (MM-GBSA) approach was used for calculating the binding free energy (delta G) over simulation time that was achieved by the adaptive Poisson-Boltzmann solver (APBS) plugin that is installed directly in PyMOL (https://pymolwiki.org/index.php/APBS_Electrostatics_Plugin). Moreover, Bluues software [35] was used for electrostatic calculations and surface potentials computations. The SCFbio ROG web tool [36] was deployed to calculate the radius of gyration (ROG). Protein Frustratometer 2 web server [37] was used to check and compute the energy landscape and dynamics. For REMD analysis, we relied upon the RMSD, accuracy scores, Molprobity, GBSE, total energy and ROG as the crucial parameters for assessment of the refined POTEE model.

Network analysis
We employed the protein sequence of POTEE paralog (UniProt ID: Q6S8J3) for the identification of protein interactors using ConsensusPathDB [38]. Network associators were selected based on the closest distance neighbour of POTEE paralog and thus was the main parameter in selecting and sorting the significant network associators.

Motif identification
To rectify different motifs present in the POTEE mRNA sequence we deployed CTCFBSDB software viz., based on CCCTC-binding factor (CTCF) that is simply a conserved transcription regulator ubiquitous in almost every organism -from the fruit fly to human beings. It attaches itself to various DNA sequences with the aid of 11-zinc fingers that mainly depend upon the biological context. These binding factors represent diverged DNA sequences and have been addressed to have a crucial part in gene expression control. Recent studies suggest that these factors are affiliated to genomic imprinting and X-chromosome inactivation [21,22] that are two major epigenetic mechanisms. We submitted the mRNA sequence of the Homo sapiens POTEE (accession Id: NM_001083538.3) to CTCFBSDB and identified six essential motifs. Only three motifs namely (i) CTTCCAGCAGATGTGGATCA (score 13.1291), (ii) GGAACTGCC (score 12.1573), (iii) CGCCACATGCAGGC (score 8.44245) had the maximum likelihood to be present in the nucleotide sequence of POTEE as these hits were matching to various other motifs present in the repositories such as JASPAR 2020 [39]. Table 1 represents the identified motifs along with their confidence score in detail.

Motif enrichment and occurrence analyses
The CentriMo predicted the nucleotide percentage present in the mRNA POTEE sequence. Nucleotide pair of A-T was 0.2534 while the C-G pair had 0.2466. Figure 1 is the predicted motif probability graph showing the distance from the best site from the sequencing center.
The three best scoring motifs were submitted for motif comparison using TomTom. For motif enrichment purposes, the Pearson correlation coefficient was used to score the motifs. All these motifs were perfect matches to other motifs in the human and mouse genome. For the M1 motif, 25 motifs were perfect matches, while for the M2 motif only 6 perfect matches were obtained. For the M5 motif, 13 matches were predicted using various databases namely, JASPAR2018_CORE_vertebrates_non-redundant, where 579 motifs were screened and only 23 were matched. In the uniprobe_mouse database, 386 motifs were screened that resulted in only 3 matches, whereas, for the jolma2013 database, 843 motifs were screened that resulted in 15 perfect matches to our query motifs -M1, M2, and M5. Depending on the E and P values, we have selected only the best scoring motif matches for all the three scoring motifs -M1, M2, and M5. These motifs were matches to zinc finger factors, DNA-binding domains, and transcription factors (TFs) present in both the human and the mouse genome. The enrichment analyses results have been given as a Supplementary Table S1. To find the motif occurrences in the mRNA sequence of POTEE, we used FIMO software. We set the parameter for Homo sapiens,   selected the UCSC database (hg38). There were 64,488 motif occurrences with a p-value less than 0.0001. The best match motif was identified to be CTTCCAGCAGATGT which has a width of 14. Table 2 represents the top 20 motif occurrences that have been computed for the POTEE mRNA sequence.

Deep neural network-based structure prediction
The tertiary model of POTEE paralog has already been developed using the 1yvn.1. A PDB template and can be easily accessed and downloaded from Swiss-Model [30]. The model is a theoretical one with no experimentally validated crystal structure. The Swiss-Model POTEE structure showcases the actin-like domain and not the complete protein structure. We deployed the TopModel tool to predict the POTEE structure as it has an embedded automated workflow for template-based modeling (TBM) that uses the power of deep neural network (DNN) learning to improve template selection, thus, preparing the best possible and robust models with good overall quality, coverage and similarity index to the template models. Figure 2 represents the two varied structures of POTEE paralog -(a) a Swiss-Model structure (UniProt ID: Q6S8J3) and (b) our deep neural network predicted model using TopModel software. There is a magnanimous difference between the two models predicted. Our predicted model has been predicted based on the best matching PDB template 6I4D_A. In the predicted POTEE model, the blue-colored residues represent low predicted error referring to their high modeled quality, while red-colored residues correspond to high predicted error meaning they have a poor modeled quality. Out of 50 templates, our deep neural network (DNN)-based approach selected only 5 best possible templates for POTEE tertiary structure prediction. Table 3 encapsulates the top 5 templates along with the coverage, similarity index, overall quality, and a TM score that is calculated by various neural networks that use information about the threading energy, structural similarity, and model quality predictions. Figure 3 showcases the multiple sequence alignment of the templates along with the number of conserved residues that were selected by deep neural network (DNN) for model prediction.
The predicted model of POTEE has a single chain A with a length of 1-960 residues. Figure 4 represents the modeled structure of POTEE paralog with the residues that formed helices, parallel and anti-parallel strands, and loopy regions. With this, it is evident that our predicted model encapsulates the major domains  and motifs of the POTEE paralog. Also, the main regions of functionality start from residue 248 and end at residue 960. Loops are formed from residue 1-246.
To validate our predicted model, we utilised AlphaFold [40] structure predicted POTEE model (https://alphafold.ebi.ac.uk/entry/Q6S8J3). We downloaded the AlphaFold predicted model for POTEE that starts from residue 1 and ends are residue 1072. We aligned our predicted model to AlphaFold predicted model to check how well our predicted model has been developed (refer Figure 5). We observed that there was a perfect alignment to both the structures from residue 237-958, that corresponds to the fact that this segment of our predicted model has a very high confidence per residue i.e., >90. While, residues starting from 1-236 and stretch of residue starting from 959-1071 didn't align well, that simply refers to having a poor per residue confidence score, i.e., <50.

Structural alignment
In order to see how it is different from the existing actin structure of POTEE paralog available in the Swiss-Model, we aligned both the structures -our deep neural network (DNN)-based predicted model of POTEE with the actin region of POTEE that is available in the Swiss-Model repository (Q6S8J3). Figure 6 represents the two aligned structures. It is evident that our predicted model of POTEE and the Swiss-Model actin POTEE region that starts from 705-1075 were aligned mainly at 5 intersections; residues 6-12 of predicted POTEE was perfectly aligned to 706-712. Resides 26-28, 66-76, and 81-140 in the predicted POTEE model were perfectly aligned to 726-728, 767-777, and 781-841 residues of the Swiss-Model POTEE structure. The longest aligned match portion

Replica exchange molecular dynamic simulation (REMD) and electrostatic analyses
It is quite noticeable that the POTEE model has been refined with its overall energy being stable alongwith a good overall root mean square deviation (RMSD) score with minimum steric clashes. Table 4 provides the    important parameters and temperature ranges for the REMD simulation run for our modeled POTEE paralog. Table 5 describes the temperature, energies and the probability exchange rate. Root-mean-square deviation (RMSD) analysis showcases many residual disturbances that are present in the POTEE structure during the simulation that dictate its stability via confirming the equilibration [41]. A greater disturbance between trajectories was noted that therefore impacted the root-mean-square deviation (RMSD) of the replicas. At 10 ns, the RMSD values were recorded as main loops, helices, and beta strands were present in this region suggesting major changes in the refined POTEE structure when compared to the modeled one. The accuracy score describes the changes in the backbone of the original structure with the refined structure. Post molecular dynamic simulations, it is evident that the accuracy of the refined POTEE model is better when compared to modeled POTEE structure (refer to Table 6). The MolProbity score gives an idea about the atom-atom mapping in tertiary structures to look for clashes that may arise because of MD simulation problems within the structure and the dihedral angles. Usually, MolProbity scores lie in the range of 1-2 Å (A). Our results discern that refined POTEE (MolProbity score = 2.69) has been aligned better and has fewer clashes when compared to the originally submitted modeled POTEE (MolProbity score = 2.32).
The radius of gyration (ROG) of a tertiary model defines the root-mean-square average of the distance of all atoms from the center of mass of the tertiary model [42]. The radius of gyration (ROG) is recorded to be less for the refined POTEE model (21.01 ± 1.00) when compared to the original modeled POTEE (21.86 ± 1.78) (refer Table 6). Figure 7 represents the RMSF plot with detailed regions of the residues that had higher fluctuation peaks and lower fluctuation peaks. The higher fluctuations were mainly observed in the highly coiled and super loopy regions starting from residues 1-246, while lower peaks were obtained in helices and beta-stranded residue regions.
After MD simulation, there is an energy landscape difference that dictates the refinement and further alterations in our modeled POTEE structure. The macromolecular frustration phenomenon is used to infer the functional dynamics and behavior of protein structures. The greater the frustrated regions, the greater the functional and binding cavities are present in a protein structure. Figure 7 encapsulates the combined, minimal, maximal, and neutral frustrations of the POTEE refined structure along with the density of frustration at various residues computed for 5 Å spheres. Maximal frustrations were present in residues 1-337 that mainly consist of loops while minimal frustrations were observed in 340-855 residues that form the helices and beta-strands in the POTEE tertiary model. The contact map visualization (refer Figure 8) also verifies the maximal and minimal frustrations in the initial residues and ending residues of the POTEE structure.
It is important to check how biomolecules associate with each other under various environments. That is where electrostatics plays a pivotal role in protein structural analyses. The adaptive Poisson-Boltzmann solver (APBS) provides solutions to the equations of continuum electrostatics for large biomolecules [43].
Our study reveals that refined POTEE structure had an APBS range in between −203.276 and 199.204, while the original modeled POTEE structure ranged in between −119.164 to 84.203 respectively. The molecular mechanics generalized Born surface area continuum solvation (MM-GBSA) indicate that post MD simulation, POTEE structure has become more stable, with fewer steric clashes, and is electrostatically stable. Figure 9 represents the MM-GBSA calculations in the form of an APBS mapin PyMOL [33] software for both the modeled and refined POTEE structures.
Another study reveals that POTEE, ApoA1, and HPX genes get upregulated in breast cancer and could be seen as a potential novel biomarker for the same [46], whereas, Wang et al. (2015) [20] discern that POTEE is hypomethylated in non-small cell lung cancer (NSCLC) and is associated with TNM NSCLC patients. All these recent studies suggest that POTEE paralog gets epigenetically activated in different cancers, however, there is no significant data available to prove its epigenetic association in terms of network-based epigenetic interactor analyses.
With different literature evidence, we know POTEE gets epigenetically regulated in cancers, but what we don't know is why it gets epigenetically triggered. Therefore, it becomes necessary to analyze the POTEE sequence and to understand its significant associators and their behavior in different cancers. Therein, by deploying a network-based epigenetic analysis, we identified 200+ direct and indirect, inter-related, physical, and text-backed associators linked to POTEE. However, we selected only 10 highly significant, direct, and physical associators that had a confidence score of ≥5.0. These 10 associators were -RELA, HMOX2, EZH2, p-10Y-ERBB3-1, WDR1, ERRFI1, PRG2, FMR1, DEFA6-(?-100), cytf_human respectively. Figure 10 represents the network of these 10 associators and POTEE. To further make it lucid, we applied the k-means clustering algorithm to group closest and similar associators to the POTEE paralog. Two distinguishable groups were formed, Group A (demarcated in blue, see Figure 9) encapsulated -RELA, HMOX2, EZH2, p-10Y-ERBB3-1, WDR1, ERRFI1, whereas, group B (demarcated in orange) consisted of PRG2, FMR1, DEFA6-(?-100), cytf_human. Table 7 provides a brief description of the 10 identified interactors.
On manual text mining the literature evidence, we found that out of 10, 8 of these associators were epigenetically modified and regulated in different diseases. Group A associators overpowered the epigenetic link to group B interactors. RELA has shown an increased methylation level that is significant in the progression of breast cancer [47], HMOX2 has shown an increased hypomethylation in endometriosis [48], while EZH2 mediates histone modification H3K27m3 and causes several cancers [49]. ERRFI1 is discerned to have an epigenetic downregulation in neuroblastoma tumors [50], and WDR1 has been shown to get overexpressed in non-small cell lung cancer (NSCLC) [51], whereas, p10Y-ERBB3-1 is discerned to have shown histone      [54]. This evidence suggests that since the majority of the network associators of POTEE are epigenetically activated in many cancers, it is quite natural for POTEE paralog to get over-expressed and epigenetically regulated in ovarian cancer too. Moreover, there exist various experimental studies [19,20,[44][45][46][47][48][49][50][51][52][53][54] that discern its epigenetic dynamics in different diseases.

Discussion
The cancer testis antigen (CTA) family member -prostate ovary testis embryo expression (POTE) is a class of genes that have been discerned to play a pivotal role in many diseases especially cancers. Because of limited literature, there is no experimental or derived structure of POTEE paralog. Also, the lack of genomic information makes it crucial to deduce pivotal information that can be used as a lead to identify and understand its epigenetic trigger that leads to ovarian cancer in females. With the aid of an exploratory modus operandi, we identified six main matching motifs that are present in the mRNA sequence of POTEE paralog, out of them, three motifs -(i) CTTCCAGCAGATGTGGATCA (score 13.1291), (ii) GGAACTGCC (score 12.1573), (iii) CGCCACATGCAGGC (score 8.44245) are most probable candidates to be in the nucleotide sequence of POTEE as these were matching to other motifs already known to be ubiquitous in established and validated repositories. Also, A-T pair was 0.2534 and nucleotide pair C-G was 0.2466 in %age as computed in the POTEE mRNA sequence. These motifs were perfect matches to various present in the human and mouse genome. Moreover, these motifs were matches to zinc finger factors, DNA-binding domains, and transcription factors (TFs) present in both the human and mouse genome. There were 64,488 motif occurrences with a p-value less than 0.0001. The best match motif was identified to be CTTCCAGCAGATGT. In order to predict the tertiary structure, instead of adopting the traditional approach, we deployed the template-based modeling (TBM) method that utilized the power of deep neural network (DNN) learning. There is a significant difference between the Swiss-Model POTEE structure and our DNN-based POTEE model. The predicted model has been developed using the best matching PDB template 6I4D_A. The predicted model of POTEE has a single chain A with a length of 1-960 residues and encapsulates the major domains and motifs of the POTEE paralog. Also, the main regions of functionality start from residue 248 and end at residue 960. Loops are formed from residue 1-246. After structure alignment, it is evident that our predicted model of POTEE and the Swiss-Model actin POTEE region that starts from 705-1075 were aligned mainly at 5 intersections; residues 6-12 of predicted POTEE was perfectly aligned to 706-712. Resides 26-28, 66-76, and 81-140 in the predicted POTEE model were perfectly aligned to 726-728, 767-777, and 781-841 residues of the Swiss-Model POTEE structure. The longest aligned match portion started from residue 179-250 in our predicted model of POTEE with residue 879-949 of the Swiss-Model POTEE structure. To validate our predicted model, we utilised AlphaFold [40] structure predicted POTEE model and thus aligned the two structures to check how well our predicted model has been developed. It was found that there was a perfect alignment to both the structures from residue 237-958, that corresponds to the fact that the stretch of our predicted model has a very high confidence per residue i.e., >90. While, residues starting from 1-236 and stretch of residue starting from 959-1071 didn't align well, that simply refers to having a poor per residue confidence score, i.e., <50.
Post-REMD, the POTEE model has been refined with its overall energy being stable along with a good overall root mean square deviation (RMSD) score with less steric clashes. Root-mean-square deviation (RMSD) analysis showcases many disturbances that are present in the POTEE structure during simulation dictating the stability by confirming the equilibration. A greater disturbance between trajectories was noted in the RMSD of the replicas. The accuracy of the refined POTEE model is better when compared to modeled POTEE structure (refer to Table 6, the second column). Our results discern that refined POTEE (MolProbity score = 2.69) has been aligned better and has fewer clashes when compared to the originally submitted modeled POTEE (MolProbity score = 2.32). The higher fluctuations were mainly observed in the highly coiled and super loopy regions starting from residues 1-246, while lower peaks were obtained in helices and beta-stranded residue regions. The molecular mechanics generalized Born surface area continuum solvation (MM-GBSA) indicate that post MD simulation, POTEE structure has become more stable, with fewer steric clashes, and is electrostatically stable.
The network-based epigenetic analysis discerns only 10 highly significant, direct, and physical associators that had a confidence score of ≥5.0 and were namely-RELA, HMOX2, EZH2, p-10Y-ERBB3-1, WDR1, ERRFI1, PRG2, FMR1, DEFA6-(?-100), cytf_human respectively. Since the majority of the network associators of POTEE are epigenetically activated in many cancers as they have been reported in the literature, it is quite natural for POTEE paralog to get over-expressed and epigenetically regulated in ovarian cancer too. Additionally, we conclude that although there are a few studies that have shown POTEE gets hypomethylated (over-expressed) in ovarian cancer, but our study has validated this theory using an in-silico analysis that uses genomic, structural, electrostatic and epigenetics-based network approach.

Conclusions
With an exhaustive and an exploratory analysis, we would like to conclude that POTEE paralog has motifs that are matches to zinc finger factors, DNA-binding domains, and transcription factors (TFs) ubiquitousin both the human and mouse genome. The best match motif was identified to be CTTCCAGCAGATGT. There are a few studies that have shown a correlation between transcription factor BORIS (Brother of Regulator of Imprinted Sites) viz., paralogous to the well characterized, highly conserved, multivalent 11 Zn-finger factor CTCF but are different and -N and C termini. BORIS and POTE both come from a cancer testis antigen (CTA) family, and there are a few studies that showcase BORIS directly dictates CTA gene expression regulation [55][56][57][58]. Additionally, the BORIS/CTCF mRNA expression ratio is also linked with DNA hypomethylation in cancers. Our genomic analysis thus points out the direct correlation of the CTCF motif identified in POTEE mRNA sequence could be a useful lead in understanding why it gets hypomethylated in ovarian cancer. The predicted model has been developed using a deep-learning based homology modelling approach with the best matching PDB template 6I4D_A. and has a single chain A with a length of 1-960 residues encapsulating domains, motifs, and loops. Also, the main regions of functionality start from residue 248 and ends at residue 960. Loops are formed from residue 1-246. To validate our predicted POTEE model, we used AlphaFold POTEE structure. It was observed that there was a perfect alignment to both the structures (predicted POTEE & AlphaFold POTEE model) from residue 237-958 referring to a high confidence per reside (>90) of our predicted model. Post molecular dynamic simulations and related analyses such as -molecular mechanics generalized Born surface area continuum solvation (MM-GBSA) indicate that POTEE structure has become more stable, with fewer steric clashes and is electrostatically stable and that the higher fluctuations were mainly observed in the highly coiled and super loopy regions starting from residues 1-246, while lower peaks were obtained in helices and beta stranded residue regions. There are 10 highly significant, direct and physical associators with a confidence score of ≥5.0 namely -RELA, HMOX2, EZH2, p-10Y-ERBB3-1, WDR1, ERRFI1, PRG2, FMR1, DEFA6-(?-100), cytf_human and the majority of the network associators of POTEE are epigenetically activated in many cancers. Thus, it is quite natural for POTEE paralog too to get over-expressed and epigenetically regulated in cancer and to be specific, ovarian cancer. Thus, it can be seen as a positive prognostic indicator to diagnose ovarian cancer in its early stages.