On the identification of potential novel therapeutic targets for spinocerebellar ataxia type 1 (SCA1) neurodegenerative disease using EvoPPI3

Abstract EvoPPI (http://evoppi.i3s.up.pt), a meta-database for protein-protein interactions (PPI), has been upgraded (EvoPPI3) to accept new types of data, namely, PPI from patients, cell lines, and animal models, as well as data from gene modifier experiments, for nine neurodegenerative polyglutamine (polyQ) diseases caused by an abnormal expansion of the polyQ tract. The integration of the different types of data allows users to easily compare them, as here shown for Ataxin-1, the polyQ protein involved in spinocerebellar ataxia type 1 (SCA1) disease. Using all available datasets and the data here obtained for Drosophila melanogaster wt and exp Ataxin-1 mutants (also available at EvoPPI3), we show that, in humans, the Ataxin-1 network is much larger than previously thought (380 interactors), with at least 909 interactors. The functional profiling of the newly identified interactors is similar to the ones already reported in the main PPI databases. 16 out of 909 interactors are putative novel SCA1 therapeutic targets, and all but one are already being studied in the context of this disease. The 16 proteins are mainly involved in binding and catalytic activity (mainly kinase activity), functional features already thought to be important in the SCA1 disease.


Introduction
Knowledge on the protein-protein interaction (PPIs) network (the so called interactome) is essential to elucidate the complex molecular relationships in living systems, and thus understand biological functions at cellular and systems levels. Such knowledge is needed to understand the link between genotypes and phenotypes, and thus identify the molecular basis of disease and identify possible therapeutic targets. Systematic mapping of PPIs, by probing interactions between proteins of interest, has been performed using high-throughput methods such as the yeast two-hybrid (Y2H) system, protein-fragment complementation assays, affinity purification and mass spectrometry [1]. For a given pair of proteins, PPI can also be addressed using methods such as X-ray crystallography, NMR spectroscopy, fluorescence resonance energy transfer (FRET), and surface plasmon resonance (SPR) [2]. These interactions are compiled in primary databases such as BioGRID [3,4], CCSB [5], DroID [6], FlyBase [7], HIPPIE [8], HitPredict [9], HomoMINT [10], Instruct [11], Interactome3D [12], Mentha [13], MINT [14], and PINA v2 [15], all included in EvoPPI [16,17], a meta-database, for PPI. Although these are literaturecurated databases, false positive interactions may still be reported, mainly due to the experimental limitations of the methods used to characterize PPI. Moreover, it is still possible that the in vitro protein interactions will never be observed in vivo because proteins are present at different tissues or subcellular locations, the genes encoding such proteins are expressed at different moments in time, the in vivo interactions are transient, proteins show post-translational modifications, or the physiological conditions are unmatched. Given that there are many different criteria that can be used to decide on the inclusion of a given PPI on a database, it is not surprising that the degree of overlap between different PPI databases is limited, and that all of them report an exclusive set of information [16]. Since there is no clear consensus on how to identify false positive interactions, in EvoPPI, users can select the databases to search, compare the PPIs reported in each database, and decide which ones to trust. In EvoPPI, all interactions, irrespective of the source database, are reported as GeneID pairs. Without EvoPPI, such comparisons would be difficult to make since the primary databases use different formats (gene identifiers [BioGRID], UniprotKB accession numbers [MINT], and gene names [CCSB], for instance). Since the last major EvoPPI update (EvoPPI2) results can be viewed and downloaded either pooled or by database.
Main databases rely on the published research, and thus are biased towards proteins screened more frequently, such as those involved in disease. Moreover, both in humans and in animal model species, PPI networks are likely very large, and thus the available data likely covers only a small fraction of the full network. Therefore, from the very start, EvoPPI also aimed at making most of the existing data, under the assumption that protein networks, and thus, interologs (conserved interactions between pairs of proteins which have interacting homologs in another organism) are conserved between distantly related species, as it seems to be the case [18]. While the first EvoPPI version made available a BLAST option that can be customized (the number of descriptions, minimum expect value, minimum length of alignment block, and the minimum identity [in percentage] can be defined by the user), in EvoPPI2, for humans and the model species Mus musculus, Caenorhabditis elegans, and Drosophila melanogaster, pre-computed predicted interactomes are available, under the category "Predicted Interactomes". Since EvoPPI2, for the establishment of gene orthologies, the user can select DIOPT and/or Ensembl. Within this update (database version 2022.04.v2), the set of model species that are considered is expanded to include the zebrafish Danio rerio. Proteomic analyses of model species mutants expressing a human protein of interest can also be very informative, and have been widely used in the context of neurodegenerative diseases [19][20][21][22][23][24][25][26]. The assumption is that the human orthologs of the genes encoding the model species proteins that interact with the human protein of interest encode proteins that will natively interact with the human protein. Moreover, that the model species proteins that interact with the human protein will also interact with the model species protein encoded by the orthologous gene of the one encoding the human protein. Nevertheless, since these are cross-species observations such data is not included in the main PPI databases. Therefore, we have compiled from the literature such data for nine neurodegenerative polyglutamine (polyQ) diseases caused by an abnormal expansion of the polyQ tract, namely six spinocerebellar ataxias (SCA) types 1, 2, 6, 7, 17; Machado-Joseph disease (MJD/SCA3); Huntington's disease (HD); dentatorubral pallidoluysian atrophy (DRPLA); and spinal and bulbar muscular atrophy, X-linked 1 (SMAX1/SBMA), that is now available at EvoPPI3.
It seems reasonable to assume that a mutant protein could show novel protein interactions when compared to the wild type form. This is the reason why data coming from patients or disease model species usually do not end up in primary PPI databases. This is not necessarily the case for all diseases. For instance, for the neurodegenerative polyglutamine (polyQ) diseases above mentioned, the pathogenic process seems to be driven by changes in protein binding strength that lead to the dysregulation of the protein network, and not due to novel interactions [16,[27][28][29][30][31]. Therefore, here, for the nine proteins involved in these neurodegenerative polyQ diseases, namely Ataxin-1, 2, 6, 7, 17, and 3, huntingtin (HTT), atrophin-1 (ATN1), and androgen receptor (AR), we have compiled from the literature PPI data that is now available at EvoPPI3. Under the assumption that the PPI networks are at least partially conserved between species, and that the homologous genes that perform similar functions can be identified without error, such data can be used to predict and test interactions not yet observed. These datasets must be, however, used with caution, since in most published studies no strategy was implemented to reduce the number of false-positive and false-negative occurrences (see for instance [32]).
Cross-species genetic screens (modifier screens) using different methodologies [33] can be very informative as well [34][35][36], and have been recurrently used in neurodegenerative diseases (see for instance [37,38]). Indeed, human disease genes are largely conserved between the human and mouse genome (99.5%; [39]), and for D. melanogaster and C. elegans it has been reported that 75% and 83%, respectively, of human disease-related loci have paralogs in these species [40,41]. Therefore, we also compiled such data from the literature, which is now available at EvoPPI3. In this case, the data does not necessarily represent direct physical interactions of the disease-causing proteins, although, for an unknown fraction, such possibility cannot be excluded.
We also report the interactome of mutant Ataxin-1 flies expressing the wild-type (wt) and expanded (exp, associated to disease) human forms (available at Bloomington D. melanogaster Stock Center), using coimmunoprecipitation followed by mass spectrometry analyses. The proteins present in both wt and exp mutant ATXN1 flies, that were present in at least three out of the four wt and exp samples, are included in EvoPPI3 under the Predicted interactomes PolyQ_models22 category, with the name Curated Homo sapiens ATXN1 D. melanogaster.
Since functional insight on protein interaction networks is often obtained via functional enrichment analyses and pathway annotation, in the new EvoPPI3 release, here presented, data can also be exported as GeneID lists, in a format compatible with PantherDB (http://pantherdb.org). Moreover, in order to facilitate comparisons using Veen diagrams, results can now also be exported as single column GeneID lists.
As an example of the utility of the new features of EvoPPI3, here, we show how it is possible to use them to easily identify novel putative therapeutic targets for spinocerebellar ataxia type 1 (SCA1) neurodegenerative disease. By performing comparative analyses using the different EvoPPI3 datasets, we identify 575 novel putative Ataxin-1 interactors. Of the 909 Ataxin-1 interactors here identified, 16 are putative therapeutic targets.

Mutant Ataxin-1 fly interactome
Transgenic flies expressing the wild-type (wt; 39,738 and 39,739) and expanded (exp; 39,740 and 33,818) human Ataxin-1 forms were obtained from the Bloomington Drosophila Stock Center. As negative control we used the white fly (5905) strain, used to construct the above mentioned ATXN1 lines. To express the human protein, crosses with the GMR-GAL4 driver line have been performed. For each mutant fly strain, we have performed two biological replicates. For each sample, 400 flies were frozen in liquid nitrogen and then decapitated by vigorously vortexing twice for 15 s. As described by Emery [48], the heads of the flies were grounded after adding 1.25 μL of extraction buffer per fly (500 μL) 400.6 μL TBS (50 mM TRIS; 150 mM NaCl, pH 7.5); 5% glycerol; 10 mM EDTA; 0.1% Triton; 5 μL Protease inhibitor cocktail (Complete™, Mini, EDTA-free Protease Inhibitor Cocktail from Roche containing 1 mM dithiothreitol [DTT]; 0.5 mM phenylmethylsulfonyl fluoride; 20 mg/mL aprotinin, 5 mg/mL leupeptin, and 5 mg/mL pepstatin A). After centrifugation for 10 min at 12,000 g at 4 • C, the supernatant was collected. Protein concentration was measured by a NanoDrop Microvolume Spectrophotometer.
For co-immunoprecipitation, the protocol of the Protein G Mag Sepharose kit (GE HealthCare), and the Ataxin-1 antibody 11NQ anti-Ataxin-1 (NeuroMab) [49] were used. Briefly, first, we washed the beads (calibration) with 500 μL of TBS buffer. Then, the antibody was diluted in the binding buffer (10:500) and the mix added to the beads, incubated, and re-suspended in a benchtop shaker at −4 • C for two hours. The beads are then washed with 500 μL of TBS buffer, and the fly heads protein extract added to the beads, and a new cycle of incubation and resuspension on a benchtop shaker at −4 • C for two hours was performed. After, the unbound fraction liquid was removed. Three washing steps were then made with TBS buffer. Before the last washing step, the solution with the beads was transferred to a new microtube, since some unbound material could still be present. Subsequently, the elution process was made with 50 μL of the elution buffer (2% SDS) and the samples heated at 95 • C for 5 min. Mass spectrometry analyses were then performed using the Hybrid Quadrupole-Orbitrap mass spectrometer (Q-Exactive, Thermo Scientific). Proteome discovery and BLAST [50] were used for protein identification.

Protein profiles of different regions of the brain
Spatial profiling of the human Brain for the basal ganglia, cerebral cortex, midbrain, and thalamus (where Ataxin-1 is present) was retrieved from The Human Protein Atlas (https://www.proteinatlas.org/humanproteome/brain/human+brain). These brain regions are relevant for SCA1 [51].

Database structure
The EvoPPI3 database structure (version 2022.04.v2) tries to highlight the possible problems associated with the use of the different datasets, while attempting at the same time to provide an easy access to different types of data. As such, there are two main categories: "Interactomes" and "Predicted Interactomes". The datasets that are found under the "Interactomes" category fall into three main subcategories or "collections": (i) Databases: within species protein interactions reported in the main PPI databases, namely BioGRID [3,4], CCSB [5], DroID [6], FlyBase [7], HIPPIE [8], HitPredict [9], HomoMINT [10], INstruct [11], Interactome3D [12], Mentha [13], MINT [14], and PINA v2 [15]. EvoPPI3 can thus, be used as an aggregator of the main PPI databases, by selecting the datasets under the "Interactomes/Databases" subsection only. EvoPPI3 allows users to retrieve direct (level 1) interactions as well as interactions up to level 3 (proteins that interact with proteins that interact with proteins that interact with the query). By default, only the datasets under Interactomes/Databases and Predicted Interactomes/Databases can be searched for. In order to use the other datasets, users must click on the corresponding checkboxes (Figure 1), thus acknowledging that they are aware of the possible problems associated with the remaining datasets (see above).

Data output
EvoPPI3 allows the visualization of the results both as a chart (by pushing the "Show chart" button) or as a table (the default option). Permalinks to the results can be generated, in order to easily share results. EvoPPI3 allows the download of all protein isoforms associated with the retrieved results in FASTA format ("Download FASTA"), ideal for users wanting to perform 3D protein inferences and protein docking analyses, as well as the download of GeneID pairs in CSV format ("Download CSV"). The user has the option to download all data or the data obtained from a single dataset. In the version here reported, two new output formats are provided, namely: (i) a format intended at facilitating biological process/cellular component/molecular function/pathways/protein function enrichment statistical tests by giving the option to export the data in the format "GeneID:number1, GeneID:number2, . . . " compatible with the one used by the PantherDB (http://pantherdb.org); and (ii) a single column geneID list (without repetitions and that includes the query gene) that is ideal for a visual overall representation of the data using Venn diagrams, such as the ones that can be obtained at https:// bioinformatics.psb.ugent.be/webtools/Venn/.

Biological example
In this section we provide an example of the inferences that can be easily made using the EvoPPI3 version here reported, namely the identification of novel therapeutic targets for spinocerebellar ataxia type 1 (SCA1) neurodegenerative disease, that is caused by the expansion of the polyglutamine repeat of Ataxin-1 protein that is encoded by the ATXN1 gene (GeneID 6310). In order to achieve this goal two tasks must be performed: (i) the identification of the Ataxin-1 protein network; and (ii) the comparison of the identified network with the available gene modifier data.
As shown in Reboiro-Jato et al. [17], the H. sapiens Ataxin-1 PPI network reported in the main databases, although large (380 interactions), is still incomplete, since the predicted interactome of H. sapiens based on M. musculus is much larger (1243 and 592 interactions when considering the DIOPT or Ensembl M. musculus -H. sapiens orthologies, respectively) than the one reported in humans. Therefore, using the new EvoPPI3 datasets, we try to get a better estimate of the size and composition of the Ataxin-1 PPI network. For simplicity, we analyse first the use of the data from H. sapiens (the main databases and the H. sapiens [PolyQ_22] datasets) and M. musculus (the H. sapiens predicted interactome based on M. musculus main databases) (only those genes that are in common between the DIOPT and Ensembl orthology predictions) and the H. sapiens predicted interactome based on the M. musculus PolyQ_models_22 (only those genes that are in common between the DIOPT and Ensembl orthology predictions), and then the data from H. sapiens (the main databases and the H. sapiens [PolyQ_22] datasets) and D. melanogaster (Curated H. sapiens ATXN1 D. melanogaster [PolyQ_models_22] only those genes that are in common between the DIOPT and Ensembl orthology predictions). In order to easily obtain only those genes that are in common between the DIOPT and Ensembl orthology predictions, the search results for the two datasets were exported separately, using the new "Unique GeneIDs list (plain)" EvoPPI3 button, and the intersection of the two lists obtained using the https://bioinformatics.psb.ugent.be/webtools/Venn/ website. Figure 2 shows the observed and predicted (based on M. musculus data) Ataxin-1 H. sapiens PPI network. It should be noted that there are 46 human proteins that are reported as being human Ataxin-1 interactors that are not reported in any other dataset. Unless the Ataxin-1 network is still very incomplete, these 46 interactors may represent false positives. This is a possibility since 14 (30%) out of these 46 interactors are available in just one (out of 27) databases. Nevertheless, eight proteins are assigned as human Ataxin-1 interactors in more than seven databases, and thus different curators have included them as true human Ataxin-1 interactors. When comparing the functional classification (using PhantherDB; http://pantherdb.org) of these 46 Ataxin-1 interactors with the remaining 334 Ataxin-1 interactors reported in the main databases, there are significant differences between the two datasets (Sign test; P < 0.05; Figure 3). Therefore, they may be indeed, false positives, and thus, in the following analyses we no longer consider them.
There are 611 H. sapiens genes that are predicted to encode Ataxin-1 interactors, when using the M. musculus data reported in the main databases and, as a criterion, only those genes that are in common between the DIOPT and Ensembl orthology predictions (Figure 2). Of these, 46 (7.5%) are already reported in the main PPI databases. Moreover, 444 (72.7%) out of the 611 predicted Ataxin-1 interactors are reported as Ataxin-1 interactors in the Interactomes/H. sapiens (PolyQ_22) database that is based on studies performed in SCA1 patients and human and mammalian derived cell lines data; (Table 1). There are 674 H. sapiens genes that are predicted to encode Ataxin-1 interactors (only those genes that are in common between the DIOPT and Ensembl orthology predictions were considered), when using the M. musculus Predicted Interactomes/H. sapiens Mus Musculus (PolyQ-models_22) datasets. Of these 47 (7.0%) are already reported in the main human PPI databases. Moreover, 486 (72.1%) out of the 674 predicted interactors are also present in the Interactomes/H. sapiens (PolyQ_22) database, being 53 predicted Ataxin-1 interactors not identified before. Therefore, with these comparisons we have identified a set of 452 well supported new human Ataxin-1 interactors, thus increasing the number of interactors to 786. By using the new "Unique GeneIDs (Panther)" EvoPPI3 button, and the PhantherDB (http:// pantherdb.org) database, it can be easily shown that the new 452 human putative Ataxin-1 interactors show a similar functional class characterization to those 334 interactions reported in the main PPI databases and that are likely not false positives (Sign test; P > 0.05; Figure 3). It should be noted that there is no data for the D. rerio orthologs (GeneIDs 5,654,841 and 557,340) of the human Ataxin-1.
Although PPI data from non-vertebrate species can also be used to infer the full H. sapiens Ataxin-1 network, when using only genes that are in common between the DIOPT and Ensembl orthology predictions, no results are generated based on the available C. elegans datasets. Moreover, only five H. sapiens Ataxin-1 interactors are predicted based on the D. melanogaster data (only those genes that are in common between the DIOPT and Ensembl orthology predictions were considered). Therefore, here, we obtained the proteomic profile of the human Ataxin-1 in D. melanogaster using two mutants expressing the human wild-type (wt) ATXN1 form and two mutants expressing the human expanded (exp) ATXN1 form (see Material and Methods). No proteins were identified in the negative control. Considering only the proteins identified in at least three out of the four wt samples, and three out of four exp samples, and that are in common between the wt and exp samples, we identify 173 fly proteins that can interact with the human Ataxin-1 (Supplementary Table 1). These are available in EvoPPI3  Figure 4). This is in agreement with the conserved functional interactions with the same group of binding partners observed in transgenic flies with the human Ataxin-1 [53][54][55][56]. Therefore, using the data from D. melanogaster wt Ataxin-1 mutants, we could validate 123 new human Ataxin-1 interactors. These 123 human putative Ataxin-1 interactors show a similar functional class characterization to those 334 interactions reported in the main PPI databases and that are likely not false positives (using a Sign test P > 0.05; Figure 3).
Our final estimate of the human Ataxin-1 network is thus 909 (Supplementary Table 2). Nevertheless, it could be argued that since many Ataxin-1 interactors were identified in vitro, that such interactions may not be observed in vivo, since they could have different spatio-temporal distributions within human cells.While the temporal issue is difficult to address at present, the spatial aspect can be addressed by looking for the presence of these proteins in brain tissues that are relevant in SCA1 [51] namely basal ganglia, cerebral cortex, midbrain, and thalamus according to the Human Protein Atlas (https://www.proteinatlas.org/humanproteome/brain/human+ brain; Figure 5). The vast majority (92% [836]) of the Ataxin-1 interactors are expressed in these brain tissues. Only for 8% of the Ataxin-1 network we find evidence for an unexpected spatial expression (22 are restricted to a particular brain tissue, and 45 are not present in these tissues [ melanogaster wt and exp Ataxin-1 mutants, respectively]). Therefore, most of the Ataxin-1 interactions here identified may also be relevant in vivo.  Table 2) can now be compared with the gene modifier data available at EvoPPI3 to identify those genes that worsen or ameliorate a given SCA1 related phenotype, and thus may be novel therapeutic targets. In humans, for Ataxin-1, only one gene modifier (GeneId: 1400; Table 2) is reported in the Interactomes/Modifiers_22/H. sapiens (Modifiers_22) dataset  Table 2). Gene modifier data is also available for D. melanogaster Ataxin-1 mutants (Table 1). Of the 87 proteins (only those genes that are in common between the DIOPT and Ensembl orthology predictions were considered), 15 are Ataxin-1 interactors ( Table 2). According to PhanterDB these 16 proteins are involved in binding and catalytic activity, mainly kinase activity ( Table 2). The role of Ataxin-1 in RNA metabolism has been previously addressed [57]. Kinase activity is also known to play an important role in normal Purkinje neuron function, and altered activity has been suggested to have a role in cerebellar ataxias [58,59]. These proteins should be considered as novel genetic SCA1 modulators, and explored as novel therapeutical targets.

Conclusions
Using the novel databases (including new data here obtained for D. melanogaster wt and exp Ataxin-1 mutants available at EvoPPI3), as well as the new export options, the size of the human Ataxin-1 network was increased from 380 up to 909 interactors. Of these, 16 have been reported to worsen and ameliorate phenotypes associated with the SCA1 disease, which are putatively novel therapeutic targets. All but one, are already being studied in the context of SCA1, although only six out of these 16 human proteins are present in the main PPI databases. One of them is already being studied as a therapeutic target ( Table 2). These proteins are mainly involved in binding and catalytic activity, manly kinase activity, functional features already thought to be important in the SCA1 disease.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. Research funding: This research was financed by the National Funds through FCT-Fundação para a Ciência e a Tecnologia, I.P., under the project UIDB/04293/2020. This work was also partially supported by the Conselleria de Cultura, Educación e Universidade (Xunta de Galicia) under the scope of the strategic funding ED431C 2022/03-GRC Competitive Reference Group and by Ministerio de Universidades (Gobierno de España) through a "María Zambrano" contract (Hugo López-Fernández).

Conflict of interest statement:
Authors state no conflict of interest.