Study of NAD-interacting proteins highlights the extent of NAD regulatory roles in the cell and its potential as a therapeutic target

Abstract Nicotinamide adenine dinucleotide (NAD) levels are essential for the normal physiology of the cell and are strictly regulated to prevent pathological conditions. NAD functions as a coenzyme in redox reactions, as a substrate of regulatory proteins, and as a mediator of protein-protein interactions. The main objectives of this study were to identify the NAD-binding and NAD-interacting proteins, and to uncover novel proteins and functions that could be regulated by this metabolite. It was considered if cancer-associated proteins were potential therapeutic targets. Using multiple experimental databases, we defined datasets of proteins that directly interact with NAD – the NAD-binding proteins (NADBPs) dataset – and of proteins that interact with NADBPs – the NAD-protein–protein interactions (NAD-PPIs) dataset. Pathway enrichment analysis revealed that NADBPs participate in several metabolic pathways, while NAD-PPIs are mostly involved in signalling pathways. These include disease-related pathways, namely, three major neurodegenerative disorders: Alzheimer’s disease, Huntington’s disease, and Parkinson’s disease. Then, the complete human proteome was further analysed to select potential NADBPs. TRPC3 and isoforms of diacylglycerol (DAG) kinases, which are involved in calcium signalling, were identified as new NADBPs. Potential therapeutic targets that interact with NAD were identified, that have regulatory and signalling functions in cancer and neurodegenerative diseases.


Introduction
Nicotinamide adenine dinucleotide (NAD) is a crucial metabolite in the cell, generally known for its function as cofactor in oxidation-reduction reactions responsible for energy production in the form of ATP, where it alternates between the oxidized (NAD+) and the reduced (NADH) forms. By transferring electrons between reactions, NAD participates in a multitude of metabolic processes that are key to the normal physiology of the cell including glycolysis, the citric acid cycle, fatty acids beta-oxidation and mitochondrial electron transport. Additionally, NAD is a substrate for proteins involved in cell survival, DNA damage repair, calcium signalling, or transcription regulation. NAD-dependent enzymes include sirtuins (SIRTs) [1], poly-and mono-(ADP-ribose) polymerases (PARPs and MARTs) [2,3], and cyclic ADP-ribose hydrolases, such as CD38 [4]. Maintenance of NAD cellular levels depends on a balance between its production and its depletion, for which the interconversion between NAD+/NADH and NADP/NADPH is not accounted.
Another role for NAD was acknowledged more recently, where NAD would function as a direct modulator of protein-protein interactions (PPIs), through its binding to the NUDIX domain [5]. The NUDIX domain is a 23 amino acid long general structure of a Nucleoside Diphosphate linked to a variable moiety X, with catalytic activity on nucleotides. Through their activity, many NUDIX proteins contribute to cellular homeostasis by cleaning the cell from deleterious compounds. Others regulate the concentrations of several metabolites, such as NAD, NADP and ADP-ribose. Others remove 5 ′ -cap from RNA and control the stability of mRNA, as well as gene expression. Nevertheless, several NUDIX enzymes remain uncharacterized [6][7][8].
NAD binding to the NUDIX homology domain (NHD) of the Deleted in Breast Cancer 1 (DBC1) protein prevented its interaction with PARP1 [5], and the DBC1-PARP1 interaction inhibits PARP1 normal function in the DNA damage repair process. Conversely, DBC1 regulates the activity of several proteins such as the transcription factor p53; the androgen and estrogen receptors (AR and ER), that are involved in hormone signalling; the BRCA1, which is also a DNA damage repair protein; and other NAD-dependent proteins that are epigenetic regulators, such as SIRT1 and HDAC3 [9].
The PARP catalytic domain is an example of a conserved protein domain that is common to all proteins within the PARP family, in which resides their main function of transferring the ADP-ribose moiety from its substrate (NAD) to carboxylate groups of aspartic and glutamic residues [10].
In this study, we aimed to characterize the NAD interactome, due to multitude of NAD cellular functions and relevance of NAD metabolism in normal and pathological conditions. Considering the NAD role in regulating PPIs, we focused on NAD-binding proteins and their interactions. Multiple experimental databases were surveyed to define an NAD-binding dataset, that was characterized through pathway enrichment analysis and protein structural domains analysis. The full human proteome was then screened, and a selection of potential NAD-binding proteins were further analysed. As previously reported in [11], we identified new proteins that potentially interact with NAD. Here, we described in detail the NADBPs dataset, we predicted NAD interacting residues of known NADBPs to serve as a reference and we further analysed the NUDIX containing proteins. We also uncovered NADBPs that are cancer-associated and potential drug targets. In addition, we performed molecular docking to predict the NAD-binding to potential NADBPs.

NAD-related proteins dataset
To study the proteins potentially related to NAD, an NAD-related dataset was defined. This was made using "NAD" as a keyword search, which considered all proteins that have "NAD" in protein name or in any field of description, such as protein family names, gene description, function or ontology classification. All human reviewed proteins obtained through UniProt (https://www.uniprot.org/) [12] and from IMEX Consortium database (http://www.imexconsortium.org/) [13] were considered.

Gene ontology (GO) analysis of the protein datasets
GO analysis was performed on PANTHER (http://pantherdb.org/) [16], through an overrepresentation test (Fisher's exact, False Discovery Rate correction), using the Pathways annotation dataset (version 13.0). The NAD-binding, the NAD-related and the NAD-PPIs datasets were analysed.

Identification of putative NAD-binding proteins
The most frequent protein domains and protein families within the NADBPs dataset were identified [11]. The total of 20,303 human reviewed proteins from the Uniprot database were considered as a reference dataset, and the 50,588 unreviewed proteins as a test dataset. Proteins that presented at least one of the most frequent NADBPs domains were retrieved from both reference and test datasets. The genes/proteins that were found exclusively within the test dataset of unreviewed proteins were identified and further analysed using the NADbinder (http://crdd.osdd.net/raghava/nadbinder/) [17] to predict the number of NAD interacting residues, and the STRING database (https://string-db.org/) v. 11 [15], to obtain the interactions of each of those proteins.

Molecular docking
To evaluate the potential binding of NAD to the top target, an automated in silico molecular docking analysis was performed using SwissDock web server (http://www.swissdock.ch), as described by Grosdidier and collaborators [18]. NAD ligand was used as provided by ZINC database (https://zinc.docking.org/), with the ID ZINC8214766, and the protein 3D structures of the top target were retrieved from AlphaFold database (https://alphafold.ebi.ac.uk/) [19].

Cancer associated proteins and potential drug targets
Proteins from the NADBPs dataset were compared with catalogues of protein-coding genes from the subproteomes of the Human Protein Atlas (https://www.proteinatlas.org/) [20]. Namely, the cancer proteome, that contains a list of 569 mutated proteins strongly implicated in cancer, as defined through the catalogue of somatic mutations in cancer (COSMIC), and the druggable proteome, that contains a list of 754 proteins targeted directly by an FDA approved drug, were considered. Currently, approximately four thousand protein-coding genes in the UniProt database have experimental evidence of involvement in several disease conditions, including cancer, neurologic, systemic and cardiovascular disease. From those, a list of 1326 proteins annotated in The Human Protein Atlas as potential drug targets, was also considered, as they belong to known drug target protein classes, such as enzymes, transporters, receptors and ion-channels, and are not yet targets for FDA approved or experimental drugs in the Drugbank database.

NAD-binding and NAD-related proteins differ in their predominant cellular roles
After collecting data from six different databases, we obtained a NADBPs dataset composed by a total of 439 proteins (Figure 1 and Appendix A). The NAD metabolite was found under different forms and names, and both oxidized and reduced forms were included. The highest numbers of interactions with NAD were found on the databases STITCH, DrugBank and the Human Metabolome Database.
The analysis of the 439 NADBPs showed that around 80% of these proteins were enzymes, most with catalytic activity, involved in metabolite interconversion. The major protein classes were dehydrogenases (92 proteins), from which over 30 were NADH dehydrogenase, and oxidoreductases (55 proteins), but several others were identified, as shown in Figure 2. More than one hundred proteins were mitochondrial isoforms of enzymes, which participate in the chain of reactions responsible for ATP production. Adding to enzymes that use NAD as cofactor in redox reactions, we also found all PARPs and all SIRTs, which are enzymes that use NAD as a  substrate. Regarding their molecular function, a small number of proteins involved in regulation or transporter activities was also found.
The NAD-related datasets included all proteins potentially related to NAD, either by protein names or by any field of description. For the "NAD-related" dataset, we obtained 456 proteins from UniProt and 1907 from IMEX. In a total of 2125 proteins, only 238 were common to both sources. We then identified 279 proteins that were also present in the NADBP dataset, leaving a total of 1846 NAD-related proteins that do not bind NAD directly.
We performed a GO analysis on the 439 proteins of the NADBPs dataset (Table 1) and on the 1846 proteins of the NAD-related dataset, to compare the results of the enriched pathways obtained in each one. Only two pathways were common to the two datasets, Glycolysis and the FAS signalling pathway. We found 31 pathways specific of the NADBPs dataset, that were not enriched on the NAD-related dataset. Those included pathways related to biosynthesis or metabolism of nucleic acids, carbohydrates, and amino acids.
In the NAD-related dataset, we found 36 pathways that did not appeared in the NADBPs dataset, that were mostly related to signalling. The highest fold enrichment values were found in the pentose phosphate pathway (the highest fold = 10.1), the JAK/STAT signalling, and four pathways related to p53 signalling. Of note, disease related pathways arose in the NAD related dataset, such as Alzheimer, Huntington, and Parkinson diseases. Also, signalling pathways related to angiogenesis, inflammation, and apoptosis, which are disease related mechanisms, were identified within the results.

Proteins that interact with NADBPs comprise about half of the human proteome
Then, the NAD-protein-protein interactions (NAD-PPIs), i.e., the proteins that interacted with the NADBPs were studied. Using the 439 proteins from the NADBPs dataset, 9823 pairs of proteins from STRING database were obtained, that corresponded to a total of 7815 unique gene name identifiers, 19,682 pairs from BIOGRID database, that corresponded to a total of 6479 unique gene IDs, and 5594 pairs from IMEX, that corresponded to a total of 3301 unique IDs. After mapping each type of ID retrieved from each database to the UniProtKB ID, with reviewed annotation (either using automatic tools or manually, in the case of automatically unmapped IDs), the duplicated entries we removed that were mainly due to gene or protein alternative names, or disease names associated to those genes. From STRING, a total of 7533 proteins were successfully mapped and 75 elements remained unmapped. From BIOGRID, a total of 5752 proteins were mapped and 54 remained unmapped. Most of these unmapped IDs were pseudogenes. From IMEX, 2500 proteins were mapped, and 90 elements remained unmapped. We found 40 CHEBI IDs, that were retrieved from CHEBI database for identification, but were not included for further analysis, since they corresponded to chemical compounds that interact with NADBPs and not protein-protein interactions, as it was intended. The proteins common to the three sources of PPIs were identified, and a final list of 10,020 proteins involved in PPIs with NADBPs remained.
As this represents about half of the human proteins annotated so far, according to the most recent version of UniProt Knowledge Database (UniProtKB 2020_06, [21]), with 20,379 reviewed proteins on the human proteome, the 1368 proteins common to all databases ( Figure 3) were further analysed. With this, the selection of the most validated interactions was assured.
A GO analysis was performed on the 1368 proteins from the NAD-PPIs dataset and compared with the results from the NADBPs dataset described previously (Table 1). Similarly to the NAD-related dataset, the NAD-PPIs dataset presented an enrichment in several signalling pathways, as compared to the NADBPs dataset. The pathways with the highest number of genes (over 50) were related to hormone receptors signalling, namely for gonadotropin and for the gastrointestinal peptide hormones cholecystokinin and gastrin, followed by the Wnt signalling and angiogenesis pathways. Several other pathways were related to hormone or growth factor signalling, and disease pathways also emerged, namely three major neurodegenerative diseases, Alzheimer's, Huntington's, and Parkinson's.

Overview of NADBPs protein structural domains
Protein domains analysis was performed on the 439 NADBPs through PFAM database and all matches that achieved an expectation value (E-value) below 1 (max. 0.88) were selected. The results show the top hit domain for each protein and how many hits were found. Within the 439 proteins, 1101 identifications were made, which corresponded to a total of 412 different domains. Two proteins didn't have an identified domain (NDUFA11 and GPAT2) and, in the remaining 437 proteins, 222 different domains were identified as top hit. More than half of the proteins (56% -247 proteins) belonged to the FAD/NAD(P)-binding Rossmann fold superfamily, and 27% belonged to the Ankyrin repeat superfamily. In our approach, the top 15 more common domains, which appeared in more than 10 proteins (Table 2), were selected. Five different ankyrin repeats were among these top domains found. Others were the short chain dehydrogenase, the aldehyde dehydrogenase family, the cytochrome P450 and the poly(ADP-ribose) polymerase (PARP) catalytic domain.
We also identified 65 proteins that had one of the 43 domains containing the term "NAD" in their names or descriptions. Eighteen proteins contain specifically one of the six different "NAD binding domain", namely the D-isomer specific 2-hydroxyacid dehydrogenase, the 3-hydroxyacyl-CoA dehydrogenase, the lactate/malate dehydrogenase, the malic enzyme, the UDP-glucose/GDP-mannose dehydrogenase family, and the 6-phosphogluconate dehydrogenase NAD binding domains. The NUDIX domain was found only in two proteins from the NAD-binding dataset, namely NUDT12 and NUDT7.

Identification of 13 new NAD-binding proteins based on protein domains
We searched for the 15 domains that were identified in ten or more proteins from the NADBPs dataset (Table 2) within the dataset of the full human proteome unreviewed proteins (test dataset) and obtained 901 protein sequences. After removing all protein fragments and duplicates, 255 proteins were identified, which corresponded to 204 single genes. A similar approach was performed in the reference dataset yielding 474 genes. Given our aim to identify uncharacterized proteins, from the 204 genes, 195 that were also identified in the reference dataset were excluded and 8 genes remained, corresponding to 13 protein sequences, found uniquely in the test dataset (Table 3).  Among the 13 proteins, there were five isoforms of the Diacylglycerol (DAG) kinase, four encoded by the DGKI gene, and one encoded by DGKZ gene. There were two other kinase isoforms, from the Leucine-rich repeat serine/threonine-protein kinase 1, encoded by the LRRK1 gene. There were also two proteins related to membrane transport, the Sodium/hydrogen exchanger 9B2 (SLC9B2) and two isoforms of a short transient receptor potential channel encoded by the TRPC3 gene. A smaller isoform of the POTEB member of the ankyrin family was also found. Of note, POTEB was the only protein that presented simultaneously two of the 15 domains (Ank_2 e Ank_5). Additionally, there were two proteins resultant from the readthrough of two genes, CYP3A7-CYP3A51P, which belong to a subfamily of the Cytochrome P450, and FPGT-TNNI3K, from the neighbouring fucose-1-phosphate guanylyltransferase (FPGT) and TNNI3 interacting kinase (TNNI3K) genes.
To evaluate the possibility that NAD has an impact on the interactions between these proteins, we further searched for the interactions of each of the proteins. DGKI and SLC9B2 had no reported interactions, as well as the proteins resultant from the two readthrough events. LRRK1 had the highest number of interactions, followed by TRPC3.

Number of NAD interacting residues in new and known NADBPs
The 13 identified proteins were further analysed using the NADbinder software (Table 3). Here, instead of the protein structure, the protein sequence is considered. The highest number of NAD-interacting residues was 33 and was identified in the longest isoform of TRPC3, with 793 amino acids, followed by the longest isoform of DGKI with 1078 amino acids, where 31 residues were identified. The five DAG kinase isoforms retrieved more than 20 NAD-interacting residues, as well as the two readthrough proteins. A positive correlation was observed between the amino acid length and the number of NAD-interacting residues identified.
To serve as a reference, six proteins known to be involved in NAD metabolism were additionally scanned for the number of NAD interacting residues. Namely, two enzymes that consume NAD intracellularly (PARP1 and SIRT1), two enzymes that consume NAD extracellularly (CD38 and CD73), and two enzymes that participate in NAD biosynthesis (NAMPT and NAPRT) were analysed. Among these sequences, there was no significant correlation between the number of NAD-binding residues and protein length. According to the NADbinder analysis results, CD73 had the highest number of NAD interacting residues (51), and the remaining NAD-consuming enzymes had between 37 and 43 residues, which were higher than the ones identified in the 13 previously studied proteins. NAMPT and NAPRT don't interact directly with the NAD molecule, and presented 39 and 18 NAD interacting residues, respectively. However, they bind nicotinamide and nicotinic acid, similar molecules that are NAD precursors, and are responsible for the first steps of their conversion into NAD.

NUDIX containing proteins in NADBPs dataset
As it was previously described that NUDIX domain directly interacts with NAD, the proteins within NADBPs dataset that are NUDIX hydrolases, NUDT7 and NUDT12, were also studied. In NUDT7, 21 NAD interacting residues were identified and, in NUDT12, only six residues were detected. We also searched for their interactors, considering only experimentally validated physical interactions, and found three proteins that interact with NUDT7 and three proteins that interact with NUDT12 ( Figure 4).

Identification of NADBPS as potential new drug targets mutated in cancer
We found 122 proteins that are NAD-binding and potential drug targets. Two of them also belong to the set of cancer mutated genes, fumarate hydratase (FH) and 5 ′ -nucleotidase, cytosolic II (NT5C2). Additionally, three other NADBPs, which mutations are implicated in cancer, were found in the catalogue of FDA approved targets, namely, 5-aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP cyclohydrolase (ATIC), androgen receptor (AR) and isocitrate dehydrogenase 2 (IDH2).

Discussion
NAD binds to a large number of different proteins in order to perform a diversity of functions within the cell. In those reactions, NAD can: (1) act as an enzymatic cofactor in redox reactions, (2) be degraded by NAD-dependent enzymes, and (3) mediate protein-protein interactions, therefore regulating several cellular processes. Our approach in this study to identify potential NAD-binding proteins, drove us to a global analysis of the NAD interactome. We integrated data from various sources to include a large dataset of proteins that were already known to interact with the NAD molecule, or that were in some way related to NAD functions. We functionally characterized the protein datasets through gene ontology and protein structural domains analysis.
Through the analyses of enriched pathways, based on gene ontology annotations, we found that NADBPs are involved in a diversity of cellular pathways. The comparison with the NAD-related or the NAD-PPIs datasets emphasised that NADBPs are central in basic metabolism and biosynthetic processes. Nonetheless, essential metabolic pathways, such as glycolysis and TCA cycle, and signalling pathways mediated by GABA or dopamine receptors, were found in all datasets. Conversely, the proteins that participate in NAD-PPIs are involved in signalling pathways, from development and apoptosis to general immune and hormone responses, and including many disease pathways, showing the extension of the action of this small molecule.
Analysis of the protein structural domains showed that the ankyrin repeats were the most frequent, with some proteins presenting more than one ankyrin repeat in their structures. The ankyrin domain is very frequent in all human proteome as it mediates protein-protein interactions [22] and regulates the function of other proteins [23]. Confirming their high frequency, in the unreviewed dataset here obtained from the full human proteome, based on the UniProt database, 448 proteins have at least one of the five ankyrin repeats.
Adding to protein structural domains, the number of NAD interacting residues was considered, given that the direct binding of NAD at specific sites of a protein ultimately determines its action [17]. NAD binding to the NUDIX homology domain of DBC1 regulates its action on PARP1, by preventing the interaction between the two proteins [5]. In this study, no more than 10 residues were identified within the NUDIX domain that are conserved across several species. Considering the presence of a specific domain with a folding favourable to an interaction with a small molecule, only a small number of residues might be responsible for the actual interaction. The identification of NAD interacting residues within the sequence of known NADBPs, revealed that, while some NAD consuming enzymes had around 40 residues, the two NUDIX-containing domain proteins had lower numbers (21 and 6).
The role of the NAD-capped RNA hydrolase NUDT12 is directly associated with NAD, also known as deNADding enzyme, and it interacts with Bleomycin Hydrolase (BLMH) through the ankyrin repeats of NUDT12 [24]. The known role of the peptidase BLMH is to cleave the anti-cancer peptide Bleomycin, reducing the intracellular levels of the drug, but its primary biological function remains unknown.
Among the new proteins that might potentially bind NAD identified in our study, TRPC3 (UniProt ID J3QTB0) had the ankyrin repeat domain and had the highest number of NAD-interacting residues. The molecular docking performed revealed a potential NAD-binding location on TRPC3. From a total of 31 clusters of docking positions obtained, 26 were placed within a same location, including the ones with the best scoring and lowest estimated energy ( Figure 5). The corresponding reviewed protein (UniProt ID: Q13507) of TRPC3 is longer than the two isoforms detected here, with 836 amino acids. Its known interactions were found to be mostly involved in signal transduction, response to stress, anatomical structure development, and transport processes, many of them related to calcium transport and signalling, such as the inositol trisphosphate (IP3) receptors ITPR1 and ITPR3, and the Sodium/calcium exchanger SLC8A1.
TRPC3 is a member of the transient receptor potential (TRP) channels family, which regulates intracellular calcium concentration [25] and is directly activated by lipids, specifically diacylglycerol (DAG). Together with IP3, DAG is a product of the hydrolysis of a phospholipid catalysed by the phospholipase C (PLC) enzymes. PLC gamma enzymes are key components of intracellular signalling, and some PLCG1 functions have been associated to a specific protein domain that directly interacts with TRPC3 and PLCG1, regulating calcium entry [26]. Very recently, the role of PLC gamma enzymes in disease development has been explored [27]. Of note, PLCG1 was also found in our dataset of NAD-PPIs, showing that it already binds other NADBPs, and several unreviewed isoforms of DAG kinases were identified in this study as potential NADBPs.
Both NAD-dependent signalling and calcium-dependent signalling are essential in the cell and therefore their dysregulation is often associated with disease. In particular, the role of NAD as a regulator of calcium channels has been recently reviewed, due to its impact on cancer treatment research [28], where calcium channels emerge as potential targets for anticancer therapy. In addition to cancer, the TRP channels, namely the TRPC3 group, regulate functions in neurons and are involved in various neurological and psychiatric disorders [29].
Interestingly, only one ion channel was identified in the primary NADBPs dataset in our study, named Transient receptor potential cation channel subfamily M member 2 (TRPM2). Although it was not identified in the domain analysis through Pfam, the presence of NUDIX domains in the structure of TRPM2 has been described in the literature and associated to its conformational changes and gating functions [30]. The activation of TRPM2 by NAD has been documented for over two decades and is one example of the relation between NAD and calcium metabolism [31].
In a final step of this research, we decided to investigate whether some of the NADBPs were potential therapeutic drug targets. We found FH and NT5C2, which are directly involved in NAD related reactions: the former participates in the TCA cycle and the latter in the NAD synthesis, specifically by catalysing the hydrolysis of NMN into NR or NAMN into NAR. Both enzymes are altered in cancer and are also associated with neurological diseases [32][33][34]. In addition, from the NADBPs dataset ATIC, AR and IDH2 are already being used as therapeutic targets. ATIC participates in purine biosynthesis, where it catalyses the last two steps of the pathway [35]. IDH2 is the mitochondrial isoform of the isocitrate dehydrogenases family of enzymes, that depends on NADP and calcium binding to perform the oxidative decarboxylation of isocitrate, one of the steps of the TCA cycle. Therefore, alterations in these enzymes will have an important impact in metabolism. IDH1 and IDH2 mutations have been described in different types of cancer, including glioblastoma, and are being targeted for acute myeloid leukaemia [36,37]. The androgen receptor act as a transcription factor and, when activated by the hormone androgen, binds to target genes, and directly regulates gene transcription of a high number of genes. SIRT1, an NAD-dependent deacetylase, regulates AR activity, linking NAD metabolism to ligand-induced hormone signalling [38]. Aberrant expression of AR contributes to the progression of prostate cancer, making this protein a recognized therapeutic target in this context [39]. Alterations in AR have also been associated to neurological diseases, from developmental deficiencies to neurodegenerative disorders [40].

Conclusions
Concluding, this global study of the NAD interactome resulted in the identification of new potentially NADbinding proteins, including TRPC3 and a few isoforms of DGA kinases, which are involved in calcium signalling. NADBPs participate in several metabolic pathways and signalling processes in the cell, while proteins interacting with NADPBs (NAD-PPIs) are mostly involved in signalling pathways. Furthermore, we identified NADBPs that are known (ATIC, AR and IDH2),as well as potential new drug targets in cancer (FH and NT5C2).