Differentially methylated CpG sites associated with the high-risk group of prostate cancer

Abstract Prostate cancer (PC) is one of the most common and socially significant oncological diseases among men. Bioinformatic analysis of omics data allows identifying molecular genetic changes associated with the disease development, as well as markers of prognosis and response to therapy. Alterations in DNA methylation and histone modification profiles widely occur in malignant tumors. In this study, we analyzed changes in DNA methylation in three groups of PC patients based on data from The Cancer Genome Atlas project (TCGA, https://portal.gdc.cancer.gov): (1) high- and intermediate-risk of the tumor progression, (2) favorable and unfavorable prognoses within the high-risk group, and (3) TMPRSS2-ERG-positive (tumors with TMPRSS2-ERG fusion transcript) and TMPRSS2-ERG-free cases within the high-risk group. We found eight CpG sites (cg07548607, cg13533340, cg16643088, cg18467168, cg23324953, cg23753247, cg25773620, and cg27148952) hypermethylated in the high-risk group compared with the intermediate-risk group of PC. Seven differentially methylated CpG sites (cg00063748, cg06834698, cg18607127, cg25273707, cg01704198, cg02067712, and cg02157224) were associated with unfavorable prognosis within the high-risk group. Six CpG sites (cg01138171, cg14060519, cg19570244, cg24492886, cg25605277, and cg26228280) were hypomethylated in TMPRSS2-ERG-positive PC compared to TMPRSS2-ERG-negative tumors within the high-risk group. The CpG sites were localized, predominantly, in regulatory genome regions belonging to promoters of the following genes: ARHGEF4, C6orf141, C8orf86, CLASP2, CSRNP1, GDA, GSX1, IQSEC1, MYOF, OR10A3, PLCD1, PLEC1, PRDM16, PTAFR, RP11-844P9.2, SCYL3, VPS13D, WT1, and ZSWIM2. For these genes, analysis of differential expression and its correlation with CpG site methylation (β-value level) was also performed. In addition, STK33 and PLCD1 had similar changes in colorectal cancer. As for the CSRNP1, the ARHGEF4, and the WT1 genes, misregulated expression levels were mentioned in lung, liver, pancreatic and androgen-independent prostate cancer. The potential impact of changed methylation on the mRNA level was determined for the CSRNP1, STK33, PLCD1, ARHGEF4, WT1, SCYL3, and VPS13D genes. The above CpG sites could be considered as potential prognostic markers of the high-risk group of PC.


Introduction
Prostate cancer (PC, MeSH -D011471) is a common malignant neoplasm in men worldwide [1]. Currently, to predict the course of PC, patients are stratified into appropriate risk groups based on the following criteria: pathological stage of the tumor (pT), prostate-specific antigen (PSA) level before surgery, and Gleason score [2]. However, these criteria often incorrectly reflect the aggressive tumor phenotype. The solution to this problem can be the study of tumor molecular genetic characteristics using modern approaches. Bioinformatic analysis of omics datasets (genome, transcriptome, and methylome) enables identifying molecular changes that can be associated with the tendency of a tumor to disseminate or can predict the time from radical prostatectomy to disease progression.
Epigenetic changes occur in all types of malignant tumors and include perturbation of both the DNA methylation and the histone modification patterns [3,4]. These changes can be associated with various clinical and pathological characteristics and, in some cases, allow to conclude about the prognosis [3]. Aberrant CpG methylation was found in various malignant tumors even at the early stages [4]. However, it is necessary to clearly distinguish between the role of aberrant methylation of the promoter regions and global hyper/ hypomethylation throughout the genome, including intergenic and intronic regions. Hypermethylation of CpG islands can contribute to genetic instability and enhance cell growth, proliferation, and invasion [4]. For PC, global DNA hypomethylation is almost always associated with the late stages of the disease and is usually found in metastatic tissues [5].
The most commonly described change of the methylation pattern in PC concerns the promoter of the GSTP1 gene [6], which is involved in DNA repair [7]. Its hypermethylation was detected in 90% of PC samples and 50% of hyperplasia prone to malignancy [8]. The GSTP1 [9], APC [10], RASSF1A [11], RARB [3], CCND2 [12], EphA5 [13], and PTGS2 [14] genes were detected to be hypermethylated in PC compared with adjacent normal prostate tissues. Promoter DNA methylation of GSTP1 [15], RARB [16], RASSF1 [17], and APC [18] was widely studied as a non-invasive marker for PC early diagnosis. Hypermethylated GSTP1 promoter detecting in blood or urine are associated with the presence of PC [17]. Tumors carrying a mutation in the IDH1 gene, which amount 1% of all PC cases, also have an increased level of DNA methylation [19].
In some cases, subgroups of malignant tumors are featured with the so-called CpG island methylator phenotype (CIMP) that is characterized by intense hypermethylation of the gene promoter regions and is associated with an unfavorable prognosis in colorectal cancer [20,21]. The existence of the CIMP was firstly demonstrated for colorectal cancer and then was shown for bladder, breast, endometrial, gastric, hepatocellular, and lung cancer, as well as gliomas [21]. The presence of the TMPRSS2-ERG fusion transcript indicates one of the most common molecular subtypes of PC. The presence of this fusion transcript has been considered as a marker of unfavorable prognosis in PC [19]. CIMP has not been found in PC, however, higher overall genome methylation level was shown in the TMPRSS2-ERG-negative cases of PC [22]. It was reported that among TMPRSS2-ERG-positive samples methylation clusters were found; moreover one-third of TMPRSS2-ERG-positive samples of PC has been seen to be characterized by hypermethylated cluster [19]. However, the association of aberrant DNA methylation with the PC prognosis currently remains unclear [23].
The study aims to identify differentially methylated CpG sites associated with the high-risk group of PC, including unfavorable prognosis within the group and TMPRSS2-ERG molecular subtype, based on The Cancer Genome Atlas (TCGA) project data.

Methods
The analysis of differential CpG methylation was carried out in the R statistical environment (v. 3.5.2) [25]. For comparison of β-value between groups, BiSeq (v.1.22.0) [26] package was used. The Mann-Whitney test, β-regression, and logistic regression modeling were applied. We considered CpG sites (Illumina CpG IDs -cg#) with p-value <0.05 in all three tests as differentially methylated. To retrieve CpG sites mostly differentiating two patient groups, fold-change (Log 2 FC) and Δβ-value between comparison groups were calculated. Spearman's rank correlation (standart "cor.test" function) analysis of detected CpG sites with the high-risk group was fulfilled. CpG site annotation (genomic position, gene name, promoter or enhancer) was accomplished by Ensembl [27] and GeneHancer [28] databases, UCSC browser [29], and annotatr (v.1.8.0) [30]. When selecting top-ranked CpG sites the preference was given to ones located in regulatory genomic regions (promoters or enhancers).
Differential expression analysis was carried out on the same samples using edgeR package (v.3.24.3) [31]. The trimmed mean of M-values (TMM) normalization method of count matrix was used; Quasi-likelihood (QLF), Exact Fisher's (ET), and Mann-Whitney tests were applied for detecting differences between comparison groups. In addition, changes in gene expression level between the comparison groups (Log 2 FC) and overall gene expression level in the cohort (Log 2 CPM) were calculated. Spearman's rank correlation (standart "cor.test" function) analysis of identified CpG sites with their gene expression level was fulfilled. Differentially expressed genes were annotated by biomaRt package (v.2.38.0) [32,33].

Criteria
Parameter High risk, n Intermediate risk, n

Differentially methylated CpG sites associated with the unfavorable prognosis in the high-risk group of PC
We identified seven differentially methylated CpG sites (p-value ≤0.05) in the unfavorable prognosis group of PC compared with the favorable one: cg00063748, cg06834698, cg18607127, cg25273707, cg01704198, cg02067712, and cg02157224. Among them, the cg01704198 and cg02067712 sites were hypermethylated (FC >1; Δβ-value >0), when other CpG sites were characterized by the hypomethylation status (FC <1; Δβ-value <0) (Figure 1b). Six identified CpG sites were localized in the promoter regions of the PRDM16, OR10A3, RP11-844P9.2, CLASP2, GSX1, and C8orf86 genes; the cg25273707 CpG site belonged to the transcription factor (TF)binding region (Table 2) [27][28][29][30]. Differential expression analysis revealed no significant expression changes of the above genes between the unfavorable prognosis group and the favorable one within the high-risk group of PC (Table 3).

Discussion
DNA methylation is one of the main mechanisms of gene expression regulation. In adult normal somatic cells, oncogene silencing is maintained by the promoter methylation, when promoter methylation of tumor suppressor genes does not occur [4]. Altered DNA methylation leads to the deregulation of gene expression patterns and disruption of crucial cellular processes, such as DNA repair, cell adhesion, cell cycle control, and apoptosis, contributing to the development of cancer [4,60]. Cancer-associated genome-wide hypomethylation more often occurs than individual gene hypomethylation [60]. At the same time, hypermethylation can be seen in promoters of individual genes in carcinogenesis a lot [60]. In this study, we found both hypermethylation and hypomethylation of CpG sites of individual genes associated with the high-risk group of PC. Identified genes have not been previously reported as oncogenes or tumor suppressor genes.
Comparison of the high-and intermediate-risk groups of PC revealed eight hypermethylated CpG sites in promoters of different genes. The decreased expression has been found only for three out of eight genes (CSRNP1, STK33, and PLCD1). For these genes, we observed a negative correlation of CpG site methylation status (β-value levels) and expression changes. Spearman's rank correlation coefficients were statistically significant but had low values. Thus, we can conclude that there is a tendency of the impact of these CpG site hypermethylation on the gene expression. The hypermethylation of other identified CpG sites was not associated with expression alterations of corresponding genes. Notably, aberrant methylation of the STK33, and PLCD1 genes was observed in other cancers. In particular, often promoter hypermethylation of the PLCD1 gene was shown to be associated with its downregulation in breast [38], gastric [39], and colorectal cancers [40], as well as chronic myeloid leukemia [41]. In colorectal cancer, PLCD1 promoter hypermethylation and its decreased expression were correlated with tumor progression [42]. The hypermethylation of the STK33 gene promoter was associated with progression of colorectal [34,35] and head and neck cancers [36]; no data on the altered gene expression were previously reported. For IQSEC1 gene, we did not observe a significant expression change correlated with the CpG methylation status. However, hypermethylation of the IQSEC1 gene promoter and its downregulation was reported in lung cancer [37]. Methylation status of CSRNP1 has not been earlier studied, however, the gene expression was decreased in hepatocellular [44] and lung cancers [45] correlating with tumor progression.
Seven differentially methylated CpG sites were found under comparison of the favorable and unfavorable prognosis within the high-risk group of PC. Additional analysis of differential expression of genes with identified CpG sites revealed no significant expression changes. Therefore, aberrant methylation of identified CpG sites does not influence the gene expression. Two genes (PRDM16 and CLASP2) genes have been previously shown to be involved in cancer. Promotor hypermethylation and downregulated expression of the PRDM16 gene was observed in lung cancer [49]. In gastric cancer, decreased PRDM16 expression was associated with an unfavorable prognosis [50]. Methylation status of the CLASP2 gene has not been studied; however, the gene upregulation was detected in bladder cancer [51].
The analysis of TMPRSS2-ERG-positive tumors within the high-risk group of PC revealed six hypomethylated CpG sites in different genes, among which significant upregulation was observed for ARHGEF4, WT1, SCYL3, and VPS13D. Expression changes in these genes were negatively correlated with the β-value levels of the identified CpG sites. Thus, hypomethylation of cg01138171, cg19570244, cg25605277, cg26228280 CpG sites can potentially upregulate the expression of the corresponding genes. In the literature, there are no data on the methylation status of the identified genes. However, the ARHGEF4 and WT1 genes were characterized by increased expression in pancreatic [52,53] and prostate cancers [57,58] that correlated with unfavorable prognosis and poor survival of patients.
Likewise our study, the STK33 and the PLCD1 genes had similar both methylation changes and expression signatures in colorectal cancer, indicating their potential effect on the gene expression. With regards to the CSRNP1, the ARHGEF4, and the WT1 genes, shifted expression were noticed in lung, liver, pancreatic and androgen-independent prostate cancer. However, methylation or expression changes in SCYL3 and VPS13D have never been marked in any cancer. Thus, we found differential methylation of several CpG sites associated with the high-risk group of PC. Furthermore, aberrant methylation was related to individual CpG sites located predominantly in the gene promoter regions. CSRNP1, STK33, PLCD1, ARHGEF4, WT1, SCYL3, and VPS13D were also characterized by significant changes in the mRNA levels negatively correlated with the methylation status of identified CpG sites. Identified CpG sites could be considered as potential prognostic markers of the high-risk group of PC.