Superenhancer–transcription factor regulatory network in malignant tumors

Abstract Objective This study aims to identify superenhancer (SE)–transcriptional factor (TF) regulatory network related to eight common malignant tumors based on ChIP-seq data modified by histone H3K27ac in the enhancer region of the SRA database. Methods H3K27ac ChIP-seq data of eight common malignant tumor samples were downloaded from the SRA database and subjected to comparison with the human reference genome hg19. TFs regulated by SEs were screened with HOMER software. Core regulatory circuitry (CRC) in malignant tumor samples was defined through CRCmapper software and validated by RNA-seq data in TCGA. The findings were substantiated in bladder cancer cell experiments. Results Different malignant tumors could be distinguished through the H3K27ac signal. After SE identification in eight common malignant tumor samples, 35 SE-regulated genes were defined as malignant tumor-specific. SE-regulated specific TFs effectively distinguished the types of malignant tumors. Finally, we obtained 60 CRC TFs, and SMAD3 exhibited a strong H3K27ac signal in eight common malignant tumor samples. In vitro experimental data verified the presence of a SE–TF regulatory network in bladder cancer, and SE–TF regulatory network enhanced the malignant phenotype of bladder cancer cells. Conclusion The SE–TF regulatory network with SMAD3 as the core TF may participate in the carcinogenesis of malignant tumors.


Introduction
The superenhancer (SE) regulates gene expression and is characterized by high-density epigenetic modifications associated with transcription factors (TFs), and cofactors [1]. The abnormal gene transcription mediated by SEs is essential for maintaining the characteristics of tumor cells [2]. Tumor cells significantly promote the expression of various oncogenes by assembling SE, thereby enhancing their proliferation, invasion, and metastasis [3]. Therefore, the identification and analysis of core SEs and TFs in malignant tumors are of great value in tumor research [4].
Mechanisms of abnormal SE formation in malignant tumors have been demonstrated to exhibit a profound influence on both molecular pathogenesis and clinical management [5]. The transcriptional element, for example, bromodomain-containing protein 4 (BRD4) and cyclin-dependent kinase 7 (CDK7), has been reportedly the treatment target due to tumor-specific SEs [6]. Myeloid cell leukemia-1 gene (MCL1) and B-cell leukemia/lymphoma-xl (BCL-xl) are cell apoptosis regulators and express aberrantly in cancer, which plays an important role in chemoresistance [7]. By chromatin immunoprecipitation and sequencing (ChIP-seq) processing, Oldridge et al. found changed polymorphism within one SE element of LIM domain only 1 (LMO1) significantly affected neuroblastoma susceptibility by different binding of gamma-aminobutyrate (GATA) TF or regulation of LMO1 expression, which resulted in oncogenic dependency of neuroblastoma [8]. Lysine (K)-specific methyltransferase 2D (KMT2D) deficiency was reported to impair SEs to form glycolytic vulnerability in lung cancer, promote tumorigenesis in mice and upregulate protumorigenic progression, including glycolysis [9]. Oncogenic homeobox B8 (HOXB8) was driven by MYC-regulated SEs and enhanced colorectal cancer invasiveness through BTB and CNC homology 1 (BACH1) [10]. Increased expression of PPP1R15A and CDK7 is positively associated with undesirable clinical prognosis in anaplastic thyroid carcinoma. CDK7 and PPP1R15A are considered potential biomarkers and therapeutic targets for anaplastic thyroid carcinoma [11].
Although increasing reports have shown some SEs and TFs are correlated with malignancy, the common and specific SEs and TFs in various tumors, as well as complex regulatory networks, have largely been unknown. Furthermore, a majority of studies are only involved in one type of malignant tumor. Histone H3 Lys27 acetylation (H3K27ac) CHIP-seq and RNA-seq are recently hot tools for studying DNA, RNA, and protein [12,13]. Our study aims to identify and analyze core SEs and TFs in various malignant tumors such as liver cancer, lung cancer, and breast cancer based on H3K27ac ChIP-seq data in the sequence read archive (SRA) database as well as RNA-seq data in The Tumor Genome Atlas (TCGA) database.

Data acquisition
The SRA database (https://www.ncbi.nlm.nih.gov/sra) is a database used to store the original data of second-generation sequencing. The ChIP-seq data of the differential histone modification regions of eight common malignant tumor samples were all from the SRA database. The TCGA project is a joint project initiated by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) in 2006, which included 39 types of malignant tumors investigated from the very first glioblastoma multiforme to the present, involving 29 types of malignant tumor organs, and more than 10,000 tumor samples. The RNA-seq data used in the analysis were all from the TCGA database.
Informed consent: Informed consent was not applicable for this study.
Ethical approval: Ethics committee approval was not applicable for this study.

Data preprocessing
Sample sequencing volume and quality were first evaluated and the Cutadapt software (https://cutadapt.readthedocs. io/en/stable/) was adopted to remove the joint and low-quality bases, followed by data quality control using the FastQC software (https://www.bioinformatics.babraham. ac.uk/projects/fastqc/). Clean reads were aligned to the human reference genome hg19 using the Bowtie2 software (ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/ Homo_sapiens/UCSC/hg19/Homo_sapiens_UCSC_hg19. tar.gz). Unique alignment data were extracted from the obtained SAM files. The BAM files were deduplicated, and the data of the same cell line were merged and sorted using the Samtools [14]. With the input data as control, MACS2 software (https://pypi.org/project/MACS2/) [15] was applied to retrieve the significant H3K27ac peaks (p < 1 × 10 −9 was considered significant). Significant H3K27ac peaks with a distance longer than 2.5 kb from the TSS sites were extracted as the enhancer. The obtained enhancers were extended upstream and downstream by 5 kb from the middle point of the corresponding peaks. Next, the enhancer sequence was segmented into 100 bp bin reads and aligned with the BED files converted from the BAM files generated before. The average number of peaks aligned to all the corresponding bins of an enhancer was regarded as the H3K27ac signal of the enhancer. bam-Coverage in the deepTools [16] was applied to convert the processed BAM file of each sample into a corresponding Bigwig file, and the Integrated Genomics Viewer (IGV) was uploaded for visualization.

Histone H3K27ac modification analysis
The human reference genome hg19 was divided into small fragments with a length of 2 kb, and the number of reads of each sample mapped to each small fragment was calculated and normalized by the sequencing depth of the sample. The normalized H3K27ac signal files of the small fragments were then integrated into a matrix file according to tumor type and the correlation was calculated by an R algorithm.

Prediction of SEs in eight common malignant tumor samples
The SEs and TEs in all samples were calculated using the ROSE algorithm [6,17] combined with the significant H3K27ac peak found by MACS2. The SEs found in different cell lines of the same malignant tumor were merged through the merge module of Bedtools software [18], and then the frequency of occurrence of each merged SE in different cell lines was calculated. In each malignant tumor, the SEs appearing in at least two cell lines were selected for subsequent analysis.

Differential expression analysis of SEs and TEs
The SEs that appeared in at least two cell lines were annotated with HOMER software [19]. Subsequently, the SE-regulated target genes in each malignant tumor sample were extracted for functional enrichment analysis with DAVID software [20,21]. These genes were merged using the merge module of the Bedtools software, and the frequency of all SEs presented in eight common malignant tumors was calculated. The number of different SE-regulated target genes in eight common malignant tumor samples was counted and the conservative SE-regulated target gene and the malignant tumor-specific SE-regulated target gene were defined in sequence. Afterward, a malignant tumor conservative SE-regulated target gene and a malignant tumor-specific SE-regulated target gene were selected and displayed (H3K27ac signal) using IGV. Finally, expression profile data in the TCGA database was used for verification.

Identification of conservative and malignant tumor-specific TFs
Single significant H3K27ac peaks included in the SEs that appeared more than twice in eight common malignant tumors were first extracted. The nucleosome-free regions (NFRs) were then extracted from the bed file converted from the sorted BAM file with HOMER. The single significant H3K27ac peaks were overlapped with the extracted NFRs to identify the NFRs located on the SEs, and the results were saved as NFR bed files. The bed files were used as input to find the TFs regulated by SEs in corresponding malignant tumor samples with HOMER, and the p-value and motif graphs of the corresponding TFs were generated.

Screening of the SE-TE regulatory network in eight common malignant tumors
Core regulatory circuitry (CRC), namely the SE-TE regulatory network, mainly refers to the regulatory loop composed of SEs and core TFs in cells [22]. Generally, the expression of a core TF gene was not only regulated by the corresponding SEs but also regulated by binding of the SEs with the TF itself. CRCmapper software [22] was applied to define the core TF regulatory circuits in each malignant tumor sample. The TFs with the highest score in the SE-TF regulatory network in each malignant tumor sample were counted and collated according to the frequency of their appearance in each malignant tumor sample. For further validation, the expression of all TF coding genes was displayed according to the tumor expression profile data in the TCGA database, and then a CRC TF that appeared several times in malignant tumor core TFs, as well as a specific TF that only appeared in one specific malignant tumor sample, was selected to separately display the H3K27ac signal of the SEs near the two loci by IGV. T24 cells in the logarithmic phase were trypsinized, seeded into a 6-well plate at a density of 1 × 10 5 cells per well, and cultured for 24 h. Upon reaching about 75% confluence, the cells were transfected using the Lipofectamine 2000 reagent (Invitrogen Inc., Carlsbad, CA, USA) with short hairpin RNA plasmids targeting NC (sh-NC), SMAD3 (sh-SMAD3), ETS1 (sh-ETS1), and HOXB2 (sh-HOXB2), and overexpression plasmids of NC (pcDNA3.1), SMAD3 (pcDNA3.1-SMAD3), ETS1 (pcDNA3.1-ETS1), and HOXB2 (pcDNA3.1-HOXB2), as well as sh-NC + pcDNA3.1, sh-SMAD3 + pcDNA3.1, sh-SMAD3 + pcDNA3.1-ETS1, sh-SMAD3 + pcDNA3.1-HOXB2, and sh-SMAD3 + pcDNA3.1-ETS1 + pcDNA3.1-HOXB2 plasmids. After 48 h of transfection, subsequent experiments were carried out, and each experiment was repeated 3 times. The concentration of plasmids used was 50 ng/mL. shRNA or overexpression plasmids were provided by Shanghai GenePharma Co., Ltd (Shanghai, China).

RNA isolation and quantitation
Total RNA was extracted from cells using the TRIzol reagent (16096020, Thermo Fisher Scientific). For mRNA analysis, the extracted RNA was reversely transcribed into cDNA using a reverse transcription kit (RR047A, Takara, Tokyo, Japan). RT-qPCR was conducted using an SYBR® Premix Ex TaqTM II kit (DRR081, Takara) on an ABI 7500 instrument (Applied Biosystems, Foster City, CA, USA), with three repeated wells for each sample. GAPDH served as the internal reference, and the fold changes were calculated using the method of 2 −ΔΔCt . The primer sequences are shown in Table S2.

Transwell assay
Transwell chambers (8 μm pore size; Corning Incorporated, Corning, NY, USA) in 24-well plates were used for in vitro cell migration and invasion detection. In brief, 600 μL of 20% FBS-containing RPMI-1640 medium was preadded to the 8 μm pore-size Matrigel-coated Transwell chambers and the Matrigel-free Transwell chambers and equilibrated at 37°C for 1 h. T24 cells transfected for 48 h were resuspended in RPMI-1640 medium containing 10% FBS, and 100 μL of the cell suspension containing 1 × 10 9 cells/L was added to the upper chamber, and cultured at 37°C with 5% CO 2 for 24 h. The Transwell chamber was removed and the cells in the inner layer of the Transwell chamber were wiped with a cotton swab. After washing with PBS, cells were fixed 4% methanol, and stained with 0.1% crystal violet before counting; they were photographed under an inverted microscope (Olympus IX73, Olympus Optical Co., Ltd, Tokyo, Japan) in five randomly selected fields of view, with three replicates for each specimen. The differences between the groups were analyzed and the histogram of migration and invasion was plotted.
2.13 Cell count kit-8 (CCK-8) assay T24 cell proliferation was evaluated using a CCK-8 kit (K1018, Apexbio, USA). Cells (1 × 10 4 cells per well, 100 μL/well) were plated in a 96-well plate, and 10 μL of CCK-8 solution was added at each time point (0, 24, 48, 72, and 96 h) to incubate the cells at 37°C for 2 h. Next, the optical density (OD) value was measured at 450 nm with a microplate reader, and the obtained data were displayed in curve graphs.

Statistical analysis
All data were analyzed using SPSS 21.0 statistical software (IBM Corp. Armonk, NY, USA). The measurement data were described as mean ± standard deviation. Data obeying normal distribution and homogeneity of variance between two groups were compared by unpaired t-test. Differences among multiple groups were statistically analyzed using one-way analysis of variance (ANOVA) or repeated measures ANOVA, followed by Tukey's post hoc tests with corrections for multiple comparisons. A value of p < 0.05 was statistically significant.

Data information of cell lines in eight common malignant tumor samples in the SRA database
ChIP-seq data of differential histone modification regions in eight common malignant tumor cell lines were downloaded from the SRA database. All cell lines did not receive special treatment, and the data of different cell lines for each malignant tumor were merged and 71 types of cell lines were obtained. The data information of eight common malignant tumor cell lines in the SRA database is shown in Table S3.

Difference in the H3K27ac signal intensity in the enhancer region of eight common malignant tumor cell lines
Analysis of the histone H3K27ac modification changes in the enhancer region of each malignant tumor cell line by the MACS2 software indicated a large number of significant peaks in each malignant tumor cell line (Figure 1a). In addition, the Python script results showed the presence of a large number of enhancers in each malignant tumor cell line, which was consistent with the significant H3K27ac peak trend ( Figure 1b) Figure 1c). Furthermore, we displayed the H3K27ac signals on the neutrophil cytosolic factor 2 (NCF2) gene locus by IGV in the eight selected malignant tumor cell lines, as representative; and we found that H3K27ac signals were highly enriched in the enhancer regions (Figure 1d).  (Figure 2).

Screening of the number of SEs in eight common malignant tumor samples
SEs are a large cluster of transcriptionally active enhancers that drive the expression of genes that control cell identity [23]. To identify the SEs in all the eight common malignant tumors, we applied a method consisting of three steps. The first step was to determine the active enhancer sites, which are described in Figure 1. The next step was to stitch the enhancers we obtained by their distribution and distance between each other. Within the genome range, if the distance between two enhancer annotations were within 12.5 kb, then they were merged into a single entity called the stitched enhancer. Finally, the threshold between the SEs and the TEs was determined. We sorted the stitched enhancers and the remaining single enhancers according to the intensity of the H3K27ac signal level (from low to high) and drafted a graph of the ranking result. On this graph of H3K27ac signal density vs density ranking, we identified the tangent point by a tangent line with a slope of 1, which was considered as the threshold. That is, the points with higher density (to the right and above the tangent point) were SEs, while the rest were TEs (Figure 3a). The significant H3K27ac peak found by the ROSE algorithm combined with the screening results of MACS2 software was used to calculate the SEs and TEs in all samples. The SEs found in different cell lines of the same type of malignant tumor were merged using the merge module of Bedtools software, and the SEs that appeared in at least 2 cell lines in the eight common malignant tumor samples were calculated. The results indicated that eight common malignant tumor samples had a large number of SEs, and colorectal malignant tumor samples had the largest number (2500) (Figure 3b). To further confirm that our prediction of SEs was correct, we selected the PADI1 gene, which was reported to be regulated by an adjacent estrogen receptor alpha-dependent SE [24] and visualized by IGV. The results showed that, except for the COR-L311, HepG2, and KYSE140 cell lines, other cell lines all had strong PADI1 gene enhancer region H3K27ac signal expression, suggesting SE expression ( Figure 3c). re c ta l c a n c e r L iv e r c a n c e r K id n e y c a n c e r G a s tr ic c a n c e r B la re c ta l c a n c e r L iv e r c a n c e r K id n e y c a n c e r G a s tr ic c a n c e r B la

Analysis of conservative and malignant tumor-specific target genes regulated by SEs in eight common malignant tumor samples
Functional enrichment analysis of the SE-regulated genes in malignant tumor samples through DAVID software revealed that these genes were significantly enriched in pathways including cell junction organization, cell junction assembly, and cell-cell junction organization (Figure 4a and b). We collated all SE-regulated genes in eight common malignant tumor samples and obtained 5,923 target genes (Figure 4c and d). As shown in Table S4, there were 2154 SE-regulated genes (conservative target genes) occurring more than 6 times in eight common malignant tumor samples, while 35 genes (specific target genes) occurring only once. We selected the malignant tumor-specific SE-regulated gene A2M and the conservative SE-regulated gene ABALON as representatives and displayed the H3K27ac signal distribution on the two loci by IGV in the eight common malignant tumor cell lines. The results showed that H3K27ac signal peaks on A2M were only observed in a few malignant tumor cell lines (Figure 4e), while the peaks on ABALON was clearly observed in eight common malignant tumor cell lines (Figure 4f). Analysis of the SE-regulated target gene expression data in the eight common malignant tumor samples from the TCGA database revealed that the results were consistent with the H3K27ac ChIP-seq results (Figure 4g).  Figure 2: A heat map of the correlation analysis of the H3K27ac signals in different malignant tumor cell lines.

Identification of malignant tumorspecific and conservative TFs regulated by SEs in eight common malignant tumor samples
The TFs regulated by SEs in each malignant tumor sample were identified using HOMER software, and the top 10 TFs with the smallest p-value were screened. The results displayed that some TFs existed in multiple malignant tumor samples, and some TFs only in specific malignant tumor samples (Figure 5a). The differential expression of all TFs in each malignant tumor sample is shown in Figure 5b. The TF occurring more than 4 times was defined as a conservative TF and that occurring only once was defined as a specific TF. Based on TCGA RNAseq data of eight common malignant tumor samples, we calculated the expression of all the identified TFs in each malignant tumor type (Figure 5c). Finally, we selected a conservative TF kruppel-like factor 5 (KLF5) and

Screening of the SE-TF regulatory network in eight common malignant tumor cell lines
The screening of the SE-TF regulatory network was divided into three steps: the first step was to find all the TFs regulated by the SEs; the second step was to find the TFs that could bind to the SEs to regulate the expression of their own genes from the TFs regulated by SEs, which is the so-called autoregulated; the third step was to merge all autoregulated TFs. These TFs could regulate the expression of each other as well as themselves, and they together formed a SE-TF regulatory network (Figure 6a).
Prediction results of the core TFs in eight common malignant tumor cell lines by CRCmapper software are shown in Table S5. The core TFs with the highest scores in each malignant tumor sample were merged, with a total of 60 core TFs obtained (Figure 6b). To verify the prediction, we analyzed TCGA RNA-seq data of eight common malignant tumor samples and found consistent results with those of CRCmapper software (Figure 6c). We further validated the results by selecting TFs SMAD3 and NR5A1 and employed IGV to analyze the H3K27ac signal in the gene enhancer region. The results showed that SMAD3 had strong H3K27ac signals in eight common malignant tumor samples, while the NR5A1 only had H3K27ac signals in liver cancer samples (Figure 6d and e).  HNF1B  HNF1A  NR5A2  INSM1  DLX2  POU2F3  HOXA7  EOMES  GLI2  FOXD2  FOXL1  GLIS3  ERG  NFATC1  IKZF2  RUNX2  BHLHE40  JUNB  SOX9  SREBF1  FOSL2  HIF1A  MYC  ELF3  HES1  SOX13  FOXP1  RREB1  E2F6  KLF7  PRDM1  RUNX1  NR3C1  MAF  MEF2A  TBX2  FOXC1  HOXB2  KLF16  KLF13  TCF7L2  SMAD3  ETV6  TGIF1  ELF1  KLF4  NR4A1  NFIL3  ETS1  MAX  MEF2D  ASCL2  SOX2  EHF  FOXA1  FOXQ1  The above results indicated that the SE-TF regulatory network with the TF SMAD3 as the core may be involved in the occurrence and development of a variety of malignant tumors.

Validation of the identified SE-TF regulatory network in bladder cancer cells
Next, we aimed to verify the H3K27ac signal of the TF SMAD3 in tumor samples by culturing eight kinds of malignant tumor cell lines. The results of ChIP-PCR showed SMAD3 had a strong H3K27ac signal in eight common malignant tumors (Figure 7a), suggesting that the SE-TF regulatory network with SMAD3 as the core may be involved in the occurrence and development of a variety of malignant tumors. Further, we took bladder cancer as an example for verification. The core TFs of bladder cancer were SMAD3, ETS1, and HOXB2. The results of RT-qPCR showed (Figure 7b) these TFs were expressed at an average level in the bladder cancer cell line T24. T24 cells were transfected with shRNA and overexpression plasmids targeting these TFs, the efficiency of which was confirmed by RT-qPCR and western blot analysis (Figure 7c and d).
In order to achieve the best silencing efficiency, the siRNA-pool was transfected into T24 cells where the expression of TFs was determined by RT-qPCR and western blot analysis. The results presented that silencing the expression of any TFs would result in a decrease in the expression of other TFs, and meanwhile, overexpression of any TFs would also lead to an increase of other TFs (Figure 7e and f). In addition, silencing any TFs and overexpressing other TFs simultaneously in T24 cells could partially reverse the above results (Figure 7g and h). These results indicated that SMAD3, ETS1, and HOXB2 in bladder cancer interacted with each other to construct a SE-TF regulatory network.
For further verification of the direct transcriptional regulation between these TFs, we used ChIP-PCR to detect the binding of each TF to the promoter regions of the other two TFs. The results displayed that each TF could bind to the promoter regions other than itself (Figure 7i), indicating that all TFs were connected to each other to form a network. The above results demonstrated the presence of a SE-TF regulatory network in bladder cancer.

SE-TF regulatory network enhances the malignant phenotype of bladder cancer cells
Finally, we attempted to elucidate the effect of the SE-TF regulatory network on the function of T24 cells. The results of Transwell and CCK-8 assays showed that overexpression of TFs stimulated the migration, invasion of T24 cells, accompanied by enhanced proliferation while silencing of TFs resulted in opposite results (Figure 8a). Furthermore, weakening of the malignant phenotype of T24 cells caused by silencing of SMAD3 could be partially reversed by overexpression of other one or two TFs (Figure 8b). The aforementioned findings indicated the promoting effect of the SE-TF regulatory network on the malignant phenotype of bladder cancer cells.

Discussion
SEs and TFs are associated with many genes involved in cancer pathogenesis [5,25]. Primary oncogene promoters of tumor cells are controlled by SE, rendering selective activation of transcription [26]. Mechanisms of abnormal SE formation in malignant tumors have been demonstrated to exhibit profound influence on both molecular pathogenesis and clinical management [5]. Our study is to identify and validate core SEs and TFs in eight common malignant tumors based on H3K27ac ChIP-seq in the SRA database as well as RNA-seq data in the TCGA database, followed by verifications in bladder cancer cell experiments. First, we obtained H3K27ac and Input ChIP-seq sequencing data of eight common malignant tumor cell lines including gastric cancer through the SRA database and downloaded FPKM data of all RNA-seq from the TCGA database for verification. Similarly, other research works also downloaded H3K27ac and Input ChIP-seq sequencing data of tumor cell lines from the SRA database [27][28][29]. Unique to our result, FPKM data of all RNA-seq were downloaded from the TCGA database for verification; ChIP-seq sequencing data were not specially processed and from the same malignant tumor cell lines before merging. This study targets eight common malignant tumors, whereas the former reports only refer to a kind of malignant tumor [3,30]. Then, MACS2 software was used to find the H3K27ac peak in each malignant tumor cell line. Taking the significant H3K27ac peak at more than 2.5 kb from the TSS site as the enhancer, we found that abundant enhancers existed in each malignant tumor cell line, which was consistent with the tendency of the significant H3K27ac peak. The NCF2 gene was selected and IGV was used to show the H3K27ac signal on the enhancer of eight common malignant tumors. Generally, Bowtie2 is used for comparison in the preliminary processing of ChIP-seq sequencing data, and the only comparison data are obtained by Samtools to  deduplicate [31][32][33]. Differently, this study merged data from the same tumor cell line and identified the significant H3K27ac peak through the MACS2 software. The significant H3K27ac peak at more than 2.5 kb from the TSS site was used as the enhancer. Then the significant H3K27ac peak was compared with the identified enhancer, and finally, all data analyzed were displayed as a whole. Next, we found that H3K27ac signals of same malignant tumor cell line could be clustered together by analyzing the correlation of H3K27ac signals, indicating that H3K27ac pattern is different and the H3K27ac signal can distinguish malignant tumors. We used the ROSE algorithm to predict SEs. A single enhancer entity within the range of 12.5 kb was merged and SEs that occurred in at least two cell lines were calculated so that the range was diminished and the stability of the prediction results was improved.
In addition, we examined all SE-regulated genes in eight common malignant tumors. Thirty-five genes that occurred more than six times in all malignant tumors were defined as conservative SE-regulated genes, and 2,154 genes were specific for each malignant tumor sample. The data analyzed in this study were only for SEs that occurred in at least two cell lines, and the specific and conservative SEs were defined differently. The defined SEs were verified by IGV, and the SE-regulated genes in all malignant tumors were obtained from the TCGA database. Finally, the expression data in eight common malignant tumors were compared with the analysis results of H3K27ac ChIP-seq data. Generally, the core TFs that bound to SEs in each malignant tumor sample were found with HOMER software [34][35][36]. Importantly, the TFs that appeared more than four times in malignant tumors were defined as malignant tumor-conservative TFs, while TFs that appeared only once were defined as malignant tumor-specific TFs. Based on TCGA RNA-seq data, we calculated the expression of all TFs in each type of malignant tumor, which can be distinguished according to the defined specific and conservative TFs. Conservative TF KLF5 and specific TF POU2F2 were chosen to exhibit the expression of all malignant tumors through a beeswarm package, which also could distinguish malignant tumor-specific TFs from conservative TFs. Finally, CRC TFs were predicted in eight common malignant tumor cell lines through the CRCmapper software, which is the same as other reports [37,38]. This study combined with the expression profile data of patients in the TCGA database, and also showed some of the identified CRC TFs near SE H3K27ac signal through the IGV. Furthermore, the in vitro experimental data demonstrated the presence of a SE-TF regulatory network in bladder cancer, and the SE-TF regulatory network enhanced the malignant phenotype of bladder cancer cells.
In conclusion, we integrated the H3K27ac ChIP-seq of eight common malignant tumor cell lines and RNA-seq from cancer patients in the TCGA database and identified core SEs and TFs. In all malignant tumor samples, there were 35 SE-regulated genes that occurred more than six times, while 2,154 SE-regulated genes occurred only once, which were defined as conservative SE-regulated genes and specific SE-regulated genes, respectively. We found the core TFs bound to SEs in each malignant tumor sample with HOMER. The TFs that appeared more than four times in tumor samples were defined as malignant tumor-conservative TFs including Fral, KLF5, and NFY, while TFs that appeared only once were defined as specific TFs such as POU2F2. Finally, eight common malignant tumor cell lines including 76NF2V and 786-M1A were selected to predict CRC TFs with CRCmapper software. A total of 60 CRC TFs were obtained, among which SMAD3 were present in five types of common malignant tumors, while 46 CRC TFs including NR5A1 were only present in one type of malignant tumor. Taken together, our study provides new ideas for the research of these malignant tumors and experimental validations in cancer cells. However, identified conservative and specific SEs and TFs need to be verified by basic experiments, which might indicate a promising method to improve malignant tumor therapy. Table S1: Primer sequences for ChIP-PCR Gene Sequence SMAD3 (human) Forward: 5ʹ-CTCCTGTCTTGCCCCACTTT-3ʹ Reverse: 5ʹ-GGTTGGACTCGCAGCAAGTA-3ʹ ETS1 (human)