Potential prognostic markers and significant lncRNA–mRNA co-expression pairs in laryngeal squamous cell carcinoma

Abstract lncRNA–mRNA co-expression pairs and prognostic markers related to the development of laryngeal squamous cell carcinoma (LSCC) were investigated. The lncRNA and mRNA expression data of LSCC in GSE84957 and RNA-seq data of 112 LSCC samples from TCGA database were used. Differentially expressed genes (DEGs) and lncRNAs (DE-lncRNAs) between LSCC and para-cancer tissues were identified. Co-expression analysis of DEGs and DE-lncRNA was conducted. Protein–protein interaction network for co-expressed DEGs of top 25 DE-lncRNA was constructed, followed by survival analysis for key nodes in co-expression network. Finally, expressions of several DE-lncRNAs and DEGs were verified using qRT-PCR. The lncRNA–mRNA network showed that ANKRD20A5P, C21orf15, CYP4F35P, LOC_I2_011146, XLOC_006053, XLOC_I2_003881, and LOC100506027 were highlighted in network. Some DEGs, including FUT7, PADI1, PPL, ARHGAP40, MUC21, and CEACAM1, were co-expressed with above lncRNAs. Survival analysis showed that PLOD1, GLT25D1, and KIF22 were significantly associated with prognosis. qRT-PCR results showed that the expressions of MUC21, CEACAM1, FUT7, PADI1, PPL, ARHGAP40, ANKRD20A5P, C21orf15, CYP4F35P, XLOC_I2_003881, LOC_I2_011146, and XLOC_006053 were downregulated, whereas the expression of LOC100506027 was upregulated in LSCC tissues. PLOD1, GLT25D1, and KIF22 may be potential prognostic markers in the development of LSCC. C21orf15-MUC21/CEACAM1/FUT7/PADI1/PPL/ARHGAP40 are potential lncRNA–mRNA pairs that play significant roles in the development of LSCC.


Introduction
Squamous cell carcinoma of the head and neck is the 6th most common malignancy worldwide with nearly 177,000 new cases in 2018 [1]. Laryngeal squamous cell carcinoma (LSCC) is the second common malignant tumor of the head and neck, comprising 96% of all laryngeal cancers [2]. It has been reported that the mortality rates and crude incidence of laryngeal cancer in China from 2008 to 2012 are 1.01/100,000 and 1.22/100,000, respectively, higher in men than in women [3]. Smoking and alcohol consumption, virus infection, and air pollution are considered as main factors inducing LSCC [4]. Although significant advances in LSCC detection and treatment have been made, the 5-year survival rate and prognosis of LSCC are still poor [5,6]. Thus, it is of great importance to clarify the molecular mechanisms of LSCC to establish more effective biomarkers or appropriate treatment targets.
In the last two decades, the molecular biomarkers and relative regulatory mechanisms of LSCC have been widely investigated [7]. Numerous long noncoding RNAs (lncNRAs) are closely associated with the development of some cancers [8]. For example, lncRNA SNHG1 is overexpressed in LSCC tissues, which is involved in the proliferation and metastasis of LSCC [9]. It is reported that some lncRNAs cooperate with nearby protein coding genes to constitute "lncRNA-mRNA pairs" that affect their function [10]. For instance, Kong et al. [11] indicated that lncRNA FOXC1-FOXCUT pair might be involved in oral squamous cell carcinoma progression. Yang et al. [12] reported that TCONS_00010232, ENST00000564977, and ENST00000420168 might affect CASP3 and FOXQ1 expression in HPV-18 positive cervical cancer cell. Zhou et al. [13] found several lncRNA-mRNA pairs, such as lncRNA-LMO1-2-RIC3 and lncRNA-MCL1-ADAMTSL4, which might play vital roles in the progression of hypopharyngeal squamous cell carcinoma. Besides, Feng et al. [14] suggested that lncRNA NR_027340-ITGB1, lncRNA MIR31HG-HIF1A, and lncRNA SOX2-OT-DDIT4 were important for advanced LSCC. However, the previous studies about lncRNA-mRNA pairs were not enough to elucidate the molecular mechanisms of LSCC development.
In the current study, the lncRNA and mRNA data of GSE84957 and the RNA-seq data of 112 LSCC samples from the cancer genome atlas (TCGA) database were used for the analysis. Differentially expressed genes (DEGs) and differentially expressed lncRNAs (DE-lncRNAs) between LSCC tissues and adjacent normal tissues were identified. Subsequently, co-expression analysis of DEGs and DE-lncRNA was conducted. Protein-protein interaction (PPI) prediction for top 25 DE-lncRNA co-expressed DEGs was performed, followed by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for lncRNA. After that, transcription factor (TF) and microRNA (miRNA) prediction and functional enrichment analysis of co-expressed DEG and survival analysis for key nodes in co-expression network were conducted. Finally, the expressions of several DE-lncRNAs and DEGs in paired samples of LSCC and adjacent tissues were verified using quantitative real-time-PCR (qRT-PCR). We aimed to find significant lncRNA-mRNA pairs and important prognostic genes in the development of LSCC and then tried to elucidate its molecular mechanisms.

Data source
The lncRNA and mRNA expression profiles of LSCC were all analyzed in this study. The lncRNA and mRNA dataset GSE84957 involving 9 pairs of primary Stage IV LSCC tissues and adjacent normal tissues were also downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/) database. The expression data of this dataset were generated from the platform of GPL17843 Agilent-042818 Human lncRNA Microarray 8_24_v2.
In addition, the clinical data and RNA-seq data of 112 LSCC samples were achieved from the cancer genome atlas (TCGA) database. In brief, the clinical data and RNA-seq data of TCGA-head and neck squamous cell carcinoma (TCGA-HNSC) were downloaded from UCSC Genome Browser. According to the clinical information, the samples with tumor location at larynx were selected.

Data preprocessing and identification of DEGs and DE-lncRNAs
After obtaining the raw data of lncRNA-mRNA, the data were preprocessed with linear models for microarray data (limma) software [15], including background correction, data normalization, and concentration prediction. Following data annotation, when several probes were matched to one gene entry, the final expression value was calculated by the mean of these probes. The DEGs analysis between the tumor and control samples was conducted using Bayes test and the p values were revised by Benjamini/Hochberg (BH) method. The DEGs and DE-lncRNAs were screened, and |log 2 fold-change (FC)| > 1 and adjusted p value <0.05 were deemed as significantly thresholds. The information of protein coding gene (V32) provided by Gencode (https://www.gencodegenes.org/) database [16] was applied to annotate the RNA-seq data of LSCC samples from TCGA into mRNA and lncRNA expression matrixes for following analysis.

Co-expression analysis of DEGs and DE-lncRNA
The expression matrixes data of DEGs and DE-lncRNAs identified from GSE84957 dataset were extracted to conduct the pearson correlation analysis. The pearson correlation coefficient (r) between each DEGs and DE-lncRNA was calculated. Then, DE-lncRNA-DEG pairs with r > 0.9 and p value <0.05 were selected, among which the pairs of top 25 expression changed DE-lncRNAs and their coexpression DEG was considered as important for following analysis.

Protein-protein interaction (PPI) prediction for top 25 DE-lncRNA co-expressed DEGs
The STRING database (http://string-db.org/) provides the functional partnerships and interactions between proteins for more than 2000 organisms [18]. The PPIs pairs between proteins edited by DEGs from the above significant correlated top 25 DE-lncRNA-DEGs co-expression pairs were analyzed using STRING (version 10.0) with setting PPI score as 0.4. Afterwards, the PPI network construction was conducted using Cytoscape software (version 3.2.0, http://www.cytoscape.org/) [19].

Kyoto Encyclopedia of Genes and
Genomes (KEGG) pathway enrichment analysis for lncRNA KEGG database, as a resource for deciphering genome and pathways, reveals biological interpretation of genes in molecular datasets [20]. lncRNA-enriched pathway was predicted based on functional pathways of each lncRNA co-expressed mRNA using the clusterprofiler package [21] in R software (Version 3.14.0, http://bioconductor.org/ packages/3.2/bioc/html/clusterProfiler.html). BH-adjusted p value <0.05 and count >1 were used to present the significantly enriched KEGG pathways.

Transcription factor (TF) prediction and functional enrichment analysis for co-expressed DEGs
TFs are major trans-acting factors in transcriptional regulation, which is crucial to investigate the regulatory circuitry underlying complex traits. TRRUST is a database of reference TF-target regulatory interactions in humans based on literature curation, which conducted sentencebased text mining and prioritized the candidate sentences for the cost-effective literature curation [22]. TF was predicted for co-expressed DEGs using TRRUST v2 (https:// www.grnpedia.org/trrust/). TF-target genes network was constructed using Cytoscape. Gee ontology (GO)-biological process (BP), GOcellular component (CC), GO-molecular function (MF), and KEGG pathway enrichment analyses were performed using R package clusterProfiler v 3.14.0. KEGG pathways and GO terms with adjusted p < 0.05 (method: BH) were screened, and the top 10 pathways/terms were presented using bubble chart.

microRNAs (miRNAs) prediction and functional enrichment analysis
miRNAs are noncoding small endogenous RNAs which mediate posttranscriptional gene regulation, which are reported to implicate in various biological processes, such as cell proliferation and apoptosis, disease development, and angiogenesis [23]. Therefore, we further predicted miRNAs for the co-expressed DEGs using Webgestalt (http://www.webgestalt.org/option.php) database. p value <0.05 was used to select miRNA-target interactions. miRNAstarget genes network was visualized using Cytoscape. Functional enrichment analyses were performed using R package clusterProfiler v 3.14.0, and BH-adjusted p < 0.05 was used to show significant enriched terms.

Survival analysis for key genes in co-expression network
The expression values of all genes and prognosis and survival information were extracted from TCGA database. The genes were divided into low or high expression group based on the median expression in all samples using R package Survival [24] (Version: 2.42-6 https://cran.rproject.org/web/packages/survival/index.html). Genes with p < 0.05 in survival analysis were considered as prognosis significantly related genes. In addition, Kaplan-Meier (K-M) survival curves were plotted. Furthermore, clinical information was analyzed based on progress-free survival (PFS) provided by TCGA.

qRT-PCR analysis
In total, five paired LSCC and adjacent nonneoplastic tissues samples were collected from five LSCC patients who underwent surgery in Otolaryngology Department of Gulou Hospital affiliated to Nanjing Medical College, and these tissue samples were then used in this study. The characteristics of patients included in the study are listed in Table A1. Total RNA from frozen tissue (50-100 mg) homogenized in 1 mL TRIZOL reagent was extracted by TRIzol reagent (9109, Takara, Japan) according to the manufacturer's protocol. Then, qRT-PCR was conducted to verify the expressions of several DEGs and DE-lncRNAs identified in this study. mRNA was reversed transcribed to cDNA using primeScript RT Master MIX (RR036A, Takara), and reverse transcription reaction for miRNA was conducted with PrimeScript II RTase 1st Strand cDNA Synthesis Kit (6210A, Takara). Subsequently, amplification was conducted using Power SYBR Green PCR Master Mix (A25742, Thermo) with the reaction conditions as following: 50°C for 3 min, 95°C for 3 min, and 40 cycles of 95°C for 10 s and 60°C for 30 s. GAPDH was applied as internal controls for mRNAs. Table 1 lists the primer sequences of genes. The 2 −ΔΔCt method was applied to calculate relative expression of genes.
Informed consent: Informed consent has been obtained from all individuals included in this study.
Ethical approval: The research related to human use has been complied with all the relevant national regulations, institutional policies, and in accordance with the tenets of the Helsinki Declaration, and has been approved by the Ethics Committee of the Gulou Hospital affiliated to Nanjing Medical College.

Statistics analysis
All the data were presented as mean ± standard deviation. Statistics analysis was performed using Graphpad prism 5 (Graphpad Software, San Diego, CA), and the express values between groups were compared using Student's t-test. p < 0.05 was deemed statistically significant.

Identification of DEGs and DE-lncRNAs
Under the cut-off of |log 2 FC| > 1 and adjusted p value <0.05, a total of 1,149 DEGs (including 783 up-and 366 downregulated DEGs) and 142 DE-lncRNAs (including 74 up-and 68 downregulated DE-lncRNAs) were identified across LSCC tissues and normal tissues samples. The results of heatmaps showed that these DEGs and DE-lncRNAs could clearly distinguish the LSCC samples from normal samples, which verified DEGs and DE-lncRNAs were credible and could be used for following analysis (Figure 1).

Co-expression analysis of top 25 DE-lncRNA and DEGs
According to the given threshold, a total of 338 coexpressed regulation pairs between top 25 DE-lncRNA and DEGs (involving 17 DE-lncRNA and 145 DEGs) were identified. PPI prediction was performed for these 145 DEGs, of which 174 interaction pairs were predicted for 82 DEGs. Then, lncRNA-mRNA network (Figure 2, Table S1) was constructed by integrating these relations. It showed that seven significant downregulated DE-lncRNAs with lowest log 2 FC values (ANKRD20A5P, C21orf15, CYP4F35P,

KEGG pathway enrichment analysis for DE-lncRNA
KEGG pathway enrichment analysis for lncRNAs in the lncRNA-mRNA network was performed based on each lncRNA co-expressed mRNA ( Figure A1). It could be seen that there were similarities and differences on the involved KEGG pathways of these lncRNAs. For example, XLOC_l2_003881 and XLOC_007734 were significantly related to chemical carcinogenesis, drug metabolismcytochrome P450 and serotonergic synapse, etc. While LOC100505813 and DNAPTP3 were associated with ECM− receptor interaction, platelet activation, and focal adhesion, LOC100652832 was implicated in proteasome.

TF prediction and functional enrichment analysis for co-expressed DEGs
After TF prediction for 145 DEGs, 75 TF-mRNA pairs were obtained, which included 22 TFs (e.g., SP1, NFKB1, RELA, and JUN) and 27 DEGs (e.g., upregulated COL1A1, MMP11, PTHLH, and KRT14; downregulated PPL and CEACAM1) ( Figure 3a). As presented in Figure 3b, DEGs in TF-mRNA network were markedly enriched in amoebiasis, ECM−receptor interaction, protein digestion and absorption, staphylococcus aureus infection, and AGE-RAGE signaling pathway in diabetic complications. Furthermore, the evidently enriched GO-BP terms included skin development, cornification, and keratinization; GO-CC terms included intermediate filament, intermediate filament cytoskeleton, and collagen-containing extracellular matrix; while GO-MF terms included extracellular matrix structural constituent and structural constituent of cytoskeleton.

Survival analysis
Survival analyses were conducted for one lncRNA (HCG22) and all the above mRNA nodes. The results showed that PLOD1 (p = 0.016), GLT25D1 (also named COLGALT1, p = 0.034), and KIF22 (p = 0.032) were significantly associated with prognosis (Figure 5a-c). The expression values of these three genes in GSE84957 were presented as box plot (Figure 5d).

Discussion
In the current study, lncRNA and mRNA expression profiles of LSCC were comprehensively analyzed to find significant lncRNA-mRNA pairs and important prognostic genes for LSCC. The lncRNA-mRNA network showed that top downregulated ANKRD20A5P, C21orf15, CYP4F35P, XLOC_I2_011146, XLOC_006053, and XLOC_I2_003881 and one of top 3 upregulated LOC100506027 were highlighted in network. Furthermore, some DEGs, such as FUT7, PADI1, PPL, ARHGAP40, MUC21, and CEACAM1, were co-expressed with these above lncRNAs. Survival analysis showed that PLOD1, GLT25D1 (COLGALT1), and KIF22 were significantly associated with prognosis of LSCC. In addition, the qRT-PCR results suggested that the expressions of MUC21, CEACAM1, FUT7, PADI1, PPL, ARHGAP40, ANKRD20A5P, C21orf15, CYP4F35P, XLOC_I2_003881, XLOC_I2_011146, and XLOC_ 006053 were significantly downregulated, whereas the expression of LOC100506027 was significantly upregulated in LSCC tissues compared with that in para-cancer tissues.
It was reported that PLOD1 is a potential prognostic marker in gastrointestinal cancer [25]. Yamada et al. [26] suggested that aberrant expressed PLOD1 was related to pathogenesis of bladder cancer, and it might be a potential prognostic marker for this cancer. PLOD1 can promote cell migration and growth in osteosarcoma [27].
Suppression of KIF22 inhibits cancer cell proliferation through delaying mitotic exit [28]. Zhang et al. [29] indicated that KIF22 was associated with clinical outcome and tumor progression in prostate cancer. KIF22 is involved in the migration and proliferation of gastric cancer cells through MAPK-ERK pathways [30]. As previously reported, COLGALT1 is involved in the progression of mammary tumor metastases [31]. Wang et al. [32] indicated that  COLGALT2 played role in the proliferation of osteosarcoma. Not too much previous studies reported the roles of these three genes in LSCC. Combined with our present survival analysis results, we inferred that PLOD1, GLT25D1 (COLGALT1), and KIF22 might be potential prognostic markers for LSCC development.
Our qRT-PCR results showed that the expression of MUC21, CEACAM1, FUT7, PADI1, PPL, and ARHGAP40 was downregulated in LSCC tissues compared with that in para-cancer tissues. MUC21, as a member of the mucin family, may play a protective role against external stimuli in mucus layer on mucosal surfaces [33]. There is growing evidence that mucin families are responsible for epithelial carcinomas, especially LSCC [33]. Yuan et al. have reported that MUC21 is associated with differentiation and carcinogenesis of squamous epithelial di [34]. Nair et al. have predicted the downregulation of MUC21 in LSCC tumors via gene expression profile analysis [35], which is consistent with our result. Some studies showed that CEACAM1 played roles in tumorigenesis. The loss of expression and genetic alteration of the CEACAM1 may be an early event for colorectal cancers development [36]. CEACAM1 is related to oral tumors progression [37]. Importantly, Lucarini et al. [38] demonstrated that CEACAM1 was involved in LSCC progression and might be a potential therapeutic target for LSCC. There were no researches about the roles of FUT7, PADI1, PPL, and ARHGAP40 in LSCC, but the roles of these genes or the related genes in other cancers were reported. For example, lower expression of PPL is related to cancer-specific survival and pathological stage in urothelial carcinoma of the urinary bladder [39]. Cui et al. [40] demonstrated that overexpression of exogenous FUT7 contributed to migration and adhesion of cell line MDAMB-231 of breast cancer. PADI2 inhibits proliferation of colon cancer cells [41] and can be used as a potential marker for breast cancer [42]. Downregulated ARHGAP10 inhibits tumorigenicity of ovarian cancer cells [43]. ARHGAP17 plays tumor suppressive role in colon cancer via Wnt/β-Catenin Signaling [44]. Thus, MUC21, CEACAM1, FUT7, PADI1, PPL, and ARHGAP40 may be associated with the development of LSCC.
Chromosome 21 open reading frame 15 (C21orf15) is a lncRNA located in the juxtacentromeric region of human chromosome 21 with domain of spliced expressed sequence tags AJ003450 [45]. It has been reported that C21orf15 is predicted to be upregulated in metastatic prostate cancer [46], whereas our RT-PCR result showed that C21orf15 was downregulated in LSCC tissue. However, few studies reported the function of C21orf15. Combined with our present study that C21orf15 was co-expressed with MUC21, CEACAM1, FUT7, PADI1, PPL, and ARHGAP40, we inferred that C21orf15-MUC21/CEACAM1/FUT7/PADI1/ PPL/ARHGAP40 were lncRNA-mRNA pairs that were involved in LSCC development. That is to say, C21orf15 may affect LSCC development by modulating the expression of MUC21/CEACAM1/FUT7/PADI1/PPL/ARHGAP40. Lastly, there are no previous researches that studied the functions of ANKRD20A5P, CYP4F35P, XLOC_I2_003881, XLOC_I2_011146, XLOC_006053, and LOC100506027. Further researches are needed to clarify the function of these lncRNA in LSCC. Besides, the co-expression relationships of 7 lncRNAs and these genes were needed to be verified by experiments in future.
Funding information: This work was supported by The Project of Invigorating Health Care through Science, Technology and Education (No. ZDXKB2016015), Nanjing Medical Science and Technique Development Foundation (No. QRX17052). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author contributions: Conception and design of the research: J.W.; acquisition of data: D.L.; analysis and interpretation of data: Y.G.; statistical analysis: H.Z.; obtaining funding: J.W.; drafting the manuscript: X.Q. and X.S.; revision of manuscript for important intellectual content: X.S. and X.Q. All authors read and approved the final manuscript.

Conflict of interest:
The authors state no conflict of interest.
Data availability statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.