Screening and bioinformatics analysis of key biomarkers in acute myocardial infarction

: Acute myocardial infarction ( AMI ) is the most severe manifestation of coronary artery disease. Consider able e ﬀ orts have been made to elucidate its etiology and pathology, but the genetic factors that play a decisive role in the occurrence of AMI are still unclear. To determine the molecular mechanism of the occurrence and develop ment of AMI, four microarray datasets, namely, GSE29111, GSE48060, GSE66360, and GSE97320, were downloaded from the Gene Expression Omnibus ( GEO ) database. We analyzed the four GEO datasets to obtain the di ﬀ erential expression genes ( DEGs ) of patients with AMI and patients with non - AMI and then performed gene ontology ( GO ) , Kyoto Encyclopedia of Genes and Genomes ( KEGG ) enrich ment analysis, and Protein - protein interaction ( PPI ) network analysis. A total of 41 DEGs were identi ﬁ ed, including 39 upregulated genes and 2 downregulated genes. The enriched functions and pathways of the DEGs included the in ﬂ amma tory response, neutrophil chemotaxis, immune response, extracellular space, positive regulation of nuclear factor kappa - light - chain - enhancer of activated B cells ( NF - κ B ) transcription factor activity, response to lipopolysaccharide, receptor for advanced glycation end products ( RAGE ) receptor binding, innate immune response, defense response to bacterium, and receptor activity. The cytoHubba plug - in in Cytoscape was used to select the most signi ﬁ cant hub gene from the PPI network. Ten hub genes were identi ﬁ ed, and GO enrichment analysis revealed that these genes were mainly enriched in in ﬂ ammatory response, neutro phil chemotaxis, immune response, RAGE receptor binding, and extracellular region. In conclusion, this study inte grated four datasets and used bioinformatics methods to analyze the gene chips of AMI samples and control samples and identi ﬁ ed DEGs that may be involved in the occur rence and development of AMI. The study provides reliable molecular biomarkers for AMI screening, diagnosis, and prognosis.


Introduction
Acute myocardial infarction (AMI) is myocardial necrosis due to acute and persistent coronary ischemia. AMI remains a leading cause of morbidity and mortality worldwide, despite dramatic improvements in coronary intervention technology and drugs in the past several decades [1].
Accumulating evidence has demonstrated that atherosclerosis is the pathological basis of coronary heart disease (CHD). Lipid metabolism and inflammatory responses play critical roles in plaque rupture, which leads to the occurrence and development of AMI [2]. However, among many risk factors, only 50-60% of the causes of CHD (including AMI) can be explained, and the remaining 40-50% are attributed to genetic factors [3]. At present, most scholars believe that genetic factors may be the cumulative effect of multiple micro-effect genes in the occurrence of CHDs. The genetic factors that play a decisive role in the occurrence of CHD are still unclear. Therefore, traditional assessments of cardiovascular risk factors (including hypertension, diabetes, and smoking) are essential for the prevention and prognosis prediction of AMI [4], but still cannot sufficiently predict the risk of recurrent events. Some molecular markers, such as brain natriuretic peptide and c-reactive protein, or other serum inflammatory markers in the prediction of AMI have gained increasing attention, but the present study shows that the prediction ability of those biomarkers is not so strong; finding more sensitive markers to explain the initial disturbance and the underlying mechanisms of AMI is an urgent need to solve the problem [5]. With the development of bioinformatics, a large amount of big data related to genomics, transcriptomics, proteomics, and metabolomics have been produced. Through the combination of bioinformatics analysis and computer science, these data can be analyzed to study the interrelation between multiple genes. Previous studies have also demonstrated that the application of tissue-based microarray gene expression profiles in a variety of cardiac diseases can help to identify ischemic and nonischemic causes [6]. We can also analyze the relevant gene chips of patients with AMI to identify the different genes and provide ideas for in-depth research on the molecular mechanism of the genetic aspect of myocardial infarction and early diagnosis.
However, the analysis of a single gene chip may yield false-positive results, which may lead to the analysis conclusion not being credible. Therefore, in this study, we analyzed four Gene Expression Omnibus (GEO) datasets to obtain the differential genes of patients with AMI and patients with non-AMI and then performed gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, Protein-protein interaction (PPI) network analysis, and systematic analysis of the relationship between different genes and their regulatory network. We hope to determine the molecular mechanism of the occurrence and development of AMI, thus providing a reliable diagnostic marker for early diagnosis.

Differential expression analysis
Microarray data: The GEO database is a gene expression database created and maintained by the National Center for Biotechnology Information (NCBI). It contains highthroughput gene expression data submitted by research institutions around the world [7]. We downloaded the four datasets GSE29111 [8], GSE48060 [9], GSE66360 [10], and GSE97320 [11] from the GEO database. The probe annotation information of the gene chips of these four datasets comes from the same platform, GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). The dataset for GSE29111 includes 36 AMI samples and 16 Unstable Angina samples. GSE48060 incorporates 31 AMI samples and 21 normal samples. GSE66360 consists of 49 AMI samples and 50 normal samples. GSE97320 contains 3 AMI samples and 3 normal samples. A total of 119 AMI and 90 control samples were included. Detailed information about these datasets is listed in Table 1.
Data processing: Based on the same platform, the original data of these four datasets were integrated into one dataset and then the batch processing was standardized using the R package (sva) [12]. The "affy" package in R (version 3.4.2, http://www.R-project.org/) [13] was used to preprocess the original data, including background correction, matrix data normalization, missing value supplementation, etc. The probe information matched the latest annotation file downloaded from the NCBI gene database. No probes that matched it were deleted. If a gene corresponded to multiple probes, the probe with the lowest corrected p-value was selected. Quantile standardized preprocessing was performed by the "limma" package [14] before identifying the differentially expressed genes (DEGs).
Identification of DEGs: First, through probe annotation, gene expression data were extracted, and LOG2 logarithmic conversion was performed. Then, the limma package of R was used to analyze the DEGs of AMI samples and control samples in the gene chip data. The screening criteria for DEGs were set to log2(fold change) (log2 FC) > 1 and adjusted p-value (adj. p-value) <0.05. Finally, hierarchical clustering analysis was performed on the DEGs of AMI samples and control samples.

GO enrichment and KEGG pathway analyses
GO enrichment analysis [15] and KEGG [16] pathway analysis were performed by using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [17] online tool. GO enrichment analyses annotate and clas-

PPI network construction
A PPI network functional enrichment analysis of the identified DEGs was constructed by using the STRING online database [18]. The cutoff point was set to a credibility score higher than 0.4. The PPI network was visualized using Cytoscape software [19].

Recognition of hub genes
To select hub genes from these DEGs, we used the cytoHubba [20] plug-in in Cytoscape software, which can explore important hubs in an interactome network by several topological algorithms. Three of those algorithms, namely, the density of maximum neighborhood component (DMNC), maximal clique centrality (MCC), and maximum neighborhood component (MNC), were used to identify the hub genes. We used Venn diagrams to find mutual genes from the results of the three algorithms.

Controlling the quality and identification of DEGs
R software was used to analyze all the original data. First, quality was evaluated. The residual, grayscale, weight, and residual symbol maps were made to show the uniform color distribution. The relative logarithmic expression (RLE) and normalized unscaled standard error (NUSE) boxplot showed that the chip expression values were close to each other; RLE did not deviate significantly from 0, and NUSE did not deviate significantly from 1, indicating reliable quality between chipsets ( Figures S1 and S2). The RNA degradation diagram showed that RNA degradation started from the 5-prime end, and no abnormal RNA degradation occurred ( Figure S3). The slope of the curve was between 0°and 45°, indicating a moderate degradation rate. This indicates that the dataset met the requirements of data analysis. The results before the score normalization process are shown in Figure 1a, and the results after the process are shown in Figure 1b. It can be seen from Figure 1b that the processed chip data value distribution is better. Finally, after comparing the two sets of data of AMI samples and control samples, we found 41 DEGs, including 39 upregulated genes and 2 downregulated genes. These DEGs are represented by volcano graphs (Figure 2). In the volcano graphs, p < 0.05 is used as the criterion for defining significant differences. Red dots indicate upregulated DEGs, while green dots indicate downregulated DEGs. The heatmap ( Figure 3) shows the hierarchical clustering of DEGs. The clusters of differential genes are displayed on the left side of the heatmap, and the clusters of the samples are displayed above the heatmap.

Functional enrichment analysis of the DEGs
We used DAVID to perform functional analysis of these DEGs. Figure 4 shows the top ten significant GO terms. As shown in Figure 5 and Table 2 we found that DEGs between AMI and control samples were mainly enriched in inflammatory response, neutrophil chemotaxis, immune response, extracellular space, positive regulation of nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) transcription factor activity, response to lipopolysaccharide, receptor for advanced glycation end products (RAGE) receptor binding, innate immune response, defense response to bacterium, and receptor activity.

Pathway analyses
According to the p-value, the top ten KEGG pathways for the DEGs are shown in Figure 6 and Table 3 including the hematopoietic cell lineage pathway (p = 0.002882095), cytokine-cytokine receptor interaction pathway (p = 0.007920517), malaria pathway (p = 0.011431864), legionellosis (p = 0.013770733), nucleotide-binding oligomerization domain (NOD)-like receptor signaling pathway (p = 0.014759858), tuberculosis (p = 0.020316804), complement and coagulation cascades (p = 0.021902284), rheumatoid arthritis (p = 0.034396554), amoebiasis (p = 0.048229511), and tumor necrosis factor (TNF) signaling pathway (p = 0.049049747). All these pathways may be related to the development of AMI. Among all the pathways, the cytokine-cytokine receptor interaction pathway involves the largest number of five DEGs. The hematopoietic cell lineage signaling pathway was the most significant pathway according to the p-value.

Construction of the PPI network
We used the STRING database to investigate PPI networks, including 39 upregulated genes and two downregulated genes. Subsequently, a PPI network of those DEGs was drawn and is shown in Figure 7.

Selection of hub genes from the PPI network
We used the cytoHubba plug-in in Cytoscape to select the most significant hub gene from the PPI network. Among the algorithms in the cytoHubba plug-in, we chose the three most commonly used algorithms, namely, DMNC, MNC, and MCC. Based on these three algorithms, we selected the top fifteen hub genes of each algorithm (Figure 8a-c). Finally, we used the Venn diagram to intersect the top fifteen hub genes of each selected algorithm to obtain the final ten mutual hub genes (Figure 8d). A total of ten genes were identified as hub genes. The names, abbreviations, and MFs of these ten hub genes are shown in Table 4. The functional analyses of these hub genes were analyzed with DAVID. The results of the hub genes are shown in Table 5.

Discussion
AMI is the most severe manifestation of coronary artery disease [1]. Numerous previous studies have confirmed  that atherosclerosis is the primary etiology factor of AMI and that the inflammatory response plays a critical role in the development of AMI [21]. Although traditional AMI risk factors such as smoking and diabetes have some reference value for disease prevention and prognosis judgment, the specificity of this risk factor has certain limitations [22]. Identifying new biomarkers at the current stage is conducive to early screening and diagnosis of AMI. In this study, the bioinformatics method was used to screen the data of GEO chips to analyze the DEGs in the peripheral blood samples of patients with AMI. Compared with the healthy control group, 41 genes in peripheral blood samples of patients with AMI had differential expression, of which 2 were downregulated and 39 were upregulated. Further analysis of GO functional enrichment of DEGs and construction of a PPI network were conducted to explore their differential expression. The DEGs were mainly enriched in inflammatory response, neutrophil chemotaxis, immune response, extracellular space, and positive regulation of NF-κB transcription factor activity. Previous studies have reported that the inflammatory response plays a critical role in the occurrence and development of AMI [21,23]. Elevated neutrophils are important markers of the inflammatory response. At the same time, neutrophils are an essential part of innate immunity. Neutrophils chemoattract to coronary plaques and infarcted myocardium and mediate tissue damage by releasing matrix-degrading enzymes and reactive oxygen species (ROS) [24]. The extracellular space is an irregular and interconnected narrow space between cells. As the microenvironment where brain cells and neural networks live and work directly, the extracellular space affects the activity of brain cells. It is involved in the occurrence and development of various neurological diseases [25]. Reports of connections between the extracellular space and heart disease are rare. Nevertheless, extracellular vesicles, which originate from diverse subcellular compartments, are released into the extracellular space and act as regulators of the transfer of biological information [26]. Circulating extracellular vesicles can be diagnostic and prognostic biomarkers for primary and secondary prevention of cardiovascular diseases [27]. In addition, recent studies have suggested that NF-κB is a crucial nuclear transcription factor that mediates the release of inflammatory factors and plays a vital role in the pathophysiological process of AMI [28]. Overall, the above research results are consistent with our analysis.
Interleukin-1 (IL-1) plays an essential role in the signaling, activation, and regulation of immune cells, and mediates T and B cell activation, proliferation and differentiation, and the inflammatory response [37,38]. There are two subtypes of IL-1, namely, IL-1α and IL-1β; two receptor forms, that is, interleukin-1 receptor type 1 (IL1R1) and interleukin-1 receptor type 2 (IL1R2); an interleukin-1 receptor accessory protein (IL1RAP); and an interleukin-1 receptor antagonist (IL1RN). With the help of IL1RAP, IL-1 can form a membrane-penetrating complex with IL1R1, and IL-1/IL1R1/IL1RAP recruit IL-1 receptor-associated kinase  (IRAK1) and activate transcription regulators. This further activates NF-κB, mitogen-activated protein kinase, mammalian target of rapamycin, and other signal transduction pathways, ultimately initiating the IL-1-mediated cellular response. Unlike IL1R1, IL1R2 acts as an endogenous inhibitor that prevents IL-1 from binding to IL1R1 and inhibits IL-1 signal transduction [39,40]. IL1R2 is highly expressed in several cancer species and is regarded as a novel biomarker for the early diagnosis and therapeutic target of tumors [41][42][43][44].
However, IL1R2 has been reported to have a controversial role in coronary disease, which needs further investigation. The elevated IL1R2 prevents homocysteine-induced apoptosis and inflammation injury in coronary atherosclerosis through the inhibition of inflammasome signaling pathway activation [45]. In another study, the expression level of IL1R2 on monocytes/macrophages decreased during atherosclerosis and vascular injury. The reason may include acetylated low-density lipoprotein inhibiting the expression of IL1R2 in macrophages and uncontrolled IL-1dependent inflammation in this case [46]. However, published bioinformatics analyses have shown that IL1R2 is highly expressed in patients with AMI [47,48]. IL1RN inhibits the activity of IL-1 by binding to the receptor IL1R1 and prevents its signal transduction with the coreceptor IL1RAP. Under advanced pathological conditions, IL1RN inhibits cells by noncompetitively inhibiting the activities of caspase-8 and caspase-9 apoptosis and/or necrosis [49,50,51]. According to reports, IL1RN can reduce the myocardial infarct size and inhibit myocardial cell apoptosis by blocking the inflammatory response during myocardial ischemia-reperfusion injury and plays a potential  benefit in myocardial remodeling in AMI [52]. Anakinra, a specific IL1RN, demonstrates potent suppression of markers of inflammation in non-ST elevation acute coronary syndromes [53].
Nucleotide-binding oligomerization domain-like receptor protein 3 (NLRP3) may function as an inducer of apoptosis. The NLRP3 inflammatory body is a multimeric protein complex composed of NLRP3, apoptosis-related specklike protein, and pro-caspase-1 [54]. It has been reported that in the case of AMI and heart ischemia, the formation of NLRP3 inflammatory bodies is increasingly triggered, which in turn damages the function and viability of myocardial cells [55,56]. Moreover, NLRP3 inflammasome inhibitors can effectively reduce the amount of myocardial necrosis during AMI [57][58][59]. These results indicate that the NLRP3 inflammasome has potential as a biomarker of AMI [60] and as a promising therapeutic target in the treatment of AMI [61].
Chemokine (C-X-C motif) ligand 2 (CXCL2), a member of the CXC chemokine family, is produced by activated monocytes and neutrophils and expressed at the site of inflammation. In the inflammatory response, CXCL2 acts as a powerful neutrophil chemoattractant to recruit and adhere to neutrophils. Recent studies have shown that inflammation is an essential pathological process in the development of AMI, and CXCL2 plays a vital role in the development of AMI [62]. CXCL2 is highly expressed in the infarcted and marginal regions of myocardial infarction [63]. CXCL2 overexpression induces the rapid accumulation of neutrophils to the injury site. It releases proteolytic enzymes to directly damage the surrounding cells, causing ROS damage and leading to the formation of myocardial infarction [64]. Increased expression of CXCL2 in myocardial infarction will aggravate the acute inflammatory response after myocardial injury and promote heart rupture. In contrast, suppression of CXCL2 chemoattractant expression can regulate neutrophil recruitment to the injured myocardium and cardiac fibroblasts and relieve myocardial injury [65]. In the KEGG pathway analysis, we found that CXCL2 was enriched in cytokine-cytokine receptor Table 4: Functional roles of the ten hub genes No. Gene symbol Full name Molecular function 1 S100A8 S100 calcium binding protein A8 A calcium-and zinc-binding protein which plays a prominent role in the regulation of inflammatory processes and immune response 2 S100A9 S100 calcium binding protein A9 3 S100A12 S100 calcium binding protein A12 4 NLRP3 NLR family, pyrin domain containing protein 3 May function as an inducer of apoptosis  interaction, the NOD-like receptor signaling pathway, and the TNF signaling pathway. Thus, CXCL2 plays a significant role in myocardial infarction through the above pathways and can be a potential therapeutic target. Dendritic cell-associated c-type lectin 1 (Dectin-1, also named CLEC7A) is an innate immune pattern recognition receptor that recognizes β-glucan on the cell wall. In a recent study, researchers demonstrated that upregulation of Dectin-1 could promote myocardial injury, inflammation, and fibrosis and then damage myocardial function. Dectin-1 can also induce activation of the NF-κB/NLRP3 axis in infarcted myocardium [66]. Dectin-1 promotes myocardial ischemia-reperfusion injury by regulating macrophage polarization and neutrophil infiltration [67]. These studies revealed a new link between Dectin-1 in cardiac muscle cells and cardiac remodeling, suggesting that targeting the Dectin-1/NF-κB/NLRP3 signaling pathway may be a promising treatment for AMI.
Chronic inflammation plays an essential role in the onset of atherosclerosis and its cardiovascular-related diseases. The immune response is a vital initiating factor of the inflammatory response, and the complement system is an important component of natural immunity and participates in the development of atherosclerosis [68]. The complement system is composed of more than 30 plasma proteins. After complement activation, active fragments with inflammatory mediators are generated, such as the production of various proinflammatory factors (including C3a, C4a, and C5a). These cytokines can stimulate cell degranulation and release histamine, as well as vasoactive substances that enhance vascular permeability and stimulate visceral smooth muscle contraction. C5a is the most active of these substances. C5a first binds to its receptor (C5a receptor; C5aR) so that it can be activated and can activate C5aR to exert its functions. C5a can also regulate the expression of inflammatory mediators, promote the accumulation of chemotaxis neutrophils, induce neutrophil degranulation, and cause inflammatory damage to vascular endothelial cells [69]. The C5a-C5aR1 axis seems to be involved in the occurrence of atherosclerosis and CHD [70]. Experimental studies have shown that when the C5a/C5aR1 axis is attenuated, the infarct size and inflammation will decrease [71]. In the in vivo AMI mouse model, the lack of circulating leukocyte C5aR1 led to a reduction in infarct size and improved clinical outcomes [72].
Triggering receptor expressed on myeloid cells 1 (TREM-1) is a vital signal receptor expressed on neutrophils and monocytes, and plays an important role in systemic infection. Current evidence suggests that TREM-1 may be a useful biomarker for predicting neonatal sepsis and a biomarker of clinical diagnosis and prognosis for sepsis [73] and other types of acute and chronic inflammation-related diseases [74]. Inflammation is one of the most important complication factors of CHDs. The study found that TREM-1 is involved in the pathogenesis of acute and chronic cardiovascular diseases, such as AMI and atherosclerosis, and shows the prospect of using the TREM-1 pathway as a potential target for understanding and managing cardiovascular disease prospects [75].
Additionally, we conducted hierarchical clustering analysis of hub genes. The results show that these hub genes can distinguish myocardial infarction samples from non-myocardial infarction samples and may become diagnostic biomarkers. Studies have shown that neopterin is an independent predictor of all-cause and cardiovascular mortality. Serum neopterin concentration is associated with adverse cardiovascular events in patients with chronic stable angina pectoris, and this correlation is closely related to the severity of CHD [76]. Neopterin may therefore be a useful marker of patient's risk of future coronary events [77]. The main mechanism of AMI in patients with CHD is the rupture of unstable atherosclerotic plaque mediated by inflammatory cells, followed by intra-plaque or subplaque bleeding leading to thrombosis and lumen occlusion. As an inflammatory marker of macrophage immune activation, neopterin level can reflect the change in plaque stability, so it has a good prediction effect on the occurrence and prognosis of myocardial infarction in patients with CHD. Alteration of S100A8/S100A9, IL1RN, NLRP3, CXCL2, and C5AR1 is involved in inflammation and atherosclerosis, which constitute the crucial mechanisms of AMI formation, indicating that these genes may play essential roles in the progression and prognosis of AMI by regulating the concentration of neopterin.
It is worth noting that there have been articles that study DEGs in AMI. However, the results of those articles are somewhat different from ours. Compared with other AMI studies, this study performed a complete bioinformatics analysis of DEGs of AMI through statistical methods. We were able to provide more reliable results for the following reasons: (1) The selected four datasets are all based on the same platform. They can be combined into one dataset for normalized analysis, which increased the credibility of the results. (2) During the investigation, the original data were processed with background correction and missing value supplementation to ensure that the information was valid and reliable. (3) The sample size of previous articles is relatively small. We selected 4 datasets with a sample size of 209, and the DEGs selected were more accurate. Of course, it is also important to verify these results in the subsequent experiments.
In summary, our study used bioinformatics methods to analyze the gene chips of a large-sample myocardial infarction population and control population and identified DEGs that may be involved in the occurrence and development of AMI. The study provides reliable molecular biomarkers for AMI screening, diagnosis, and prognosis; it also provides a basis for exploring new therapeutic targets for AMI. Compared with other studies on AMI, the innovation and advantage of this study lie in the use of multichip joint analysis and the use of the same platform, with a larger sample size. However, no experimental verification is the limitation of this study. Thus, to verify the current results, further studies are needed to elucidate the biological function of these genes in AMI.