Sickle cell disease (SCD) is one of the most common and life-threatening genetic blood disorders, that features intermittent vaso-occlusive events and chronic hemolytic anemia . The 2015 Global Burden of Disease reports estimate that the prevalence of sickle cell trait is more than 400 million cases globally , and it results in more than 100,000 deaths . SCD in general involves the mutation of the hemoglobin-beta gene. Previous study has presented a global map of gene variation associated with SCD by genome-wide association analysis . Moreover, with the development of analytical techniques, more than one hundred blood and urine biomarkers have been revealed to be involved in the pathogenesis of SCD . However, most known biomarkers give limited clinical value, and few biomarkers provide useful prognostic information in managing the condition.
High-throughput genome-wide association studies have led to a paradigm shift in the way that investigators explore complex diseases. Genome-wide analyses have discovered several genes influencing the likelihood of developing SCD [4, 6, 7]. Since most biological processes arise from integrated activities among many genes, interpreting the consequences on a pathway level contributes to understanding how gene perturbations account for disease . Thus, the characterization of pathway changes is imperative for understanding the molecular mechanisms of SCD. Existing pathway algorithms have been classified into three categories: over-representation analysis, functional class scoring and pathway topology-based approach . However most existing pathway techniques mainly concern the identification of disturbed pathways in specific disease condition, but ignore the case that pathway aberrance may occur in an individual subject. Individualized pathway analysis is conducive to personalized interpretation of disease data. Currently, an individualized pathway analysis method has been proposed to identify disturbed pathways in disease . The new pathway analysis strategy calculated the individualized pathway aberrance score (iPAS) of one disease sample by comparing with accumulated normal samples, making it possible to interpret disease data in a personalized or customized way.
In this study, we employed the iPAS method to identify disturbed pathways by quantifying the individual pathway aberrance compared with accumulated normal samples. Specifically, this method systematically provided a series of analysis steps: gene expression data collection and preprocessing, pathway data recruitment and preprocessing, gene-level statistics, pathway-level statistics, and significant disturbed pathway analysis. Under this framework, we expected to identify the disturbed pathways in SCD and to further understand the underlying mechanism of SCD.
2.1 Gene expression data
Gene expression data of SCD were retrieved from the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/), with an accession number of E-GEOD-35007 . The gene expression data were obtained from 250 SCD patients and 61 age-matched controls, using the Illumina HumanHT-12 V4.0 beadchip platform. These control samples were defined as accumulated control samples, referring to “nRef” hereinafter. The detailed sample characteristics were presented in the previous study . All raw data and the annotations were obtained from the manufacturer’s documents, and the probes were re-annotated to gene symbols. Finally, a total of 31,426 gene symbols were obtained for subsequent analysis.
2.2 Pathway data
In this study, all human biological pathways were retrieved from the Reactome pathway database (http://www.reactome.org/). Pathways with a very large number of genes are too complex to be understood by human experts. Thus, pathways with gene size > 100 were removed from our study. Then, by intersecting genes between pathway data and gene expression data, we obtained a total of 1,022 pathways (covering 4,928 genes) for subsequent analysis.
2.3 Individualized pathway analysis
2.3.1 Data preprocessing
Standard pre-treatment was conducted to the control quality of the gene expression data. For normal genes, background correction and normalization was implemented to eliminate the effect of nonspecific hybridization by RMA algorithm and quantile based algorithm [11, 12]. Then, Micro Array Suite 5.0 was applied to revise perfect match and mismatch value , and the medianpolish method was used to summarize the expression value .
2.3.2 Gene-level statistics
First, we calculated the average expression value and standard deviation of genes in normal conditions. Then, the gene expression value in individual disease samples was standardized by the average and standard deviation of nRef as reference. For each gene i in the disease group, we calculated the gene expression value as follows:
Where gDi stood for the expression value of gene i in one SCD sample, mean(gnRef) represented the average expression value of gene i in nRef and stdev(gnRef) was the standard deviation of nRef.
2.3.3 Pathway-level statistics
It has been indicated that the Average Z method performed best in highlighting pathway aberrance and in further revealing clinical importance . Thus, we employed the Average Z method to evaluate the pathway aberrance of individual samples in this study. A vector Z = (z1, z2 … zn) stood for the expression status of a pathway, where zi represented the standardized expression of i-th gene, and n was the number of genes belonging to the given pathway. The iPAS status of one pathway was defined as follows:
After that the expression matrix was obtained for each pathway in each individualized sample. The mean value of iPAS values of each pathway was defined as the pathway aberrance level of each disease sample.
2.3.4 Disturbed pathways analysis
After obtaining the pathway aberrance of all pathways in individualized SCD samples, we performed a significance analysis to identify the disturbed pathways in SCD. In this study, the wilcoxon-test  was implemented to generate the pathway statistics values and false discovery rate (FDR)  was utilized to adjust the p-value. The pathways under the threshold of p-value < 0.01 were considered as disturbed pathways in SCD. Meanwhile, the top ten disturbed pathways based on the p-values were selected to conduct a clustering analysis.
2.4 Changed percentage of disturbed pathways
To further validate disturbed pathways identified by the iPAS method, we counted the changed percent for each pathway across all SCD samples. To achieve this, we first determined the distribution character of each pathway statistic value in normal and disease samples to establish a basis. Then, statistical analysis was conducted on the disturbed pathways (p-value < 0.01) in the disease group to obtain the changed percent for each pathway in all SCD cases. In this paper, the pathways whose disturbed percentage was > 80% were extracted for further analysis.
2.5 Differentially expressed genes (DEGs) based pathway analysis
As a validation step, we performed a DEG-based pathway analysis. To achieve this, DEGs between SCD patients and controls were identified by Linear models for microarray data (LIMMA) package , and the p-values were proofread by FDR . Genes under the criteria of |log(FoldChange)| > 2 and p-value < 0.01 were supposed to be DEGs between SCD patients and controls. Then we performed a pathway enrichment analysis based on the Reactome pathway database, using the online Database for Annotation, Visualization and Integrated Discovery (DAVID) . Pathways with p-value < 0.01 were considered as significant pathways.
The conducted research is not related to either human or animals use.
3.1 Identification of disturbed pathways
In the present study, 61 normal controls in the gene expression profile E-GEOD-35007 were defined as nRef (reference) for 250 SCD patients. Quantile normalization was performed on disease genes to evaluate their gene-level statistics using the accumulated normal data. Meanwhile, a total of 1022 pathways were obtained from the Reactome pathway database. We extracted gene-level statistical values of all genes in each pathway, and denoted the mean value as pathway-level statistics of this pathway. According to the iPAS method based on Average Z measures, we obtained the pathway aberrance scores of individual SCD samples. Via wilcoxon-test for the pathway-level statistics, the p-value of each pathway was calculated. Under the criterion of p-value < 0.01, a total of 618 disturbed pathways were identified in SCD compared with normal condition. The top ten disturbed pathways were shown in Table 1. A clustering analysis was conducted based on the top ten disturbed pathways, with the resulting heatmap illustrated in Figure 1. Moreover, we calculated the classification performance of the top 10 disturbed pathways for all samples. Ideally, all samples should be classified into two major clusters. Our results showed that the top 10 disturbed pathways could separate SCD patients from normal controls with an accuracy of 0.89. The most significant disturbed pathway was metabolism of porphyrins (p = 8.31E-27).
3.2 Changed percentage of disturbed pathways
In order to further validate the altered pathways in SCD, we calculated the changed percentage of each disturbed pathway across 250 SCD samples. A total of 6 disturbed pathways changed in more than 80% of SCD individuals (Table 2). Activation of PUMA and translocation to mitochondria were the most significantly affected pathways with changes occurring in 223 disease individuals (89.2%), and metabolism of porphyrins changed in 220 disease individuals (88.0%). Among these 6 disturbed pathways, 4 of them occurred in the top 10 disturbed differential pathways in SCD.
3.3 DEGs based pathway analysis
After data preprocessing of the gene expression profile, LIMMA package was employed to calculate the gene differential expression values. In this research, under the criteria of |logFoldChange| > 2 and p-value < 0.01, we obtained forty-six DEGs in SCD, including 2 down-regulated genes and 44 up-regulated genes. Then a pathway enrichment analysis was performed based on these DEGs using DAVID. Under the threshold p-value < 0.01, only one significant pathway, heme biosynthesis (p = 2.0E-04), was identified and was also a disturbed pathway identified by the iPAS method.
3.4 Gene level statistics of DEGs in disturbed pathways
After quantile normalization of genes in all cases, we obtained the gene level statistics of each gene separately. Then the gene level statistics of DEGs were performed, as shown in Figure 2. It was easily found that the gene expression levels of most DEGs in disease condition were higher than those in normal condition. It could be inferred that the differential gene-levels may lead to the aberrance of pathways in SCD compared with nRef.
In the current study, we employed the Average Z-based iPAS method to quantify the individual pathway aberrance compared with accumulated normal samples, and identified several disturbed pathways in SCD that may contribute to a better understanding of the mechanism of SCD. Using the iPAS method, we screened out a total of 618 disturbed pathways between normal and disease conditions. Further analysis showed that 6 of them changed in more than 80% SCD individuals, such as metabolism of porphyrins, and heme biosynthesis. Furthermore, traditional DEG-based pathway enrichment analysis was also conducted to validate the new method. By functional enrichment analysis of DEGs, we obtained only one significant pathway in SCD, i.e., heme biosynthesis. This significant pathway was one of disturbed pathways identified by iPAS method. Relative to the traditional pathway enrichment analysis, the iPAS method employed Average Z measure to quantify pathway aberrance across individual samples, showing richer results and more extensive application.
Heme biosynthesis was the only one common disturbed pathway identified by both iPAS method and DEG-based pathway enrichment analysis, implying its crucial role in the pathogenesis of SCD. Heme, a complex of protoporphyrin IX with iron, is a prosthetic group of various hemoproteins and an essential cofactor in many biological processes [19, 20]. It is well known that heme biosynthesis is one of the most important metabolic pathways in mammals, and defective heme biosynthesis will give rise to severe metabolic disorders, such as erythropoietic porphyria and sideroblastic anemia . SCD is a group of genetic blood disorders, and patients with SCD possess sickle hemoglobin, an oxygen-transport protein in the red blood cells [22, 23]. With the increase of oxidative stress, heme biosynthesis was markedly increased in circulating endothelial cells in SCD .
Relative to the traditional pathway analysis, the iPAS method identified more disturbed pathways, such as metabolism of porphyrins, activation of PUMA and translocation to mitochondria, and pre-NOTCH expression and processing. Metabolism of porphyrins was the most significant disturbed pathway and changed in 88.0% SCD subjects. It is well known that hemolysis of red blood cells is an inherent characteristic associated with SCD, and results in the release of porphyrins and its metabolites [25, 26]. Porphyrins comprise a portion of hemoglobin, the major constituent of human red blood cells. Disorders of porphyrin metabolism might be closely associated with the physiopathology of SCD. A previous patent proposed a method for clinically screening SCD by detecting porphyrins and porphyrin metabolites in human dentition . Activation of PUMA and translocation to mitochondria changed in 89.2% of SCD samples. PUMA, also known as p53 upregulated modulator of apoptosis, is a p53-dependent pro-apoptotic protein, and activated PUMA interacted with Bcl-2 family members signals apoptosis to the mitochondria . SCD is a disease of hypoxia because of insufficient numbers of erythrocytes for oxygen delivery . Hypoxia and oxidative stress could induce p53-dependent cell apoptosis . Recent findings indicated that PUMA participated in hypoxia-triggered cell apoptosis by interfering with the mitochondrial pathway . Forster et al.  found that PUMA was significantly differentially expressed with over 2-fold changes in erythroid progenitor cells in β-thalassaemia. The disturbed pathway pre-NOTCH expression and processing changed in 82.8% of SCD subjects. Notch is a binary switch for cell-fate decisions mediated by cell-cell interactions, and aberrant Notch signal transduction is associated with cancer and other human diseases . Pre-NOTCH is the nascent form of Notch precursor. Notch proteins are expressed in hematopoietic cells and have been indicated to play a fundamental role in regulating the induction of hematopoietic stem cells and lineage cell fate decisions [34, 35]. The disturbed pathways identified by this new method might play potentially important roles in the pathogenesis of SCD. Further studies should be performed to explore the underlying specific mechanisms between these disturbed pathways and SCD development.
In conclusion, iPAS method is an applicable strategy for exploring disturbed pathways based on gene expression data. Using iPAS method, we identified several disturbed pathways, such as metabolism of porphyrins and heme biosynthesis, in SCD. These pathways might play significant roles in the pathogenesis of SCD, and could be considered as potential predictive and prognostic markers for SCD.
This work was supported by Key Project of Research and Development of Shandong Province, China (2015GSF118131).
GBD 2015 Disease and Injury Incidence and Prevalence Collaborators, Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015, Lancet, 2016, 388, 1545-1602 PubMedGoogle Scholar
GBD 2015 Mortality and Causes of Death Collaborators, Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015, Lancet, 2016, 388, 1459-1544 PubMedGoogle Scholar
Quinlan J., Idaghdour Y., Goulet J.P., Gbeha E., de Malliard T., Bruat V., et al., Genomic architecture of sickle cell disease in West African children, Front Genet, 2014, 5, 26 Web of SciencePubMedGoogle Scholar
Raghavachari N., Xu X., Harris A., Villagra J., Logun C., Barb J., et al., Amplified expression profiling of platelet transcriptome reveals changes in arginine metabolic pathways in patients with sickle cell disease, Circulation, 2007, 115, 1551-1562 Web of SciencePubMedCrossrefGoogle Scholar
Chang Milbauer L., Wei P., Enenstein J., Jiang A., Hillery C.A., Scott J.P., et al., Genetic endothelial systems biology of sickle stroke risk, Blood, 2008, 111, 3872-3879 Web of ScienceCrossrefPubMedGoogle Scholar
Glazko G.V., Emmert-Streib F., Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets, Bioinformatics, 2009, 25, 2348-2354 PubMedWeb of ScienceCrossrefGoogle Scholar
Ahn T., Lee E., Huh N., Park T., Personalized identification of altered pathways in cancer using accumulated normal tissue data, Bioinformatics, 2014, 30, i422-429 Web of ScienceCrossrefGoogle Scholar
Irizarry R.A., Bolstad B.M., Collin F., Cope L.M., Hobbs B., Speed T.P., Summaries of Affymetrix GeneChip probe level data, Nucleic acids research, 2003, 31, e15-e15 Google Scholar
Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P., A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics, 2003, 19, 185-193 CrossrefPubMedGoogle Scholar
Bolstad B., affy: Built-in Processing Methods, 2013 Google Scholar
Smyth G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments, Stat Appl Genet Mol Biol, 2004, 3, 3 Google Scholar
Alvord G., Roayaei J., Stephens R., Baseler M.W., Lane H.C., Lempicki R.A., The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists, Genome Biol, 2007, 8, 183 CrossrefWeb of ScienceGoogle Scholar
Stegenga K., Burks L.M., Using photovoice to explore the unique life perspectives of youth with sickle cell disease: a pilot study, J Pediatr Oncol Nurs, 2013, 30, 269-274 CrossrefPubMedWeb of ScienceGoogle Scholar
Nath K.A., Grande J.P., Haggard J.J., Croatt A.J., Katusic Z.S., Solovey A., et al., Oxidative stress and induction of heme oxygenase-1 in the kidney in sickle cell disease, Am J Pathol, 2001, 158, 893-903 PubMedCrossrefGoogle Scholar
Richard P.A., Method of screening for sickle cell disease by detection of porphyrins and porphyrin metabolites in human dentition, 1980, Patent No. 4236526 Google Scholar
Li Y., Liu X., Rong F., PUMA mediates the apoptotic signal of hypoxia/reoxygenation in cardiomyocytes through mitochondrial pathway, Shock, 2011, 35, 579-584 Web of ScienceCrossrefPubMedGoogle Scholar
Forster L., McCooke J., Bellgard M., Joske D., Finlayson J., Ghassemifar R., Differential gene expression analysis in early and late erythroid progenitor cells in beta-thalassaemia, Br J Haematol, 2015, 170, 257-267 CrossrefPubMedGoogle Scholar
About the article
Published Online: 2017-12-29
Conflict of interest: Authors state no conflict of interest.
Citation Information: Open Life Sciences, Volume 12, Issue 1, Pages 418–424, ISSN (Online) 2391-5412, DOI: https://doi.org/10.1515/biol-2017-0049.
© 2017 Chun-Juan Lu et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0