Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Open Medicine

formerly Central European Journal of Medicine

Editor-in-Chief: Darzynkiewicz, Zbigniew

IMPACT FACTOR 2018: 1.221

CiteScore 2018: 1.01

SCImago Journal Rank (SJR) 2018: 0.329
Source Normalized Impact per Paper (SNIP) 2018: 0.479

ICV 2017: 152.94

Open Access
See all formats and pricing
More options …
Volume 14, Issue 1


Volume 10 (2015)

Prediction of recurrence-associated death from localized prostate cancer with a charlson comorbidity index–reinforced machine learning model

Yi-Ting Lin
  • Department of Urology, St. Joseph Hospital, Yunlin County, 63241, Taiwan
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Michael Tian-Shyug Lee
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Yen-Chun Huang
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Chih-Kuang Liu
  • Department of Urology, St. Joseph Hospital, Yunlin County, 63241, Taiwan
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Yi-Tien Li
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Mingchih Chen
  • Corresponding author
  • Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2019-08-14 | DOI: https://doi.org/10.1515/med-2019-0067


Research has failed to resolve the dilemma experienced by localized prostate cancer patients who must choose between radical prostatectomy (RP) and external beam radiotherapy (RT). Because the Charlson Comorbidity Index (CCI) is a measurable factor that affects survival events, this research seeks to validate the potential of the CCI to improve the accuracy of various prediction models. Thus, we employed the Cox proportional hazard model and machine learning methods, including random forest (RF) and support vector machine (SVM), to model the data of medical records in the National Health Insurance Research Database (NHIRD). In total, 8581 individuals were enrolled, of whom 4879 had received RP and 3702 had received RT. Patients in the RT group were older and exhibited higher CCI scores and higher incidences of some CCI items. Moderate-to-severe liver disease, dementia, congestive heart failure, chronic pulmonary disease, and cerebrovascular disease all increase the risk of overall death in the Cox hazard model. The CCI-reinforced SVM and RF models are 85.18% and 81.76% accurate, respectively, whereas the SVM and RF models without the use of the CCI are relatively less accurate, at 75.81% and 74.83%, respectively. Therefore, CCI and some of its items are useful predictors of overall and prostate-cancer-specific survival and could constitute valuable features for machine-learning modeling.

1 Introduction

Localized prostate cancer is one of the most common male cancers and has led to unavoidable cancer death and impairment of quality of life [1, 2]. Current evidence has already demonstrated that great controversy exists regarding the treatment options for localized prostate cancer. Radical prostatectomy (RP) is associated with the lowest cancer-specific mortality in observational studies [3, 4], but selection bias exists. The selection bias means that patients with low comorbidity tend to undergo RP, whereas patients with high comorbidity are recommended to receive external beam radiotherapy (RT) [3, 4]. Randomized trials have demonstrated no significant differences between patients undergoing RP and patients undergoing RT in overall and cancer-specific survival. In addition, no differences are exhibited in long-term quality of life (QoL) for patients undergoing RP versus those undergoing RT [4]. Currently, the shared decision-making process has been made part of standard practice in medical decision management for localized prostate cancer. However, patients who must choose their treatments encounter a dilemma when they have information regarding only the reported statistical cancer outcomes and posttreatment QoL but are unable to take personal characteristics into consideration [3, 4]. Integrating cancer characteristics, comorbidity, and cancerous outcomes into a machine-learning model, and obtaining a predictive result might represent a solution to the dilemma.

Machine-learning modeling could be employed as a powerful tool for addressing the problem of treatment-related decisions for patients with prostate cancer. In the past 20 years, an increasing number of new, powerful algorithms and computer science advances have made the modeling of big medical data possible. The machine learning models formulated using big medical data enable individualized predictions of clinical outcomes. Several instances of this have already occurred in the areas of oncology outcome prediction. Algorithms, such as support vector machine (SVM), random forest (RF), artificial neural network, and decision tree algorithms, have been applied for modeling with acceptable accuracy [5]. Kourou et al. indicated that a small sample size represents the most common research limitation for applications of machine learning in past decades [5]. Furthermore, after reviewing the literature thoroughly, they also concluded that the dataset quality and careful feature selection are also crucial for effective machine learning and accurate prediction. Relatively few references currently utilize machine learning algorithms for prostate cancer research [6, 7]. Two studies have employed machine learning to maximize the detection accuracy of prostate cancer. The absence of research focusing on machine learning predictions of localized prostate cancer outcomes may be a result of the long survival times following treatment. The longer the duration of disease-free survival is, the greater the chances are that non-prostate-cancerous factors, such as comorbidity and accidents, constitute the primary determinants of survival. In addition to cancerous characteristics, therefore, this research investigates the effects of non-prostate-cancerous factors for modeling long-survival cancers.

The Charlson Comorbidity Index (CCI) [8] serves as a valuable, measurable indicator for improving prediction accuracy with machine learning models for long life expectancy cancers. The CCI has been recognized as influential in relation to the clinical outcomes for localized prostate cancer. Park et al. declared CCI to be a prognostic factor for RP outcomes [9]. Lee et al. demonstrated that CCI is a major prognostic factor for long-term survival after RP [10]. Later, Rajan et al. denied the effect of comorbidity on cancer-specific survival in prostate cancer patients [11]. However, they all found comorbidity to be a reliable predictor for overall survival. Comorbidity has also been proven to represent a strong predictor of overall survival for patients treated with radiotherapy. Jespersen CG et al. demonstrated that the choice of treatments for localized prostate cancer is affected by comorbidity [12]. Therefore, CCI is not only a factor that influences survival but is also an indicator for treatment options.

Taiwan’s National Health Insurance Research Database (NHIRD), created in 1998, is an excellent resource for medical research because of the compulsory enrollment involved in Taiwanese national health insurance. The medical expense records of more than 99% of Taiwan’s population are included in the NHIRD. The NHIRD currently provides 78 databases openly accessible, including 59 health databases, 5 social databases, and 14 welfare databases. The first to 14th database contents include health insurance–related data files relating to ambulatory patients, inpatients, pharmacy information and medical care orders, and cause-of-death data. As part of the NHIRD, the Taiwan Cancer Registry (TCR) was created by the Department of Health to collect cancer epidemiological data from 1996 onwards comprehensively. TCR has become an indispensable resource for Taiwan cancer research. Using the NHIRD, Wang’s work surveyed second primary malignancy risk after radiotherapy for patients with rectal cancer [13]. Researchers used the NHIRD to prove that the survival of lung cancer is affected by a common comorbidity, chronic renal insufficiency [14]. A type of Chinese herb was demonstrated to improve the survival of lung cancer with this database [15]. Wei et al. also used the NHIRD to clarify the effects of the comorbidity chronic kidney disease on the mortality risk of lung cancer patients [14]. Yang et al. recognized that CCI is a good predictor for NHIRD patients with lung cancer using a Cox regression model (16). Successively, fine results have been reported in NHIRD-related literature relating to breast cancer [17, 18] and hepatoma [19]. Prostate cancer–related issues with NHIRD have also been published [20, 21, 22]. However, these studies related to prostate cancer have not focused on cancerous recurrence or death. The big data of NHIRD have also been analyzed by various researchers using machine learning modeling. Hu et al. used the NHIRD to conduct machine learning predictions regarding return visits for pediatric patients to the emergency department [23]. Wang et al. utilized machine learning classifiers for the prediction of brain metastasis of lung cancer for patients from the NHIRD [24]. Some research has aimed to validate comorbidity data, such as acute myocardial infarction or stroke, and death data. High consistency among these data has been observed [25, 26, 27]. Thus, data derived from the NHIRD should be extremely suitable for the machine learning modeling of prostate cancer survival.

To our knowledge, no machine learning models have been able to predict localized prostate cancer outcomes. Although CCI has been proven to represent a significant prognostic factor for treatment outcome of localized prostate cancer, few studies have taken the CCI into account when building machine learning models. This study focused mainly on building a CCI-reinforced machine learning model for the prediction of different treatment outcomes. Thus, the purposes of this study were 1) to use data extracted from the NHIRD to compare the cancer outcomes for patients with localized prostate cancer who have undergone RP versus patients with localized prostate cancer who have undergone RT, 2) to analyze the factors that influence cancer-specific and overall survival, and 3) to observe the effects of the CCI on the improvement of prediction ability (and thus to determine whether the CCI represents a significant factor in cancer outcomes).

2 Materials and methods

2.1 Search for target population in database

This research was a retrospective study targeting patients with localized prostate cancer who had undergone RP or RT. Both ICD9 (185) and ICD-10 (C61) were used to identify patients with prostate cancer in the NHIRD from 2008 to 2015. To search for clinical T1N0M0, and T2N0M0 patients, the TCR database was incorporated to obtain information on the initial clinical stages of patients. The next step was using the radical prostatectomy procedure code (79403B, 79410B) and order codes for external beam radiation therapy (36015B, 36002B, 36005B, 36012B, 36019B) to search for patients who had received RP or RT as their initial definite treatment. Individuals with incomplete data on clinical stages, histological grading, or radiotherapy start-date, for example, were excluded. Patients whose recurrence statuses were recorded as “uncertain” or whose radiation modality was marked “unknown or other than external beam radiation therapy” in the TCR database were also excluded. The flow chart for the study population retrieved from NHIRD appears in Figure 1. The demographic statistical comparison between the RP group and the RT group is as seen in Table 1.

Flow chart of subjects searching This figure demonstrates whole procedure for establishing our target population. The dataset of target population was extracted from the outpatient expense file, hospitalization expense file, TCR file and death cause file of NHIRD.
Figure 1

Flow chart of subjects searching

This figure demonstrates whole procedure for establishing our target population. The dataset of target population was extracted from the outpatient expense file, hospitalization expense file, TCR file and death cause file of NHIRD.

Table 1

Demographic features among treatment groups.

2.2 Data collection and definition of variables

Various meaningful variables are extracted from the medical expense records and Cancer Registry records of every single patient in our target population. These variables could be categorized into cancerous characteristic variables, comorbidity variables, and cancerous outcome variables. Cancerous characteristic variables include age, clinical T stage, histological grade, initial definite treatment, and duration of follow-up. The cancer data are obtained from the TCR file. Comorbidity variables [28] are composited by recording the presence or absence of the following conditions, including myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, mild liver disease, diabetes without chronic complication, diabetes with chronic complication, hemiplegia or paraplegia, renal disease, any malignancy, including lymphoma and leukemia, except malignant neoplasm of skin, moderate or severe liver disease, metastatic solid tumor, AIDS/HIV, and the CCI. We obtained the status of every comorbidity item with ICD-9 code screening of every comorbidity contained in the CCI in the outpatient department and hospitalization expense database. The cancerous outcome variables, including overall survival time and cancer-specific survival time, were retrieved from the cause-of-death file. The aforementioned data were analyzed and modeled in the following sections.

The demographic and cancer characteristics are listed and compared in Table 1, such as comorbidities and cancerous outcomes among treatment groups extracted from the NHIRD. The chi-square test was used for categorical outcomes, and the independent t test was used for numeric outcomes.

2.3 Establish significant causal factors for cancerous outcomes with the Cox Hazard regression model

We stratified all patients according to their T stage, grade, initial definite treatment, and CCI, and we compared the cancer-specific survival and overall survival using the Kaplan–Meier method (Figure 2). The significant causal factors were identified for recurrence-free survival, disease specific survival, and overall survival time using a Cox hazard regression model (Table 2).

Table 2

Hazard ratios of features in Cox proportional Hazard model

Accumulated mortality events curve, stratified initial definite treatment, grade, stage and years Mortality events are significantly higher in high grade, RT group. Grade 1: Gleason score 2~5; Grade 2: Gleason score 6,7; Grade 3: Gleason score 8~10 *: Statistically significant, p<0.05
Figure 2

Accumulated mortality events curve, stratified initial definite treatment, grade, stage and years

Mortality events are significantly higher in high grade, RT group.

Grade 1: Gleason score 2~5; Grade 2: Gleason score 6,7; Grade 3: Gleason score 8~10

*: Statistically significant, p<0.05

2.4 Comparison of machine learning models with and without CCI

Machine learning models, including RF and SVM, were adopted for the prediction of cancer outcomes. To prove that CCI would enhance the predictive ability of a machine learning model, we created the models using two stages. The first stage was training models with cancerous characteristic factors and follow-up durations as the inputs for the model and with cancerous outcomes as the outputs for the model. The only difference in the second stage was the addition of comorbidity variables along with the other input variables in stage one of the model, as seen in Table 3. The parameters for the setting of every algorithm were all by default. Three new data sets were constructed with the ratios of 1:1, 1:2, 1:3, to make up the imbalanced classes of cancer outcomes [29]. Five-fold cross validation was then applied to these three datasets. The mean accuracy, mean AUC, and mean kappa value feature in Table 4.

Table 3

List of variable used in machine learning modeling for overall survival prediction

Table 4

Predictive ability of machine learning models

2.5 Software

SAS version 9.4 was used for data extraction, data preprocessing from the NHIRD, and Cox regression analysis. The R studio 3.5.1 was used as a platform to implement the machine learning models. Several R packages were used, including randomForest for RF modeling, e1071 for SVM modeling, pROC and caret for calculating accuracy and kappa, and survival and survminer for Kaplan–Meier.

3 Results

In total, 8581 patient’s records were included in the study: 4879 who had received radical prostatectomies (the RP group) and 3702 who had received radiotherapy (the RT group). As seen in Table 1, RP group members received fewer diagnoses of congestive heart failure (171(3.50%) in RP vs 214(5.78%) in RT, p < 0.0001), peripheral vascular disease (117(2.40) vs 115(3.11), p = 0.0451), chronic pulmonary disease (629(12.89) vs 593(16.02), p < 0.001), cerebrovascular disease (371(7.60) vs 410(11.08), p < 0.0001), hemiplegia (27(0.55) vs 38(1.03), p = 0.0123), and dementia (47(0.96) vs 97(2.62), p < 0.0001), any malignancy (256(5.25) vs 270(7.29)) as well as lower CCI scores (1.36 vs 1.42, p < 0.0001), less overall death (88(1.80) vs 325(8.78), p < 0.0001), and prostate cancer–specific death (24(0.49) vs 79(2.13), p < 0.0001). However, exceptions also occurred. We observed that more patients had ICD-9 codes for ulcer disease (839(17.20) vs 544(14.69)), diabetes mellitus without complications (1079(22.12) vs 657(17.75), p < 0.0001), and metastasis solid tumor (130(2.66) vs 51(1.38), p < 0.0001) in the RP group. Our data also revealed that the RP and RT groups included equal incidences of any severity of liver disease, diabetes mellitus with complications, ulcer disease, connective tissue disease, and follow-up time.

The Cox proportional hazard model was established, as shown in Table 2. The CCI was stratified into seven levels equivalent to CCIs of 0, 1, 2, 3, 4, 5, and 6 or more. Compared with patients with CCI0, Patients with CCI1, CCI2, CCI3, CCI4, CCI5, and CCI6+ had statistically significantly higher risk of overall death, with the hazard ratios 1.461, 1.816, 2.015, 2.455, 2.108, and 2.767. Grade, initial definite treatment, and age also played significant roles in the risk of overall death. Among CCI items, moderate-to-severe liver disease together with dementia, congestive heart failure, chronic pulmonary disease, and cerebrovascular disease were significantly associated with higher overall risk of death. Conversely, the patients classified as CCI1, CCI2, CCI3, CCI4, CCI5, and CCI6+ had statistically nonsignificant risks for prostate cancer-specific death compared with patients classified as CCI0. Grade and initial definite treatment were significantly associated with higher risk of prostate cancer–specific mortality. Metastatic solid tumors followed by moderate-to-severe renal disease would increase the risk of prostate cancer–specific death. The accumulated survival event curves stratified with age, initial definite treatment, grade, and CCI were plotted. The Kaplan–Meier test revealed that the RP group, low grade groups, and low CCI groups were statistically significantly associated with fewer overall mortality events (Figure 2).

Because the CCI and its items were highly correlated with the overall mortality rather than with prostate cancer–specific mortality, we created models with and without CCI and the related items to predict the overall numbers of deaths instead of prostate cancer–specific deaths. The overall mortality in our dataset was 413, so we had to downsample the relatively larger group of patients who survived until the end of the follow-up period with the numbers 413, 826, and 1239 according to the previously designed ratios of 1:1, 1:2, and 1:3. The CCI-reinforced SVM model with a 1:1 sampling ratio yielded the best accuracy and kappa (0.8518, 0.6315) for prediction of overall numbers of deaths. SVM without CCI yielded a less accurate model with a smaller kappa value (0.7581, 0.5164). The same phenomenon was observed in the case of an RF model. The CCI-reinforced RF model had higher accuracy and kappa (0.80, 0.60) than the RF model without CCI had (0.7483, 0.4946). Different downsampling ratios for the survival groups did not affect the accuracy or kappa significantly, as ratios of 1:1, 1:2, and 1:3 selected. (Table 4)

4 Discussion

Patient selection for RP or RT is an art for urologists and patients. According to the popular clinical guidelines followed by most urologists in Taiwan, flow charts in NCCN clinical practice guidelines for prostate cancer 2010 [30] recommended that RPs are more suitable for localized prostate cancer patients with life expectancies of more than 10 years. This is because risks associated with RP are higher than those for radiotherapy during the procedure. RP also yields a better cancerous outcome in intermediateand high-risk localized prostate cancer than RT does. For patients with life expectancies of less than 10 years and with lower clinical T stages and histological grading, RT might be sufficient to protect them from mortality caused by prostate cancer. In Taiwan, the cutoff age for a 10-year life expectancy is 76 years during the study period. Average ages (RP: 65.79 years old, RT: 74.10) for these two groups in our dataset are consistent with the NCCN recommendation, as seen in Table 1. More young patients personally chose or were advised by physicians to undergo radical prostatectomies. In addition, we found that patients with fewer comorbidities tended to undergo radical prostatectomies rather than radiotherapy (Table 1). This finding is compatible with Jespersen’s work, in which the choice of treatment for localized prostate cancer is demonstrated to be affected by comorbidity [12]. This is relatively reasonable because RP is a major surgery, and its success is highly dependent on patients’ general conditions. Systemic diseases yield more perioperative complications and prolong hospital stay [31]. Our findings furthermore revealed that the RT group had less favorable prostate cancer–specific mortality. This finding is consistent with Abdollah’s work and the subsequent systemic review and meta-analysis. This might result from the nature of sparing radiation insensitive tumor cells during radiotherapy, which would not happen in the case of RP [32, 33]. In the RP group in our dataset, more patients had diabetes mellitus without end organ damage compared with other comorbidities. This may have been due to routine blood sugar tests according to clinical practice guidelines for preoperative blood tests [34]. Routine blood glucose checkup led to exhaustive detection and even overdiagnosis of diabetes mellitus.

Our data from the NHIRD of Taiwan revealed that CCI and associated items are critical risk factors for the overall survival of localized prostate cancer patients but not for prostate cancer–specific survival. These findings are consistent with the relevant literature. Matthes et al. surveyed the effects of comorbidities for prostate cancer victims. They found that CCI (CCI1: HR: 2.07 (1.51-2.85) and CCI2+: 2.34 (1.59-2.34)) were associated with overall mortality rather than prostate cancer–specific mortality [35]. The increase in the overall mortality risk was also aligned with our results. Thomas’ study demonstrated the same conclusions in patients receiving RP [36]. The findings of Jae et al. also supported this assertion in patients with RP [37]. In line with our results, Matthes claimed that CCI played a more significant role than age in overall mortality risks [38]. Cancela and colleagues presented contradictory findings that age remains a major predictor in the untreated localized prostate cancer population [39]. To our knowledge, much literature addresses the measurement of the risks for every single CCI item. We established a Cox proportional hazard model to assess the effects of all CCI items and observed severe liver disease and dementia to be the most important factors associated with overall mortality. Minor factors, including congestive heart failure, chronic pulmonary disease, cerebrovascular disease, were also discovered. Because the score for moderate-to-severe liver disease is 3, moderate-to-severe liver disease has a higher correlation with overall mortality. Chen et al. studied the population for individuals with dementia in Taiwan and found them to usually have multiple comorbidities and to be poorly cared for [40]. This might be the reason why dementia is relevant. In clinical practice, informing patients and their families to take dementia into consideration during their decision-making processes might be advisable.

Our data suggested that the CCI and its items can reinforce the machine learning classifier for survival prediction of localized prostate cancer. As a good predictor of overall death, CCI and its items could be extracted from the NHIRD and successfully modeled with RF and SVM algorithms. The highest accuracy rate of our model is 85.18%, with the kappa value 0.6315. Fleiss defined the quality of classifier as good to fair with kappa values between 0.4 and 0.75 [41]. Currently, researchers only relatively rarely model prostate cancer survival with machine learning algorithms. This study might constitute a useful reference for other research. Imbalanced classification is fairly common in most real-world cases, and our dataset used a ratio of 1:20. Downsampling is a well-known and widely used technique for solving imbalance problems [29]. In this study, we followed the recommendations of this handbook that the ratios of 1:1, 1:2, and 1:3 be used, and we observed that accuracy did not change significantly. However, the kappa value and sensitivity decreased as the ratio of sampling increased from 1:1 to 1:3. This occurred because the false negative prediction increased as the ratio of sampling increased. Subsequently, even the 1:3 RF model with CCI yielded a maximum accuracy of up to 82.62% among RF models, but we still regarded the 1:1 RF model with CCI as the best model among RF models because it has both high accuracy and kappa. Our dataset suggested that SVM models with CCI were superior to RF models with CCI. However, we still cannot conclude with certainty that SVM is a superior algorithm to RF. The advantage of RF is that RF is more interpretable because the RF algorithm enables the measuring of variables’ importance. This is helpful for variable selection and improved modeling.

In this research, we successfully identified the risk factors for prostate cancer death and all deaths among patients with localized prostate cancer in Taiwan. The research findings could serve as a valuable reference for epidemiological research in the east Asian prostate cancer population, whose clinical courses and epidemiological features differ from Caucasian and Black populations’ [42, 43, 44]. The model we established could be helpful clinically during the shared decision-making process. The nature of compulsory enrollment in Taiwan’s National Health Insurance enabled the comprehensive compilation of medical expense records and ensured that NHIRD constitutes a high-quality database in terms of data integrity. Integrity is an essential property for a database to assure consistency and statistical accuracy [45]. To maintain dataset integrity, we strive to avoid unnecessary sample dropouts. Therefore, a new method was designed to model our dataset with RF and SVM algorithms.

Because follow-up time can affect the duration of survival event observation, the follow-up time was included among the inputs for machine learning models. We have validated this new method of machine learning modeling for survival prediction using our dataset for localized prostate cancer, a cancer well-known to be associated with long survival. The longer the expected survival is, the longer follow-up time is required for observing survival events. Numerous researchers use fixed follow-up times for some cancers with shorter target event-free survival, for example, 3 years or 5 years [46, 47, 48, 49, 50]. However, no definite conclusions may be drawn regarding what follow-up time for observation of localized prostate cancer survival is adequate: this might be 10 years, 20 years, or even longer. Such follow-up times are too long for most databases to accomplish complete studies, currently. Using this right-censoring method, patients who have inadequate follow-up time can be retained while compared with the fixed follow-up time, such as 3 or 5 years, for observing survival events. Retaining the information provided by the inadequate follow-up records is helpful to maintain the integrity of our database for facilitating unbiased analysis and interpretation. In this study, we proved the method used to be feasible. The accuracy of our machine learning model was up to 80%. In clinical practice, we can assign follow-up times (t) as 1 year, 2 years, and so on to predict survival.

According to our findings in this study, we strongly suggest that researchers take CCI and associated items into account when developing new models for prostate cancer decision aids in the future. Furthermore, our machine learning models with satisfying accuracy, can serve for a decision aid during shared decision-making process [51]. It is helpful for physicians and patients to use our model to predict the survivals under different choices of definite treatment for prostate cancer. However, we cannot make suggestions until further evaluation of patients’ psychological impacts after their receiving of such machine-learning suggestions. Before our comorbidity-reinforced model could be added into standard shared decision-making process [51] in clinical practice, we suggest that trials for assessment of patients’ psychological and emotional impacts caused by machine learning suggestions need to be conducted. After these results are available, we might be able to design a new shared decision-making process reinforced by machine-learning decision aid..

This research has three limitations. First, prostate specific antigen (PSA) data is unavailable in the NHIRD. The pretreatment PSA level and the elevation of the PSA level after definite treatment of localized prostate cancer is associated with higher risks of prostate cancer–specific death [52]. Williams et al. also demonstrated that both pretreatment and posttreatment PSA levels were associated with an overall mortality of localized prostate cancer [53]. The lack of data regarding PSA would represent a major loss of crucial information for the status of extension of prostate cancer and would compromise the establishment of the Cox hazard proportional model and machine learning models. However, little evidence addresses the association between the PSA level and noncancerous mortality of prostate cancer. In our dataset, the proportion of non-cancerous mortality is as high as 75.06% of the overall mortality. The high proportion of noncancerous death in our dataset would attenuate the effects of the PSA data deficiency. Furthermore, our machine learning model achieves accuracy of up to 81 %, even in the absence of PSA data. This is an excellent proof of the significant role of CCI and its items.

The second limitation is that no uniform procedure was available for surveying the patients’ comorbidities, which often exist in retrospective datasets. The lack of standard procedures resulted in the neglect of some preexisting comorbidities and the underestimating of CCI. However, this bias can be partially diminished by routine preoperative evaluation, which is commonly performed in most Taiwanese medical centers and provides indications regarding some occult comorbidities. Although many machine learning models exist for prediction, they have seldom been used in clinical practice [54]. Future studies might aim at surveying the effects of various machine learning models in clinical practice.

The third limitation is the lack of records regarding self-paid treatments in the NHIRD database. Overcoming this limitation is difficult. Some brachytherapy, intensity-modulated radiotherapy, image-guided radiotherapy, robotic surgery, or laparoscopic RP are partially or fully paid for by patients themselves. The NHIRD database provides no information regarding the exact methods for the RT or RPs that patients undergo. This limitation causes bias for model building and survival prediction. However, National Health Insurance has begun to implement the policy of recording the code for every self-paid treatment. The respective problem will no longer exist in the future.

5 Conclusions

The CCI and its items are powerful predictors for cancerous outcomes. Using the CCI and its items for building machine learning models would enhance the predictive power of machine learning models utilizing RF and SVM algorithms. The various treatment choices are statistically insignificant after adjustment by Cox regression to cancerous outcomes. The outcome prediction of comorbidity-reinforced machine learning models has achieved acceptably high accuracy and thus could serve as an individualized decision aid during the shared medical decision-making process for localized prostate cancer after evaluation of psychological and emotional impacts caused by suggestions yielded with the model.



radical prostatectomy


external beam radiotherapy


Charlson comorbidity index


random forest


support vector machine


National Health Insurance Database


quality of life


Taiwan Cancer Registry


  • [1]

    Lehto US, Ojanen M, Vakeva A, Dyba T, Aromaa A, Kellokumpu-Lehtinen P. Early quality-of-life and psychological predictors of disease-free time and survival in localized prostate cancer. Qual Life Res. 2019;28(3):677-686 CrossrefPubMedGoogle Scholar

  • [2]

    Adam S, Feller A, Rohmman S, Arndt V. Health-related quality of life among long-term (≥5 years) prostate cancer survivors by primary intervention: a systematic review. Health Qual Life Outcomes. 2018;16(1):22 PubMedCrossrefGoogle Scholar

  • [3]

    Serrell EC, Pitts D, Hayn M, Beaule L, Hansen MH, Sammon JD. Review of the comparative effectiveness of radical prostatectomy, radiation therapy, or expectant management of localized prostate cancer in registry data. Urol Oncol. 2018;36(4):183-192 CrossrefPubMedGoogle Scholar

  • [4]

    Wallis CJ, Glaser A, Hu JC, Huland H, Lawrentschuk N, Moon D, et al. Survival and complications following surgery and radiation for localized prostate cancer: an international collaborative review. European urology. 2018;73(1):11-20 PubMedCrossrefGoogle Scholar

  • [5]

    Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Computational and structural biotechnology journal. 2015;13:8-17 PubMedCrossrefGoogle Scholar

  • [6]

    Alkadi R, Taher F, El-baz A, Werghi NJ. A Deep Learning-Based Approach for the Detection and Localization of Prostate Cancer in T2 Magnetic Resonance Images J Digit Imaging. 2018 Nov 30. doi: 10.1007/s10278-018-0160-1. [Epub ahead of print] Google Scholar

  • [7]

    Wang G, Teoh JY-C, Choi K-S, editors. Diagnosis of prostate cancer in a Chinese population by using machine learning methods. Conf Proc IEEE, Eng Med Biol Soc. 2018;2018:1-4 Google Scholar

  • [8]

    Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of Chronic Disease.1987;40(5):373-383 CrossrefGoogle Scholar

  • [9]

    Koppie TM, Serio AM, Vickers AJ, Vora K, Dalbagni G, Donat SM, et al. Age‐adjusted Charlson comorbidity score is associated with treatment decisions and clinical outcomes for patients undergoing radical cystectomy for bladder cancer. Cancer: Interdisciplinary International Journal of the American Cancer Society. 2008;112(11):2384-2392 CrossrefGoogle Scholar

  • [10]

    Lee JY, Lee DH, Cho NH, Rha KH, Choi YD, Hong SJ, et al. Charlson comorbidity index is an important prognostic factor for long-term survival outcomes in Korean men with prostate cancer after radical prostatectomy. Yonsei medical journal. 2014;55(2):316-323 PubMedCrossrefGoogle Scholar

  • [11]

    Rajan P, Sooriakumaran P, Nyberg T, Akre O, Carlsson S, Egevad L, et al. Effect of Comorbidity on Prostate Cancer– Specific Mortality: A Prospective Observational Study. Journal of Clinical Oncology. 2017;35(31):3566 CrossrefGoogle Scholar

  • [12]

    Jespersen CG, Nørgaard M, Jacobsen JB, Borre MJ. Patient comorbidity is associated with conservative treatment of localized prostate cancer. Scandinavian journal of urology. 2015;49(5):366-370 CrossrefPubMedGoogle Scholar

  • [13]

    Wang T-H, Liu C-J, Chao T-F, Chen T-J, Hu Y-W. Second primary malignancy risk after radiotherapy in rectal cancer survivors. World journal of gastroenterology 2018;24(40):4586-4595 CrossrefPubMedGoogle Scholar

  • [14]

    Wei Y-F, Chen J-Y, Lee H-S, Wu J-T, Hsu C-K, Hsu Y-C. Association of chronic kidney disease with mortality risk in patients with lung cancer: a nationwide Taiwan population-based cohort study. 2018 Jan 24;8(1):e019661. doi: 10.1136/bmjopen-2017-019661 Google Scholar

  • [15]

    Wang C-Y, Huang H-S, Su Y-C, Tu C-Y, Hsia T-C, Huang S-T. Conventional treatment integrated with Chinese herbal medicine improves the survival rate of patients with advanced non-small cell lung cancer. Complementary therapies in medicine 2018;40:29-36 PubMedCrossrefGoogle Scholar

  • [16]

    Yang C-C, Fong Y, Lin L-C, Que J, Ting W-C, Chang C-L, et al. The age-adjusted Charlson comorbidity index is a better predictor of survival in operated lung cancer patients than the Charlson and Elixhauser comorbidity indices. European Journal of Cardio-Thoracic Surgery. 2017;53(1):235-240Google Scholar

  • [17]

    Ding D-C, Chen W, Wang J-H, Lin S-Z. Association between polycystic ovarian syndrome and endometrial, ovarian, and breast cancer: A population-based cohort study in Taiwan. Medicine. 2018;97(39) PubMedGoogle Scholar

  • [18]

    Hung SC, Liao KF, Hung HC, Lin CL, Lee PC, Hung SJ, et al. Tamoxifen use correlates with increased risk of hip fractures in older women with breast cancer: A case–control study in Taiwan. Geriatr Gerontol Int. 2019;19(1):56-60. doi: 10.1111/ ggi.13568 CrossrefPubMedGoogle Scholar

  • [19]

    Tsai W-C, Kung P-T, Wang Y-H, Kuo W-Y, Li Y-HJPo. Influence of the time interval from diagnosis to treatment on survival for early-stage liver cancer. PloS one. 2018 Jun 22;13(6):e0199532. doi: 10.1371/journal.pone.0199532 CrossrefPubMedGoogle Scholar

  • [20]

    Kao HH, Kao LT, Li IH, Pan KT, Shih JH, Chou YC, et al. Androgen Deprivation Therapy Use Increases the Risk of Heart Failure in Patients With Prostate Cancer: A Population‐ Based Cohort Study. J Clin Pharmacol. 2019;59(3):335-43. doi: 10.1002/jcph.1332 CrossrefPubMedGoogle Scholar

  • [21]

    Kao LT, Xirasagar S, Lin HC, Huang CY. Association Between Pioglitazone Use and Prostate Cancer: A Population‐Based Case‐Control Study in the Han Population. J Clin Pharmacol. 2019;59(3):344-49. doi: 10.1002/jcph.1326 CrossrefGoogle Scholar

  • [22]

    Jhan J-H, Yeh H-C, Chang Y-H, Guu S-J, Wu W-J, Chou Y-H, et al. New-onset diabetes after androgen-deprivation therapy for prostate cancer: A nationwide propensity score-matched four-year longitudinal cohort study. J Diabetes Complications. 2018;32(7):688-692. doi: 10.1016/j.jdiacomp.2018.03.007 CrossrefPubMedGoogle Scholar

  • [23]

    Hu Y-H, Tai C-T, Chen SC-C, Lee H-W, Sung S-F, biomedicine pi. Predicting return visits to the emergency department for pediatric patients: Applying supervised learning techniques to the Taiwan National Health Insurance Research Database. Computer methods and programs in biomedicine. 2017;144:105-112 CrossrefGoogle Scholar

  • [24]

    Wang K-J, Adrian AM, Chen K-H, Wang K-M, biomedicine pi. A hybrid classifier combining borderline-SMOTE with AIRS algorithm for estimating brain metastasis from lung cancer: A case study in taiwan. Computer methods and programs in biomedicine 2015;119(2):63-76 CrossrefGoogle Scholar

  • [25]

    Cheng CL, Lee CH, Chen PS, Li YH, Lin SJ, Yang YH. Validation of acute myocardial infarction cases in the national health insurance research database in taiwan. J Epidemiol. 2014;24(6):500-507 PubMedCrossrefGoogle Scholar

  • [26]

    Cheng CL, Chien HC, Lee CH, Lin SJ, Yang YH. Validity of in-hospital mortality data among patients with acute myocardial infarction or stroke in National Health Insurance Research Database in Taiwan. Int J Cardiol. 2015;201:96-101 CrossrefPubMedGoogle Scholar

  • [27]

    Cheng CL, Kao YH, Lin SJ, Lee CH, Lai ML. Validation of the National Health Insurance Research Database with ischemic stroke cases in Taiwan. Pharmacoepidemiol Drug Saf. 2011;20(3):236-242 PubMedCrossrefGoogle Scholar

  • [28]

    Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. Journal of chronic diseases. 1987;40(5):373-383 CrossrefPubMedGoogle Scholar

  • [29]

    Chawla NV. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook: Springer; 2010. p. 875-886 

  • [30]

    Mohler JL. The 2010 NCCN clinical practice guidelines in oncology on prostate cancer. Harborside Press, LLC; 2010 Google Scholar

  • [31]

    Potretzke AM, Kim EH, Knight BA, Anderson BG, Park AM, Figenshau RS, et al. Patient comorbidity predicts hospital length of stay after robot-assisted prostatectomy. Journal of robotic surgery 2016;10(2):151-156 PubMedCrossrefGoogle Scholar

  • [32]

    Abdollah F, Schmitges J, Sun M, Jeldres C, Tian Z, Briganti A, et al. Comparison of mortality outcomes after radical prostatectomy versus radiotherapy in patients with localized prostate cancer: a population‐based analysis. International Journal of Urology. 2012;19(9):836-844 CrossrefGoogle Scholar

  • [33]

    Petrelli F, Vavassori I, Coinu A, Borgonovo K, Sarti E, Barni S. Radical prostatectomy or radiotherapy in high-risk prostate cancer: a systematic review and metaanalysis. Clin Genitourin Cancer. 2014;12(4):215-224 CrossrefPubMedGoogle Scholar

  • [34]

    Guidance N. Routine preoperative tests for elective surgery. BJU International. 2018;121(1):12-16 CrossrefPubMedGoogle Scholar

  • [35]

    Matthes KL, Limam M, Pestoni G, Held L, Korol D, Rohrmann S, et al. Impact of comorbidities at diagnosis on prostate cancer treatment and survival. Journal of cancer research and clinical oncology. 2018;144(4):707-715 PubMedCrossrefGoogle Scholar

  • [36]

    Guzzo TJ, Dluzniewski P, Orosco R, Platz EA, Partin AW, Han MJ. Prediction of mortality after radical prostatectomy by Charlson comorbidity index. Urology. 2010;76(3):553-557 PubMedCrossrefGoogle Scholar

  • [37]

    Koppie TM, Serio AM, Vickers AJ, Vora K, Dalbagni G, Donat SM, et al. Age‐adjusted Charlson comorbidity score is associated with treatment decisions and clinical outcomes for patients undergoing radical cystectomy for bladder cancer. 2008;112(11):2384-2392 Google Scholar

  • [38]

    Matthes KL, Limam M, Pestoni G, Held L, Korol D, Rohrmann S, et al. Impact of comorbidities at diagnosis on prostate cancer treatment and survival. 2018;144(4):707-715 Google Scholar

  • [39]

    de Camargo Cancela M, Comber H, Sharp L. Age remains the major predictor of curative treatment non-receipt for localised prostate cancer: a population-based study. British journal of cance. 2013;109(1):272 CrossrefGoogle Scholar

  • [40]

    Chen T-B, Yiao S-Y, Sun Y, Lee H-J, Yang S-C, Chiu M-J, et al. Comorbidity and dementia: a nationwide survey in Taiwan. PLoS One. 2017;12(4):e0175475 PubMedCrossrefGoogle Scholar

  • [41]

    Fleiss J. Statistical methods for rates and proportions 2nd edition1981. New York: John Wiley. ISBN 0-471-26370-2; 1981 Google Scholar

  • [42]

    Chao GF, Krishna N, Aizer AA, Dalela D, Hanske J, Li H, et al., editors. Asian Americans and prostate cancer: A nationwide population-based analysis. Urol Oncol. 2016;34(5):233.e7-15. doi: 10.1016/j.urolonc.2015.11.013 PubMedGoogle Scholar

  • [43]

    Jeong IG, Dajani D, Verghese M, Hwang J, Cho YM, Hong JH, et al., editors. Differences in the aggressiveness of prostate cancer among Korean, Caucasian, and African American men: A retrospective cohort study of radical prostatectomy. Urol Oncol. 2016;34(1):3.e9-14. doi: 10.1016/j.urolonc.2015.08.004 Google Scholar

  • [44]

    Taitt H. Global Trends and Prostate Cancer: A Review of Incidence, Detection, and Mortality as Influenced by Race, Ethnicity, and Geographic Location. American journal of men’s health 2018;12(6):1807-1823 CrossrefPubMedGoogle Scholar

  • [45]

    Ravetz J. Integrity must underpin quality of statistics. Nature. 2018;553(7688):281 CrossrefPubMedGoogle Scholar

  • [46]

    Facciorusso A, Di Maso M, Serviddio G, Vendemiale G, Spada C, Costamagna G, et al. Factors associated with recurrence of advanced colorectal adenoma after endoscopic resection. Clinical Gastroenterology and Hepatology. 2016;14(8):1148-54. e4 CrossrefGoogle Scholar

  • [47]

    Kim W, Kim KS, Park RW. Nomogram of Naive Bayesian Model for Recurrence Prediction of Breast Cancer. Healthcare informatics research. 2016;22(2):89-94 CrossrefPubMedGoogle Scholar

  • [48]

    Ramkumar C, Buturovic L, Malpani S, Kumar Attuluri A, Basavaraj C, Prakash C, et al. Development of a Novel Proteomic Risk-Classifier for Prognostication of Patients With Early-Stage Hormone Receptor–Positive Breast Cancer. Biomarker insights. 2018;13:1177271918789100. doi: 10.1177/1177271918789100 PubMedGoogle Scholar

  • [49]

    Shinagare AB, Balthazar P, Ip IK, Lacson R, Liu J, Ramaiya N, et al. High-Grade Serous Ovarian Cancer: Use of Machine Learning to Predict Abdominopelvic Recurrence on CT on the Basis of Serial Cancer Antigen 125 Levels. J Am Coll Radiol. 2018;15(8):1133-1138. doi: 10.1016/j.jacr.2018.04.008 CrossrefGoogle Scholar

  • [50]

    Takada M, Sugimoto M, Masuda N, Iwata H, Kuroi K, Yamashiro H, et al. Prediction of postoperative disease-free survival and brain metastasis for HER2-positive breast cancer patients treated with neoadjuvant chemotherapy plus trastuzumab using a machine learning algorithm. Breast cancer research and treatment 2018;172(3):611-618 PubMedCrossrefGoogle Scholar

  • [51]

    Elwyn G, O’Connor A, Stacey D, Volk R, Edwards A, Coulter A. The International Patient Decision Aid Standards (IPDAS) Collaboration. Developing a quality criteria framework for patient decision aids: online international Delphi consensus process. British Medical Journal. 2006;13:417-422 Google Scholar

  • [52]

    Van den Broeck T, van den Bergh RC, Arfi N, Gross T, Moris L, Briers E, et al. Prognostic Value of Biochemical Recurrence Following Treatment with Curative Intent for Prostate Cancer: A Systematic Review. European urology. 2019;75(6):967-987 CrossrefPubMedGoogle Scholar

  • [53]

    Williams SG, Duchesne GM, Millar JL, Pratt GR. Both pretreatment prostate-specific antigen level and posttreatment biochemical failure are independent predictors of overall survival after radiotherapy for prostate cancer. International Journal of Radiation Oncology, Biology, Physics. 2004;60(4):1082-1087 PubMedCrossrefGoogle Scholar

  • [54]

    Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. 2015;13:8-17 Google Scholar

About the article

Received: 2019-02-27

Accepted: 2019-06-06

Published Online: 2019-08-14

Conflict of interestConflict of interest statement: Authors state no conflict of interest.

Citation Information: Open Medicine, Volume 14, Issue 1, Pages 593–606, ISSN (Online) 2391-5463, DOI: https://doi.org/10.1515/med-2019-0067.

Export Citation

© 2019 Yi-Ting Lin et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 Public License. BY 4.0

Comments (0)

Please log in or register to comment.
Log in