Prediction of Recurrence-associated Death from Localized Prostate Cancer with a Charlson Comorbidity Index–reinforced Machine Learning Model

Abstract Research has failed to resolve the dilemma experienced by localized prostate cancer patients who must choose between radical prostatectomy (RP) and external beam radiotherapy (RT). Because the Charlson Comorbidity Index (CCI) is a measurable factor that affects survival events, this research seeks to validate the potential of the CCI to improve the accuracy of various prediction models. Thus, we employed the Cox proportional hazard model and machine learning methods, including random forest (RF) and support vector machine (SVM), to model the data of medical records in the National Health Insurance Research Database (NHIRD). In total, 8581 individuals were enrolled, of whom 4879 had received RP and 3702 had received RT. Patients in the RT group were older and exhibited higher CCI scores and higher incidences of some CCI items. Moderate-to-severe liver disease, dementia, congestive heart failure, chronic pulmonary disease, and cerebrovascular disease all increase the risk of overall death in the Cox hazard model. The CCI-reinforced SVM and RF models are 85.18% and 81.76% accurate, respectively, whereas the SVM and RF models without the use of the CCI are relatively less accurate, at 75.81% and 74.83%, respectively. Therefore, CCI and some of its items are useful predictors of overall and prostate-cancer-specific survival and could constitute valuable features for machine-learning modeling.


Introduction
Localized prostate cancer is one of the most common male cancers and has led to unavoidable cancer death and impairment of quality of life [1,2]. Current evidence has already demonstrated that great controversy exists regarding the treatment options for localized prostate cancer. Radical prostatectomy (RP) is associated with the lowest cancer-specific mortality in observational studies [3,4], but selection bias exists. The selection bias means that patients with low comorbidity tend to undergo RP, whereas patients with high comorbidity are recommended to receive external beam radiotherapy (RT) [3,4]. Randomized trials have demonstrated no significant differences between patients undergoing RP and patients undergoing RT in overall and cancer-specific survival. In addition, no differences are exhibited in long-term quality of life (QoL) for patients undergoing RP versus those undergoing RT [4]. Currently, the shared decision-making process has been made part of standard practice in medical decision management for localized prostate cancer. However, patients who must choose their treatments encounter a dilemma when they have information regarding only the reported statistical cancer outcomes and posttreatment QoL but are unable to take personal characteristics into consideration [3,4]. Integrating cancer characteristics, comorbidity, and cancerous outcomes into a machine-learning model, and obtaining a predictive result might represent a solution to the dilemma.
Machine-learning modeling could be employed as a powerful tool for addressing the problem of treatment-re-lated decisions for patients with prostate cancer. In the past 20 years, an increasing number of new, powerful algorithms and computer science advances have made the modeling of big medical data possible. The machine learning models formulated using big medical data enable individualized predictions of clinical outcomes. Several instances of this have already occurred in the areas of oncology outcome prediction. Algorithms, such as support vector machine (SVM), random forest (RF), artificial neural network, and decision tree algorithms, have been applied for modeling with acceptable accuracy [5]. Kourou et al. indicated that a small sample size represents the most common research limitation for applications of machine learning in past decades [5]. Furthermore, after reviewing the literature thoroughly, they also concluded that the dataset quality and careful feature selection are also crucial for effective machine learning and accurate prediction. Relatively few references currently utilize machine learning algorithms for prostate cancer research [6,7]. Two studies have employed machine learning to maximize the detection accuracy of prostate cancer. The absence of research focusing on machine learning predictions of localized prostate cancer outcomes may be a result of the long survival times following treatment. The longer the duration of disease-free survival is, the greater the chances are that non-prostate-cancerous factors, such as comorbidity and accidents, constitute the primary determinants of survival. In addition to cancerous characteristics, therefore, this research investigates the effects of non-prostate-cancerous factors for modeling long-survival cancers.
The Charlson Comorbidity Index (CCI) [8] serves as a valuable, measurable indicator for improving prediction accuracy with machine learning models for long life expectancy cancers. The CCI has been recognized as influential in relation to the clinical outcomes for localized prostate cancer. Park et al. declared CCI to be a prognostic factor for RP outcomes [9]. Lee et al. demonstrated that CCI is a major prognostic factor for long-term survival after RP [10]. Later, Rajan et al. denied the effect of comorbidity on cancer-specific survival in prostate cancer patients [11]. However, they all found comorbidity to be a reliable predictor for overall survival. Comorbidity has also been proven to represent a strong predictor of overall survival for patients treated with radiotherapy. Jespersen CG et al. demonstrated that the choice of treatments for localized prostate cancer is affected by comorbidity [12]. Therefore, CCI is not only a factor that influences survival but is also an indicator for treatment options.
Taiwan's National Health Insurance Research Database (NHIRD), created in 1998, is an excellent resource for medical research because of the compulsory enrollment involved in Taiwanese national health insurance. The medical expense records of more than 99% of Taiwan's population are included in the NHIRD. The NHIRD currently provides 78 databases openly accessible, including 59 health databases, 5 social databases, and 14 welfare databases. The first to 14th database contents include health insurance-related data files relating to ambulatory patients, inpatients, pharmacy information and medical care orders, and cause-of-death data. As part of the NHIRD, the Taiwan Cancer Registry (TCR) was created by the Department of Health to collect cancer epidemiological data from 1996 onwards comprehensively. TCR has become an indispensable resource for Taiwan cancer research. Using the NHIRD, Wang's work surveyed second primary malignancy risk after radiotherapy for patients with rectal cancer [13]. Researchers used the NHIRD to prove that the survival of lung cancer is affected by a common comorbidity, chronic renal insufficiency [14]. A type of Chinese herb was demonstrated to improve the survival of lung cancer with this database [15]. Wei et al. also used the NHIRD to clarify the effects of the comorbidity chronic kidney disease on the mortality risk of lung cancer patients [14]. Yang et al. recognized that CCI is a good predictor for NHIRD patients with lung cancer using a Cox regression model (16). Successively, fine results have been reported in NHIRD-related literature relating to breast cancer [17,18] and hepatoma [19]. Prostate cancer-related issues with NHIRD have also been published [20][21][22]. However, these studies related to prostate cancer have not focused on cancerous recurrence or death. The big data of NHIRD have also been analyzed by various researchers using machine learning modeling. Hu et al. used the NHIRD to conduct machine learning predictions regarding return visits for pediatric patients to the emergency department [23]. Wang et al. utilized machine learning classifiers for the prediction of brain metastasis of lung cancer for patients from the NHIRD [24]. Some research has aimed to validate comorbidity data, such as acute myocardial infarction or stroke, and death data. High consistency among these data has been observed [25][26][27]. Thus, data derived from the NHIRD should be extremely suitable for the machine learning modeling of prostate cancer survival.
To our knowledge, no machine learning models have been able to predict localized prostate cancer outcomes. Although CCI has been proven to represent a significant prognostic factor for treatment outcome of localized prostate cancer, few studies have taken the CCI into account when building machine learning models. This study focused mainly on building a CCI-reinforced machine learning model for the prediction of different treatment outcomes. Thus, the purposes of this study were 1) to use data extracted from the NHIRD to compare the cancer outcomes for patients with localized prostate cancer who have undergone RP versus patients with localized prostate cancer who have undergone RT, 2) to analyze the factors that influence cancer-specific and overall survival, and 3) to observe the effects of the CCI on the improvement of prediction ability (and thus to determine whether the CCI represents a significant factor in cancer outcomes).

Search for target population in database
This research was a retrospective study targeting patients with localized prostate cancer who had undergone RP or RT. Both ICD9 (185) and ICD-10 (C61) were used to identify patients with prostate cancer in the NHIRD from 2008 to 2015. To search for clinical T1N0M0, and T2N0M0 patients, the TCR database was incorporated to obtain information on the initial clinical stages of patients. The next step was using the radical prostatectomy procedure code (79403B, 79410B) and order codes for external beam radiation therapy (36015B, 36002B, 36005B, 36012B, 36019B) to search for patients who had received RP or RT as their initial definite treatment. Individuals with incomplete data on clinical stages, histological grading, or radiotherapy start-date, for example, were excluded. Patients whose recurrence statuses were recorded as "uncertain" or whose radiation modality was marked "unknown or other than external beam radiation therapy" in the TCR database were also excluded. The flow chart for the study population retrieved from NHIRD appears in Figure 1. The demographic statistical comparison between the RP group and the RT group is as seen in Table 1.

Data collection and definition of variables
Various meaningful variables are extracted from the medical expense records and Cancer Registry records of    [28] are composited by recording the presence or absence of the following conditions, including myocardial infarction, congestive heart failure, peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, mild liver disease, diabetes without chronic complication, diabetes with chronic complication, hemiplegia or paraplegia, renal disease, any malignancy, including lymphoma and leukemia, except malignant neoplasm of skin, moderate or severe liver disease, metastatic solid tumor, AIDS/HIV, and the CCI. We obtained the status of every comorbidity item with ICD-9 code screening of every comorbidity contained in the CCI in the outpatient department and hospitalization expense database. The cancerous outcome variables, including overall survival time and cancer-specific survival time, were retrieved from the cause-of-death file. The aforementioned data were analyzed and modeled in the following sections.
The demographic and cancer characteristics are listed and compared in Table 1, such as comorbidities and cancerous outcomes among treatment groups extracted from the NHIRD. The chi-square test was used for categorical outcomes, and the independent t test was used for numeric outcomes.

Establish significant causal factors for cancerous outcomes with the Cox Hazard regression model
We stratified all patients according to their T stage, grade, initial definite treatment, and CCI, and we compared the cancer-specific survival and overall survival using the Kaplan-Meier method ( Figure 2). The significant causal factors were identified for recurrence-free survival, disease specific survival, and overall survival time using a Cox hazard regression model ( Table 2).

Comparison of machine learning models with and without CCI
Machine learning models, including RF and SVM, were adopted for the prediction of cancer outcomes. To prove  that CCI would enhance the predictive ability of a machine learning model, we created the models using two stages. The first stage was training models with cancerous characteristic factors and follow-up durations as the inputs for the model and with cancerous outcomes as the outputs for the model. The only difference in the second stage was the addition of comorbidity variables along with the other input variables in stage one of the model, as seen in Table  3. The parameters for the setting of every algorithm were all by default. Three new data sets were constructed with the ratios of 1:1, 1:2, 1:3, to make up the imbalanced classes of cancer outcomes [29]. Five-fold cross validation was then applied to these three datasets. The mean accuracy, mean AUC, and mean kappa value feature in Table 4.  The CCI-reinforced and CCI absent RF and SVM models were built. We evaluated the ability of model with accuracy, sensitivity, specificity and kappa.

Software
SAS version 9.4 was used for data extraction, data preprocessing from the NHIRD, and Cox regression analysis. The R studio 3.5.1 was used as a platform to implement the machine learning models. Several R packages were used, including randomForest for RF modeling, e1071 for SVM modeling, pROC and caret for calculating accuracy and kappa, and survival and survminer for Kaplan-Meier.

Results
In total, 8581 patient's records were included in the study: 4879 who had received radical prostatectomies (the RP group) and 3702 who had received radiotherapy (the RT group). As seen in Table 1 Table 2. The CCI was stratified into seven levels equivalent to CCIs of 0, 1, 2, 3, 4, 5, and 6 or more. Compared with patients with CCI0, Patients with CCI1, CCI2, CCI3, CCI4, CCI5, and CCI6+ had statistically significantly higher risk of overall death, with the hazard ratios 1.461, 1.816, 2.015, 2.455, 2.108, and 2.767. Grade, initial definite treatment, and age also played significant roles in the risk of overall death. Among CCI items, moderate-to-severe liver disease together with dementia, congestive heart failure, chronic pulmonary disease, and cerebrovascular disease were significantly associated with higher overall risk of death. Conversely, the patients classified as CCI1, CCI2, CCI3, CCI4, CCI5, and CCI6+ had statistically nonsignificant risks for prostate cancer-specific death compared with patients classified as CCI0. Grade and initial definite treatment were significantly associated with higher risk of prostate cancer-specific mortality. Metastatic solid tumors followed by moderate-to-severe renal disease would increase the risk of prostate cancer-specific death. The accumulated survival event curves stratified with age, initial definite treatment, grade, and CCI were plotted. The Kaplan-Meier test revealed that the RP group, low grade groups, and low CCI groups were statistically significantly associated with fewer overall mortality events (Figure 2).
Because the CCI and its items were highly correlated with the overall mortality rather than with prostate cancer-specific mortality, we created models with and without CCI and the related items to predict the overall numbers of deaths instead of prostate cancer-specific deaths. The overall mortality in our dataset was 413, so we had to downsample the relatively larger group of patients who survived until the end of the follow-up period with the numbers 413, 826, and 1239 according to the previously designed ratios of 1:1, 1:2, and 1:3. The CCI-reinforced SVM model with a 1:1 sampling ratio yielded the best accuracy and kappa (0.8518, 0.6315) for prediction of overall numbers of deaths. SVM without CCI yielded a less accurate model with a smaller kappa value (0.7581, 0.5164). The same phenomenon was observed in the case of an RF model. The CCI-reinforced RF model had higher accuracy and kappa (0.80, 0.60) than the RF model without CCI had (0.7483, 0.4946). Different downsampling ratios for the survival groups did not affect the accuracy or kappa significantly, as ratios of 1:1, 1:2, and 1:3 selected. (Table 4)

Discussion
Patient selection for RP or RT is an art for urologists and patients. According to the popular clinical guidelines followed by most urologists in Taiwan, flow charts in NCCN clinical practice guidelines for prostate cancer 2010 [30] recommended that RPs are more suitable for localized prostate cancer patients with life expectancies of more than 10 years. This is because risks associated with RP are higher than those for radiotherapy during the procedure. RP also yields a better cancerous outcome in intermediateand high-risk localized prostate cancer than RT does. For patients with life expectancies of less than 10 years and with lower clinical T stages and histological grading, RT might be sufficient to protect them from mortality caused by prostate cancer. In Taiwan, the cutoff age for a 10-year life expectancy is 76 years during the study period. Average ages (RP: 65.79 years old, RT: 74.10) for these two groups in our dataset are consistent with the NCCN recommendation, as seen in Table 1. More young patients personally chose or were advised by physicians to undergo radical prostatectomies. In addition, we found that patients with fewer comorbidities tended to undergo radical prostatectomies rather than radiotherapy (Table 1). This finding is compatible with Jespersen's work, in which the choice of treatment for localized prostate cancer is demonstrated to be affected by comorbidity [12]. This is relatively reasonable because RP is a major surgery, and its success is highly dependent on patients' general conditions. Systemic diseases yield more perioperative complications and prolong hospital stay [31]. Our findings furthermore revealed that the RT group had less favorable prostate cancer-specific mortality. This finding is consistent with Abdollah's work and the subsequent systemic review and meta-analysis. This might result from the nature of sparing radiation insensitive tumor cells during radiotherapy, which would not happen in the case of RP [32,33]. In the RP group in our dataset, more patients had diabetes mellitus without end organ damage compared with other comorbidities. This may have been due to routine blood sugar tests according to clinical practice guidelines for preoperative blood tests [34]. Routine blood glucose checkup led to exhaustive detection and even overdiagnosis of diabetes mellitus.
Our data from the NHIRD of Taiwan revealed that CCI and associated items are critical risk factors for the overall survival of localized prostate cancer patients but not for prostate cancer-specific survival. These findings are consistent with the relevant literature. Matthes et al. surveyed the effects of comorbidities for prostate cancer victims. They found that CCI (CCI1: HR: 2.07 (1.51-2.85) and CCI2+: 2.34 (1.59-2.34)) were associated with overall mortality rather than prostate cancer-specific mortality [35]. The increase in the overall mortality risk was also aligned with our results. Thomas' study demonstrated the same conclusions in patients receiving RP [36]. The findings of Jae et al. also supported this assertion in patients with RP [37]. In line with our results, Matthes claimed that CCI played a more significant role than age in overall mortality risks [38]. Cancela and colleagues presented contradictory findings that age remains a major predictor in the untreated localized prostate cancer population [39]. To our knowledge, much literature addresses the measurement of the risks for every single CCI item. We established a Cox proportional hazard model to assess the effects of all CCI items and observed severe liver disease and dementia to be the most important factors associated with overall mortality. Minor factors, including congestive heart failure, chronic pulmonary disease, cerebrovascular disease, were also discovered. Because the score for moderate-to-severe liver disease is 3, moderate-to-severe liver disease has a higher correlation with overall mortality. Chen et al. studied the population for individuals with dementia in Taiwan and found them to usually have multiple comorbidities and to be poorly cared for [40]. This might be the reason why dementia is relevant. In clinical practice, informing patients and their families to take dementia into consideration during their decision-making processes might be advisable.
Our data suggested that the CCI and its items can reinforce the machine learning classifier for survival prediction of localized prostate cancer. As a good predictor of overall death, CCI and its items could be extracted from the NHIRD and successfully modeled with RF and SVM algorithms. The highest accuracy rate of our model is 85.18%, with the kappa value 0.6315. Fleiss defined the quality of classifier as good to fair with kappa values between 0.4 and 0.75 [41]. Currently, researchers only relatively rarely model prostate cancer survival with machine learning algorithms. This study might constitute a useful reference for other research. Imbalanced classification is fairly common in most real-world cases, and our dataset used a ratio of 1:20. Downsampling is a well-known and widely used technique for solving imbalance problems [29]. In this study, we followed the recommendations of this handbook that the ratios of 1:1, 1:2, and 1:3 be used, and we observed that accuracy did not change significantly. However, the kappa value and sensitivity decreased as the ratio of sampling increased from 1:1 to 1:3. This occurred because the false negative prediction increased as the ratio of sampling increased. Subsequently, even the 1:3 RF model with CCI yielded a maximum accuracy of up to 82.62% among RF models, but we still regarded the 1:1 RF model with CCI as the best model among RF models because it has both high accuracy and kappa. Our dataset suggested that SVM models with CCI were superior to RF models with CCI. However, we still cannot conclude with certainty that SVM is a superior algorithm to RF. The advantage of RF is that RF is more interpretable because the RF algorithm enables the measuring of variables' importance. This is helpful for variable selection and improved modeling.
In this research, we successfully identified the risk factors for prostate cancer death and all deaths among patients with localized prostate cancer in Taiwan. The research findings could serve as a valuable reference for epidemiological research in the east Asian prostate cancer population, whose clinical courses and epidemiological features differ from Caucasian and Black populations' [42][43][44]. The model we established could be helpful clinically during the shared decision-making process. The nature of compulsory enrollment in Taiwan's National Health Insurance enabled the comprehensive compilation of medical expense records and ensured that NHIRD constitutes a high-quality database in terms of data integrity. Integrity is an essential property for a database to assure consistency and statistical accuracy [45]. To maintain dataset integrity, we strive to avoid unnecessary sample dropouts. Therefore, a new method was designed to model our dataset with RF and SVM algorithms.
Because follow-up time can affect the duration of survival event observation, the follow-up time was included among the inputs for machine learning models. We have validated this new method of machine learning modeling for survival prediction using our dataset for localized prostate cancer, a cancer well-known to be associated with long survival. The longer the expected survival is, the longer follow-up time is required for observing survival events. Numerous researchers use fixed follow-up times for some cancers with shorter target event-free survival, for example, 3 years or 5 years [46][47][48][49][50]. However, no definite conclusions may be drawn regarding what follow-up time for observation of localized prostate cancer survival is adequate: this might be 10 years, 20 years, or even longer. Such follow-up times are too long for most databases to accomplish complete studies, currently. Using this right-censoring method, patients who have inadequate follow-up time can be retained while compared with the fixed follow-up time, such as 3 or 5 years, for observing survival events. Retaining the information provided by the inadequate follow-up records is helpful to maintain the integrity of our database for facilitating unbiased analysis and interpretation. In this study, we proved the method used to be feasible. The accuracy of our machine learning model was up to 80%. In clinical practice, we can assign follow-up times (t) as 1 year, 2 years, and so on to predict survival.
According to our findings in this study, we strongly suggest that researchers take CCI and associated items into account when developing new models for prostate cancer decision aids in the future. Furthermore, our machine learning models with satisfying accuracy, can serve for a decision aid during shared decision-making process [51]. It is helpful for physicians and patients to use our model to predict the survivals under different choices of definite treatment for prostate cancer. However, we cannot make suggestions until further evaluation of patients' psychological impacts after their receiving of such machine-learning suggestions. Before our comorbidity-reinforced model could be added into standard shared decision-making process [51] in clinical practice, we suggest that trials for assessment of patients' psychological and emotional impacts caused by machine learning suggestions need to be conducted. After these results are available, we might be able to design a new shared decision-making process reinforced by machine-learning decision aid.. This research has three limitations. First, prostate specific antigen (PSA) data is unavailable in the NHIRD. The pretreatment PSA level and the elevation of the PSA level after definite treatment of localized prostate cancer is associated with higher risks of prostate cancer-specific death [52]. Williams et al. also demonstrated that both pretreatment and posttreatment PSA levels were associated with an overall mortality of localized prostate cancer [53]. The lack of data regarding PSA would represent a major loss of crucial information for the status of extension of prostate cancer and would compromise the establishment of the Cox hazard proportional model and machine learning models. However, little evidence addresses the association between the PSA level and noncancerous mortality of prostate cancer. In our dataset, the proportion of noncancerous mortality is as high as 75.06% of the overall mortality. The high proportion of noncancerous death in our dataset would attenuate the effects of the PSA data deficiency. Furthermore, our machine learning model achieves accuracy of up to 81 %, even in the absence of PSA data. This is an excellent proof of the significant role of CCI and its items.
The second limitation is that no uniform procedure was available for surveying the patients' comorbidities, which often exist in retrospective datasets. The lack of standard procedures resulted in the neglect of some preexisting comorbidities and the underestimating of CCI. However, this bias can be partially diminished by routine preoperative evaluation, which is commonly performed in most Taiwanese medical centers and provides indications regarding some occult comorbidities. Although many machine learning models exist for prediction, they have seldom been used in clinical practice [54]. Future studies might aim at surveying the effects of various machine learning models in clinical practice.
The third limitation is the lack of records regarding self-paid treatments in the NHIRD database. Overcoming this limitation is difficult. Some brachytherapy, intensity-modulated radiotherapy, image-guided radiotherapy, robotic surgery, or laparoscopic RP are partially or fully paid for by patients themselves. The NHIRD database provides no information regarding the exact methods for the RT or RPs that patients undergo. This limitation causes bias for model building and survival prediction. However, National Health Insurance has begun to implement the policy of recording the code for every self-paid treatment. The respective problem will no longer exist in the future.

Conclusions
The CCI and its items are powerful predictors for cancerous outcomes. Using the CCI and its items for building machine learning models would enhance the predictive power of machine learning models utilizing RF and SVM algorithms. The various treatment choices are statistically insignificant after adjustment by Cox regression to cancerous outcomes. The outcome prediction of comorbidity-reinforced machine learning models has achieved acceptably high accuracy and thus could serve as an individualized decision aid during the shared medical decision-making process for localized prostate cancer after evaluation of psychological and emotional impacts caused by suggestions yielded with the model.