Skip to content
BY 4.0 license Open Access Published by De Gruyter February 6, 2023

A new machine-learning-based prediction of survival in patients with end-stage liver disease

  • Sebastian Gibb ORCID logo EMAIL logo , Thomas Berg , Adam Herber , Berend Isermann and Thorsten Kaiser



The shortage of grafts for liver transplantation requires risk stratification and adequate allocation rules. This study aims to improve the model of end-stage liver disease (MELD) score for 90-day mortality prediction with the help of different machine-learning algorithms.


We retrospectively analyzed the clinical and laboratory data of 654 patients who were recruited during the evaluation process for liver transplantation at University Hospital Leipzig. After comparing 13 different machine-learning algorithms in a nested cross-validation setting and selecting the best performing one, we built a new model to predict 90-day mortality in patients with end-stage liver disease.


Penalized regression algorithms yielded the highest prediction performance in our machine-learning algorithm benchmark. In favor of a simpler model, we chose the least absolute shrinkage and selection operator (lasso) regression. Beside the classical MELD international normalized ratio (INR) and bilirubin, the lasso regression selected cystatin C over creatinine, as well as IL-6, total protein, and cholinesterase. The new model offers improved discrimination and calibration over MELD and MELD with sodium (MELD-Na), MELD 3.0, or the MELD-Plus7 risk score.


We provide a new machine-learning-based model of end-stage liver disease that incorporates synthesis and inflammatory markers and may improve the classical MELD score for 90-day survival prediction.


Liver cirrhosis is the terminal result of the fibrotic remodeling of liver tissue due to chronic damage. This end-stage of liver disease is usually irreversible, and the only available therapy is liver transplantation. However, the shortage of grafts for transplantation from deceased donors requires risk stratification and precise and fair allocation rules. The allocation of liver transplantation in most countries is based on disease severity determined by the model of end-stage liver disease (MELD) [1], [2], [3]. The MELD score estimates patients’ 3-month mortality risk based on laboratory results, namely, for bilirubin, creatinine, and the international normalized ratio (INR). The MELD score has been extended by the sodium level (MELD-Na score) because this was found to be an important additional risk factor in liver cirrhosis [3, 4]. Recently MELD 3.0 was introduced adding a factor for female sex, albumin and interactions between bilirubin and sodium and between albumin and creatinine [5]. The MELD was initially developed to predict the survival of patients undergoing transjugular intrahepatic portosystemic shunts. It was subsequently revalidated to predict mortality risk in patients awaiting a liver transplantation. Although the MELD score should be an objective allocation score, creatinine and INR are highly dependent on the laboratory methods used [6, 7]. In addition, women are disadvantaged in MELD-Na [8]. Patients with identical disease states can have very different MELD scores and thus receive different priority levels on the liver transplantation waiting list. Furthermore, for acute-on-chronic liver failure, for example, the MELD score often underestimates the mortality risk [9].

There have been some attempts to use the data extracted from more than 300,000 electronic medical records from two hospitals in the United States to improve the MELD score [10]. The derived MELD-Plus7 and MELD-Plus9 risk scores add albumin, white blood cell count, total cholesterol, age, and length of stay to the MELD-Na variables. Despite their published prediction improvement, the MELD-Plus scores are not yet used for transplant allocation.

As shown by the MELD 3.0 and MELD-Plus risk scores, better predictive scores often need more variables and are more complicated. To reduce the risk of overlooking or incorrectly calculating and interpreting the results, clinical decision support systems may be used and could improve patient safety. The research project on digital laboratory medicine (AMPEL) develops a clinical decision support system based on laboratory diagnostics that should support clinical practitioners in interpreting laboratory results and taking the necessary medical interventions [11].

This study aims to find clinical and laboratory values that improve on the risk stratification for liver transplantation of the classical MELD, MELD-Na, MELD 3.0, and MELD-Plus scores and that might be implemented as part of the AMPEL clinical decision support system.

Materials and methods

Study population

In a retrospective cohort study, we followed 778 consecutive patients, who were recruited during the evaluation process for liver transplantation at University Hospital Leipzig from November 2012 to June 2015. For each patient, we recorded 44 variables, including age, sex, etiology of liver disease, complications as listed in Table 1, and 25 laboratory measurements.

Table 1:

Baseline characteristics.

Group Characteristic Female, n=240 Male, n=414 Overall, n=654
General Follow-up time, days 181 (17, 411) 198 (54, 378) 191 (38, 384)
Age 56 (51, 64) 58 (52, 63) 58 (52, 64)
LTx 20 (8.3%) 40 (9.7%) 60 (9.2%)
Laboratory measurements Total bilirubin, µmol/L 21 (10, 49) 26 (14, 51) 24 (13, 50)
Cystatin C, mg/L 1.31 (1.04, 1.80) 1.32 (1.08, 1.76) 1.32 (1.07, 1.79)
(Missing) 2 5 7
Creatinine, µmol/L 73 (58, 91) 85 (71, 110) 80 (65, 104)
INR 1.20 (1.06, 1.50) 1.22 (1.10, 1.44) 1.21 (1.09, 1.46)
(Missing) 1 1 2
Sodium, mmol/L 138.8 (135.4, 141.0) 138.0 (135.0, 140.1) 138.1 (135.2, 140.4)
(Missing) 5 4 9
WBC, exp 9/L 6.7 (4.7, 8.8) 6.2 (4.6, 7.8) 6.3 (4.7, 8.3)
(Missing) 10 13 23
IL-6, pg/mL 10 (5, 36) 14 (6, 47) 13 (6, 42)
(Missing) 27 29 56
Albumin, g/L 41 (35, 46) 38 (33, 43) 39 (34, 45)
(Missing) 2 4 6
Total protein, g/L 71 (63, 76) 71 (65, 76) 71 (65, 76)
(Missing) 2 4 6
Cholesterine, mmol/L 4.64 (3.41, 5.56) 4.28 (3.34, 5.15) 4.40 (3.34, 5.31)
ALAT, µkat/L 0.29 (0.21, 0.42) 0.36 (0.23, 0.56) 0.33 (0.22, 0.48)
(Missing) 2 8 10
ASAT, µkat/L 0.68 (0.49, 1.05) 0.81 (0.61, 1.23) 0.77 (0.55, 1.16)
(Missing) 3 5 8
MELD MELD score 10 (7, 18) 12 (9, 17) 11 (8, 17)
(Missing) 2 1 3
MELD-Na score 10 (7, 19) 12 (9, 20) 11 (8, 20)
(Missing) 5 4 9
MELD category
[6,9] 111 (47%) 146 (36%) 257 (40%)
[10,20) 71 (30%) 163 (40%) 234 (36%)
[20,30) 28 (12%) 77 (19%) 105 (16%)
[30,40) 21 (8.9%) 17 (4.1%) 38 (5.9%)
[40,52) 4 (1.7%) 7 (1.7%) 11 (1.7%)
(Missing) 5 4 9
MELD 3.0 score 12 (8, 20) 12 (9, 19) 12 (8, 19)
(Missing) 8 8 16
Entities Cirrhosis 203 (85%) 391 (94%) 594 (91%)
(Missing) 1 0 1
ALF 6 (2.5%) 2 (0.5%) 8 (1.2%)
CLI 30 (12%) 21 (5.1%) 51 (7.8%)
Etiologya Ethyltoxic 114 (48%) 297 (72%) 411 (63%)
HBV 4 (1.7%) 16 (3.9%) 20 (3.1%)
HCV 17 (7.1%) 28 (6.8%) 45 (6.9%)
AIH 22 (9.2%) 9 (2.2%) 31 (4.7%)
PBC 17 (7.1%) 0 (0%) 17 (2.6%)
PSC 5 (2.1%) 11 (2.7%) 16 (2.4%)
NASH 17 (7.1%) 31 (7.5%) 48 (7.3%)
Cryptogenic 36 (15%) 32 (7.7%) 68 (10%)
Complicationsa Dialysis 14 (5.9%) 19 (4.6%) 33 (5.1%)
(Missing) 1 1 2
GIB 58 (24%) 104 (25%) 162 (25%)
HCC 18 (7.5%) 103 (25%) 121 (19%)
SBP 31 (13%) 60 (15%) 91 (14%)
(Missing) 0 1 1
Mortality Within 7 days 16 (7.2%) 18 (4.5%) 34 (5.5%)
Within 30 days 25 (12%) 31 (8.1%) 56 (9.4%)
Within 90 days 32 (16%) 49 (13%) 81 (14%)
Within 365 days 39 (20%) 61 (18%) 100 (18%)
  1. Values are given as median (lower quartile, upper quartile) or n (percent). aMore than one per patient possible; AIH, autoimmune hepatitis; ALAT, alanine aminotransferase; ALF, acute liver failure; ASAT, aspartate aminotransferase; CLI, chronic liver insufficiency; GIB, gastrointestinal bleeding; HBV, hepatitis B virus; HCV, hepatitis C virus; IL-6, interleukin 6; INR, international normalized ratio; LTx, liver transplantation; NASH, non-alcoholic steatohepatitis; PBC, primary biliary cirrhosis; PSC, primary sclerosing cholangitis; SBP, spontaneous bacterial peritonitis.

We excluded 124 patients from our analysis (Figure 1). One was younger than 18 years, 24 had had a liver transplant before, and in 99 cases the follow-up data were missing or invalid.

Figure 1: 
Flowchart showing inclusion and exclusion of records.
Figure 1:

Flowchart showing inclusion and exclusion of records.

The Ethics Committee at the Leipzig University Faculty of Medicine approved the retrospective usage of the data for our study (reference number: 039/14ff).

MELD scores

MELD and MELD-Na were calculated as described in [3] using the following formulas:

MELD = 10 ( 0.957 ln ( creatinine [ mg / dL ] ) + 0.378 ln ( bilirubin [ mg / dL ] ) + 1.120 ln ( INR ) + 0.643 ) ,

Creatinine, bilirubin, and INR values lower than 1.0 mg/dL were set to 1.0 mg/dL. The maximum allowed creatinine was 4.0 mg/dL. If the patients received dialysis, creatinine was set to 4.0 mg/dL. For MELD-Na, MELD was calculated as above and recalculated if greater than 11 using:

MELD Na = MELD i + 1.32 ( 137     Na [ mmol / L ] )   ( 0.033 MELD i ( 137     Na [ mmol / L ) )  

Serum sodium values lower than 125 mmol/L or higher than 137 mmol/L were set to 125 mmol/L and 137 mmol/L, respectively.

MELD 3.0 was calculated as described in [5]:

M E L D 3 = 1.33 i f f e m a l e + 4.56 ln b i l i r u b i n m g / d L + 0.82 137 N a m m o l / L 0.24 137 N a m m o l / L ln b i l i r u b i n m g / d L + 9.09 ln I N R + 11.14 ln c r e a t i n i n e m g / d L + 1.85 3.5 a l b u m i n g / d L 1.83 3.5 a l b u m i n g / d L ln c r e a t i n i n e m g / d L + 6

In contrast to MELD-Na the maximum allowed creatinine was set to 3.0 mg/dL.

The MELD-Plus7 risk score was calculated as described in [10]:

L = 8.53499496 + 2.59679650 log 10 ( 1 + creatinine   [ mg / dL ] ) + 2.06503238 log 10 ( 1 + bilirubin   [ mg / dL ] ) + 2.99724802 log 10 ( 1 + INR ) 6.47834101 log 10 ( 1 + sodium   [ mmol / L ] ) 6.34990436 log 10 ( 1 + albumin   [ g / dL ] ) + 1.92811726 log 10 ( 1 + wbc   [ th / cumm ] ) + 0.04070442 age   [ years ]
MELD Plus = exp ( L )   /   ( 1 + exp ( L ) )

The MELD-Plus9 score was ignored because the length of stay information was not available for our cohort, and the model itself was not superior to MELD-Plus7 in its original publication [10].

Data processing and machine-learning

All data processing, statistical, and machine-learning analyses were performed using R version 4.2.0 [12]. Prior to the analyses, all laboratory values were zlog transformed to approximate a normal distribution as described in [13] and implemented in [14].

z l o g = ( log ( x )     ( log ( LL ) + log ( UL ) ) / 2 ) 3.92 / ( log ( UL )     log ( LL ) )

where LL and UL are the lower and upper age-, and sex-related reference limits of the corresponding laboratory diagnostic, respectively. Missing laboratory measurements were replaced with a zlog value of zero. In a similar manner, where information about complications or comorbidities was missing, we treated them as not present.

We compared 13 different statistical and machine-learning algorithms using the mlr3 framework [15], [16], [17], [18]. The algorithms used were designed for analysis of survival data: Cox proportional hazards regression model [19], [20], [21]; penalized regressions, namely, two different implementations of the lasso (least absolute shrinkage and selection operator) regression [22], [23], [24], [25], [26], two different implementations of the ridge regression [23], [24], [25], [26], and the elastic net regression [23, 24]; two different random forest implementations [27], [28], [29], [30]; extreme gradient boosting [31, 32]; two different support vector machines [33, 34]; and two neural networks [35], [36], [37].

We used nested resampling to avoid feature- and model-selection bias to obtain reliable metrics to choose the best model. We thus applied a 5-fold inner and 3-fold outer cross-validation that was repeated 10 times. We searched the hyperparameter space using a grid search with a resolution of 10. The models were evaluated using the concordance index, which is a measure of rank correlation between predicted and observed events [38]. Subsequently, we selected the model with the highest median concordance index. If multiple models were comparable, we chose the simpler (more penalized) one.

After selecting the final model, we trained and validated it as described in [39]. Briefly, we developed our model using a 3-fold cross-validation that was repeated 10 times and finally predicted survival at 90 days. The predictions were sorted and grouped into equal-sized intervals so that there were 50 patients in each interval. For each group, the mean survival probability and the difference to the Kaplan–Meier estimate was calculated. The same procedure was repeated for 200 bootstrap samples. Subsequently, the differences between predicted survival and Kaplan–Meier estimates of the bootstrap samples were averaged and added to the original model difference to obtain a bias-corrected estimate between predicted and observed survival.

The final model was compared against the MELD, MELD-Na, MELD 3.0, and MELD-Plus7 scores using the area under the time-dependent receiver operating characteristic curve (AUROC) based on the nonparametric inverse probability of censoring weighting estimator (IPCW) [40, 41]. Testing was applied as described in [40]. A p<0.05 was considered a statistically significant difference.

Further we determined the variable importance by counting how often the variable was chosen during the feature selection in the bootstrap process described above and by the logrank value returned by one random forest implementation [28].

Summary tables were built using the gtsummary package [42, 43]


Baseline characteristics

In this study, we analyzed a cohort of 654 patients evaluated for liver transplantation (Table 1). Of these, 414 (63%) were men. The median (IQR) follow-up time was 191 (38, 384) days. Within the follow-up time, 60 (9.2%) patients received a liver transplant and were censored for the analysis beyond this time point. The median (IQR) age of the cohort was 58 (52, 64) years.

Most of the patients presented with cirrhosis (594 [91%]). The remaining patients had acute liver failure (8 [1.2%]) or chronic liver insufficiency (51 [7.8%]). At evaluation time, the median (IQR) MELD, MELD-Na, and MELD 3.0 scores were 11 (8, 17), 11 (8, 20), and 12 (8, 19), respectively. Within 90 days, 81 (14%) patients had died. In nearly two thirds of all patients (411 [63%]), alcohol was the main cause of their end-stage liver disease. The remaining patients had viral infections – hepatitis B (20 [3.1%]) or C (45 [6.9%] – autoimmune hepatitis (31 [4.7%]), primary sclerosing cirrhosis (16 [2.4%]), primary biliary cirrhosis (17 [2.6%]), unknown causes (non-alcoholic steatohepatitis (48 [7.3%]), or cryptogenic hepatitis (68 [10%])). The most common complications among all patients were gastrointestinal bleedings in 162 (25%) patients, followed by hepatocellular carcinoma (HCC) in 121 (19%) and spontaneous bacterial peritonitis (SBP) in 91 (14%) patients. Just 33 (5.1%) patients required renal dialysis.

MELD scores

We divided our cohort into 5 MELD-Na-score risk categories, with 257, 234, 105, 38, and 11 patients, respectively. As shown in Table 2, our observed 90-day mortality was 1.2–2.2 times higher than the predicted one [44]. Only in the group with the lowest MELD-Na scores was the mortality much lower.

Table 2:

Observed vs. MELD-expected 90-day mortality.

MELD category Observed deaths, n Expected deaths, n SMR Observed mortality, % Expected mortality, %
[6, 9] 1 5.8 0.2 0.5 2.8
[10, 20] 9 6.9 1.3 4.5 3.7
[20, 30] 37 17.0 2.2 40.9 22.0
[30, 40] 23 16.2 1.4 87.5 64.6
[40, 52] 6 5.1 1.2 100.0 84.4
  1. Observed vs. MELD-Na-expected 90-day mortality. MELD-Na mortality values are taken from [44] and the mean value was calculated for each MELD-Na category. All patients censored before day 90 were ignored for the calculation of the MELD-Na-expected deaths. SMR, standardized mortality ratio=observed deaths/expected deaths.


The penalized regression and random forest models resulted in a similar high median performance in the benchmark process (see Supplementary Material). In favor of simpler models, we chose a lasso-based approach for the final model development. The classical Cox model, the data and computational intense algorithms extreme gradient boosting, the support vector machines, and the neural networks yielded a worse performance.

The resulting AMPEL model of end-stage liver disease (AMELD) uses 6 predictors. Besides the classical MELD variables INR and (direct) bilirubin, the lasso regression selected cystatin C over creatinine, as well as IL-6, total protein, and cholinesterase (Table 3).

Table 3:

AMPEL MELD model coefficients.

IL-6 0.193
Total protein −0.081
INR 0.067
Cholinesterase −0.037
Cystatin C 0.026
Direct bilirubin 0.010

The 90-day survival estimate of a given patient can be calculated as shown below:

S = S 0 , 9 0 e x p L P = 0.9661 exp ( 0.1929 zlog ( IL 6 ) 0.0805 zlog ( Total   protein ) + 0.667 zlog ( Cholinesterase ) + 0.026 zlog ( Cystatin   C ) + 0.0098 zlog ( Direct   bilirubin ) )

where S 0 is the baseline survival estimate from Table 4 and LP is the linear predictor of the coefficients in Table 3.

Table 4:

Underlying survival function of AMPEL MELD.

t, days 2 7 30 90 180 365
S 0 , t 0.9967 0.9912 0.9816 0.9661 0.9546 0.9489
  1. t, time in days; S0, baseline survival estimate.

Model comparison

AMELD offers the best AUROC 0.961 (95% CI, 0.942–0.980) of all competitors (Figure 2). However, after correction for multiple testing, the difference against the second-best model, MELD 3.0 (AUROC 0.936 [95% CI, 0.910–0.963]), is not statistical significant (p=0.15; see Table 5 for complete comparison).

Figure 2: 
Receiver operating characteristic (ROC) curve. Area under the time-dependent ROC curve (AUC) based on the nonparametric inverse probability of censoring weighting estimate (IPCW) for AMELD, MELD, MELD-Na, MELD 3.0, and MELD-Plus7, as described in [40].
Figure 2:

Receiver operating characteristic (ROC) curve. Area under the time-dependent ROC curve (AUC) based on the nonparametric inverse probability of censoring weighting estimate (IPCW) for AMELD, MELD, MELD-Na, MELD 3.0, and MELD-Plus7, as described in [40].

Table 5:

Test comparison between AUCROC for MELD-Na and AMELD.

t=2 t=7 t=30 t=90 t=180 t=365
Non-adjusted p-values 0.27 0.15 0.33 0.03 0.02 0.33
Adjusted p-values 0.75 0.52 0.84 0.15 0.08 0.84
  1. Test comparison between AUCROC for MELD 3.0 and AMELD according to [40] at day 2, 7, 30, 90, 180, 365; p-values are shown as non-adjusted and adjusted for multiple testing.

Figure 3 depicts the calibration curve for AMELD and the competing MELD scores. While the prediction for low-risk patients is similar to other scores, AMELD predicts the medium-risk patients more accurately.

Figure 3: 
Calibration of AMELD in comparison to MELD, MELD-Na, MELD 3.0, and MELD-Plus7. The points mark the mean predicted survival of 50 patients per interval against their observed mean survival. The error bars correspond to the 95% confidence interval of the survival estimate.
Figure 3:

Calibration of AMELD in comparison to MELD, MELD-Na, MELD 3.0, and MELD-Plus7. The points mark the mean predicted survival of 50 patients per interval against their observed mean survival. The error bars correspond to the 95% confidence interval of the survival estimate.

Variable importance

The AMELD selected features IL-6, total protein, INR, cholinesterase, cystatin C, and direct bilirubin are the 6 highest ranked features in the bootstrap process. In the random forest logrank they are among the 12 highest ranked features out of 42 features. Sodium and albumin introduced by MELD-Na and MELD 3.0 are on position 18 and 12 for the bootstrap and 20 and 5 for the random forest variable importance ranking, respectively (see Supplement for the complete rankings).


In this retrospective analysis of 654 consecutive patients who were recruited during the evaluation process for liver transplantation at University Hospital Leipzig, we developed a new model to predict 90-day survival probability. Our new model, AMELD, incorporates synthesis and inflammatory markers and may improve the prediction over previous published MELD scores.

Although the Organ Procurement and Transplantation Network recommends the use of MELD-Na over MELD, the sodium does not add any information to the score. Neither in the bootstrap selection nor in the variable importance of a random forest model does the sodium gain any important weight (in bootstrap selection rank 18 of 42 and in a random forest variable importance rank 20 of 42; see Supplementary Material). In addition, sodium levels can be affected by diuretic therapy, and the discrimination performance of MELD-Na has already been shown by [10] to be insignificantly different from the classical MELD. Consequently, it is not a component of our model.

The albumin included in MELD 3.0 is part of the total protein that has throughout higher rankings in both variable importance measurements (12 vs. 2 in the bootstrap selection and 5 vs. 1 in the random forest variable importance, Supplementary Material). Using total protein instead of albumin as synthesis marker offers more information about additional globulins and especially coagulation factors that are incompletely measured by INR.

It has been shown that women receive median lower scores in MELD-Na and therefore are less likely to receive an urgent listing for liver transplantation [8]. MELD 3.0 tries to circumvent this by adding 1.33 to the score for female sex. To address sexual differences in laboratory measurements, we applied the zlog transformation [13] in our study. This transformation also yields comparable laboratory measurements across different clinical laboratories and methods, as well as reducing the influence of age- and sex-related differences, which could improve the objectivity of the allocation process. The already nearly normal scaled zlog-transformed laboratory measurements allow us to avoid the sample-dependent scaling prior to model development. Both should increase the external applicability of our approach.

Our model, AMELD, uses 6 predictors. Three of these—INR, (direct) bilirubin, and cystatin C—are identical or similar to the laboratory measurements utilized in the classical MELD. The direct bilirubin selected by our approach and the total bilirubin in MELD are highly correlated (Spearman correlation: 0.96) and therefore interchangeable.

In our feature selection process, cystatin C was chosen instead of creatinine. This is in line with recent studies that show a better prediction of mortality for cystatin C in unselected and critically ill patients [45, 46].

Furthermore, AMELD extends MELD by the additional synthesis parameters total protein and cholinesterase and, most importantly, the inflammatory marker IL-6.

The latter is an example where results from machine-learning can aid in finding clinically meaningful patterns and guide further research. Systemic inflammatory response syndrome is an independent risk factor for poor outcomes in patients with cirrhosis [47]. However, neither MELD, MELD-Na, nor MELD 3.0 took inflammation into account. By contrast, IL-6 is the most important variable in our model. Even if IL-6 is excluded, the algorithms select CRP as one of the most important variables, which highlights the important role of inflammation in the prognosis of patients with end-stage liver disease. We and other researchers have already shown that adding inflammatory markers, namely IL-6 or CRP, to the classical MELD score increases the accuracy of the 90-day mortality prediction [48, 49].

Besides laboratory measurements, we investigated 17 other items of clinical data, but none were chosen in our model-selection process. Kartoun et al. also used the lasso approach and did not select any clinical data [10]. In retrospective analyses in particular, the clinical data are often not detailed enough to provide valuable information.

In our benchmark of 13 different machine-learning algorithms, we have seen that the more complex algorithms, such as boosting, support vector machines, and neural networks, perform worse than simpler ones. This may be due to additive effects that favor regression models or due to our relatively small sample size. It has been shown that in machine-learning, more than 200 events per predictor variable are required to achieve good prediction performance [50].


Our study has several limitations. Principally, it is a single-center, retrospective analysis lacking an external validation cohort. The data were collected during the evaluation process for liver transplantation at University Hospital Leipzig. There was no controlled follow-up process. We excluded all patients who had already undergone liver transplantation, which could artificially lower the disease severity of the cohort. Furthermore, most of our patients suffered from ethyltoxic liver cirrhosis (63%), which may have limited the transfer to other etiologies.

With a median MELD-Na score of 11, our cohort was healthier than those of previous studies that reported a score of 14–18.5 [4, 10, 44]. However, the 90-day mortality rate of 81 (14%) was comparable to the mortality rates seen in earlier studies, ranging from 6 to 30% [1, 4, 10]. Although our cohort was healthier, with a medium mortality rate, the classical MELD score underestimates the observed mortality by 1.2–2.2 times. Unfortunately, our sample size was low in the high-risk patient category, resulting in a poor but better than MELD-based prediction performance by our AMELD for very ill patients.


In conclusion, we have developed a new model to predict the 90-day survival of patients during the evaluation process for liver transplantation. Our model, AMELD, extends the classical MELD predictors INR, (direct) bilirubin, and cystatin C (instead of creatinine) by using the synthesis parameters total protein and cholinesterase and, most importantly, the inflammatory marker IL-6. Using these 6 laboratory measurements, AMELD may improve the prediction over previous published MELD scores. For wider adoption, a prospective multi-center study for model recalibration and validation is needed.

Corresponding author: Sebastian Gibb, Anesthesiology and Intensive Care Medicine, University Hospital Greifswald, Greifswald, Germany; and Institute of Laboratory Medicine, Clinical Chemistry and Molecular Diagnostics, University Hospital Leipzig, Leipzig, Germany, E-mail:

Funding source: Sächsische Aufbaubank

Award Identifier / Grant number: RL eHealthSax 2017/18 grant number 100331796


We thank the whole AMPEL team for their support, Stefan Kemnitz for his technical support regarding the use of the high-performance computing (HPC) cluster at the University of Greifswald and Bernd Klaus for helpful discussions about survival analysis and proofreading the manuscript. The calculations were conducted using the HPC cluster of the University of Greifswald. In 2018, the average carbon efficiency of the German power grid was 0.471 kgCO2eq/kWh [51]. A single run of the complete calculation pipeline took roughly 96 h of computation on an Intel(R) Xeon(R) E5-2650 v4 and Intel(R) Xeon(R) Gold 6240 CPU (thermal design power: 105 and 150 W). The total estimated emissions were at least 6.77 kgCO2eq (ignoring development, test runs, etc.). This is equivalent to 44 km driven by an average car [52].

  1. Research funding: The study was supported by the AMPEL project ( This project is co-financed through public funds according to the budget decided by the Saxon State Parliament under RL eHealthSax 2017/18 grant number 100331796. The funder provided support in the form of salaries for author SG but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The publication of the manuscript was funded by the Open Access Publishing Fund of Leipzig University supported by the German Research Foundation within the program Open Access Publication Funding.

  2. Author contributions: Funding was acquired by TK. The study was conceptualized by SG and TK. The data were acquired by TK. The data were analyzed and processed by SG. Visualization was produced by SG. The original drafts of the manuscript were written by SG and TK. Drafts were edited and reviewed by SG, TB, AH, BI, and TK. The project was supervised by TK. All authors have accepted responsibility for the entire content of this manuscript and have approved its submission.

  3. Competing interests: TB recieved grants or research support from Abbvie, BMS, Gilead, MSD/Merck, Humedics, Intercept, Merz, Norgine, Novartis, Orphalan, Sequana Medical. TB received honoraria or consultation fees/advisory board from Abbvie, Alexion, Bayer, Gilead, GSK, Eisai, Enyo Pharma, Falk Foundation, HepaRegeniX GmbH, Humedics, Intercept, Ipsen, Janssen, MSD/Merck, Novartis, Orphalan, Roche, Sequana Medical, SIRTEX, SOBI, and Shionogi TB participated in a company sponsored speaker’s bureau for Abbvie, Alexion, Bayer, Gilead, Eisai, Intercept, Ipsen, Janssen, MedUpdate GmbH, MSD/Merck, Novartis, Orphalan, Sequana Medica, SIRTEX, and SOBI. All other authors state no conflict of interest.

  4. Informed consent: Informed consent was obtained from all individuals included in this study.

  5. Ethical approval: Research involving human subjects complied with all relevant national regulations and institutional policies, as well as the tenets of the Helsinki Declaration (as revised in 2013), and was approved by the Ethics Committee at the Leipzig University Faculty of Medicine (reference number: 039/14ff).

  6. Data and software availability: The complete code and reproducible analysis can be found at In addition, all data are available in the ameld R package at for further investigation.


1. Malinchoc, M, Kamath, PS, Gordon, FD, Peine, CJ, Rank, J, Borg, PCJ. A model to predict poor survival in patients undergoing transjugular intrahepatic portosystemic shunts. Hepatology 2000;31:864–71. in Google Scholar PubMed

2. Wiesner, R, Edwards, E, Freeman, R, Harper, A, Kim, R, Kamath, P, et al.. Model for end-stage liver disease (MELD) and allocation of donor livers. Gastroenterology 2003;124:91–6. in Google Scholar PubMed

3. Organ Procurement and Transplantation Network. Policies; 2021. Available from: [Accessed 01 Aug 2021].Search in Google Scholar

4. Kim, WR, Biggins, SW, Kremers, WK, Wiesner, RH, Kamath, PS, Benson, JT, et al.. Hyponatremia and mortality among patients on the liver-transplant waiting list. N Engl J Med 2008;359:1018–26. in Google Scholar PubMed PubMed Central

5. Kim, WR, Mannalithara, A, Heimbach, JK, Kamath, PS, Asrani, SK, Biggins, SW, et al.. MELD 3.0: the model for end-stage liver disease updated for the Modern Era. Gastroenterology 2021;161:1887–95.e4. in Google Scholar PubMed PubMed Central

6. Trotter, JF, Brimhall, B, Arjal, R, Phillips, C. Specific laboratory methodologies achieve higher model for endstage liver disease (MELD) scores for patients listed for liver transplantation. Liver Transplant 2004;10:995–1000. in Google Scholar PubMed

7. Cholongitas, E, Marelli, L, Kerry, A, Senzolo, M, Goodier, DW, Nair, D, et al.. Different methods of creatinine measurement significantly affect MELD scores. Liver Transplant 2007;13:523–9. in Google Scholar PubMed

8. Sealock, JM, Ziogas, IA, Zhao, Z, Ye, F, Alexopoulos, SP, Matsuoka, L, et al.. Proposing a sex-adjusted sodium-adjusted MELD score for liver transplant allocation. JAMA Surg 2022;157:618. in Google Scholar PubMed PubMed Central

9. Hernaez, R, Liu, Y, Kramer, JR, Rana, A, El-Serag, HB, Kanwal, F. Model for end-stage liver disease-sodium underestimates 90-day mortality risk in patients with acute-on-chronic liver failure. J Hepatol 2020;73:1425–33. in Google Scholar PubMed

10. Kartoun, U, Corey, KE, Simon, TG, Zheng, H, Aggarwal, R, Ng, K, et al.. The MELD-plus: a generalizable prediction risk score in cirrhosis. PLoS One 2017;12:e0186301. in Google Scholar PubMed PubMed Central

11. Eckelt, F, Remmler, J, Kister, T, Wernsdorfer, M, Richter, H, Federbusch, M, et al.. Verbesserte Patientensicherheit durch “clinical decision support systems” in der Labormedizin. Der Internist 2020;61:452–9. in Google Scholar PubMed

12. R Core Team. R. language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2022. Available from: in Google Scholar

13. Hoffmann, G, Klawonn, F, Lichtinghagen, R, Orth, M. Der zlog-Wert als Basis für die Standardisierung von Laborwerten. J Lab Med 2017;41:23–32. in Google Scholar

14. Gibb, S. zlog: Z(log) transformation for laboratory measurements; 2021. Available from: [Accessed 01 Aug 2021].Search in Google Scholar

15. Lang, M, Binder, M, Richter, J, Schratz, P, Pfisterer, F, Coors, S, et al.. Mlr3: a modern object-oriented machine learning framework in R. J Open Source Softw 2019;4:1903. in Google Scholar

16. Lang, M, Bischl, B, Richter, J, Schratz, P, Binder, M. Mlr3: machine learning in R – next generation; 2022. Available from: in Google Scholar

17. Sonabend, R, Király, FJ, Bender, A, Bischl, B, Lang, M. Mlr3proba: an R package for machine learning in survival analysis. Bioinformatics 2021;37:2789–91. in Google Scholar

18. Sonabend, R, Kiraly, F, Lang, M. Mlr3proba: Probabilistic supervised learning for mlr3; 2022. Available from: [Accessed 01 Aug 2021].Search in Google Scholar

19. Cox, DR. Regression models and life-tables. J R Stat Soc Series B 1972;34:187–202. in Google Scholar

20. Therneau, TM, Grambsch, PM. Modeling survival data: extending the Cox model. New York: Springer; 2000.10.1007/978-1-4757-3294-8Search in Google Scholar

21. Therneau, TM. Survival: survival analysis; 2022. Available from: in Google Scholar

22. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat Med 1997;16:385–95.<385::aid-sim380>;2-3.10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3Search in Google Scholar

23. Simon, N, Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Software 2011;39:1–13. in Google Scholar

24. Friedman, J, Hastie, T, Tibshirani, R, Narasimhan, B, Tay, K, Simon, N, et al.. Glmnet: lasso and elastic-net regularized generalized linear models; 2022. Available from: [Accessed 01 Aug 2022].Search in Google Scholar

25. Goeman, JJ. L1 penalized estimation in the Cox proportional hazards model. Biom J 2010;52:70–84. in Google Scholar

26. Goeman, J, Meijer, R, Chaturvedi, N, Lueder, M. Penalized: L1 (lasso and fused lasso) and L2 (ridge) penalized estimation in GLMs and in the Cox model; 2022. Available from: in Google Scholar

27. Wright, MN, Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Software 2017;77:1–17. in Google Scholar

28. Wright, MN, Wager, S, Ranger, PP. A fast implementation of random forests; 2021. Available from: in Google Scholar

29. Ishwaran, H, Kogalur, UB, Blackstone, EH, Lauer, MS. Random survival forests. Ann Appl Stat 2008;2:841–60. in Google Scholar

30. Ishwaran, H, Kogalur, UB. RandomForestSRC: fast unified random forests for survival, regression, and classification (RF-SRC); 2020. Available from: in Google Scholar

31. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proc 22nd ACM SIGKDD Int Conf Knowl Disc Data Mining 2016:785–94 pp.10.1145/2939672.2939785Search in Google Scholar

32. Chen, T, He, T, Benesty, M, Khotilovich, V, Tang, Y, Cho, H, et al.. Xgboost: extreme gradient boosting; 2022. Available from: in Google Scholar

33. Belle, VV, Pelckmans, K, Huffel, SV, Suykens, JAK. Support vector methods for survival analysis: a comparison between ranking and regression approaches. Artif Intell Med 2011;53:107–18. in Google Scholar

34. Fouodo, CJK. Survivalsvm: survival support vector analysis; 2018. Available from: in Google Scholar

35. Katzman, JL, Shaham, U, Cloninger, A, Bates, J, Jiang, T, Kluger, Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:24. in Google Scholar

36. Kvamme, H, Ø, B, Scheel, I. Time-to-event prediction with neural networks and Cox regression. J Mach Learn Res 2019;20:1–30.Search in Google Scholar

37. Sonabend, R. Survival models: models for survival analysis; 2022. Available from: in Google Scholar

38. Harrell, FE. Evaluating the yield of medical tests. JAMA 1982;247:2543. in Google Scholar

39. Harrell, FE, Lee, KL, Mark, DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361–87.<361::aid-sim168>;2-4.10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4Search in Google Scholar

40. Blanche, P, Dartigues, J-F, Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat Med 2013;32:5381–97. in Google Scholar

41. Blanche, P. TimeROC: Time-dependent ROC curve and AUC for censored survival data; 2019. Available from: in Google Scholar

42. Sjoberg, DD, Whiting, K, Curry, M, Lavery, JA, Larmarange, J. Reproducible summary tables with the gtsummary package. R J 2021;13:570–80. in Google Scholar

43. Sjoberg, DD, Curry, M, Larmarange, J, Lavery, J, Whiting, K, Zabor, EC. Gtsummary: presentation-ready data summary and analytic result tables; 2022. Available from: in Google Scholar

44. Vanderwerken, DN, Wood, NL, Segev, DL, Gentry, SE. The precise relationship between model for end-stage liver disease and survival without a liver transplant. Hepatology 2021;74:950–60. in Google Scholar

45. Helmersson-Karlqvist, J, Ärnlöv, J, Larsson, A. Cystatin c-based glomerular filtration rate associates more closely with mortality than creatinine-based or combined glomerular filtration rate equations in unselected patients. Eur J Prev Cardiol 2016;23:1649–57. in Google Scholar

46. Helmersson-Karlqvist, J, Lipcsey, M, Ärnlöv, J, Bell, M, Ravn, B, Dardashti, A, et al.. Cystatin c predicts long term mortality better than creatinine in a nationwide study of intensive care patients. Sci Rep 2021;11:1–9. in Google Scholar

47. Thabut, D, Massard, J, Gangloff, A, Carbonell, N, Francoz, C, Nguyen-Khac, E, et al.. Model for end-stage liver disease score and systemic inflammatory response are major prognostic factors in patients with cirrhosis and acute functional renal failure. Hepatology 2007;46:1872–82. in Google Scholar PubMed

48. Remmler, J, Schneider, C, Treuner-Kaueroff, T, Bartels, M, Seehofer, D, Scholz, M, et al.. Increased level of interleukin 6 associates with increased 90-day and 1-year mortality in patients with end-stage liver disease. Clin Gastroenterol Hepatol 2017;16:730–7. in Google Scholar PubMed

49. Cervoni, J-P, Amorós, À, Bañares, R, Montero, JL, Soriano, G, Weil, D, et al.. Prognostic value of c-reactive protein in cirrhosis. Eur J Gastroenterol Hepatol 2016;28:1028–34. in Google Scholar PubMed

50. van der Ploeg, T, Austin, PC, Steyerberg, EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol 2014;14:1–13. in Google Scholar PubMed PubMed Central

51. Icha, P, Lauf, T, Kuhs, G. Entwicklung der spezifischen Kohlendioxid-Emissionen des deutschen Strommix in den Jahren 1990–2020. Climate Change. Umweltbundesamt; 2021, vol: 45: 38 p. Available from: [Accessed 03 Mar 2022].Search in Google Scholar

52. Allekotte, M, Bergk, F, Biemann, K, Deregowski, C, Knörr, W, Althaus, H-J, et al.. Ökologische Bewertung von Verkehrsarten. Umweltbundesamt; 2021: 214–5 pp. Available from: [Accessed 10 Mar 2022].Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (

Received: 2022-08-10
Accepted: 2022-12-20
Published Online: 2023-02-06
Published in Print: 2023-02-23

© 2023 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 23.3.2023 from
Scroll Up Arrow