Noninvasive diagnosis of AIH/PBC overlap syndrome based on prediction models

Abstract Autoimmune liver diseases (AILDs) are life-threatening chronic liver diseases, mainly including autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), and AIH–PBC overlap syndrome (OS), which are difficult to distinguish clinically at early stages. This study aimed to establish model to achieve the purpose of the diagnosis of AIH/PBC OS in a noninvasive way. A total of 201 AILDs patients were included in this retrospective study who underwent liver biopsy during January 2011 to December 2020. Serological factors significantly associated with OS were determined by the univariate analysis. Two multivariate models based on these factors were constructed to predict the diagnosis of AIH/PBC OS using logistic regression and random forest analysis. The results showed that immunoglobulins G and M had significant importance in both models. In logistic regression model, anti-Sp100, anti-Ro-52, anti-SSA, or antinuclear antibody positivity were risk factors for OS. In random forest model, activated partial thromboplastin time and ɑ-fetoprotein level were important. To distinguish PBC and OS, the sensitivity and specificity of logistic regression model were 0.889 and 0.727, respectively, and the sensitivity and specificity of random forest model were 0.944 and 0.818, respectively. In conclusion, we established two predictive models for the diagnosis of AIH/PBC OS in a noninvasive method and they showed better performance than Paris criteria for the definition of AIH/PBC OS.


Introduction
Autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), and primary sclerosing cholangitis are main categories of autoimmune liver diseases (AILDs) caused by anomalous response of immune system to self-antigens on hepatocytes or bile ducts. AIH and PBC have different mechanisms and clinical manifestations. In AIH, autoimmune injury mainly affects the hepatocytes, leading to the presence of interface hepatitis in liver histology, and AIH is characterized by high levels of immunoglobulin G (IgG) or γ-globulin and some positive serum autoantibodies [1]. PBC is characterized by circulating anti-mitochondrial antibodies (AMAs), chronic cholestasis, and autoimmunity on the intrahepatic small bile ducts, and the liver biopsy findings show typical appearance of non-purulent granulomatous destructive cholangitis [2,3]. About 2-19% of patients share overlapping features of both PBC and AIH, and it is called PBC-AIH overlap syndrome (OS) [4][5][6][7]. OS may represent an important and unrecognized cause of resistance to ursodeoxycholic acid in patients with PBC [6]. If treated inappropriate, OS leads to the development of liver cirrhosis rapidly and even liver failure, which need liver transplantation [8]. Therefore, early diagnosis and proper treatment of OS are extremely important.
The sensitivity and specificity of the Paris criteria for OS were reported to be 92 and 97%, respectively [9]. However, patients with less severe forms of AIH-PBC OS may not be captured by the Paris criteria [10]. The gold standard for diagnosing AILDs is still liver pathology, which can evaluate the severity and prognosis and determine treatment options [11,12]. However, liver biopsy has some disadvantages. First, about one quarter of patients already develop cirrhosis at diagnosis [13], have poor blood clotting and a greater risk, and increase the possibility of complications. Second, sampling process errors occur in needle liver biopsies of AILDs, which may be affected by different extents of lesions. Finally, OS should not be over-diagnosed to avoid the risk of steroid side effects on PBC patients [14,15].
Therefore, a noninvasive method as a supplementation for biopsy is urgently needed to differentiate PBC and OS, which should be safe, convenient, accurate, and effective. This noninvasive approach helps integrate numerous clinical parameters, providing a reliable understanding of liver pathology features and prognosis. The artificial intelligence model is well suited to this challenge and has already been used to predict effects of different treatment options [16][17][18][19].
This study aimed to develop and validate a prediction model for predicting the diagnosis of AIH/PBC OS based on machine learning models. We utilized the AILD patient cohort and selected valuable parameters and weighed for their importance.

Patient and public involvement
It was not appropriate or possible to involve patients or the public in the design, or conduct, or reporting, or dissemination plans of our research.
This study was approved by Ethics Committee of Xiangya Hospital of Central South University (Changsha, China, Approval ID: 20341), and all patients provided informed consent. A total of 201 patients with AILDs admitted to Xiangya Hospital between 2011 and 2020 were retrospectively analyzed.
Patients with AIH were selected based on the international group for the study of AIH simplified criteria [20]. These simplified diagnostic criteria for AIH have been validated in different countries including China [21][22][23]. PBC patients were classified according to the Paris criteria [10]. PBC-AIH OS was strictly defined by the association of PBC and AIH either simultaneously or consecutively [9]. In each patient, the absence of biliary obstruction was assessed by ultrasonography and hepatitis virus serology, and copper blue protein was negative. None had excessive alcohol consumption (<20 g/day), and there was no evidence of exposure to hepato-or bile duct toxicity. To ensure an accurate classification of AILDs, all patients had pathological results of liver biopsy. Among 201 patients with AILDs, 65 patients were excluded according to the exclusion criteria as follows: age <18 years (n = 2), virus hepatitis (n = 52), drug-induced liver injury (n = 3), liver cancer (n = 2), blood disease (n = 1), and incomplete information (n = 5).

Collection of clinical and pathology data
Information on each patient, including history as well as symptoms, clinical findings, and data from laboratory or other diagnostic investigations were obtained, including age and sex; clinical symptoms including pruritus, jaundice, and fatigue; red blood cell (RBC), white blood cell (WBC), platelet (PLT), lymphocyte count (L), eosinophil count (E), basophil count (B), monocyte count (M), aspartate aminotransferase (AST), alkaline phosphatase (ALP), γ-glutamyl transferase, international normalized ratio (INR), activated partial thromboplastin time (APTT), IgG, immunoglobulin A (IgA), immunoglobulin M (IgM), and auto-antibody tests. All auto-antibody tests were performed using Western blot-based antibody detection kits (Oumeng Diagnostics Ltd., Germany).

Statistical analysis
Continuous variables were described as median and interquartile range (IQR), and categorical variables were described as absolute frequencies and percentages. SPSS 25.0 was utilized for statistical analysis. The chi-squared test or Fischer's exact test was used to compare categorical data. For the parametric distribution, Student's t-test was used to compare the mean values of two groups. For nonparametric variables, the Kruskal-Wallis test or Mann-Whitney U test was used. All Pvalues were two-sided, and P-values <0.05 were considered statistically significant.

Model establishment
A random forest model was constructed to distinguish PBC from OS using the statistical language R (version 3.3), with the packages random forest, pROC, and rms. The random forest creates multiple training sets for decision trees, wherein each tree is built based on a bootstrap sample drawn randomly from the original dataset using the classification and regression tree method and the decrease Gini impurity as the splitting criterion [24]. Of the final cohort included, 70% were selected for training and the remaining for validation. A 10-fold cross-validation strategy was performed during training. In the model, we determined 500 decision trees in the forest, and 5 variables were considered on each decision tree. The maximum depth of the trees was set to 3 [25]. The confidence intervals for area under the receiver-operating characteristic curve (AUC) and other evaluation criteria were constructed using bootstrapping. The one that produced an AUC closest to the average was considered the best trained model. The model was then verified in the testing dataset. We selected the AUC, accuracy, sensitivity, and specificity to evaluate the performance of the model. The input variables in our model were ranked by relative importance in diagnosing PBC based on the mean decrease in accuracy and the mean decrease in the Gini coefficient [26]. Moreover, we developed the logistic regression model using the same training data and testing data.

Demographics and baseline features
The medical records of 201 AILD patients with liver biopsy were reviewed. Finally, 136 patients were enrolled in this study and divided into the PBC group (35, 25.74%), the AIH group (44, 32.35%), and the OS group (57, 41.91%) ( Figure 1). We found no difference in the distribution of age or sex among the three groups. The general characteristics of the study participants are summarized in Table 1.

Comparison between AIH and other liver diseases
To explore the difference between AIH and other ALDs, we performed single-factor and multi-factor comparisons. The blood indexes of study subjects are shown in Figure 1: Brief scheme of the study design. Data are median (IQR) or n (%). A P-value of less than 0.05 is considered statistically significant among three groups. *The P-value represents comparison between AIH, PBC, and OS within the test panel.  We put the variables with P-values less than 0.05 in the univariate analysis into the logistic regression analysis. The AUC between AIH and PBC and between AIH and OS was 1.

Construction of model to discriminate PBC and OS
To construct prediction model to discriminate PBC and OS, first we ranked the variables used to construct the model in the order of importance (Figure 2a). For the model based on random forest after over 50 replications, we chose a model which displayed the receiver-operating characteristic (ROC) under the selected training and testing cohort with the AUC of 0.93 (95% CI: 0.91-0.99), the closest to the average. According to the mean decrease in the Gini value, we found that the five strongest predictor variables used to differentiate PBC from OS were ɑ-fetoprotein (AFP), APTT, globulin (GLB), IgG, and IgM (Figure 2b). After validation in the remaining 30% of samples, the random forest model could predict the outcome with the sensitivity of 0.944 and the specificity of 0.818. The effects of these five variables on the outcome of diagnosing PBC are shown in Figure 3.
We also established prediction model applying binary logistic regression algorithms with the sensitivity of 0.889 and the specificity of 0.727 after testing in the same data. The AUC for this model was 0.78 (95% CI: 0.63-0.86), which was lower than that of the random forest model (Figure 4).

Discussion
In this retrospective study, we conducted single-factor and multi-factor analysis of patients with AILDs. When AIH was compared with PBC and OS, many variables showed significant differences, indicating that it is easy to distinguish them. However, it is difficult to distinguish PBC from OS, because the two diseases lack unique clinical characteristics [6]. In fact, there is no international consensus on the definition of OS in daily practice, especially on the histological level. The percentage of OS patients was higher than usual, which may be related to the fact that liver biopsy is not necessary for the diagnosis of PBC alone [11]. Moreover, we excluded some patients with comorbidities, which may have some selection bias. For AIH and OS, liver biopsy is necessary, and the early detection rate of OS patients has increased because of liver biopsy [12]. Therefore, we established two prediction models to reduce unnecessary operations and make effective early diagnosis. The results showed that the models had powerful competence to differentiate PBC and OS.
In our study, it is important to note that the magnitude of elevated IgG in OS was more than that in PBC. From a clinical point of view, hypergammaglobulinemia is one of the prominent clinical characteristics of AIH patients, and the decrease in IgG is an important aspect for disease control [27]. The presence of a combination of anti-double-stranded DNA (anti-dsDNA), elevated alanine aminotransferase (ALT), and elevated IgG levels should prompt a clinician to a potential diagnosis of OS [28]. We found similar result that OS is more prone to higher levels of immunoglobulins, IgG and IgM, in random forest model. Many autoantibodies have been detected in AILDs, and AMA has been recognized as specific targets of PBC [29]. However, in our study, we found no difference in AMA between PBC and OS, which may be related to the fact that OS has clinical traits of PBC. AMAs are not associated with disease progression, while ANAs are related to disease severity and clinical outcome, and are the markers of poor prognosis. In particular, ANAs are detected in up to 50% PBC patients. Two immunofluorescence patterns are considered PBC specific: the multiple nuclear dot patterns for antigens, such as Sp100 and promyelocytic leukemia protein, and rim-like/membranous patterns for antigens, such as gp210, nucleoporin p62, and the lamin B receptor [30,31]. In our study, the prevalence of ANA was 100% in AIH and 80% in PBC patients. It was reported that sp100 had a sensitivity of 40% and a specificity of 97.3% for PBC [32]. OS also has the characteristics of PBC, and Sp100 and SSA may be potential autoantibodies to differentiate patients between PBC and OS. In fact, SSA and anticentromere antibody (ACA) are helpful for the diagnosis of PBC with SS, and SSA and ACA are recognized as serological markers of AMA negative PBC patients [33]. In our study, OS patients were more likely to have SS. Anti-dsDNA and anti-p53 have been suggested to be potential autoantibodies for identifying patients with OS [27,34].
Furthermore, gp210 antibodies in PBC are associated with severe prognosis [35]. One study found significantly higher frequency of anti-gp210 in patients with OS than in patients with PBC [36], indicating that OS has a worse prognosis. However, we did not detect significantly higher frequencies of gp210, dsDNA, and p53 in OS patients compared to PBC patients.
Logistic regression model showed that the combination of splenomegaly with Sp100, SSA, and IgG levels was able to differentiate patients with OS from those with PBC. In random forest model, APTT and AFP level were factors in the top 5. It has been reported that prothrombin time (PT) and APTT can be used as appropriate predictors  of bleeding risk due to impaired liver synthesis and reduced procoagulant factors in cirrhosis [37]. However, other study showed that APTT abnormalities were poorly associated with bleeding [38]. Therefore, the importance of APTT is controversial and requires follow-up verification. Generally, the sensitivity and specificity of random forest model were better than those of logistic regression model, which is likely due to the strong generalization power of the random forest model [39].
Recently, Wang et al. developed a nice model based on limited sociodemographic and clinical parameters from routine health checkup to identify individuals at high risk for AILD [40]. In contrast, our model has unique advantages: it is noninvasive; it integrates multiple irrelevant variables simultaneously and evaluates the weight of each variable; it can be refined continuously as database enlarges and sensitive variables are constantly discovered. More importantly, our model could be used to distinguish PBC and OS. Similarly, Zhang et al. developed a scoring classification based on selected histologic features of AIH and PBC and modified biochemical and immunologic characteristics, and it showed a high sensitivity and specificity for the diagnosis of OS and may be better than current OS scoring systems to detect mild forms of OS [41].
It should be noted that our study has some imitations. First, we did not compare healthy control cohort. Second, our sample size is relatively small. Third, to improve the sensitivity and specificity of our model, we focused on the factors with importance ranking shown in Figure 1 and did not include other significant biomarkers or demographic factors, such as those used in previous study [40]. Further studies are needed to explore novel and sensitive parameters. With the enlargement of data, machine learning will gain power and become a promising approach to distinguishing PBC and OS.
In conclusion, we constructed two models with sufficient accuracy to predict the diagnosis of PBC and OS patients who probably benefit from early treatment based on readily available parameters. Following the use in clinical practice, these models help patients with early and effective treatment and reduce surveillance liver biopsies in the future. Author contributions: Xiaomei Zhang conceived the study. Kailing Wang and Yong Li contributed to the conception and design of the study. All the authors contributed to the collection of data and the analysis and interpretation of the data. Kailing Wang and Xiaomei Zhang contributed to the drafting of the article, and the other authors contributed to the critical revision of the article for important intellectual content. Xiaomei Zhang supervised this study. All the authors approved the final draft of the article.