Prediction of LDL in hypertriglyceridemic subjects using an innovative ensemble machine learning technique

Objectives: Determining low-density lipoprotein (LDL) is a costly and time-consuming operation, but triglyceride value above 400 (TG>400) always requires LDL measurement. Obtaining a fast LDL forecast by accurate prediction can be valuable to experts. However, if a high error margin exists, LDL prediction can be critical and unusable. Our objective is LDL value and level prediction with an error less than low total acceptable error rate (% TEa). Methods: Our present work used 6392 lab records to predict the patient LDL value using state-of-the-art Arti ﬁ cial Intelligence methods. The designed model, p-LDL-M, predicts LDL value and class with an overall average test score of 98.70 %, using custom, hyper-parameter-tuned Ensemble Machine Learning algorithm. Results: The results show that using our innovative p-LDL-M is advisable for subjects with critical TG>400. Analysis proved that our model is positively a ﬀ ected by the Hopkins and Friedewald equations normally used for (TG ≤ 400). The conclusionfollowsthatthe test score performanceofp-LDL-M using only (TG>400) is 7.72 % inferior to the same p-LDL-M, using Hopkins and Friedewald supported data. In addition, the test score performance of the NIH-Equ-2 for (TG>400) is much inferior to p-LDL-M prediction results. Conclusions: In conclusion, obtaining an accurate and fast LDL value and level forecast for people with (TG>400) using our innovative p-LDL-M is highly recommendable.


Introduction
Clinical status in patients at cardiovascular risk depends on their low-density lipoprotein (LDL) levels [1].Therefore, LDL levels are one of the most important target parameters in cholesterol-lowering treatment regimens worldwide.In biochemistry laboratory practice, LDL is usually a member of a group of tests called the lipid profile.Other tests in this panel are total cholesterol (TC), triglycerides (TG), and high-density lipoprotein (HDL).Beta quantification is the most commonly used ultracentrifugation-based method for LDL measurement [2].Although it is the gold standard, new measurement methods have been developed because this method is timeconsuming and expensive.Many manufacturers have developed calibrated kits validated by the gold-standard method compatible auto-analyzers.The auto-analyzer-supported test methods are based on homogeneous immunoassays.However, medical laboratories are units that adopt the principle of best cost-effective operation for fast and accurate test results.Therefore, scientists have started to develop new laboratory tests and methods, especially due to increasing health expenditures.The Friedewald equation (LDL=TC-(HDL + TG/5)) has been widely used in clinical practice for several decades.The Friedewald equation calculates the LDL value using the patients' TC, HDL, and TG test results [3].However, other studies have shown that LDL calculation using the Friedewald formula is inappropriate in cases with TG>400 mg/dL (High-TG) [4].Consequently, new formulae for more accurate LDL calculations have been developed, and their results have been reflected in routine laboratory result reports.In particular, for patients with low LDL levels (LDL<70 mg/dL), the Martin-Hopkins equation (MH-E) equation (LDL=TC − HDL − TG/ novel factor) has been considered the most successful [5,6].Later in 2020, Sampson M et al. planned a study in normolipidemic and/or hypertriglyceridemic patients and developed the equation (LDL=TC/0.948− HDL/0.971− (TG/8.56+ [TG × NonHDL]/2,140 − TG 2 /16,100) − 9.44).Although NIH-Equ-2 has the best accuracy in the TG 400-800 mg/dL range, it is far from the results of direct measurement (Direct LDL), as it has an error up to 30 mg/dL at 800 mg/dL of TG [7].Analysis of the above three formulae shows different superiorities over each other concerning TG and LDL-level relationship.Nevertheless, the NIH-Equ-2 is usually accepted as more successful than the Friedewald and MH-E in the TG 400-800 mg/dL range.
Currently, the LDL levels of patients with TG≤400 mg/dL (Low-TG) are calculated using the Friedewald formula in Turkish routine biochemistry laboratory tests.Meanwhile, in high-TG patients, the result is obtained by direct LDL measurement.Recently, machine learning (ML) and deep learning algorithms of artificial intelligence have started to take the place of formulae in medical science.In the literature, studies target patients with low TG that use ML.Naturally, the performances of ML-supported LDL calculation studies are compared with mainly the Friedewald and MH-E [8][9][10][11][12][13].However, to our knowledge, there is currently no ML study targeting the LDL levels in high-TG patients.Very lately, ensemble ML techniques have been practiced in many healthrelated works and diabetes prediction [14,15].
This study aims to develop a model using the Ensemble ML technique that predicts the patient LDL value and level classification using the age, gender data, and other non-LDL test results for high TG cases.The accuracy of our designed model and NIH-Equ-2 results will be evaluated by comparison to Direct LDL.The goal is not to exceed the total acceptable error rate (% TEa) in predicting the patient's LDL value and level.

Study design
Our present research has been conducted at Dr. Suat Seren Chest Diseases and Thoracic Surgery Education and Research Hospital (Hospital).The medical participants of the research requested LDL value and level prediction software from the Computer Engineering members of Dokuz Eylul and Istanbul Universities.A total of 109,591 (high-TG: 6,404 + low-TG: 103,187) patient records collected at the Biochemistry Laboratory of the Hospital were provided.Records missing total cholesterol/triglycerides/ HDL/LDL results, beyond the linearity limit of the specific assays, containing zero or negative valued results, records of patients <18 years of age, or records lacking numerical values were excluded from the study.The remaining 6,392 records (High-TG) of a total of 106,390 records (All-TG) were uploaded to a Phyton ® software (Wilmington, Delaware, USA) and divided into a training subset (80 % of All-TG) and a test subset (20 % of All-TG), as in typical ML testing set-ups.Low-TG patient results were used only for training the algorithm.Our study results are only for high-TG patients.The TC, HDL, LDL, and TG records were obtained by measurements made using the Beckman Coulter (Brea, CA) AU-640 and Roche Cobas (Mannheim, Germany) c702 auto-analyzers.
A detailed literature research revealed the previous algorithms used in LDL prediction [8,10,11].Our data analysis method was based on an investigation using state-of-the-art ensemble ML algorithms that were a subset of artificial intelligence.The candidate algorithms were based on tree-type algorithms that could detect linear or non-linear relationships between independent and dependent variables.Tests were conducted on the pre-conditioned data, and the best prediction performers were chosen to be included in the final prediction model.Performances were measured by comparing the predicted LDL value/ class to the actual measured value/class.

Study population/subjects
Our study population consisted of the laboratory test results of 109,591 blood samples obtained at the Hospital's biochemistry laboratory between January 2010 and December 2022.The standard lipid profile data collected from the laboratory database included the patient's TC, TG, HDL, and the same-day measured LDL value.The baseline characteristics of our study population (High-TG) are shown in Table 1.While 3,431 subjects were male, the remaining 2,961 were female.The average age of the males and females were 49.72 and 54.07, respectively.The measured average Direct LDL was 149.76 ± 45.28 mg/dL.
The Standards for Reporting Diagnostic Accuracy diagram is shown in Figure 1.The diagram reports the flow of the subjects throughout the study.The total number was divided into two sets for statistical and ML analysis.The first set (training set) was created as 80 % of the total number of subjects, and the second set (test set) was created as 20 %.The first set was used for training the designed ML model, and the second set was used for testing the performance of the designed model.The training set was divided into two groups.The first group (n=105,111) contained LDL data with TG≤400 mg/dL and LDL values calculated with MH-E.The second group (n=5,113) contained direct LDL data with TG>400 mg/dL.In the p-LDL-M {1} model, the training set contained only the second group data, while in the p-LDL-M {2} model, both groups were included.The distribution of the test set of 1,279 subjects according to their LDL levels is shown in Figure 1.The classification is according to the 2019 ESC/EAS Guidelines for managing dyslipidemias [16].The most undesired error in the LDL-level classification determination is putting a patient into a lower LDL level than the patient's under-classification. Therefore, underclassification is a very critical preference point in new model designs.

Lipid profile testing
All lipid profile parameters were measured on automatic chemistry analyzers Beckman Coulter AU 640 and Roche Cobas c702 of the Hospital's biochemistry laboratory.Beckman auto analyzers were used for lipid profile measurements in our Hospital until 2017, and Roche auto analyzers afterward.The results of each patient's first test results were included in the study, and duplicate measurements were not included.TC was estimated using the enzymatic cholesterol esterase/oxidase method, while TG was estimated during the enzymatic glycerol phosphate oxidase method.HDL was measured by direct homogenous assays without precipitation, LDL was measured by a direct homogenous assay that uses a selective protective agent to separate LDL from chylomicrons, HDL, and VLDL, and then estimated by cholesterol esterase/oxidase method.The allowable % TEa for the LDL test for these measurement principles is 11.9 % [17].All tests using Bio-Rad internal quality control material were performed using Beckman Coulter AU and Roche Cobas reagents and calibrators.

Statistical analysis
Assuming Direct LDL was the most accurate, it was compared to the corresponding predicted and calculated LDL values.Statistical analysis was performed using IBM ® SPSS ® Statistics 26 for Windows ® .

Demirci et al.: Prediction of LDL in hypertriglyceridemic subjects
Paired t-test was used for the comparison of means.Pearson correlation test was performed to assess the correlation of Direct LDL with the results of the designed ML model and NIH-Equ-2.The p-value <0.05 was taken as statistically significant.Bland-Altman plots were used to assess systematic bias for different direct LDL concentrations.In the Bland-Altman plots, differences among the methods were plotted against direct LDL measurements.
We also assessed the performance of the designed model and NIH-Equ-2 in classifying the LDL levels as per the 2019 ESC/EAS Guidelines.If the subject was correctly classified into the same treatment class by the model and NIH-Equ-2, it was taken as correctly classified.The percentage of subjects classified correctly into appropriate treatment classes was calculated and compared.Cohen's kappa score was used to check the accuracy of the designed model and NIH-Equ-2 classifications.The Kappa result can be interpreted as follows: values≤0 indicating no agreement, 0.01-0.20 as none to slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1.00as almost perfect agreement [18].
A universal method of model performance measurement was the statistical calculation of the accuracy, precision, and recall values.The definitions of the parameters used in the accuracy, precision, and recall calculations are as follows: -True Positive (TP): the number of cases in the subject's LDL class is correctly identified.-False Positive (FP): the number of cases in which the subject's LDL class is incorrectly identified.-True Negative (TN): the number of cases that subjects out of an LDL class is correctly identified.Not applicable.-False Negative (FN): the number of cases that subjects out of LDL class (1-6) is incorrectly identified.

Machine learning analysis
In our study, Python version 3.9 was used as the software programming language.The Pandas software library (Version: 1.4.4) in Python was used for data processing and analysis.The NumPy library (Version: 1.21.5) was used for the Python programming language, which supports large, multi-dimensional arrays and provides highlevel mathematical functions for working with these arrays.The Sklearn library (Version: 1.0.2) was used to train machine learning models.The SHAP library (Version: 0.42.1) was used to feature importance.Lastly, the XGBoost library (Version: 1.5.0) was used to train and test the XGBoost algorithm.
In this part, the first step was the determination of Pearson and Spearman correlation coefficient matrices.Sometimes, due to outliers, these two matrices disagree on the strength of the correlation between an independent and dependent variable [19].Therefore, both were included in the first step of statistical investigation.In the present study, there was no correlation strength disagreement between the two matrices, but only minor differences of less than 1.000.The results and implications of the matrices are in Figure 2. The figure summarizes the relationship between the different parameters involved in our research.A correlation higher than 0.7 is an indication of a strong relationship.A correlation higher than 0.85 is an indication of a very strong relationship [19].
The SHapley Additive exPlanations (SHAP) graph can determine each feature's significance in our proposed model.The SHAP values provide the contribution of each feature to every individual prediction in a proposed model.In other words, SHAP values quantify the impact of each feature on the model's predictions by considering all the possible feature combinations.The SHAP graph and the Pearson correlation results of our model were provided in the discussion section.
The step in ensemble ML analysis was the regression investigation of the High-TG subset.Seven ML algorithms (Random Forest, Decision Tree, XGBoost, Linear Regression, KNN, AdaBoost, SVR) were used to test the linearity of pre-processed data.After verifying the linearity of the High-TG dataset, the LDL values of the patients with Low-TG were calculated using the MH-E and Friedewald equations.After preliminary analysis showed that MH-E performed better in predicting LDL, a dataset was obtained by calculating low-TG LDL.A new dataset, All-TG, was formed by combining the Low-TG subset with the High-TG dataset.The linearity of All-TG was also verified.
In the third step, test scores of the seven individual ML algorithms were determined for the High-TG subset and the All-TG data set.The "bucket of models" is an ensemble technique that tests models to choose the best-performing model for the problem.Avoiding intermediate results, Random Forest and XGBoost were the best performers in both data sets, taking first place and second place.Their test scores were around 1 % apart.
A Machine Learning Model was designed in the next step, as shown in Figure 3.The model has been named "p-LDL-M" because our model predicts the LDL value using Artificial Intelligence rather than measuring it.For the High-TG dataset, the p-LDL-M was tested using both Random Forest and XGBoost algorithms (Table 2: p-LDL-M {1}).After observing the XGBoost algorithm's slightly better performance (around 1 %), the designed model was finalized by hyperparameter tuning of the used XGBoost algorithms.Next, p-LDL-M prediction was conducted using the All-TG dataset (Table 2: p-LDL-M {2}).In the final step, the predicted LDL values were placed in their LDL-level classes, separately for p-LDL-M {1} and p-LDL-M {2}, as shown in Figure 3.

Basic statistics results
The Pearson correlation basic statistical results of our study are presented in Figure 2. According to Figure 2, there was a very strong relationship between LDL and TC, LDL and Non_HDL, also between TC and Non_HDL.It is also clear that NIH-Equ-2 can calculate LDL value fairly correctly.Surprisingly, the rest of the parameters have moderate or weak relationships.
The basic statistical accuracy, precision, and recall results of our proposed p-LDL-M {2} model classification are presented.The total number of subjects is 1,279.But first, the p-LDL-M {2} model's statistical TP, FP, TN, and FN findings are given below, using Supplementary Table 1.Since our classification was not about identifying a subject's "not belonging to a class," True Negative was considered non-applicable to our results.

Machine learning results
The ML analysis results are shown in Table 3.The first six rows show the test score of predicting the LDL values in that specific LDL-level Class only.The last row uses the entire dataset to show the LDL value prediction for all classes.For the High-TG subset, the test score for predicting LDL values in Class 1 was the lowest at 74.31 % and highest in Class 4 (95.77%).In all classes, prediction performance was 91.14 %.The rest of the scores were all above 90 %.The first general observation of Table 3 was that p-LDL-M {2} performs better with the All-TG dataset, which the MH-E supported.All test scores were above 94 %, the lowest being 94.86 % in Class 6 and the highest again in Class 4 with 99.75 %.When using all 106,390 patient records, the test score was 99.04 % when all classes were considered.A performance increase of almost 8 % over p-LDL-M {1}.Accordingly, all p-LDL-M {2} prediction errors were lower at all three error types.
The LDL level class prediction results are shown in Figure 1.Compared to the actual distribution, the predicted Class distributions of p-LDL-M {2} look similar for the p-LDL-M {1} model.However, the classification predictions were further detailed in Table 5.In this table, the over and under-classification errors were shown.

Comparison of model predicted and NIH-Equ-2 results with direct LDL
A study of the mean of the predicted and calculated LDL values by a paired t-test showed that all of the means significantly agreed with the mean of Direct LDL.However, the mean LDL values calculated by NIH-Equ-2 were distinguishably lower than the mean of Direct LDL and the other models.There was a 12.72 mg/dL difference between the mean of the values calculated by the NIH-Equ-2 (135.22 mg/ dL) and the mean of the Direct LDL values (147.94mg/dL).This difference was 8.60 % of the Direct LDL value and smaller than the LDL test TEa (11.9 %).
The Pearson correlation analysis between the predicted NIH-Equ-2 and Direct-LDL results showed very strong correlations.While the correlation of predicted results was >0.950, the correlation score of the NIH-Equ-2 was 0.865 (Table 4).
Analysis showed that the NIH-Equ-2 results were scattered, and the R 2 value was low (R 2 =0.7483).The p-LDL-M {2} produced the best results, with an R 2 value close to 1 (R 2 =0.9905) and a low scatter.The p-LDL-M {1} model had a slightly high scatter, with 0.9231 R 2 value.The scattering of all competitors can be seen in Figure 4a.In the Bland Altman plot, it can be seen that most of the measurements are above the mean line and are mostly within the 95 % confidence interval.It can be mentioned that the values that do not fall within the 95 % confidence interval from the results produced by our model tend to be high (Figure 4b).The receiver operating characteristic (ROC) curves of the predicted six classes in the p-LDL-M {2} model are shown in Figure 4c.The area under the ROC curve (AUC) indicates a model's performance across all possible classification thresholds.A value of more than 0.9 is considered an outstanding performance.Our obtained ROC curves indicate a 98 % AUC value success rate in five classes.The AUC value of Class 3 is the lowest (95 %).Therefore, the average AUC value proves that our proposed model has good prediction accuracy and precision performance across all classes.
The class of LDL levels of patients is as important as the quantitative value of LDL in lipid-lowering therapies.Clinicians apply variable treatment options ranging from dietary changes and exercise to multi-drug therapies, depending on the patient LDL level category.Therefore, the LDL values under study were categorized into classes according to the 2019 ESC/EAS guideline.As shown in Figure 1, class 1 (0-54 mg/dL) had the least number of cases (10 cases), while class 5 (116-189 mg/dL) had the highest number (749 cases).

Discussion
Our study aims at LDL value and level prediction of High-TG people based on measured and calculated values.According to our literature review, our study is the first for high-TG subjects.Below, the most important findings of our study are discussed.
The mean value of the used datasets is important in giving a general idea about the studied population.When the lipid profiles of the population included in our study were analyzed, the mean values of TC, TG, and LDL values were found to be high, while the mean HDL values were found to be low compared to other related ML studies [8][9][10][11].The dietary habits in our country may explain the differences.However, except for the TG results, the lipid profile values of the National Institutes of Health center were similar to our results in  the multicenter study by Sampson M. et al.The lipid results from the other four centers were lower than our study [7].
In previous Low-TG LDL prediction studies, Random Forest has been the most preferred ML algorithm, but XGBoost, Deep Neural Network (DNN), Support Vector Machine (SVM), Linear Regression, and k-Nearest Neighbors (KNN) methods have also been used [8,10].In our study, Ensemble ML techniques proved XGBoost as the preferred algorithm, as it provided the best prediction results.
Our p-LDL-M predictions in patient LDL value and LDL-level class have produced interesting results.Comparison of the LDL value prediction accuracy of our p-LDL-M with the direct LDL measurement results showed that our model's prediction errors did not exceed the total acceptable error rate (%TEa) and obtained the highest test score (99.04 %), the lowest MAE, MSE and MAPE results.Interestingly, the best performance was obtained by using All-TG and not by using measured High-TG only.Applying the Martin-Hopkins equation to Low-TG and using the calculated LDL values increased the prediction performance by a 7.90 % margin over predictions when only High-TG data was used.The p-LDL-M prediction results outperformed the famous NIH-Equ-2 results by a margin of 8.60 %.In addition, LDL level classification results had better performance and the least scatter when All-TG was used.
The study clearly shows the positive effect of using the Martin-Hopkins equation to calculate LDL values.Furthermore, our study confirmed the following: -The previously known strong correlations between TC, Non-HDL, and Direct LDL, -The good performance of NIH-Equ-2 in calculating LDL (Figure 2).-The success of the Random Forest algorithm in estimating LDL values.-The linear relationship between TG and LDL.
A thorough statistical analysis of the p-LDL-M results has proven the strong performance of our designed model.The p-LDL-M predicted, and direct LDL measurement results were significantly correlated (0.995), while the correlation of the NIH-Equ-2 results was lower (0.865).The algorithm results of Anudeep PP. and Singh G. et al. were also significantly correlated with the direct-LDL measurements (0.98 and 0.982) [9,10].The above strong correlation results indicate that ML algorithms can predict LDL values better than previously developed formulae [20].Another interesting result was the difference between the mean value of the NIH-Equ-2 results and the direct LDL mean (147.94 ± 45.03, 135.22 ± 41.01 mg/dL).In contrast, the mean values obtained in our model and the other compared methods had no significant mean differences.Our p-LDL-M {2} model's superior statistical performance is further supported by the accuracy, precision, and recall values in the results sections.
The SHAP graph of our model is shown in supplementary material.The highest impacting feature is Non_HDL.This is validated by the highest Pearson correlation value, 0.8229 in Figure 2. Triglyceride's negative −0.104 Pearson correlation value is observed in the minus section of the SHAP graph.The protruding line in the positive section depicts the natural correlation between cholesterol and LDL levels.
The scatter plot of p-LDL-M results is shown in Figure 4a.While our results' R 2 value is 0.9905, the R 2 value of the NIH-Equ-2 formula is 0.7483.The graphs show that our model results are almost linear, in contrast to the scattered results of NIH-Equ-2.NIH-Equ-2 has more scatter with increasing TG levels.Our scatter performance is again in agreement with Anudeep PP. et al.'s low scatter study.Our study also agreed that different formulae can produce negative results, even though their correlation values (r) vary between 0.89 and 0.94 [10].
Numerous epidemiological, mendelian randomization, and randomized controlled trials have shown that atherosclerotic cardiovascular diseases (ASCVD) and LDL have a log-linear relationship.Therefore, according to previous guidelines and committees, lipid-lowering therapies are of great importance for ASCVD-related survival.The lipid-lowering therapeutic interventions have been demonstrated by basic science, clinical observations, genetic factors, randomized controlled trials, and epidemiological studies [21][22][23].It is also known that LDL levels determine the type and dose of cholesterollowering therapies to be administered to patients.One study demonstrated that a 1 mmol/L reduction in LDL levels can reduce major cardiovascular events by 20 % (24).In some cases, the p-LDL-M has overestimated LDL levels by one class (Table 5).However, we believe there is little danger of overestimating LDL levels by one class because ezetimibe and monoclonal antibody therapies added to statins lower LDL levels more, thus improving cardiovascular outcomes.It has been demonstrated that extra ezetimibe in statin therapy reduced the risk of possible cardiovascular events but did not cause any side effects or toxic effects, even in individuals with acute coronary syndromes and desirable LDL levels.As a result, the survival rate has improved with decreased LDL levels.In addition, there is no study on the harm of ezetimibe addition in statin therapy at medium LDL levels [25,26].
The costs of lipid-lowering therapies to healthcare providers are significant worldwide [27].Although beta quantification is a reference method, it is unsuitable for routine use due to its high cost and effort [28].Therefore, beta quantification has been replaced by enzymatic and homogeneous immune methods.After the correlation of LDL with other cholesterol derivatives was demonstrated, the direct measurement was replaced by formulae such as Friedewald, Martin Hopkins, and NIH-Equ-2.The aforementioned formulae have different advantages over each other.While the Martin Hopkins and Friedewald formulae are successful at TG levels <400 mg/dL, the NIH-Equ-2 can calculate results up to 800 mg/dL.More remarkably, since the Martin Hopkins formula uses different factors for different TG levels, it can be used even in non-fasting people [29].Recently, research eliminating the disadvantages of used formulae has focused on ML LDL prediction models.According to our literature review, our study is the first to predict LDL levels using ML algorithms for cases with TG between 400 and 800 mg/dL.Our work becomes important when clinicians need accurate LDL level classifications for appropriate treatment [16].To demonstrate the classification accuracy of our model, we calculated the kappa score of our classification results.The kappa analysis revealed that the kappa score of our p-LDL-M model was significantly higher than the NIH-Equ-2 (0.951 and 0.394, respectively).According to the kappa score, the p-LDL-M model was in "almost perfect agreement" with the direct LDL measurements, while NIH-Equ-2 was evaluated as "fair agreement".
It is known that classifying an LDL value to a lower class (under-classification) than it actually may prevent the patient from receiving the appropriate treatment.On the other hand, putting LDL levels in the correct or a higher class (over-classification) will result in the appropriate or more intensive treatment.This situation may only lead to some more patients receiving treatment and some more treatment side effects.However, considering the morbidity and mortality statistics of cardiovascular disease patients due to high LDL levels, intensive hyperlipidemia treatment can positively affect the quality of life of over-classified patients [24,30,31].Table 5 shows that The p-LDL-M{2}had 15 under-classified cases (1.17 %, n=15), while The p-LDL-M{1} had 42 (3.28%, n=42) and NIH-Equ-2 had (27.7 % n=355).More strikingly, in the p-LDL-M {2}algorithm, none of the patients were misclassified under two classes.Furthermore, while there were only three misclassifications for classes 1 and 2 in prediction models, there were 23 misclassifications in NIH-Equ-2.
In classes that include patients with more risk factors (classes 1-3), our model placed some subjects incorrectly in a higher class.As a consequence, such misclassified patients would receive more intensive treatment, which is clinically less critical than inadequate treatment [4].Our study has the lowest sample size in the first 3 clusters among all studies.We believe that our study's relatively large classification error is due to the low number of samples available, and it will decrease with a larger number of samples.
Many publications are defending the success of NIH-Equ-2 LDL value estimation [32,33].It should be noted that the results of patients with TG<400 mg/dL were also used in the development of NIH-Equ-2.In addition, NIH-Equ-2 was based on LDL levels estimated by the reference beta-quantification method.Although our model's accuracy was superior to NIH-Equ-2 accuracy, the LDL levels for patients with TG<400 mg/dL were obtained by direct homogeneous immune methods in our study.In addition, our study used results based on two different auto-analyzers, unlike many publications in the literature.Therefore, different accuracy rates in results may arise because of other methods, models, and auto-analyzers [34].
Our study has several limitations.First, since our data were obtained retrospectively and beta quantification is not used in laboratory practice, the more frequently preferred direct homogeneous immune method was used instead of the reference beta-quantification method.Secondly, the effects of diseases that may affect the lipid profiles could not be evaluated separately due to the study's retrospective design.Thirdly, ethnic group sub-categorization could not be made.Since the target TG range was >400 mg/dL, triglyceride values could not be subcategorized or analyzed in detail.Our study's sample size is lower than other studies in the literature because our Hospital is a chest disease hospital specializing in lung diseases.However, we believe the sample size is still sufficient to obtain statistically significant results.Due to the low number of participants, the model must be validated with larger datasets, even if the values produced are attributed to success.

Conclusions
As a result, the estimation model created by our study gives more accurate results for target TG values than the NIH-Equ-2 formula, considering the proportional relationship of very low-density lipoprotein (VLDL) with non-HDL cholesterol and TG.After validation of this model in more cases, it can take a place in the routine lipid profile.Its use as a clinical support system to predict LDL levels in patients with high TG may be a safe and time-efficient/cost-effective solution for both laboratory professionals and clinicians in the future.

Figure 1 :
Figure 1: Standards for reporting diagnostic accuracy diagram showing the flow of the subjects through the study.

Figure 4 :
Figure 4: Statistical performance characteristics of our model.(a) Scatter plots of correlations between predicted, calculated, and direct LDL.(b) Bland-Altman plot between direct LDL and p-LDL-M {2}.(c) ROC curves of p-LDL-M {2} model class predictions.

Table  :
Characteristics of the study population (n=).
a MAE, mean absolute error.b MSE, mean squared error.c MAPE, mean absolute percentage error.

Table  :
Comparison of LDL predicted by p-LDL-M, and NIH-Equ- with Direct LDL (n=,).
a Fair agreement.b Almost perfect agreement.Demirci et al.: Prediction of LDL in hypertriglyceridemic subjects