CSG compressive strength prediction based on LSTM and interpretable machine learning

: As a new type of environmentally friendly building material, cemented sand and gravel (CSG) has advantages distinct from those of concrete. Compressive strength is an important mechanical property of CSG. However, his method of testing is mainly by doing experiments. For this reason, a deep learning algorithm, long short-term memory (LSTM) model, was proposed to predict the compressive strength of CSG by using four input variables, namely cement content, sand rate, water-binder ratio, and ﬂ y ash content, with a total of 114 sample data. Three metrics – coe ﬃ cient ( R 2 ), root mean square error (RMSE), and mean absolute error (MAE) – were used to evaluate the model ’ s performance, and the predicted results were compared with the traditional machine learning algorithm, namely the random forest (RF) model. Finally, SHapley Additive exPlanations can be combined to explain the contribution degree of each input feature in the machine learning inquiry model to the prediction results. The results show that the prediction accuracy and reliability of LSTM are higher. The LSTM model has R 2 = 0.9940, RMSE = 0.1248, and MAE = 0.0960, while the RF model has R 2 = 0.9147, RMSE = 0.4809, and MAE = 0.4397. The LSTM model can accurately predict CSG compressive strength. Cement and sand rate contribute more to the predicted results than other input characteristics.


Introduction
In response to the world development trend and the national call for "carbon neutrality," the construction of environmentally friendly water conservancy projects has become the mainstream trend [1].Compared with a rollercompacted concrete dam, cement consumption is less, aggregate preparation and mixing facilities are greatly simplified, temperature control measures can be cancelled, the construction speed is significantly accelerated, and the project cost is significantly reduced.Its material, cemented sand and gravel (CSG), is an economical, safe, green, and low-carbon new building material formed by river bed gravel or local waste materials after mixing, rolling, and vibrating with cementing materials and water [2,3].As an ultra-poor cementing material, it is similar to concrete but has many differences and advantages.Compared with concrete, the reduction in cement content significantly reduces its hydration heat.The selection of raw materials is very simple, and the aggregate with large particle sizes removed does not need to be sieved.To avoid the destruction of land vegetation to the greatest extent [4], it is necessary to study the properties of CSG to apply and popularize it more efficiently.
For the various properties of CSG, compressive strength is directly related to the safety of the structure and is a necessary condition for evaluating the performance of the structure in the whole life cycle [5].Therefore, compressive strength is one of the most important properties of CSG.However, CSG is a heterogeneous mixture of complex materials, and each component is randomly distributed in the CSG mix ratio, and factors such as cement content, waterbinder ratio, and waste composition will affect the compressive strength [6].Therefore, it is very difficult to accurately predict the compressive strength of CSG with a complex matrix.Scholars have conducted relevant studies on the compressive strength performance of CSG, including test methods, and relevant scholars have taken some tests to assess the compressive strength of CSG, statistical rule analysis, and influence analysis.Chen et al. [7] established a dataset of CSG mix ratio and 28-day compressive strength and analyzed the distribution law of CSG compressive strength by using skewness kurtosis and single-sample Kolmogorov-Smirnov test.Li et al. [8] conducted an experimental study on the effects of sand rate, water-binder ratio, fly ash, and other parameters on the properties of CSG materials in different mix ratios.Chai et al. [9] studied the influence of fly ash content on the compressive strength of CSG.However, taking the test method generally requires a lot of time and economic cost, so it is necessary to seek a low-cost and high-prediction accuracy method to predict its performance.
With the development of artificial intelligence, intelligent algorithms have been applied in many fields.Zhou et al. [10] proposed a fire prediction model based on the CatBoost algorithm to predict fire points.Liu et al. [11] proposed a prediction model based on the XGBoost algorithm for pipeline safety assessment.Ilić et al. [12] realized water-quality prediction in five different regions through the Naïve Bayes algorithm.Wang et al. [13] developed an improved back propagation (BP) neural network to predict surface runoff coefficients in different rainfall conditions.Intelligent algorithms are also used to predict the strength of civil engineering materials, mainly concrete.For example, Wu and Zhou [14,15] employed an optimized support vector regression (SVR) model to predict the splitting tensile and compressive strength of sustainable concrete, and the results showed that the optimized model can achieve an accurate prediction of concrete mechanical properties.Latif [16] used boosted decision tree regression (BDTR) to predict the compressive strength of concrete and compared it with support vector machine (SVM).The results show that BDTR has better prediction accuracy with an R 2 of 0.86, which can accurately predict the compressive strength of concrete, but this may depend on the input adequacy of the data set.Ahmad et al. [17] adopted decision tree, bagging regressor, and Ada-Boost regressor to predict geopolymer concrete compressive strength, and the results showed that the bagging model had the best prediction accuracy.But it is also possible to compare the accuracy of predictions with other machine learning models.Yuan et al. [18] used machine learning methods such as gradient boosting and random forest (RF) to predict the compressive strength and flexural strength of recycled aggregate concrete.The results show that RF has better prediction accuracy than gradient boosting, and it is suggested that environmental characteristics can be further added as input variables.Mozumder et al. [19] tried using SVR to predict the uniaxial compressive strength of FRP-confined concrete.The results show that SVR can be used as an alternative physical tool to predict the strength of fiber reinforced polymer (FRP)-confined concrete.However, the research on the properties of materials is mainly focused on concrete, but there is a lack of research on the properties of CSG, especially when it comes to the application of intelligent algorithms in strength.
In addition, most of the above intelligent algorithms are traditional machine learning algorithms, whose prediction ability is limited.Compared with traditional machine learning algorithms, it may be a better choice to explore deep learning models with better prediction performance.For example, Liu et al. [20] proposed a convolutional neural network (CNN)-based foreign exchange rate prediction model, and the results show that its long-term prediction accuracy is better than artificial neural networks, SVR, and other models.Salinas et al. [21] adopted an autoregressive recurrent neural network (RNN) (DeepAR) to produce probabilistic predictions and showed an accuracy improvement of about 15% compared to the latest methods.Wu et al. [22] combined a deep learning gate recurrent unit network with wavelet packet decomposition for the automatic diagnosis of internal defect signals of concrete structures, and the results showed that the accuracy rate of the model reached 90.76% and the prediction performance was good.Zhang and Ci [23] used the deep belief network (DBN), and their results suggest that DBN has excellent performance in forecasting and direction.However, selecting too many hidden layers in pursuit of prediction accuracy may cause the problem of gradient disappearance [24].One of the characteristics of long short-term memory (LSTM) model is that it can learn long-term dependencies to avoid the problem of disappearing gradients [25].Qiu et al. [26] predicted river water temperature by LSTM and achieved a good prediction effect.Latif [27] predicted the compressive strength of concrete through LSTM and proved the superiority of the LSTM model.
In view of the multiple advantages of CSG over concrete, the rare application of intelligent algorithms in CSG material properties, the advantages of deep learning, and the advantages and disadvantages of different algorithms, this article uses 114 sets of compressive strength test data to predict the compressive strength of CSG for the first time through deep learning LSTM.Compared with the traditional machine learning RF, the effectiveness of the LSTM model and the advantages of deep learning compared with traditional machine learning are verified, which provides a theoretical basis for the practical application of CSG and promotes the practical application of CSG.
2 Experimental design and method

Experimental raw materials and mix ratio
The purpose of this experiment was to measure the compressive strength of CSG under different mix ratios.According to the "Technical Guidelines for Cemented Granular Material damming" (SL678-2014), the amount of cementing material should not have been less than 80 kg•m −3 , and the amount of cement should not have been less than 32 kg•m −3 .The sum of cement content and fly ash content in this test was between 80 and 110 kg•m −3 , and the mix ratio is shown in Table 1.
The cement used in the experiment was 425# ordinary Portland cement, produced in Henan Yodongda Cement Co., Ltd, with physical and mechanical properties as shown in Table 2; the fly ash was Class FⅡ dry discharge fly ash from Zhengzhou Thermal Power Plant, with properties as shown in Table 3.The mixing water was tap water, and the coarse aggregate and fine aggregate came from the North Ruhe material Yard in Ruzhou City.

Experimental process 2.2.1 Preparation methods
In view of the fact that CSG material properties are between roller-compacted concrete and earth-stone materials, the forming and maintenance of CSG specimens in this test were carried out according to the "Technical Guidelines for Dam Construction with Cemented Particle Material" (SL678-2014) and "Test Procedure for Hydraulic Rolled Concrete" (DL/T5433-2009).The preparation process of the CSG specimen in this test is shown in Figure 1.First, the aggregate was screened and then stored in silos with different particle size standards.The mixture was designed according to the mix ratio, and the mixture was mixed at the end.In order to improve the mixing uniformity, mechanical mixing and manual mixing were combined, and the mixing machine used was the single horizontal shaft concrete SJD-60 mixer.After the mixing, loading, vibrating, and forming were carried out.During the loading, the cast iron test mold pre-painted with oil was used.First, manual vibration was carried out for no less than 25 times and then moved to the shaking table.The weight block was put on, and the hand was righted.The vibration time was strictly controlled to

Standard curing room maintenance
Stand still according to the standing time  vibrate the shaking table.After the vibration was completed, the specimen was moved down, i.e., the specimen was formed.After the specimen was covered and left for 48 h, the mold was removed, and finally the specimen was put into the standard curing room for curing until the test age.

Experimental methods and results
The compressive strength test was carried out according to the "Standard for Test Methods of Physical and Mechanical Properties of Concrete" (GB/T 50081-2019).The pressure testing machine adopted WAW-1000 electro-hydraulic servo universal testing machine, as shown in Figure 2. The compressive strength test of the CSG specimen was carried out by using the computer to automatically control the test process.The specimen was placed in the middle of the pressure plate under the testing machine after inspection, and the bearing surface was perpendicular to the top surface when forming.Under continuous and uniform loading, the specimen approached failure and began to rapidly deform until it was completely destroyed, and the failure load was recorded.The CSG specimen is a secondary standard cube specimen of 150 × 150 × 150 mm.There are 114 groups of tests, with three specimens in each group of tests.The compressive strength was tested according to the above method standards, and the representative value of CSG compressive strength was determined according to the "Concrete Strength Inspection and Evaluation Standard" (GB50107).The 114 sets of data obtained after the final selection are shown in Figure 3.

LSTM networks
LSTM networks are a special type of RNN.It was proposed by Hochreiter and Schmidhuber in 1997 [28].RNN is a loop network in which information is transmitted from the current loop to the next loop.This chain structure indicates that RNN is a normal neural network structure that can be used.However, regular RNN has the problem of longterm dependencies, which means that as the distance between loops increases, the link of information in the RNN can break.However, LSTM can solve the problems of gradient disappearance and gradient explosion during long sequence training in the machine learning field.
The LSTM is able to learn long-term dependencies due to the presence of special properties in the model repetition module.LSTM uses storage units and gates to control long-term information stored or retained in the network.As a powerful recursive neural network model, LSTM can extract long-and short-term correlations in time series, enabling the model to extract data features effectively [29][30][31].LSTM includes a forget gate, an input gate, an update gate, and an output gate in the main structure.The main equations of the LSTM structure are as follows: where f t , i t , g t , and o t are the output values of forgetting gate, input gate, update gate, and output gate, respectively;   bias vectors; c t and σ are memory unit and sigmoid activa- tion functions, respectively.

RF
The RF algorithm was proposed by Breiman in 2001 [32].
The core idea of RF is ensemble learning, which consists of multiple decision classification trees, each of which is built from a bootstrap sample of application data.In the process of tree construction, variables are randomly selected as the candidate variable set at each split, and the results are collected by randomly selecting the features of each classification tree.Finally, the results are stably and accurately predicted by majority voting or average according to each specific problem.Assuming a set of input data sets is } and the prediction value of a single decision tree is { ( ) H x θ , i }, the final prediction result of the RF model is the average of the prediction results of all decision trees: where ( ) H x ‾ is the predicted value of the RF model, θ i is a random variable of a single decision tree, x is the charac- teristic variable, and k is the number of decision trees.

Interpretable machine learning method
Interpretability shortage is one of the restrictions of using a machine learning model, and SHapley Additive exPlanations (SHAP) belongs to the method of model post-interpretation.Its core idea is to calculate the marginal contribution of features to the model output and then explain the "black box model" from the global and local levels [33].SHAP is an interpretable machine learning method based on game theory that involves constructing a combination of different input variables to compare the average change of the model output and then quantifying the specific contribution of each feature to the model results.The SHAP value of each input variable is the weighted average of the marginal contribution of the variable, which is calculated as follows: where Φ i is the SHAP value of input variable i.A positive (negative) SHAP value indicates that variable i contributes to the prediction result; n is the number of input variables; N is the complete set of input variables; S is the set excluding variable i and is a subset of N; and F(S) is the prediction based on the input S.
4 Database establishment

Sample data
The sample data used in this study came from the experimental data of CSG mechanical properties obtained in the previous experiment, with a total of 114 sample data.As an additional data set independent of the training set and the test set, the validation set can be used to evaluate the model to better determine whether the model has good generalization ability.In view of the necessity of the verification set and to further increase the reliability of the model, the sample data were randomly divided into three parts: 77 training set data, 20 verification set data, and 17 test set data, and the modeling was carried out on the basis of these data.Input variables in this article were completely consistent with the test mix ratio and material dosage.Input variables included cement content, sand rate, waterbinder ratio, and fly ash content, which involved all raw materials (cement, fly ash, sand, water, and sand gravel) in the test.Although increasing the number of input features in some machine learning algorithms could improve the model's performance to a certain extent, it usually requires more data support [34].In view of the limited amount of data in this article, it was appropriate to select four input variables, and the model in this article could achieve very good prediction accuracy, whereas too many input features might have easily led to overfitting problems.At the same time, the computational complexity of the model will be increased [35].In addition, the output variable was set to compressive strength.The description of the overall data in the model is shown in Table 4, which provides the mean value, median value, standard deviation, sample variance, range, minimum value, maximum value, sum, and number of sample data of the corresponding data.In order to ensure that all models obtain the best results from the parameters, it was crucial to identify these parameters.

Data preprocessing and evaluation index
Due to the differences in the units of input variables in this study, if they were directly substituted into the model for learning, the final result would be affected to some extent, so it is necessary to normalize the data.The normalization process retains all the features, converts the parameter values into data between 0 and 1, and converts dimensionless expressions into dimensionless expressions, so that the data are numerically comparable and the model has higher accuracy.In this study, min-max standardization was used as a normalization method, and the expression equation is as follows: where y i represents the normalized data, x min represents the minimum feature data, x max represents the maximum value of the feature, and x i represents the data before normalization.
In order to test the performance of the model, this article evaluated its performance through coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE).Each index has its own method to infer the performance of the model.R 2 is used to check the linear correlation between the experimental value and the predicted value.When R 2 is >0.8 and <1, the model is considered valid [36].RMSE is used to evaluate the difference between the experimental value and the predicted value and is the most commonly used index to measure the regression quality of regression trees.MAE is used to assess the average error between the experimental value and the predicted value.For these two indicators, the lower the value, the higher the model performance.R 2 , RMSE, and MAE formulas are as follows: where N is the number of sample data, y i is the experimental value (MPa), y ̅ i is the average of the experimental value (MPa), f i is the predicted value of model regression (MPa), and f̅ i is the average of the predicted value from the model regression (MPa).

Model prediction and result analysis
The purpose of this study was to evaluate the performance of the LSTM model and the RF model for CSG compressive strength prediction.In order to analyze and compare the predictive performance of the two models, the experimental value and the predicted value were plotted in relation to each other.The input data set included cement content, sand rate, water-binder ratio, and fly ash content, and the output variable was compressive strength.R 2 , RMSE, and MAE were used as the performance indexes for model evaluation, and the predictive performance of the two models was compared and analyzed.The effectiveness of deep learning algorithm and machine learning algorithm in CSG compressive strength prediction was compared.The deep learning algorithm chosen in this article was LSTM, and the machine learning algorithm was RF.
Both the LSTM model and the RF model successfully predicted CSG compressive strength.Grid search is widely used as a machine learning tuning technology.It evaluates the model effect of each different parameter combination for random combinations of parameters within a given range through cross-validation to find a set of optimal parameters.This method has good applicability in both machine learning and deep learning and has strong robustness during operation.It can exhaust all possible parameter combinations to ensure that the global optimal solution can be found, especially in problems with small parameter space [37,38].In view of the advantages of this method and the small parameter space of the proposed model, the grid search method was used to optimize the parameters of the LSTM and RF models.
For the parameter adjustment of the LSTM model, grid search was used to find the optimal parameter structure, and cross-validation was carried out under the conditions of a given range of relevant parameters.The optimal parameter structure was obtained as follows: the activation function was "relu," the step size was 2, the number of hidden layers was 256, the number of batch selections was 32, the learning rate was 0.001, the output scale was 1, and the output scale corresponded to an output variable (compressive strength).In addition, the epoch number For the parameter adjustment of the RF model, the tuning of the RF model mainly involves two parameters: the number of decision trees and the number of leaf nodes [39].The number of decision trees is often referred to as "the number of trees in a RF," and the number of leaf nodes controls the depth of each decision tree.These two parameters affect the overall performance and running speed of the RF, and a few other parameters will also have a certain impact on the model, such as "min_samples_split" and "max_feature."Given the range of relevant parameters, the number of main parameter decision trees was 0-200 and the number of leaf nodes was 0-10.After cross-verification, the optimal parameter structure was obtained as follows: the number of decision trees was 39, the number of leaf nodes was 1, "min_samples_split" was 2, "max_feature" was "auto", where "min_samples_split" and "max_feature" were default values, and the model would have obtained the best performance under this parameter structure.
The LSTM and RF models were divided into the same training set, validation set, and test set to facilitate comparative analysis, and training and learning were carried out under the optimal parameter structure.The results after successful prediction were analyzed.Figures 5 and 6, respectively, show the errors between the predicted value and the experimental value of the LSTM and the RF models under different data sets.By observing the bar chart, it could be seen that the absolute error of the LSTM model's prediction results under the training set and the test set was    between the predicted value and the experimental value.In addition, the R 2 of the LSTM model and the RF model were close to each other in the training set and the test set, indicating that the models were not overfitted, which verified the effectiveness of the models.
Statistical analysis was carried out on the LSTM model and RF model, and the evaluation indicators are shown in Table 5.It could be seen that the R 2 value of the LSTM model was 0.9940, which was closer to 1 than that of the RF model, the RMSE value was 0.1248, and the MAE value was 0.0960, all of which were much lower than those of RF model, indicating that the prediction accuracy of the LSTM model was higher.The results of this study were similar to those of previous literature.Gao [40] realized the deep learning CNN could predict the compressive strength of recycled concrete and compared it with the traditional machine learning BP neural network and SVM.The results showed that CNN had higher prediction accuracy, and the training error of 28-day compressive strength was 0.25%.The test error was 0.66%.In addition, Latif [27] predicted the compressive strength of concrete through deep learning LSTM and compared it with the traditional machine learning SVM.The results showed that the accuracy of the LSTM model was better than that of SVM, and the R 2 values were 0.98 and 0.78, respectively.Similarly, Chen et al. [41] used LSTM to predict the compressive strength of high-strength concrete and compared it with the traditional SVR.The results showed that the LSTM model had higher accuracy and reliability.Therefore, it could be concluded that the results of this study were similar to those of other studies in the field, and LSTM could be used as a reliable prediction model for CSG.

Interpretability analysis
The above research showed that for CSG with a given mix ratio, the LSTM model can accurately predict its compressive strength.For CSG, if the predicted compressive strength value does not meet expectations, the content of each component of CSG needs to be adjusted continuously.However, in a situation where the influence and contribution of each input variable to the output result are unknown, these attempts are blind, and there will be a lot of trial and error.Based on this, in order to enhance the interpretability of the model, this article put forward a displayed SHAP, which can explain the machine learning method to study the importance of each input variable to the output and contribution to the size of the positive and negative.
As shown in Figure 8, the average SHAP value on the X axis indicates how important the input variable was to the output result.It could be found that in this study, the input variable that had the greatest influence on the compressive strength of CSG was cement content, followed by sand rate, water-binder ratio, and fly ash.In addition, the influence of global features that illuminated the input features is illustrated in Figure 9, where each point represents the features and SHAP value observed separately in the data set.The X axis represents the SHAP value of each input variable, i.e., the impact of each input variable against pressure strength, and the Y axis represents the importance ranking of the four input variables.The high eigenvalue of each sample in the figure indicates that this input variable had a positive and negative effect on the output result.It could be clearly seen from Figure 9 that the influence of cement content and fly ash content on compressive strength was positive, i.e., the compressive strength increased with the increase in dosage.On the contrary, the influence of sand   ratio and water-binder ratio on compressive strength was negative, and the increase in the content of these two components would have led to a decrease in the compressive strength of CSG, which was consistent with the actual law in a certain range, and the interpretation result had a certain credibility.

Conclusion
In this article, deep learning LSTM is used to predict the compressive strength of CSG, and the prediction results of the LSTM model are compared with machine learning RF.
The main conclusions are summarized as follows: 1) The LSTM model can well deal with the complex nonlinear relationship between variables, and the coefficient of determination R 2 exceeds 0.99 in both the training set and the test set.2) Compared with the traditional machine learning RF, deep learning LSTM has higher prediction accuracy and better performance.It can be used as a method to predict the compressive strength of CSG.The predicted compressive strength of CSG can be obtained through LSTM model prediction before the laboratory compression test, which will greatly reduce the time and material costs of the laboratory compression test.Good for environmental protection.3) Among the four input variables in this article, cement content and sand rate are the two variables that have the greatest influence on compressive strength.4) The influence of cement content and fly ash content on compressive strength is positive, and the compressive strength increases with the increase in the content, while the influence of sand rate and water-binder ratio on compressive strength is negative, and their increase will lead to a decrease in the compressive strength of CSG.
In this article, LSTM and RF models are proposed to predict the compressive strength of CSG, both of which can be predicted successfully.LSTM has better accuracy and generalization ability, but there are some limitations.Since LSTM is a recursive neural network, it requires a lot of computational resources and time, and it is recommended to develop a more simplified deep learning model in future studies.The performance of RF models is often limited when processing high-dimensional data, and it is suggested that this problem should be studied in future studies.In addition, it is very important to evaluate the generalization ability of the model, which is also one of the challenges faced by the current research field.In future studies, more data sets can be used to evaluate the generalization ability of the model in this article so as to better adapt to the needs of practical applications.In addition to expanding the data set, it is necessary to integrate the prediction model into the existing system or software before practical application and consider the strategy of data update and model retraining to ensure the effectiveness and sustainability of the model.Furthermore, better methods can be adopted for model interpretation and analysis to determine the limitations and risks of the model.

Figure 1 :
Figure 1: Preparation process of a CSG specimen.
and W o are weight vectors; b f ,b i , b g , and b o are

Figure 2 :
Figure 2: Compressive strength test of a 150 mm cube CSG specimen.
of iterations was 250, which determined the fitting and convergence of model training to a large extent.The model required several iterations to fit convergence.As the number of epochs increased, the number of weight update iterations increased, and the curve changed from the initial state of unfitting to the state of optimal fitting and finally into the state of overfitting.The iteration of the LSTM model in this article is shown in Figure 4.It could be seen from the figure that both the training set and the verification set began to converge when the number of iterations reached 200 and then entered the optimization fitting state.Therefore, the epoch number of iterations of the LSTM model in this article was 250.

Figure 5 :
Figure 5: Error plots of predicted and experimental values of the LSTM model in different data sets: (a) train set and (b) test set.

Figure 6 :
Figure 6: Error plots of predicted and experimental values of the RF model in different data sets: (a) train set and (b) test set.

Figure 7 :
Figure 7: Fitting curves of different models: (a) LSTM model and (b) RF model.

Figure 8 :
Figure 8: Global importance of the input variables.

Figure 9 :
Figure 9: Global feature influences of the input features.

Table 1 :
Mix ratio of CSG

Table 2 :
Physical and mechanical properties of cement

Table 4 :
Statistical description of the overall data

Table 5 :
Model performance comparison