Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access July 1, 2022

An intelligent approach for reservoir quality evaluation in tight sandstone reservoir using gradient boosting decision tree algorithm

  • Longfei Ma , Hanmin Xiao EMAIL logo , Jingwei Tao , Taiyi Zheng and Haiqin Zhang
From the journal Open Geosciences

Abstract

This article focuses on the study of identifying the quality of tight sandstone reservoirs based on machine learning. The machine learning method – Gradient Boosting Decision Tree (GBDT) algorithm is used to design and classify reservoir quality. First, it is based on logging data, core observation, cast thin section, and reservoir physical statistics. The permeability, porosity, resistivity, mud content, sand-to-ground ratio, and sand thickness were preferred as reservoir evaluation criteria in the area, and the gray correlation method was used to obtain reservoir quality categories and construct training datasets. The machine learning GBDT algorithm is used to train and test the obtained dataset. It is found that the recognition accuracy of the GBDT model is 95% by confusion matrix analysis. In addition, it is compared with four commonly used reservoir prediction methods (Bayesian discriminant method, random forest, support vector machine, and artificial neural network) for verifying the reliability of the GBDT model. Finally, the GBDT model is used to identify the reservoir quality of the study area, and it is well verified in the production data. The research results show that the GBDT model can become an important tool for rapid and real-time tight sandstone reservoir evaluation.

1 Introduction

With the continuous growth of global oil and gas demand and the continuous decline of conventional oil and gas production, as well as the continuous progress of exploration theory and technology, the unconventional oil and gas represented by tight sandstone reservoirs have gradually become a new hotspot of research, exploration, and development, and increasingly attracted high attention from domestic and foreign counterparts. In recent years, China’s unconventional oil and gas exploration have made rapid progress and have entered a stage of strategic breakthroughs. At the same time, the industrial and strategic values of tight sandstone reservoirs have attracted more and more attention to oil and gas exploration and development [1,2,3,4,5]. The tight sandstone reservoirs are affected by low porosity, strong heterogeneity, and diagenesis. It is under the background of generally ultra-low petrophysical properties of tight reservoirs, identification of high porosity and high permeability reservoirs has become one of the key problems to be solved in oil and gas exploration and development [6,7,8,9]. China’s tight oil started late and developed rapidly. So far, many large-scale tight oil reserves have been discovered in Ordos, Songliao, Junggar, and Bohai Bay. Among them, the tight oil resources of the Ordos Basin are 1.99 billion tons. This billion ton oil field was discovered in the Chang 7 oil source layer of Qingcheng Oilfield of Ordos Basin in September 2019. The average height of the formation is about 100 m, and the “Zhangjiatan Shale” with stable lithology is developed at the bottom. The current research on these tight sandstone reservoirs mainly focuses on the basic characterization of sedimentary environment, diagenesis, and pore throat characteristics [10]. Therefore, the evaluation technology is relatively macroscopic, and the understanding of the control parameters of high-quality reservoirs is still unclear. The resulting reservoir evaluation is difficult to carry out rationally, which hinders the effective development of tight oil, and gas to a large extent. Reservoir “sweet spot” is the core content of tight oil and gas exploration, which is directly related to the selection of exploration targets and the evaluation of tight oil and gas resources. It is also a prerequisite for the effective development of tight oil and gas. Accurate evaluation of tight oil and gas reservoir quality is the goal of petroleum engineers.

The traditional reservoir quality evaluation method starts from the controlling factors of the physical property of tight reservoirs, and the basis of clarifying the constructive, or destructive, effects of geological factors such as structure, deposition, or diagenesis on the physical property of tight reservoirs. The quality of tight reservoir is evaluated by judging the action properties and coupling relationship of geological factors of different areas, and the superimposed development area of constructive geological factors is taken as the quality area of the tight reservoir [11,12,13]. This method can be analyzed from the perspective of the genesis of tight oil, which is of great significance to oil and gas exploration, but it is difficult to give a quantitative standard for reservoir grading evaluation. With the development of modern testing technology, some scholars classify the reservoir quality by the permeability under different pore types basis for the characterization and classification of microscopic pore seepage [14,15]. This method requires a large number of pore tests such as scanning electron microscopy, thin slices, mercury intrusion analysis, etc. It is difficult to popularize in areas lacking relevant data. The intelligence and digitization of oil and gas exploration and development have become a hot spot and development trend in the industry to improve the efficiency and quality of oil and gas exploration and development and reduce costs and risks. The machine learning is booming and used across the world, and the application of machine learning technology to the classification and evaluation of reservoir quality has become particularly important. By applying supervised learning algorithms of machine learning, the machine can learn from experience, identify the complex associations with various datasets, extract the necessary information, improve the accuracy of reservoir quality evaluation, reduce exploration risks, and determine the break-even point [16]. Gradient boosting decision tree (GBDT) is a machine learning model that solves the problem of data imbalance. The principle is to categorize and analyze the residuals between the regression tree and the target value, and then use the step-up algorithm to continuously reduce the residuals so that the calculated value gradually approximates the target value [17,18,19,20,21]. So, the different residual values are treated differently as the regression tree, the training results are not affected even if there are error points in the sample. It is shown that GBDT not only has a reasonable training process but also has good robustness [22].

In recent years, the artificial intelligence has been widely used in various fields closely related to people’s lives. Scholars at home and abroad have done a lot of research work. Hamid Reza Amedi used artificial neural networks (ANNs) to predict the solubility of hydrogen sulfide at different ion concentrations, temperatures, and pressures [23], Mohammad-Ali Ahmadi used ANN to predict the estimation of dissolved calcium carbonate concentration in oilfield brine [24], permeability and porosity estimation [25], prediction of chemical oil displacement efficiency evaluation in reservoirs [26], and prediction of asphaltene precipitation due to natural loss [27]. On this basis, the unified particle swarm optimization algorithm (UPSO) is used to optimize the neural network model [28], prediction of asphaltene precipitation due to natural depletion [29], calculating the dew point pressure of condensate gas reservoir production liquid [30], monitoring condensate gas ratio in condensate gas reservoirs [31], and predicting the solubility of hydrogen sulfide in various ions [32,33]. Seyedeh Raha Moosavi developed RF and Q prediction models that include optimized multilayer perceptron and radial basis function neural networks [34]. Gu Yufeng used cyclic neural network and intelligent forecaster CRBM-Ga-PSO-CRBM to predict lithology, and data based on integrated learning drove the lithology prediction of CRBM-LD-AFSA-LighTGBM [35,36,37]. Runhai Feng used scaling algorithms to improve the uncertainty of lithofacies classification in machine learning methods [38].

In response to the above problems, this study divides the reservoirs into four categories through a large amount of logging data, core observations, casting thin slices and reservoir physical property statistics, and the gray correlation algorithm. In addition, the GBDT algorithm in machine learning is used to learn and classify the quality types of tight sandstone reservoirs. Finally, the advantages of the machine learning method technique are clarified by comparing it with the traditional reservoir quality classification method. The single well reservoir classification map in the target area is generated, which provides planning and methods for fine exploration and development of tight oil and gas reservoirs in Yanchang Formation.

2 Geological background

2.1 Detrital composition and rock fabric

As shown in Figure 1, the petrological properties of the Chang 7 tight sandstone reservoir of the study area were evaluated by using a large number of cast thin sections and scanning electron microscopy. It is found that the rock types of Chang 7 member of the study area are mainly lithic feldspar sandstone and feldspathic lithic sandstone.

Figure 1 
                  The rock types of the Chang 7 member in the Ordos Basin [39].
Figure 1

The rock types of the Chang 7 member in the Ordos Basin [39].

The results of Figure 2 show that the reservoir porosity and permeability of the Chang 7 member had a medium correlation between a correlation coefficient of 0.621, which is a typical tight reservoir. The porosity is mainly distributed over 8 and 12%, and has an average value of 9.45%. The permeability is mainly distributed over 0.08 mD and 0.2 mD, and has an average of 0.154 mD. Therefore, the petrophysical characteristics of the reservoir are generally characterized by ultralow porosity and ultralow permeability.

Figure 2 
                  Cross plot of porosity and permeability of the Chang 7 sandstone.
Figure 2

Cross plot of porosity and permeability of the Chang 7 sandstone.

2.2 Diagenetic features

During the tectonic period between the mid–late Triassic and the early Jurassic, the Indosinian movement occurred in China and its surrounding areas, which had an extremely important influence on the evolution of the Ordos Basin. In the Chang 7 period, the basin lake experienced severe depressions. The corresponding sedimentary facies in this period were dominated by deep lake facies and semi-deep lake facies, forming the main oil-generating layer of the Yanchang Formation. Through casting thin section and scanning electron microscope observation, it is found that the pore types of the Chang 7 tight reservoir of the study area are mainly primary intergranular pores, dissolution pores, and intercrystalline pores, among which dissolution pores dominate. Later diagenetic reformation changed the pore structure of the reservoir. It could be seen from Figure 3 that there were enlarged quartz and feldspar cemented fillings in the pores of the reservoir rock, the pore throat morphology and the structure were dense, and the mixed matrix alteration could be seen between the grains. The residual pores of illite clay and a small amount of debris dissolution were filled with albite, and the intergranular pore throat was filled with quartz. The later diagenesis reduced the contribution rate of pores to the reservoir space, and the physical properties of the reservoir generally showed low porosity and low permeability.

Figure 3 
                  Main diagenesis characteristics of the Chang 7 sandstone in Gucheng area. (a) Enlarged quartz and feldspar cemented filling residual intergranular pore throat morphology; (b) the structure is dense, and the hetero-base altered illite clay can be seen between the grains; (c) filling albite in the residual pores of a small amount of debris dissolution; and (d) quartz filling growth in the intergranular pore throat.
Figure 3

Main diagenesis characteristics of the Chang 7 sandstone in Gucheng area. (a) Enlarged quartz and feldspar cemented filling residual intergranular pore throat morphology; (b) the structure is dense, and the hetero-base altered illite clay can be seen between the grains; (c) filling albite in the residual pores of a small amount of debris dissolution; and (d) quartz filling growth in the intergranular pore throat.

3 Data acquisition and research methods

Ten wells were selected for core analysis, mercury injection experiment, oil test results, etc., to analyze the physical properties of the reservoir in the study area. It was based on the actual geological conditions of the study area. This study selected permeability, porosity, resistivity, shale content, sand-to-ground ratio, and sand thickness as the reservoir evaluation criteria for the area, and defined permeability as the main factor and the rest as sub-factors (Table 1). The gray correlation degree was used to calculate the weight coefficient of the reservoir evaluation parameters. Finally, the comprehensive scoring model and classification standard of the reservoir in the study area were established.

Table 1

Reservoir evaluation parameters of some key wells in Chang 7 member of Yanchang Formation, Ordos Basin

Well Layer Permeability (mD) Porosity (%) True formation resistivity (Ω m) NEWSAND (%) Ratio of sandstone thickness to formation thickness Sandstone thickness (m)
A1 Chang7 0.0871 7.2900 111.2130 10.5010 0.0749 8.6250
A2 Chang7 0.1060 7.0000 41.0020 23.3490 0.0000 0.0000
A3 Chang7 0.0970 10.1000 38.4590 8.6570 0.0287 3.5000
A4 Chang7 0.2890 12.4000 56.6970 17.1850 0.0441 5.3750
A5 Chang7 0.2890 13.1000 126.5960 11.3410 0.0113 1.3750
A6 Chang7 0.2010 7.0000 57.8090 99.9900 0.0000 0.0000
A7 Chang7 0.1140 10.3000 46.9890 29.6210 0.0000 0.0000
A8 Chang7 0.3080 13.3000 117.4610 10.9350 0.0418 4.8750
A9 Chang7 0.1350 10.3000 37.3950 14.6560 0.0212 2.8750
A10 Chang7 0.1150 9.9000 38.1810 63.6610 0.0120 1.6250
A11 Chang7 0.1720 9.9000 73.3820 7.5340 0.0279 2.8750
A12 Chang7 0.0656 8.2000 46.2730 52.9210 0.0000 0.0000
A13 Chang7 0.0222 5.3000 42.8210 20.3790 0.0351 4.1000

3.1 Gray correlation analysis principle

The gray system theory is proposed by Professor Deng Julong [40,41]. This method is mainly used to solve the uncertainty problems such as lack of data and poor information [42]. The purpose of applying gray relational analysis is to obtain a comprehensive method of evaluating reservoirs of multiple factors.

  1. Let the reference series and the comparison series be:

    X 0 = { x 1 ( 0 ) , x 2 ( 0 ) , , x n ( 0 ) } X j = { x 1 ( j ) , x 2 ( j ) , , x n ( j ) } , j = 1 , 2 , 3 , , m

  2. According to the reference series and the comparison series, the following raw data can be formed:

    (1) X = x 1 ( 1 ) , x 1 ( 1 ) , x 1 ( 2 ) , , x 1 ( m ) x 2 ( 1 ) , x 2 ( 1 ) , x 2 ( 2 ) , , x 2 ( m ) x n ( 1 ) , x n ( 1 ) , x n ( 2 ) , , x n ( m ) .

  3. Data preprocessing:

    (2) x i = x i x max , i = 1 , 2 , n .

  4. Calculate the correlation coefficient

    (3) ξ i ( k ) = Δ ( min ) + ρ Δ ( max ) Δ i ( k ) + ρ Δ ( max ) , k = 1 , 2 , m ,

    where Δ i ( k ) = x 0 ( k ) x i ( k ) indicates the absolute difference between the reference series and the comparison series; Δ ( max ) = max i max k ( k ) represents the maximum absolute difference between the reference sequence and the comparison sequence; Δ ( min ) = min i min k ( k ) represents the minimum absolute difference between the reference sequence and the comparison sequence; ξ i ( k ) is the gray correlation coefficient.

  5. Weight coefficient

    The degree of relevance indicates the degree of relevance to the reference sequence and the comparison sequence in the corresponding time. To facilitate the overall comparison, the average value of the correlation coefficient is used to indicate the degree of correlation between the reference series and the comparison series at the corresponding time. The formula is as follows:

    (4) γ i = 1 n i n ξ i ( k ) .

3.2 Calculation of correlation degree and weight coefficient

After each parameter was standardized, the gray correlation coefficient between the reference series and the comparison series was calculated by equation (3), and then the gray correlation coefficient between the reference series and the comparison series was determined (Table 2). According to equation (4), the weighting coefficients of the six indexes of reservoir evaluation were obtained (Table 3). Add the value of each parameter multiplied by the weight coefficient to get the comprehensive score of the reservoir. The calculation formula is as follows:

(5) Comprehensive reservoir score = 0.2367 Permeability + 0.163 Porosity  + 0.1851 RT + 0.1405 SH + 0.1369 Ratio of sandstone Thickness to formation thickness + 0.1378 Sandstone thickness .

Table 2

Correlation coefficient data of reservoir evaluation parameters of some key wells in Chang 7 member of Yanchang Formation, Ordos Basin

Well Layer Permeability (mD) Porosity (%) True formation resistivity (Ω m) NEWSAND (%) Ratio of sandstone thickness to formation thickness Sandstone thickness (m)
A1 Chang7 1.0000 0.6266 0.4014 0.7147 0.3831 0.3831
A2 Chang7 1.0000 0.7097 0.9891 0.8010 0.5641 0.5641
A3 Chang7 1.0000 0.5005 0.9727 0.6610 0.8677 0.8305
A4 Chang7 1.0000 0.9867 0.4943 0.3675 0.5597 0.5856
A5 Chang7 1.0000 0.9052 0.7615 0.3506 0.3611 0.3638
A6 Chang7 1.0000 0.7791 0.7351 0.5617 0.4056 0.4056
A7 Chang7 1.0000 0.5241 0.9371 0.8577 0.5461 0.5461
A8 Chang7 1.0000 1.0000 1.0000 0.3333 0.5021 0.5060
A9 Chang7 1.0000 0.5699 0.7878 0.6042 0.7416 0.8092
A10 Chang7 1.0000 0.5455 0.9021 0.6284 0.6767 0.7065
A11 Chang7 1.0000 0.7055 0.8704 0.4796 0.7054 0.6642
A12 Chang7 1.0000 0.5246 0.7111 0.5847 0.6765 0.6765
A13 Chang7 1.0000 0.5770 0.6036 0.7717 0.5291 0.5248
Table 3

Weight coefficient data of reservoir evaluation parameters of some key wells in Chang 7 member of Yanchang Formation, Ordos Basin

Evaluation parameters Permeability Porosity True formation resistivity NEWSAND Ratio of sandstone thickness to formation thickness Sandstone thickness
Weight coefficient 0.2367 0.1630 0.1851 0.1405 0.1369 0.1378

According to equation (5), the standardized evaluation index and its corresponding weight coefficient were multiplied and then added to obtain the corresponding comprehensive score (Table 4). According to the evaluation factors of Table 4, the inflection point diagram could be made, where it obtained the threshold values of four types of reservoirs, namely I, II, III, and IV (Figure 4). It could be seen from the figure that the type I reservoir of a comprehensive evaluation factor greater than 0.6 was characterized by high-quality reservoirs of high porosity and permeability. The comprehensive evaluation factor of 0.5 and 0.6 was a class II reservoir, which had medium porosity and permeability and was of good quality. The comprehensive evaluation factor of 0.3 and 0.5 was class III reservoir, which had poor porosity and permeability and was of poor quality. The type IV reservoir of a comprehensive evaluation factor less than 0.3 was the reservoir of the worst porosity and permeability and the worst quality, and no industrial oil flow will occur.

Table 4

Comprehensive reservoir evaluation factors of some key wells in Chang 7 member of Yanchang Formation in Ordos Basin

Well Layer Permeability (mD) Porosity (%) True formation resistivity (Ω m) NEWSAND (%) Ratio of sandstone thickness to formation thickness Sandstone thickness (m) The evaluation factors
A1 Chang7 1.0000 0.6266 0.4014 0.7147 0.3831 0.3831 0.6210
A2 Chang7 1.0000 0.7097 0.9891 0.8010 0.5641 0.5641 0.2647
A3 Chang7 1.0000 0.5005 0.9727 0.6610 0.8677 0.8305 0.3794
A4 Chang7 1.0000 0.9867 0.4943 0.3675 0.5597 0.5856 0.6539
A5 Chang7 1.0000 0.9052 0.7615 0.3506 0.3611 0.3638 0.6406
A6 Chang7 1.0000 0.7791 0.7351 0.5617 0.4056 0.4056 0.4719
A7 Chang7 1.0000 0.5241 0.9371 0.8577 0.5461 0.5461 0.3295
A8 Chang7 1.0000 1.0000 1.0000 0.3333 0.5021 0.5060 0.7545
A9 Chang7 1.0000 0.5699 0.7878 0.6042 0.7416 0.8092 0.3942
A10 Chang7 1.0000 0.5455 0.9021 0.6284 0.6767 0.7065 0.4073
A11 Chang7 1.0000 0.7055 0.8704 0.4796 0.7054 0.6642 0.4767
A12 Chang7 1.0000 0.5246 0.7111 0.5847 0.6765 0.6765 0.2982
A13 Chang7 1.0000 0.5770 0.6036 0.7717 0.5291 0.5248 0.3078
Figure 4 
                  Boundary of reservoir classification in the study.
Figure 4

Boundary of reservoir classification in the study.

Using the above method, the single-well reservoir classification of a certain well in the study area shown in Table 5 is obtained. The classified reservoir types were consistent with the oil field test results table. This method was used to classify the reservoir types of the remaining wells as the training sample data onto machine learning.

Table 5

Comprehensive evaluation and classification of some key wells in Chang 7 member of Yanchang Formation in Ordos Basin

Permeability (mD) Porosity (%) True formation resistivity (Ω m) NEWSAND (%) Ratio of sandstone thickness to formation thickness Sandstone thickness (m) The evaluation factors Reservoir types
0.5966 0.8892 0.6282 0.0651 0.6889 0.6889 0.6008 I
0.6422 0.9041 0.6357 0.0635 0.6889 0.6889 0.6152 I
0.5880 0.8863 0.6429 0.0658 0.6889 0.6889 0.6011 I
0.4731 0.8436 0.6477 0.0719 0.6889 0.6889 0.5687 II
0.3446 0.7848 0.6516 0.0782 0.6889 0.6889 0.5303 II
0.2356 0.7198 0.6578 0.0833 0.6889 0.6889 0.4958 III
0.1645 0.6637 0.6600 0.0901 0.6889 0.6889 0.4712 III
0.1399 0.6398 0.6586 0.0922 0.6889 0.6889 0.4615 III
0.0046 0.0065 0.6551 0.1291 0.0000 0.0000 0.1415 IV
0.0046 0.0065 0.6485 0.1543 0.0000 0.0000 0.1438 IV
0.0046 0.0065 0.6339 0.2558 0.0000 0.0000 0.1554 IV
0.0046 0.0065 0.6091 0.4156 0.0000 0.0000 0.1733 IV
0.0046 0.0065 0.5789 0.5179 0.0000 0.0000 0.1820 IV

3.3 Research methods

3.3.1 GBDT algorithm

The GBDT algorithm was developed to ensure classification or regression by continuously reducing the residuals generated during the training process. It had been commonly used from the beginning. Many attributes of GBDT include high prediction accuracy, the ability to process data onto consistent and discrete forms, the use of certain robust loss functions, and extreme robustness to outliers [43]. The training process of the GBDT algorithm is shown in Figure 5. The GBDT algorithm used a weak classifier called the classification and regression tree (CART) in each iteration. The CART was suitable for high deviation, low variance, and sufficient depth. In the regression problem, each classifier performing the iterative process was trained based on the residual of the previous classifier, and the gradient descent technique was used to impose a negative gradient on the loss function, and then the regression tree was fitted to ensure the continuous improvement in the decision model [44].

Figure 5 
                     Schematic diagram of training GBDT algorithm.
Figure 5

Schematic diagram of training GBDT algorithm.

The main idea of GBDT is to train a new learning machine in the gradient direction to reduce the learning error rate of the previous learning machine, and the new learning machine is generated iteratively on the basis of the previous learner, and its calculation formula is as follows [45]:

(6) F m + 1 ( x ) = F m ( x ) + ρ h ( x ) , 1 m M .

The specific steps of GBDT algorithm are mainly divided into the following four steps.

Step 1: The first step is to initialize a learning machine. The initialization formula is as follows:

(7) F 0 ( x ) = arg min ρ i = 1 N L ( y i , ρ ) .

Step 2: To calculate the negative gradient value of the current loss function, the number of iterations is to calculate the fitting target of the regression tree in this iteration, and its calculation formula is:

(8) r m , i = L ( y i , F ( x i ) ) F ( x i ) F ( x i ) = F m 1 ( x i ) , i = 1 , 2 , 3 , , N .

Step 3: Through the first iteration, the optimal base classifier is obtained, whose calculation formula is as follows:

(9) α m = arg min i = 1 N ( r m , i β h ( x i ; α m ) ) .

The optimal learning rate (ρ m ) is calculated by linear optimization method, and the next base learner is updated, whose calculation formula is:

(10) F m ( x ) = F m 1 ( x ) + ρ m h ( x i ; a m ) .

If the iteration ends, repeat Steps 2–3, otherwise proceed to Step 4.

Step 4: Building the ultimate strong learning machine.

3.3.2 Machine learning classification evaluation index

Accuracy is a common evaluation index for classification problems, reflecting the ratio of the number of correctly classified samples of the total number of samples [46]. Generally, single evaluation indicators such as recall rate (Recall) and precision rate and comprehensive evaluation indicators such as F-measure are used as evaluation indicators of unbalanced datasets. The calculation formula is shown in Table 6. We used the confusion matrix to visually display the predicted classification results and the actual classification results in the form of a matrix [46]. The confusion matrix is shown in Table 7. As shown in Table 7, the True Positive (TP) classes represent the number of the sample set divided into Positive classes correctly; False Positive (FP) represents the number of sample sets misclassified into Positive categories; False Negative class (FN) represents the number of misclassified Negative classes in the sample set; True Negative (TN) indicates the number of sample sets that are correctly classified into negative classes.

Table 6

Calculation formula of machine learning evaluation index

Indicators Formula
Recall Recall = TP/(TP + FN)
Accuracy Precision = TP/(TP + FP)
F-score F-measure = 2Recall * Precision/(Recall + Precision)
Table 7

Confusion matrix

Label = 1 Label = 0
Predict = 1 TP FN
Predict = 0 FP TN

4 Results

4.1 GBDT model construction and data analysis

According to the reservoir evaluation standard of Figure 5, the reservoirs of the study area were divided into four types, and effective porosity (POR), gamma-ray (GR), acoustic curve (AC), spontaneous potential (SP), resistivity (RT), characterizing the physical and electrical characteristics of the core, as well as 7 logging curves of water saturation (SW), and shale content (SH) were taken as input parameters. The sampling interval was 0.125 m, and a total of 5,515 data points were recorded, and the relationship between each data point and reservoir type is shown in Figure 6.

Figure 6 
                  The relationship between data points and reservoir types based on normalized logging curves.
Figure 6

The relationship between data points and reservoir types based on normalized logging curves.

The construction model was divided into the following two parts:

Step 1: Divide the dataset into the training set and test set according to 8:2 through the train_test_split module. Select feature subset D and divide it into training set Dtrain and test set Dtest;

Step 2: Establish the GBDT model. The GridSearchCV module was used to perform grid search and cross-validation to find the most accurate parameter within the specified parameter range. According to the final selected hyperparameters, 300 iterations are performed in Boosting, the maximum depth of each learner was set to 3, the minimum number of classification samples was set to 2, the learning rate is set to 0.01, and the loss function was the cross-entropy loss function. After determining the dataset, the training of GBDT could be launched [20,21].

4.2 Model training results

The results of GBDT model training and testing are shown in Table 8. The accuracy of the model training set and test set was 100% and 95%, and the corresponding F-scores were 1.00% and 93.7%, respectively. From the GBDT model test set confusion matrix (Figure 7), it was found that the model predicted 90, 92, 90, and 98% accuracy for Class I, II, III, and IV reservoirs, respectively. In comparison, the model had low accuracy in predicting Type I and Type III reservoirs. From the overall point of view of the model prediction results, good results had been achieved, and the prediction accuracy of the model was sufficient to provide reliable results. In addition, Figure 8 shows the original and predicted reservoir quality logging curves for the coring wells. It could be seen that the accuracy of type I and III thin interbeds was slightly worse than that of type IV, thus confirming the conclusion of confusion matrix analysis. Finally, it could be inferred that the proposed GBDT algorithm was suitable for dense sandstone reservoir quality evaluation.

Table 8

Model training results

Evaluation Test set of GBDT model Training set of GBDT model
Acc 95% 100%
Recall 95% 100%
Precision 92.5% 100%
F-score 0.937 1.00
Macro F1-score 0.94 1.00
Micro F1-score 0.96 1.00
Weighted F1-score 0.96 1.00
Figure 7 
                  Confusion matrix diagram of GBDT model test set.
Figure 7

Confusion matrix diagram of GBDT model test set.

Figure 8 
                  Evaluation results of the A88 well.
Figure 8

Evaluation results of the A88 well.

4.3 Feature importance

When the dataset was large, the amount of calculation required by the classifier would increase, thereby increasing the running time. We used machine learning algorithms before we selected the parameters with larger correlation characteristics between input data and output data to improve the effectiveness of the analysis. The relative importance of each feature parameter in the GBDT model are shown in Figure 9. It could be seen from the figure that AC was considered the most important feature (>0.3), followed by POR (0.2), compared with SP and SH (0.027 and 0.052, respectively), whose importance were low. When five important characteristic parameters (AC, RT, SW, POR, and GR) were selected, the discrimination accuracy of the GBDT model was 88%. However, the accuracy of class I, II, and III reservoirs decreased by 67, 57, and 75, with a 35% decrease in class II (Figure 10). Therefore, the input parameters SP and SH are retained here to ensure the prediction accuracy of the GBDT model.

Figure 9 
                  The feature importance of the datasets.
Figure 9

The feature importance of the datasets.

Figure 10 
                  Confusion matrix plot of the GBDT model with five features.
Figure 10

Confusion matrix plot of the GBDT model with five features.

5 Discussion

5.1 Prediction performance comparison

At present, the common methods of identifying reservoir types by logging curves mainly include Bayesian discriminant analysis and machine learning methods. The following is a comparison of the prediction results of the typical algorithms of these two methods with that of the GBDT algorithm proposed to this article.

5.1.1 Bayesian discriminant analysis

This study used Python software to analyze the applicability of the Bayesian algorithm to reservoir quality of seven logging curves. The Bayesian discriminant accuracy rate was 69%, and the Micro F1-score was 70%. It could be seen from the confusion matrix of Bayesian discriminant analysis in Figure 11 that the effectiveness of Bayesian discriminant analysis was relatively weak. The recognition accuracy of type II and type III reservoirs was poor, especially the recognition accuracy rate of type II is only 20%. In contrast, the recognition of type IV reservoir is better.

Figure 11 
                     Confusion matrix diagram of Bayesian discriminant analysis.
Figure 11

Confusion matrix diagram of Bayesian discriminant analysis.

5.1.2 Machine learning approach

This part discusses the accuracy of random forest algorithm and support vector machine for reservoir quality identification.

  1. Random forest

    This is a supervised algorithm of machine learning, a classifier that uses multiple trees to train and predict samples. This study uses Python software to compile a random forest algorithm to analyze the recognition of reservoir quality by seven logging curves. The accuracy of random forest discrimination is 87%, and the Micro F1-score is 87%. According to the confusion matrix of random forest test set in Figure 12, it could be seen that the identification accuracy of class I, class II, and class III reservoirs is 56, 61, and 69%. In contrast, the recognition accuracy of Type IV reservoir is 97%, but the overall recognition accuracy of the random forest was lower than that of the GBDT model.

  2. Support vector machine

    Support vector machine is currently a more popular machine learning model. It is a two-classification model. When it dealt with multi-classification problems, it was necessary to construct a suitable multi-classifier. This study used Python software to compile a support vector machine algorithm to analyze the recognition of reservoir quality by seven logging curves. The discrimination accuracy of the support vector machine was 82%, and the Micro F1-score was 82%. From the discriminant analysis confusion matrix in Figure 13, it could be seen that the model had the highest recognition accuracy for type IV reservoirs, followed by type I, and the recognition accuracy for type II reservoirs was 0. Therefore, support vector machines were not suitable for reservoir quality evaluation based on logging data.

Figure 12 
                     Confusion matrix diagram of random forest discriminant analysis.
Figure 12

Confusion matrix diagram of random forest discriminant analysis.

Figure 13 
                     Confusion matrix diagram of support vector machine discriminant analysis.
Figure 13

Confusion matrix diagram of support vector machine discriminant analysis.

5.1.3 ANN

ANN is a very important computing model in the current artificial intelligence (AI) field. This study used Python software to compile ANN algorithms to analyze the recognition of reservoir quality by seven logging curves. The discrimination accuracy of the model was 83%, and the Micro F1-score was 83%. As could be seen from the confusion matrix of ANN discriminant analysis in Figure 14, the recognition accuracy of class I, class II, and class III reservoirs was 56, 47, and 66%, respectively. By comparison, class IV was identified with 94% accuracy. The ANN was not suitable for reservoir quality evaluation based on logging data because of its poor recognition of reservoir types.

Figure 14 
                     Confusion matrix diagram of the discriminant analysis of artificial neural network.
Figure 14

Confusion matrix diagram of the discriminant analysis of artificial neural network.

The results of the reservoir quality evaluation of Well A88 in the study area using the above methods are shown in Figure 15. It can be seen from the figure that the prediction result of GBDT is the closest to the original conclusion, and the error is the smallest compared with other methods. Therefore, it can be inferred that the GBDT algorithm is the most suitable for the quality evaluation of tight sandstone reservoirs. According to the geological background of Chang 7 reservoir of Yanchang Formation in Ordos Basin, the rock types of the reservoir are mainly lithic feldspar sandstone and feldspar lithic sandstone, and there are enlarged quartz and feldspar cemented filling residual grains in the rock pores. The shape and structure of the intergranular pore throats are dense, and the intergranular altered illite clay can be seen between the grains, a small amount of debris dissolving the residual pores are filled with albite, and the intergranular pore throats are filled with quartz. The pore types are dominated by primary intergranular pores, dissolution pores, and intercrystalline pores. The contribution rate of pores in the later diagenesis to the reservoir space of the reservoir is reduced, resulting in the general characteristics of low porosity and low permeability in the physical properties of the reservoir rock. The reservoir types of this characteristic are mostly type IV, followed by types III, II, and I, decreasing in order. When the amount of data on each type of reservoir is completely unbalanced, the accuracy of conventional reservoir identification methods is generally low. The GBDT algorithm proposed to this article is an algorithm designed for the imbalance of data volume. It has unique advantages in dealing with the problem of uneven data volume and can be well applied for actual production.

Figure 15 
                     Evaluation results of four models in Well A88.
Figure 15

Evaluation results of four models in Well A88.

5.2 Distribution of the reservoir quality

The above results show that the GBDT model was effective against tight sandstone reservoir evaluation. Therefore, this article evaluates the reservoir quality distribution of the remaining 30 wells. In addition, it was verified through the oil field test data, and the relationship between the reservoir quality type and the test oil production (t/d.m) as shown in Figure 16. It can be seen from the figure that there was a positive correlation between the reservoir type determined by GBDT and the daily production per meter of a single well. It shows that the machine learning model selected in this study was effective against identifying tight reservoir quality. Finally, the quality of reservoirs of the 6th and 7th members of the 51 wells in the target area were identified and statistically calculated, and the reservoir quality distribution was predicted basis of difference prediction and facies control. Figure 17 shows a section of part of the well-connected well reservoir. It could be seen from the figure that the high-quality reservoirs are distributed over obvious bands of the Chang 7 member. In addition, the Chang 6 reservoir is a key area for the future development of reservoir porosity and permeability.

Figure 16 
                  Relationship between reservoir quality categories and production materials.
Figure 16

Relationship between reservoir quality categories and production materials.

Figure 17 
                  Interwell reservoir profile of the study area.
Figure 17

Interwell reservoir profile of the study area.

6 Conclusion

The geological conditions of the Chang 7 tight reservoir in the Ordos Basin are extremely complex. Through evaluation and development practices in recent years, the theoretical understanding and development technology of tight reservoir “sweet spots” have gradually formed.

  1. The accuracy rate of GBDT reservoir quality identification is 95%, and a new technology for accurate and efficient classification of tight reservoir quality has been formed, which supports well location deployment and oil and gas reservoir evaluation;

  2. Through the confusion matrix analysis of conventional reservoir quality identification methods (Bayesian discriminant method, random forest, support vector machine, and ANN), it is found that the accuracy of reservoir quality identification is 69, 87, 82, and 83%, respectively, which further illustrates the unique advantages of the GBDT algorithm in the field of reservoir evaluation;

  3. It breaks through the limitations of traditional tight reservoir “sweet spot” identification, establishment of interpretation models and complex calculation processes, and provides a reference plan for the development and large-scale production of complex tight oil and gas reservoirs in the Ordos Basin.

Acknowledgments

The work was supported by the study on characterization technology and seepage mechanism of seepage channel in tight reservoir (2021DJ2201).

  1. Author contributions: M.L.F.: prepared the manuscript with contributions from all co-authors; X.H.M.: data collection and processing; T.J.W.: developed the model code; Z.T.Y.: revision of the manuscript; Z.H.Q.: modified the article syntax.

  2. Conflict of interest: Authors state no conflict of interest.

References

[1] Li P, Zheng M, Bi H, Wu ST, Wang XR. Pore throat structure and fractal characteristics of tight oil sandstone: A case study in the Ordos Basin, China. J Pet Sci Eng. 2017;149:665–74.10.1016/j.petrol.2016.11.015Search in Google Scholar

[2] Liu Y, Hu W, Cao J, Wang X, Tang Q, Wu H, et al. Diagenetic constraints on the heterogeneity of tight sandstone reservoirs: A case study on the Upper Triassic Xujiahe Formation in the Sichuan Basin, southwest China. Mar Pet Geol. 2018;92:650–69.10.1016/j.marpetgeo.2017.11.027Search in Google Scholar

[3] Wang Q, Chen D, Gao X, Wang F, Li J, Liao W, et al. Microscopic pore structures of tight sandstone reservoirs and their diagenetic controls: A case study of the Upper Triassic Xujiahe Formation of the Western Sichuan Depression, China. Mar Pet Geol. 2020;113:104119.10.1016/j.marpetgeo.2019.104119Search in Google Scholar

[4] Cao BF, Luo XR, Zhang LK, Lei YH, Zhou JS. Petrofacies prediction and 3-D geological model in tight gas sandstone reservoirs by integration of well logs and geostatistical modeling. Mar Pet Geol. 2020;114(C):104202.10.1016/j.marpetgeo.2019.104202Search in Google Scholar

[5] Zhao X, Yang Z, Lin W, Xiong S, Luo Y, Wang Z, et al. Study on pore structures of tight sandstone reservoirs based on nitrogen adsorption, high-pressure mercury intrusion, and rate-controlled mercury intrusion. J Energy Resour Technol. 2019;141(11).10.1115/1.4043695Search in Google Scholar

[6] Lai J, Wang GW, Ran Y, Zhou ZL, Cui YF. Impact of diagenesis on the reservoir quality of tight oil sandstones: The case of Upper Triassic Yanchang Formation Chang 7 oil layers in Ordos Basin, China. J Pet Sci Eng. 2016;145:54–65.10.1016/j.petrol.2016.03.009Search in Google Scholar

[7] Xi K, Cao Y, Haile BG, Zhu R, Jahren J, Bjørlykke K, et al. How does the pore-throat size control the reservoir quality and oiliness of tight sandstones? The case of the Lower Cretaceous Quantou Formation in the southern Songliao Basin, China. Mar Pet Geol. 2016;76:1–15.10.1016/j.marpetgeo.2016.05.001Search in Google Scholar

[8] Wang GW, Chang XH, Yin W, Li Y, Song TT. Impact of diagenesis on reservoir quality and heterogeneity of the Upper Triassic Chang 8 tight oil sandstones in the Zhenjing area, Ordos Basin, China. Mar Pet Geol. 2017;83:84–96.10.1016/j.marpetgeo.2017.03.008Search in Google Scholar

[9] Zheng DY, Pang XQ, Jiang FJ, Liu TS, Shao XH, HY , et al. Characteristics and controlling factors of tight sandstone gas reservoirs in the Upper Paleozoic strata of Linxing area in the Ordos Basin, China. J Nat Gas Sci Eng. 2020;75(C):103135.10.1016/j.jngse.2019.103135Search in Google Scholar

[10] Liang Y, Ren ZL, Wang YL, Shi Z. Characteristics of fluid inclusions and reservoiring phases in the Yanchang Formation of Zichang area, the Ordos Basin. Oil Gas Geol. 2011;32:182–91.Search in Google Scholar

[11] Zhou X, He S, Liu P, Ju YJ. Characteristics and classification of tight oil pore structure in reservoir Chang 6 of Daijiaping area. Ordos Basin Earth Sci hrontiers. 2016;23(3):253–65.Search in Google Scholar

[12] Sakhaee-Pour A, Steven LB. Effect of pore structure on the producibility of tight-gas sandstones. AAPG Bull. 2014;98(4):663–94.10.1306/08011312078Search in Google Scholar

[13] Yang SY, Zhang JC, Huang WD, Zhang Y, Tang X. ‘Sweet spot’ types of reservoirs and genesis of tight sandstone gas in Kekeya area, Turpan-Hami Basin. Acta Petrolei Sincia. 2013;4(2):272–82.Search in Google Scholar

[14] Zhang H, Zhang R, Yang H, Shou J, Wang J, Liu C, et al. Characterization and evaluation of ultra-deep fracture-pore tight sandstone reservoirs: A case study of Cretaceous Bashijiqike Formation in Kelasu tectonic zone in Kuqa foreland basin, Tarim, NW China. Pet Exploration Dev Online. 2014;41(2):175–84.10.1016/S1876-3804(14)60020-3Search in Google Scholar

[15] Wang JH, Jiang ZX, Zhang YF, Wei XJ, Wang H, Liu SQ. Quantitative evaluation of the reservoir potential and controlling factors of semi-deep lacustrine tempestites in the Eocene Lijin Sag of the Bohai Bay Basin, East China. Mar Pet Geol. 2016;77:262–79.10.1016/j.marpetgeo.2016.05.006Search in Google Scholar

[16] Vikrant AD, Mario RE. Formation lithology classification using scalable gradient boosted decision trees. Computers & Chem Eng. 2019;128:392–404.10.1016/j.compchemeng.2019.06.001Search in Google Scholar

[17] Liao ZJ, Huang Y, Yue X, Lu H, Xuan P, Ju Y. In silico prediction of gamma-aminobutyric acid type-a receptors using novel machine-learning-based SVM and GBDT approaches. BioMed Res Int. 2016;2016:1–12. 10.1155/2016/2375268.Search in Google Scholar PubMed PubMed Central

[18] Yang SY, Wu JP, Du YM, He YQ, Chen X, Meng FL. Ensemble learning for short-term traffic prediction based on gradient boosting machine. J Sens. 2017;78:1–15.10.1155/2017/7074143Search in Google Scholar

[19] Li LJ, Yu Y, Bai SS, Cheng JJ, Chen XY, Eduard L. Towards effective network intrusion detection: a hybrid model integrating gini index and GBDT with PSO. J Sens. 2018;2018:1–9.10.1155/2018/1578314Search in Google Scholar

[20] Liao ZJ, Wan SX, He Y, Zou Q. Classification of small GTPases with hybrid protein features and advanced machine learning techniques. Curr Bioinforma. 2018;13(5):492–500.10.2174/1574893612666171121162552Search in Google Scholar

[21] Zhang CS, Zhang Y, Shi XJ, George A, Fan GJ, Shen XJ. On incremental learning for gradient boosting decision trees. Neural Process Lett. 2019;50(1):957–87.10.1007/s11063-019-09999-3Search in Google Scholar

[22] Gu YF, Zhang DY, Bao ZD, Zhang CH. Lithology prediction of tight sandstone reservoirs using GBDT. Geophys Prog. 2021;36(02):585–94.Search in Google Scholar

[23] Hamid RA, Alireza B, Mohammad AA. Evolving machine learning models to predict hydrogen sulfide solubility in the presence of various ionic liquids. J Mol Liq. 2016;216:411–22.10.1016/j.molliq.2016.01.060Search in Google Scholar

[24] Ahmadi M-A, Bahadori A, Shadizadeh SR. A rigorous model to predict the amount of dissolved calcium carbonate concentration throughout oil field brines: Side effect of pressure and temperature. Fuel. 2015;139:411–22.10.1016/j.fuel.2014.08.044Search in Google Scholar

[25] Mohammad-Ali A, Mohammad RA, Seyed MH, Mohammad E. Connectionist model predicts the porosity and permeability of petroleum reservoirs by means of petro-physical logs: Application of artificial intelligence. J Pet Sci Eng. 2014;123:183–200.10.1016/j.petrol.2014.08.026Search in Google Scholar

[26] Mohammad AA, Shifei D. Developing a robust surrogate model of chemical flooding based on the artificial neural network for enhanced oil recovery implications. Math Probl Eng. 2015;1–9. 10.1155/2015/706897.Search in Google Scholar

[27] Mohammad AA, Seyed RS. New approach for prediction of asphaltene precipitation due to natural depletion by using evolutionary algorithm concept. Fuel. 2012;102:716–23.10.1016/j.fuel.2012.05.050Search in Google Scholar

[28] Mohammad AA. Neural network based unified particle swarm optimization for prediction of asphaltene precipitation. Fluid Phase Equilibria. 2011;314:46–51.10.1016/j.fluid.2011.10.016Search in Google Scholar

[29] Mohammad AA, Mohammad G. Corrigendum to “Neural network based swarm concept for prediction asphaltene precipitation due to natural depletion” [J Pet Sci Eng. 2012;98–99:40–49]. J Pet Sci Eng. 2013;108:404–4.10.1016/j.petrol.2013.05.006Search in Google Scholar

[30] Mohammad AA, Mohammad E, Arash Y. Robust intelligent tool for estimating dew point pressure in retrograded condensate gas reservoirs: Application of particle swarm optimization. J Pet Sci Eng. 2014;123:7–19.10.1016/j.petrol.2014.05.023Search in Google Scholar

[31] Mohammad AA, Mohammad E, Payam SM, Mohammad MF. Evolving predictive model to determine condensate-to-gas ratio in retrograded condensate gas reservoirs. Fuel. 2014;124:241–57.10.1016/j.fuel.2014.01.073Search in Google Scholar

[32] Mohammad AA, Behzad P, Yahya J, Shahab A, Reza S. Connectionist technique estimates H2S solubility in ionic liquids through a low parameter approach. J Supercrit Fluids. 2015;97:81–7.10.1016/j.supflu.2014.11.009Search in Google Scholar

[33] Ali S, Mohammad AA, Seyed HZ, Alireza B, Ali A, Reza S. Estimating hydrogen sulfide solubility in ionic liquids using a machine learning approach. J Supercrit Fluids. 2014;95:525–34.10.1016/j.supflu.2014.08.011Search in Google Scholar

[34] Seyedeh RM, David AW, Mohammad AA, Abouzar C. ANNbased prediction of laboratory-scale performance of CO2-foam flooding for improving oil recovery. Nat Resour Res. 2019;28(4):1619–37.10.1007/s11053-019-09459-8Search in Google Scholar

[35] Gu YF, Zhang ZM, Zhang DM, Zhu YX, Bao ZD, Zhang DY. Complex lithology prediction using mean impact value, particle swarm optimization, and probabilistic neural network techniques. Acta Geophysica. 2020;68:1–26.10.1007/s11600-020-00504-2Search in Google Scholar

[36] Gu Y, Bao Z, Zhang D. A smart predictor used for lithologies of tight sandstone reservoirs: a case study of member of Chang 4  +  5, Jiyuan Oilfield, Ordos Basin. Pet Sci Technol. 2021;39(7–8):175–95.10.1080/10916466.2021.1881114Search in Google Scholar

[37] Gu Y, Zhang D, Lin Y, Ruan J, Bao Z. Data-driven lithology prediction for tight sandstone reservoirs based on new ensemble learning of conventional logs: A demonstration of a Yanchang member, Ordos Basin. J Pet Sci Eng. 2021;207:207.10.1016/j.petrol.2021.109292Search in Google Scholar

[38] Feng RH. Improving uncertainty analysis in well log classification by machine learning with a scaling algorithm. J Pet Sci Eng. 2020;196:1–23.10.1016/j.petrol.2020.107995Search in Google Scholar

[39] He HN, Zhao WW, Wang HZ, et al. Mechanism of Hydrocarbon Accumulation Formation and Main Controlling Factors in Chnag-7 Tight Oil of Yanchang Formation, Southeastern Ordos Basin. Unconventonal Oil & Gas, 2019:6(3):33–40.Search in Google Scholar

[40] Deng JL. Control problems of grey systems. Syst Control Lett. 1982;1(5):288–94.10.1016/S0167-6911(82)80025-XSearch in Google Scholar

[41] Deng JL. Grey control system. J Huazhuang Univ Sci Technol. 1982;3:9–18.Search in Google Scholar

[42] Xu HL, Liu J, Qiao C, Gong LP, Jin CL, Yu MG. Application of gray correlative analysis method to reservoir evaluation of Shuanghe Oilfiled. Reserv evaluation Dev. 2015;5(5):17–21.Search in Google Scholar

[43] Ma XL, Ding C, Luan S, Wang Y, Wang Y. Prioritizing influential factors for freeway incident clearance time prediction using the gradient boosting decision trees method. IEEE. 2017;18(9):1–25.10.1109/TITS.2016.2635719Search in Google Scholar

[44] Xia YF, Liu CZ, Li YY, Liu NN. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl. 2017;78:225–41.10.1016/j.eswa.2017.02.017Search in Google Scholar

[45] Gu Y, Zhang D, Bao Z. Lithological classification via an improved extreme gradient boosting: A demonstration of the Chang 4 + 5 member, Ordos Basin, Northern China. J Asian Earth Sci. 2021;215:104798.10.1016/j.jseaes.2021.104798Search in Google Scholar

[46] Xu LL, Chi DX. Machine learning classification strategy for imbalanced datasets. Computer Eng Appl. 2020;56(24):12–27.Search in Google Scholar

Received: 2021-07-12
Revised: 2022-01-19
Accepted: 2022-02-21
Published Online: 2022-07-01

© 2022 Longfei Ma et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 11.12.2023 from https://www.degruyter.com/document/doi/10.1515/geo-2022-0354/html
Scroll to top button