Abstract
In order to reduce the adverse impact of debris flow disasters on engineering construction facilities and social security in the lower reaches of the Yajiang River, this article selected 11 risk assessment factors such as elevation, aspect, profile curvature, Relief, and rainfall to study the occurrence rule of debris flow in this area. The data of disaster factors caused by debris flow points were derived and analyzed in ArcGIS. Then, factor correlation test and factor sensitivity level were established. The coupling model of qualitative mathematical model (analytic Hierarchy Process, AHP), quantitative mathematical model (binary logistic regression [LR]), machine learning model (random forest RF), and certainty factor (CF) were, respectively, used to predict the risk of debris flow disaster in the study area. After comparison, it was found that the CF–LR model had the highest accuracy. The results show that the areas with high debris flow risk are mainly concentrated in the first half of the lower reaches of the Yajiang River and distributed along both sides of the river bank. The annual rainfall range of 600–700 mm is the critical water source saturation value of debris flow in the study area.
1 Introduction
Debris flow is a fluid mixture carrying water, sand, and stones, which commonly erupts in the rainy season and has great explosive power and destructive power [1]. Due to human overuse of nature, arbitrary felling of trees, and frequent extreme weather, mudslides have brought immeasurable economic and property losses worldwide. For example, in Uttarakhand state of India in 2021, a catastrophic mudslide caused more than 200 deaths or missing, destroyed two hydropower projects, and caused damage to several basic hydropower facilities [2]. In 2013, flash floods and landslides caused by heavy rainfall in the northwestern Himalayan state killed more than 6,000 people [3]. According to relevant statistics, from 1975 to 1984, mudslide disasters in 18 provinces, cities, and districts in China caused 2,136 deaths and direct economic losses of 1.6 billion yuan [4]. On August 20, 2010, China Economic Times reporters learned at the press conference about Sichuan Province’s response to the massive mudslide disaster that the direct loss of the province was 6.89 billion yuan [5]. The reconstruction of the project and ecological restoration after the debris flow disaster will also cost a lot of human and material resources (Figure 1).
The southeast region of Tibet, with abundant rainfall and complex terrain, is the worsthit area of mudslides in Tibet. The lower reaches of the Yajiang River are located in Nyingchi City in the southeast of Tibet. The river’s lower reaches have the characteristics of large terrain drop, sufficient water, and so on. The valley is narrow and deep, so it contains huge water energy. Due to its steep terrain and huge torrents, it creates good breeding conditions for mudslides and poses a potential threat to nearby engineering facilities. Once the mudslide disaster occurs near the Yajiang River, it will cause incalculable economic losses to the construction project. The downstream of the Yajiang River traffic is inconvenient, and emergency repair personnel cannot arrive at the scene in time; maintenance work is very difficult. What is more dangerous is that if a debris flow occurs near the Yajiang River, the debris flow will move along with the current and cause major security risks to the ecological environment, wildlife, and residents on both sides. Therefore, taking appropriate and feasible methods to analyze and forecast the debris flow in the lower reaches of the Yajiang River and taking timely protective measures can reduce the loss to the greatest extent.
ArcGIS software can effectively process a variety of complex geological engineering information data and has processing functions such as mask, turn point, reclassification, and superposition of raster images. With the application of “3 S” technology in geological disaster safety analysis, Geographic Information System (GIS) tools become more and more important.
In recent years, many hybrid models have been proposed to study geological hazards and achieved good results. The evaluation method combining a fuzzy mathematical model and analytic hierarchy method not only makes up for its shortcomings but also makes the calculation result more accurate [6]. The application of certainty factor (CF) coupling model with SI or logistic regression (LR) significantly reduces subjectivity and deviation [7]. Adaptive neurofuzzy inference system is combined with grey wolf optimizer and particle swarm optimizer algorithms to manage landslide risk by using outputs of qualitative stepwise weighted assessment ratio analysis and quantitative CF models [8]. These methods can be divided into three categories: qualitative analysis method, quantitative analysis method, and machine learning method. Analytic Hierarchy Process (AHP) is representative of the qualitative analysis model, which is widely used in geological hazard assessment. Its effective prediction time is long, and the research results show that it can accurately predict the disaster risk distribution prediction within 10 years [9,10,11]. However, this requires experienced experts to accurately assign evaluation factors, which is subjective to a certain extent. The representative of the quantitative analysis model is the LR model, which is excellent in the accuracy performance of geological disaster prediction, and the prediction result is a probability between 0 and 1 [12,13,14]. The binary LR model is based on a certain sample calculation model; if the sample is insufficient, it is easy to underfit and affect the accuracy of results. In the case of insufficient samples, the qualitative analysis method is suitable. In recent years, with the rise of machine learning, decision tree, random forest (RF), and convolutional neural network have been widely used in geological disaster prediction. Combined with the research results of several scholars, random forest stands out because of its advantages of high accuracy, low jumbled lines, and fast learning speed [15,16]. Random forest is good at dealing with highdimensional data, which is ineffective in solving regression problems. In addition, for the data with fewer feature numbers, its randomness is not well played and the effect is not good. Each method has its advantages and disadvantages. Selecting the most appropriate research model can achieve the best results. This article selected the most suitable model in this research area by calculating the accuracy of the model.
The CF is a method to manage uncertainty in a rulesbased expert system. It is a model developed by Shortliffe and Buchanan in the mid1970s and widely used in medical diagnosis research in the early stage [17]. Now, it is often used to calculate the internal relationship between factors and then coupled with other models.
The selection and weight of disastercausing factors are the basis for making the susceptibility assessment chart. Considering the special geohydrological and rainfall conditions near Yajiang River, in addition to the common topographic and geological debris flow disastercausing factors, the topographic humidity index (TWI), runoff index (SPI), profile curvature, annual average rainfall, and other hydrologicrelated disastercausing factors are also selected and passed the correlation test. Rainfalltype debris flow is dominant in southeast Tibet. According to the CF values of different annual rainfall levels, different susceptibility conditions can be obtained to study the influence of annual rainfall on the occurrence of debris flow.
Most research literature only selects a mathematical model for research, which will make the results have certain limitations. This article studies the mathematical models with different properties and compares the accuracy of the models to get the most suitable model for research. The parameters of the three types of models are analyzed in depth. In addition, the CF model can be used to quantitatively calculate the influence degree of a disaster factor on debris flow disaster. Even in the case of insufficient data, debris flow disasters can be predicted to the greatest extent. Figure 2 shows the flow chart. Finally, the accuracy of the prone area was verified by satellite image and field survey. The research results can provide theoretical support for disaster prevention and reduction near the lower reaches of the Yajiang River.
2 The study area
2.1 Research scope
The study area is located in the east of Nyingchi City, Tibet Autonomous Region (Figure 3a). The overall elevation trend of Nyingchi City is higher in the north and lower in the west, resulting in higher terrain in the upper half of Yajiang and lower in the lower half of Yajiang (Figure 3b). Nyingchi area is a highfrequency area of debris flow due to its unique hydroclimatic conditions, plateau environment, complex stratigraphic lithology conditions, and vegetation coverage. The Yajiang River flows from Lang County to Nyingchi City and from Metuo County to India. The overall distribution is high in the east and low in the west, with a large elevation span of 5,110–86 m (Figure 3c). It passes through the Brahmaputra Canyon which is the largest and deepest canyon in the world. Steep cliffs on both sides of the canyon, narrow riverbeds, and rapid flow of water, for the occurrence of debris flow, nurtured good conditions.
With the development and utilization of the water resources of the Yajiang River, it is found that natural geological disasters are easy to occur near the basin of the Yajiang River. Once the disasters occur, they will cause serious threats and losses to engineering and construction. In this paper, the width of the two banks of the Yajiang River in southeast Tibet is 2 km in the study area. The study area is a total area of about 4309.5 km^{2}, mudrock flow point 42, roughly the information shown in Table 1; 42 nondebris flow points were randomly selected in the study area, and 84 debris flow research sample banks were obtained to carry out assessment work.
Number  Longitude  Latitude  Number  Longitude  Latitude 

1  93° 23′ 14.355″ E  29° 3′ 10.387″ N  22  93° 35′ 35.847″ E  29° 11′ 24.231″ N 
2  93° 52′ 31.296″ E  29° 7′ 12.359″ N  23  94° 11′ 29.918″ E  29° 14′ 52.277″ N 
3  93° 27′ 10.880″ E  29° 4′ 54.747″ N  24  93° 34′ 0.563″ E  29° 11′ 21.619″ N 
4  93° 47′ 58.458″ E  29° 7′ 6.189″ N  25  94° 17′ 21.341″ E  29° 16′ 43.265″ N 
5  93° 23′ 58.029″ E  29° 5′ 7.855″ N  26  94° 19′ 16.585″ E  29° 18′ 26.958″ N 
6  93° 43′ 27.572″ E  29° 7′ 57.374″ N  27  94° 23′ 53.952″ E  29° 19′ 40.866″ N 
7  93° 27′ 33.153″ E  29° 6′ 34.610″ N  28  94° 21′ 35.080″ E  29° 21′ 10.679″ N 
8  93° 54′ 30.953″ E  29° 9′ 42.993″ N  29  94° 28′ 40.969″ E  29° 23′ 42.580″ N 
9  94° 10′ 30.934″ E  29° 11′ 38.545″ N  30  94° 48′ 17.557″ E  29° 26′ 50.230″ N 
10  93° 39′ 44.594″ E  29° 8′ 42.212″ N  31  94° 44′ 30.664″ E  29° 27′ 17.997″ N 
11  94° 7′ 59.459″ E  29° 11′ 39.668″ N  32  94° 30′ 7.228″ E  29° 26′ 3.776″ N 
12  93° 38′ 23.048″ E  29° 8′ 56.420″ N  33  94° 50′ 37.209″ E  29° 27′ 57.208″ N 
13  93° 46′ 24.887″ E  29° 9′ 57.879″ N  34  94° 31′ 51.521″ E  29° 26′ 32.045″ N 
14  94° 2′ 26.996″ E  29° 11′ 38.939″ N  35  94° 35′ 43.658″ E  29° 28′ 2.739″ N 
15  93° 45′ 33.361″ E  29° 10′ 11.034″ N  36  94° 46′ 14.046″ E  29° 29′ 33.614″ N 
16  94° 13′ 13.543″ E  29° 13′ 2.239″ N  37  94° 42′ 20.511″ E  29° 29′ 16.302″ N 
17  92° 52′ 34.606″ E  29° 5′ 23.009″ N  38  94° 49′ 50.876″ E  29° 30′ 30.332″ N 
18  93° 41′ 23.578″ E  29° 10′ 31.258″ N  39  94° 53′ 45.216″ E  29° 31′ 14.790″ N 
19  93° 31′ 19.109″ E  29° 9′ 53.623″ N  40  94° 53′ 2.991″ E  29° 33′ 14.104″ N 
20  94° 15′ 36.164″ E  29° 14′ 43.894″ N  41  94° 56′ 9.547″ E  29° 34′ 31.865″ N 
21  93° 30′ 48.671″ E  29° 10′ 41.953″ N  42  94° 57′ 0.201″ E  29° 38′ 4.646″ N 
2.2 Disasterinducing factors
2.2.1 Topographic factors
The research area has a special plateau environment. The weathering degree of rocks, climatic conditions, and natural environment are distinctive at different elevations. Slope aspect affects sunshine duration, vegetation growth, and airflow direction. The profile curvature can better reflect the concavity of the overall topography than the slope. It refers to the difference between the highest elevation and the lowest elevation in a specific region, and the formula is:
2.2.2 Meteorological factors
Most of the debris flows in southeast Tibet are rainfall type, and the outbreak time is concentrated in the rainfall season. Because the topographic geology and environmental conditions do not change much in a short time, the average annual rainfall is taken as the main factor in this study.
2.2.3 Geological factors
Lithology is also one of the main disaster factors affecting the occurrence of debris flow. The main rock types in southeast Tibet include diorite, gneiss, phyllite and quartz sandstone, conglomerates, granites, and phyllite [18]. However, due to the dense distribution of debris flow points in the study area, most of them occur in the gneiss area, so the influence of lithology on debris flow cannot be reflected.
After the strong weathering of rocks, broken loose materials or fine particles are produced, and the severely weathered rocks can be transformed into a solid source of debris flow. Weathering products of rocks will remain in the soil and gradually become part of the soil. According to the different original rock types, the content of coarse particles remaining in the soil will also be different [19]. Therefore, according to the proportion of coarse particles in the soil texture, the surrounding loose matter can be known. Soils are mainly divided into silty, sandy, and clay soils. Silty soil is the main product of rock weathering, so the content of silty soil is selected instead of lithology as a geological factor. The earthquake fault zone in southeast Tibet is active, and earthquakes often destroy the mountain structure, damage the ecological environment, and bring about a chain reaction of geological disasters. Earthquake is a kind of natural disaster that cannot be predicted at present [20]. The distance from the fault zone is a potentially important disaster factor.
2.2.4 Hydrological factors
TWI and SPI are hydrological factors commonly used in geological disasters [21,22], quantifying the control of topography on hydrology. The calculation is as follows:
where α is the cumulative upslope area of a drainage basin through a point and tan(Slope) is the angle of the slope at the same point. α high index value indicates a great potential of water accumulated due to low slope angles.
2.2.5 Inducer factors
Land use type is the product of human natural activities and engineering activities. The loose material accumulated in the gully can be directly converted into debris flow fluid material under the action of heavy rainfall. Areas with high land use rate also have more hidden loose solid sources [23].
Vegetation coverage has the function of preventing soil erosion, especially in areas with high rainfall. But in the case of debris flow, vegetation can also become a solid source. To some extent, vegetation coverage has the function of preventing soil erosion, especially in areas with high rainfall. But in the event of a debris flow, vegetation can also become a solid source.
2.3 Data collection
The debris flow points were provided by geological disaster survey data of counties and cities in the Tibet Autonomous Region. In this article, all debris flow events that occurred in the study area from 2001 to 2009 were selected as research objects. Through the random point creation function in Arcgis, 42 nondebris flow points were selected (there was no debris flows near these points for more than a decade).
Geological and geomorphic information such as elevation, slope aspect, and relief will not change greatly in decades, so there is little difference in the results caused by choosing different times. In this article, the data from 2020 are used to predict, and the data from 2000 that did not occur debris flow are used to study.
2.4 Correlation of factors
Based on the data of 42 debris flow points, the correlation analysis of 11 important index data was carried out using SPSS software (Table 2).
NDVI  SPI  TWI  Fault distance  Silt content  Elevation  rainfall  Profile curvature  Aspect  Relief  land utilization  

NDVI  1  0.338  −0.218  0.168  0.105  0.559  0.291  0.231  −0.091  0.473  −0.165 
SPI  0.338  1  0.433  0.024  0.105  0.342  0.03  0.281  0.094  0.514  −0.287 
TWI  −0.218  0.433  1  −0.03  −0.152  −0.387  −0.161  0.015  −0.157  −0.329  0.11 
Fault distance  0.168  0.024  −0.03  1  0.012  −0.004  −0.254  0.182  −0.204  0.015  0.09 
Silt content  0.105  0.105  −0.152  0.012  1  0.147  −0.068  −0.046  −0.059  0.104  −0.081 
Elevation  0.559  0.342  −0.387  −0.004  0.147  1  0.399  0.079  0.144  0.714  −0.186 
Rainfall  0.291  0.03  −0.161  −0.254  −0.068  0.399  1  −0.168  −0.012  0.035  −0.1 
Profile curvature  0.231  0.281  0.015  0.182  −0.046  0.079  −0.168  1  −0.121  0.333  −0.111 
Aspect  −0.091  0.094  −0.157  −0.204  −0.059  0.144  −0.012  −0.121  1  0.097  0.023 
Relief  0.473  0.514  −0.329  0.015  0.104  0.714  0.035  0.333  0.097  1  −0.313 
Land utilization  −0.165  −0.287  0.11  0.09  −0.081  −0.186  −0.1  −0.111  0.023  −0.313  1 
There is a weak positive correlation between elevation and relief, elevation and NDVI, and relief and SPI. The correlation coefficients between the other factors are small, which can be considered as having no correlation. The factors used met the factor selection conditions.
3 Materials and methods
3.1 CF model
The CF model can well calculate the CF values of different categories of the same factor, and the calculation formula is as follows:
where PPa is the occurrence probability of debris flow points in different categories of single factor area, and PPs is the occurrence probability of debris flow points in the total study area. CF values range from −1 to 1, with the closer to 1 the more certain, and the closer to −1 the more uncertain.
The 10 disastercausing factors of 42 debris flow sample points in the study area were classified. The CF value is calculated by the CF model, as shown in Table 3.
Disaster factor  Classification  Category area (km^{2})  Debris flow point  The total number of (%)  CF 

Fault distance  0–2,000  390.1167  3  7.14  −0.2126 
2,000–4,000  787.4118  9  21.43  0.1488  
4,000–6,000  477.3636  6  14.29  0.2268  
6,000–8,000  434.9799  5  11.90  0.1536  
8,000–10,000  640.6857  4  9.52  −0.3616  
10,000–12,000  256.8735  9  21.43  0.7289  
12,000–14,000  99.7353  6  14.29  0.8462  
Elevation  2,900–3,100  657.1017  17  40.48  0.6294 
3,100–3,300  369.9387  7  16.67  0.4897  
3,300–3,500  296.8272  7  16.67  0.5925  
3,500–3,700  261.7623  4  9.52  0.3658  
3,700–3,900  213.7941  5  11.90  0.5890  
3,900–4,100  153.8442  2  4.76  0.2528  
NDVI  0.2–0.4  259.7121  3  7.14  0.1578 
0.4–0.6  486.9963  11  26.19  0.5741  
0.6–0.8  1062.332  18  42.86  0.4290  
0.8–1  2428.728  10  23.81  −0.5799  
Average annual rainfall  600–700  1263.117  3  7.14  −0.7581 
700–800  1779.83  22  52.38  0.2136  
800–900  1069.183  14  33.33  0.2582  
900–1,000  183.0861  3  7.14  0.4092  
Profile curvature  0–5  1071.729  13  30.95  0.1985 
5–10  1509.003  16  38.10  0.0816  
10–15  962.2728  7  16.67  −0.2554  
15–20  471.6315  5  11.90  0.0815  
20–25  197.0577  1  2.38  −0.4817  
Relief  0–10  264.9465  7  16.67  0.6373 
10–20  535.6998  9  21.43  0.4240  
20–30  512.3988  5  11.90  0.0013  
30–40  586.0062  4  9.52  −0.3017  
40–50  641.7855  6  14.29  −0.0411  
50–60  595.4427  2  4.76  −0.6576  
60–70  458.5059  4  9.52  −0.1058  
70–80  301.1166  0  0.00  −1.0000  
80–90  177.4665  3  7.14  0.4276  
90–100  236.2662  2  4.76  −0.1325  
Silt content  10–20  335.1366  3  7.14  −0.0822 
20–30  2598.244  27  64.29  0.0628  
30–40  963.4104  12  28.57  0.2197  
TWI  <0  1262.587  10  23.81  −0.1888 
0–2  1247.879  10  23.81  −0.1792  
2–4  943.4673  5  11.90  −0.4587  
4–6  350.8659  2  4.76  −0.4175  
6–8  141.7248  3  7.14  0.5449  
8–10  173.4723  5  11.90  0.6684  
10–12  130.9842  6  14.29  0.7950  
12–14  28.9854  1  2.38  0.7246  
>14  29.5353  0  0.00  −1.0000  
SPI  <−8  414.8919  5  11.90  0.1932 
−8,−4  872.6598  8  19.05  −0.0599  
−4,0  877.1535  12  28.57  0.2904  
0–4  1930.46  15  35.71  −0.2043  
4–8  203.0436  2  4.76  0.0107  
Aspect  North  532.7046  6  14.29  0.1360 
Northeast  513.9783  3  7.14  −0.4035  
East  484.9236  2  4.76  −0.5792  
Southeast  543.9105  5  11.90  −0.0573  
South  543.357  7  16.67  0.2459  
Southwest  559.8576  5  11.90  −0.0844  
West  539.8083  4  9.52  −0.2415  
Northwest  580.2552  10  23.81  0.4388  
Plain  10.7055  0  0.00  −1.0000  
Land utilization  10–20  213.2136  3  7.14  0.3115 
20–30  2847.981  19  45.24  −0.3165  
30–40  929.457  15  35.71  0.4010  
40–50  244.8666  5  11.9  0.5286 
It can be seen from Table 3 that the fault distance is between 12,000 and 14,000 m; The elevation is between 2,900 and 3,100 m; NDVI ranges 0.4–0.6; annual rainfall in the 900–1,000 mm range; the profile curvature is between 0 and 5; the fluctuation is in the range of 0–10 m; silty soil content in the range of 30–40; TWI ranges from 10 to 12; SPI is in the range of −4 to 0; when the slope direction is northwest, debris flow is most likely to occur.
3.2 CF–AHP
AHP is an evaluation method combining qualitative and quantitative methods, which decomposes the elements related to decisionmaking into levels of objectives, criteria, schemes, etc. [24]. The quantization table of AHP is shown in Table 4. According to the quantization values of 1–9, all disastercausing factors are compared in pairs and assigned, and the judgment matrix is listed. Scholar Wu Chenshuang, who has a certain indepth study on the whole geological disaster of Tibet, was invited to score this time [6], and the opinions of Academician Cui Peng, Professor Tang Chuan, and other relevant personnel with great research achievements in this field were also referred to the study by Liu et al and Guo et al. [25,26]. The score is relatively reliable.
Factor i over factor j  Value of quantization 

Equality  1 
Slightly  3 
Moderately  5 
Strong  7 
Extreme  9 
Two neighbors determine the median value  2, 4, 6, 8 
Reciprocal  a _{ ij } = 1/a _{ ij } 
By multiplying the weight of each factor calculated by AHP by the CF value of the corresponding 11 factors of a grid, the debris flow occurrence weight w of the grid can be obtained. W is a positive number that represents the mudslide, w is a negative representative mudslide that does not occur, and the size of the w represents the influence degree. Finally, 84 debris flow points in the sample database were used to verify the accuracy of the model.
In order to verify the rationality of the scoring of quantified values, all evaluation matrices should be subjected to a consistency test, which can prove that the expert scoring is reasonable. Equation (5) is obtained by converting Aw = λw, which is used to calculate λ _{max}. According to formula (6), CI can be obtained. The smaller the value of CI, the more reliable the weight. According to formula (7), CR can be calculated. When CR <0.1, the matrix passes the consistency test.
where A is the constructed judgment matrix, w is the weight, n is the number of indicators, and RI can be obtained by referring to Table 4.
A1  A2  A3  

A1  1  1/2  1/4 
A2  2  1  1/2 
A3  4  2  1 
The disaster factors are divided into two layers. In the first criterion layer, A1 is the terrain, A2 is the solid source, and A3 is the water source. The matrix is shown in Table 5, CR = 0.058 < 0.1, passing the consistency test. The second criterion layer is divided into three matrices, as shown in Table 6, which are B1 elevation, B2 slope aspect, B3 profile curvature, and B4 relief; C1 fault distance, C2 silt content, C3 NDVI, C4 land utilization; D1 rainfall, D2 TWI, D3 SPI. CR was 0.033, 0.0582, and 0.004, respectively.
B1  B2  B3  B4  

B1  1  2  1/4  1/3 
B2  ½  1  1/6  1/4 
B3  4  6  1  3 
B4  3  4  1/3  1 
C1  C2  C3  C4  

C1  1  2/3  2  1/4 
C2  3/2  1  3  1/2 
C3  ½  1/3  1  1/6 
C4  4  2  6  1 
D1  D2  D3  

D1  1  3  5  
D2  1/3  1  2  
D3  1/5  1/2  1 
3.3 CF–AHP
The binary LR model is an algorithm that expresses or predicts trends with a linear relationship based on statistics. 70% of the sample points, including 30 groups of debris flow points and 30 groups of nondebris flow points, were randomly selected to construct the model. The remaining 30% of debris flow was used to validate the model. The calculated CF value was used to replace the data of 11 disastercausing factors of debris flow and was used as the independent variables of the model. Whether debris flow occurs (“1” means debris flow occurs, “0” means no debris flow occurs) is used as the dependent variable of the model [27,28], and the results are shown in Table 7.
Regression coefficient  Standard error  Chisquare value  Degree of freedom  

Fault distance  −1.61  1.053  2.339  1 
Elevation  6.863  2.783  6.083  1 
NDVI  −1.051  1.458  0.52  1 
Average annual rainfall  −1.779  1.477  0.728  1 
Aspect  −1.26  3.738  4.829  1 
Profile curvature  8.215  1.468  3.186  1 
Relief  2.621  4.151  3.132  1 
Silt content  −7.345  1.65  4.419  1 
TWI  3.468  3.153  4.709  1 
SPI  5.008  2.115  5.608  1 
Land utilization  0.143  1.245  0.013  1 
Constant  −1.856  1.019  3.322  1 
Therefore, the occurrence probability of debris flow in the cell can be expressed as follows:
3.4 CF–RF
RF is a nonparametric statistical technique based on regression or classification of a decision tree set (forest) [29]. The workflow flow chart is shown in Figure 4. The importance of features is obtained by using 70% debris flow sample points. The remaining 30% verifies model accuracy.
4 Results and discussion
4.1 Analysis of calculation results
In the AHP, the weight of the first criterion layer and the second criterion layer are multiplied to obtain the weight value of the disaster factor. The variable coefficients and constants of disaster factors were obtained by binary LR. The characteristic importance of disaster factors is obtained by random forest after data training (Table 8).
LR (W)  AHP  RF  

Fault distance  1.61  0.0462  0.0866 
Elevation  6.863  0.0217  0.2014 
NDVI  1.051  0.0248  0.0628 
Rainfall  1.26  0.2942  0.0552 
Profile curvature  8.215  0.0989  0.0664 
Relief  2.621  0.0480  0.1212 
Silt content  7.345  0.0744  0.0562 
TWI  3.468  0.1351  0.0813 
SPI  6.842  0.0841  0.0545 
Aspect  5.008  0.0129  0.1525 
Land utilization  0.143  0.1599  0.0619 
It can be seen from Figure 5 that elevation, silty soil, profile curvature, and SPI in the CF–LR model have a large weight. In the CF–AHP model, the weight of rainfall, land use, and TWI are larger. In the CF–RF model, elevation, slope aspect, and relief have greater weight.
Although the maximum weight factors of the three models are different, they all show that topography has an important impact on the development of debris flow. The difference in weight factors may be due to the lack of local attention to the debris flow database in the lower reaches of the Yajiang River, resulting in insufficient debris flow sample points and insufficient research data. The lack of sufficient sample data will affect the accuracy of the LR model and the random play of RF. At the same time, AHP is highly subjective, which exaggerates the influence of rainfall on debris flow when assigning factors.
4.2 Comparison of model accuracy
Confusion matrix is a standard way to evaluate the accuracy of a model. Its function is to judge the probability of correct judgment and the probability of wrong judgment. Its main indicators are precision, Recall, specificity, and F1 Score (the closer to 1, the better the effect). It can be seen from Table 9 that the specificity of the RF model is slightly higher than that of the LR model, and other indicators of the LR model are better than other models. Therefore, the LR model is applicable as the research method in the research area at the present stage.
Precision  Sensitivity  Specificity  F1 score  

LR  0.7907  0.8095  0.7857  0.8000 
AHP  0.6182  0.8095  0.5000  0.7010 
RF  0.6552  0.9048  0.5238  0.7600 
Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. The ROC curve of the three methods was verified by the established debris flow database (Figure 6). According to the area under the curve (AUC) (Figure 7), the CF–LR coupling model has the highest accuracy, followed by the CF–RF coupling model, and finally CF–AHP coupling model.
4.3 Susceptibility map of study area
GIS can well collect, analyze, and process complex spatiotemporal data and provide good help in processing geological disaster information. The calculated CF value is used to assign values to the raster graph, as shown in Figure 8. The CF–LR coupling model is used to make the debris flow disaster susceptibility distribution map. The evaluation chart of debris flow susceptibility in the lower reaches of Yajiang River is shown in Figure 9.
As can be seen from Figure 10, the probability of most grid points is close to 0 or 1, with a roughly uniform distribution in the middle. Therefore, the equidistant classification method was used to divide the susceptibility into four categories: low susceptibility area, mild susceptibility area, moderate susceptibility area, and high susceptibility area. It can be seen from Figure 7 that most of the study area belongs to the lowrisk area of debris flow, accounting for 54.57% of the total study area. mild, moderate, and highrisk areas accounted for 6.1, 5.61, and 33.76%, respectively. With the Grand Canyon as the boundary point, the areas prone to debris flow are basically distributed on both sides of the upstream river bank of the Grand Canyon and a small amount of them are distributed downstream.
Through satellite images (https://www.earthol.com/) and field investigation, several places with potential debris flow hazards in highprone areas were found, mainly in the “n” shaped watershed. In addition to the steep terrain, these areas also have fastflowing water and complicated flow direction, and the corrosion and scouring of the water greatly increased the potential debris flow hazards (Figures 11–13).
4.4 Study of different rainfall
By introducing CF values of different annual rainfall into the model, the influence of annual rainfall on the study area is studied. The CF value of grade A rainfall (600–700 mm) is −0.7581, that of grade B rainfall (700–800 mm) is 0.2136, that of grade C rainfall (800–900 mm) is 0.2582, and that of grade D rainfall (900–1,000 mm) is 0.4092. The susceptibility of debris flow at different gears in the study area was obtained through grid calculation and processing, as shown in Figure 8 (Figure 14).
The results show that when the annual rainfall is between 600 and 1,000 mm, the proportion of lowprone areas increases with the increase in annual rainfall, while the proportion of highprone areas decreases with the increase in annual rainfall. The number of debris flows is not much when the rainfall is 600–700 mm, but suddenly increases when the rainfall of Bgrade is 700–800 mm. This may be due to the influence of the geographical environment and geological factors in the study area. When the annual rainfall reaches 600–700 mm, it is at the critical point of debris flow outbreak, and when the annual rainfall exceeds the critical point, debris flow will be triggered. Debris flow takes away effective solid material sources after the eruption, making it difficult to repeat debris flow. However, due to the concave terrain in some areas, drainage is difficult, so more water sources have accumulated. With the increase in rainfall, the accumulated water source can drive the movement of heavier solid materials, which will cause a small amount of debris flow disaster.
5 Conclusions
Based on the establishment of a 1:1 database of debris flow points and nondebris flow samples and the collected information on disaster factors, the evaluation of the three coupling models of CF–AHP, CF–LR, and CF–RF for the study area was calculated. Although there were some differences in the evaluation results, they were all relatively accurate, but the accuracy of the CF–LR model was the highest, reaching 0.884.
Based on the CF model, it can be concluded that the study area is 12,000–14,000 m away from the fault with the elevation of 2,900– 3,100 m; NDVI is 0.4–0.6; profile curvature is 0–5; the fluctuation is 0–10; the silty soil content is 30–40%; TWI at 10–12; when SPI is −4–0 and slope direction is located in the northwest of these geological and geomorphological conditions, debris flow is most likely to occur.
The factor weights calculated by the three methods are biased. The main reasons for this phenomenon are as follows: insufficient research data leads to low accuracy of the model. The subjective factor of AHP is strong; the LR model is prone to underfitting due to insufficient samples. RF is suitable for highlatitude data, and the LR model is more suitable for lowdimension data. But the accuracy of the three coupling methods is more than 0.7, and all of them have good prediction effects.
Under the GIS platform and based on the CF–LR model, the lowprone area accounted for 54.57% of the study area. This indicates that the research area is relatively safe. The proportion of highprone areas was 33.76%, and the remaining mildly prone and mediumprone areas were not large. The danger zone is mainly distributed in the Grand Canyon and the previous river basin, concentrated on the two sides of the river within 2 km.
The CF values of four different kinds of rainfall were brought into the model, and the analysis showed that the annual rainfall threshold of debris flow in the study area was in the range of 600–700 mm, and debris flow erupted intensively when the rainfall reached 700–800 mm. After 800 mm, the proportion of lowprone areas gradually increased, while the proportion of highprone areas gradually decreased.
Research limitations: (1) The establishment of a debris flow database is not sufficient, which affects the accuracy of the model and needs to be improved; (2) this article predicts debris flowprone areas from a macro perspective, considering that the study area is large and sparsely populated, and lacks professional equipment and teams for field investigation; (3) at present, the quantity and type of disaster factors can only be selected based on experience and literature, and there is no detailed classification system. For example, when the number of indicators is large, the AHP is more complex. We will try the BWMMABAC model [30] and Failure mode and effect analysis tool [31] to improve this problem in the future.
Acknowledgments
This research is supported by the National Natural Science Foundation of China (Grant No. U21A20158) and Natural Science Foundation of Tibet Autonomous Region (XZ202201ZY0034G).

Conflict of interest: The authors declare that they have no conflict of interest.
References
[1] Takahashi T. Debris flow. Annu Rev Fluid Mech. 1981;13(1):57–77. 10.1146/annurev.fl.13.010181.000421.Search in Google Scholar
[2] Shugar DH, Jacquemart M, Shean D, Bhushan S, Upadhyay K, Sattar A, et al. A massive rock and ice avalanche caused the 2021 disaster at Chamoli, Indian Himalaya. Science. 2021;373:300–6. 10.1126/science.abh4455.Search in Google Scholar PubMed
[3] Allen SK, Rastner P, Arora M, Huggel C, Stoffel M. Lake outburst and debris flow disaster at Kedarnath, June 2013: Hydrometeorological triggering and topographic predisposition. Landslides. 2016;13(6):1479–91.10.1007/s1034601505843Search in Google Scholar
[4] Hou LG. Review of research on disaster assessment of debris flow. Geol Hazards Environ Prot. 2006;1:45–9 (in Chinese).Search in Google Scholar
[5] Han QH, Ma LN. The economic loss of Sichuan landslide reached 6.89 billion yuan. China Economic Times. 20100823. 8.Search in Google Scholar
[6] Wu C, Guo Y, Su L. Risk assessment of geological disasters in Nyingchi, Tibet. Open Geosci. 2021;13(1):219–32. 10.1515/geo20200208.Search in Google Scholar
[7] Dou J, Tien Bui D P, Yunus A, Jia K, Song X, Revhaug I, et al. Optimization of causative factors for landslide susceptibility evaluation using remote sensing and GIS data in Parts of Niigata, Japan. PLoS One. 2015;10(7):e0133262. 10.1371/journal.pone.0133262.Search in Google Scholar PubMed PubMed Central
[8] Paryani S, Neshat A, Javadi S, Pradhan B. Comparative performance of new hybrid ANFIS models in landslide susceptibility mapping. Nat Hazards. 2020;103:1961–88. 10.1007/s11069020040679.Search in Google Scholar
[9] Liu Y, Eckert CM, Earl C. A review of fuzzy AHP methods for decisionmaking with subjective judgements. Expert Syst Appl. 2020;161:113738. 10.1016/j.eswa.2020.113738.Search in Google Scholar
[10] Zheng Q, Lyu HM, Zhou A, Shen SL. Risk assessment of geohazards along ChengKun railway using fuzzy AHP incorporated into GIS. Geomatics, Nat Hazards Risk. 2021;12(1):1508–31. 10.1080/19475705.2021.1933614.Search in Google Scholar
[11] Zhao H, Yao L, Mei G, Liu T, Ning Y. A fuzzy comprehensive evaluation method based on AHP and entropy for a landslide susceptibility map. Entropy. 2017;19(8):396. 10.3390/e19080396.Search in Google Scholar
[12] Mandal S, Mandal K. Modeling and mapping landslide susceptibility zones using GIS based multivariate binary logistic regression (LR) model in the Rorachu river basin of eastern Sikkim Himalaya, India. Modeling Earth Syst Environ. 2018;4(1):69–88. 10.1007/s4080801804260.Search in Google Scholar
[13] Elkadiri R, Sultan M, Youssef AM, Elbayoumi T, Chase R, Bulkhi AB, et al. A remote sensingbased approach for debrisflow susceptibility assessment using artificial neural networks and logistic regression modeling. IEEE J Sel Top Appl Earth Observations Remote Sens. 2014;7(12):4818–35. 10.1109/jstars.2014.2337273.Search in Google Scholar
[14] Xu W, Yu W, Zhang G. Prediction method of debris flow by logistic model with two types of rainfall: A case study in the Sichuan, China. Nat Hazards. 2011;62(2):733–44. 10.1007/s1106901199880.Search in Google Scholar
[15] Dou J, Yunus AP, Tien Bui D, Merghadi A, Sahana M, Zhu Z, et al. Assessment of advanced random forest and decision tree algorithms for modeling rainfallinduced landslide susceptibility in the IzuOshima Volcanic Island, Japan. Sci Total Environ. 2019;662:332–46. 10.1016/j.scitotenv.2019.01.221.Search in Google Scholar PubMed
[16] Kern AN, Addison P, Oommen T, Salazar SE, Coffman RA. Machine learning based predictive modeling of debris flow probability following wildfire in the intermountain Western United States. Math Geosci. 2017;49(6):717–35. 10.1007/s1100401796812.Search in Google Scholar
[17] Gaag L. A pragmatical view on the certainty factor model. The Netherlands: Utrecht University. 1990. p. 1–8.Search in Google Scholar
[18] Lin RJ, Cao ZX. Analysis on distribution characteristics of debris flow in Xizang Province. Xizang Sci Technol. 2016;4:67–70.Search in Google Scholar
[19] Hosseini S, Lashkaripour GR, Hafezi Moghaddas N, Ghafoori M. Indexing the engineering properties of residual soils in the southern slopes of Mashhad, NE Iran. J Mt Sci. 2020;17(9):2179–202. 10.1007/s1162901957707.Search in Google Scholar
[20] Wei XL, Chen NS, Cheng QG, He N, Deng MF, Tanoli JI. Longterm activiti of earthquakeinduced landslides:A case study from Qionghai Lake Basin, Southwest of China. J Mt Sci. 2014;11(3):607–24.10.1007/s1162901329704Search in Google Scholar
[21] Sorensen R, Seibert J. Effects of DEM resolution on the calculation of topographical indices: TWI and its components. J Hydrol. 2007;347(1–2):79–89.10.1016/j.jhydrol.2007.09.001Search in Google Scholar
[22] Dou J, Tien Bui D, Yunus AP, Jia K, Song X, Revhaug I, et al. Optimization of causative factors for landslide susceptibility evaluation using remote sensing and GIS data in parts of Niigata,Japan. PLoS One. 2015;10(7):e0133262. 10.1371/journal.pone.0133262 Search in Google Scholar PubMed PubMed Central
[23] Ding M, Yang G, Gao Z, Huang T. IOP Conference Series Earth and Environmental Science. Vol. 861; 2021. p. 052017. 10.1088/17551315/861/5/052017.Search in Google Scholar
[24] AlHarbi KMAS. Application of the AHP in project management. Int J Proj Manag. 2001;19(1):19–27.10.1016/S02637863(99)000381Search in Google Scholar
[25] Liu Y, Tang C, Feng Y. The amount of information based on AHP method of geological disaster risk assessment. J Earth Environ. 2013;9(2):173–9. 10.14050/j.carol carroll nki. 16729250.2013.02.015.Search in Google Scholar
[26] Guo X, Cui P, Marchi L, Ge Y. Characteristics of rainfall responsible for debris flows in Wenchuan Earthquake area. Environ Earth Sci. 2017;76(17):596. 10.1007/s126650176940y.Search in Google Scholar
[27] Harrell FE. Binary logistic regression. Regression modeling strategies. Cham: Springer; 2015. p. 219–74.10.1007/9783319194257_10Search in Google Scholar
[28] Tranmer M, Elliot M. Binary logistic regression. Cathie Marsh for census and survey research, paper, 20; 2008.Search in Google Scholar
[29] Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.10.1023/A:1010933404324Search in Google Scholar
[30] Huang G, Xiao L, Pedrycz W, Pamucar D, Zhang G, Martinez L. Design alternative assessment and selection: A novel Zcloud rough numberbased BWMMABAC model. Inf Sci An Int J. 2022;603:149–89.10.1016/j.ins.2022.04.040Search in Google Scholar
[31] Huang G, Xiao L, Pedrycz W, Zhang G, Martinez L, et al. Failure mode and effect analysis using Tspherical fuzzy maximizing deviation and combined comparison solution methods. IEEE Trans Reliab, 2022. 10.1109/TR.2022.3194057.Search in Google Scholar
© 2023 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.