Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access October 3, 2022

Optimization of interpolation parameters based on statistical experiment

  • Jinming Zhang EMAIL logo
From the journal Open Geosciences

Abstract

Interpolation parameters are the basic elements of an interpolation algorithm in terrain environmental modeling. Different interpolation parameters produce different interpolation precisions, and the differences can be substantial. They are divided into deterministic parameters and uncertainty parameters. The optimization of uncertainty interpolation parameters is studied in this article. First, the methods of cross-validation, correlation analysis, and trend surface analysis are systematically used to identify and determine the optimal interval of interpolation parameters, which is helpful to solve the “black box” problem of parameter selection. Second, the significance degree of different parameters on the interpolation accuracy is given by using analysis of variance, which indicated that the high influence parameters are more helpful to improve the accuracy. This is an innovative conclusion.

1 Introduction

Digital elevation model (DEM) is a mathematically derived representation of the earth’s surface and is an important part of the environment. It is produced by collecting elevation point data and then interpolating those points to a surface [1]. Therefore, as the main method of terrain environment modeling, the interpolation algorithm is one of the main factors affecting the modeling accuracy. To improve the accuracy, many scholars have studied the influence of the interpolation algorithm on the accuracy and the performance effect of the interpolation algorithm under different conditions [2].

Gao used the digital contour data of three terrain units to compare the differences between the weighted average algorithm, minimum curvature algorithm, and Kriging algorithm in terms of the grid size, sampling density, number of nearest neighbors, distance attenuation index, and elevation loss [3]. The experimental results show that when the grid size is large, DEM accuracy is obtained no matter which interpolation algorithm is equivalent. Among them, DEM created by the minimum curvature algorithm is the most sensitive to changes in the grid size. The Kriging algorithm is hardly affected by the number of elevations in the nearest neighborhood, but the weighted average algorithm strictly depends on the number of elevations in the nearest neighborhood. The minimum curvature algorithm is better than the weighted average algorithm and Kriging algorithm when dealing with the uncertainty of terrain due to lack of elevation.

Chaplot et al. evaluated the effects of landform types, the density of original data, and interpolation techniques on the accuracy of DEM generation [4]. The experimental results show that if the variability of spatial structure is not considered, and the sampling density is high enough, there are few significant differences in the existing interpolation algorithms. However, when the sampling density is low, the spatial structure is strong, the coefficient of variation is low, and anisotropy is low, the Kriging algorithm has better estimation accuracy. When the coefficient of variation is low and the spatial structure is low, the regular tension spline algorithm has better estimation accuracy. When the coefficient of variation is high, the spatial structure is strong, and anisotropy is strong, the inverse distance weighted (IDW) algorithm performs better.

Heritage et al. selected five sampling strategies (i.e., Cross-Section, Bar-Outline-Only, Bar-and-Chute-Outline, Bar-and-Chute-Outline-with-Spot, Aerial-LiDAR-Equivalent) and five interpolation algorithms (i.e., IDW, point Kriging, Kriging, minimum curvature, and trigonometric linear division) to quantitatively study their influence on interpolation accuracy [5]. The experimental results show that the trigonometric linear division algorithm and point Kriging algorithm can always provide the best interpolation accuracy for Bar-Outline-Only sampled data. For Aerial-LiDAR-Equivalent sampled data, the interpolation algorithm does not make any difference. For Bar-and-Chute-Outline-with-Spot sampled data, the triangulation linear division algorithm and point Kriging algorithm can always provide better interpolation accuracy.

Habib et al. used deterministic interpolation algorithms such as ANUDEM, IDW, and TIN to investigate the impact of estimation techniques on generating a reliable and accurate DEM suitable for large-scale mapping [6]. The results of the investigations showed that ANUDEM and TIN models are similar and significantly better than those attained from IDW.

The majority of these published works are concentrated on the comparison between different interpolation methods and not on the sensitivity of each method by altering the interpolation parameters. Therefore, there are few studies on how the interpolation parameters of an interpolation algorithm affect accuracy under the same conditions.

Interpolation parameters, including the search mode and interpolation kernel function (IKF), are the basic elements of an interpolation algorithm. Different or even significantly different results are produced when different interpolation parameters are used [7]. However, for most users, it is difficult to select an appropriate interpolation parameter. As a direct result, the spatial interpolation becomes a “black box.” A robust interpolation algorithm should provide comprehensible interpolation parameters or provide the user with as many suggestions regarding the selection of interpolation parameters as possible [8].

Some studies have focused on the error associated with the spatial discretization modes, data sources, and interpolation algorithms [9]. Through experiments, Wang provided the optimal number of search points for various interpolation algorithms [10]. Gao noted that the Kriging algorithm is nearly unaffected by the number of search points, whereas the weighted average algorithms are strictly dependent on the number of search points [3]. Lam noted that a weight factor of 1 or 2 is suitable for IDW [11], whereas Declercq argued that a weight factor of 2 would produce better experimental results [12]. While there is no widely accepted method for determining the smoothing factor in the radial basis function (RBF) algorithm, various methods have been proposed. Hardy [13], Franke [14] and Foley [15] provided an approximate numerical expression method. Rippa proposed an approach that uses a recursive algorithm to search for a parameter that minimizes the global error of the interpolation surface [16]. Aguilar et al. noted that a low smoothing factor should be used in the RBF multiquadric function (MQF) and RBF multilog function (MLF) and that a very large smoothing factor should be used in the RBF natural cubic splines function (NCSF) and RBF thin plate splines function (TPSF) [17]. Yan et al. proposed a novel multi-parameter simultaneous optimization IDW algorithm involving anisotropy, and it can simultaneously optimize the parameters of anisotropy, neighbor size, and distance decay to improve the accuracy of interpolation [18]. Zhu et al. used a moving least squares method to calculate the terrain factors and noted that it is economical to select 10–12 points in the neighborhood [19]. Zhang et al. studied the influence of interpolation parameters on interpolation accuracy; however, the experimental conclusions were largely arbitrary due to the lack of data source selection and the use of research methods [20,21]. Similar research also includes Zhu et al. [22], Zhang et al. [23,24], Sówka et al. [25], and Wu et al. [26].

Most researchers have directly assigned values to the interpolation parameters based on experience. For example, when studying the effects of the geomorphic type, sampling density, and interpolation algorithm on the accuracy of regular grids, Aguilar et al. used RBF and IDW with eight search points [17]. When studying an adaptive IDW algorithm, Lu and Wong defined a “neighborhood” as the local area consisting of five sampling points closest to the interpolation point [27]. Deng et al. proposed a quasigeoid grid network biquadratic interpolation method that takes the eight grid points closest to the interpolation point as the neighborhood [28]. However, selection based on experience inevitably leads to uncertainties in interpolation results or even distorted spatial distribution models, which in turn result in potentially erroneous conclusions [29].

More importantly, Wang et al. proposed the spatial statistics trinity, which describes spatial sampling and inference in terms of population, samples, and inference [30]. First, the overall nature of the research object needs to be considered, such as spatial autocorrelation (SAC), spatial stratified heterogeneity or independence and identically distributed (i.i.d.). Second, different sampling strategies are selected, such as stratified sampling or simple random sampling. Finally, different inference methods are chosen to estimate the unknown data. The theory of spatial statistics trinity is also applicable to the interpolation of terrain. In terms of spatial population, there are various types of terrain, and it is necessary to choose different types to carry out corresponding experiments. In terms of spatial samples, traditional topographic surveying methods follow the process from feature points to contour lines, so the methods of contour sampling and random sampling are need to be considered. In terms of inference method, the SAC of terrain data determines that the kernel function of the interpolation algorithm is a function of distance. Therefore, the discussion would be clear within the framework of the spatial statistics trinity, theoretically and empirically.

Therefore, the main contributions of this work lie in that it focused on the influence of interpolation parameters on accuracy. When choosing a certain algorithm, according to the different interpolation parameters, a more appropriate statistical interval and the degree of significance of different interpolation parameters can be given. In short, the purpose of this study is to provide a suitable interval for the interpolation parameters, to avoid an arbitrary operation and give prompt information as much as possible.

The rest of this article is organized as follows. In Section 2, the meaning of interpolation parameters was identified, and the effects of various interpolation parameters on the accuracy were analyzed. Subsequently, an experimental setup is given. In that, the sparsely distributed discrete sampling points in six regions with various geomorphic types and a Gaussian surface were used as experimental data. The number of search points, search direction, and IKF factor were selected as experimental objects. In Section 3, the statistical intervals for the interpolation parameters were calculated using a series of experimental methods, such as cross-validation, correlation analysis, trend surface analysis, and analysis of variance. Finally, Section 4 concludes the work and gives future directions.

2 Methods

2.1 Interpolation parameters

2.1.1 Search mode

Most interpolation algorithms estimate the elevation of an interpolation point within a local area; specifically, they determine a local area (or neighborhood) centered at the interpolation point and select some sampling points within that neighborhood to complete the interpolation. This process is called the sampling point search mode. Control options in the sampling point search mode include the search shape, search direction, number of search points, search neighborhood radius, and consideration of contour, structure, fracture, or border lines during searching. The search mode determines the selection of sampling points. The number and distribution of the sampling points affect the interpolation accuracy [23].

2.1.2 IKF

The IKF is a mathematical function in the interpolation algorithm that adjusts the influence of the sampling points. It is an important factor affecting interpolation accuracy [31]. All IKFs are explicit or implicit distance functions (Table 1). With respect to geographic implications, an IKF directly or indirectly represents the correlation between two adjacent spatial objects [7]. The IKF is a key factor in an interpolation algorithm and may lead to an increase or decrease in accuracy.

Table 1

IKF of several typical interpolation algorithms

Interpolation method Expression (d for distance) Kernel Function Factor
IDW IDW p(d) = (d + t) −u u for weight factor, t for smoothing factor
RBF MQF φ(d) = (d 2 + c 2)1/2 c for smoothing factor
MLF φ(d) = ln(d 2 + c 2)
TPSF φ(d) = (d 2 + c 2)ln(d 2 + c 2)
NCSF φ(d) = (d 2 + c 2)3/2
OK LINE γ(d) = C 0 + Cd C 0 for nugget, C for sill, a for range
SPHERE γ(d) = C 0 + C(3d/2a − d 3 /2a 2)
EXP γ(d) = C 0 + C(1 − e −d/a )
GAUSS γ(d) = C 0 + C(1 − e −d^2/a^2)

For example, for the OK algorithm, the spatial correlation of the sampling point is expressed by a semivariogram function model. This model provides a quantitative description of the spatial correlation between variables, which directly expresses the spatial variability of topographic relief. As the distance between the sampling point and the interpolation point increases, the variability increases, and the correlation decreases. When the distance increases to a certain extent, the variability gradually becomes stable, and the correlation disappears.

2.1.3 Experimental interpolation parameters

Different interpolation parameters have different performances. Some interpolation parameters will deterministically increase or decrease the interpolation accuracy, which are called the deterministic interpolation parameters. For example, using contour, structure, fracture, or border lines as constraints during interpolation assists in the selection of more reasonably distributed sampling points, thus improving the interpolation accuracy. Selecting sampling points on contour lines other than the one in question leads to a “massive topographic texture” [32]. In addition, there are sudden geomorphic changes between sampling points on opposite sides of the structure, fracture, and border lines; these sampling points inevitably decrease the interpolation accuracy when included in the interpolation [33,34]. Another example is the smoothing factor (t) in IDW, whose primary function is to smooth topography. A nonzero smoothing factor results in a decrease in the elevation of mountaintops, an increase in the elevation of valleys, and ultimately, a decrease in interpolation accuracy.

This article only studies the parameters that have uncertain effects on interpolation accuracy, which are the uncertain interpolation parameters. Because of differences in geomorphic types and sampling point distributions, assigning different values to some interpolation parameters may increase or decrease the interpolation accuracy, for example, the search direction, number of search points (or search radius), weight factor (u) in IDW, the smoothing factor (c) in RBF and the nugget (C 0), sill (C), and range (a) in the semivariogram function model in ordinary kriging (OK). Therefore, these parameters that significantly impact the accuracy of several typical interpolation algorithms were selected as experimental objects.

2.2 Interpolation parameter statistical experiment

2.2.1 Experimental data

Strategies for acquiring sampling point data include sampling along contour lines, regular grid sampling, progressive sampling, selective sampling, and mixed sampling. In practice, a data sampling strategy is selected based on the actual situation [33]. Therefore, without loss of generality, an ASTER GDEM in a 15 km × 15 km experimental area was selected from each of the six regions of representative geomorphic type as basic source data randomly, as shown in Figure 1. The grid resolution of GDEM is 30 m, and the descriptive topographic parameters are summarized in Table 2. At the same time, we sample GDEM along the contour lines, and the discrete sampling points obtained are used as experimental data. Because the impact of source data errors on accuracy is not part of this study, the experimental data were assumed to contain no source errors, which is consistent with the assumptions and requirements of this experiment.

Figure 1 
                     Perspective renderings of different geomorphic types: (a) plain; (b) hill; (c) low mountain; (d) middle mountain; (e) high mountain; and (f) mixed landform.
Figure 1

Perspective renderings of different geomorphic types: (a) plain; (b) hill; (c) low mountain; (d) middle mountain; (e) high mountain; and (f) mixed landform.

Table 2

Major topographic variables

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
Lowest elevation/m 2 33 285 1,235 3,940 575 −93.30
Highest elevation/m 50 354 1,444 2,252 5,769 1,081 108.15
Average slope/(°) 1.27 5.51 13.36 16.63 23.77 9.11 15.19

Meanwhile, to overcome the source error of on-site data, the experiment chooses an ideal mathematical surface – Gaussian surface as the experimental object – and selects discrete sampling points according to the random sampling method of subregions. In this way, these discrete sampling points used as experimental data will not contain any error, and they are the real “true values.”

The formula for Gaussian surface is shown below:

(1) z = A 1 x m 2 e x m 2 y n + 1 2 B 0.2 x m x m 3 y n 5 e x m 2 y n 2 C e x m + 1 2 y n 2 ,

where A = 90, B = 300, C = 9, and m = n = 500; the x and y coordinates of the generated Gaussian surface range between −500 and 500, as shown in Figure 2.

Figure 2 
                     Perspective rendering of Gaussian surface: (a) three-dimensional perspective; (b) two-dimensional contour map; and (c) randomly selected scattered points.
Figure 2

Perspective rendering of Gaussian surface: (a) three-dimensional perspective; (b) two-dimensional contour map; and (c) randomly selected scattered points.

2.2.2 Experimental objects

The relevant interpolation parameters in IDW, RBF, and OK were selected as study objects. The experimental values selected for the interpolation parameters are summarized in Table 3.

Table 3

Interpolation parameters and experimental values

Interpolation algorithm Search direction Number of search points Kernel function factor
IDW IDW D = Unrestricted, Four-direction, Eight-direction P = 1, 2, 4, 8, 16 u = 1, 2, 3, 4, 5
RBF MQF c = 0, 20, 40, 60, 80, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000
MLF
TPSF
NCSF
OK LINE Experimental fit
SPHERE
EXP
GAUSS

The search direction includes the directionally unrestricted search (denoted as “D0”), four-direction search (denoted as “D1”), and eight-direction search (denoted as “D2”). A directionally unrestricted search means that the direction is not taken into consideration and that the search process continues until the algorithm locates the preset number of sampling points. In the four-direction search, the neighborhood is evenly divided into four regions, with the interpolation point at the center. The same number of sampling points is selected in each region. The approach used in the eight-direction search is similar to that in the four-direction search.

The number of search points was set to the number of sampling points (1, 2, 4, 8, and 16) in each direction. A four-direction search with one search point means that there is one sampling point in each direction and that there are a total of four search points. The number of search points in a directionally unrestricted search is the same as that in a four-direction search.

The IKF factor in each selected interpolation algorithm was set to various experimental values. The weight factor (u) in IDW was set to 1, 2, 3, 4, and 5. The smoothing factor (c) in RBF was set to 0, 20, 40, 60, 80, 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1,000. The nugget (C 0), partial sill (C), and range (a) in OK were set to experimentally fitted values.

2.2.3 Experimental process and methods

The flow of the interpolation parameter statistical experiment is shown below.

Step 1: Experimental data acquisition

Data from a 15 km × 15 km area in each of the six regions with different geomorphic types were obtained from 30 m ASTER GDEM and subsequently discretized to use as experimental data. At the same time, the Gaussian surface was selected as another experimental data.

Step 2: Statistical analysis of the global residual root-mean-square error (RMSE) by cross-validation

Cross-validation is an experimental method that is independent of the user and grid size and that enables the user to focus on the interpolation parameters.

Based on the experimental values for the interpolation parameters in the algorithm, the residual at each sampling point was calculated by cross-validation. The global residual RMSES were statistically analyzed. On this basis, the statistical intervals of an interpolation parameter were determined using various analytical methods.

Step 3: Selection of the optimal search direction using the correlation analysis method based on the global residual RMSE

Correlation analysis is the most basic classical statistical method for quantitatively determining the correlation between factors and the quantitative relation between fitting variables from a statistical analysis perspective. The correlation between factors is primarily determined by calculating and testing the correlation coefficient (r). Generally, when r is greater than 0.8, there is a relatively strong linear correlation between the two variables; when r is less than 0.3, there is a relatively weak correlation between the two variables [35].

In this experiment, the correlation between RMSE was calculated for various search directions, and the impact of each search direction on the interpolation accuracy was determined. In this way, an optimal search direction was determined.

Step 4: Validation of the intervals for interpolation parameters using trend surface analysis based on global residual RMSE

Trend surface analysis uses a mathematical function to simulate the spatial distributions of factors and their trends. Trend surface analysis simulates the spatial distribution patterns of factors and determines their spatial trends through fitting with two-dimensional nonlinear functions using the least-squares method and regression analysis [36].

Assuming that there are a series of sampling points ( x i , y i , z i ) ( i = 1 , 2 , , n ) , a regression equation z = f ( x , y ) is fitted to satisfy the following condition:

(2) Q = i n [ z i f ( x i , y i ) ] 2 min ,

where f is a polynomial function. Any function can be approximated using a polynomial function within a specified range because the degree of the polynomial function can be adjusted as needed. The F-test was used in this study to test the goodness of fit of the nonlinear function obtained by fitting, which directly affects the trend surface analysis results.

In this experiment, trend surface analysis was used to analyze the impact of various interpolation parameters on the interpolation accuracy and to determine the optimal ranges of the interpolation parameters from a spatial distribution trend perspective.

Step 5: Investigation of the significant difference between the interpolation parameters using ANOVA based on global residual RMSE

ANOVA is used to test the significance of a difference in the mean between two or more samples, to assess the contribution of changes in various factors to the total variance and to analyze whether control factors at various levels have a significant impact on the results. In ANOVA, the p-value is an index often used to determine the degree of influence of a control factor. For a given confidence level of 0.05, when the calculated p-value is less than or equal to 0.05, the control factor is considered to have a significant impact on the experimental results [37].

In this experiment, the degree of significance of each interpolation parameter on interpolation accuracy was analyzed using ANOVA. The results provide information to effectively improve the interpolation results.

3 Results

Based on the description in Section 2, the statistical value intervals of an interpolation parameter of several typical interpolation algorithms were determined using a series of experimental methods, as shown in Table 4. In this study, based on the characteristics of the interpolation algorithm, these ranges were analyzed and validated to provide better guidance for the application of DEM.

Table 4

The value intervals of interpolation parameters

Interpolation algorithm Number of search points Search direction Kernel function factor Degree of significant
IDW IDW P = 4–8 D = four-direction u = 1–2 WF > NoSP > SD
RBF MQF P = 16–24 D = four-direction c = 1,000 NoSP > SF > SD
MLF P = 16–24 D = four-direction c = 1,000 NoSP > SF > SD
TPSF P >32 D = four-direction c = 0 SF > NoSP > SD
NCSF P >32 D = four-direction c = 0 SF > NoSP > SD
OK LINE P = 8–16 D = Unrestricted Smaller nugget SvF > NoSP > SD
SPHERE P = 8–16
EXP P = 8–16
GAUSS P = 8–16

NoSP, number of search points; SD search direction; WF, weight factor; SF, smoothing factors; SvF, semivariogram function.

These experimental results can be applied to any scientific field related to spatial interpolation, such as hydrology, geology, and earth sciences. They provide the optimal value interval of interpolation parameters for interpolation applications from a statistical point of view. Using the value interval of these parameters can avoid the randomness of the interpolation algorithm to a large extent and ensure a reasonable range of interpolation accuracy. Furthermore, the significant influence of the interpolation parameters indicates that under the same conditions, which parameters may be modified to achieve better interpolation accuracy.

3.1 IDW

The IDW is a precise interpolation algorithm. With this algorithm, the maximum and minimum values generated by interpolation only occur at the sampling points [38].

During the experiment, the values of the number of search points and weight factor were fixed, and the effects of a directionally unrestricted search, a four- and eight-direction search, on the interpolation accuracy were studied. According to the results, r between the RMSE for any two directions with any experimental data is greater than 0.83, as shown in Table 5 (“D1 × D2” signifies the r between the RMSE calculated for four- and eight-direction searches). In particular, the correlation coefficients for the four- and eight-direction searches are greater than 0.99. Therefore, it can be considered that four- and eight-direction searches have similar effects on the interpolation accuracy. Therefore, based on a comprehensive consideration of the interpolation efficiency and the accuracy of the interpolation algorithms, four-direction search should be used as much as possible.

Table 5

Values of r between different search directions (based on IDW)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 × D1 0.959 0.990 0.929 0.952 0.877 0.978 0.9442
D0 × D2 0.958 0.992 0.908 0.940 0.836 0.976 0.9924
D1 × D2 0.998 0.999 0.998 0.999 0.996 0.999 0.9671

Furthermore, the trend surface analysis method was used to study the influence trend of the number of search points and weight factor on the interpolation accuracy from the perspective of a continuous spatial distribution. The number of search points was taken as the x-axis variable, the weight factor was taken as the y-axis variable, and the RMSE was taken as the z-axis variable, as shown in Table 6 (only the experimental results for the four- and eight-direction search are listed). First, according to the experimental results, the coefficient of determination (R 2) of the trend surface function constructed by the number of search points and weight factor is above 0.99. Based on the F-distribution table, F 0.05(9,15) = 2.59, which is lower than the calculated F, suggesting that the cubic polynomial trend surface fitting function is significant at the 0.05 confidence level. Second, the trend surface fitting effect diagram clearly shows that the experimental region changes from plains, hills, mixed, low mountains, then to high mountains, the terrain gradually becomes more complex, and the changes in the number of search points and weight factor gradually focus, showing a certain pattern in Table 6; the result of Gaussian surface also follows this trend. The smaller interpolation error is concentrated between the areas where the number of search points per direction is 1–2 and the weight factor is 1–2. Combining the characteristics of IDW algorithm, as the number of search points increases, the difference in elevation between the sampling points gradually increases, which ultimately results in a decrease in interpolation accuracy. And a relatively small number of search points should be selected. As the weight factor increases, the impact of each sampling point on the weight of the interpolation point increases, which is manifested as a significant increase in the contribution of relatively closer sampling points and a gradual decrease in the contribution of relatively distant sampling points. Therefore, a relatively large weight factor is not ideal. The conclusions from the trend surface analysis are consistent with the previous analysis results.

Table 6

Results of the trend surface analysis between the number of search points, weight factor, and RMSE (based on IDW)

Geomorphic type Item Four-direction search Eight-direction search
Plain Fitting rendering
F 1291.490 849.538
p 0 0
R 2 0.997 0.996
Hill Fitting rendering
F 2548.880 1696.616
p 0 0
R 2 0.999 0.998
Low mountain Fitting rendering
F 1235.521 3203.684
p 0 0
R 2 0.997 0.999
Middle mountain Fitting rendering
F 1620.698 3404.377
p 0 0
R 2 0.998 0.999
High mountain Fitting rendering
F 998.270 2439.738
p 0 0
R 2 0.997 0.999
Mixed Fitting rendering
F 1895.928 3368.690
p 0 0
R 2 0.998 0.999
Gaussian surface Fitting rendering
F 447.368 504.263
p 0 0
R 2 0.996 0.997

More importantly, ANOVA was used to analyze the significance of the weight factor, number of search points, and search direction on interpolation accuracy at the confidence level of 0.05. From Table 7, if the impact of the weight factor and the number of search points on the interpolation accuracy is considered, then the weight factor, the number of search points, and their interaction have a significant impact on the interpolation accuracy (i.e., p < 0.05). If the impact of the number of search points and search direction on the interpolation accuracy is considered, then the number of search points has a significant impact on the interpolation accuracy (taking the “Hill” region as an example, p = 0.017 < 0.05), whereas the search direction and its interaction with the number of search points have no significant impact on interpolation accuracy. If the impact of the weight factor and search direction on interpolation accuracy is considered, then the weight factor has a significant impact on the interpolation accuracy (i.e., p < 0.05), whereas the search direction and its interaction with the weight factor have no significant impact on the interpolation accuracy (also take the “Hill” region as an example, p = 0.892 > 0.05). Sometimes, p of the interaction between the weight factor and search direction is 0 or less than 1, but p of the interaction between the number of search points and search direction is always 1, which indicates that the effect of the number of search points on the interpolation accuracy is less than the impact of the weight factor. Therefore, the results show that of the three parameters (weight factor, number of search points, and search direction), the weight factor has the most significant impact on the interpolation accuracy, followed by the number of search points and search direction. Thus, properly altering the weight factor can significantly improve the interpolation accuracy.

Table 7

Significant impacts of the weight factor, number of search points, and search direction on interpolation accuracy (based on IDW)

Geomorphic type Error sources Sum of squares (SS) Degree-of-freedom (DF) Mean square (MS) F p
Plain NoSP × NoSP 0.264 4 0.066 3.200 0.019
SD × SD 0.086 2 0.043 2.073 0.135
NoSP × SD 0.005 8 0.001 0.028 1.000
NoSP × NoSP 0.264 4 0.066 22.735 0.000
WF × WF 0.957 4 0.239 82.362 0.000
NoSP × WF 0.226 16 0.014 4.866 0.000
WF × WF 0.423 4 0.106 12.725 0.000
SD × SD 0.010 2 0.005 0.597 0.554
WF × SD 0.662 8 0.083 9.965 0.000
Hill NoSP × NoSP 13.018 4 3.255 3.297 0.017
SD × SD 0.955 2 0.478 0.484 0.619
NoSP × SD 0.019 8 0.002 0.002 1.000
NoSP × NoSP 13.018 4 3.255 99.062 0.000
WF × WF 42.936 4 10.734 326.724 0.000
NoSP × WF 15.627 16 0.977 29.728 0.000
WF × WF 13.019 4 3.255 6.806 0.000
SD × SD 0.110 2 0.055 0.115 0.892
WF × SD 31.402 8 3.925 8.208 0.000
Low mountain NoSP × NoSP 42.735 4 10.684 7.036 0.000
SD × SD 8.462 2 4.231 2.786 0.070
NoSP × SD 0.020 8 0.003 0.002 1.000
NoSP × NoSP 42.735 4 10.684 33.362 0.000
WF × WF 31.765 4 7.941 24.798 0.000
NoSP × WF 51.810 16 3.238 10.112 0.000
WF × WF 31.765 4 7.941 5.026 0.002
SD × SD 8.462 2 4.231 2.678 0.077
WF × SD 7.302 8 0.913 0.578 0.792
Middle mountain NoSP × NoSP 53.908 4 13.477 6.839 0.000
SD × SD 8.121 2 4.061 2.061 0.136
NoSP × SD 0.008 8 0.001 0.001 1.000
NoSP × NoSP 53.908 4 13.477 44.625 0.000
WF × WF 48.735 4 12.184 40.343 0.000
NoSP × WF 62.523 16 3.908 12.939 0.000
WF × WF 48.735 4 12.184 6.267 0.000
SD × SD 8.121 2 4.061 2.089 0.133
WF × SD 6.763 8 0.845 0.435 0.895
High mountain NoSP × NoSP 48.557 4 12.139 6.867 0.000
SD × SD 14.014 2 7.007 3.964 0.024
NoSP × SD 0.020 8 0.003 0.001 1.000
NoSP × NoSP 48.557 4 12.139 21.891 0.000
WF × WF 30.141 4 7.535 13.588 0.000
NoSP × WF 62.233 16 3.890 7.014 0.000
WF × WF 30.141 4 7.535 4.070 0.006
SD × SD 14.014 2 7.007 3.785 0.028
WF × SD 13.423 8 1.678 0.906 0.517
Mixed NoSP × NoSP 31.726 4 7.931 5.756 0.001
SD × SD 3.404 2 1.702 1.235 0.298
NoSP × SD 0.008 8 0.001 0.001 1.000
NoSP × NoSP 31.726 4 7.932 66.989 0.000
WF × WF 44.972 4 11.243 94.959 0.000
NoSP × WF 35.198 16 2.200 18.580 0.000
WF × WF 44.972 4 11.243 10.060 0.000
SD × SD 3.404 2 1.702 1.523 0.226
WF × SD 2.386 8 0.298 0.267 0.974
Gaussian surface NoSP × NoSP 81.621 4 20.405 4.799 0.002
SD × SD 11.029 2 5.515 1.297 0.281
NoSP × SD 3.315 8 0.414 0.097 0.999
NoSP × NoSP 81.621 4 20.405 30.535 0.000
WF × WF 109.638 4 27.409 41.016 0.000
NoSP × WF 126.436 16 7.902 11.825 0.000
WF × WF 109.638 4 27.409 7.618 0.000
SD × SD 11.029 2 5.515 1.533 0.224
WF × SD 14.55 8 1.819 0.505 0.848

3.2 RBF

Similarly, RBF is also a precise interpolation algorithm. However, there is a difference between RBF and IDW algorithms. The IDW algorithm is unable to calculate elevations higher or lower than the sampling points, whereas RBF interpolation algorithms are able to do so [39].

During the experiment, the values of the number of search points and smoothing factor were fixed, and the effects of the directionally unrestricted search, four-direction search, and eight-direction search on the interpolation accuracy were studied. According to the experimental results, r with MQF between the RMSES for any two directions is greater than 0.89, as shown in Table 8. In particular, the correlation coefficients for the four- and eight-direction searches are greater than 0.98. MQF and MLF results are similar, with the latter performing better than the former, as shown in Table 9. Therefore, four- and eight-direction searches can be assumed to have little effect on the interpolation accuracy.

Table 8

Values of r between different search directions (based on MQF)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 × D1 0.932 0.995 0.997 0.997 0.996 0.997 0.954
D0 × D2 0.891 0.987 0.992 0.992 0.990 0.993 0.957
D1 × D2 0.989 0.995 0.998 0.997 0.997 0.997 0.999
Table 9

Values of r between different search directions (based on MLF)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 × D1 0.953 0.996 0.998 0.998 0.997 0.998 0.939
D0 × D2 0.926 0.991 0.995 0.995 0.995 0.996 0.952
D1 × D2 0.996 0.999 0.999 0.999 0.999 0.999 0.998

TPSF and NCSF experimental results are similar. However, during the interpolation process, due to the influence of the number of search points and smoothing factor, the interpolation result may include an outlier. This outlier leads to a poor correlation between different search directions and even negative correlations, as shown in Tables 10 and 11.

Table 10

Values of r between different search directions (based on TPSF)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 × D1 0.051 0.418 0.585 0.303 0.200 0.087 0.181
D0 × D2 0.222 0.366 0.050 0.183 0.295 0.132 0.009
D1 × D2 0.013 0.128 0.199 0.576 0.822 0.060 0.150
Table 11

Values of r between different search directions (based on NCSF)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 ×D1 0.010 0.374 0.421 0.086 0.053 0.217 0.101
D0 × D2 −0.062 0.236 0.251 0.107 0.198 0.022 0.030
D1 × D2 −0.033 0.080 0.456 0.218 0.162 0.439 0.024

The differences in RMSE corresponding to the various search directions were analyzed, and the trend of the error differences was determined. The interpolation accuracy when using the four- and eight-direction search is higher than that of the directionally unrestricted search in most cases for MQF and MLF. However, as the smoothing factor and the number of search points increase, the difference in the influence of the search direction on the interpolation accuracy gradually decreases, indicating that the influence of the search direction on the difference in interpolation accuracy is weakened.

For TPSF and NCSF, the interpolation results may include an outlier due to the number of search points and smoothing factors in the interpolation process. If the number of search points and smoothing factors that generate abnormal values are limited, when the smoothing factor is equal to 0, the number of search points is greater than 32, and the interpolation accuracy of the four- and eight-direction search is better than that of the directionally unrestricted search.

The characteristics of the RBF kernel function, which is a typical symmetric distance function, were considered. Using a relatively small number of search directions may lead to lower interpolation accuracy. For MQF, according to the experimental results, as the number of search points increases, most RMSES gradually become stable. As different values of the smoothing factor are considered, the influence of the number of search points on the interpolation accuracy is inconsistent. When the smoothing factor is 0, the number of search points is the main factor affecting the interpolation accuracy, and the relationship between RMSE and the number of search points gradually decreases and tends to become stable. Decreasing the number of search points cannot reflect local terrain features, resulting in a large interpolation error. As the number of search points increases to a certain extent, the interpolation error reaches a steady state. As the number of search points increases, the influence on the interpolation result is not greatly affected. When the smoothing factor is not 0, the influence of the number of search points on the interpolation error is clearly affected by the smoothing factor, and eventually, an abrupt change in error is generated. More importantly, the relationship between the number of search points and RMSE is basically the same in the other experimental regions. Therefore, the optimal value interval is 4–6 search points or more than 16 search points per direction in MQF (a smaller number of search points is more appropriate if considering the interpolation efficiency). MLF results are similar to MQF, although the influence of the number of search points on the interpolation accuracy is more regular for MLF than for MQF. However, the results of TPSF and NCSF are not similar to those of MQF and MLF. When the number of search points is small, TPSF and NCSF produce significant numerical instability. As the smoothing factor increases, this numerical instability disappears. When the number of search points is large (>8 per direction), the numerical instability of TPSF and NCSF disappears, although a good interpolation result is obtained. As the smoothing factor increases, the numerical instability gradually appears again. Therefore, a larger number of search points (>8 per direction) is a better choice for TPSF and NCSF.

Furthermore, the trend surface analysis method was used to study the significance of the number of search points and weight factor on DEM interpolation accuracy from the perspective of a continuous spatial distribution. The number of search points was taken as the x-axis variable, the smoothing factor was taken as the y-axis variable, and RMSE was taken as the z-axis variable. Due to the numerical instability of TPSF and NCSF, only MQF and MLF experimental results were analyzed. First, according to the experimental results, MQF was taken as an example, as shown in Table 12 (only the experimental results for the four- and eight-direction searches are listed). The trend surface function of the number of search points and smoothing factor is significant in different experimental regions, and R 2 is above 0.8. Additionally, based on the F-distribution table, F 0.05(9,15) = 1.96, which is lower than the calculated F, suggesting that the cubic polynomial trend surface fitting function is significant at the 0.05 confidence level. Second, similar to IDW, the trend surface fitting effect diagram clearly shows that the experimental region changes from plains, hills, mixed, low mountains, then to high mountains, the terrain gradually becomes more complex, and the changes in the number of search points and smoothing factors gradually focus, showing a certain pattern in Table 12. There are several parameter regions with a lower interpolation error. For MQF, for example, with 4–6 search points per direction, the smoothing factor is close to zero; with approximately 16 search points per direction, the smoothing factor is close to zero; and with 4–6 search points per direction, the smoothing factor is 900–1,000. These three areas may be the areas where the interpolation parameters are more reasonable.

Table 12

Results of the trend surface analysis between the number of search points, smoothing factor, and RMSE (based on MQF)

Geomorphic type Item Four-direction search Eight-direction search
Plain Fitting rendering
F 52.780 204.012
p 0 0
R 2 0.812 0.944
Hill Fitting rendering
F 60.408 51.831
p 0 0
R 2 0.832 0.809
Low mountain Fitting rendering
F 73.996 58.546
p 0 0
R 2 0.858 0.827
Middle mountain Fitting rendering
F 70.379 59.146
p 0 0
R 2 0.852 0.829
High mountain Fitting rendering
F 73.166 62.198
p 0 0
R 2 0.857 0.836
Mixed Fitting rendering
F 67.734 56.406
p 0 0
R 2 0.847 0.822
Gaussian surface Fitting rendering
F 41.785 238.811
p 0 0
R 2 0.823 0.971

However, even if the interpolation accuracy is low, the display effect of constructing with different interpolation parameters is still quite different. When the number of search points is small and the smoothing factor is small (e.g., Figure 3(a), (b), (e), and (f)), there are some “small pits” and a big difference from the original data, which should be caused by the small number of search points. When the number of search points is small and the smoothing factor is large, or the number of search points is large and the smoothing factor is small, the DEM obtained by modeling is more consistent with the original data, because the number of search points and smoothing factor are coordinated to eliminate their respective shortcomings. Therefore, if two aspects of modeling effect and modeling efficiency are considered reasonable, choosing a smaller number of search points (e.g., 4–6 search points per direction) and a larger smoothing factor (e.g., the smoothing factor is 900–1,000) is a more economical choice.

Figure 3 
                  Renderings of DEM created based on different interpolation parameters: (a) P = 16, c = 0; (b) P = 16, c = 100; (c) P = 16, c = 400; (d) P = 16, c = 1,000; (e) P = 8, c = 0; (f) P = 16, c = 0; (g) P = 32, c = 0; and (h) P = 64, c = 0.
Figure 3

Renderings of DEM created based on different interpolation parameters: (a) P = 16, c = 0; (b) P = 16, c = 100; (c) P = 16, c = 400; (d) P = 16, c = 1,000; (e) P = 8, c = 0; (f) P = 16, c = 0; (g) P = 32, c = 0; and (h) P = 64, c = 0.

These experimental results also appear on the Gaussian surface. From the graphical performance, it is more similar to the experimental results where it is gradually approaching on-site terrain. When the search direction is eight-direction, there is one point in each direction and eight search points in all. At this time, a larger smoothing factor can also achieve good interpolation accuracy. However, it can be predicted that as the number of search points decreases, the interpolation error will gradually increase, gradually approaching the experimental results of the “Plain” terrain. This shows that the experimental results are similar regardless of whether it is an actual terrain or an ideal surface.

For MLF, the experimental results are similar to MQF, and it has the same optimal value interval, as shown in Table 13.

Table 13

Results of the trend surface analysis between the number of search points, smoothing factor, and RMSE (based on MLF)

Geomorphic type Item Four-direction search Eight-direction search
Plain Fitting rendering
F 91.380 101.247
p 0 0
R 2 0.882 0.892
Hill Fitting rendering
F 142.923 91.333
p 0 0
R 2 0.921 0.882
Low mountain Fitting rendering
F 158.316 145.491
p 0 0
R 2 0.928 0.923
Middle mountain Fitting rendering
F 162.745 159.789
p 0 0
R 2 0.930 0.929
High mountain Fitting rendering
F 159.398 162.691
p 0 0
R 2 0.929 0.930
Mixed Fitting rendering
F 165.062 154.167
p 0 0
R 2 0.931 0.927
Gaussian surface Fitting rendering
F 48.229 404.818
p 0 0
R 2 0.870 0.983

Moreover, ANOVA was used to analyze the significance of the smoothing factor, number of search points, and search direction on interpolation accuracy at the confidence level of 0.05. Table 14 shows the results of the analysis of variance based on MLF; Table 15 shows the results of the analysis of variance based on TPSF.

Table 14

Significant effects of the smoothing factor, number of search points, and search direction on interpolation accuracy (based on MLF)

Geomorphic type Error sources Sum of squares (SS) Degree of freedom (DF) Mean square (MS) F p
Plain NoSP × NoSP 0.401 4 0.100 20.033 0.000
SD × SD 0.099 2 0.050 9.961 0.000
NoSP × SD 0.020 8 0.003 0.506 0.851
NoSP × NoSP 0.401 4 0.100 107.329 0.000
SF × SF 0.995 14 0.071 76.151 0.000
NoSP × SF 0.035 56 0.001 0.668 0.958
SF × SF 11.850 14 0.846 17.915 0.000
SD × SD 1.612 2 0.806 17.055 0.000
SF × SD 0.013 28 0.001 0.010 1.000
Hill NoSP × NoSP 0.374 4 0.094 41.736 0.000
SD × SD 0.006 2 0.003 1.233 0.294
NoSP × SD 0.001 8 0.000 0.055 0.100
NoSP × NoSP 0.374 4 0.094 1587.039 0.000
SF × SF 0.419 14 0.030 507.385 0.000
NoSP × SF 0.050 56 0.001 15.046 0.000
SF × SF 0.419 14 0.030 12.626 0.000
SD × SD 0.006 2 0.003 1.167 0.314
SF × SD 0.001 28 0.000 0.012 1.000
Low mountain NoSP × NoSP 0.258 4 0.064 38.210 0.000
SD × SD 0.002 2 0.001 0.593 0.554
NoSP × SD 0.001 8 0.000 0.057 0.100
NoSP × NoSP 0.258 4 0.064 1506.009 0.000
SF × SF 0.245 14 0.018 408.567 0.000
NoSP × SF 0.106 56 0.002 44.136 0.000
SF × SF 0.245 14 0.018 8.596 0.000
SD × SD 0.002 2 0.001 0.492 0.612
SF × SD 0.002 28 0.000 0.033 1.000
Middle mountain NoSP × NoSP 0.161 4 0.040 45.895 0.000
SD × SD 0.001 2 0.000 0.454 0.635
NoSP × SD 0.001 8 0.000 0.065 0.999
NoSP × NoSP 0.161 4 0.040 1931.601 0.000
SF × SF 0.108 14 0.008 370.424 0.000
NoSP × SF 0.074 56 0.001 63.615 0.000
SF × SF 0.108 14 0.008 5.872 0.000
SD × SD 0.001 2 0.000 0.303 0.739
SF × SD 0.001 28 0.000 0.026 1.000
High mountain NoSP × NoSP 0.045 4 0.011 40.079 0.000
SD × SD 0.000 2 0.000 0.444 0.642
NoSP × SD 0.000 8 0.000 0.100 0.999
NoSP × NoSP 0.045 4 0.011 1006.899 0.000
SF × SF 0.028 14 0.002 177.873 0.000
NoSP × SF 0.030 56 0.001 47.820 0.000
SF × SF 0.028 14 0.002 4.713 0.000
SD × SD 0.000 2 0.000 0.296 0.744
SF × SD 0.000 28 0.000 0.038 1.000
Mixed NoSP × NoSP 0.336 4 0.084 55.283 0.000
SD × SD 0.003 2 0.001 0.857 0.426
NoSP × SD 0.001 8 0.000 0.078 0.999
NoSP × NoSP 0.336 4 0.084 2076.245 0.000
SF × SF 0.220 14 0.016 388.034 0.000
NoSP × SF 0.097 56 0.002 42.720 0.000
SF × SF 0.220 14 0.016 6.498 0.000
SD × SD 0.003 2 0.001 0.539 0.584
SF × SD 0.001 28 0.000 0.020 1.000
Gaussian surface NoSP × NoSP 1350.329 4 337.582 37.863 0.000
SD × SD 177.353 2 88.677 9.946 0.000
NoSP × SD 804.4593 8 100.557 11.278 0.000
NoSP × NoSP 1350.329 4 337.582 45.61 0.000
SF × SF 1198.796 14 85.628 11.569 0.000
NoSP × SF 545.1227 56 9.734 1.315 0.098
SF × SF 1198.796 14 85.628 5.525 0.000
SD × SD 177.353 2 88.677 5.721 0.004
SF × SD 38.5286 28 1.376 0.089 1.000
Table 15

Significant effects of the smoothing factor, number of search points, and search direction on interpolation accuracy (based on TPSF)

Geomorphic type Error sources Sum of squares (SS) Degree of freedom (DF) Mean square (MS) F p
Plain NoSP × NoSP 432,000,000 4 108,000,000 1.021 0.398
SD × SD 200,000,000 2 99,962,336 0.944 0.391
NoSP × SD 851,000,000 8 106,000,000 1.005 0.434
NoSP × NoSP 0.493 4 0.123 8.018 0.000
SF × SF 7.183 14 0.513 33.383 0.000
NoSP × SF 2.750 56 0.049 3.195 0.000
SF × SF 7.183 14 0.513 21.608 0.000
SD × SD 0.411 2 0.206 8.653 0.000
SF × SD 0.863 28 0.031 1.299 0.157
Hill NoSP × NoSP 0.078 4 0.020 1.125 0.346
SD × SD 0.001 2 0.001 0.032 0.969
NoSP × SD 0.002 8 0.000 0.016 1.000
NoSP × NoSP 0.078 4 0.020 78.133 0.000
SF × SF 2.528 14 0.181 720.091 0.000
NoSP × SF 1.096 56 0.020 78.068 0.000
SF × SF 2.528 14 0.181 27.050 0.000
SD × SD 0.001 2 0.001 0.083 0.921
SF × SD 0.010 28 0.000 0.051 1.000
Low mountain NoSP × NoSP 0.065 4 0.016 1.767 0.137
SD × SD 0.000 2 0.000 0.000 1.000
NoSP × SD 0.001 8 0.000 0.008 1.000
NoSP × NoSP 0.065 4 0.016 102.842 0.000
SF × SF 1.026 14 0.073 461.103 0.000
NoSP × SF 0.893 56 0.016 100.364 0.000
SF × SF 1.026 14 0.073 13.571 0.000
SD × SD 0.000 2 0.000 0.000 1.000
SF × SD 0.010 28 0.000 0.069 1.000
Middle mountain NoSP × NoSP 0.062 4 0.016 2.239 0.066
SD × SD 0.000 2 0.000 0.029 0.972
NoSP × SD 0.000 8 0.000 0.006 1.000
NoSP × NoSP 0.062 4 0.016 161.912 0.000
SF × SF 0.602 14 0.043 446.292 0.000
NoSP × SF 0.847 56 0.015 157.040 0.000
SF × SF 0.602 14 0.043 8.446 0.000
SD × SD 0.000 2 0.000 0.040 0.961
SF × SD 0.007 28 0.000 0.051 1.000
High mountain NoSP × NoSP 0.034 4 0.009 2.539 0.041
SD × SD 0.000 2 0.000 0.064 0.938
NoSP × SD 0.001 8 0.000 0.020 1.000
NoSP × NoSP 0.034 4 0.009 75.004 0.000
SF × SF 0.230 14 0.016 144.667 0.000
NoSP × SF 0.458 56 0.008 72.098 0.000
SF × SF 0.230 14 0.016 5.883 0.000
SD × SD 0.000 2 0.000 0.077 0.926
SF × SD 0.007 28 0.000 0.083 1.000
Mixed NoSP × NoSP 0.065 4 0.016 1.604 0.175
SD × SD 0.000 2 0.000 0.002 0.998
NoSP × SD 0.000 8 0.000 0.004 1.000
NoSP × NoSP 0.065 4 0.016 125.144 0.000
SF × SF 1.097 14 0.078 604.770 0.000
NoSP × SF 1.007 56 0.018 138.784 0.000
SF × SF 1.097 14 0.078 13.031 0.000
SD × SD 0.000 2 0.000 0.003 0.997
SF × SD 0.009 28 0.000 0.052 1.000
Gaussian surface NoSP × NoSP 27,919,460 4 6,979,865 1.562 0.186
SD × SD 7,012,385 2 3,506,193 0.785 0.458
NoSP × SD 31,250,799 8 3,906,350 0.874 0.539
NoSP × NoSP 27,919,460 4 6,979,865 1.572 0.185
SF × SF 63,952,668 14 456,8048 1.029 0.028
NoSP × SF 246,721,896 56 4,405,748 0.992 0.000
SF × SF 63,952,668 14 4,568,048 1.016 0.039
SD × SD 7,012,385 2 3,506,193 0.780 0.460
SF × SD 124,628,986 28 4,451,035 0.990 0.485

MQF and MLF have similar experimental results. The results of ANOVA in Table 14 show that the search direction has no significant effect on the interpolation accuracy (i.e., p > 0.05), and the number of search points and smoothing factor have a significant impact on the interpolation accuracy (i.e., p < 0.05). Who has a more significant influence? Is it the number of search points or the smoothing factor? If we simply observe the number of search points, the smoothing factors, and their interaction, it seems that you cannot obtain a corresponding conclusion, because their p-values are all equal to 0, indicating that they all have a significant influence. However, if we analyze the significant influence of “the number of search points and search direction” and “the smoothing factor and search direction” on the interpolation accuracy. Take “Hill” as an example, when the search direction has no significant influence on the interpolation accuracy (i.e., p = 0.294 > 0.05 and p = 0.314 > 0.05), the interaction of the number of search points and search direction (p = 0.100 > 0.05) is smoother than the smoothing factor and search direction (p = 1.000 > 0.05), it indicates that the number of search points has a greater suppressing effect on search direction. In other words, the number of search points has a greater significant impact on the interpolation accuracy.

TPSF and NCSF also have similar experimental results. The results of ANOVA in Table 15 show that the search direction has no significant effect on the interpolation accuracy (i.e., p > 0.05), and the number of search points and smoothing factor have a significant impact on the interpolation accuracy (i.e., p < 0.05). In the comparison of the significance of the two, the smoothing factor has a more significant impact. However, the experimental results of Gaussian surfaces show another interesting phenomenon. The influence of the search direction is gradually increasing (“Gaussian Surface” in Table 15). This is because the Gaussian surface uses more uniform and random sampling points, resulting in a more uniform distribution of sampling points involved in the interpolation calculation, and these sampling points are more suitable for the characteristics of the RBF algorithm.

3.3 OK

For the OK algorithm, an experimental semivariogram function model based on the experimental data during interpolation must be developed. In other words, an appropriate semivariogram function model and its fitted values must be selected. If there is no appropriate experimental semivariogram function model for the experimental data, then the optimal interpolation results cannot be obtained [40].

Therefore, the semivariogram function model has the most significant impact on the interpolation accuracy. During the experiment, the experimental semivariogram function models were established for the data of six experimental regions and a Gaussian surface, as shown in Table 16, and the experimental results show that the experimental semivariogram function models obtained by fitting the experimental regions all have high reliability, with R 2 reaching 0.85. However, the fitting effect of the experimental region in “Plain” is poor (R 2 does not exceed 0.6), and the Gaussian model and the linear model cannot be fitted (Figure 4).

Table 16

Fitting results of semivariogram function models in different experimental regions

Model Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
SPHERE C 0 5.664 21.694 2297.577 4402.268 1004.380 11197.577 0
C –99.459 2683.229 62480.137 44680.578 6516.541 275354.167 1,700
a 360485.461 6312.426 5365.413 14202.406 5819.678 20630.065 1,610
R 2 0.593 0.997 0.997 0.991 0.991 0.993 0.990
EXP C 0 5.667 0.000 0.000 3578.595 0.001 9667.751 0
C –340.742 3342.229 72502.576 93466.480 8305.763 822295.633 1,790
a 819334.078 3800.929 2633.719 17288.385 2594.520 38666.164 989
R 2 0.592 0.987 0.988 0.994 0.997 0.994 0.901
GAUSS C 0 / 387.100 0.230 6452.000 1484.000 32248.668 0
C / 2336.000 59636.437 29280.000 5981.000 142170.508 1,700
a / 3107.000 1407.966 4295.000 2648.000 5582.567 733
R 2 / 0.996 0.854 0.972 0.982 0.962 0.965
LINE C 0 / 223.692 6005.491 5525.128 1377.759 13266.720 0
C / 2439.613 57780.437 31071.114 6018.745 139058.733 1,400
a / 4857.852 4033.876 7337.784 4353.208 7279.495 741
R 2 / 0.995 0.986 0.984 0.978 0.991 0.992
Figure 4 
                  Fitting chart of the experimental semivariogram with Gaussian surface: (a) SPHERE; (b) EXP; (c) GAUSS; and (d) LINE.
Figure 4

Fitting chart of the experimental semivariogram with Gaussian surface: (a) SPHERE; (b) EXP; (c) GAUSS; and (d) LINE.

The experimental results also show that the linear model always has stable interpolation results, and as the number of search points increases, the error gradually decreases and becomes stable. The nugget plays a significant role in all semivariogram function models. When the nugget is zero or relatively small, the interpolation results produced by each exponential model are nearly consistent with those produced by the linear model, and the RMSE of the interpolation is relatively small. When the nugget is relatively large, the RMSE of the interpolation is larger. In different experimental regions, the error calculated by the Gaussian model increases nearly linearly as the number of search points increases and is the worst among all semivariogram function models; however, when the nugget value is small, a certain degree of improvement is observed. These results are related to the meaning of the nugget. The term nugget represents the concentrated sampling error of discrete sampling points and the error caused by small-scale changes.

The experimental values for the number of search points and the semivariogram function model were fixed, and the effects of the directionally unrestricted search, four-direction search, and eight-direction search on the interpolation accuracy were studied. According to the experimental results, in addition to the “Plain” experimental region, r between RMSEs for any two directions is greater than 0.97, as shown in Table 17. The experimental results show that the search direction has no significant effect on the interpolation accuracy.

Table 17

Values of r between different search directions (based on OK)

Geomorphic type Plain Hill Low mountain Middle mountain High mountain Mixed Gaussian surface
D0 × D1 0.772 0.997 0.981 0.992 0.990 0.996 1.000
D0 × D2 0.722 0.997 0.975 0.995 0.994 0.998 0.975
D1 × D2 0.971 0.995 0.997 0.999 0.999 0.999 0.975

The differences in RMSE corresponding to the various search directions were analyzed, and the trend of the error differences in interpolation was determined. The experimental results do not provide clear and unbiased conclusions to demonstrate the pros and cons of a directionally unrestricted, four-direction and eight-direction search.

According to the experimental results, the influences of the number of search points on the interpolation accuracy show different performance effects under different semivariogram function models. The influence of the linear semivariogram function model gradually decreases, and for 8–16 search points, a stable state is achieved. The influence of the Gaussian model gradually increases. Therefore, it is more suitable when the number of search points is small. The spherical model and the exponential model show similar performance effects in the different experimental areas; as the number of search points increases, the interpolation error gradually increases or decreases, showing a certain degree of variability. However, these two models have a smaller number of search points in general, i.e., 8–16 points. These conclusions are consistent with the finding that one of the important features of the OK algorithm is its shielding effect [41]. This effect can eliminate the impact of a relatively large number of search points and search direction on the interpolation accuracy. Therefore, it is reasonable to select a relatively small number of search points and use directionally unrestricted search in the OK algorithm.

Finally, the experimental results show that of the three parameters (the semivariogram function models, number of search points, and search direction), as shown in Table 18, the semivariogram function models have the most significant impact on the interpolation accuracy, followed by the number of search points and the search direction.

Table 18

Significant effects of the semivariogram function model, number of search points, and search direction on interpolation accuracy (based on OK)

Geomorphic type Error sources Sum of squares (SS) Degree of freedom (DF) Mean square (MS) F p
Plain SvF × SvF / / / / /
NoSP × NoSP / / / / /
SvF × NoSP / / / / /
SD × SD / / / / /
NoSP × NoSP / / / / /
SD × NoSP / / / / /
SvF × SvF / / / / /
SD × SD / / / / /
SvF × SD / / / / /
Hill SvF × SvF 192.700 3 64.233 24.076 0.000
NoSP × NoSP 6.656 4 1.664 0.624 0.655
SvF × NoSP 32.016 12 2.668 229.167 0.000
SD × SD 0.097 2 0.049 32.2130 0.000
NoSP × NoSP 6.656 4 1.664 1104.701 0.000
SD × NoSP 0.012 8 0.002 0.000 1.000
SvF × SvF 192.700 3 64.233 1248.405 0.000
SD × SD 0.097 2 0.049 0.943 0.440
SvF × SD 0.309 6 0.052 0.064 0.999
Low mountain SvF × SvF 311.247 3 103.749 19.653 0.000
NoSP × NoSP 27.721 4 6.930 1.313 0.320
SvF × NoSP 63.350 12 5.279 110.190 0.000
SD × SD 0.009 2 0.005 0.205 0.819
NoSP × NoSP 0.618 4 0.155 6.858 0.011
SD × NoSP 0.180 8 0.023 0.022 1.000
SvF × SvF 39.523 3 13.175 346.410 0.000
SD × SD 0.009 2 0.005 0.121 0.888
SvF × SD 0.228 6 0.038 0.239 0.961
Middle mountain SvF × SvF 419.053 3 139.684 19.434 0.000
NoSP × NoSP 104.192 4 26.048 3.624 0.037
SvF × NoSP 86.254 12 7.188 27.184 0.000
SD × SD 5.821 2 2.911 139.812 0.000
NoSP × NoSP 104.192 4 26.048 1251.200 0.000
SD × NoSP 0.167 8 0.021 0.002 1.000
SvF × SvF 419.053 3 139.684 237.704 0.000
SD × SD 5.821 2 2.912 4.953 0.054
SvF × SD 3.526 6 0.588 0.147 0.989
High mountain SvF × SvF 39.523 3 13.175 23.618 0.000
NoSP × NoSP 0.618 4 0.155 0.277 0.887
SvF × NoSP 6.694 12 0.558 40.740 0.000
SD × SD 0.009 2 0.005 0.205 0.819
NoSP × NoSP 0.618 4 0.155 6.858 0.011
SD × NoSP 0.180 8 0.023 0.022 1.000
SvF × SvF 39.523 3 13.175 346.410 0.000
SD × SD 0.009 2 0.005 0.121 0.888
SvF × SD 0.228 6 0.038 0.239 0.961
Mixed SvF × SvF 311.247 3 103.749 19.653 0.000
NoSP × NoSP 27.721 4 6.930 1.313 0.320
SvF × NoSP 63.350 12 5.279 110.190 0.000
SD × SD 0.668 2 0.334 48.126 0.000
NoSP × NoSP 27.721 4 6.930 998.689 0.000
SD × NoSP 0.056 8 0.007 0.001 1.000
SvF × SvF 311.247 3 103.749 595.431 0.000
SD × SD 0.668 2 0.334 1.917 0.227
SvF × SD 1.045 6 0.174 0.092 0.997
Gaussian surface SvF × SvF 6,529,088 3 2,176,363 8.823 0.0001
NoSP × NoSP 6,534,044 4 1,633,511 6.622 0.0003
SvF × NoSP 19,628,667 12 1,635,722 6.631 0.0000
SD × SD 775,732 2 387,866 0.52 0.5980
NoSP × NoSP 6,534,044 4 1,633,511 2.191 0.0852
SD × NoSP 1,692,150 8 211,518 0.284 0.9680
SvF × SvF 1,760,564 3 586,854 0.856 0.4706
SD × SD 776,967 2 388,483 0.566 0.5713
SvF × SD 7,098,088 6 1,183,015 1.725 0.1356

4 Discussion

As described in ESRI Using GeoStatistical Analyst, the user uses the exploratory data analysis method to fully mine the interpolation data itself in terms of data collection, data format, data accuracy, spatial distribution and spatial correlation, etc., and coupled with the user’s familiarity and experience with the characteristics of interpolation algorithm; then, the most reliable interpolation result can be obtained. However, the complicated actual situation is often not satisfactory: users of data cannot obtain detailed background information such as collection; at the same time, as the scope of applications expands, actual operators may not have sufficient relevant professional knowledge and experience. All of this has brought certain troubles to the operators and users of interpolation result data.

In this study, the statistical value interval of an interpolation parameter in several typical interpolation algorithms was studied using cross-validation, correlation analysis, trend surface analysis, and ANOVA. In the interpolation process, none of the methods could automatically identify topographic features and select the appropriate interpolation algorithm and corresponding interpolation parameters. It has become a consensus. Therefore, it is important to provide the user with more information on interpolation parameters and eliminate arbitrariness in their selection. Potential abnormal impacts of special sampling methods on some interpolation algorithms were not considered in this experiment (e.g., the maximum values in the interpolation residuals when TPSF and NCSF were used). Therefore, the statistical value interval of the interpolation parameters determined by calculating the global residual RMSE was the optimal choice from the perspectives of stability and robustness. Moreover, as one of the factors affecting interpolation accuracy, the search direction had the least significant impact compared to other factors. The emergence of new data acquisition methods (e.g., interferometric synthetic aperture radar and light detection and ranging) has resulted in a continuous increase in the accuracy and distribution density of acquired sampling points. Therefore, in a practical modeling process, it is possible to place less focus on or even forgo the selection of the search direction.

This article mainly focuses on finding the reasonable value range of the parameters in most cases according to a large number of experiments with different parameters. It is essentially an optimization problem. Therefore, some of the most representative computational intelligence algorithms can also be used [42], such as Monarch Butterfly Optimization [43], Earthworm Optimization Algorithm [44], Elephant Herding Optimization [45] and Moth Search [46] algorithm.

In addition, as mentioned by Wang et al. [30], the spatial sampling is usually performed once and non-randomly, and the accuracy of statistical inference varies with the adopted sampling strategy. Therefore, the accuracy of applying spatial statistics using field data depends on the sampling, including its pattern and density. According to this idea, analyzing and classifying the patterns and densities of local areas, and establishing the relationship between them and the interpolation algorithm, interpolation accuracy, it will be possible to obtain accurate, rather than statistical experimental conclusions. The results will provide better guidance in the application of DEM.

Acknowledgments

The author would like to thank Professor Jinfeng Wang for his fruitful discussions about the theory of spatial statistic trinity. The author would like to express their gratitude to AJE (https://www.aje.cn/) for the expert linguistic services provided. In this study, DPS (Data Processing System developed by Professor Tang Qiyi of Zhejiang University), Matlab 2009, and Surface Mapping Software (www.goldensoftware.com/) were used for experimental analysis. DPS provides correlation analysis and variance analysis methods, Matlab 2009 provides trend surface analysis method, and Surface Mapping Software provides semivariogram fitting methods. The author also would like to express their gratitude for the software mentioned earlier.

  1. Funding information: This research was funded by the National Natural Science Foundation of China, grant number 41371383.

  2. Conflict of interest: The authors declare no conflicts of interest.

  3. Data availability statement: The data used to support the findings of this study are available from the corresponding author upon request.

  4. Computer code availability: DemCreate, the source codes are available for download at the link: https://github.com/VGEZhangJM/DemCreate.git (2022).

References

[1] Salekin S, Burgess J, Morgenroth J, Mason E, Meason D. A comparative study of three non-geostatistical methods for optimising digital elevation model interpolation. ISPRS Int J geo-information. 2018;7(8):300.10.3390/ijgi7080300Search in Google Scholar

[2] Polidori L, El Hage M. Digital elevation model quality assessment methods: A critical review. Remote Sens. 2020;12(21):3522.10.3390/rs12213522Search in Google Scholar

[3] Gao J. Construction of regular grid DEMs from digitized contour lines: A comparative study of three interpolators. Geog Inf Sci. 2001;7(1):8–15.10.1080/10824000109480551Search in Google Scholar

[4] Chaplot V, Darboux F, Bourennane H, Leguédois S, Silvera N, Phachomphon K. Accuracy of interpolation techniques for the derivation of digital elevation models in relation to landform types and data density. Geomorphology. 2006;77(1–2):126–41.10.1016/j.geomorph.2005.12.010Search in Google Scholar

[5] Heritage GL, Milan DJ, Large ARG, Fuller IC. Influence of survey strategy and interpolation model on DEM quality. Geomorphology. 2009;112(3–4):334–44.10.1016/j.geomorph.2009.06.024Search in Google Scholar

[6] Habib M, Alzubi Y, Malkawi A, Awwad M. Impact of interpolation techniques on the accuracy of large-scale digital elevation model. Open Geosci. 2020;12(1):190–202.10.1515/geo-2020-0012Search in Google Scholar

[7] Lu HX. Research on DEM error model. Nanjing: PhD, Nanjing Normal University; 2008.Search in Google Scholar

[8] Hofierka J, Cebecauer T, Šúri M. Optimisation of interpolation parameters using cross-validation. Digital terrain modelling. Berlin: Springer; 2007. p. 67–82.10.1007/978-3-540-36731-4_3Search in Google Scholar

[9] Wilson JP. Environmental applications of digital terrain modeling. USA: Wiley-Blackwell Press; 2018.10.1002/9781118938188Search in Google Scholar

[10] Wang JY. Principles of spatial information system. China: Science Press; 2001.Search in Google Scholar

[11] Lam NSN. Spatial interpolation methods: A review. Am Cartographer. 1983;10(2):129–50.10.1559/152304083783914958Search in Google Scholar

[12] Declercq FAN. Interpolation methods for scattered sample data: Accuracy, spatial patterns, processing time. Cartograp Geog Inf Syst. 1996;23(3):128–44.10.1559/152304096782438882Search in Google Scholar

[13] Hardy RL. Multiquadric equations of topography and other irregular surfaces. J Geophys Res. 1971;76(8):1905–15.10.1029/JB076i008p01905Search in Google Scholar

[14] Franke R. Scattered data interpolation: Tests of some methods. Math Comput. 1982;38(157):181–200.Search in Google Scholar

[15] Foley TA. Interpolation and approximation of 3-D and 4-D scattered data. Comput Math Appl. 1987;13(8):711–40.10.1016/0898-1221(87)90043-5Search in Google Scholar

[16] Rippa S. An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv Comput Math. 1999;11(2):193–210.10.1023/A:1018975909870Search in Google Scholar

[17] Aguilar FJ, Agüera F, Aguilar MA, Carvajal F. Effects of terrain morphology, sampling density, and interpolation methods on grid DEM accuracy. Photogramm Eng Remote Sens. 2005;71(7):805–16.10.14358/PERS.71.7.805Search in Google Scholar

[18] Yan JB, Wu B, He QH. An anisotropic IDW interpolation method with multiple parameters cooperative optimization. Acta Geodaet Cartograp Sin. 2021;50(5):675–84.Search in Google Scholar

[19] Zhu Y, Liu X, Zhao J, Cao J, Wang X, Li D. Effect of DEM interpolation neighbourhood on terrain factors. ISPRS Int J Geo-Inform. 2019;8(1):30.10.3390/ijgi8010030Search in Google Scholar

[20] Zhang J, Guo L, Zhang X. Effects of interpolation parameters in inverse distance weighted method on DEM accuracy. J Geomat Sci Technol. 2012;29(1):51–6.Search in Google Scholar

[21] Zhang J, You X, Wan G. Effects of interpolation parameters in multi-log radial basis function on DEM accuracy. Geomat Inf Sci Wuhan Univ. 2013;38(5):608–12.10.1109/GeoInformatics.2011.5981017Search in Google Scholar

[22] Zhu Y, Tan S, Min F, Cui X. IDW ionospheric TEC interpolation and accuracy analysis considering latitude and longitude anisotropy. Geomat Inf Sci Wuhan Univ. 2019;44(11):1605–12.Search in Google Scholar

[23] Zhang J, You X, Wan G. Experimental research on optimization of DEM interpolation parameters. Acta Geodaet Cartograph Sin. 2014;43(2):178–85, 192.Search in Google Scholar

[24] Zhang S, Ning H, Gao H, Ye H, Huang Y, Huang Y. Three-dimensional simulation and spatial characteristics of soil organic carbon based on anisotropy in region. Trans Chin Soc Agric Eng. 2016;32(16):115–24.Search in Google Scholar

[25] Sówka I, Pawnuk M, Grzelka A, Pielichowska A. The use of ordinary kriging and inverse distance weighted interpolation to assess the odour impact of a poultry farming. Sci Rev Eng Environ Sci. 2020;29(1):17–26.10.22630/PNIKS.2020.29.1.2Search in Google Scholar

[26] Wu FC, Wang CK, Huang GH. Delineation of gravel-bed clusters via factorial kriging. Geomorphology. 2018;308:161–74.10.1016/j.geomorph.2018.02.013Search in Google Scholar

[27] Lu GY, Wong DW. An adaptive inverse-distance weighting spatial interpolation technique. Comput Geosci. 2008;34(9):1044–55.10.1016/j.cageo.2007.07.010Search in Google Scholar

[28] Deng XS, Guo YK, Hua XH. Quasigeoid grid network biquadratic interpolation method. Acta Geod Cartograph Sin. 2009;38(1):35–40.Search in Google Scholar

[29] Longley PA, Goodchild MF, Maguire DJ, Rhind DW. Geographical information systems. Principles and technical issues. Vol. 1. USA: John Wiley & Sons; 1999.10.4324/9781315736822-21Search in Google Scholar

[30] Wang J, Gao B, Stein A. The spatial statistic trinity: A generic framework for spatial sampling and inference. Environ Model Softw. 2020;134:104835.10.1016/j.envsoft.2020.104835Search in Google Scholar

[31] De Smith MJ, Goodchild MF, Longley P. Geospatial analysis: a comprehensive guide to principles, techniques and software tools. UK: Winchelsea Press; 2007.Search in Google Scholar

[32] Gao J, Xia YJ, You X, Shu G. Virtual reality in terrain environment simulation. China: PLA Publishing House; 1999.Search in Google Scholar

[33] Tang GA, Liu XJ, Lv GN. Principle and method of DEM and geoscience analysis. China: Science Press; 2005.Search in Google Scholar

[34] Li ZL, Zhu Q. Digital elevation model. China: Wuhan University Press; 2003.Search in Google Scholar

[35] Xu JH. Geographical modelling methods. China: Science Press; 2010.Search in Google Scholar

[36] Liu XZ, Zhang A, Li JZ. Geographical methematics methods. China: Science Press; 2010. p. 154–96.Search in Google Scholar

[37] Tang QY. DPS data processing system–experimental design, statistical analysis and data mining. China: Science Press; 2007.Search in Google Scholar

[38] Jones KH. A comparison of algorithms used to compute hill slope as a property of the DEM. Comput Geosci. 1998;24(4):315–23.10.1016/S0098-3004(98)00032-6Search in Google Scholar

[39] Mitášová H, Mitáš L. Interpolation by regularized spline with tension: I. Theory and implementation. Math Geol. 1993;25(6):641–55.10.1007/BF00893171Search in Google Scholar

[40] Zimmerman D, Pavlik C, Ruggles A, Armstrong MP. An experimental comparison of ordinary and universal kriging and inverse distance weighting. Math Geol. 1999;31(4):375–90.10.1023/A:1007586507433Search in Google Scholar

[41] Zhang J. Scale, uncertainty and integration of spatial information. China: Wuhan University Press; 2008.Search in Google Scholar

[42] Zhao N, Gao H, Wen X, Li H. Combination of convolutional neural network and gated recurrent unit for aspect-based sentiment analysis. IEEE Access. 2021;9:15561–9.10.1109/ACCESS.2021.3052937Search in Google Scholar

[43] Feng Y, Deb S, Wang GG, Alavi AH. Monarch butterfly optimization: A comprehensive review. Expert Syst Appl. 2021;168:114418.10.1016/j.eswa.2020.114418Search in Google Scholar

[44] Wang GG, Deb S, Coelho LDS. Earthworm optimization algorithm: a bio-inspired metaheuristic algorithm for global optimization problems. Int J Bio-Inspired Comput. 2018;12(1):1–22.10.1504/IJBIC.2018.093328Search in Google Scholar

[45] Li J, Lei H, Alavi AH, Wang GG. Elephant herding optimization: Variants, hybirds, and applications. Mathematics. 2020;8(9):145.10.3390/math8091415Search in Google Scholar

[46] Wang GG. Moth search algorithm: A bio-inspired metaheuristic algorithm for global optimization problems. Memetic Comput. 2018;10(2):151–64.10.1007/s12293-016-0212-3Search in Google Scholar

Received: 2022-02-07
Revised: 2022-04-17
Accepted: 2022-04-28
Published Online: 2022-10-03

© 2022 Jinming Zhang, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 2.12.2023 from https://www.degruyter.com/document/doi/10.1515/geo-2022-0400/html
Scroll to top button