Systematic bias of selected estimates applied in vertical displacement analysis

: In surveying problems we almost always use unbiased estimators; however, even unbiased estimator might yield biased assessments, which is due to data. In statistics one distinguishes several types of such biases, for example, sampling, systemic or response biases. Considering surveying observation sets, bias from data might result from systematic or gross errors of measurements. If nonrandom errors in an observation set are known, then bias can easily be determined for linear estimates (e.g., least squares estimates). In the case of non-linear estimators, it is not so simple. In this paper we are focused on a vertical displacement analysis and we consider traditional least squares estimate, two M split estimates and two basic robust estimates, namely M-estimate, R-estimate. The main aim of the paper is to assess estimate biases empir-ically by applying Monte Carlo method. The smallest biases are obtained for M- and R-estimates, especially for a high magnitude of a gross error. On the other hand, there are several cases when M split estimates are the best. Such results are acquired when the magnitude of a gross error is moderate or small. The outcomes conﬁrm that bias of M split estimates might vary for diﬀerent point displacements.


Introduction
The most common estimators, which are used in solving geodetic or surveying problems, are unbiased. The mentioned unbiasedness is a feature of an estimator and results from the computation procedure applied. However, even an unbiased estimator might sometimes yield biased assessments. Such biases are not due to the estimation process but usually to the data. There are several types of such biases in statistics, e.g., sampling, systemic (or systematic), response biases, which reflects the origin of a bias (e.g. Kalton and Schuman, 1982;Cortes et al., 2008;Gobindass et al., 2009). In the case of observation sets in surveying or geodesy, the biases of such types might result from unrecognized measurement errors (the errors which are not reflected in functional or stochastic models). Let an observation set be affected by gross or systematic errors, then estimates of parameters might be biased, and the bias itself is called the systematic one. The bias might be assessed either theoretically or empirically. Considering the latter case, one should apply Monte Carlo simulations. Such an approach is advisable and helpful especially when we want to assess the systematic biases in the case of a particular network. Such an approach can also provide valuable information concerning locations of gross (or systematic) errors within the observation set considered and its influence on every parameter. Here, we discuss systematic biases in the case of vertical displacement analysis. We address estimates which can be applied in such a problem, namely a traditional estimator of the least squares method, common estimators which are robust against outliers: basic M-estimator (the Huber method), R-estimator (Hodges-Lehmann weighted estimator). Additionally, we consider two types of M split estimators, namely the squared M split estimator and the absolute M split estimator. Thus, the main aim of the paper is to assess the biases of the chosen linear and non-linear estimators when an observation set is affected by unrecognized errors. We consider several values of gross errors and their different locations (when they affect different observations at different measurement epochs).

Theoretical Foundations
Let us consider a functional model of geodetic observations as follows where: y is an observation vector, A is a known matrix of coefficients, X is an unknown parameter vector,v is a vector of random measurement errors. If all observations are mutually independent and have the same accuracy, then the identity matrix can be taken as a weight matrix. Without loss of generality we can take such an assumption here. The solutions with other weight matrices and regarding to the estimates applied here can be found in (e.g., Yang et al., 2002;Duchnowski, 2013;Wiśniewski et al., 2019). The least squares estimate (LS) of the parameter vector,X LS , is as followŝ Similarly, for most of M-estimators we can writê is a respective weight function,v i is a estimated standardized error of the ith observation. It is worth noting that in the case of M-estimation, the parameter vector is obtained by applying an iterative process which ends when the parameter vector does not change between the succeeding iteration steps. In this paper one can consider the robust M-estimation with the weight function of the Huber method w H (v i ) (e.g., Gui and Zhang, 1998;Baselga, 2007) where: a is a positive constant (usually assumed between 1.5 and 3.5; in this paper a = 2). Another robust estimation method discussed here is R-estimation. Obviously, one can apply the basic R-estimator (Hodges-Lehmann estimator) of the shift (Hodges and Lehmann, 1963;Duchnowski, 2010); however, in this paper we use the Hodges-Lehmann weighted estimator (HLWE) of the shift∆ HLW because geodetic observations might have different accuracy (Duchnowski, 2013). Considering application of such an estimation in vertical deformation analysis, it is necessary to compute coordinates of each point in several independent ways at the first and second measurement epochs, respectively. Then∆ where: medw (•) is a weighted median operator and H I m , H II i are computed point heights at the first and second measurement epochs, respectively; ≤ m ≤ n I and ≤ in II ; n I , n II are numbers of the computed heights at the first or second measurement epochs, respectively. Note that even if the measurements have the same accuracy, the computed heights might differ in accuracy (according to the way of computation).
Let us introduce the last type of estimation considered here, namely M split(q) estimation, which is a development of a classical M-estimation. The linear model of Eq. (1) is split into q competitive models (Wiśniewski, 2009(Wiśniewski, , 2010 where: l = , . .., q; q is the number of the competitive models. The split of functional models causes obtaining q competitive versions of the parameters. In the case of a univariate model, the observation set might be regarded as an unknown mixture of realizations of q random variables which differ from one another in the location parameters.
Here, we assume two measurement epochs, thus, q = 2 and hence In such a situation M split(q) estimation is called M split( ) estimation; here we will call it just M split estimation. In this paper we consider two types of that method, namely the squared M split estimation (SMS) and the absolute M split estimation (AMS).
The iterative process for SMS estimation is called as the traditional iterative process, and it can be written as follows (e.g., Wiśniewski, 2009;Wyszkowska and Duchnowski, 2019) where: dX is an increment to parameter, H is a Hessian matrix, g is a gradient and The respective weight matrices, w ( ) , are obtained by applying the weight functions of SMS estimation However, the traditional iterative process is not useful for AMS estimation. For that method it is recommended to apply the parallel iterative process described as follows (Wyszkowska and Duchnowski, 2019) where: a = and b = or a = and b = . The respective Hessians and gradients are as follows and based on the following weight functions where: d is an assumed small positive constant. Obviously, the traditional iterative process as well as the parallel one should end for such j = k, for which both gradients are equal to 0 (or close enough to 0) andX ( )

Empirical Tests
Let us consider an example simulated levelling network (Fig. 1). Such a network consists of two reference points P , P and three object points A, B, C.
One can assume that all observations are measured at two different epochs. There is an assumption that all observations are independent and their measurement errors are normally distributed, v i ∼ N( , σ ), and the standard deviation σ = mm. All empirical tests are based on applying the Crude Monte Carlo simulations. In our empirical analyses, we always assume 1000 simulations. The Monte Carlo simulations are the basis for obtaining the following bias of the displacement estimates where: ∆Ĥ MC u is a mean value of displacement estimates from all Monte Carlo simulations, ∆Hu is a theoretical value of a displacement, u ∈ {A, B, C}. In the context of the paper such parameters are equal to the objective point displacements, ∆Ĥ A , ∆Ĥ B , ∆Ĥ C , respectively. All simulations were executed in Mathcad 15.0.

Fig. 1. Simulated levelling network
It is also noteworthy that in the case of M split estimation we should assume a starting point. For SMS estimation, X ( ) =X LS is often used in such a context. However, in the multivariate model (e.g., deformation analysis) such an approach might lead to wrong solutions. Hence, we assume X ( ) = for SMS estimation (Wyszkowska and Duchnowski, 2019). As for AMS estimation, it should start from two starting points, thus, let X ( ) = , and let X ( )  , which is due to the fact that we want to investigate if the bias changes with varying point displacements. In this paper, the authors execute the empirical tests of biases for several different variants which differ from each other not only in the number of gross errors in the observation sets but also in the location and magnitude of such errors.
Firstly, it is noteworthy to address the case with no gross errors in observation sets. Obviously, the results obtained, which are not presented here, confirm that all considered estimates are unbiased (all empirical biases are equal to 0). Generally, it would be advisable to compare the obtained biases to such an accuracy to find out significance of the biases. Monte Carlo simulations allow us also to assess the accuracy of the estimates. Table 1 presents the assessed standard deviations, SD(∆Ĥu), of the estimated object point displacements for the chosen ∆H A .

1: ∆H A = mm, 2: ∆H A = mm
It is worth noting that accuracy of M split estimates clearly depends on the magnitude of the point displacement, especially as for SD(∆Ĥ A ), which is consistent with the method property Duchnowski, 2019, 2020). For the other estimates such relationship is just neglectable. For detailed description of assessment accuracy by applying Monte Carlo simulations see (e.g., Duchnowski and Wiśniewski, 2017;Duchnowski, 2019, 2020 Considering Variant II, it is noticeable that the superior results of biases are obtained for the Huber method and HLW estimation. SMS estimates are usually biased in an unacceptable way (in comparison to their accuracies). It is also shown that this estimation method might give small biases of ∆Ĥ A and ∆Ĥ B only for few values of ∆H A . Whereas AMS estimation gives correct results (all biases close to 0) only for negative values of ∆H A or when ∆H A > 40 mm. It is caused by a coincidence of some ∆H A with value of ∆H B = 20 mm and with the gross error (Wyszkowska and Duchnowski, 2020). On the other hand, the biases of ∆Ĥ A and ∆Ĥ B for AMS estimation are of small magnitude.
In Variant III the smallest biases are obtained for the Huber method and AMS estimation. HLW estimation gives results which are a little bit worse (especially for ∆Ĥ A ) but still very close to required values. SMS estimation provides satisfactory outcomes only for several values of ∆H A (for small positive values of this displacement). For the other values the biases of SMS estimates are similar or even bigger than the biases of LS estimates. In the case of Variant IV, the biases of LS, Huber's and HLW estimates are relatively small. What is more, the curves describing biases obtained for SMS and AMS methods are similar in shape to the respective curves in Variant I. However, the maximal and minimal value of the biases mentioned are of a greater magnitude than in the variant in question. In the next variant the same height difference is affected by a gross error, but of a bigger value. The conventional robust estimates can deal with such a big error and their biases are very small. On the other hand, the results of both variants of M split estimation are strongly affected for many values of ∆H A ; however, for several ∆H A the biases of the methods are small and close to 0. Note that the biases in question are often bigger than the biases of LS estimates. One can conclude that biases of the M split estimates strongly depend on the location and magnitude of unrecognized errors.
In the last variant, the most significant differences between the biases obtained are for ∆Ĥ A and ∆Ĥ B . In this context, the smallest biases are acquired for AMS estimation. The results of SMS estimation are also satisfactory; however, a little bit worse. The results which are obtained for the rest of the methods considered here are similar to one another.

Conclusions
Unbiasedness is one of the most important features of estimators. It can be analyzed either theoretically or empirically by applying Monte Carlo simulations. The second approach can provide information for a particular geodetic network, which is advisable from the practical point of view. It is especially important if we analyze biases which result from the data, namely from observation sets that contain outlying observations. The paper addresses the common estimates of the parameters and their systematic biases in vertical displacement analysis. Such type bias usually results from systematic or gross errors. This study shows that all considered estimates are unbiased when an observation set is not disturbed by unrecognized errors. Notwithstanding this, in the case of occurrence of some nonrandom errors in the observation set, one can realize that estimates become biased. It also worth noting that the biases which are obtained for the methods that are not robust against outliers seem significant in comparison with the accuracy of the estimates, especially for the bigger magnitude of the gross error. Such a bias might depend on the location, magnitude and number of gross errors. Generally, the superior results are obtained for Huber's or Hodges-Lehmann weighted estimates, which is especially vivid for large magnitude of a gross error. These both methods are robust against outliers, so it seems natural that the biases of such methods are small. It is noteworthy that unbiasedness is well-known in the case of LS estimates as well as the basic M-estimates and R-estimates (it was proved in theoretical way). Thus, the empirical examples presented here just confirmed such a feature. On the other hand, the unbiasedness of M split estimates has not been investigated yet. For that reason, the results obtained for that estimation method seems the most valuable and novel. Considering the practical example presented here, one can say that there are several specific cases when the superior results are acquired for M split estimation (especially AMS estimates). Such outcomes are obtained especially when the magnitude of a gross error is small or moderate. On the other hand, in some cases the biases of M split estimates are much bigger than the biases of the other methods. For the gross errors of bigger magnitude, the biases of M split estimates also depend on the sign of the gross error (compare results of Variants II and III). What is more, it is pointed out that M split estimation might depend on the values of the point displacements.
The growing value of the point displacement impacts on M split estimate of that point displacement but also on the estimates obtained for the neighboring points. where h I 1 + 2 mm and h II 1 − 2 mm (Variant VI)