Data quality is a concept related with uncertainty  that needs to fit within the perspective of producer (internal) or users (external). Positional accuracy is an internal data quality element of importance due to errors could propagate to derived maps and that has been studied widely in the literature. On the other hand, the elevation attribute is a critical element in any geoscience application from geologic maps to 3D models , specially important in order to generate digital elevation models (DEMs) and subproducts of them. Several factors affects the overall vertical accuracy of DEMs , nevertheless one of the most important is the uncertainty introduced by geoid transformations added to the theoretical model  that produces differences between local and global DEM vertical accuracy (theoretically the same for a specific application).
The term geoid was employed by mathematician J.B. Listing  to define mathematical surface of equipotential surfaces of Earth’s gravitational field. The geoid’s calculation is usually carried out by combining a global geopotential model of gravitational field with terrestrial gravity anomalies measured locally and complemented with local or regional topographic data according the desired accuracy. Geoid height or geoid undulation is enunciated as height above the ellipsoid taken along the normal line to ellipsoid’s surface .
On the other side, the vertical gradient of the gravity anomaly can be related to the horizontal derivatives of the deflection of the vertical and the geoid height (undulation) N and its first and second derivatives . This first relation, however, may be less suited for practical application as inaccuracies of N are greatly amplified by forming the second derivatives.
Different estimation methods of geoid heights models such as least-squares collocation (LSC), artificial neural networks, fuzzy logic and radial basis functions (Multiquadric-MQ, Thin Plate Spline-TPS) have been widely advocated . Some of these methods are analyzed in  and constitute deterministic methods. Geostatistical techniques have not been explored in this context.
Geostatistics is a discipline that deals with the statistical analysis of regionalized variables (spatially distributed). Therefore, applications of geostatistics are wide. Complementarily, in the Global Navigation Satellite System (GNSS)/levelling method, existence of uniformly and homogeneously scattered points on the study area generally increases the quality of the estimation . In this study, geostatistics is applied in order to obtain geoid undulation estimation and error models in the rural area of Guayaquil town, Ecuador.
The stationarity is one of main hypothesis of geostatistics and can be reached with different assumptions. The first order or strict hypothesis that assumes a mean constant translation invariant; the second order which argues that the mean is constant and the variance is finite; and, the intrinsic or weak stationarity that deals with the increments in order to reach a finite variance of these one. In this context, the application of kriging interpolation method is feasible in order to get an error estimation that cannot be obtained with deterministic methods. When the presence of drift is determined, various alternatives can be implemented. One option is carried out the geostatistical analysis in smaller areas, subdividing the original data set . Nevertheless, if the number of samples is little, geostatistical analysis is not feasible due to few pair of points would use to build the variogram function and conclusions from this would not be correct. Another approach is the definition of a function of drift through the deterministic definition of this, which can be explained by a spatial trend or another variable. Once this has been reached, universal kriging or kriging with external drift can be applied.
On the other hand, multiple simulations can be used to quantify heights uncertainty through the analysis and evaluation of statistics related with a distribution of spatial random fields . Nevertheless, this option has not been chosen because of the typical smooth spatial change behavior of geoid undulation which can be better depicted by kriging estimation.
In the present case, study area is located in Ecuador, in the western of Andean chain as shown in Figure 1. This area is mainly buildup of accreted fragments of Late Cretaceous mafic oceanic plateau basement . The lithology varies through the study area in relation with different processes of weathering.
The number of measurements is not enough in order to subdivide the analysis in smaller areas. For this reason, the analysis has been developed with the whole data set.
Another important task is determination of a possible trend that will be analyzed in the next section. After this task, computation of experimental variogram was carried out and anisotropy was also analyzed.
The structural analysis constitutes a critical step in geostatistical modeling. The computation of variogram model tries to find the model that minimizes a cost function measuring the distance between model and experimental variograms . In the univariate case, it is enough that the sills fulfill a positivity constraint and they can be fitted by ordinary least squares (OLS), weighted least squares (WLS) or generalized least squares (GLS) .
Another interesting approach is the estimation of the trend and the variogram of the random residuals from it simultaneously. The current best practice is restricted maximum likelihood (REML) technique .
Intrinsic random functions of order k (IRF-k), which are associated with generalized covariances, also provide a class of non stationary models that are useful to represent the non stationary solutions of stochastic partial differential equations such as found in hydrogeology . It is possible to use this alternative that can be complemented with iterative least squared technique, which provides other interesting approach for variogram fitting. This last method is based on a numerical minimization (deterministic) of a sum of squares and uses specific techniques to face with numerical singularities and convergence problems . The aim of this paper is firstly obtain an uncertainty error estimation and a detailed geoid undulation estimation map in order to avoid additional field work according the fitness for use (engineering design).
2 Materials and Methods
The study area is located in the southern west of Guayaquil city, in the coastal region of Ecuador, within of Guayas basin. The area is a floodplain with low slope. Guayaquil data set includes 299 measurements of heights and geoid undulation determined in the same locations with an average distance of 3 km. Geoid undulations were obtained by second order geometric leveling (tolerance ±8.4 mm ) or trigonometric levelling with reciprocal (tolerance ±3 cm ) to get mean sea level height (where k is the displacement distance) and cover an average area of near 2,400 km2 that includes a variety of landforms.
Ellipsoidal heights were obtained by Global Position System (GPS) measurements from geodesic supplementary control points with differential static method using double frequency GPS receptors (relative accuracy 10mm ±2 ppm).
The geostatistical approach used in this article to get geoid undulation estimations is novel because typically this is developed with a deterministic approach, such as using polynomial surface fitting, least squares collocation (LSC), finite-element methods and artificial neural networks (with less frequency), that do not let to obtain an uncertainty level related with the estimation. A workflow of the methodology is shown in Figure 2.
The geostatistical analysis can be carried out in a lot of commercial software, but using open source software for research purposes is an interesting option. One of the available alternatives is R environment and statistical program . Inside R, there are an important number of libraries related with the spatial data analysis. Nevertheless, a criterion to choose a specific library is the updating level and the quantity and quality of algorithms available. In this sense, interesting options are: a) RGeostats developed by the Geostatistical Team of the Geosciences Research Center of MINES ParisTech , b) geoR developed by Ribeiro Jr. and Diggle P. ; and c) gstat developed by Pebesma E. . As a first step, exploratory data analysis (EDA) (see Section 2.1) was carried out as a mechanism for data quality assurance in spatial (coordinates) and thematic (geoid undulation) components. This step was critical, since any incorrect value alters any subsequent analysis. An important activity constitutes the review of statistical and spatial distribution.
2.0.1 Exploratory data analysis
The EDA includes basic descriptive statistics of geoid undulation.
To confirm the validity of experimental variogram and data intrinsically, an envelope (Figure 3) drawn based on simulations obtained by repeatedly simulating from the fitted model (parameters of spherical covariance) in the data locations . This envelope shows the variability of the empirical variogram. On the other hand, sometimes a preliminary variography on raw data before minimization (drift removal) could be useful .
2.0.2 Trend analysis
In the current study, trend was identified through plots of spatial distribution in two dimensions (postplot) and directional clouds.
here, βi i = 0, 1, 2, 3 are unknown terms, β0 is called the intercept term, ∈ is the fit error and Xn are explanatory variables.This approach can be extended to spatial coordinates with East and North as predictors and trend as explained variable.
Then the equation for linear or first order trend (Eq. 2) will be: (2)
The predicted value (Z(s0)) at location s0 will be again a linear combination of the observed Z(si), i = 1, …, n values as shown in Equation 3: (3)
here ω1 + ω2 + … + ωn = 1.
The same approach is used to model a spatial second order trend and other covariates, the mathematical development for these cases is not detailed in this work. The trend detection and removal is a previous stage to continue with the next steps of prediction process.
In order to review an important set of spatial trend models, a regression graphical method for all subsets was employed. In this method (interactions of predictors), every possible model is inspected . Additionally, the relative weights of predictors were analyzed according the heuristic method proposed by .
On the other hand, Bouguer gravity anomaly is an interesting factor which can be reviewed in the trend modeling. The anomalies map of the study area was obtained from International Gravimetric Bureau .
For an easy visualization, variograms with a set of different trend considerations have been computed as shown in Figure 4.
The upper-left panel shows a variogram calculated from the original data, instead the upper-right panel uses residuals from a linear model. The lower-left panel uses residuals from a quadratic model and the lower-right panel uses residuals from Bouguer gravity anomaly as covariate.
The differences amongst the resulting variograms illustrate the inter-play between the specification of a model for the mean response and the resulting estimated covariance structure.
In order to assess the normality of geoid undulation’s residuals, a Lilliefors (Kolmogorov-Smirnov) test was carried out. The results in this case indicate that by incorporating gravity anomaly into a model for the geoid undulation appears to achieve approximate stationarity of the residuals, because the corresponding variogram reaches a plateau . Another important observation is that the semi-variance decreases significantly between raw variogram and residual variogram of a model adjusting for gravity anomaly as covariate.
2.1 Anisotropy detection
Once the trend has been removed, it is fundamental to detect remainder anisotropy. For this purpose, graphical tools such as variogram surface and directional variograms are used . The original data showed geometric anisotropy that could be solved with a linear transformation of the spatial coordinates that will make the variation isotropic. Nevertheless, analysis of anisotropy of model trend residuals is a tool to define if a linear transformation is necessary.
2.2 Model validation
Two methods has been carried out to validate a model: 1) leave-one out (LOO) re-estimation, whereby one sample at a time is removed from the data set and re-estimated from the remaining data  (typically used when is not available an independent data set). If enough points are available, the effect of the removed point on the model (which was estimated using that point) is minor ; and, 2) jackknife, whereby an independent data set or a subset of the data is removed completely from the data set, and reestimated using the original or remaining data.
3 Results and discussion
In practical terms, the multinormality nature of the data is a requirement to fulfill and one that is not possible to verify, but if the univariate distribution approximates a normal distribution, the working hypothesis is that the data were generated by a multivariate normal distribution . For this reason, it is important to review the data distribution. The Figure 5 shows the histograms of heights and geoid undulations data.
In the present case, the skewness coefficient is 0.52 for geoid undulations and 2.81 for heights. In some cases, for example in environmental variables such as metal concentrations in soils, if the skewness coefficient is between 0.5 and 1, transformation to square roots might normalize the distribution approximately .
The D statistic for Lilliefors (Kolmogorov-Smirnov) test is 0.046335. According  the normality is confirmed with D <= 0.1. It is not possible to compute significance levels because of the presence of spatial dependences in the samples. As mentioned earlier, outliers cause serious distortions in geostatistics . In the current case there are not outliers according the review of box-plot shown in Figure 6 for geoid undulation (A preliminary analysis also was carried out), but for heights, there are an important number of outliers. This last situation is common because there are little peaks in the study area.
Through the analysis of variogram cloud was possible to detect some points which produce relevant differences between point pairs for the calculation of experimental variogram.
Postplot is shown in Figure 7 and directional clouds taking account geoid undulation in Figure 8. According these graphics, apparently there is a systematic increment of geoid undulation from south-west to north-east, then the use of Kriging with external drift constitutes an interesting option to implement.
The regression graphical method for all subsets is shown in Figure 9; The best model is which that includes the terms: North, North2, East2 and the intercept showing highest R2 with minor numbers of variables.
Another criteria to choose the best covariates configuration was the most parsimonious equation as measured by the Akaike Information Criterion (AIC) .
The relative weights for predictors are 22.76% for East2, 38.60% for North and 38.64% for North2.
The final trend equation (Eq. 6) obtained is: (6)
The coefficients were computed with ordinary least squares due to the observation points are approximately regularly spaced. Otherwise, the proper approach is residual maximum likelihood (REML) . It is often more important to use more useful and higher quality data than to use sophisticated statistical methods. In some situations; for example: if the points are extremely clustered, and/or if the sample is ≤ 100, and/or if the measurements are noisy or obtained using non-standard techniques; however, the model needs to be fitted using the most sophisticated technique to avoid making biased predictions .
Regarding Bouguer gravity anomaly, the correlation coefficient between this and geoid undulation for the study area is 0.81. This value shows a strong positive linear relationship between gravity anomaly and geoid undulation, situation that makes physical sense. Incorporating gravity anomaly into a model for the geoid undulation seems to achieve approximate stationarity of the residuals, because the corresponding variogram reaches a plateau . Another important observation is that the semi-variance decreases significantly between raw variogram and residual variogram of a model adjusting using gravity anomaly as covariate.
Once the trend was identified, it was possible to compute the residual or true variogram. In order to compare the trend regression models, some statistics were computed. A summary of these one are shown in Table 1.
For both trend models, the mean error (ME) proves the lack of bias of the estimate, with a value close to 0 .
The experimental variogram was computed with the classical equation (Eq. 7), (where Nh is the number of pairs of points at distance h(lag)), because in this case there are not outliers according the analysis of geoid undulation carried out. When an important number of these is detected, the use of a robust variogram estimators, such as Cressie an Hawkins, Dowd and Genton  is suggested. (7)
The spherical, exponential, gaussian and cubic covariance models were analyzed in order to carry out the variogram fitting. Conceptually, the gaussian model is suitable for slowly-varying variables such as elevations . Nevertheless, in this study, the spherical model shows the smallest residual sum of squares (RSS) (See Table 2) and the sill (0.8) is reached in 12.1 km. The spherical model has been used in other geostatistical analysis to carry out the variogram modelling . The REML estimation was carried out based in the parameters of this model. Figure 10 shows the REML and spherical OLS fittings. Both of them are very similar. The REML parameters are shown in Table 3. The AIC shows that the absence of spatial dependence in this study is not a valid hypothesis (high AIC in nonspatial model).
3.2 Anisotropy evaluation
The Figure 11 shows the variogram surface, computed in gstat package , up to 45 km and with a bin width of 2.5 km. These parameters were selected because the variogram is only valid for a distance of one half of the field size since for larger distances the variogram begins to leave data out of the calculations  and the bin width is related with the spatial distribution of samples. The anisotropy is evident, with maximum continuity in the direction 110° clockwise from north. To confirm this assumption, directional variograms are shown in Figure 12. Notice that the variograms are computed in 100°, 105°, 110° and 115°. The 110° variogram shows the least variation.
As mentioned in Section 3.1, once that the trend was modeled, a variogram surface was drawn from the residuals of trend model. The Figure 13 shows anisotropy with few significant variances so this could be negligible in the kriging procedure.
The result of LOO cross validation method is shown in Figure 14.The errors have a normal distribution, with the major proportion of errors in the range (-0.5; 0.5).
Initially, the direction of maximum continuity was identified in 110° and the distance parameter for this direction was 40,000 m, nevertheless the trend model identified incorporates this particularity as evidenced in the variogram surface of residuals.
The estimation phase is the last one in Kriging which is an interpolation method that provides the best unbiased linear estimates (minimum variance) of point values or block averages . The spatial continuity aspect is included in Kriging by variogram model fitted as mentioned above. The error estimation map of geoid undulation is shown in Figure 15.
Finally, in order to compare the geoid undulation estimations carried out in this study with Earth Gravitational models (EGM) used in Ecuador, a set of 33 validation points (Figure 16) which were measured in an independent field trip with the same methodology that were obtained the sampling points were employed. It is important to notice that these points were not used in the computation of geoid undulation estimation map. The mean square error (MSE) ratios in relation with the estimation map are 300 and 225 for EGM96 (spherical harmonic coefficients complete to degree and order 360) and EGM08 (this model contains the spherical harmonic coefficients of the gravity field and their errors to degree and order 2,160 and partially to 2,190 , respectively.
The results of estimation in validation points are shown in Figure 17.
The classical use of deterministic methods in order to get geoid undulations maps have not let to define the uncertainty of this variable in these estimations. For this reason, the present work uses a geostatistical approach. The advantage of kriging consists of involving all data set in the analysis. Weights applied do not depend only on the distance among the measured points and location of the predictor, but also on the spatial arrangement of measured points . Since the geoid undulation is typically a slowly-varying variable, this study carries out the analysis of trend with emphasis.
The sill parameter of geoid undulation variogram decreases almost 100 times when gravity anomaly is considered as trend. For this reason, Bouguer anomaly is critical in the geoid undulation estimation process.
The statistics computed for the trend models analyzed in this study: a) spatial trend model and b) Bouguer anomaly as external drift are very similar. This situation is explained because the geoid undulation, typically, varies slowly in the geographical space and is related with physical processes such as tectonic structure dynamics .
A comparison of the estimation results with Earth gravitational models (typically used in Ecuador) was carried out in order to compare the performance of the proposal methodology. In comparison with Earth gravitational models: EGM96 and EGM08, the geoid undulation prediction map generated in this study shows an outstanding improvement analyzing the mean square error and the distribution of errors (Figure 17). According the characteristics of kriging method, the distribution of errors is unbiased. In relation with EGM96, the kriging’s range of errors is almost the third part of this one. On the other hand, the EGM08 errors distribution is left-skewed. For this reason, the application of geostatistical techniques in this field represents a practical tool in order to use its products in detailed engineering projects where altimetric accuracy is fundamental.
Future research must be focused on optimal sampling design based in geoid undulations surveys carried out in Ecuador and the subsequent geostatistical analysis. Applications related with tectonic structure analysis must be addressed.
The authors wish to acknowledge to Instituto Geográfico Militar (IGM) of Ecuador for providing data collected in various field trips.
Goodchild, M. F., Devillers, R., and Jeansoulin, R., Fundamentals of spatial data quality, volume 662. John Wiley & Sons, 2006. Google Scholar
Katerji, W., Vario-Model for Estimating and Propagating DEM Vertical Accuracy: Case of Lebanon. PhD thesis, UPM, 2016. Google Scholar
Doganalp, S. and Selvi, H. Z., Local geoid determination in strip area projects by using polynomials, least-squares collocation and radial basis functions. Measurement, 73:429–438, 2015. CrossrefWeb of ScienceGoogle Scholar
Sansn, F. and Sideris, M. G., Geoid determination: theory and methods. Springer Science & Business Media, 2013. Google Scholar
Isaaks, E. H. and Srivastava, R. M., An introduction to applied geostatistics. Oxford University Press, 1989. Google Scholar
Machiels, L., La Roca Magica-Zeolite occurrence and genesis in the Late Cretaceous Cayo arc of Coastal Ecuador. PhD thesis, Department of Earth and Environmental Sciences. Division of Geology. KU LEUVEN, 2010. Google Scholar
Desassis, N. and Renard, D., Automatic variogram modeling by iterative least squares: Univariate and multivariate cases. Mathematical Geosciences, 45(4):453–470, 2013. CrossrefWeb of ScienceGoogle Scholar
Cressie, N. A. and Cassie, N. A., Statistics for spatial data, volume 900. Wiley New York, 1993. Google Scholar
Chiles, J.-P. and Delfiner, P., Geostatistics: modeling spatial uncertainty, volume 497. John Wiley & Sons, 2012. Google Scholar
R Development Core Team., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2006. ISBN 3-900051-07-0. Google Scholar
Renard, D., Bez, N., Desassis, N., and Beucher, H., RGeostats: The Geostatistical package 10.0.8. MINES ParisTech. Free download from: http://cg.ensmp.fr/rgeostats, 2014.
Ribeiro Jr., P. and Diggle, P., geoR: a package for geostatistical analysis. R-NEWS, 1(2):15–18, 2001. Google Scholar
Hatvani, I., Leuenberger, M., Kohán, B., and Kern, Z., Geostatistical analysis and isoscape of ice core derived water stable isotope records in an Antarctic macro region. Polar Science, 2017. Web of ScienceGoogle Scholar
Faraway, J. J., Linear models with R. CRC Press, 2014. Google Scholar
Kabacoff, R., R in action: data analysis and graphics with R. Manning Publications Co., 2015. Google Scholar
Bonvalot, S., Balmino, G., Briais, A., Kuhn, M., Peyrefitte, A., Vales, N., et al., World gravity map, 636 1:50,000,000 map, eds. Technical report, BGI-CGMW-CNES-IRD, Paris. 637, 2012. Google Scholar
Diggle, P. and Ribeiro, P. J., Model-based geostatistics. Springer Science & Business Media, 2007. Google Scholar
Rossiter, D. G., Geostatistics & open-source statistical computing. Exercise 5: Predicting from point samples, 2014. Google Scholar
Rossi, M. and Deutsch, C., Mineral resource estimation. Springer Science & Business Media, 2013. Google Scholar
Rossiter, D. G., Geostatistics & open-source statistical computing. Lecture 6 - Assessing the quality of spatial predictions, 2014. Google Scholar
Olea, R., Geostatistics for engineers and earth scientists. Springer Science & Business Media, 2012. Google Scholar
Akaike, H., Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike, pages 199–213. Springer, 1998. Google Scholar
Hengl, T., A practical guide to geostatistical mapping. University of Amsterdam, 2009. Google Scholar
Guagliardi, I., Cicchella, D., De Rosa, R., and Buttafuoco, G., Assessment of lead pollution in topsoils of a southern italy area: Analysis of urban and peri-urban environment. Journal of Environmental Sciences, 33:179–187, 2015. CrossrefWeb of ScienceGoogle Scholar
Fabijánczyk, P., Zawadzki, J., and Wojtkowska, M., Geostatistical study of spatial correlations of lead and zinc concentration in urban reservoir, study case Czerniakowskie lake, Warsaw, Poland. Open Geosciences, 8(1):484–492, 2016. Web of ScienceGoogle Scholar
Armstrong, M., Basic linear geostatistics. Springer Science & Business Media, 1998. Google Scholar
Banerjee, P., Foulger, G., Dabral, C., et al. Geoid undulation modelling and interpretation at Ladak, nw Himalaya using Gps and levelling data. Journal of Geodesy, 73(2):79–86, 1999. CrossrefGoogle Scholar
About the article
Published Online: 2017-06-14
Citation Information: Open Geosciences, Volume 9, Issue 1, Pages 255–265, ISSN (Online) 2391-5447, DOI: https://doi.org/10.1515/geo-2017-0021.
© 2017 E.G. Chicaiza et al.. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0