Adjustment models for multivariate geodetic time series with vector-autoregressive errors


 In this contribution, a vector-autoregressive (VAR) process with multivariate t-distributed random deviations is incorporated into the Gauss-Helmert model (GHM), resulting in an innovative adjustment model. This model is versatile since it allows for a wide range of functional models, unknown forms of auto- and cross-correlations, and outlier patterns. Subsequently, a computationally convenient iteratively reweighted least squares method based on an expectation maximization algorithm is derived in order to estimate the parameters of the functional model, the unknown coefficients of the VAR process, the cofactor matrix, and the degree of freedom of the t-distribution. The proposed method is validated in terms of its estimation bias and convergence behavior by means of a Monte Carlo simulation based on a GHM of a circle in two dimensions. The methodology is applied in two different fields of application within engineering geodesy: In the first scenario, the offset and linear drift of a noisy accelerometer are estimated based on a Gauss-Markov model with VAR and multivariate t-distributed errors, as a special case of the proposed GHM. In the second scenario real laser tracker measurements with outliers are adjusted to estimate the parameters of a sphere employing the proposed GHM with VAR and multivariate t-distributed errors. For both scenarios the estimated parameters of the fitted VAR model and multivariate t-distribution are analyzed for evidence of auto- or cross-correlations and deviation from a normal distribution regarding the measurement noise.


Introduction
Geodetic observation models of surveyed phenomena often include unknown quantities in terms of parameters to be estimated by adjustment computations. The most general structure of observation model is described by nonlinear condition equations in which multiple observations and unknown parameters may be linked to each other [1]. This general case of adjustment calculus was called Gauss-Helmert model (GHM) by Wolf [2]. There exist numerous special cases of this model that have been studied extensively, e. g., the errors-in-variables (EIV) model adjusted by the method of total least squares [3] or by its various generalizations [4,5,6,7]. Another special case of the GHM is the Gauss-Markov model (GMM), in which the observations are still linked to functional model parameters but occur in separate equations [8,9,10]. Such models may also contain variance-covariance information about the measurements; this information defines the so-called stochastic model. Since adjustment calculus has been elaborated on the basis of matrix calculus, variance-covariance information regarding the observables is usually arranged as a matrix, called the "variance-covariance matrix" (VCM).
Since the dimension of the VCM equals the number of observations, the storage and processing of this matrix in the course of the adjustment computations can be challenging, especially because modern sensors such as accelerometers, laser scanners, laser tracker or satellitebased sensors tend to provide huge data sets consisting of millions or even billions of observations even within relatively short periods of time. It is usually not an option to omit the variance-covariance information within the adjustment computations since the standard parameter estimators lose desirable quality characteristics when this information is neglected. For instance, the least-squaresestimator in the GMM ceases to be the best linear unbiased estimator (BLUE) when the VCM is dropped in the estimation equation [11]. Furthermore, it is often desirable in geodetic data analysis to obtain a realistic description of the uncertainties or variance-covariance information of the estimated parameters, which information is unavailable when an adequate stochastic model of the observables is missing. The "curse of dimensionality" concerning the VCM is aggravated by the increasingly practiced combination of multiple observation series stemming, e. g., from 3D sensors or from sensor networks. Time series of such measurements, e. g., GPS and laser scanner measurements, frequently have been found to be correlated in such a way that the VCM is fully populated, e. g., [12,13,17,16,14,15].
To alleviate the computational burden or even computational infeasibility in connection with the storage and processing of a huge VCM, an alternative approach to the stochastic modeling of variance-covariance information can be applied. The goal of that approach is to fully decorrelate the observables by transforming the observation model in such a way that the VCM of the transformed observables becomes a diagonal matrix, e. g., the identity matrix [18]. A diagonal VCM can easily be handled within the adjustment procedure [19], e. g., through vectorization of the occurring matrix products. In geodetic applications the correlations of the observables may frequently be described as colored noise [20], in which case the stochastic model can be expressed as an autoregressive (AR) or autoregressive moving average (ARMA) process and the transformation of the observation equations achieved by means of a computationally efficient digital de-correlation filter [21,22]. Such processes have also been estimated by means of total least squares based on an EIV model [23]. To model auto-and cross-correlations of multivariate time series, the aforementioned AR and ARMA processes have been extended to vector-autoregressive (VAR) and vectorautoregressive moving average (VARMA) processes [24]. The model order of these recursive processes can be specified to define how far the correlations reach into the past. Thus, VAR(MA) processes can be used to model quite detailed patterns of auto-and cross-correlations. The innovation of the current contribution is to incorporate VAR processes into the GHM as colored noise models.
The white-noise components of (V)AR or (V)ARMA processes are usually assumed to be normally distributed. However, this assumption may be unrealistic or at least questionable in a practical situation, e. g., due to numerous outliers afflicting the measurements. It may then be safer to replace the normal distribution by a larger family of distributions defined by probability density functions having thicker tails than the "ideal" normal distributions. For instance, the usage of generalized t-distributions and of scaled t-distributions has been proposed in connection with ARMA processes [25]. The scaled, multivariate tdistributions have also been applied in connection with VAR processes within rather simple GMM [26,27]. A key feature of using multivariate t-distributions is that the associated degree of freedom (df) can be estimated alongside the functional parameters, the VAR coefficients and the scale or cofactor matrix; such a method may be called a self-tuning estimator [28] since it does not involve a tuning constant for classifying the in-and outliers, in contrast to for instance Huber's M-estimator [29]. The df provides evidence of deviations of the measurements' actual probability distribution from a normal distribution: For large df the multivariate t-distribution is similar to a multivariate normal distribution, whereas the t-distribution has much thicker tails than the latter for a small df, which may indicate the presence of numerous outliers.
The estimation procedure is implemented as an iteratively reweighted least-squares (IRLS) method. The weights are treated as random variables, which are numerically determined by the expectation step within an expectation maximization (EM) algorithm [30]. The various types of model parameters are estimated group-by-group within the maximization step. To deal with the condition equations within the GHM, constrained EM in the spirit of [31] is applied. Due to the down-weighting of observations with extreme errors, 1 this method can be expected to yield a robust estimator for the GHM. Koch introduced the GHM with t-distributed errors and subsequently extended the model to include variance components [32,33]. This type of GHM is now extended to include a VAR model with multivariate-t-distributed errors. This yields an approach complementary to that of modeling a small number of outliers as additive, unknown offsets within the afflicted observations, as done in [34,35]. Therefore, the proposed adjustment model is directed at applications in which rather large numbers of outliers can be expected. To model the auto-and cross-correlations, we do not employ the more general VARMA processes since we found the estimation of the moving average part by means of an EM algorithm to diverge.
The paper is structured as follows. In Section 2, we specify the multivariate observation model, the correla-tion model, and the stochastic model, which jointly define the GHM with VAR and multivariate t-distributed errors. In Section 3, we formulate the optimization principle employed to adjust this model, and we derive the normal equations to be solved within the EM algorithm. We show that this algorithm estimates the functional parameters, the VAR coefficients and the parameters defining the multivariate t-distribution essentially via IRLS. The Monte Carlo simulation described in Section 4 serves the purpose of validating the algorithm, by showing the biases to be expected in the practical situation of data approximation by a circle based on a GHM with VAR and multivariate tdistributed errors. In Section 5 we demonstrate the application of the algorithm for the adjustment of a real, multivariate accelerometer data set, whose offset and drift are modeled as part of a GMM (viewed as a special case of the GHM) with VAR and multivariate t-distributed errors. In a second applied scenario real laser tracker measurements with outliers are adjusted to estimate the parameters of a sphere employing the proposed GHM with VAR and multivariate t-distributed errors. Finally, some limitations and consequently potential extensions of the presented methodology to be explored in the future are outlined in the conclusions and outlook.

Adjustment models for multivariate time series
The purpose of this section is to develop adjustment models suitable for the fitting of multivariate geodetic time series. According to the standard categorization in geodetic adjustment theory these models have two main components: the deterministic model and the stochastic model. The following sub-sections demonstrate some possible structures of these models that have been found useful in geodetic time series analysis. These structures allow for certain generalizations of the standard GHM and GHM by allowing their random deviations to be auto-and crosscorrelated through (V)AR processes while potential deviations from a normal distribution are modeled stochastically by a multivariate t-distribution.

Deterministic part of the Gauss-Helmert model
We suppose that N time series are observed simultaneously without gaps, so that each series consists of n mea-surements. The measurements can then be doubly indexed in the form ℓ k,t with k = 1, . . . , N and t = 1, . . . , n, where k represents the group index and t a time-related index. We further suppose that the measurements are used to estimate a specified parameter-dependent functional model, which describes spatial objects possibly varying throughout time. In the most general case, each observation is modeled as the sum of an observation-specific location parameter μ k,t and a corresponding random deviation or "error" e k,t , that is, Each type of quantity in these equations can be vectorized in different ways to obtain more compact notations. The rule will be that all column vectors are symbolized by bold-faced, lower-case Greek letters (in case of unknown parameters) or Roman letters. Firstly, the quantities of one type occurring throughout the time series k = 1, . . . , N may be collected for every time instance t = 1, . . . , n via to define the multivariate observation time series Secondly, the observations, location parameters and random deviations can be grouped series-by-series for k = 1, . . . , N according to to define the observation series Note that the additional colons are used to distinguish the two types of vectors, e. g., the observations ℓ 1 at time instance t = 1 and the observations ℓ 1: within time series k = 1. Finally, it will be convenient to combine the vectors indexed by time to the "complete" vectors so that (3) turns into the single vector equation Note that the relationship μ = ℓ−e obtained from (7) would be written asl = ℓ + v in the usual notation of the GHM, wherel are the (unknown) "adjusted observations". The choice of terminology ("location parameter") and notation (μ) is motivated by our intention to describe the observations in such a way that maximum likelihood (ML) estimation via an EM algorithm can easily be established. Note also that the vectors ℓ t , μ t and e t can be retrieved from the corresponding complete vectors ℓ, μ and e via where the (N × Nt)-matrix J t consists of rows This matrix can be written as the block matrix beginning with t − 1 square zero matrices 0 N of dimension N, followed by the N-dimensional identity matrix I N , and ending with n − t further zero blocks 0 N . In practice the goal usually is to fit a specific functional model to the given measurements, e. g., a circle. Such a model generally depends on unknown parameters β = [β 1 ⋅ ⋅ ⋅ β m ] T , e. g., the center coordinates and the radius of a circle. To incorporate this additional model structure it is assumed that the functional parameters β and the location parameters μ fulfill condition equations . . .
which can be written in vectorized form as This equation and (7) form the deterministic part of the GHM, which was defined similarly in [32]. The given somewhat unusual notation, especially the introduction of location parameters, in the setup of the GHM is more aligned with the method of constrained ML estimation than with the usual method of LS. The former estimation method will be more suitable than the latter in connection with the estimation of VAR models and usage of a non-Gaussian pdf.

Deterministic part of the Gauss-Markov model
With certain kinds of adjustment problems each location parameter μ k,t can be expressed directly as the value of a function f k,t depending on the functional parameters β, through the relationship Then, the observation equations (1) take the form Vectorizing the (unknown) function values f k,t (β) similarly as in (2) and (4), these observation equations can be arranged by time as or by time series as or collectively, by vectorizing f t (β) for t = 1, . . . , n as in (6), as This structure of observation equation defines the GMM. Using (14), the condition equations (13) are then simply taken as h(β, μ) = μ − f(β) = 0. In adjustment computations it is more common to state the observation equations in the equivalent form where the "corrections" v are related to the random deviations e via sign reversal. In view of the additional stochastic model imposed on the observation model later on, the representation (18) is preferred in this paper. In case the function f can be written as a matrix-vector product Xβ, we speak of a "linear model", where X is called the "design matrix"; we then write instead of (18). Linear models have been studied thoroughly in mathematical statistics and geodesy [36,18]. The parameters β can be estimated without any additional assumptions, e. g., by applying "ordinary LS". However, if the observables are correlated or heteroscedastic, the LS estimator will not be BLUE if the correlations and unequal variances present are not taken into account by incorporating a corresponding stochastic model into the observation model and the estimator.

Stochastic modeling 2.2.1 Modeling of correlations using vector-autoregressive processes
The simplest case of a stochastic model occurs for uncorrelated observables with identical variances, so that the VCM Σ is the rescaled identity matrix σ 2 I, where the variance factor σ 2 may be a known number or an unknown parameter. Geodesists have frequently pointed out that observables are usually correlated (e. g., [37]) and that it is of great practical importance to take correlations into account in geodetic data analysis (e. g., [38]). In case of correlated measurements the VCM Σ typically is either fully populated or has a band-limited structure (see [21]). Its decomposition therefore yields Σ = σ 2 Q, where the so-called "cofactor matrix" Q has the same population characteristics as Σ. When a univariate time series is surveyed the correlations between the various random variables are called "auto-correlations". Oftentimes these can be modeled by means of an AR process for some p ∈ ℕ (the "model order" of the AR process) and some α 1 , . . . , α p (the "coefficients" of the AR process).
(U t ) t∈ℤ is a white noise series that recursively generates an auto-correlated time series (E t ) t∈ℤ (see, e. g., [42] for details). In case of multivariate observables, additional correlations between the random variables of different time series, so-called "cross-correlations", may occur. To take both auto-and cross-correlations into account, an error process (E t ) t∈ℤ can be defined to be a VAR(p) process by the equation (cf. Sect. 8.4 in [24]) where (U t ) t∈ℤ is a multivariate form of white noise. The coefficients of this VAR process are now arranged within the (N × N)-matrices

Modeling of observational weights using multivariate t-distributions
Another standard decomposition of a the VCM Σ is given in terms of the "weight matrix" P by The main diagonal of the VCM Σ consists of the variances of the individual observables, whereas the main diagonal elements of P are commonly referred to as "weights". The meaning of the weights is twofold in practice: On the one hand, weights may be interpreted as "repetition numbers", i. e., as the number of times that the corresponding observations have occurred. On the other hand, weights can be assigned to observations in order to diminish or increase their relative importance and influence in statistical inference. Abnormal or extreme observations ("outliers"), located in the tails of the probability density function (pdf) defining their probability distribution, are frequently encountered in geodetic data analysis. One standard approach to handling such observations consists in their "down-weighting" by giving them lesser weights than non-outliers ("inliers"). Alternatively, it is possible to carry out statistical tests in order to identify and delete outlying observations (e. g., [39,40,41]). However, the deletion of observations creates data gaps, which prohibit the usage of recursive time series models, such as the VAR processes considered in Sect. 2.2.1. Therefore, we restrict ourselves to the approach based on the down-weighting of outlying observations, which avoids their elimination and consequential data gaps. Since the data sets analyzed nowadays tend to be huge and oftentimes have stochastic properties unknown to the practitioner, it is necessary to have a data-adaptive and computationally efficient procedure for computing weights at hand. Mathematically, such a procedure is based on a function (the "weight function"), which corresponds to a certain pdf of the random deviations.
For the purpose of multivariate modeling including potential cross-correlations, the well-known Student's t-distribution can be generalized to an N-variate t-distribution t(μ, Ψ, ν) with location parameter vector μ, scale matrix Ψ and df ν, defined by the pdf (cf. [43]) Since the scale matrix Ψ is related to the VCM via Σ = ν ν−2 ⋅Ψ for df ν > 2, it may be viewed as a "cofactor matrix". Then, a multivariate t-distributed form of white noise of a VAR process can be defined by (see also [44]). The joint pdf of n such white noise vectors U 1 , . . . , U n is therefore given by This pdf is more intricate and therefore more difficult to handle in maximum likelihood (ML) estimation than the pdf of a multivariate normal distribution. However, it is well known that a multivariate t-distributed random vector X ∼ t(μ, Ψ, ν) has the property of being a conditionally normally distributed random vector X ∼ N(μ, Ψ/w) where w is a given realization of a gamma-distributed random variable W ∼ G (ν/2, ν/2) W (cf. [43]). Due to this property, the distributional model (26) can be replaced by the assumptions and It can be seen in (28) that the white noise vector U t has the VCM Σ t = Ψ/w t , which is a rescaled version of the cofactor matrix Ψ. For a small value w t , the cofactor matrix Ψ may be viewed as being "inflated"; in the univariate case N = 1, this rescaling approach therefore has been called "variance inflation model" (e. g., [41]). The idea of this type of outlier model is to assign a relatively large variance to an error which is located in the tails of the pdf, in comparison to an error near the center of the pdf. Under the distributional assumptions (28)-(29) the joint pdf of white noise vectors U 1 , . . . , U n and the random variables W 1 , . . . , W n assigned to n measurements can be shown to result in and (27) defines the corresponding marginal distribution of U 1 , . . . , U n . Although the joint pdf (30) appears to be even more complicated than the marginal pdf defining the multivariate t-distribution, considerable simplifications can be achieved by taking a conditional expectation of the log-likelihood function defined by this pdf. This step, which exploits the fact that the pdf (30) now involves natural exponential functions exp(.) in contrast to (27), corresponds to the "E-step" of an EM algorithm shown in Sect. 3. We first proceed by defining particular instances of the aforementioned log-likelihood functions which include the deterministic model of the GHM or GMM, as well as the VAR process.

Representations of the adjustment models as likelihood functions
A likelihood function L can be defined by a pdf by fixing an element of its domain (the observation space) and by letting its parameters be "variables". The pdf (30) directly involves as parameters the entries of the cofactor matrix Ψ and the df ν. When certain realizations u 1 , . . . , u n of the white-noise vector are given, they lead to realizations e 1 , . . . , e n of the VAR process (22). These computational relationships can inversely be written as where e 0 , . . . , e 1−p are conventionally taken to be zero vectors. Moreover, the realizations e 1 , . . . , e n of a VAR process occur within the condition equations (3) of the GHM, which can inversely be written as Consequently, (31) can be recast as Here, for brevity of expressions, we defined the so-called lag polynomial A(L) (see [24], p. 243) based on the lag operator L, whose powers are used to shift the time index t according to the rule L j Z t : Thus, the pdf (30) indirectly depends also on the VAR coefficients in the matrices A 1 , . . . , A p and on the location parameters μ 1 , . . . , μ n of the deterministic model in case of a GHM. In the case of the more structured GMM, f t (β) is substituted for μ t in (32) and (33) in view of (16), so that the pdf (30) then depends on functional model parameters β rather than on μ 1 , . . . , μ n .
To fix the dimensions of the parameter space, the kth rows of the VAR coefficient matrices are combined to column vectors of length p ⋅ N which in turn are combined to the single column vector Similarly, the row-wise vectorization of the inverted cofactor matrix Ψ −1 yields a column vector of length N 2 , which is denoted byψ. Then, the parameters that the pdf for the GHM and the GMM with VAR and multivariate t-distributed errors directly and indirectly depend upon can be grouped according to and Taking the natural logarithm of the pdf (30) with u t given by (33), the log-likelihood functions of this GHM can be obtained, see (81) in the Appendix. The log-likelihood function for the GHM describes only the observation model, so that the constraints h(β, μ) = 0 in (13) need to be provided for in addition when the model parameters θ GHM are estimated. Thus, the complete parameter vector consists of the groups A similar log-likelihood involving multiple (univariate) AR processes instead of a single (multivariate) VAR process was established in [46]. To obtain the log-likelihood function log L(θ GMM ; ℓ) for the GMM, which was employed in [26], the location parameters μ t are simply replaced by f t (β), in which the aforementioned constraints are omitted since the location parameters μ are not modeled explicitly.

Estimation
Maximum likelihood estimation of the model parameters θ GHM based on the log-likelihood function (81) requires values for the unobservable data w. Since a stochastic model has been specified for these quantities, such values can be determined as part of an EM algorithm [30].
A similar approach has previously been carried out for numerous models that can be identified as special cases of the current GHM. In all previous instances the EM algorithm was found to be suitable in view of its stable convergence and its computationally simple form as IRLS (see, e. g., [47,48]), where the entries of w play the role of the weights. The iterations are indexed by s in the following.
In the expectation step ("E-step") of the EM or IRLS algorithm the weights are computed as conditional expectations of the random variables W based on their stochastic model, using the current solutionθ (s) GHM . In the subsequent maximization step ("M-step") a new solutionθ (s+1) GHM is computed by maximizing the conditional expectation of the log-likelihood function ("Q-function") using the current weights.
The Q-function for the GHM with VAR and multivariate t-distributed errors is given by (82) in the Appendix. Evaluation of the conditional expectation of the random weight variables W yields the computational formula The tilde on the conditional expectation w (s) t is used to distinguish this "theoretical" type of quantity from estimates. To maximize the resulting Q-function (85) under the constraints the Lagrangian is defined as which involves as additional parameters an (r × 1)-vector λ of unknown Lagrange multipliers. This approach leads to a "constrained EM algorithm" [31]. In the frequently encountered case that h(β, μ) cannot be expressed "linearly" in the form of for some numerically fixed matrix X, matrix B and vector m, linearization is applied as shown for instance in [32]. Here, we define the (m × 1)-vector of "incremental parameters" the (r × 1)-vector of "misclosures" the (r × 1)-vector of "pseudo-misclosures" the (r × m)-Jacobi matrix and the (r × Nn)-Jacobi matrix Whereas Δβ is unknown, the quantities (44)- (47) are computable with the given parameter estimates of iteration step s. For notational brevity, the dependency of the latter quantities on the iteration step will not be indicated in the following; thus, we write m, m, X and B instead of m (s) , m (s) , X (s) and B (s) . Forming now the partial derivatives of the linearized Lagrangian (86) given in the Appendix with respect to the different groups of parameters and setting these equal to zero, one obtains 1. the normal equations for the incremental functional parameters Δβ: 2. the normal equations for the location parameters μ: 3. the normal equations for the VAR coefficient matrices denoting with e i = 0 3×1 for all i ≤ 0 and letting W (s) be the n-dimensional diagonal matrix with diagonal entries w (s) 1 , . . . , w (s) n ; 4. the normal equations for the inverse cofactor matrix Ψ −1 : 5. the normal equation for the df ν (cf. Sect. 5.8.2 [47]): involving the digamma function dg; 6. the normal equations for the Lagrange multipliers λ: Since the location parameters are obtained from (49) as (supposing that the inverse exists), these parameters can be eliminated from (56) via substitution, which results in Together with (48), this gives the normal equation system The parameter groups (λ, Δβ), (A 1 , . . . , A p ), Ψ −1 and ν are estimated in that order by successively solving (59), (52), (54) and (55). The parameter group μ can be determined through (57) after solving for (λ, Δβ). While equation systems (59), (52) and (54) are linear, the solution of (55) requires a zero search. This step-wise solution approach defines an ECM algorithm, which extends previously established ECM algorithms in simpler models.
Next, the new solution of the VAR coefficient matrices is computed from (52), based on the new correlated residuals computed from the model (7) and the corresponding residual matrix with the usual boundary values e (s+1) i = 0 3×1 for all i ≤ 0. The current VAR model defines a new de-correlation filter A (s+1) (L), by which the correlated residuals e (s+1) are transformed into the white noise residuals in view of (33) and (64). Conditional on these values, equation system (54) gives rise to the solution for the cofactor matrix Finally, ν (s+1) may be found by a zero search to satisfy (55). It is generally known that the ECM algorithm can be sped up by solving instead the equation where Equation (68) is obtained by defining the log-likelihood function directly on the joint pdf (30) of the white-noise series based on the multivariate t-distribution, and setting its partial derivative with respect to ν equal to zero. This modification of the ECM algorithm has been called ECM either (ECME) [49,50]. The zero search can be performed with mathematical standard software such as the INTLAB library ([51]) or using a grid search. The estimation of the df concludes the current M-step. The E-and M-step (in the form of an ECME algorithm) are repeated until certain stop criteria are fulfilled, e. g., a combination of a maximum number of iterations and thresholds for the minimum absolute difference between the current and the previous solution. We defined the threshold d ν by a greater value than the other threshold since the df ν fluctuates quite strongly and is the parameter most challenging to estimate. In the following, the solution resulting from the final iteration step is indicated by using "hats", that is, β,μ, etc. This EM algorithm was tested in the Monte Carlo simulation of Sect. 4.1. In the special case of a GMM with VAR and multivariate t-distributed errors, a similar unconstrained algorithm can be derived [26]. The methodology is applied in the real data study of Sect. 4.2 and Sect. 4.3.

Gauss-Helmert model: Adjustment of simulated data of a 2D circle
In the first part of this section, the performance of the EM algorithm derived in previous Sect. 3 in producing correct estimates within a Monte Carlo (MC) simulation is studied. For this purpose, generated measurements of N = 2 time series are approximated by a 2D circle defined by the nonlinear equation On the one hand, the vector β of functional parameters consists of the circle center coordinates c x , c y and radius r, whose simulated true values are specified by On the other hand, the vector μ = [μ T 1 , . . . , μ T n ] T of location parameters consists of n measured points where n was set to 250, 500, 2500 and 5000 in different simulation runs. For greater clarity of notation, we write μ x,t , μ y,t instead of μ 1,t , μ 2,t in the sequel. The true point coordinates were defined by where random numbers u t = [u x,t , u y,t ] T for the whitenoise components were generated from the multivariate tdistribution with σ 2 0 = 0.001 2 , Ψ x = 1.0, Ψ y = 4.0 and Ψ x,y = 0.5. The given cofactor matrix reflects the assumption that the entries of U t are pairwise correlated by correlation coefficient ρ = 0.25. Simulations were carried out for the df ν = 4 (referred to as "stochastic model A") and ν = 7 ("stochastic model B").
In total 500 MC samples of the white-noise components for stochastic model A and B as well as for the different sample sizes n were randomly generated using MAT-LAB's mvtrnd routine. The resulting simulated observations in models A and B were then adjusted by means of the EM algorithm. The maximum number of iterations was set to itermax = 50, and the convergence thresholds to ε = 10 −8 and ε ν = 0.001. The unknown parameters β and Ψ of the simulated observations of Model A were additionally estimated using the classical least-squares approach to the GHM ("LS"), without iterative re-weighting and without imposing a VAR model (cf. [18]). The comparison between the results of LS and EM algorithm demonstrates the effect of assuming a simplified stochastic model.

Analysis of the simplified stochastic model on the estimation results
The results of this comparison are visualized in Fig. 1 and Fig. 2 by means of box plots. In each box plot, the middle line shows the median. The bottom and the top border of the boxes are the 25 th percentile (Q1) and 75 th percentile (Q3), respectively. The distance Q3-Q1 is the inter-quartile range (IQR). The additional lines above and below the box are the whiskers defined by Q1 − 1.5 ⋅ IQR and Q3 + 1.5 ⋅ IQR, respectively. The red crosses indicate the outliers in the sense that their value is more than 1.5 times the IQR away from Q1 and Q3. The results of Fig. 1 show that the precision of the estimated parametersβ decreases with increasing number of observations. Overall, the mean values of the estimated parameters are similar for LS and EM. However, Fig. 1 shows for numbers of observations less than 5000 that LS produces numerous extreme MC radius estimates outside of the inter-quartile range far away from the true radius, whereas all radius values estimated by EM lie within that range. Thus, EM is significantly more accurate than LS as far as the estimation of the radius is concerned. Figure 2 shows the estimated scale parametersΨ for 500 MC runs by means of EM and LS estimation. The number of outliers (red crosses) resulting from the LS estimate is significantly larger than the number of outliers from the EM algorithm. It can be clearly seen that the bias in the EMestimated scale parameters is much smaller compared to the LS-estimated scale parameters. As the number of observations increases, the bias of the estimated scale parameters decreases and the dispersion becomes smaller. The bias of all estimated parameter can be calculated for each MC run by The statistical moments of the calculated bias Δβ and ΔΨ of the simulations with 5000 observations are presented in Table 1. For the functional parameters β the true value lies in the 95 % confidence interval for all estimates (LS, EM model A and EM model B). In contrast, the true values of Ψ lie outside the estimated 95 %-confidence interval of the estimated cofactor matrixΨ when LS is employed. Applying EM, this statement is valid only for the 95 %-confidence interval forΨ x under stochastic model A and forΨ y under stochastic model B. Here, the biases of the estimatorΨ with respect to the true simulation parameter values applying EM is less than the corresponding bias when employing LS. The greater deviation concerning the VCM estimated by LS is the result of not taking the correlations and the heavy-tailedness of the noise into account.

Analysis of the degree of freedom on the estimation results
The influence of the df on the estimation results using the EM algorithm is displayed in Fig. 3 and Fig. 4. Figure 3 shows

Gauss-Markov model: Adjustment of 3D MEMS accelerometer data
In the second part of this section, the performance of the EM algorithm elaborated in Sect. 3 (which we also refer to as "VAR-multivariate algorithm" in the following) is studied on real data sets, which were measured by an accelerometer. Cost-effective Micro-Electro-Mechanical-Systems (MEMS) accelerometers have received increasing attention in numerous applications such as vibration analysis of bridge structures, e. g., [52,53,54]. In such applications, it is desirable to select a suitable accelerometer by considering, for instance, measurement and sensitivity ranges, uncertainty of measurements, cost, and sampling rate. [55] proposed a two-step scenario to accomplish the suitability analysis for MEMS accelerometers. Firstly, the calibration of the MEMS accelerometers was carried out for different positions and over different temperature ranges. Therefore, the characteristics of the calibration parameters such as biases, scale factors and nonorthogonalities between axes have been inspected during different temperature changes. Secondly, a controlled excitation experiment was conducted using a shaker to compare and validate estimated harmonic oscillation parameters including frequency, amplitude and phase shift with other MEMS accelerometers as well as a high-end reference accelerometer. The second step additionally allowed

LS Model A EM Model A EM Model B
Δβ c x mean 3.7 ⋅ 10 −  to check the time synchronization between the MEMS accelerometers. Later on [54] extended the aforementioned two-step scenario to three-step scenario. In the third-step, a static test experiment was accomplished over a long period of time, which allows to estimate an offset and a drift of the measurements as well as auto-correlations and underlying distributional parameters for each axis individually.
As part of this study and to make the third step of the aforementioned three-step scenario more comprehensive, the (unknown) auto-as well as cross-correlations and the (unknown) distributional characteristics of the acceleration measurements recorded from three different axes are analyzed by employing VAR models with t-distributed er-rors. For this purpose, acceleration measurements were recorded from a cost-effective MEMS accelerometer of type BNO055 with Arduino UNO-Board and 9 axes motion shield (produced from Bosch GmbH), which is a so-called NAMS. The measurements were carried out in three different axes over a period of 24 hours (Fig. 5) with a sampling frequency of 0.3789 Hz. Although it is possible to perform the measurements with a higher sampling frequency (e. g., up to 200 Hz), the aforementioned sampling frequency is specified to speed up the processing.
An observation model consisting of (1) a functional model based on a linear drift with an unknown offset (intercept) and a linear drift coefficient, (2) an auto-and cross-correlation model based on a VAR process with un-   The model parameters estimated by means of the VARmultivariate algorithm are then compared with the estimates resulting from the "AR-univariate algorithm" (see [54] and [56] for details) where each axis of the MEMS accelerometers is modeled by an individual AR process. The static acceleration measurements ℓ 1 , . . . , ℓ n (n = 32743) are Table 3: Statistics of the estimated unknown parameters in the functional model based on the linear drift model, the auto-and crosscorrelation model based on the VAR process and the stochastic model based on the centered and scaled t-distribution with an unknown degree of freedom and unknown scale factor from the AR-univariate and VAR-multivariate algorithms.
x −0.3597 6.81e-09 0.0066 1 −6.2927e-04 yes 16.87 univar. y 0.2811 −6.12e-08 0.0132 1 0.0013 yes 120 univar. z 9.6746 6.65e-09 0.0141 1 0.0087 yes 120 modeled based on a linear drift, which is described by the model equations where the vector β of unknown functional parameters consists of the offset c 0 in [m/s 2 ] and the linear drift coefficient The quantities x 1 , . . . , x n represent equidistant time instances and e 1 , . . . , e n the random deviations contaminated with colored noise. The preceding observation equations can be written in the form of where X t are the rows of the design matrix which has full rank. Table 3 summarizes the statistics of the estimated parameters from the VAR-multivariate and AR-univariate algorithms based on the GMM.ĉ 0 [m/s 2 ] is the estimated offset andĉ 1 the estimated linear drift coefficient [m/s 3 ]. p is the VAR model order in case of the VAR-multivariate algorithm and the AR model order in case of the ARunivariate algorithm. WNT stands for "white-noise test" and indicates the acceptance ("yes") or rejection ("no") of the multivariate portmanteau test introduced by [57] and adapted by [26].σ l is the estimated standard deviation of the white noise components. In this study, the multivariate portmanteau test is applied to select an adequate VAR model order by testing the residuals of the VAR model if they are uncorrelated or not. Therefore, the p-value (pv) is calculated based on different VAR model orders p = 1, . . . , 10. Beforehand, the significance level is defined as α = 0.05, and the maximum lag is set to 20 similar to the research of [26]. The WNT is accepted if pv > 0.05, which corresponds to the acceptance of the white noise hypothesis, otherwise it is rejected. To estimate the df of the underlying t-distribution a zero search based on the reliable interval Newton method as described in [58] is performed by using the INTLAB library [51]. The estimated unknown parameters including offset and drift in both VAR-multivariate and AR-univariate algorithms are rather similar to each other and no significant differences are observable. The df estimate initially was below 2 with the application of the VAR-multivariate algorithm, so that the covariance matrix Σ of the white noise is theoretically undefined. We also found some numerical instability with the estimation of the parameters of the scale matrix, so that we set the estimateν to the small but numerically stable value 2.1 and repeated the algorithm with this fixed df. In contrast, the AR-univariate algorithm produced much larger df estimates (16.87, 120, and 120) for the individual axes. The value 120 is the maximum value of the zero search interval; if the zero is not found within the search interval, it is automatically set to the threshold 120 (which happened in the present case). This threshold indicates the fitted t-distribution already is very close to a normal distribution. We may therefore conclude that the combination of a VAR process with a multivariate t-distribution yields quite different results than the three individual AR processes with univariate t-distributions, which ignore the cross-correlations in the noise. The reason for why the latter combination leads to increased df estimates needs to be investigated further. The standard deviations of the white noise components for both VAR-multivariate and AR-univariate algorithms are estimated, see Table 3. As it could be seen, the uncertainties in the x-axis of the NAMS acceleration data are less than for the two other axes. Fig. 6 (a) illustrates the estimated p-values for different VAR model orders. The estimated p-values are all above the significance level, and thus, the VAR model order 1 is selected. Fig. 7 (b) shows the calculated test values that are compared with the cumulative distribution function of the  chi-square distribution (F χ 2 (4 2 ⋅(20−p)) ) with df = 4 2 ⋅ (20 − p) [26]. As it could be seen, all test values are below the critical values, and thus, the WNT are accepted.
The estimated VCM (Σ) and its corresponding correlation matrix (R) based on the VAR-multivariate algorithm areΣ  according to the definition of the VAR coefficient matrix (23). As it could be seen, the VAR model coefficients corresponding to y and z axes are slightly greater than those for the x-axis. In addition, the estimated colored and white noise residuals obtained from NAMS acceleration data are illustrated in Fig. 8. The subtraction of the aforementioned residuals are provided (Fig. 9) to have a better realization of the amplitude of the colored noise residuals at each axis. As it could be seen, the y and z-axes of the NAMS accelerometer have stronger colored measurement noise in comparison to the x-axis.
The proposed approach in this study assists us to select a proper MEMS accelerometer for, e. g., the purpose of vibration analysis of bridge structures. It is applied as a third step of the aforementioned scenario in the suitability analysis of the MEMS accelerometer by providing information about unknown offset and drift coefficients, VAR model order and stochastic model parameters. Therefore, a sensor with possibly less offset and drift coefficients over a long period as well as lower uncertainty of the mea-surements could be selected. In addition, a minimum VAR model order between different MEMS accelerometers reveals less cross-correlation between their axes, which can also be considered as an important influencing factor in the selection procedure. Moreover, it is observed that the MEMS accelerometers used in this study have minimum standard deviations in the x-axis compared to the two other axes. Therefore, the sensors should be mounted with their x-axis in the main observations direction during the monitoring of bridge structures.

Gauss-Helmert model: Adjustment of a sphere from continuous laser tracker measurements
In the third part of this section, the performance of the EM algorithm in a real-world application estimating the parameters β = c x c y c z r T of sphere is studied. The practical relevance of this sphere parameter estimation can be seen in the field of surface fitting applications as well as referencing applications of surface and point-wise measuring sensors, i. e., laser scanner and laser tracker measurements. The 3D sphere is defined by the nonlinear equation which is an extension in 3D space by considering the z-component in addition to the nonlinear equation of the 2D circle in (70). To illustrate and discuss the adjustment of a sphere, 3D coordinates on the surface of a sphere with known radius were obtained by continuous measurements with 10 Hz using a laser tracker of type Leica AT960-LR. The extended uncertainty of the x-, y-and z-coordinate is given by U x,y,z = ±15 µm + 6 µm/m (MPE according to ISO 10360-10) while using a Leica Red Ring Reflector 1.5" (RRR) in single measurement mode [60]. Here, the data acquisition is performed in continuous measurement mode at 10 Hz using the software tool Tracker Pilot. In the acquisition process, the RRR was moved randomly over the sphere surface from only one laser tracker setup; thus, only a halfsphere is measured. To capture the full sphere surface at least two standpoints of the laser tracker are required. In this case, the measurements have to be transformed into a common coordinate system. This leads to the presence of referencing uncertainties besides sensor-related uncertainties. In order to focus on sensor uncertainties only, the sphere is only observed from one standpoint. The distance between the laser tracker and the sphere was approx-

Analysis of the measurements
In analogy to the analysis scheme for the simulated 2D circle in Sect. 4.1, the sphere parameters are obtained on the one hand by means of a LS estimation, i. e. a classical GHM, and on the other hand by means of the proposed EM algorithm in this contribution. The thresholds used within the EM algorithm are similar to the simulation. In this realworld application study, the multivariate portmanteau test is applied to select an adequate VAR model order p by testing the estimated residualsû t of the VAR model if they are uncorrelated (i. e., by checking for the presence of white noise). This VAR model order selection for p = 0, . . . , 10 was similarly applied in Sect. 4.2 using the significance level α = 0.05, and the maximum lag is set to 20 similar to the research of [26]. For the VAR model order p = 0 no multivariate VAR process is estimated. Instead, the GHM with stochastically independent t-distributed errors is evaluated. The EM algorithm estimates the unknown parameters β, Ψ, ν and A j , which yields -depending on the VAR model order -in total 4 + 6 + 1 + 9 ⋅ p parameters to estimate. Before discussing the results of the parameter estimation, we will discuss the results of the portmanteau test in Fig. 10. The blue line indicate the critical value for the significance level α = 0.05. The red dots show the test values of the weighted white noise, which are larger than the critical value. The green star represents the test value that is smaller than the critical value, which means that the test is passed. It can be seen that the test is passed for order p = 3, while for orders p = 4 . . . 7 the test value is close to the critical value. The test for order p = 0 failed and is, due to its significantly larger numerical value, not shown in Fig. 10. Based on these portmanteau test results, we will focus in the further discussion on the results of the LS model and the VAR model orders p = 0 and p = 3.

Discussion of the results
The estimated parameters β for the LS model and the VAR model orders p = 0 and p = 3 shown in Table 4 are almost the same, with the maximum differences of 5 μm being in the range of the uncertainty of the laser tracker used. The small value for the df indicates that the residuals u t are not normally distributed. Differences in the estimated parameters of the three models can be found in the scale matrix and the VCM of the residuals.Σ uu is equivalent to the classical estimate of the VCM of the estimated white noise u. For the LS model,Σ uu is estimated by means of the estimated residuals e.Σ uu for VAR model order p = 3 clearly shows, how the values of the VCM of the observations on the main diagonal are reduced by means of the VAR process. The fact that the matricesΣ uu for the LS model and the VAR model order p = 0 are almost identical can be seen in the residuals shown in Fig. 11. Figure 11 depicts the estimated colored and white noise residuals obtained from laser tracker measurements for VAR model order p = 3. The blue line indicates the colored noise, which shows several significant peaks, which can be interpreted as potential outliers. These may stem from the imperfect surface of the sphere or even can be associated with the roughness of the surface. Furthermore, these significant peaks can result from the measurement procedure since for the data acquisition the RRR is moved individually by hand along the surface. If the contact to the surface is slightly lost, this will also produce peaks as can be seen in the blue line. Due to the orientation of the local laser tracker coordinate system, individual peaks in the y-component (middle part in Fig. 11) can be associated with an uplift of the RRR towards the laser tracker. The minor colored noise level of the z-component also results from the measurement setup, i. e. the orientation of the local laser tracker coordinate system, and shows the uncertainty differences for the angular and the distance measurement unit. Since the estimated parameters β for the LS model and the VAR model orders p = 0 and p = 3 are almost identical (see Table 4), the depicted colored noise is also nearly identical. For the special case of VAR model order p = 0 the residual estimatesê t andû t are equal. This implies equality ofΣ uu for the LS model and the VAR model order p = 0. Furthermore, the residuals in Fig. 11 show an equal distribution of positive and negative peaks (outliers) for the individual coordinate components. Due to the symmetric geometry of the sphere the positive and negative peaks compensate each other in the estimation of the parameters β.
The red circles in Fig. 11 indicate white noise and the green dots indicate the weighted white noise, i. e. white noise values multiplied with their weightsŵ. It can be clearly seen that the VAR process is unable to fully smooth the colored noise. First, the substitution of t-distribution instead of the normal distribution allow for reducing the influence of the remaining outliers in the parameter estimation. Furthermore, only by using the weighted white noise (colored in green) the portmanteau test can be passed, which is not applicable for the white noise (colored in red). Table 5 shows the results of the VAR coefficients for the model orders p = 1, 2, 3. To judge about any possible correlations for the colored noise of the x-, y-and z-component, we can confirm correlations based on the shown VAR coefficients in Table 5. Since the off-diagonal elements are significantly different from zero, this indicates the presence of correlations. As an example,Â 3 shows similar magnitudes for the x-component with −0.263 and correlations with the y-component by a value of 0.203. Furthermore, the alternating positive and negative sign of the coefficients indicate the negative correlation for the coordinate components.
Taking a look at the VAR coefficients for the model order p = 1 one can see small off-diagonal elements, which shows that the correlation of the coordinate components is hardly reduced. This can also be seen by the significant difference of the test value in Fig. 10 for p = 1 with respect to the critical value. For the VAR model of order p = 2 it is firstly possible to estimate the correlations of the colored noise. On the one hand, in Table 5 the off-diagonal elements for p = 2 are obviously larger compared to p = 1. On the other hand, the test value is approaching the critical value, which can be seen in Fig. 10.

Conclusions and outlook
The nonlinear Gauss-Helmert model constitutes the most general adjustment model, which is widely used in geodesy. This model consists of condition equations (which link the observations and the functional parameters occurring in the mathematical model employed to approximate the measurements) and of a stochastic model. The latter usually is defined by a VCM, which tends to be huge when the number of measurements is large. We therefore applied and investigated, in the context of multivariate time series, a more manageable type of stochastic model defined by a vector autoregressive (VAR) process. This model takes both auto-and cross-correlations into account and can be estimated alongside the functional model parameters. By employing a multivariate t-distribution with data-adaptable scale matrix and degree of freedom, moderate deviations from the usual assumption of normally distributed noise are possible. To fuse this t-distribution with the VAR process and the constraints, we applied the idea of formulating a generalized Gauss-Helmert model in terms of a likelihood function, which is maximized under the constraints. The proposed model allows for a computationally convenient iteratively reweighted least-squares method based on a constrained expectation maximization algorithm. This methodology extends the previously established approach involving univariate autoregressive models that ignored potential cross-correlations between the time series. Analysis of the biases by means of Monte Carlo simulations showed that the functional model parameters tend to be highly accurate when the number of observations is increased, whereas a significant bias of the degree of freedom of the multivariate t-distribution persists, in particular for smaller degrees of freedom. The accuracy of the estimated VAR coefficients and of the scale parameters generally improves with larger datasets. The methodology is also applicable to functional models that take the form of a Gauss-Markov model, as shown by an analysis of real accelerometer data modeled by an offset and a linear drift. For another real-world application, which utilizes continuous measurements of a laser tracker on a sphere, the applicability of the proposed Gauss-Helmert model was also demonstrated. Furthermore, the analyses of the re-sults for this application demonstrated that the estimated VAR process coefficients and the estimated parameters of the multivariate t-distribution allows for detailed evaluations of the auto-and cross-correlation as well as the distributional characteristics of the measurement noise. Thus, actual deviations of the noise from expected whitenoise and normal-distribution behavior become visible in the course of adjusting real-world datasets. Due to its flexibility the proposed innovative type of Gauss-Helmert model with VAR and t-distributed errors could be useful in various fields of geodesy such as engineering geodesy or Earth system modeling where multivariate, functionally complex and correlated time series are analyzed. Since such datasets often contain gaps a further extension of the model and EM algorithms such that the E-step yields also values for missing measurement values appears to be useful. It is also conceivable that the VAR processes of the current model are replaced by other stochastic processes that might match the correlation pattern of a given set of observables better.

Log-likelihood and Q-functions
The joint normal equation system for all of the VAR coefficient matrices A 1 , . . . , A p then becomes

Normal equations for the inverse scale matrix
Substituting (33) into (86) for brevity of expressions and applying the arguments given in Sect. 373 in [18], one obtains