In this work, we describe advanced numerical tools for working with multivariate functions and for the analysis of large data sets. These tools will drastically reduce the required computing time and the storage cost, and, therefore, will allow us to consider much larger data sets or finer meshes. Covariance matrices are crucial in spatio-temporal statistical tasks, but are often very expensive to compute and store, especially in three dimensions. Therefore, we approximate covariance functions by cheap surrogates in a low-rank tensor format. We apply the Tucker and canonical tensor decompositions to a family of Matérn- and Slater-type functions with varying parameters and demonstrate numerically that their approximations exhibit exponentially fast convergence. We prove the exponential convergence of the Tucker and canonical approximations in tensor rank parameters. Several statistical operations are performed in this low-rank tensor format, including evaluating the conditional covariance matrix, spatially averaged estimation variance, computing a quadratic form, determinant, trace, loglikelihood, inverse, and Cholesky decomposition of a large covariance matrix. Low-rank tensor approximations reduce the computing and storage costs essentially. For example, the storage cost is reduced from an exponential to a linear scaling , where d is the spatial dimension, n is the number of mesh points in one direction, and r is the tensor rank. Prerequisites for applicability of the proposed techniques are the assumptions that the data, locations, and measurements lie on a tensor (axes-parallel) grid and that the covariance function depends on a distance, .
Nowadays it is very common to work with large spatial data sets [64, 15, 62, 44, 61, 52], for instance, with satellite data, collected over a very large area (e.g., the data collected by the National Center for Atmospheric Research, USA, https://www.earthsystemgrid.org/). This data can also come from a computer simulator code as a solution of a certain multiparametric equation (e.g., Weather research and Forecasting model, https://www.mmm.ucar.edu/weather-research-and-forecasting-model), it could be also sensor data from multiple sources. Typical operations in spatial statistics, such as evaluating the spatially averaged estimation variance, computing quadratic forms of the conditional covariance matrix, or computing maximum of likelihood function  require high computing power and time. Our motivation for using low-rank tensor techniques is that operations on advanced matrices, such as hierarchical, low-rank and sparse matrices, are limited by their high computational costs, especially in three dimensions and for a large number of observations.
A tensor can be simply defined as a high-order matrix, where multi-indices are used instead of indices (see Section 3 and equation (3.1) for a rigorous definition). One way to obtain a tensor from a vector or matrix is to reshape it. For example, we assume that is a vector. We reshape it and obtain a matrix of size , or a tensor of order 3 of size or a tensor of order 6 of size (6 times). Each element of such a six-dimensional hypercube is described by the multi-index . The obtained tensors contain not only rows and columns, but also slices and fibers [40, 41, 10]. These slices and fibers can be analyzed for linear dependences, super symmetry, or sparsity and may result in a strong data compression. Another difference between tensors and matrices is that a matrix (obtained, for instance, after the discretization of a kernel ) separates a point from a point , whereas the corresponding tensor (depending on tensor format) separates from . This implies that tensors may have not just one rank like a matrix, but many. Therefore, we speak about a tensor rank, but not a matrix rank. In this work, we consider two very common tensor formats: canonical (denoted as CP) and Tucker (see Section 3).
Low-rank tensor methods can be gainfully combined with other data-compression techniques in low dimensions. For example, a three-dimensional function can be approximated as the sum of the tensor products of one-dimensional function. Then the usual matrix techniques can be applied to those one-dimensional functions.
To be more concrete, we consider a relatively wide class of Matérn covariance functions. We demonstrate how to approximate Matérn covariance matrices in a low-rank tensor format, then how to perform typical Kriging and spatial statistics operations in this tensor format. Matérn covariance matrices typically depend on three to five unknown hyper-parameters, such as smoothness, three covariance lengths (in a three-dimensional anisotropic case), and variance. We study the dependences of tensor ranks and approximation errors on these parameters. Splitting the spatial variables via low-rank techniques reduces the computing cost for a matrix-vector product from to FLOPs, where d is the spatial dimension, r is the tensor rank, and n is the number of mesh points along the longest edge of the computational domain. For simplicity, we assume that (e.g., for a time-space problem in three dimensions). Other motivating factors for applying low-rank tensor techniques include the following:
The storage cost is reduced from to or, depending on the tensor format, to , where .
The low-rank tensor technique allows us to compute not only the matrix-vector product, but also the inverse , square root , matrix exponent , , , and a likelihood function.
The low-rank tensor approximation is relatively new, but already a well-studied technique with free software libraries available.
The approximation accuracy is fully controlled by the tensor rank. The full rank gives an exact representation.
General limitations of the tensor technique are the following:
It could be time consuming to compute a low-rank tensor decomposition.
It requires axes-parallel mesh.
Some theoretical estimations exist for functions depending on (although more general functions have a low-rank representation in practice).
During the last few years, there has been great interest in numerical methods for representing and approximating large covariance matrices [44, 54, 56, 51, 1, 2, 43]. Low-rank tensors were previously applied to accelerated Kriging and spatial design by orders of magnitude . The covariance matrix under consideration was assumed to be circulant, and the first column had a low-rank decomposition. Therefore, d-dimensional Fourier was applied to and drastically reduce the storage and the computing cost.
The maximum likelihood estimator was computed for parameter-fitting given Gaussian observations with a Matérn covariance matrix . The presented framework for unstructured observations in two spatial dimensions allowed for an evaluation of the log-likelihood and its gradient with computational complexity . The -matrix techniques [19, 22, 21] provide the efficient data sparse approximation for the differential and integral operators in , . -matrices are very robust for approximating the covariance matrix [38, 56, 2, 26, 1, 4], its inverse , and its Cholesky decomposition [38, 44, 43], but can also be expensive, especially for large n in three dimensions. Namely, the complexity in three dimensions will be , where , , is the rank and C is a large constant which scales exponentially in dimension d, see . Thus, the -matrix techniques scale exponentially in dimension size. Therefore, more efficient methods for fast and efficient matrix linear algebra operations are still needed.
The key idea is to compute a low-rank decomposition not of the covariance function (it could be hard), but of its analytically known spectral density (which could be a much easier object) and then apply the inverse Fourier to the obtained low-rank components. The Fourier transformation of the Matérn covariance function is known analytically as the Hilbert tensor. This Hilbert tensor can be decomposed numerically in a low-rank tensor format. Both the Fourier transformation and its inverse have the canonical (CP) tensor rank-1. Therefore, the inverse Fourier does not change the tensor rank of the argument. By applying the inverse Fourier to the low-rank tensor, we obtain a low-rank approximation of the initial covariance matrix, which can be further used in the Kalman filter update, Karhunen–Loève expansion, Bayesian update, and Kriging.
The structure of the paper is as follows. In Section 2, we list typical tasks from statistics that motivate us to use low-rank tensor techniques and define the Matérn covariance functions and their Fourier transformations. Section 3 is devoted to low-rank tensor decomposition. Sections 3.4, 3.5 and 3.6 contain the main theoretical contribution of this work. We present low-rank tensor techniques and separate radial basis functions using the Laplace transform and the quadrature, give estimations of the approximation error, convergence rate, and the tensor rank, and we also prove the existence of a low-rank approximation of a Matérn function. Section 4 contains another important contribution of this work, namely, the solutions to typical statistical tasks in the low-rank tensor format.
2.1 Problem Settings in Spatial Statistics
Below, we formulate five tasks. These computational tasks are very common and important in statistics. Fast and efficient solution of these tasks will help to solve many real-world problems, such as the weather prediction, moisture modeling, and optimal design in geostatistics.
Task 1: Approximate a Matérn covariance function in a low-rank tensor format.
The covariance function , , , is discretized on a tensor grid with N mesh points, , and . The task is to find the following decomposition into one-dimensional functions:
for some given . Alternatively, we may look for factors such that
Here, the matrices correspond to the one-dimensional covariance functions in the direction μ.
Task 2: Computing of square root of .
The square root of the covariance matrix is needed in order to generate random fields and processes. It is also used in the Kalman filter update.
Task 3: Kriging.
Spatial statistics and Kriging  are used to model the distribution of ore grade, forecast of rainfall intensities, moisture, temperatures, or contaminant. The missing values are interpolated from the known measurements by Kriging [46, 30]. When considering space-time Kriging on fine meshes [66, 14, 28, 9], Kriging may easily exceed the computational power of modern computers. Estimating the variance of Kriging and geostatistical optimal design problems are especially numerically intensive [48, 50, 60].
The Kriging can be defined as follows. Let be the vector of values to be estimated with zero expectation and covariance matrix . Let be the , , vector of measurements. The corresponding cross- and auto-covariance matrices are denoted by and , and sized and , respectively. If the measurements are subject to error, an error covariance matrix is included in . Using this notation, the Kriging estimate is given by .
Task 4: Geostatistical design.
The goal of geostatistical optimal design is to optimize the sampling patterns from which the data values in will be obtained. The objective function that will be minimized is typically a scalar measure of either the conditional covariance matrix or the estimation variance (4.4). The two most common measures for geostatistical optimal design are and :
Task 5: Computing the joint Gaussian log-likelihood function.
We assume that is an available vector of measurements, and is an unknown vector of the parameters of a covariance matrix . The task is to compute the maximum likelihood estimation (MLE), where the log-likelihood function is as follows:
The difficulty here is that each iteration step of a maximization procedure requires the solution of a linear system , the Cholesky decomposition, and the determinant.
In Section 4 we give detailed solutions. We give strict definition of tensors later in Section 3.
2.2 Matérn Covariance and Its Fourier Transform
A low-rank approximation of the covariance function is a key component of the tasks formulated above. Among of the many covariance models available, the Matérn family [45, 25] is widely used in spatial statistics, geostatistics , machine learning , image analysis, weather forecast, moisture modeling, and as the correlation for temperature fields . The work  introduced the Matérn form of spatial correlations into statistics as a flexible parametric class with one parameter determining the smoothness of the underlying spatial random field.
The main idea of this low-rank approximation is shown in Figure 1 and explained in details in Section 3.3. Figure 1 demonstrates two possible ways to find a low-rank tensor approximation of the Matérn covariance function. The first way (marked with “?”) is not so trivial and the second via the Fast Fourier Transform (FFT), low-rank and the inverse FFT (IFFT) is more trivial. We use here the fact that the FT of the Matérn covariance is analytically known and has a known low-rank approximation. The IFFT can be computed numerically and does not change the tensor ranks.
The Matérn covariance function is defined as
where distance , two points in , defines the smoothness of the random field, and denotes the modified Bessel function of order ν. The larger ν, the smoother the random field. The parameter is a spatial range parameter that measures how quickly the correlation of the random field decays with distance, with larger corresponding to a slower decay (keeping ν fixed). When (see ), the Matérn covariance function reduces to the exponential covariance model and describes a rough field. The value corresponds to a Gaussian covariance model, which describes a very smooth field, that is infinitely differentiable. Random fields with a Matérn covariance function are times mean square differentiable. Thus, the hyperparameter ν controls the degree of smoothness.
The d-dimensional Fourier transform of the Matérn kernel, defined in equation (2.2), in is given by 
where is a constant and is the Euclidean distance in . The following tensor approach also applies to the case of anisotropic distance, where , and A is a positive diagonal matrix.
3 Low-Rank Tensor Decompositions
In this section, we review the definitions of the CP and Tucker tensor formats. Then we provide the analytic -based proof of the existence of low-rank tensor approximations of Matérn functions. We investigate numerically the behavior of the Tucker and CP ranks across a wide range of parameters specific to the family of Matérn kernels in equation (2.3). The Tucker tensor format is used in this work for additional rank compression of the CP factors. There are no reliable algorithms to compute CP decomposition, which can be difficult to compute, but there are such algorithms for Tucker decomposition. The Tucker decomposition is only limited with respect to the available memory storage, since the term in grows exponentially with d.
3.1 General Definitions
CP and Tucker rank-structured tensor formats have been applied for the quantitative analysis of correlation in multidimensional experimental data for a long time in chemometrics and signal processing [59, 8]. The Tucker tensor format was introduced in 1966 for tensor decomposition of multidimensional arrays in chemometrics . Though the canonical representation of multivariate functions was introduced as early as in 1927 , only the Tucker tensor format provides a stable algorithm for decomposition of full-size tensors. A mathematical approval of the Tucker decomposition algorithm was presented in papers on higher-order singular value decomposition (HOSVD) and the Tucker ALS algorithm for orthogonal Tucker approximation of higher-order tensors . For higher dimensions, the so-called Matrix Product States (MPS) (see the survey paper ) or the Tensor Train (TT)  decompositions can be applied. However, for three-dimensional applications, the Tucker and CP tensor formats remain the best choices. The fast convergence of the Tucker decomposition was proved and demonstrated numerically for higher-order tensors that arise from the discretization of linear operators and functions in for a class of function-related tensors and Green’s kernels in particular, it was found that the approximation error of the Tucker decomposition decayed exponentially in the Tucker rank [33, 36].
These results inspired the canonical-to-Tucker (C2T) and Tucker-to-canonical (T2C) decompositions for function-related tensors in the case of large input ranks, as well as the multigrid Tucker approximation .
A tensor of order d in a full format is defined as a multidimensional array over a d-tuple index set:
Here, is an element of the linear space
equipped with the Euclidean scalar product , defined as
Tensors with all dimensions having equal size , , are called tensors. The required storage size scales exponentially with the dimension, , which results in the so-called “curse of dimensionality”.
To avoid exponential scaling in the dimension, the rank-structured separable representations (approximations) of the multidimensional tensors can be used. The simplest separable element is given by the rank-1 tensor
with entries which requires only numbers for storage.
The rank-1 canonical tensor is a discrete counterpart of the separable d-variate function, which can be represented as the product of univariate functions
An example of the separable d-variate function is . Then, by discretization of this multivariate function on a tensor grid in a computational box, we obtain a canonical rank-1 tensor.
A tensor in the R-term canonical format is defined by a finite sum of rank-1 tensors (Figure 2, left)
where are normalized vectors, and R is the canonical rank. The storage cost of this parametrization is bounded by dRn. An element of the tensor can be computed as
An alternative (contracted product) notation is used in computer science community:
where , and . An analogous multivariate function can be represented by a sum of univariate functions
For , there are no stable algorithms to compute the canonical rank of a tensor , that is, the minimal number R in representation (3.2), and the respective decomposition with the polynomial cost in d, i.e., the computation of the canonical decomposition is an N-P hard problem .
The Tucker tensor format (Figure 2, right) is suitable for stable numerical decompositions with a fixed truncation threshold. We say that the tensor is represented in the rank- orthogonal Tucker format with the rank parameter if
where represents a set of orthonormal vectors for , and is the Tucker core tensor. The storage cost for the Tucker tensor is bounded by , with . Using the orthogonal side matrices and contracted products, the Tucker tensor decomposition may be presented in the alternative notation
In the case , the orthogonal Tucker decomposition is equivalent to the singular value decomposition (SVD) of a rectangular matrix.
3.2 Tucker Decomposition of Full Format Tensors
We use the following algorithm to compute the Tucker decomposition of the full format tensor. The most time-consuming part of the Tucker algorithm is higher-order singular value decomposition (HOSVD), the computation of the initial guess for matrices using the SVD of the matrix unfolding , (Figure 3), of the original tensor along each mode of a tensor . Figure 3 illustrates the matrix unfolding of the full format tensor (see (3.1)) along the index set .
The second part of the algorithm is the ALS procedure. For every tensor mode, a “single-hole” tensor of reduced size is constructed by the mapping all of the modes of the original tensor except one into the subspaces . Then the subspace for the current mode is updated by SVD of the unfolding of the “single hole” tensor for this mode. This alternates over all modes of the tensor, which are updated at the current iteration of ALS. The final step of the algorithm is computation of the core tensor by using the ultimate mapping matrices from ALS.
The numerical cost of Tucker decomposition for full size tensors is dominated by the initial guess, which is estimated as when all are equal , or for our three-dimensional case. This step restricts the available size of the tensor to be decomposed since, for conventional computers, the three-dimensional case is the limiting case for SVD.
The multigrid Tucker algorithm for full size tensors allows the computational complexity to be linear in the full size of the tensor, , i.e., for three-dimensional tensors . It is computed on a sequence of diadically refined grids and is based on implementing the HOSVD only at the coarsest grid level, . The initial guess for the ALS procedure is computed at each refined level by the interpolation of the dominating Tucker subspaces obtained from the previous coarser grid. In this way, at fine three-dimensional Cartesian grids, we need 29 only storage (to represent the initial full format tensor) to contract with the Tucker side matrices, obtained by the Tucker approximation via ALS on the previous grids.
3.3 Illustration of the Low-Rank Approximation Idea
In this subsection we describe a possible ways to find a low-rank tensor approximation of the Matérn covariance matrix (Figure 1). Let be the d-dimensional Fourier transform, where is its inverse and denotes the Kronecker product. We assume that is known analytically and has a low-rank tensor approximation . Since the Fourier and inverse Fourier transformations do not change the Kronecker tensor rank of the argument , by applying the inverse Fourier, we obtain a low-rank representation of the covariance function by applying the inverse Fourier:
3.4 Sinc Approximation of the Matérn Function
The Sinc method provides a constructive approximation of the multivariate functions in the form of a low-rank canonical representation. It can be also used for the theoretical proof and for the rank estimation. Methods for the separable approximation of the three-dimensional Newton kernel and many other spherically symmetric functions that use the Gaussian sums have been developed since the initial studies in chemical  and mathematical literature [63, 6, 23, 17]. Here, we use a tensor-decomposition approach for lattice-structured interaction potentials . We also recall the grid-based method for a low-rank canonical representation of the spherically symmetric kernel function , where , , by its projection onto the set of piecewise constant basis functions; see  for the case of Newton and Yukawa kernels for .
Following the standard schemes, we introduce the uniform rectangular Cartesian grid with mesh size (we assume that n is even) in the computational domain . Let be a set of tensor-product piecewise constant basis functions
for the 3-tuple index , , with , where . The generating kernel is discretized by its projection onto the basis set in the form of a third-order tensor of size , which is defined entry-wise as
The low-rank canonical decomposition of the third-order tensor is based on applying exponentially convergent -quadratures to the integral representation of the function , , in the form
specified by the weights . Figure 4 illustrates a scheme of the proof of existence of the canonical low-rank tensor approximation. It could be easier to apply the Laplace transform to the Fourier transform of a Matérn covariance matrix than to the Matérn covariance. To approximate the resulting Laplace integral we apply the quadrature. The number of terms in the approximate sum (3.5) is the canonical tensor rank.
In particular, the -quadrature for the Laplace–Gauss transform
can be applied, where the quadrature points and weights are given by
Under the assumption , this quadrature can be proven to provide an exponential convergence rate in M (uniformly in p) for a class of functions that are analytic in a certain strip of the complex plane such that the functions decay polynomially or exponentially on the real axis. The exponential convergence of the -approximation in the number of terms (i.e., the canonical rank ) was analyzed elsewhere [63, 6, 23].
We assume that a representation similar to (3.5) exists for any fixed such that . Then we apply the -quadrature approximation (3.5) and (3.6) to obtain the separable expansion
providing an exponential convergence rate in M:
By combining (3.4) and (3.7), and taking into account the separability of the Gaussian basis functions, we arrive at the low-rank approximation of each entry of the tensor :
Recalling that , we define the vector as
Then the third-order tensor can be approximated by the R-term () canonical representation
where . Given a threshold , M can be chosen as the minimal number such that in the max-norm
The skeleton vectors can be reindex by , (), . The symmetric canonical tensor in (3.8) approximates the three-dimensional symmetric kernel function (), centered at the origin, such that ().
In some applications, the tensor can be given in the canonical tensor format, but with large rank R and discretized on large grids ; thus, computation of the initial of guess in the Tucker-ALS decomposition algorithm becomes intractable. This situation may arise when composing the tensor approximation of complicated kernel functions from simple radial functions that can be represented in the low-rank CP format.
For such cases, the canonical-to-Tucker decomposition algorithm was introduced . It is based on the minimization ALS procedure, similar to the Tucker algorithm, described in Section 3.2 for full-size tensors, but the initial guess is computed by just the SVD of the side matrices , ; see (3.3). This schema is the Reduced HOSVD (RHOSVD), which does not require unfolding of the full tensor.
Another efficient rank-structured representation of the multidimensional tensors is the mixed-tensor format , which combines either the canonical-to-Tucker decomposition with the Tucker-to-canonical decomposition, or standard Tucker decomposition with the canonical-to-Tucker decomposition, in order to produce a canonical tensor from a full-size tensor.
3.5 Laplace Transform of the Covariance Matrix
The integral representations like (3.5) can be derived by the Laplace transform either directly to the Matérn covariance function or to its spectral density (2.3). For example, in the case of the Newton kernel, , and the Laplace–Gauss transform representation takes the form
In this case, , and the sum of (3.8) reduces to , implying that . Therefore, the Laplace transform representation of the Slater function (i.e., exponential covariance) with can be written as
When the Matérn spectral density in (2.3) has an even dimension parameter , and , the Laplace transform
can be applied after substituting in
If , then the Laplace transform is
3.6 Covariance Matrix in Rank-Structured Tensor Format
In what follows, we consider the CP approximation of the radial function in the positive sector, i.e., on the domain (by symmetry, the canonical tensor can be extended to the whole computational domain .) Let the covariance function in (2.2) be represented by the rank-R symmetric CP tensor on an tensor grid, denoted by as described in the previous sections,
with the same skeleton vectors for .
We define the covariance matrix entry-wise by
Using the tensor representation (3.9), we represent the large matrix in the rank-R Kronecker (tensor) format as
where the symmetric Toeplitz matrix , , is defined by its first column, which is specified by the skeleton vectors in the decomposition (3.9).
Figure 5 illustrates eight selected canonical generating vectors from for , , on a grid of size for the Slater function , which defines the corresponding Toeplitz matrices.
A Toeplitz matrix can be multiplied by a vector in operations via complementing it to the circulant matrix. In general, the inverse of a Toeplitz matrix cannot be calculated in the closed form; unlike circulant matrices, which can be diagonalized by the Fourier transform.
In the rest of this subsection, we introduce the numerical scheme, based on certain specific properties of the skeleton matrices in the symmetric rank-R decomposition (3.10) that is capable of rank-structured calculations of the analytic matrix functions . We discuss the most interesting examples of the functions and , where .
Given an symmetric positive definite matrix such that , the matrix-valued function is given as the exponentially fast converging series
where the matrix , acting in the multidimensional index set, allows the low-rank Kronecker tensor decomposition. Then the low-rank tensor approximation of can be computed by the “add-and-compress” scheme, where each term in the series above (3.11) is summed using the rank-truncation algorithm in the corresponding format.
To limit the rank-structured evaluation of to the described framework, we propose the special rank-structured additive splitting of the covariance matrix with the easily invertible dominating part. To that end, we construct the diagonal matrix by assembling all of the diagonal sub-matrices in , (in the following, we simplify the notation by omitting the upper index ):
and modify each Toeplitz matrix by subtracting its diagonal part,
Using the matrices defined above, we introduce the rank additive splitting of , which is defined by the skeleton matrices and since is Kronecker symmetrical,
Hence, we have
Likewise, since is a scaled identity, we obtain
We assume that for in some norm; thus, we can apply the “add-and-compress” scheme described above.
We illustrate the “add-and-compress” computational scheme in the following example. We consider the covariance matrix , obtained by a rank-R approximation of the Slater function with on a grid with sampling points. Figure 6 demonstrates the decay in both the matrix norms (left), and the scaled, preconditioned matrices , (right). We use the scaling factor of . The right figure indicates that the analytic matrix functions can be evaluated by using an exponentially fast convergent power series supported by the “add-and-compress” strategy to control the tensor rank.
In the Kriging calculations (Task 3 above), the low-rank tensor structure in the covariance matrix can be directly utilized if the sampling points in the Kriging algorithm form a smaller tensor sub-grid of the initial tensor grid with . The same argument also applies to the evaluation of conditional covariance. In the general case of “non-tensor” locations of the sampling points, some mixed tensor factorizations could be applied.
3.7 Numerical Illustrations
In what follows, we check some examples of the low-rank Tucker tensor approximation of the p-Slater function , and Matérn kernels with full-grid tensor representation. We demonstrate the fast exponential convergence of the tensor approximation in the Tucker rank. The functions were sampled on the three-dimensional Cartesian grid with , .
For a continuous function , where , and , we introduce the collocation-type function-related tensor of order d:
where are grid collocation points, indexed by ,
which are the nodes of equally spaced subintervals with a mesh size of .
We test the convergence of the error in the relative Frobenius norm with respect to the Tucker rank for p-Slater functions with . The Frobenius norm is computed as
where is the tensor reconstructed from the Tucker rank-r decomposition of .
Figure 7 shows convergence of the Frobenius error with respect to the Tucker rank for the p-Slater function discretized on the three-dimensional Cartesian grid
for different values of the parameter p. These functions are illustrated in Figure 8 for and . Figure 9 shows for a Slater function with the dependence of the Tucker decomposition error in the Frobenius norm (3.13) on the Tucker rank, for the increasing grid parameter n.
Figure 10 shows the convergence with respect to the Tucker rank for the spectral density of Matérn covariance
where and . The Tucker decomposition rank is strongly dependent on the parameter α and weakly depend on the parameter ν. The three-dimensional Matérn functions with the parameters , (left) and , (right) are presented in Figure 11, showing the function at .
These numerical experiments demonstrate the good algebraic separability of the typical multidimensional functions used in spatial statistics, that lead us to apply the low-rank tensor decomposition methods to the multidimensional problems of statistical data analysis.
4 Solutions to Typical Tasks in Low-Rank Tensor Format
In this section, we walk through the solutions to the statistical questions raised in the motivation above, Section 2. We add some lemmas to summarize the new computing and storage costs. Let , measurement vector , , , and . We also introduce the restriction operator P, which consists of only ones and zeros, and pick sub-indices from the whole index set . This operator P has tensor-1 structure, i.e., . An application of this restriction tensor does not change tensor ranks.
Computing Matrix-Vector Product.
Let . If is separable, i.e., , then
If is non-separable, then low-rank tensor properties cannot be employed and either the FFT idea  or the hierarchical matrix technique should be applied instead [38, 21, 19, 22].
The computing cost of the product is reduced from to , where , .
Trace and Diagonal of .
Let . Then
The proof follows from the properties of the Kronecker tensors.
For simplicity, we assume that and .
The computing cost of and is reduced from to . The cost of is reduced from to .
A simple Matlab test for computing on a working station with 128 GB produces the computing times in seconds shown in Table 1.
In what follows, we discuss the computation of trace and diag in the case of Tucker representation of the generating Matérn function on grid and present the corresponding numerical example for . We notice that due to Toeplitz structure of the skeleton matrices of size , the diagonal of the covariance matrix is the weighted Tucker sum of the Kronecker products of scaled identity matrices in . The scaling factor is determined by the value of generating Tucker vector corresponding the origin of the computational box . Let the matrix be composed by using the Tucker tensor approximating the generating Matérn function. Then, as a straightforward consequence of the above remark, we derive the simple representations
where corresponds to the origin in and is the identity matrix in the full tensor space. The grid coordinate of is determined by the multi-index .
Table 2 represents the values of computed by the Tucker approximation to the three-dimensional Slater function on the grids with , and for different Tucker rank parameters .
|Tucker rank r|
Given the rank- Tucker tensor , the complexity for calculation of is estimated by .
Computing Square Root .
Observe that can be computed as in (3.12). An iterative method for computing is presented in .
Linear Solvers in a Low-Rank Tensor Format.
Likely, there is already a good theory for solving linear systems with symmetric and positive definite matrix , in a tensor format. We refer to the overview works [34, 35, 18, 20]. Some particular linear solvers are developed in [13, 2, 24, 36, 12], [11, 36]. We also recommend to use the Tensor Toolbox , which contains routines for CP and Tucker tensor formats.
Let . We assume that there is an iterative method that can be used to solve the linear system in a low-rank tensor format and to find the solution in the form . Then the quadratic form is the following scalar products:
The proof follows from the definition and properties of the tensor and scalar products. If is non-separable, then low-rank tensor properties cannot be employed and the FFT idea  or the hierarchical matrix technique  should be employed.
The computing cost of the quadratic form is the product of the number of required iterations and the cost of one iteration, which is (assuming that the iterative method required only matrix-vector products).
The proof follows from the definitions and properties of the tensor and scalar products.
4.2 Interpolation by Simple Kriging
The three most computationally demanding tasks in Kriging are:
solving an system of equations to obtain the Kriging weights,
obtaining the Kriging estimate by superposing the Kriging weights with the cross-covariance matrix between the measurements and the unknowns,
evaluating the estimation variance as the diagonal of an conditional covariance matrix .
Here, M refers to the number of measured data values, and N refers to the number of estimation points. When optimizing the design of the sampling patterns, the challenge is to evaluate the scalar measures of the conditional covariance matrix (see and in equation (2.1) and Task 4) repeatedly within a high-dimensional and non-linear optimization procedure (e.g., [42, 58]).
The following Kriging formula is well known :
If for some small and Lemma 4.4 holds, then
The proof follows from the definitions and the properties of the tensor and scalar products.
The computing cost of solving the linear system is . Computation of the Kriging coefficients by equation (4.3) costs .
If is non-separable, then the low-rank tensor properties cannot be employed and either the FFT idea  or the hierarchical matrix technique  should be applied.
4.3 Computing Conditional Covariance
Let be the vector of measurements. The conditional covariance matrix is
The associated estimation variance is the diagonal of the conditional covariance matrix :
Let us assume that the measurements are taken at locations that form a subset of the total set of nodes , i.e., . We also assume that the nodes belong to a tensor mesh, i.e., if and , then .
Again, we use low-rank tensor solvers, this time to solve the matrix system . We obtain the solution
Assuming that , we obtain
where is the new rank after a rank-truncation procedure and are new factors. Let
The conditional covariance is
where for and for .
4.4 Example: Separable Covariance Matrices
be the Gaussian covariance function, where and . The function can be written as a tensor product of one-dimensional functions:
After discretization of , we obtain as a rank-1 Kronecker product of the one-dimensional covariance matrices, i.e.,
We note that arbitrary discretization (anisotropy) can occur in any direction.
If d Cholesky decompositions exist, i.e., and , then
where and are also lower- and upper-triangular matrices, respectively.
Lemma 4.8 shows that
the Gaussian covariance function in dimensions can be written as the tensor sum of one-dimensional covariance functions,
its Cholesky factor can be computed via Cholesky factors computed from one-dimensional covariances.
The computational complexity drops from , , to , where n is the number of mesh points in a one-dimensional problem. Further research is required on non-Gaussian covariance functions.
Let . If the inverse matrices , , exist, then
The computational complexity drops from , , to , where n is the number of mesh points in a one-dimensional problem.
We assume here that we have an efficient method to invert (e.g., FFT or hierarchical matrices) with a cost of . If not, then the complexity cost drops from to (usual Gaussian elimination algorithm).
If , , are covariance matrices, then we can compute , where is a separable rank-1 d-dimensional covariance function, as
We check for that
and then apply mathematical induction. ∎
The computational cost drops again from , , to . A similar assumption to Remark 4.10 for computing also holds here.
Let , , and . Using MATLAB on a MacBookPro with 16 GB RAM, the time required set up the matrices ,, and is 11 seconds; it takes 4 seconds to compute , , and . The large matrices and are never constructed (i.e., the Kronecker product is never calculated).
In previous work  we used the hierarchical matrix technique to approximate and its Cholesky factor for in 2 minutes. Here, we combine the hierarchical matrix technique and the Kronecker tensor product. Assuming , we approximate for in minutes.
Let be the same as in equation (4.5), and let Lemma 4.4 hold. Then we apply the property in equation (4.6) to obtain a tensor approximation of the log-likelihood:
Equation (4.7) shows one disadvantage of the Gaussian log-likelihood function in high dimensions. Namely, the log-likelihood grows exponentially with d as .
In this work, we demonstrate that the basic functions and operators used in spatial statistics may be represented using rank-structured tensor formats and that the error of this representation exhibits the exponential decay with respect to the tensor rank. We applied the Tucker and canonical tensor decompositions to a family of Matérn-type and Slater-type functions with varying parameters and demonstrated numerically that their approximations exhibit exponentially fast convergence. A low-rank tensor approximation of the Matérn covariance function and its Fourier transform is considered. We separated the radial basis functions using the Laplace transforms to prove the existence of such low-rank approximations, and applied the quadrature method to estimate the tensor ranks and accuracy.
We also demonstrated how to compute , , the matrix-vector product, Kriging operations, and the geostatistical optimal design in a low-rank tensor format with at a linear cost. For matrix of size , and of tensor rank 1, we were able to compute the Cholesky factorization in 15 seconds. We also computed the Tucker approximation to the three-dimensional Slater function on the grid with points and Tucker ranks . Furthermore, we demonstrated how to compute for . This might be useful in machine learning. In this paper, operations such as computing the Cholesky factorization, inverse, and determinant have been implemented for rank-1 tensors (e.g., the Gaussian covariance has a tensor rank-1). These formulas could be useful for developing successive rank-1 updates in greedy algorithms. Further investigations are needed for the representation of these quantities with the ranks higher than one.
Additionally, in Section 3.7 we studies the influence of the parameters of the Matérn covariance function on the tensor ranks (Figures 7 and 9). We observed (see Figure 10) that the dependence of the parameters of the Matérn covariance function on the tensor ranks is very weak, and the ranks grew slowly. In this paper, we also highlighted that big data statistical problems can be effectively treated by using the special low-rank tensor techniques.
Funding statement: The research reported in this publication was supported by funding from King Abdullah University of Science and Technology (KAUST).
 S. Ambikasaran, J. Y. Li, P. K. Kitanidis and E. Darve, Large-scale stochastic linear inversion using hierarchical matrices, Comput. Geosci. 17 (2013), no. 6, 913–927. 10.1007/s10596-013-9364-0Search in Google Scholar
 J. Ballani and D. Kressner, Sparse inverse covariance estimation with hierarchical matrices, preprint (2015), http://sma.epfl.ch/~anchpcommon/publications/quic_ballani_kressner_2014.pdf. Search in Google Scholar
 C. Bertoglio and B. N. Khoromskij, Low-rank quadrature-based tensor approximation of the Galerkin projected Newton/Yukawa kernels, Comput. Phys. Commun. 183 (2012), no. 4, 904–912. 10.1016/j.cpc.2011.12.016Search in Google Scholar
 S. Börm and J. Garcke, Approximating gaussian processes with -matrices, Proceedings of 18th European Conference on Machine Learning—ECML 2007, Lecture Notes in Artificial Intelligence 4701, Springer, Berlin (2007), 42–53. 10.1007/978-3-540-74958-5_8Search in Google Scholar
 S. F. Boys, G. B. Cook, C. M. Reeves and I. Shavitt, Automatic fundamental calculations of molecular structure, Nature 178 (1956), 1207–1209. 10.1038/1781207a0Search in Google Scholar
 D. Braess, Nonlinear Approximation Theory, Springer Ser. Comput. Math. 7, Springer, Berlin, 1986. 10.1007/978-3-642-61609-9Search in Google Scholar
 J.-P. Chilès and P. Delfiner, Geostatistics, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 1999. 10.1002/9780470316993Search in Google Scholar
 A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, Wiley, New York, 2002. 10.1002/0470845899Search in Google Scholar
 S. De Iaco, S. Maggio, M. Palma and D. Posa, Toward an automatic procedure for modeling multivariate space-time data, Comput. Geosci. 41 (2011), 10.1016/j.cageo.2011.08.008. 10.1016/j.cageo.2011.08.008Search in Google Scholar
 L. De Lathauwer, B. De Moor and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278. 10.1137/S0895479896305696Search in Google Scholar
 S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Computation of the response surface in the tensor train data format, preprint (2014), https://arxiv.org/abs/1406.2816. Search in Google Scholar
 S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Polynomial chaos expansion of random coefficients and the solution of stochastic partial differential equations in the tensor train format, SIAM/ASA J. Uncertain. Quantif. 3 (2015), no. 1, 1109–1135. 10.1137/140972536Search in Google Scholar
 S. Dolgov, B. N. Khoromskij and D. Savostyanov, Superfast Fourier transform using QTT approximation, J. Fourier Anal. Appl. 18 (2012), no. 5, 915–953. 10.1007/s00041-012-9227-4Search in Google Scholar
 P. A. Finke, D. J. Brus, M. F. P. Bierkens, T. Hoogland, M. Knotters and F. De Vries, Mapping groundwater dynamics using multiple sources of exhaustive high resolution data, Geoderma 123 (2004), no. 1, 23–39. 10.1016/j.geoderma.2004.01.025Search in Google Scholar
 R. Furrer and M. G. Genton, Aggregation-cokriging for highly multivariate spatial data, Biometrika 98 (2011), no. 3, 615–631. 10.1093/biomet/asr029Search in Google Scholar
 I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Data-sparse approximation to a class of operator-valued functions, Math. Comp. 74 (2005), no. 250, 681–708. 10.1090/S0025-5718-04-01703-XSearch in Google Scholar
 I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Hierarchical tensor-product approximation to the inverse and related operators for high-dimensional elliptic problems, Computing 74 (2005), no. 2, 131–157. 10.1007/s00607-004-0086-ySearch in Google Scholar
 L. Grasedyck, D. Kressner and C. Tobler, A literature survey of low-rank tensor approximation techniques, GAMM-Mitt. 36 (2013), no. 1, 53–78. 10.1002/gamm.201310004Search in Google Scholar
 W. Hackbusch, A sparse matrix arithmetic based on -matrices. I. Introduction to -matrices, Computing 62 (1999), no. 2, 89–108. 10.1007/s006070050015Search in Google Scholar
 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Ser. Comput. Math. 42, Springer, Heidelberg, 2012. 10.1007/978-3-642-28027-6Search in Google Scholar
 W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, Springer Ser. Comput. Math. 49, Springer, Heidelberg, 2015. 10.1007/978-3-662-47324-5Search in Google Scholar
 W. Hackbusch and B. N. Khoromskij, A sparse -matrix arithmetic. II. Application to multi-dimensional problems, Computing 64 (2000), no. 1, 21–47. 10.1007/PL00021408Search in Google Scholar
 W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. I. Separable approximation of multi-variate functions, Computing 76 (2006), no. 3–4, 177–202. 10.1007/s00607-005-0144-0Search in Google Scholar
 W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. II. HKT representation of certain operators, Computing 76 (2006), no. 3–4, 203–225. 10.1007/s00607-005-0145-zSearch in Google Scholar
 M. S. Handcock and M. L. Stein, A Bayesian analysis of Kriging, Technometrics 35 (1993), 403–410. 10.1080/00401706.1993.10485354Search in Google Scholar
 H. Harbrecht, M. Peters and M. Siebenmorgen, Efficient approximation of random fields for numerical applications, Numer. Linear Algebra Appl. 22 (2015), no. 4, 596–617. 10.1002/nla.1976Search in Google Scholar
 J. Håstad, Tensor rank is NP-complete, J. Algorithms 11 (1990), no. 4, 644–654. 10.1007/BFb0035776Search in Google Scholar
 M. R. Haylock, N. Hofstra, A. M. Klein Tank, E. J. Klok, P. D. Jones and M. New, A european daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006, J. Geophys. Res. 113 (2008), 10.1029/2008JD010201. 10.1029/2008JD010201Search in Google Scholar
 F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys. 6 (1927), 164–189. 10.1002/sapm192761164Search in Google Scholar
 A. G. Journel and C. J. Huijbregts, Mining Geostatistics, Academic Press, New York, 1978. Search in Google Scholar
 V. Khoromskaia, Computation of the Hartree–Fock exchange by the tensor-structured methods, Comput. Methods Appl. Math. 10 (2010), no. 2, 204–218. 10.2478/cmam-2010-0012Search in Google Scholar
 V. Khoromskaia and B. N. Khoromskij, Fast tensor method for summation of long-range potentials on 3D lattices with defects, Numer. Linear Algebra Appl. 23 (2016), no. 2, 249–271. 10.1002/nla.2023Search in Google Scholar
 B. N. Khoromskij, Structured rank- decomposition of function-related tensors in , Comput. Methods Appl. Math. 6 (2006), no. 2, 194–220. 10.2478/cmam-2006-0010Search in Google Scholar
 B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on recent advances, Chemometr. Intell. Laboratory Syst. 110 (2011), no. 1, 1–19. 10.1016/j.chemolab.2011.09.001Search in Google Scholar
 B. N. Khoromskij, Tensor numerical methods for multidimensional PDEs: Theoretical analysis and initial applications, CEMRACS 2013—Modelling and Simulation of Complex Systems: Stochastic and Deterministic Approaches, ESAIM Proc. Surveys 48, EDP Sci., Les Ulis (2015), 1–28. 10.1051/proc/201448001Search in Google Scholar
 B. N. Khoromskij and V. Khoromskaia, Low rank Tucker-type tensor approximation to classical potentials, Cent. Eur. J. Math. 5 (2007), no. 3, 523–550. 10.2478/s11533-007-0018-0Search in Google Scholar
 B. N. Khoromskij and V. Khoromskaia, Multigrid accelerated tensor approximation of function related multidimensional arrays, SIAM J. Sci. Comput. 31 (2009), no. 4, 3002–3026. 10.1137/080730408Search in Google Scholar
 B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Application of hierarchical matrices for computing the Karhunen–Loève expansion, Computing 84 (2009), no. 1–2, 49–67. 10.1007/s00607-008-0018-3Search in Google Scholar
 P. K. Kitanidis, Introduction to Geostatistics, Cambridge University Press, Cambridge, 1997. 10.1017/CBO9780511626166Search in Google Scholar
 T. G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (2001), no. 1, 243–255. 10.1137/S0895479800368354Search in Google Scholar
 T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500. 10.1137/07070111XSearch in Google Scholar
 J. B. Kollat, P. M. Reed and J. R. Kasprzyk, A new epsilon-dominance hierarchical bayesian optimization algorithm for large multiobjective monitoring network design problems, Adv. Water Res. 31 (2008), no. 5, 828–845. 10.1016/j.advwatres.2008.01.017Search in Google Scholar
 A. Litvinenko, HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification, preprint (2017), https://arxiv.org/abs/1709.08625. 10.1016/j.mex.2019.07.001Search in Google Scholar PubMed PubMed Central
 A. Litvinenko, Y. Sun, M. G. Genton and D. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, preprint (2017), https://arxiv.org/abs/1709.04419. 10.1016/j.csda.2019.02.002Search in Google Scholar
 B. Matérn, Spatial Variation, 2nd ed., Lecture Notes in Statist. 36, Springer, Berlin, 1986. 10.1007/978-1-4615-7892-5Search in Google Scholar
 G. Matheron, The Theory of Regionalized Variables and its Applications, Ecole de Mines, Fontainebleau, 1971. Search in Google Scholar
 V. Minden, A. Damle, K. L. Ho and L. Ying, Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations, Multiscale Model. Simul. 15 (2017), no. 4, 1584–1611. 10.1137/17M1116477Search in Google Scholar
 W. G. Müller, Collecting Spatial Data. Optimum Design of Experiments for Random Fields, 3rd ed., Contrib. Statist., Springer, Berlin, 2007. Search in Google Scholar
 G. R. North, J. Wang and M. G. Genton, Correlation models for temperature fields, J. Climate 24 (2011), 5850–5862. 10.1175/2011JCLI4199.1Search in Google Scholar
 W. Nowak, Measures of parameter uncertainty in geostatistical estimation and geostatistical optimal design, Math. Geosci 42 (2010), no. 2, 199–221. 10.1007/s11004-009-9245-1Search in Google Scholar
 W. Nowak and A. Litvinenko, Kriging and spatial design accelerated by orders of magnitude: Combining low-rank covariance approximations with FFT-techniques, Math. Geosci. 45 (2013), no. 4, 411–435. 10.1007/s11004-013-9453-6Search in Google Scholar
 D. Nychka, S. Bandyopadhyay, D. Hammerling, F. Lindgren and S. Sain, A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Statist. 24 (2015), no. 2, 579–599. 10.1080/10618600.2014.914946Search in Google Scholar
 I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317. 10.1137/090752286Search in Google Scholar
 J. Quiñonero Candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6 (2005), 1939–1959. Search in Google Scholar
 C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adapt. Comput. Mach. Learn., MIT, Cambridge, 2006. 10.7551/mitpress/3206.001.0001Search in Google Scholar
 A. K. Saibaba, S. Ambikasaran, J. Yue Li, P. K. Kitanidis and E. F. Darve, Application of hierarchical matrices to linear inverse problems in geostatistics, Oil Gas Sci. Technol. Rev. IFP Energ. Nouv. 67 (2012), no. 5, 857–875. 10.2516/ogst/2012064Search in Google Scholar
 U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Ann. Physics 326 (2011), no. 1, 96–192. 10.1016/j.aop.2010.09.012Search in Google Scholar
 R. Shah and P. Reed, Comparative analysis of multiobjective evolutionary algorithms for random and correlated instances of multiobjective d-dimensional knapsack problems, European J. Oper. Res. 211 (2011), no. 3, 466–479. 10.1016/j.ejor.2011.01.030Search in Google Scholar
 A. K. Smilde, R. Bro and P. Geladi, Multi-Way Analysis with Applications in the Chemical Sciences, Wiley, New York, 2004. 10.1002/0470012110Search in Google Scholar
 G. Spöck and J. Pilz, Spatial sampling design and covariance-robust minimax prediction based on convex design ideas, Stoch. Environmental Res. Risk Assess. 24 (2010), 463–482. 10.1007/s00477-009-0334-ySearch in Google Scholar
 M. L. Stein, J. Chen and M. Anitescu, Difference filter preconditioning for large covariance matrices, SIAM J. Matrix Anal. Appl. 33 (2012), no. 1, 52–72. 10.1137/110834469Search in Google Scholar
 M. L. Stein, Z. Chi and L. J. Welty, Approximating likelihoods for large spatial data sets, J. R. Stat. Soc. Ser. B Stat. Methodol. 66 (2004), no. 2, 275–296. 10.1046/j.1369-7412.2003.05512.xSearch in Google Scholar
 F. Stenger, Numerical Methods Based on Sinc and Analytic Functions, Springer Ser. Comput. Math. 20, Springer, New York, 1993. 10.1007/978-1-4612-2706-9Search in Google Scholar
 Y. Sun and M. L. Stein, Statistically and computationally efficient estimating equations for large spatial datasets, J. Comput. Graph. Statist. 25 (2016), no. 1, 187–208. 10.1080/10618600.2014.975230Search in Google Scholar
 L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279–311. 10.1007/BF02289464Search in Google Scholar PubMed
 S. M. Wesson and G. G. S. Pegram, Radar rainfall image repair techniques, Hydrol. Earth Syst. Sci. 8 (2004), no. 2, 8220–8234. 10.5194/hess-8-220-2004Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston