## 1 Introduction

Nowadays it is very common to work with large spatial data sets [64, 15, 62, 44, 61, 52], for instance, with satellite data, collected over a very large area (e.g., the data collected by the National Center for Atmospheric Research, USA, https://www.earthsystemgrid.org/). This data can also come from a computer simulator code as a solution of a certain multiparametric equation (e.g., Weather research and Forecasting model, https://www.mmm.ucar.edu/weather-research-and-forecasting-model), it could be also sensor data from multiple sources. Typical operations in spatial statistics, such as evaluating the spatially averaged estimation variance, computing quadratic forms of the conditional covariance matrix, or computing maximum of likelihood function [62] require high computing power and time. Our motivation for using low-rank tensor techniques is that operations on advanced matrices, such as hierarchical, low-rank and sparse matrices, are limited by their high computational costs, especially in three dimensions and for a large number of observations.

A tensor can be simply defined as a high-order matrix,
where multi-indices are used instead of indices (see Section 3 and equation (3.1)
for a rigorous definition). One way to obtain a tensor from a vector or matrix is to reshape it. For example,
we assume that *slices* and *fibers*
[40, 41, 10].
These slices and fibers can be analyzed for linear dependences, super symmetry, or sparsity
and may result in a strong data compression.
Another difference between tensors and matrices is that a matrix (obtained, for instance,
after the discretization
of a kernel *tensor rank*, but not a matrix rank.
In this work, we consider
two very common tensor formats: canonical (denoted as CP) and Tucker (see Section 3).

Low-rank tensor methods can be gainfully combined with other data-compression techniques in low dimensions. For example, a three-dimensional function can be approximated as the sum of the tensor products of one-dimensional function. Then the usual matrix techniques can be applied to those one-dimensional functions.

To be more concrete, we consider a relatively wide class of Matérn covariance functions. We demonstrate
how to approximate Matérn covariance matrices in a low-rank tensor format, then how to perform typical
Kriging and spatial statistics operations in this tensor format. Matérn covariance matrices typically
depend on three to five unknown hyper-parameters,
such as smoothness, three covariance lengths (in a three-dimensional anisotropic case), and variance.
We study the dependences of tensor ranks and approximation errors on these parameters. Splitting the spatial
variables via low-rank techniques reduces the
computing cost for a matrix-vector product from *d* is the spatial dimension, *r* is the tensor rank, and
*n* is the number of mesh points along the longest edge
of the computational domain. For simplicity, we assume that

- (1)The storage cost is reduced from
to$\mathcal{\mathcal{O}}({n}^{d})$ or, depending on the tensor format, to$\mathcal{\mathcal{O}}(drn)$ , where$\mathcal{\mathcal{O}}(drn+{r}^{d})$ .$d>1$ - (2)The low-rank tensor technique allows us to compute not only the matrix-vector product,
but also the inverse
, square root${\mathbf{C}}^{-1}$ , matrix exponent${\mathbf{C}}^{\frac{1}{2}}$ ,$\mathrm{exp}(\mathbf{C})$ ,$\mathrm{trace}(\mathbf{C})$ , and a likelihood function.$det(\mathbf{C})$ - (3)The low-rank tensor approximation is relatively new, but already a well-studied technique with free software libraries available.
- (4)The approximation accuracy is fully controlled by the tensor rank. The full rank gives an exact representation.
- (5)Low-rank techniques are either faster than a Fourier transform (
vs.$\mathcal{\mathcal{O}}(drn)$ ) or can be efficiently combined with it [51, 13].$\mathcal{\mathcal{O}}({n}^{d}\mathrm{log}{n}^{d})$

General limitations of the tensor technique are the following:

- (a)It could be time consuming to compute a low-rank tensor decomposition.
- (b)It requires axes-parallel mesh.
- (c)Some theoretical estimations exist for functions depending
on
(although more general functions have a low-rank representation in practice).$|x-y|$

During the last few years, there has been great interest in numerical methods for
representing and approximating large covariance matrices
[44, 54, 56, 51, 1, 2, 43].
Low-rank tensors were previously applied to accelerated Kriging and spatial design by orders of
magnitude [51].
The covariance matrix under consideration was assumed to be circulant, and the
first column had a low-rank decomposition. Therefore, *d*-dimensional Fourier
was applied to and drastically reduce the storage and the computing cost.

The maximum likelihood estimator was computed for parameter-fitting given Gaussian observations
with a Matérn covariance matrix [47]. The presented framework for unstructured
observations in two spatial dimensions allowed for an evaluation of the log-likelihood
and its gradient with computational complexity *n* in three dimensions. Namely, the complexity in three dimensions will be *C* is a large constant which scales exponentially
in dimension *d*, see [22].
Thus, the

The key idea is to compute a low-rank decomposition not of the covariance function (it could be hard), but of its analytically known spectral density (which could be a much easier object) and then apply the inverse Fourier to the obtained low-rank components. The Fourier transformation of the Matérn covariance function is known analytically as the Hilbert tensor. This Hilbert tensor can be decomposed numerically in a low-rank tensor format. Both the Fourier transformation and its inverse have the canonical (CP) tensor rank-1. Therefore, the inverse Fourier does not change the tensor rank of the argument. By applying the inverse Fourier to the low-rank tensor, we obtain a low-rank approximation of the initial covariance matrix, which can be further used in the Kalman filter update, Karhunen–Loève expansion, Bayesian update, and Kriging.

The structure of the paper is as follows.
In Section 2, we list typical tasks from statistics that motivate us to use
low-rank tensor techniques and
define the Matérn covariance functions and their Fourier transformations.
Section 3 is devoted to low-rank tensor decomposition.
Sections 3.4, 3.5 and 3.6
contain the main theoretical
contribution of this work. We present low-rank tensor techniques and separate radial basis
functions using the Laplace transform and
the

## 2 Motivation

### 2.1 Problem Settings in Spatial Statistics

Below, we formulate *five tasks*. These computational tasks are very common and important in statistics.
Fast and efficient solution of these tasks will help to solve many real-world problems, such as the weather
prediction, moisture modeling, and optimal design in geostatistics.

### Task 1: Approximate a Matérn covariance function in a low-rank tensor format.

The covariance function *N* mesh points,

for some given

Here, the matrices

### Task 2: Computing of square root of $\mathbf{C}$ .

The square root

### Task 3: Kriging.

Spatial statistics and Kriging [39] are used to model the distribution of ore grade, forecast of rainfall intensities, moisture, temperatures, or contaminant. The missing values are interpolated from the known measurements by Kriging [46, 30]. When considering space-time Kriging on fine meshes [66, 14, 28, 9], Kriging may easily exceed the computational power of modern computers. Estimating the variance of Kriging and geostatistical optimal design problems are especially numerically intensive [48, 50, 60].

The Kriging can be defined as follows.
Let

### Task 4: Geostatistical design.

The goal of geostatistical optimal design is to
optimize the sampling patterns
from which the data values in

where

### Task 5: Computing the joint Gaussian log-likelihood function.

We assume
that

The difficulty here is that each iteration step of a maximization procedure requires the
solution of a linear system

In Section 4 we give detailed solutions. We give strict definition of tensors later in Section 3.

### 2.2 Matérn Covariance and Its Fourier Transform

A low-rank approximation of the covariance function is a key component of the tasks formulated above. Among of the many covariance models available, the Matérn family [45, 25] is widely used in spatial statistics, geostatistics [7], machine learning [4], image analysis, weather forecast, moisture modeling, and as the correlation for temperature fields [49]. The work [25] introduced the Matérn form of spatial correlations into statistics as a flexible parametric class with one parameter determining the smoothness of the underlying spatial random field.

The main idea of this low-rank approximation is shown in Figure 1 and explained in details in Section 3.3. Figure 1 demonstrates two possible ways to find a low-rank tensor approximation of the Matérn covariance function. The first way (marked with “?”) is not so trivial and the second via the Fast Fourier Transform (FFT), low-rank and the inverse FFT (IFFT) is more trivial. We use here the fact that the FT of the Matérn covariance is analytically known and has a known low-rank approximation. The IFFT can be computed numerically and does not change the tensor ranks.

The Matérn covariance function is defined as

where distance

The *d*-dimensional Fourier transform

where *A* is a positive diagonal

## 3 Low-Rank Tensor Decompositions

In this section, we review the definitions of the CP and Tucker tensor formats.
Then we provide the analytic *d*.

### 3.1 General Definitions

CP and Tucker rank-structured tensor formats have been
applied for the quantitative analysis of correlation in multidimensional experimental data
for a long time in
chemometrics and signal processing [59, 8].
The Tucker tensor format was introduced in 1966 for tensor decomposition of
multidimensional arrays in chemometrics [65].
Though the canonical representation of multivariate functions was
introduced as early as in 1927 [29], only the Tucker tensor format provides a
stable algorithm for decomposition of full-size tensors. A mathematical approval
of the Tucker decomposition algorithm was presented in papers on
higher-order singular value decomposition (HOSVD) and the Tucker
ALS algorithm for orthogonal Tucker approximation of higher-order tensors [10].
For higher dimensions, the so-called Matrix Product States (MPS)
(see the survey paper [57])
or the Tensor Train (TT) [53] decompositions can be applied.
However, for three-dimensional applications, the Tucker and CP
tensor formats remain the best choices.
The fast convergence of the Tucker decomposition was proved and demonstrated numerically for
higher-order tensors that
arise from the discretization of linear operators and functions in

These results inspired the canonical-to-Tucker (C2T) and Tucker-to-canonical (T2C) decompositions for function-related tensors in the case of large input ranks, as well as the multigrid Tucker approximation [37].

A tensor of order *d* in a full format is defined as a multidimensional array
over a *d*-tuple index set:

Here,

equipped with the Euclidean scalar product

Tensors with all dimensions having equal size

To avoid exponential scaling in the dimension, the rank-structured separable representations (approximations) of the multidimensional tensors can be used. The simplest separable element is given by the rank-1 tensor

with entries

The rank-1 canonical tensor is a discrete counterpart of the separable *d*-variate function,
which can be represented as the product of univariate functions

An example of the separable *d*-variate function is

A tensor in the *R*-term canonical format is defined by a finite sum of rank-1 tensors
(Figure 2, left)

where *R* is the canonical rank. The storage cost of this
parametrization is bounded by *dRn*.
An element

An alternative (contracted product) notation is used in computer science community:

where

For *R* in representation (3.2), and the respective
decomposition with the polynomial cost in *d*, i.e., the computation of the
canonical decomposition is an *N*-*P* hard problem [27].

The Tucker tensor format (Figure 2, right) is suitable for stable numerical
decompositions with a fixed
truncation threshold.
We say that the tensor

where

In the case

### 3.2 Tucker Decomposition of Full Format Tensors

We use the following algorithm to compute the Tucker decomposition of the full format tensor.
The most time-consuming
part of the Tucker algorithm is higher-order singular value decomposition (HOSVD), the computation of
the initial guess for matrices

The second part of the algorithm
is the ALS procedure. For every tensor mode, a “single-hole” tensor of reduced size
is constructed by the mapping all of the modes
of the original tensor except one into the subspaces

The numerical cost of Tucker decomposition for full size tensors is dominated by
the initial guess, which is estimated as

The multigrid Tucker algorithm for full size tensors allows the computational complexity to be
linear in the full size of the tensor,

### 3.3 Illustration of the Low-Rank Approximation Idea

In this subsection we describe a possible ways to find a low-rank tensor approximation of the
Matérn covariance matrix *d*-dimensional Fourier transform, where

where

### 3.4 Sinc Approximation of the Matérn Function

The Sinc method provides a constructive approximation
of the multivariate functions in the form of a low-rank canonical representation.
It can be also used for the theoretical proof and for the rank estimation.
Methods for the separable approximation of the three-dimensional Newton kernel and many other spherically
symmetric functions that use the Gaussian sums have been developed
since the initial studies in chemical [5] and mathematical
literature [63, 6, 23, 17].
Here, we use a tensor-decomposition approach for lattice-structured interaction
potentials [32].
We also recall the grid-based method for a low-rank canonical
representation of the spherically symmetric kernel function

Following the standard schemes, we introduce
the uniform *n* is even)
in the computational domain

for the 3-tuple index

The low-rank canonical decomposition of the third-order tensor

specified by the weights

In particular, the

can be applied, where the quadrature points

Under the assumption *M* (uniformly in *p*)
for a class of functions

We assume that a representation similar to (3.5) exists
for any fixed

providing an exponential convergence rate in *M*:

By combining (3.4) and (3.7), and taking into account the
separability of the Gaussian basis functions, we arrive at the low-rank
approximation of each entry of the tensor

Recalling that

Then the third-order tensor *R*-term (

where *M* can be chosen as the minimal number
such that in the max-norm

The skeleton vectors can be reindex by

In some applications, the tensor can be given in the canonical tensor format, but with large
rank *R* and discretized on large grids

For such cases, the canonical-to-Tucker decomposition algorithm was introduced [37].
It is based on the minimization ALS procedure, similar to the Tucker algorithm,
described in Section 3.2 for full-size tensors, but the
initial guess is computed by just the
SVD of the side matrices

Another efficient rank-structured representation of the multidimensional tensors is the mixed-tensor format [31], which combines either the canonical-to-Tucker decomposition with the Tucker-to-canonical decomposition, or standard Tucker decomposition with the canonical-to-Tucker decomposition, in order to produce a canonical tensor from a full-size tensor.

### 3.5 Laplace Transform of the Covariance Matrix

The integral representations like (3.5) can be derived by the Laplace
transform either directly to the Matérn covariance function or to its spectral density
(2.3).
For example, in the case of the Newton kernel,

In this case,

When the Matérn spectral density in (2.3) has an
even dimension parameter

can be applied after substituting

If

### 3.6 Covariance Matrix in Rank-Structured Tensor Format

In what follows, we consider the CP approximation of the radial function *R* symmetric CP tensor on an

with the same skeleton vectors

We define the covariance matrix

Using the tensor representation (3.9), we represent the large *R* Kronecker (tensor) format as

where the symmetric Toeplitz matrix

Figure 5 illustrates eight selected canonical generating vectors from

A Toeplitz matrix can be multiplied by a vector in

In the rest of this subsection, we introduce the numerical scheme, based on certain
specific properties of the skeleton matrices in the symmetric rank-*R* decomposition
(3.10) that is capable of
rank-structured calculations
of the analytic matrix functions

Given an symmetric positive definite matrix such that

where the matrix

To limit the rank-structured evaluation of

and modify each Toeplitz matrix

Using the matrices defined above, we introduce the rank

Hence, we have

Likewise, since

We assume that

We illustrate the “add-and-compress” computational scheme in the following example.
We consider the covariance matrix *R*

In the Kriging calculations (Task 3 above), the low-rank tensor
structure in the covariance matrix

### 3.7 Numerical Illustrations

In what follows, we check some examples of the low-rank Tucker tensor approximation of the *p*-Slater function

For a continuous function *d*:

where

which are the nodes of
equally spaced subintervals with a mesh size of

We test the convergence of the
error in the relative Frobenius norm with respect to the Tucker rank for *p*-Slater
functions with

where *r* decomposition of

Figure 7 shows convergence of the Frobenius error with respect to the
Tucker rank for the *p*-Slater function discretized on the three-dimensional Cartesian grid

for different values of the parameter *p*. These
functions are illustrated in Figure 8 for
*n*.

Figure 10 shows the convergence with respect to the Tucker rank for the spectral density of Matérn covariance

where

These numerical experiments demonstrate the good algebraic separability of the typical multidimensional functions used in spatial statistics, that lead us to apply the low-rank tensor decomposition methods to the multidimensional problems of statistical data analysis.

## 4 Solutions to Typical Tasks in Low-Rank Tensor Format

In this section, we walk through the solutions to the statistical
questions raised in the motivation above, Section 2.
We add some lemmas to summarize the new computing and storage costs.
Let *P*, which consists of only ones and zeros,
and pick sub-indices *P* has tensor-1 structure,
i.e.,

### Computing Matrix-Vector Product.

Let

If

*The computing cost of the product *

### Trace and Diagonal of $\mathbf{C}$ .

Let

The proof follows from the properties of the Kronecker tensors.

*The cost of computing the right-hand sides in (4.1)–(4.2)
is rdn, where *

For simplicity,
we assume that

*The computing cost of *

A simple Matlab test for computing

Computing time (in seconds) to set up and compute the trace of

n | |||

d | 100 | 500 | 1000 |

1000 | 3.7 | 67 | 491 |

In what follows, we discuss the computation of trace and diag in the case of Tucker representation of the
generating Matérn function on

where

Table 2 represents the values of

The error of the Tucker approximation to the value

Tucker rank r | ||||||||||

n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |

129 | 0.386 | 0.20 | 0.12 | 0.07 | 0.04 | 0.017 | 0.002 | 1.2e | 8.4e | 7.5e |

257 | 0.386 | 0.20 | 0.12 | 0.073 | 0.046 | 0.029 | 0.017 | 0.007 | 8.0e | 1.4e |

513 | 0.386 | 0.20 | 0.12 | 0.073 | 0.047 | 0.031 | 0.020 | 0.0138 | 0.008 | 0.0035 |

Given the rank-

### Computing Square Root ${\mathbf{C}}^{\frac{1}{2}}$ .

Observe that

### Linear Solvers in a Low-Rank Tensor Format.

Likely, there is already a good theory for solving
linear systems

### 4.1 Computing ${\mathbf{z}}^{T}{\mathbf{C}}^{-1}\mathbf{z}$

*Let *

The proof follows from the definition and properties of the tensor and scalar products.
If

*The computing cost of the quadratic form *

The proof follows from the definitions and properties of the tensor and scalar products.

### 4.2 Interpolation by Simple Kriging

The three most computationally demanding tasks in Kriging are:

- (1)solving an
system of equations to obtain the Kriging weights,$M\times M$ - (2)obtaining the
Kriging estimate by superposing the Kriging weights with the$N\times 1$ cross-covariance matrix between the measurements and the unknowns,$N\times M$ - (3)evaluating the
estimation variance as the diagonal of an$N\times 1$ conditional covariance matrix [51].$N\times N$

Here, *M* refers
to the number of measured data values, and *N* refers to the number of estimation
points.
When optimizing the design of the sampling patterns,
the challenge is to evaluate the scalar measures of the

The following Kriging formula is well known [51]:

*If
*

The proof follows from the definitions and the properties of the tensor and scalar products.

*The computing cost of solving the linear system *

If

### 4.3 Computing Conditional Covariance

Let

The associated estimation variance

Let us assume that
the measurements are taken at locations that form a subset of the total
set of nodes

Let

Again, we use
low-rank tensor solvers, this time to solve the matrix system

Assuming that

where

The conditional covariance is

where

### 4.4 Example: Separable Covariance Matrices

Let

be the Gaussian covariance function,
where

After discretization of

We note that arbitrary discretization (anisotropy) can occur in any direction.

*If d Cholesky decompositions exist, i.e., *

*where *

Lemma 4.8 shows that

- (a)the Gaussian covariance function in
dimensions
can be written as the tensor sum of one-dimensional covariance functions,$d>1$ - (b)its Cholesky factor can be computed via Cholesky factors computed from one-dimensional covariances.

The computational complexity drops
from *n*
is the number of mesh points in a one-dimensional problem. Further research is required
on non-Gaussian covariance functions.

*Let *

The computational complexity drops from *n* is the number of mesh points in a one-dimensional problem.

We assume here that we have an efficient method to invert

*If *

*d*-dimensional covariance function, as

We check for

and then apply mathematical induction. ∎

The computational cost
drops again from

Let

In previous work [43] we used the hierarchical matrix technique to
approximate

*Let *

Equation (4.7) shows one disadvantage of the Gaussian log-likelihood function in high dimensions.
Namely, the log-likelihood grows exponentially with *d* as

## 5 Conclusion

In this work, we demonstrate that the basic functions and operators used in spatial statistics
may be represented using rank-structured tensor formats and that the error of this
representation exhibits the exponential decay with respect to the tensor rank.
We applied the Tucker and canonical tensor decompositions to a family of
Matérn-type and Slater-type functions with varying parameters and demonstrated numerically
that their approximations exhibit exponentially fast convergence.
A low-rank tensor approximation of the Matérn covariance function and its Fourier
transform is considered. We separated the radial basis functions using the Laplace transforms
to prove the existence of such low-rank approximations, and applied the

We also demonstrated how to compute

Additionally, in Section 3.7 we studies the influence of the parameters of the Matérn covariance function on the tensor ranks (Figures 7 and 9). We observed (see Figure 10) that the dependence of the parameters of the Matérn covariance function on the tensor ranks is very weak, and the ranks grew slowly. In this paper, we also highlighted that big data statistical problems can be effectively treated by using the special low-rank tensor techniques.

## References

- [1]↑
S. Ambikasaran, J. Y. Li, P. K. Kitanidis and E. Darve, Large-scale stochastic linear inversion using hierarchical matrices, Comput. Geosci. 17 (2013), no. 6, 913–927.

- [2]↑
J. Ballani and D. Kressner, Sparse inverse covariance estimation with hierarchical matrices, preprint (2015), http://sma.epfl.ch/~anchpcommon/publications/quic_ballani_kressner_2014.pdf.

- [3]↑
C. Bertoglio and B. N. Khoromskij, Low-rank quadrature-based tensor approximation of the Galerkin projected Newton/Yukawa kernels, Comput. Phys. Commun. 183 (2012), no. 4, 904–912.

- [4]↑
S. Börm and J. Garcke, Approximating gaussian processes with H 2 {{H^{2}}}-matrices, Proceedings of 18th European Conference on Machine Learning—ECML 2007, Lecture Notes in Artificial Intelligence 4701, Springer, Berlin (2007), 42–53.

- [5]↑
S. F. Boys, G. B. Cook, C. M. Reeves and I. Shavitt, Automatic fundamental calculations of molecular structure, Nature 178 (1956), 1207–1209.

- [6]↑
D. Braess, Nonlinear Approximation Theory, Springer Ser. Comput. Math. 7, Springer, Berlin, 1986.

- [7]↑
J.-P. Chilès and P. Delfiner, Geostatistics, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 1999.

- [8]↑
A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, Wiley, New York, 2002.

- [9]↑
S. De Iaco, S. Maggio, M. Palma and D. Posa, Toward an automatic procedure for modeling multivariate space-time data, Comput. Geosci. 41 (2011), 10.1016/j.cageo.2011.08.008.

- [10]↑
L. De Lathauwer, B. De Moor and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278.

- [11]↑
S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Computation of the response surface in the tensor train data format, preprint (2014), https://arxiv.org/abs/1406.2816.

- [12]↑
S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Polynomial chaos expansion of random coefficients and the solution of stochastic partial differential equations in the tensor train format, SIAM/ASA J. Uncertain. Quantif. 3 (2015), no. 1, 1109–1135.

- [13]↑
S. Dolgov, B. N. Khoromskij and D. Savostyanov, Superfast Fourier transform using QTT approximation, J. Fourier Anal. Appl. 18 (2012), no. 5, 915–953.

- [14]↑
P. A. Finke, D. J. Brus, M. F. P. Bierkens, T. Hoogland, M. Knotters and F. De Vries, Mapping groundwater dynamics using multiple sources of exhaustive high resolution data, Geoderma 123 (2004), no. 1, 23–39.

- [15]↑
R. Furrer and M. G. Genton, Aggregation-cokriging for highly multivariate spatial data, Biometrika 98 (2011), no. 3, 615–631.

- [16]↑
I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Data-sparse approximation to a class of operator-valued functions, Math. Comp. 74 (2005), no. 250, 681–708.

- [17]↑
I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Hierarchical tensor-product approximation to the inverse and related operators for high-dimensional elliptic problems, Computing 74 (2005), no. 2, 131–157.

- [18]↑
L. Grasedyck, D. Kressner and C. Tobler, A literature survey of low-rank tensor approximation techniques, GAMM-Mitt. 36 (2013), no. 1, 53–78.

- [19]↑
W. Hackbusch, A sparse matrix arithmetic based on ℋ \mathscr{H}-matrices. I. Introduction to ℋ \mathscr{H}-matrices, Computing 62 (1999), no. 2, 89–108.

- [20]↑
W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Ser. Comput. Math. 42, Springer, Heidelberg, 2012.

- [21]↑
W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, Springer Ser. Comput. Math. 49, Springer, Heidelberg, 2015.

- [22]↑
W. Hackbusch and B. N. Khoromskij, A sparse ℋ \mathscr{H}-matrix arithmetic. II. Application to multi-dimensional problems, Computing 64 (2000), no. 1, 21–47.

- [23]↑
W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. I. Separable approximation of multi-variate functions, Computing 76 (2006), no. 3–4, 177–202.

- [24]↑
W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. II. HKT representation of certain operators, Computing 76 (2006), no. 3–4, 203–225.

- [25]↑
M. S. Handcock and M. L. Stein, A Bayesian analysis of Kriging, Technometrics 35 (1993), 403–410.

- [26]↑
H. Harbrecht, M. Peters and M. Siebenmorgen, Efficient approximation of random fields for numerical applications, Numer. Linear Algebra Appl. 22 (2015), no. 4, 596–617.

- [28]↑
M. R. Haylock, N. Hofstra, A. M. Klein Tank, E. J. Klok, P. D. Jones and M. New, A european daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006, J. Geophys. Res. 113 (2008), 10.1029/2008JD010201.

- [29]↑
F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys. 6 (1927), 164–189.

- [31]↑
V. Khoromskaia, Computation of the Hartree–Fock exchange by the tensor-structured methods, Comput. Methods Appl. Math. 10 (2010), no. 2, 204–218.

- [32]↑
V. Khoromskaia and B. N. Khoromskij, Fast tensor method for summation of long-range potentials on 3D lattices with defects, Numer. Linear Algebra Appl. 23 (2016), no. 2, 249–271.

- [33]↑
B. N. Khoromskij, Structured rank-( R 1 , … , R D ) (R_{1},\dots,R_{D}) decomposition of function-related tensors in ℝ D \mathbb{R}^{D}, Comput. Methods Appl. Math. 6 (2006), no. 2, 194–220.

- [34]↑
B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on recent advances, Chemometr. Intell. Laboratory Syst. 110 (2011), no. 1, 1–19.

- [35]↑
B. N. Khoromskij, Tensor numerical methods for multidimensional PDEs: Theoretical analysis and initial applications, CEMRACS 2013—Modelling and Simulation of Complex Systems: Stochastic and Deterministic Approaches, ESAIM Proc. Surveys 48, EDP Sci., Les Ulis (2015), 1–28.

- [36]↑
B. N. Khoromskij and V. Khoromskaia, Low rank Tucker-type tensor approximation to classical potentials, Cent. Eur. J. Math. 5 (2007), no. 3, 523–550.

- [37]↑
B. N. Khoromskij and V. Khoromskaia, Multigrid accelerated tensor approximation of function related multidimensional arrays, SIAM J. Sci. Comput. 31 (2009), no. 4, 3002–3026.

- [38]↑
B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Application of hierarchical matrices for computing the Karhunen–Loève expansion, Computing 84 (2009), no. 1–2, 49–67.

- [40]↑
T. G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (2001), no. 1, 243–255.

- [41]↑
T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500.

- [42]↑
J. B. Kollat, P. M. Reed and J. R. Kasprzyk, A new epsilon-dominance hierarchical bayesian optimization algorithm for large multiobjective monitoring network design problems, Adv. Water Res. 31 (2008), no. 5, 828–845.

- [43]↑
A. Litvinenko, HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification, preprint (2017), https://arxiv.org/abs/1709.08625.

- [44]↑
A. Litvinenko, Y. Sun, M. G. Genton and D. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, preprint (2017), https://arxiv.org/abs/1709.04419.

- [46]↑
G. Matheron, The Theory of Regionalized Variables and its Applications, Ecole de Mines, Fontainebleau, 1971.

- [47]↑
V. Minden, A. Damle, K. L. Ho and L. Ying, Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations, Multiscale Model. Simul. 15 (2017), no. 4, 1584–1611.

- [48]↑
W. G. Müller, Collecting Spatial Data. Optimum Design of Experiments for Random Fields, 3rd ed., Contrib. Statist., Springer, Berlin, 2007.

- [49]↑
G. R. North, J. Wang and M. G. Genton, Correlation models for temperature fields, J. Climate 24 (2011), 5850–5862.

- [50]↑
W. Nowak, Measures of parameter uncertainty in geostatistical estimation and geostatistical optimal design, Math. Geosci 42 (2010), no. 2, 199–221.

- [51]↑
W. Nowak and A. Litvinenko, Kriging and spatial design accelerated by orders of magnitude: Combining low-rank covariance approximations with FFT-techniques, Math. Geosci. 45 (2013), no. 4, 411–435.

- [52]↑
D. Nychka, S. Bandyopadhyay, D. Hammerling, F. Lindgren and S. Sain, A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Statist. 24 (2015), no. 2, 579–599.

- [53]↑
I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317.

- [54]↑
J. Quiñonero Candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6 (2005), 1939–1959.

- [55]↑
C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adapt. Comput. Mach. Learn., MIT, Cambridge, 2006.

- [56]↑
A. K. Saibaba, S. Ambikasaran, J. Yue Li, P. K. Kitanidis and E. F. Darve, Application of hierarchical matrices to linear inverse problems in geostatistics, Oil Gas Sci. Technol. Rev. IFP Energ. Nouv. 67 (2012), no. 5, 857–875.

- [57]↑
U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Ann. Physics 326 (2011), no. 1, 96–192.

- [58]↑
R. Shah and P. Reed, Comparative analysis of multiobjective evolutionary algorithms for random and correlated instances of multiobjective d-dimensional knapsack problems, European J. Oper. Res. 211 (2011), no. 3, 466–479.

- [59]↑
A. K. Smilde, R. Bro and P. Geladi, Multi-Way Analysis with Applications in the Chemical Sciences, Wiley, New York, 2004.

- [60]↑
G. Spöck and J. Pilz, Spatial sampling design and covariance-robust minimax prediction based on convex design ideas, Stoch. Environmental Res. Risk Assess. 24 (2010), 463–482.

- [61]↑
M. L. Stein, J. Chen and M. Anitescu, Difference filter preconditioning for large covariance matrices, SIAM J. Matrix Anal. Appl. 33 (2012), no. 1, 52–72.

- [62]↑
M. L. Stein, Z. Chi and L. J. Welty, Approximating likelihoods for large spatial data sets, J. R. Stat. Soc. Ser. B Stat. Methodol. 66 (2004), no. 2, 275–296.

- [63]↑
F. Stenger, Numerical Methods Based on Sinc and Analytic Functions, Springer Ser. Comput. Math. 20, Springer, New York, 1993.

- [64]↑
Y. Sun and M. L. Stein, Statistically and computationally efficient estimating equations for large spatial datasets, J. Comput. Graph. Statist. 25 (2016), no. 1, 187–208.

- [65]↑
L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279–311.

- [66]↑
S. M. Wesson and G. G. S. Pegram, Radar rainfall image repair techniques, Hydrol. Earth Syst. Sci. 8 (2004), no. 2, 8220–8234.