Show Summary Details
More options …

# Computational Methods in Applied Mathematics

Editor-in-Chief: Carstensen, Carsten

Managing Editor: Matus, Piotr

IMPACT FACTOR 2018: 1.218
5-year IMPACT FACTOR: 1.411

CiteScore 2018: 1.42

SCImago Journal Rank (SJR) 2018: 0.947
Source Normalized Impact per Paper (SNIP) 2018: 0.939

Mathematical Citation Quotient (MCQ) 2018: 1.22

Online
ISSN
1609-9389
See all formats and pricing
More options …
Volume 19, Issue 1

# Tucker Tensor Analysis of Matérn Functions in Spatial Statistics

Alexander Litvinenko
/ David Keyes
• Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology, Thuwal-Jeddah, Saudi Arabia
• Email
• Other articles by this author:
/ Venera Khoromskaia
• Max-Planck Institute for Mathematics in the Sciences, 04103 Leipzig; and Max-Planck Institute for Dynamics of Complex Technical Systems, 39106 Magdeburg, Germany
• Email
• Other articles by this author:
/ Boris N. Khoromskij
/ Hermann G. Matthies
Published Online: 2018-07-07 | DOI: https://doi.org/10.1515/cmam-2018-0022

## Abstract

In this work, we describe advanced numerical tools for working with multivariate functions and for the analysis of large data sets. These tools will drastically reduce the required computing time and the storage cost, and, therefore, will allow us to consider much larger data sets or finer meshes. Covariance matrices are crucial in spatio-temporal statistical tasks, but are often very expensive to compute and store, especially in three dimensions. Therefore, we approximate covariance functions by cheap surrogates in a low-rank tensor format. We apply the Tucker and canonical tensor decompositions to a family of Matérn- and Slater-type functions with varying parameters and demonstrate numerically that their approximations exhibit exponentially fast convergence. We prove the exponential convergence of the Tucker and canonical approximations in tensor rank parameters. Several statistical operations are performed in this low-rank tensor format, including evaluating the conditional covariance matrix, spatially averaged estimation variance, computing a quadratic form, determinant, trace, loglikelihood, inverse, and Cholesky decomposition of a large covariance matrix. Low-rank tensor approximations reduce the computing and storage costs essentially. For example, the storage cost is reduced from an exponential $\mathcal{𝒪}\left({n}^{d}\right)$ to a linear scaling $\mathcal{𝒪}\left(drn\right)$, where d is the spatial dimension, n is the number of mesh points in one direction, and r is the tensor rank. Prerequisites for applicability of the proposed techniques are the assumptions that the data, locations, and measurements lie on a tensor (axes-parallel) grid and that the covariance function depends on a distance, $\parallel x-y\parallel$.

MSC 2010: 60H15; 60H35; 65N25

## 1 Introduction

Nowadays it is very common to work with large spatial data sets [64, 15, 62, 44, 61, 52], for instance, with satellite data, collected over a very large area (e.g., the data collected by the National Center for Atmospheric Research, USA, https://www.earthsystemgrid.org/). This data can also come from a computer simulator code as a solution of a certain multiparametric equation (e.g., Weather research and Forecasting model, https://www.mmm.ucar.edu/weather-research-and-forecasting-model), it could be also sensor data from multiple sources. Typical operations in spatial statistics, such as evaluating the spatially averaged estimation variance, computing quadratic forms of the conditional covariance matrix, or computing maximum of likelihood function [62] require high computing power and time. Our motivation for using low-rank tensor techniques is that operations on advanced matrices, such as hierarchical, low-rank and sparse matrices, are limited by their high computational costs, especially in three dimensions and for a large number of observations.

A tensor can be simply defined as a high-order matrix, where multi-indices are used instead of indices (see Section 3 and equation (3.1) for a rigorous definition). One way to obtain a tensor from a vector or matrix is to reshape it. For example, we assume that $𝐯\in {ℝ}^{{10}^{6}}$ is a vector. We reshape it and obtain a matrix of size ${10}^{3}×{10}^{3}$, or a tensor of order 3 of size ${10}^{2}×{10}^{2}×{10}^{2}$ or a tensor of order 6 of size $10×\mathrm{\dots }×10$ (6 times). Each element of such a six-dimensional hypercube is described by the multi-index $\alpha =\left({\alpha }_{1},\mathrm{\dots },{\alpha }_{6}\right)$. The obtained tensors contain not only rows and columns, but also slices and fibers [40, 41, 10]. These slices and fibers can be analyzed for linear dependences, super symmetry, or sparsity and may result in a strong data compression. Another difference between tensors and matrices is that a matrix (obtained, for instance, after the discretization of a kernel $c\left(𝐱,𝐲\right)=c\left(|𝐱-𝐲|\right)$) separates a point $𝐱=\left({x}_{1},\mathrm{\dots },{x}_{d}\right)\in {ℝ}^{d}$ from a point $𝐲=\left({y}_{1},\mathrm{\dots },{y}_{d}\right)\in {ℝ}^{d}$, whereas the corresponding tensor (depending on tensor format) separates ${x}_{1}-{y}_{1}$ from ${x}_{2}-{y}_{2},{x}_{3}-{y}_{3},\mathrm{\dots },{x}_{d}-{y}_{d}$. This implies that tensors may have not just one rank like a matrix, but many. Therefore, we speak about a tensor rank, but not a matrix rank. In this work, we consider two very common tensor formats: canonical (denoted as CP) and Tucker (see Section 3).

Low-rank tensor methods can be gainfully combined with other data-compression techniques in low dimensions. For example, a three-dimensional function can be approximated as the sum of the tensor products of one-dimensional function. Then the usual matrix techniques can be applied to those one-dimensional functions.

To be more concrete, we consider a relatively wide class of Matérn covariance functions. We demonstrate how to approximate Matérn covariance matrices in a low-rank tensor format, then how to perform typical Kriging and spatial statistics operations in this tensor format. Matérn covariance matrices typically depend on three to five unknown hyper-parameters, such as smoothness, three covariance lengths (in a three-dimensional anisotropic case), and variance. We study the dependences of tensor ranks and approximation errors on these parameters. Splitting the spatial variables via low-rank techniques reduces the computing cost for a matrix-vector product from $\mathcal{𝒪}\left({N}^{2}\right)$ to $\mathcal{𝒪}\left(d{r}^{2}{n}^{2}\right)$ FLOPs, where d is the spatial dimension, r is the tensor rank, and n is the number of mesh points along the longest edge of the computational domain. For simplicity, we assume that $N={n}^{d}$ (e.g., $d=4$ for a time-space problem in three dimensions). Other motivating factors for applying low-rank tensor techniques include the following:

• (1)

The storage cost is reduced from $\mathcal{𝒪}\left({n}^{d}\right)$ to $\mathcal{𝒪}\left(drn\right)$ or, depending on the tensor format, to $\mathcal{𝒪}\left(drn+{r}^{d}\right)$, where $d>1$.

• (2)

The low-rank tensor technique allows us to compute not only the matrix-vector product, but also the inverse ${𝐂}^{-1}$, square root ${𝐂}^{\frac{1}{2}}$, matrix exponent $\mathrm{exp}\left(𝐂\right)$, $\mathrm{trace}\left(𝐂\right)$, $det\left(𝐂\right)$, and a likelihood function.

• (3)

The low-rank tensor approximation is relatively new, but already a well-studied technique with free software libraries available.

• (4)

The approximation accuracy is fully controlled by the tensor rank. The full rank gives an exact representation.

• (5)

Low-rank techniques are either faster than a Fourier transform ($\mathcal{𝒪}\left(drn\right)$ vs. $\mathcal{𝒪}\left({n}^{d}\mathrm{log}{n}^{d}\right)$) or can be efficiently combined with it [51, 13].

General limitations of the tensor technique are the following:

• (a)

It could be time consuming to compute a low-rank tensor decomposition.

• (b)

It requires axes-parallel mesh.

• (c)

Some theoretical estimations exist for functions depending on $|x-y|$ (although more general functions have a low-rank representation in practice).

During the last few years, there has been great interest in numerical methods for representing and approximating large covariance matrices [44, 54, 56, 51, 1, 2, 43]. Low-rank tensors were previously applied to accelerated Kriging and spatial design by orders of magnitude [51]. The covariance matrix under consideration was assumed to be circulant, and the first column had a low-rank decomposition. Therefore, d-dimensional Fourier was applied to and drastically reduce the storage and the computing cost.

The maximum likelihood estimator was computed for parameter-fitting given Gaussian observations with a Matérn covariance matrix [47]. The presented framework for unstructured observations in two spatial dimensions allowed for an evaluation of the log-likelihood and its gradient with computational complexity $\mathcal{𝒪}\left({n}^{\frac{3}{2}}\right)$. The $\mathcal{ℋ}$-matrix techniques [19, 22, 21] provide the efficient data sparse approximation for the differential and integral operators in ${ℝ}^{d}$, $d=1,2,3$. $\mathcal{ℋ}$-matrices are very robust for approximating the covariance matrix [38, 56, 2, 26, 1, 4], its inverse [1], and its Cholesky decomposition [38, 44, 43], but can also be expensive, especially for large n in three dimensions. Namely, the complexity in three dimensions will be $C{k}^{d-1}N\mathrm{log}N$, where $N={n}^{d}$, $d=3$, $k\ll n$ is the rank and C is a large constant which scales exponentially in dimension d, see [22]. Thus, the $\mathcal{ℋ}$-matrix techniques scale exponentially in dimension size. Therefore, more efficient methods for fast and efficient matrix linear algebra operations are still needed.

The key idea is to compute a low-rank decomposition not of the covariance function (it could be hard), but of its analytically known spectral density (which could be a much easier object) and then apply the inverse Fourier to the obtained low-rank components. The Fourier transformation of the Matérn covariance function is known analytically as the Hilbert tensor. This Hilbert tensor can be decomposed numerically in a low-rank tensor format. Both the Fourier transformation and its inverse have the canonical (CP) tensor rank-1. Therefore, the inverse Fourier does not change the tensor rank of the argument. By applying the inverse Fourier to the low-rank tensor, we obtain a low-rank approximation of the initial covariance matrix, which can be further used in the Kalman filter update, Karhunen–Loève expansion, Bayesian update, and Kriging.

The structure of the paper is as follows. In Section 2, we list typical tasks from statistics that motivate us to use low-rank tensor techniques and define the Matérn covariance functions and their Fourier transformations. Section 3 is devoted to low-rank tensor decomposition. Sections 3.4, 3.5 and 3.6 contain the main theoretical contribution of this work. We present low-rank tensor techniques and separate radial basis functions using the Laplace transform and the $\mathrm{sinc}$ quadrature, give estimations of the approximation error, convergence rate, and the tensor rank, and we also prove the existence of a low-rank approximation of a Matérn function. Section 4 contains another important contribution of this work, namely, the solutions to typical statistical tasks in the low-rank tensor format.

## 2.1 Problem Settings in Spatial Statistics

Below, we formulate five tasks. These computational tasks are very common and important in statistics. Fast and efficient solution of these tasks will help to solve many real-world problems, such as the weather prediction, moisture modeling, and optimal design in geostatistics.

## Task 1: Approximate a Matérn covariance function in a low-rank tensor format.

The covariance function $c\left(𝐱,𝐲\right)$, $𝐱=\left({x}_{1},\mathrm{\dots },{x}_{d}\right)$, $𝐲=\left({y}_{1},\mathrm{\dots },{y}_{d}\right)$, is discretized on a tensor grid with N mesh points, $N={n}^{d}$, $d\ge 1$ and $\epsilon >0$. The task is to find the following decomposition into one-dimensional functions:

$\parallel c\left(𝐱,𝐲\right)-\sum _{i=1}^{r}\prod _{\mu =1}^{d}{c}_{i\mu }\left({x}_{\mu },{y}_{\mu }\right)\parallel \le \epsilon$

for some given $\epsilon >0$. Alternatively, we may look for factors ${𝐂}_{i\mu }$ such that

$\parallel 𝐂-\sum _{i=1}^{r}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }\parallel \le \epsilon .$

Here, the matrices ${𝐂}_{i\mu }$ correspond to the one-dimensional covariance functions ${c}_{i\mu }\left({x}_{\mu },{y}_{\mu }\right)$ in the direction μ.

## Task 2: Computing of square root of $𝐂$.

The square root ${𝐂}^{\frac{1}{2}}$ of the covariance matrix $𝐂$ is needed in order to generate random fields and processes. It is also used in the Kalman filter update.

Spatial statistics and Kriging [39] are used to model the distribution of ore grade, forecast of rainfall intensities, moisture, temperatures, or contaminant. The missing values are interpolated from the known measurements by Kriging [46, 30]. When considering space-time Kriging on fine meshes [66, 14, 28, 9], Kriging may easily exceed the computational power of modern computers. Estimating the variance of Kriging and geostatistical optimal design problems are especially numerically intensive [48, 50, 60].

The Kriging can be defined as follows. Let $\stackrel{^}{𝐬}$ be the $N×1$ vector of values to be estimated with zero expectation and covariance matrix ${𝐂}_{ss}$. Let $𝐲$ be the $m×1$, $m\ll N$, vector of measurements. The corresponding cross- and auto-covariance matrices are denoted by ${𝐂}_{sy}$ and ${𝐂}_{yy}$, and sized $N×m$ and $m×m$, respectively. If the measurements are subject to error, an error covariance matrix $𝐑$ is included in ${𝐂}_{yy}$. Using this notation, the Kriging estimate $\stackrel{^}{𝐬}$ is given by $\stackrel{^}{𝐬}={𝐂}_{sy}{𝐂}_{yy}^{-1}𝐲$.

The goal of geostatistical optimal design is to optimize the sampling patterns from which the data values in $𝐲$ will be obtained. The objective function that will be minimized is typically a scalar measure of either the conditional covariance matrix or the estimation variance (4.4). The two most common measures for geostatistical optimal design are ${\phi }_{A}$ and ${\phi }_{C}$:

${\phi }_{A}={N}^{-1}\mathrm{trace}\left[{𝐂}_{ss|y}\right]\mathit{ }\text{and}\mathit{ }{\phi }_{C}={𝐳}^{T}\left({𝐂}_{ss|y}\right)𝐳,$(2.1)

where ${𝐂}_{ss|y}:={𝐂}_{ss}-{𝐂}_{sy}{𝐂}_{yy}^{-1}{𝐂}_{ys}$, see [48, 50].

## Task 5: Computing the joint Gaussian log-likelihood function.

We assume that $𝒛\in {ℝ}^{N}$ is an available vector of measurements, and $𝜽$ is an unknown vector of the parameters of a covariance matrix $𝐂$. The task is to compute the maximum likelihood estimation (MLE), where the log-likelihood function is as follows:

$\mathcal{ℒ}\left(𝜽\right)=-\frac{N}{2}\mathrm{log}\left(2\pi \right)-\frac{1}{2}\mathrm{log}det\left\{𝐂\left(𝜽\right)\right\}-\frac{1}{2}\left({𝐳}^{T}\cdot 𝐂{\left(𝜽\right)}^{-1}𝐳\right).$

The difficulty here is that each iteration step of a maximization procedure requires the solution of a linear system $\mathrm{𝐋𝐯}=𝐳$, the Cholesky decomposition, and the determinant.

In Section 4 we give detailed solutions. We give strict definition of tensors later in Section 3.

## 2.2 Matérn Covariance and Its Fourier Transform

A low-rank approximation of the covariance function is a key component of the tasks formulated above. Among of the many covariance models available, the Matérn family [45, 25] is widely used in spatial statistics, geostatistics [7], machine learning [4], image analysis, weather forecast, moisture modeling, and as the correlation for temperature fields [49]. The work [25] introduced the Matérn form of spatial correlations into statistics as a flexible parametric class with one parameter determining the smoothness of the underlying spatial random field.

The main idea of this low-rank approximation is shown in Figure 1 and explained in details in Section 3.3. Figure 1 demonstrates two possible ways to find a low-rank tensor approximation of the Matérn covariance function. The first way (marked with “?”) is not so trivial and the second via the Fast Fourier Transform (FFT), low-rank and the inverse FFT (IFFT) is more trivial. We use here the fact that the FT of the Matérn covariance is analytically known and has a known low-rank approximation. The IFFT can be computed numerically and does not change the tensor ranks.

Figure 1

Two possible ways to find a low rank tensor approximation of the Matérn covariance matrix ${C}_{\nu ,\mathrm{\ell }}\left(r\right)$.

The Matérn covariance function is defined as

${C}_{\nu ,\mathrm{\ell }}\left(r\right)=\frac{{2}^{1-\nu }}{\mathrm{\Gamma }\left(\nu \right)}{\left(\frac{\sqrt{2\nu }r}{\mathrm{\ell }}\right)}^{\nu }{\mathcal{𝒦}}_{\nu }\left(\frac{\sqrt{2\nu }r}{\mathrm{\ell }}\right),$(2.2)

where distance $r:=\parallel x-y\parallel$, $x,y$ two points in ${ℝ}^{d}$, $\nu >0$ defines the smoothness of the random field, and ${\mathcal{𝒦}}_{\nu }$ denotes the modified Bessel function of order ν. The larger ν, the smoother the random field. The parameter $\mathrm{\ell }>0$ is a spatial range parameter that measures how quickly the correlation of the random field decays with distance, with larger $\mathrm{\ell }$ corresponding to a slower decay (keeping ν fixed). When $\nu =\frac{1}{2}$ (see [55]), the Matérn covariance function reduces to the exponential covariance model and describes a rough field. The value $\nu =\mathrm{\infty }$ corresponds to a Gaussian covariance model, which describes a very smooth field, that is infinitely differentiable. Random fields with a Matérn covariance function are $⌊\nu -1⌋$ times mean square differentiable. Thus, the hyperparameter ν controls the degree of smoothness.

The d-dimensional Fourier transform ${𝑭}^{d}\left(C\left(r,\nu \right)$ of the Matérn kernel, defined in equation (2.2), in ${ℝ}^{d}$ is given by [45]

$𝑼\left(\xi \right):={𝑭}^{d}\left(C\left(r,\nu \right)=\beta \cdot {\left(1+\frac{{\mathrm{\ell }}^{2}}{2\nu }|\xi {|}^{2}\right)}^{-\nu -\frac{d}{2}},$(2.3)

where $\beta =\beta \left(\nu ,\mathrm{\ell },n\right)$ is a constant and $|\xi |$ is the Euclidean distance in ${ℝ}^{d}$. The following tensor approach also applies to the case of anisotropic distance, where ${r}^{2}=\sqrt{〈A\left(x-y\right),\left(x-y\right)〉}$, and A is a positive diagonal $d×d$ matrix.

## 3 Low-Rank Tensor Decompositions

In this section, we review the definitions of the CP and Tucker tensor formats. Then we provide the analytic $\mathrm{sinc}$-based proof of the existence of low-rank tensor approximations of Matérn functions. We investigate numerically the behavior of the Tucker and CP ranks across a wide range of parameters specific to the family of Matérn kernels in equation (2.3). The Tucker tensor format is used in this work for additional rank compression of the CP factors. There are no reliable algorithms to compute CP decomposition, which can be difficult to compute, but there are such algorithms for Tucker decomposition. The Tucker decomposition is only limited with respect to the available memory storage, since the term ${r}^{d}$ in $\mathcal{𝒪}\left(drn+{r}^{d}\right)$ grows exponentially with d.

## 3.1 General Definitions

CP and Tucker rank-structured tensor formats have been applied for the quantitative analysis of correlation in multidimensional experimental data for a long time in chemometrics and signal processing [59, 8]. The Tucker tensor format was introduced in 1966 for tensor decomposition of multidimensional arrays in chemometrics [65]. Though the canonical representation of multivariate functions was introduced as early as in 1927 [29], only the Tucker tensor format provides a stable algorithm for decomposition of full-size tensors. A mathematical approval of the Tucker decomposition algorithm was presented in papers on higher-order singular value decomposition (HOSVD) and the Tucker ALS algorithm for orthogonal Tucker approximation of higher-order tensors [10]. For higher dimensions, the so-called Matrix Product States (MPS) (see the survey paper [57]) or the Tensor Train (TT) [53] decompositions can be applied. However, for three-dimensional applications, the Tucker and CP tensor formats remain the best choices. The fast convergence of the Tucker decomposition was proved and demonstrated numerically for higher-order tensors that arise from the discretization of linear operators and functions in ${ℝ}^{d}$ for a class of function-related tensors and Green’s kernels in particular, it was found that the approximation error of the Tucker decomposition decayed exponentially in the Tucker rank [33, 36].

These results inspired the canonical-to-Tucker (C2T) and Tucker-to-canonical (T2C) decompositions for function-related tensors in the case of large input ranks, as well as the multigrid Tucker approximation [37].

A tensor of order d in a full format is defined as a multidimensional array over a d-tuple index set:

(3.1)

Here, $𝐀$ is an element of the linear space

${𝕍}_{n}=\underset{\mathrm{\ell }=1}{\overset{d}{\otimes }}{𝕍}_{\mathrm{\ell }},{𝕍}_{\mathrm{\ell }}={ℝ}^{{I}_{\mathrm{\ell }}}$

equipped with the Euclidean scalar product $〈\cdot ,\cdot 〉:{𝕍}_{n}×{𝕍}_{n}\to ℝ$, defined as

Tensors with all dimensions having equal size ${n}_{\mathrm{\ell }}=n$, $\mathrm{\ell }=1,\mathrm{\dots },d$, are called ${n}^{\otimes d}$ tensors. The required storage size scales exponentially with the dimension, ${n}^{d}$, which results in the so-called “curse of dimensionality”.

To avoid exponential scaling in the dimension, the rank-structured separable representations (approximations) of the multidimensional tensors can be used. The simplest separable element is given by the rank-1 tensor

$𝐔={𝐮}^{\left(1\right)}\otimes \mathrm{\dots }\otimes {𝐮}^{\left(d\right)}\in {ℝ}^{{n}_{1}×\mathrm{\dots }×{n}_{d}},$

with entries ${u}_{{i}_{1},\mathrm{\dots },{i}_{d}}={u}_{{i}_{1}}^{\left(1\right)}\mathrm{\cdots }{u}_{{i}_{d}}^{\left(d\right)},$ which requires only ${n}_{1}+\mathrm{\dots }+{n}_{d}$ numbers for storage.

The rank-1 canonical tensor is a discrete counterpart of the separable d-variate function, which can be represented as the product of univariate functions

$f\left({x}_{1},{x}_{2},\mathrm{\dots },{x}_{d}\right)={f}_{1}\left({x}_{1}\right){f}_{2}\left({x}_{2}\right)\mathrm{\dots }{f}_{d}\left({x}_{d}\right).$

An example of the separable d-variate function is $f\left({x}_{1},{x}_{2},{x}_{3}\right)={e}^{\left({x}_{1}+{x}_{2}+{x}_{3}\right)}$. Then, by discretization of this multivariate function on a tensor grid in a computational box, we obtain a canonical rank-1 tensor.

A tensor in the R-term canonical format is defined by a finite sum of rank-1 tensors (Figure 2, left)

${𝐀}_{c}=\sum _{k=1}^{R}{\xi }_{k}{𝐮}_{k}^{\left(1\right)}\otimes \mathrm{\dots }\otimes {𝐮}_{k}^{\left(d\right)},{\xi }_{k}\in ℝ,$(3.2)

where ${𝐮}_{k}^{\left(\mathrm{\ell }\right)}\in {ℝ}^{{n}_{\mathrm{\ell }}}$ are normalized vectors, and R is the canonical rank. The storage cost of this parametrization is bounded by dRn. An element $a\left({i}_{1},\mathrm{\dots },{i}_{d}\right)$ of the tensor $𝐀={\sum }_{i=1}^{R}{\otimes }_{\nu =1}^{d}{u}_{i\nu }$ can be computed as

$a\left({i}_{1},\mathrm{\dots },{i}_{d}\right)=\sum _{\alpha =1}^{R}{u}_{1}\left({i}_{1},\alpha \right){u}_{2}\left({i}_{2},\alpha \right)\mathrm{\dots }{u}_{d}\left({i}_{d},\alpha \right).$

An alternative (contracted product) notation is used in computer science community:

$𝐀=𝐂{×}_{1}{U}^{\left(1\right)}{×}_{2}{U}^{\left(2\right)}{×}_{3}\mathrm{\cdots }{×}_{d}{U}^{\left(d\right)},$(3.3)

where $𝐂=\mathrm{diag}\left\{{c}_{1},\mathrm{\dots },{c}_{d}\right\}\in {ℝ}^{{R}^{\otimes d}}$, and ${U}^{\left(\mathrm{\ell }\right)}=\left[{𝐮}_{1}^{\left(\mathrm{\ell }\right)}\mathrm{\dots }{𝐮}_{R}^{\left(\mathrm{\ell }\right)}\right]\in {ℝ}^{{n}_{\mathrm{\ell }}×R}$. An analogous multivariate function can be represented by a sum of univariate functions

$f\left({x}_{1},{x}_{2},\mathrm{\dots },{x}_{d}\right)=\sum _{k=1}^{R}{f}_{1,k}\left({x}_{1}\right){f}_{2,k}\left({x}_{2}\right)\mathrm{\dots }{f}_{d,k}\left({x}_{2}\right).$

For $d\ge 3$, there are no stable algorithms to compute the canonical rank of a tensor $𝐀$, that is, the minimal number R in representation (3.2), and the respective decomposition with the polynomial cost in d, i.e., the computation of the canonical decomposition is an N-P hard problem [27].

Figure 2

Canonical (left) and Tucker (right) decompositions of three-dimensional tensors.

The Tucker tensor format (Figure 2, right) is suitable for stable numerical decompositions with a fixed truncation threshold. We say that the tensor $𝐀$ is represented in the rank-$𝐫$ orthogonal Tucker format with the rank parameter $𝐫=\left({r}_{1},\mathrm{\dots },{r}_{d}\right)$ if

$𝐀=\sum _{{\nu }_{1}=1}^{{r}_{1}}\mathrm{\dots }\sum _{{\nu }_{d}=1}^{{r}_{d}}{\beta }_{{\nu }_{1},\mathrm{\dots },{\nu }_{d}}{𝐯}_{{\nu }_{1}}^{\left(1\right)}\otimes \mathrm{\dots }\otimes {𝐯}_{{\nu }_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\mathrm{\dots }\otimes {𝐯}_{{\nu }_{d}}^{\left(d\right)},\mathrm{\ell }=1,\mathrm{\dots },d,$

where ${\left\{{𝐯}_{{\nu }_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\right\}}_{{\nu }_{\mathrm{\ell }}=1}^{{r}_{\mathrm{\ell }}}\in {ℝ}^{{n}_{\mathrm{\ell }}}$ represents a set of orthonormal vectors for $\mathrm{\ell }=1,\mathrm{\dots },d$, and $𝐁=\left[{𝐁}_{{\nu }_{1},\mathrm{\dots },{\nu }_{d}}\right]\in {ℝ}^{{r}_{1}×\mathrm{\cdots }×{r}_{d}}$ is the Tucker core tensor. The storage cost for the Tucker tensor is bounded by $drn+{r}^{d}$, with $r=|𝐫|:={\mathrm{max}}_{\mathrm{\ell }}{r}_{\mathrm{\ell }}$. Using the orthogonal side matrices ${V}^{\left(\mathrm{\ell }\right)}=\left[{v}_{1}^{\left(\mathrm{\ell }\right)}\mathrm{\dots }{v}_{{r}_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\right]$ and contracted products, the Tucker tensor decomposition may be presented in the alternative notation

${𝐀}_{\left(𝐫\right)}=𝐁{×}_{1}{V}^{\left(1\right)}{×}_{2}{V}^{\left(2\right)}{×}_{3}\mathrm{\dots }{×}_{d}{V}^{\left(d\right)}.$

In the case $d=2$, the orthogonal Tucker decomposition is equivalent to the singular value decomposition (SVD) of a rectangular matrix.

## 3.2 Tucker Decomposition of Full Format Tensors

We use the following algorithm to compute the Tucker decomposition of the full format tensor. The most time-consuming part of the Tucker algorithm is higher-order singular value decomposition (HOSVD), the computation of the initial guess for matrices ${V}^{\left(\mathrm{\ell }\right)}$ using the SVD of the matrix unfolding ${A}_{\left(\mathrm{\ell }\right)}$, $\mathrm{\ell }=1,2,3$ (Figure 3), of the original tensor along each mode of a tensor [10]. Figure 3 illustrates the matrix unfolding of the full format tensor $𝐀$ (see (3.1)) along the index set ${I}_{1}=\left\{1,\mathrm{\dots },{n}_{1}\right\}$.

Figure 3

Unfolding of a three-dimensional tensor along the mode ${I}_{\mathrm{\ell }}$ with $\mathrm{\ell }=1$.

The second part of the algorithm is the ALS procedure. For every tensor mode, a “single-hole” tensor of reduced size is constructed by the mapping all of the modes of the original tensor except one into the subspaces ${V}^{\left(\mathrm{\ell }\right)}$. Then the subspace ${V}^{\left(\mathrm{\ell }\right)}$ for the current mode is updated by SVD of the unfolding of the “single hole” tensor for this mode. This alternates over all modes of the tensor, which are updated at the current iteration of ALS. The final step of the algorithm is computation of the core tensor by using the ultimate mapping matrices from ALS.

The numerical cost of Tucker decomposition for full size tensors is dominated by the initial guess, which is estimated as $O\left({n}^{d+1}\right)$ when all ${n}_{\mathrm{\ell }}=n$ are equal , or $O\left({n}^{4}\right)$ for our three-dimensional case. This step restricts the available size of the tensor to be decomposed since, for conventional computers, the three-dimensional case ${n}_{\mathrm{\ell }}>{10}^{2}$ is the limiting case for SVD.

The multigrid Tucker algorithm for full size tensors allows the computational complexity to be linear in the full size of the tensor, $O\left({n}^{d}\right)$, i.e., $O\left({n}^{3}\right)$ for three-dimensional tensors [37]. It is computed on a sequence of diadically refined grids and is based on implementing the HOSVD only at the coarsest grid level, ${n}_{0}\ll n$. The initial guess for the ALS procedure is computed at each refined level by the interpolation of the dominating Tucker subspaces obtained from the previous coarser grid. In this way, at fine three-dimensional Cartesian grids, we need 29 only $O\left({n}^{3}\right)$ storage (to represent the initial full format tensor) to contract with the Tucker side matrices, obtained by the Tucker approximation via ALS on the previous grids.

## 3.3 Illustration of the Low-Rank Approximation Idea

In this subsection we describe a possible ways to find a low-rank tensor approximation of the Matérn covariance matrix $C\left(r,\nu \right)$ (Figure 1). Let ${𝑭}^{d}={\otimes }_{\nu =1}^{d}{𝑭}_{\nu }$ be the d-dimensional Fourier transform, where ${𝑭}^{-d}={\otimes }_{\nu =1}^{d}{𝑭}_{\nu }^{-1}$ is its inverse and $\otimes$ denotes the Kronecker product. We assume that $𝑼\left(\xi \right)={𝑭}^{d}\left(C\left(r,\nu \right)\right)$ is known analytically and has a low-rank tensor approximation $𝐔={\sum }_{j=1}^{r}{\otimes }_{\nu =1}^{d}{𝐮}_{j\nu }$. Since the Fourier and inverse Fourier transformations do not change the Kronecker tensor rank of the argument [51], by applying the inverse Fourier, we obtain a low-rank representation of the covariance function by applying the inverse Fourier:

${𝑭}^{-d}\left(𝐔\right)=\left(\underset{\nu =1}{\overset{d}{\otimes }}{𝑭}_{\nu }^{-1}\right)\sum _{i=1}^{r}\left(\underset{\nu =1}{\overset{d}{\otimes }}{𝐮}_{\nu i}\right)=\sum _{i=1}^{r}\underset{\nu =1}{\overset{d}{\otimes }}\left({𝑭}_{\nu }^{-1}\left({𝐮}_{\nu i}\right)\right)=\sum _{i=1}^{r}\underset{\nu =1}{\overset{d}{\otimes }}{\stackrel{~}{𝐮}}_{\nu i}=:C\left(r,\nu \right),$

where ${\stackrel{~}{𝐮}}_{\nu i}:={𝑭}_{\nu }^{-1}\left({𝐮}_{\nu i}\right)$.

## 3.4 Sinc Approximation of the Matérn Function

The Sinc method provides a constructive approximation of the multivariate functions in the form of a low-rank canonical representation. It can be also used for the theoretical proof and for the rank estimation. Methods for the separable approximation of the three-dimensional Newton kernel and many other spherically symmetric functions that use the Gaussian sums have been developed since the initial studies in chemical [5] and mathematical literature [63, 6, 23, 17]. Here, we use a tensor-decomposition approach for lattice-structured interaction potentials [32]. We also recall the grid-based method for a low-rank canonical representation of the spherically symmetric kernel function $q\left(\parallel x\parallel \right)$, where $x\in {ℝ}^{d}$, $d=2,3,\mathrm{\dots }$, by its projection onto the set of piecewise constant basis functions; see [3] for the case of Newton and Yukawa kernels for $x\in {ℝ}^{3}$.

Following the standard schemes, we introduce the uniform $n×n×n$ rectangular Cartesian grid ${\mathrm{\Omega }}_{n}$ with mesh size $h=\frac{2b}{n}$ (we assume that n is even) in the computational domain $\mathrm{\Omega }={\left[-b,b\right]}^{3}$. Let $\left\{{\psi }_{𝐢}\right\}$ be a set of tensor-product piecewise constant basis functions

${\psi }_{𝐢}\left(𝐱\right)=\prod _{\mathrm{\ell }=1}^{3}{\psi }_{{i}_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\left({x}_{\mathrm{\ell }}\right)$

for the 3-tuple index $𝐢=\left({i}_{1},{i}_{2},{i}_{3}\right)\in \mathcal{ℐ}$, $\mathcal{ℐ}={I}_{1}×{I}_{2}×{I}_{3}$, with ${i}_{\mathrm{\ell }}\in {I}_{\mathrm{\ell }}=\left\{1,\mathrm{\dots },n\right\}$, where $\mathrm{\ell }=1, 2, 3$. The generating kernel $q\left(\parallel x\parallel \right)$ is discretized by its projection onto the basis set $\left\{{\psi }_{𝐢}\right\}$ in the form of a third-order tensor of size $n×n×n$, which is defined entry-wise as

$𝐐:=\left[{q}_{𝐢}\right]\in {ℝ}^{n×n×n},{q}_{𝐢}={\int }_{{ℝ}^{3}}{\psi }_{𝐢}\left(x\right)q\left(\parallel x\parallel \right)dx.$(3.4)

The low-rank canonical decomposition of the third-order tensor $𝐐$ is based on applying exponentially convergent $sinc$-quadratures to the integral representation of the function $q\left(p\right)$, $p\in ℝ$, in the form

$q\left(p\right)={\int }_{ℝ}{a}_{1}\left(t\right){e}^{-{p}^{2}{a}_{2}\left(t\right)}dt,$

specified by the weights ${a}_{1}\left(t\right),{a}_{2}\left(t\right)>0$. Figure 4 illustrates a scheme of the proof of existence of the canonical low-rank tensor approximation. It could be easier to apply the Laplace transform to the Fourier transform of a Matérn covariance matrix than to the Matérn covariance. To approximate the resulting Laplace integral we apply the $\mathrm{sinc}$ quadrature. The number $\left(2M+1\right)$ of terms in the approximate sum (3.5) is the canonical tensor rank.

Figure 4

Scheme of the proof of existence of low-rank tensor approximation, $r=2M+1$.

In particular, the $sinc$-quadrature for the Laplace–Gauss transform

(3.5)

can be applied, where the quadrature points $\left({t}_{k}\right)$ and weights $\left({a}_{k}\right)$ are given by

(3.6)

Under the assumption $0<{a}_{0}\le |p|<\mathrm{\infty }$, this quadrature can be proven to provide an exponential convergence rate in M (uniformly in p) for a class of functions $a\left(z\right)$ that are analytic in a certain strip $|z|\le D$ of the complex plane such that the functions ${a}_{1}\left(t\right){e}^{-{p}^{2}{a}_{2}\left(t\right)}$ decay polynomially or exponentially on the real axis. The exponential convergence of the $\mathrm{sinc}$-approximation in the number of terms (i.e., the canonical rank $R=2M+1$) was analyzed elsewhere [63, 6, 23].

We assume that a representation similar to (3.5) exists for any fixed $x=\left({x}_{1},{x}_{2},{x}_{3}\right)\in {ℝ}^{3}$ such that $\parallel x\parallel >{a}_{0}>0$. Then we apply the $sinc$-quadrature approximation (3.5) and (3.6) to obtain the separable expansion

$q\left(\parallel x\parallel \right)={\int }_{{ℝ}_{+}}a\left(t\right){e}^{-{t}^{2}{\parallel x\parallel }^{2}}dt\approx \sum _{k=-M}^{M}{a}_{k}{e}^{-{t}_{k}^{2}{\parallel x\parallel }^{2}}=\sum _{k=-M}^{M}{a}_{k}\prod _{\mathrm{\ell }=1}^{3}{e}^{-{t}_{k}^{2}{x}_{\mathrm{\ell }}^{2}},$(3.7)

providing an exponential convergence rate in M:

By combining (3.4) and (3.7), and taking into account the separability of the Gaussian basis functions, we arrive at the low-rank approximation of each entry of the tensor $𝐐$:

${q}_{𝐢}\approx \sum _{k=-M}^{M}{a}_{k}{\int }_{{ℝ}^{3}}{\psi }_{𝐢}\left(x\right){e}^{-{t}_{k}^{2}{\parallel x\parallel }^{2}}dx=\sum _{k=-M}^{M}{a}_{k}\prod _{\mathrm{\ell }=1}^{3}{\int }_{ℝ}{\psi }_{{i}_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\left({x}_{\mathrm{\ell }}\right){e}^{-{t}_{k}^{2}{x}_{\mathrm{\ell }}^{2}}d{x}_{\mathrm{\ell }}.$

Recalling that ${a}_{k}>0$, we define the vector $𝐪$ as

Then the third-order tensor $𝐐$ can be approximated by the R-term ($R=2M+1$) canonical representation

$𝐐\approx {𝐐}_{R}=\sum _{k=-M}^{M}{a}_{k}\underset{\mathrm{\ell }=1}{\overset{3}{\otimes }}{𝐛}^{\left(\mathrm{\ell }\right)}\left({t}_{k}\right)=\sum _{k=-M}^{M}{𝐪}_{k}^{\left(1\right)}\otimes {𝐪}_{k}^{\left(2\right)}\otimes {𝐪}_{k}^{\left(3\right)}\in {ℝ}^{n×n×n},$(3.8)

where ${𝐪}_{k}^{\left(\mathrm{\ell }\right)}\in {ℝ}^{n}$. Given a threshold $\epsilon >0$, M can be chosen as the minimal number such that in the max-norm

$\parallel 𝐐-{𝐐}_{R}\parallel \le \epsilon \parallel 𝐐\parallel .$

The skeleton vectors can be reindex by $k↦{k}^{\prime }=k+M+1$, ${𝐪}_{k}^{\left(\mathrm{\ell }\right)}↦{𝐪}_{{k}^{\prime }}^{\left(\mathrm{\ell }\right)}$ (${k}^{\prime }=1,\mathrm{\dots },R=2M+1$), $\mathrm{\ell }=1,2,3$. The symmetric canonical tensor ${𝐐}_{R}\in {ℝ}^{n×n×n}$ in (3.8) approximates the three-dimensional symmetric kernel function $q\left(\parallel x\parallel \right)$ ($x\in \mathrm{\Omega }$), centered at the origin, such that ${𝐪}_{{k}^{\prime }}^{\left(1\right)}={𝐪}_{{k}^{\prime }}^{\left(2\right)}={𝐪}_{{k}^{\prime }}^{\left(3\right)}$ (${k}^{\prime }=1,\mathrm{\dots },R$).

In some applications, the tensor can be given in the canonical tensor format, but with large rank R and discretized on large grids $n×n×\mathrm{\dots }×n$; thus, computation of the initial of guess in the Tucker-ALS decomposition algorithm becomes intractable. This situation may arise when composing the tensor approximation of complicated kernel functions from simple radial functions that can be represented in the low-rank CP format.

For such cases, the canonical-to-Tucker decomposition algorithm was introduced [37]. It is based on the minimization ALS procedure, similar to the Tucker algorithm, described in Section 3.2 for full-size tensors, but the initial guess is computed by just the SVD of the side matrices ${U}^{\left(\mathrm{\ell }\right)}=\left[{𝐮}_{1}^{\left(\mathrm{\ell }\right)}\mathrm{\dots }{𝐮}_{R}^{\left(\mathrm{\ell }\right)}\right]\in {ℝ}^{{n}_{\mathrm{\ell }}×R}$, $\mathrm{\ell }=1,2,3$; see (3.3). This schema is the Reduced HOSVD (RHOSVD), which does not require unfolding of the full tensor.

Another efficient rank-structured representation of the multidimensional tensors is the mixed-tensor format [31], which combines either the canonical-to-Tucker decomposition with the Tucker-to-canonical decomposition, or standard Tucker decomposition with the canonical-to-Tucker decomposition, in order to produce a canonical tensor from a full-size tensor.

## 3.5 Laplace Transform of the Covariance Matrix

The integral representations like (3.5) can be derived by the Laplace transform either directly to the Matérn covariance function or to its spectral density (2.3). For example, in the case of the Newton kernel, $q\left(p\right)=\frac{1}{p}$, and the Laplace–Gauss transform representation takes the form

In this case, ${𝐪}_{k}^{\left(\mathrm{\ell }\right)}={𝐪}_{-k}^{\left(\mathrm{\ell }\right)}$, and the sum of (3.8) reduces to $k=0,1,\mathrm{\dots },M$, implying that $R=M+1$. Therefore, the Laplace transform representation of the Slater function $q\left(p\right)={e}^{-2\sqrt{\alpha p}}$ (i.e., exponential covariance) with $p={\parallel x\parallel }^{2}$ can be written as

$q\left(p\right)={e}^{-2\sqrt{\alpha p}}=\frac{\sqrt{\alpha }}{\sqrt{\pi }}{\int }_{{ℝ}_{+}}{t}^{-\frac{3}{2}}\mathrm{exp}\left(-\frac{\alpha }{t}-pt\right)dt.$

When the Matérn spectral density in (2.3) has an even dimension parameter $d=2{d}_{1}$, ${d}_{1}=1,2,\mathrm{\dots }$ and $\nu =0,1,2,\mathrm{\dots }$, the Laplace transform

$\frac{\eta !}{{\left(p+a\right)}^{\eta +1}}={\int }_{{ℝ}_{+}}{t}^{\eta }{e}^{-at}{e}^{-pt}dt$

can be applied after substituting $p={\parallel \xi \parallel }^{2}$ in

$q\left(p\right)=\beta {\left(1+\frac{{\mathrm{\ell }}^{2}}{2\nu }p\right)}^{\eta },\eta =-\nu -{d}_{1}.$

If $-\eta =\nu +{d}_{1}=\frac{1}{2},\frac{3}{2},\frac{5}{2},\mathrm{\dots },\frac{2k+1}{2},\mathrm{\dots }$, then the Laplace transform is

$\frac{\left(2\eta \right)!\sqrt{\pi }}{\eta !{4}^{\eta }}\frac{1}{{\left(p+a\right)}^{\eta }\sqrt{p+a}}={\int }_{{ℝ}_{+}}\frac{{t}^{\eta }}{\sqrt{t}}{e}^{-at}{e}^{-pt}dt,\eta \in ℕ.$

## 3.6 Covariance Matrix in Rank-Structured Tensor Format

In what follows, we consider the CP approximation of the radial function $q\left(r\right)$ in the positive sector, i.e., on the domain ${\left[0,b\right]}^{3}$ (by symmetry, the canonical tensor can be extended to the whole computational domain ${\left[-b,b\right]}^{3}$.) Let the covariance function $q\left(r\right)={C}_{\nu ,\mathrm{\ell }}\left(r\right)$ in (2.2) be represented by the rank-R symmetric CP tensor on an $n×n×n$ tensor grid, denoted by ${\mathrm{\Omega }}_{n}\subset \mathrm{\Omega }={\left[0,b\right]}^{3}$ as described in the previous sections,

$q\left(r\right)↦ℚ\approx {ℚ}_{R}=\sum _{k=1}^{R}{𝐪}_{k}^{\left(1\right)}\otimes {𝐪}_{k}^{\left(2\right)}\otimes {𝐪}_{k}^{\left(3\right)}\in {ℝ}^{n×n×n}$(3.9)

with the same skeleton vectors ${𝐪}_{k}^{\left(\mathrm{\ell }\right)}\in {ℝ}^{n}$ for $\mathrm{\ell }=1,2,3$.

We define the covariance matrix $𝐂=\left[{c}_{𝐢,𝐣}\right]\in {ℝ}^{𝐧×𝐧}$ entry-wise by

${c}_{𝐢,𝐣}={C}_{\nu ,\mathrm{\ell }}\left(\parallel {x}_{𝐢}-{y}_{𝐣}\parallel \right),𝐢,𝐣\in \mathcal{ℐ}.$

Using the tensor representation (3.9), we represent the large ${n}^{3}×{n}^{3}$ matrix $𝐂$ in the rank-R Kronecker (tensor) format as

$𝐂\approx {𝐂}_{R}=\sum _{k=1}^{R}{𝐐}_{k}^{\left(1\right)}\otimes {𝐐}_{k}^{\left(2\right)}\otimes {𝐐}_{k}^{\left(3\right)},$(3.10)

where the symmetric Toeplitz matrix ${𝐐}_{k}^{\left(\mathrm{\ell }\right)}=\mathrm{Toepl}\left[{𝐪}_{k}^{\left(\mathrm{\ell }\right)}\right]\in {ℝ}^{n×n}$, $\mathrm{\ell }=1,2,3$, is defined by its first column, which is specified by the skeleton vectors ${𝐪}_{k}^{\left(1\right)}={𝐪}_{k}^{\left(2\right)}={𝐪}_{k}^{\left(3\right)}$ in the decomposition (3.9).

Figure 5 illustrates eight selected canonical generating vectors from ${𝐪}_{k}^{\left(1\right)}$ for $k=1,\mathrm{\dots },R$, $R=34$, on a grid of size $n=2,049$ for the Slater function ${e}^{-{\parallel x\parallel }^{p}}$, which defines the corresponding Toeplitz matrices.

Figure 5

Selected eight canonical vectors from the full set ${𝐪}_{k}^{\left(1\right)}$, $k=1,\mathrm{\dots },R$, see (3.9).

A Toeplitz matrix can be multiplied by a vector in $O\left(n\mathrm{log}n\right)$ operations via complementing it to the circulant matrix. In general, the inverse of a Toeplitz matrix cannot be calculated in the closed form; unlike circulant matrices, which can be diagonalized by the Fourier transform.

In the rest of this subsection, we introduce the numerical scheme, based on certain specific properties of the skeleton matrices in the symmetric rank-R decomposition (3.10) that is capable of rank-structured calculations of the analytic matrix functions $\mathcal{ℱ}\left({𝐂}_{R}\right)$. We discuss the most interesting examples of the functions ${\mathcal{ℱ}}_{1}\left(𝐂\right)={𝐂}^{-1}$ and ${\mathcal{ℱ}}_{2}\left(𝐂\right)={𝐂}^{\frac{1}{2}}$, where $𝐂={𝐂}_{R}$.

Given an symmetric positive definite matrix such that $\parallel 𝐀\parallel =q<1$, the matrix-valued function is given as the exponentially fast converging series

$\mathcal{ℱ}\left(𝐀\right)=𝐄+{a}_{1}𝐀+{a}_{2}{𝐀}^{2}+\mathrm{\cdots },$(3.11)

where the matrix $𝐀$, acting in the multidimensional index set, allows the low-rank Kronecker tensor decomposition. Then the low-rank tensor approximation of $\mathcal{ℱ}\left(𝐀\right)$ can be computed by the “add-and-compress” scheme, where each term in the series above (3.11) is summed using the rank-truncation algorithm in the corresponding format.

To limit the rank-structured evaluation of ${\mathcal{ℱ}}_{1}\left(𝐂\right)$ to the described framework, we propose the special rank-structured additive splitting of the covariance matrix $𝐂$ with the easily invertible dominating part. To that end, we construct the diagonal matrix ${𝐐}_{0}^{\left(1\right)}$ by assembling all of the diagonal sub-matrices in ${𝐐}_{k}^{\left(1\right)}$, $k=1,\mathrm{\dots },R$ (in the following, we simplify the notation by omitting the upper index $\left(1\right)$):

${𝐐}_{0}:=\sum _{k=1}^{R}\mathrm{diag}\left({𝐐}_{k}\right),$

and modify each Toeplitz matrix ${𝐐}_{k}$ by subtracting its diagonal part,

${𝐐}_{k}↦{\stackrel{^}{𝐐}}_{k}:={𝐐}_{k}-\mathrm{diag}\left({𝐐}_{k}\right),k=1,\mathrm{\dots },R.$

Using the matrices defined above, we introduce the rank $R+1$ additive splitting of $𝐂$ , which is defined by the skeleton matrices ${𝐐}_{0}$ and ${\stackrel{^}{𝐐}}_{k}$ since $𝐂$ is Kronecker symmetrical,

$𝐂={𝐐}_{0}\otimes {𝐐}_{0}\otimes {𝐐}_{0}+\sum _{k=1}^{R}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}\otimes {\stackrel{^}{𝐐}}_{k}^{\left(2\right)}\otimes {\stackrel{^}{𝐐}}_{k}^{\left(3\right)}.$

Hence, we have

${𝐂}^{-1}={𝐐}_{0}^{-1}\otimes {𝐐}_{0}^{-1}\otimes {𝐐}_{0}^{-1}{\left(𝐄+\sum _{k=1}^{R}{𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}\otimes {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(2\right)}\otimes {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(3\right)}\right)}^{-1}.$

Likewise, since ${𝐐}_{0}\otimes {𝐐}_{0}\otimes {𝐐}_{0}$ is a scaled identity, we obtain

${𝐂}^{\frac{1}{2}}={𝐐}_{0}^{\frac{1}{2}}\otimes {𝐐}_{0}^{\frac{1}{2}}\otimes {𝐐}_{0}^{\frac{1}{2}}{\left(𝐄+\sum _{k=1}^{R}{𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}\otimes {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(2\right)}\otimes {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(3\right)}\right)}^{\frac{1}{2}}.$(3.12)

We assume that $\parallel {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}\parallel <1$ for $k=1,\mathrm{\dots },R$ in some norm; thus, we can apply the “add-and-compress” scheme described above.

We illustrate the “add-and-compress” computational scheme in the following example. We consider the covariance matrix ${𝐂}_{R}$, obtained by a rank-R $\mathrm{sinc}$ approximation of the Slater function ${e}^{-{\parallel x\parallel }^{p}}$ with $R=40$ on a grid with $n=1,025$ sampling points. Figure 6 demonstrates the decay in both the matrix norms ${𝐐}_{k}^{\left(1\right)}$ (left), and the scaled, preconditioned matrices ${𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}$, $k=1,\mathrm{\dots },R$ (right). We use the scaling factor of $\frac{1}{n}$. The right figure indicates that the analytic matrix functions $\mathcal{ℱ}\left({𝐂}_{R}\right)$ can be evaluated by using an exponentially fast convergent power series supported by the “add-and-compress” strategy to control the tensor rank.

Figure 6

Scaled norms $\parallel {𝐐}_{k}^{\left(1\right)}\parallel$ (left) and $\parallel {𝐐}_{0}^{-1}{\stackrel{^}{𝐐}}_{k}^{\left(1\right)}\parallel$ (right) vs. $k=1,\mathrm{\dots },R$.

In the Kriging calculations (Task 3 above), the low-rank tensor structure in the covariance matrix ${𝐂}_{R}$ can be directly utilized if the sampling points in the Kriging algorithm form a smaller ${m}_{1}×{m}_{2}×{m}_{3}$ tensor sub-grid of the initial $n×n×n$ tensor grid ${\mathrm{\Omega }}_{n}$ with ${m}_{\mathrm{\ell }}. The same argument also applies to the evaluation of conditional covariance. In the general case of “non-tensor” locations of the sampling points, some mixed tensor factorizations could be applied.

## 3.7 Numerical Illustrations

In what follows, we check some examples of the low-rank Tucker tensor approximation of the p-Slater function $C\left(x\right)={e}^{-{\parallel x\parallel }^{p}}$, and Matérn kernels with full-grid tensor representation. We demonstrate the fast exponential convergence of the tensor approximation in the Tucker rank. The functions were sampled on the ${n}_{1}×{n}_{2}×{n}_{3}$ three-dimensional Cartesian grid with ${n}_{\mathrm{\ell }}=100$, $\mathrm{\ell }=1,2,3$.

For a continuous function $q:\mathrm{\Omega }\to ℝ$, where $\mathrm{\Omega }:={\prod }_{\mathrm{\ell }=1}^{d}\left[-{b}_{\mathrm{\ell }},{b}_{\mathrm{\ell }}\right]\subset {ℝ}^{d}$, and $0<{b}_{\mathrm{\ell }}<\mathrm{\infty }$, we introduce the collocation-type function-related tensor of order d:

where $\left({x}_{{i}_{1}}^{\left(1\right)},\mathrm{\dots },{x}_{{i}_{d}}^{\left(d\right)}\right)\in {ℝ}^{d}$ are grid collocation points, indexed by $\mathcal{ℐ}={I}_{1}×\mathrm{\dots }×{I}_{d}$,

${x}_{{i}_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}=-{b}_{\mathrm{\ell }}+\left({i}_{\mathrm{\ell }}-1\right){h}_{\mathrm{\ell }},{i}_{\mathrm{\ell }}=1,2,\mathrm{\dots },{n}_{\mathrm{\ell }},\mathrm{\ell }=1,\mathrm{\dots },d,$

which are the nodes of equally spaced subintervals with a mesh size of ${h}_{\mathrm{\ell }}=\frac{2{b}_{\mathrm{\ell }}}{{n}_{\mathrm{\ell }}-1}$.

We test the convergence of the error in the relative Frobenius norm with respect to the Tucker rank for p-Slater functions with $p=0.1,0.2,1.9,2.0$. The Frobenius norm is computed as

${E}_{FN}=\frac{\parallel 𝐐-{𝐐}_{\left(r\right)}\parallel }{\parallel 𝐐\parallel },$(3.13)

where ${𝐐}_{\left(r\right)}$ is the tensor reconstructed from the Tucker rank-r decomposition of $𝐐$.

Figure 7

Convergence in the Frobenius error (3.13) with respect to the Tucker rank for the function (3.14) with$p=0.1,0.2,1.9,2.0$ (left); Decay of singular values of the weighted Slater function (right).

Figure 8

Cross section of the three-dimensional radial function (3.14) with $p=0.1$ (left) and $p=1.9$ (right) at level $z=0$.

Figure 9

Multigrid Tucker: convergence with respect to Tucker ranks of a Slater function with $p=1$ on a sequence of grids.

Figure 10

Convergence with respect to the Tucker rank of three-dimensional spectral density of Matérn covariance (3.15)with $\alpha =0.1$ (left) and $\alpha =100$ (right).

Figure 7 shows convergence of the Frobenius error with respect to the Tucker rank for the p-Slater function discretized on the three-dimensional Cartesian grid

$C\left(x,y\right)={e}^{-{\parallel x-y\parallel }^{p}}$(3.14)

for different values of the parameter p. These functions are illustrated in Figure 8 for $p=0.1$ and $p=1.9$. Figure 9 shows for a Slater function with $p=1$ the dependence of the Tucker decomposition error in the Frobenius norm (3.13) on the Tucker rank, for the increasing grid parameter n.

Figure 10 shows the convergence with respect to the Tucker rank for the spectral density of Matérn covariance

${f}_{\alpha ,\nu }\left(\rho \right):=\frac{C}{{\left({\alpha }^{2}+{\rho }^{2}\right)}^{\nu +\frac{d}{2}}},$(3.15)

where $\alpha \in \left(0.1,100\right)$ and $d=1,2,3$. The Tucker decomposition rank is strongly dependent on the parameter α and weakly depend on the parameter ν. The three-dimensional Matérn functions with the parameters $\nu =0.4$, $\alpha =0.1$ (left) and $\nu =0.4$, $\alpha =100$ (right) are presented in Figure 11, showing the function at $z=0$.

These numerical experiments demonstrate the good algebraic separability of the typical multidimensional functions used in spatial statistics, that lead us to apply the low-rank tensor decomposition methods to the multidimensional problems of statistical data analysis.

Figure 11

The shape of three-dimensional spectral density of Matérn covariance (3.15) with $\alpha =0.1$ (left) and $\alpha =100$ (right).

## 4 Solutions to Typical Tasks in Low-Rank Tensor Format

In this section, we walk through the solutions to the statistical questions raised in the motivation above, Section 2. We add some lemmas to summarize the new computing and storage costs. Let $N={n}^{d}$, measurement vector $𝐳\in {ℝ}^{m}$, ${𝐂}_{ss}\in {ℝ}^{N×N}$, ${𝐂}_{zz}\in {ℝ}^{m×m}$, and ${𝐂}_{sz}\in {ℝ}^{N×m}$. We also introduce the restriction operator P, which consists of only ones and zeros, and pick sub-indices $\left\{{i}_{1},\mathrm{\dots },{i}_{m}\right\}$ from the whole index set $\left\{{i}_{1},\mathrm{\dots },{i}_{N}\right\}$. This operator P has tensor-1 structure, i.e., $P={\otimes }_{\nu =1}^{d}{P}_{\nu }$. An application of this restriction tensor does not change tensor ranks.

## Computing Matrix-Vector Product.

Let $𝐂={\sum }_{i=1}^{r}{\otimes }_{\mu =1}^{d}{𝐂}_{i\mu }$. If $𝐳$ is separable, i.e., $\parallel 𝐳-{\sum }_{j=1}^{{r}_{b}}{\otimes }_{\nu =1}^{d}{𝐳}_{j\nu }\parallel \le \epsilon$, then

$\mathrm{𝐂𝐳}=\sum _{i=1}^{r}\sum _{j=1}^{{r}_{z}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }{𝐳}_{j\mu }.$

If $𝐳$ is non-separable, then low-rank tensor properties cannot be employed and either the FFT idea [51] or the hierarchical matrix technique should be applied instead [38, 21, 19, 22].

#### Lemma 4.1.

The computing cost of the product $\mathrm{Cz}$ is reduced from $\mathcal{O}\mathit{}\mathrm{\left(}{N}^{\mathrm{2}}\mathrm{\right)}$ to $\mathcal{O}\mathit{}\mathrm{\left(}r\mathit{}{r}_{z}\mathit{}d\mathit{}{n}^{\mathrm{2}}\mathrm{\right)}$, where $N\mathrm{=}{n}^{d}$, $d\mathrm{\ge }\mathrm{1}$.

## Trace and Diagonal of $𝐂$.

Let $𝐂\approx \stackrel{~}{𝐂}={\sum }_{i=1}^{r}{\otimes }_{\mu =1}^{d}{𝐂}_{i\mu }$. Then

$diag\left(\stackrel{~}{𝐂}\right)=diag\left(\sum _{i=1}^{r}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }\right)=\sum _{i=1}^{r}\underset{\mu =1}{\overset{d}{\otimes }}diag\left({𝐂}_{i\mu }\right),$(4.1)$\mathrm{trace}\left(\stackrel{~}{𝐂}\right)=\mathrm{trace}\left(\sum _{i=1}^{r}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }\right)=\sum _{i=1}^{r}\prod _{\mu =1}^{d}\mathrm{trace}\left({𝐂}_{i\mu }\right).$(4.2)

The proof follows from the properties of the Kronecker tensors.

#### Lemma 4.2.

The cost of computing the right-hand sides in (4.1)–(4.2) is rdn, where ${\mathrm{C}}_{i\mathit{}\mu }\mathrm{\in }{\mathrm{R}}^{n\mathrm{×}n}$.

For simplicity, we assume that ${n}_{1}={n}_{2}=\mathrm{\dots }={n}_{d}=n$ and ${\sum }_{i=1}^{d}{n}_{i}=dn$.

#### Lemma 4.3.

The computing cost of $\mathrm{diag}\mathrm{\left(}\mathrm{C}\mathrm{\right)}$ and $\mathrm{trace}\mathit{}\mathrm{\left(}\mathrm{C}\mathrm{\right)}$ is reduced from $\mathcal{O}\mathit{}\mathrm{\left(}N\mathrm{\right)}$ to $\mathcal{O}\mathit{}\mathrm{\left(}r\mathit{}d\mathit{}n\mathrm{\right)}$. The cost of $\mathrm{det}\mathit{}\mathrm{\left(}\mathrm{C}\mathrm{\right)}$ is reduced from $\mathcal{O}\mathit{}\mathrm{\left(}{N}^{\mathrm{3}}\mathrm{\right)}$ to $\mathcal{O}\mathit{}\mathrm{\left(}d\mathit{}{n}^{\mathrm{3}}\mathrm{\right)}$.

#### Example 4.1.

A simple Matlab test for computing $\mathrm{trace}\left(𝐂\right)$ on a working station with 128 GB produces the computing times in seconds shown in Table 1.

Table 1

Computing time (in seconds) to set up and compute the trace of $\stackrel{~}{𝐂}={\sum }_{j=1}^{r}{\otimes }_{\nu =1}^{d}{𝐂}_{j\nu }$, $r=10$, $\stackrel{~}{𝐂}\in {ℝ}^{N×N}$, where $N={n}^{d}$, $d=1,000$ and $n=\left\{100,500,1000\right\}$. A modern desktop computer with 128 GB RAM was used.

In what follows, we discuss the computation of trace and diag in the case of Tucker representation of the generating Matérn function on ${\left(2k+1\right)}^{\otimes d}$ grid and present the corresponding numerical example for $d=3$. We notice that due to Toeplitz structure of the skeleton matrices of size $\left(2k+1\right)×\left(2k+1\right)$, the diagonal of the covariance matrix $𝐂$ is the weighted Tucker sum of the Kronecker products of scaled identity matrices in ${ℝ}^{2k+1}$. The scaling factor is determined by the value of generating Tucker vector ${𝐯}_{{\nu }_{\mathrm{\ell }}}^{\left(\mathrm{\ell }\right)}\left(k+1\right)$ corresponding the origin of the computational box ${\left[-b,b\right]}^{d}$. Let the matrix $𝐂$ be composed by using the Tucker tensor $𝐀$ approximating the generating Matérn function. Then, as a straightforward consequence of the above remark, we derive the simple representations

$\mathrm{diag}\left(𝐂\right)=𝐀\left({x}_{0}\right){𝐄}^{\left(d\right)},\mathrm{trace}\left(𝐂\right)=𝐀\left({x}_{0}\right){\left(2k+1\right)}^{d},$

where ${x}_{0}$ corresponds to the origin $x=0$ in ${\left[-b,b\right]}^{d}$ and ${𝐄}^{\left(d\right)}$ is the identity matrix in the full tensor space. The grid coordinate of ${x}_{0}$ is determined by the multi-index $\left(k+1,\mathrm{\dots },k+1\right)$.

Table 2 represents the values of $𝐀\left({x}_{0}\right)$ computed by the Tucker approximation to the three-dimensional Slater function ${e}^{-\parallel x\parallel }$ on the ${\left(2k+1\right)}^{\otimes 3}$ grids with $n=2k+1=129,257,513$, and for different Tucker rank parameters $r=1,2,\mathrm{\dots },10$.

Table 2

The error of the Tucker approximation to the value $𝐀$ at the origin (the exact value is equal to 1) versus the Tucker rank and the grid size $n=2k+1$.

Given the rank-$𝐫$ Tucker tensor $𝐀$, the complexity for calculation of $𝐀\left({x}_{0}\right)$ is estimated by $O\left({r}^{d}\right)$.

## Computing Square Root ${𝐂}^{\frac{1}{2}}$.

Observe that ${𝐂}^{\frac{1}{2}}$ can be computed as in (3.12). An iterative method for computing ${𝐂}^{\frac{1}{2}}$ is presented in [16].

## Linear Solvers in a Low-Rank Tensor Format.

Likely, there is already a good theory for solving linear systems $\mathrm{𝐂𝐰}=𝐳$ with symmetric and positive definite matrix $𝐂$, in a tensor format. We refer to the overview works [34, 35, 18, 20]. Some particular linear solvers are developed in [13, 2, 24, 36, 12], [11, 36]. We also recommend to use the Tensor Toolbox [41], which contains routines for CP and Tucker tensor formats.

## 4.1 Computing ${𝐳}^{T}{𝐂}^{-1}𝐳$

#### Lemma 4.4.

Let $\mathrm{\parallel }\mathrm{z}\mathrm{-}{\mathrm{\sum }}_{i\mathrm{=}\mathrm{1}}^{r}{\mathrm{\otimes }}_{\mu \mathrm{=}\mathrm{1}}^{d}{\mathrm{z}}_{i\mathit{}\mu }\mathrm{\parallel }\mathrm{\le }\epsilon$. We assume that there is an iterative method that can be used to solve the linear system $\mathrm{Cw}\mathrm{=}\mathrm{z}$ in a low-rank tensor format and to find the solution in the form $\mathrm{w}\mathrm{=}{\mathrm{\sum }}_{i\mathrm{=}\mathrm{1}}^{r}{\mathrm{\otimes }}_{\mu \mathrm{=}\mathrm{1}}^{d}{\mathrm{w}}_{i\mathit{}\mu }$. Then the quadratic form ${\mathrm{z}}^{T}\mathit{}{\mathrm{C}}^{\mathrm{-}\mathrm{1}}\mathit{}\mathrm{z}$ is the following scalar products:

${𝐳}^{T}{𝐂}^{-1}𝐳=\sum _{i=1}^{r}\sum _{j=1}^{{r}_{z}}\prod _{\mu =1}^{d}\left({𝐰}_{i\mu },{𝐳}_{j\mu }\right).$

The proof follows from the definition and properties of the tensor and scalar products. If $𝐳$ is non-separable, then low-rank tensor properties cannot be employed and the FFT idea [51] or the hierarchical matrix technique [38] should be employed.

#### Lemma 4.5.

The computing cost of the quadratic form ${\mathrm{z}}^{T}\mathit{}{\mathrm{C}}^{\mathrm{-}\mathrm{1}}\mathit{}\mathrm{z}$ is the product of the number of required iterations and the cost of one iteration, which is $\mathcal{O}\mathit{}\mathrm{\left(}r\mathit{}{r}_{z}\mathit{}d\mathit{}{m}^{\mathrm{2}}\mathrm{\right)}$ (assuming that the iterative method required only matrix-vector products).

The proof follows from the definitions and properties of the tensor and scalar products.

## 4.2 Interpolation by Simple Kriging

The three most computationally demanding tasks in Kriging are:

• (1)

solving an $M×M$ system of equations to obtain the Kriging weights,

• (2)

obtaining the $N×1$ Kriging estimate by superposing the Kriging weights with the $N×M$ cross-covariance matrix between the measurements and the unknowns,

• (3)

evaluating the $N×1$ estimation variance as the diagonal of an $N×N$ conditional covariance matrix [51].

Here, M refers to the number of measured data values, and N refers to the number of estimation points. When optimizing the design of the sampling patterns, the challenge is to evaluate the scalar measures of the $N×N$ conditional covariance matrix (see ${\phi }_{A}$ and ${\phi }_{C}$ in equation (2.1) and Task 4) repeatedly within a high-dimensional and non-linear optimization procedure (e.g., [42, 58]).

The following Kriging formula is well known [51]:

$\stackrel{^}{𝐬}={𝐂}_{sz}{𝐂}_{zz}^{-1}𝐳.$

#### Lemma 4.6.

If $\mathrm{\parallel }{\mathrm{C}}_{s\mathit{}z}\mathrm{-}{\mathrm{\sum }}_{i\mathrm{=}\mathrm{1}}^{{r}_{C}}{\mathrm{\otimes }}_{\mu \mathrm{=}\mathrm{1}}^{d}{\mathrm{C}}_{i\mathit{}\mu }\mathrm{\parallel }\mathrm{\le }\epsilon$ for some small $\epsilon \mathrm{\ge }\mathrm{0}$ and Lemma 4.4 holds, then

${𝐂}_{sz}{𝐂}_{zz}^{-1}𝐳\approx \sum _{i=1}^{{r}_{z}}\sum _{j=1}^{{r}_{C}}\underset{\nu =1}{\overset{d}{\otimes }}{𝐂}_{i\nu }{𝐰}_{j\nu }.$(4.3)

The proof follows from the definitions and the properties of the tensor and scalar products.

#### Lemma 4.7.

The computing cost of solving the linear system ${\mathrm{C}}_{z\mathit{}z}^{\mathrm{-}\mathrm{1}}\mathit{}\mathrm{z}$ is $\mathcal{O}\mathit{}\mathrm{\left(}\mathrm{#}\mathit{}\mathrm{iters}\mathrm{\cdot }{r}_{z}\mathit{}r\mathit{}d\mathit{}{m}^{\mathrm{2}}\mathrm{\right)}$. Computation of the Kriging coefficients by equation (4.3) costs $\mathcal{O}\mathit{}\mathrm{\left(}{r}_{z}\mathit{}{r}_{C}\mathit{}d\mathit{}n\mathit{}m\mathrm{\right)}\mathrm{+}\mathcal{O}\mathit{}\mathrm{\left(}\mathrm{#}\mathit{}\mathrm{iters}\mathrm{\cdot }{r}_{z}\mathit{}r\mathit{}d\mathit{}{m}^{\mathrm{2}}\mathrm{\right)}$.

If $𝐳$ is non-separable, then the low-rank tensor properties cannot be employed and either the FFT idea [51] or the hierarchical matrix technique [38] should be applied.

## 4.3 Computing Conditional Covariance

Let $𝐲\in {ℝ}^{m}$ be the vector of measurements. The conditional covariance matrix is

${𝐂}_{ss|y}={𝐂}_{ss}-{𝐂}_{sy}{𝐂}_{yy}^{-1}{𝐂}_{ys}.$

The associated estimation variance $\stackrel{^}{𝝈}$ is the diagonal of the $N×N$ conditional covariance matrix ${𝐂}_{ss|y}$:

${\stackrel{^}{𝝈}}_{𝐬}=diag\left({𝐂}_{ss|y}\right)=diag\left({𝐂}_{ss}-{𝐂}_{sy}{𝐂}_{yy}^{-1}{𝐂}_{ys}\right).$(4.4)

Let us assume that the measurements are taken at locations that form a subset of the total set of nodes $\mathcal{ℐ}=\left\{0,\mathrm{\dots },N-1\right\}$, i.e., ${\mathcal{ℐ}}_{\mathcal{ℳ}}=\left\{{i}_{1},\mathrm{\dots },{i}_{m}\right\}\subset \mathcal{ℐ}$. We also assume that the nodes ${\mathcal{ℐ}}_{\mathcal{ℳ}}$ belong to a tensor mesh, i.e., if $\mathcal{ℐ}={\otimes }_{\nu =1}^{d}{I}_{\nu }$ and ${\mathcal{ℐ}}_{\mathcal{ℳ}}={\otimes }_{\nu =1}^{d}{\stackrel{^}{I}}_{\nu }$, then ${\stackrel{^}{I}}_{\nu }\subseteq {I}_{nu}$.

Let

${𝐂}_{yy}=\sum _{k=1}^{r}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{k\mu }.$

Again, we use low-rank tensor solvers, this time to solve the matrix system ${𝐂}_{yy}𝐖={𝐂}_{ys}$. We obtain the solution

$𝐖={𝐂}_{yy}^{-1}{𝐂}_{ys}=\sum _{j=1}^{{r}_{w}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{j\mu }.$

Assuming that ${𝐂}_{sy}\approx {\sum }_{i=1}^{{r}_{C}}{\otimes }_{\mu =1}^{d}{𝐂}_{i\mu }$, we obtain

${𝐂}_{sy}𝐖={𝐂}_{sy}{𝐂}_{yy}^{-1}{𝐂}_{ys}=\sum _{i=1}^{{r}_{C}}\underset{\nu =1}{\overset{d}{\otimes }}{𝐂}_{i\nu }\sum _{j=1}^{{r}_{w}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{j\mu }=\sum _{i=1}^{{r}_{C}}\sum _{j=1}^{{r}_{w}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }{𝐂}_{j\mu }\approx \sum _{j=1}^{{r}_{0}}\underset{\mu =1}{\overset{d}{\otimes }}{\stackrel{~}{𝐂}}_{j\mu },$

where ${r}_{0}>0$ is the new rank after a rank-truncation procedure and ${\stackrel{~}{𝐂}}_{j\mu }$ are new factors. Let

${𝐂}_{ss}=\sum _{i=1}^{{r}_{s}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }.$

The conditional covariance is

${𝐂}_{ss|y}=\sum _{i=1}^{{r}_{s}}\underset{\mu =1}{\overset{d}{\otimes }}{𝐂}_{i\mu }-\sum _{i=1}^{{r}_{0}}\underset{\mu =1}{\overset{d}{\otimes }}{\stackrel{~}{𝐂}}_{i\mu }=\sum _{i=1}^{{r}_{s}+{r}_{0}}\underset{\mu =1}{\overset{d}{\otimes }}{\stackrel{^}{𝐂}}_{i\mu },$

where ${\stackrel{^}{𝐂}}_{i\mu }={𝐂}_{i\mu }$ for $1\le i\le {r}_{s}$ and ${\stackrel{^}{𝐂}}_{i\mu }=-{\stackrel{~}{𝐂}}_{i\nu }$ for ${r}_{s}.

## 4.4 Example: Separable Covariance Matrices

Let

$cov\left(𝐱,𝐲\right)={\mathrm{exp}}^{-{|𝐱-𝐲|}^{2}}$

be the Gaussian covariance function, where $𝐱=\left({x}_{1},\mathrm{\dots },{x}_{d}\right)$ and $𝐲=\left({y}_{1},\mathrm{\dots },{y}_{d}\right)\in \mathcal{𝒟}\subset {ℝ}^{d}$. The function $cov\left(𝐱,𝐲\right)$ can be written as a tensor product of one-dimensional functions:

$cov\left(𝐱,𝐲\right)={\mathrm{exp}}^{-{|{x}_{1}-{y}_{1}|}^{2}}\otimes \mathrm{\dots }\otimes {\mathrm{exp}}^{-{|{x}_{d}-{y}_{d}|}^{2}}.$

After discretization of $cov\left(𝐱,𝐲\right)$, we obtain $𝐂$ as a rank-1 Kronecker product of the one-dimensional covariance matrices, i.e.,

$𝐂={𝐂}_{1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}.$(4.5)

We note that arbitrary discretization (anisotropy) can occur in any direction.

#### Lemma 4.8.

If d Cholesky decompositions exist, i.e., ${\mathrm{C}}_{i}\mathrm{=}{\mathrm{L}}_{i}\mathrm{\cdot }{\mathrm{L}}_{i}^{T}$ and $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{\dots }\mathrm{,}d$, then

${𝐂}_{1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}=\left({𝐋}_{1}{𝐋}_{1}^{T}\right)\otimes \mathrm{\dots }\otimes \left({𝐋}_{d}{𝐋}_{d}^{T}\right)=\left({𝐋}_{1}\otimes \mathrm{\dots }\otimes {𝐋}_{d}\right)\cdot \left({𝐋}_{1}^{T}\otimes \mathrm{\dots }\otimes {𝐋}_{d}^{T}\right)=:𝐋\cdot {𝐋}^{T},$

where $\mathrm{L}\mathrm{:=}{\mathrm{L}}_{\mathrm{1}}\mathrm{\otimes }\mathrm{\dots }\mathrm{\otimes }{\mathrm{L}}_{d}$ and ${\mathrm{L}}^{T}\mathrm{:=}{\mathrm{L}}_{\mathrm{1}}^{T}\mathrm{\otimes }\mathrm{\dots }\mathrm{\otimes }{\mathrm{L}}_{d}^{T}$ are also lower- and upper-triangular matrices, respectively.

Lemma 4.8 shows that

• (a)

the Gaussian covariance function in dimensions $d>1$ can be written as the tensor sum of one-dimensional covariance functions,

• (b)

its Cholesky factor can be computed via Cholesky factors computed from one-dimensional covariances.

The computational complexity drops from $\mathcal{𝒪}\left(N\mathrm{log}N\right)$, $N={n}^{d}$, to $\mathcal{𝒪}\left(dn\mathrm{log}n\right)$, where n is the number of mesh points in a one-dimensional problem. Further research is required on non-Gaussian covariance functions.

#### Lemma 4.9.

Let $\mathrm{C}\mathrm{=}{\mathrm{C}}_{\mathrm{1}}\mathrm{\otimes }\mathrm{\dots }\mathrm{\otimes }{\mathrm{C}}_{d}$. If the inverse matrices ${\mathrm{C}}_{i}^{\mathrm{-}\mathrm{1}}$, $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{\dots }\mathrm{,}d$, exist, then

${\left({𝐂}_{1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}\right)}^{-1}={𝐂}_{1}^{-1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}^{-1}.$

The computational complexity drops from $\mathcal{𝒪}\left(N\mathrm{log}N\right)$, $N={n}^{d}$, to $\mathcal{𝒪}\left(dn\mathrm{log}n\right)$, where n is the number of mesh points in a one-dimensional problem.

#### Remark 4.10.

We assume here that we have an efficient method to invert $𝐂$ (e.g., FFT or hierarchical matrices) with a cost of $\mathcal{𝒪}\left(N\mathrm{log}N\right)$. If not, then the complexity cost drops from $\mathcal{𝒪}\left({N}^{3}\right)$ to $\mathcal{𝒪}\left(d{n}^{3}\right)$ (usual Gaussian elimination algorithm).

#### Lemma 4.11.

If ${\mathrm{C}}_{i}$, $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{\dots }\mathrm{,}d$, are covariance matrices, then we can compute $\mathrm{log}\mathit{}\mathrm{det}\mathit{}\mathrm{C}$, where $\mathrm{C}$ is a separable rank-1 d-dimensional covariance function, as

$\mathrm{log}det\left({𝐂}_{1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}\right)=\sum _{j=1}^{d}\mathrm{log}det{𝐂}_{j}\prod _{i=1,i\ne j}^{d}{n}_{i}.$(4.6)

#### Proof.

We check for $d=2$ that

$det\left({𝐂}_{1}\otimes {𝐂}_{2}\right)=det{\left({𝐂}_{1}\right)}^{{n}_{2}}\cdot det{\left({𝐂}_{2}\right)}^{{n}_{1}}$

and then apply mathematical induction. ∎

The computational cost drops again from $\mathcal{𝒪}\left(N\mathrm{log}N\right)$, $N={n}^{d}$, to $\mathcal{𝒪}\left(dn\mathrm{log}n\right)$. A similar assumption to Remark 4.10 for computing $det\left(𝐂\right)$ also holds here.

#### Example 4.2.

Let $n=6000$, $d=3$, and $N={6000}^{3}$. Using MATLAB on a MacBookPro with 16 GB RAM, the time required set up the matrices ${𝐂}_{1}$,${𝐂}_{2}$, and ${𝐂}_{3}$ is 11 seconds; it takes 4 seconds to compute ${𝐋}_{1}$, ${𝐋}_{2}$, and ${𝐋}_{3}$. The large matrices $𝐂$ and $𝐋$ are never constructed (i.e., the Kronecker product is never calculated).

#### Example 4.3.

In previous work [43] we used the hierarchical matrix technique to approximate ${𝐂}_{i}$ and its Cholesky factor ${𝐋}_{i}$ for $n=2\cdot {10}^{6}$ in 2 minutes. Here, we combine the hierarchical matrix technique and the Kronecker tensor product. Assuming $𝐂={𝐂}_{1}\otimes \mathrm{\dots }\otimes {𝐂}_{d}$, we approximate $𝐂$ for $n={\left(2\cdot {10}^{6}\right)}^{d}$ in $2d$ minutes.

#### Lemma 4.12.

Let $\mathrm{C}$ be the same as in equation (4.5), and let Lemma 4.4 hold. Then we apply the property in equation (4.6) to obtain a tensor approximation of the log-likelihood:

$\mathcal{ℒ}\approx \stackrel{~}{\mathcal{ℒ}}=-\frac{{\prod }_{\nu =1}^{d}{n}_{\nu }}{\mathrm{log}\left(2\pi \right)}-\sum _{j=1}^{d}\mathrm{log}det{𝐂}_{j}\prod _{i=1,i\ne j}^{d}{n}_{i}-\sum _{i=1}^{r}\sum _{j=1}^{r}\prod _{\nu =1}^{d}\left({𝐮}_{i,\nu }^{T},{𝐮}_{j,\nu }\right).$(4.7)

Equation (4.7) shows one disadvantage of the Gaussian log-likelihood function in high dimensions. Namely, the log-likelihood grows exponentially with d as ${n}^{d}$.

## 5 Conclusion

In this work, we demonstrate that the basic functions and operators used in spatial statistics may be represented using rank-structured tensor formats and that the error of this representation exhibits the exponential decay with respect to the tensor rank. We applied the Tucker and canonical tensor decompositions to a family of Matérn-type and Slater-type functions with varying parameters and demonstrated numerically that their approximations exhibit exponentially fast convergence. A low-rank tensor approximation of the Matérn covariance function and its Fourier transform is considered. We separated the radial basis functions using the Laplace transforms to prove the existence of such low-rank approximations, and applied the $\mathrm{sinc}$ quadrature method to estimate the tensor ranks and accuracy.

We also demonstrated how to compute $diag\left(𝐂\right)$, $\mathrm{trace}\left(𝐂\right)$, the matrix-vector product, Kriging operations, and the geostatistical optimal design in a low-rank tensor format with at a linear cost. For matrix $𝐂$ of size $N×N$, $N={6000}^{3}$ and of tensor rank 1, we were able to compute the Cholesky factorization in 15 seconds. We also computed the Tucker approximation to the three-dimensional Slater function ${e}^{-\parallel x\parallel }$ on the grid with ${513}^{3}$ points and Tucker ranks $r=1,2,\mathrm{\dots },10$. Furthermore, we demonstrated how to compute $\mathrm{trace}\left(𝐂\right)$ for $N={n}^{d}={1000}^{1000}$. This might be useful in machine learning. In this paper, operations such as computing the Cholesky factorization, inverse, and determinant have been implemented for rank-1 tensors (e.g., the Gaussian covariance has a tensor rank-1). These formulas could be useful for developing successive rank-1 updates in greedy algorithms. Further investigations are needed for the representation of these quantities with the ranks higher than one.

Additionally, in Section 3.7 we studies the influence of the parameters of the Matérn covariance function on the tensor ranks (Figures 7 and 9). We observed (see Figure 10) that the dependence of the parameters of the Matérn covariance function on the tensor ranks is very weak, and the ranks grew slowly. In this paper, we also highlighted that big data statistical problems can be effectively treated by using the special low-rank tensor techniques.

## References

• [1]

S. Ambikasaran, J. Y. Li, P. K. Kitanidis and E. Darve, Large-scale stochastic linear inversion using hierarchical matrices, Comput. Geosci. 17 (2013), no. 6, 913–927.

• [2]

J. Ballani and D. Kressner, Sparse inverse covariance estimation with hierarchical matrices, preprint (2015), http://sma.epfl.ch/~anchpcommon/publications/quic_ballani_kressner_2014.pdf.

• [3]

C. Bertoglio and B. N. Khoromskij, Low-rank quadrature-based tensor approximation of the Galerkin projected Newton/Yukawa kernels, Comput. Phys. Commun. 183 (2012), no. 4, 904–912.

• [4]

S. Börm and J. Garcke, Approximating gaussian processes with ${H}^{2}$-matrices, Proceedings of 18th European Conference on Machine Learning—ECML 2007, Lecture Notes in Artificial Intelligence 4701, Springer, Berlin (2007), 42–53.  Google Scholar

• [5]

S. F. Boys, G. B. Cook, C. M. Reeves and I. Shavitt, Automatic fundamental calculations of molecular structure, Nature 178 (1956), 1207–1209.

• [6]

D. Braess, Nonlinear Approximation Theory, Springer Ser. Comput. Math. 7, Springer, Berlin, 1986.  Google Scholar

• [7]

J.-P. Chilès and P. Delfiner, Geostatistics, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 1999.  Google Scholar

• [8]

A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications, Wiley, New York, 2002.  Google Scholar

• [9]

S. De Iaco, S. Maggio, M. Palma and D. Posa, Toward an automatic procedure for modeling multivariate space-time data, Comput. Geosci. 41 (2011), 10.1016/j.cageo.2011.08.008.  Google Scholar

• [10]

L. De Lathauwer, B. De Moor and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21 (2000), no. 4, 1253–1278.

• [11]

S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Computation of the response surface in the tensor train data format, preprint (2014), https://arxiv.org/abs/1406.2816.

• [12]

S. Dolgov, B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Polynomial chaos expansion of random coefficients and the solution of stochastic partial differential equations in the tensor train format, SIAM/ASA J. Uncertain. Quantif. 3 (2015), no. 1, 1109–1135.

• [13]

S. Dolgov, B. N. Khoromskij and D. Savostyanov, Superfast Fourier transform using QTT approximation, J. Fourier Anal. Appl. 18 (2012), no. 5, 915–953.

• [14]

P. A. Finke, D. J. Brus, M. F. P. Bierkens, T. Hoogland, M. Knotters and F. De Vries, Mapping groundwater dynamics using multiple sources of exhaustive high resolution data, Geoderma 123 (2004), no. 1, 23–39.

• [15]

R. Furrer and M. G. Genton, Aggregation-cokriging for highly multivariate spatial data, Biometrika 98 (2011), no. 3, 615–631.

• [16]

I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Data-sparse approximation to a class of operator-valued functions, Math. Comp. 74 (2005), no. 250, 681–708.  Google Scholar

• [17]

I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Hierarchical tensor-product approximation to the inverse and related operators for high-dimensional elliptic problems, Computing 74 (2005), no. 2, 131–157.

• [18]

L. Grasedyck, D. Kressner and C. Tobler, A literature survey of low-rank tensor approximation techniques, GAMM-Mitt. 36 (2013), no. 1, 53–78.

• [19]

W. Hackbusch, A sparse matrix arithmetic based on $\mathcal{ℋ}$-matrices. I. Introduction to $\mathcal{ℋ}$-matrices, Computing 62 (1999), no. 2, 89–108.  Google Scholar

• [20]

W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Ser. Comput. Math. 42, Springer, Heidelberg, 2012.  Google Scholar

• [21]

W. Hackbusch, Hierarchical Matrices: Algorithms and Analysis, Springer Ser. Comput. Math. 49, Springer, Heidelberg, 2015.  Google Scholar

• [22]

W. Hackbusch and B. N. Khoromskij, A sparse $\mathcal{ℋ}$-matrix arithmetic. II. Application to multi-dimensional problems, Computing 64 (2000), no. 1, 21–47.  Google Scholar

• [23]

W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. I. Separable approximation of multi-variate functions, Computing 76 (2006), no. 3–4, 177–202.

• [24]

W. Hackbusch and B. N. Khoromskij, Low-rank Kronecker-product approximation to multi-dimensional nonlocal operators. II. HKT representation of certain operators, Computing 76 (2006), no. 3–4, 203–225.

• [25]

M. S. Handcock and M. L. Stein, A Bayesian analysis of Kriging, Technometrics 35 (1993), 403–410.

• [26]

H. Harbrecht, M. Peters and M. Siebenmorgen, Efficient approximation of random fields for numerical applications, Numer. Linear Algebra Appl. 22 (2015), no. 4, 596–617.

• [27]

J. Håstad, Tensor rank is NP-complete, J. Algorithms 11 (1990), no. 4, 644–654.

• [28]

M. R. Haylock, N. Hofstra, A. M. Klein Tank, E. J. Klok, P. D. Jones and M. New, A european daily high-resolution gridded data set of surface temperature and precipitation for 1950–2006, J. Geophys. Res. 113 (2008), 10.1029/2008JD010201.  Google Scholar

• [29]

F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys. 6 (1927), 164–189.

• [30]

A. G. Journel and C. J. Huijbregts, Mining Geostatistics, Academic Press, New York, 1978.  Google Scholar

• [31]

V. Khoromskaia, Computation of the Hartree–Fock exchange by the tensor-structured methods, Comput. Methods Appl. Math. 10 (2010), no. 2, 204–218.  Google Scholar

• [32]

V. Khoromskaia and B. N. Khoromskij, Fast tensor method for summation of long-range potentials on 3D lattices with defects, Numer. Linear Algebra Appl. 23 (2016), no. 2, 249–271.

• [33]

B. N. Khoromskij, Structured rank-$\left({R}_{1},\mathrm{\dots },{R}_{D}\right)$ decomposition of function-related tensors in ${ℝ}^{D}$, Comput. Methods Appl. Math. 6 (2006), no. 2, 194–220.  Google Scholar

• [34]

B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on recent advances, Chemometr. Intell. Laboratory Syst. 110 (2011), no. 1, 1–19.  Google Scholar

• [35]

B. N. Khoromskij, Tensor numerical methods for multidimensional PDEs: Theoretical analysis and initial applications, CEMRACS 2013—Modelling and Simulation of Complex Systems: Stochastic and Deterministic Approaches, ESAIM Proc. Surveys 48, EDP Sci., Les Ulis (2015), 1–28.  Google Scholar

• [36]

B. N. Khoromskij and V. Khoromskaia, Low rank Tucker-type tensor approximation to classical potentials, Cent. Eur. J. Math. 5 (2007), no. 3, 523–550.

• [37]

B. N. Khoromskij and V. Khoromskaia, Multigrid accelerated tensor approximation of function related multidimensional arrays, SIAM J. Sci. Comput. 31 (2009), no. 4, 3002–3026.

• [38]

B. N. Khoromskij, A. Litvinenko and H. G. Matthies, Application of hierarchical matrices for computing the Karhunen–Loève expansion, Computing 84 (2009), no. 1–2, 49–67.

• [39]

P. K. Kitanidis, Introduction to Geostatistics, Cambridge University Press, Cambridge, 1997.  Google Scholar

• [40]

T. G. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23 (2001), no. 1, 243–255.

• [41]

T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev. 51 (2009), no. 3, 455–500.

• [42]

J. B. Kollat, P. M. Reed and J. R. Kasprzyk, A new epsilon-dominance hierarchical bayesian optimization algorithm for large multiobjective monitoring network design problems, Adv. Water Res. 31 (2008), no. 5, 828–845.

• [43]

A. Litvinenko, HLIBCov: Parallel hierarchical matrix approximation of large covariance matrices and likelihoods with applications in parameter identification, preprint (2017), https://arxiv.org/abs/1709.08625.

• [44]

A. Litvinenko, Y. Sun, M. G. Genton and D. Keyes, Likelihood approximation with hierarchical matrices for large spatial datasets, preprint (2017), https://arxiv.org/abs/1709.04419.

• [45]

B. Matérn, Spatial Variation, 2nd ed., Lecture Notes in Statist. 36, Springer, Berlin, 1986.  Google Scholar

• [46]

G. Matheron, The Theory of Regionalized Variables and its Applications, Ecole de Mines, Fontainebleau, 1971.  Google Scholar

• [47]

V. Minden, A. Damle, K. L. Ho and L. Ying, Fast spatial Gaussian process maximum likelihood estimation via skeletonization factorizations, Multiscale Model. Simul. 15 (2017), no. 4, 1584–1611.

• [48]

W. G. Müller, Collecting Spatial Data. Optimum Design of Experiments for Random Fields, 3rd ed., Contrib. Statist., Springer, Berlin, 2007.  Google Scholar

• [49]

G. R. North, J. Wang and M. G. Genton, Correlation models for temperature fields, J. Climate 24 (2011), 5850–5862.

• [50]

W. Nowak, Measures of parameter uncertainty in geostatistical estimation and geostatistical optimal design, Math. Geosci 42 (2010), no. 2, 199–221.

• [51]

W. Nowak and A. Litvinenko, Kriging and spatial design accelerated by orders of magnitude: Combining low-rank covariance approximations with FFT-techniques, Math. Geosci. 45 (2013), no. 4, 411–435.

• [52]

D. Nychka, S. Bandyopadhyay, D. Hammerling, F. Lindgren and S. Sain, A multiresolution Gaussian process model for the analysis of large spatial datasets, J. Comput. Graph. Statist. 24 (2015), no. 2, 579–599.

• [53]

I. V. Oseledets, Tensor-train decomposition, SIAM J. Sci. Comput. 33 (2011), no. 5, 2295–2317.

• [54]

J. Quiñonero Candela and C. E. Rasmussen, A unifying view of sparse approximate Gaussian process regression, J. Mach. Learn. Res. 6 (2005), 1939–1959.  Google Scholar

• [55]

C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, Adapt. Comput. Mach. Learn., MIT, Cambridge, 2006.  Google Scholar

• [56]

A. K. Saibaba, S. Ambikasaran, J. Yue Li, P. K. Kitanidis and E. F. Darve, Application of hierarchical matrices to linear inverse problems in geostatistics, Oil Gas Sci. Technol. Rev. IFP Energ. Nouv. 67 (2012), no. 5, 857–875.

• [57]

U. Schollwöck, The density-matrix renormalization group in the age of matrix product states, Ann. Physics 326 (2011), no. 1, 96–192.

• [58]

R. Shah and P. Reed, Comparative analysis of multiobjective evolutionary algorithms for random and correlated instances of multiobjective d-dimensional knapsack problems, European J. Oper. Res. 211 (2011), no. 3, 466–479.

• [59]

A. K. Smilde, R. Bro and P. Geladi, Multi-Way Analysis with Applications in the Chemical Sciences, Wiley, New York, 2004.  Google Scholar

• [60]

G. Spöck and J. Pilz, Spatial sampling design and covariance-robust minimax prediction based on convex design ideas, Stoch. Environmental Res. Risk Assess. 24 (2010), 463–482.

• [61]

M. L. Stein, J. Chen and M. Anitescu, Difference filter preconditioning for large covariance matrices, SIAM J. Matrix Anal. Appl. 33 (2012), no. 1, 52–72.

• [62]

M. L. Stein, Z. Chi and L. J. Welty, Approximating likelihoods for large spatial data sets, J. R. Stat. Soc. Ser. B Stat. Methodol. 66 (2004), no. 2, 275–296.

• [63]

F. Stenger, Numerical Methods Based on Sinc and Analytic Functions, Springer Ser. Comput. Math. 20, Springer, New York, 1993.  Google Scholar

• [64]

Y. Sun and M. L. Stein, Statistically and computationally efficient estimating equations for large spatial datasets, J. Comput. Graph. Statist. 25 (2016), no. 1, 187–208.

• [65]

L. R. Tucker, Some mathematical notes on three-mode factor analysis, Psychometrika 31 (1966), 279–311.

• [66]

S. M. Wesson and G. G. S. Pegram, Radar rainfall image repair techniques, Hydrol. Earth Syst. Sci. 8 (2004), no. 2, 8220–8234.  Google Scholar

Revised: 2018-03-12

Accepted: 2018-05-02

Published Online: 2018-07-07

Published in Print: 2019-01-01

The research reported in this publication was supported by funding from King Abdullah University of Science and Technology (KAUST).

Citation Information: Computational Methods in Applied Mathematics, Volume 19, Issue 1, Pages 101–122, ISSN (Online) 1609-9389, ISSN (Print) 1609-4840,

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.