Boris Khoromskij and Sergey Repin

Rank Structured Approximation Method for Quasi-Periodic Elliptic Problems

Accessible
De Gruyter | Published online: June 17, 2017

Abstract

We consider an iteration method for solving an elliptic type boundary value problem 𝒜 ⁢ u = f , where a positive definite operator 𝒜 is generated by a quasi-periodic structure with rapidly changing coefficients (a typical period is characterized by a small parameter ϵ). The method is based on using a simpler operator 𝒜 0 (inversion of 𝒜 0 is much simpler than inversion of 𝒜 ), which can be viewed as a preconditioner for 𝒜 . We prove contraction of the iteration method and establish explicit estimates of the contraction factor q. Certainly the value of q depends on the difference between 𝒜 and 𝒜 0 . For typical quasi-periodic structures, we establish simple relations that suggest an optimal 𝒜 0 (in a selected set of “simple” structures) and compute the corresponding contraction factor. Further, this allows us to deduce fully computable two-sided a posteriori estimates able to control numerical solutions on any iteration. The method is especially efficient if the coefficients of 𝒜 admit low-rank representations and if algebraic operations are performed in tensor structured formats. Under moderate assumptions the storage and solution complexity of our approach depends only weakly (merely linear-logarithmically) on the frequency parameter 1 ϵ .

1 Introduction

Problems with periodic and quasi-periodic structures arise in various natural sciences models and technical applications. Quantitative analysis of such problems requires special methods oriented towards their specific features. For perfectly periodic structures, efficient methods are developed within the framework of the homogenization theory (see, e.g., [1, 3, 8] and other literature cited therein). However, classical homogenization methods cover only one class of problems (all cells are self-similar and the amount of cells is very large). In this paper, we use a different idea and suggest another modus operandi for quantitative analysis of boundary value problems with periodic and quasi-periodic coefficients. It generates approximations converging (in the energy space) to the exact solution and provides guaranteed and computable error estimates. The approach is applicable to (see, e.g., Figures 1, 2)

  1. (i)

    periodic structures, in which the amount of cell is considerable (e.g., 10 3 – 10 4 ) but not large enough to neglect the error generated by the respective homogenized model;

  2. (ii)

    quasi-periodic structures that contain cells with defects and deformations;

  3. (iii)

    multi-periodic structures where the coefficients reflect the combined effect of several functions with different periodicity.

In general terms, the idea of the method is as follows. We consider the problem 𝒫 :

(1.1) 𝒜 ⁢ u = f , f ∈ V * ,

where V is a reflexive Banach space with the norm ∥ ⋅ ∥ V , V * is the space conjugate to V (the respective duality pairing is denoted by 〈 v * , v 〉 ), and 𝒜 : V → V * is a bounded linear operator. It is assumed that the operator 𝒜 is positive definite and invertible, so that problem (1.1) is well posed. However, 𝒫 is viewed as a very difficult problem because 𝒜 is generated by a complicated physical structure, which may contain a huge amount of details. Therefore, attempts to solve (1.1) numerically by standard methods may lead to enormous expenditures. Similar difficulties arise if we wish to verify the quality of a numerical solution.

Assume that the operator 𝒜 is approximated by a simplified positive definite operator 𝒜 ∘ and the inversion of 𝒜 ∘ is much simpler than the inversion of 𝒜 . By means of 𝒜 ∘ , we construct an iteration method based on solving a “simple” problem 𝒫 0 : 𝒜 ∘ ⁢ u ∘ = g . In other words, the method is based on the operation g → 𝒜 ∘ - 1 ⁢ g . It also includes the operation v → 𝒜 ⁢ v , which can be performed very efficiently by tensor-type decomposition methods provided that physical structures generated 𝒜 have low-rank representations. We prove that iterations generate a sequence of functions converging to the exact solution of (1.1) with a geometrical rate. Furthermore, we deduce explicitly computable and guaranteed a posteriori error estimates adapted to this class of problems. They evaluate the accuracy of approximations computed on each step of the iteration algorithm. These estimates also use only inversion of 𝒜 ∘ and operations of the type v → 𝒜 ⁢ v . In the iteration methods and error estimates inversion of the operator A is avoided.

In this paper, we consider one class of problems associated with divergent type elliptic equations where 𝒜 = Q * ⁢ Λ ⁢ Q and 𝒜 ∘ = Q * ⁢ Λ ∘ ⁢ Q . Here Λ : Y → Y is a bounded operator induced by a complicated quasi-periodic structure while Q : V → Y and Q * : Y → V * are conjugate operators, i.e.,

( y , Q ⁢ w ) = 〈 Q * ⁢ y , w 〉   for all  ⁢ y ∈ Y ⁢  and  ⁢ w ∈ V ,

where Y is a Hilbert space with the scalar product ( ⋅ , ⋅ ) and the norm ∥ ⋅ ∥ . The operators Q and Q * are induced by differential operators or certain finite-dimensional approximations of them. Henceforth, it is assumed that f ∈ 𝒱 , where 𝒱 is a Hilbert space with the scalar product ( ⋅ , ⋅ ) 𝒱 . This space is intermediate between V and V * , i.e., V ∈ 𝒱 ∈ V * .

The operator 𝒜 ∘ = Q * ⁢ Λ ∘ ⁢ Q contains the operator Λ ∘ generated by a simplified structure. We assume that the operators Λ and Λ ∘ are Hermitian (i.e., ( Λ ⁢ y , z ) = ( y , Λ ⁢ z ) and ( Λ ∘ ⁢ y , z ) = ( y , Λ ∘ ⁢ z ) ) and satisfy the conditions

λ ⊖ ∘ ⁢ ∥ y ∥ 2 ≤ ( Λ ∘ ⁢ y , y ) ≤ λ ⊕ ∘ ⁢ ∥ y ∥ 2   for all  ⁢ y ∈ Y ,
λ ⊖ ⁢ ∥ y ∥ 2 ≤ ( Λ ⁢ y , y ) ≤ λ ⊕ ⁢ ∥ y ∥ 2 , λ ⊖ < λ ⊕ .

Then, the structural operators Λ and Λ ∘ are spectrally equivalent:

(1.2) c 1 ⁢ ( Λ ∘ ⁢ y , y ) ≤ ( Λ ⁢ y , y ) ≤ c 2 ⁢ ( Λ ∘ ⁢ y , y ) ,

where the constants are the minimal and maximal eigenvalues of the generalized spectral problem Λ ⁢ y - μ ⁢ Λ ∘ ⁢ y = 0 . Obviously, they satisfy the estimates c 1 ≥ λ ⊖ / λ ⊕ ∘ and c 2 ≤ λ ⊕ / λ ⊖ ∘ (which may be rather coarse).

Concerning the operator Q, we assume that there exists a positive constant c such that

∥ Q ⁢ w ∥ ≥ c ⁢ ∥ w ∥ V   for all  ⁢ w ∈ V .

Generalized solutions of the problems 𝒫 and 𝒫 0 are defined by the variational identities

(1.3) ( Λ ⁢ Q ⁢ u , Q ⁢ w ) = 〈 f , w 〉   for all  ⁢ w ∈ V ,

and

(1.4) ( Λ ∘ ⁢ Q ⁢ u ∘ , Q ⁢ w ) = 〈 f ~ , w 〉   for all  ⁢ w ∈ V .

In Section 2, we show that a sequence { u k } converging to u in V can be constructed by solving problems (1.4) with specially constructed right-hand sides f ~ k generated by the residual of (1.3). In proving convergence, the key issue is analysis of the spectral radius of the operator

(1.5) 𝔹 ρ := 𝕀 - ρ ⁢ Λ ∘ - 1 ⁢ Λ ,

and selection of such relaxation parameter ρ that provides the best convergence rate. Moreover, iteration procedures of such a type become contracting if the iteration parameter is properly selected. This fact is often used in proving analytical results (see, e.g., [20], where classical results on existence and uniqueness of a variational inequality are established by contraction arguments). Also, these ideas were used in the construction of various numerical methods (see, e.g., [7]). However, achieving our goals requires more than the fact of contraction. We need explicit and realistic estimates of the contraction factor (which are used in error analysis) and a practical method of finding Λ ∘ with minimal q. The latter task leads to a special optimization problem that defines the most efficient “simplified” operator Λ ∘ among a certain class of “admissible” operators. This question is studied in Section 3. In general, Λ and Λ ∘ can be induced by scalar, vector, and tensors functions. We show that selection of the optimal structural operator Λ ∘ is reduced to a special interpolation type problem, which is purely algebraical and does not require solving a differential problem (therefore a suitable Λ ∘ can be found a priori). We discuss several examples and suggest the corresponding optimal (or quasi-optimal) Λ ∘ , which guarantees convergence of the iteration sequence with explicitly known contraction factor.

Now, it is worth discussing the main differences between our approach and the classical homogenization method developed for regular periodic structures. This method operates with a homogenized boundary value problem Q * ⁢ Λ H ⁢ Q ⁢ u H = f , where Λ H is defined by means of an auxiliary problem with periodical boundary conditions in the cell of periodicity. The respective solution u H contains an irremovable (modeling) error depending on the cell diameter ϵ. Moreover, if ϵ tends to zero, then typically u H converges to u only weakly (e.g., in L 2 ). Getting a better convergence (e.g., in H 1 ) requires certain corrections, which lead to other (more complicated) boundary value problems in the cell of periodicity. The respective “corrected” solution u H c also contains an error. Typically, the error is proportional to ϵ and can be neglected only if the amount of cells is very large. If our method is applied to perfectly periodical structures then setting Λ ∘ := Λ H is one possible option. In this case, the homogenized operator (defined without correction procedures) is used for a different purpose: construction of a suitable preconditioning operator. The latter operator generates numerical solutions converging to the exact solution in the energy norm (i.e., the method is free from irremovable errors) and can be applied for a rather wide range of ϵ. In addition, the theory suggests other simpler ways of selecting suitable Λ ∘ . In this context, it is interesting to know whether or not the choice Λ ∘ := Λ H always yields the minimal value of the contraction factor. In Section 3, we briefly discuss this question and present an example of that the best Λ ∘ may differ from Λ H .

In Section 4, we deduce a posteriori estimates that provide fully computable and guaranteed estimates of the distance to the exact solution u for any numerical approximation u k , h computed for an approximation subspace V h . These estimates are established by combining functional type a posteriori estimates (see [25, 22, 26] and references cited therein) and estimates generated by the contraction property of the iteration method (see [24, 29]).

The second part of the paper is devoted to a fast solution method for the basic iteration problem (2.1). The key idea consists of using tensor-type representations for approximations, what is quite natural if both coefficients of the respective quasi-periodic structure and the right-hand side admit low-rank tensor-type representations. We notice that the amount of structures representable in terms of low-rank formats is much larger than the amount of periodic structures covered by the homogenization method. The idea of tensor-type approximations of partial differential equations traces back to [9]. In computational mechanics this method is known as the Kantorovich–Krylov (or extended Kantorovich) method. However, it is rarely used in modern numerical technologies. In part, this is due to restrictions on the shape of the domain imposed by the Kantorovich method. Henceforth, we assume that the domain Ω satisfies these restrictions, i.e., it is a tensor-type domain (e.g., rectangular) or a union of tensor-type domains. This assumption induces certain geometrical limitations. However, they can be bypassed by such methods as coordinate transformation and domain decomposition, which are widely used in modern computational mathematics (e.g., in iso-geometric analysis).

The recent tensor numerical methods (for steady state and dynamical problems) based on the advanced nonlinear tensor approximation algorithms have been developed in the last ten years. Literature survey on the modern tensor numerical methods for multi-dimensional PDEs can be found in [14, 16, 13]. In the context of problems considered in the paper, we are mainly concerned with another specific feature: very complicated material structure. In this case, direct application of standard finite element methods suffers from the necessity to account huge information encompassed in coefficients (especially in multi-dimensional problems). We show that tensor-type methods allow us to reduce computations to a collection of one-dimensional problems, which can be solved very efficiently using low-rank representations with the small storage requests. Similar ideas are applied for computing a posteriori error estimates.

Section 5 discusses numerical aspects of the method and exposes several examples. Typical behavior of quasi-periodic coefficients is described by oscillation around a constant, modulated oscillation around a given smooth function, or oscillation around a piecewise constant function.

Figure 1 Examples of periodic and modulated periodic coefficients in 1D.
Figure 1 Examples of periodic and modulated periodic coefficients in 1D.

Figure 1

Examples of periodic and modulated periodic coefficients in 1D.

Figure 2 An example of modulated piecewise periodic coefficients in 2D.

Figure 2

An example of modulated piecewise periodic coefficients in 2D.

Figure 1 (1D case) represents examples of highly oscillating (left) and modulated periodic coefficients (right) functions. Figure 2 (2D case) illustrates the well-separable equation coefficient obtained by a sum of step-type and uniformly oscillating functions, namely,

a ⁢ ( x 1 , x 2 ) = C 0 ⁢ sign ⁡ ( x 1 ) + C 1 + sin ⁡ ( π ⁢ ω a ⁢ ( x 1 + x 2 ) ) ,

where C 0 = 5 , C 1 = 6 , a = 10 , and ω = 12 .

We show that specially constructed FEM type approximations of PDEs with slightly perturbed or regularly modulated periodic coefficients on d-fold n × ⋯ × n tensor grids in ℝ d may lead to the discretized algebraic equations with the low Kronecker rank stiffness matrix of size n d × n d , where n = O ⁢ ( 1 ϵ ) is proportional to the large frequency parameter 1 ϵ . In this case, the rank decomposition with respect to the d spacial variables is applied, such that the discrete solution can be calculated in the low-rank separable form, which requires storage size of only O ⁢ ( d ⁢ n ) instead of O ⁢ ( n d ) = O ⁢ ( 1 ϵ d ) complexity representations which are mandatory for the traditional FEM techniques (the latter quickly leads to the bottleneck in case of small parameter ϵ > 0 ).

The arising linear system of equations can be solved by preconditioned iteration with the simple preconditioner Λ ∘ , such that the storage and numerical costs scale almost linearly in the univariate discrete problem size n. Numerical examples in Section 5 demonstrate the stable geometric convergence of the preconditioned CG (PCG) iteration with the preconditioner Λ ∘ and confirm the low-rank approximate separable representation to the solution with respect to d spacial variables even in the case of complicated quasi-periodic coefficients.

This approach is well suited for applying the quantized-TT (QTT) tensor approximation [15] to functions discretized on large tensor grids of size proportional to the frequency parameter, i.e., n = O ⁢ ( 1 ϵ ) , as it was demonstrated in the previous paper [17] for the case d = 1 . The use of tensor-structured preconditioned iteration with the adaptive QTT rank truncation may lead to the logarithmic complexity in the grid size, O ⁢ ( log p ⁡ n ) , see [14, 16, 23] for the rank-truncated iterative methods, [11, 12, 4, 10] for various examples of the QTT tensor approximation to lattice structured systems, and [2] for tensor approximation of complicated functions with multiple cusps in ℝ d .

In Section 6, we conclude with the discussion on further perspectives of the presented approach for 2D and 3D elliptic PDEs with quasi-periodic coefficients.

2 The Iteration Method

Let v ∈ V and ρ ∈ ℝ + . Consider the problem: find u v such that

(2.1) ( Λ ∘ ⁢ Q ⁢ u v , Q ⁢ w ) = ℓ v ∘ ⁢ ( w ) - ρ ⁢ ℓ v ⁢ ( w )   for all  ⁢ w ∈ V ,

where

ℓ v ⁢ ( w ) := ( Λ ⁢ Q ⁢ v , Q ⁢ w ) - 〈 f , w 〉   and   ℓ v ∘ ⁢ ( w ) := ( Λ ∘ ⁢ Q ⁢ v , Q ⁢ w ) .

Obviously, the right-hand side of (2.1) is a bounded linear functional on V, so that this problem has a unique solution u v . Thus, we have a mapping T ρ : V → V , which becomes a contraction if the parameter ρ is properly selected. Indeed, for any v 1 and v 2 in V, we obtain

( Λ ∘ ⁢ Q ⁢ η , Q ⁢ w ) = ( Λ ∘ ⁢ Q ⁢ ζ - ρ ⁢ Λ ⁢ Q ⁢ ζ , Q ⁢ w )   for all  ⁢ w ∈ V ,

where u 1 = T ρ ⁢ v 1 , u 2 = T ρ ⁢ v 2 , ζ := v 1 - v 2 , and η := u 1 - u 2 . Hence

∥ η ∥ ∘ 2 := ( Λ ∘ ⁢ Q ⁢ η , Q ⁢ η ) = ( Λ ∘ ⁢ Q ⁢ ζ , Q ⁢ η ) - ρ ⁢ ( Λ ⁢ Q ⁢ ζ , Q ⁢ η )
= ( Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ η ) - ρ ⁢ ( Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ η ) = ( Q ⁢ ζ - ρ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ η )
(2.2) ≤ ∥ η ∥ ∘ ⁢ ( Λ ∘ ⁢ Q ⁢ ζ - ρ ⁢ Λ ⁢ Q ⁢ ζ , Q ⁢ ζ - ρ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ ) 1 / 2 .

From (2.2) we find that

∥ η ∥ ∘ 2 ≤ ( Λ ∘ ⁢ Q ⁢ ζ , Q ⁢ ζ ) - 2 ⁢ ρ ⁢ ( Λ ⁢ Q ⁢ ζ , Q ⁢ ζ ) + ρ 2 ⁢ ( Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ⁢ Q ⁢ ζ )
= ( Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ ζ ) - 2 ⁢ ρ ⁢ ( Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ ζ ) + ρ 2 ⁢ ( Λ ∘ - 1 ⁢ Λ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ ζ )
= ( ( 𝕀 - 2 ⁢ ρ ⁢ Λ ∘ - 1 ⁢ Λ + ρ 2 ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Λ ∘ - 1 ⁢ Λ ) ⁢ Q ⁢ ζ , Λ ∘ ⁢ Q ⁢ ζ ) = ( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ Q ⁢ ζ , Q ⁢ ζ )
(2.3) ≤ ( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ Q ⁢ ζ , 𝔹 ρ 2 ⁢ Q ⁢ ζ ) 1 / 2 ⁢ ( Λ ∘ ⁢ Q ⁢ ζ , Q ⁢ ζ ) 1 / 2 ,

where 𝔹 ρ is defined by (1.5). If ρ is selected such that

(2.4) ( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ Q ⁢ ζ , Q ⁢ ζ ) ≤ q 2 ⁢ ∥ ζ ∥ ∘ 2   for some  ⁢ q < 1 ,

then (2.3) shows that T ρ is a contractive mapping.

It is not difficult to show that ρ satisfying (2.4) can be always found. Indeed, in view of (1.2),

( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ Q ⁢ ζ , Q ⁢ ζ ) = ( Λ ∘ ⁢ Q ⁢ ζ , Q ⁢ ζ ) - 2 ⁢ ρ ⁢ ( Λ ⁢ Q ⁢ ζ , Q ⁢ ζ ) + ρ 2 ⁢ ( Λ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Q ⁢ ζ )
(2.5) ≤ ( 1 - 2 ⁢ ρ ⁢ c 1 ) ⁢ ( Λ ∘ ⁢ Q ⁢ ζ , Q ⁢ ζ ) + ρ 2 ⁢ ( Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ⁢ Q ⁢ ζ ) .

Since Λ and Λ ∘ are invertible with trivial kernels, μ and y μ are an eigenvalue and the respective eigenfunction of Λ ⁢ y μ = μ ⁢ Λ ∘ ⁢ y μ if and only if they are an eigenvalue and the eigenfunction of the problem Λ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ y μ = μ ⁢ Λ ⁢ y μ . This means that

c 1 ⁢ ( Λ ⁢ y , y ) ≤ ( Λ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ y , y ) ≤ c 2 ⁢ ( Λ ⁢ y , y ) ≤ c 2 2 ⁢ ( Λ ∘ ⁢ y , y ) .

Hence

( Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ ζ , Λ ⁢ Q ⁢ ζ ) ≤ c 2 2 ⁢ ∥ ζ ∥ ∘ 2

and (2.5) implies

( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ Q ⁢ ζ , Q ⁢ ζ ) ≤ ( 1 - 2 ⁢ ρ ⁢ c 1 + ρ 2 ⁢ c 2 2 ) ⁢ ∥ ζ ∥ ∘ 2 .

The minimum of the expression in round brackets is attained if ρ = ρ * := c 1 / c 2 2 . For ρ = ρ * , we find that

q * 2 := 1 - c 1 2 c 2 2 ≤ q ^ 2 := 1 - λ ⊖ 2 ⁢ λ ⊖ ∘ 2 λ ⊕ 2 ⁢ λ ⊕ ∘ 2 ∈ [ 0 , 1 ) .

Hence T ρ is a contractive mapping with explicitly known contraction factor q * . Well-known results in the theory of fixed points (see, e.g., [29]) yield the following theorem.

Theorem 2.1.

For any u 0 ∈ V and ρ = ρ * , the sequence { u k } ∈ V of functions satisfying the relation

(2.6) ( Λ ∘ ⁢ Q ⁢ u k , Q ⁢ w ) = ( Λ ∘ ⁢ Q ⁢ u k - 1 , Q ⁢ w ) - ρ ⁢ ( ( Λ ⁢ Q ⁢ u k - 1 , Q ⁢ w ) - 〈 f , w 〉 )   for all  ⁢ w ∈ V

converges to u in V and ∥ u k - u ∥ ∘ ≤ q * k ⁢ ∥ u 0 - u ∥ ∘ as k → + ∞ .

Remark 2.2.

From (2.3) we obtain

∥ η ∥ ∘ 2 ≤ 1 λ 0 , min ⁢ ∥ 𝔹 ρ 2 ⁢ Q ⁢ ζ ∥ ⁢ ∥ ζ ∥ ∘ ≤ ∥ 𝔹 ρ 2 ∥ λ 0 , min 2 ⁢ ∥ ζ ∥ ∘ 2 .

This relation yields a simple (but not very sharp) estimate of the contraction factor.

For further analysis, it is convenient to estimate the right-hand side of (2.3) by a different method. Let ∥ 𝔹 ρ ∥ ∘ denote the operator norm

(2.7) ∥ 𝔹 ρ ∥ ∘ := sup y ∈ Y ⁡ ∥ 𝔹 ρ ⁢ y ∥ ∘ ∥ y ∥ ∘ .

Then ∥ 𝔹 ρ ⁢ y ∥ ∘ ≤ ∥ 𝔹 ρ ∥ ∘ ⁢ ∥ y ∥ ∘ and

( Λ ∘ ⁢ 𝔹 ρ 2 ⁢ y , y ) ≤ ∥ 𝔹 ρ ∥ ∘ 2 ⁢ ∥ y ∥ ∘ 2 .

Hence (2.3) yields the estimate

∥ η ∥ ∘ ≤ ∥ 𝔹 ρ ∥ ∘ ⁢ ∥ ζ ∥ ∘ ,

which shows that T ρ is a contraction provided that

(2.8) ∥ 𝔹 ρ ∥ ∘ < 1 .

In applications 𝔹 ρ is a self-adjoint bounded operator acting in a finite-dimensional space, so that verification of this condition amounts to finding ρ which yields the respective spectral radius of 𝔹 ρ (see Section 4).

3 Selection of Λ ∘

In this section, we discuss how to select Λ ∘ in order to minimize q, which is crucial for two major aspects of quantitative analysis: convergence of the iteration method and guaranteed a posteriori estimates. We assume that V, 𝒱 , and Y are spaces of functions defined in a Lipschitz bounded domain Ω (namely y ⁢ ( x ) ∈ 𝕋 for a.e. x ∈ Ω where 𝕋 may coincide with ℝ , ℝ d , or 𝕄 d × d ) and the operators Λ and Λ ∘ are generated by bounded scalar functions, matrices or tensors. In this case,

( Λ ⁢ y , y ) := ∫ Ω Λ ⁢ ( x ) ⁢ y ⊙ y ⁢ 𝑑 x   and   ( Λ ∘ ⁢ y , y ) := ∫ Ω Λ ∘ ⁢ ( x ) ⁢ y ⊙ y ⁢ 𝑑 x ,

where ⊙ denotes the respective product of scalar, vector, or tensor functions. In view of (2.7) and (2.8), the value of ρ should minimize the quantity sup y ∈ Y ⁡ ( Λ ∘ ⁢ 𝔹 ρ ⁢ y , 𝔹 ρ ⁢ y ) / ( Λ ∘ ⁢ y , y ) . This procedure yields the contraction factor

q 2 = 𝒬 ⁢ ( Λ , Λ ∘ ) := inf ρ ⁡ sup y ∈ Y ⁡ ∫ Ω Λ ∘ ⁢ ( x ) ⁢ 𝔹 ρ ⁢ ( x ) ⁢ y ⊙ 𝔹 ρ ⁢ ( x ) ⁢ y ⁢ 𝑑 x ∫ Ω Λ ∘ ⁢ ( x ) ⁢ y ⊙ y ⁢ 𝑑 x ,

whose computation is reduced to solving algebraic problems at a.e. x ∈ Ω , i.e.,

𝒬 ⁢ ( Λ , Λ ∘ ) := inf ρ ⁡ sup x ∈ Ω ⁡ sup τ ∈ 𝕋 ⁡ Λ ∘ ⁢ ( x ) ⁢ 𝔹 ρ ⁢ ( x ) ⁢ τ ⊙ 𝔹 ρ ⁢ ( x ) ⁢ τ Λ ∘ ⁢ ( x ) ⁢ τ ⊙ τ .

Let 𝕊 be a certain set of “simple” operators defined a priori (e.g., it can be a finite-dimensional set formed by piecewise constant or polynomial functions). Then, finding the best “simplified” operator amounts to solving the following problem: find Λ ^ ∘ ∈ 𝕊 such that 𝒬 ⁢ ( Λ , Λ ^ ∘ ) is minimal. In other words, the optimal Λ ^ ∘ is defined by the problem

(3.1) inf Λ 0 ∈ 𝕊 ρ ∈ ℝ ⁡ sup x ∈ Ω τ ∈ 𝕋 ⁡ Λ ∘ ⁢ ( x ) ⁢ 𝔹 ρ ⁢ ( x ) ⁢ τ ⊙ 𝔹 ρ ⁢ ( x ) ⁢ τ Λ ∘ ⁢ ( x ) ⁢ τ ⊙ τ = q 2 .

Notice that (3.1) is an algebraic problem, which should be solved (analytically or numerically) before computations. The respective solution Λ ^ ∘ defines the best operator to be used in the iteration method (2.6) and yields the respective contraction factor. Below we discuss some particular cases, where analysis of this problem generates an optimal (or almost optimal) Λ ∘ .

Problem (3.1) is explicitly solvable if Λ ∘ and Λ have a special structure, namely,

Λ ∘ = a ∘ ⁢ ( x ) ⁢ 𝕀 , Λ = a ⁢ ( x ) ⁢ 𝕀 ,

where 𝕀 is the unit operator and a ∘ ⁢ ( x ) and a ⁢ ( x ) are positive bounded functions defined in Ω. Then,

𝔹 ρ ⁢ ( x ) = ( 1 - 𝐡 ⁢ ( x ) ) ⁢ 𝕀 , 𝐡 ⁢ ( x ) := a ⁢ ( x ) a ∘ ⁢ ( x )

and

sup τ ∈ 𝕋 ⁡ ( 1 - ρ ⁢ 𝐡 ⁢ ( x ) ) 2 ⁢ τ ⊙ τ | τ | 2 = | 1 - ρ ⁢ 𝐡 ⁢ ( x ) | 2   for all  ⁢ x ∈ Ω .

Define 𝐡 ⊖ := min x ∈ Ω ⁡ 𝐡 ⁢ ( x ) and 𝐡 ⊕ := max x ∈ Ω ⁡ 𝐡 ⁢ ( x ) . It is not difficult to show that

sup x ∈ Ω ⁡ | 1 - ρ ⁢ 𝐡 ⁢ ( x ) | = max ⁡ { | 1 - ρ ⁢ 𝐡 ⊖ | , | 1 - ρ ⁢ 𝐡 ⊕ | } .

Minimization with respect to ρ yields the best value ρ * = 2 𝐡 ⊖ + 𝐡 ⊕ and the respective value

(3.2) 𝒬 ⁢ ( Λ , Λ ∘ ) = ( 𝐡 ⊕ - 𝐡 ⊖ 𝐡 ⊕ + 𝐡 ⊖ ) 2 = ( 1 - 𝒥 ⁢ ( a , a ∘ ) 1 + 𝒥 ⁢ ( a , a ∘ ) ) 2 < 1 , 𝒥 ⁢ ( a , a ∘ ) = 𝐡 ⊖ 𝐡 ⊕ .

In accordance with (3.1), the identification of the optimal simplified problem is reduced to the problem

(3.3) sup a 0 ∈ 𝕊 ⁡ 𝒥 ⁢ ( a , a ∘ ) ,

where 𝕊 is a given set of functions.

We illustrate the above relations by means of several examples.

Example 3.1 (Constant Coefficients).

In the simplest case, we set 𝕊 = P 0 , i.e., a 0 is a constant. From (3.3) it follows that q = ( a ¯ - a ¯ ) / ( a ¯ + a ¯ ) , where a ¯ := min x ∈ Ω ⁡ a ⁢ ( x ) and a ¯ := max x ∈ Ω ⁡ a ⁢ ( x ) . Then ρ * = 2 ⁢ a 0 / ( a ¯ + a ¯ ) and the iteration procedure (2.6) with ρ = ρ * has the form

∫ Ω Q ⁢ u k ⊙ Q ⁢ w ⁢ 𝑑 x = ∫ Ω ( 1 - 2 ⁢ a a ¯ + a ¯ ) ⁢ Q ⁢ u k - 1 ⊙ Q ⁢ w ⁢ 𝑑 x + 2 a ¯ + a ¯ ⁢ ∫ Ω f ⁢ w ⁢ 𝑑 x .

From Theorem 2.1, it follows that

∫ Ω | Q ⁢ ( u k - u ) | 2 ⁢ 𝑑 x ≤ C ⁢ ( a ¯ - a ¯ a ¯ + a ¯ ) 2 ⁢ k .

Example 3.2 (Oscillation Around a Given Function).

Consider a somewhat different example. Let a ⁢ ( x ) be a function oscillating around a certain mean function g ⁢ ( x ) so that

a ⁢ ( x ) g ⁢ ( x ) ∈ [ 1 - ϵ , 1 + ϵ ] , ϵ ∈ ( 0 , 1 ) .

If g is a relatively simple function, then it is natural to set a ∘ ⁢ ( x ) = g ⁢ ( x ) . By (3.2), we find that 𝐡 ⊕ = 1 + ϵ , 𝐡 ⊖ = 1 - ϵ , and q = ϵ . Hence the method is very efficient for small ϵ (i.e., if a oscillates around g with a relatively small amplitude). Figures 1 and 2 illustrate three examples of quasi-periodic coefficients a and respective a ∘ corresponding to the case of oscillation around a constant with smooth modulation, oscillation around a given smooth function, or oscillation around a piecewise constant function.

Example 3.3 (Piecewise Constant Coefficients).

Consider a more complicated case, where Ω is divided into N nonoverlapping subdomains Ω i and Λ ∘ ⁢ ( x ) = c i ⁢ 𝕀 if x ∈ Ω i . Define the numbers

a ⊖ ( i ) = min x ∈ Ω i ⁡ a ⁢ ( x ) , a ⊕ ( i ) = max x ∈ Ω i ⁡ a ⁢ ( x ) ,
𝐡 ⊖ = min ⁡ { a ⊖ ( 1 ) c 1 , a ⊖ ( 2 ) c 2 , … , a ⊖ ( N ) c N } , 𝐡 ⊕ = max ⁡ { a ⊕ ( 1 ) c 1 , a ⊕ ( 2 ) c 2 , … , a ⊕ ( N ) c N } .

Since the constants c i are defined up to a common multiplier, we can without a loss of generality assume that

(3.4) ∑ i = 1 ( N ) λ i = 1 , where  ⁢ λ i = 1 c i .

In accordance with (3.3), the maximum of 𝒬 ⁢ ( Λ , Λ ∘ ) is attained if

(3.5) min ⁡ { λ 1 ⁢ a ⊖ ( 1 ) , λ 2 ⁢ a ⊖ ( 2 ) , … , λ N ⁢ a ⊖ ( N ) } max ⁡ { λ 1 ⁢ a ⊕ ( 1 ) , λ 2 ⁢ a ⊕ ( 2 ) , … , λ N ⁢ a ⊕ ( N ) } → max ,

where λ i > 0 and satisfy (3.4). If N = 2 , then problem (3.5) has a simple solution, which shows that the ratio λ 1 / λ 2 (i.e., c 2 / c 1 ) can be any in the interval [ ξ 1 , ξ 2 ] , where

ξ 1 = min ⁡ { a ⊖ ( 2 ) a ⊖ ( 1 ) , a ⊕ ( 2 ) a ⊕ ( 1 ) } , ξ 2 = max ⁡ { a ⊖ ( 2 ) a ⊖ ( 1 ) , a ⊕ ( 2 ) a ⊕ ( 1 ) } .

It is interesting to compare these results with those generated by homogenized models in the case of perfectly periodic structures. For this purpose, we consider a simple one-dimensional problem

( a ⁢ u ′ ) ′ - f = 0   in  ⁢ ( 0 , 1 )

with

a ⁢ ( x ) = a ( 1 ) ⁢ ( x )   in  ⁢ Ω 1 = ( 0 , β ) , β ∈ ( 0 , 1 ) ,
a ⁢ ( x ) = a ( 2 ) ⁢ ( x )   in  ⁢ Ω 2 = ( β , 1 ) ,

where a ( 1 ) ⁢ ( x ) is a perfectly periodical function attaining only two values a ⊖ ( 1 ) and a ⊕ ( 1 ) . The Lebesgue measure of the set where a ⁢ ( x ) = a ⊕ ( 1 ) is κ 1 ⁢ | Ω 1 | , κ 1 ∈ ( 0 , 1 ) . Analogously, a ( 2 ) ⁢ ( x ) is a periodical function attaining only two values a ⊖ ( 2 ) and a ⊕ ( 2 ) . The Lebesgue measure of the set where a ⁢ ( x ) = a ⊕ ( 2 ) is κ 2 ⁢ | Ω 2 | , κ 2 ∈ ( 0 , 1 ) . We assume that a ⊖ ( i ) < a ⊕ ( 2 ) , i = 1 , 2 and the amount of periods is very large. Then, the homogenization method can be successfully applied. The corresponding homogenized problem has the coefficients (see, e.g., [8])

a H ( 1 ) := ( 1 β ⁢ ∫ 0 β 1 a ( 1 ) ⁢ ( x ) ⁢ 𝑑 x ) - 1   in  ⁢ Ω 1 , a H ( 2 ) := ( 1 1 - β ⁢ ∫ β 1 1 a ( 2 ) ⁢ ( x ) ⁢ 𝑑 x ) - 1   in  ⁢ Ω 2 .

It is easy to see that

a H ( 1 ) = a ⊖ ( 1 ) ⁢ a ⊕ ( 1 ) κ 1 ⁢ a ⊖ ( 1 ) + ( 1 - κ 1 ) ⁢ a ⊕ ( 1 ) ∈ ( a ⊖ ( 1 ) , a ⊕ ( 1 ) ) , a H ( 2 ) = a ⊖ ( 2 ) ⁢ a ⊕ ( 2 ) κ 2 ⁢ a ⊖ ( 2 ) + ( 1 - κ 2 ) ⁢ a ⊕ ( 2 ) ∈ ( a ⊖ ( 2 ) , a ⊕ ( 2 ) ) .

Hence

a H ( 2 ) a H ( 1 ) ∈ ( ξ 1 H , ξ 2 H ) , where  ⁢ ξ 1 H := a ⊖ ( 2 ) a ⊕ ( 1 ) < min ⁡ { a ⊖ ( 2 ) a ⊖ ( 1 ) , a ⊕ ( 2 ) a ⊕ ( 1 ) } = ξ 1 , ξ 2 H := a ⊕ ( 2 ) a ⊖ ( 1 ) > ξ 2

and ( ξ 1 , ξ 2 ) ⊂ ( ξ 1 H ⁢ ξ 2 H ) . Therefore, the coefficients a H ( 1 ) and a H ( 2 ) may not generate the best piecewise constant a ∘ , which produces the smallest contraction factor in the iteration procedure (2.6).

4 Error Estimates

4.1 General Estimate

Since T ρ is a contractive mapping, we can use the Ostrowski estimates (see [24, 29, 26]) of the distance between v ∈ V and u (the fixed point). The estimates state that

(4.1) ∥ v - u ∥ ∘ ∈ { ϵ 1 + q ⁢ ( ρ ) , ϵ 1 - q ⁢ ( ρ ) } , where  ⁢ ϵ := ∥ T ρ ⁢ v - v ∥ ∘ .

This estimate cannot be directly applied because v ρ := T ρ ⁢ v is generally unknown (it is the exact solution of a boundary value problem). Instead, we must use a numerical approximation v ~ ρ (in our analysis, we impose no restrictions on the method by which the function v ~ ρ ∈ V was constructed). Thus, the difference η ρ := v - v ~ ρ is a known function and the quantity δ ρ = ∥ η ρ ∥ ∘ is directly computable. It is easy to see that

(4.2) δ ρ - ∥ v ~ ρ - v ρ ∥ ∘ ≤ ∥ v ρ - v ∥ ∘ ≤ δ ρ + ∥ v ~ ρ - v ρ ∥ ∘ .

To deduce a fully computable majorant of the norm ∥ v ~ ρ - v ρ ∥ ∘ we use the method suggested in [25, 26]. First, we rewrite (2.1) in the form

(4.3) ( Λ ∘ ⁢ Q ⁢ v ρ , Q ⁢ w ) = ( Λ ∘ ⁢ Q ⁢ v , Q ⁢ w ) - ρ ⁢ ( ( Λ ⁢ Q ⁢ v , Q ⁢ w ) - 〈 f , w 〉 ) .

For any y ∈ Y and w ∈ V 0 , we have

(4.4) ( Λ ∘ ⁢ Q ⁢ ( v ρ - v ~ ρ ) , Q ⁢ w ) = ( Λ ∘ ⁢ Q ⁢ ( v - v ~ ρ ) , Q ⁢ w ) - ρ ⁢ ( ( Λ ⁢ Q ⁢ v , Q ⁢ w ) - 〈 f , w 〉 )
= ( Λ ∘ ⁢ Q ⁢ ( v - v ~ ρ ) - ρ ⁢ Λ ⁢ Q ⁢ v + y , Q ⁢ w ) - 〈 Q * ⁢ y - ρ ⁢ f , w 〉 .

We estimate the first term on the right-hand side of (4.4) as follows:

( Λ ∘ ⁢ Q ⁢ ( v - v ~ ρ ) - ρ ⁢ Λ ⁢ Q ⁢ v + y , Q ⁢ ( v ρ - v ~ ρ ) ) = ( Q ⁢ ( v - v ~ ρ ) - ρ ⁢ Λ ∘ - 1 ⁢ Λ ⁢ Q ⁢ v + Λ ∘ - 1 ⁢ y , Λ ∘ ⁢ Q ⁢ ( v ρ - v ~ ρ ) )
≤ ( Λ ∘ ⁢ Q ⁢ ( v - v ~ ρ ) + τ y , Q ⁢ ( v - v ~ ρ ) + Λ ∘ - 1 ⁢ τ y ) 1 / 2 ⁢ ∥ v ρ - v ~ ρ ∥ ∘ ,

where τ y := y - ρ ⁢ Λ ⁢ Q ⁢ v . The second term meets the estimate

〈 Q * ⁢ y - ρ ⁢ f , v ρ - v ~ ρ 〉 ≤ ∥ Q * ⁢ y - ρ ⁢ f ∥ ⁢ ∥ v ρ - v ~ ρ ∥ ≤ 1 ( λ ⊖ ∘ ) 1 / 2 ⁢ ∥ Q * ⁢ y - ρ ⁢ f ∥ ⁢ ∥ v ρ - v ~ ρ ∥ ∘ ,

where ∥ w * ∥ = sup w ∈ V ⁡ 〈 w * , w 〉 / ∥ w ∥ ∘ is the dual norm. Hence

(4.5) ∥ v ρ - v ~ ρ ∥ ∘ ≤ ( Λ ∘ Q η ρ + τ y , Q η ρ + Λ ∘ - 1 τ y ) 1 / 2 + 1 λ ⊖ ∘ ∥ Q * y - ρ f ∥ = : M ⊕ ( η ρ , τ y ) .

Notice that

inf y ∈ Y ⁡ M ⊕ ⁢ ( η ρ , τ y ) = ∥ v ρ - v ~ ρ ∥ ∘ .

Indeed, set y = Λ ∘ ⁢ Q ⁢ ( v ρ - v ) + ρ ⁢ Λ ⁢ Q ⁢ v (in this case, τ y = Λ ∘ ⁢ Q ⁢ ( v ρ - v ) ). In view of (4.3), 〈 Q * ⁢ y - ρ ⁢ f , w 〉 = 0 , and the majorant is equal to ∥ v ρ - v ~ ρ ∥ ∘ 2 . Thus, estimate (4.5) has no irremovable gap and a properly selected y yields a sharp upper bound of the error.

Remark 4.1.

It is not difficult to show that the last term of M ⊕ ⁢ ( η ρ , τ y ) can be estimated via an explicitly computable quantity provided that y has the same regularity as the true flux (see [26]). However, in our subsequent analysis Q * ⁢ y - ρ ⁢ f = 0 and these advanced forms of the majorant are not required. In this case, the majorant has a simpler form:

M ⊕ 2 ⁢ ( η ρ , τ y ) = ( Λ ∘ ⁢ Q ⁢ η ρ , Q ⁢ η ρ ) + ( Λ ∘ - 1 ⁢ τ y , τ y ) + 2 ⁢ ( Q ⁢ η ρ , τ y ) .

It is important that the computation of the majorant M ⊕ does not require inversion of the operator Λ associated with a complicated quasi-periodic problem.

Now, (4.1), (4.2), and (4.5) yield the following result.

Theorem 4.2.

The error e = v - u is subject to the estimate

(4.6) ∥ e ∥ ∘ ∈ [ max ⁡ { 0 , δ ρ - M ⊕ ⁢ ( η ρ , τ y ) 1 + q ⁢ ( ρ ) } , δ ρ + M ⊕ ⁢ ( η ρ , τ y ) 1 - q ⁢ ( ρ ) ] ,

where M ⊕ is defined by (4.5), τ y := y - ρ ⁢ Λ ⁢ Q ⁢ v , and y is a function in Y.

Remark 4.3.

Here η ρ and δ ρ are directly computable and q ⁢ ( ρ ) is defined in accordance with relations presented in the previous section. Hence the cost of (4.6) is mainly related to the quantity M ⊕ ⁢ ( η ρ , τ y ) , which is an a posteriori error majorant of the functional type. The derivation of such estimates is performed by purely functional methods and does not exploit special features of approximations (e.g., Galerkin orthogonality), numerical method, and exact solution (e.g., extra regularity). Properties of the majorants are well studied (see [25, 26] and the literature cited therein). Numerous tests performed for different boundary value problems have confirmed high practical efficiency of error majorants of the functional type. It was shown that M ⊕ is a guaranteed and efficient majorant of the global error and generates good indicators of local errors if y is replaced by a certain numerical reconstruction of the exact dual solution. There are many different ways to obtain suitable reconstructions with minimal expenditures (concerning this point we refer to [21] where the reader will find a systematic discussion of computational aspects in the context of various boundary value problems). Error majorants of this type can be also used for the evaluation of modeling errors (see [28, 27]).

Usually, the cost of a good estimate (with the efficiency index between 1 and 2) is comparable with the cost of a numerical solution. However, the proportion essentially depends on the numerical method used. For the classical FEM schemes the expenditures are maximal (because this method generates rather coarse approximations of fluxes). For the dual mixed method, finite volume method, isogeometric approximations, and other methods producing locally equilibrated fluxes, the expenditures may be two to three times smaller than for the numerical solution.

4.2 Examples

Now we shortly discuss applications of Theorem 4.2 to problems where Q and Q * are defined by the operators ∇ and div , respectively, Λ ∘ = a ∘ ⁢ ( x ) ⁢ 𝕀 , Λ = a ⁢ ( x ) ⁢ 𝕀 , x ∈ Ω , and V = H ̊ 1 ⁢ ( Ω ) .

d = 1 .

Let Ω = ( 0 , 1 ) . Equation (1.1) has the form ( a ⁢ ( x ) ⁢ u ′ ) ′ - f = 0 . In this case, Q ⁢ w = w ′ , Q * ⁢ y = - y ′ , and (4.3) is reduced to

(4.7) ∫ 0 1 a ∘ ⁢ ( v ρ - v ) ′ ⁢ w ′ ⁢ 𝑑 x + ρ ⁢ ∫ 0 1 ( a ⁢ v ′ ⁢ w ′ + f ⁢ w ) ⁢ 𝑑 x = 0 .

In order to apply Theorem 4.2, we set y = ρ ⁢ ( g ⁢ ( x ) + μ ) , where g ⁢ ( x ) = - ∫ 0 x f ⁢ 𝑑 x and μ is a constant. Then - y ′ - ρ ⁢ f = 0 and τ = ρ ⁢ ( g ⁢ ( x ) + μ ) - ρ ⁢ a ⁢ v ′ = ρ ⁢ ( μ + g - a ⁢ v ′ ) . The best constant μ is defined by minimization of M ⊕ 2 ⁢ ( η ρ , τ ) , which has the form

∫ 0 1 ( a ∘ ⁢ ( η ρ ′ ) 2 + a ∘ - 1 ⁢ ρ 2 ⁢ ( μ + g - a ⁢ v ′ ) 2 + 2 ⁢ η ρ ′ ⁢ ρ ⁢ ( μ + g - a ⁢ v ′ ) ) ⁢ 𝑑 x

Since ∫ 0 1 η ρ ′ ⁢ 𝑑 x = 0 , the problem is reduced to minimization of the second term and the best μ satisfies the equation

∫ 0 1 a ∘ - 1 ⁢ ( μ + g ⁢ ( x ) - a ⁢ v ′ ) ⁢ 𝑑 x = 0 .

Hence

μ = μ ¯ := ∫ 0 1 a ∘ - 1 ⁢ ( a ⁢ v ′ - g ) ⁢ 𝑑 x ∫ 0 1 a ∘ - 1 ⁢ 𝑑 x ,

and (4.6) yields the estimate

(4.8) ∥ e ∥ ∘ ∈ [ max ⁡ { 0 , δ ρ - I ⊕ ⁢ ( v , v ~ ρ ) 1 + q ⁢ ( ρ ) } , δ ρ + I ⊕ ⁢ ( v , v ~ ρ ) 1 - q ⁢ ( ρ ) ] ,

where

I ⊕ 2 ⁢ ( v , v ~ ρ ) = ∫ 0 1 a ∘ - 1 ⁢ ( a ∘ ⁢ ( v - v ~ ρ ) ′ + ρ ⁢ ( μ ¯ + g - a ⁢ v ′ ) ) 2 ⁢ 𝑑 x .

Here v and v ~ ρ are two consequent numerical approximations (e.g., finite element approximations v k , h and v k + 1 , h computed on a mesh ℐ h ). Then

η ρ = η k , h := v k , h - v k + 1 , h   and   δ ρ = δ k , h := ∥ v k , h - v k + 1 , h ∥ ∘

are directly computable. Since a ∘ is a “simple” function, the integrals

F 1 = ∫ 0 1 a ∘ - 1 ⁢ 𝑑 x , F 2 = ∫ 0 1 a ∘ - 1 ⁢ g ⁢ 𝑑 x , F 3 = ∫ 0 1 a ∘ ⁢ ( η k , h ′ ) 2 ⁢ 𝑑 x ,
F 4 = ∫ 0 1 a ∘ - 1 ⁢ ( μ ¯ + g ) 2 ⁢ 𝑑 x , F 5 = ∫ 0 1 ( μ ¯ + g ) ⁢ η k , h ′ ⁢ 𝑑 x

are easy to compute. Other integrals

G 1 = ∫ 0 1 a ∘ - 1 ⁢ a ⁢ v k , h ′ ⁢ 𝑑 x , G 2 = ∫ 0 1 a ⁢ v k , h ′ ⁢ η k , h ′ ⁢ 𝑑 x ,
G 3 = ∫ 0 1 ( μ ¯ + g ) ⁢ a ∘ - 1 ⁢ a ⁢ v k , h ′ ⁢ 𝑑 x , G 4 = ∫ 0 1 a ∘ - 1 ⁢ a 2 ⁢ ( v k , h ′ ) 2 ⁢ 𝑑 x

contain a highly oscillating coefficient a multiplied by piecewise polynomial mesh functions. If a and f have low QTT rank tensor representations [15], then the integrals can be efficiently computed by tensor-type methods discussed in [17] (see also Section 5 below). We have

I ⊕ 2 ( v , v ~ ρ ) = F 3 + 2 ρ G 2 + 2 ρ F 5 + ρ 2 ( F 4 - 2 G 3 + G 4 ) = : ε k , h 2 , μ ¯ = G 1 - F 2 F 1 .

Notice that the quantity ε k , h is the value of the majorant, where the flux has been selected in the best way. It is not difficult to show that

a ∘ ⁢ ( v ρ ′ - v ′ ) = ρ ⁢ ( g + μ ¯ ) - a ⁢ v ′

and, therefore,

I ⊕ 2 ⁢ ( v , v ~ ρ ) = ∫ 0 1 a ∘ ⁢ ( v ρ ′ - v ~ ρ ′ ) 2 ⁢ 𝑑 x .

In other words, this term coincides with the error of the Galerkin solution related to the simplified boundary value problem (4.7), where v = v k , h . In accordance with Section 3, we set ρ = 2 / ( 𝐡 ⊖ + 𝐡 ⊕ ) and find that q = ( 𝐡 ⊕ - 𝐡 ⊖ ) / ( 𝐡 ⊖ + 𝐡 ⊕ ) . Now (4.8) yields easily computable lower and upper bounds of the error encompassed in v k , h :

(4.9) δ k , h - ε k , h 1 + q ≤ ∥ v k , h - u ∥ ∘ ≤ δ k , h + ε k , h 1 - q .

Remark 4.4.

It is worth adding comments on convergence properties of the quantities δ k , h and ε k , h entering (4.9). If the mesh 𝒯 h is fixed, then v k , h tends to the Galerkin approximation u h of problem (1.1) on this mesh. This fact follows from Theorem 2.1 applied to the case where the iterations are performed on the respective finite-dimensional space V h (considered as the space V). Then, ∥ v k , h - u h ∥ ∘ ≤ q k ⁢ ∥ v 0 , h - u h ∥ and for any h the term δ k , h tends to zero with the geometric rate. The quantity ε k , h is equal to the error of the Galerkin solution to the simplified problem. It has different asymptotic properties. It mainly depends on 𝒯 h , and for a given mesh it does not tend to zero when k → + ∞ . However, for any given v (which in our example is defined by v k , h ) this term goes to zero if h → 0 provided that the mesh satisfies the standard regularity conditions. The problem with a ∘ is assumed to be much more regular than the problem with a. Therefore, in terms of h the approximation error ε k , h (associated with a ∘ ) will decrease faster than the analogous error in the original problem (e.g., for a ∘ = const , the term ε k , h is proportional to h).

Since both quantities δ k , h and ε k , h are explicitly known, estimate (4.9) (and other analogous estimates) contains a very useful information unavailable in the context of purely asymptotic error analysis. Using this information, we can organize the computational process in the best possible way by comparing iteration and discretization errors. In this process, rapidly converging iterations with respect to k should be continued until δ k , h > ε k , h . If δ k , h ≈ ε k , h , then further iterations on the mesh 𝒯 h are unable to essentially improve the numerical solution. Instead, we should refine 𝒯 h , project u k , h on the refined mesh, and use it as a starting point for a new series of iterations generated by problem (4.7). For each step, we compute the right-hand side of (4.9) and stop the process when it becomes smaller than the desired tolerance.

d = 2 .

The computation of M ⊕ for 2D problems can be also reduced to the computation of one-dimensional integrals. Certainly, on the multidimensional case the amount of integrals is much larger. However, the basic tensor decomposition methods remain the same. Below we briefly discuss them with the paradigm of a simple case where

f = f ( 1 ) ⁢ ( x 1 ) ⁢ f ( 2 ) ⁢ ( x 2 )   and   a = a ( 1 ) ⁢ ( x 1 ) ⁢ a ( 2 ) ⁢ ( x 2 ) .

Assume that approximations are represented in the form of series formed by one-dimensional functions ϕ i ( 1 ) and ϕ j ( 2 ) (which may be supported locally or globally), so that

v = ∑ i = 1 n 1 ∑ j = 1 n 2 γ i ⁢ j ⁢ ϕ i ( 1 ) ⁢ ( x 1 ) ⁢ ϕ j ( 2 ) ⁢ ( x 2 ) , v ~ ρ = ∑ i = 1 n 1 ∑ j = 1 n 2 γ ~ i ⁢ j ⁢ ϕ i ( 1 ) ⁢ ( x 1 ) ⁢ ϕ j ( 2 ) ⁢ ( x 2 ) .

In this case,

∇ ⁡ η ρ = ( ∑ i = 1 n 1 ∑ j = 1 n 2 ς i ⁢ j ⁢ ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ ϕ j ( 2 ) , ∑ i = 1 n 1 ∑ j = 1 n 2 ς i ⁢ j ⁢ ϕ i ( 1 ) ⁢ ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ) , where  ⁢ ς i ⁢ j = γ i ⁢ j - γ ~ i ⁢ j .

We define another set of one-dimensional functions W k ( 1 ) ⁢ ( x 1 ) and W l ( 2 ) ⁢ ( x 2 ) , which form the vector function

(4.10) y = Υ 0 + ∑ k = 1 m 1 ∑ l = 1 m 2 σ k ⁢ l ⁢ Υ k ⁢ l , Υ k ⁢ l = { W k ( 1 ) ⁢ ∂ ⁡ W l ( 2 ) ∂ ⁡ x 2 ; - ∂ ⁡ W k ( 1 ) ∂ ⁡ x 1 ⁢ W l ( 2 ) } .

Here Υ 0 is a given function, which can be defined in different ways. In particular, we set

Υ 0 = { W 0 ( 1 ) ⁢ ( x 1 ) ⁢ W 0 ( 2 ) ⁢ ( x 2 ) ; 0 } , W 0 ( 1 ) ⁢ ( x 1 ) = ∫ 0 x 1 f ( 1 ) ⁢ 𝑑 x 1 , W 0 ( 2 ) = - ρ ⁢ f ( 2 ) .

The functions Υ k ⁢ l must satisfy the usual linear independence conditions in order to guarantee unique solvability of the respective approximation problem. For any smooth function w vanishing on ∂ ⁡ Ω , we have

∫ Ω ( Υ 0 ⋅ ∇ ⁡ w - ρ ⁢ f ⁢ w ) ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 = 0   and   ∫ Ω Υ k ⁢ l ⋅ ∇ ⁡ w ⁢ d ⁢ x 1 ⁢ d ⁢ x 2 = 0 .

Thus, ∥ Q * ⁢ y - ρ ⁢ f ∥ = 0 and we can use the simplified form of M ⊕ .

In the simplest case Λ ∘ = a ∘ ⁢ 𝕀 , where a ∘ is a constant. The best y minimizes the quantity

M ⊕ 2 ⁢ ( η ρ , τ ) = ∫ Ω a ∘ ⁢ ∇ ⁡ η ρ ⋅ ∇ ⁡ η ρ ⁢ d ⁢ x + ∫ Ω a ∘ - 1 ⁢ y ⋅ y ⁢ 𝑑 x + ρ 2 ⁢ ∫ Ω a ∘ - 1 ⁢ a 2 ⁢ ∇ ⁡ v ⋅ ∇ ⁡ v ⁢ d ⁢ x
(4.11) - 2 ⁢ ∫ Ω ( ρ ⁢ a ∘ - 1 ⁢ a ⁢ ∇ ⁡ v + ∇ ⁡ η ρ ) ⋅ y ⁢ 𝑑 x + 2 ⁢ ρ ⁢ ∫ Ω a ⁢ ∇ ⁡ η ρ ⋅ ∇ ⁡ η ρ ⁢ d ⁢ x ,

which shows that y must satisfy the relation y = ρ ⁢ a ⁢ ∇ ⁡ v + a ∘ ⁢ ∇ ⁡ η ρ . We select σ k ⁢ l that defines the Galerkin approximation of this function, and we arrive at the system

∑ k = 1 m 1 ∑ l = 1 m 2 σ k ⁢ l ⁢ ∫ Ω Υ k ⁢ l ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 + ∫ Ω Υ 0 ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2
(4.12) = ∑ i = 1 n 1 ∑ j = 1 n 2 ∫ Ω ( ρ ⁢ a ⁢ γ i ⁢ j + a ∘ ⁢ ς i ⁢ j ) ⁢ ( ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ ϕ j ( 2 ) , ϕ i ( 1 ) ⁢ ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ) ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 .

Introduce the following matrices:

D ( 1 ) = { D k ⁢ l ( 1 ) } , D k ⁢ l ( 1 ) = ∫ 0 a ∂ ⁡ W k ( 1 ) ∂ ⁡ x 1 ⁢ ∂ ⁡ W l ( 1 ) ∂ ⁡ x 1 ⁢ 𝑑 x 1 , W ( 1 ) = { W k ⁢ l ( 1 ) } , W k ⁢ l ( 1 ) = ∫ 0 a W k ( 1 ) ⁢ W l ( 1 ) ⁢ 𝑑 x 1 ,
D ( 2 ) = { D k ⁢ l ( 2 ) } , D k ⁢ l ( 2 ) = ∫ 0 b ∂ ⁡ W k ( 2 ) ∂ ⁡ x 2 ⁢ ∂ ⁡ W l ( 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 2 , W ( 2 ) = { W k ⁢ l ( 2 ) } , W k ⁢ l ( 2 ) = ∫ 0 b W k ( 2 ) ⁢ W l ( 2 ) ⁢ 𝑑 x 2 ,
F ( 1 ) = { F i ⁢ k ( 1 ) } , F i ⁢ k ( 1 ) = ∫ 0 a ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ W k ( 1 ) ⁢ 𝑑 x 1 , G ( 1 ) = { G i ⁢ k ( 1 ) } , G i ⁢ k ( 1 ) = ∫ 0 a ϕ i ( 1 ) ⁢ ∂ ⁡ W k ( 1 ) ∂ ⁡ x 1 ⁢ 𝑑 x 1 ,
F ( 2 ) = { F j ⁢ l ( 2 ) } , F j ⁢ l ( 2 ) = ∫ 0 b ϕ j ( 2 ) ⁢ ∂ ⁡ W l ( 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 2 , G ( 2 ) = { G j ⁢ l ( 2 ) } , G j ⁢ l ( 2 ) = ∫ 0 b ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ⁢ W l ( 1 ) ⁢ 𝑑 x 2 ,
F ^ ( 1 ) = { F ^ i ⁢ k ( 1 ) } , F ^ i ⁢ k ( 1 ) = ∫ 0 a a 1 ⁢ ( x 1 ) ⁢ ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ W k ( 1 ) ⁢ 𝑑 x 1 , G ^ ( 1 ) = { G ^ i ⁢ k ( 1 ) } , G ^ i ⁢ k ( 1 ) = ∫ 0 a a 1 ⁢ ( x 1 ) ⁢ ϕ i ( 1 ) ⁢ ∂ ⁡ W k ( 1 ) ∂ ⁡ x 1 ⁢ 𝑑 x 1 ,
F ^ ( 2 ) = { F ^ j ⁢ l ( 2 ) } , F ^ j ⁢ l ( 2 ) = ∫ 0 b a 2 ⁢ ( x 2 ) ⁢ ϕ j ( 2 ) ⁢ ∂ ⁡ W l ( 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 2 , G ^ ( 2 ) = { G ^ j ⁢ l ( 2 ) } , G ^ j ⁢ l ( 2 ) = ∫ 0 b a 2 ⁢ ( x 2 ) ⁢ ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ⁢ W l ( 1 ) ⁢ 𝑑 x 2 ,

and vectors

𝐠 ( 1 ) = { g k ( 1 ) } , g k ( 1 ) = ∫ 0 a W 0 ( 1 ) ⁢ W k ( 1 ) ⁢ 𝑑 x 1 , 𝐠 ( 2 ) = { g l ( 2 ) } , g l ( 2 ) = ∫ 0 b W 0 ( 2 ) ⁢ ∂ ⁡ W l ( 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 2 .

Notice that all coefficients are presented by one-dimensional integrals, which can be efficiently computed with the help of special (tensor-type) methods (see, e.g., [15, 16, 17, 18, 19]).

It is not difficult to see that

Y k ⁢ l ⁢ s ⁢ t := ∫ Ω Υ k ⁢ l ⋅ Υ s ⁢ t ⁢ 𝑑 x = W k ⁢ s ( 1 ) ⁢ D l ⁢ t ( 2 ) + D k ⁢ s ( 1 ) ⁢ W l ⁢ t ( 2 )

and

∫ Ω Υ 0 ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 = ∫ Ω W 0 ( 1 ) ⁢ W s ( 1 ) ⁢ W 0 ( 2 ) ⁢ ∂ ⁡ W t ( 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 = g s ( 1 ) ⁢ g t ( 2 ) ,

where Y = { Y k ⁢ l ⁢ s ⁢ t } is the fourth-order tensor. Hence the left-hand side of the system (4.12) has the form Y ⁢ 𝝈 + 𝐠 ( 1 ) ⊗ 𝐠 ( 2 ) . In the right-hand side we have the term

∫ Ω a ∘ ⁢ ς i ⁢ j ⁢ ( ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ ϕ j ( 2 ) , ϕ i ( 1 ) ⁢ ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ) ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 = a ∘ ⁢ H ⁢ 𝝇 ,

where H = { H i ⁢ j ⁢ s ⁢ t } , H s ⁢ t ⁢ i ⁢ j = F i ⁢ s ( 1 ) ⁢ F j ⁢ t ( 2 ) - G i ⁢ s ( 1 ) ⁢ G j ⁢ t ( 2 ) . Another term is

∫ Ω ρ ⁢ a ⁢ γ i ⁢ j ⁢ ( ∂ ⁡ ϕ i ( 1 ) ∂ ⁡ x 1 ⁢ ϕ j ( 2 ) , ϕ i ( 1 ) ⁢ ∂ ⁡ ϕ j ( 2 ) ∂ ⁡ x 2 ) ⋅ Υ s ⁢ t ⁢ 𝑑 x 1 ⁢ 𝑑 x 2 = H ^ ⁢ 𝜸 ,

where H ^ = { H ^ i ⁢ j ⁢ s ⁢ t } , H ^ s ⁢ t ⁢ i ⁢ j = F ^ i ⁢ s ( 1 ) ⁢ F ^ j ⁢ t ( 2 ) - G ^ i ⁢ s ( 1 ) ⁢ G ^ j ⁢ t ( 2 ) .

Now (4.12) implies

𝝈 = Y - 1 ⁢ ( H ^ ⁢ 𝜸 + a ∘ ⁢ H ⁢ 𝝇 - 𝐠 ( 1 ) ⊗ 𝐠 ( 2 ) )

and the value of M ⊕ is obtained by (4.6), (4.10), and (4.11).

5 Low-Rank Solution of the Discrete Equation

We consider the following elliptic diffusion equation with quasi-periodic coefficient a ⁢ ( x ) > 0 (whose oscillations are characterized by the parameter ϵ):

(5.1) 𝒜 ⁢ u = - div ⁡ ( a ⁢ ( x ) ⁢ ∇ ⁡ u ) = f ⁢ ( x ) , x = ( x 1 , … , x d ) ∈ Ω = ( 0 , 1 ) 2 , u | Γ = 0 ,

where the function f corresponds to the modified right-hand side in problem (4.3). In this case Γ = ∂ ⁡ Ω , Λ = a ⁢ 𝕀 , Q ⁢ w = ∇ ⁡ w , and Q * ⁢ y = - div ⁡ y .

In what follows we assume that f and a admit low-rank representation i.e.,

f = ∑ i = 1 R f f 1 i ⁢ ( x 1 ) ⁢ f 2 i ⁢ ( x 2 ) , a = ∑ j = 1 R a a 1 j ⁢ ( x 1 ) ⁢ a 2 j ⁢ ( x 2 ) ,

where the parameters R f and R a are called the separation rank. Then one may assume that the exact FEM solution can be well approximated by u K ⁢ ( x ) = ∑ j = 1 K u 1 j ⁢ ( x 1 ) ⁢ u 2 j ⁢ ( x 2 ) , where K depends on the separation rank of f and a. In some cases this important property can be rigorously proven (e.g., for the Laplacian and other closely related operators). Similar low-rank approximations can be observed for the QTT tensor approximations (see [17]). Existence of a low-rank solution means that for some moderate K we have u K ≈ u up to the rank truncation threshold.

First, we sketch the rank-structured computational scheme. In our set of examples the original problem is to find u such that

∫ Ω a ⁢ ( x ) ⁢ ∇ ⁡ u ⋅ ∇ ⁡ w ⁢ d ⁢ x = ∫ Ω f ⁢ w ⁢ 𝑑 x   for all  ⁢ w ∈ V 0 := H 0 1 ⁢ ( Ω ) .

It is approximated by the following Galerkin problem for the low-rank representation u K :

(5.2) ∫ Ω a ⁢ ( x ) ⁢ ∇ ⁡ u K ⋅ ∇ ⁡ w K ⁢ d ⁢ x = ∫ Ω f ⁢ w K ⁢ 𝑑 x   for all  ⁢ w K ∈ V 0 K ,

where V 0 K is a subset of V 0 formed by functions of the type

w K ⁢ ( x ) = ∑ j = 1 K ϕ 1 j ⁢ ( x 1 ) ⁢ ϕ 2 j ⁢ ( x 2 ) .

Therefore, in terms of the general scheme exposed in the introduction, the problem 𝒫 is now problem (5.2) and we solve it by iterations with the help of the simplified (preconditioned) problem

(5.3) ∫ Ω a ∘ ⁢ ( x ) ⁢ ∇ ⁡ u k K ⋅ ∇ ⁡ w K ⁢ d ⁢ x = ∫ Ω f k - 1 ⁢ w K ⁢ 𝑑 x   for all  ⁢ w K ∈ V 0 K ,

where f k - 1 depends on u k - 1 K and a ∘ is a simple function (i.e., it is representable by a sum of terms a 1 ∘ ⁢ ( x 1 ) ⁢ … ⁢ a 2 ∘ ⁢ ( x 2 ) with very simple multipliers).

Given the right-hand side, problem (5.3) is much simpler than the initial problem and the stiffness matrix associated with (5.3) is computed much easier and has a simpler (low Kronecker rank) form that allows a rank-structured representation of its inverse.

5.1 Kronecker Product Representation of the Stiffness Matrix

Figure 3 illustrates a 2D example of the L × L periodic coefficient with L = 6 corresponding to the choice ϵ = 1 L . In this example, the scalar coefficient is represented by the separable function

a ⁢ ( x ) = C + a 1 ⁢ ( x 1 ) ⁢ a 1 ⁢ ( x 2 ) , C > 0 ,

where the generating univariate function a 1 ⁢ ( x 1 ) has the shape of six uniformly distributed bumps of height 1 as shown in Figure 3, right. Figure 3, left, presents the oscillating part of a 2D coefficients function, which is a 1 ⁢ ( x 1 ) ⁢ a 1 ⁢ ( x 2 ) . Here, the coefficient bumps are displaced on the coarse grid of size 8 ⁢ L × 8 ⁢ L in such a way that bumps occupy the 4 × 4 central box in each of the 8 × 8 cells, which compose the whole L × L lattice-type decomposition of Ω (we have L = 6 in Figure 3, i.e., the size of the coarse grid is 48 × 48 , while L = 12 in Figures 4 and 5, i.e., the size of the coarse grid is 96 × 96 ). Hence the axis scale 20 , 40 , 60 , 80 denotes the coarse grid in both x 1 and x 2 that describes the construction of coefficient in detail.

Figure 3 Example of the 2D periodic oscillating coefficients (left)
and the respective 1D factor a1⁢(x1){a_{1}(x_{1})}.
Figure 3 Example of the 2D periodic oscillating coefficients (left)
and the respective 1D factor a1⁢(x1){a_{1}(x_{1})}.

Figure 3

Example of the 2D periodic oscillating coefficients (left) and the respective 1D factor a 1 ⁢ ( x 1 ) .

The examples of other possible shapes of the equation coefficient corresponding to the cases (i), (ii) and (iii) specified in Section 1 are presented in Figures 1 and 2.

We apply the FEM Galerkin discretization of equation (5.1) by means of tensor-product piecewise affine basis functions (instead of “linear finite elements”)

{ φ 𝐢 ( x ) := φ i 1 ( x 1 ) ⋯ φ i d ( x d ) } , 𝐢 = ( i 1 , … , i d ) , i ℓ ∈ ℐ ℓ = { 1 , … , n ℓ } , ℓ = 1 , … , d ,

where φ i k are 1D finite element basis functions (say, piecewise linear hat functions).

We associate the univariate basis functions with the uniform grid { ν j } , j = 1 , … , n ℓ , on [ 0 , 1 ] with the mesh size h = 1 / ( n ℓ + 1 ) . In this construction we have N = n 1 ⁢ n 2 ⁢ … ⁢ n d basis functions φ 𝐢 . Notice that the univariate grid size n ℓ is of the order of n ℓ = O ⁢ ( 1 ϵ ) designating the total problem size N = O ⁢ ( 1 ϵ d ) .

For ease of exposition we first consider the case d = 2 , and further assume that the scalar diffusion coefficient a ⁢ ( x 1 , x 2 ) can be represented in the form

a ⁢ ( x 1 , x 2 ) = ∑ k = 1 R a k ( 1 ) ⁢ ( x 1 ) ⁢ a k ( 2 ) ⁢ ( x 2 ) > 0

with a small rank parameter R.

The N × N stiffness matrix is constructed by the standard mapping of the multi-index 𝐢 into the N-long univariate index i representing all degrees of freedom. For instance, we use the so-called big-endian convention for d = 3 and d = 2 :

𝐢 ↦ i := i 3 + ( i 2 - 1 ) ⁢ n 3 + ( i 1 - 1 ) ⁢ n 2 ⁢ n 3 , 𝐢 ↦ i := i 2 + ( i 1 - 1 ) ⁢ n 2 ,

respectively. Hence all matrices and vectors are defined on the long index i as usual, however, the special Kronecker structure allows the low-storage and low-complexity matrix vector multiplications when appropriate, i.e., when a vector also admits the low-rank Kronecker form representation. In particular, the basis function φ 𝐢 is designated via the long index, i.e., φ i = φ 𝐢 .

First, we consider the simplest case R = 1 and let d = 2 . We construct the Galerkin stiffness matrix A = [ a i ⁢ j ] ∈ ℝ N × N in the form of a sum of Kronecker products of small “univariate” matrices. Recall that given p 1 × q 1 matrix A and p 2 × q 2 matrix B, their Kronecker product is defined as a p 1 ⁢ p 2 × q 1 ⁢ q 2 matrix C via the block representation

C = A ⊗ B = [ a i ⁢ j ⁢ B ] , i = 1 , … , p 1 , j = 1 , … , q 1 .

We say that the Kronecker rank of the matrix A in the representation above equals 1. Now the elements of the Galerkin stiffness matrix take the form

a i ⁢ j = 〈 𝒜 ⁢ φ i , φ j 〉 = ∫ Ω a ( 1 ) ⁢ ( x 1 ) ⁢ a ( 2 ) ⁢ ( x 2 ) ⁢ ∇ ⁡ φ i ⁢ ( x ) ⁢ ∇ ˙ ⁢ φ j ⁢ ( x ) ⁢ 𝑑 x
= ∫ 0 1 a ( 1 ) ⁢ ( x 1 ) ⁢ ∂ ⁡ φ i 1 ⁢ ( x 1 ) ∂ ⁡ x 1 ⁢ ∂ ⁡ φ j 1 ⁢ ( x 1 ) ∂ ⁡ x 1 ⁢ 𝑑 x 1 ⁢ ∫ 0 1 a ( 2 ) ⁢ ( x 2 ) ⁢ φ i 2 ⁢ ( x 2 ) ⁢ φ j 2 ⁢ ( x 2 ) ⁢ 𝑑 x 2
+ ∫ 0 1 a ( 1 ) ⁢ ( x 1 ) ⁢ φ i 1 ⁢ ( x 1 ) ⁢ φ j 1 ⁢ ( x 1 ) ⁢ 𝑑 x 1 ⁢ ∫ 0 1 a ( 2 ) ⁢ ( x 2 ) ⁢ ∂ ⁡ φ i 2 ⁢ ( x 2 ) ∂ ⁡ x 2 ⁢ ∂ ⁡ φ j 2 ⁢ ( x 2 ) ∂ ⁡ x 2 ⁢ 𝑑 x 2 ,

which leads to the rank-2 Kronecker product representation

A = [ a i ⁢ j ] = A 1 ⊗ M 2 + M 1 ⊗ A 2 ,

where ⊗ denotes the conventional Kronecker product of matrices. Here A 1 = [ a i 1 ⁢ j 1 ] ∈ ℝ n 1 × n 1 and A 2 = [ a i 2 ⁢ j 2 ] ∈ ℝ n 2 × n 2 denote the univariate stiffness matrices and M 1 = [ m i 1 ⁢ j 1 ] ∈ ℝ n 1 × n 1 and M 2 = [ m i 2 ⁢ j 2 ] ∈ ℝ n 2 × n 2 define the corresponding weighted mass matrices, e.g.,

a i 1 ⁢ j 1 = ∫ 0 1 a ( 1 ) ⁢ ( x 1 ) ⁢ ∂ ⁡ φ i 1 ⁢ ( x 1 ) ∂ ⁡ x 1 ⁢ ∂ ⁡ φ j 1 ⁢ ( x 1 ) ∂ ⁡ x 1 ⁢ 𝑑 x 1 , m i 1 ⁢ j 1 = ∫ 0 1 a ( 1 ) ⁢ ( x 1 ) ⁢ φ i 1 ⁢ ( x 1 ) ⁢ φ j 1 ⁢ ( x 1 ) ⁢ 𝑑 x 1 .

By simple algebraic transformations (e.g., by lamping of the tri-diagonal mass matrices, which does not effect the approximation order of the FEM discretization) the matrix A can be simplified to the form

(5.4) A ↦ A = A 1 ⊗ D 2 + D 1 ⊗ A 2 ,

where D 1 , D 2 are the diagonal matrices. The matrix A corresponds to the FEM discretization of the initial elliptic PDE with complicated highly oscillating coefficients.

The simple choice of the spectrally equivalent preconditioner A ∘ corresponds to the operator Laplacian. In this case the representation in (5.4) is simplified to the discrete Laplacian matrix in the form of rank-2 Kronecker sum

(5.5) A ↦ A ∘ = A 1 ⊗ I 2 + I 1 ⊗ A 2 ,

where I 1 and I 2 denote the identity matrices of the corresponding size. Here the simple tree-diagonal matrices A 1 and A 2 represent the FEM/FDM Laplacian in 1D. This matrix will be used in what follows as a prototype preconditioner for solving the linear system of equations

(5.6) A ⁢ 𝐮 = 𝐟 .

The matrix A is constructed in general for the R-term separable coefficient a ⁢ ( x 1 , x 2 ) with R ≥ 1 which leads to the rank- 2 ⁢ R Kronecker sum representation

A = ∑ k = 1 R [ A 1 , k ⊗ D 2 , k + D 1 , k ⊗ A 2 , k ] ,

with matrices of the respective size.

5.2 On Existence of the Low-Rank Solution

In this paper we discuss the approach based on the low-rank separable ϵ-approximation of the solution to equation (5.6) that is considered as the d-dimensional real-valued array 𝐮 ∈ ℝ n 1 × ⋯ × n d . In general, for the case R > 1 this favorable property is not guaranteed by the low Kronecker rank representation to the Galerkin system matrix A, discussed in Section 5.1.

Let R = 1 and d = 2 . The existence of the low-rank approximation to the solution of equation (5.6) with the low-rank right-hand side

𝐟 = ∑ k = 1 R f 𝐟 k ( 1 ) ⊗ 𝐟 k ( 2 ) , 𝐟 k ( ℓ ) ∈ ℝ n ℓ ,

and with the system matrix in the form (5.5) can be justified by plugging the representation (5.5) in the sinc -quadrature approximation to the Laplace integral transform [5]

(5.7) Λ ∘ - 1 = ∫ ℝ + e - t ⁢ Λ ∘ ⁢ 𝑑 t ≈ B M := ∑ k = - M M c k ⁢ e - t k ⁢ Λ ∘ = ∑ k = - M M c k ⁢ e - t k ⁢ A 1 ⊗ e - t k ⁢ A 2 ,

taking into account that the matrices A 1 and A 2 commute with I 1 and I 2 , respectively. Hence equation (5.7) represents the accurate rank- ( 2 ⁢ M + 1 ) Kronecker product approximation to the preconditioner Λ ∘ - 1 which can be applied directly to the right-hand side to obtain

𝐮 = Λ ∘ - 1 ⁢ 𝐟 ≈ B M ⁢ 𝐟 = ∑ k = - M M c k ⁢ ∑ m = 1 R f e - t k ⁢ A 1 ⁢ 𝐟 m ( 1 ) ⊗ e - t k ⁢ A 2 ⁢ 𝐟 m ( 2 ) .

The numerical efficiency of the representation (5.7) can be explained by the fact that the quadrature parameters t k , c k can be chosen in such a way that the low Kronecker rank approximation B M converges to Λ ∘ - 1 exponentially fast in M. For example, under the choice t k = e k ⁢ h , c k = h ⁢ t k with h = π / M there holds [5]

∥ Λ ∘ - 1 - B M ∥ ≤ C ⁢ e - β ⁢ M ⁢ ∥ Λ ∘ - 1 ∥ ,

in the Frobenius norm, which means that the approximation error ϵ > 0 can be achieved with the number of terms R B = 2 ⁢ M + 1 of the order of R B = O ⁢ ( | log ⁡ ϵ | 2 ) .

Figures 4 and 5 demonstrate the singular values of the discrete solution on the n × n grid for n = 95 , 143 and 191, indicating very moderate dependence of the ϵ-rank on the grid size n. As in the case of Figure 3, in Figures 4 and 5 we only represent the oscillating part of the coefficients and omit the small constant C > 0 .

Figure 4 Rank decomposition of the solution for the 12×12{12\times 12} periodic coefficient.
Figure 4 Rank decomposition of the solution for the 12×12{12\times 12} periodic coefficient.
Figure 4 Rank decomposition of the solution for the 12×12{12\times 12} periodic coefficient.

Figure 4

Rank decomposition of the solution for the 12 × 12 periodic coefficient.

Figure 5 Rank decomposition of the solution for the 12×12{12\times 12} modulated periodic coefficient.
Figure 5 Rank decomposition of the solution for the 12×12{12\times 12} modulated periodic coefficient.
Figure 5 Rank decomposition of the solution for the 12×12{12\times 12} modulated periodic coefficient.

Figure 5

Rank decomposition of the solution for the 12 × 12 modulated periodic coefficient.

Further enhancement of the tensor approximation can be based on the application of the quantized-TT (QTT) tensor approximation which has been already applied in [17] to the 1D equations with quasi-periodic coefficients. The power of the QTT approximation method is due to the perfect low-rank decompositions applied to the wide class of function-related tensors [15]. See [17] for a more detailed discussion and a number of numerical examples.

One can apply QTT approximations to problems with quasi-periodic coefficients, which can be described by oscillation with smooth modulation around a constant value, oscillation around a given smooth function, or oscillation around a piecewise constant function, see Figure 1 and examples in [17].

Let the vector 𝐱 ∈ ℂ N , N = 2 L , be obtained by sampling a continuous function f ∈ C ⁢ [ 0 , 1 ] (or even piecewise smooth functions), on the uniform grid of size N. For the following examples of univariate functions the explicit QTT-rank estimates of the corresponding QTT tensor representations are valid uniformly in the vector size N, see [15]:

  1. (A)

    r = 1 for complex exponentials, f ⁢ ( x ) = e i ⁢ ω ⁢ x , ω ∈ ℝ .

  2. (B)

    r = 2 for trigonometric functions, f ⁢ ( x ) = sin ⁡ ω ⁢ x , f ⁢ ( x ) = cos ⁡ ω ⁢ x , ω ∈ ℝ .

  3. (C)

    r ≤ m + 1 for polynomials of degree m.

  4. (D)

    For a function f with the QTT-rank r 0 modulated by another function g with the QTT-rank r (say, step-type function, plain wave, polynomial) the QTT rank of a product fg is bounded by a multiple of r and r 0 ,

    rank QTT ⁡ ( f ⁢ g ) ≤ rank QTT ⁡ ( f ) ⁢ rank QTT ⁡ ( g ) .

  5. (E)

    Furthermore, the following result holds [11]: the QTT rank for the periodic amplification of a reference function on a unit cell to a rectangular lattice is of the same order as that for the reference function.

The rank of the QTT tensor representation to the 1D Galerkin FEM matrix in the case of oscillating coefficients was discussed in [4, 17].

5.3 Numerical Test on the Rank Decomposition of u

Figure 6 represents the right-hand side f 1 ⁢ ( x 1 , x 2 ) and the respective solution for the discretization to equation (5.1) (with the coefficient depicted in Figure 3) on a 400 × 400 -grid, where

f 1 ⁢ ( x 1 , x 2 ) = sin ⁡ ( 2 ⁢ x 1 ) ⁢ sin ⁡ ( 2 ⁢ x 2 ) .

The PCG solver for the system of equations (5.6) with the discrete Laplacian inverse as the preconditioner demonstrates robust convergence with the rate q ≪ 1 . The next example demonstrates the rank behavior in the singular value decomposition (SVD) of a matrix representing the solution vector 𝐮 ∈ ℝ n 1 × n 2 to equation (5.6) with the 12 × 12 periodic coefficient shown in Figure 4, left. Figure 7 represents the rank behavior in the SVD decomposition of the solution in the case of the 8 × 8 periodic coefficient.

Figure 6 The right-hand side and solution for periodic oscillating coefficients shown
in Figure 3.
Figure 6 The right-hand side and solution for periodic oscillating coefficients shown
in Figure 3.

Figure 6

The right-hand side and solution for periodic oscillating coefficients shown in Figure 3.

Comparing Figures 4 and 7 indicates that the exponential decay of the approximation error in the rank parameter is stable with respect to the size of the L × L lattice structure of the coefficient, i.e., the behavior of the singular values remains almost the same for different parameters ϵ = 1 L .

Figure 7 Accuracy of the rank decomposition of the solution vs. rank parameter
for the 8×8{8\times 8} periodic coefficient and grid size n×n{n\times n}.
Figure 7 Accuracy of the rank decomposition of the solution vs. rank parameter
for the 8×8{8\times 8} periodic coefficient and grid size n×n{n\times n}.

Figure 7

Accuracy of the rank decomposition of the solution vs. rank parameter for the 8 × 8 periodic coefficient and grid size n × n .

Our iterative scheme includes only the matrix-vector multiplication with the stiffness matrix A that has the small Kronecker rank 2 ⁢ R , and the action of the preconditioner defined by the approximate inverse to the Laplacian type matrix. The latter has low Kronecker rank of order R B = O ⁢ ( | log ⁡ ε | 2 ) as shown above.

Given rank-1 vector 𝐮 = 𝐮 1 ⊗ 𝐮 2 , the standard property of the Kronecker product matrices

A ⁢ 𝐮 = A 1 ⁢ 𝐮 1 ⊗ M 2 ⁢ 𝐮 2 + M 1 ⁢ 𝐮 1 ⊗ A 2 ⁢ 𝐮 2

indicates that the matrix-vector multiplication enlarges the initial rank by the factor of 2, and similar with the action of the preconditioner. Hence each iterative step should be supplemented with certain rank truncation procedure which can be implemented adaptively to the chosen approximation threshold or fixed bound on the rank parameter.

Remark 5.1.

Notice that for d = 3 the transformed matrix A ∘ takes the form

A ∘ = A 1 ⊗ I 2 ⊗ I 3 + I 1 ⊗ A 2 ⊗ I 3 + I 1 ⊗ I 2 ⊗ A 3 ,

and it obeys the d-term Kronecker sum representation. Hence in the general case of d ≥ 2 and R ≥ 1 the Kronecker rank of the matrix A ∘ is given by

rank Kron ⁡ ( A ∘ ) = d ⁢ R .

6 Conclusions

We present a preconditioned iteration method for solving an elliptic type boundary value problem in ℝ d with the operator generated by a quasi-periodic structure with rapidly changing coefficients characterized by a small length parameter ϵ. We use tensor product FEM discretization that allows to approximate the stiffness matrix A in the form of a low-rank Kronecker sum. The preconditioner A ∘ is constructed based on certain averaging (homogenization) procedure of the initial equation coefficients such that the inversion of A ∘ is much simpler than the inversion of A. We prove contraction of the iteration method and establish explicit estimates of the contraction factor q < 1 . For typical quasi-periodic structures we deduce fully computable two-sided a posteriori estimates which are able to control numerical solutions on any iteration.

We apply the tensor-structured approximation which is especially efficient if the equation coefficients admit low-rank representations and algebraic operations are performed in tensor structured formats. Under moderate assumptions the storage and solution complexity of our approach depends only weakly (merely linear-logarithmically) on the frequency parameter 1 ϵ . Numerical tests demonstrate that the FEM solution allows the accurate low-rank separable approximation which is the basic prerequisite for application of the tensor numerical methods to the problems of geometric homogenization.

The approach allows further enhancement based on the quantized-TT (QTT) tensor approximation which is the topic for future research work. Another direction is related to fully tensor structured implementation of the computable two-sided a posteriori error estimates. The interesting question arises how far the presented approach can be extended to the numerical analysis of elliptic equations with rather unstructured jumping coefficients arising in stochastic homogenization, see, e.g., [6].

Acknowledgements

SR appreciates the support provided by the Max-Planck Institute for Mathematics in the Sciences (Leipzig, Germany) during his scientific visit in 2016. The authors are thankful to Dr. V. Khoromskaia (MPI MIS, Leipzig) for the help with numerical experiments.

References

[1] N. S. Bakhvalov and G. Panasenko, Homogenisation: Averaging Processes in Periodic Media. Mathematical Problems in the Mechanics of Composite Materials, Springer, Berlin, 1989. Search in Google Scholar

[2] P. Benner, V. Khoromskaia and B. N. Khoromskij, Range-separated tensor formats for numerical modeling of many-particle interaction potentials, preprint (2016), http://arxiv.org/abs/1606.09218. Search in Google Scholar

[3] A. Bensoussan, J.-L. Lions and G. Papanicolaou, Asymptotic Analysis for Periodic Structures, North-Holland, Amsterdam, 1978. Search in Google Scholar

[4] S. Dolgov, V. Kazeev and B. N. Khoromskij, The tensor-structured solution of one-dimensional elliptic differential equations with high-dimensional parameters, Preprint 51/2012, MPI MiS, Leipzig, 2012. Search in Google Scholar

[5] I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij, Hierarchical tensor-product approximation to the inverse and related operators in high-dimensional elliptic problems, Computing 74 (2005), 131–157. Search in Google Scholar

[6] A. Gloria and F. Otto, Quantitative estimates on the periodic approximation of the corrector in stochastic homogenization, ESAIM Proc. 48 (2015), 80–97. Search in Google Scholar

[7] R. Glowinski, J.-L. Lions and R. Trémolierés, Analyse Numérique des Inéquations Variationnelles, Dunod, Paris, 1976. Search in Google Scholar

[8] V. V. Jikov, S. M. Kozlov and O. A. Oleinik, Homogenization of Differential Operators and Integral Functionals, Springer, Berlin, 1994. Search in Google Scholar

[9] L. V. Kantorovich and V. L. Krylov, Approximate Methods of Higher Analysis, Interscience, New York, 1958. Search in Google Scholar

[10] V. Kazeev, O. Reichmann and C. Schwab, Low-rank tensor structure of linear diffusion operators in the TT and QTT formats, Linear Algebra Appl. 438 (2013), no. 11, 4204–4221. Search in Google Scholar

[11] V. Khoromskaia and B. N. Khoromskij, Grid-based lattice summation of electrostatic potentials by assembled rank-structured tensor approximation, Comp. Phys. Commun. 185 (2014), no. 12, 3162–3174. Search in Google Scholar

[12] V. Khoromskaia and B. N. Khoromskij, Tensor approach to linearized Hartree–Fock equation for lattice-type and periodic systems, preprint (2014), https://arxiv.org/abs/1408.3839. Search in Google Scholar

[13] V. Khoromskaia and B. N. Khoromskij, Tensor numerical methods in quantum chemistry: From Hartree–Fock to excitation energies, Phys. Chem. Chem. Phys. 17 (2015), 31491–31509. Search in Google Scholar

[14] B. N. Khoromskij, Tensor-structured preconditioners and approximate inverse of elliptic operators in ℝ d , J. Constr. Approx. 30 (2009), 599–620. Search in Google Scholar

[15] B. N. Khoromskij, O ⁢ ( d ⁢ log ⁡ N ) -quantics approximation of N-d tensors in high-dimensional numerical modeling, Constr. Approx. 34 (2011), 257–280. Search in Google Scholar

[16] B. N. Khoromskij, Tensors-structured numerical methods in scientific computing: Survey on recent advances, Chemometr. Intell. Lab. Syst. 110 (2012), 1–19. Search in Google Scholar

[17] B. N. Khoromskij and S. Repin, A fast iteration method for solving elliptic problems with quasiperiodic coefficients, Russian J. Numer. Anal. Math. Modelling 30 (2015), no. 6, 329–344. Search in Google Scholar

[18] B. N. Khoromskij, S. Sauter and A. Veit, Fast quadrature techniques for retarded potentials based on TT/QTT tensor approximation, Comput. Methods Appl. Math. 11 (2011), no. 3, 342–362. Search in Google Scholar

[19] B. N. Khoromskij and G. Wittum, Numerical Solution of Elliptic Differential Equations by Reduction to the Interface, Lect. Notes Comput. Sci. Eng. 36, Springer, Berlin, 2004. Search in Google Scholar

[20] J.-L. Lions and G. Stampacchia, Variational inequalities, Comm. Pure Appl. Math. 20 (1967), 493–519. Search in Google Scholar

[21] O. Mali, P. Neittaanmaki and S. Repin, Accuracy Verification Methods. Theory and Algorithms, Springer, New York, 2014. Search in Google Scholar

[22] P. Neittaanmaki and S. Repin, Reliable Methods for Computer Simulation. Error Control and a Posteriori Estimates, Elsevier, Amsterdam, 2004. Search in Google Scholar

[23] I. V. Oseledets and S. V. Dolgov, Solution of linear systems and matrix inversion in the TT-format, SIAM J. Sci. Comput. 34 (2012), no. 5, A2718–A2739. Search in Google Scholar

[24] A. Ostrowski, Les estimations des erreurs a posteriori dans les procédés itératifs, C. R. Acad. Sci Paris Sér. A–B 275 (1972), A275–A278. Search in Google Scholar

[25] S. Repin, A posteriori error estimation for variational problems with uniformly convex functionals, Math. Comp. 69 (2000), no. 230, 481–500. Search in Google Scholar

[26] S. Repin, A Posteriori Estimates for Partial Differential Equations, Walter de Gruyter, Berlin, 2008. Search in Google Scholar

[27] S. Repin, T. Samrowski and S. Sauter, Combined a posteriori modeling-discretization error estimate for elliptic problems with complicated interfaces, ESAIM Math. Model. Numer. Anal. 46 (2012), no. 6, 1389–1405. Search in Google Scholar

[28] S. Repin, S. Sauter and A. Smolianski, A posteriori estimation of dimension reduction errors for elliptic problems on thin domains, SIAM J. Numer. Anal. 42 (2004), no. 4, 1435–1451. Search in Google Scholar

[29] E. Zeidler, Nonlinear Functional Analysis and Its Applications. I: Fixed-Point Theorems, Springer, New York, 1986. Search in Google Scholar

Received: 2017-2-24
Revised: 2017-4-28
Accepted: 2017-5-16
Published Online: 2017-6-17
Published in Print: 2017-7-1

© 2017 Walter de Gruyter GmbH, Berlin/Boston