Adaptive space–time finite element methods for parabolic optimal control problems

: We present, analyze, and test locally stabilized space–time finite element methods on fully unstructured simplicial space–time meshes for the numerical solution of space–time tracking parabolic optimal con-trol problems with the standard L 2 -regularization. We derive a priori discretization error estimates in terms of the local mesh-sizes for shape-regular meshes. The adaptive version is driven by local residual error indicators, or, alternatively, by local error indicators derived from a new functional a posteriori error estimator. The latter provides a guaranteed upper bound of the error, but is more costly than the residual error indicators. We perform numerical tests for benchmark examples having different features. In particular, we consider a discontinuous target in form of a first expanding and then contracting ball in 3d that is fixed in the 4d space– time cylinder.


Introduction
Let us consider the following space-time tracking optimal control problem: For a given target function y d ∈ L 2 (Q) (desired state) and for some appropriately chosen regularization (cost) parameter ϱ > 0, find the state y ∈ Y and the control u ∈ U minimizing the cost functional subject to the linear parabolic initial-boundary value problem (IBVP): where Q := Ω × (0, T), Σ := ∂Ω × (0, T), Σ 0 := Ω × {0}, T > 0 is the final time, ∂ t denotes the partial time derivative, div x is the spatial divergence operator, ∇ x is the spatial gradient, and the source term u on the right-hand side of the parabolic PDE (1.2) serves as control.The spatial domain Ω ⊂ ℝ d , d = 1, 2, 3, is supposed to be bounded and Lipschitz.The coefficient ν is assumed to be uniformly positive and bounded, i.e., there exist positive constants ν 1 and ν 2 such that 0 < ν 1 ⩽ ν(x, t) ⩽ ν 2 for almost all (x, t) ∈ Q. (1.3) For simplicity, we here consider only scalar coefficients, but it is clear that the scalar coefficient ν can be replaced by a symmetric, and uniformly positive definite and bounded d × d coefficient matrix.
In the standard setting, that was already investigated in the famous book by J. L. Lions [38], the distributed control u is taken from U = L 2 (0, T; L 2 (Ω)) = L 2 (Q), and thus the cost of the control is also measured in the L 2 (Q)-norm that mathematically serves as regularization term in (1.1).Since the state equation (1.2) has a unique solution y ∈ Y := Y 0 = {v ∈ L 2 (0, T; H 1 0 (Ω)) : ∂ t v ∈ L 2 (0, T; H −1 (Ω)), v = 0 on Σ 0 } = {v ∈ W(0, T) : v = 0 on Σ 0 }, one can conclude the existence of a unique control u ∈ U minimizing the cost functional J(Su, u), where S is the solution operator mapping u ∈ U to the unique solution y ∈ Y of (1.2) (see, e.g., [38,56]).There is a huge number of publications devoted to the numerical solution of optimal control problems (1.1)-(1.2) with the standard L 2 -regularization (see, e.g., [6,23,56]).The overwhelming majority of the publications uses some time-stepping method or discontinuous Galerkin method for the time discretization in combination with some space-discretization method like the finite element method (see, e.g., [40,41]).The unique solvability of the optimal control problem can also be established by showing that the optimality system (KKT system) has a unique solution, since these problems are equivalent for quadratic cost functionals with linear constraints.In [37], the Banach-Necas-Babuska (BNB) theorem was applied to the optimality system to show its well-posedness.We refer the reader to [7,Th. 3.6] and [16,Th. 2.6] for the version of the BNB theorem, which was used in [37], and to the original papers [3,44].Furthermore, the discrete inf-sup condition, that does not follow from the inf-sup condition in the infinite-dimensional setting, was established for continuous space-time finite element discretizations on fully unstructured simplicial space-time meshes [37,Lem. 3.4].The discrete inf-sup condition implies stability of the discretization and a priori discretization error estimates (see also [3,4]).In connection with continuous space-time finite element discretizations for parabolic optimal control problems, we would like to mention the publications [19,43].A comprehensive overview of space-time methods for parabolic IBVP can be found in [53].
Besides the L 2 -regularization, other regularization respectively cost terms can be chosen in order to obtain certain desired effects.We here only mention the sparsity techniques where the L 1 term µ‖u‖ L 1 (Q) with the sparsity parameter µ is added to the L 2 -regularization term and directional sparsity techniques [11,21,49], control in measure spaces also leading to locally concentrated controls [10], and the energy regularization where U = L 2 (0, T; H −1 (Ω)) (see [36]).The energy regularization is motivated by applications in electrical engineering where controls u from L 2 (0, T; H −1 (Ω)) are admissible, i.e., controls that are concentrated on spatial hypersurfaces.Furthermore, the observation can be restricted to some subset of the space-time cylinder Q including Σ T := Ω × {0} (observation at the terminal time), or the control can be restricted to some subset of Q including the boundary Σ or some parts of Σ (control via Dirichlet, Neumann, and Robin boundary conditions).
In this paper, we consider locally stabilized space-time finite element methods on fully unstructured simplicial space-time meshes for the numerical solution of the space-time tracking parabolic optimal control problem (1.1)-(1.2) with the standard L 2 -regularization as model problem, although the space-time finite element technique presented in this paper can certainly be applied to other optimal control problems as well.In our former works [32,33], we have successfully applied the locally stabilized space-time finite element methods to the state equation (1.2) with right-hand sides u from L 2 (Q) and with special distributional right-hand sides u from L 2 (0, T; H −1 (Ω)).In particular, we have proposed adaptive space-time finite element schemes based on local error indicators that are derived from the residual error indicator proposed in [52], and Repin's functional error estimator providing a guaranteed upper bound for any admissible approximation [46].In our note [34], we report on the first results for globally stabilized space-time finite element methods applied to the optimal control problem (1.1)-(1.2),but on quasi-uniform meshes characterized by a global mesh-size parameter h.There we used global time-upwind finite element test functions v h + ϑh 2 ∂ t v h and q h − ϑh 2 ∂ t q h for constructing consistent finite element schemes approximating the reduced optimality system.We mention that upwind test functions were introduced by Hughes and Brooks for constructing stable finite element schemes for stationary convection-diffusion problems in [24].This stabilization technique is called SUPG, and was later used by Johnson and Saranen [27] for transient problems; see also [25] for the related Galerkin Least-Squares finite element methods, and [5,31] for more recent papers on these stabilization techniques.In the case of unstructured, but shape-regular meshes naturally produced by adaptive schemes, one should replace the global discretization parameter h by the local mesh-size h K that can be different for every element K from the triangulation T h .This simple replacement of h by h K works well for the state equation alone [32,33], but not for the reduced optimality system.Therefore, in this paper, we introduce a differentiable mesh-density function λ h (x, t) in order to prove coercivity of the mesh-dependent bilinear form a h (⋅, ⋅) that corresponds to the finite element scheme for the reduced optimality system.Coercivity together with Galerkin orthogonality and extended boundedness immediately leads to a best-approximation estimate from which one can derive convergence rate estimates by means of interpolation error estimates under additional regularity assumptions.Now adaptivity can be performed simultaneously in space and time on the basis of local error indicators.We use the residual error indicator that was introduced for the state equation in [52], and that was later used for optimal control problems as well [37].The residual error indicator is computational cheap, and works very well in all our numerical experiments.However, a rigorous proof of reliability and efficiency is still missing even for the state equation (cf.[52]).Using the inf-sup condition for the reduced optimality system proved in [37], we are able to derive rigorously a functional a posteriori error estimator that provides a guaranteed upper bound of the discretization error and local error indicators for adaptivity.Finally, we have to solve one system of finite element equations for defining all space-time unknowns all at once.The system matrix is non-symmetric, but positive definite.We solve this system by means of a parallel version of the flexible General Minimal Residual (FGMRES) method [47], preconditioned by a block-diagonal algebraic multigrid (AMG) preconditioner.The parallelization is relatively easy since it can be done simultaneously in space and time as known from elliptic problems.We emphasize that our stabilized scheme is coercive with respect to a mesh-dependent norm that is stronger than the norm used in the inf-sup condition in [37].
The remainder of the paper is organized as follows.Section 2 introduces the basic notations and states some preliminary results on the solvability and the space-time discretization of the parabolic initialboundary value problem (1.2) that serves as state equation in the optimal control problem studied in this paper.Section 3 states the reduced optimality system characterizing the unique solution of the optimal control problem (1.1)-(1.2).The space-time finite element discretization of the reduced optimality system is derived in Section 4. In Section 5, we establish the coercivity of the bilinear form corresponding the finite element scheme.A priori estimates of the discretization error are derived in Section 6. Section 7 provides the derivation of a new functional a posteriori error estimator, and briefly describes the construction of simultaneous space-time adaptive finite element schemes.The algebraic system corresponding to the finite element scheme and the solver are presented in Section 8. Section 9 is devoted to the presentation and discussion of the numerical results.Finally, we draw some conclusion and give an outlook in Section 10.
The standard weak or variational formulation of the state equation (1.2) reads as follows: with the bilinear form b(⋅, ⋅) : Y × V → ℝ, defined by the identity and the duality product ⟨⋅, ⋅⟩ Q : V * × V → ℝ, and for given u ∈ V * .We mention that the first integral in (2.2) has to be understood as duality pairing as well since ∂ t y, in general, belongs to V * .We also may integrate by parts with respect to time.Then we can include the initial conditions as natural conditions into the variational formulation, and we look for a weak solution y in H vanishing on the top Σ T := Ω × {T} of the space-time cylinder Q (see, e.g., [28]).The standard methods to show existence and uniqueness are Galerkin's method in space and a priori estimates.One can also use the BNB theorem as it was done in [50].Indeed, the bilinear form b(⋅, ⋅) fulfills the following three conditions: (BNB1) boundedness: that are sufficient and necessary for B : Y → V * being an isomorphism, where the operator B is defined by the bilinear form: Therefore, the solution operator S = B −1 is a well-defined bounded linear operator from V * onto Y.Moreover, if u ∈ L 2 (Q), then the unique solution y belongs to the space provided that the coefficient ν fulfills some additional conditions (see [14]).This property is called maximal parabolic regularity.We mentioned that already Ladyzhenskaya proved maximal parabolic regularity for the case ν = 1 in [28].Until now many papers have been published on this topic (see, e.g., [18]).

Optimality system
Eliminating the control u from the optimality system by means of the so-called gradient equation p + ϱu = 0, we get the reduced optimality system.The weak form of the reduced optimality system can be written in the form: Find the state y ∈ Y 0 and the adjoint state p ∈ P T such that holds for v, q ∈ V, where P T := {p ∈ W(0, T) : p = 0 on Σ T }.It is shown in [37,Th. 3.3] by means of the BNB theorem that the variational reduced optimality system (3.1)-(3.2) is well-posed.
In addition to (1.3), we assume that the coefficient function ν(x, t) is of bounded variation in t for almost all x ∈ Ω.Then we can conclude maximal parabolic regularity for the parabolic equations (3.1) and (3.2) because the corresponding right-hand sides − 1 ϱ p and y − y d belong to L 2 (Q) (see [14]).Therefore, ∂ t y and L x y := −div x (ν ∇ x y) as well as ∂ t p and L x p belong to L 2 (Q) too.In this case, the variational reduced optimality system (3.1)-(3.2) can be rewritten in the strong form as coupled forward and backward system of parabolic PDEs: Find holds.Further regularity results for parabolic problems can be found, e.g., in [17,29].Later we need such regularity results for convergence rate estimates.

Space-time finite element discretization
We now use the PDE optimality system (3.3)-(3.4)as starting point for constructing stable and consistent space-time finite element schemes on fully unstructured simplicial triangulations of the space-time cylinder Q.
Thus, let us first consider a decomposition T h = {K} of Q into shape-regular simplicial elements, i.e., Q = ⋃ K∈T h K, and K ∩ K  = ⌀ for all K and K  from T h with K ̸ = K  (see, e.g., [7,16] for more details).Once a shape-regular triangulation is available, we define the space-time finite element spaces consists of all continuous and piecewise polynomial functions (provided that x K (⋅) is affine-linear), where x K (⋅) denotes the map from the reference element K (unit simplex) to the finite element K ∈ T h , and ℙ k ( K) is the space of polynomials of degree k ∈ ℕ := {1, 2, . ..} on the reference element K.
In addition to the assumptions made above, we further assume that ν is piecewise smooth in the sense that div x (ν ∇ x w h )| K ∈ L 2 (K) for all w h from Y 0h or P Th and for all K ∈ T h .Now we multiply the first PDE (3.3) by a upwind test function , and the second one (3.4) by q h − ϑλ 2 h ∂ t q h with q h ∈ P Th , where ϑ is some positive scaling parameter and λ h ∈ W 1 ∞ (Q) is a mesh-density function which we will choose later.Then, integrating over K, integrating by parts in the elliptic terms where the scaling parameter ϑ does not appear, and summing over all K ∈ T h , we arrive at the variational consistency identity with the combined bilinear and linear forms respectively.The continuous and piecewise differentiable mesh-density function λ h (x, t) should be chosen in such a way that the inequalities hold for all (x, t) ∈ K, K ∈ T h , and for all meshes under consideration, where h K = diam(K), and λ 0 , λ 0 and λ 1 are some positive generic constants.We can easily define the mesh density function λ h (x, t) by means of the finite element space S 1 h (Q) = span{φ i : i = 1, 2, . . ., n h }, spanned by the nodal basis functions φ i (x, t), as follows where h i denotes the average of the length of all edges meeting at the vertex (x i , t i ), i.e. the sum of the lengths divided by the number of these edges.We note that λ h (( , where δ i,j denotes Kronecker's symbol.The conformity and the shape regularity of the mesh immediately imply inequalities (4.4).The general discretization parameter h can be chosen as where n h is the number of vertices in T h , N h = dim Y 0h , and M h = dim P Th .We note that h K = O(h) for all K ∈ T h in the case of a (quasi) uniform mesh.
The consistent finite element scheme corresponding to (4.1) now reads as follows: Subtracting (4.5) from (4.1), we immediately obtain the Galerkin orthogonality relation which is crucial for deriving discretization error estimates.

Coercivity and unique solvability
We will now show that the bilinear form a h is coercive on Y 0h × P Th with respect to the norm for some suitably chosen scaling parameter ϑ, with where α is some positive parameter.We choose α = 1 if not stated otherwise.First, we observe that the identity is valid for all (v h , q h ) ∈ Y 0h × P Th .Here we have used the identities and We note that, in the latter identity, the boundary integral vanishes for all finite element functions (v h , q h ) from Y 0h × P Th .The special case that 1. k = 1 and ν = const on K for all K ∈ T h , and 2. ∂ t λ h = 0, e.g., uniform mesh h i = h, as considered in [34], makes the terms with the second-order elliptic operator L x and the last term in (5.2) zero.Then the identity (5.2) gives that α is chosen as 0.5 in the definition of the norm (5.1), i.e., the bilinear form is coercive on Y 0h × P Th with the coercivity constant 1.
In the general case, we have to estimate these terms.Beside Cauchy's and Young's inequalities, we need the generalized Friedrichs' inequality where c F (ν) can obviously be estimated from above by the Friedrichs constant c F,Ω of the spatial domain Ω times (ν 1 ) −1/2 , i.e., c 1 , and the special inverse inequality λ max,K := max where λ max,K can be computed from a small generalized matrix eigenvalue problem for every finite element K ∈ T h (see [32]).The upper bound can be shown by mapping K to the reference element K. Now, using these inequalities, we can proceed to estimate (5.2) from below as follows: where ε and ε are positive parameters from Young's inequalities, , and h = max K∈T h h K .The coercivity constant µ c is positive for properly chosen positive parameters ε, ε, and ϑ.
with a = max K∈T h a K .The coercivity (5.5) of the bilinear form a h on Y 0h × P Th immediately implies that the finite element scheme (4.5) can only have one solution, but, in the finite dimensional case, uniqueness yields existence.Thus, the space-time finite element scheme (4.5) has a unique solution that can be determined via the solution of corresponding algebraic system of finite element equations (see Section 8).
Remark 5.1.In the case of a quasi-uniform mesh, formally defined by the setting h K = h and h i = h, i.e., λ h = h, ∂ t λ h = 0, and, therefore, λ 0 = λ 0 = 1 and λ 1 = 0, the coercivity estimate (5.5) yields the coercivity estimate presented in [34].We note that, for k = 1, L x v h = L x q h = 0 on K ∈ T h for all (v h , q h ) ∈ Y 0h × P Th provided that ν is elementwise constant.

A priori discretization error estimates
In order to derive a priori discretization error estimates, we first need to establish the extended boundedness of the bilinear form and for all y ∈ Y 0h + Y 0 ∩ H L,1 (Q) and p ∈ P Th + P T ∩ H L,1 (Q), where The boundedness constant µ b will be defined below.At first, using integration by parts with respect to t, we get the identity and (v h , q h ) ∈ Y 0h × P Th , and that is the starting point for estimating the right-hand side from above.Indeed, Young's and Cauchy's inequalities, and the generalized Friedrichs inequality (5.3), yield the following estimates: , where we use the shorter notations ‖ ⋅ ‖ D for the L 2norms ‖ ⋅ ‖ L 2 (D) .The following best-approximation error estimate is now an easy consequence of the Galerkin orthogonality (4.6), the coercivity estimate (5.5), the extended boundedness estimate (6.1), and the triangle inequality.
Theorem 6.1.Let us assume that the coefficient ν fulfills the conditions leading to maximal parabolic regularity, i.e., the unique solution (y, p) of the optimality system Furthermore, let ϱ be some fixed positive regularization parameter, and let ϑ be a fixed scaling parameter such that condition (5.6) is satisfied.Then the discretization error estimate holds, where (y h , p h ) ∈ Y 0h × P Th is the unique solution of the finite element optimality system (4.5).
Under additional regularity assumptions imposed on the solution (y, p), the best-approximation error estimate (6.2) yields discretization error estimates in terms of the mesh-sizes.Indeed, choosing v h = I h y ∈ Y 0h and q h = I h p ∈ P Th in the infimum at the right-hand side of (6.2), we can show the following theorem, where I h is the usual nodal finite element interpolation operator for sufficiently smooth solutions y, p , or a quasi-interpolation operator à la Clément [12] and Scott-Zhang [48] for low-regularity solutions y, p ∈ H m+1 (Q) with m > 0 (see also [8]).
Theorem 6.2.Let the assumptions of Theorem 6.1 hold.We further assume that all transformations x K (⋅) from the reference element K to the physical finite element K ∈ T h are affine-linear.Then, in the smooth case, when assuming y, p with s = min{ℓ, k + 1} and some generic positive constants c 1 and c 2 , whereas, in the non-smooth case, when only assuming y, p ∈ H m+1 (Q) for some m such that (d + 1)/2 ⩾ m + 1 > 1, and ν∇ x y, ν∇ x p ∈ (H m (K)) d , we arrive at the estimate and S K := {K  ∈ T h : K ∩ K  ̸ = ⌀} denotes the neighborhood of the simplex K ∈ T h .
Proof.Using standard interpolation respectively quasi-interpolation error estimates (see, e.g., [8,12,22,48] for space interpolation in the case of semi-norms), we can derive the discretization error estimates (6.3) and (6.4) from the best-approximation error estimate (6.2) along the line of the proofs for the state equations given in [32] and [33] for the smooth and non-smooth cases, respectively.Indeed, in the norms of the right-hand side of (6.2), the same (quasi-) interpolation error terms appear as in the corresponding bestapproximation error estimates in [32,33].The only difference consists in the inclusions of the positive meshdensity function λ h or 1/λ h as weights into the norms.Using the local lower and upper bounds (4.4) of the mesh-density function λ h , we can estimate these terms as follows: and where e denotes the (quasi) interpolation errors y − I h y or p − I h p for the state or co-state, respectively.For the further estimation of (6.5) and (6.6), we will use standard nodal interpolation or quasi-interpolation error estimates in dependence of the regularity of the state y or the co-state p.The further estimation of ‖L x e‖ 2 requires a special treatment.Let us now derive the estimation of ‖L x e‖ 2 L 2 (K) = ‖div x (ν∇ x (y − I h y))‖ 2 L 2 (K) for the state y in the low-regularity case y ∈ H m+1 (Q) for some (d + 1)/2 ⩾ m + 1 > 1, and ν∇ x y ∈ (H m (K)) d .Inserting and subtracting the average of the flux q = −ν∇ x y over K ∈ T h and using the inverse inequality ‖div we can estimate the term ‖div x (ν∇ x (y − I h y))‖ 2 L 2 (K) as follows: where I h is a quasi-interpolation operator like mentioned above.The last term can be estimated by means of the local approximation properties of I 0 h and I h .Indeed, inserting and subtracting q in the last term of the above estimate, we obtain with some generic positive constant c, where we have used Poincaré's inequality for estimating the first term.The high-regularity case is straightforward since one can again use the nodal interpolation operator, and one can estimate ‖div x (ν∇ x (y − I h y))‖ 2 L 2 (K) directly.We always assume that the coefficient ν restricted to an element K is sufficiently smooth, but ν can be discontinuous globally.It is clear that the same estimations can be derived for the co-state p.
We mention that the semi-norms in the estimates (6.3) and (6.4) have to be replaced by the corresponding full norms in the case of non-affine linear mappings x K (⋅) from K to K ∈ T h , e.g., if one uses the popular isoparametric elements (k ⩾ 2) or general non-linear mappings in order to represent (or approximate) curved interfaces or boundaries.Regularity results for the solution (y, p) of the optimality system (3.1)-(3.2) can be derived from known regularity results for parabolic initial-boundary value problems [17,28,29] by simple bootstrapping arguments.Here we always assume that the coefficient ν is as smooth as necessary.Such kind of bootstrapping arguments can be found in [19] and [36,Rem. 3.3] for L 2 and energy regularizations, respectively.In the case considered in our paper, we always assume maximal parabolic regularity, i.e., ∂ t y, ∂ x i y, L x y and ∂ t p, ∂ x i p, L x p belong to L 2 (Q).The a priori convergence rate estimates of Theorem 6.2 are based on additional (isotropic) regularity information y, p ∈ H 1+m (Q), but not on anisotropic regularity information.Anisotropic regularity information can be the starting point for anisotropic mesh refinement that can be done better for hexahedral elements than simplicial elements; see, e.g., [35] for adaptive hexahedral space-time finite element approximations to parabolic initial boundary value problems, where we consider a parabolic problem on a 'slit' spatial domain with a solution that is analytic in t and has low spatial regularity due to the elliptic singularity at the top of the slit.
In the case of low-regularity solutions when singularities in the time or/and space derivatives appear, the convergence rate is affected by these singularities, e.g., we observe O(h s−1 ) = O(h m ) instead of O(h) for linear shape functions (k = 1) if (y, p) only belongs to H 1+m (Q), where s = min{1 + m, 2} = 1 + m for 0 < m ⩽ 1.The full rate O(h) can be recovered by mesh grading that was already used for elliptic boundary value problems in early works by Oganesyan and Rukhovets [45]; see also [2] for more recent results, and [13] for spacetime finite element mesh grading.However, in order to use the mesh grading technique, we must know the strengths of the singularities and their positions in space and time.Thus, in practice, one wants to implement an adaptive finite element method (AFEM) that is based on local error indicators and automatic adaptive mesh refinement.Such an adaptive finite element code should do the same job as an a priori mesh grading; see also [30] for adaptive space-time Isogeometric Analysis of parabolic initial boundary value problems.
In many practical applications, such low-regularity solutions can appear due to discontinuous coefficients, re-entrant corners in the computational domain, changing boundary conditions, and missing compatibility of the boundary conditions with the initial conditions.Discontinuous target states y d will lead to steep gradients, in particular, for small regularization parameters ϱ.We note that the target y d can change discontinuously in space and time (cf. the numerical example from Subsection 9.2).Hence, for such kind of non-smooth problems, it makes sense to use adaptive mesh refinements that are guided by local error indicators based on a posteriori error indicators or even estimators.The unstructured space-time approach considered in this paper allows for adaptivity in space and time simultaneously.This is a huge advantage of fully unstructured space-time techniques over the usual time-stepping methods or tensor-product techniques.
In order to drive the adaptive process, we will first use the error indicator of residual type proposed by Steinbach and Yang in [52] for parabolic PDEs; see also [37] for applications to parabolic optimal control problems.In particular, given the finite element state y h and co-state p h , we compute the local indicator where are nothing but the residual and jump terms of the coupled PDE optimality system in the strong form (3.3)-(3.4),respectively.While the residual error indicator is easy to realize and has a low computational cost, we have no rigorous analysis even in the case of the state equation only (cf.[52]).However, on the basis of the continuous inf-sup condition for the reduced optimality system given in [37], we can rigorously derive a functional a posteriori error estimate that provides a guaranteed upper bound and that can be easily localized.Moreover, this functional a posteriori error estimate is of general importance since it can be used to estimate the error between the exact weak solution (y, p) ∈ Y 0 × P T of the reduced variational optimality system (3.1)-(3.2) and any approximation (ỹ, p) belonging to the to the space Y 0 ∩ H 1 (Q) × P T ∩ H 1 (Q).Here we are interested in the finite element approximation (y h , p h ) ∈ Y 0h × P Th computed by our finite element scheme (4.5).The variational optimality system (3.1)-(3.2) can be written in the compact form obtained by adding (3.1) and (3.2).For simplicity, we assume that ν = 1.Thus the bilinear form a(⋅, ⋅) is given by the expression Now, using the inf-sup estimate [37,Lem. 3.1], we can estimate the difference between any approximation (ỹ, p) ∈ Y 0 ∩ H 1 (Q) × P T ∩ H 1 (Q) and the exact solution (y, p) ∈ Y 0 × P T as follows: , and ‖p‖ X T defined analogously.We note that the infsup constant 1/(2 √ 2) from [37] can be replaced by the improved constant 1/ √ 2 (see [51,54]).Now, adding and subtracting arbitrary vector functions τ and σ from regrouping the terms, integrating by parts, and using Cauchy's as well as Friedrichs's inequalities, we obtain the following estimates of the nominator on the right-hand side of (7.2): Combining this estimate with the inf-sup condition (7.2), we arrive at the following guaranteed upper bound that holds for an arbitrary pair In order to estimate the error of our finite element solution, we replace (ỹ, p) by the finite element solution (y h , p h ) of (4.5).We still have to choose appropriate fluxes τ and σ from H(div x , Q).Unfortunately, we cannot simply take the finite element fluxes ∇ x y h and ∇ x p h because they do not belong to H(div x , Q).Instead, we postprocess the finite element fluxes to produce improved fluxes τ (0) h that belong to H(div x , Q).We can then either use the postprocessed fluxes directly and insert them into the majorant M ⊕ , or we can try to improve the initial fluxes even further.In order to do so, we first estimate the squared majorant once more, and obtain a quadratic functional as an upper bound: Using τ (0) h and σ (0) h as initial guess, we minimize the functional M 2 + wrt.τ and σ using a few steps of the (preconditioned) CG method, respectively.Hence we obtain improved fluxes τ (1) h and σ (1) h , which we then insert into the localized functional M + to obtain local error indicators, i.e., that then drive the adaptive process.Note that we cannot compute any efficiency indices with respect to (7.3), since the norm in the dual space L(0, T; H −1 (Ω)) involved in the left-hand side, is not really computable.Since we have the estimates h ) which are used for computing the efficiency indices in Section 9.
Once we have computed the indicators for each K ∈ T h , we determine a set M h of (almost) minimal cardinality such that σ where σ ∈ (0, 1) is an a priori chosen bulk parameter.This marking strategy is called Dörfler marking [15].The elements of the set M h are then refined, where we might have to refine additional elements in order to maintain the conformity and shape-regularity of the mesh; see, e.g., [39] and [55] for a bisection strategy for simplices that works in any dimension.We use this bisection strategy for our numerical experiments in Section 9 as well.

Algebraic system and solvers
Let the finite element spaces Y 0h = span{φ j : j = 1, . . ., N h } and P Th = {ψ m : m = 1, . . ., M h } be spanned by the standard nodal finite element basis.Then each finite element function y h ∈ Y 0h and p h ∈ P Th can the represented in the form Inserting (8.1) into (4.5), and testing with basis functions φ i and ψ n , we obtain one big linear system for determining the unknown coefficient vectors y h = (y j ) j=1,...,N h ∈ ℝ N h and p h = (p m ) m=1,...,M h ∈ ℝ M h at once for the whole space time cylinder, where f h = (ℓ h (0, ψ n )) n=1,...,M h ∈ ℝ M h .The system matrix m,n=1,...,M h i,j=1,...,N h = ( obviously has a 2 × 2 block structure with the blocks K yy , K yp , K py , and K pp defined by the identities for all coefficient vectors y h , v h ∈ ℝ N h and p h , q h ∈ ℝ M h corresponding to the finite element functions y = y h , v = v h ∈ Y 0h and p = p h , q = q h ∈ P Th via the finite element isomorphism; cf.(8.1).Due to (5.5), the system matrix K h as well as the diagonal blocks K yy and K pp are non-symmetric, but positive definite.Hence the linear system (8.2) is solved by the FGMRES method [47], preconditioned by a block-diagonal AMG preconditioner C h = blockdiag(C yy , C pp ), where C yy = K yy (I − M yy ) −1 and C pp = K pp (I − M pp ) −1 with the AMG iteration matrices M yy and M pp resulting from one AMG V-cycles applied to the diagonal blocks of K h using zero initial guesses [20].In the next section, we test the performance of this AMG preconditioned FGMRES as single-grid solver starting with initial guess zero and stopping after reducing the initial residual by the factor 10 −8 , and in the nested iteration setting where the initial guess for the refinement level ℓ + 1 is interpolated from the last iterate at the preceding refinement level ℓ.The nested iteration approach aims at stopping the iteration when the corresponding discretization error is reached at level ℓ + 1 (see, e.g., [20]).

Numerical results
We implemented the space-time finite element scheme (4.5) using MFEM [42], a lightweight C++ library.The block-diagonal AMG preconditioner described in Section 8 was realized by means of BoomerAMG, an AMG implementation provided by the linear solver library hypre [26].Both libraries allow for easy parallelization.We stop the AMG preconditioned FGMRES iterations once the initial residual has been reduced by a factor of 10 −8 .In the nested iteration approach, we stop the iteration when the residual computed from the interpolated coarse grid approximation is reduced by the factor 10 −2 for linear elements, and by 10 −3 for quadratic and cubic elements.
The adaptive bulk parameter is always chosen as σ = 0.25.In the following, we present the numerical results for two benchmark problems, which where already used in [34,37] as test problems, but for d = 2, i.e., Q ⊂ ℝ 3 .Here we show the corresponding numerical results for the 4-dimensional space-time cylinder Q decomposed into shape-regular four-dimensional simplices (pentatopes).

Smooth problem with explicitly known solution
As first benchmark, we consider the space-time domain Q = (0, 1) 4 , i.e., d = 3, a constant diffusion coefficient ν ≡ 1, and the manufactured state, co-state, and control respectively, where a = −(2π 2 + 1)/(2π 2 + 2) and b = 1.The desired state y d is computed accordingly.This solution is highly smooth and has no local features.Hence, we expect to observe the optimal convergence rates predicted by Theorem 6.2.In the left plot of Fig. 1, we present the convergence history of uniform refinements as well as for adaptive refinements using the functional error estimator presented in Section 7, for different polynomial degrees k, and a fixed regularization parameter ϱ = 0.01.Indeed, we always obtain a rate of O(h k ).Note that the 'bumps' in the convergence rates are due to the bisection algorithm, where each uniform refinement only doubles the number of elements as opposed to the usual procedure of uniform refinements that subdivides each pentatope into 16 subpentatopes.In the right plot of Fig. 1, we present a comparison between the efficiency indices of the residual indicator, as well as the functional estimator.Here we observe that the efficiency index of the functional estimator is not influenced by the polynomial degree k, whereas the efficiency index of the residual indicator has only a mild dependence on the polynomial degree k.We already Fig. sketched the numerical procedure to obtain improved fluxes τ (1) h and σ (1) h in Sect.7. The postprocessing of the finite element fluxes ∇ x y h and ∇ x q h is done by means of a nodal averaging procedure, resulting in the initial guesses τ (0) h and σ (0) h for the AMG preconditioned CG method.We stop the PCG solver either after the initial residual has been reduced by a factor of 10 −2 , or after at most 10 CG iterations.The final iterates τ (1) h and σ (1) h are then inserted in the localized functional M + .Additionally, we also performed strong scaling tests for our AMG-preconditioned FGMRES solver (cf.Fig. 2).Here, we plot the solving time (including the setup time of the AMG preconditioner) for three different problem sizes, using linear elements for the left plot, and quadratic elements for the right plot.We can observe that, for linear elements, we obtain almost optimal scaling up to 64 cores, after which the speed up stagnates.This stagnation is due to fact that the problem sizes for each processor are too small, such that the parallel overhead is now the dominating factor.For quadratic elements, the overall speed up is optimal until 128 cores.Then the parallel overhead again affects the speed-up.We note that the size of the problems fitting to a single core can vary greatly depending on the polynomial degree k.Indeed, the biggest problem that we could solve on a single core has 2 084 994 dofs for linear elements, compared to 16 596 226 dofs for quadratic elements.
Finally, we also tested the linear solver in a nested iterations settings.In Table 1, we present the solving times and number of iterations, comparing non-nested and nested iterations, for different polynomial

Discontinuous target function
In order to showcase the adaptive capabilities of a space-time scheme, we once more consider the fourdimensional space-time domain, Q = (0, 1) 4 , and a constant diffusion coefficient ν ≡ 1.However, we now want to approximate the discontinuous desired state that is nothing but a first expanding and then contracting ball, which is however fixed in space-time.Moreover, we use a smaller, fixed regularization parameter ρ = 10 −6 .The discontinuity at the interface between the ball and its surrounding introduces steep gradients for the state and co-state, which require a rather fine local mesh-size in order to resolve the rapid changes properly.Hence, this problem is well suited for using adaptive refinements, driven by the residual error indicator introduced in Section 7. Indeed, the residual indicator leads to mesh refinements concentrated along the hypersurface of discontinuity (see Fig. 4), where we plot the finite element state y h , co-state p h , and control u h on cuts through the space-time mesh T h .These cuts are nothing but unit cubes, which we then further cut by three planes along the three spatial coordinates centered at (0.5, 0.5, 0.5).While we do not have explicit knowledge about the exact solutions, we can take a look at the convergence behavior of the sum of local indicators η(y h , p h ) 2 = ∑ K∈T h η K (y h , p h ) 2 (cf.Fig. 3).
Here we can indeed observe almost optimal rates of O(h k ) in terms of h = #dofs −1/(d+1) .However, in contrast to residual estimators for elliptic problems, where the sum of indicators is indeed proportional to the discretization error in the energy norm (see, e.g., [9]), no such result is available for residual indicators for parabolic problems on totally unstructured space-time decompositions.

Conclusions and outlook
We presented and analyzed new locally stabilized space-time finite element schemes on fully unstructured, but shape regular simplicial decompositions of the space-time cylinder Q for the numerical solution of parabolic optimal control problems.Such meshes typically arise from adaptive mesh refinement driven by local error indicators.The meshes are described by a continuous, elementwise differentiable mesh density function λ h (x, t) that also provides the right scaling of the time-upwind test functions.The resulting spacetime finite element scheme is consistent, coercive, and bounded.The boundedness of the corresponding bilinear form a h (⋅; ⋅) holds for an extended space with respect to the first couple of functions.Beside the finite element functions, this space also contains the solution (y, p) ∈ (Y 0 ∩ H L,1 (Q)) × (P T ∩ H L,1 (Q)) in the maximal parabolic regularity setting.Coercivity, extended boundedness, and Galerkin orthogonality, which results from consistency, immediately yield a best-approximation estimate from which convergence rate estimates follow under additional regularity assumptions.The first adaptive version of our space-time finite element method is based on residual error indicators, which work well in our numerical experiment.However, there is no theory concerning reliability, efficiency, convergence, and optimality in sense of the paper [9].Therefore, we proposed a second version that is based on a new functional error estimator providing a guaranteed upper bound of the error.However, the computation of the local error indicators are more costly than the residual ones.Finally, we have to solve one big system of finite element equations providing the finite element solution of the reduced optimality system all at once.We used a parallel version of an AMG preconditioned FGMRES that shows a excellent parallel performance in our numerical experiments.The construction and analysis of parallel solvers that are not only robust with respect to h and k, but also with respect to small regularization parameters ϱ is certainly a challenging task for future work.Simultaneous adaptivity and parallelization in space and time are big advantages of our really unstructured space-time finite element solver for the reduced optimality system that is forward and backward in time anyway.It is clear that this space-time technique can be applied to other optimal control problems and to other state equations including convection-diffusion problems and PDE system like the Navier-Stokes equations.

Funding:
The authors would like to thank the Austrian Science Fund (FWF) for the financial support under the grant DK W1214-04.

Fig. 3 :Fig. 4 :
Fig. 3: Convergence plot of the sum of local error indicators η 2 (y h , p h ) = ∑ K∈T h η 2 K (y h , p h ) for different polynomial degrees k.y h p h u h

1 :
Convergence rates in the mesh-dependent norm ‖(⋅, ⋅)‖ h (left); Eflciency indices of the residual indicator and the functional estimator wrt. to the error √ϱ‖∇ x (y − y h )‖ 2 Q + ‖∇ x (p − p h )‖ 2 Q (right); for different polynomial degrees k.Both plots share the same line styles.
Fig. 2:Strong scaling results of the block-AMG preconditioned FGMRES for fixed problem sizes; using linear elements (left), and quadratic elements (right).
Solving times and number of iterations for non-nested and nested iterations using 64 cores, where ℓ indicates the number of uniform refinements.The largest problem sizes are 32 278 018, 72 386 050, and 43 427 330 dofs, for the polynomial degrees k = 1, 2, and 3, respectively.Here, we can deduce that the AMG preconditioner works very well in the nested iterations framework, reducing the number of iterations as well as the overall solving time drastically.