Gradient estimates for an orthotropic nonlinear diffusion equation

We consider a quasilinear degenerate parabolic equation driven by the orthotropic $p-$Laplacian. We prove that local weak solutions are locally Lipschitz continuous in the spatial variable, uniformly in time.

1. Introduction 1.1. Aim of the paper. Let Ω ⊂ R N be an open bounded set and I ⊂ R an open bounded interval. We study the gradient regularity of local weak solutions to the following parabolic equation |u xi | p−2 u xi xi , in I × Ω.
Evolution equations of this type have been studied since the 60's of the previous century, especially by the Soviet school, see for example the paper [27] by Vishik. Equation (1.1) also explicitely appears in the monographs [21], [ In this paper, we will focus on the case p ≥ 2. We first observe that (1.1) looks quite similar to the more familiar one (1.2) u t = ∆ p u, in I × Ω, which involves the p−Laplace operator Indeed, both parabolic equations are particular instances of equations of the type u t = div∇F (∇u) with F : R N → R a convex function which satisfies the structural conditions ∇F (z), z ≥ 1 C |z| p and |∇F (z)| ≤ C |z| p−1 , for every z ∈ R N .
Then, the basic regularity theory equally applies to both (1.1) and (1.2). The standard reference in the field is DiBenedetto's monograph [12], where one can find boundedness results for the solution u (see [12, Chapter V]), Hölder continuous estimates for u (see [12, Chapter III]), as well as Harnack inequality for positive solutions (see [12, Chapter VI]). At a technical level, there is no distinction to be made between (1.1) and (1.2).
In contrast, when coming to the regularity of ∇u (i.e. boundedness and continuity), the situation becomes fairly more complicated. Let us start from (1.2). DiBenedetto and Friedman [14] have proved that the gradients of this equation are bounded. This is the starting point to obtain the continuity of the gradients for any We refer again to DiBenedetto's book for a comprehensive collection of results on the subject, notably to [12,Chapter VIII]. Since then, there has been a growing literature concerning the regularity for nonlinear, possibly degenerate or singular, parabolic equations (or systems), the main model of which is given by the evolutionary p−Laplacian equation (1.2). Without any attempt to completeness, we can just mention some classical references [15,13,28,8,9], up to the most recent contributions on the subject, given by [2,18,19], among others. However, none of these results apply to our equation (1.1). Indeed, all of them rely on the fact that the loss of ellipticity of the operator div∇F is restricted to a single point, since the Hessian D 2 F behaves as in the model case (1.2) where the elliptic character is lost only for z = 0. Such a property dramatically breaks down for our equation (1.1). Indeed, in this case, the function F has the following orthotropic structure The Hessian matrix of F now degenerates on an unbounded set, namely the set of those z ∈ R N such that one component z i is 0. As a consequence, the aforementioned references do not provide any regularity results for the gradients of the solutions.
The main goal of the present paper is to prove the L ∞ bound on ∇u for our equation (1.1), thus extending the result by DiBenedetto and Friedman to this more degenerate setting. In order to do this, we will need to adapt to the parabolic setting the machinery that we developed in [3,4,5,6] and [7], for degenerate equations with orthotropic structure. Indeed, the operator N i=1 |u xi | p−2 u xi xi , that we called orthotropic p−Laplacian, is the prominent example of this kind of equations. We also refer to [11] for an approach to this operator, based on viscosity techniques.

Main result.
In this paper, we establish the following regularity result which can be seen as the parabolic counterpart of our previous result [5,Theorem 1.1] for the elliptic case. In the statement below, the notation ∇u refers to the spatial variables, i. e. ∇u = (u x1 , . . . , u xN ).
Main Theorem. Let p > 2 and let u ∈ L p loc (I; W 1,p loc (Ω)) be a local weak solution of (1.1). Then ∇u ∈ L ∞ loc (I × Ω). More precisely, for every parabolic cube Q τ,R (t 0 , x 0 ) := (t 0 − τ, t 0 ) × (x 0 − R, x 0 + R) N ⋐ I × Ω, and every 0 < σ < 1, we have ∇u L ∞ (Qστ,σR (t0,x0)) ≤ C 1 Qτ,R(t0,x0) |∇u| p dt dx for a constant C = C(N, p) > 0. Remark 1.1 (Scalings). We observe that the equation (1.1) is invariant with respect to the "horizontal" and "vertical" scale changes u λ,µ (t, x) = µ u(µ p−2 λ p t, λ x), for every λ, µ > 0. Then it is easily seen that the a priori estimate (1.4) is invariant with respect to these scale changes. We point out that such estimate is the exact analogue of that for the evolutionary p−Laplacian, see [12,Theorem 5.1,Chapter VIII]. Occasionally, in the paper we will work with anisotropic parabolic cubes of the type The choice of cubes of this type could be loosely justified by a dimensional analysis of the equation. Indeed, by considering the quantity u as dimensionless and using the family of scalings for every λ > 0, we get the relation time ∼ (length) p . However, as it is well-known, estimates on cubes of the type Q R are too restrictive when looking at C 0,α estimates for ∇u. Indeed, in light of the so-called intrinsic geometry, it is much more important to work with local estimates on cubes Q τ,R , where the time scale τ is adapted to the solution itself: roughly speaking, we can take τ ∼ R 2 |∇u| p−2 . This explains the importance of having (1.4) with two independent scales R and τ . We refer to [12, Chapter VIII] for a description of the method of intrinsic scalings, where these heuristics are clarified. Remark 1.2 (Case 1 < p ≤ 2). When p = 2, the orthotropic parabolic equation (1.1) boils down to the standard heat equation, for which solutions are well-known to be smooth. For this reason, in our statement we restrict our attention to the case p > 2. However, we point out that by making the choice τ = R 2 and taking the limit as p goes to 2 in (1.4), we formally end up with the classical gradient estimate for solutions of the heat equation In light of the previous remark, in the case p = 2 the relation is now the natural one.
As for the singular case p < 2, this is somehow simpler than its degenerate counterpart. In this case, the local Lipschitz regularity of solutions to (1.1) can be directly inferred from [22,Theorem 1], under the restriction Indeed, the result of [22] covers (among others) the case of parabolic equations of the form under the following assumptions on the convex function F It is not difficult to see that the orthotropic function (1.3) for p < 2 matches both requirements. Indeed, observe that thanks to the fact that p − 2 < 0. Thus in the subquadratic case, the orthotropic structure helps more than it hurts, in a sense.

Remark 1.3 (Anisotropic diffusion)
. We conclude this part by observing that, more generally, one could consider the following parabolic equation in Ω × I, which still has an orthotropic structure. Now we have a whole set of exponents 1 < p 1 ≤ p 2 ≤ . . . p N , one for each coordinate direction. We cite the paper [24], where some global Lipschitz regularity results are proven for solutions of the relevant Cauchy-Dirichlet problem, under appropriate regularity assumptions on the data. We point out that in light of their global nature, for p 1 = · · · = p N = p > 2 such results are not comparable to ours. We also refer to [10] for a sophisticated Harnack inequality for positive local weak solutions, as well as for some further references on the problem. Finally, the very recent paper [16] contains a thorough study of the Cauchy problem in the case p i < 2, together with some regularity results. However, as for the counterpart of our Main Theorem for local solutions of this equation, this is still an open problem, to the best of our knowledge.
1.3. Technical aspects of the proof. The core of the proof of the Main Theorem is an a priori Lipschitz estimate for smooth solutions of the orthotropic parabolic equation, see Proposition 4.1 below. More precisely, we introduce the regularized problem where F ε is a smooth uniformly convex approximation of the orthotropic function (1.3). By the classical regularity theory, the maps u ε are regular enough to justify all the calculations below. The goal is to establish a local uniform Lipschitz estimate on u ε , which does not depend on the regularization parameter ε. Finally, we let ε go to 0 and prove that the family u ε converges to the original solution u. This allows to obtain the Lipschitz estimate for u itself.
In the subsequent part of this section, we emphasize the main difficulties to get such a Lipschitz estimate on u ε . In order to simplify the presentation, we drop the index ε both for F ε and u ε . The strategy is apparently quite classical: we rely on a Moser iterative scheme of reverse Hölder's inequalities, resulting from the interplay between Cacciopoli estimates and the Sobolev embeddings.
To be more specific, we first differentiate the equation with respect to a spatial variable x j , so as to get the equation solved by the j−th component of the gradient. This is given by More generally, the composition of the component u xj with a non-negative convex function h is a subsolution of this equation. Accordingly, the map h(u xj ) satisfies the Caccioppoli inequality which is naturally attached to (1.5) (see Lemma 3.1 below). If I = (T 0 , T 1 ) and τ ∈ (T 0 , T 1 ), this reads as follows: (1.6) Here, the maps χ ∈ C ∞ 0 (I) and η ∈ C ∞ 0 (Ω) are non-negative cut-off functions in the time and space variables, respectively. We have used the following expedient notation: for a. e. τ ∈ I.
When F is a uniformly elliptic integrand, in the sense that one can easily obtain from (1.6) a crucial "unnatural" feature of the subsolution h(u xj ); that is, a sort of reverse Poincaré inequality where the Sobolev norm of h(u xj ) is controlled by the L 2 norm of the subsolution itself. In conjunction with the Sobolev inequality, this is the cornerstone which eventually leads to the classical version of the Moser iterative scheme. It should be noticed that this strategy still works even in the degenerate case, provided the Hessian behaves like as for the evolutionary p−Laplace equation (1.2). It is sufficient to use the "absorption of degeneracy" trick, where the degenerate weight |∇u| p−2 is recombined with the subsolution h(u xj ) by means of simple algebraic manipulations. This still permits to infer from (1.6) a control on the Sobolev norm of a suitable convex function of u xj . This is nowadays a standard technique in the field; for the elliptic case, it goes back to the pioneering works by Ural'tseva [26] and Uhlenbeck [25].
As we explained above, due to the severe degeneracy of D 2 F in our orthotropic situation, it is not possible to follow the same path. In order to rely on such an absorption trick, we have to go through a tour de force and to introduce a new family of weird Caccioppoli inequalities (see Lemma 3.2 below). These are the parabolic counterparts of a corresponding estimate introduced in the elliptic setting in [3] and then fruitfully exploited in [5]. The crucial idea is to mix together the components of the gradient with respect to 2 orthogonal directions. This compensates the lack of ellipticity of D 2 F and allows to rely on the Sobolev embeddings in the iterative scheme. We do not detail these Caccioppoli-type estimates here, but instead explain the main additional difficulties with respect to the elliptic framework.
Let us come back for one instant to the standard Caccioppoli inequality (1.6). It follows from (1.5) by taking ϕ = h h ′ (u xj ) χ η 2 . In particular, the parabolic term is given bÿ Then an integration by parts yields The latter yields the "time slice" term on the left-hand side of (1.6). This way, the time derivative is transferred to χ and one can handle the factor h 2 (u xj ) as in the elliptic framework. For the weird Cacciopoli inequalities, the test function is now ϕ = u xj Φ ′ (u 2 xj ) Ψ(u 2 x k ) χ η 2 , for some 1 ≤ j, k ≤ N . The corresponding parabolic term becomes: In contrast to the previous situation, we cannot perform an integration by parts to get rid of the time derivative on Φ(u 2 xj ), since it would affect the factor Ψ(u 2 x k ). In order to overcome this difficulty, which does not arise in the elliptic setting, we need a new approach, aimed at "symmetrizing" the above quantity containing u xj and u x k .
Basically, we merge together two weird Cacciopoli inequalities, where the spatial variables x j and x k play symmetric roles. More specifically, we insert into (1.5) the test functions: , and then add the two resulting inequalities. The parabolic term is now replaced by the following quantity: This allows to integrate by parts and transfer the time derivative on the test function. It turns out that by a suitable adaptation of the arguments that we used in the elliptic case, one can incorporate this new term in the iterative Moser scheme. This finally leads to the desired local L ∞ estimate on ∇u.

1.4.
Plan of the paper. The paper is organized as follows: after collecting the basic terminology and some preliminaries on Steklov averages in Section 2, we present in Section 3 the proofs of the new Cacciopoli inequalities in the parabolic setting. We detail the iterative Moser scheme in Section 4 and finally establish the Main Theorem in Section 5, by transferring to the original solution u the a priori estimates obtained on the approximating solutions u ε .

Preliminaries
We say that u ∈ L p loc (I; W 1,p loc (Ω)) is a local weak solution of the quasilinear diffusion equation 2.2. Steklov averages. Throughout the paper, we denote by T 0 < T 1 the endpoints of the time interval I.
We shall use some standard properties of the Steklov averages. Let 0 < σ < T 1 − T 0 and ψ ∈ L ∞ (I × Ω) such that ψ is compactly supported in (T 0 , T 1 − σ) × Ω. We extend ψ by 0 on (R \ I) × Ω, so that ψ − σ is well-defined on I × Ω and compactly supported therein. By the Fubini theorem, we have Finally, we can derive from (2.2) the following regularity properties of the Steklov averages: (2) if one further assumes that v ∈ L 1 loc (I; By an obvious change of variables, this yields which gives the desired identity (2.3). In order to prove (2.4), we rely again on (2.2), this time tested with ψ xj in place of ψ, for some 1 ≤ j ≤ N : In the last equality, we have derived under the integral sign the smooth function ψ. Hence, by integrating by parts the last integral and using (2.2) again, one gets from which (2.4) follows.

Energy estimates for a regularized equation
3.1. An approximating equation. We denote by and for every ε ∈ (0, 1), we consider the convex function We consider a local weak solution u ε ∈ L p loc (I; W 1,p loc (Ω)) of the equation (2.1) with the choice This means that u ε verifies Hence, one can rely on the classical regularity theory for quasilinear parabolic equations, see e.g. [ and u ε ∈ L 2 loc (I; W 2,2 loc (Ω)). In the following computations, we delete the index ε both for u and F .

3.2.
An equation for the spatial gradient. In order to establish a Lipschitz bound on our solution u, we need to differentiate (3.2) with respect to the spatial variables x j , 1 ≤ j ≤ N .
3.3. Caccioppoli-type inequalities. As explained in the introduction, the first technical tool in the proof of the Lipschitz bound of u is the following Cacciopoli inequality which provides a W 1,2 estimate on h(u xj ), where h is any smooth convex function.
be two non-negative functions, with χ non-decreasing. Let h : R → R be a C 1 convex non-negative function. Then, for almost every τ ∈ I and every j = 1, ..., N , we have Proof. We first assume that h is a C 2 convex non-negative function. Let ζ ∈ C ∞ 0 (I) and η ∈ C ∞ 0 (Ω). There We insert in (3.5) the test function which has compact support in ( We use the above identity to infer: We then perform an integration by parts with respect to the time variable in the first term: We now want to take the limit as σ goes to 0. Let Ω 1 ⋐ Ω such that η is compactly supported in Ω 1 . Since u xj ∈ L 2 loc (I × Ω), we have lim It then follows from the Dominated Convergence Theorem that Next, by recalling the choice of ϕ above, we have Hence, a similar argument to the one leading to (3.8) Finally, by using that (∇F (∇u)) xj ∈ L 2 loc (I × Ω), we can infer that It follows that lim σ→0 +¨( T0+σ1,T1−σ1)×Ω (∇F (∇u)) xj Up to now, we have thus proved: We now choose ζ as follows. Let χ ∈ C ∞ 0 ((T 0 , T 1 ]) be as in the statement. Given τ ∈ I and δ > 0 such that T 0 < τ < τ + δ < T 1 , we define We then insert in (3.9). Then, for almost every τ ∈ I, we can let δ go to 0 and obtain (3.11) Since χ does not depend on the spatial variable, we have Since the second term is non-negative, by dropping it, we get from (3.11) In order to estimate the last term, we use the Cauchy-Schwarz inequality: A further application of Young inequality leads to In this way, the integral containing ∇u xj can be absorbed in the left hand side. Let us finally observe that we can remove the C 2 assumption on the function h, by a standard approximation argument.
We next establish the key tool for the proof of our main result, namely a Cacciopoli type inequality, where two different partial derivatives u xj and u x k come into play.
Step 1: an identity involving u xj and u x k . We first assume that Φ and Ψ are two C 2 non-decreasing and non-negative convex functions. We fix k, j ∈ {1, . . . , N }. Given 0 < σ 1 < (T 1 − T 0 )/2 and ζ ∈ C ∞ 0 (T 0 + σ 1 , T 1 − σ 1 ), we consider (3.5) with the index j and for every 0 < σ < σ 1 , we insert the test function Symmetrically, we consider (3.5) with the index k and insert the test function The functions ϕ and ϕ are compactly supported in (T 0 + σ 1 , T 1 − σ 1 ) × Ω and belong to L ∞ ((T 0 + σ 1 , ). Thus they are admissible test functions. We observe that and similarly Thus we obtain By summing the two equations obtained from (3.5) as described above, and using the identity (3.12), we get We then perform an integration by parts in the time variable in the first integral, which gives Finally, we let σ go to 0. In the same vein as in the proof of (3.9), the Dominated Convergence Theorem implies that where ϕ 0 = u xj Φ ′ u 2 xj Ψ u 2 x k ζη 2 , and We now choose ζ as in (3.10) and insert it in (3.13). By letting δ go to 0, we obtain for almost every τ ∈ I where ψ 0 and ψ 0 are defined as ϕ 0 and ϕ 0 , except that ζ is now replaced by χ.
Step 2: completion of the proof. We first observe that Taking into account the definition of ψ 0 and ψ 0 , we thus get from (3.14)

(3.15)
We first estimate the two last terms. We use the Cauchy-Schwarz inequality: By the Young inequality, this implies A similar inequality holds true for the last term in (3.15). Hence, (3.16) In the left-hand side of (3.16), in the second term, we drop 2 u 2 xj Φ ′′ (u 2 xj ) which is non-negative. We also drop the whole last term of the left-hand side for the same reason. This yields (3.17) We next estimate the second term of the right-hand side of (3.17), that we denote by We first use the Cauchy-Schwarz inequality to get We then introduce a parameter θ ∈ [0, 1]. By writing Ψ ′ (u 2 one gets by the Cauchy-Schwarz inequality again: We define Then G is a C 1 non-negative convex function. Moreover, by its definition Hence, by the standard Caccioppoli inequality (3.6) with h = G and k in place of j, this yields: By using the Jensen inequality with the concave function y → y 1−θ/2 , we obtain This implies¨( Coming back to (3.18), it follows that Together with (3.17), this yields the desired inequality. Finally, the C 2 assumption on Φ and Ψ can be removed by a standard approximation argument.

Uniform Lipschitz estimate for the regularized equation
This section is devoted to the proof of the following uniform estimate. For simplicity, we will work with anisotropic parabolic cubes of the form Proposition 4.1. There exist α = α(N ) > 2 and C = C(N, p) > 0 such that for every ε > 0 and for every Proof. We will limit ourselves for simplicity to the case N ≥ 3. This allows to use the Sobolev inequality valid for every f ∈ W 1,2 0 (Ω) Here C N is a constant which depends only on N . The case N = 2 follows with minor modifications and we omit the details. The proof is quite involved and for ease of readability, we divide it into several steps.
Step 1: the choices of Φ and Ψ. We apply Lemma 3.2 with the following choices This gives On the product of the two last integrals, we use the Young inequality in the form this gives By the Young inequality again, we can estimate Using that s ≥ 1 in the left-hand side and m θ ≤ m in the right-hand side, we thus obtain This finally implies
We also use that s ℓ + m ℓ + s 2 ℓ m ℓ ≤ 2 (q + 1) 3 . Then the above inequality (4.2) written for s = s ℓ and m = m ℓ with 0 ≤ ℓ ≤ ℓ 0 − 1 gives Observe that we used that 2 s ℓ + 2 m ℓ = 2 (q + 1) on the first term on the right-hand side. By summing from ℓ = 0 up to ℓ = ℓ 0 − 1, one gets For the last term, we apply Lemma 3.1 with the choice We thus get This is turn implies that possibly for a different constant C > 0.
Step 3: weak ellipticity and boundedness of D 2 F . We now use the explicit expression of F . We recall that If follows that for every ξ, λ ∈ R N , we have By inserting these estimates into (4.3), one gets We consider the second term on the left-hand side: observe that by keeping in the sum only the term with i = k and dropping the others, we get When we sum over j = 1, . . . , N the resulting estimate, we thus get which in turn implies up to redefine the constant C > 0. We now add the term (T0,τ )×Ω |u x k | 2q+p χ |∇η| 2 dt dx, on both sides of the above inequality. With some algebraic manipulations, this gives By using the Sobolev inequality in the spatial variable for the second term of the left-hand side, one obtains We now finally sum over k = 1, . . . , N and use the Minkowski inequality for the second term of the left-hand side. This gives We introduce the expedient function We observe that for every s ≥ 0, one has In particular for s = 2, we get U ≤ |∇u| ≤ √ N U. Hence, we get (4.4) Step 4: choice of the cut-off functions. Let (t 0 , x 0 ) ∈ I × Ω and 0 < r < R ≤ 1 such that the cube Q R (x 0 ) = (x 0 − R, x 0 + R) N is compactly contained in Ω. We further require that so that we must have T 0 < t 0 < T 1 and R p < t 0 − T 0 . Let χ : [T 0 , T 1 ] → R be a non-decreasing Lipschitz function such that We recall the notation for the anisotropic parabolic cube With such a choice of χ and η, we use (4.4) twice: • firstly, by dropping the second term in the left-hand side, and taking the supremum in τ over the interval (t 0 − r p , t 0 ); • secondly, by dropping the first term in the left-hand side and taking τ = t 0 .
By summing up the two resulting contributions, this gives sup τ ∈(t0−r p ,t0)ˆ{τ }×Qr(x0) Observe that we also used that (R − r) p ≤ (R − r) 2 , since R ≤ 1 and p > 2. By the Hölder inequality, one haŝ Using (4.5), this implies Step 5: the local L ∞ estimate on ∇u. Take q = q j = 2 j+1 − 1 with j ∈ N. We set We observe that γ j−1 < γ j < δ j and τ j ∈ (0, 1) is defined in a such a way that The estimate (4.6) can be rewritten as By interpolation in L p spaces, we obtain By lengthy but elementary computations, we see that thus the combination of the two previous inequalities leads tö Qr(t0,x0) U γj dt dx ≤ ¨Q r (t0,x0) .
We now use the Young inequality Thus we have proved the estimatë (4.7) Next, we observe that where the last inequality relies on the two facts: R ≤ 1 and p > 2. Hence, using that q j ≥ 1, one gets By summing |Q r (t 0 , x 0 )| on both sides of (4.7), this implies We can now appeal to [17, Lemma 6.1] and absorb the term on the right-hand side containing 1 + U γj , in a standard way. By using the definition of q j , this leads to We want to iterate the previous estimate on a sequence of shrinking parabolic cylinders. We fix two radii 0 < r < R ≤ 1, then we consider the sequence and we apply (4.8) with R j+1 < R j in place of r < R. We introduce the notation By iterating the previous estimate starting from j = 1, we obtain for every n ∈ N \ {0}, possibly for a different constant C > 1. We now take the power 2 −n on both sides: where the last inequality relies on the fact that We thus get Here, we have also used that γ 0 = p + 2 and γ n ∼ 2 n+2 , for n going to ∞. Finally, in order to remove the dependence on the L p+2 norm of the gradient, we use a standard interpolation trick. We write ¨Q R (t0,x0) Inserting this estimate into (4.9), we get where in the last line, we have used that R ≤ 1. By [17, Lemma 6.1] again, we get Since U ≤ |∇u| ≤ √ N U, this completes the proof.
Remark 4.2. An inspection of the proof reveals that the exponent α in (4.1) can be taken to be any number > 2, if N = 2.
In the case N = 2, the constant C blows-up as α tends to 2.

Proof of the Main Theorem
We take u ∈ L p loc (I; W 1,p loc (Ω)) to be a local weak solution of equation ( for every ϕ ∈ C ∞ 0 (I × Ω). We recall that F 0 indicates the convex function Our program is as follows: • we approximate u by solutions u ε of the regularized equation; • we transfer the Lipschitz estimate of Proposition 4.1 from u ε to u; • we use a scaling argument to "rectify" the local estimate and obtain (1.4). We start by recalling that u has the following additional properties: for every subinterval J ⋐ I and every open set O ⋐ Ω, we have (5.2) u t ∈ L p ′ (J; W −1,p ′ (O)) and u ∈ C(J; L 2 (O)).
Here W −1,p ′ (O) is the topological dual space of W 1,p 0 (O) and the latter is the completion of C ∞ 0 (O) with respect to the L p norm of the gradient.
Part 1: convergence of the approximation scheme. We remember the definition (3.1) of F ε for every ε ≥ 0. We then consider the approximating initial boundary value problem parametrized by ε > 0   in Ω.
Thus the previous integral identity can be rewritten aŝ For the second term in (5.5), we can use that for every ξ, ζ ∈ R N , we have for some C = C(N, p) > 0. The first inequality can be found in [20,Section 10]. On the other hand, the convexity of G implies that ∇G is a monotone map and thus ∇G (∇u ε ) , ∇u ε − ∇u ≥ ∇G(∇u), ∇u ε − ∇u .
Then for every ε > 0, we have This estimate guarantees that the family {∇u ε } ε>0 is strongly convergent to ∇u in L p (I × Ω). This is now sufficient to pass to the limit in the uniform Lipschitz estimate (4.1), which will be still valid for u, as well. By a covering argument, this in turn implies ∇u ∈ L ∞ loc (I × Ω), as claimed.