Invariance properties of limiting point processes and applications to clusters of extremes

Motivated by examples from extreme value theory, but without using the theory of regularly varying time series or any assumptions about the marginal distribution, we introduce the general notion of a cluster process as a limiting point process of returns of a certain event in a time series. We explore general invariance properties of cluster processes which are implied by stationarity of the underlying time series. Of particular interest in applications are the cluster size distributions, and we derive general properties and interconnections between the size of an inspected and a typical cluster. While the extremal index commonly used in extreme value theory is often interpreted as the inverse of a “mean cluster size”, we point out that this only holds true for the expected value of the typical cluster size, caused by an effect very similar to the inspection paradox in renewal theory.


Motivation
When one looks at a series of sequential random observations, the interest is sometimes restricted to keeping track of the times at which a particular outcome is observed in contrast to those at which it does not happen.This means that for a stochastic process (X t ) t∈Z on a generic probability space (Ω, A, P ) such that X t : (Ω, A) → (S, S) for some measurable space (S, S) one fixes a set U ∈ S and looks at the resulting process (I t ) t∈Z with I t := 1 {Xt∈U} , t ∈ Z. (1.1) The process (I t ) t∈Z is then a binary time series (see, e.g., [14], [15]), but could equally well be interpreted as a point process, i.e. a random measure of the form where N ⋆ Z , B( N ⋆ Z ) denotes the space of all simple counting measures on Z with its Borel σ-algebra, see [8], Chapter 7. Note that each element N of N ⋆ Z is uniquely determined by its support, i.e. the occurrence times of the event U , and so N (ω) in (1.2) is uniquely determined by the random set C(ω) = C (I t (ω)) t∈Z := {t ∈ Z : I t (ω) = 1}.
(1.3) Special cases of (S, S) and U in (1.1) are given by (R, B(R)) and U = (u, ∞) or (R d , B(R d )) and U = {x ∈ R d : x > u} for some u > 0 and some norm • , see Figure 1 for an example.
The resulting so-called exceedance process (I t ) t∈Z plays an important role in extreme value theory and its applications, see [11], [12], [19].There, one would typically analyze a stationary time series (X t ) t∈Z and look, in the univariate case, at the family of stationary exceedance processes given by for n ∈ N, where u n is a sequence chosen in a way such that P (X This approach can easily be generalized to multivariate time series (X t ) t∈Z by looking at the processes I n t := 1 { Xt >un} for some norm • instead, but for the remainder of this introduction we will restrict attention to the univariate case for ease of notation.
The resulting sequence of binary processes (I n t ) t∈Z (or, equivalently the corresponding point processes from (1.2) or their supports from (1.3)) indicate the times at which events happen which become more and more extreme and therefore increasingly unlikely.Consequentially, the finite-dimensional distributions of the processes (I n t ) t∈Z in (1.4) converge to those of a degenerate process consisting of only zeros.
By conditioning on the event {I n 0 = 1} as n → ∞, i.e. by looking at the conditional distribution we can avoid such a degenerate process and instead describe the limiting temporal structure that we see in rare events, given that such a rare event is observed at time t = 0.The analysis of the resulting extremal clusters is one of the main aspects of extreme value theory for time series, starting with [12], [18] and leading to more recent results like, e.g., [3], [4], [9], [13], [22].Under additional assumptions, the object of interest may in fact be not only the occurrence times of extremal events but their magnitude as well and it is possible to characterize the limiting process after a suitable, usually linear transformation, which means finding the weak limits of A typical form of such additional assumptions would be the framework of stationary regularly varying time series (X t ) t∈Z (see [16] for a general introduction), where we restrict our attention here to a univariate, non-negative time series (X t ) t∈Z .We call such a time series regularly varying if the marginal distribution of X 0 has a univariate regularly varying tail with some index α > 0, i.e.
and there exists a non-degenerate, so-called tail process (Y t ) t∈Z , see [4], such that for all m, n ∈ N as u → ∞, where w =⇒ stands for convergence in distribution.Key properties of the tail process are that , and Y 0 is independent of (Y t /Y 0 ) t∈Z . (1.7) Furthermore, the stationarity of (X t ) t∈Z influences the structure of the tail process (Y t ) t∈Z , leading to the so-called "time change formula" which holds for all s > 0 and all bounded, measurable functions f : R Z → R, see [4,Theorem 3.1] and also [16,Theorem 5.3.1].Thus, for a non-negative, stationary regularly varying time series (X t ) t∈Z with tail process (Y t ) t∈Z , setting I n t = 1 {Xt>un} for a sequence u n → ∞ implies the existence of a limiting process (1.9) We will see in the following that an equality of the form (1.9) holds in fact under much more general assumptions.

Assumptions and Overview
The aim of this note is to take a step back from typical frameworks in extreme value theory which impose certain assumptions about the marginal distribution of a time series and its standardized exceedances and find similar but much more general invariance properties of what we will call cluster processes.While it might be helpful to keep in mind the results from extreme value theory mentioned in the previous paragraph and in particular the exceedance processes of the form (1.4) as a typical example of such a cluster process, we will derive invariance properties of the limiting distribution of (1.5) under minimal requirements, making use only of the stationarity of the underlying binary processes (I n t ) n∈N and their conditional convergence, as formalized in the following assumption.We write N = {1, 2, . ..} and N 0 = N ∪ {0}.
Assumption 1.1.Let (I t ) t∈Z be a stochastic process such that there exists a family (I n t ) t∈Z , n ∈ N, of {0, 1}-valued stationary stochastic processes such that P (I n 0 = 1) > 0 for all n ∈ N and for all u, v ∈ N.
Observe that Assumption 1.1 immediately implies that I 0 = 1 almost surely.It should also be noted that the process (I t ) t∈Z is in general not stationary.While Assumption 1.1 allows for L((I n t ) t∈Z ) w =⇒ L((J t ) t∈Z ) for some stationary process (J t ) t∈Z with P (J 0 = 1) > 0, the setting in which P (I n 0 = 1) → 0 would typically lead to more interesting results as otherwise Theorem 2.1 below is a rather trivial statement and the cluster sizes treated in Sections 3 and 4 are almost surely infinite.
The process (I t ) t∈Z can be interpreted as indicating the times at which (in the limit) an event returns, given that we observe it at time t = 0.If P (I n 0 = 1) → 0 for n → ∞, then an underlying dependence structure in each process (I n t ) t∈Z can, via conditioning on {I n 0 = 1}, prevent the process (I t ) t∈Z from being equal to (1 {t=0} ) t∈Z .Typically, it would be an underlying factor causing the process (I t ) t∈Z to take the value 1 exactly at time t = 0 but possibly also within a certain time span before and after t = 0.However, if we assume that the impact of such an effect fades out as time passes, it is often reasonable to assume that P lim t→∞ i.e. that each such episode is almost surely of finite length.Even though we will not assume a priori that (1.10) holds, it motivates us to speak of a cluster of observations, for which we only keep track of the times at which we see an observation belonging to this cluster.Therefore, we will name the process (I t ) t∈Z the cluster process and the random set C(ω) = {t ∈ Z : I t (ω) = 1} the cluster.
In the following section, we will derive invariance principles for the process (I t ) t∈Z .
The results are related to the phenomenon expressed in the time change formula (1.8) and in (1.9) but hold under Assumption 1.1 only.Next, we will use these principles to characterize the distribution of cluster sizes.To this end, we will introduce the two quantities of the size of an inspected cluster and the size of a typical cluster (see also [22]) in Sections 3 and 4, respectively, and explore how they are related.Next, we interpret the results in the framework of extreme value theory, explain the relationship to the notion of the extremal index and derive an inequality between expected values of inspected and typical cluster sizes which is similar to the inspection paradox in renewal theory.In the final section, two simple examples illustrate our results.

An invariance principle for the cluster process
The following invariance principle for (I t ) t∈Z follows from the stationarity of the underlying processes (I n t ) t∈Z .Theorem 2.1 (Time change formula for (I t ) t∈Z ).Let (I t ) t∈Z be as in Assumption 1.1 and let A ⊂ Z with 0 ∈ A. Then, P (I t = 1 for all t ∈ A) = P (I t−a = 1 for all t ∈ A) for all a ∈ A.
Proof.Assume first that |A| < ∞, where |A| denotes the cardinality of the set A. Let a ∈ A. Then, P (I t = 1 for all t ∈ A) and P (I t−a = 1 for all t ∈ A) are determined by the finite-dimensional distributions of the random vectors (I t ) t∈A and (I t−a ) t∈A , respectively.We thus get from Assumption 1.1 and from 0, a ∈ A that .
By stationarity of all (I n t ) t∈Z and since 0, a ∈ A, the right-hand side equals In terms of the random set C in (1.3), the conclusion of Theorem 2.1 can be written as P (A ⊂ C) = P (A − a ⊂ C) for A ⊂ Z such that 0, a ∈ A. This equality foreshadows the following corollary, which is helpful for proving more refined statements about the times t at which the event {I t = 1} is observed, as will be needed in the next section.(b) Let A ⊂ Z with 0 ∈ A and let t 1 , t 2 ∈ Z with t 1 ≤ 0 ≤ t 2 .Then, Proof.We start by showing (b).Let t 1 , t 2 ∈ Z be such that t 1 ≤ 0 ≤ t 2 and let a ∈ A ∩ [t 1 , t 2 ].By the inclusion-exclusion principle and by applying Theorem 2.1 in the third step below, we find Part (a) follows from part (b) by letting t 1 → −∞, t 2 → ∞ and using continuity from above.
Remark 2.3.Invariance properties similar to Theorem 2.1 are frequent topics in the theory of point processes and appear in different forms.One can show that Theorem 2.1 implies that the process (I t ) t∈Z is point-stationary in the sense of [23], Chapter 9, and   [17].However, note that our Theorem 2.1 derives its invariance property for a limiting process derived from a sequence of stationary point processes, in contrast to the aforementioned connections between different forms of stationarity for one underlying process.It is furthermore easy to show that, even if we have not used an assumption about the regular variation of the underlying time series (X t ) t∈Z in the first place, the process (Y t ) := (Y I t ) t∈Z for some Y with P (Y > x) = 1 − x −α , x ≥ 1, independent of (I t ) t∈Z is a valid tail process of a regularly varying time series, see [13].This allows to apply in particular the results of [22] to our setting.The concept of "exceedance stationarity" introduced there describes the same invariance property that we see in (I t ) t∈Z if one restricts attention to the special case of a tail process of the form (Y I t ) t∈Z .
In the following sections we derive some further properties of (I t ) t∈Z from Theorem 2.1.

The inspected cluster size distribution
Without further knowledge of the processes (I n t ) t∈Z it is impossible to give more details on the particular form of the cluster process (I t ) t∈Z beyond the invariance principle from the previous section.Yet, as we will see in this section, it is possible to derive more general statements about summary statistics derived from (I t ) t∈Z .Based on the interpretation of (I t ) t∈Z as a cluster of events, the size of such a cluster would typically be of interest.Here, we should pay attention to the fact that (I t ) t∈Z describes a cluster from which we know that it includes the arbitrarily chosen but fixed view point t = 0, and so it makes sense to speak more precisely of an inspected cluster.To measure the size of an inspected cluster we introduce the quantity which we call the inspected cluster size.Note that for the exceedance process as introduced in (1.4), several rather mild dependence conditions on the process (X t ) t∈Z lead to P (S i < ∞) = 1 (see [12]), but we do not rule out the case P (S i = ∞) > 0 a priori.Furthermore, given the reference point t = 0, it may be of interest to keep track of how many instances of a cluster happen before and after this reference point, so we introduce noting that We start by showing that the joint distribution of (S i + , S i − ) (and thereby also the distribution of S i ) is already determined by the distribution of S i + or S i − , respectively.
Proposition 3.1.Let (I t ) t∈Z be as in Assumption 1.1 and S i , S i + , S i − as defined in (3.1)-(3.2).Then Proof.Let ∞} denote the time at which the |i|-th 1 is observed in (I t ) t∈Z before (if i < 0) or after (if i > 0) time t = 0, where inf ∅ = +∞ by convention; note that T 0 = 0, since 0 ∈ C. Using Corollary 2.2(b) in the third step below we can thus write ) can be done analogously as above or simply by noting that if (I t ) t∈Z satisfies Assumption 1.1, then so does (I −t ) t∈Z .Proposition 3.1 has a few direct implications for the joint distribution of (S i + , S i − ).(a) The law of (S i − , S i + ) is symmetric and for all k, ℓ ∈ N 0 , we have where the variable S i ± on the right-hand side can represent both S i − and S i + .
(b) The common probability mass function of S i − and S i + is nonincreasing on N 0 : where in the last step (3.5) was used.Summability of the probability mass function implies that lim k→∞ P (S i ± = k − 1) = 0 and thus P (S i ± = ℓ, S i = ∞) = 0 for all ℓ ∈ N 0 , which leads to the desired result.
Taken together, we see that the size of an inspected cluster and the number of observations before and after the inspection point t = 0 can be modelled as a two-step procedure: First, the inspected cluster size is determined according to the distribution of S i .Second, the number of observations before and after the inspection point are either both infinite or determined according to a uniformly distributed random split of the inspected cluster size.

The typical cluster size distribution and the extremal index
Given our definition of the cluster process in Assumption 1.1, it is natural to proceed as in the previous section and analyze the inspected cluster size.It should, however, be noted that the resulting quantities and their distributions may not correspond to the understanding of a cluster as desired in applications, as the conditioning on {I n 0 = 1} in Assumption 1.1 induces a size-bias similar to the inspection paradox, see [10, Chapter I.4], and [22] for the size-bias observed in tail processes of regularly varying time series.
To illustrate this point, think again of the exceedance processes of (1.4) and assume that X t stands for water level at a site protected by a dike.If the water level is extremely high for several days in a row, the dike will be weakened by pressure and moisture and may break.Thus, to construct a safe dike it would be important to know how long, on average, an extremal episode of high water levels lasts.From knowledge of the distribution of the inspected cluster size S i one could now be tempted to use as the relevant quantity.It has to be noted, however, that this is the expected value of the overall length of an inspected cluster, i.e. one that we observe at a random time during an extremal episode and thus typically already after some extremal events have happened previous to the inspection.This random choice of inspection time induces a bias towards longer clusters, similar to the inspection or waiting-time paradox, where the size of an inspected epoch of a renewal process stochastically dominates the size of a typical epoch, see [1].
To avoid this bias and instead make sure that we look at a typical epoch inspected at a fixed point relative to the cluster, the most straightforward way to do so is to make sure that we inspect the extremal cluster at the beginning, i.e. at a time at which S i − = 0.The idea behind this bias-correction is similar to the concept of anchoring maps, as introduced in [2].It leads us to introducing the typical cluster size distribution given by L(S t ) := L(S i | S i − = 0), if P (S i − = 0) > 0 (otherwise it is not defined), see also [22] for more on the notion of a typical cluster in the context of regularly varying time series.It is worth mentioning that Theorem 3.3 implies that the distribution of S t is defined if and only if P (S i < ∞) > 0 and that in this case P (S t < ∞) = 1.Furthermore, by Theorem 3.2(a), we could have equally well conditioned on {S i + = 0} in (4.2), i.e. on making our observation at the end of a cluster instead of at the beginning, leading to the same distribution of the typical cluster.
Again for the special case of the exceedance processes in (1.4) and under mild assumptions on (X t ) t∈Z , the probability of the conditioning event equals the well-known independent and therefore, from Proposition 3.1, that for all k, ℓ ∈ N 0 , which implies that the distribution of S i ± must be geometric.This makes our findings relevant for statistical inference as here some structural property of the underlying process allows us to conclude a parametric model for the inspected cluster size.

Figure 1 :
Figure 1: Above: A stretch of a realization of a time series (X t ) t∈Z with solid dots indicating the observations in the set U = (u, ∞) (shaded area) and empty dots for the remaining non-exceedances.Below: The same stretch of the corresponding realization of the time series (I t ) t∈Z .In this case, the set C contains −1, 0, 2.
which proves the statement in case |A| < ∞.For general A ⊂ Z and a ∈ A, set A n := A ∩ [− max(n, |a|), max(n, |a|)] for which |A n | < ∞, A n ↑ A as n → ∞, and a ∈ A n for all n ∈ N, so that P (I t = 1 for all t ∈ A) = lim n→∞ P (I t = 1 for all t ∈ A n ) = lim n→∞ P (I t−a = 1 for all t ∈ A n ) = P (I t−a = 1 for all t ∈ A) from the previous case and by continuity from above.