Optimal Algorithms for Computing Average Temperatures

Abstract A numerical algorithm is presented for computing average global temperature (or other quantities of interest such as average precipitation) from measurements taken at speci_ed locations and times. The algorithm is proven to be in a certain sense optimal. The analysis of the optimal algorithm provides a sharp a priori bound on the error between the computed value and the true average global temperature. This a priori bound involves a computable compatibility constant which assesses the quality of the measurements for the chosen model. The optimal algorithm is constructed by solving a convex minimization problem and involves a set of functions selected a priori in relation to the model. It is shown that the solution promotes sparsity and hence utilizes a smaller number of well-chosen data sites than those provided. The algorithm is then applied to canonical data sets and mathematically generic models for the computation of average temperature and average precipitation over given regions and given time intervals. A comparison is provided between the proposed algorithms and existing methods.


Introduction
Computing average temperatures is a particular instance of a common task in data processing, namely that of exploiting measurements made on a function f to estimate a quantity Q(f ) that depends on f , referred to as Quantity of Interest (QoI) below. In the present situation, f = f (x, t) represents the temperature (in degrees Celsius) as a function of the position x on the surface of the earth and of the time t. We single out temperature as a running example, but other atmospheric features such as humidity or precipitation can be treated equally well. As a QoI, we single out average temperatures over the whole earth or a smaller region R during a whole year or a shorter period ∆, obtained as the normalized integral where σ(R) is the surface area of the region and |∆| is the duration of the time period. In going further, we denote by Ω := {(x, t) : x ∈ R, t ∈ ∆} the domain of f , so the QoI also reads Q(f ) = |Ω| − Ω f , where |Ω| = σ(R)|∆| denotes the measure of Ω. Other QoIs such as the temperature Q(f ) = f (ξ , τ) at a speci c location ξ and a speci c time τ can be considered as well.
Temperatures are continuously monitored by numerous weather stations around the globe, providing us with ample data to estimate the QoI. This data takes the form of a vector This data alone is not su cient to guarantee that a QoI such as annual global temperature can be accurately computed. Indeed, without additional knowledge, the temperature function could, for example, oscillate wildly between the data sites, in which case the computed average temperature Q(f ) based on the collected data would be very far from the true average. To forbid such unrealistic scenarios, one imposes additional requirements on the temperature function f , which usually come either from the physical properties of f or from regularity theorems for f as a solution to (system of) partial di erential di erential equations (PDEs).
These additional demands on f are referred to as model class assumptions and are necessary to quantify the uncertainty in our knowledge about f . Unfortunately, in the case of the temperature function, the complexity of the system of PDEs makes regularity theorems hard to exploit. Therefore, the model class assumptions we consider are not built on the smoothness of f , but instead on its approximability. More precisely, our model class is the set of all continuous functions on the sphere (or part of the sphere) that can be approximated, up to accuracy ϵ, by a nite dimensional space V of functions. The choice of this nite dimensional space is essential and should be tailored to the physical properties of f . We believe that the geophysical community has a lot of experience and knowledge about the physical properties of the temperature/precipitation functions and can propose relevant choices of V for these cases. Once the model class is xed, i.e., the linear space V is selected, the problem of computing Q(f ) is well posed since the uncertainty about f is settled, and we can focus on the computation of Q(f ). There are, in general, three main issues related to the actual calculation of the QoI: (i) Is there an optimal accuracy at which one can estimate a QoI? (ii) Are there procedures that achieve this accuracy? (iii) Can one implement these procedures? Such questions form the cornerstone of a mathematical theory called optimal recovery (see [18]). Regarding (ii), for instance, the theory guarantees that there is an optimal algorithm for computing an average temperature Q(f ) that takes the form of a weighted average of the data, i.e., using the approximation While the existence of such an algorithm is settled, the di culty lies in the actual construction of this optimal, or even of a near optimal algorithm, since the data points (x j , t j,i ) are not equidistributed. Some of the available current methods used to estimate average temperatures also follow that path and take the form of a weighted average, even though they are not explicitly stated as such. In essence, they use local approximation-based methods to calculate the weights, by obtaining approximations to the values of the temperature function at equidistributed points on the sphere (see Section 2.1). However, no error analysis or optimality of the proposed techniques is provided which would give valuable information on the closeness (or not) between the computed quantity and the true average. Recently, signi cant advances were made on (i)-(ii)-(iii) in [7], where it was revealed how to explicitly construct optimal algorithms for the estimation of QoIs relative to the model classes mentioned above. The purpose of this note is to present to the geophysical scientists the latest theoretical and algorithmic developments in optimal recovery and inquire their expertise in atmospheric science in an e ort to produce the most realistic model classes (speci cally, choices of linear spaces V) for average global temperature computation with certi able performance. More precisely, we present an optimal algorithm for the computation of the average temperature Q(f ) based on the measurements w = M(f ). The error between the computed QoI and the true average global temperature is proven to be bounded by the product of the error ϵ of the underlying model and a computable compatibility constant µ V which assesses the quality of the measurements for the chosen model (space V). The algorithm nds the weights in (3) by solving a convex minimization problem. The optimization is performed (preferably) during an o ine stage, uses a basis for the space V, and requires only the station locations and not the station readings. The obtained solution utilizes only a subset of the set of station locations involved in the optimization. Thus, only the readings at these stations need to be accessed in the online computation stage. This additional e ciency feature of the proposed algorithm reduces the data transmission time, especially in cases when huge databases are stored in a cloud system and are accessible to multiple scienti c teams.
Finally, we test the algorithm on canonical data sets for the computation of average temperature and average precipitation over given regions and given time intervals. Note that, to provide a fair comparison between the proposed algorithm and existing methods, we use for some of the experiments the same processed data that have been used for the existing methods . As to the model class, we employ the standard (in the case of functions de ned on the sphere) choice of V being the space of spherical harmonics of certain degree or the space of piecewise constant functions. We believe that one can exploit better choices of spaces V (probably well known to atmospheric scientists), and therefore better model classes based on approximibility, which are intrinsically connected to the nature of the temperature or precipitation function f and use our algorithm to dramatically improve on the quality of the obtained results.

The algorithm
We now formalize our approach, in particular by clarifying the notion of optimal algorithms. By an algorithm, we simply mean a mapping taking a measurement vector w = M(f ) ∈ R m as input and returning a number A(w) ∈ R as an output, hopefully close to the true value Q(f ). The error made by this approximation is The performance of the algorithm A is then assessed in a worst-case setting, keeping in mind that, besides the data, the only information one has (or presumes) about f is that it satis es our model class assumptions. The model classes we advocate are built on the approximability properties of f . This usually involves a speci c space of functions that we use for approximation, e.g. polynomials, piecewise polynomials, radial basic functions, wavelets, etc., and thus we assume that we have an n-dimensional space V contained in the space C(Ω) of continuous functions on a domain Ω. That space encapsulates all our a priori knowledge about the temperature function f . It would be ideal but clearly unrealistic for f to be an element of V, since we have only partial knowledge about f . So, the best we can do is to use a space V which can rather well approximate the temperature function f . More precisely, if the distance from f to V in uniform norm is our model class is the set K of all functions that can be approximated by elements from V up to accuracy ϵ > . We next introduce the error of the algorithm A for the elements in this model class K, that is as an indicator of the performance of the algorithm A over the class K. A favorable bound on E(K, A) guarantees that Q(f ) is computed well. Finally, an optimal algorithm A * is one that makes E(K, A) as small as possible, i.e., Note that E * (K) quanti es the optimal performance relative to the class K. It was shown in [7] that the optimal performance E * (K) can be precisely evaluated for the model class (6) and that it decouples as The latter formula reveals that, besides the approximation capability of V, another important part is played by a constant µ V that encapsulates the compatibility of the data sites (x j , t j,i ), j = , . . . , m, i = , . . . , j , with the space V and the quantity of interest Q. A poor compatibility will result in a large µ V , which happens for example when Q(v) is large for some v ∈ V while all measurements v(x j ) are small, see (13). The roles of µ V and ϵ are somewhat competing in the choice of a proper linear space V on which to build the algorithm.
Indeed, we want a space with both good approximation capability, so that ϵ is small, and good compatibility with the data, so that µ V is small. Note that enlarging the space V will have the e ect of decreasing ϵ while increasing µ V . Such a tension is similar to the one encountered in statistical learning when confronted with the problem of over tting the data. Next, once a "favorite" space V is chosen, the results in [7] also put forward the construction of an optimal algorithm A * for the estimation of real-valued linear QoIs such as (1). The execution of the algorithms does not depend on how well the particular choice of V approximates the function of interest f . All this is factored in the error estimate (9), which can be recast as so in the above sense the proposed algorithm is universal. For easy comparison with other algorithms, we describe our construction in the special framework where there is no time dependence, i.e., when f = f (x) depends only on the position x. Existing algorithms for estimating annual temperatures can indeed be unscrambled as producing beforehand an annual temperature f (x j ) at each data site x j , and then computing a weighted average of the f (x j ), see e.g [14] for the general overview. We place ourselves in the same setting where our data consist of measurements f (x j ) which have already been averaged over time. Another instance of the time-independent framework is the case where fτ(x) = f (x, τ) represents an instantaneous temperature at a given time τ.
In the time-independent framework, given data sites (x , . . . , xm) and a basis (v , . . . , vn) for the space V, the optimal algorithm A * for the estimation of a real-valued linear QoI Q rst produces a solution Notice that the optimal weights a * , . . . , a * m depend on the sites x , . . . , xm but not on the data f (x ), . . . , f (xm) at these sites. Therefore, once the weights are computed (preferably o ine), formula (12) can be reused instantly for each data w τ = [f τ (x ), . . . , f τ (xm)] recorded at another time τ (or other time averages).
The algorithm A * is optimal in the sense that E (K(ϵ, V), A * ) = E * (K(ϵ, V)) for all ϵ > . Notice that the knowledge of ϵ > is not necessary to construct the algorithm A * . In addition, the minimization algorithm gives an explicit formula for the compatibility constant µ V , see [7], namely .

Novelty of our work and comparison with other methods
Our approach possesses several compelling features which we discuss next. First, we propose a model class based on the approximability of the temperature function, formally expressed by the choice of a space V.
This space explicitly encodes our general knowledge about the temperature eld (or other phenomenon we are studying). Next, we suggest an optimal algorithm A * , see (11)- (12), for the computation of a QoI for the elements from this model class. The algorithm is universal, because it can be executed for any space V. It is reliable because we provide an a priori error estimate for the di erence between the computed quantity and the true average. It is e cient since it uses a basis for V and the station locations (not the station readings) to select at the end only a small number of station locations whose readings are used for the computation of the global average temperature.
The current methods for the computation of global average temperature could be roughly divided into two main categories: approximation-based methods and statistical methods. The approximation-based methods use the available station data readings to assign values (f k ) M k= of the temperature function at regular grid points (g k ) M k= , see e.g. [10], [11], and [12]. Once this is done, an average over the newly computed values (f k ) M k= is performed. The choice of the grid and the method of assigning these values (usually done via interpolation) implicitly de nes the model class assumptions. In [25], twelve methods of interpolation are reviewed and analyzed. Most of them are various versions of the classical inverse distance weighting (IDW) method, see [23]. Its main idea is that at a grid point g k one assigns the average f k of temperature readings from stations whose distance to g k is smaller than a certain xed threshold, using weights which decay as the distance from g k gets bigger. Various modi cations of this method are used. For example, the reference station method (RSM) proposed in [10] uses all stations whose distance from g k is at most km and the weights decay linearly but with additional preference (i.e. bigger weights) given to stations with longer operating times. Those methods produce non-zero weights for all station locations and the respective algorithms use all available readings.
A whole range of statistical methods, often called optimal interpolation, have also been used for average temperature computation. We refer the reader to [6, chap. 4] for the history and basics of these techniques. They are applied to both the computation of the grid values f k (or the full temperature eld f ) and the direct calculation of the weights. Usually, in those methods, the weights are computed using the variance and covariance structure of the temperature eld (or other phenomenon we are studying). Assumptions about this structure are incorporated in the respective algorithm, re ecting the model assumptions. For example, the method proposed by [26] based on the work of [8] and [15] calculates the weights in (3) directly, see also [24].

Experimental results
In this section, we test our method on several situations encountered in atmospheric science: the estimation of average annual global temperature, the estimation of average seasonal regional temperature, and the estimation of total annual global precipitation. We also give a brief description of the implementation of our algorithm.

. Implementation details
Here, for simplicity, we consider the time-independent framework only and discuss the implementation of problem (11)  Here B is an n × m matrix, where n = dim(V), with entries b ij = v i (x j ), where the v i 's form a basis for the space V, that is V = span{v , . . . , vn}. The vector c := [c , . . . , cn] ∈ R n has entries c i = π S v i dσ, i = , . . . , n, where S is the unit sphere. The above problem can be handled by any o -the-shelf linear solver. We relied on CVX, a package for specifying and solving convex programs, see [1]. In the particular case when V is the space of spherical harmonics of degree at most L, see [19], the entries of B and c are computed as follows: • We pick the usual spherical harmonics {Y k , = , . . . , L, k = − , . . . , } as the basis {v , . . . , vn}, so that n = (L + ) . Among the multiple options to construct the spherical harmonics, we relied on Chebfun, a package for computing with functions, see [4].
• We collect the data sites x , . . . , xm from one of the appropriate databases mentioned below and we express each data location x j in polar coordinates (θ j , ϕ j ). • We evaluate each spherical harmonic at all the data sites to form the matrix B ∈ R n×m with entries We also evaluate the average of each spherical harmonic to form the vector c ∈ R n with entries π π π Y k (θ, ϕ) sin(θ)dθdϕ.
The numerical evaluations are carried out using Chebfun built-in capabilities. All experiments can be duplicated by downloading the reproducible available on S. Foucart's webpage.

. Average annual global temperatures
Our rst and most exhaustive experiment illustrates the application of our algorithm to the estimation of average annual global temperatures. The results are compared with the ones released by agencies such as NOAA National Climatic Data Center and NASA Goddard Institute for Space Studies. In these cases, the time dependence is discarded by considering for each year a function f = f (x) representing an average annual temperature depending only on the position x.
Data provenance: The experiment relies on the following standard data sets: • Raw Land Data (RLD), obtained by merging monthly land-based station temperatures (GHCNM) from [16] and monthly Antarctic land-based station temperatures (SCAR MET-READER) from [21]; • Processed Land Data (PLD), obtained by processing the RLD using GISTEMP steps 0-2, see [9] and [12]; • Gridded Land Data (GLD), using the PLD to assign temperatures for each grid center in the covering of the globe by , equal-area cells, see [10]; • Gridded Sea Data (GSD)¹, downloaded from the GISTEMP's website https://data.giss.nasa.gov/pub/gistemp/SBBX.ERSSTv4.gz and obtained following [13] and [20]. Algorithmic details: Our numerical algorithms require the choice of an approximation space V. We consider the linear spaces of spherical harmonics of degrees L = , L = , and L = , which we denote by SH3, SH6, and SH9, respectively. We also consider the linear spaces of piecewise constant functions on two standard partitions of the globe, namely the coarse and ne partitions used in [10], [11], [12], which we denote by PCC and PCF, respectively. We compare the performance of our algorithms with two standard methods provided by GISTEMP and by [20]. Both of these methods implicitly rely on piecewise constant approximation, however, they di er from our optimal algorithm. We compare these algorithms on two data sets: • Data set 1: This data set consists of the merging of GLD and GSD. This is the standard data set used in GISTEMP. • Data set 2: This data set consists of the merging of PLD and GSD. This set works more closely with raw data, at least on land. We did not have access to raw sea data. Observations: The results using PCC and PCF as well as using SH3, SH6 and SH9 on both data sets 1 and 2 are displayed in Figures 1 and 2. In these gures, temperature anomalies in • C were computed from the 1951-1980 baseline average. They are compared with those computed using GISTEMP and those reported by NOAA over the 1950-2016 time period. The results call for a number of comments: • Our results match those reported by NOAA and NASA tightly for data set 1, because they rely on the exact same data set, and more loosely for data set 2, which is closer to raw data; • Spherical harmonics are surprisingly e ective, considering that dim(V) = (L + ) = for L = (SH9) while dim(V) = , for the ne grid (PCF); • Moving from coarse to ne grid improves the approximation capability ϵ and does not severely deteriorate the compatibility constant µ V (which is also computed by the algorithm), and so produces more accurate results; • Likewise, for spherical harmonics, increasing L improves ϵ and does not severely deteriorate µ V . Note that selecting clustered stations, for example stations on land only, would generate a large constant µ V , since some spherical harmonics appearing in formula (13)    Weather station positioning: According to properties of -minimizers, the solution a * of (11) is sparse, meaning that only n = dim(V) weights among a * , . . . , a * m are nonzero, which implies that only n from the m weather stations are involved in the optimal estimation of average temperatures (recall that the computation of a * does not depend on the station readings but only on the positions of the stations). In our spherical harmonics experiments, we have observed that the n stations selected by the method tend to be evenly spread, see Figure 3, and that clustered stations, as in the land-only data set, tend to produce larger values of µ V , and thus the QoI estimation becomes less reliable in the latter case. When L increases, the value of µ V does not severely deteriorate as long as there are su ciently many stations which are reasonably spread around the globe. . Average seasonal regional temperatures We apply our method to estimate the average temperature in the state of Texas, see Figure 4, over two periods of three months (winter and summer) from 2000 to 2016. Time dependence is now incorporated. Thus, the space V consists of functions of two variables: the position x and the time t. We assume a piecewise constant dependence on x and a piecewise linear dependence on t, with breakpoints every week. The data we used underwent a preprocessing step producing average weekly temperatures at each weather station from daily values acquired from [17]. By contrast to the annual global temperature, we do not need to discard a weather station from the record just because one weekly reading is missing. The results in Figure 4 show excellent agreement with the NOAA regional temperatures using all available stations. The volume of data involved in the optimal estimation is reduced by a factor up to .

. Average annual precipitations for the contiguous US
For our precipitation computations, we use the annual precipitation data for the contiguous US which was extracted from the Global Precipitation Climatology Centre monthly precipitation data set (GPCC) from [22]. Figure 5 shows the results using PCF and compares them with values reported by [2] and downloaded from [5].  [2,5] are also shown.
We observe that PCF performs very well while utilizing only of the data set's grid locations in the contiguous United States, see Figure 6.

Discussion and conclusion
We have presented a template method to estimate diverse QoIs, with a special emphasis on average temperature. The method, whose parameters are computed o ine once and for all, is optimal with respect to the approximation model selected. Ideally, the method should be applied to raw data, where the data manipulation as performed by other methods (such as urban adjustment) should be integrated in the modeling stage, i.e., the choice of the space V. This choice is of course critical for the reliability of the method. It is rather surprising that spherical harmonics provided good results -this would have been conceivable for temperatures in the free troposphere, but on the earth surface they obliterate the important dependence of temperature on latitude, altitude, surface type, etc. We hope that domain experts can chime in and propose relevant choices of V for the model. A good space V should have a small dimension (the space of piecewise constants does not) for at least two reasons: the compatibility constant µ V will remain small and the estimation formula will feature few nonzero weights. The latter is due to the automatic sparsity of -minimizers. It does not suggest eliminating numerous weather stations and acquiring less data, as the nonzero weights depend on the QoI, on V, and on the available stations, but it presents an interesting advantage in terms of data transmission.