Machine Learning for the Relationship of High-Energy Electron Flux between GEO and MEO with Application to Missing Values Imputation for Beidou MEO Data

: We consider the problem of building the relationship of high-energy electron flux between Geostationary Earth Orbit (GEO) and Medium Earth Orbit (MEO). A time-series decomposition technique is first applied to the original data, resulting in trend and detrended part for both GEO and MEO data. Then we predict MEO trend with GEO data using three machine learning models: Linear Regression (LR), Random Forest (RF), and Multi-Layer Perceptron (MLP). Experiment shows that RF gains best performance in all scenarios. Feature extraction analysis demonstrates that the inclusion of lagged features and (possible) ahead features is substantially helpful to the prediction. At last, an application of imputing missing values for MEO data is presented, in which RF model with selected features is used to handle the trend part while a moving block method is for the detrended part.


Introduction
It is well-known that high-energy electrons in the Earth's outer radiation belt are crucial risk factors of satellite internal charging (Gubby and Evans 2002;Horne et al. 2013). Such effect can subsequently cause significant satellites anomalies, leading to serious loss of service, such as communication interruption, navigation precision degradation, etc (Ryden et al. 2008;Singh et al. 2021). One way to avoid this effect is to use radiationhardened components with aluminum during satellite design, but this can be expensive because of the additional mass and increased launch costs (Horne et al. 2013). Therefore, it is highly desirable to understand behavioral characteristics of high-energy electrons and make reliable forecast and warning of the radiation environment around our spacecrafts. With an increasing number of satellites and its growing importance to our life, this topic has attracted a large amount of attention over past decades, resulting in lots of advances.
Most of these works focus on the Geostationary Earth Orbit (GEO) due to the large number of operational satellites and a wealth of available data, ranging from physical models to machine learning models or their combination (Bengtson et al. 2018). Physical models are mainly driven by our understanding about the physical mechanism of electron generation and acceleration. Typical study in this aspect includes Li and Temerin (2001); Millan and Thorne (2007); Anderson et al. (2015), etc. With the rapid progress of artificial intelligence (Onan et al. 2016b;Onan and Toçoglu 2021;Onan 2021;Onan and Korukoglu 2016), machine learning models have been largely applied to forecasting high-energy electron flux at GEO, yielding state-of-the-art results. Guo et al. (2013) propose an artificial neural network model with the radial basis function, in which lagged values of flux, solar wind parameters, and Ap-index are taken as input, achieving a predictive efficiency (PE) of around 0.82 for years -2010. Shin et al. (2016 develop a different neural network scheme to predict GEO electron flux at a high time resolution, which exhibits an excellent PE of 0.96 for GOES-15 and 0.93 for GOES-13 for 1-hour prediction (For detailed information about GOES (Geostationary Operational Environmental Satellite), please visit https://www.goes.noaa.gov/.). Wei et al. (2018) propose to use the deep learning technique of long short-term memory (LSTM) network for 1-day ahead integral flux, where various feature combinations are studied; the LSTM method achieves state-of-the-art performance as of that moment.
Compared to GEO, studies at Medium Earth Orbit (MEO) are relatively rare because of scarce available observations and the accuracy is far from our expectation. The methods are mainly physically driven, such as Radiation Belt Environment (RBE) model (Fok et al. 2008), Versatile Electron Radiation Belt (VERB) model (Subbotin and Shprits 2009), and SPACE-CAST (Horne et al. 2013). Recent years, thanks to the Van Allen Probe (VAP), some new models were developed; typical example is the dynamic linear model using the Kalman filter (Tim et al. 2018). However, VAP-B was already deactivated in 2019 and VAP-A is about to cease operations, which makes it more challenging to study the behavior of high-energy electrons at MEO.
To mitigate this problem, we aim to build a relationship between GEO and MEO in this paper based on the current available data with machine learning techniques that are becoming more and more popular in astronomy research (Danilov and Karpov 2018;Boudreaux 2017;Peng and Bai 2019). Specifically speaking, our focus is to create a model that predicts high-energy electron fluxes detected by Beidou satellites at MEO with the fluxes from GOES. Once the model is built, one could use it to generate much more data that are out of the scope of current available observations from Beidou because GOES has far longer monitoring history. This is quite important in lots of applications, e.g., such data could help us conduct fault analysis for satellites at MEO that do not have high-energy electron sensors installed therein. In data processing aspect, this model could also be very helpful, e.g., it provides a good way to impute missing values for the Beidou observations as we will demonstrate in Section 4. In addition (perhaps more importantly), we could use the learned relationship between GEO and MEO to make more accurate forecast of high-energy electron flux for both GEO and MEO, following the idea of multi-task transfer learning (Jiang 2009;Samala et al. 2017).
To build such a model, we first apply a time-series decomposition technique to simplify the structure of original data, resulting in trend and detrended part for both GEO and MEO data, which makes their relationship become clear. Then we try three commonly-used machine learning methods: Linear Regression (LR), Random Forest (RF), and Multi-Layer Perceptron (MLP). A feature extraction analysis is conducted, where we gradually add lagged values of GOES observations and ahead values into the prediction model to check if these additional features can improve the predictive performance.
The reminder of this paper proceeds as follows. Section 2 describes the source data and introduces our time-series decomposition technique. Section 3.2 presents the prediction methods and feature extraction analysis. Section 4 demonstrates an application of the learned model to missing values imputation for Beidou MEO data. Section 5 concludes this paper and gives some discussion.

Description
The high-energy electron flux data used in this study include two parts: 5-min averaged observations from GOES-15 and 1min observations from a Beidou MEO satellite (Zou et al. 2018) (It is well-known that the Beidou Navigation System consists of three kinds of orbits: GEO, MEO, and IGSO (Inclined Geosynchronous Orbit). The data used here is from one of the MEO satellites that carries electron detectors.). We choose the data ranging from January, 2019 to February, 2020 for two reasons: first, GOES-15 ceased its service in March, 2020; second, the Beidou spacecraft was launched in late 2018 and the beginning of obtained data are not in good condition. The GOES-15 data, available from the National Geophysical Data Center, contain two energy channels: >600 KeV, denoted by 'E06', and >2 MeV, denoted by 'E2'. The Beidou data include four energy channels, denoted by 'Flux1', 'Flux2', 'Flux3', and 'Flux4', whose details are shown in Table 1.
All chosen data are demonstrated in Figure 1, in which top and bottom panes show GOES data and Beidou data respectively. We see that the GOES data display good completeness and robustness while the Beidou data contain a lot of missing values and some outliers. To get deeper characteristics of the data, we select five days of observations (from 2019-01-10 to 2019-01-14) to show in Figure 2. We see that the two data show largely different features, which is mainly due to the orbit distinction. The GOES data have a one-day seasonality, which is reasonable because of the period of GEO satellites. The Beidou data show stronger seasonality, where high-value peaks and zero values appear alternately. This is because the Beidou MEO spacecraft passes through the Earth's radiation belt twice per cycle: the flux is high when the satellite is in the belt while it is almost zero when the satellite is out of the belt.

Decomposition
From the analysis in Section 2.1 (as shown in Figure 2), it seems quite difficult to directly build a connection between the two data, primarily because of the large difference in terms of spatial characteristics. To solve this problem, we propose to decompose the time series for both data to remove the spatial effect.
For Beidou data, the idea follows a two-step procedure as shown in Figure 3, where we take Flux1 from 2019-01-10 to 2019-01-14 as an example. The first step is to extract the maximum value of each block from the original observations, where a block means the set of values via the Earth's radiation belt once (see an example marked in red rectangle box in Figure 3). The extracted maximum values constitute trend of the original data. In order to enforce robustness of the first step, the maximum for each block is obtained by taking median of the first ten maximum values over all observations within that block. The second step is to detrend the original data, which is implemented by using the trend to divide the original data. Note that the division operation in this step is conducted blockwisely, in the sense that all observations within a block are divided by its corresponding maximum value and this is done block by block. The resulting detrended data is shown in the bottom panel of Figure 3.
For GEO data, the idea is relatively straightforward compared to the MEO data. We simply take the moving average with a window size of a day over the original data, resulting in the trend of GEO data. Here we do not talk about how to detrend GEO data since only the trend part is required in the subsequent analysis.
With the procedure mentioned above, we extract the trend for all energy channels of both GOES-15 and Beidou data. The results are shown in Figure 4, from which we see that the two kinds of data show a similar fluctuation property. Compared to the original data shown in Figures 1 and 2, the relationship between GEO data and MEO data now becomes straightforward, which lays the foundation for us to build their connection.
One problem remained prior to modeling is the time resolution distinction between two data: the trend of GOES data is recorded every 5 minutes while the time resolution for Beidou trend data is about 386.6 minutes (half of the Beidou MEO satellite period). To solve this problem, we downsample the GOES trend data with a window size of 386.6 minutes and mean function, in the sense that the mean over values within a window is taken as representation of that window. Note that since 386.6 is not integral multiples of 5 the number of values within a window differs over time, but this does not matter for the mean function (It does matter for some other functions, e.g., 'sum').

Methods and Results
In this section, we present our methods for predicting highenergy electron fluxes of Beidou data with GOES observations and the experimental results. Since the detrended part of Beidou data (see Figure 3) is mainly determined by orbit property of the Beidou MEO satellite and is not related to GEO observations, we only talk about predicting the trend of Beidou data with GOES trend data in the subsequent analysis.

Problem Formulation and Feature Extraction
One simple idea for our prediction is to find a function f (·), such that where X t = (E06, E2) t and Y t = (Flux1, Flux2, Flux3, Flux4) t denote the model input and output at time t respectively. This idea is straightforward and correct in theory, but may not yield good performance in practice for some reasons, e.g., noises or time delay.
To this end, we propose to include two-step lagged features, denoted by (X t−2 , X t−1 ) and ahead features, denoted by (X t+1 , X t+2 ), into our prediction model. Then, the prediction problem becomes Note that the ahead features may not be available in some situations, e.g., when the prediction happens at current time and future values have not yet arrived. In this scenario, the input features reduce to combination of lagged values and raw values, i.e., (X t−2 , X t−1 , X t ).
We will explore how the inclusion of lagged and ahead features influence the prediction performance in Section 3.3 through experiments.

Methods
We consider three commonly-used machine learning methods: Linear Regression (LR), Random Forest (RF), and Multi-Layer Perceptron (MLP).
LR is a method that assumes linear relationship between features and target, which can be expressed as The goal is to learn proper parameters b * and w * that produce minimum errors between predictions and true values, which is usually implemented through Ordinary Least Squares, as shown in RF is an ensemble method that takes Decision Tree as its base estimator and uses bagging (bootstrap aggregating) strategy to combine results of base estimators, which usually consists of two steps. The first step is to bootstrap the original data to generate multiple datasets and to train a decision tree model on each of these datasets, yielding multiple predictive results. The second step is to aggregate these results to obtain a more accurate and robust prediction. See Liaw et al. (2002) for more details about random forest models.
MLP is a feed-forward artificial neural network that generates a set of outputs from a set of inputs via a set of hidden units, as shown in Figure 5 that demonstrates a simple MLP with two hidden layers. One could determine the number of hidden layers and number of neurons per layer flexibly according to a specific problem, which enables MLP a powerful tool for a wide range of applications.

Experiments
The data used here are the trend of GOES-15 data and Beidou MEO data resulting from procedures in Section 2.2, which range from January, 2019 to February, 2020 (totally 1580 records). After removing those with missing values (all from Beidou MEO data), we get 1179 valid records. Then we take the logarithm of all the valid data as our final experimental data (the subsequent assessment is based on the logarithm scale and this is a commonly-used step in electron flux data modeling Wei et al. (2018)). For each evaluation, we randomly select 75% of the experimental data as train set and the rest 25% as test set.
Two metrics are used to evaluate the predictive performance: prediction efficiency (PE) and root mean absolute error (RMSE), which are defined as where y i andŷ i denote true and predicted values respectively. The PE measures how well the predictive model is compared to a naive method (just taking average value of the observations as predicted values): zero means equally well performance while one means perfect performance. The RMSE reflects difference between the true values and the values predicted by the model. In terms of the input features, we consider five different combinations of raw, lagged, and ahead features, which are listed in Table 2, where ✓ and × represent inclusion and exclusion of corresponding features respectively. For implementation of the three methods introduced in Section 3.2, we use scikit-learn (https://scikit-learn.org), a free open source Python machine learning library. The hyper-parameters for RF and Table 2. Description of five input feature combinations, denoted by 'R', 'RL1', 'RL12', 'RL12A1', and 'RL12A12', where ✓ and × represent inclusion and exclusion of corresponding features respectively.  MLP are chosen automatically with the so-called 'randomized search cross validation' strategy that is also implemented in scikit-learn. Before going into the performance of all involved methods, we first check the convergence and generalization property of MLP since it is a relatively complex model (the number of parameters is big), which may have a risk of overfitting. Figure 6 shows PE on validation set (randomly set aside 10% of training data) over the number of epochs for MLP with input feature RL12A12 for the four targets (Flux1, Flux2, Flux3, and Flux4). The reason why we choose RL12A12 as our experiment is because the number of input features in this case is the most, constituting the easiest case of getting stuck in overfitting. We can see that the model displays a good property of convergence and generalization, in the sense that PE tends to get stable after 50 epochs and it reaches a decent result on validation set.

Raw features (Xt
The comparative results of LR, RF, and MLR with the five input feature combinations are shown in Figure 7, displaying the mean of PE and RMSE over 20 times repeated experiments for the four targets (Flux1, Flux2, Flux3, and Flux4) respectively (Here it means we conduct 20 times experiments for each setting, e.g., (Input features=RL1, Method=RF, Target=Flux4), in which different experiments take different train/test split, in the sense that we resample the train set for a new started experiment.). In the dimension of methods, we see that Random Forest substantially outperforms LR and MLP in both PE and RMSE regardless of input features and targets, showing best performance. In terms of the targets, Flux1 is the easiest one to predict while Flux4 is the hardest one. With respect to the input features, it is clear that inclusion of lagged and ahead features can significantly improve the predictive performance of all the methods, especially for Flux1 and Flux2. To conclude, the Random Forest method with some lagged and ahead features gives the best results: PE is greater than 0.75 for all fluxes and it almost reaches 0.9 for Flux1, which means quite good prediction. Therefore, we claim that it is a fairly good idea to predict MEO high-energy electron fluxes with GEO data to solve the problem of data scarcity at MEO.

Application: Missing Values Imputation for Beidou MEO Data
As shown in the bottom panel of Figure 1, there exist a lot of missing values in Beidou MEO data, which poses a challenge for the subsequent data analysis and applications. Therefore, it is important and necessary to impute these values before going forward. In this section, we introduce an approach to accomplish this goal, in which the Random Forest method presented in Section 3.2 is used to handle the trend part while a moving block method is for the detrended part (seasonality). Impute trend part: With the available data, we can learn a Random Forest model that predicts MEO fluxes with GEO observations in terms of the trend. Since GEO data are complete, we can take the GEO data whose corresponding MEO observations are missing as input to the learned model, resulting in the imputed values of the trend part of MEO data. The original trend and imputed trend of Beidou MEO data are shown in Figure 8, showing all energy channels. We see that the imputed trend is complete and looks quite reasonable.
Impute detrended part: From Figure 3, we see that the blocks for detrended data show quite similar behavior. On the basis of this fundamental characteristic, we use a simple moving block technique to impute the detrended part of Beidou MEO data, in the sense that we randomly choose a block from available data (non-missing part) and take it as values of the missing block. We repeat this procedure for all missing blocks, resulting in a complete detrended data. Figure 9 displays original and imputed seasonality for Flux1 from 2019-02-01 to 2019-02-05, which shows good and sensible results. To save space, we only show the results for Flux1. The results for Flux2, Flux3, and Flux4 have similar picture.
The imputed trend times the imputed seasonality results in imputed raw data, as shown in Figure 10, where original and imputed Flux1 from 2019-02-01 to 2019-02-05 are displayed.

Conclusion and Discussion
In this paper, we proposed a machine learning approach that predicts Beidou MEO high-energy electron fluxes with its counterpart of GOES-15 satellite. This proposal provides a way to mitigate the problem of data scarcity at MEO, which could subsequently facilitate the understanding and exploration of MEO environment. The approach first decomposes the original data to get trend and detrended part, then applies a regression technique to predict the trend of Beidou MEO data with the trend of GOES-15 data.
In the experiments, we explored three commonly-used regression models and found that Random Forest outperforms Linear Regression and Multi-Layer Perceptron substantially, reaching a predictive efficiency (PE) around 89% for Flux4 with all features. A feature engineering analysis shows that including lagged features and (possible) ahead features into the prediction model is rather helpful to improve model performance, leading to an advance ranging from 1% to 10% w.r.t. PE. We also checked the convergence and generalization property of MLP to avoid overfitting, and find that it performs quite good and gets stable after around 50 epochs. At last, we illustrated our method by presenting an application of imputing missing values for Beidou MEO data.
In the current analysis, we focused on predicting MEO values with GEO data, not the other way around, which is mainly because this direction is of greater importance due to data scarcity at MEO that does not exist at GEO. In the future, we would consider both directions, which could enable a more accurate forecast of the high-energy electron flux for both GEO and MEO since the learned relationship between them provides a way for them to borrow information from each other. This could be implemented following the idea of multi-task transfer learning (Samala et al. 2017).
While we only consider three commonly-used models, one could explore more methods, use ensemble strategies (Hastie et al. 2009;Onan et al. 2016a;Onan 2018; or even try an automatic machine learning scheme (Zeng and Luo 2017). It is also encouraged to include more satellites data other than GOES-15 to achieve better results. Given that there are just few available data at MEO for now, we did not take deep learning models (Lecun et al. 2015) into account, but it is fairly possible and advisable to try them in the future as more and more data become available.