The atmospheric model of neural networks based on the improved Levenberg-Marquardt algorithm

: Traditional atmospheric models are based on the analysis and fitting of various factors influencing the space atmosphere density. Neural network models do not specifically analyze the polynomials of each influencing factor in the atmospheric model, but use large data sets for network construction. Two traditional atmospheric model algorithms are analyzed, the main factors affecting the atmospheric model are identified, and an atmospheric model based on neural networks containing various influencing factors is proposed. According to the simulation error, the Levenberg-Marquardt algorithm is used to iteratively realize the rapid network weight correction, and the optimal neural network atmospheric model is obtained. The space atmosphere is simulated and calculated with an atmospheric model based on neural networks, and its average error rate is lower than that of traditional atmospheric models such as the DTM2013 model and the MSIS00 model. At the same time, the calculation complexity of the atmospheric model based on the neural networks is significantly simplified than that of the traditional atmospheric model.


Introduction
Recently, with the development of constellation theory and application technology, a variety of low-earth orbit communication constellations began to be deployed (Xu and Zhang 2018). High precision orbit calculation of low-Earth orbit satellites has become a matter of great concern once again. For low orbit satellites, atmospheric resistance is one of the main perturbation factors, and the model accuracy of atmospheric density in low earth orbit directly affects the calculation accuracy of low earth orbit (Hatten et al. 2017). Since the 1960s, various factors affecting atmospheric models have been summarized, and a variety of empirical models of atmospheric density have been established. There are two main kinds of traditional atmospheric models. In the first category, only the change of atmospheric density with the ground height was considered, such as the exponential model, the improved Harris-Priester model (Montenbruck and Gill 2001). In the second category, not only the influence of height but the influence of longitude, latitude, and different periods was were taken into accounts, such as the Jacchia series model (Jacchia 1970), the DTM series model (Berger et al. 1998), and the MSIS series model (Hedin et al. 1977). Through the longterm accumulation and analysis of historical measurement data, these models have formed linear or nonlinear polynomials for different atmospheric factors. These models have relatively stable accuracy and are suitable for global atmospheric prediction. The common disadvantages of these models are their low accuracy. The error rates are generally more than 10-15% when the influencing factors are stable (Marcos et al. 1994). The errors are usually about 30% when the various influencing factors are of high disturbance, and the errors are even as high as 100% in some years (Sarah et al. 2002).
With the development of neural networks and data mining technology, researchers have tried to combine data mining, artificial intelligence, and atmospheric models, spacecraft orbit calculations, and other technologies to improve the accuracy of spacecraft orbits and atmospheric calculations. Tang et al. (2001) predicted the atmospheric circulation over the Pacific Ocean by using neural networks, and the results were superior to that of linear regression. Zhang et al. (2014) solved the resource scheduling problem of the Beidou constellation by intelligent distribution algorithm. Cao et al. (2015) improved the accuracy of position and attitude estimation of satellite formation by an intelligent unscented predictive filter. Rosangela et al. (2015) used neural networks to perform data assimilation of atmospheric circulation model and achieved encouraging results. Jiang et al. (2017); Jiang (2018) studied the stability of small satellites in the gravitational field of small celestial bodies by intelligent classification. Liao et al. (2018) used the spacecraft orbit data as the benchmark to form the inversion error model, modified the space atmosphere model based on the data mining method and improved the accuracy of the atmosphere model.

Factors affecting the density of space atmosphere
The NRLmsise-00 model (hereinafter abbreviated to the MSIS00 model) proposed by the United States Naval Research Laboratory (NRL) in 2000 (Picone et al. 2002) and the DTM2013 Model (Bruinsma 2015) released by ATMOP (Advanced Thermosphere Model and Orbit Prediction project) in 2013 are the two widely used high precision atmospheric model. Since the vertical distribution of atmospheric temperature determines the vertical distribution of atmospheric density, various atmospheric models use the upper atmosphere temperature as the representation of atmospheric density (Li 1995). Both the two atmospheric models calculate the atmospheric density by calculating the atmospheric temperature. For the DTM2013 model, the atmospheric density at the height Z is: Where, ρ i (120 km) are the density of atmosphere ingredients at the height 120 km. The ingredients include O, O 2 , H, He, N, N 2 . 120 km is the minimum limit of DTM models. f i (Z) = (T 120 /T(Z)) 1+α+γ i exp(−σγ i ζ ), where T 120 is the absolute temperature at the height of 120 km, T(Z) is the absolute temperature at the height Z. G i (L) is the atmospheric ingredient density correction term based on periodic and non-periodic factors. The non-periodic terms include the latitude term, the solar activity term, and the geomagnetic activity term. The periodic terms include the annual term, semiannual term, diurnal term, and semidiurnal term. The parameters of G i (L) include the coefficients a i , the associate Legendre polynomial coefficient Pmm based on the solar incidence Angle, the solar radiation proxy of the last day, the mean of F 30 in 81 daysF 30 , the geomagnetic index K P in 3 hours, the mean of K P in dayK P , the day of year d.
For the MSIS00 atmospheric model, the temperature of atmosphere where the space higher than 117.2 km is: Where ϵ(Z, Z L ) is determined by the height Z and latitude B, ϵ(Z, 3.085462×10 −6 +2.27×10 −9 ×cos(2B) , Z L = 120 km. The low order spherical harmonic function fitted with the measured atmospheric data in space is: Where P i (31) is the fitting constant, which is obtained from the parameter table. T 1 is the solar term (the solar radiation proxy use F 10.7 ). T 3 is the symmetrical annual term. T 4 is the symmetrical semi-annual term. T 5 is the asymmetrical annual term. T 6 is the asymmetrical semi-annual term. T 7 is the diurnal term. T 8 is the semidiurnal term. T 9 is the geomagnetic term (the geomagnetic index use A P ). T 10 is the terdiurnal term. T 11 is the longitude term. T 12 is the hybrid term of time and longitude. T 13 is the hybrid term of time, longitude, and geomagnetic (Picone et al. 2002).
For the space that lower than 117.2 km, the temperature is calculated by interpolation at five height nodes (Z 1 = 120 km, Z 2 = 110 km, Z 3 = 100 km, Z 4 = 90 km, Z 5 = 72.5 km). The function of latitude such as ϵ(Z, Z 1 ) and ϵ(Z 5 , Z 1 ) are used in the temperature function T (Z) . The GS(P i ) function is used in the fitting item of the node function T (Z) , and it can be written as: Where T 1 is the solar term (the solar radiation proxy use F 10.7 ). T 3 is the symmetrical annual term. T 4 is the symmetrical semi-annual term. T 5 is the asymmetrical annual term. T 6 is the asymmetrical semi-annual term. T 7 is the diurnal term. T 8 is the semidiurnal term. T 9 is the geomagnetic term (the geomagnetic index use A P ). T 10 is the terdiurnal term. T 11 is the longitude term. Based on the analysis, the influence term distribution of a spherical harmonic function of the DTM2013 model and the MSIS00 model is completely consistent. In these two models, the measurement time (including annual item, semi-annual item, diurnal item, semidiurnal item, terdiurnal item), longitude, latitude, height, solar radiation proxies, and geomagnetic indices are included. Therefore, the algorithm of the DTM2013 model and the MSIS00 model is to calculate the atmospheric temperature at a certain height in the space indirectly and then obtain the atmospheric density. The atmospheric temperature at a certain location is a function of the parameters such as height, measurement time, longitude, latitude, solar radiation proxy (and its 81-day mean), geomagnetic index (and its daily mean), etc. The function of atmospheric density and influencing factors is: Since the DTM2013 model and the MSIS00 model are all semiempirical models, the polynomials and coefficients of f (t, longi, lati, Z, SRP, GI) are derived by analyzing and fitting historical measured data. The process of constructing the atmospheric model with the influencing factors can be transformed into a training process of neural networks if the f (t, longi, lati, Z, SRP, GI) is seen as the model of a neural networks. A part of the historical measured atmospheric data is selected randomly to form the training data set. The parameters such as time, longitude, latitude, height, solar radiation proxies, geomagnetic indices in the training set are taken as the input parameters, and the measured atmospheric density data is taken as the expected output to train the neural networks. So the trained neural networks is the atmospheric model based on neural networks. Taking the left measured data as the test data set, the input parameters are substituted into the atmospheric model based on the neural networks to calculate the simulation results of atmospheric density. The model error rate of the neural networks can be obtained by comparing the measured density and the simulation results. Taking the minimum MAE rate as the target, the optimal neural networks atmosphere model can be obtained through iterative training and the parameters correction by error back propagation.
The solar radiation proxy F 30 and geomagnetic index K P are used in the DTM2013 model, and the solar radiation proxy F 10.7 and geomagnetic index A P are used in the MSIS00 model. There are four combinations of different solar radiation proxies and geomagnetic indices in the atmospheric model based on neural networks. To simplify the problem, K P index is matched with F 30 and F 10.7 proxy to complete atmospheric modeling. Due to the equivalence of A P index and K P index (Leif 1976), the accuracy and adaptability of the model use K P index apply to the model that use A P index.

Atmospheric density data sets based on the GOCE measured data and neural networks
GOCE (Gravity field and steady-state Ocean Circulation Explorer) is a low-orbit satellite launched by the European Space Agency (ESA) in 2009, which is mainly used to detect the earth gravity field and Ocean Circulation. It is equipped with a highly sensitive gravimetric gradiometer capable of making three-dimensional measurements of changes of earth gravity field. The orbital period of GOCE is about 1.6 hours. ESA has released GOCE measured data as public products for precision atmospheric models (Bruinsma et al. 2015).
The measured data of GOCE include measured time, longitude, latitude, height, measured atmospheric density, and so on. These data can be combined with the measured data of solar radiation flux released by the NOAA (National Oceanic and Atmospheric Administration) (Wang et al. 2014) and the measured geomagnetic indices data released by the NGDC (National Geophysical Data Center) to construct two complete data sets of space atmosphere.
The atmospheric data measured by GOCE for 3 years from November 2009 to November 2012 was fused with two solar radiation proxies (F 10.7 and F 30 ) and geomagnetic index K P . The data are divided into two data sets according to different solar radiation proxies. Data set 1 includes: measured time, longitude, latitude, height, F 10.7 proxy, 81-day mean of F 10.7 , 3-hour geomagnetic index K P , daily mean of K P and measured atmospheric density. The data set 2 includes: measured time, longitude, latitude, height, F 30 proxy, 81-day mean of F 30 , 3-hour geomagnetic index K P , daily mean of K P and measured atmospheric density.
Error Back Propagation was proposed by James et al. (1987) of the PDP (Parallel Distributed Procession) group firstly. Neural networks adopted the algorithm of Error Back Propagation to adjust the weights of the networks, which solved the learning problem of multi-layer neural networks and greatly promoted the development of neural networks. In the neural networks, the input layer, the hidden layers, and the output layer are connected, and the neurons in the same layer are isolated. The improved LM (Levenberg-Marquardt) algorithm (Lera and Pinzolas 2002) is adopted for the specific networks learning method. The main idea of the LM algorithm is: Suppose the number of neurons in the input layer are M, the number of neurons in the hidden layers are I, the number of neurons in the output layer are J, the m − th neuron of input layer is xm, the i t h neuron of hidden layer is k i , the j − th neuron of output layer is y j . The connection weight from xm to k i is ω mi , and weight matrix from k i to y j in the hidden layers is ω ij . The weight matrix IW of input layer is composed by ω mi , and the weight matrix LW of hidden layer is composed by ω ij . For the n + 1 iteration process, there are: where H is the Hessian matrix of the error performance function, g is the gradient. For the binary differentiable function f (x, y), the Hessian matrix is: The Hessian matrix of the multiple variables function is analogous. When the error performance function has the form of squared error, the Hessian matrix can be approximated as H = J T J, and the gradient is g = J T e. J is a Jacobi matrix containing the first derivative of the error performance function to the networks weights. The function for correct the weights of the networks in LM algorithm is: Because the Jacobi matrix is easier to calculate than the Hessian matrix, the LM algorithm can improve the learning efficiency of neural networks.

Construction of atmospheric model based on neural networks
The steps of constructing the atmospheric model based on neural networks are as follows: 1. Clean and fusion the measured atmospheric data of GOCE, solar radiation proxy data, and geomagnetic index data. Form integrated data set that includes measured time, longitude, latitude, height, solar radiation proxy, geomagnetic activity index, measured atmospheric density. There are 7954820 samples were collected after eliminating some errors and meaningless data (such as some measured atmospheric density is a minus and some geomagnetic index has no measured value). Since the value space of all kinds of data vary greatly, the data set is normalized linearly based on their maximum and minimum values (Jin et al. 2016); 2. Conduct random classification of the above data set, among which 3000000 samples are randomly selected to form the training data set, and the remaining 4954820 samples are the test data set; 3. Construct neural networks f (t, longi, lati, Z, SRP, GI) = ρ Atmos . The neural networks for atmospheric density simulation is set to contain 1 − N hidden layers, and the number of node neurons in a single hidden layer is between 5-25 because too many hidden layers and too many neurons in a single hidden layer will increase the computational and application complexity of the neural networks. The training iteration limitation is set as 1000 times. The initial value of the weight matrix IW and LW(n)(n = 1, 2, . . . , N) in the input layer and hidden layers of neural networks adopts random sequence; 4. Train the neural networks by the training data set in 2). The input parameters of the training sample include time, longitude, latitude, height, F 10.7 (F 30 ), 81-days mean of F 10.7 (F 30 ), K P , and day mean of K P . The expected output is the atmospheric density at the corresponding time and position. Carried out linear normalization on the input parameters. Add constant term to the input parameters, multiply the input sequence by the weight matrix IW, and carry out the Tan-Sigmoid transform on the input sequence and complete the calculation of the input layer. Make the calculation times of the hidden layer n = 1; 5. Add constant term to the input sequence, multiply the transformed input sequence by LW(n) and carry out the Tan-Sigmoid transform to the input sequence, complete the calculation of the n − th hidden layer; 6. If n = N, turn to step 7) to the calculation process of the output layer; If n < N, make n = n + 1, repeat step 5), complete the calculation process of the n − th hidden layer; 7. Since the output layer transform function is linear, no operation is required in the output layer. Carried out reverse linear normalization on the calculation results to obtain the simulation atmospheric density in neural networks; 8. Input the input sequence of the test data set into the trained neural networks. Compare the simulation results with the GOCE measured atmospheric density of the same input sequence, and calculate the mean absolute error rate. If the mean absolute error rate meets the requirements or the number of iterations has reached the limitation, the calculation will stop and turn to step 9); Otherwise, according to the change of gradient, improve various weights in the weight matrix IW and LW(n) = (n = 1, 2, . . . , N) based on equation (8), re-enter the step 4); 9. Record the weight matrix IW and LW(n)(n = 1, 2, . . . , N) of the last iteration and normalization parameters of the neural networks. The atmospheric model based on neural networks is constructed by the above parameters.
The neural networks were set as 1-4 hidden layers and 5-43 neurons. The neural networks were trained by GOCE training data set 1. Relative to the number of hidden layers and neurons, the curve of model MAE rates was as follows: It can be seen from Figure 1 that the performance of the neural network changes greatly with the number of hidden layers and the number of neurons increases. For a single hidden layer neural network, the MAE rate drops rapidly and converges as the number of neurons increases. In a neural network with 2 or 3 hidden layers, the MAE rate decreases slowly as the number of neurons increases. In a neural network with 3 hidden layers and 40 neurons, the minimum MAE rate is about 9.6%. As the number of neurons increases continuously, the MAE rate gradually increases. As the number of neurons increases, the MAE rate rises rapidly. This shows that when the hidden layer is more than 3, the entire network is over-fitting. On the contrary, the neural network is under-fitting when the hidden layer is less than 3. Synthesizing the data in the figure, a neural network composed of 3 hidden layers and 40 neurons is optimal.
The MSE (mean square error) variation curve of 1000 training sessions of the atmospheric model based on neural networks is shown in Figure 2.
The atmospheric model training process based on neural networks with 3 hidden layer is shown in Figure 2. When the number of iterations is less than 100 times, the MSE drops rapidly, which shows that the gradient of the network drops quickly and the learning efficiency is high during this period. When the number of iterations exceeds 700, the MSE of the system gradually converges and shrinks to a plateau. This shows that the gradient of the network changes very little and the learning efficiency is very low during this period. The above data shows that it is reasonable to set the upper limit of learning times of the neural network to 1000, and the optimal performance has been reached in the learning of the neural network, and more learning cannot continue to achieve a significant improvement in the performance of the neural network.

Simulation Results
The MAE rate of F 30 proxy neural networks atmospheric model is 9.31%. Figure 3 is the MAE rate after interval sampling.
The MAE rate of F 10.7 proxy neural networks model is 9.60%. Figure 4 is the MAE rate after interval sampling.
It can be seen from Figure 3 and Figure 4 that the MAE rates of the two neural networks models are extremely approximate. The MAE rate of the atmospheric model based on F 10.7 neural networks is slightly greater for some outlier data with large errors than that based on F 30 neural networks. The reason for this phenomenon is that the F 10.7 proxy will bring greater model error than the F 30 proxy when the data variance is larger (Cui et al. 2020).
The atmospheric density sampling of GOCE in March 2012 was selected as an example to analyze the accuracy of the two models in a large space range over a long period. The atmospheric density simulation results of the two neural network models corresponding to the GOCE orbital position were compared with the GOCE measured data. The global atmospheric density data measured by GOCE in March 2012 is shown in Figure 5.
The global atmospheric density F 10.7 neural networks model simulation results in March 2012 is shown in Figure 6.
The global atmospheric density F 30 neural networks model simulation results in March 2012 is shown in Figure 7.
The MAE rates of the atmospheric model are based on F 10.7 and F 30 are 8.18% and 8.07% respectively of the atmospheric density simulation experiment for data in March 2012. It can be seen from Figure 5, Figure 6, and Figure 7 that two neural networks models have completed the reconstruction of atmospheric density data in sampling points. Compared with the measured data, the simulation data of the two neural networks atmospheric models are smoother, and the simulation accuracy of local extremum points is insufficient. In the comparison between the two neural networks models, the simulation results of the local extremum points by the neural networks model based on F 30 proxy is slightly better than that based on F 10.7 proxy. As a large number of training data are concentrated on the non-extreme points, the neural network will have the problem of no training or insufficient training for some extreme points. This situation is shown in the figure, which shows that the atmospheric model based on neural network has a poor generalization effect on some extreme points.
To give specific model simulation results, the GOCE measured data for each year from 2009 to 2012 are randomly selected. With the time, longitude, latitude, and altitude in  the data as input parameters, the accuracy of the two neural network models and the two traditional atmospheric models were compared. The input parameters are calculated using DTM2013 mode, MSIS00 mode and two neural network atmospheric mode respectively. The randomly selected GOCE measured data input parameters are shown in Table 1.
The GOCE measured atmospheric density and the atmospheric density simulation results of the four models are as shown in Table 2.
The atmospheric density simulation MAE rates of the four models are shown in Table 3.
The statistical figure of MAE rates of the above four models is shown in Figure 8.  It can be seen from Table 1, Table 2, Table 3, and Figure  8 that in the simulation results of the atmospheric density of random sampling space locations in different years, the performance of the two neural networks models is significantly better than that of the two traditional atmospheric models. The MAE rates of the neural networks models are about 1/3-1/2 of that of the traditional models. In the two tra-ditional atmospheric models, the MAE rate of the DTM2013 model is slightly lower than that of the MSIS00 model. In the two atmospheric models based on neural networks, the MAE rate of the F 30 neural networks model is slightly lower than that of the F 10.7 neural networks model. For the calculation of atmospheric density at the same time and space, the MAE rates of the two traditional atmospheric models show a certain consistency, and the two neural networks atmospheric models also show a certain consistency. According to equations (1)-(4), the consistency of the two traditional models stems from the fact that the two models are basically the same pattern in calculating atmospheric density. The consistency of the two neural network models stems from the fact that the input parameters and calculation methods of the atmospheric models in the two models are also the same, the difference is only that different solar radiation parameters are used.
It can be seen from equations (1)-(4) that the calculation process of the traditional atmospheric model is as    follows: polynomial fitting is carried out for several factors affecting the atmosphere, the fitting results are summed up, and then the atmospheric temperature is calculated according to the results. Finally, the corresponding atmospheric density is obtained from the atmospheric temperature. But for the atmospheric model based on a neural network, its calculation process is as follows: multiple factors affecting the atmosphere are multiplied by the neural network matrix, then the corresponding atmospheric density can be obtained. The atmospheric model based on neural network does not need to calculate a large number of atmospheric factors fitting polynomials, and does not need to pre-set a large number of atmospheric component coefficients in the algorithm, just need to use the neural network matrix, its calculation process is much simpler than the traditional atmospheric model. On the HP ProLiant DL388 Gen10, the atmospheric model based on neural network performs a single computation in about 1.5-3ms, compared with about 5-7 ms for the traditional atmosphere model. The speed of the neural network atmosphere model is obviously faster than that of the traditional atmosphere model. In the orbit calculation of a low-Earth orbit spacecraft with a height close to GOCE, through the atmospheric model based on the neural network, it is possible to obtain the high-precision atmospheric density of the spacecraft position in quasi real-time and calculate the spacecraft by simply relying on matrix multiplication. The resistance of the device improves the orbit determination accuracy of the device.

Conclusion
A novel atmosphere model based on neural networks is derived. In the atmospheric model based on neural networks, the atmospheric density measurement data of GOCE is taken as the basic data. The measured time, longitude, latitude, height, solar radiation proxies, and geomagnetic indices of the measured space taken as the input parameters, and the atmospheric density of the space position is taken as the label. Compared to the simulation results of the DTM2013 model, the MSIS00 model, and the two neural networks, it is confirmed that the atmospheric model based on neural networks has increased the calculation accuracy significantly and simplified the calculation method significantly. Besides, the atmospheric model based on neural networks can be updated rapidly and iteratively with the new spatial atmospheric measurement data which improves the accuracy of the model. The atmospheric model based on neural network can be applied to orbit determination and orbit prediction of low earth orbit satellites.
In the future, if more than one observation is introduced into the method, the range of satellite altitudes that can be applied to it will be expanded. The main limitation of the neural networks model based on GOCE data is that the adaptability of the atmospheric model is limited because the height of GOCE is relatively fixed (about 250-280 km), more measured data of different space height should be added in the future research to wider the adaptability of the model, such as data from satellites such as CHAMP and GRACE. The value of solar radiation proxy F 10.7 and F 30 is relatively high compared with the low solar activity years because 2009-2012 is an active period of solar activity. So more measured data of different stages of the solar activity cycle also should be added in the subsequent studies.

Funding information:
This article does not accept fund support.