Measuring the Imbalance of Regional Development from Outer Space in China

Abstract We develop a statistical framework to use the data of night-time-lights (DN) from satellite to augment official GDP measures, and a non-linear substitution relationship between DN and GDP is given. In this paper, we take advantage of DN instead of GDP to measure the imbalance of regional development (IRD) in China by using the method of bi-dimensional decomposition under the population-weighted coefficient of variation. The method enables us to analyze the contributions of DN components to within-region and between-regions inequality under the framework which has been proposed, we can get the conclusion that the imbalance between-regions rather than within-region is the main reason for the influence of IRD for the whole country in China.


Introduction
In the past 30 years, the China's economy has achieved a rare high-speed growth in the world, and average growth rate of gross domestic product (GDP) spanning 2004 to 2016 reached about 8.9%. However, while affirming the miracle of China's economic growth, the authenticity of China's GDP statistical data has been questioned. Therefore, it has raised greater doubts about the imbalance of regional development in China measured by the indicator of GDP or its derivative indicators. The external doubts make the authenticity of China's economic growth and the imbalance of regional development become an important topic of academic discussion.
As an important indicator to measure the economic status and development level of a country or region, GDP also serves as the core indicator for the imbalance of regional development in the existing literature research, and data's generation of it has standard accounting methods and strict statistical system. However, compared with developed countries, the accounting methods and statistical systems of developing countries' GDP are generally backward [1,2] . Wu [3] pointed out that there are four sources of estimation error in China's GDP: Data distortion, inaccurate conversion, underestimation of service industry and other errors and sampling errors. Furthermore, others believed that there are political pressures of local governments, regional development imbalances are also the cause [4,5] . In the face of doubts about GDP estimation errors, economists are concerned about finding a more objective indicator to replace GDP. Fortunately, in recent years, the data of night-time-lights (DN) released by the National Oceanic and Atmospheric Administration (NOAA) has received extensive attention from the academic community. More over, many scholars begin to make deep studies on the relationship between DN and GDP. And they believed that there is a positive linear substitution relationship between them. Henderson, et al. [6] developed a statistical framework to use satellite data on night-time lights to augment official income growth measures, and Xu, et al. [5] applied the method of this research to study the authenticity of China's GDP, they insisted that although there is a certain deviation, the linear substitution relationship between DN and GDP is significant. Therewith, domestic scholars in China, replaced the indicator of GDP with DN and launched a series of studies on industry, space, economic development, and so on. Such as, Mellander, et al. [7] used the night-time light data to a good proxy measure for economic activity, Xin, et al. [8] monitored the urban expansion by taking the night-time light data from the spatial aspects, and Li, et al. [9] analyzed the relationship between local political chief turnover and economic growth after characterizing the economic growth by the night-time light data. However, in the specific measurement of the relationship between DN and GDP, they did not consider the relationship of the indicators from the actual data, ignoring the study of nonlinear relations.
In this paper, we focus on the actual substitution relationship between GDP and DN, and the study of nonlinear relationships, especially. After the specific substitution relationship between the two has determined, we take DN instead of GDP to dig out the imbalance of regional development in China furtherly. What's more, when we measure regional inequality, two types of regional inequality need to be distinguished (Milanovic [10] ; Kanbur and Venables [11] ): The first one of regional inequality refers to un-weighted variation in per capita income across regions, while the second one concerns population-weighted variation in per capita income across regions. It is well known that the square of the coefficient of variation belongs to the generalized entropy class of inequality measures. Akita and Miyata [12] has improved the model of populationweighted variation to calculate the degree of regional imbalance, its spatial decomposition is one of the main methods for the unbalanced development of the current research area. Because it enables us to analyze the contributions of GDP components to within-region and between-region inequalities and to overall regional inequality in a coherent framework, as Qin, et al. [13] studied, population-weighted variation has integrated income decomposition into the decomposition of space industry is a more suitable method in China.
In light with Liu, et al. [14] on the inverse absolute deviation method to measure imbalance, and drawing on the other methods of existing research, we propose a method of adjusting the weighted coefficient of variation (ACV) for spatial decomposition of regional development imbalance in this paper. Moreover, we use DN instead of GDP to decompose ACV, and to explain the current situation of regional imbalance of China. After an empirical analysis, we can get the new method is more universally applicable and the results are in line with the truth practically.
Our arrangement for the paper is as follows. In Section 2, we verify the nonlinear relationship between GDP and night-time lights data in China. In Section 3, we measure the degree of imbalance in China's regional development. In Section 4, based on the measurement results, we analyze the process and causes using the method of AVC, which is proposed by ourselves by using the actual data from 2004 to 2016 in China, and the conclusion is given in this part.

The Substitution Relationship Between GDP and DN
The research of existing literature on the relationship between the data of night-time-lights (DN) and GDP is limited to positive and linear. In this part, based on the data of GDP given by the Chinese official, and the actual data of night-time-lights published by NOAA, we will dig out the specific relationship between GDP and DN from the aspects of non-linear under the verification of theory and empiric.

Data of Night-Time-Lights
Satellites from the United States Air Force Defense Meteorological Satellite Program (DMSP) have been circling the earth 14 times per day recording the intensity of Earth-based lights with their Operational Linescan System (OLS) sensors since the 1970s, with a digital archive beginning in 1992. The raw data comes from NOAA and National Geophysical Data Center (NGDC), and its open for the public. More importantly, they remove observations for places experiencing the bright half of the lunar cycle, the summer months when the sun sets late, auroral activity (the northern and southern lights), and forest fires. These restrictions remove intense sources of natural lights, leaving mostly man-made lights. Observations where cloud cover obscures the earths surface are also excluded. Finally, data from all orbits of a given satellite in a given year are averaged over all valid nights to produce a satellite-year dataset (Piet, et al. [15] ). It is the datasets that are distributed to the public (Xi and William [16] ; Xu, et al. [5] ). In this paper, due to the thoughts of Henderson, et al. [6] , we calculate the the night-time lights data from NOAA of China by using the software of ArcGis. Different from them is we download the data from satellites of F15, F16 and F18, and time span is 2004 to 2013. In addition, we use the average value of DN based on the number of population from the region, rather than based on the number of Grids and geographic area. Before applying the DN to measure the degree of imbalance in China and its provinces, it is necessary to verify the correlation between the night-time lights data and the GDP data to prove that GDP can be replaced by DN.

The Hypothesis of Model
In the most of existing literature research, the relationship between GDP and lighting data is basically a linear connection. Represented by Henderson, et al. [6] , they set up the relationship between the rate growth of GDP and the rate growth of lights brightness as follows: among them, y i representing the rate growth of GDP, l i representing the rate growth of lights brightness, N representing different regions, u i is the error term. However, in the research of real-world, the linear relationship between GDP and lights intensity data is not very appropriate. As far as the brightness of the lights itself is concerned, the data value it reflects is a stable state of the growth region; and the linear relationship reflects an ever-increasing or decreasing relationship, which is inconsistent with the real economy.
According to actual data and the research purpose, we must first test the correlation between GDP and lights brightness, and we set the model as In Equation (1), y representing the per capita GDP, l representing the per capita lights brightness value, f (·) is a nonlinear function and reflecting the correlation between GDP and lights brightness. According to the preliminary nonlinear relationship test, its verification is given behind, we assume that the nonlinear function as In fact, we also estimated and tested other nonlinear function relationships between DN and GDP, such as S-type function, Power-exponential function, Cubic function, Logarithmic function and so on. However, the quadratic function has the best fitting effect, which is embodied by the smallest residual sum of squares, and there will be corresponding explanations later.
Although the lights data is obtained from outer space by satellite, there are still some measurement errors. Such as, differences in geographical location, cultural customs, and lifestyles between different provinces may result in different brightness of the lights. In order to reduce the interference of measurement error on the estimation of the result, in this paper, we change the model further as In Equation (2), η is the unobservable effect of province, κ is the time effect, C is a vector of control variables, γ is the coefficient corresponding to the control variables' set, μ is the random error which is a normal distribution subject to zero mean and constant variance, α and β are parameters of nonlinear function, N and T are number of the province and the year respectively.

Description of Variables
In the measurement model, Equation (2), the dependent variable is expressed in terms of per capita GDP (unit: Hundred yuan), and the core independent variable is expressed in per capita lights brightness (sum of DN/ total population * 100). Taking into account the analysis of other influencing factors, on the basis of existing research, such as Zhang [17] put human development index into the consideration of dynamic linkage between information and communication technology, Li, et al. [18] used the urbanization on assessing the spatial and temporal differences in the impacts of factor, Gao, et al. [19] considered the ecosystem service into regional development, and so on. We also try to use some substitution variables, such as consumption of electricity (Elec), which is expressed in terms of total consumption of electricity, is highly correlated with the brightness of the lights in the model. And the rate of urbanization (Urb), is expressed by the urban population in the proportion of total population. Finally, during the model fitting process, we also added some control variables, including force of labor (Lab) and rate of employment (Emp), which reckoned with the number of employment divided by total population; capital assets (Cap) is expressed as the ratio of fixed asset investment with GDP.
The sample which in this paper is including 31 provinces and cities in China (excluding Hong Kong, Macao and Taiwan). The time span is from 2004 to 2013 and all the data come from an open database, such as: The per capita GDP and population come from the official website of the National Bureau of Statistics; the data of night-time-lights comes from NOAA (https://ngdc.noaa.gov/eog/opennewwindow); the electricity consumption comes from the National Bureau of Statistics, the provincial statistical yearbooks and the Chinese energy statistics yearbooks; the urbanization rate, capital, and labor data are from the China Statistical Yearbook, and the Statistical Yearbooks of the Provinces. Table 1 gives descriptive statistics for the variables. Since the available lights data spans from 1992 to 2013, and 2004 was a turning point in the study of regional development imbalance, proved by many research institutes. Therefore, the actual data we selected was from 2004 to 2013, and the actual sample size was 310. Because of the consumption of electricity's data in some years of Tibet and Xinjiang is missing, and the total number of samples of its variable is less than other variables. The Variance Inflation Factor (VIF) of the variables in Table 1 at last column are all less than 3, indicating that there is no significant multi-collinearity among independent variables.

The Test of Model
The specific determination of the nonlinear function f (l) form is based on the national per capita DN and the national per capita GDP in the time series data of 2004-2013, the crosssectional data of the provinces and cities in 2013, and the panel data of the provinces and cities in China during 2004-2013. Based on comprehensive considerations, it is aimed at finding a functional form with minimum estimate error and maximum fitting effect to determine the substitution relationship between DN and GDP, and then to make further research on regional development imbalance. A nonlinear test is required before a specific nonlinear model is given. We used a variety of methods to fit the nonlinear relationship in Equation (1). In this paper we select the relatively high fitting effect for display, and the results are shown in Figure 1. Through the display of four parts in Figure 1, we get the other two conclusions are consistent. The left graph of the upper shows the correlation between DN value and GDP with a curve of volatility growth, the right graph of the upper is displaying the different curve relationships, including a quadratic function, an S-type function and a simple linear function. In the same way, the per capita DN is the dependent variable for per capita GDP for the all parts in Figure  1. After comparing with all the functions we recognized above, we got the conclusion that the linear effect is the worst, and the quadratic type is slightly higher than the S-type's result. This is the main reason why we choose the quadratic model as the nonlinear relationship between the two in this paper.    Table 2. Column (1) in Table 2 is the estimated result of the fixed effect (FE) that controls the time and province effects under the general linear model; and β = 1.143 through the 1% significance level test, there is a significant positive correlation between DN and GDP. Column (2) is the estimated result of FE that controls the time and province effect under the assumption of the model of S-type, and the parameters of k = 397.79, α = 1.943, β = −0.015 all passed the test of 1% significance level, and Adjusted R-squared is greater than the first column, indicating that there is a significant nonlinear 'S' relationship between DN and GDP. Column (3) is the estimated result of FE that controls the time and province effects under the quadratic assumption, and both α = 1.856 and β = −0.0002 passed the test of 1% significance level, and Adjusted R-squared is greatest than columns (1) and (2), and there is a significant quadratic relationship between the DN and the GDP, what's more, in the follow-up study of the paper we selected the function of the quadratic relationship between them. The time trend term was added to column (4) and the results show that the value α and β are significant, however, the value of α falls to 1.792 and the value of β rises to −0.0019.
As the discussion of Berliant and Weiss [20] , at a given time, there may be spatial autocorrelation in real GDP growth in different countries, and the method of ordinary least squares (OLS) will be deviations in the results, however, measurement methods of space can overcome this weak point. In this paper, we constructed the spatial measurement model, and the estimation results of the spatial auto-regressive model (SAM) are performed in column (5) of Table  2. Comparing with the results from FE estimation, there are two obvious changes in the SAM estimation results: The first one is that the value of α is increased and the value of β is reduced, reflecting the higher correlation between DN value and GDP; the second one is that R 2 of the model is greater, reflecting the stronger interpretation of the SAM model.
In columns (6) and (7) of Table 2, we examined the relationship between GDP and its substitution variables, the consumption of electricity (Elec) and the rate of urbanization (Urb). The estimation results in columns (6) and (7) show that there is a significant positive linear relationship between Elec and GDP, Urb and GDP; but R 2 is smaller than other models. Moreover, we made a mix estimate of the per capita DN, Elec and Urb in column (8), although the values of the variables vary greatly, the statistical results are still significant. Finally, the estimation results of other control variables are added and showed in column (9), although the value of α decreases and the value of β increases, the statistical result is still very significant, which further validates the nonlinear relationship between DN and GDP.

The Test of Robustness
First, we replaced the per capita DN with the mean of lights (sum of DN/ number of Grid) to verify the correlation between the growth rate of DN mean and the growth rate of GDP, and the results are shown in columns (1) and (2) of Table 3. In column (1), the coefficient of β is 0.013 (significantly positive), indicating that the positive correlation between the brightness of the night-time lights and GDP does not change; meanwhile, the two coefficients of the quadratic form in column (2) are unified with the symbols in Table 2, which proves that the nonlinear relationship between the brightness of the lights and GDP does not change. Note: ***, **, and * are the significance at the 1%, 5%, and 10% levels, respectively; the following explains the same.
Second, we consider the ground-continuous flame factor that may interfere with the accuracy of night-time lights data measurement. Henderson, et al. [6] believed that the gas flare generated during oil production may produce brightness values for the light and affect the estimation results. Considering this, we remove six provinces which have large oil production in 31 provinces and cities in China, including the provinces and cities of Tianjin, Heilongjiang, Shaanxi, Shandong, Xinjiang and Guangdong, and then, we re-fix the other influence variables in formula (2) to recalculate the correlation between the per capita DN and the per capita GDP. In column (3) of Table 3 we show the linear estimation results, compared with the results in Table 2, the values do not change too much. Meanwhile, the results in column (4) show that there is no significant change between column (4) and Table 2 in two coefficient values of the quadratic function, which indicates that the interference caused by the gas flame does not have a significant impact on the estimation of the overall model, and our model results are stable.
Finally, the data of DN we used from satellites F15, F16, and F18, since we selected the data from 2004 to 2013. In order to eliminate the error caused by different satellite measurements, we sum the data of DN from different satellite in the same year and made the equal weighting among them as the mean values. Observations were averaged and the results were as shown in column (5), and the coefficients were all tested by a 1% significance test. Note: ***, **, and * are the significance at the 1%, 5%, and 10% levels, respectively; the following explains the same.
In general, the test results of China's provincial-level GDP and DN show that although there are some differences in the coefficient estimates of the quadratic function relationship under different estimation methods, the difference is not the same as the overall significant fluctuations, and all pass the 1% level of significance test. Therefore, there is sufficient evidence to show that there is a very significant correlation between DN and GDP, which further indicates that the brightness of the light can be used as one of the variables for observing economic growth under certain conditions.

Measuring of the Imbalance of Regional Development in China
Due to measurement errors, GDP growth from official statistics may not accurately mirror China's actual economic growth, and the authenticity of China's local GDP data is widely questioned due to manipulation by human factors, and the deviation from the officially calculated GDP and the actual economy at the provincial-level may be greater. Most of the research about the Imbalance of Regional Development (IRD) in China is in view of the index of GDP and its derivatives, which makes the degree of IRD in China is quite questionable. On the surface of the previous introduction in this paper, there is a significant positive relationship between DN and GDP. Under the help of DN, we can measure the actual situation of IRD in China from another aspect more objectively.
The existing methods for calculating IRD include the method of Theil Index, the method of coefficient of variation (CV) and the method of Gini Coefficient. In this paper, we based on the method of CV which is the most widely used and most representative method in the existing references, to measure IRD in China. Akita and Miyata [12] proposed a method of bi-dimensional decomposition of IRD based on the population-weighted coefficient of variation (WCV), which made the spatial decomposition of IRD and income source decomposition unified within a systematic framework, and we adopted this method to measure the IRD by replacing the index of DN with GDP.

Measurement of the Adjusted Coefficient of Variation
In this paper, we use an adjusted population-weighted coefficient of variation (ACV) method to measure IRD. Suppose that there are m regions in an economy, and region i contains h i provinces. Therefore, there are m i=1 h i provinces in the economy as a whole. Letx ij is the per capita DN of province j in region i, N ij is the population of province j in region i, N i is the total population of region i, N = m i=1 hi j=1 N ij is the total population of all provinces, X = m i=1 hi j=1 N ijxij is the total DN of all provinces, andX i = X i /N i is the per capita DN of i province,X = X/N is the per capita DN of all provinces. Then, overall inequality in per capita DN among provinces can be measured by the square of the adjusted population-weighted coefficient of variation (ACV, hereafter) as where X = (X 1 , X 2 , · · · , X m ) and X i = (x i1 ,x i2 , · · · ,x ihi ) are all variable collection.
As is well known, the ACV belongs to the generalized entropy class of measures, and can be additively decomposed into the within-region and between region inequality components as follows among Equation (4), we decomposed further and the degree of IRD within-region at region i is calculated as follows further, take the result of Equation (5) into the sum of m regions CV W in the calculated is as follows finally, the imbalance between m regions CV B is calculated as follows

The Overall Trends of IRD in China
The degree of IRD in China's regional development declined at first, and then declined by the way of fluctuation during the sample period (2004-2013), followed by a sharp rise and fell sharply soon afterwards, and then slowly declined to stabilize (at Figure 2, left). There are two distinct cut-offs in this process, at the year of 2009 and 2010 respectively. Before 2009, the degree of IRD showed a state of ups and downs, and the cost of ACV fluctuated between 0.75∼1.25, the change was relatively stable. From 2009 to 2010, the degree of IRD appeared a short rapid rise and the value of ACV reached 2.04; this is also the transition of China's regional economic development policy from the development of central cities and special zones to the development of large urbanization areas and "group-type urban agglomerations" in 2010. From 2010 to 2011, there was a short-term rapid decline, which fell to 1.02; after 2011, it stabilized and fluctuated slightly around 1.01. Overall, IRD is gradually decreasing and the average ACV is 1.11, which indicates that China's imbalance of regional development is gradually decreasing and tending to stabilize.
The results of China's IRD in 2004-2013 are calculated as shown in Figure 2 (right). After a brief rise in 2004, it continues to decline, indicating China's new regional development strategy accomplished after 2004. The effect continues to be significant, which is inconsistent with the results calculated using the night-time lights data; considering the development of China's realworld regions, we believe that the results from the data of DN are more realistic. After 2010, the degree of IRD has stabilized and fluctuated slightly around 1.4, which is consistent with the conclusions of the lights data. In general, the degree of IRD is gradually decreasing and tends to be stable, however, it is less than the continuous steady reduction in the existing literature and the fluctuation is reduced.

Spatial Decomposition of the Causes of IRD
The method of ACV is spatially decomposed by using Equation (4) above. Specifically, the imbalance of China's regional development is divided into two parts, that are the within-fourregions and between-four-regions, as the Eastern region, the Central region, the Western region and the Northeast region. The results are shown in Table 4. Table 4 The results of spatial decomposition of the causes of IRD Time Regional Imbalance Within Regional Imbalance Between Regional Imbalance It can be seen from Table 4 that the degree of IRD between-four-regions in China is about 1.4∼3.2 times compared with the sum of the imbalances in the four regions, and the degree of IRD between regions is highly consistent with the process of regional imbalances in China's regional development (see Figure 3) from 2004 to 2013. After a sharp decline at 2010, IRD is in a relatively state of stability, and its effect of IRD within-region is uneven, there are certain differences, and they are relatively consistent in general and the absolute value of the degree of imbalance within the region is still less than the imbalance between regions. Comparing with the imbalance between-regions, the impact effects of within-region is in a second place. In terms of this result, therefore, we believe that judging the imbalance between the four regions is the main reason for the imbalance of regional development in China.

Prediction of the Degree of IRD in China
Since the lights data is only updated to 2013, the measure of IRD is limited from 2004 to 2013. Based on the time series characteristics of the national lights data in China, we forecast the lights data from 2014 to 2016 and conduct regional analysis in the same way by using Equation (4) which we gave above, and then, make a deep discussion of the degree of IRD. In the test of the time series auto-regressive model, the lag order is 1, and the final ACV for the regional development imbalance in China is shown in Figure 4.

Conclusion
Satellite night-time lights data as the index is a useful proxy for economic activity at temporal and geographic scales for which traditional data are of poor quality or are unavailable. The nonlinear relationship between the data of night-time-lights (DN) and the data of GDP is discussed, and a specific quadratic substitution relationship is given in this paper. What's more, we used DN instead of GDP to calculate the imbalance of regional development (IRD) in China, moreover, we presented the bi-dimensional decomposition method of IRD based on the adjusted population-weighted coefficient of variation. Therefore, it enables us to analyze the contributions of DN components to within-region and between-region inequalities and thus to overall regional inequality in a coherent framework.
According to our analysis using provincial DN by spatial sector in China, the adjusted population-weighted coefficient of variation is found to be quite useful when we examine the changes in the determinants of IRD associated with the changes in industrial structure and the geographical distribution of economic activities. After computed the ACV in China from 2004 to 2013, we found that the imbalance of regional development is gradually decreasing and stabilizing, although there has been a sharp jump in 2010, the entire trend is declining. And that the imbalance in between-region rather than within-region is the main reason for the IRD across the whole country. In this paper, we give a surrogate indicator, DN, to show the degree of IRD in China in recent years, which is a new perspective in the study of regional coordinated development and also a new aspect to study the imbalance of regional development in china, and we're going to do more. In Figure 4, the degree of IRD is set according to the model of ARIMA (1,1,0). The ACV values obtained in the forward prediction phase three are 1.0223, 1.0152 and 1.017, respectively, and the average value is about 1.018. Compared with the previous conclusion that IRD in China is stable after 2011 and is fluctuating around 1.11, and when calculated ACV using per capita GDP, it is 1.249, 1.239 and 1.439 from 2014 to 2016, what's more, its trend is also stabilizing. Therefore, we believe that the use of the index of DN to calculate IRD under the method of AVC is a suitable way, and the degree of IRD has reached a bottleneck period for China. To break through this stable state, we need to use external forces (such as regional policies, economic changes, and so on), and also should focus on macroeconomic policies between regions. It is supplemented by unbalanced development within the region.