Analysis of modern approaches for the prediction of electric energy consumption

Abstarct A review of modern methods of forming a mathematical model of power systems and the development of an intelligent information system for monitoring electricity consumption. The main disadvantages and advantages of the existing modeling approaches , as well as their applicability to the energy systems of Ukraine and Kazakhstan,are identified. The main factors that affect the dynamics of energy consumption are identified. A list of the main tasks that need to be implemented in order to develop algorithms for predicting electricity demand for various objects, industries and levels has been developed.


Introduction
Creation of innovative intellectual systems for managing energy consumption processes is a vital task for individual objects (institutions), countries, and for the global econ-omy as a whole. Solving such urgent problems as reducing energy consumption, ensuring energy independence, reducing greenhouse gas emissions requires identifying adequate methods for analyzing, modeling and forecasting time series of consumption and production of various types of energy, their integration with existing information systems for making management decisions across individual enterprises, cities, industries and states. The lack of development of theoretical and methodological approaches and practical aspects of the use of forecasting systems and evaluating the efficiency of electricity use in Kazakhstan and Ukraine actualize the need to create integrated automated energy management systems using modern methods of machine learning.
The purpose of this work is to compare modern methods of analysis, modeling and forecasting the consumption of electric energy at the national, sectoral and individual (by facilities) levels, as well as to study the experience of their use in various countries and industries.
In this paper we have used classical statistical "ad hoc" models, advanced ensemble methods and neural networks to predict electric power demand with a case of the wholesale energy transmission company.
The rest of the paper is organized as follows:Section 2, contain Literature review, Section 3discuss the Comparative analysis of the methods and models, Section 4 contain Application and Results, the Conclusion of the study is provided in Section 5.

Literature review
The ubiquity of modern technological devices for measuring the amount of energy consumed has contributed to the development of engineering and statistical analysis methods, which make it possible to effectively plan, predict and monitor the growing load on the power grid.
Over the past decade, research has intensified in the area of forecasting electricity consumption for industrial, municipal, and energy distributing enterprises, housing complexes, business structures, and individual houses [1][2][3][4][5]. This is due to the need to ensure the energy efficiency of buildings, recognized by the International Energy Agency (The International Energy Agency) as one of the five conditions that reduce the final energy consumption and associated CO2 emissions [6]. Environmental prerequisites and economic feasibility contributed to the development of national energy-efficient design rules for various types of buildings, which gave impetus to the development of computer software for energy-efficient design of new homes, such as EnergyPlus, DOE-2, eQUEST, IES, ECOTECT, etc. [7].
Maintaining energy efficiency in buildings requires continuous monitoring of energy consumption indicators and identifying factors that affect them in real time. Most researchers identify weather conditions as the main factors determining the dynamics of demand for electricity. These include: temperature indicators (air, environment, dry lamps, dew point, wet point, room temperature); indicators of humidity, pressure, wind speed and direction, cloudiness and brightness of the sun; precipitation [8]. Among additional independent factors, the authors use variables of electrical load, heat transfer, or thermal index in models; calendar variables; size indicators and operational characteristics of buildings, urban infrastructure development; indicators of living standards and socioeconomic development [8].
For example, to predict the demand for electricity in the residential sector of Chile [4], the authors use data on average daily energy consumption in kW as the dependent variable, variables of average daily temperature in Celsius, and daily value per unit account of Chile as explanatory variables. To display calendar effects, researchers include dummy variables, namely, a variable for all Saturdays, a variable for all Sundays, and a variable for holidays in the study interval [4]. It should be noted that the frequency of the time series used in the models is determined by the source and availability of data.
So, in the work [5] the hourly rows of electricity consumption are presented, in the study [3] -half-hour data with an annual time interval. Accordingly, the forecasts obtained on such a sample can only be short-term, for example, for a week. To obtain medium-term and long-term forecasts, models estimated using higher frequency data (for example, monthly [9]) and a longer time interval (several decades) are used. Real-time forecasting requires the acquisition of data from instrumentation by the minute or by the second.
An analysis of open statistical information on electricity consumption in Ukraine and Kazakhstan [10,11] shows that the statistics on gross electricity consumption by all sectors of the economy are available only by year; Indi-cators of final consumption, taking into account renewable energy sources in the context of households, industrial sectors, transport, services, agriculture, forestry and fisheries, as well as non-energy energy consumption, have been available only since 2007. You can get monthly reports from relevant ministries [12] indicators of gross energy consumption in the country, and only within the last decade.
A comparative analysis of methodological approaches to the calculation of the energy security indicator revealed a number of weaknesses in the national systems for assessing energy security as part of the country's national security. In particular, the shortcomings of the approach to calculating the level of energy security of Ukraine [13] are identified. These include: the limited range of aspects of energy security for which the assessment is carried out, the lack of a base for comparison and a long series of statistical data on energy security indicators, a slow update of the threshold values of indicators embedded in the rationing algorithm. In addition to the domestic approach, an analysis was conducted of methods for assessing the energy security risk index developed by the United States Institute of Energy and the International Energy Agency [14]; a comparative analysis of these methods with the domestic approach. According to the results of the analysis, differences were found in the rationing of individual indicators, the quality characteristics of individual indicators and the method for determining the respective weights for each indicator. It was proposed to include indicators such as market volatility, energy intensity indicators, the state of global and regional fuel stocks, etc., into the list of indicators of the country's energy security. To solve the problem of modeling real statistical data represented by different frequencies, it was proposed to use mixed frequency models, Mixed-Data Sampling Models (MIDAS) [15], to determine the relationship between possible energy security factors and the energy efficiency of the national economy.
One of the solutions to the problem of a small sample of data to obtain adequate statistically significant results and qualitative forecasts can be the use of panel models that evaluate similar indicators for a group of objects, for example, all educational institutions in the region, regions of the country or countries with similar development parameters.
Thus, article [16] used a panel sample of annual data on the consumption of electricity by residential buildings in the context of Chinese cities to identify the most significant factors of the construction of "green houses". The authors of [17] examine the demand for electricity in the industrial and service sectors of Taiwan, analyzing panel data for 23 industrial sectors and 9 service sectors for the period 1998-2015.
Article [18] assesses the efficiency of electricity consumption for an unbalanced group of 27 countries in transition and 6 OECD member countries in Europe from 1994 to 2007. Thus, it can be concluded that for countries such as Kazakhstan and Ukraine, models based on panel data are the most acceptable.
At the same time, the focus of scientific research in these countries should be biased towards modeling electricity demand by individual objects that have the appropriate equipment to measure high-frequency fixation electricity consumption, followed by extrapolation of the results to higher levels (industry, regional).
The above approach is presented in detail in the work of Canadian scientists [16], who identified two methods for modeling the demand for electricity in the residential sector: "top down" and "bottom up".
The first approach focuses on identifying key factors and forecasting electricity consumption by housing objects of different levels depending on historical housing data and top-level variables, which include macroeconomic indicators (gross domestic product, unemployment rates and inflation), prices for various types of energy, climatic factors.
The second approach is based on the use of statistical and engineering methodologies for predicting electricity consumption at the regional and national levels by extrapolating the indicators of a representative set of individual houses [19].
It should be noted that engineering models that describe final energy consumption as a natural phenomenon, based on physical laws and do not require historical data on energy consumption, are now practically not used. The rapid increase in the sources and volumes of data, their processing technologies and processing system capacities contributed to the shift of scientific interests towards statistical methods.
The variety of statistical models is due both to differences in the data structure (linear and nonlinear; discrete and continuous models), as well as the development of machine learning methods and software tools that implement them. Parametric and non-parametric methods that can be classified into regression, autoregression methods, Fourier models, neural networks, models of fuzzy logic, Wavelet analysis, Bayesian methods are widely used.
The use of parametric methods implies the availability of information on the nature of the distribution of data, which is fraught with the receipt of biased estimates of parameters and false conclusions in the case of an incorrectly chosen model. For those cases where the present distribu-tion of data is unknown, the use of non-parametric methods is preferred. A significant drawback and limitation of non-parametric models, focused more on testing hypotheses than on estimating parameters, is the complexity of their calculations and high requirements for software and hardware [4].

The comparative analysis of the methods and models
Modern time series forecasting methods are based mainly on the principle of historical prediction of the future. The peculiarity of the energy consumption indicators is the presence of multidirectional trends, seasonal and cyclical fluctuations, structural breaks, makes certain requirements for the selection of appropriate methods and models. This paper represents a comparative study of approaches that can be used to make reliable predictions of energy consumption on macro, micro and sectoral levels, as well as to reveal significant predictors and causal relationships for policy conclusions. Additional interest point is model selection for energy consumption time series of different data frequency. The study focuses on classical time series techniques (autoregressions, exponential smoothing models, dynamic regressions), ensemble models and neural networks, capable to handle nonstationarity, heteroscedasticity, serial correlation of nonstable short-term data.
The methods for extrapolation past information to the future are constantly being improved in terms of complexity, interpretation and forecast accuracy. In the last decades scholars' attention has shifted from structural models based on the system of equations and restrictions on parameters to special "ad hoc" models that are not theoretically justified. Although statistical techniques based on the Gauss least squares (OLS), non-linear least squares (NNLS) and the maximum likelihood (MLE) estimation are highly used, technologies' innovations forced active development of machine learning forecasting methods.  [20]. Still, numerous studies [1,4,7,20] report better model fitting but worse forecasting accuracy of these methods comparing to statistical models. The researchers [20] state the need for improvement and further development of machine learning models in terms of their better interpretability and specification of the uncertainty around the point forecasts.

Autoregressive approach
One of the most widely used classical "ad hoc" time series techniques is Autoregressive moving average (ARMA) or Autoregressive integrated moving average (ARIMA) models that apply the Box-Jenkins methodology [21]. These models predict time series' future values based on a linear combination of its previous values and disturbances. The ARIMA model with parameters p (the autoregressive order or the lag of the model), d (the integration or differencing order), q (the moving average order) fit an equation: Here y t represent the actual time series values in time period t; ∆ d = (y t−1 − y t ) d is the difference operator of the d th order, applied to remove a stochastic trend; φ 1,...,p , θ 1,...,q are the parameters of the model; ε t is a error term that is assumed to be a stationary Gaussian white-noise process with mean zero and constant variance σ 2 [21]. Model (1) can be rewritten using backshift lag operator (L) notation as: A special case of model (1) is the Seasonal autoregressive integrated moving average model SARIMA (p, d, q)×(P, D, Q)s [21]: where s is the seasonal length -the number of periods in a season (s=12 for monthly series); L is the lag operator; ∆ D s is the seasonal difference operator. An iterative modeling approach implies assessing stationarity and seasonality patterns; identification of the model parameters and their estimation with maximum likelihood or non-linear least squares methods; checking adequacy and prediction accuracy of the model [22].
A common technique to assess the stationarity of the series is the Augmented Dickey-Fuller (ADF) test. It estimates the model (4) to test the null hypothesis of a unit root against the alternative of stationarity [23]: where α is a constant, β the coefficient of a simple time trend, ρ is the parameter of interest, ∆ is the first difference operator, δ i are parameters and p the lag order of the autoregressive process.Specification of the ARMA/ARIMA/SARIMA models is commonly facilitated by the graphical analysis of the correlograms (the autocorrelation function, ACF, and partial autocorrelation function, PACF) of the original and differenced series [21]. Selection the optimal model parameters (p, d, q), (P, D, Q) is justified by minimization of the information criteria (see Appendix 1). The Hyndman-Khandakar algorithm automates this procedure with the function auto.arima of the "forecast" R package [22].
To eliminate the problem of unreliable MLE parameter estimation and to reveal unobservable state of the series frequently the Kalman filter algorithm is used for ARIMA state-space models [24].
In the presence of the consistent change in the variance over time, the Autoregressive model of conditional heteroscedasticity (ARCH) [25] or Generalized uutoregressive conditional heteroscedasticity model of (GARCH) are appropriate [26]. The models predict the future conditional and unconditional variance presuming the stationarity of the series (no trend or seasonal component) [26]: Here the error term ε t accounts for a stochastic white-noise process z t , and a time-dependent standard deviation,σ t . For ARCH(q) the squared innovations σ 2 t are modeled as: where α 0 > 0 and α i ≥ 0, i > 0 for all t.
For GARCH(p, q) the series σ 2 t is modeled as: Here p and q are nonnegative integers, representing the number of lagged conditional variances and the lagged squared innovations, respectively.
GARCH models have numerous applications in financial time series analysis. The ARIMA/SARIMAX models fit energy consumption series better due to relatively stable dynamics and seasonal characteristics.
Despite the active development of machine learning models, autoregressive methods (ARMA/ARIMA/SARIMA, dynamic regression models, vector autoregressions, VAR, and cointegration models, VEC) are still widely used to predict the electric energy consumption.
The researchers emphasize the improved forecast accuracy of the SARIMAX models [1,4,22], which assesses not only historical energy consumption data, but additional exogenous variables as well. Thus, considering holidays' and weather effects, changes in the law, market situation and demographics, may explain significant data variation giving more reliable predictions. Dynamic regression models that include external variables and allow the model errors to contain autocorrelation describing them as ARIMA process showed good results as well [22,27].
A common way to account for causality of nonstationary series, make structural inference and policy conclusions is to use vector autoregressive (VAR) models and structural vector autoregressive (SVAR) models. They treat simultaneous sets of variables equally, regressing each endogenous variable on its own lags and the lags of all other variables in a finite-order system [28]. The basic p-lag VAR has the form: where Y t = (y 1t , y 2t , ..., y nt ) ′ is an (n × 1) vector of time series variables; Π i are (n × n) coefficient matrices; ε t is an (n × 1) unobservable zero mean white noise independent vector process with time invariant covariance matrix ∑︀ . The simplicity of estimation and interpretation of VAR/SVAR models with impulse response functions and forecast error vector decompositions made them a good alternative to structural models. The authors [29] used VAR approach to empirically prove the existence of bidirectional causality between electricity consumption and GDP in Russia.
The drawbacks of VAR approach in terms of explaining the long-term dynamics of the series is successfully realized by the Vector error correction models, VECM, used to describe the cointegration relationships between the variables. The basic VECM form a relationship [30]: where ∆Y t and its lags are differenced I(0) series; D t is a deterministic term; ΠY t−1 contains the cointegrating relations.
Authors [31] employed Johansen cointegration to determine the long run relationship between energy consumption and its determinants for different sectors and to forecast future energy demand using scenario analysis.
Taking into consideration deep theoretical development, outstanding empirical results, simplicity and feasibility of justification and deployment, autoregressive models are highly recommended for use in experimental studies. It is important to mention that vector autoregressive and cointegration models are suitable mostly for macroeconomic analysis of energy consumption by sectors, regions and sources. ARIMA/SARIMAX models

Exponential smoothing approach
Exponential smoothing is a powerful time series forecasting method for univariate data, frequently used as an alternative to autoregressive approach. This framework has multiple applications in different fields of studies due to its flexibility, reliability of the forecasts and low expenses. Proposed in the late 1950s [32] this approach has motivated some of the most successful forecasting methods.
The taxonomy of exponential smoothing models differs depending on the trend and seasonality nature. The simple exponential smoothing model applicable for data with no clear trend or seasonality produces forecasts as weighted averages of past observations, decaying exponentially depending on the timing of observations [22]: where 0 ≤ α ≤ 1 is the smoothing parameter.
Holt-Winters additive and multiplicative models suggested improvement of the model (16) to account for trend and seasonal patterns [22]. The more advanced state space exponential smoothing models with additive or multiplicative errors contain a measurement equation that describes the observed data, and some state equations that describe how the unobserved components or states (level, trend, seasonal) change over time [22]. One of the most successful recent advancement in exponential smoothing state space models refers to TBATS model with Box-Cox transformation, ARMA error, trend and representation of seasonal components by Fourier series [33]. This approach produces high accuracy forecasts handling multiple nested and nonnested seasonality. Although it requires extra calculation time, especially for big time series data.
The general representation of TBATS model (11) includes level (12), trend (13), seasonal (14) and ARMA error term (15) equations: Here y (ω) t is Box-Cox transformed observations at time t with the parameter ω; l t is the local level at time t; b is the long-run trend; b t is the short-term trend at time t; seasonal periods; s (i) t is i th seasonal component of the series at time t; d t is an ARMA(p, q) error process; ε t is the Gaussian white-noise process with zero mean and constant variance; α, β, i are smoothing parameters; ϕ is damped parameter; s (i) j, t−1 is the stochastic level; k i is the number of harmonics for the i th seasonal component, λ (i) j = 2πj/m i where m i is period of the i th seasonal cycles [33].
Papers [7,34,35] verified excellent forecast accuracy characteristics and opportunities for long-term forecast of electric energy demand using TBATS and based on it hybrid models.
Other useful models based on time series smoothing and decomposition are Seasonal and Trend decomposition using Loess (STL) and Multiple Seasonal Decomposition (MSTL). They use local regression nonlinear smoothing algorithm (Loess) for parameter estimation [22].

Machine learning methods
Artificial intelligence methods are becoming increasingly popular in the scientific and business environment [20]. There are numerous applications of machine learning methods in forecasting energy consumption and demand [5,7,8,17,19,35].
Deep learning with artificial neural networks (ANN) are widely used and discussed nowadays. A significant advantage of ANN models is their ability to model non-linear relationships not restricting on the stationarity of the parameters. Its shortcoming refers to the requirement of big data sample for training and complexities with interpretation of the "black box" output.
Neural network is organized in form of layers having the predictors' or inputs' bottom layers, the forecasts' or outputs' top layer and intermediate layers containing "hidden neurons" [22]. Frequently used nonlinear autoregressive neural networks model NNAR(p, P, k)m [36] can be described by the following equation: where p, P represent lagged autoregressive and seasonal inputs respectively, k -nodes in the hidden layer, m -the number of seasonal periods' inputs Another ANN model that showed outstanding forecasting abilities is Multi-Layer Perceptron (MLP), where each layer of nodes receives inputs from the previous lay-ers. The matrix notation of the MLP model is: Here b (1) , b (2) are the bias vectors; W (1) , W (2) are weight matrices connecting the input vector to the hidden layer; G, s -activation functions; h(x) forms the hidden layer; o(x) is the output vector.
The proposed MLP approach [37] was used to classify residential buildings according to their energy consumption and make corresponding hourly predictions for high and low power consumption buildings.
To sum up, neural networks models often provide an ideal approximation of actual and predicted data within a training sample, but in the case of insufficient training data, give large forecast errors. A variety of methods are used to improve predictive qualities of ANNs, including cross-validation, noise reduction, error regularization, error-reversal method, optimized approximation, SVM algorithms [22].
Currently, scientists are offering a range of hybrid models that are based on two or more traditional machine learning techniques or artificial intelligence methods [7,19,35]. Traditional methods for predicting time series, such as ANN and ARIMA, are complemented by optimization methods -Particle Swarm Optimization Algorithm (PSO), genetic algorithm, ant colony genetic algorithm etc. For instance, in paper [8] the authors introduced the hybrid model that combines the ARIMA model to identify periodicity, seasonality and linearity with an evolutionary algorithm (EA) for efficiently determining and optimizing residuals. Researchers [35] developed a hybrid model based on the TBATS and neural networks algorithms to forecast the electricity load demand.
Ensemble methods build a model by training several relatively simple base models (also known as weak learners) and then combine them to create a more predictive model. The most well known ensemble learning algorithms use bootstrap aggregation, also known as bagging (Breiman 1996); random forests (Breiman 2001); extremely randomized trees, also called extra trees (Geurts et al. 2006); and boosting (Schapire 1990). The bagging, extra trees, and random forests are based on a simple averaging of the base learner, while the boosting algorithms are built upon a constructive iterative strategy.
The recent advances in machine learning methods refer to ensemble methods that combine several low accuracy base models ("weak learners") are used to create a higher quality predictive model ("strong learner"). The most popular ensemble learning algorithms are bootstrap aggregation (also known as bagging); random forests; extremely randomized trees (also known as extra trees), and boosting [38]. The first three methods are based on a simple averaging of the base models, while boosting methods apply iterative optimization algorithms based on decision trees and minimization of the loss function [39]. Boosting algorithms like Gradient Boosting [39], XGBoost, Ad-aBoost, Gentle Boost are frequently demonstrate state-ofthe-art results at Kaggle and other machine learning competitions [40]. The improvement of prediction accuracy by the gradient boosting machine model comparing to piecewise linear regression and to a random forest algorithm have been proved in [38] on the example of energy consumption of commercial buildings.

Application and Results
In this section we use the hourly energy consumption data (2012 -2017) of the US wholesale transmission organization [41] to test the prediction accuracy and deployment features of the statistical and machine learning methods described in Section 3. Visual inspection of the electricity consumption time series point on the possible sources of data variation, including weather, holidays, daily, weekly and monthly periodicity (Figure 1). It is expected that the electricity consumption should be lower during weekends and nights, and may be higher at holidays and in summer and winter months. To account for possible multiple seasonality different exogenous variables are considered: outside air temperature, time of the week, hour of the day. The models are implemented in R programming language using 'Forecast' [42], 'Segmented' [43], 'XGBoost' [44], 'rnn' [45]  For multivariate time series analysis we estimate ARIMA, piecewise linear regression that include additional input variables. To improve the prediction accuracy of the energy consumption modeling we used the gradient boosting [44] and neural network [45] machine learning algorithms.
Tables 1-2 demonstrate the model and forecast accuracy for training and test sets.
The energy consumption models are ranked by the forecast accuracy in this order:   In model (20) the exogenous variable day_of_week turns to be insignificant. According to the Hyndman-Khandakar algorithm [22] the optimal ARIMA model doesn't include any seasonal parameters. At the same time variables Temperature and hour_of_day significantly influence the hourly energy consumption.
Dependency of the energy consumption on the year seasons (rise in summer and winter months and slowdown in the rest of the year) explain the choice of the piecewise linear regression. Effect of temperature on electricity consumption is presented in Figure 2.   Gradient boosting [44] and neural network machine learning algorithms used in the study are based on the extraction of the influential data features for models' training. The main features of the energy consumption time series are hour, day of week, month, quarter, year, day of week, day of month, week of year. Feature importance according to the gradient boosting tree algorithm is presented in Figure 3.
Detailed analysis of the errors of the gradient boosting model revealed the worst prediction accuracy for holidays. Inclusion of holidays' dummy variable (takes 1 if the day is a holiday and 0 otherwise) helped to improve the forecast accuracy MAPE to 5.45%. Estimated neural networks model showed the best forecast accuracy using two layers and four input variables that form the most important features of the energy consumption series. At the same time empirical analysis revealed deterioration of the forecast accuracy of the estimated machine learning methods in favor of TBATS and ARIMA model with exogenous variables.

Conclusions
The paper contains analytical review of theoretical and practical issues of effective energy management system based on the analysis of internal (technical, economic, structural, regime) and external (meteorological, environmental, energy, macroeconomic) factors. A comparative assessment of modeling techniques used to forecast electricity demand is considered. Two areas of research have been identified: forecasting electricity consumption based on panel data (by countries; regions; sectors, industries) and by individual objects that have the appropriate equipment to measure high-frequency consumption.
The findings point on the evolving shift from classical regression models to machine learning algorithms. Classical statistical techniques are still used but mostly in terms of hybrid models designed to reduce the model error or eliminate the existing assumptions for parameter estimation. In this respect, exponential smoothing model TBATS, seasonal trend decomposition model STL and seasonal autoregressive model SARIMAX form the top list of statistical techniques according to publications' review and empirical assessments.
The empirical analysis proves the extreme importance of clean high-frequency long statistics for high accuracy forecasts of energy consumption. Verification of signifi-cant independent variables that explain variation of energy consumption is found to be another factor that improves the quality of predictions, especially for short data samples.
The increasing popularity of machine learning methods, and gradient boosting and neural networks in particular, is their ability to extract features from the series and include them in the models without specifying the parameters, as is the case with standard statistical algorithms. The empirical study proved their superiority in terms of forecast accuracy, especially for long samples. Besides these models are less prone to overfitting and let the user to include non-significant variables and parameters without the loss in the predictability of the model [38]. The empirical model evaluation in RStudio Integrated Development Software revealed problems associated with huge computation time undertaken for neural networks model. The XGBoost gradient boosting algorithm realized in [44] sufficiently decreases this time applying paralleling technique. Still much effort should be taken to help the final user to interpret these models not only by the accuracy metrics, but also by the "black box" investigation. Real time analytical solutions enabling in-time detection of the energy demand and its high and low picks, require further research considerations.