Manuel Baumann, Sara Grundel, Philipp Sauerteig and Karl Worthmann

## Surrogate models in bidirectional optimization of coupled microgrids

De Gruyter Oldenbourg | 2019

# Abstract

The energy transition entails a rapid uptake of renewable energy sources. Besides physical changes within the grid infrastructure, energy storage devices and their smart operation are key measures to master the resulting challenges like, e. g., a highly fluctuating power generation. For the latter, optimization based control has demonstrated its potential on a microgrid level. However, if a network of coupled microgrids is considered, iterative optimization schemes including several communication rounds are typically used. Here, we propose to replace the optimization on the microgrid level by using surrogate models either derived from radial basis functions or neural networks to avoid this iterative procedure. We prove well-posedness of our approach and demonstrate its efficiency by numerical simulations based on real data provided by an Australian grid operator.

## 1 Introduction

The share of renewable energy sources rapidly increases; also due to more and more installed devices like e. g., solar panels at household-level. Hence, households become prosumers, i. e., power is not only consumed but also produced. Therefore, energy generation and distribution takes place in a distributed way. In particular, energy can be transmitted bidirectionally between the grid and the prosumers, which results in a paradigm shift in the grid organization. In addition, prosumers may also possess some kind of energy storage device in order to manipulate their power demand profiles by either charging or discharging. From the grid operator’s perspective it might be beneficial that charging decisions are not made based on local information only. Instead taking into account information on the entire grid may improve the system-wide operation, e. g., to flatten the overall power demand within the grid in order to facilitate the power supply. [1]. In order to achieve this goal, communication is needed. In the future, each household shall be equipped with a smart meter which yields so-called smart homes. Smart meters collect data and communicate with the grid operator automatically.

A straight-forward way to optimally operate the overall system is to formulate one large-scale optimization problem and to solve it in a centralized way, see, e. g., [2]. This approach, however, is hard to realize in practice. One of the disadvantages is that some central node needs the complete information about the grid, which is, e. g., due to data privacy, not desirable. Alternatives are decentralized or distributed optimization algorithms. In [3] the authors propose a decentralized approach to steer energy storage systems in order to avoid over-capacity of pole transformers while maintaining a high charging amount of energy storage systems in low-voltage distribution systems. The other option mentioned above are distributed optimization methods such as distributed dual ascent [4], Alternating Direction Method of Multipliers (ADMM) [5] or Augmented Lagrangian based Alternating Direction Inexact Newton (ALADIN) [6]. These algorithms use a star-shaped communication topology, i. e., each smart home communicates only with the grid operator and does not share any information with its neighbours. Nevertheless, in every iteration each household has to transmit specific (personal) data to the grid operator, see also [7] and [8] for an application of ADMM and ALADIN to electrical networked systems, respectively. In order to exploit the potential of these algorithms they are typically embedded within a Model Predictive Control (MPC) framework. MPC is a state-of-the-art technique to tackle optimal control problems by solving finite-dimensional optimization problems successively, see e. g., [9] for an introduction to MPC and [10], [11] for MPC approaches in electrical networks.

An alternate option to steer the power demand of local agents besides battery control is to schedule so-called controllable loads. Controllable loads can be shifted in time to avoid bottlenecks in the energy supply, see e. g., [12], [13]. There is also a large potential in the context of stochastic optimization of smart grids. For weather forecasting methods we refer to [14]. How to integrate electrical vehicles into the electricity network under uncertainties is described in [15].

Considering the power networks described so far, it is assumed that exchange of energy within the grid is possible at any time and does not cause any losses or additional costs, which might (approximately) hold for domestic nets, e. g., a town. In this paper, we refer to these grids as microgrids (MGs). In [16], [17], the concept of coupled MGs is used to tackle large-scale problems incorporating several MGs. In the latter, the authors show that even if each single MG is optimally operated, there is still room for improvement if energy can be exchanged among MGs. Therefore, a second optimization problem is solved on a higher grid level in order to optimally exchange energy resulting in a bilevel optimization problem [18].

In [19], the authors propose to replace the distributed optimization routine on the lower grid level by a surrogate model in order to speed-up the calculation and further reduce communication effort. Here, Radial Basis Functions (RBFs) [20] are used to approximate the input-output behaviour of ADMM within the framework of coupled MGs established in [17]. Besides RBFs there are various methods to learn the behaviour of a complex function. Artificial Neural Networks (NNs) are one of the most popular representatives of modern artificial intelligence techniques and are often used in practice due to their success in various application fields, see e. g., the survey article [21]. In [22] the authors forecast loads in a power grid using NNs, whereas in [23] NNs are used in an optimal power flow framework. The main advantage of using surrogates is that communication effort can be reduced.

In this paper, we extend the idea of coupled microgrids established in [17] by proposing an iterative bi-directional optimization routine in order to improve the overall performance. Due to its iterative structure, however, our method comes along with a strong need for communication between smart homes and grid operator. As a remedy we present two approaches to reduce the communication effort by substituting the optimization on microgrid level via surrogate models. A main difference compared to [19] lies in the different input-output map that is replaced by the surrogate models, for which we can show that each input uniquely determines an (optimal) output. Furthermore, we also take NNs as potential surrogate models into account and study the performance of the resulting approximations numerically in an MPC framework. Our simulations show that the proposed method approximately recovers the performance based on using ADMM but significantly reduces the communication burden. The effect of applying surrogate models within MPC extends our previous work [19] where a surrogate model based on RBFs was only applied in a static optimization problem.

The paper is structured as follows: In Section 2 we formulate a mathematical model for coupled microgrids that consists of two hierarchy levels, and introduce optimization problems corresponding to each of them. In the consecutive section, we propose an iterative scheme that requires the solution of a distributed optimization problem on the lower level which is solved using ADMM. In Section 4, we investigate the impact of disturbances w. r. t. the lower-level solution on the performance measured in terms of the upper-level objective function. Based on the results, we propose to replace ADMM by surrogates in order to reduce communication effort and computation time. The performance of the optimization scheme incorporating surrogates is analysed in an MPC framework in Section 5.

## 2 A model for coupled microgrids

We consider a system of coupled microgrids (MGs) and call it a smart grid. Each MG consists of several residential energy systems (agents) coupled through the grid operator, which can be seen as Central Entity (CE). The coupling of the microgrids is done through a network, where some MGs are connected by a transmission line and others are not connected, cf. Figure 1.

### 2.1 Upper-level model: Energy exchange

We assume that we have Ξ N many MGs which are partially coupled via transmission lines and can be interpreted as nodes of a non-complete graph, see Figure 1 where Ξ = 4 . Each MG κ, for κ [ 1 : Ξ ] , consists of I κ N agents modeled in detail in Subsection 2.2. We assume that each MG κ has an average power demand z ¯ κ ( n ) at time n. Given this, we can compute the total power demand I κ z ¯ κ of a MG. The control goal is to exchange power among the MGs in a way such that a desired quantity ζ ¯ ( n ) is targeted by controlling the residential storage units. We specify ζ ¯ ( n ) in Subsection 2.2, and assume for the moment that this quantity is known to the grid and has advantages for the grid operation. We assume that this desired quantity is independent of κ, but this is not necessary for the rest of the discussion.

Let δ ν κ describe the percentage of power I ν z ¯ ν ( n ) that is transferred from MG ν to MG κ. We enforce δ ν κ equals zero if there is no transmission line between the two MGs. Otherwise, the power demand of a MG κ is given by its own total power demand δ κ κ I κ z ¯ κ , where δ κ κ is what remains at the MG, and the sum over the power received from connected MGs, ν κ δ ν κ I ν z ¯ ν . For each time step in our prediction horizon of length N N 2 , we want to match this to the desired power demand starting at time k for N timesteps of each MG in a least-squares sense. The objective function is thus given by J : R Ξ N × R Ξ × Ξ × N R ,

(1) ( z ¯ , δ ) n = k k + N 1 κ = 1 Ξ ζ ¯ ( n ) I κ ν = 1 Ξ δ ν κ ( n ) η ν κ I ν z ¯ ν ( n ) 2 .
Here, the vector z ¯ = ( z ¯ ( k ) , , z ¯ ( k + N 1 ) ) with z ¯ ( · ) R Ξ stacks the average power demand per MG and time step while the matrix η = ( η ν κ ) κ , ν = 1 Ξ [ 0 , 1 ] Ξ × Ξ incorporates efficiencies along the transmission lines.

We are interested in minimizing (1) under the following constraints: All exchange rates δ ν κ are within the interval [0,1], sum up to 1, and only transfer power in one direction, meaning that either δ ν κ or δ κ ν is zero. Moreover, note that at this grid level, the average power demands per MG are known. Following [17], the optimization problem of the upper-level is, thus, formulated over the exchange rates δ,

(2a) min δ [ 0 , 1 ] Ξ × Ξ × N J ( z ¯ , δ )
(2b) s . t . κ = 1 Ξ δ ν κ ( n ) = 1
(2c) δ ν κ ( n ) · δ κ ν ( n ) 0 , κ ν ν , κ [ 1 : Ξ ] , n [ k : k + N 1 ] ,
where δ ν κ ( n ) denotes the power exchange rate from MG ν to MG κ at time instance n. Constraints ( 2b) and ( 2c) ensure that the whole energy of each MG is scheduled and that exchanges via transmission lines can only occur in one direction during one time step. We denote the feasible set of ( 2) by
D δ = δ [ 0 , 1 ] Ξ × Ξ × N | (2b) and (2c) hold .
The efficiency of a transmission line does not depend on the direction of the transfer, i. e., the matrix η is symmetric. Furthermore, we assume no loss without transport, i. e., η κ κ 1 for all κ [ 1 : Ξ ] , in the rest of the paper.

### 2.2 Lower-level model: Single microgrid

As we have seen in the previous section, we consider an average power demand at each MG as well as some desired quantity ζ ¯ ( n ) . In order to understand these quantities better, we explain the modeling of the MG in all detail. The basis of our considerations forms the model presented in [24] and its extension [7]. Therefore, let each subsystem be equipped with an energy generation device (e. g., roof top photo-voltaic panels) and some storage device (e. g., a battery). Then, the i-th system, i [ 1 : I κ ] in MG κ, can be described by the discrete time system dynamics,

(3a) x κ i ( n + 1 ) = α κ i x κ i ( n ) + T ( β κ i u κ i + ( n ) + u κ i ( n ) )
(3b) z κ i ( n ) = w κ i ( n ) + u κ i + ( n ) + γ κ i u κ i ( n ) ,
where x κ i ( n ) and z κ i ( n ) denote the State of Charge (SoC) of the battery and the power demand at time instance n N 0 , respectively. The latter incorporates the net consumption, w κ i ( n ) = κ i ( n ) g κ i ( n ) , as the difference of load and power generation, cf. Figure 2.

The system can be controlled by charging u κ i + and discharging u κ i the battery at each time step. The length of a time step in hours is denoted by T > 0 , e. g., T = 0.5 corresponds to a half-hour time window. The constants α κ i , β κ i , γ κ i ( 0 , 1 ] represent efficiencies w. r. t. self discharge and energy conversion. Furthermore, the initial SoC x κ i ( k ) = x ˆ κ i , where k N 0 denotes the current time instance, is assumed to be known. State and input are subject to the inequality constraints,

(4a) 0 x κ i ( n ) C κ i
(4b) u _ κ i u κ i ( n ) 0
(4c) 0 u κ i + ( n ) u ¯ κ i
(4d) 0 u κ i ( n ) u _ κ i + u κ i + ( n ) u ¯ κ i 1 .
Here, C κ i 0 denotes the battery capacity. The last constraint ensures that the bounds on discharging ( 4b) and charging ( 4c) hold even if the battery is both discharged and charged during one time step. Note that the case C κ i = 0 covers the case, where not all systems have a storage device. Since the future net consumption is not known in advance, it is assumed to be reliably predictable on a short time horizon of size N, N N 2 , time steps.

For a concise notation we introduce the set X κ i = [ 0 , C κ i ] of feasible states, the set U κ i = { ( u κ i + , u κ i ) R 2 | (4b) (4d) hold } of feasible control pairs and the set

D κ i = z κ i R N | z κ i = z κ i ( k ) , , z κ i ( k + N 1 ) u κ i U N such that x κ i ( k ) = x ˆ κ i , (3) and (4) hold
of feasible outputs over the next N time steps, i [ 1 : I κ ] , κ [ 1 : Ξ ] . Referring to the feasible sets of a MG κ we use the Cartesian product, e. g., D ( κ ) = D κ 1 × × D κ I κ and z ( κ ) D ( κ ) . Note that the sets D κ i , i [ 1 : I κ ] , and hence D ( κ ) are non-empty, compact, and convex.

The output quantity in (3b) is the power demand z κ i of an individual agent in MG κ. The average power demand z ¯ κ = 1 I κ i = 1 I κ z κ i in each MG can then be computed from the individual power demands, and is used as an input to (1). We define ζ ¯ in (1) as a stable reference trajectory by averaging over a past time horizon and over all individual residential units of all microgrids, the so-called overall average net consumption as proposed in [7],

(5) ζ ¯ ( n ) = 1 I · min { N , n + 1 } j = n min { n , N 1 } n i = 1 I w i ( j )
where I = κ = 1 Ξ I κ denotes the total number of agents within the entire smart grid. Due to averaging, the trajectory ζ ¯ has little fluctuations and yields advantages for the grid operation.

Let us for the moment ignore the coupling described in Subsection 2.1. Then, δ ( n ) equals the identity for all n [ k : k + N 1 ] and J becomes

z ¯ n = k k + N 1 κ = 1 Ξ ζ ¯ ( n ) I κ I κ z ¯ κ ( n ) 2 .
Therefore, the overall objective J can be decoupled yielding the local optimization problem
(6) min z ( κ ) D ( κ ) g ( z ¯ κ )
per MG with local objective function g : R N R 0 ,
g ( z ¯ κ ) = ζ ¯ z ¯ κ 2 2 .

### 2.3 Fully coupled optimization problem

We are interested in optimizing the function (1). This function, in general, depends on δ as well as on z ¯ . As seen in the previous section, the average power demand z ¯ = z ¯ ( u ) depends on the control u, which we have to find in such a way that J is optimal. The overall optimization problem can be written as

(7) min δ D δ , z ( κ ) D ( κ ) J ( z ¯ , δ ) .
Note that due to constraint ( 2c) the optimization of J w. r. t. δ is non-convex. Furthermore, the large scale of the optimization w. r. t. z ¯ causes the use of a centralized solver to be expensive. In addition, using a centralized solver assumes the existence of a global entity gathering the information of the whole grid, in particular the personal data of each household, which is undesirable in practice. Hence, solving ( 7) centralized is impractical. In the subsequent section we present an approach to tackle ( 7) by solving the upper and lower-level problem iteratively. Doing so we avoid a node with full knowledge in the grid and only communicate specific aggregated information in each iteration among the agents.

## 3 Bidirectional optimization

We propose to tackle the optimization problem (7) in a bidirectional way, i. e., we first find an optimal z ¯ for δ being the identity and then optimize (7) w. r. t. δ for fixed z ¯ in order to find the optimal exchange strategy. This typically already gives a considerable improvement, and has been also done e. g., in [17]. To solve (2) for a fixed z ¯ is straight forward, and we use a standard Sequential Quadratic Programming (SQP) solver. We refer to [26] for an introduction to SQP methods. In this paper we show how to incorporate the computed energy exchange from the upper level into the lower-level optimization problem in order to improve the overall performance.

### 3.1 Bidirectional optimization scheme

Assume that each MG κ, κ [ 1 : Ξ ] , within the smart grid has already solved its inherent optimization problem (6) and based on the corresponding solutions z ¯ κ an energy exchange policy δ has been computed according to (2). This exchange yields an updated power demand

z ¯ κ + ( n ) = 1 I κ ν = 1 Ξ δ ν κ ( n ) η ν κ I ν z ¯ ν ( n ) , n [ k : k + N 1 ] ,
and hence, the difference Δ z ¯ κ ( n ) = z ¯ κ ( n ) z ¯ κ + ( n ) in power demand for all MGs. We are interested in updating z ¯ in such a way that our cost function ( 1),
J ( z ¯ , δ ) = n = k k + N 1 κ = 1 Ξ ζ ¯ ( n ) I κ ν = 1 Ξ δ ν κ ( n ) η ν κ I ν z ¯ ν ( n ) 2 = κ = 1 Ξ I κ 2 ζ ¯ z ¯ κ + 2 2 = κ = 1 Ξ I κ 2 ( ζ ¯ + Δ z ¯ κ ) z ¯ κ 2 2
is minimized further. One could think about fixing δ and finding an optimal z ¯ . This, however, leads to an optimization problem not avoiding communication and coupling all microgrids. The trick here is now to fix not only the δ but also the z ¯ -components from all the microgrids but one. This leads to optimizing I κ 2 ( ζ ¯ + Δ z ¯ κ ) z ¯ κ 2 2 locally in each MG, where Δ z ¯ κ is computed by using the z ¯ κ ’s and δ’s from the previous optimization step. Intuitively, the difference Δ z ¯ κ , κ [ 1 : Ξ ] , can be interpreted as an additional load or generation, and, therefore, as a change of the desired power demand profile for MG κ. This yields the modified lower-level optimization problem
(8) min z ( κ ) D ( κ ) g κ ( z ¯ κ ) = ζ ¯ κ + z ¯ κ 2 2 ,
where ζ ¯ κ + = ζ ¯ + Δ z ¯ κ In this formulation the updated reference trajectories ζ ¯ κ + , κ [ 1 : Ξ ] , differ among the single MGs and depend on a given δ and a given previous z ¯ κ . The solution of the newly derived lower-level optimization problem can be solved with ADMM for all microgrids independently and in a parallel way.

Based on the updated reference value we solve (8) and (2) to improve the battery usage and the energy exchange and repeat the optimization until some terminal condition is satisfied, e. g., performance improvement less than a pre-defined tolerance or maximal number of iterations exceeded. This procedure is summarized in Algorithm 1. Note that we only update the reference ζ ¯ on the lower level, since the upper-level optimization problem (1) does not change.

Neither convergence nor the interpretation of a potential limit of Algorithm 1 is clear a priori. Figure 3, however, experimentally shows convergence of the proposed scheme and a continuous improvement of the upper-level performance index. Here, we ran 10 iterations of the optimization scheme and plotted both the objective function values before and after the energy exchange within each iteration. The values stagnate after four iterations indicating that additional iterations do not further improve the overall performance. The next subsection elaborates on how to solve (8) in a fully distributed way using ADMM.

### Remark 1.

The results displayed in Figure 3 and our numerical investigations indicate convergence to the global minimum of (7). However, a formal (and rigorous) derivation of this conjecture is left for future research.

### 3.2 Distributed optimization via ADMM

In this section we briefly discuss how to solve the lower-level optimization problem (6) or (8) using an Alternating Direction Method of Multipliers (ADMM) approach. We consider a single MG and therefore omit the index κ. Since the averaged output quantity appears in the objective function (6) or (8), we need to introduce an auxiliary variable a in order to decouple the lower-level optimization in the following way,

(9a) min z , a g ( a ¯ ) = a ¯ ζ ¯ 2 2
(9b) s . t . 1 I i = 1 I a i a ¯ = 0 , z i a i = 0
(9c) z i D i i [ 1 : I ] .
Note that ( 9c) is a short-hand notation for the battery dynamics ( 3)–( 4), and yields a fully decoupled constraint in the variable z. ADMM is an optimization scheme to solve ( 9) based on the augmented Lagrangian L ρ : R I N × R I N × R I N R , for ρ > 0 ,
L ρ ( z , a , λ ) = g ( a ¯ ) + i = 1 I λ i ( z i a i ) + ρ 2 z i a i 2 2 ,
in a distributed way, cf. [ 5]. Following [ 7], the ADMM algorithm for ( 9) yields the three-step iteration + 1 ,
(10a) z i + 1 = arg min z i D i z i λ i + ρ 2 z i a i 2 2
(10b) a + 1 = arg min a R I N g ( a ¯ ) i = 1 I a i λ i + ρ 2 z i + 1 a i 2 2
(10c) λ i + 1 = λ i + ρ ( z i + 1 a i + 1 )
until some termination condition is satisfied. Note that ( 10b) is an unconstrained optimization problem and can be solved explicitly. The problem ( 10a) can be solved in parallel by each battery in the MG introduced for our model in Section 2.2. Note that scheme ( 10) assumes communication within the MG, more precisely, each system i, i [ 1 : I ] , sends its optimal solution z i to the CE and receives both the updated auxiliary a i and dual variable λ i in return. The variant discussed in [ 7] avoids unnecessary communication overhead by returning a broadcast variable instead, which only incorporates information on the aggregated values.

According to Theorem 3.1 in [7] the optimization scheme (10) converges in the following sense.

### Theorem 1.

Consider Problem (9) with g being strictly convex, closed and proper and let the iterates ( z , a , λ ) be computed according to (10). Then the following following statements hold true:

1. 1.

( z a ) N 0 converges to zero for ,

2. 2.

( g ( a ¯ ) ) N 0 converges to the optimal value g of (9),

3. 3.

( λ ) N 0 converges to the dual optimal λ of (9).

According to [7] and the references [5, Section 3] and [4, Appendix C] therein, problem (6) fulfils the assumptions of Theorem 1.

## 4 Surrogate models for ADMM

This section is dedicated to surrogate models for the optimization routine (10) within a single MG. For simplicity of notation we omit the index κ.

Due to the distributive structure of ADMM, the residential energy systems do not need to share information with their neighbours but only with the CE, see also the star-structure in Figure 2. In each iteration of ADMM, subsystem i has to transmit its solution z i of (10a) to the CE. The optimization scheme presented in Algorithm 1, however, suggests to run ADMM more than once in order to improve the performance w. r. t. to the objective function (1). In order to avoid unnecessary communication, we propose to use surrogate models to approximate the optimization routine (10). More precisely, we are interested in a function which approximates φ : R N × R I × R N R N :

(11) φ ( w ¯ , x ( k ) , ζ ¯ ) = z ¯ ,
for all feasible ( w ¯ , x ( k ) , ζ ¯ ) R N × X × R N . Note that we do neither assume knowledge on the local net consumption w i , i [ 1 : I ] , nor on the future SoC x ( n ) , n > k .

Figure 4 (top) shows that if the approximation (11) is sufficiently accurate, the impact on the performance of the optimization scheme is negligible. Here, the costs J ( z ¯ , δ ) after optimization are visualized for 48 consecutive time steps (equals 24-hours simulation time). In the experiment, we disturbed the ADMM solution in Algorithm 1 by uniformly distributed additive noise, i. e., z ¯ + 10 p · d , where the vector d U ( 1 , 1 ) , and p N 0 denotes the intensity of the disturbance.

Note that (11) might yield approximations to the solution z ¯ that either aggravate the performance w. r. t. (6) compared to the solution w ¯ associated with u 0 or solutions that correspond to an infeasible control u ˆ U . As a remedy, we propose to apply ADMM once after replacing it by a surrogate in the optimization scheme. More precisely, first we run Algorithm 1 using a surrogate in Step 4(b) until the while loop terminates and then we additionally repeat Steps 4 and 5 using ADMM.

### 4.1 Well-posedness

The following proposition states that for equality in (11), a proper mapping is defined. For a concise notation we replace the index κ i by i here.

### Proposition 1.

Consider φ given by (11), where z ¯ describes the optimal solution of (6) computed via ADMM, i. e., z ¯ = z ¯ ( u ) . We assume all hyper-parameter to be fixed meaning that { T , α i , β i , γ i , C i , u ¯ i , u _ i } in (3)(4) are constant over time for all i [ 1 : I ] . Then φ is a mapping, i. e., for all ( w ¯ , x ( k ) , ζ ¯ ) R N × X × R N , there exists a uniquely determined z ¯ R N such that z ¯ is the solution to the optimization problem (6).

### Proof.

First note that ADMM yields the unique solution of (6), see e. g., [5]. Furthermore, there are no constraints on z i , i [ 1 : I ] , and the future SoC can be interpreted as an affine function of the current SoC and the future (dis-) charging rate. Hence, expansion of (3a) and averaging of (3b) yield,

min u z ¯ ( u ) ζ ¯ 2 2 , subject to x i ( k + 1 + n ) = α i n + 1 x i ( k ) + T = k k + n α i n + k ( β i u i + ( ) + u i ( ) ) , x i ( k ) = x ˆ i , and constraints (4) , z ¯ ( k + n ) = w ¯ ( k + n ) + u ¯ + ( k + n ) + γ ¯ u ¯ ( k + n ) , n [ 0 : N 1 ] ,
where · ¯ denotes the corresponding average value w. r. t. all subsystems, in particular z ¯ ( n ) = 1 I i = 1 I z i ( n ) . This representation of ( 6) illustrates that the (predicted) average values ζ ¯ = ( ζ ¯ ( k ) , , ζ ¯ ( k + N 1 ) ) and w ¯ = ( w ¯ ( k ) , , w ¯ ( k + N 1 ) ) and the current SoC x i ( k ) i = 1 I , uniquely determine the optimal solution z ¯ ( u ) obtained by ADMM.  □

### Remark 2.

Note that ζ ¯ z ¯ as introduced in [19] does not define a mapping since the solution z ¯ of (6) not only depends on the reference value ζ ¯ but also on the future net consumption w ¯ and the current SoC x i ( k ) .

### 4.2 Radial basis functions approximation

Radial Basis Functions (RBFs) are used to interpolate functions based on a set of sampling data. We briefly recap some basics on RBFs. For a detailed introduction to theory and application see e. g., [20], for a similar approach where RBFs are used to replace ADMM we refer to [19].

Let M N denote the number of samples. Then, the interpolation function of (11) is given as the sum of basis functions ψ m : R N × X × R N R , m [ 1 : M ] , and a regularization term q : R N × X × R N R N . More precisely,

(12) z ¯ φ RBF ( χ ) = m = 1 M ψ m ( χ ) α m + q ( χ ) ,
where χ = ( w ¯ , x ( k ) , ζ ¯ ) is the joint inputs of Proposition 1, and α m R N , m [ 1 : M ] . The basis functions are so-called radial basis functions of the form, ψ m ( χ ) = ψ ( χ χ m ) , where the kernel ψ yields support close to the sampling data χ m , m [ 1 : M ] . We choose an affine linear regularization q ( χ ) = β 0 + B χ . Note that different choices are possible. The missing parameters α m , β 0 and B are determined by interpolation conditions, cf. [ 19], [ 20].

In Figure 5, a possible fit via RBFs is visualized. Here, we interpolated given data from two-weeks of optimization (4540 data points) based on sampling data picking each 25-th data point to train (12). Then, we tested φ RBF on the following day, and plotted the fitting. Our implementation is based on the Matlab toolbox DACE [27]. Note that the evaluation time of the RBF approximation grows with the number of data points used. Already with 180 data points to train (12) with N = 6 causes the function evaluation of φ RBF to be expensive, see Table 1. Using more data points would no longer yield an advantage over using ADMM w. r. t. computation time.

### 4.3 Neural networks approximation

Neural Networks (NNs) are a state-of-the-art method in artificial intelligence frameworks. Based on huge amounts of data M 1 they are able to learn and recognize patterns in complex systems. We consider a NN of l-layers as an approximation to the mapping (11), i. e.,

(13) z ¯ φ NN ( χ ) = σ W [ l ] σ ( W [ 2 ] χ + b [ 2 ] ) + b [ l ] ,
where σ denotes the sigmoid function, and the weights W [ l ] and biases b [ l ] are determined during the training phase. Here, the number of neurons at layer l 1 and at layer l determine the number of rows and columns of W [ l ] , respectively. Note that a separate neural network is trained for each MG. For an introduction to deep learning and neural networks we refer the reader to [ 28], [ 29]. To train the NNs, we used Matlab’s built-in toolbox nftool.

The overall goal of the approximation (13) is to be sufficient in the sense of the MPC performance shown in Figure 6. Our experiments in Figures 4 (bottom) and Figure 5 show that with one hidden layer of ten neurons only, a satisfying approximation on a 24-hours time window can be achieved if the training data is large enough. Note that NNs benefit from big data. In our case study, we trained the NN only on data corresponding to two weeks.

## 5 Numerical proof-of-concept

Model Predictive Control (MPC) is a method to tackle optimal control problems on an infinite time horizon by solving a series of finite dimensional optimization problems instead, see e. g., [9] for an introduction to non-linear MPC.

### 5.1 Model predictive control (MPC)

Consider the optimal control problem (6). In order to provide an optimal control sequence over an arbitrary long time horizon we use MPC. To this end, at current time instance k N 0 we assume the future net consumption ( w i ( k ) , w i ( k + 1 ) , , w i ( k + N 1 ) ) R N to be predicted for all subsystems i [ 1 : I ] . Based on the prediction, Algorithm 1 is executed (per MG) to determine control sequences u i , i [ 1 : I ] , and an exchange strategy δ. Then, only the first instances u i ( k ) and δ ( k ) are implemented and the time instance is incremented. Algorithm 2 outlines this MPC scheme.

Note that Problems (6) or (8) and (2) have to be solved in order to determine z ¯ and δ in each MPC iteration. Therefore, the open-loop costs J ( z ¯ , δ ) can be computed in each iteration as well (cf. Figure 4). However, since Step 3 in Algorithm 2 suggests to only implement the first instance of the controls computed in Step 2, these costs are not attained. Instead the stage costs

(14) κ = 1 Ξ ζ ¯ ( k ) I κ ν = 1 Ξ δ ν κ ( k ) η ν κ I ν z ¯ ν ( k ) 2
are realized at each time step k N 0 (cf. Figure 6).

### 5.2 Usage of surrogate models in MPC

We compare the performances using ADMM, RBFs, and NNs on the lower-level, i. e., in Step 4(b) of Algorithnm 1. In all numerical simulations we set T = 0.5 , N = 6 , Ξ = 4 , I 1 = 50 , and I 2 = I 3 = I 4 = 10 .[1] The battery parameters were randomly chosen with mean values C = 0.98 , u _ = 0.24 , and u ¯ = 0.25 . Based on the battery capacities we set x ˆ i = 0.5 C i . In order to incorporate losses along the transmission lines, we used the efficiency matrix,

η = 1.0 0.9 0.9 0.85 0.9 1.0 0.0 0.85 0.9 0.0 1.0 0.0 0.85 0.85 0.0 1.0 in (1) .
For simplicity of the numerical computation, we only replaced the lower-level optimization routine for MG 1 and, thus, avoid training a separate surrogate model for each MG. We used Matlab for implementation.

Results on the MPC closed loop can be found in Figure 6 and Table 1. In Figure 6 the closed-loop performances of ADMM (black line) compared to perturbed ADMM, and ADMM (black line) compared to the two surrogate models are visualized. Similar to the open-loop case, small disturbances in ADMM have little impact and RBFs outperform the NN. The first column of Table 1 compares the sum of all MPC closed-loop performances using ADMM, RBFs and a NN while in column 2 the average runtimes of these approaches are reported. Note that when using a surrogate, we call ADMM once per MPC iteration. As elaborated in [7] in each ADMM iteration an N-dimensional vector has to be transmitted twice. Hence, both surrogates reduce the need for communication. Two great advantages of ADMM are that the local optimization (10a) can be parallelized and the global optimization is independent of the size of the MG. However, a single function evaluation such as (12) or (13) is faster than running the entire ADMM optimization routine.

Table 1

Comparison of the summed MPC closed-loop performance k = 0 47 κ = 1 Ξ ζ ¯ ( k ) I κ ν = 1 Ξ δ ν κ ( k ) η ν κ I ν z ¯ ν ( k ) 2 and runtime (per call): ADMM vs. RBFs vs. NNs.

 closed-loop cost runtime [ms] no control 12,228 — ADMM 4,416 2.5 RBFs 4,529 1.2 NNs 5,598 0.05

Note that in column 2 of Table 1 we ignored the communication between smart homes and CE which is needed to apply ADMM in practice. However, the runtime of ADMM impairs when executed in an actual smart grid while surrogates do not require additional communication.

In order to improve the performance of the NN, more sampling data has to be generated to increase the training set significantly. To avoid large offline computation times, we chose N = 6 , i. e., a prediction horizon of three hours, which is rather short compared to [7], [19].

### Remark 3.

We point out two implementation details to solve (2) efficiently. First, note that the optimization (2) can be parallelized in n, since there is no coupling. Furthermore, we replace (2c) by

δ κ ν ( n ) · δ ν κ ( n ) ε
for some tolerance ε > 0 to smooth the feasible set.

## 6 Conclusions

In this paper we recalled an optimization problem arising in large-scale electrical networks. We proposed an iterative bidirectional optimization scheme to tackle this problem in a distributed way, and showed numerically that a small error on the lower level does not have noticeable impact on the performance. Based on this observation, we replaced the lower-level optimization by surrogate models using radial basis functions and artificial neural networks. The numerical results show the potential of using these surrogates to reduce communication effort and computational time in MPC while preserving the overall performance.

### References

1. S. Parhizi, H. Lotfi, A. Khodaei and S. Bahramirad. State of the Art in Research on Microgrids: A Review. IEEE Access, 3(1):890–925, 2015.10.1109/ACCESS.2015.2443119 Search in Google Scholar

2. D. E. Olivares, C. A. Cañizares and M. Kazerani. A centralized optimal energy management system for microgrids. In 2011 IEEE Power and Energy Society General Meeting, pages 1–6, 2011. Search in Google Scholar

3. R. Okubo, S. Yoshizawa, Y. Hayashi, S. Kawano, T. Takano and N. Itaya. Decentralized Charging Control of Battery Energy Storage Systems for Distribution System Asset Management. In 2019 IEEE Milan PowerTech, pages 1–6, 2019. Search in Google Scholar

4. D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Belmont, MA, USA: Athena Scientific, 1989. Search in Google Scholar

5. S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011. Search in Google Scholar

6. B. Houska, J. Frasch and M. Diehl. An Augmented Lagrangian Based Algorithm For Distributed Nonconvex Optimization. SIAM J. Optim., 26(2):1101–1127, 2016.10.1137/140975991 Search in Google Scholar

7. P. Braun, T. Faulwasser, L. Grüne, C. M. Kellett, S. R. Weller and K. Worthmann. Hierarchical distributed ADMM for predictive control with applications in power networks. IFAC J. Syst. Control, 3:10–22, 2018.10.1016/j.ifacsc.2018.01.001 Search in Google Scholar

8. A. Engelmann, Y. Jiang, T. Mühlpfordt, B. Houska and T. Faulwasser. Towards Distributed OPF using ALADIN. IEEE Trans. Power Syst., 34(1):584–594, 2019.10.1109/TPWRS.2018.2867682 Search in Google Scholar

9. L. Grüne and J. Pannek. Nonlinear Model Predictive Control. Theory and Algorithms. Springer, London, 2 edition, 2017. Search in Google Scholar

10. M. Khalid and A. V. Savkin. A model predictive control approach to the problem of wind power smoothing with controlled battery storage. Renewable Energy, 35(7):1520–1526, 2010.10.1016/j.renene.2009.11.030 Search in Google Scholar

11. A. Parisio, E. Rikos and L. Glielmo. A model predictive control approach to microgrid operation optimization. IEEE Trans. Control Syst. Technol., 22(5):1813–1827, 2014.10.1109/TCST.2013.2295737 Search in Google Scholar

12. G. Graditi, M. L. Di Silvestre, R. Gallea and E. R. Sanseverino. Heuristic-Based Shiftable Loads Optimal Management in Smart Micro-Grids. IEEE Trans. Ind. Informat., 11(1):271–280, 2015.10.1109/TII.2014.2331000 Search in Google Scholar

13. P. Braun, L. Grüne, C. M. Kellett, S. R. Weller and K. Worthmann. Model Predictive Control of Residential Energy Systems Using Energy Storage & Controllable Loads. Progress in Industrial Mathematics at ECMI 2014. Mathematics in Industry, 22:617–623, 2016. Search in Google Scholar

14. R. R. Appino, J. Á. G. Ordiano, R. Mikut, T. Faulwasser and V. Hagenmeyer. On the use of probabilistic forecasts in scheduling of renewable energy sources coupled to storages. Applied Energy, 210:1207–1218, 2018.10.1016/j.apenergy.2017.08.133 Search in Google Scholar

15. R. R. Appino, M. Muñoz-Ortiz, J. Á. G. Ordiano, R. Mikut, V. Hagenmeyer and T. Faulwasser. Reliable Dispatch of Renewable Generation via Charging of Time-varying PEV Populations. IEEE Trans. Power Syst., 34(2):1558–1568, 2018. Search in Google Scholar

16. R. H. Lasseter. Smart distribution: Coupled microgrids. Proceedings of the IEEE, 99(6):1074–1082, 2011.10.1109/JPROC.2011.2114630 Search in Google Scholar

17. P. Braun, P. Sauerteig and K. Worthmann. Distributed optimization based control on the example of microgrids. In M. J. Blondin, P. M. Pardalos and J. S. Sáez, editors, Computational Intelligence and Optimization Methods for Control Engineering, volume 150 of Springer Optimization and Its Applications. Springer International Publishing, 173–200, 2019. Search in Google Scholar

18. A. Sinha, P. Malo, and K. Deb. A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications. arXiv preprint arXiv:1705.06270, 2017. Search in Google Scholar

19. S. Grundel, P. Sauerteig and K. Worthmann. Surrogate Models For Coupled Microgrids. In I. Faragó, F. Izsák and P. Simon, editors, Progress in Industrial Mathematics at ECMI 2018, 30, 1 edition, Springer International Publishing, 2019. DOI: 10.1007/978-3-030-27550-1. Search in Google Scholar

20. M. D. Buhmann. Radial basis functions: Theory and implementations, volume 12. Cambridge university press, 2003. Search in Google Scholar

21. O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed and H. Arshad. State-of-the-art in artificial neural network applications: A survey. Heliyon, 4(11), 2018. Search in Google Scholar

22. H. Zhang, F. Xu and L. Zhou. Artificial neural network for load forecasting in smart grid. In 2010 International Conference on Machine Learning and Cybernetics, volume 6, pages 3200–3205, 2010. Search in Google Scholar

23. P. Siano, C. Cecati, H. Yu and J. Kolbusz. Real Time Operation of Smart Grids via FCN Networks and Optimal Power Flow. IEEE Trans. Ind. Informat., 8(4):944–952, 2012.10.1109/TII.2012.2205391 Search in Google Scholar

24. K. Worthmann, C. M. Kellett, P. Braun, L. Grüne and S. R. Weller. Distributed and Decentralized Control of Residential Energy Systems Incorporating Battery Storage. IEEE Trans. Smart Grid, 6(4):1914–1923, 2015.10.1109/TSG.2015.2392081 Search in Google Scholar

25. E. L. Ratnam, S. R. Weller, C. M. Kellett and A. T. Murray. Residential load and rooftop PV generation: an Australian distribution network dataset. Internat. J. Sustain. Energy, 2015. Search in Google Scholar

26. J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2006. Search in Google Scholar

27. S. N. Lophaven, H. B. Nielsen and J. Søndergaard. DACE-A Matlab Kriging toolbox, version 2.0. Technical report, 2002. Search in Google Scholar

28. I. Goodfellow, Y. Bengio and A. Courville. Deep Learning. MIT Press, 2016. Search in Google Scholar

29. C. F. Higham and D. J. Higham. Deep Learning: An Introduction for Applied Mathematicians. arXiv preprint arXiv:1801.05894, 2018. Search in Google Scholar