## 1 Introduction

Research on multiple objective optimization (MO) has been attracting significant attention of the engineering community since 1980s; with the aid of fast computers solutions to many complex optimization problems have been made possible. The Vector Evaluated Genetic Algorithm (VEGA) [1] is one of the earliest examples of Multi-Objective Evolutionary Algorithms (MOEAs). The more recent developments include NSGA-II [2] and its modified versions as well as Particle Swarm based methods [3]. A comprehensive review of problem definitions and non-EA based solution methods may be found in [4].

There is an increasing number of indicator-based MOEAs that have been proposed in recent years; the indicator is used as a fitness measure for a set of Pareto points, and – by optimizing the indicator function – the MO problem essentially becomes a single objective optimization problem as the solver only needs to locate the optimal value of the indicator value and update the generation based on it. One of the best-known indicators is the hypervolume [5]; it has been successfully applied to both EAs and surrogate-based algorithms. Despite its unique feature of being strictly monotonic to Pareto improvements [6], it suffers from high computational cost for higher dimensions.

The general opinion favors EAs as advantageous in solving MO problems by often being population based, thus multiple solutions can be obtained in a single run. However, solutions to practical problems may be expensive in terms of computational time and effort. In the context of electromagnetic devices, the finite element method is a common design tool; it often takes hours or even days to obtain a single solution, therefore surrogate model based algorithms are often preferred.

In this study we propose a new indicator focused Localized Probability of Improvement (LPoI) approach for MO problems; its implementation requires the predicted mean and mean square errors to be available, hence it is not applicable to other EAs, but for Gaussian based surrogate models (including those relying on kriging) it has the advantage of being linearly scalable to problems with higher number of objectives.

## 2 Kriging theory

Modern engineering design often involves implementation of deterministic computer simulation; in electromagnetic design, time consuming finite element models (FEM) are often built to represent the actual devices. Designs are analyzed and optimized before being put into production. In these types of problems, the optimization can be a very time consuming process due to a large number of FEM calls needed. Therefore surrogate modeling techniques are often used to reduce the number of expensive FEM simulations.

*Y*consists of a global mean

*f*and a local departure

*Z*:

where *x* is the location of any design site.

*σ*

^{2}and non-zero covariance. A general exponential correlation function is one of the most commonly used correlation functions, due to its continuous characteristic and flexibility

where *x _{i}* and

*x*are a pair of observations,

_{j}*k*is the problem dimension, while

*θ*and

_{n}*p*are hyperparameters controlling the shape of the correlation function.

_{n}*u*,

*σ*

^{2}and the hyperparameters

*θ*and

*p*are obtained via the Maximum Likelihood Estimation (MLE), with the maximum likelihood function given by

where *y* denotes all observations and ** R** is the correlation matrix.

The kriging prediction and the predicted mean square error (MSE) at a given location *x* are given as follows

with *μ*̂ and *σ*̂^{2} are the optimal mean and variance, respectively, obtained by solving the maximum likelihood function.

## 3 Localized probability of improvement

*PoI*at any location is given by

where *y _{t}* is the target of improvement,

*y*̂ the kriging predicted mean at location

*x*,

*s*̂ is a square root of the mean square error at location

*x*and

*Φ*(⋅) is the cumulative distribution function.

*y*is associated with the minimum value of each individual objective function; the subscript

_{ext}*ext*stands for “extreme value” and

*y*is given by

_{ext}*y*is the known minimum value of the

_{min}^{n}*n*objective function and

^{th}*p*is the percentage of improvement to be defined; parameter

*p*is discussed later in this section. The corresponding

*PoI*is:

where *y*̂^{n}, *s*̂^{n}, *y _{ext}^{n}* and

*PoI*are the corresponding measures of the

_{ext}^{n}*n*objective function.

^{th}*n*values of

*PoI*, which equals to the number of objectives, because the

*PoI*is calculated based on the extreme value of each objective function. We consider the maximum potential improvement for all individual objectives, hence

_{ext}*y*(

_{int}^{n}*x*) is associated withthe reference point that is defined based on the location of

*x*. The subscript

*int*stands for “intermediate” and

*y*is calculated as

_{ref}where *y _{ref}* is the calculated reference point.

*y*, the algorithm finds the Pareto front for the existing design sites using non-dominated sorting. For each closest set of Pareto points (the number of points is equal to the number of objectives) it calculates the corresponding reference point. The coordinates for the reference point of each dimension is equal to the maximum value of the coordinates for these Pareto points in the same dimension. The coordinates for the corresponding reference point in the

_{ref}*n*dimension

^{th}*Ref*(

*x*) is given by:

^{n}where *Y ^{n}* is the collection of the

*n*objective values for all of the points in that Pareto set.

^{th}Taking a bi-objective problem as an example, assuming the reference point *y _{ref}* is to be determined for Pareto points

*P*

_{1}and

*P*

_{2}, the coordinates of

*P*

_{1}and

*P*

_{2}are therefore denoted by [

*P*

_{1}.

*x*

^{1},

*P*

_{1}.

*x*

^{2}] and [

*P*

_{2}.

*x*

^{1},

*P*

_{2}.

*x*

^{2}], respectively. Note that

*x*is the

^{n}*n*objective value at the location in the search space associated with

^{th}*P*. The

*x*

^{1}and

*x*

^{2}coordinates (in the objective space) of the reference point are thus described as follows

*PoI*is given as

where *y _{ref}^{n}*,

*y*̂

^{n},

*s*̂

^{n},

*PoI*and

_{ref}^{n}*PoI*are the corresponding measures of the

_{ext}^{n}*n*objective function.

^{th}*n*values of

*PoI*for the second improvement target. However, unlike the first improvement target, the second one uses a localized target. Therefore, we consider using the minimum potential improvement for all individual objectives and hence

*LPoI*for any given point is the maximum of these two probability of improvement measures, given by

where *PoI _{ext}*, as described by (9), is due to the fact that the minimum of each individual objective function is always present in the Pareto front, thus the

*PoI*at each location

*x*, over the optimal target of that function, is always considered. This term also contributes to the diversification of the Pareto front.

Furthermore, *LPoI _{ref}* – as described by (15) – can be treated as a maximum of the minimum potential improvements to a local target. This term helps to improve the Pareto front both towards the origin and in the direction of the objective value. It contributes to the diversification of the Pareto front, while the max-min method also contributes to the uniformity of the Pareto front.

To obtain the next infill sampling point, the algorithm finds the location *x* associated with the maximum *LPoI* measure in the objective space.

The parameter *p* – as seen in (7) and (10) – is associated with the magnitude of target improvement; it controls the convergence rate of the algorithm. A smaller amount of improvement will guide the solver towards existing Pareto points, while a larger value will encourage the exploration of the design space. It is crucial to use a proper *p*, since too small a value may lead to a false Pareto front, while a large value may result in a slow convergence rate or zero probability of improvement at all unknown sites. Thus it is advisable to dynamically adjust the value while monitoring the convergence.

*p*in this paper. First, the initial improvement target percentage

*p*is defined and then the parameter

_{initial}*p*is calculated as

where *LPoI _{prev}* is a complete set of

*LPoI*at previous iteration.

The next infill point is taken at the location with a maximum *LPoI*. Therefore, the solver tends to minimise the localised probability of improvement and converges towards the Pareto front. When the design space is well explored, or *p* is especially small, the solver will converge towards existing Pareto fronts; at this stage, it is common for the *LPoI* to be equal, or come close, to 1 at multiple unknown sites (extremely likely to improve over the target point). In order to obtain a uniformly distributed Pareto front, the algorithm selects candidates which have the largest Euclidean distance to existing Pareto points compared to the next infill sampling points. For this reason, the maximum value of *LPoI* can be capped between 0.9 and 1 for faster exploitation of the existing Pareto front without degrading the overall performance.

## 4 Test examples

The top graph in Figure 1 shows the kriging model (solid line) after 45 iterations, with the red crosses plotted at the true Pareto ront, while the bottom plot shows the proposed indicator value for the unknown sites. As can be seen, the algorithm has correctly converged to all four Pareto point clusters in the search space and thus further sampling will lead to more exploitation on the Pareto front. The sampled design sites in the objective space are plotted in Figure 2, where the red dots indicate the location of the true Pareto front. The improvement direction imposed by the two improvement targets are illustrated in Figure 2, where the yellow arrows show the improvement direction for the first improvement target, and the blue arrows indicate the improvement direction for the second improvement target.

## 5 Solving the new TEAM problem

*z*is given by

The problem is specified as follows: given the current density *J* within the coil, and prescribed flux density, find the optimal *r* distribution of radii *r*(*z*), —*d* ≤ *z* ≤ *d* that yields the prescribed flux density *B*_{0}(*z*).

An initial arrangement of turns was given in the extended paper of [7], the width of each turn *w* and the height *h* are 1 mm and 1.5m m, respectively.

The model consists of 20 turns connected in series, symmetrically distributed, hence there are 10 radii which need to be optimized (the main objective *f*_{1}). Two additional objectives were proposed to complement the first objective *f*_{1}. The three objectives *f*_{1}, *f*_{2} and *f*_{3} may be described as follows:

*f*_{1}: find the optimal distribution of r, so that the discrepancy between the prescribed flux density *B*_{0} and the actual induction field *B* is minimized;

*f*_{2}: minimize the sensitivity function;

*f*_{3}: minimize the power loss related function.

Mathematically the three objective functions are expressed as

where *B*^{+} = *B* (*r* (*ξ*_{l} + *Δξ*), *z _{q}*),

*B*

^{—}=

*B*(

*r*(

*ξ*

_{l}—

*Δξ*),

*z*),

_{q}*l*= 1,

*n*and

_{t}*q*= 1,

*n*.

_{p}*Δξ*= 0.5 mm. At this stage it was suggested to consider only two objectives at the time,

*f*

_{1}and either

*f*

_{2}or

*f*

_{3}.

The optimization results are illustrated by Figs. 5 and 6, where objectives 2 and 3 are plotted against objective 1, respectively. The globally optimal points A and B (defined by the closeness to the respective utopia points) are defined by the radii distributions [11.4, 8.6, 9.1, 12.1, 8.9, 8.3, 7.0, 6.4, 6.8, 5.9] and [7.2, 10.6, 7.2, 6.6, 9.0, 5.2, 9.2, 5.0, 5.4, 6.9], respectively.

## 6 Conclusion

A novel approach to kriging-based multi objective optimization is put forward relying on the Localized Probability of Improvement. For illustration purposes a bi-objective test problem is provided, as well as the recently introduced TEAM benchmark problem. It is shown that the proposed method addresses efficiently both the diversification and uniformity of the Pareto solution, is computationally efficient and is linearly scalable to higher number of objectives.

## References

- [1]↑
Schaffer J.D., Multiple objective optimization with vector evaluated genetic algorithms, Proceedings of the 1st International Conference on Genetic Algorithms, 1985, 93-100

- [2]↑
Deb K., Agrawal S., Pratap A., Meyarivan T., A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II, in Schoenauer M. et al. (eds), Parallel Problem Solving from Nature PPSN VI, 2000

- [3]↑
Parsopoulos K.E., Vrahatis M.N., Particle swarm optimization method in multiobjective problems, SAC’02 Proceedings of the 2002 ACM Symposium on Applied Computing, 2002, 603-607

- [4]↑
Marler R.T., Arora J. S., Survey of multi-objective optimization methods for engineering, Structural and Multidisciplinary Optimization, 2004, 26, 6, 369

- [5]↑
Zitzler E., Thiele L., Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach, IEEE Transactions on Evolutionary Computation, 1999, 3, 4, 257-271

- [6]↑
Knowles J., Corne D., On metrics for comparing nondominated sets, Evolutionary Computation, Proceedings of CEC’02, Honolulu, 2002, 711-716

- [7]↑
Barba P.D., Mognaschi M.E., Song X., Lowther D.A., Sykulski J.K., A benchmark TEAM problem for multiobjective Pareto optimization of electromagnetic devices, IEEE Transactions on Magnetics, 2017, PP, 99