Artificial neural networks for inverse design of resonant nanophotonic components with oscillatory loss landscapes

Abstract Machine learning offers the potential to revolutionize the inverse design of complex nanophotonic components. Here, we propose a novel variant of this formalism specifically suited for the design of resonant nanophotonic components. Typically, the first step of an inverse design process based on machine learning is training a neural network to approximate the non-linear mapping from a set of input parameters to a given optical system’s features. The second step starts from the desired features, e.g. a transmission spectrum, and propagates back through the trained network to find the optimal input parameters. For resonant systems, this second step corresponds to a gradient descent in a highly oscillatory loss landscape. As a result, the algorithm often converges into a local minimum. We significantly improve this method’s efficiency by adding the Fourier transform of the desired spectrum to the optimization procedure. We demonstrate our method by retrieving the optimal design parameters for desired transmission and reflection spectra of Fabry–Pérot resonators and Bragg reflectors, two canonical optical components whose functionality is based on wave interference. Our results can be extended to the optimization of more complex nanophotonic components interacting with structured incident fields.


Introduction
Inverse design of optical components is the process of calculating the material properties that yield a specific optical response. Popular computational methods for inverse design are evolutionary algorithms [1] and density topology optimization [2][3][4].
A prevalent method in density topology optimization is the adjoint method. This technique provides a way to efficiently compute gradients of a loss function with respect to design parameters, as was demonstrated in the design of demultiplexers that separate light of different wavelengths [5,6]. A similar method underlies the design of tunable metasurfaces [7] and nonlinear nanophotonic devices [8]. Recently, Angeris et al. [9] found a technique to calculate the bounds on the performance of these optical structures.
An interesting alternative formalism was introduced by Peurifoy et al. in 2018 [10]. The authors applied deep learning to the inverse design of multi-layered nanoparticles in two different steps. First, a neural network is trained to predict the particles' scattering cross-section as a function of each shell's thicknesses. This network is trained using a supervised learning algorithm. Second, this trained network is used to perform gradient descent on the design parameters. In other words, after the first training step, all weights and biases of the network are fixed, and only the input parameters can be updated to reduce the distance between the predicted and the desired scattering crosssection. So et al. improved this technique of inverse design of nanoparticles in 2019 [11] by adapting the loss function of the neural network to learn the thickness as well as the material of each layer. More advanced neural network architectures, such as a tandem network [12], Generative Adversial Networks (GANs) [13][14][15][16] and autoencoders [17,18] have also been successfully used in the field of inverse design for optical components. In addition, deep learning has found use in various vibrant areas of nanophotonics, including integrated photonics [19], holography [20], diamond photonics [21], plasmonic nanoparticles [22] and topological photonics [23]. More examples can be found in recent reviews [24,25].
A problem arises when applying these inverse design methods to optical systems with strong resonances. These resonances are often based on the interference of many partial waves. A highly oscillatory loss function originates because of the periodicity of the phase difference between the interfering waves. The gradient-based inverse design thus often gets stuck in local minima, far away from the global optimum. In this contribution, we propose a solution to this problem, based on Fourier analysis.

Predicting the transmission spectra of simple resonators
We first illustrate our technique using a simple onedimensional Fabry-Pérot resonator: a thin slice of a dielectric medium. When light hits the resonator, it is reflected several times inside the medium. The interference of these partial waves heavily influences the reflected and transmitted light intensities. This leads to a non-trivial dependence of the transmission and reflection spectra on the wavelength of the incident light and the physical parameters of the resonator, as illustrated in Figure 1(A). Four parameters determine the interference of the partial waves: the wavelength of the incoming light λ, the angle of incidence θ, the index of refraction n and the width of the resonator l. The wavelength-dependent transmission is then given by where F(θ, n) 4R/(1 − R) 2 is the coefficient of Finesse, and δ 0 (θ, n, l) 4πnl cos(θ mat ) is a parameter that encodes the phase difference between the different partial waves that are transmitted. Finally, the parameter θ mat is the transmission angle inside the resonator, determined by the incident angle and the index of refraction, in agreement with Snell's law.
We train a neural network to predict the spectral transmission T(λ) from given values for F and δ 0 . The network consists of an input layer with two nodes, a number of hidden layers with adjustable weights and non-linear activation functions, and an output layer with 200 nodes, representing the transmission spectrum sampled at different wavelengths. During this training procedure, a batch of values of F and δ 0 , all normalized between −1 and 1, are provided as input to the network. The network performs a non-linear mapping from this input to an output in the form of a transmission spectrum; the mean squared error (MSE) between this output and the actual transmission spectrum is the loss function. Subsequently, the gradient of this loss function is propagated back through the network to update the values of the weights. This procedure gradually improves the spectral predictions of the network. After a number of epochs-where in each epoch, the entire training data set is used once-predictions of the network closely match the actual spectra. This is visualized in Figure 1.
We performed a search over network architectures and activation functions. The best performing neural network had six hidden layers, 200 nodes in each hidden layer, and the swish activation function. The full details of the metasearch and the resulting network are given in the Supplementary material. The final network model obtains a test mean absolute error (MAE) of 0.485 ± 0.060%. Some examples of predicted spectra T(λ) during training are shown in Figure 1(C)-(F). The target spectrum or ground truth, calculated using Eq. (1), is shown in orange; the predictions during different phases of the training are shown in blue. The spectra with sharp peaks are somewhat harder to predict. This can be expected, since we sample the transmission at 200 points. A denser sampling of the transmission should lead to better predictions for transmissions with sharper peaks.

Inverse design in an oscillatory loss landscape
In the second step of the inverse design procedure, we harness the fully trained network and the fact that its structure allows us to perform a gradient descent from output to input in a straightforward way. Our goal is to find the material parameters that yield the desired transmission spectrum T(λ). The input values are initialized with random values, representing a first guess of the material parameters F and δ 0 . Unlike before, only these input values are updated at this point. The weights of the network remain fixed throughout this step. The neural network is used to compute the transmission spectrum from its input values, and subsequently the MSE between the obtained and desired spectrum yields the loss function given by The gradient of this loss function is propagated back to the input nodes, where the values are adjusted in a direction that minimizes this loss function. Over time they thus converge to the material parameters that yield a transmission spectrum T(λ) that is most similar to the desired transmission T(λ).
We repeatedly encountered one specific problem in our experiments: convergence to one of the local minima in the oscillatory loss landscape. As a result, without carefully setting the initial values for F and δ 0 , the global minimum is often not found. This is illustrated in Figure 2 and in the left panel of Figure 3. Figure 2(A) depicts the trained network, while Figure 2(B) shows the desired transmission in orange along with two obtained transmission spectra for two different sets of initial values of F and δ 0 in blue and red. Figure 2(C) shows the mean squared error (MSE) between the desired and obtained transmissions during training. The orange line indicates the loss for predictions with the correct parameters F and δ 0 . The other panels represent the evolution of the parameters F and δ 0 . Here, the orange line shows the true parameters from which the desired transmission T(λ) was computed. We distinguish two cases. A first case where inverse design works, converging to the global minimum, and a second case where the gradient descent gets stuck in a local minimum. In this second case, the obtained and the desired transmissions do not look alike. We also obtain a considerably higher loss than the one obtained for the correct parameters for F and δ 0 . In the other case, we choose an initial value sufficiently close to the true δ 0 , such that the gradient descent does converge to a global optimum. We see that the spectra are indeed very similar (orange and blue lines). We notice that the loss is as low as it would be for the true parameters.
In Figure 3(A), we see that the loss landscape appears to have strong oscillations along the direction of δ 0 , which is causing the local minima. We can illustrate this further by looking at cross-sections of the loss landscape. For a section at constant δ 0 , the MSE loss is perfectly suited for gradient descent in the direction of F. There is only one minimum to which we can converge. The story is different for the parameter δ 0 (Figure 3(B)-(C)). The MSE loss shows sharp local minima, from which it is difficult to escape.
In the specific case of the oscillatory loss landscapes, as is found in the inverse design of resonant components, we propose a more efficient solution-one grounded in physical arguments: the key idea is to take the Fourier transform of the desired spectrum T(λ) into account. Since δ 0 is linked to the number of oscillations in the transmission spectrum, meaningful information about δ 0 is adequately captured by the Fourier transform of the transmission spectrum. Therefore, we introduce a second loss function, the mean squared error (MSE) on the Fourier power spectrum of the target function T(λ), given by We show the loss landscape of this function in Figure 3(D)-(F). Here, we only consider the first 10 frequency bins, since the transmission spectra do not oscillate more than 10 times in the region of interest. To extract the information on the relative power in the different frequencies, we normalize this total power spectrum to one. The relative power in the different frequencies minimally depends on the oscillation amplitude. This implies that the normalized power spectrum is insensitive to the parameter F, which controls the oscillation amplitude. It also means that every cross-section of the loss landscape along δ 0 with constant F will approximately be the same. We see in Figure 3(D) that this is indeed the case.
We can use this fact to initialize δ 0 sufficiently close to the true value to find the global minimum more quickly in the gradient descent algorithm. Two sections, at F = 0 and at the true parameter value F = −0.932, are plotted in Figure 3(E)-(F). We can see that these sections reach a global minimum at similar values of δ 0 . Thus, we can compute this value for δ 0 without knowing the true value for F beforehand. Simply computing the minimum of the obtained section of the Fourier power spectrum at a chosen value of F, e.g. F = 0, yields the desired result.
Unfortunately, the assumption that the sections of the MSE on the Fourier power spectra are equal for all values of F is not entirely valid, as we already see in Figure 3. To mediate this problem, we re-initialize δ 0 after a few iterations of gradient descent, when we are closer to the true parameter for F. In our algorithm, we make this update every 100 epochs. For values of F that are closer to the true parameter, the sections will be more similar to the global minimum section. This way, the initial value for δ 0 gets closer to the true parameter every time we re-initialize.
In the end, we are not interested in matching the Fourier transforms of the spectra, but merely in matching the spectra themselves. The Fourier power spectrum we used so far also does not take information on the phase of different frequencies into account. Therefore, our goal is better achieved by a combined loss function that periodically re-initialises δ 0 : the sum of the MSE on the spectra and the MSE on the Fourier power spectra of the spectra. We thus take the MSE loss on the spectra into account in this periodic update: Our algorithm is thus alternatively minimizing two loss functions. It minimizes the MSE on the spectrum through gradient descent, yielding better values for F and δ 0 . We avoid getting stuck in local mimima during this process by periodically minimizing a second, combined loss function L given above. The latter minimization procedure consists of a straightforward computation of the value for δ 0 corresponding to a minimum of the loss at the current value for F. The other device parameter F is initialized at 0.0. To verify whether a different initialization would influence the results, we repeated the inverse design of one transmission from the test set for 100 different initializations of F. We demonstrate that all simulations converged to the optimal solution. In the following we explain how this method considerably improves the efficiency of the search in the parameter space.

Comparison with other search algorithms
Several solutions exist to overcome the premature convergence on local minima. One solution consists of increasing the stochastic features of the gradient descent. In this way, updates to higher loss regions can occur, allowing the search algorithm to escape from the local minima. Another solution consists of abandoning the neural network gradient descent altogether: we could use the trained network only in the forward direction, to compute the loss for any given set of input parameters, and then combine this with a stochastic local search algorithm such as simulated annealing. Both these approaches potentially visit a vast area of the parameter space before finding the global minimum, thus decreasing the inverse design procedure's efficiency.
To validate our algorithm, we compare it to a classic local search algorithm in the form of simulated annealing (SA) [26]. We use SA to minimize the loss function obtained by the trained neural network's predictions. For a fair comparison, we use both the classic SA as the more advanced dual annealing (DA) algorithm [27]. This DA algorithm combines a larger neighbourhood function with an additional local search and a restart mechanism, resulting in a highly performant search process.
We evaluate our inverse design algorithm and the two simulated annealing algorithms on 1000 transmission spectra from the Fabry-Pérot test set. The metric is the mean absolute error (MAE) between the obtained and desired spectra. We determine that a global minimum is found when we have an MAE less than 0.01.
Without fine tuning of the hyperparameters, both our algorithm and the DA reached the global minimum in 95% of the cases, while the SA algorithm never found the global minimum. But the actual improvement lies in the number of steps taken before the global minimum is reached: DA covers a large part of the parameter space, while our approach by design follows a simple path, almost straight to the global minimum. The path taken by the DA algorithm is 84 times longer than the path of our optimization algorithm. This is further illustrated in the Supplementary material.

More general resonant devices
To test if our method works for problems more complex than the Fabry-Pérot resonator, we also applied it to Bragg reflectors. The Bragg reflector consists of two alternating layers of materials with different refractive indices. The interference of many internal reflections again leads to a non-trivial wavelength-dependent reflection spectrum R(λ). Now four parameters determine the spectra: the difference Δn between the two refractive indices of the materials, the mean refractive index n of the whole structure, the total length Λ of a unit of two alternating layers and the number of these units that are stacked p. The underlying analytical formula for R(λ) is given in the Supplemental material.
We trained a neural network to learn the mapping from the four resonator parameters to the reflection R(λ). Then we evaluated our inverse design approach on three spectra in the test set. We used the second loss function to initialize the four parameters. We re-initialized them every 300 epochs for Λ and Δn and every 600 epochs for n and p. The results of these experiments are shown in Figure 4(A)-(C). These figures show that our inverse design is capable of scaling to a problem with four parameters. In the Supplemental Material, we also show that the performance of the inverse design is not influenced by the number of stacked layers p. The dual annealing algorithm is also able to tackle this problem well. It however takes five times longer to converge, illustrating our algorithm's superior efficiency.
To show that we can also approximate arbitrary reflection spectra with our inverse design method, we borrowed a famous example of a one-dimensional curve from the novel "Le petit prince" by Antoine de Saint-Exupéry. We made a piecewise approximation of an elephant's shape and performed inverse design on this arbitrary "reflection" spectrum. The result, shown in Figure 4(D), illustrates that our inverse design method is capable of approximating a broad range of reflection spectra.

Conclusion
Finding the global minimum in a parameter space where there are many local minima is a significant problem in inverse design. In this work, we show that, in the context of resonant photonic components, this problem can be addressed by constructing a combined loss function that also fits the Fourier representation of the target function. Physically, this idea is grounded in the fact that the Fourier representation contains information about the target function's oscillatory behaviour. For many resonant photonic components, these oscillations straightforwardly correspond to specific parameters in the input space. We demonstrated our technique with Fabry-Pérot and Bragg resonators and quantified the superior search efficiency compared to other optimization methods. Our technique will be particularly valuable for inverse design of optical components based on many nanophotonic resonators, such as coupled or multi-layered resonators [28][29][30], lasers [31], dispersion-engineered metasurfaces [32] and (nonlinear) resonators in the context of integrated optics [33,34]. Finally, because this method can incorporate several parameters of the incident light, we think it will be interesting to study and optimize complex materials interacting with structured light beams [35][36][37][38][39].