Iterative optimization of photonic crystal nanocavity designs by using deep neural networks

Devices based on two-dimensional photonic-crystal (2D-PC) nanocavities, which are defined by their air hole patterns, usually require a high quality (Q) factor to achieve high performance. We demonstrate that hole patterns with very high Q factors can be efficiently found by the iteration procedure consisting of: machine learning of the relation between the hole pattern and the corresponding Q factor, and new dataset generation based on the regression function obtained by machine learning. First a dataset comprising randomly generated cavity structures and their first principles Q factors is prepared. Then a deep neural network is trained using the initial dataset to obtain a regression function that approximately predicts the Q factors from the structural parameters. Several candidates for higher Q factors are chosen by searching the parameter space using the regression function. After adding these new structures and their first principles Q factors to the training dataset, the above process is repeated. As an example, a standard silicon-based L3 cavity is optimized by this method. A cavity design with a high Q factor exceeding 11 million is found within 101 iteration steps and a total of 8070 cavity structures. This theoretical Q factor is more than twice of the previously reported record values of the cavity designs detected by the evolutionary algorithm and the leaky mode visualization method. It is found that structures with higher Q factors can be detected within less iteration steps by exploring not only the parameter space near the present highest-Q structure but also that distant from the present dataset.


INTRODUCTION
Photonic nanocavities based on artificial defects in two-dimensional (2D) photonic-crystal (PC) slabs [1][2][3][4][5][6][7][8][9][10][11] have received significant attention as structures that enable preservation of photons for extended times in small modal volumes. 2D-PC slab cavities are usually defined by defects in the triangular air hole lattice of the PC. For example, cavities can be defined by a defect consisting of three missing air holes (the so-called L3 cavity), a single missing hole (H0 cavity), or a line defect with a modulation of the lattice constants (heterostructure cavity). Photons of the cavity modes are confined in such nanocavities in the in-plane and vertical directions by Bragg reflection due to the air hole pattern of the 2D PC and total internal reflection due to the refractive index contrast between the PC slab and the surrounding air or cladding layers, respectively. We note that the in-plane reflection is usually almost perfect while the vertical reflection is only partial [2]. Thus, the total spectral intensity of the wavevector components that do not fulfill the total internal reflection condition, i.e., the leaky components, determines the cavity's quality (Q) factor [12]. So far, various methods of optimizing cavity designs with respect to the Q factor have been proposed and demonstrated [2-5, [12][13][14][15][16][17][18][19]. Among them, the Gaussian envelope approaches [2,3], the leaky position visualization approach [17], and the analytic inverse problem approaches [13,14] utilize the knowledge of the physics of photon confinement mentioned above. For instance, the analytic inverse problem approaches are based on approximations that relate the cavities' structural parameters to the mode fields, and thus allow us to explicitly determine an optimized cavity geometry with less leaky components [13,14]. This type of approaches is very useful to optimize specific structural parameters, but targets are limited because suited analytical expressions are only available for certain cavity types. On the other hand, the Gaussian envelope and leaky position visualization approaches improve cavity designs based on the differences between the mode field calculated for the actual structure and the ideal mode field, which is artificially generated and has a minimum of leaky components [2, 3,17]. The comparison of these fields enables identification of spatial positions where leakage of photons occurs. However, since these approaches cannot predict the optimized structure, the modifications required for a reduction of leakage have to be manually identified by trial and error. While these approaches are useful in early optimization stages, they cannot utilize the large degree of freedom that is inherent to the 2D geometry of the air hole pattern. The reports on optimization of 2D-PC nanocavity designs by these approaches have so far considered only up to nine structural parameters (e.g. symmetric displacements of certain holes) for optimization [2,3,17], because it is difficult to manually locate better air hole patterns in the high-dimensional parameter space consisting of the positions of all individual air holes. Obviously, more systematic and automated methods of exploring high-dimensional parameter spaces are required to fully utilize the potential of 2D-PC nanocavities.
Minkov et al. utilized a genetic algorithm to explore the parameter space of the 2D-PC air hole pattern, and succeeded in tuning up to 11 parameters to find more suited nanocavity structures without using the physical knowledge of leaky components [15,16,18]. However, this approach requires a relatively large number of randomly generated sample cavity structures and their calculated Q factors: they have reported that 100 cycles  80 individuals = 8000 sample cavities (300 cycles  120 individuals = 36000 sample cavities) were required to optimize five (seven) parameters in the L3 (H0) cavity [16]. The relatively large number of required sample cavities is considered to be a consequence of the genetic algorithm, which basically utilizes only the good cavities among the sample cavities generated in each cycle. Recently, we have proposed an approach based on deep learning, demonstrating optimization of 27 parameters of a heterostructure cavity using a training dataset consisting of 1000 randomly generated air hole patterns and their calculated Q factors [19]. In [19], we trained a neural network (NN) by the sample dataset to obtain an approximate function of the Q factor with respect to the structural parameters. This regression function was then employed to detect new cavity structures that are likely to exhibit higher Q factors. The important point is that not only high-Q structures but also moderate or low-Q structures can be useful when searching new cavity geometries with higher Q factors (since both improve the accuracy of the regression function developed by the NN), although high-Q sample cavity structures are of course more helpful. However, one problem of this approach is that structures with Q factors much higher than that of the base cavity design are rarely generated during the preparation of the training dataset. Therefore, the accuracy of the regression function at the parameter space near extremely high Q factors is low.
In this report, we propose an iterative optimization method to overcome this problem: here, the candidate structures for higher-Q factors identified by the regression function at the present iteration step are added to the training dataset for the next step. The new dataset is used to derive an improved regression function. To increase the diversity of the new candidates, several different candidateselection constraints are defined and their combinations are used to efficiently explore the parameter space. In order to avoid strong influences of initial discoveries, one constraint is that the new candidate should lie at a parameter space distant from the structures that have already been analyzed. Additionally, we employ several NNs that learn the dataset in different orders, resulting in different regression functions. With these we can partly account for the uncertainty of the prediction by a NN. By repeating the optimization cycles, cavity structures that are important for detection of high-Q cavity structures are automatically accumulated in the dataset. To demonstrate this, we optimize the design of a silicon (Si) L3 cavity via 25 parameters. We are able to detect a structure with a maximum Q factor of almost 11 million by generating a total of 8070 sample structures within 101 iterations. This theoretical Q factor is more than two times larger than the Q factors of Si-based L3 cavity structures found by the genetic algorithm [16] and leaky mode visualization approaches [17].

FRAMEWORK
In this section we explain the procedures of the proposed iterative optimization method, which contains the preparation, learning, structure search, validation, and dataset update phases. The latter four phases are repeated to iteratively improve the regression function developed in the learning phase and the following structure search. The general design of the preparation and learning phases can be found in [19]. First of all, we assume that the type of 2D-PC cavity that is to be optimized, is known (in Section 3 we choose the L3 cavity). Next, the preparation phase consisting of the following three procedures (I) to (III) has to be implemented: I. Select the structural parameters of the base cavity (such as air hole positions and radii) that should be considered for optimization. Generate many sample cavity structures by randomly varying the selected parameters within a certain meaningful range.
II. Calculate the Q factors of the sample cavities generated in (I) by a first principles method to obtain the training dataset consisting of the sample cavity structures and the corresponding Q factors.
III. Prepare deep NNs that have input nodes corresponding to the structural parameters selected in (I) and have a single output node corresponding to the Q factor.
The learning phase is described by the following procedure: IV. Train the deep NNs prepared in (III) to learn the relation between the structure and the Q factor using the dataset prepared in (I) and (II) (only for the first round) or the updated dataset obtained in (VII) (for the following rounds). Let each deep NN learn the dataset in a different order so that they acquire different approximation functions of Q.
The structure search phase consists of V. Starting from a randomly chosen initial cavity structure, gradually change the structural parameters using the gradient (in the parameter space) of the approximated Q factor that is predicted by a trained deep NN. By this process, one new candidate structure with a potentially higher Q factor is located. Various candidate structures are prepared by using different deep NNs and by applying different constraints (described later).
The validation phase is straightforward: VI. Determine the accurate Q factors of the candidate structures by a first principles calculation.
After the learning, structure search and validation phases, the training dataset is updated and the next iteration cycle is carried out as follows: VII. Add the sets of the structures obtained in (V) and the Q factors calculated in (VI) to the training dataset.
By repeating the procedures (IV)-(VII), the sample cavities that are important for locating high-Q structures are automatically accumulated, because both correct and wrong predictions constitute important information for the development of an improved regression function. Figure 1 briefly illustrates the concept of the approach for optimization explained above.
(I-II) Dataset Preparation (IV) Train NNs by using the dataset to obtain a regression function that approximates .
Problem: Find that maximizes .
The value of for a structure can be computed, but with a large cost.  is only used to locate new structures via the gradient, but the values are not explicitly discussed in this work.

Optimization of the cavity design for a Si-based L3 nanocavity
In this section, we demonstrate the optimization of the cavity design for a L3 cavity made of Si by the proposed iterative optimization. The results are useful for device development and also provide a benchmark for the optimization performance of the presently used algorithm. The numbers given can be compared with those in previously reported methods [16,17], because the Si-based L3 nanocavity is a standard 2D-PC nanocavity. Figure 2 shows the basic structure of the presently considered L3 nanocavity, where the lattice constant is a, the radius of each air hole is 0.25 a, the thickness of the slab is 0.5366 a, and the refractive index of the slab material (Si) is n = 3.46. These values were chosen by considering the standard dimensions of fabricated nanocavities (a = 410 nm, t = 220 nm) operating at optical communication wavelengths [9,20], and the refractive index of Si at these wavelengths. The radii of the air holes are the same as those used in [16], and the slab thickness is similar to that in [16] (0.55 a). The color plot in Fig. 1 shows the electric field distribution of the fundamental mode in the y-direction (Ey). The distribution was calculated for the base cavity structure by a first principle method [three-dimensional finitedifference time-domain (3D-FDTD) method], and the resulting Q factor of the base structure is 7160. The modal volume Vcav of the mode is 0.61 cubic wavelength in the material (/ ) 3 3 , respectively. The displacements of the 50 air holes inside the red square are the structural parameters that are used to optimize the cavity design with respect to Q.

Preparation phase
[Step (I)]: The positions of the 50 air holes within the area of 11 (a)  5 (rows) (indicated by the red square in Fig. 2) are the structural parameters that are used to optimize the cavity design with respect to the Q factor, because most of the electric field intensity of the mode concentrates in this area [19]. Each sample cavity structure (labelled by index i) is defined by the base structure and a set of 2D displacement vectors { 2 … } , where ℎ = ℎ ℎ defines the displacement of h-th air hole in the x-y plane and h enumerates all air holes that are selected for structural optimization (from 1 to 50 in the present case). The parameter space vector of structure i, as defined in Fig. 1, is a single column vector with the structure (d1x, d1y, d2x, …, d50y) T and contains displacements corresponding to the single set { 2 … 50 } . Although we have 100 degrees of freedom in the 2D displacements of 50 air holes, the actual degrees of freedom in the present analysis are 25 because we have to impose mirror symmetries with respect to the central x and y axes to obtain high Q factors [12].
[Step (II)]: Random displacements are applied to all air holes in the x-and y-directions in such a way that the mirror symmetries of the structure are maintained and that a uniform distribution between -0.1a to 0.1a is obtained. The appropriate magnitude of the fluctuation has been determined in previous manual optimizations of L3 cavities [2,17]. In this demonstration, we initially prepare n = 1000 random nanocavity structures (the whole set is denoted by ) using the above outlined displacement restrictions, and calculate their Q factors using the 3D-FDTD method. The obtained set of Q values, , exhibits a distribution between 10 3 and 10 5 , and the average is 6700. Because the first principles Q values of the initial set are spread over two orders of magnitudes, and this difference should increase in the subsequent optimization cycles, we employ log10( ) as the target of machine learning. As a result, the initial training dataset consists of the structural parameters and

Learning phase
[Step (IV)]: For this phase, we employ a conventional loss function L consisting of two terms: the squared difference between the output of the NN and the teacher data (i.e., log10( )), and the summation of the squared connection weights in the network (weight decay method [24]), where the latter is used to avoid the overfitting, For the hyperparameter λ we use 0.00333 determined from the (10-fold) cross-validation method. In the training process, we randomly select one set { , log10 ( )} from the training dataset { , log10 ( )}, where j is the number of samples in the present dataset as defined in Fig. 1, and change the internal parameters of the NN to reduce L using the back-propagation method [25]. Here, the actual output of the NN is referred to as log 0 , where is an approximation of the Q factor. We apply the momentum optimization method to speed up convergence [26], where the learning rate and the momentum decay rate are set to 1.010 -4 and 0.9, respectively. The random selection of one structure and following reduction of L by using the back-propagation method is repeated 510 4 times. 10 separate NNs are trained by the same method, but with different orders of data feeding. Therefore, after the training, each NN has acquired different internal parameters, which widens the divergence of the candidate structures that are generated in the following step (V).

Structure search phase
[Step (V)]: Several candidate structures (here, we use m = 70) with potentially higher Q factors are generated using the gradient method. For this we define the loss function ′, and calculate the gradient of ′ with respect to (i.e., ∇ ′ ) using the back-propagation method [25], where target is set to a very high value (here, we use 1.010 8 ). Starting from a randomly generated initial structure defined in the parameter space by (k > j), we incrementally change the structure to reduce the loss ′ (i.e., ← + Δ , where Δ is a set of incremental hole displacements calculated from ∇ ′ | based on the momentum method [26]). The artificial loss term in Eq. (2) is used to constrain the structural parameter space that is explored during the optimization, and different conditions are used to obtain different candidate structures. We designed the following three types of artificial losses, where ′ is a control parameter.
(A) Squared distance from the base structure or the best structure in the previous round: Where refers to the sample structure with the highest Q in the previous rounds (i.e., the highest Q among Q . (In the case of the first round, is set to zero, because the base structure has no displacements). This artificial loss is designed to explore the parameter space in the vicinity of the best structure in the previous rounds.
(B) Squared distance from a randomly generated initial structure : This artificial loss forces exploration of unknown parameter space stochastically. It is expected that a structure with a higher Q that is not predictable from the training dataset, can be accidentally found by using this artificial loss.
(C) Sum of the inverse of the distances from all the structures in the training data set: This artificial loss increases as the parameter space vector of the structure that is being optimized approaches the locations of the known structures, with i ≤ j. This restriction forces exploration of unknown parameter space more strictly than (B).
For the present demonstration, we designed and investigated the following three strategies of candidate generation:  Figure 3 shows the highest Q factors of the additional 70 samples structures generated in each iteration step cycle as a function of the size of the used training dataset. The corresponding are not discussed in the following, because the regression function is only employed to identify structures with potentially higher Q factors (via the gradient method). The results for the different strategies (A), (A+B), and (A+C) are shown with the blue, orange, and green curves, respectively. We find that the highest Q achieved in each round overall increases with further iteration although some fluctuations exist. The highest Q factors of the structures that have been detected by 100 iterations of cavity design optimization are 5.7510 6 , 9.1210 6 , and 1.1010 7 for strategies (A), (A+B), (A+C), respectively. Figure 4 plots the inter-structure distances between the best structure in the present round and the best structure in the previous rounds in terms of the parameter space vector , indicating how large the modifications in each round of optimization are. It can be confirmed that the inter-structure distances tend to decrease as the optimization proceeds. The inter-structure distances for strategy (A+C) is basically always larger than those for the other structures, and that for strategy (A+B) is larger than that for (A) only at early stages (< 4000 samples). The air hole displacements of the structures with the highest Q factors found during 100 optimization cycles for the three strategies are shown in Fig. 5. The distribution of Ey and the modal volume Vcav of the cavity mode are shown as well. It is interesting to note that the displacements of the best cavities for the three strategies are significantly different. Vcav of the optimized cavities are 0.73, 0.68, and 0.74 (/ ) 3 for strategies (A), (A+B), (A+C), respectively, which are slightly larger than that of the base structure (0.61 (/ ) 3 , Fig. 2).

Performance of the three strategies
At first, we compare the results of the iterative optimization proposed in this report with the optimization results of the previously reported NN-based optimization method [19], which corresponds to the results obtained in the first round. After the first round of optimization, 1070 samples cavities had been accumulated, and the highest Q factors were 3.0910 5 , 2.4410 5 , and 1.5710 5 for strategies (A), (A+B), (A+C), respectively. The improvements relative to the Q of the base structure (7160, Fig. 2) are about 43, 34, 22 times for (A), (A+B), (A+C), respectively. The iterative optimization method was able to detect structures with much higher Q factors: the highest Q factors that have been found until the final 101th round are 5.7510 6 , 9.1210 6 , and 1.1010 7 , and thus the improvement ratios are about 800, 1270, 1540 times for (A), (A+B), (A+C), respectively. This evidences that the proposed method is very effective compared to the previous method, because computation costs for first principles calculations increased only by 8 times (from 1070 to 8070 sample cavities). In the following we compare the three strategies with respect to the best Q and the computation costs. The best Q factors found with strategies (A+B) and (A+C) are about 1.6 and 1.9 times larger than that of (A). The differences in the improvement ratios originate from the differences in the methods of generating candidate cavity structures. In strategy (A), candidates for high Q structures are generated by exploring the parameter space following the gradient of predicted by the trained NN while keeping the distance from the best structure in the previous rounds small. This constraint is controlled by parameter ′ , which was changed from 3 to 110 -5 to generate 7 different candidates. Candidates with higher Q factors are frequently generated, but the candidate structures are generated according to the past experience (although some randomness is introduced by the initial structures). Therefore, the possibility to get stuck in a local maximum during the repetition of the optimizations is relatively large. The rapid decrease of the inter-structure distance for this strategy shown in Fig. 4 supports this interpretation.
In strategy (A+B), half of the candidates are generated according to the past experience, and half of the candidates are generated by exploring the parameter space near randomly generated initial structures. The latter approach is expected to add diversity to the generated candidates and the training dataset. It is important to note that the latter half is not just a random generation; here, candidates are explored based on experience (the gradient of QNN) while the space to be explored is intentionally limited. This can prevent getting stuck in an already known local maxima. This explanation is supported by the larger inter-structure distances for this strategy compared to those for strategy (A) in the early stages of optimization ( Fig. 4; < 4000 samples). It seems that the advantage of this strategy decreases as the number of iteration cycles increases as shown in Figs. 3 and 4. We explain this with the higher probability of an overlap between the randomly generated initial structure and some sample in the training dataset after many iterations, reducing the diversity of the generated candidates. However, this strategy detected a structure with a Q factor that is 1.6 times larger than the best structure found with strategy (A). It is noted that the computation cost for this strategy is the same as that for (A), because the computation cost for the artificial loss (B) [Eq. (4)] equals that of the artificial loss (A) [Eq. (3)].
In strategy (A+C), half of the candidates are generated according to the past experience, and half of the candidates are generated by exploring the parameter space according to the gradient of QNN while avoiding the space near the already known structures in the training data set. Therefore, unknown parameter space is explored more explicitly compared to the case of using artificial loss (B). The maximum Q factor found with this strategy is 20% larger than that detected with strategy (A+B) as shown in Figs. 3 and 5. Moreover, the tendency to detect significantly higher Q factors in the next iteration step is still not saturated even at 100 optimization cycles ( Fig. 3; the inclination of the green curve is relatively steep). This is in contrast to the case of (A+B), and means that strategy (A+C) can avoid local maxima more effectively. The much larger inter-structure distances of this strategy shown in Fig. 4 support this interpretation. The drawback is the increase in the computation cost: as can be seen from Eq. (5), the evaluation cost for the artificial loss (C) scales with the number of samples in the training dataset (N) while the other terms in the loss function do not scale with N as can be seen in Eqs. (2)-(4). However, the evaluation cost of the artificial loss (C) scales much slower than that of the Bayesian optimization discussed later.

Comparison with other optimization methods
Here, we compare the L3 cavity optimization performances of our proposed method and other stateof-the-art optimization methods as a benchmark. The genetic algorithm based method was used in [16] to optimize 5 parameters in the Si-based L3 cavity and enabled detection of a structure with a Q of 4.2 million by using ~8000 sample cavities. Reference [17] optimized 9 parameters in L3 cavity using a leaky component visualization method, and found a structure with a Q of 5.3 million by using 200 sample cavities. In comparison, our proposed method was used to optimize 25 parameters of the L3 cavity and we detected a structure with a Q of 11.0 million by using 8070 sample cavities generated with strategy (A+C). The maximum Q detected by the proposed method is more than 2.6 times larger than that found in [16], while the number of samples cavities that have been used are almost the same.
Compared to the leaky component visualization method, the Q obtained in the present method is more than two times larger. Although the number of sample cavities used in [17] is only about 200, these sample structures had to be explored manually, which usually consumes similar time and more effort compared to the proposed automated method. Therefore, our proposed method has provided a structure that is a more optimized than those of the two previous methods while the computation costs are similar. We consider that the higher optimization efficiency of the proposed method has two origins: a training database that contains all experiences accumulated during the whole calculation, and also the aggressive search of unknown parameter space, which results in generation of candidate structures that are useful for the optimization.

Comparison with Bayesian optimization
Finally, we compare the proposed method and the well-known Bayesian optimization in the context of generic optimization methods. The Bayesian optimization is a powerful tool to optimize a blackbox function that is expensive to calculate [27,28]. In this method, an approximate function (usually a so-called Gaussian process [28]) that predicts not only the mean but also variance (uncertainty) of the values of the black-box function is generated by using the present dataset (the training dataset in our case). Then, an acquisition function that evaluates the probability of obtaining better values is prepared based on the predicted mean and uncertainty. As a new observation point (= candidate), the point with the highest value of the acquisition function is searched in its parameter space, and the value of the black-box function at this observation point is calculated and added to the dataset. This procedure is iterated many times. We note that the approach of our proposed method constitutes almost the same procedure. As explained in Section 2, our framework uses a NN to construct an approximation function of the black-box function . The artificial loss (C) roughly evaluates the inverse of the uncertainty, and the use of several NNs trained with different data feeding orders also corresponds to the evaluation of the uncertainty of the predicted values.
However, there are important differences between the Bayesian optimization and our proposed optimization method. One is the computation cost for the search of candidate structures in the highdimensional parameter space (this applies to the so-called normal Bayesian optimization only), and the other is the computation cost of the learning of a large-scale training dataset (this also applies to other types of Bayesian optimization). Concerning the former difference, the normal Bayesian optimization usually uses direct search methods without relying on derivatives [28], and therefore sufficient search in a high-dimensional parameter space is impossible from the viewpoint of computation cost (a practical parameter space is usually limited to less than 10 dimensions) [28 -30]. The gradient method starting from random initial points would be useful for the search in highdimensional space, but it is difficult to implement because the gradient of the acquisition function in the normal Bayesian optimization tends to be zero over a wider region in the parameter space as its dimension increases, because of the characteristics of the kernel function [29]. The Bayesian optimization with the elastic Gaussian process [29] can overcome this issue, but computation costs for the learning of large-scale datasets remain high as discussed later. The random embedding method [30] can also treat high-dimensional parameter spaces but only under the rather restrictive assumption that the numbers of important dimensions are very small. In contrast, our method is able to effectively utilize the gradient method in high-dimensional space because the gradient of the loss function does not disappear owing to the RELU nonlinear layers in the NNs and the properly designed artificial loss terms. Therefore, a parallel trial of gradient-based searches starting from many randomly generated initial points works well in our case.
Regarding the relationship between the scale of the training dataset (N) and the computation cost for the training, the cost in the Bayesian optimization scales with N 3 because the inverse of a N  N matrix has to be calculated for the training [28]. (Ref. [31] utilized a deep neural network with a Bayesian linear regressor in the last hidden layer to resolve this issue. However, the maximum feasible dimension of the parameter space is still limited because direct (parallel) search is utilized [31].) Fortunately, the training cost of a NN only scales with N 1 , and thus a large-scale dataset can be employed to increase the precision of the candidate search, which is especially important for the optimization in a high-dimensional parameter space. In addition, our method employs the gradient method starting from random initial points, which enables efficient exploration of a high-dimensional space. In total, it is considered that the proposed approach benefits from the characteristics of a NNbased regression, which enables training of a large-scale dataset and search in a high-dimensional parameter space while introducing the policy of the Bayesian optimization (i.e., the mean and variation of the prediction are taken into account).

CONCLUSION
We proposed and demonstrated a new approach for optimizing 2D-PC nanocavity designs, which have large degrees of structural freedom. This approach comprises the repetition of the following four steps: training of NNs to learn the relationship between cavity structure and the Q factor using the present dataset, generation of candidate structures using the trained NNs, calculation of their Q factors, and finally adding the new structures and Q factors to the dataset. The key point of this approach is to generate a variety of candidate structures to avoid getting stuck in a local maximum in the highdimensional parameter space. For this purpose, we prepared several NNs and trained them with different data feeding orders. In addition, we designed three artificial loss terms and used them to generate candidate structures by employing the regression function provided by a trained NN. It was demonstrated that the artificial loss term that increases near the known structures in the dataset works most efficiently to increase the speed of generating structures with higher Q-factors: this method generated an optimized Si-based L3 nanocavity structure with a Q factor of 11 million (here, 25 parameters were fine-tuned using 101 iterations and a total of 8070 sample cavities). This Q factor is more than 2 times larger than the Q factors obtained by previously reported methods, while computation costs and efforts are similar. We also compared our method and the Bayesian optimization in the context of generic optimization methods. The proposed approach is effective not only for the optimization of 2D-PC nanocavity designs, but also for generic optimization problems in high-dimensional parameter space.