For deep learning, we first collect a dataset consisting of 10,150 silver antennae with six representative shapes (circle, square, cross, bow-tie, H-shaped, and V-shaped). Each entry in the dataset is composed of a reflection spectrum with 200 spectral points and its corresponding cross-sectional structural design with a 64×64 pixel image. Sixty-four coarse meshes are used for both the *x*- and *y*-directions for the simple calculation. The cross-sectional structure designs are prepared in the form of images with a physical domain size of 500 nm×500 nm. The antenna of 30 nm thickness is placed on a 50-nm MgF_{2} spacer, a 200-nm silver reflector, and a silicon substrate (Figure 1). To obtain the reflection spectrum of each structure, a finite-difference time-domain (FDTD) electromagnetic simulation is performed using the commercial program FDTD Lumerical Solutions. The simulation is conducted over the whole spectral range from *f*=250–500 THz, and 200 spectral points are extracted. Periodic boundary conditions with the periodicity of 500 nm are used along the *x*- and *y*-directions, and perfectly matched boundary conditions are used along the *z*-direction. At each simulation, *y*-polarized light is incident on the antenna with 0 incident angle. The current deep-learning setting solves the designing structure problem in a fixed physical domain and fixed wavelength. Designing structures with different periodicity or wavelengths requires additional data-collection or deep-learning procedures.

Figure 1: Schematic of data preparation for deep learning.

Each entry in the dataset is composed of the reflection spectrum obtained from FDTD simulation and its corresponding cross-sectional structural design.

As a next step, we implement a deep-learning algorithm using the Pytorch framework. Artificial intelligence has revolutionized the field of computer vision recently [17], [18]. A convolutional neural network (CNN) [18], [19] is among the most widely used techniques, inspired by the natural visual perception mechanism of the human brain. A CNN uses convolution operators to extract features from the input data, which are usually images. It greatly increases the efficiency of image recognition, because every channel extracts important features of the images. On the other hand, the development of GAN has resulted in major progress in computer vision [20]. A GAN is composed of a generator network (GN) that generates the images and a discriminator network (DN) that distinguishes the generated images from real images. GN is trained to generate authentic images to deceive the DN, and DN is trained not to be deceived by the GN. The two networks compete with each other in every training step; ultimately, the competition leads to mutual improvement of each network, so that GN can generate higher quality realistic images than when it learns alone. DCGAN combines the idea of a CNN and GAN to provide a very stable Nash equilibrium solution [16]. We employed the cDCGAN algorithm with a condition [21], which is the input reflection spectrum in this case.

The cDCGAN architecture to design nanophotonic structures is presented in Figure 2. cDCGAN is composed of two networks: a GN that generates structural cross-sectional images, and a DN that distinguishes the generated images given by the GN from user-given target designs group. GN is composed of four transposed CNN layers consisting of 1024, 512, 256, 128, and 1 channel, respectively; DN is a CNN with four layers. GN takes inputs of both the 100×1 size random noise (*z*) and the 200×1 size input spectrum. GN provides a probability distribution function (PDF) of the antenna as output, which is generated from the random noise. The input spectrum guides the GN to generate a PDF that has such optical properties. On the contrary, DN takes input as a structural image from either a user-provided target designs group (*x*) or the generated PDF images by GN, GN(*z*). DN plays the role of discriminating GN(*z*) from the target designs group. Ultimately, GN and DN are simultaneously trained competitively: GN is trained to generate an authentic structural design to deceive the DN, and DN is trained to distinguish target designs from the design generated by GN. Mathematically, GN and DN are trained in the direction to minimize or maximize the objective function:

Figure 2: Schematic of the cDCGAN architecture to suggest the designs of structures.

GN is composed of a transposed CNN to generate the structural images, and DN is composed of conventional CNN to distinguish target structural designs from the generated designs. Each layer introduces nonlinear activation functions (ReLU, Tanh, Leaky ReLU, and Sigmoid) according to the guideline of Radford et al. [16].

$$\begin{array}{c}\underset{\text{GN}}{\mathrm{min}}\text{\hspace{0.17em}}\underset{\text{DN}}{\mathrm{max}}\text{\hspace{0.17em}}l\mathrm{(}\text{DN,\hspace{0.17em}GN}\mathrm{)}={E}_{\text{x}\leftarrow {P}_{\text{data}}\mathrm{(}x\mathrm{)}}\mathrm{[}\mathrm{log}\text{\hspace{0.17em}}\text{DN}\mathrm{(}x\mathrm{)}]\\ +{E}_{z~{P}_{\text{z}}\mathrm{(}z\mathrm{)}}\mathrm{[}\mathrm{log}\mathrm{(}1-\text{DN}\mathrm{(}\text{GN}\mathrm{(}z\mathrm{)}\mathrm{)}\mathrm{)}]\mathrm{,}\end{array}$$(1)

where DN(*x*) represents the probability of a structural image coming from the target design group (*x*), and DN(GN(*z*)) represents that coming from generated design G(*z*) by GN. In terms of DN, the network is trained to give maximized expectation values of *E* with ${E}_{\text{x}\leftarrow {P}_{\text{data}}\mathrm{(}x\mathrm{)}}\mathrm{[}\mathrm{log}\text{\hspace{0.17em}}\text{DN}\mathrm{(}x\mathrm{)}]$ for a given image coming from the target design, and ${E}_{z~{P}_{\text{z}}\mathrm{(}z\mathrm{)}}\mathrm{[}\mathrm{log}\mathrm{(}1-\text{DN}\mathrm{(}\text{GN}\mathrm{(}z\mathrm{)}\mathrm{)}\mathrm{)}]$ for a given image generated by GN. On the other hand, GN is trained to give minimized expectation values to deceive DN. This adversarial training allows GN to generate high-quality structural images.

In addition to adversarial training, we further modify the loss function of GN [22], [23] in the cDCGAN to fit our problem to

$${l}_{\text{GN}}=\mathrm{(}1-\rho \mathrm{)}\times {l}_{\text{GN,design}}+\rho \times {l}_{\text{GN,adv}}\mathrm{,}$$(2)

where *l*_{GN,design} is the design loss, *l*_{GN,adv} is the adversarial loss defined in Eq. (1), and *ρ* is the ratio of the adversarial loss. The design loss is introduced to explicitly guide the GN to generate structural images well. It directly measures the quantitative difference between two probability distributions of the target design (*x*_{i}) and the generated design $\mathrm{(}{\widehat{x}}_{i}\mathrm{)}$ using a binary cross-entropy criterion:

$${l}_{\text{GN,design}}=-\mathrm{(}{x}_{i}\mathrm{log}\text{\hspace{0.17em}}\sigma \mathrm{(}{\widehat{x}}_{i}\mathrm{)}+\mathrm{(}1-{x}_{i}\mathrm{)}\mathrm{log}\mathrm{(}1-\sigma \mathrm{(}{\widehat{x}}_{i}\mathrm{)}\mathrm{)}\mathrm{)}\mathrm{,}$$(3)

where a *σ* is a Sigmoid function.

We optimized *ρ* to make the GN generate high-quality realistic designs. For a low *ρ*, a competition effect cannot be expected, whereas a high *ρ* can cause confusion in the learning process. Therefore, an appropriate value of *ρ*=0.5 was chosen to maximize the ability of GN to produce convincing structural designs. During each training step, the network is trained to optimize the weights to describe the mapping between the input spectrum and the PDF (see Supporting Information for details about deep-learning procedure and network optimization).

After training, cDCGAN suggests designs on a 64×64 pixel PDF *p*(*i*, *j*), which represents the probability that a silver antenna exists at the location (*i*, *j*). To reduce the PDF to a binary image representing the existence of antennae at the locations, we employed a post-processing step according to Otsu [24]. This method determines the binary threshold *t* that minimizes the intra-class variance ${\sigma}_{\omega}^{2}$ of the black and white pixels as

$${\sigma}_{\omega}^{2}\mathrm{(}t\mathrm{)}={\omega}_{0}\mathrm{(}t\mathrm{)}{\sigma}_{0}^{2}\mathrm{(}t\mathrm{)}+{\omega}_{1}\mathrm{(}t\mathrm{)}{\sigma}_{1}^{2}\mathrm{(}t\mathrm{}\mathrm{)}\mathrm{,}$$(4)

where *ω*_{0} and *ω*_{1} represent the weights for the probabilities of two classes separated by *t*, ${\sigma}_{0}^{2}$ is the variance of the black pixels, and ${\sigma}_{1}^{2}$ is the variance of the white pixels. In summary, for a given reflection spectrum, cDCGAN produces a PDF which is then converted to a binary design image in the post-processing step. At each training step, 2000 validation samples are used to validate the trained network. The average loss of the validation set converged to 5.564×10^{−3} after 1000 training steps. Using a single GPU of GTX 1080-Ti, the training a network for one epoch requires about 4 min. However, once a network is trained, the trained network can generate a design for a desired spectrum within 3 s.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.