Diffractive Interconnects: All-Optical Permutation Operation Using Diffractive Networks

Permutation matrices form an important computational building block frequently used in various fields including e.g., communications, information security and data processing. Optical implementation of permutation operators with relatively large number of input-output interconnections based on power-efficient, fast, and compact platforms is highly desirable. Here, we present diffractive optical networks engineered through deep learning to all-optically perform permutation operations that can scale to hundreds of thousands of interconnections between an input and an output field-of-view using passive transmissive layers that are individually structured at the wavelength scale. Our findings indicate that the capacity of the diffractive optical network in approximating a given permutation operation increases proportional to the number of diffractive layers and trainable transmission elements in the system. Such deeper diffractive network designs can pose practical challenges in terms of physical alignment and output diffraction efficiency of the system. We addressed these challenges by designing misalignment tolerant diffractive designs that can all-optically perform arbitrarily-selected permutation operations, and experimentally demonstrated, for the first time, a diffractive permutation network that operates at THz part of the spectrum. Diffractive permutation networks might find various applications in e.g., security, image encryption and data processing, along with telecommunications; especially with the carrier frequencies in wireless communications approaching THz-bands, the presented diffractive permutation networks can potentially serve as channel routing and interconnection panels in wireless networks.


Introduction
Permutation is one of the basic computational operations that has played a key role in numerous areas of engineering, e.g., computing [1], communications [2], encryption [3], data storage [4], remote sensing [5], and data processing [6]. Historically, electronic integrated circuits have been the established implementation medium for the permutation operation and other space-variant linear transformations, while the research on optical computing has been mainly focused on using the Fourier transform approximation of thin lenses covering various applications in space-invariant transformations, e.g., convolution/correlation. On the other hand, as photonic switching devices and optical waveguide technology have become the mainstream communication tools on high-end applications, e.g., fiber optic networks, supercomputers, and data centers, various approaches have been developed toward all-optical implementation of permutation operation and other space-variant transformations based on, e.g., Mach-Zehnder interferometers [7], optical switches [8], photonic crystals [9], holographically recorded optical elements [10][11][12], off-axis lenslet arrays [13,14], and arrays of periodic grating-microlens doublets [15]. The development of compact, low-power optical permutation and interconnection devices can have significant impact on next-generation communication systems, e.g., 6G networks [16,17], as well as other applications such as optical data storage [18] and image encrypting cameras [19][20][21].
With the widespread availability of high-end graphics processing units (GPU) and the massively growing amounts of data, the past decade has witnessed major advances in deep learning, dominating the field of digital information processing for various engineering applications including, e.g., image segmentation and classification [22][23][24][25], natural language processing [26,27], among others [28]. The statistical inference and function approximation capabilities of deep neural networks have also been exploited to produce state-of-the-art performance for computational inverse problems in many imaging and sensing applications including, e.g., microscopy [29][30][31][32][33][34][35], quantitative phase imaging [36][37][38][39][40][41], and others [42][43][44][45][46][47][48][49][50][51]. Beyond these data processing tasks, deep learning can also provide task-specific solutions to challenging inverse optical design problems for numerous applications including nanophotonics [52,53], metamaterials [54], imaging, and sensing [55][56][57][58][59][60]. However, as the success and the applications of deep learning grow further, the electronic parallel computing platforms, e.g., GPUs, hosting deep neural networks and other machine learning algorithms have started to bring some limitations due to their powerand bandwidth-hungry operation. Moreover, the pace of the advances in computational capacity of the integrated circuits has fallen behind the exponential increase predicted by the Moore's law [61]. These factors have fueled a tremendous amount of effort toward the development of optical machine learning schemes and other photonic computing devices that can partially reduce the computational burden on the electronics leading to power-efficient, massively parallel, high-speed machine learning systems. While most of the arising optical computing techniques rely on integrated photonic devices and systems compatible with the integrated waveguide technology [62][63][64][65][66][67][68], an alternative option toward exploiting photons for machine learning and the related computing tasks is to use complex modulation media and free-space light propagation and diffraction, which is particularly suitable for visual computing applications where the information is already carried by the optical waves (e.g., of a scene or target object) in free space [69].
Motivated by these pressing needs, Diffractive Deep Neural Networks (D 2 NN) [70,71] have emerged as an optical machine learning framework that utilizes deep learning to engineer light-matter interactions over a series of diffractive surfaces so that a desired statistical inference or deterministic computing task is realized all-optically as the light waves propagate through structured surfaces. According to this framework, the physical parameters determining the phase and/or amplitude of light over each independently controllable unit, i.e., the "diffractive neuron," are updated through the stochastic gradient descent and error-backpropagation algorithms based on a loss function tailored specifically for a given task. The weights of the connections between the diffractive neurons/features on successive layers, on the other hand, are dictated by the light diffraction in free space. Once the deep learning-based training, which is a one-time effort, is completed using a computer, the resulting transmissive/reflective diffractive layers are fabricated using, e.g., lithography or 3D printing, to physically form the diffractive network that completes a given inference or computational task at the speed of light using entirely passive modulation surfaces, offering a task-specific, power-efficient, and fast optical machine learning platform.
Based on the D 2 NN framework, here we demonstrate diffractive optical network designs that were trained to all-optically perform a given permutation operation between the optical intensities at the input and output fields-of-view, capable of handling hundreds of thousands of interconnections with diffraction limited resolution. We quantified the success of the presented diffractive optical networks in approximating a given, randomly selected permutation operation as a function of the number of diffractive neurons and transmissive layers used in the diffractive network design. We also laid the foundations toward practical implementations of diffractive permutation networks by investigating the impact of various physical error sources, e.g., lateral and axial misalignments and unwanted in-plane layer rotations, on the quality/accuracy of the optically realized interconnection weights and the permutation operation. Moreover, we showed that the diffractive optical permutation networks can be trained to be resilient against possible misalignments as well as imperfections in the diffractive layer fabrication and assembly. Finally, we present the first proof-of-concept experimental demonstration of diffractive permutation networks by all-optically achieving a permutation matrix of size 25 × 25, effectively realizing 625 interconnections based on 3D-printed diffractive layers operating at the THz part of the spectrum.
The presented diffractive optical permutation networks can readily find applications in THz-band communication systems serving as communication channel patch panels; furthermore, the underlying methods and design principles can be broadly extended to operate at other parts of the electromagnetic spectrum, including the visible and IR wavelengths, by scaling each diffractive feature size proportional to the wavelength of light [72] and can be used for image encryption in security cameras [21] and optical data storage systems, among other applications [73][74][75][76]. Figure 1 illustrates the presented free-space permutation interconnect concept designed around diffractive optical networks using the D 2 NN framework. As shown in Figure 1, the presented permutation interconnect scheme does not use any standard optical components such as lenses, and instead relies on a series of passive, phase-only diffractive surfaces. Due to the passive nature of these layers, the diffractive optical network shown in Figure 1 does not consume any power except for the illumination light, providing a power-efficient permutation operation in a compact footprint of ∼600 along the longitudinal axis, which could be further squeezed as needed. The 5-layer diffractive optical permutation network design shown in Figure 1 was trained through supervised learning to all-optically realize a desired permutation operation, P, between the light intensity signals at the input and output FOVs, each with N i = N o = 400 (20 × 20) individual pixels of size 2 × 2 . Stated differently, this permutation operation controls in total of N i N o = 0.16 million optical intensity connections.

Results
The supervised nature of the training process of the diffractive permutation network necessitates the use of a set of input-output signal pairs (examples that satisfy P) to compute a penalty term and the associated gradient-based updates with respect to the physical parameters of each diffractive neuron at every iteration. We set the optical signal of interest at the input and the output of the diffractive permutation scheme to be the light intensity, and as a result, the deep learning-based evolution of the presented diffractive permutation network shown in Figure 1 was driven based on the mean-squared error (MSE) (see Methods Section) between the ground-truth and the all-optically synthesized output intensity patterns at a given iteration. Since this loss function acts only on the light intensity, the diffractive optical network can enjoy an output phase-freedom in synthesizing the corresponding transformed optical intensity patterns within the output field-of-view. The light intensity, I, is related to the complex-valued field, U, through a nonlinear operation, I = |U| 2 . If a pair of input-output complex fields exists for a given diffractive network, i.e., {U in , U out } and {U ′ in , U ′ out }, then the input field U ′′ in = U in + U ′ in will create the output field U ′′ , making it challenging (in terms of the data generalization capability) to design diffractive optical networks for achieving a general purpose intensity-tointensity transformation such as a permutation operation. To overcome this generalization challenge, we trained our diffractive permutation networks using ∼4.7 million randomly generated input/output intensity patterns that satisfy the desired P, instead of a standard benchmark image dataset (see the Methods).
After the training phase, we blindly tested each diffractive permutation network with test inputs that were never used during the training. Figure 2 illustrates 6 different randomly generated blind testing inputs along with the corresponding all-optically permuted output light intensities. In the first two randomly generated input patterns shown in Figure 2A, there is light coming out of all the input pixels/apertures at different levels of intensity. In the next two test input patterns shown in Figure 2A, on the other hand, nearly half of the input apertures have nonzero light intensity and finally, the last two test inputs contain only 10 and 11 pixels/apertures with light propagating toward the 1st layer of the diffractive permutation network. When tested on 20K randomly generated blind testing input intensity patterns with different sparsity levels, the 5-layer diffractive permutation network shown in Figure 1 achieves 18.61 dB peak-signal-to-noise ratio (PSNR), very well matching the ideal output response. For this randomly generated 20K testing data, Figure 2B also shows the distribution of PSNR as a function of the number of input pixels with nonzero light intensity, which reveals that the diffractive permutation network can permute relatively sparser inputs with a higher output image quality, achieving a PSNR of 25.82 dB.
In addition to randomly generated blind testing inputs, we further tested the diffractive permutation network shown in Figure 1 on 18.75K EMNIST images; note that this diffractive network was trained only using randomly generated input/output intensity patterns that satisfy P and the EMNIST images constitute not only blind testing set but also a significant deviation from the statistical distribution of the training images. The input field-of-view contains the permuted EMNIST images (P −1 ) and the diffractive network inverts that permutation by all-optically performing P to recover the original images at the output plane (see Figure 2). The performance of the diffractive permutation network was quantified based on both PSNR and Structural Similarity Index Measure (SSIM).
With N L = 40K diffractive neurons on each layer, the   Figure 1 was tested on two different datasets. The first blind testing dataset contains 20K randomly generated inputs. 6 examples from this randomly created testing data are shown here for demonstrating input-output intensity pairs with low, moderate and high signal sparsity levels. Beyond successfully permuting randomly generated intensity patterns, the performance of the diffractive permutation network was also quantified using permuted EMNIST images. None of these test samples were used in the training phase. (B) Output intensity image PSNR with respect to the ground truth intensity patterns as a function of the input signal sparsity in randomly generated test dataset. (C) Same as (B), except for EMNIST test images. Figure 1 provides 19.18 dB and 0.85 for PSNR and SSIM metrics, respectively, demonstrating the generalization capability of the diffractive network to new types of input image data never seen during the training phase.

Impact of the number of diffractive layers and features
Next, we investigate the performance of diffractive permutation networks as a function of the number of diffractive neurons trained within the system. Toward this goal, in addition to the 5-layer design shown in Figure 1 that has in total of N = 200K trainable diffractive features, we trained diffractive permutation networks consisting of 4, 3, and 2 modulation surfaces. The physical design parameters such as the size/width of the diffractive surfaces, layer-to-layer distances, and the extent of the input and output fields-of-view were kept identical to the ones in the 5-layer network design. In other words, these new diffractive networks are designed and trained exactly in the same way as the previous 5-layer network except they contain fewer diffractive layers. Figure 3A and B provides a quantitative comparison between these 4 diffractive permutation networks. While Figure 3A illustrates the mean PSNR and SSIM values achieved by each diffractive network for recovering EMNIST images, Figure 3B demonstrates the mean-squared-error (MSE) between the desired permutation operation and its optically realized version (P D 2 NN ) as a function of the number of diffractive layers utilized in these designs (see Supplementary Figure S1 and the Methods Section on the estimation of P D 2 NN ). According to the permutation operator error shown in Figure 3B, the performance improvement of the system increases drastically with the additional diffractive layers up to the 4-layer design that represents a critical point in the sense that the inclusion of a 5th diffractive surface brings a relatively small improvement. The reason behind this behavior is the fact that the number of diffractive features, N, in the 4-layer diffractive permutation network matches the space-bandwidth product set by our input and output FOVs, i.e., N i N o = 400 × 400 = 160K. In other words, Figure 3B reveals that when the number of phase-only diffractive modulation units N matches or exceeds N i N o , the diffractive optical network can achieve a given linear transformation between the input and output intensities with a very low error, i.e., P D 2 NN ≈ P; for example, the MSE between P D 2 NN and P in the case of a 4-layer design was found to be 6.63 × 10 −5 . For N < N i N o , the error between P D 2 NN and P increases accordingly, as shown in Figure 3B.
The benefit of having N ≥ N i N o is further revealed in the increased generalization capability of the diffractive network as shown in Figure 3A. Since the EMNIST images were not used during the training, they represent completely new types of input intensity patterns for the presented diffractive optical networks. The SSIM (PSNR) values achieved by the 4-layer diffractive network is found as 0.75 (16.41 dB) for the optical recovery of the permuted EMNIST images. These numbers are significantly higher compared to the performance of the 3-layer and 2-layer diffractive designs that can attain SSIM (PSNR) values of 0.46 (12.91 dB) and 0.30 (12.08 dB) for the same task; furthermore, the 5-layer diffractive network design shown in Figure 1 outperforms the others by achieving 0.85 (19.18 dB) for the same performance metrics. The visual comparison of the input-output intensity patterns depicted in Figure 3C further supports this conclusion, where the noise due to the crosstalk between interconnection channels decreases proportional to the number of diffractive layers in the system.

Vaccination of diffractive permutation networks
With sufficiently large number of phase-only diffractive neurons/features, the diffractive networks can optically realize permutation operations with, e.g., 0.16 million channels between the input and output pixels as shown in Figure 3. In fact, the number of interconnects that can be optically implemented through diffractive networks can go far beyond 0.16 million, given that the size/width of the diffractive surfaces and the number of diffractive layers can be increased further depending on the fabrication technology and the optomechanical constraints of the system. In addition, as the number of diffractive layers increases in a diffractive network architecture, their forward model can better generalize to new, unseen data as shown in Figure 3A.
On the other hand, deeper diffractive optical network designs are more susceptible to misalignments that are caused by the limitations of the optomechanical assembly and/or the fabrication technology that is utilized. It was shown that diffractive optical networks trained for statistical inference tasks, e.g., all-optical object classification, can be vaccinated against misalignments and other physical error sources, when the factors creating these nonideal conditions were incorporated into the training forward model, which was termed as vaccinated-D 2 NNs or v-D 2 NNs [77]. Specifically, v-D 2 NN expands on the original D 2 NN framework by modeling possible error sources as random variables and integrating them as part of the training model so that the deep learning-based evolution of the diffractive surfaces is guided toward solutions that are resilient to nonideal physical conditions and/or fabrication errors. Toward practical applications of diffractive permutation networks, we quantified the impact of optomechanical errors and applied the v-D 2 NN framework to devise robust solutions that can achieve a given interconnect operation despite fabrication tolerances.
In our numerical study depicted in Figure 4, we considered 4 different misalignment components representing the 3D misalignment vector of the lth diffractive layer, and their in-plane rotation around the In addition, their location over the z direction and their in-plane orientation can randomly change within the ranges of (−12 , 12 ) and (−2 • , 2 • ), respectively (see the Methods Section for more details).
To better highlight the impact of these misalignments and demonstrate the efficacy of the v-D 2 NN framework, we trained a new nonvaccinated, i.e., tr = 0, diffractive permutation network that can all-optically realize a given permutation matrix, P, representing 10K intensity interconnections between 100 input and 100 output pixels of size 4 × 4 . The error-free training model of this diffractive network with tr = 0 implicitly assumes that when the resulting diffractive network is fabricated, the system conditions will exactly match the ideal settings regarding the 3D locations of the layers and their in-plane orientations. With an architecture identical to the one shown in Figure 1, containing N = 200K ≫ N i N o diffractive neurons, this diffractive network can all-optically approximate the permutation matrix, P, with an MSE of 1.45 × 10 −6 in the absence of any misalignment errors, i.e., test = 0 (see the green curve in Figure 4B). However, when there is some discrepancy between the training and testing conditions, i.e., test > 0, the optically implemented forward transformation, P D 2 NN , starts to deviate from the desired operation P. For instance, at test = 0.125, the transformation error, ‖P D 2 NN − P‖, can be computed as 3.1 × 10 −3 . This negative impact of the physical misalignments on the performance of a nonvaccinated diffractive network can also be seen in Figure 4A  Unlike the nonvaccinated design, the vaccinated diffractive permutation networks can maintain their approximation capacity and accuracy under erroneous testing conditions as shown in Figure 4A and B. For instance, the SSIM value of 0.49 attained by the nonvaccinated diffractive network for the misalignment uncertainty set by test = 0.125 increases to 0.88 in the case of a diffractive permutation network trained with tr = 0.25 (red curve in Figure 4A). The difference between the image recovery performances of the vaccinated and the nonvaccinated diffractive network designs increases further as the misalignment levels increase during the blind testing.  Figure S2 further illustrates the error maps between P and P D 2 NN realized by the nonvaccinated and vaccinated diffractive permutation networks at different misalignment levels.
The compromise for this misalignment robustness comes in the form of a reduction in the peak performance. While the nonvaccinated diffractive network can solely focus on realizing the given permutation operation with the highest quality and approximation accuracy, the vaccinated diffractive network designs partially allocate their degrees-of-freedom to building up resilience against physical misalignments. For example, while the peak SSIM achieved by the nonvaccinated diffractive network is 0.99, it is 0.88 for the diffractive permutation network vaccinated with tr = 0.25. The key difference, on the other hand, is that the better performance of the nonvaccinated diffractive network is sensitive to the physical implementation errors, while the vaccinated diffractive permutation networks can realize the desired input-output interconnects over a larger range of fabrication errors or tolerances. A comparison between the diffractive layer patterns of the nonvaccinated and vaccinated diffractive permutation networks shown in Figure 4C and D, respectively, also reveals that the vaccination strategy results in smoother light modulation patterns; in other words, the material thickness values over the neighboring diffractive neurons partially lose their independence and become correlated, causing a reduction in the number of independent degreesof-freedom in the system.

Experimental demonstration of a diffractive permutation network
To experimentally demonstrate the success of the presented diffractive permutation interconnects, we designed a 3-layer diffractive permutation network achieving the desired (randomly generated) intensity shuffling operation In addition to these misalignment components, we also vaccinated this diffractive network model against unwanted material thickness variations that could arise due to the limited lateral and axial resolution of our 3D printer (see the Methods Section for details). To compensate for the reduction in the degrees-of-freedom due to the vaccination scheme, the number of phase-only diffractive features in the permutation network was selected to be N L = 10K diffractive neurons per layer. Therefore, each diffractive layer shown in Figure 5A contains 100 × 100 phase-only diffractive neurons of size ∼0.67 × 0.67 .
Compared to the diffractive surfaces shown in Figures 1-4, the layers of our experimental system were set to be 2-times smaller in both the x and y directions to keep the layer-to-layer distances smaller while maintaining the level of optical connectivity between the successive diffractive surfaces (see Figure 5B). Figure 5C and D illustrates the 3D printed diffractive permutation network and the schematic of our experimental setup (see the Methods Section for details). Figure 6A illustrates the targeted 25 × 25 permutation matrix (P) that is randomly generated and the numerically predicted P D 2 NN along with the absolute difference map between these two matrices. According to the numerical forward model of the trained diffractive network shown in Figure 5, the transformation error between the P and P D 2 NN , i.e., ‖P D 2 NN − P‖ is equal to 5.99 × 10 −4 under error-free conditions, i.e., test = 0. Furthermore, the forward model of the trained diffractive permutation network shown in Figure 5 provides 17.87 dB PSNR on average for the test letters "U," "C," "L," and "A," as depicted in Figure 6B. A visual comparison between the numerically predicted and the experimentally measured output images of these 4 input letters (which were never seen by the network before) demonstrates the accuracy of the forward training and testing models as well as the success of the presented diffractive permutation network design. Interestingly, the PSNR of the experimentally measured images was observed to be higher, 19.54 dB, compared to the numerically predicted value, 17.87 dB. Our numerical study reported in Figure 4 suggests that this can be explained based on the vaccination range used during the training and the amount physical error in the system testing. For instance, the SSIM value achieved by the vaccinated diffractive network trained with tr = 0.5 (yellow curve) at relatively lower physical misalignment levels, e.g., test = 0.125, is higher compared to its performance under the ideal conditions, i.e., test = 0.0, as depicted in Figure 4A.

Discussion
Beyond optomechanical error sources and fabrication tolerances, another factor that might potentially hinder the utilization of diffractive permutation networks in practical applications is the output diffraction efficiency. For instance, the diffraction efficiency of the 5-layer network shown in Figure 1 is ∼0.004% which might be very low for some applications. On the other hand, this can be significantly increased by using an additional loss term, penalizing the poor diffraction efficiency of the network (see the Methods Section for details). Supplementary Figure S3 demonstrates 5-layer diffractive networks that are designed to optically realize 0.16 million interconnections between the input and output FOVs with increased diffraction efficiencies, and compares their performance in terms of SSIM values achieved. The training of these diffractive network models is based on a loss function in the form of a linear combination of two different penalty terms, where  e is the diffraction efficiency related penalty term promoting efficient solutions (see the Methods Section). As a general trend, the diffraction efficiency of the underlying diffractive network model increases as a function of the weight ( ) of the efficiency penalty term in the loss function. However, since the number of diffractive neurons, hence, the degrees-of-freedom in these diffractive network models is very close to N i N o , the diffraction efficiency either does not improve beyond a certain value or the evolution of the diffractive layers starts to solely focus on the efficiency instead of the desired permutation operation resulting in low performance designs. This unstable behavior can be observed specifically when 0.235 < < 0.24. On the other hand, as in the case of vaccinated  diffractive network models, if the diffractive network architecture contains N ≫ N i N o diffractive neurons, then this instability vanishes, providing significant improvements in the output diffraction efficiency without sacrificing the performance of the all-optical permutation operation. For instance, the 3D printed diffractive permutation network depicted in Figure 6 was trained based on  ′ with = 0.15 and it provides 2.45% output diffraction efficiency, despite the fact that 89.37% of the incident power at the input plane is lost due to the absorption of the 3D printing material. With weakly absorbing transparent materials used as part of the diffractive network fabrication, a significantly larger output efficiency can be achieved. Also note that, although we solely focused on diffractive network designs composed of dielectric optical modulation surfaces, in principle, some of these layers can be replaced with metasurfaces/metamaterials. While the use of can provide some additional degrees of freedom, including, for example, the engineering of dispersion, there are some challenges to overcome in realizing metamaterial-based diffractive networks. In the presence of fabrication errors and imperfections, the scattered light fields might deviate from the predictions of the numerical forward model. These nonideal waves generated by a metasurface would then excite unwanted diffraction modes over the subsequent layers generating an "avalanche" within the diffractive volume, accumulating substantial field errors, especially for deeper network designs with L ≥ 2. In addition, the physical models of phase and/or amplitude modulation of meta-atoms are, in general, valid for waves covering a relatively small numerical aperture (NA). As a result, a high NA diffractive network design that utilizes the entire bandwidth of the propagation medium (NA = 1, for air) would be challenging, as the modulation response of the meta-units might deviate from their ideal small angle responses, introducing errors to the forward model. Although it is possible to restrict a diffractive network design to work with a lower NA, it would increase the overall footprint of the system and reduce the space-bandwidth product that can be processed by the diffractive network.
These challenges, in general, are negligible for dielectric diffractive networks composed of /2 features on a substrate, as also highlighted by the close match between the numerically predicted images and their experimentally measured counterparts shown in Figure 6 and our former work [21,70,74,76,77]. In our optical forward model, the diffractive layers are assumed to be thin modulation surfaces, i.e., there is only a single scattering event converting an incident wave field to an outgoing one after each diffractive layer. Practically, though, there are additional scattered fields that are ignored in our model, especially when there is a substantial material thickness variation between adjacent pixels. However, in our designs, we do not observe a sharp material thickness transition between neighboring pixels. This is mainly due to the nature of our training process. Specifically, the presented diffractive networks are trained through error-backpropagation, which computes the variable updates by taking the gradient of the loss function with respect to the material thickness over each diffractive unit. In such a process, it is highly unlikely that the gradients of two adjacent pixels ( /2 apart from each other) deviate significantly from each other, which effectively causes a smoothening effect on the diffractive surface height profiles as they are being optimized through deep learning. This smoothening behavior is even more pronounced in the vaccinated diffractive network designs due to the random lateral translation of the diffractive layers as part of the training forward model. Therefore, the impact of side scattering or field shadowing due to height discontinuities across a given diffractive layer design is negligible. In addition, the back-reflected waves can also be ignored, as these are, in general, weak processes unless they are specifically enhanced using, e.g., metamaterials or other special structures. Therefore, the optical forward model of dielectric diffractive networks can be accurately represented within the scalar diffraction theory without needing vectorial modeling of light fields or considering weaker multi-reflections. Finally, the evanescent waves and the vectorial fields associated with them can be entirely ignored since each successive diffractive layer is axially positioned > away from the previous layer.
In summary, we showed that the diffractive networks can optically implement intensity permutation operations between their input and output apertures based on phaseonly light modulation surfaces with N ≥ N i N o diffractive neurons. Due to the nonlinear nature of the intensity operation, it is crucial to use training input intensity patterns with different levels of sparsity to prevent any type of data-specific overfitting during the training phase.
Diffractive permutation networks with N > N i N o demonstrate increased generalization capability, synthesizing more accurate outputs with ‖P D 2 NN − P‖ ≈ 0. By using N > N i N o , one can also design misalignment and fabrication error insensitive, power-efficient diffractive permutation networks, which could play a major role in practical applications, e.g., 6G wireless networks, computational cameras, etc. Although this study demonstrated diffractive optical networks realizing permutation operations with 0.16 million interconnects, with N i = N o = 20 × 20, these systems are highly scalable to even larger N i , N o combinations depending on the availability of training computer hardware. Since the training of a diffractive optical network is a one-time effort, one can use a computing platform with a significantly larger random-access memory (RAM) to design much bigger diffractive networks. Alternatively, the forward training model of a diffractive network can also be distributed among multiple GPUs for parallel computing with increased memory capacity paving the way to significantly larger permutation operations to be implemented all-optically. Finally, the incorporation of dynamic spatial light modulators to replace some of the diffractive layers in a given design can be used to reconfigure, on demand, the all-optically performed diffractive transformation.

Experimental setup
According to the schematic diagram of our experimental setup shown in Figure 5D, the THz wave incident on the input FOV of the diffractive network was generated using a horn antenna attached to the source WR2.2 modulator amplifier/multiplier chain (AMC) from Virginia Diode Inc. (VDI). A 10 dBm RF input signal at 11.111 GHz (f RF1 ) at the input of the AMC was multiplied 36 times to generate a continuous-wave (CW) radiation at 0.4 THz, corresponding to ∼0.75 mm in wavelength. The output of the AMC was modulated with 1 kHz square wave to resolve low-noise output data through lock-in detection. Since we did not use any collimating optics in our setup, the distance between the input plane of the 3D-printed diffractive optical network and the exit aperture of the horn antenna was set to be ∼60 cm approximating a uniform plane wave over the 40 × 40 input FOV.
At the output plane of the diffractive optical network, the diffracted THz light was collected using a single-pixel Mixer/AMC from Virginia Diode Inc. (VDI). During the measurements, the detector received a 10 dBm sinusoidal signal at 11.083 GHz serving as a local oscillator for mixing, and the down-converted signal was at 1 GHz. The 40 × 40 output FOV was scanned by placing the single-pixel detector on an XY stage that was built by combining two linear motorized stages (Thorlabs NRT100). At each scan location, the down-converted signal coming from the single-pixel detector was fed to low-noise amplifiers (Mini-Circuits ZRL-1150-LN+) with a gain of 80 dBm and a 1 GHz (+/−10 MHz) bandpass filter (KL Electronics 3C40-1000/T10-O/O) that erases the noise components coming from unwanted frequency bands. Following the amplification and filtering, the measured signal passed through a tunable attenuator (HP 8495B) and a low-noise power detector (Mini-Circuits ZX47-60). Finally, the output voltage value was generated by a lock-in amplifier (Stanford Research SR830). The modulation signal was used as the reference signal for the lock-in amplifier and accordingly, we performed a calibration to convert the lock-in amplifier readings at each scan location to linear scale. During our experiments, the scanning step size at the output plane was set to be ∼ in x and y directions. The smallest pixel of the experimentally targeted permutation grid, i.e., the desired resolution of the diffractive permutation operation was taken as 8 × 8 during the training, corresponding to 5 × 5 discrete input and output signals. Therefore, the output signal measured for each input object was integrated over a region of 8 × 8 per pixel, resulting in the measured images shown in Figure 6B.
A 3D printer, Objet30 Pro, from Stratasys Ltd., was used to fabricate the layers of the diffractive permutation network shown in Figure 5C as well as the layer holders. The active modulation area of our 3D printed diffractive layers was 5 × 5 cm (∼66.66 × ∼66.66 ) containing 100 × 100, i.e., 10K, diffractive neurons.
These modulation surfaces were printed as insets surrounded by a uniform slab of printing material with a thickness of 2.5 mm and the total size of each printed layer including these uniform regions was 6.2 × 6.2 cm. Following the 3D printing, these additional surrounding regions were coated with aluminum to block the propagation of the light over these areas minimizing the contamination of the output signal with unwanted scattered light.

Training forward model of diffractive permutation networks 4.2.1 Optical forward model:
The material thickness, h, was selected as the physical parameter controlling the complex-valued transmittance values of the diffractive layers of our design. Based on the complex-valued refractive index of the diffractive material, = n + j , the corresponding transmission coefficient of a diffractive neuron located on the lth layer at a coordinate of (x q , y q , z l ) is defined as, t ( where n m = 1 denotes the refractive index of the propagation medium (air) between the layers. The real and imaginary parts of the 3D printing material were measured experimentally using a THz spectroscopy system, and they were revealed as n = 1.7227 and = 0.031 at 0.4 THz.
The optical forward model of the presented diffractive networks relies on the Rayleigh-Sommerfeld theory of scalar diffraction to represent the propagation of light waves between the successive layers. According to this diffraction formulation, the free space can be interpreted as a linear, shift-invariant operator with the impulse response, where r = √ x 2 + y 2 + z 2 . Based on Eq. (2), qth diffractive neuron on the lth layer, at (x q , y q , z l ), can be interpreted as the source of a secondary wave generating the field at (x, y, z) in the form of, . ( The parameter r l q in Eq. (3) is expressed as √ ( When each diffractive neuron on layer l generates the field described by Eq. (3), the light field incident on the pth diffractive neuron on the (l + 1)th layer at (x p , y p , z l+1 ) is the linear superposition of the all the secondary waves generated by the previous layer l, i.e., ∑ , where A l q is the complex amplitude of the wave field right after the qth neuron of the lth layer. This field is modulated by the multiplicative complex-valued transmittance of the diffractive unit at (x p , y p , z l+1 ), creating the modulated field t ( ) . Based on this new modulated field, a new secondary wave, is generated. The outlined successive modulation and secondary wave generation processes occur until the waves propagating through the diffractive network reach to the output plane. Although, the forward optical model described by Eqs. (1)-(4) is given over a continuous 3D coordinate system, during our deep learning-based training of the presented diffractive permutation networks, all the wave fields and the modulation surfaces were represented based on their discrete counterparts with a spatial sampling rate of ∼0.67 on both x and y axes, that is also equal to the size of a diffractive neuron.  Figure S3, and in all these diffractive network architectures, the layer-to-layer distances were taken as 120 .

Physical
The axial distance between the 1st diffractive layer and the input FOV was set to be 53.3 that is also equal to the axial distance from the last diffractive layer to the output plane, preserving the symmetry of the system on the longitudinal axis. In the case of our experimentally validated diffractive design ( Figure 5), on the other hand, the active modulation surface of the fabricated diffractive layers extends 66.7 on both x and y directions. Accordingly, the layer-to-layer distances were taken as 60 while the remaining distances were kept equal to 53.3 . During the deep learning-based training of all of these diffractive permutation networks, the wave fields and the propagation functions While the light fields, the diffractive layers, and the impulse response of the free space were all sampled at a rate of ∼0.67 , the spatial grid/pixel size of a given desired permutation operation was set to be larger. Specifically, the permutation pixel size was taken as 2 × 2 for the diffractive networks shown in Figures 1-3. On the other hand, the input and output pixel size was chosen as 4 × 4 for the vaccinated and nonvaccinated diffractive permutation networks shown in Figure 4, and finally, the pixel size was set to be 8 × 8 for the fabricated diffractive permutation network model depicted in Figure 5.
To train the presented diffractive permutation networks, a structural loss function, , in the form of MSE was used.
In Eq. (5), I in [s] and I out [s] denote the lexicographically ordered vectorized counterparts of the input intensity pattern, i.e., vec(I in [q, p]), and the output intensity pattern, i.e., ec(I out [q, p]), and P represents the desired permutation matrix to be performed all-optically. As depicted in Eq. (5), the output intensity pattern I out [s] or I out [q, p] was scaled by a constant that was calculated at each training iteration as, Note that the presented diffractive permutation networks preserve the relative intensity levels. Stated differently, our training forward model aims to keep the intensity levels over the output and input pixels the same up to a single multiplicative constant, .
To improve the diffraction efficiency of diffractive permutation networks, we defined another loss function,  ′ , that is a linear combination of two penalty terms,  ′ =  +  e , where  corresponds to the structural loss defined in Eq. (5).  e is the penalty term that promotes higher diffraction efficiency at the output of diffractive networks, and it was defined as,  e = e − , where, The diffractive permutation networks presented in Figures 1-4 were trained based on  ′ with = 0; however, the experimentally demonstrated diffractive permutation network model was trained with = 0.15, resulting in an output diffraction efficiency of 2.45%, which includes a material absorption loss of Supplementary Figure S3 further demonstrates the diffraction efficiency and the SSIM values provided by various diffractive permutation network models trained with different values.
The supervised deep learning-based training of the presented diffractive permutation networks evaluates the loss function  ′ for a batch of randomly generated input patterns, computes the mean gradient, and updates the learnable, auxiliary variables, h a , that determine the material thickness over each diffractive neuron, h, through the following relation, where h m and h b denote the maximum modulation thickness and the base material thickness, respectively. For all the diffractive permutation networks presented in Figures 1-4 and Supplementary Figure S3, h m was taken as 2 . In the design of the 3D-printed diffractive permutation network, however, h m was set to be 1.66 to restrict the material thickness contrast between the neighboring diffractive features. The value of h b was taken as 0.66 for all the presented designs including the fabricated diffractive network.

Computation of P D 2 NN , optical transformation errors and performance quality metrics:
For a given diffractive permutation network design trained to optically implement a permutation matrix P of size N i × N o , there are two different ways to compute the permutation operation predicted by its numerical forward model. The first way is to propagate N different randomly generated independent inputs with N ≥ N i N o and solve a linear system of equations for revealing the entries of P D 2 NN . Alternatively, each input pixel at the input FOV can be turned on sequentially and the output intensity pattern synthesized by the diffractive optical permutation network as a response to each pixel provides one unique column of P D 2 NN . These two procedures, in general, result in two different P D 2 NN matrices that closely resemble each other. We opted to use the latter procedure due to its simplicity, which turn on each input pixel one at a time and records the corresponding output intensity pattern, which, after vectorization, represents a column of P D 2 NN . Following the calculation of P D 2 NN predicted by the forward model of a trained diffractive permutation network, it was scaled with a multiplicative constant, P , to account for the optical losses: The all-optical transformation error, ‖P − P D 2 NN ‖ 2 , can be computed based on, Denoting the lexicographically ordered vectorized version of a 2D input intensity pattern with I in [s], the ground truth output intensity can be found by PI in [s]. The PSNR between this ground-truth vector and the output vector synthesized by the forward optical operation of a given, trained diffractive network, I out [s], can be calculated as, where is the multiplicative constant defined in Eq. (6). The SSIM values were calculated based on the built-in function in TensorFlow, i.e., tf.image.ssim, where the two inputs were 2D versions of PI in [s] and I out [s], representing the ground-truth image and the permuted, alloptical output signal, respectively. All the parameters of tf.image.ssim were taken equal to default values, except that the size of the Gaussian filter was set to be 5 × 5, instead of 11 × 11, and the width of the Gaussian filter was set to be 0.75.

Vaccination of diffractive permutation networks
: v-D 2 NN framework aims to design diffractive optical networks that are resilient against physical error sources, e.g., misalignments, by modeling these factors as random variables and incorporating them into the forward training model. In the training forward model of the vaccinated diffractive networks shown in Figure 4, 4 physical error components were modeled representing the misalignment of each diffractive layer with respect to their ideal location and orientation/angle. The first 3 components represent the statistical variations in the location of each diffractive layer in 3D space. Let the ideal location of a diffractive layer, l, be denoted by the vector X l = (x l , y l , z l ), then at each training iteration i, v-D 2 NN framework perturbs X l with a random displacement vector, D l,i = During the training, for each batch of input images, the 3D displacement vector D l,i is updated and accordingly, the location of the layer l is set to be X l,i = X l + D l,i , building up robustness to physical misalignments.
Beyond the displacement of diffractive layers, the physical forward model of a diffractive network is also susceptible to variations in the orientation of the diffractive layers. Ideally, one should include all 3 rotational components, yaw, pitch, and roll; however, in this study, we only considered the yaw component since in our experimental systems, the pitch and the roll can be controlled with a high precision. The random angle representing the rotation of a diffractive layer l around the optical axis was defined as D l,i ∼ U(−Δ , Δ ). With 3 shift components depicted in Eq. (12) and the statistical yaw variation modeled through D l,i , the vaccinated diffractive networks shown in Figure 4 were trained to build resilience against these 4 misalignment components. The values of Δ x , Δ y , Δ z , and Δ determining the misalignment tolerance range were defined as a function a common

Training details:
The deep learning-based training of the diffractive permutation networks was implemented using Python (v3.6.5) and TensorFlow (v1.15.0, Google Inc.). The backpropagation updates were calculated using the Adam optimizer [78], and its parameters were taken as the default values in TensorFlow and kept identical in each model. The learning rates of the diffractive optical networks were set to be 0.001. The training batch size was taken as 75 during the deep learning-based training of the presented diffractive permutation networks. For the training of the diffractive permutation networks, we generated ∼4.7 million random intensity patterns, providing us 93,750 iterations/error-backpropagation updates per epoch. The training of a 5-layer diffractive permutation network with 40K diffractive neurons per layer for 5 epochs using this randomly created training dataset takes approximately 4 days using a computer with a GeForce GTX 1080 Ti Graphical Processing Unit (GPU, Nvidia Inc.) and Intel ® Core™ i7-8700 Central Processing Unit (CPU, Intel Inc.) with 64 GB of RAM, running Windows 10 operating system (Microsoft).