Monocular depth sensing using metalens

: 3-D depth sensing is essential for many applications ranging from consumer electronics to robotics. Passive depth sensing techniques based on a double-helix (DH) point-spread-function (PSF) feature high depth estimation precision, minimal power consumption, and reduced system complexity compared to active sensing methods. Here, we propose and experimentally implemented a polarization-multiplexed DH metalens designed using an autonomous direct search algorithm, which utilizes two contra-rotating DH PSFs encoded in orthogonal polarization states to enable monocular depth perception. Using a reconstruction algorithm that we developed, concurrent depth calculation and scene reconstruction with minimum distortion and high resolution in all three dimensions were demonstrated.


Introduction
Conventional optical imaging systems map 3-D scene to a flat image plane at the cost of losing depth information.The missing knowledge of object distances, however, is crucial to a variety of applications spanning autonomous driving, object recognition, gesture control, virtual/augmented reality, etc.Multiple active depth sensing mechanisms have convolution of the scene with the DH PSF, and thus the depth information can only be estimated with prior knowledge of the original object.Colburn et al. coupled a DH metasurface with an extended depth-of-focus (EDOF) metasurface in a binocular setting to resolve this ambiguity [41].Multiplexing presents a way to combine the two metasurfaces into one aperture to realize monocular depth estimation (MDE) [42].Along this line, MDE was recently demonstrated with a decoupled pair of conjugate single-helix PSFs [43].
In this paper, we demonstrate a polarizationmultiplexed DH metasurface design for MDE using a single metasurface.Two DH PSFs with opposite rotating directions are each encoded with a linear polarization.Importantly, the focal point rotation angles of the two PSFs always add up to 90 • , a feature that allows computationally efficient and unambiguous reconstruction of both the depth and image.As one specific example, we experimentally implemented the design at 635 nm wavelength within the depth range of 45-212 mm and rotation angles of up to 80 • .This concept is generically applicable to other wavelengths and depth ranges.

Design principle of metalens with contra-rotating DH PSFs
The DH phase mask is constructed based on superposition of Laguerre-Gaussian modes [44][45][46], it modifies the wavefront emitted from a point-source to generates two foci on the image plane, where orientation of the line connecting the foci depends on the point source distance.Rotation rate and depth range are determined by the choice of the Laguerre-Gaussian mode set.In previous implementations, a block-iterative weighted projection algorithm was utilized to optimize the DH phase mask and suppress the focal spot sidelobes, thereby improving depth estimation accuracy [41,46,47].However, the optimization procedure is a time-consuming empirical process which requires constant human intervention.Here, we use a direct search (DS) algorithm [48][49][50][51][52] to optimize the DH phase mask.the relationship between rotation angle and distance using the same notations as in Ref. [41], with V 1 and  0 being free parameters to control the rotation rate and depth range.
With a DH phase mask, an image formed through the metalens is the convolution of the DH PSF and the object.Without prior knowledge of the object, it is not possible to extract the true object information from a single-shot image.We eliminate this ambiguity by multiplexing two DH phase masks with opposite rotation directions into a single metalens.The two phase masks share an identical layout albeit with their x and y coordinates swapped, and therefore only one phase mask design is required.After acquiring two images corresponding to the two polarization states, a series of deconvolution operation is performed on them assuming DH PSFs of varying rotation angles.Since the two images capture the same object, their deconvolved outcome should be identical provided that a correct rotation angle is used.Therefore, by identifying the deconvolved image pair with maximum similarity, the rotation angle and hence object depth can be unambiguously determined.More details of the depth retrieval algorithm are discussed in Section 4. Further improvement of the design is possible leveraging the rise of end-to-end optimization framework in recent years, which provides an alternative approach to optimize both the meta-optical frontend and the reconstruction algorithm [53][54][55].
An approach similar to that in Ref. [56] is employed to design polarization-multiplexed meta-atoms.The metaatom structure is schematically depicted in Figure 1(a), comprising a 450 nm thick rectangular amorphous silicon nanopost sitting on a fused silica substrate.The geometries of the nano-posts are designed according to the polarization directions of the incident light.The meta-atom pitch is fixed at 300 nm.The finite-difference time-domain (FDTD) method is used to analyze the meta-atom responses.The phase delay and transmittance of the meta-atoms under x-polarized incident light are shown in Figure 1(b) and (c).Data pertinent to y-polarized light can be trivially obtained by swapping the x and y coordinates.The phase difference between two polarization states is shown in Figure 1 [57] containing 16 meta-atom structures of different lateral dimensions is chosen to allow independent control of the metalens' phase profiles in both polarization states.The meta-atom dimensions are listed in the Appendix.

Metalens fabrication and PSF characterization
As a proof of concept, we designed a metalens with 1 mm aperture size and 5 mm focal length.Parameters in Eq. ( 4) are taken as  0 = 125,  0 = 180 • and V 1 = 2, identical to those in Ref. [41].The DS optimization was performed with N = 41 in Eq. (1), i.e. with 41 evenly spaced discrete rotation angles at a step size of 2 • .The designed metalens has a rotation angle up to 80 • , corresponding to a depth sensing range of 45-212 mm.The so-designed metalens was fabricated through electron-beam lithography followed by reactive-ion etching.The PSF of the fabricated metalens was first characterized.Figure 3 compares the simulated and measured PSFs with different rotation angles.In the simulation, Kirchhoff diffraction integral [58] is used to transform the near-field wavefront after exiting the metasurface to the intensity distribution on the image plane.During the PSF measurement, a monochrome micro-LED display (Jade Bird Display 5000DPI AMuLED Panel) is placed in front of the metalens at different distances, and a 40 μm diameter circular spot is displayed to emulate a point object.A telescope assembly is placed between the DH metalens and an image sensor (Arducam MT9J001) with a calibrated magnification of 5. A polarizer is mounted in front of the sensor to control the polarization state of the incident light.As seen from Figure 3, excellent agreement is obtained between our design and experiment throughout the entire depth range.

Concurrent imaging and depth mapping demonstration
The experimental setup consists of the micro-LED display projecting a '+'-shaped pattern shown in Figure 4(a) and placed at varying distances.The images captured by the DH metalens are shown in Figure 4(b) and (c) for the two polarization states, respectively.To extract depth information and reconstruct the scene, these images are deconvolved using a set of PSF pairs with different rotation angles.Since the two phase masks corresponding to orthogonal polarizations are linked via a reflection transformation, the sum of rotation angles of every PSF pair must equal 90 • .We use Wiener deconvolution given in Eq. ( 5), where H is the Fourier transform of the image formed by the lens, G gives the Fourier transform of DH PSF (i.e., a cosine function), F denotes the Fourier transform of the deconvolved image, and SNR = 0.1 is the signal-to-noise ratio, which is an intrinsic parameter of the image sensor.
Three pairs of deconvolved images in two polarization states are shown in Figure 4(d) and (e), each assuming a different rotation angle.We then computed the image correlation map between the image pair using Eq. ( 6), where 'corr' stands for the image correlation map, and h 1 and h 2 represent the deconvolved image pair.We further define the similarity parameter as the maximum value within the image correlation map, and Figure 4(f) plots the parameter as a function of the rotation angle.Since the pair of images depict the same object, the similarity curve should reach maximum when the rotation angles of the DH PSFs used in the deconvolution are correct.The object depth can then be inferred according to the correct rotation angles.
The protocol described above was applied to depth estimation of objects placed at different distances, and the measured depth values are shown as red dots in Figure 4(g).The analytical expression of Eq. ( 4) that our lens design is based on is also plotted as a solid line, showing excellent agreement.
To further characterize the lateral spatial resolution of the DH metalens, we replaced the '+' pattern on the micro-LED display with a standard USAF resolution chart.As an example, the captured images under x and y polarized light are shown in Figure 5(a) and (b) for an object distance of 5.5 mm.The same deconvolution algorithm was performed to reconstruct the scene shown in Figure 5(c).The modulation transfer function (MTF) at different spatial frequencies was obtained from the image contrast of the reconstructed resolution chart.In addition to the direct MTF measurement, we also evaluated the MTF from Fourier transform of the measured DH PSFs.Both sets of results are plotted in Figure 5(d) with excellent agreement, which assures accuracy of our reconstruction algorithm.
Lastly, real world imaging was demonstrated using letters of 'M', 'I', and 'T' printed on card boards.The letters were displaced at different distances as shown in   The aforementioned algorithm was implemented to infer the depth of the objects, with the caveat that oblique incidence onto the metalens (which leads to additional phase delay) was accounted for and corrected following procedures outlined in Ref. [41].The extracted depths are shown in Figure 6(d), which agrees well with the ground truth.The average error of depth estimation is 2.7%.

Conclusions
We demonstrated a monocular metalens design capable of perform both depth sensing and scene reconstruction concurrently.The design leverage polarization-multiplexing to encode two phase masks, each generating an optimized DH PSF.Our design ensures that rotation angles of the two contra-rotating PSFs are always complementary, which enables unambiguous depth perception without prior knowledge of the scene.Compared to other depth sensing methods, our MDE approach features a single-aperture, compact footprint, high depth and lateral resolution, and passive operation.These advantages foresee vast potential applications of our technology in areas such as microscopy, medical imaging, virtue/augmented reality, automotive/robotic sensing and beyond.

Appendix A: Polarization-multiplexed meta-atom design
The meta-atom dimensions and their phase/transmittance responses are listed in Table 1.

Figure 1 :
Figure 1: Polarization-multiplexed meta-atom design.(a) Illustration of the meta-atom structure.(b) Phase delay and (c) transmittance of the meta-atoms with x-polarized incident light.(d) Phase delay difference between the two polarization states.

Figure 2
Figure 2 presents optical microscope and scanning electron microscope (SEM) images of the fabricated metalens, showing excellent uniformity and pattern fidelity.The PSF of the fabricated metalens was first characterized.Figure3compares the simulated and measured PSFs with different rotation angles.In the simulation, Kirchhoff diffraction integral[58] is used to transform the near-field wavefront after exiting the metasurface to the intensity distribution on the image plane.During the PSF measurement, a monochrome micro-LED display (Jade Bird Display 5000DPI AMuLED Panel) is placed in front of the metalens

Figure 3 :
Figure 3: Metalens PSF.(a) Simulation and (b) experimental measurement of PSFs for different source distances in the x polarization state.(c) Simulation and (d) experimental measurement of PSFs in the y polarization state.(scale bar: 20 μm).

Figure 4 :
Figure 4: Experimental demonstration of image deconvolution to enable concurrent depth mapping and scene reconstruction.(a) A '+' pattern on the micro-LED display emulates an object.(b, c) Images in the (b) x-polarization state and (c) y-polarization state with different object distances.(d) Deconvolved images of the object at 5.5 cm distance using DH PSF rotation angles of 90 • , 110 • , and 130 • (left to right), respectively.(e) Deconvolved images of the object at 5.5 cm distance with DH PSF rotation angles of 0 • , −20 • , and −40 • (left to right), respectively.(f) Similarity of image pairs deconvolved using different DH PSF rotation angles.The maxmium point corresponds to the correct rotation angle.(g) Object depth estimation based on analytical expression (solid line) and experimental measurement (red dots).(Scale bar: 40 μm).

Figure 5 :
Figure 5: Imaging of the USAF target for evaluating the lateral resolution.Experimentally captured images in the (a) x-polarization and (b) y-polarization states.(c) Reconstructed image of the USAF resolution target pattern.(d) Measured MTF of the metalens at an object distance of 5.5 mm.The red dots correspond to MTF measured from the USAF target and the blue dots are MTF calculated from experimentally measured PSFs via Fourier transform.The solid line gives MTF inferred from PSFs simulated by diffraction integral.(Scale bar: 80 μm).

Figure 6 :
Figure 6: Experimental demonstration of depth sensing.(a) Photos of printed letters of 'M', 'I', and 'T', each placed at a different distance.(b, c) Captured images in (b) the x-polarization state and (c) the y-polarization state.(d) Inferred object distances (red dots) compared to the ground truth (solid line).(Scale bar: 80 μm).

Figure 6 (
Figure6(a).An LED light source with a center wavelength of 625 nm and 20 nm full-width-at-half-maximum (FWHM) spectral bandwidth was used to illuminate the scene, and a filter with 635 nm center wavelength and 10 nm FWHM bandwidth was placed in front of the image sensor to reject out-of-band light.The images recorded in the two polarization states are shown in Figure6(b) and (c).The aforementioned algorithm was implemented to infer the depth of the objects, with the caveat that oblique incidence onto the metalens (which leads to additional phase delay) was accounted for and corrected following procedures outlined in Ref.[41].The extracted depths are shown in Figure6(d), which agrees well with the ground truth.The average error of depth estimation is 2.7%.