Single-Photon 3D Imaging with Equi-Depth Photon Histograms
Kaustubh Sadekar, David Maier, Atul Ingle
ECCV 2024
Single-photon cameras present a promising avenue for high-resolution 3D imaging. They have ultra-high sensitivity---down to individual photons---and can record photon arrival times with extremely high (sub-nanosecond) resolution. Single-photon 3D cameras (SPCs) estimate the round-trip time of a laser pulse by forming equi-width (EW) histograms of detected photon timestamps over multiple laser cycles. Acquiring and transferring such EW histograms requires high bandwidth and in-pixel memory, making SPCs less attractive for 3D-perception applications in resource-constrained settings such as mobile devices and AR/VR headsets. Here we propose a new 3D sensing technique based on equi-depth (ED) histograms. ED histograms are a more concise representation of timestamp data than EW histograms and reduce the bandwidth requirement. Moreover, to reduce the in-pixel memory requirement, we propose a lightweight algorithm to estimate ED histograms in an online fashion without explicitly storing the photon timestamps. This algorithm is amenable to future in-pixel implementations. We propose algorithms that process ED histograms to perform 3D computer-vision tasks of estimating scene distance maps and performing visual odometry under challenging conditions such as high ambient light. Our work paves the way towards lower bandwidth and reduced in-pixel memory requirements for SPCs, making them attractive for resource-constrained 3D vision applications.
Our proposed DeePEDH pipeline enables SPCs to be used in resource-constrained applications. SPCs generate large amounts of raw timestamp data, requiring high in-pixel memory and causing a data bottleneck. (a–d) Conventional 3D SPCs resort to low-resolution equi-width (EW) histograms resulting in poor distance resolution. (e–g) DeePEDH uses a more efficient on-sensor compression scheme through equi-depth (ED) histograms combined with a deep neural network distance map estimator. (h) The proposed method provides accurate high-resolution distance maps with 10 − 100× lower bandwidth. (i–k) DeePEDH paves the way for various computer vision tasks to be performed on resource-constrained devices using SPCs for 3D sensing.
Equi-width vs. equi-depth histograms for peaky data. (a) Most bins of an 8-bin EW histogram are wasted on background photons; a single bin B2 gives a coarse estimate of the true peak. (b) An ED histogram captures this “peaky” transient distribution better with a cluster of narrower bins around the true peak.
A binner element used for tracking ED histogram boundaries. (a) A median-finding binner splits the incident photons into early (En) and late photons (Ln) compared to Cn, the control value (CV) which is the current estimated median-value of the binner. In this case, there are more early photons than late, hence the step size Sn is negative to move the CV left. (b) Our optimized stepping strategy uses a sequence of step sizes to achieve faster convergence and lower variance compared to earlier fixed-stepping binner design.
Comparison of distance map reconstructions on NYUv2 test images. The CSPH algorithm, HEDH (narrowest bin), and the PEDH (narrowest bin) methods suffer from noisy distance maps in darker regions and scene points at farther distances. The 32-bin EWH suffers from quantization artifacts. The proposed DNN-based DeePEDH method learns the spatiotemporal correlations between the PEDH output to provide both high spatial and distance resolution even in regions where other methods fail.
DeePEDH SPCs enable better camera tracking and 3D reconstruction. (a) The camera motion trajectory reconstructed from DeePEDH distance maps (green) closely tracks the ground truth (green) and provides > 10× lower RMSE than 32-bin EWH (blue). (b) 3D surface reconstruction obtained using a 32-bin EWH suffers from severe quantization artifacts. The proposed 32-bin DeePEDH method generates high quality 3D reconstructions both qualitatively and in terms of quantitative metrics. (c) semantic segmentation results for a pre-trained CEN network [42] using distance estimates from 32-bin EWH and 32-bin DeePEDH. As the distance maps from DeePEDH
are significantly closer to the ground truth, the segmentation results are better than using distance images from 32-bin EWH.
This project is supported in part by funding from the US National Science Foundation (NSF Grant # ECCS-2138471).