🤖 AI Summary
End-to-end co-optimization of optical–electronic convolutional neural networks (CNNs) suffers from prohibitive simulation costs and an intractably large parameter space. Method: We propose a two-stage co-design framework: first training a purely electronic CNN, then directly optimizing the optical front-end—modeled as a metasurface array—via convolutional kernel optimization. This decouples optical and electronic optimization, avoiding the instability of joint training while drastically reducing computational and memory overhead. Contribution/Results: Our key innovation lies in tightly integrating physically realizable metasurface optical modeling with data-driven kernel optimization, enabling efficient co-design of optical front-ends and electronic back-ends. On monocular depth estimation, our method achieves twice the accuracy of end-to-end approaches under identical training resources, demonstrating both effectiveness and superiority.
📝 Abstract
Opto-electronic neural networks integrate optical front-ends with electronic back-ends to enable fast and energy-efficient vision. However, conventional end-to-end optimization of both the optical and electronic modules is limited by costly simulations and large parameter spaces. We introduce a two-stage strategy for designing opto-electronic convolutional neural networks (CNNs): first, train a standard electronic CNN, then realize the optical front-end implemented as a metasurface array through direct kernel optimization of its first convolutional layer. This approach reduces computational and memory demands by hundreds of times and improves training stability compared to end-to-end optimization. On monocular depth estimation, the proposed two-stage design achieves twice the accuracy of end-to-end training under the same training time and resource constraints.