🤖 AI Summary
This work addresses the challenge of irreversible loss of task-critical information under bandwidth, latency, and power constraints inherent in conventional downsampling approaches. The authors propose the first task-driven, real-time foveated imaging system that dynamically allocates pixel budgets at capture time through a dual-stream sensor architecture: high-resolution sampling is focused on task-relevant regions while low-resolution captures preserve global context. Formulating gaze control as a sensor attention policy learning problem, they employ reinforcement learning to enable closed-loop optimization between perception and acquisition. This approach overcomes the limitations of static or non-task-aware capture strategies, significantly outperforming existing baselines under strict pixel budgets and demonstrating practical feasibility on a 200-megapixel dual-stream sensor for real-world video acquisition.
📝 Abstract
Ultra-high-resolution image sensors offer the potential to capture fine spatial details critical for many visual perception tasks, but acquiring and processing all pixels at full resolution is often infeasible under realistic bandwidth, latency, and power constraints. Existing approaches address this challenge through acquisition strategies such as spatial or temporal downsampling, which irrevocably discard information before task relevance can be assessed. In this work, we introduce a real-time, predictive, and task-aware foveated imaging system that operates directly at image acquisition time. Leveraging emerging dual-stream sensor architectures, our method dynamically allocates limited pixel bandwidth to task-relevant regions of interest while maintaining a low-resolution global context. We formulate foveated acquisition as a sensor attention policy-learning problem, in which past observations guide actions that determine future measurements, closing the perception-acquisition loop. Through extensive simulation across multiple perception tasks, we demonstrate that our approach achieves high task performance under strict pixel budgets and significantly outperforms relevant baselines operating at the same bandwidth. We further validate our system on a 200-megapixel dual-stream sensor, capturing real-world videos under realistic bandwidth and latency constraints, demonstrating the practical feasibility of task-driven, acquisition-time foveated imaging.