🤖 AI Summary
This work addresses the challenges of complex geometric and illumination modeling in mobile computational photography. We propose a compact, self-regularized neural field representation that jointly reconstructs scene geometry and illumination directly from smartphone raw sensor data—such as RAW images and multimodal sensor signals. Our method employs a coordinate-based network architecture and requires no labeled data, pretrained models, or hand-crafted preprocessing pipelines; instead, it solves the imaging inverse problem end-to-end via stochastic gradient descent. Evaluated on three core tasks—depth estimation, layer decomposition, and image stitching—our approach consistently surpasses existing state-of-the-art methods. Results demonstrate that neural fields can achieve high-fidelity, implicit scene modeling on resource-constrained mobile devices, validating both their effectiveness and practical applicability in real-world mobile photography scenarios.
📝 Abstract
Over the past two decades, mobile imaging has experienced a profound transformation, with cell phones rapidly eclipsing all other forms of digital photography in popularity. Today's cell phones are equipped with a diverse range of imaging technologies - laser depth ranging, multi-focal camera arrays, and split-pixel sensors - alongside non-visual sensors such as gyroscopes, accelerometers, and magnetometers. This, combined with on-board integrated chips for image and signal processing, makes the cell phone a versatile pocket-sized computational imaging platform. Parallel to this, we have seen in recent years how neural fields - small neural networks trained to map continuous spatial input coordinates to output signals - enable the reconstruction of complex scenes without explicit data representations such as pixel arrays or point clouds. In this thesis, I demonstrate how carefully designed neural field models can compactly represent complex geometry and lighting effects. Enabling applications such as depth estimation, layer separation, and image stitching directly from collected in-the-wild mobile photography data. These methods outperform state-of-the-art approaches without relying on complex pre-processing steps, labeled ground truth data, or machine learning priors. Instead, they leverage well-constructed, self-regularized models that tackle challenging inverse problems through stochastic gradient descent, fitting directly to raw measurements from a smartphone.