🤖 AI Summary
To address the high computational complexity and low inference efficiency of Transformers in light field (LF) image super-resolution—stemming from their self-attention mechanism—this paper proposes LF-VSSM, the first lightweight state space model tailored for LF super-resolution. LF-VSSM hierarchically models long-range dependencies: intra-view spatial, inter-view spatial-angular, and pixel-level spatial-angular correlations, marking the first systematic integration of state space models into LF super-resolution. Leveraging LF geometric priors and progressive feature extraction, the network achieves significant parameter and FLOPs reduction. Extensive experiments on multiple benchmark datasets demonstrate that LF-VSSM surpasses existing state-of-the-art methods in PSNR and SSIM while reducing model size and computational cost; notably, it achieves a 42% speedup in inference time.
📝 Abstract
Transformers bring significantly improved performance to the light field image super-resolution task due to their long-range dependency modeling capability. However, the inherently high computational complexity of their core self-attention mechanism has increasingly hindered their advancement in this task. To address this issue, we first introduce the LF-VSSM block, a novel module inspired by progressive feature extraction, to efficiently capture critical long-range spatial-angular dependencies in light field images. LF-VSSM successively extracts spatial features within sub-aperture images, spatial-angular features between sub-aperture images, and spatial-angular features between light field image pixels. On this basis, we propose a lightweight network, $L^2$FMamba (Lightweight Light Field Mamba), which integrates the LF-VSSM block to leverage light field features for super-resolution tasks while overcoming the computational challenges of Transformer-based approaches. Extensive experiments on multiple light field datasets demonstrate that our method reduces the number of parameters and complexity while achieving superior super-resolution performance with faster inference speed.