🤖 AI Summary
Existing light field (LF) super-resolution methods uniformly model sub-aperture images (SAIs), leading to parallax entanglement and computational redundancy. To address this, we propose the Multi-scale Disparity Transformer (MDT), the first disparity-aware “divide-and-conquer” Transformer architecture for LF SR. MDT explicitly disentangles parallax by employing a multi-branch structure and a novel Disparity-Sensitive Attention (DSA) mechanism that models distinct disparity ranges separately. Integrated with SAI-wise collaborative modeling and multi-scale feature fusion, it forms the lightweight LF-MDTNet. On 2× and 4× LF SR tasks, LF-MDTNet achieves PSNR gains of +0.37 dB and +0.41 dB over state-of-the-art methods, respectively, while reducing model parameters by 23% and accelerating inference by 1.8×. The approach thus advances accuracy, efficiency, and interpretability—enabling explicit parallax-aware representation learning in LF super-resolution.
📝 Abstract
This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) that addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images inherent in conventional methods. MDT features a multi-branch structure, with each branch utilising independent disparity self-attention (DSA) to target specific disparity ranges, effectively reducing computational complexity and disentangling disparities. Building on this architecture, we present LF-MDTNet, an efficient LFSR network. Experimental results demonstrate that LF-MDTNet outperforms existing state-of-the-art methods by 0.37 dB and 0.41 dB PSNR at the 2x and 4x scales, achieving superior performance with fewer parameters and higher speed.