SANR: Scene-Aware Neural Representation for Light Field Image Compression with Rate-Distortion Optimization

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The high dimensionality of light field images renders conventional compression methods inefficient, while existing neural representation approaches predominantly rely on implicit coordinate mappings, lacking explicit scene-structure modeling and end-to-end rate-distortion optimization. To address these limitations, this paper proposes a scene-aware neural compression framework comprising three key innovations: (1) a hierarchical scene modeling module that explicitly captures multi-scale 3D scene structure; (2) the first integration of entropy-constrained, quantization-aware training (QAT) into light field neural representations, enabling end-to-end joint rate-distortion optimization; and (3) a hybrid architecture combining implicit neural representation with multi-scale latent coding. Evaluated on standard benchmarks, the proposed method achieves an average bitrate reduction of 65.62% over HEVC while attaining state-of-the-art rate-distortion performance.

Technology Category

Application Category

📝 Abstract
Light field images capture multi-view scene information and play a crucial role in 3D scene reconstruction. However, their high-dimensional nature results in enormous data volumes, posing a significant challenge for efficient compression in practical storage and transmission scenarios. Although neural representation-based methods have shown promise in light field image compression, most approaches rely on direct coordinate-to-pixel mapping through implicit neural representation (INR), often neglecting the explicit modeling of scene structure. Moreover, they typically lack end-to-end rate-distortion optimization, limiting their compression efficiency. To address these limitations, we propose SANR, a Scene-Aware Neural Representation framework for light field image compression with end-to-end rate-distortion optimization. For scene awareness, SANR introduces a hierarchical scene modeling block that leverages multi-scale latent codes to capture intrinsic scene structures, thereby reducing the information gap between INR input coordinates and the target light field image. From a compression perspective, SANR is the first to incorporate entropy-constrained quantization-aware training (QAT) into neural representation-based light field image compression, enabling end-to-end rate-distortion optimization. Extensive experiment results demonstrate that SANR significantly outperforms state-of-the-art techniques regarding rate-distortion performance with a 65.62% BD-rate saving against HEVC.
Problem

Research questions and friction points this paper is trying to address.

Compressing high-dimensional light field images efficiently
Addressing lack of scene structure modeling in compression
Incorporating rate-distortion optimization in neural representation compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scene-aware hierarchical modeling captures intrinsic structures
End-to-end rate-distortion optimization with entropy constraints
Quantization-aware training incorporated in neural representation compression
🔎 Similar Papers
No similar papers found.
G
Gai Zhang
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Xinfeng Zhang
Xinfeng Zhang
Fuxi AI Lab, NetEase Inc.
Vision-Language ModelsMultimodal
Lv Tang
Lv Tang
University of Alberta. Former researcher @ UCAS/Nanjing University
Computer VisionMLLMVideo CompressionImage Segmentation
H
Hongyu An
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
L
Li Zhang
Advanced Video Group (AVG), Bytedance Inc., San Diego, CA 92122 USA
Qingming Huang
Qingming Huang
University of the Chinese Academy of Sciences
Multimedia Analysis and RetrievalImage and Video ProcessingPattern RecognitionComputer VisionVideo Coding