EHVC: Efficient Hierarchical Reference and Quality Structure for Neural Video Coding

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural video codecs (NVCs) suffer from a structural mismatch between their reference frame configuration and hierarchical quality representation, limiting coding efficiency. To address this, we propose EHVC—an Efficient Hierarchical Video Coding framework—that jointly optimizes three key components to align these structures: (1) a hierarchical multi-reference mechanism to strengthen temporal dependency modeling; (2) a forward-frame context pre-read strategy to improve motion compensation accuracy; and (3) layer-wise quality-scale constraints coupled with stochastic quality-level training to ensure stable reconstruction quality. EHVC is trained end-to-end, integrating classical coding principles with deep learning advantages. Extensive experiments on standard benchmark datasets demonstrate that EHVC significantly outperforms state-of-the-art methods, achieving an average 18.7% BD-rate reduction while improving both PSNR and MS-SSIM—validating its superior compression efficiency and reconstruction fidelity.

Technology Category

Application Category

📝 Abstract
Neural video codecs (NVCs), leveraging the power of end-to-end learning, have demonstrated remarkable coding efficiency improvements over traditional video codecs. Recent research has begun to pay attention to the quality structures in NVCs, optimizing them by introducing explicit hierarchical designs. However, less attention has been paid to the reference structure design, which fundamentally should be aligned with the hierarchical quality structure. In addition, there is still significant room for further optimization of the hierarchical quality structure. To address these challenges in NVCs, we propose EHVC, an efficient hierarchical neural video codec featuring three key innovations: (1) a hierarchical multi-reference scheme that draws on traditional video codec design to align reference and quality structures, thereby addressing the reference-quality mismatch; (2) a lookahead strategy to utilize an encoder-side context from future frames to enhance the quality structure; (3) a layer-wise quality scale with random quality training strategy to stabilize quality structures during inference. With these improvements, EHVC achieves significantly superior performance to the state-of-the-art NVCs. Code will be released in: https://github.com/bytedance/NEVC.
Problem

Research questions and friction points this paper is trying to address.

Addressing reference and quality structure mismatch in neural video codecs
Optimizing hierarchical quality structure for improved coding efficiency
Enhancing reference structure alignment with quality hierarchy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical multi-reference scheme aligning reference and quality structures
Lookahead strategy using encoder-side context from future frames
Layer-wise quality scale with random quality training strategy
🔎 Similar Papers
No similar papers found.
Junqi Liao
Junqi Liao
University of Science and Technology of China
video codingreinforcement learning
Yaojun Wu
Yaojun Wu
Bytedance Inc.
Data compression、Deep learning
C
Chaoyi Lin
Bytedance China, Hangzhou, Zhejiang Province, China
Z
Zhipin Deng
Bytedance China, Beijing, China
L
Li Li
University of Science and Technology of China, Hefei, Anhui Province, China
D
Dong Liu
University of Science and Technology of China, Hefei, Anhui Province, China
Xiaoyan Sun
Xiaoyan Sun
Microsoft Research Asia
Image/Video CodingMultimedia ProcessingComputer Vision