Bi-Directional Deep Contextual Video Compression

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deep video compression methods suffer from inadequate motion modeling and insufficient exploitation of temporal context in B-frame coding, leading to suboptimal bit allocation and reduced compression efficiency. To address this, we propose DCVC-B—a novel deep B-frame coding framework that introduces, for the first time, a bidirectional motion-difference contextual propagation mechanism. It jointly designs bidirectional temporal context compression and an adaptive entropy model, complemented by a GOP-level hierarchical quality-driven training strategy. The method integrates bidirectional optical flow estimation, multi-scale temporal modeling, and autoregressive entropy coding. Experiments demonstrate that, under random-access conditions, DCVC-B achieves an average BD-rate reduction of 26.6% over the HEVC reference software (HM), with performance surpassing VVC on several test sequences. This work significantly advances the rate-distortion performance and practical applicability of deep learning-based B-frame coding.

Technology Category

Application Category

📝 Abstract
Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B, to improve the compression performance of deep B-frame coding. Our scheme mainly has three key innovations. First, we develop a bi-directional motion difference context propagation method for effective motion difference coding, which significantly reduces the bit cost of bi-directional motions. Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model, to make better use of the multi-scale temporal contexts. Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures (GOP). Experimental results show that our DCVC-B achieves an average reduction of 26.6% in BD-Rate compared to the reference software for H.265/HEVC under random access conditions. Remarkably, it surpasses the performance of the H.266/VVC reference software on certain test datasets under the same configuration. We anticipate our work can provide valuable insights and bring up deep B-frame coding to the next level.
Problem

Research questions and friction points this paper is trying to address.

Depth Video Compression
B-Frame Encoding
Temporal Information Utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Depth Context Video Compression
Efficient Motion Representation
Bit Allocation Strategy
🔎 Similar Papers
No similar papers found.
Xihua Sheng
Xihua Sheng
University of Science and Technology of China->City University of Hong Kong
Video codingImage codingPoint Cloud coding
L
Li Li
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
D
Dong Liu
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China
S
Shiqi Wang
Department of Computer Science, City University of Hong Kong, Hong Kong, China