Bi-Directional Deep Contextual Video Compression

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
Existing deep video compression methods suffer from inadequate motion modeling and insufficient exploitation of temporal context in B-frame coding, leading to suboptimal bit allocation and reduced compression efficiency. To address this, we propose DCVC-B—a novel deep B-frame coding framework that introduces, for the first time, a bidirectional motion-difference contextual propagation mechanism. It jointly designs bidirectional temporal context compression and an adaptive entropy model, complemented by a GOP-level hierarchical quality-driven training strategy. The method integrates bidirectional optical flow estimation, multi-scale temporal modeling, and autoregressive entropy coding. Experiments demonstrate that, under random-access conditions, DCVC-B achieves an average BD-rate reduction of 26.6% over the HEVC reference software (HM), with performance surpassing VVC on several test sequences. This work significantly advances the rate-distortion performance and practical applicability of deep learning-based B-frame coding.

Technology Category

Application Category

📝 Abstract
Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B, to improve the compression performance of deep B-frame coding. Our scheme mainly has three key innovations. First, we develop a bi-directional motion difference context propagation method for effective motion difference coding, which significantly reduces the bit cost of bi-directional motions. Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model, to make better use of the multi-scale temporal contexts. Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures (GOP). Experimental results show that our DCVC-B achieves an average reduction of 26.6% in BD-Rate compared to the reference software for H.265/HEVC under random access conditions. Remarkably, it surpasses the performance of the H.266/VVC reference software on certain test datasets under the same configuration. We anticipate our work can provide valuable insights and bring up deep B-frame coding to the next level.
Problem

Research questions and friction points this paper is trying to address.

Depth Video Compression
B-Frame Encoding
Temporal Information Utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Depth Context Video Compression
Efficient Motion Representation
Bit Allocation Strategy
Xihua Sheng
Xihua Sheng
University of Science and Technology of China->City University of Hong Kong
Video codingImage codingPoint Cloud coding
L
Li Li
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
D
Dong Liu
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei 230027, China
S
Shiqi Wang
Department of Computer Science, City University of Hong Kong, Hong Kong, China