Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing learned video compression methods face a fundamental trade-off in motion estimation and compensation (ME/MC): while multi-transform frameworks achieve superior rate-distortion (R-D) performance, they suffer from severe error propagation; conversely, unified-transform frameworks eliminate error propagation but degrade ME/MC accuracy due to shared latent representations. This work proposes the first unified-transform framework free of error propagation. Its core innovations are: (1) dual-domain progressive temporal alignment—coarse pixel-level alignment followed by fine-grained alignment in latent space using multi-reference frames and a flow-guided deformable transformer; and (2) quality-conditioned mixture-of-experts (QCMoE) quantization, enabling continuous bit-rate adaptation. Experiments demonstrate that our method achieves state-of-the-art R-D performance without any error propagation, while simultaneously delivering high reconstruction quality, low latency, and flexible bit-rate control.

Technology Category

Application Category

📝 Abstract

Existing frameworks for learned video compression suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separate-transform framework employs distinct transforms for intra-frame and inter-frame compression to yield impressive rate-distortion (R-D) performance but causes evident error propagation, while the unified-transform framework eliminates error propagation via shared transforms but is inferior in ME/MC in shared latent domains. To address this limitation, in this paper, we propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE) to enable quality-consistent and error-propagation-free streaming for learned video compression. Specifically, we propose dualdomain progressive temporal alignment for ME/MC that leverages coarse pixel-domain alignment and refined latent-domain alignment to significantly enhance temporal context modeling in a coarse-to-fine fashion. The coarse pixel-domain alignment efficiently handles simple motion patterns with optical flow estimated from a single reference frame, while the refined latent-domain alignment develops a Flow-Guided Deformable Transformer (FGDT) over latents from multiple reference frames to achieve long-term motion refinement (LTMR) for complex motion patterns. Furthermore, we design a QCMoE module for continuous bit-rate adaptation that dynamically assigns different experts to adjust quantization steps per pixel based on target quality and content rather than relies on a single quantization step. QCMoE allows continuous and consistent rate control with appealing R-D performance. Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts, while successfully eliminating error propagation.

Problem

Research questions and friction points this paper is trying to address.

Addresses error propagation in learned video compression frameworks

Enhances motion estimation with dual-domain progressive temporal alignment

Enables quality-consistent streaming via adaptive quantization with QCMoE

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-domain progressive temporal alignment for motion estimation

Flow-Guided Deformable Transformer for long-term motion refinement

Quality-conditioned mixture-of-expert module for adaptive quantization

🔎 Similar Papers

No similar papers found.