🤖 AI Summary
To address the prohibitive memory growth with frame count in implicit neural representation (INR)-based video compression—hindering edge deployment—this paper proposes a timeline-based autoregressive segmentation modeling paradigm. Long videos are partitioned into adjustable-length segments, each assigned a dedicated INR model; cross-segment parameter initialization and joint training enable effective parameter sharing and seamless temporal modeling. This is the first work to reformulate INR video compression from an autoregressive perspective. Experiments on multiple benchmarks demonstrate substantial improvements over state-of-the-art INR compression methods: up to 62% reduction in memory footprint, PSNR gains of 1.8–3.2 dB, and support for low-latency, flexible video editing and on-device inference.
📝 Abstract
Implicit Neural Representations (INRs) have demonstrated significant potential in video compression by representing videos as neural networks. However, as the number of frames increases, the memory consumption for training and inference increases substantially, posing challenges in resource-constrained scenarios. Inspired by the success of traditional video compression frameworks, which process video frame by frame and can efficiently compress long videos, we adopt this modeling strategy for INRs to decrease memory consumption, while aiming to unify the frameworks from the perspective of timeline-based autoregressive modeling. In this work, we present a novel understanding of INR models from an autoregressive (AR) perspective and introduce a Unified AutoRegressive Framework for memory-efficient Neural Video Compression (UAR-NVC). UAR-NVC integrates timeline-based and INR-based neural video compression under a unified autoregressive paradigm. It partitions videos into several clips and processes each clip using a different INR model instance, leveraging the advantages of both compression frameworks while allowing seamless adaptation to either in form. To further reduce temporal redundancy between clips, we design two modules to optimize the initialization, training, and compression of these model parameters. UAR-NVC supports adjustable latencies by varying the clip length. Extensive experimental results demonstrate that UAR-NVC, with its flexible video clip setting, can adapt to resource-constrained environments and significantly improve performance compared to different baseline models.