Emerging Standards for Machine-to-Machine Video Coding

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Conventional pixel-level video coding paradigms—designed for human visual perception—exhibit high bandwidth consumption, poor scalability, and risks of raw-data leakage in machine-vision-dominated video communication scenarios. Method: This paper proposes a task-aware end-to-end video coding framework featuring a dual-path standardized architecture: the Pixel-domain Coding Model (VCM) and the Neural Feature-domain Coding Model (FCM). We formally define and implement FCM—the first standardized feature-level compression scheme—optimized for downstream vision tasks. Contribution/Results: FCM achieves detection accuracy nearly matching edge-side inference while substantially reducing bitrates. Empirical evaluation reveals that H.265/HEVC performs comparably to H.266/VVC across most machine vision tasks (average BD-Rate difference: +1.39%), whereas H.264/AVC degrades significantly (−32.28%). Notably, HEVC even outperforms VVC in tracking tasks (BD-Rate: −1.81%), demonstrating strong compatibility with legacy hardware and enabling efficient reuse of existing infrastructure.

Technology Category

Application Category

📝 Abstract

Machines are increasingly becoming the primary consumers of visual data, yet most deployments of machine-to-machine systems still rely on remote inference where pixel-based video is streamed using codecs optimized for human perception. Consequently, this paradigm is bandwidth intensive, scales poorly, and exposes raw images to third parties. Recent efforts in the Moving Picture Experts Group (MPEG) redesigned the pipeline for machine-to-machine communication: Video Coding for Machines (VCM) is designed to apply task-aware coding tools in the pixel domain, and Feature Coding for Machines (FCM) is designed to compress intermediate neural features to reduce bitrate, preserve privacy, and support compute offload. Experiments show that FCM is capable of maintaining accuracy close to edge inference while significantly reducing bitrate. Additional analysis of H.26X codecs used as inner codecs in FCM reveals that H.265/High Efficiency Video Coding (HEVC) and H.266/Versatile Video Coding (VVC) achieve almost identical machine task performance, with an average BD-Rate increase of 1.39% when VVC is replaced with HEVC. In contrast, H.264/Advanced Video Coding (AVC) yields an average BD-Rate increase of 32.28% compared to VVC. However, for the tracking task, the impact of codec choice is minimal, with HEVC outperforming VVC and achieving BD Rate of -1.81% and 8.79% for AVC, indicating that existing hardware for already deployed codecs can support machine-to-machine communication without degrading performance.

Problem

Research questions and friction points this paper is trying to address.

Optimizing video coding for machine consumption, not human perception

Reducing bandwidth and preserving privacy in machine-to-machine systems

Evaluating codec performance for machine tasks to maintain accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

FCM compresses neural features to reduce bitrate and preserve privacy

VCM applies task-aware coding tools in the pixel domain

Existing hardware with HEVC can support machine communication without performance loss

🔎 Similar Papers

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding