🤖 AI Summary
This work addresses the high latency and decoding overhead inherent in large-scale video privacy detection. The authors propose an efficient compressed-domain detection method that dynamically decides whether to skip processing P- and B-frames by leveraging inter-frame redundancy, while reusing inference results from I-frames. Integrating a lightweight detector with frame-type-aware processing, the approach significantly reduces computational cost without compromising accuracy. Experimental results demonstrate that the method achieves 99.75% and 96.83% detection accuracy for faces and license plates, respectively, with over 80% of frames skipped during inference. Compared to existing approaches, it improves average accuracy by 9.84% and reduces latency by 75.95%.
📝 Abstract
As the Internet of Things (IoT) becomes deeply embedded in daily life, users are increasingly concerned about privacy leakage, especially from video data. Since frame-by-frame protection in large-scale video analytics (e.g., smart communities) introduces significant latency, a more efficient solution is to selectively protect frames containing privacy objects (e.g., faces). Existing object detectors require fully decoded videos or per-frame processing in compressed videos, leading to decoding overhead or reduced accuracy. Therefore, we propose ComPrivDet, an efficient method for detecting privacy objects in compressed video by reusing I-frame inference results. By identifying the presence of new objects through compressed-domain cues, ComPrivDet either skips P- and B-frame detections or efficiently refines them with a lightweight detector. ComPrivDet maintains 99.75% accuracy in private face detection and 96.83% in private license plate detection while skipping over 80% of inferences. It averages 9.84% higher accuracy with 75.95% lower latency than existing compressed-domain detection methods.