Tiny Collaborative Inference for Occlusion-Robust Object Detection

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenges of object detection on ultra-low-power edge devices under conditions of occlusion, severe memory and computational constraints, and multi-device communication overhead. The authors propose a lightweight collaborative inference framework that integrates an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantization, enabling efficient deployment on devices with less than 1 MB of SRAM. Notably, they demonstrate for the first time on such resource-constrained hardware that decision-level fusion—specifically weighted box fusion (WBF)—outperforms feature-level fusion, and that host-free multi-board collaboration is feasible. Experiments show that in asymmetric occlusion scenarios, single-view mAP improves by up to 0.2736, while three-view fusion achieves 0.3827; real-world deployment yields a 29.8% increase in frame coverage, with communication energy consumption significantly lower than inference energy.

📝 Abstract

Small edge devices such as IoT surveillance nodes and search-and-rescue (SAR) platforms are increasingly expected to run computer vision locally. On ultra-low-end hardware, however, object detection is limited by available memory and compute, by communication costs when several devices cooperate, and by the loss of accuracy caused by occlusion. The work evaluates occlusion-robust object detection on devices with less than 1 MB SRAM by combining an MCUNet backbone, a YOLOv2 detection head, and TensorFlow Lite quantisation. We evaluate two collaborative inference strategies: feature-level fusion, which concatenates intermediate feature maps, and decision-level fusion via Weighted Boxes Fusion (WBF). Under the tested occlusion settings, WBF outperforms feature-level fusion and gives gains of up to +0.2736 mAP in asymmetric occlusion scenarios. Extending fusion to three views improves accuracy further (up to +0.3827 mAP) while adding communication overhead (approximately 1.3 KB per exchange). The hardware experiments start with a host-assisted USB-relay baseline and then move to a Wi-Fi peer-to-peer deployment on two Coral Dev Board Micro units, where WBF runs on-device and communication energy remains small relative to inference. In a representative 301.9 s autonomous session comprising 108 frames, fused output is observed on 61 frames compared with 47 for Board 2 alone, a frame-level coverage gain of +29.8%. We also include a small exploratory decentralised federated learning (DFL) feasibility note, but do not treat it as a main result because performance remains limited under non-iid local data. The results support decision-level fusion as a viable option for improving occlusion robustness in small-scale edge object detection, including host-free multi-board operation on ultra-low-end hardware.

Problem

Research questions and friction points this paper is trying to address.

occlusion-robust object detection

edge devices

collaborative inference

ultra-low-end hardware

YOLOv2

Innovation

Methods, ideas, or system contributions that make the work stand out.

collaborative inference

occlusion-robust detection

decision-level fusion