Feature Fusion and Knowledge-Distilled Multi-Modal Multi-Target Detection

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-target detection (MTD) on resource-constrained embedded devices for security and defense applications faces challenges from heterogeneous RGB–thermal dual-modal inputs and high computational overhead. Method: We propose a lightweight, efficient dual-modal detection framework featuring a novel posterior-probability-guided multi-stage knowledge distillation mechanism that fuses multi-modal features and optimizes a composite loss function. A compact student model is designed to inherit knowledge from a larger teacher model. Contribution/Results: The student model achieves 95% of the teacher’s mAP while reducing inference latency by approximately 50%. It significantly enhances cross-domain generalization and enables practical deployment on embedded platforms. This work establishes a scalable, multi-source perception paradigm for low-power AI vision systems.

Technology Category

Application Category

📝 Abstract
In the surveillance and defense domain, multi-target detection and classification (MTD) is considered essential yet challenging due to heterogeneous inputs from diverse data sources and the computational complexity of algorithms designed for resource-constrained embedded devices, particularly for Al-based solutions. To address these challenges, we propose a feature fusion and knowledge-distilled framework for multi-modal MTD that leverages data fusion to enhance accuracy and employs knowledge distillation for improved domain adaptation. Specifically, our approach utilizes both RGB and thermal image inputs within a novel fusion-based multi-modal model, coupled with a distillation training pipeline. We formulate the problem as a posterior probability optimization task, which is solved through a multi-stage training pipeline supported by a composite loss function. This loss function effectively transfers knowledge from a teacher model to a student model. Experimental results demonstrate that our student model achieves approximately 95% of the teacher model's mean Average Precision while reducing inference time by approximately 50%, underscoring its suitability for practical MTD deployment scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enhance multi-target detection accuracy with feature fusion
Reduce computational complexity for resource-constrained devices
Improve domain adaptation via knowledge distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature fusion for multi-modal inputs
Knowledge distillation for domain adaptation
Composite loss for efficient training
🔎 Similar Papers
No similar papers found.
N
Ngoc Tuyen Do
School of Information and Communications, Hanoi University of Science and Technology
Tri Nhu Do
Tri Nhu Do
Assistant Professor, Polytechnique Montréal
TelecomWirelessDetectionAI/ML