FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of knowledge distillation from RGB-pretrained models to infrared images, which is hindered by fundamental differences in imaging physics. The authors propose a frequency-domain decoupled distillation framework, revealing for the first time that cross-modal feature discrepancies are non-uniformly distributed across frequency bands. Building on this insight, they introduce a band-adaptive asymmetric distillation strategy: a strict MSE loss is applied to low-frequency components to preserve structural information, while a relaxed log-MSE loss is used for high-frequency components to tolerate texture discrepancies and enhance edge guidance. Combining frequency decomposition, a hybrid loss function, and Transformer-based feature analysis, the method achieves 64.1 mAP50 on KAIST—outperforming DINOv2 by 2.4 points—and demonstrates consistently improved generalization across FLIR ADAS, MFNet semantic segmentation, and ResNet-50 architectures.
📝 Abstract
Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics. We investigate the spectral structure of the RGB--IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency. The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0.1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2.4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.1 mAP50, improving 2.4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2.1 mAP50), tasks (MFNet segmentation, +1.85 mean intersection-over-union), and architectures (ResNet-50, +1.0 mAP50). Code is available at: https://anonymous.4open.science/r/freq_decoupled_kd-5E5A
Problem

Research questions and friction points this paper is trying to address.

infrared object detection
knowledge distillation
cross-modal transfer
modality gap
frequency analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Decoupled Knowledge Distillation
Cross-Modal Transfer
Infrared Object Detection
Spectral Divergence Analysis
Asymmetric Supervision
🔎 Similar Papers
No similar papers found.