🤖 AI Summary
To address the insufficient image transmission efficiency and robustness in collaborative perception (CP) among networked mobile agents under dynamic scenarios, this paper proposes a task-driven Sense-of-Machine (SoM) MIMO communication system. The method pioneers the integration of machine shared perception into a digital MIMO joint source-channel coding (JSCC) framework, synergizing feature pyramid extraction with closed-loop channel modeling to enable task-oriented semantic image compression and transmission. The end-to-end trainable architecture deeply unifies deep learning–based JSCC, MIMO physical-layer communication, and task-aware optimization. Experimental results demonstrate that, under identical communication overhead, the proposed approach achieves average mAP improvements of 6.30 and 10.48 over two baseline JSCC methods, respectively, significantly enhancing CP performance—particularly in low-SNR environments.
📝 Abstract
To support cooperative perception (CP) of networked mobile agents in dynamic scenarios, the efficient and robust transmission of sensory data is a critical challenge. Deep learning-based joint source-channel coding (JSCC) has demonstrated promising results for image transmission under adverse channel conditions, outperforming traditional rule-based codecs. While recent works have explored to combine JSCC with the widely adopted multiple-input multiple-output (MIMO) technology, these approaches are still limited to the discrete-time analog transmission (DTAT) model and simple tasks. Given the limited performance of existing MIMO JSCC schemes in supporting complex CP tasks for networked mobile agents with digital MIMO communication systems, this paper presents a Synesthesia of Machines (SoM)-based task-driven MIMO system for image transmission, referred to as SoM-MIMO. By leveraging the structural properties of the feature pyramid for perceptual tasks and the channel properties of the closed-loop MIMO communication system, SoM-MIMO enables efficient and robust digital MIMO transmission of images. Experimental results have shown that compared with two JSCC baseline schemes, our approach achieves average mAP improvements of 6.30 and 10.48 across all SNR levels, while maintaining identical communication overhead.