🤖 AI Summary
This work addresses the incompatibility of existing deep joint source-channel coding methods with discrete semantic tokens from foundation models and the susceptibility of fixed constellations to catastrophic errors under channel noise. To overcome these limitations, the paper proposes Semantic Token Channel Coding (STCC), a unified joint source-channel coding framework tailored for discrete semantic tokens. STCC employs a residual MLP encoder and a triple loss function to align the semantic embedding space with channel topology, thereby learning geometrically structured constellations that transform noise perturbations into semantically or structurally plausible deviations. Experimental results demonstrate that STCC significantly outperforms conventional approaches at low signal-to-noise ratios while enhancing semantic robustness without requiring modifications to the receiver.
📝 Abstract
Deep Joint Source-Channel Coding (JSCC) has emerged as a promising paradigm for overcoming the ``cliff effect" in wireless communications. However, existing Deep JSCC frameworks operate directly on raw analog data such as image pixels rather than the discrete semantic tokens that foundation models require. Moreover, traditional systems employ fixed, hand-designed constellations that treat all tokens equally, leading to catastrophic random errors under channel noise. In this paper, the Semantic Token Codebook Communication (STCC) is proposed as a unified source-channel semantic token coding framework designed to transmit the discrete semantic tokens of foundation models over noisy channels. The core of STCC is the Semantic Token Codec (STC). It accepts discrete tokens as input, which maintains compatibility with foundation models while employing a residual multiple layer perceptron, i.e., MLP-based encoder that learns geometrically structured constellations optimized with a triple-loss objective. This learned mapping forces the channel topology to align with the semantic embedding space, ensuring that channel noise results in topological errors rather than random corruption. This phenomenon is theoretically and empirically characterized, identifying ``Semantic Drift" in symbolic modalities and ``Structural Distortion" in perceptual modalities, where errors shift predictions to semantically or structurally similar tokens. Extensive experiments demonstrate that STCC significantly outperforms traditional systems in low-SNR regimes, effectively converting channel noise into semantic variations without requiring receiver-side modification.