ToDMA: Large Model-Driven Token-Domain Multiple Access for Semantic Communications

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address token collisions, high latency, and perceptual distortion arising from concurrent multi-user token access in semantic communication, this paper proposes a novel token-domain non-orthogonal multiple access (NOMA) mechanism. Methodologically, it integrates compressed sensing–driven token detection, cross-slot channel state information (CSI) clustering, pre-trained multimodal large language model (MLLM)-based mask completion, and joint source-channel semantic coding—enabling massive concurrent device transmission and semantic-level collision resolution under shared token and modulation codebooks. This work establishes the first end-to-end semantic multiple access paradigm without orthogonal resource partitioning. Experiments demonstrate that, for both text and image tasks, the proposed scheme achieves significantly lower end-to-end latency than orthogonal schemes, reduces distortion rate by 27.4%, and outperforms existing non-orthogonal methods in perceptual quality metrics—including LPIPS and FID.

Technology Category

Application Category

📝 Abstract
Token communications (TokCom) is an emerging generative semantic communication concept that reduces transmission rates by using context and multimodal large language model (MLLM)-based token processing, with tokens serving as universal semantic units across modalities. In this paper, we propose a semantic multiple access scheme in the token domain, referred to as token domain multiple access (ToDMA), where a large number of devices share a token codebook and a modulation codebook for source and channel coding, respectively. Specifically, each transmitter first tokenizes its source signal and modulate each token to a codeword. At the receiver, compressed sensing is employed first to detect active tokens and the corresponding channel state information (CSI) from the superposed signals. Then, the source token sequences are reconstructed by clustering the token-associated CSI across multiple time slots. In case of token collisions, some active tokens cannot be assigned and some positions in the reconstructed token sequences are empty. We propose to use pre-trained MLLMs to leverage the context, predict masked tokens, and thus mitigate token collisions. Simulation results demonstrate the effectiveness of the proposed ToDMA framework for both text and image transmission tasks, achieving significantly lower latency compared to context-unaware orthogonal communication schemes, while also delivering superior distortion and perceptual quality compared to state-of-the-art context-unaware non-orthogonal communication methods.
Problem

Research questions and friction points this paper is trying to address.

Proposes token domain multiple access (ToDMA) for semantic communications
Uses MLLMs to mitigate token collisions in shared codebooks
Reduces latency and improves quality in multimodal token transmission
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-domain multiple access for semantic communications
Compressed sensing detects active tokens and CSI
Pre-trained MLLMs predict masked tokens to mitigate collisions