Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

173K/year
🤖 AI Summary
This work addresses the limited generalization of current brain–computer interface (BCI) models, which are typically task-specific and lack cross-task modeling capabilities. The authors propose Mind-Omni, a unified multitask framework that integrates seven distinct brain–vision–language encoding and decoding tasks for the first time. Central to this approach is a novel Brain Tokenizer that discretizes continuous neural signals into tokens, enabling their integration into a shared semantic space via a discrete diffusion mechanism for multimodal mutual generation. The framework leverages multitask co-training and a newly curated dataset fine-tuned with BQA-style instructions. Mind-Omni achieves new state-of-the-art performance across multiple tasks, with certain results matching or even surpassing those of larger task-specific models, thereby demonstrating the efficacy of unified modeling and synergistic multitask learning in BCI systems.
📝 Abstract
Modeling the interplay between external stimuli and internal neural representations is a pivotal research area for Brain-Computer Interfaces (BCIs). A major limitation of prior work is the prevailing paradigm of specialized, single-task models, which curtails versatility and neglects inter-task synergies. To address this, we propose Mind-Omni, the first versatile framework that unifies seven distinct encoding and decoding tasks through a discrete diffusion paradigm. At its core is a novel Brain Tokenizer that transforms heterogeneous, continuous brain signals into standardized, discrete tokens. This enables direct, token-level interactions for mutual understanding and generation between any two or more modalities within a shared semantic space. To unlock advanced reasoning capabilities, we further curate a specialized Brain Question Answering (BQA) instruction-tuning dataset. Our model not only establishes a new state-of-the-art among multi-task unified frameworks but also provides strong evidence for multi-task synergy. By demonstrating performance competitive with, and at times superior to, larger specialized models, our work offers a powerful new paradigm for neural modeling and paves the way for foundation models of neural activity. The code is publicly available at https://github.com/ReedOnePeck/Mind-Omni.
Problem

Research questions and friction points this paper is trying to address.

Brain-Computer Interfaces
multi-task modeling
neural representations
unified framework
cross-modal interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion
Brain Tokenizer
multi-task synergy
brain-vision-language modeling
foundation models
Yizhuo Lu
Yizhuo Lu
中科院自动化研究所
人工智能、神经编解码
Changde Du
Changde Du
Institute of Automation, Chinese Academy of Sciences
machine learningcomputer visioncomputational neurosciencebrain-computer interface(BCI)artificial intelligence
Qingyu Shi
Qingyu Shi
Peking University
computer visiondiffusionmultimodal
Hang Chen
Hang Chen
University of Science and Technology of China
Audio-Visual Speech EnhancementAudio-Visual Speech Recognition
Jie Peng
Jie Peng
Renmin University of China
L
Liuyun Jiang
School of Future Technology, University of Chinese Academy of Sciences, Beijing, China
S
Shuangchen Zhao
NeuBCI Lab, State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Zhongguancun Academy, Beijing, China
Huiguang He
Huiguang He
Institute of Automation, Chinese Academy of Scineces
Artificial Intelligencemedical image processingBrain Computer Interface