PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address information decay and semantic fragmentation in large language models (LLMs) during long-context understanding, this work draws inspiration from biological working memory and cortical modularity. We propose a novel architecture featuring: (1) Persistent Activation (PA), a mechanism that emulates sustained prefrontal neuronal firing to enable dynamic reuse of critical hidden states; and (2) Cortical Expert Clustering (CE), which performs semantic-driven, task-adaptive clustering of feed-forward network (FFN) weights to mitigate semantic fragmentation. The method is fully compatible with existing LLMs and supports activation-level memory banking and cross-token dependency modeling. Experiments demonstrate consistent improvements: +6% on LongBench multi-document QA, +12.5–17.5% on Infinite-Bench, and robust performance on needle-in-a-haystack tasks up to 200K tokens. Our approach significantly enhances long-range contextual modeling capability while improving interpretability.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench's Multi-document QA and 12.5-17.5% performance gains on Infinite-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.
Problem

Research questions and friction points this paper is trying to address.

Addresses information decay in LLMs due to transient neural activations
Reduces semantic fragmentation from unstructured FFN weights in LLMs
Enhances long-context understanding and performance in language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persistent Activity Mechanism mimics PFC neurons
Cortical Expert Clustering reorganizes FFN weights
Enhances long-context performance without structural changes
🔎 Similar Papers
No similar papers found.
Kangcong Li
Kangcong Li
Fudan University
P
Peng Ye
Shanghai Artificial Intelligence Laboratory
Chongjun Tu
Chongjun Tu
fudan university
neural architecture searchdataset pruningMLLM inference acceleration
L
Lin Zhang
School of Information Science and Technology, Fudan University
Chunfeng Song
Chunfeng Song
Shanghai AI Lab
Computer VisionPattern RecognitionAI4Science
J
Jiamin Wu
Shanghai Artificial Intelligence Laboratory
T
Tao Yang
School of Information Science and Technology, Fudan University
Qihao Zheng
Qihao Zheng
Shanghai AI Lab
NeuroscienceNeuroAIAI4NeuroAI4Science
T
Tao Chen
School of Information Science and Technology, Fudan University