PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address information decay and semantic fragmentation in large language models (LLMs) during long-context understanding, this work draws inspiration from biological working memory and cortical modularity. We propose a novel architecture featuring: (1) Persistent Activation (PA), a mechanism that emulates sustained prefrontal neuronal firing to enable dynamic reuse of critical hidden states; and (2) Cortical Expert Clustering (CE), which performs semantic-driven, task-adaptive clustering of feed-forward network (FFN) weights to mitigate semantic fragmentation. The method is fully compatible with existing LLMs and supports activation-level memory banking and cross-token dependency modeling. Experiments demonstrate consistent improvements: +6% on LongBench multi-document QA, +12.5–17.5% on Infinite-Bench, and robust performance on needle-in-a-haystack tasks up to 200K tokens. Our approach significantly enhances long-range contextual modeling capability while improving interpretability.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench's Multi-document QA and 12.5-17.5% performance gains on Infinite-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.

Problem

Research questions and friction points this paper is trying to address.

Addresses information decay in LLMs due to transient neural activations

Reduces semantic fragmentation from unstructured FFN weights in LLMs

Enhances long-context understanding and performance in language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Persistent Activity Mechanism mimics PFC neurons

Cortical Expert Clustering reorganizes FFN weights

Enhances long-context performance without structural changes

🔎 Similar Papers

No similar papers found.

Authors to Follow