Free(): Learning to Forget in Malloc-Only Reasoning Models

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of large language models in long-context reasoning due to the accumulation of redundant information. The authors propose Free()LM, the first framework to incorporate an endogenous “forgetting” mechanism during inference, thereby breaking from the conventional paradigm of strictly additive context retention. Free()LM employs a plug-and-play LoRA-based Free-Module that dynamically switches between reasoning and cleanup modes, continuously evaluating and pruning low-importance context tokens to maintain a compact, low-noise reasoning state. Experiments demonstrate consistent performance gains across models ranging from 8B to 685B parameters, with an average improvement of 3.3% over state-of-the-art baselines. Notably, Free()LM establishes a new SOTA on IMOanswerBench and restores accuracy to 50% on long-context tasks where Qwen3-235B-A22B completely fails.

Technology Category

Application Category

📝 Abstract
Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as"malloc-only"engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.
Problem

Research questions and friction points this paper is trying to address.

reasoning models
malloc-only
forgetting
context pruning
thinking tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-forgetting
reasoning models
context pruning
LoRA adapter
malloc-only architecture
🔎 Similar Papers
No similar papers found.