Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

πŸ“… 2026-06-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

198K/year
πŸ€– AI Summary
This work addresses the challenge that current large language models struggle to convert short-term contextual knowledge into long-term memory and lack sustained learning capabilities. Inspired by human sleep mechanisms, the authors propose a novel β€œsleep” paradigm comprising two phases: memory consolidation and β€œdreaming.” In the consolidation phase, knowledge is distilled from a smaller model into a larger one through knowledge seeding; in the dreaming phase, synthetic data generated via reinforcement learning enables unsupervised self-improvement. This framework is the first to integrate sleep-inspired mechanisms into large models, combining policy distillation, generalized distillation, and curriculum learning to support self-modification and continual learning. Experiments demonstrate significant performance gains in long-horizon tasks, knowledge integration, and few-shot generalization, validating the efficacy of the proposed sleep mechanism.
πŸ“ Abstract
The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a ''Sleep'' paradigm that allows the models to continually learn, distill their short-term fragile memories into stable long-term knowledge with replay, and recursively improve themselves with ''Dreaming'' process. In more detail, sleep consists of two stages: (1) Memory Consolidation: an upward distillation process, called Knowledge Seeding, where the memories of a smaller-self are distilled into a larger network to provide more capacity while preserving the knowledge. As a proof of concept, we present a new Generalized Distillation process for {Knowledge Seeding} (i.e., the combination of on-policy distillation with Reinforcement Learning (RL)-based imitation learning); (2) Dreaming: a self-improvement phase, where the model uses RL to generate a curriculum of synthetic data to rehearse new knowledge and refine existing capabilities without human supervision. Our experiments on long-horizon, continual learning, knowledge incorporation, and few-shot generalization tasks support the importance of the sleep stage.
Problem

Research questions and friction points this paper is trying to address.

continual learning
memory consolidation
language models
long-term knowledge
in-context learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sleep paradigm
Memory Consolidation
Knowledge Seeding
Dreaming
Continual Learning
πŸ”Ž Similar Papers
No similar papers found.