A Novel Self-Evolution Framework for Large Language Models

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) are constrained by static pretraining knowledge; while post-training techniques—such as retrieval-augmented generation or preference optimization—improve user alignment, they rarely enhance domain-specific cognitive capabilities. To address this, we propose DPSE, a two-stage self-evolving framework: Stage I strengthens domain grounding via supervised fine-tuning on expert-curated data to boost professional cognition; Stage II introduces frequency-aware preference optimization that jointly models topic coherence and user satisfaction. A novel multi-dimensional interactive review module enables topic-aware, preference-guided data augmentation, integrating into a closed-loop self-evolutionary learning pipeline. Experiments demonstrate that DPSE significantly outperforms baselines—including supervised fine-tuning, direct preference optimization, and memory-augmented methods—on both general NLP benchmarks and long-horizon dialogue tasks. To our knowledge, DPSE is the first approach to simultaneously advance domain expertise and user alignment.

Technology Category

Application Category

📝 Abstract
The capabilities of Large Language Models (LLMs) are limited to some extent by pre-training, so some researchers optimize LLMs through post-training. Existing post-training strategies, such as memory-based retrieval or preference optimization, improve user alignment yet fail to enhance the model's domain cognition. To bridge this gap, we propose a novel Dual-Phase Self-Evolution (DPSE) framework that jointly optimizes user preference adaptation and domain-specific competence. DPSE introduces a Censor module to extract multi-dimensional interaction signals and estimate satisfaction scores, which guide structured data expansion via topic-aware and preference-driven strategies. These expanded datasets support a two-stage fine-tuning pipeline: supervised domain grounding followed by frequency-aware preference optimization. Experiments across general NLP benchmarks and long-term dialogue tasks demonstrate that DPSE consistently outperforms Supervised Fine-Tuning, Preference Optimization, and Memory-Augmented baselines. Ablation studies validate the contribution of each module. In this way, our framework provides an autonomous path toward continual self-evolution of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' domain cognition beyond pre-training limits
Jointly optimize user preference and domain competence
Enable autonomous continual self-evolution for LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Phase Self-Evolution framework for LLMs
Censor module extracts multi-dimensional interaction signals
Two-stage fine-tuning pipeline enhances domain and preference
🔎 Similar Papers
No similar papers found.