Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

๐Ÿ“… 2026-06-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

189K/year
๐Ÿค– AI Summary
This work addresses the limitation of existing lifelong learning agents, which rely on static parameters and discrete skill retrieval during inference and thus struggle to continuously internalize feedback at test time like humans. To overcome this, the authors propose LifeSkill, a two-stage reinforcement learning framework enabling online lifelong learning. First, it employs a verifier-guided unsupervised skill discovery mechanism to extract effective skills. Then, during testing, it dynamically converts skill-conditioned trajectories into reward signals to drive online policy updatesโ€”without requiring experience replay. Evaluated on LifelongAgentBench, LifeSkill significantly outperforms current baselines, achieving an average absolute performance gain of 7 percentage points and marking the first demonstration of continuous capability internalization during inference.
๐Ÿ“ Abstract
Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners. To bridge this gap, we propose Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a two-stage reinforcement learning framework for Online Lifelong Learning Agents. Specifically, we design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts, encouraging the model to generate skills that are useful for solving tasks rather than merely plausible in text. Furthermore, we introduce Online Skill Internalization, which continuously improves the policy model during test-time interaction by transforming skill-conditioned trajectories into reward signals. This enables the agent to directly internalize reasoning capabilities into its parameters, avoiding the context bloat of experience retrieval. Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.
Problem

Research questions and friction points this paper is trying to address.

lifelong learning
online learning
test-time adaptation
skill internalization
LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Learning
Skill Internalization
Verifier-Guided Learning
Online Lifelong Learning
LLM Agents