CLaaS: Continual learning as a service for sample efficient online learning

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge of continual learning for large language model agents operating in dynamically shifting deployment environments, where distributional shifts necessitate ongoing adaptation without catastrophic forgetting. To this end, the authors propose CLaaS—a chat API–based online continual learning system that uniquely integrates experience replay with asynchronous parameter updates. By caching interaction trajectories and reusing gradients derived from them, CLaaS simultaneously mitigates catastrophic forgetting and facilitates forward transfer. Empirical evaluations demonstrate that CLaaS substantially improves sample efficiency and outperforms in-context learning on adversarial tasks, thereby underscoring the critical role of experience replay in effective online continual learning for language agents.

📝 Abstract

Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scenario, as real-world environments cannot be trivially reset. To this end, we investigate an experiential and online continual learning setting in which agents learn from a stream of scenarios. We propose continual learning as-a-service (CLaaS), a system which enables agents to improve during deployment, abstracted behind a chat API. To increase sample efficiency, CLaaS stores rollouts in an experience replay buffer for gradient reuse during asynchronous training. We evaluate CLaaS on an adversarial task, demonstrating that parametric updates lead to superior forward transfer and less forgetting than in-context learning, with replay being a critical choice for sample efficiency.

Problem

Research questions and friction points this paper is trying to address.

continual learning

distribution shift

sample efficiency

online learning

experience replay

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

experience replay

sample efficiency