VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Computer-Using Agents (CUAs) deployed in professional productivity software suffer from hallucinatory learning and performance degradation due to unbounded memory accumulation. Method: We propose a Controllable Continual Memory Enhancement framework featuring three novel synergistic mechanisms: (1) expert-knowledge-guided memory seed initialization; (2) trajectory-level memory refinement via interactive feedback; and (3) a human-in-the-loop factual verification闭环 before deployment. The framework integrates domain knowledge injection, interaction-trajectory-driven memory updates, and human-agent collaborative validation. Contribution/Results: Evaluated end-to-end on OSWorld for task planning, our approach achieves a 111.1% improvement in task success rate over baseline CUAs—without additional fine-tuning. It effectively suppresses cumulative memory errors, significantly enhancing planning reliability and trustworthiness.

Technology Category

Application Category

📝 Abstract

Continual memory augmentation allows computer-use agents (CUAs) to learn from past interactions and refine their task-solving strategies over time. However, unchecked memory accumulation can introduce spurious or hallucinated"learnings"that degrade agent performance, particularly in domain-specific workflows such as productivity software. We present a novel framework, VerificAgent, that effectively manages memory for CUAs through (1) an expert-curated seed of domain knowledge, (2) iterative, trajectory-based memory refinement during training, and (3) a post-hoc fact-checking pass by human experts to sanitize accumulated memory before deployment. On OSWorld productivity tasks, VerificAgent achieves a 111.1% relative improvement in success rate over baseline CUA without any additional fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Manages unchecked memory accumulation degrading agent performance

Integrates expert knowledge for robust domain-specific task planning

Ensures fact-checked memory refinement to improve success rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-curated seed of domain knowledge

Iterative trajectory-based memory refinement

Post-hoc fact-checking by human experts

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments