π€ AI Summary
This work addresses the limitations of current software engineering agents, which lack effective cross-task memory mechanisms, leading to poor experience reuse and repeated errors, compounded by the absence of task-agnostic utility metrics in existing memory approaches. To overcome this, the authors propose a closed-loop memory-augmented framework that, for the first time, defines memory utility in terms of its measurable impact on downstream task performance. This framework establishes a fully automated, quantifiable, and task-agnostic system for memory optimization and evaluation, eliminating the need for manual annotations. Leveraging large language modelβdriven agents, it enables dynamic memory selection and reuse across both single-round and multi-round tasks. Experimental results demonstrate that the proposed approach yields an absolute 5.25% improvement in task success rate, a 4.63% gain in problem-solving efficiency, and at least a 9.79% reduction in computational overhead.
π Abstract
Large language models (LLMs) have enabled powerful software engineering (SE) agents capable of navigating complex codebases and resolving real-world issues. However, these agents remain fundamentally episodic: they fail to retain, refine, and reuse experiences across tasks, repeatedly reconstructing context from scratch and reproducing similar mistakes. Even with memory support, they offer no remedy for the absence of a principled, task-agnostic \textit{memory utility}, making them difficult to evaluate rigorously or generalize across agents and settings. To tackle these limitations, we introduce \ours, a closed-loop framework for memory augmentation in SE agents. \ours grounds memory utility in \textit{validated downstream impact}, establishing utility as both a task-agnostic \textbf{evaluation benchmark} and an annotation-free \textbf{optimization signal}. Through complementary evaluation on \textit{single-episode} and \textit{cross-episode} memory augmentation, results demonstrate that \ours consistently improves SE agents across settings, achieving absolute gains of up to $\uparrow5.25\%$ in success rate and $\uparrow4.63\%$ in resolve efficiency, while substantially reducing computational cost by $\geq9.79\%$. Our project page: \href{https://xhguo7.github.io/MemOp/}{https://xhguo7.github.io/MemOp/}.