🤖 AI Summary
This work addresses the challenge that existing memory-augmented large language models struggle to effectively validate and update their memories under dynamic environmental drift, often relying on external evaluators or model introspection—approaches that frequently fail in real-world applications. To overcome this limitation, the authors propose GLOVE (Global Verifier), an unsupervised memory validation and calibration framework that actively probes inconsistencies between retrieved memories and new observations, eliminating the need for ground-truth labels or strong model introspection. The core innovation lies in a relative truthfulness–based verification mechanism that removes dependence on supervised signals and self-reflection, enabling autonomous memory evolution in non-stationary environments. Evaluated on web navigation, planning, and control tasks with environmental drift, GLOVE significantly improves agent success rates, demonstrating robustness and strong generalization capabilities.
📝 Abstract
Most existing memory-enhanced Large Language Model (LLM) approaches implicitly assume that memory validity can be established either through external evaluators that provide task-specific success signals or through internal model cognition, such as reflection, for editing memory entries. However, these assumptions often break down in practical environments with dynamic drifts. We propose the Global Verifier (GLOVE), a framework that introduces a new design dimension for LLM memory systems by establishing a relative notion of truth. Through active probing to detect inconsistencies between retrieved memories and fresh observations, GLOVE enables memory-environment realignment by verifying and updating memory without access to ground-truth supervision or strong reliance on model introspection. We evaluate GLOVE on diverse benchmarks spanning web navigation, planning, and control, augmented with controlled environmental drifts that introduce non-stationarity beyond the original benchmark settings. Our results show that GLOVE substantially improves agent success rates, suggesting a robust pathway to cognitive agents capable of self-evolving.