🤖 AI Summary
Existing text-to-SQL agents struggle to adaptively reuse historical experience in multi-turn interactions, as their memory mechanisms either rely on static similarity heuristics or perform single-level retrieval based solely on sparse final outcomes. This work proposes MERIT, a novel framework that introduces, for the first time, a dual-level long-term memory mechanism: discourse-level memory guides global strategy through reinforcement learning, while turn-level memory supports local decision-making. Additionally, a lightweight process reward model provides dense supervision for intermediate memory selection. MERIT enables adaptive experience reuse across interaction stages, significantly improving task success rates and reducing interaction turns on BIRD-Interact, while also demonstrating strong zero-shot cross-benchmark transferability on Spider2-Snow without fine-tuning.
📝 Abstract
Interactive text-to-SQL agents solve database tasks through multi-turn interactions involving schema exploration, query execution, feedback interpretation, and decision revision. Long-term memory helps agents reuse past experiences, but existing retrieval methods remain limited. Static methods rely on fixed similarity heuristics that do not optimize downstream utility, while dynamic methods often learn from sparse final outcomes and retrieve memories at a single decision horizon. This is insufficient when memory usefulness changes across interaction stages, since memories useful for initial planning may differ from those needed for local, state-conditioned execution. We propose MERIT, a dynamic multi-horizon memory retrieval framework. MERIT maintains episode-level memory for global strategic guidance and turn-level memory for local decision support. Both levels use learned retrieval policies optimized with reinforcement learning. To train turn-level retrieval despite limited intermediate supervision, MERIT uses a lightweight Process Reward Model to provide dense proxy rewards for local memory selection. Experiments on BIRD-Interact show that MERIT outperforms no-memory, static-retrieval, and dynamic-retrieval baselines in success rate while reducing average interaction turns. Transfer results on Spider2-Snow further show positive cross-benchmark transfer without benchmark-specific tuning. These results suggest that multi-horizon retrieval improves experience reuse in interactive text-to-SQL agents.