🤖 AI Summary
Current interactive agents rely solely on task rewards for training, achieving high scores yet lacking explicit understanding of the underlying success mechanisms, which results in fragile, hard-to-diagnose behaviors with poor generalization. This work proposes the Explicit Symbolic Behavior Model (ESBM), which represents behavior through typed predicates, weighted rules, and a memory of mechanistic patterns. For the first time, it integrates adaptive questioning and active world-model probing into the training loop, generating constraints after each trajectory to locally refine the model. This approach enables joint learning of policy and interpretable mechanisms and introduces a multi-criterion model selection strategy. Evaluated in Atari-style environments, ESBM not only learns high-scoring policies but also produces explicit answers and executable mechanistic predictions, demonstrating that adaptive questioning serves simultaneously as a training pressure and a reusable benchmark for mechanistic understanding.
📝 Abstract
Interactive agents trained only against task return can achieve high scores while failing to represent the mechanisms that make their actions succeed. This makes brittle behavior difficult to diagnose and limits adaptation when environment dynamics change. Existing LLM reflection and policy-code repair can revise behavior from failed trajectories, but questions and world-understanding tests are usually used only after training. We introduce an Explicit Symbolic Behavioral Model (ESBM), a trainable behavioral model that couples task performance with evidence-grounded question answering and executable mechanism prediction. An ESBM represents behavior through typed predicates, weighted rules, bounded options and mechanism memory; the mechanism layer predicts symbolic events, object changes, rewards and terminal consequences under action interventions. After each rollout, adaptive questions and active world-model probes convert score failures, QA errors and transition-prediction errors into constraints for local ESBM edits. Candidate models are selected by a multi-criterion rule that jointly evaluates task score, answerability and active world-model consistency. Under the tested Atari-style protocols, ESBM learns high-scoring policies while producing explicit answers and executable mechanism predictions, indicating that adaptive questions can serve as both training pressure and reusable benchmarks for mechanistic policy learning in this setting.