π€ AI Summary
This study investigates how exploration and exploitation shape the geometric structure of internal predictive representations in intelligent agents. Building upon a predictive coding framework, we develop an online reinforcement learning agent that navigates a tree-structured maze, employing an information-gainβdriven exploration policy to modulate experience collection and dynamically update its predictive models of state transitions and rewards. Through geometric analysis of learned representations, we find that exploratory behavior induces more structured and orderly latent representations that faithfully preserve the underlying environmental topology, whereas excessive exploitation leads to disorganized representations. Strikingly, this pattern is consistently observed in both artificial agents and real mice, highlighting the critical role of exploration in constructing generalizable internal models of the environment.
π Abstract
Active sensing links behavior and learning through an action-perception loop: actions determine the observations used to update internal predictive models of perception, which subsequently guide the next actions. Predictive-coding frameworks provide a natural way to model this process, since internal representations are continuously updated to predict future observations. Here, we ask how exploratory and exploitative behavioral strategies shape these internal predictive representations. We build an online learning agent in a tree-like maze with a controllable parameter regulating the balance between exploratory and exploitative regimes. The agent updates a predictive-coding-based perception model from experience generated by its own behavior. The model predicts both future maze states and reward probability, allowing the agent to select actions either by expected information gain during exploration or by predicted reward during exploitation. We show that the resulting internal predictive representations depend strongly on the agent's behavioral regime. Exploratory agents develop representations that are more spatially organized and better preserve the structure of maze transitions in latent space. In contrast, exploitative agents learn less organized representations. We then train this predictive model on natural trajectories of water-deprived mice navigating the same maze and compare the resulting representations with those learned from agent trajectories. More exploratory mice show representational geometries that closely match those of exploratory agents, whereas mice with more restricted visitation patterns resemble reward-driven, exploitative agents. Together, these findings suggest that exploration enables predictive models to form generalized internal representations by organizing latent space around both spatial location and transition context in artificial agents and animals.