Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the challenging problem of offline representation learning in Exogenous Block MDPs (Ex-BMDPs), where high-dimensional observations are corrupted by time-dependent exogenous noise and no action labels are available. We propose the first theoretically grounded action-agnostic representation learning framework for this setting. Our method, CRAFT, leverages unlabeled multi-agent video data—without action annotations—and employs cross-policy trajectory contrastive modeling to distinguish controllable dynamics from exogenous noise. By integrating controllability discrimination with representation disentanglement, CRAFT provably separates noise from true state features. We establish theoretical guarantees that the learned representation recovers the essential latent state up to invertible transformation. Empirical evaluation on synthetic Ex-BMDP environments confirms both representation separability and strong cross-policy generalization. This work overcomes fundamental limitations of prior single-policy, action-free approaches and establishes an interpretable, theoretically sound representation foundation for video-driven control.

Technology Category

Application Category

📝 Abstract

While sequential decision-making environments often involve high-dimensional observations, not all features of these observations are relevant for control. In particular, the observation space may capture factors of the environment which are not controllable by the agent, but which add complexity to the observation space. The need to ignore these"noise"features in order to operate in a tractably-small state space poses a challenge for efficient policy learning. Due to the abundance of video data available in many such environments, task-independent representation learning from action-free offline data offers an attractive solution. However, recent work has highlighted theoretical limitations in action-free learning under the Exogenous Block MDP (Ex-BMDP) model, where temporally-correlated noise features are present in the observations. To address these limitations, we identify a realistic setting where representation learning in Ex-BMDPs becomes tractable: when action-free video data from multiple agents with differing policies are available. Concretely, this paper introduces CRAFT (Comparison-based Representations from Action-Free Trajectories), a sample-efficient algorithm leveraging differences in controllable feature dynamics across agents to learn representations. We provide theoretical guarantees for CRAFT's performance and demonstrate its feasibility on a toy example, offering a foundation for practical methods in similar settings.

Problem

Research questions and friction points this paper is trying to address.

Learning representations from action-free offline data in Ex-BMDPs

Addressing temporally-correlated noise features in observations

Leveraging diverse agent policies for controllable feature dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diverse datasets from multiple agents

Uses comparison-based representation learning

Focuses on controllable feature dynamics

🔎 Similar Papers

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

2024-02-22arXiv.orgCitations: 2

Bosch Group

Renningen, BW, DE

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)