Data Augmentation for Instruction Following Policies via Trajectory Segmentation

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalization capability of instruction-following agents caused by scarce annotated trajectory data, this paper proposes Play Segmentation (PS), a probabilistic model that automatically discovers high-quality, instruction-aligned trajectory segments from large-scale unlabeled gameplay or simulation traces. PS performs fine-grained semantic segmentation of long trajectories without assuming fixed-length segments and requires only a small number of short instruction examples. It integrates probabilistic graphical modeling, trajectory–instruction alignment learning, and a semi-supervised training framework, seamlessly embedding into imitation learning pipelines. Experiments on both game-playing and robotic manipulation tasks demonstrate that policies enhanced with PS achieve performance comparable to baselines trained on twice the volume of human-annotated data; in contrast, random sampling degrades performance significantly. These results validate PS’s effectiveness and practicality for scalable, low-supervision instruction grounding.

Technology Category

Application Category

📝 Abstract
The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to augment a small annotated dataset of instruction-trajectory pairs to improve the performance of an instruction-following policy trained downstream via imitation learning. Assuming little variation in segment length, recent video segmentation methods can effectively extract labelled segments. To address the constraint of segment length, we propose Play Segmentation (PS), a probabilistic model that finds maximum likely segmentations of extended subsegments, while only being trained on individual instruction segments. Our results in a game environment and a simulated robotic gripper setting underscore the importance of segmentation; randomly sampled segments diminish performance, while incorporating labelled segments from PS improves policy performance to the level of a policy trained on twice the amount of labelled data.
Problem

Research questions and friction points this paper is trying to address.

Limited data pairs instructions with agent trajectories.
Extract labelled segments from unannotated play trajectories.
Improve instruction-following policy via augmented dataset.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised trajectory segmentation for data augmentation
Play Segmentation model for probabilistic subsegment extraction
Improved policy performance via imitation learning with augmented data
🔎 Similar Papers
No similar papers found.