Structured Preference Optimization for Vision-Language Long-Horizon Task Planning

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing vision-language task planning methods suffer from degraded reasoning quality in dynamic, long-horizon tasks, primarily due to insufficient modeling and training of extended reasoning processes. To address this, we propose a Structured Preference Optimization (SPO) framework that jointly scores plans along three dimensions: task relevance, visual grounding, and historical consistency. We further introduce a curriculum-based long-horizon training paradigm to progressively enhance temporal reasoning capabilities. As a key contribution, we design the first structured preference evaluation mechanism and establish ExtendaBench—the inaugural benchmark covering four-stage temporal granularity (1,509 tasks). Experiments demonstrate substantial improvements: on VirtualHome, Goal Completion Rate (GCR) and Success Rate (SR) increase by 5.98% and 4.68%, respectively; on Habitat 2.0, gains reach 3.30% (GCR) and 2.11% (SR), consistently outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Existing methods for vision-language task planning excel in short-horizon tasks but often fall short in complex, long-horizon planning within dynamic environments. These challenges primarily arise from the difficulty of effectively training models to produce high-quality reasoning processes for long-horizon tasks. To address this, we propose Structured Preference Optimization (SPO), which aims to enhance reasoning and action selection in long-horizon task planning through structured preference evaluation and optimized training strategies. Specifically, SPO introduces: 1) Preference-Based Scoring and Optimization, which systematically evaluates reasoning chains based on task relevance, visual grounding, and historical consistency; and 2) Curriculum-Guided Training, where the model progressively adapts from simple to complex tasks, improving its generalization ability in long-horizon scenarios and enhancing reasoning robustness. To advance research in vision-language long-horizon task planning, we introduce ExtendaBench, a comprehensive benchmark covering 1,509 tasks across VirtualHome and Habitat 2.0, categorized into ultra-short, short, medium, and long tasks. Experimental results demonstrate that SPO significantly improves reasoning quality and final decision accuracy, outperforming prior methods on long-horizon tasks and underscoring the effectiveness of preference-driven optimization in vision-language task planning. Specifically, SPO achieves a +5.98% GCR and +4.68% SR improvement in VirtualHome and a +3.30% GCR and +2.11% SR improvement in Habitat over the best-performing baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhances reasoning in long-horizon vision-language task planning.

Improves action selection through structured preference evaluation.

Introduces a benchmark for complex vision-language task scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Preference Optimization enhances reasoning.

Curriculum-Guided Training improves generalization ability.

ExtendaBench benchmark covers 1,509 diverse tasks.

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments