Ask When It Pays: Cost-Aware Open-Ended Interaction for Instance Goal Navigation

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the ambiguity inherent in linguistic descriptions for instance goal navigation and the lack of distinction among query types in existing interactive methods regarding their information gain versus interaction cost. The task is formulated as a cost-sensitive uncertainty reduction problem, where agent-initiated queries occur only when the expected information gain exceeds the associated querying cost. The authors propose a question-type taxonomy grounded in information gain and introduce data-driven cost weights, establishing the first benchmark that enables evaluation of interaction efficiency along with a novel weighted success rate metric. Experimental results demonstrate that integrating multimodal large language models with zero-shot reasoning under this cost-sensitive strategy significantly improves navigation success rates while reducing query costs.

📝 Abstract

Instance Goal Navigation (IGN) requires an embodied agent to find a specific object instance among distractors from an underspecified natural-language description. Such ambiguity often cannot be resolved from perception and language alone, making interaction with an oracle a natural mechanism for disambiguation. Prior interactive methods allow oracle queries but treat lightweight clarification and route-level guidance alike, letting agents boost success rate through repeated high-information questions rather than by resolving the underlying ambiguity efficiently. We recast interactive IGN as a cost-sensitive uncertainty-reduction problem, where the agent should ask the question whose answer provides the largest reduction in navigation uncertainty relative to its penalty. To this end, we apply an information-gain analysis on existing navigation corpora to identify which cues reduce navigation uncertainty, yielding a compact set of question types and data-derived weights.However, existing interactive navigation benchmarks do not model the cost of different question types or evaluate how efficiently agents use interaction, making them unsuitable for studying cost-sensitive interaction. Based on this taxonomy, we construct a benchmark for diagnosing interaction behavior and efficiency, together with a Weighted Success Rate metric that penalizes each query by its derived cost. We further propose a zero-shot MLLM navigator that selectively queries at each decision step only when the expected uncertainty reduction justifies the interaction cost.

Problem

Research questions and friction points this paper is trying to address.

Instance Goal Navigation

cost-aware interaction

uncertainty reduction

interactive navigation

question efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

cost-sensitive interaction

instance goal navigation

information gain