🤖 AI Summary
Existing embodied intelligence methods for object search rely on outdated perception models, neglect the accumulation of uncertainty over time, and suffer from overconfident decision-making due to direct transfer from ground-truth supervision to noisy perceptual inputs. This paper introduces the first plug-and-play framework that holistically integrates calibrated semantic segmentation uncertainty modeling throughout both temporal aggregation and “discovery” decision-making. Our approach leverages probabilistic calibration, uncertainty-aware semantic segmentation, and temporal confidence aggregation—enabling zero-cost deployment without any fine-tuning. It is model-agnostic, compatible with diverse perception backbones and navigation policies, and requires no additional training. Evaluated in complex, realistic environments, our method significantly improves search success rate and robustness against perceptual noise and temporal ambiguity. Code and pretrained models are publicly available.
📝 Abstract
Embodied AI has made significant progress acting in unexplored environments. However, tasks such as object search have largely focused on efficient policy learning. In this work, we identify several gaps in current search methods: They largely focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time, without accounting for the resulting overconfidence in the perceived state. We address the identified problems through calibrated perception probabilities and uncertainty across aggregation and found decisions, thereby adapting the models for sequential tasks. The resulting methods can be directly integrated with pretrained models across a wide family of existing search approaches at no additional training cost. We perform extensive evaluations of aggregation methods across both different semantic perception models and policies, confirming the importance of calibrated uncertainties in both the aggregation and found decisions. We make the code and trained models available at https://semantic-search.cs.uni-freiburg.de.