Belief Consistency Between Foundation-Model Evidence and Geometric Perception in Persistent Robotic Maps

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the lack of reliability calibration and conflict detection between semantic outputs from foundation models and geometric perception in persistent mapping. To this end, the paper proposes a novel update operator that introduces, for the first time, a conflict-aware belief consistency mechanism. By integrating category-level calibration gating with an event-level conflict rejection window, the method effectively aligns semantic assertions with geometric evidence, preserving high-fidelity semantic information while discarding contradictory outputs. The system combines Mask2Former as the semantic segmenter within a persistent map fusion framework and demonstrates significant performance gains on KITTI-360 and ScanNet: it achieves 99.7% precision for the car class and improves mean IoU to 0.522, outperforming approaches based solely on calibration or end-to-end vision-language models.

📝 Abstract

Persistent maps used by autonomous robots increasingly fuse a geometric perception stack whose assertions are well-characterized with a foundation-model channel that produces semantic claims without calibrated reliability about the same scene. Contemporary mapping systems integrate the two channels by treating the foundation-model channel as an additional voter into a per-element posterior, uncalibrated for its own per-class reliability and without machinery to flag when the two channels contradict each other at a given moment. We propose an update operator with two cooperating mechanisms: a per-class calibrated commit gate, and a per-event conflict-drop window that refuses to commit foundation-model claims contradicted by the geometric channel at the moment of the claim. We evaluate on KITTI-360 and ScanNet, with an oracle geometric channel (panoptic ground truth) and an off-the-shelf online semantic segmenter (Mask2Former) to demonstrate real-world performance. The operator produces substantially more accurate committed maps (KITTI is car commit precision 99.7% vs. 43.9% for the calibration-only operator; mean per-class IoU 0.522 vs. 0.180), retains more compositional true positives at higher precision than a monolithic compositional VLM prompt. The framework operates at deployment quality across both oracle and off-the-shelf-segmenter geometric channels, and is invariant under foundation-model substitution.

Problem

Research questions and friction points this paper is trying to address.

belief consistency

foundation model

geometric perception

persistent mapping

semantic reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

belief consistency

foundation model

geometric perception