Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses a critical limitation in existing 4D memory systems, which treat semantic descriptions generated by vision-language models (VLMs) as inherently reliable, thereby overlooking inconsistencies or noise arising from varying viewpoints. To mitigate this issue, the authors propose UQ-DAAAM, a novel system that introduces, for the first time, an object-level semantic uncertainty quantification mechanism. By evaluating the consistency of multi-view VLM descriptions, UQ-DAAAM actively selects high-quality viewpoints for semantic fusion under a fixed query budget. The approach leverages probabilistic guarantees to guide view selection and integrates these insights into a spatial-semantic memory architecture. Experiments on the OC-NaVQA benchmark demonstrate that this method significantly reduces semantic uncertainty and improves spatio-temporal question answering accuracy, outperforming current state-of-the-art baselines.

📝 Abstract

Long-horizon robot operation requires spatio-temporal memory to record the environment state and recall it for downstream reasoning. Scene graphs and retrieval-augmented systems ground VLM descriptions to persistent 3D entities with rich semantic descriptions. However, VLM captions are noisy and viewpoint-inconsistent, and existing systems treat them as an oracle with no mechanism to detect unreliable stored descriptions. We introduce object-level semantic uncertainty for multi-view VLM memory: a score that measures object-centric cross-view semantic scatter of captions and identifies semantically unresolved objects. Then, we include our uncertainty scores in an advanced spatial-semantic memory system, that we dub UQ-DAAAM. UQ-DAAAM uses this score to actively refine uncertain objects under a fixed query budget by selecting high-quality views and fusing the resulting multi-view captions into a single object description. We also derive probabilistic guarantees showing that higher-quality candidate views (as selected by our approach) are more likely to reduce uncertainty. Our experiments show that uncertainty quantification can make embodied 4D memory systems more reliable and more effective. In particular, on the OC-NaVQA benchmark, UQ-DAAAM achieves substantially larger uncertainty reduction and better spatio-temporal question answering performance than baselines.

Problem

Research questions and friction points this paper is trying to address.

spatio-temporal memory

uncertainty quantification

visual language models

semantic inconsistency

embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty quantification

spatio-temporal memory

multi-view VLM