Adaptive Dual Uncertainty Optimization: Boosting Monocular 3D Object Detection under Test-Time Shifts

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular 3D object detection (M3OD) suffers significant performance degradation under real-world domain shifts due to coupled semantic uncertainty (e.g., category ambiguity) and geometric uncertainty (e.g., unstable 3D localization). To address this, we propose the first test-time adaptation (TTA) framework explicitly designed for dual uncertainty. Our method comprises three key components: (1) an unsupervised focal loss formulated in convex form to enable uncertainty-aware, gradient-stable optimization; (2) a semantics-aware normal field constraint that jointly enforces semantic confidence and geometric structural consistency; and (3) a dual-branch collaborative learning mechanism establishing a semantic–geometric complementary optimization loop. Extensive experiments across multiple benchmarks and cross-domain settings demonstrate substantial improvements in detection accuracy and 3D localization stability, with superior generalization over existing TTA approaches.

Technology Category

Application Category

📝 Abstract
Accurate monocular 3D object detection (M3OD) is pivotal for safety-critical applications like autonomous driving, yet its reliability deteriorates significantly under real-world domain shifts caused by environmental or sensor variations. To address these shifts, Test-Time Adaptation (TTA) methods have emerged, enabling models to adapt to target distributions during inference. While prior TTA approaches recognize the positive correlation between low uncertainty and high generalization ability, they fail to address the dual uncertainty inherent to M3OD: semantic uncertainty (ambiguous class predictions) and geometric uncertainty (unstable spatial localization). To bridge this gap, we propose Dual Uncertainty Optimization (DUO), the first TTA framework designed to jointly minimize both uncertainties for robust M3OD. Through a convex optimization lens, we introduce an innovative convex structure of the focal loss and further derive a novel unsupervised version, enabling label-agnostic uncertainty weighting and balanced learning for high-uncertainty objects. In parallel, we design a semantic-aware normal field constraint that preserves geometric coherence in regions with clear semantic cues, reducing uncertainty from the unstable 3D representation. This dual-branch mechanism forms a complementary loop: enhanced spatial perception improves semantic classification, and robust semantic predictions further refine spatial understanding. Extensive experiments demonstrate the superiority of DUO over existing methods across various datasets and domain shift types.
Problem

Research questions and friction points this paper is trying to address.

Addressing semantic and geometric uncertainty in monocular 3D detection
Adapting models to real-world domain shifts during inference
Improving detection reliability under environmental and sensor variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Uncertainty Optimization minimizes semantic and geometric uncertainties
Convex focal loss enables unsupervised uncertainty weighting
Semantic-aware normal field constraint preserves geometric coherence
🔎 Similar Papers
No similar papers found.
Z
Zixuan Hu
School of Computer Science, Peking University, Beijing, China
Dongxiao Li
Dongxiao Li
School of Computer Science, Peking University, Beijing, China
Xinzhu Ma
Xinzhu Ma
Associate Professor, Beihang University
deep learningcomputer vision3D scene understandingai4science
S
Shixiang Tang
The Chinese University of Hong Kong, Hongkong, China
Xiaotong Li
Xiaotong Li
Peking University
Multimodal LLMFoundation ModelTransfer Learning
Wenhan Yang
Wenhan Yang
P.hD. student of Computer Science, University of California, Los Angeles
Self-supervised LearningModel Robustness
L
Ling-Yu Duan
School of Computer Science, Peking University, Beijing, China; Peng Cheng Laboratory, Shenzhen, China