Capturing Fine-Grained Alignments Improves 3D Affordance Detection

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing 3D point cloud affordance detection methods rely on coarse-grained, embedding-level cosine similarity, failing to capture fine-grained semantic alignment between point clouds and textual affordance descriptions—thereby limiting interactive region localization accuracy. To address this, we propose LM-AD, a language-model-guided affordance detection framework featuring an Affordance Query Module (AQM). AQM tightly integrates a pre-trained language model with a 3D point cloud encoder via cross-modal attention, enabling pixel-level point-word alignment and explicitly modeling fine-grained associations between object surface geometry and functional verbs (e.g., “grasp”, “press”). Evaluated on the 3D AffordanceNet benchmark, LM-AD achieves significant improvements in detection accuracy and mean Intersection-over-Union (mIoU), outperforming state-of-the-art methods by an average of 4.2% mIoU. To our knowledge, it is the first method to enable precise, semantics-driven localization of functional regions in 3D point clouds.

Technology Category

Application Category

📝 Abstract

In this work, we address the challenge of affordance detection in 3D point clouds, a task that requires effectively capturing fine-grained alignments between point clouds and text. Existing methods often struggle to model such alignments, resulting in limited performance on standard benchmarks. A key limitation of these approaches is their reliance on simple cosine similarity between point cloud and text embeddings, which lacks the expressiveness needed for fine-grained reasoning. To address this limitation, we propose LM-AD, a novel method for affordance detection in 3D point clouds. Moreover, we introduce the Affordance Query Module (AQM), which efficiently captures fine-grained alignment between point clouds and text by leveraging a pretrained language model. We demonstrated that our method outperformed existing approaches in terms of accuracy and mean Intersection over Union on the 3D AffordanceNet dataset.

Problem

Research questions and friction points this paper is trying to address.

Detecting fine-grained alignments in 3D point clouds

Improving affordance detection accuracy and performance

Overcoming limitations of cosine similarity in text-point cloud alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pretrained language model for alignment

Introduces Affordance Query Module (AQM)

Improves accuracy and mean Intersection over Union

🔎 Similar Papers

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation