3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing point-cloud-based methods for 3D affordance reasoning suffer from poor generalization, coordinate sensitivity, and insufficient robustness due to point cloud sparsity. Method: This work introduces the first large-scale affordance dataset tailored for 3D Gaussian Splatting (3DGS), comprising 23,677 Gaussian instances across 18 functional categories—filling a critical data gap for affordance learning under this representation. We propose AffordSplatNet, featuring a cross-modal structural alignment module that fuses point-cloud and 3DGS representations via geometric consistency priors, augmented with structural consistency constraints and multi-modal supervision. Results: Our method achieves significant improvements over state-of-the-art approaches on both seen and unseen objects, substantially enhancing recognition accuracy and cross-scene generalization. This work establishes a new benchmark for fine-grained affordance understanding in embodied intelligence empowered by 3DGS.

Technology Category

Application Category

📝 Abstract

3D affordance reasoning is essential in associating human instructions with the functional regions of 3D objects, facilitating precise, task-oriented manipulations in embodied AI. However, current methods, which predominantly depend on sparse 3D point clouds, exhibit limited generalizability and robustness due to their sensitivity to coordinate variations and the inherent sparsity of the data. By contrast, 3D Gaussian Splatting (3DGS) delivers high-fidelity, real-time rendering with minimal computational overhead by representing scenes as dense, continuous distributions. This positions 3DGS as a highly effective approach for capturing fine-grained affordance details and improving recognition accuracy. Nevertheless, its full potential remains largely untapped due to the absence of large-scale, 3DGS-specific affordance datasets. To overcome these limitations, we present 3DAffordSplat, the first large-scale, multi-modal dataset tailored for 3DGS-based affordance reasoning. This dataset includes 23,677 Gaussian instances, 8,354 point cloud instances, and 6,631 manually annotated affordance labels, encompassing 21 object categories and 18 affordance types. Building upon this dataset, we introduce AffordSplatNet, a novel model specifically designed for affordance reasoning using 3DGS representations. AffordSplatNet features an innovative cross-modal structure alignment module that exploits structural consistency priors to align 3D point cloud and 3DGS representations, resulting in enhanced affordance recognition accuracy. Extensive experiments demonstrate that the 3DAffordSplat dataset significantly advances affordance learning within the 3DGS domain, while AffordSplatNet consistently outperforms existing methods across both seen and unseen settings, highlighting its robust generalization capabilities.

Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D affordance reasoning for task-oriented AI manipulations

Addressing limitations of sparse 3D point clouds in affordance recognition

Leveraging 3D Gaussian Splatting for fine-grained affordance details

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D Gaussian Splatting for high-fidelity rendering

Introduces large-scale 3DGS-specific affordance dataset

Proposes cross-modal alignment for improved recognition

🔎 Similar Papers

No similar papers found.

Authors to Follow