CRAFT-E: A Neuro-Symbolic Framework for Embodied Affordance Grounding

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enabling assistive robots to understand object functionality in unstructured environments requires robust alignment between linguistic action queries and physically feasible object interactions. Method: We propose a modular neuro-symbolic framework that integrates a verb–attribute–object knowledge graph, explicitly incorporating grasp feasibility into functional reasoning. The framework jointly leverages vision-language alignment, energy-based grasp inference, and functional compatibility modeling; symbolic reasoning generates interpretable, stepwise inference paths for fine-grained diagnostics and customizable decision-making. Contribution/Results: Our approach achieves state-of-the-art performance across static-scene understanding, ImageNet-based functional retrieval, and real-world robotic manipulation tasks—covering 20 verbs and 39 objects. It demonstrates robustness to perceptual noise, component-level interpretability, and high transparency and reliability.

Technology Category

Application Category

📝 Abstract
Assistive robots operating in unstructured environments must understand not only what objects are, but what they can be used for. This requires grounding language-based action queries to objects that both afford the requested function and can be physically retrieved. Existing approaches often rely on black-box models or fixed affordance labels, limiting transparency, controllability, and reliability for human-facing applications. We introduce CRAFT-E, a modular neuro-symbolic framework that composes a structured verb-property-object knowledge graph with visual-language alignment and energy-based grasp reasoning. The system generates interpretable grounding paths that expose the factors influencing object selection and incorporates grasp feasibility as an integral part of affordance inference. We further construct a benchmark dataset with unified annotations for verb-object compatibility, segmentation, and grasp candidates, and deploy the full pipeline on a physical robot. CRAFT-E achieves competitive performance in static scenes, ImageNet-based functional retrieval, and real-world trials involving 20 verbs and 39 objects. The framework remains robust under perceptual noise and provides transparent, component-level diagnostics. By coupling symbolic reasoning with embodied perception, CRAFT-E offers an interpretable and customizable alternative to end-to-end models for affordance-grounded object selection, supporting trustworthy decision-making in assistive robotic systems.
Problem

Research questions and friction points this paper is trying to address.

Grounding language action queries to functional objects in unstructured environments.
Overcoming limitations of black-box models for transparent, controllable assistive robotics.
Integrating grasp feasibility into interpretable affordance inference for object selection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-symbolic framework combines knowledge graph with visual-language alignment
Integrates grasp feasibility reasoning into affordance inference process
Generates interpretable grounding paths for transparent object selection
🔎 Similar Papers
No similar papers found.
Z
Zhou Chen
CSSE Department, Auburn University, Auburn, 36849, AL, USA.
Joe Lin
Joe Lin
University of California, Los Angeles
Computer Vision
C
Carson Bulgin
CSSE Department, Auburn University, Auburn, 36849, AL, USA.
Sathyanarayanan N. Aakur
Sathyanarayanan N. Aakur
Assistant Professor, Auburn University
Event UnderstandingVisual CommonsenseMetagenome Analysis