CRAFT: A Neuro-Symbolic Framework for Visual Functional Affordance Grounding

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses visual functional affordance localization—the task of identifying objects capable of supporting specific actions (e.g., “cutting”) in fully unannotated scenes. We propose a neuro-symbolic reasoning framework that integrates ConceptNet commonsense knowledge, language model–derived semantic priors, and CLIP-based visual representations into an energy-driven iterative inference loop, enabling explicit alignment between symbolic logic and perceptual features. A differentiable energy function jointly optimizes visual evidence and commonsense constraints, ensuring transparent, goal-directed reasoning. Evaluated under multi-object, zero-label settings, our method significantly improves localization accuracy while providing traceable, interpretable decision rationales. Our key contribution is the first end-to-end integration of structured commonsense knowledge into visual reasoning—achieving both high performance and intrinsic interpretability—and thereby advancing trustworthy embodied intelligence.

Technology Category

Application Category

📝 Abstract

We introduce CRAFT, a neuro-symbolic framework for interpretable affordance grounding, which identifies the objects in a scene that enable a given action (e.g., "cut"). CRAFT integrates structured commonsense priors from ConceptNet and language models with visual evidence from CLIP, using an energy-based reasoning loop to refine predictions iteratively. This process yields transparent, goal-driven decisions to ground symbolic and perceptual structures. Experiments in multi-object, label-free settings demonstrate that CRAFT enhances accuracy while improving interpretability, providing a step toward robust and trustworthy scene understanding.

Problem

Research questions and friction points this paper is trying to address.

Identify objects enabling specific actions in scenes

Integrate commonsense knowledge with visual evidence

Improve interpretability and accuracy in affordance grounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-symbolic framework for affordance grounding

Integrates ConceptNet, language models, and CLIP

Energy-based reasoning loop for iterative refinement

🔎 Similar Papers

No similar papers found.

Authors to Follow