Affordance2Action: Task-Conditioned Scene-level Affordance Grounding for Real-Time Manipulation

📅 2026-06-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

169K/year
🤖 AI Summary
Existing approaches struggle to accurately localize multifunctional regions in cluttered real-world scenes according to task instructions and lack benchmarks supporting complex mappings such as one-to-many task-to-region correspondences. This work proposes a task-conditioned, scene-level functional affordance localization framework and introduces the first real-scene affordance benchmark dataset encompassing both single- and multi-region instruction mappings. To enable efficient and high-quality annotation, we design the A2A-AffordGen pipeline, which integrates large language model filtering, interactive segmentation, mask refinement, and human verification. Models trained on this dataset significantly outperform baseline methods—including general-purpose segmentation models, vision-language models, and affordance distillation approaches—in both task-level localization accuracy and spatial priors for downstream manipulation tasks.
📝 Abstract
Task-conditioned manipulation requires grounding instructions to task-relevant functional parts rather than object categories. This setting is scene-dependent and often one-to-many in cluttered scenes: the same object may afford different interactions across tasks, while a single task may correspond to either one functional region or multiple valid functional regions, depending on the scene layout. Existing affordance datasets and benchmarks remain misaligned with this setting, as they typically focus on grasping or object-level affordances, rely on synthetic scenes, or assume a single instruction-region correspondence. We present Affordance2Action (A2A), a benchmark-centered learning framework for scene-level, task-conditioned part affordance grounding. At its core is A2A-Bench, a manipulation-oriented benchmark that covers both single-region and multi-region instruction correspondences in everyday scenes, with the latter highlighting the ambiguity and diversity of affordance grounding in realistic multi-object environments. To construct it at scale, we build A2A-AffordGen, an agent-assisted annotation pipeline that combines language-model filtering, interactive part segmentation, instance-level mask-out refinement, task-reasoning instruction generation, and human verification. A2A-Bench's supervision further supports diverse downstream applications, with real-time affordance grounding and affordance-conditioned manipulation policies as two representative examples. Experiments show that A2A exposes substantial gaps in generic segmentation, VLM-based grounding, and affordance distillation baselines, while improving task-level localization and providing useful spatial priors for downstream manipulation. All datasets and code will be publicly released to promote open research.
Problem

Research questions and friction points this paper is trying to address.

affordance grounding
task-conditioned manipulation
scene-level affordance
functional regions
instruction-region correspondence
Innovation

Methods, ideas, or system contributions that make the work stand out.

task-conditioned affordance
scene-level grounding
multi-region correspondence
affordance benchmark
real-time manipulation
L
Litao Liu
Department of Computer Science, Rutgers University-New Brunswick
Y
Yifan Han
Department of Computer Science, Rutgers University-New Brunswick
P
Pengfei Yi
Department of Computer Science, Rutgers University-New Brunswick
W
Wenbo Yu
Department of Computer Science, Rutgers University-New Brunswick
Hanqing Wang
Hanqing Wang
HUST ➡ Shanghai AI lab ➡ HKUST(gz)
MLLMEmbodied AIWorld ModelVLA
H
Haoran Du
Department of Computer Science, Rutgers University-New Brunswick
E
Enze Yuan
Department of Computer Science, Rutgers University-New Brunswick
Z
Zilin Yuan
Department of Computer Science, Rutgers University-New Brunswick
R
Ruiding Feng
Department of Computer Science, Rutgers University-New Brunswick
M
Michael Liu
Department of Computer Science, Rutgers University-New Brunswick
Q
Qi Zhang
Shanghai AI Laboratory
Jingjin Yu
Jingjin Yu
Associate Professor of Comp. Sci., Rutgers Univ. at New Brunswick, Roboticist
Algorithmic Foundations for Robotics