"Set It Up!": Functional Object Arrangement with Compositional Generative Models

📅 2024-05-20
🏛️ Robotics: Science and Systems
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling robots to interpret ambiguous high-level instructions (e.g., “set the table for two”) and generate functionally valid object arrangements. We propose a few-shot scene layout generation method that uniquely leverages abstract spatial relation graphs—parsed by large language models (LLMs)—as structured geometric constraints. These graphs are integrated with a modular diffusion model to solve for physically realizable object poses, while program synthesis and constraint satisfaction jointly optimize for functional validity, physical stability, and visual aesthetics. Evaluated on learning desk, dining table, and coffee table scenes, our approach significantly outperforms existing baselines. Crucially, it achieves semantically correct, stable, and aesthetically coherent layouts using only a small number of demonstration examples and lightweight program sketches—without requiring extensive training data or manual rule engineering.

Technology Category

Application Category

📝 Abstract
This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as"set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as"put object A on the table."We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages large language models (LLMs) to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.
Problem

Research questions and friction points this paper is trying to address.

Developing robots to understand under-specified functional object arrangement instructions
Learning arrangement rules from limited data and human-crafted program sketches
Generating physically plausible and aesthetically pleasing object arrangements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to propose abstract spatial relationships
Composes diffusion models for constraint satisfaction
Learns arrangement patterns from limited data
🔎 Similar Papers
No similar papers found.