PhyScene3D: Physically Consistent Interactive 3D Tabletop Scene Generation

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of generating dense, irregular interactive 3D tabletop scenes while avoiding physical collisions and violations commonly encountered in existing methods. The authors propose a human-like construction paradigm that leverages a Cognitive Topological Reasoning Chain (CTRC) to enable anchor-guided, sequential scene synthesis. To enhance physical plausibility, they introduce a Physics-Aware Denoising Alignment (PADA) mechanism. By integrating 3D axis-aligned bounding box (AABB) layouts, differentiable signed distance fields, test-time optimization, and end-to-end training, the method preserves semantic intent while significantly improving physical consistency. Experimental results demonstrate a 40% reduction in scene-level collision rates compared to human-annotated data, with both semantic accuracy and physical validity surpassing current state-of-the-art approaches.

📝 Abstract

Generating physically consistent 3D tabletop scenes is a fundamental yet underexplored problem for interactive and generalist robotic learning. The challenge stems from dense object hierarchies and irregular affordances. Here, an interactive scene denotes a physically valid, collision-free environment directly loadable into physics simulators. Existing methods, ranging from decoupled symbolic solvers to end-to-end regression models, often suffer from error propagation or overfitting to noisy supervision containing widespread physical violations. To address these limitations, we introduce PhyScene3D, a framework that reformulates generation as a Human-Mimetic Constructive Process. The proposed Cognitive Topological Reasoning Chain (CTRC) factorizes scene synthesis into a sequential, anchor-conditioned process. It employs a 3D AABB-based placement scheme that imposes a strong structural inductive bias. To address imperfect supervision and physical infeasibility, we introduce Physics-Aware Denoising Alignment (PADA). It integrates a differentiable Signed Distance Field (SDF) with Test-Time Optimization (TTO) to project generated scenes onto a physics-feasible manifold while preserving semantic intent. Experiments demonstrate that PhyScene3D outperforms state-of-the-art approaches in both semantic accuracy and physical validity, achieving a 40% reduction in scene-wise collision rate relative to the human-annotated training data.

Problem

Research questions and friction points this paper is trying to address.

physically consistent

3D tabletop scene generation

interactive scene

collision-free

physics simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physically Consistent Scene Generation

Cognitive Topological Reasoning Chain

Physics-Aware Denoising Alignment