3D-CoS: A New 3D Reconstruction Paradigm Based on VLM Code Synthesis

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing 3D reconstruction methods rely on low-level representations such as point clouds, meshes, or NeRFs, which hinder procedural control and precise editing. This work proposes a novel paradigmβ€”3D Code Synthesis (3D-CoS)β€”that represents 3D assets as executable Blender code for the first time. By integrating vision-language models with blueprint planning, retrieval-augmented generation (RAG), few-shot geometric exemplars, and component-level agent workflows, the approach enables structured, programmatic 3D generation and localized text-driven editing. Experimental results demonstrate that, compared to point-cloud baselines, the proposed method significantly improves fidelity in edited regions while better preserving consistency in unedited areas, thereby validating the efficacy of code as a representational medium for controllable 3D modeling.
πŸ“ Abstract
Most recent 3D reconstruction and editing systems operate on implicit and explicit representations such as NeRF, point clouds, or meshes. While these representations enable high-fidelity rendering, they are fundamentally low-level and hard to control programmatically. In contrast, we propose and systematically evaluate a new 3D reconstruction paradigm, 3D Code Synthesis (3D-CoS), where 3D assets are constructed as executable Blender code, a programmatic and interpretable medium. To assess how well current VLMs can use code to represent 3D objects, we evaluate representative open-source and closed-source VLMs in code-based reconstruction under a unified protocol. We further introduce a suite of structured code-synthesis workflows, including blueprint-based planning, Retrieval-Augmented Generation (RAG) over Blender API documentation, few-shot geometric demonstrations, and a component-level Agent workflow for part-wise code generation. To demonstrate the unique advantages of this representation, we further evaluate localized text-driven modifications and compare our code-based edits with a point-cloud-based 3D editing baseline. Our study shows that code as a 3D representation offers strong controllability and locality, yielding stronger edit fidelity and better preservation of unedited regions in our targeted editing evaluation. Our work also analyzes the potential of this paradigm, delineates the current capability frontier of VLMs for programmatic 3D modeling, and highlights code synthesis as a promising direction for editable 3D reconstruction.
Problem

Research questions and friction points this paper is trying to address.

3D reconstruction
programmable control
3D representation
editability
code synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D code synthesis
programmable 3D representation
visual language models
retrieval-augmented generation
editable 3D reconstruction
πŸ”Ž Similar Papers