CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the unsolved challenge of text-driven editing of existing CAD models—i.e., precisely modifying 3D geometric structures according to natural language instructions. We propose the first end-to-end, text-driven editing framework specifically designed for pre-existing CAD models. Methodologically, we introduce a novel “localize-then-fill” two-stage paradigm and develop an automated pipeline for synthesizing high-quality triplets (original model, instruction, edited model), leveraging Large Vision-Language Models (LVLMs) to generate semantically coherent instructions and corresponding ground-truth edits. Our framework tightly integrates LVLMs, Large Language Models (LLMs), and a differentiable CAD mutation model to achieve trainable, controllable, and geometry-faithful editing. Extensive experiments demonstrate that our approach significantly outperforms baselines in both quantitative metrics and qualitative evaluation, enabling fine-grained, semantically consistent, and geometrically accurate modifications—thereby filling a critical gap in text-guided CAD editing.

Technology Category

Application Category

📝 Abstract
Computer Aided Design (CAD) is indispensable across various industries. emph{Text-based CAD editing}, which automates the modification of CAD models based on textual instructions, holds great potential but remains underexplored. Existing methods primarily focus on design variation generation or text-based CAD generation, either lacking support for text-based control or neglecting existing CAD models as constraints. We introduce emph{CAD-Editor}, the first framework for text-based CAD editing. To address the challenge of demanding triplet data with accurate correspondence for training, we propose an automated data synthesis pipeline. This pipeline utilizes design variation models to generate pairs of original and edited CAD models and employs Large Vision-Language Models (LVLMs) to summarize their differences into editing instructions. To tackle the composite nature of text-based CAD editing, we propose a locate-then-infill framework that decomposes the task into two focused sub-tasks: locating regions requiring modification and infilling these regions with appropriate edits. Large Language Models (LLMs) serve as the backbone for both sub-tasks, leveraging their capabilities in natural language understanding and CAD knowledge. Experiments show that CAD-Editor achieves superior performance both quantitatively and qualitatively.
Problem

Research questions and friction points this paper is trying to address.

Automates CAD model editing via text
Generates training data automatically
Decomposes editing into locate and infill tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data synthesis pipeline
Locate-then-infill framework
Large Language Models backbone
🔎 Similar Papers
No similar papers found.
Y
Yu Yuan
University of Science and Technology of China
Shizhao Sun
Shizhao Sun
Microsoft
Q
Qi Liu
University of Science and Technology of China
J
Jiang Bian
Microsoft Research Asia