UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CAD

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Existing CAD research typically addresses individual tasks in isolation, lacking a unified benchmark for multimodal multitask learning. This work introduces the first comprehensive multimodal benchmark encompassing point cloud reconstruction, text- and image-to-CAD generation, and CAD-based question answering. Furthermore, we propose UniCAD-MLLM, an end-to-end general-purpose multimodal large language model that, for the first time, integrates textual, visual, sketch, and point cloud inputs within a single unified framework to enable collaborative modeling and understanding across diverse tasks. Evaluated on both the newly introduced UniCAD benchmark and the established Fusion360 dataset, UniCAD-MLLM consistently outperforms existing specialized and multitask approaches, achieving state-of-the-art performance across all evaluated tasks.
📝 Abstract
Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.
Problem

Research questions and friction points this paper is trying to address.

Computer-Aided Design
multi-modal learning
multi-task learning
unified benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal learning
multi-task CAD
unified benchmark
large language model
3D reconstruction
🔎 Similar Papers
No similar papers found.