C3VD-DEFCOL: A Deformable Colonoscopy Dataset with Time-Resolved 3D Ground Truth and Realistic Appearance

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the scarcity of colonoscopy datasets that simultaneously exhibit realistic mucosal appearance and densely sampled, temporally continuous 3D ground truth, which hinders research on 3D reconstruction under non-rigid deformation. We present the first controllable-deformation synthetic colonoscopy dataset, generated by modeling peristaltic motion on C3VD colon meshes using real camera trajectories and leveraging the LTX-2.3 architecture for geometry-guided simulation-to-real image translation. The dataset includes 110 videos spanning 11 distinct colon geometries and multiple levels of peristalsis intensity, each accompanied by synchronized 3D meshes and multimodal ground truth (depth, surface normals, optical flow, and timestamps). Notably, controllable peristalsis is introduced as a novel evaluation dimension. Experiments demonstrate that increased deformation significantly elevates pose estimation error, establishing a reproducible quantitative benchmark for deformable endoscopic 3D reconstruction and effectively narrowing the domain gap between synthetic and clinical data.

📝 Abstract

3D reconstruction could improve colonoscopy by estimating mucosal coverage and alerting clinicians to missed regions during screening. However, algorithm development is limited as no current datasets provide both a realistic in vivo appearance and dense, time-resolved 3D ground truth, especially under non-rigid deformation. We present C3VD-DEFCOL, a framework and dataset for evaluating deformable colonoscopy reconstruction with paired geometry and realistic texture. Starting from C3VD/C3VDv2 colon meshes and camera trajectories, we generate controlled deformations of the colon surface, including peristaltic waves and centerline motion, and render per-frame depth, surface normals, optical flow, camera poses, and time-stamped 3D meshes. We then use the rendered geometry, primarily depth, to condition an LTX-2.3-based sim-to-real translation model that produces RGB clips with in vivo-like mucosal color, texture, vasculature, and specular appearance while preserving the underlying 3D scene structure. The resulting dataset contains 110 videos from 11 unique colon mesh geometries, with varying camera trajectories, appearances, and parameterized deformation regimes, including three peristaltic severity levels that serve as controlled evaluation axes. We evaluate the generated videos using appearance realism, geometric consistency, and temporal consistency metrics, and use the paired ground truth to benchmark the downstream task of pose estimation in deformable 3D reconstruction. Our experiments show how pose estimation error increases with increasing deformation severity, providing a controlled stress test that is not possible with existing in vivo datasets. Overall, C3VD-DEFCOL is designed as a reproducible, quantitative evaluation platform for testing deformable 3D reconstruction algorithms, with the goal of reducing the domain gap between synthetic datasets and in vivo colonoscopy.

Problem

Research questions and friction points this paper is trying to address.

colonoscopy

3D reconstruction

non-rigid deformation

ground truth

sim-to-real

Innovation

Methods, ideas, or system contributions that make the work stand out.

deformable 3D reconstruction

sim-to-real translation

time-resolved 3D ground truth