Trans${^2}$-CBCT: A Dual-Transformer Framework for Sparse-View CBCT Reconstruction

πŸ“… 2025-06-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Sparse-view cone-beam CT (CBCT) suffers from severe undersampling, leading to prominent artifacts and incomplete spatial coverage, thereby limiting its utility in low-dose and rapid imaging. To address this, we propose TransΒ²-CBCTβ€”a dual-Transformer collaborative reconstruction framework. Its backbone integrates multi-scale CNN and Transformer features (Trans-CBCT), while a neighborhood-aware point cloud Transformer enforces voxel-level consistency. Key innovations include 3D positional encoding, k-nearest-neighbor attention, a lightweight decaying prediction head, and a multi-view feature query mechanism. Evaluated on the six-view LUNA16 dataset, Trans-CBCT achieves a PSNR of 34.21 dB (+1.17 dB) and SSIM of 0.9231 (+0.0163) over baseline methods; further refinement yields additional gains of +0.63 dB in PSNR and +0.0117 in SSIM, significantly enhancing reconstruction quality and geometric fidelity.

Technology Category

Application Category

πŸ“ Abstract
Cone-beam computed tomography (CBCT) using only a few X-ray projection views enables faster scans with lower radiation dose, but the resulting severe under-sampling causes strong artifacts and poor spatial coverage. We address these challenges in a unified framework. First, we replace conventional UNet/ResNet encoders with TransUNet, a hybrid CNN-Transformer model. Convolutional layers capture local details, while self-attention layers enhance global context. We adapt TransUNet to CBCT by combining multi-scale features, querying view-specific features per 3D point, and adding a lightweight attenuation-prediction head. This yields Trans-CBCT, which surpasses prior baselines by 1.17 dB PSNR and 0.0163 SSIM on the LUNA16 dataset with six views. Second, we introduce a neighbor-aware Point Transformer to enforce volumetric coherence. This module uses 3D positional encoding and attention over k-nearest neighbors to improve spatial consistency. The resulting model, Trans$^2$-CBCT, provides an additional gain of 0.63 dB PSNR and 0.0117 SSIM. Experiments on LUNA16 and ToothFairy show consistent gains from six to ten views, validating the effectiveness of combining CNN-Transformer features with point-based geometry reasoning for sparse-view CBCT reconstruction.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing sparse-view CBCT with severe artifacts
Enhancing global context and local details in CBCT
Improving spatial consistency in volumetric CBCT reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN-Transformer model for CBCT reconstruction
Neighbor-aware Point Transformer for volumetric coherence
Multi-scale features with view-specific queries
πŸ”Ž Similar Papers
No similar papers found.