🤖 AI Summary
Training diffusion models on high-dimensional medical imaging data (e.g., 3D MRI/CT) incurs prohibitive GPU memory consumption and energy cost under single-GPU settings. To address this, we propose a memory-efficient architecture that integrates reversible U-Net with reversible attention mechanisms, enabling fully invertible feature transformations and attention computations. Coupled with memory-optimized gradient computation strategies, our design decouples peak memory usage from data dimensionality. Experiments on BraTS2020 demonstrate a 15% reduction in peak GPU memory, substantial energy savings during training, and state-of-the-art image reconstruction quality. This work constitutes the first systematic incorporation of reversible design principles into diffusion-based 3D medical image generation frameworks, establishing a novel paradigm for efficient generative modeling under resource-constrained conditions.
📝 Abstract
Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU memory-efficient training for diffusion models for high dimensional medical datasets. The proposed model is built by using an invertible UNet architecture with invertible attention modules. This leads to the following two contributions: 1. denoising diffusion models and thus enabling memory usage to be independent of the dimensionality of the dataset, and 2. reducing the energy usage during training. While this new model can be applied to a multitude of image generation tasks, we showcase its memory-efficiency on the 3D BraTS2020 dataset leading to up to 15% decrease in peak memory consumption during training with comparable results to SOTA while maintaining the image quality.