Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

📅 2024-09-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Medical video and 3D volumetric sequences suffer from data scarcity, high annotation costs, and weak semantic-temporal controllability and noise-sample quality control in existing diffusion-based generation methods. To address these challenges, this paper proposes a controllable generative augmentation framework. Its core contributions are: (1) a multimodal conditional guidance mechanism for controllable sequence generation, enabling precise customization along both semantic and temporal dimensions; (2) a spatio-temporal consistency enhancement module to preserve structural coherence across frames and volumes; and (3) a dual-level (semantic and sequential) noise filtering mechanism with fine- and coarse-grained quality assessment to eliminate spurious samples. Built upon a diffusion model architecture, the framework demonstrates significant performance gains across three medical datasets, eleven classifiers, and three training paradigms—particularly improving high-risk patient identification and out-of-distribution generalization.

Technology Category

Application Category

📝 Abstract

In the medical field, the limited availability of large-scale datasets and labor-intensive annotation processes hinder the performance of deep models. Diffusion-based generative augmentation approaches present a promising solution to this issue, having been proven effective in advancing downstream medical recognition tasks. Nevertheless, existing works lack sufficient semantic and sequential steerability for challenging video/3D sequence generation, and neglect quality control of noisy synthesized samples, resulting in unreliable synthetic databases and severely limiting the performance of downstream tasks. In this work, we present Ctrl-GenAug, a novel and general generative augmentation framework that enables highly semantic- and sequential-customized sequence synthesis and suppresses incorrectly synthesized samples, to aid medical sequence classification. Specifically, we first design a multimodal conditions-guided sequence generator for controllably synthesizing diagnosis-promotive samples. A sequential augmentation module is integrated to enhance the temporal/stereoscopic coherence of generated samples. Then, we propose a noisy synthetic data filter to suppress unreliable cases at semantic and sequential levels. Extensive experiments on 3 medical datasets, using 11 networks trained on 3 paradigms, comprehensively analyze the effectiveness and generality of Ctrl-GenAug, particularly in underrepresented high-risk populations and out-domain conditions.

Problem

Research questions and friction points this paper is trying to address.

Limited medical datasets hinder deep model performance

Existing methods lack control in sequence generation

Noisy synthetic data reduces downstream task reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal conditions-guided sequence generator

Sequential augmentation module for coherence

Noisy synthetic data filter for reliability

🔎 Similar Papers

No similar papers found.

Authors to Follow