🤖 AI Summary
To address the prevalent semantic noise in labels and the scarcity of high-quality annotations—leading to poor model robustness in medical visual question answering (Med-VQA)—this paper introduces the first Med-VQA noisy-label benchmark and proposes DiN, a diffusion-based framework. DiN innovatively adapts the diffusion generative paradigm to VQA: it employs an Answer Diffuser for coarse-to-fine answer generation, integrates conditional information guidance, and incorporates a Noisy Label Refinement module for dynamic label correction. The method jointly leverages multimodal feature fusion, conditional embedding-based generation, and a robust loss function. Extensive experiments demonstrate significant improvements in noise robustness: DiN achieves state-of-the-art performance across multiple Med-VQA datasets, with an average accuracy gain of 7.2% over prior methods and exceptional stability under high-noise conditions.
📝 Abstract
Medical Visual Question Answering (Med-VQA) systems benefit the interpretation of medical images containing critical clinical information. However, the challenge of noisy labels and limited high-quality datasets remains underexplored. To address this, we establish the first benchmark for noisy labels in Med-VQA by simulating human mislabeling with semantically designed noise types. More importantly, we introduce the DiN framework, which leverages a diffusion model to handle noisy labels in Med-VQA. Unlike the dominant classification-based VQA approaches that directly predict answers, our Answer Diffuser (AD) module employs a coarse-to-fine process, refining answer candidates with a diffusion model for improved accuracy. The Answer Condition Generator (ACG) further enhances this process by generating task-specific conditional information via integrating answer embeddings with fused image-question features. To address label noise, our Noisy Label Refinement(NLR) module introduces a robust loss function and dynamic answer adjustment to further boost the performance of the AD module.