NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high storage and transmission costs of multimodal medical imaging in resource-constrained clinical settings, this paper proposes a semantic-guided graph-structured compression framework. It innovatively leverages cross-modal attention scores from a fine-tuned generative vision-language model (VLM) to identify and prune diagnostically salient regions in chest X-rays, converting them into visual graphs; these are then fused with report-derived knowledge graphs to construct a unified, persistent, and task-agnostic multimodal graph data asset. Evaluated on MIMIC-CXR and CheXpert Plus, the method achieves high compression rates of 93.4%–97.7% while maintaining strong pneumonia detection performance (AUC: 0.88–0.95), significantly outperforming raw-data baselines. Its core contribution is the first use of generative VLMs’ cross-attention mechanisms for interpretable structural pruning, enabling deep, graph-level integration of imaging and textual clinical knowledge.

Technology Category

Application Category

📝 Abstract
The rapid growth of multimodal medical imaging data presents significant storage and transmission challenges, particularly in resource-constrained clinical settings. We propose NEURAL, a novel framework that addresses this by using semantics-guided data compression. Our approach repurposes cross-attention scores between the image and its radiological report from a fine-tuned generative vision-language model to structurally prune chest X-rays, preserving only diagnostically critical regions. This process transforms the image into a highly compressed, graph representation. This unified graph-based representation fuses the pruned visual graph with a knowledge graph derived from the clinical report, creating a universal data structure that simplifies downstream modeling. Validated on the MIMIC-CXR and CheXpert Plus dataset for pneumonia detection, NEURAL achieves a 93.4-97.7% reduction in image data size while maintaining a high diagnostic performance of 0.88-0.95 AUC, outperforming other baseline models that use uncompressed data. By creating a persistent, task-agnostic data asset, NEURAL resolves the trade-off between data size and clinical utility, enabling efficient workflows and teleradiology without sacrificing performance. Our NEURAL code is available at https://github.com/basiralab/NEURAL.
Problem

Research questions and friction points this paper is trying to address.

Compress multimodal medical imaging data efficiently
Prune chest X-rays preserving diagnostic critical regions
Maintain high diagnostic performance with reduced data size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantics-guided data compression for medical images
Attention-guided pruning preserves critical diagnostic regions
Graph-based fusion of visual and clinical report data
🔎 Similar Papers
No similar papers found.