Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address distribution shifts induced by multi-source data aggregation in medical image segmentation, this work departs from the conventional i.i.d. assumption and introduces the exchangeability assumption—novelly applied to this task—to establish a causally inspired cross-layer foreground–background feature discrepancy control framework. Methodologically, it integrates deep feature disentanglement with causal inference to explicitly model and constrain the distributional divergence between foreground and background representations across all network layers, thereby mitigating generalization degradation caused by data augmentation. Evaluated on five public and in-house ultrasound datasets and three mainstream architectures, the method significantly enhances model robustness and segmentation accuracy under distribution shift; qualitative results demonstrate sharper boundaries and more precise anatomical details. Key contributions are: (i) the first application of the exchangeability assumption to data pooling in medical image segmentation; and (ii) an interpretable, intervenable causal mechanism for cross-layer feature discrepancy regulation.

Technology Category

Application Category

📝 Abstract

Data scarcity is a major challenge in medical imaging, particularly for deep learning models. While data pooling (combining datasets from multiple sources) and data addition (adding more data from a new dataset) have been shown to enhance model performance, they are not without complications. Specifically, increasing the size of the training dataset through pooling or addition can induce distributional shifts, negatively affecting downstream model performance, a phenomenon known as the "Data Addition Dilemma". While the traditional i.i.d. assumption may not hold in multi-source contexts, assuming exchangeability across datasets provides a more practical framework for data pooling. In this work, we investigate medical image segmentation under these conditions, drawing insights from causal frameworks to propose a method for controlling foreground-background feature discrepancies across all layers of deep networks. This approach improves feature representations, which are crucial in data-addition scenarios. Our method achieves state-of-the-art segmentation performance on histopathology and ultrasound images across five datasets, including a novel ultrasound dataset that we have curated and contributed. Qualitative results demonstrate more refined and accurate segmentation maps compared to prominent baselines across three model architectures. The code will be available on Github.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in medical image segmentation

Handling distribution shifts from multi-source data pooling

Improving feature representation for better segmentation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exchangeability assumption for multi-source data pooling

Causal framework controls feature discrepancies

Improved segmentation in data-scarce medical imaging

🔎 Similar Papers

No similar papers found.

Authors to Follow