🤖 AI Summary
This study addresses the limitations of existing breast cancer recurrence prediction models, which often rely on a single data type and thus fail to capture the full clinical complexity. To overcome this, the authors propose a novel approach that integrates multimodal clinical data—both structured and unstructured—including treatment records, pathology reports, and clinical notes. Key tumor characteristics are extracted using rule-based regular expressions, and a priority-based conflict resolution mechanism is introduced to reconcile inconsistent information across sources. The enriched feature set is then leveraged by multiple machine learning models for recurrence risk prediction. Experimental results demonstrate that this multimodal fusion strategy consistently outperforms unimodal approaches across all evaluated models, significantly improving predictive accuracy and underscoring the value of integrating heterogeneous clinical data for precise prognostic assessment.
📝 Abstract
Breast cancer recurrence, a leading cause of long-term mortality among survivors, requires timely and accurate risk assessment to guide follow-up care and treatment planning. Traditional predictive models, often limited to either structured or unstructured data alone, struggle to capture the full clinical context. This study examines the impact of integrating multi-modal clinical data, including treatment records, pathology reports, and clinician notes, on recurrence prediction. By integrating a rule-based regular expression extraction mechanism with a rigorous precedence-based conflict reconciliation strategy, our approach effectively recovers definitive tumor characteristics from free-text pathology narratives to augment structured records. We also benchmark performance against commonly used feature sets from prior breast cancer studies to assess the added value of multi-modal integration. Single-source and multi-modal inputs are evaluated across a range of machine learning models. Results show that multi-modal integration consistently improves predictive accuracy compared to single-modal methods.