🤖 AI Summary
Predicting chemotherapy treatment failure remains a critical clinical challenge. Method: Leveraging real-world oncology electronic health record (EHR) data, this study develops a lightweight, highly interpretable predictive model for clinical decision support. Focusing on five cancer types with high chemotherapy failure rates, we propose a multi-source heterogeneous feature engineering approach integrating clinical narratives, diagnostic codes, and medication records. We introduce a novel tri-axial design framework—balancing performance, computational complexity, and interpretability—and develop a lightweight Boosted Random Forest model. Results: The model achieves 80% accuracy and 75% F1-score while significantly reducing computational overhead and enhancing feature attribution clarity and clinical interpretability. Validated on real-world data, the model demonstrates operational readiness for clinical deployment, thereby mitigating patient burden and healthcare resource waste associated with ineffective chemotherapy regimens.
📝 Abstract
Cancer patients may undergo lengthy and painful chemotherapy treatments, comprising several successive regimens or plans. Treatment inefficacy and other adverse events can lead to discontinuation (or failure) of these plans, or prematurely changing them, which results in a significant amount of physical, financial, and emotional toxicity to the patients and their families. In this work, we build treatment failure models based on the Real World Evidence (RWE) gathered from patients' profiles available in our oncology EMR/EHR system. We also describe our feature engineering pipeline, experimental methods, and valuable insights obtained about treatment failures from trained models. We report our findings on five primary cancer types with the most frequent treatment failures (or discontinuations) to build unique and novel feature vectors from the clinical notes, diagnoses, and medications that are available in our oncology EMR. After following a novel three axes - performance, complexity and explainability - design exploration framework, boosted random forests are selected because they provide a baseline accuracy of 80% and an F1 score of 75%, with reduced model complexity, thus making them more interpretable to and usable by oncologists.