Boosted Random Forests for Predicting Treatment Failure of Chemotherapy Regimens

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Predicting chemotherapy treatment failure remains a critical clinical challenge. Method: Leveraging real-world oncology electronic health record (EHR) data, this study develops a lightweight, highly interpretable predictive model for clinical decision support. Focusing on five cancer types with high chemotherapy failure rates, we propose a multi-source heterogeneous feature engineering approach integrating clinical narratives, diagnostic codes, and medication records. We introduce a novel tri-axial design framework—balancing performance, computational complexity, and interpretability—and develop a lightweight Boosted Random Forest model. Results: The model achieves 80% accuracy and 75% F1-score while significantly reducing computational overhead and enhancing feature attribution clarity and clinical interpretability. Validated on real-world data, the model demonstrates operational readiness for clinical deployment, thereby mitigating patient burden and healthcare resource waste associated with ineffective chemotherapy regimens.

Technology Category

Application Category

📝 Abstract

Cancer patients may undergo lengthy and painful chemotherapy treatments, comprising several successive regimens or plans. Treatment inefficacy and other adverse events can lead to discontinuation (or failure) of these plans, or prematurely changing them, which results in a significant amount of physical, financial, and emotional toxicity to the patients and their families. In this work, we build treatment failure models based on the Real World Evidence (RWE) gathered from patients' profiles available in our oncology EMR/EHR system. We also describe our feature engineering pipeline, experimental methods, and valuable insights obtained about treatment failures from trained models. We report our findings on five primary cancer types with the most frequent treatment failures (or discontinuations) to build unique and novel feature vectors from the clinical notes, diagnoses, and medications that are available in our oncology EMR. After following a novel three axes - performance, complexity and explainability - design exploration framework, boosted random forests are selected because they provide a baseline accuracy of 80% and an F1 score of 75%, with reduced model complexity, thus making them more interpretable to and usable by oncologists.

Problem

Research questions and friction points this paper is trying to address.

Predict chemotherapy treatment failure using patient data

Build interpretable models to assist oncologists in decision-making

Analyze clinical notes and medications for treatment discontinuation patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Boosted random forests for treatment failure prediction

Feature engineering from clinical notes and EMR data

Design exploration balancing performance, complexity, explainability

🔎 Similar Papers

Predictive Model Development to Identify Failed Healing in Patients after Non–Union Fracture Surgery