Response-Aware Multimodal Learning for Post-Treatment Visual Acuity Forecasting

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study addresses the challenge of reliably predicting long-term visual acuity trajectories in patients with diabetic macular edema following anti-VEGF therapy using early observational data. The authors propose a novel multimodal prediction model that integrates baseline and one-month post-treatment optical coherence tomography (OCT) images, OCT-derived biomarkers, and clinical variables. Their approach introduces a treatment-response-aware mechanism, employing a spatial attention module to extract localized, prognosis-relevant imaging features and a dependency-aware tabular encoder to process structured clinical data. The model enables individualized, accurate forecasting of visual acuity from 3 to 24 months post-initiation of therapy. Evaluated on the 24-month prediction task, it achieves a mean absolute error (MAE) of 0.1246, root mean square error (RMSE) of 0.1621, and R² of 0.6064, demonstrating consistently stable and clinically meaningful performance across all timepoints.

📝 Abstract

Long-term visual acuity (VA) outcomes after anti-VEGF therapy are central to patient counseling, expectation setting, and follow-up planning in diabetic macular edema (DME). However, in clinical practice, physicians must often estimate long-term visual trajectories based only on early post-treatment findings, making reliable prognostication difficult. Although prior OCT-based learning approaches have largely focused on short-term response or single-endpoint prediction, modeling VA trajectories across multiple future time points from early longitudinal observations remains insufficiently explored. In this study, we assembled a real-world cohort of 188 anti-VEGF-treated DME patients with paired baseline and month-1 OCT scans, along with tabular OCT-derived biomarkers and non-imaging clinical variables. Using only these early data, we formulate a multi-horizon VA forecasting problem aimed at predicting visual outcomes at 3, 6, 12, 18, and 24 months, reflecting clinically meaningful follow-up intervals. We propose ReVA, a response-aware multimodal framework that integrates structural features from baseline and month-1 OCT with the tabular variables to capture baseline disease status and early treatment response. ReVA uses spatial attention to preserve localized prognostic imaging features and a dependency-aware tabular encoder to model interactions among clinical variables. These multimodal representations are fused to predict patient-specific long-term visual acuity trajectories. The proposed framework achieves MAE=0.1246, RMSE=0.1621, and R^2=0.6064 for 24-month VA prediction, with consistent performance across all forecast horizons. Our findings show that incorporating early treatment-response signals enables clinically meaningful long-term visual acuity forecasting, supporting data-driven decision support for routine anti-VEGF management.

Problem

Research questions and friction points this paper is trying to address.

visual acuity forecasting

anti-VEGF therapy

diabetic macular edema

long-term prognosis

multimodal learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

response-aware learning

multimodal fusion

longitudinal forecasting