🤖 AI Summary
It remains unclear whether attention maps from foundation models in computational pathology reflect genuine biological mechanisms, limiting their clinical credibility and regulatory approval. This work proposes a hypothesis-free evaluation framework that, for the first time, leverages co-registered spatial transcriptomics (Visium) data to objectively and quantitatively assess the biological plausibility of attention maps from five pathology foundation models and ResNet50 in predicting molecular features of glioblastoma. Results demonstrate that attention maps are significantly enriched in multi-gene transcriptional pathways (Cohen’s d = 0.329) rather than individual molecular events. Moreover, different encoders attend to distinct biological regions, and their internal performance rankings reverse when externally validated across independent cohorts (CPTAC/TCGA), exposing critical limitations in current evaluation paradigms.
📝 Abstract
Whether attention maps from pathology foundation models capture genuine biology remains unknown, yet this question is critical for clinical trust and regulatory approval. We propose a spatial transcriptomics-based framework for orthogonal, hypothesis-free evaluation of attention and apply it to five pathology foundation models (CONCH v1.5, UNI v2, Virchow2, GigaPath, H-Optimus-1) and a ResNet50 baseline. Using attention-based multiple instance learning, we train single-task and multi-task models to predict five molecular alterations in glioblastoma on the CPTAC cohort, validate on an independent TCGA cohort, and evaluate biological coherence of attention maps against 87 transcriptional signatures using co-registered Visium spatial transcriptomics data from 18 samples. Internally, no single encoder dominates across all tasks, and external validation inverts internal performance rankings. Attention maps show a five-fold enrichment gradient from pathways (Cohen's d=0.329) to individual genes (d=0.055), indicating that attention captures emergent multi-gene transcriptional programs rather than individual molecular events. Spatially smooth attention maps do not imply biological coherence, and different encoders attend to distinct biological compartments. Our framework provides objective, quantitative assessment of what foundation models learn from histopathology, moving the field beyond qualitative saliency map review.