π€ AI Summary
This study addresses a critical gap in drug discovery: structural screening lacks biological context, while phenotypic screening, though biologically relevant, is costly and difficult to scale. To bridge this divide, the authors propose DECODE, a novel framework that embeds biological semantics directly into chemical representations. Leveraging limited paired transcriptomic and morphological data as supervisory signals, DECODE constructs a scalable biological fingerprint encoder that operates without requiring biological assay data during inference. The method enables high-accuracy zero-shot phenotypic prediction without any biological inputs at test time and effectively mitigates experimental noise. In zero-shot mechanism-of-action prediction tasks, DECODE achieves over a 20% performance improvement, and in external validation, it demonstrates a six-fold increase in hit rates for novel anticancer compounds.
π Abstract
Motivation: The scalable identification of bioactive compounds is essential for contemporary drug discovery. This process faces a key trade-off: structural screening offers scalability but lacks biological context, whereas high-content phenotypic profiling provides deep biological insights but is resource-intensive. The primary challenge is to extract robust biological signals from noisy data and encode them into representations that do not require biological data at inference. Results: This study presents DECODE (DEcomposing Cellular Observations of Drug Effects), a framework that bridges this gap by empowering chemical representations with intrinsic biological semantics to enable structure-based in silico biological profiling. DECODE leverages limited paired transcriptomic and morphological data as supervisory signals during training, enabling the extraction of a measurement-invariant biological fingerprint from chemical structures and explicit filtering of experimental noise. Our evaluations demonstrate that DECODE retrieves functionally similar drugs in zero-shot settings with over 20% relative improvement over chemical baselines in mechanism-of-action (MOA) prediction. Furthermore, the framework achieves a 6-fold increase in hit rates for novel anti-cancer agents during external validation. Availability and implementation: The codes and datasets of DECODE are available at https://github.com/lian-xiao/DECODE.