🤖 AI Summary
Identifying disease-associated genes from gene expression data remains heavily reliant on manual curation and lacks scalability. Method: We propose GenoTEX, the first automated evaluation benchmark for this task—covering data selection, preprocessing, and statistical analysis—and provide expert-annotated code and results. We formalize domain-expert practices as quantifiable LLM-agent tasks and introduce GenoAgent, a self-correcting multi-agent framework integrating LLMs, workflow orchestration, differential expression analysis, GO/KEGG enrichment, and expert-knowledge alignment. Contribution/Results: On GenoTEX, GenoAgent achieves end-to-end automated analysis with significantly reduced human intervention. Error analysis identifies semantic understanding and domain-logic modeling as primary bottlenecks. This work establishes a reproducible, evaluable benchmark and methodological paradigm for biomedical AI agents.
📝 Abstract
Recent advancements in machine learning have significantly improved the identification of disease-associated genes from gene expression datasets. However, these processes often require extensive expertise and manual effort, limiting their scalability. Large Language Model (LLM)-based agents have shown promise in automating these tasks due to their increasing problem-solving abilities. To support the evaluation and development of such methods, we introduce GenoTEX, a benchmark dataset for the automated analysis of gene expression data. GenoTEX provides annotated code and results for solving a wide range of gene identification problems, encompassing dataset selection, preprocessing, and statistical analysis, in a pipeline that follows computational genomics standards. The benchmark includes expert-curated annotations from bioinformaticians to ensure accuracy and reliability. To provide baselines for these tasks, we present GenoAgent, a team of LLM-based agents that adopt a multi-step programming workflow with flexible self-correction, to collaboratively analyze gene expression datasets. Our experiments demonstrate the potential of LLM-based methods in analyzing genomic data, while error analysis highlights the challenges and areas for future improvement. We propose GenoTEX as a promising resource for benchmarking and enhancing automated methods for gene expression data analysis. The benchmark is available at https://github.com/Liu-Hy/GenoTex.