🤖 AI Summary
Current single-cell annotation methods suffer from low automation, misalignment with human cognitive reasoning, and suboptimal performance of general-purpose large language models (LLMs).
Method: We propose a knowledge-driven, graph-augmented LLM framework. It constructs a cell-type–marker knowledge graph, integrates differential gene–guided graph-based retrieval (RAG), and performs multi-task fine-tuning of the LLM, augmented by a semantic similarity alignment optimization strategy.
Contribution/Results: This work pioneers deep synergy between domain-specific knowledge graphs and LLMs, explicitly emulating human annotation cognition during inference. Evaluated across 11 tissue datasets, our method achieves up to a +0.21 improvement in human expert evaluation scores and a 6.1% gain in semantic consistency—significantly outperforming general-purpose LLMs.
📝 Abstract
To enable precise and fully automated cell type annotation with large language models (LLMs), we developed a graph-structured feature–marker database to retrieve entities linked to differential genes for cell reconstruction. We further designed a multi-task workflow to optimize the annotation process. Compared to general-purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types, while more closely aligning with the cognitive logic of manual annotation.