🤖 AI Summary
The exponential growth of astronomical literature has rendered manual telescope attribution unsustainable, jeopardizing the robustness of bibliographic databases and the accuracy of scientific impact assessment. To address this, we propose the first large language model (LLM)-based automation system specifically designed for astronomical facility attribution. Our approach integrates task-specific prompt engineering and supervised fine-tuning to achieve fine-grained identification and classification of telescope references. The model exhibits strong cross-telescope generalization and incorporates a historical data error-diagnosis capability. Evaluated on the TRACS Kaggle challenge test set, it achieves a macro-F1 score of 0.84—substantially outperforming existing baselines. The system has been deployed in NASA’s multi-mission scientific output literature identification platform, enabling large-scale, high-precision facility-level bibliometric analysis.
📝 Abstract
Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure that we can measure the scientific impact of our facilities and archives. However, the growing rate of publications threatens to outpace our ability to manually label astronomical literature. We therefore present the Automated Mission Classifier (amc), a tool that uses large language models (LLMs) to identify and categorize telescope references by processing large quantities of paper text. A modified version of amc performs well on the TRACS Kaggle challenge, achieving a macro $F_1$ score of 0.84 on the held-out test set. amc is valuable for other telescopes beyond TRACS; we developed the initial software for identifying papers that featured scientific results by NASA missions. Additionally, we investigate how amc can also be used to interrogate historical datasets and surface potential label errors. Our work demonstrates that LLM-based applications offer powerful and scalable assistance for library sciences.