amc: The Automated Mission Classifier for Telescope Bibliographies

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The exponential growth of astronomical literature has rendered manual telescope attribution unsustainable, jeopardizing the robustness of bibliographic databases and the accuracy of scientific impact assessment. To address this, we propose the first large language model (LLM)-based automation system specifically designed for astronomical facility attribution. Our approach integrates task-specific prompt engineering and supervised fine-tuning to achieve fine-grained identification and classification of telescope references. The model exhibits strong cross-telescope generalization and incorporates a historical data error-diagnosis capability. Evaluated on the TRACS Kaggle challenge test set, it achieves a macro-F1 score of 0.84—substantially outperforming existing baselines. The system has been deployed in NASA’s multi-mission scientific output literature identification platform, enabling large-scale, high-precision facility-level bibliometric analysis.

Technology Category

Application Category

📝 Abstract
Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure that we can measure the scientific impact of our facilities and archives. However, the growing rate of publications threatens to outpace our ability to manually label astronomical literature. We therefore present the Automated Mission Classifier (amc), a tool that uses large language models (LLMs) to identify and categorize telescope references by processing large quantities of paper text. A modified version of amc performs well on the TRACS Kaggle challenge, achieving a macro $F_1$ score of 0.84 on the held-out test set. amc is valuable for other telescopes beyond TRACS; we developed the initial software for identifying papers that featured scientific results by NASA missions. Additionally, we investigate how amc can also be used to interrogate historical datasets and surface potential label errors. Our work demonstrates that LLM-based applications offer powerful and scalable assistance for library sciences.
Problem

Research questions and friction points this paper is trying to address.

Automates classification of telescope references in literature
Addresses manual labeling limitations in astronomy publications
Enables scalable bibliographies for measuring scientific impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses large language models for telescope reference classification
Achieves high accuracy on TRACS Kaggle challenge dataset
Identifies papers and detects label errors in historical data
John F. Wu
John F. Wu
Space Telescope Science Institute, Johns Hopkins University
galaxiesastrophysicsmachine learninginterpretable AI
J
Joshua E. G. Peek
Space Telescope Science Institute
S
Sophie J. Miller
Space Telescope Science Institute
J
Jenny Novacescu
Space Telescope Science Institute
A
Achu J. Usha
Space Telescope Science Institute
C
Christopher A. Wilkinson
Space Telescope Science Institute