amc: The Automated Mission Classifier for Telescope Bibliographies

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

The exponential growth of astronomical literature has rendered manual telescope attribution unsustainable, jeopardizing the robustness of bibliographic databases and the accuracy of scientific impact assessment. To address this, we propose the first large language model (LLM)-based automation system specifically designed for astronomical facility attribution. Our approach integrates task-specific prompt engineering and supervised fine-tuning to achieve fine-grained identification and classification of telescope references. The model exhibits strong cross-telescope generalization and incorporates a historical data error-diagnosis capability. Evaluated on the TRACS Kaggle challenge test set, it achieves a macro-F1 score of 0.84—substantially outperforming existing baselines. The system has been deployed in NASA’s multi-mission scientific output literature identification platform, enabling large-scale, high-precision facility-level bibliometric analysis.

Technology Category

Application Category

📝 Abstract

Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure that we can measure the scientific impact of our facilities and archives. However, the growing rate of publications threatens to outpace our ability to manually label astronomical literature. We therefore present the Automated Mission Classifier (amc), a tool that uses large language models (LLMs) to identify and categorize telescope references by processing large quantities of paper text. A modified version of amc performs well on the TRACS Kaggle challenge, achieving a macro $F_1$ score of 0.84 on the held-out test set. amc is valuable for other telescopes beyond TRACS; we developed the initial software for identifying papers that featured scientific results by NASA missions. Additionally, we investigate how amc can also be used to interrogate historical datasets and surface potential label errors. Our work demonstrates that LLM-based applications offer powerful and scalable assistance for library sciences.

Problem

Research questions and friction points this paper is trying to address.

Automates classification of telescope references in literature

Addresses manual labeling limitations in astronomy publications

Enables scalable bibliographies for measuring scientific impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses large language models for telescope reference classification

Achieves high accuracy on TRACS Kaggle challenge dataset

Identifies papers and detects label errors in historical data

🔎 Similar Papers

Streamlining and standardizing software citations with The Software Citation Station

2024-06-06arXiv.orgCitations: 5

Authors to Follow