Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

📅 2026-01-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern spectroscopic surveys generate vast volumes of data, yet manual visual inspection for validating rare astrophysical object candidates remains highly inefficient. To address this challenge, this work proposes Spec-o3—the first tool-augmented vision–language agent that emulates astronomers’ spectral analysis workflow through multimodal interleaved chain-of-thought reasoning. By integrating cold-start supervised fine-tuning, outcome-based reinforcement learning, and dedicated spectral analysis tools, Spec-o3 substantially improves performance on five rare object identification tasks from the LAMOST survey, raising the macro F1 score from 28.3 to 76.5. The method significantly outperforms both state-of-the-art vision–language models and specialized deep learning approaches, while also demonstrating strong generalization capabilities in cross-survey transfer experiments on SDSS and DESI data.

Technology Category

Application Category

📝 Abstract
Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manually intensive process. In this process, astronomers leverage specialized tools to analyze spectra and construct reliable catalogs. However, this practice has become the primary bottleneck, as it is fundamentally incapable of scaling with the data deluge from modern spectroscopic surveys. To bridge this gap, we propose Spec-o3, a tool-augmented vision-language agent that performs astronomer-aligned spectral inspection via interleaved multimodal chain-of-thought reasoning. Spec-o3 is trained with a two-stage post-training recipe: cold-start supervised fine-tuning on expert inspection trajectories followed by outcome-based reinforcement learning on rare-type verification tasks. Evaluated on five rare-object identification tasks from LAMOST, Spec-o3 establishes a new State-of-the-Art, boosting the macro-F1 score from 28.3 to 76.5 with a 7B parameter base model and outperforming both proprietary VLMs and specialized deep models. Crucially, the agent demonstrates strong generalization to unseen inspection tasks across survey shifts (from LAMOST to SDSS/DESI). Expert evaluations confirm that its reasoning traces are coherent and physically consistent, supporting transparent and trustworthy decision-making. Code, data, and models are available at \href{https://github.com/Maxwell-Jia/spec-o3}{Project HomePage}.
Problem

Research questions and friction points this paper is trying to address.

rare celestial object
spectral inspection
manual vetting
data deluge
astronomical surveys
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-augmented agent
multimodal chain-of-thought
spectral inspection
reinforcement learning
cross-survey generalization
🔎 Similar Papers
No similar papers found.
M
Minghui Jia
Institute of Automation, CAS, Beijing, China
Qichao Zhang
Qichao Zhang
中国科学院自动化研究所
人工智能 强化学习 博弈论 自适应动态规划
A
Ali Luo
National Astronomical Observatories, CAS, Beijing, China
L
Linjing Li
Institute of Automation, CAS, Beijing, China
Shuo Ye
Shuo Ye
Huazhong University of Science and Technology
Deep learningComputer visionFine-Grained Image Analysis
H
Hailing Lu
National Astronomical Observatories, CAS, Beijing, China
W
Wen Hou
National Astronomical Observatories, CAS, Beijing, China
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics