Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

📅 2024-04-16

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

🤖 AI Summary

This study addresses the automatic classification of stellar light curves from the Kepler/K2 mission, focusing on three critical variable-star classes: Cepheids (including Type II), RR Lyrae stars, and eclipsing binaries. We propose StarWhisper LC—a novel family of astronomy-specific large language models that integrate time-series modeling with language, multimodal, and audio-inspired representations. We systematically evaluate the impact of sampling cadence and phase distribution on classification performance. Methodologically, we benchmark 1D-CNN+BiLSTM, Swin Transformer, AutoDL-optimized architectures, and prompt-engineering-based fine-tuning. Results show the Swin Transformer achieves 99% overall accuracy and 83% recall for Type II Cepheids; StarWhisper LC consistently attains ~90% accuracy. Crucially, our models maintain <10% accuracy degradation under 14% shorter observation durations and 21% fewer sampling points, substantially enhancing survey efficiency and robustness.

Technology Category

Application Category

📝 Abstract

Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94% and 99% correspondingly, with the latter demonstrating a notable 83% accuracy in discerning the elusive Type II Cepheids-comprising merely 0.02% of the total dataset.We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.

Problem

Research questions and friction points this paper is trying to address.

Classify variable star light curves

Optimize deep-learning models for astronomy

Enhance accuracy with LLM-based methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoDL optimization for 1D-Convolution+BiLSTM

Swin Transformer achieves 99% accuracy

StarWhisper LC Series with LLM models

🔎 Similar Papers

No similar papers found.

Authors to Follow