Cognitively Layered Data Synthesis for Domain Adaptation of LLMs to Space Situational Awareness

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of transferring large language models (LLMs) to the domain of space situational awareness (SSA), which stem from task-structure misalignment, lack of higher-order cognitive supervision, and inconsistencies between data and engineering standards. To overcome these issues, the authors propose the BD-FDG framework, which—drawing on Bloom’s taxonomy for the first time in domain-adaptive data generation—constructs a continuous gradient of samples spanning nine question types and six cognitive difficulty levels. Domain knowledge is organized via a knowledge-tree structure, and a multidimensional automated quality assessment pipeline yields a high-quality SSA-SFT dataset comprising 230,000 samples. The resulting SSA-LLM-8B, fine-tuned from Qwen3-8B, achieves a 144% (without chain-of-thought) and 176% (with chain-of-thought) improvement in BLEU-1 on in-domain evaluation, attains an arena win rate of 82.21%, and preserves strong general-purpose capabilities.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate exceptional performance on general-purpose tasks. however, transferring them to complex engineering domains such as space situational awareness (SSA) remains challenging owing to insufficient structural alignment with mission chains, the absence of higher-order cognitive supervision, and poor correspondence between data quality criteria and engineering specifications. The core bottleneck is the construction of high-quality supervised fine-tuning (SFT) datasets. To this end, we propose BD-FDG (Bloom's Taxonomy-based Domain-specific Fine-tuning Data Generation), a framework that addresses incomplete knowledge coverage, shallow cognitive depth, and limited quality controllability through three mechanisms: structured knowledge organization, cognitively layered question modeling, and automated quality control. The framework uses a knowledge tree to ensure structured corpus coverage, designs a question generation scheme spanning nine categories and six cognitive levels from Remember to Create to produce samples with a continuous difficulty gradient, and applies a multidimensional scoring pipeline to enforce domain rigor and consistency. Using BD-FDG, we construct SSA-SFT, a domain dataset of approximately 230K samples, and fine-tune Qwen3-8B to obtain SSA-LLM-8B. Experiments show that SSA-LLM-8B achieves relative BLEU-1 improvements of 144\% (no-think) and 176\% (think) on the domain test set and a win rate of 82.21\% over the baseline in arena comparisons, while largely preserving general benchmark performance (MMLU-Pro, MATH-500). These results validate SFT data construction driven by cognitive layering as an effective paradigm for complex engineering domains and provide a transferable framework for domain-specific LLM adaptation.
Problem

Research questions and friction points this paper is trying to address.

domain adaptation
large language models
space situational awareness
supervised fine-tuning
cognitive depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

cognitive layering
domain adaptation
structured knowledge organization
Bloom's Taxonomy
quality-controlled data generation
🔎 Similar Papers
No similar papers found.
D
Ding Linghu
Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing, China
Cheng Wang
Cheng Wang
Shenzhen Institutes of Advanced Technology
Neuroscience
Da Fan
Da Fan
Columbia University
Weather forecastAI4ScienceMachine LearningClimate modeling
Wei Shi
Wei Shi
University of Science and Technology of China
Mechanistic Interpretability
K
Kaifeng Yin
China Academy of Space Technology, Beijing, China
X
Xiaoliang Xue
Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing, China
F
Fan Yang
China Academy of Space Technology, Beijing, China
H
Haiyi Ren
State Key Laboratory of Space Information System and Integrated Application, Beijing, China
C
Cong Zhang
Qian Xuesen Laboratory of Space Technology, China Academy of Space Technology, Beijing, China