Text-Based Approaches to Item Alignment to Content Standards in Large-Scale Reading & Writing Tests

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In large-scale assessment development, aligning test items with content standards (domain/skill-level taxonomies) often relies on subjective, labor-intensive manual annotation. This paper proposes an automated alignment method based on fine-tuned small language models (SLMs), integrating multilingual E5-large-instruct embeddings with supervised learning and conducting semantic analysis via cosine similarity, KL divergence, and 2D projection. Experiments demonstrate that augmenting item text data substantially improves SLM performance, enabling superior skill-level alignment accuracy compared to conventional embedding approaches. Semantic analysis further reveals non-negligible semantic overlap among certain skills in SAT/PSAT assessments, contributing to misclassification. The proposed framework offers a scalable, interpretable, and lightweight technical pathway for enhancing the efficiency and rigor of validity evidence generation in test construction.

Technology Category

Application Category

📝 Abstract

Aligning test items to content standards is a critical step in test development to collect validity evidence based on content. Item alignment has typically been conducted by human experts. This judgmental process can be subjective and time-consuming. This study investigated the performance of fine-tuned small language models (SLMs) for automated item alignment using data from a large-scale standardized reading and writing test for college admissions. Different SLMs were trained for alignment at both domain and skill levels respectively with 10 skills mapped to 4 content domains. The model performance was evaluated in multiple criteria on two testing datasets. The impact of types and sizes of the input data for training was investigated. Results showed that including more item text data led to substantially better model performance, surpassing the improvements induced by sample size increase alone. For comparison, supervised machine learning models were trained using the embeddings from the multilingual-E5-large-instruct model. The study results showed that fine-tuned SLMs consistently outperformed the embedding-based supervised machine learning models, particularly for the more fine-grained skill alignment. To better understand model misclassifications, multiple semantic similarity analysis including pairwise cosine similarity, Kullback-Leibler divergence of embedding distributions, and two-dimension projections of item embeddings were conducted. These analyses consistently showed that certain skills in SAT and PSAT were semantically too close, providing evidence for the observed misclassification.

Problem

Research questions and friction points this paper is trying to address.

Automating alignment of test items to content standards

Overcoming subjectivity in human expert alignment process

Evaluating fine-tuned language models for educational assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned small language models automate item alignment

Models outperform embedding-based supervised machine learning

Semantic similarity analysis explains misclassification between skills

🔎 Similar Papers

No similar papers found.

Authors to Follow