Automatic Proficiency Assessment in L2 English Learners

📅 2025-05-05

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

To address the low reliability and validity of human scoring, along with substantial inter-rater variability, in second-language (L2) English speaking and writing proficiency assessment, this paper proposes the first end-to-end deep learning framework jointly modeling speech and text. Our method unifies acoustic representations (extracted via wav2vec 2.0) and linguistic representations (from BERT) within a novel role-aware, long-dialogue-adapted wav2vec 2.0–BERT collaborative evaluation architecture. We further integrate ResNet, 2D and frequency-domain CNNs, and multimodal features to enhance representation learning. Extensive experiments on EFCamDat, ANGLISH, and a proprietary dataset demonstrate that the wav2vec 2.0–only modality achieves state-of-the-art performance; across tasks, our full model improves average accuracy by 12.3% over strong baselines and significantly outperforms conventional approaches.

Technology Category

Application Category

📝 Abstract

Second language proficiency (L2) in English is usually perceptually evaluated by English teachers or expert evaluators, with the inherent intra- and inter-rater variability. This paper explores deep learning techniques for comprehensive L2 proficiency assessment, addressing both the speech signal and its correspondent transcription. We analyze spoken proficiency classification prediction using diverse architectures, including 2D CNN, frequency-based CNN, ResNet, and a pretrained wav2vec 2.0 model. Additionally, we examine text-based proficiency assessment by fine-tuning a BERT language model within resource constraints. Finally, we tackle the complex task of spontaneous dialogue assessment, managing long-form audio and speaker interactions through separate applications of wav2vec 2.0 and BERT models. Results from experiments on EFCamDat and ANGLISH datasets and a private dataset highlight the potential of deep learning, especially the pretrained wav2vec 2.0 model, for robust automated L2 proficiency evaluation.

Problem

Research questions and friction points this paper is trying to address.

Automating L2 English proficiency assessment to reduce human rater variability

Evaluating speech and text using deep learning models like CNN and BERT

Assessing spontaneous dialogues via separate wav2vec 2.0 and BERT applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 2D CNN, ResNet, and wav2vec 2.0 for speech analysis

Fine-tunes BERT model for text-based proficiency assessment

Combines wav2vec 2.0 and BERT for dialogue evaluation

🔎 Similar Papers

EvalYaks: Instruction Tuning Datasets and LoRA Fine-tuned Models for Automated Scoring of CEFR B2 Speaking Assessment Transcripts