🤖 AI Summary
Small-scale language models (e.g., BabyLM) suffer from fixed masking strategies and weak morphological generalization in masked language modeling (MLM) pretraining. Method: This paper proposes a dynamic masking probability mechanism—adjusting the masking rate in real time based on prediction difficulty—and subword-level embedding enhancement to explicitly model morphological variation. Contribution/Results: The approach jointly improves lexical morphology and contextual semantics modeling without compromising training efficiency. Evaluated within the BabyLM challenge framework, it consistently outperforms standard MLM baselines across multiple (Super)GLUE tasks. Notably, it achieves substantial performance gains in the strictly constrained small-model track, demonstrating that adaptive masking and fine-grained morphological representation are critical for enhancing language understanding in resource-limited models.
📝 Abstract
We describe our strategy for the 2025 edition of the BabyLM Challenge. Our main contribution is that of an improved form of Masked Language Modeling (MLM), which adapts the probabilities of the tokens masked according to the model's ability to predict them. The results show a substantial increase in performance on (Super)GLUE tasks over the standard MLM. We also incorporate sub-token embeddings, finding that this increases the model's morphological generalization capabilities. Our submission beats the baseline in the strict-small track.