Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the challenges of deploying large language models (LLMs) on resource-constrained edge devices and cost-sensitive settings, this paper proposes an efficient small language model (SLM) with 1.3 billion parameters. The method integrates three key innovations: (1) micro-parameterization (μP) to enable hyperparameter transfer across model scales; (2) a three-phase training paradigm—Warmup–Stable–Decay—with a switch to the Muon optimizer during the Decay phase; and (3) architectural and systems optimizations, including tied word embeddings, FP8 mixed-precision training, and large-scale pretraining. Evaluated on 13 reasoning benchmarks, the model achieves an average improvement of +4.58% over baseline SLMs, while remaining deployable on edge hardware. All training code and checkpoints are publicly released.

Technology Category

Application Category

📝 Abstract

Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present extbf{Xmodel-2.5}, a 1.3-billion-parameter small language model designed as a emph{drop-in agent core}. Training with maximal-update parameterization ($μ$P) allows hyper-parameters tuned on a 20M-parameter proxy to transfer directly to the full model, even under the parameter-tied emph{tie-word-embedding} architecture. A 1.4T-token Warmup--Stable--Decay curriculum is used, and we further show that extbf{switching from AdamW to Muon during the decay phase} improves the 13-task reasoning average by 4.58,% while keeping every other hyper-parameter fixed, verifying that early AdamW stability can be paired with late Muon sharpening for better downstream performance. FP8-mixed-precision training balances accuracy and throughput. All checkpoints, recipes, and evaluation code are released under the Apache-2.0 license.footnote{https://huggingface.co/XiaoduoAILab/Xmodel-2.5 and https://huggingface.co/XiaoduoAILab/Xmodel-2.5-history (training checkpoints).} Training code and evaluation harness: https://github.com/XiaoduoAILab/Xmodel-2.5.

Problem

Research questions and friction points this paper is trying to address.

Develops efficient small language model for edge deployment

Enables parameter transfer from proxy to full model

Improves reasoning performance via optimized training techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses maximal-update parameterization for hyper-parameter transfer

Switches from AdamW to Muon optimizer during decay phase

Employs FP8-mixed-precision training for accuracy-throughput balance

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting