π€ AI Summary
To address the low inference efficiency and poor interpretability of large language models (LLMs) for Russian, this paper proposes the first hybrid reasoning paradigm tailored to Russian, integrating direct answer generation with interpretable reasoning path synthesis. Methodologically, we design a Cyrillic-optimized dense tokenizer to enhance Russian linguistic modeling; adapt the EAGLE architecture for speculative decoding to significantly reduce latency; and perform end-to-end training via instruction tuning coupled with reasoning trajectory supervision. Key contributions include: (1) open-sourcing high-quality resourcesβT-Wix (500K Russian instruction samples), T-Math (a dedicated reasoning benchmark), and lightweight EAGLE weights; (2) achieving both high accuracy and accelerated inference across diverse domains (with empirically validated latency reduction); and (3) releasing a web-based demo system demonstrating superior performance of the dual-mode (non-reasoning vs. reasoning) framework. All artifacts are publicly available to support reproducible and extensible Russian AI research.
π Abstract
We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense tokenizer and an adapted EAGLE speculative-decoding pipeline to reduce latency. To enable reproducible and extensible research, we release the model weights, the T-Wix 500k instruction corpus, the T-Math reasoning benchmark, and the EAGLE weights on Hugging Face. These resources allow users to study Russian-language reasoning and to extend or adapt both the model and the inference pipeline. A public web demo exposes reasoning and non-reasoning modes and illustrates the speedups achieved by our inference stack across domains. T-pro 2.0 thus serves as an accessible open system for building and evaluating efficient, practical Russian LLM applications.