T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the low inference efficiency and poor interpretability of large language models (LLMs) for Russian, this paper proposes the first hybrid reasoning paradigm tailored to Russian, integrating direct answer generation with interpretable reasoning path synthesis. Methodologically, we design a Cyrillic-optimized dense tokenizer to enhance Russian linguistic modeling; adapt the EAGLE architecture for speculative decoding to significantly reduce latency; and perform end-to-end training via instruction tuning coupled with reasoning trajectory supervision. Key contributions include: (1) open-sourcing high-quality resourcesβ€”T-Wix (500K Russian instruction samples), T-Math (a dedicated reasoning benchmark), and lightweight EAGLE weights; (2) achieving both high accuracy and accelerated inference across diverse domains (with empirically validated latency reduction); and (3) releasing a web-based demo system demonstrating superior performance of the dual-mode (non-reasoning vs. reasoning) framework. All artifacts are publicly available to support reproducible and extensible Russian AI research.

Technology Category

Application Category

πŸ“ Abstract
We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense tokenizer and an adapted EAGLE speculative-decoding pipeline to reduce latency. To enable reproducible and extensible research, we release the model weights, the T-Wix 500k instruction corpus, the T-Math reasoning benchmark, and the EAGLE weights on Hugging Face. These resources allow users to study Russian-language reasoning and to extend or adapt both the model and the inference pipeline. A public web demo exposes reasoning and non-reasoning modes and illustrates the speedups achieved by our inference stack across domains. T-pro 2.0 thus serves as an accessible open system for building and evaluating efficient, practical Russian LLM applications.
Problem

Research questions and friction points this paper is trying to address.

Develops an efficient Russian LLM for hybrid reasoning
Reduces inference latency with adapted speculative decoding
Provides open resources for Russian language reasoning research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cyrillic-dense tokenizer for Russian language optimization
Adapted EAGLE speculative-decoding pipeline to reduce latency
Open-weight model with reasoning-trace generation capability
D
Dmitrii Stoianov
T-Tech, Moscow, Russia
D
Danil Taranets
T-Tech, Moscow, Russia
Olga Tsymboi
Olga Tsymboi
T-Tech
R
Ramil Latypov
T-Tech, Moscow, Russia
A
Almaz Dautov
T-Tech, Moscow, Russia
V
Vladislav Kruglikov
T-Tech, Moscow, Russia
N
Nikita Surkov
T-Tech, Moscow, Russia
G
German Abramov
T-Tech, Moscow, Russia
P
Pavel Gein
T-Tech, Moscow, Russia
Dmitry Abulkhanov
Dmitry Abulkhanov
Huawei Noah's Ark Lab
Computer Science
M
Mikhail Gashkov
T-Tech, Moscow, Russia
V
Viktor Zelenkovskiy
T-Tech, Moscow, Russia
A
Artem Batalov
T-Tech, Moscow, Russia
A
Aleksandr Medvedev
T-Tech, Moscow, Russia
A
Anatolii Potapov
T-Tech, Moscow, Russia