T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the low inference efficiency and poor interpretability of large language models (LLMs) for Russian, this paper proposes the first hybrid reasoning paradigm tailored to Russian, integrating direct answer generation with interpretable reasoning path synthesis. Methodologically, we design a Cyrillic-optimized dense tokenizer to enhance Russian linguistic modeling; adapt the EAGLE architecture for speculative decoding to significantly reduce latency; and perform end-to-end training via instruction tuning coupled with reasoning trajectory supervision. Key contributions include: (1) open-sourcing high-quality resources—T-Wix (500K Russian instruction samples), T-Math (a dedicated reasoning benchmark), and lightweight EAGLE weights; (2) achieving both high accuracy and accelerated inference across diverse domains (with empirically validated latency reduction); and (3) releasing a web-based demo system demonstrating superior performance of the dual-mode (non-reasoning vs. reasoning) framework. All artifacts are publicly available to support reproducible and extensible Russian AI research.

Technology Category

Application Category

📝 Abstract

We introduce T-pro 2.0, an open-weight Russian LLM for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense tokenizer and an adapted EAGLE speculative-decoding pipeline to reduce latency. To enable reproducible and extensible research, we release the model weights, the T-Wix 500k instruction corpus, the T-Math reasoning benchmark, and the EAGLE weights on Hugging Face. These resources allow users to study Russian-language reasoning and to extend or adapt both the model and the inference pipeline. A public web demo exposes reasoning and non-reasoning modes and illustrates the speedups achieved by our inference stack across domains. T-pro 2.0 thus serves as an accessible open system for building and evaluating efficient, practical Russian LLM applications.

Problem

Research questions and friction points this paper is trying to address.

Develops an efficient Russian LLM for hybrid reasoning

Reduces inference latency with adapted speculative decoding

Provides open resources for Russian language reasoning research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cyrillic-dense tokenizer for Russian language optimization

Adapted EAGLE speculative-decoding pipeline to reduce latency

Open-weight model with reasoning-trace generation capability

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting