InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Training large language models (LLMs) incurs prohibitive computational costs, hindering both research and deployment. Existing FP8 training frameworks lack open-source, end-to-end solutions, impeding practical adoption. To address this, we propose the first open-source FP8 training framework supporting both continued pretraining and supervised fine-tuning. Our method introduces a fine-grained mixed-precision quantization strategy that adaptively selects quantization granularities—per-tensor, per-channel, or per-group—for weights, activations, and gradients, balancing numerical stability and hardware efficiency. Evaluated on a 160B-token corpus, our approach reduces training time by 22%, peak memory consumption by 14%, and improves throughput by 19% relative to BF16 baselines, while preserving inference accuracy. This work establishes the first efficient, robust, and fully open-source FP8 LLM training pipeline, bridging a critical gap in scalable LLM development.

Technology Category

Application Category

📝 Abstract

The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by the lack of a comprehensive, open-source training recipe. To bridge this gap, we introduce an end-to-end FP8 training recipe that seamlessly integrates continual pre-training and supervised fine-tuning. Our methodology employs a fine-grained, hybrid-granularity quantization strategy to maintain numerical fidelity while maximizing computational efficiency. Through extensive experiments, including the continue pre-training of models on a 160B-token corpus, we demonstrate that our recipe is not only remarkably stable but also essentially lossless, achieving performance on par with the BF16 baseline across a suite of reasoning benchmarks. Crucially, this is achieved with substantial efficiency improvements, including up to a 22% reduction in training time, a 14% decrease in peak memory usage, and a 19% increase in throughput. Our results establish FP8 as a practical and robust alternative to BF16, and we will release the accompanying code to further democratize large-scale model training.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of large language model training

Developing comprehensive open-source FP8 training methodology

Maintaining model performance while improving training efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end FP8 training recipe integration

Fine-grained hybrid-granularity quantization strategy

Maintains numerical fidelity while maximizing efficiency

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting