InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training large language models (LLMs) incurs prohibitive computational costs, hindering both research and deployment. Existing FP8 training frameworks lack open-source, end-to-end solutions, impeding practical adoption. To address this, we propose the first open-source FP8 training framework supporting both continued pretraining and supervised fine-tuning. Our method introduces a fine-grained mixed-precision quantization strategy that adaptively selects quantization granularities—per-tensor, per-channel, or per-group—for weights, activations, and gradients, balancing numerical stability and hardware efficiency. Evaluated on a 160B-token corpus, our approach reduces training time by 22%, peak memory consumption by 14%, and improves throughput by 19% relative to BF16 baselines, while preserving inference accuracy. This work establishes the first efficient, robust, and fully open-source FP8 LLM training pipeline, bridging a critical gap in scalable LLM development.

Technology Category

Application Category

📝 Abstract
The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by the lack of a comprehensive, open-source training recipe. To bridge this gap, we introduce an end-to-end FP8 training recipe that seamlessly integrates continual pre-training and supervised fine-tuning. Our methodology employs a fine-grained, hybrid-granularity quantization strategy to maintain numerical fidelity while maximizing computational efficiency. Through extensive experiments, including the continue pre-training of models on a 160B-token corpus, we demonstrate that our recipe is not only remarkably stable but also essentially lossless, achieving performance on par with the BF16 baseline across a suite of reasoning benchmarks. Crucially, this is achieved with substantial efficiency improvements, including up to a 22% reduction in training time, a 14% decrease in peak memory usage, and a 19% increase in throughput. Our results establish FP8 as a practical and robust alternative to BF16, and we will release the accompanying code to further democratize large-scale model training.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of large language model training
Developing comprehensive open-source FP8 training methodology
Maintaining model performance while improving training efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end FP8 training recipe integration
Fine-grained hybrid-granularity quantization strategy
Maintains numerical fidelity while maximizing efficiency
🔎 Similar Papers
No similar papers found.
Wenjun Wang
Wenjun Wang
Tianjin University
Data MiningSocial NetworkComplex NetworkSmart City
S
Shuo Cai
The Hong Kong Polytechnic University
Congkai Xie
Congkai Xie
Reallm Labs
M
Mingfa Feng
InfiX.ai
Y
Yiming Zhang
The Hong Kong Polytechnic University
Z
Zhen Li
The Hong Kong Polytechnic University, InfiX.ai
K
Kejing Yang
InfiX.ai
M
Ming Li
The Hong Kong Polytechnic University
Jiannong Cao
Jiannong Cao
IEEE Fellow; Chair Professor, Hong Kong Polytechnic University
Distributed computingMobile and pervasive computingWireless sensor networksCloud computingBig Data
Y
Yuan Xie
The Hong Kong University of Science and Technology
Hongxia Yang
Hongxia Yang
Professor, HK Polytechnic University
Machine LearningGenerative AICognitive IntelligenceStatistical Modeling