MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work aims to advance the performance of artificial intelligence in solving highly challenging mathematical proof problems, approaching and surpassing top human-level capabilities. To this end, we introduce MaxProof, a framework that integrates generative-verification reinforcement learning with test-time ensemble expansion strategies. By combining depth-in-defense generative verifiers, critique-conditioned repair mechanisms, population-level search, and tournament-based selection, MaxProof achieves end-to-end high-performance automated theorem proving. Our method attains scores of 35/42 on the IMO 2025 benchmark and 36/42 on USAMO 2026—both exceeding the gold-medal thresholds for human participants—thereby significantly pushing the frontier of AI in formal mathematical reasoning.
📝 Abstract
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate. These capabilities are merged into a single released M3 model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection. With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.
Problem

Research questions and friction points this paper is trying to address.

mathematical proof
test-time scaling
proof generation
proof verification
competition-level mathematics
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time scaling
generative-verifier RL
proof verification
population-based search
mathematical reasoning
🔎 Similar Papers
Jiacheng Chen
Jiacheng Chen
The Chinese University of Hong Kong
Natural Language ProcessingReinforcement LearningOptimization
X
Xinyu Zhang
Fudan University
S
Shunkai Zhang
Peking University
Y
Yanmohan Wang
MiniMax, Tsinghua University
L
Lin Li
MiniMax
T
Tiancheng Qin
MiniMax
Qin Wang
Qin Wang
ETH Zurich
Domain AdaptationComputer Vision
Z
Zhengmao Zhu
MiniMax
T
Tianle Li
MiniMax
Jingyang Li
Jingyang Li
PhD Student, National University of Singapore
optimizationdeep learning
Zehan Li
Zehan Li
PhD, UTHealth Houston
AI for Mental HealthPsychiatryBiomedical InformaticsLLMsClinical Phenotyping
B
Binyang Jiang
MiniMax
J
Jin Zhu
MiniMax
H
Han Ding
MiniMax
F
Fei Yu
MiniMax
C
Chenyu Du
MiniMax
Z
Zijian Song
MiniMax
J
Jiayuan Song
MiniMax
Z
Zhi Zhang
MiniMax
Y
Yunan Huang
MiniMax
W
Weiyu Cheng
MiniMax
Pengyu Zhao
Pengyu Zhao
Peking University
Neural Architecture SearchRecommender System360-degree Video
Yu Cheng
Yu Cheng
Professor of Computer Science and Engineering, The Chinese University of Hong Kong
Deep Generative ModelsMultimodal LearningModel Compression