NVIDIA Nemotron 3: Efficient and Open Intelligence

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This paper addresses the joint challenges of enhancing agent capabilities, long-context reasoning, and throughput optimization in large language models. To this end, we propose OpenAgent—a family of open-source large models designed for efficiency, openness, and multi-scenario adaptability. Our method introduces a novel Mamba-Transformer hybrid Mixture-of-Experts (MoE) architecture: (i) LatentMoE, a first-of-its-kind mechanism to improve expert quality via latent routing; (ii) MTP (Multi-Token Prediction) layers to accelerate autoregressive generation; and (iii) fine-grained inference budget control. The series comprises three variants—Nano (optimal accuracy-cost trade-off), Super (high-concurrency collaborative agents), and Ultra (state-of-the-art inference performance)—all supporting unified 1M-token context windows and multi-environment reinforcement fine-tuning. We fully open-source model weights, training code, and datasets, and integrate NVFP4 quantization with end-to-end training pipelines.

Technology Category

Application Category

📝 Abstract

We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.

Problem

Research questions and friction points this paper is trying to address.

Introduces Nemotron 3 models for strong agentic and reasoning capabilities

Uses hybrid architecture for high throughput and long context handling

Applies novel training for improved model quality and efficient inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts hybrid Mamba-Transformer architecture

LatentMoE and NVFP4 training for improved quality

Multi-environment reinforcement learning post-training for reasoning

🔎 Similar Papers

No similar papers found.