PithTrain: A Compact and Agent-Native MoE Training System

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing Mixture-of-Experts (MoE) training frameworks incur high engineering costs when adapting to new architectures and system optimizations, and they lack support for enhancing the efficiency of AI coding agents. This work proposes the first lightweight MoE training system tailored for AI agents, grounded in four agent-native design principles. It introduces Agent Task Efficiency (ATE) as a novel evaluation dimension and accompanies it with the ATE-Bench benchmark. By integrating MoE training optimizations, the system achieves throughput comparable to production-grade frameworks while substantially improving agent interaction efficiency—reducing interaction rounds by up to 62% and active GPU time by up to 64% on ATE-Bench.

📝 Abstract

Mixture-of-Experts (MoE) has become the dominant architecture for frontier language models. To meet this demand, production frameworks have built optimized MoE training stacks over years of engineering effort. Yet evolving these stacks for new architectures and system optimizations remains expensive. With the rise of AI coding agents, they could automate parts of training-framework development and accelerate this evolution. But applying them to these existing frameworks carries hidden costs, invisible to today's throughput-only evaluations. We name this missing dimension agent-task efficiency (ATE): the cost of using coding agents to understand, operate, and extend a framework. Grounded in four agent-native design principles, we build PithTrain, a compact, agent-native MoE training framework. We further introduce ATE-Bench, covering real-world training-framework tasks. Our evaluation shows PithTrain matches the throughput of production frameworks, and on ATE-Bench, PithTrain enables higher agent-task efficiency, with up to 62% fewer Agent Turns and 64% less Active GPU Time.

Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts

training framework

AI coding agents

agent-task efficiency

framework evolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts

Agent-Native

Agent-Task Efficiency