Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training

📅 2026-01-25

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the limitation of existing large model training optimization methods, which typically focus on either dynamic or static energy consumption in isolation, thereby failing to achieve holistic energy efficiency. To overcome this, the paper introduces a novel partition-based multi-objective optimization framework that jointly models dynamic and static energy consumption for the first time. By co-optimizing fine-grained operator scheduling and frequency scaling, the approach achieves a superior trade-off between training time and energy usage. The method decomposes the otherwise intractable global problem into efficiently solvable local subproblems and employs an iterative optimization strategy to coordinate execution scheduling and frequency control. Experimental results demonstrate that, compared to state-of-the-art techniques, the proposed method reduces energy consumption by up to 28.3% under identical training time constraints, or shortens training time by up to 27.5% at the same energy cost.

Technology Category

Application Category

📝 Abstract

The computing demand of AI is growing at an unprecedented rate, but energy supply is not keeping pace. As a result, energy has become an expensive, contended resource that requires explicit management and optimization. Although recent works have made significant progress in large model training optimization, they focus only on a single aspect of energy consumption: dynamic or static energy. We find that fine-grained kernel scheduling and frequency scaling jointly and interdependently impact both dynamic and static energy consumption. Based on this finding, we design Kareus, a training system that pushes the time--energy tradeoff frontier by optimizing both aspects. Kareus decomposes the intractable joint optimization problem into local, partition-based subproblems. It then uses a multi-pass multi-objective optimization algorithm to find execution schedules that push the time--energy tradeoff frontier. Compared to the state of the art, Kareus reduces training energy by up to 28.3% at the same training time, or reduces training time by up to 27.5% at the same energy consumption.

Problem

Research questions and friction points this paper is trying to address.

dynamic energy

static energy

large model training

energy optimization

joint optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

joint energy optimization

dynamic and static energy

kernel scheduling