๐ค AI Summary
To address the conflicting objectives of performance, energy efficiency, and thermal safety in AI workload scheduling for heterogeneous chiplet-level Processing-in-Memory (PIM) architectures, this paper proposes the first thermal-aware Multi-Objective Reinforcement Learning (MORL) scheduling framework tailored for chiplet-scale PIM. The method jointly models execution time, dynamic power consumption, and on-chip thermal evolution, enabling runtime generation of Pareto-optimal scheduling policies. It is technology-agnostic, supporting diverse memory technologies including ReRAM, SRAM, and FeFET. Experimental evaluation demonstrates that, compared to baseline approaches, the framework achieves an average speedup of 1.89ร and reduces energy consumption by 57%, while incurring only 0.14% runtime overhead and 0.022% additional energy cost. This work marks the first holistic co-optimization of performance, energy efficiency, and thermal safety at the chiplet granularity in PIM systems.
๐ Abstract
Chiplet-based integration enables large-scale systems that combine diverse technologies, enabling higher yield, lower costs, and scalability, making them well-suited to AI workloads. Processing-in-Memory (PIM) has emerged as a promising solution for AI inference, leveraging technologies such as ReRAM, SRAM, and FeFET, each offering unique advantages and trade-offs. A heterogeneous chiplet-based PIM architecture can harness the complementary strengths of these technologies to enable higher performance and energy efficiency. However, scheduling AI workloads across such a heterogeneous system is challenging due to competing performance objectives, dynamic workload characteristics, and power and thermal constraints. To address this need, we propose THERMOS, a thermally-aware, multi-objective scheduling framework for AI workloads on heterogeneous multi-chiplet PIM architectures. THERMOS trains a single multi-objective reinforcement learning (MORL) policy that is capable of achieving Pareto-optimal execution time, energy, or a balanced objective at runtime, depending on the target preferences. Comprehensive evaluations show that THERMOS achieves up to 89% faster average execution time and 57% lower average energy consumption than baseline AI workload scheduling algorithms with only 0.14% runtime and 0.022% energy overhead.