🤖 AI Summary
To address the conflicting objectives of performance, energy efficiency, and thermal safety in AI workload scheduling for heterogeneous chiplet-level Processing-in-Memory (PIM) architectures, this paper proposes the first thermal-aware Multi-Objective Reinforcement Learning (MORL) scheduling framework tailored for chiplet-scale PIM. The method jointly models execution time, dynamic power consumption, and on-chip thermal evolution, enabling runtime generation of Pareto-optimal scheduling policies. It is technology-agnostic, supporting diverse memory technologies including ReRAM, SRAM, and FeFET. Experimental evaluation demonstrates that, compared to baseline approaches, the framework achieves an average speedup of 1.89× and reduces energy consumption by 57%, while incurring only 0.14% runtime overhead and 0.022% additional energy cost. This work marks the first holistic co-optimization of performance, energy efficiency, and thermal safety at the chiplet granularity in PIM systems.
📝 Abstract
Chiplet-based integration enables large-scale systems that combine diverse technologies, enabling higher yield, lower costs, and scalability, making them well-suited to AI workloads. Processing-in-Memory (PIM) has emerged as a promising solution for AI inference, leveraging technologies such as ReRAM, SRAM, and FeFET, each offering unique advantages and trade-offs. A heterogeneous chiplet-based PIM architecture can harness the complementary strengths of these technologies to enable higher performance and energy efficiency. However, scheduling AI workloads across such a heterogeneous system is challenging due to competing performance objectives, dynamic workload characteristics, and power and thermal constraints. To address this need, we propose THERMOS, a thermally-aware, multi-objective scheduling framework for AI workloads on heterogeneous multi-chiplet PIM architectures. THERMOS trains a single multi-objective reinforcement learning (MORL) policy that is capable of achieving Pareto-optimal execution time, energy, or a balanced objective at runtime, depending on the target preferences. Comprehensive evaluations show that THERMOS achieves up to 89% faster average execution time and 57% lower average energy consumption than baseline AI workload scheduling algorithms with only 0.14% runtime and 0.022% energy overhead.