🤖 AI Summary
This study presents the first empirical investigation into the relationship between self-admitted technical debt (SATD) and runtime energy consumption within intelligent agent frameworks, addressing a critical gap between code quality and energy efficiency. By executing standardized tasks in a controlled environment, SATD instances were identified through Python comment mining combined with fine-tuned large language model (LLM) classification techniques, while precise energy measurements across five open-source agent frameworks were obtained using hardware sensors. The findings reveal statistically significant correlations between SATD and energy consumption that vary across architectural designs, suggesting that code quality analysis can serve as an early-warning mechanism for energy inefficiency. This work provides empirical grounding for integrating software sustainability practices into agent system development and advances green software engineering through actionable insights for optimizing both code maintainability and energy performance.
📝 Abstract
Context: Every agentic AI system shipped to production carries two hidden risks: accumulated Technical Debt (TD) and unmonitored runtime energy costs. While functional benchmarking is common, the empirical link between internal structural quality (specifically TD) and dynamic energy consumption during execution remains unexplored, creating a blind spot for practitioners and organizations managing sustainability and operational budgets at scale. Goal: We propose a confirmatory empirical study correlating Self-Admitted Technical Debt (SATD) with hardware-level runtime energy consumption across agentic frameworks, to determine whether code quality can drive energy-aware design decisions. Method: We will evaluate five open-source agentic frameworks by executing a standardized task suite in a strictly controlled environment. SATD will be extracted via automated Python-based comment mining and categorized via LLM-based classification using fine-tuned prompt, while runtime energy will be measured at the hardware level. Our study will investigate three core research questions: (RQ1) the presence of TD within these frameworks; (RQ2) the variance in runtime energy consumption across architectures; and (RQ3) the statistical correlation between a framework's TD and its task-level energy consumption. Conclusion: The findings will establish whether automated source code analysis can serve as a reliable, early-warning proxy for energy-efficient framework selection, thereby advancing both green software engineering and agentic AI quality research.