Watts and Debts of Agentic Frameworks: An Empirical Study (Registered Report)

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study presents the first empirical investigation into the relationship between self-admitted technical debt (SATD) and runtime energy consumption within intelligent agent frameworks, addressing a critical gap between code quality and energy efficiency. By executing standardized tasks in a controlled environment, SATD instances were identified through Python comment mining combined with fine-tuned large language model (LLM) classification techniques, while precise energy measurements across five open-source agent frameworks were obtained using hardware sensors. The findings reveal statistically significant correlations between SATD and energy consumption that vary across architectural designs, suggesting that code quality analysis can serve as an early-warning mechanism for energy inefficiency. This work provides empirical grounding for integrating software sustainability practices into agent system development and advances green software engineering through actionable insights for optimizing both code maintainability and energy performance.
📝 Abstract
Context: Every agentic AI system shipped to production carries two hidden risks: accumulated Technical Debt (TD) and unmonitored runtime energy costs. While functional benchmarking is common, the empirical link between internal structural quality (specifically TD) and dynamic energy consumption during execution remains unexplored, creating a blind spot for practitioners and organizations managing sustainability and operational budgets at scale. Goal: We propose a confirmatory empirical study correlating Self-Admitted Technical Debt (SATD) with hardware-level runtime energy consumption across agentic frameworks, to determine whether code quality can drive energy-aware design decisions. Method: We will evaluate five open-source agentic frameworks by executing a standardized task suite in a strictly controlled environment. SATD will be extracted via automated Python-based comment mining and categorized via LLM-based classification using fine-tuned prompt, while runtime energy will be measured at the hardware level. Our study will investigate three core research questions: (RQ1) the presence of TD within these frameworks; (RQ2) the variance in runtime energy consumption across architectures; and (RQ3) the statistical correlation between a framework's TD and its task-level energy consumption. Conclusion: The findings will establish whether automated source code analysis can serve as a reliable, early-warning proxy for energy-efficient framework selection, thereby advancing both green software engineering and agentic AI quality research.
Problem

Research questions and friction points this paper is trying to address.

Technical Debt
Energy Consumption
Agentic Frameworks
Sustainability
Runtime Costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Technical Debt
Energy Consumption
Agentic AI
Self-Admitted Technical Debt
Green Software Engineering