🤖 AI Summary
This study addresses the lack of pre-execution energy consumption prediction capabilities in existing scientific workflows on heterogeneous clusters, which hinders energy- and carbon-aware scheduling. The authors propose a lightweight approach that integrates cluster hardware characterization with static task analysis to enable accurate runtime and energy predictions for previously unseen workflows, inputs, or platforms—without requiring actual execution or extensive historical data. Experimental results demonstrate that the method achieves median total energy prediction errors as low as 16.3% in public cloud environments and 18.2% in private cloud settings, significantly outperforming state-of-the-art techniques such as Ichnos and Intel RAPL.
📝 Abstract
Scientific workflows are widely used to process large quantities of data, leading to significant energy consumption and carbon emissions. To reduce this environmental impact, energy and carbon-aware scheduling approaches could be employed. However, such methods require runtime and energy predictions, which are typically only available for workflows that have been executed previously. Meanwhile, scientists may execute new or modified workflows, use workflows with different input data, or run them on alternative infrastructure. To address this critical gap, we propose Augur, a novel method to predict the energy consumption of scientific workflow tasks prior to execution. By efficiently profiling both the available cluster infrastructure and the workflow at hand, Augur is capable of predicting the overall energy consumption of the workflow with a median prediction error of $16.3\pm15.3\%$ compared to Ichnos, an energy estimation method that uses fitted power models, and $18.2\pm14.7\%$ compared to Intel RAPL, as observed in our experimental evaluation on public and private cloud infrastructure. Relying on only minimal historical execution data, Augur outperforms two state-of-the-art methods in predicting both task runtime and total workflow energy, providing a robust foundation for energy-efficient and carbon-aware scientific data analysis.