Adaptively Optimizing the Performance of HPX's Parallel Algorithms

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

HPX suffers from suboptimal C++ Executor performance on heterogeneous hardware due to static resource allocation. To address this, we propose a cores-aware and chunking-aware adaptive executor model that dynamically monitors runtime load, models scheduling overhead, and heuristically adjusts task chunking and core binding—enabling online optimization for both compute- and memory-bound workloads within HPX. Our design fully conforms to the standard C++20 Executor interface and requires no modifications to user code. Experimental evaluation across diverse hardware configurations and representative parallel workloads demonstrates speedups of 1.4–2.3× over baseline static strategies, confirming substantial performance gains while preserving portability and standards compliance.

Technology Category

Application Category

📝 Abstract

C++ Executors simplify the development of parallel algorithms by abstracting concurrency management across hardware architectures. They are designed to facilitate portability and uniformity of user-facing interfaces; however, in some cases they may lead to performance inefficiencies duo to suboptimal resource allocation for a particular workload or not leveraging certain hardware-specific capabilities. To mitigate these inefficiencies we have developed a strategy, based on cores and chunking (workload), and integrated it into HPX's executor API. This strategy dynamically optimizes for workload distribution and resource allocation based on runtime metrics and overheads. In this paper, we introduce the model behind this strategy and evaluate its efficiency by testing its implementation (as an HPX executor) on both compute-bound and memory-bound workloads. The results show speedups across all tests, configurations, and workloads studied. offering improved performance through a familiar and user-friendly C++ executor API.

Problem

Research questions and friction points this paper is trying to address.

Optimizing HPX's parallel algorithms for performance inefficiencies

Dynamic workload and resource allocation based on runtime metrics

Improving performance of compute-bound and memory-bound workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic workload optimization via runtime metrics

Integration of chunking strategy into HPX

Hardware-aware resource allocation for executors

🔎 Similar Papers

No similar papers found.