🤖 AI Summary
This work addresses the scheduling of partially replicable task chains (e.g., SDR communication standards) on heterogeneous multicore platforms, jointly optimizing throughput and power consumption. We formulate the problem—uniquely integrating partial replicability and big-little core co-scheduling—as a dual-resource pipelined workflow scheduling problem. To solve it, we propose: (i) FERTAC/2CATAC, a near-optimal greedy algorithm; and (ii) HeRAD, an optimal dynamic programming algorithm—both unifying pipelined and replication-based parallelism. Experiments show that FERTAC/2CATAC achieves average cycle times within <10% of HeRAD’s, with at most two additional cores overhead. On the StreamPU platform and in real-world DVB-S2 deployments, our approach attains >92% of theoretical peak throughput, significantly improving energy efficiency and scalability.
📝 Abstract
The arrival of heterogeneous (or hybrid) multicore architectures on parallel platforms has brought new performance opportunities for applications and efficiency opportunities to systems. They have also increased the challenges related to thread scheduling, as tasks' execution times will vary depending if they are placed in big (performance) cores or little (efficient) ones. In this paper, we focus on the challenges heterogeneous multicore problems bring to partially-replicable task chains, such as the ones that implement digital communication standards in Software-Defined Radio (SDR). Our objective is to maximize the throughput of these task chains while also minimizing their power consumption. We model this problem as a pipelined workflow scheduling problem using pipelined and replicated parallelism on two types of resources whose objectives are to minimize the period and to use as many little cores as necessary. We propose two greedy heuristics (FERTAC and 2CATAC) and one optimal dynamic programming (HeRAD) solution to the problem. We evaluate our solutions and compare the quality of their schedules (in period and resource utilization) and their execution times using synthetic task chains and an implementation of the DVB-S2 communication standard running on StreamPU. Our results demonstrate the benefits and drawbacks of the different proposed solutions. On average, FERTAC and 2CATAC achieve near-optimal solutions, with periods that are less than 10% worse than the optimal (HeRAD) using fewer than 2 extra cores. These three scheduling strategies now enable programmers and users of StreamPU to transparently make use of heterogeneous multicore processors and achieve throughputs that differ from their theoretical maximums by less than 8% on average.