Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Hardware prefetchers sharing limited resources often suffer from request conflicts, while existing selection mechanisms exhibit two critical deficiencies: inaccurate demand allocation and coarse-grained decision making. This paper proposes a synergistic mechanism combining dynamic demand-aware request allocation with fine-grained prefetcher selection. We introduce customized request routing—novel for prefetching—to enable timing-sensitive prefetch table management and low-overhead pattern matching. Furthermore, we design Alecto, a lightweight, rule-based algorithm incorporating adaptive decision logic to support real-time request分流. Evaluation shows that our approach improves single-core and 8-core performance by 2.76% and 7.56%, respectively, over the RL-based Bandit baseline; achieves 5.25% average speedup on memory-intensive benchmarks; reduces prefetch table access energy by 48%; and incurs less than 1 KB of storage overhead.

Technology Category

Application Category

📝 Abstract

Hardware prefetching plays a critical role in hiding the off-chip DRAM latency. The complexity of applications results in a wide variety of memory access patterns, prompting the development of numerous cache-prefetching algorithms. Consequently, commercial processors often employ a hybrid of these algorithms to enhance the overall prefetching performance. Nonetheless, since these prefetchers share hardware resources, conflicts arising from competing prefetching requests can negate the benefits of hardware prefetching. Under such circumstances, several prefetcher selection algorithms have been proposed to mitigate conflicts between prefetchers. However, these prior solutions suffer from two limitations. First, the input demand request allocation is inaccurate. Second, the prefetcher selection criteria are coarse-grained. In this paper, we address both limitations by introducing an efficient and widely applicable prefetcher selection algorithm--Alecto, which tailors the demand requests for each prefetcher. Every demand request is first sent to Alecto to identify suitable prefetchers before being routed to prefetchers for training and prefetching. Our analysis shows that Alecto is adept at not only harmonizing prefetching accuracy, coverage, and timeliness but also significantly enhancing the utilization of the prefetcher table, which is vital for temporal prefetching. Alecto outperforms the state-of-the-art RL-based prefetcher selection algorithm--Bandit by 2.76% in single-core, and 7.56% in eight-core. For memory-intensive benchmarks, Alecto outperforms Bandit by 5.25%. Alecto consistently delivers state-of-the-art performance in scheduling various types of cache prefetchers. In addition to the performance improvement, Alecto can reduce the energy consumption associated with accessing the prefetchers' table by 48%, while only adding less than 1 KB of storage overhead.

Problem

Research questions and friction points this paper is trying to address.

Improves prefetching efficiency by integrating selection and allocation

Addresses inaccurate demand request allocation in prefetchers

Enhances prefetcher selection criteria for better performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic request allocation for prefetcher selection

Tailored demand requests for each prefetcher

Efficient prefetcher table utilization enhancement

🔎 Similar Papers

No similar papers found.

AMD

San Jose, CA (Hybrid) / other US locations

Research Intern - AI Frameworks (Network Systems and Tools)

Microsoft

$6,710 -

San Francisco Bay area / New York City metropolitan area

Research Scientist, AI & Systems Co-design (PhD)