Compiler Support for Speculation in Decoupled Access/Execute Architectures

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Irregular code suffers from performance bottlenecks due to memory and communication latency, and conventional decoupled access-execute (DAE) architectures fail to maintain effective decoupling under control dependencies. Method: This paper extends DAE to control-dependent scenarios via a compiler-driven memory speculation mechanism—enabling speculative memory requests and poisoning markers without replay or explicit synchronization—while strictly preserving sequential consistency. The approach integrates control-flow-graph–guided compilation optimizations, speculative address generation, execution-side poisoning detection, and formal verification tailored for reducible control flow. Contribution/Results: Our method overcomes the fundamental limitation of traditional DAE in control-dependent contexts, significantly improving DAE coverage and latency-hiding capability across diverse hardware—including CPU/GPU prefetchers, CGRAs, and domain-specific accelerators. As a result, previously non-decouplable irregular programs achieve substantial performance gains.

Technology Category

Application Category

📝 Abstract

Irregular codes are bottlenecked by memory and communication latency. Decoupled access/execute (DAE) is a common technique to tackle this problem. It relies on the compiler to separate memory address generation from the rest of the program, however, such a separation is not always possible due to control and data dependencies between the access and execute slices, resulting in a loss of decoupling. In this paper, we present compiler support for speculation in DAE architectures that preserves decoupling in the face of control dependencies. We speculate memory requests in the access slice and poison mis-speculations in the execute slice without the need for replays or synchronization. Our transformation works on arbitrary, reducible control flow and is proven to preserve sequential consistency. We show that our approach applies to a wide range of architectural work on CPU/GPU prefetchers, CGRAs, and accelerators, enabling DAE on a wider range of codes than before.

Problem

Research questions and friction points this paper is trying to address.

Data Transmission Latency

Irregular Code Processing

DAE Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced DAE Technique

Complex Code Handling

Hardware Platform Versatility

🔎 Similar Papers

CacheSquash: Making caches speculation-aware