🤖 AI Summary
Modern compilers (e.g., LLVM, GCC) struggle to fully exploit SIMD parallelism (e.g., AVX, RVV) due to complex control-flow analysis, limited IR expressiveness, and fragmented vectorization pipelines. This work proposes a scalable vectorization framework: first, it extends LLVM IR with two novel representations—Structured IR (SIR) to explicitly encode control-flow structures, and Vector IR (VIR) to precisely model data dependencies; second, it integrates CFG reconstruction, fine-grained dependency analysis, and cross-IR pattern matching to improve both the accuracy of vectorization opportunity detection and the completeness of the search space. Experimental evaluation on standard benchmarks demonstrates performance improvements of up to 53% over LLVM and 58% over GCC, while significantly enhancing vectorization coverage and end-to-end compilation efficiency.
📝 Abstract
Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks.
To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.