🤖 AI Summary
Existing benchmark suites often lack structural complexity, semantic validity, and cross-language coverage, limiting their utility for systematic performance evaluation of compilers, runtimes, and hardware—particularly beyond bug detection.
Method: This paper introduces a formal L-system–based methodology for generating large-scale, semantically correct, structurally intricate artificial benchmarks across C, C++, Julia, and Go. It adapts biological L-system rewriting mechanisms to program generation, integrating multi-language grammar embeddings, iterative production rules, and compiler behavior modeling to ensure syntactic well-formedness, scalability, and semantic legality.
Contribution/Results: The resulting PGO experimental framework enables six in-depth case studies, uncovering previously undocumented phenomena—including Clang/GCC performance divergence, language-ecosystem boundary effects, GCC’s historical optimization trends, asymptotic phase-wise behavior in Clang compilation, and empirical performance characteristics of GLib data structures. This work establishes a novel paradigm for system-level, architecture-agnostic performance assessment.
📝 Abstract
L-systems are a mathematical formalism proposed by biologist Aristid Lindenmayer with the aim of simulating organic structures such as trees, snowflakes, flowers, and other branching phenomena. They are implemented as a formal language that defines how patterns can be iteratively rewritten. This paper describes how such a formalism can be used to create artificial programs written in programming languages such as C, C++, Julia and Go. These programs, being large and complex, can be used to test the performance of compilers, operating systems, and computer architectures. This paper demonstrates the usefulness of these benchmarks through multiple case studies. These case studies include a comparison between clang and gcc; a comparison between C, C++, Julia and Go; a study of the historical evolution of gcc in terms of code quality; a look into the effects of profile guided optimizations in gcc; an analysis of the asymptotic behavior of the different phases of clang's compilation pipeline; and a comparison between the many data structures available in the Gnome Library (GLib). These case studies demonstrate the benefits of the L-System approach to create benchmarks, when compared with fuzzers such as CSmith, which were designed to uncover bugs in compilers, rather than evaluating their performance.