An Empirical Comparison of General Context-Free Parsers

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the longstanding trade-off between expressiveness and performance in parsing by systematically evaluating generalized context-free parsers against deterministic baselines. While deterministic parsers such as LL(1) and LR(1) constrain language design, generalized parsers offer greater expressivity but lack comprehensive empirical assessment. The authors implement six generalized algorithms—CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR—in a unified Rust framework and conduct controlled benchmarks across 22 grammars ranging from arithmetic expressions to full C++ and Java specifications. Their rigorous, reproducible analysis reveals that the performance overhead of generalized parsing is substantially lower than commonly assumed: on deterministic grammars, GLR-family parsers are only about three times slower than LR(1) (median), with low variance and high stability, establishing them as the pragmatic choice for real-world applications requiring full context-free expressiveness.

📝 Abstract

Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers deployed in practice are deterministic (LL or LR), forcing developers not only to contort their grammars to fit the parser, but to simplify the very languages they design sacrificing expressiveness for the sake of parseability. General context-free parsers eliminate this constraint. Yet, despite decades of algorithmic development, no rigorous head-to-head comparison exists across the major families of parsing algorithms. We present the first unified, controlled benchmark of six generalized parsing algorithms: CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR, plus deterministic LL(1) and LR(1) baselines, all implemented in Rust with shared data structures and parse-tree extraction, and evaluated across 22 grammars ranging from simple expressions to full C++ and Java. Our results show that the cost of generality is lower than widely assumed. On deterministic grammars, the GLR family incurs only a 3x median slowdown over LR(1), with a narrow and predictable variance. GLR is the clear performance winner among generalized parsers and a practical default choice for software engineering tools.

Problem

Research questions and friction points this paper is trying to address.

general context-free parsing

parser comparison

grammar expressiveness

deterministic parsing

parsing performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

generalized parsing

GLR

empirical benchmark