On the Hardness of Learning Regular Expressions

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper investigates the computational complexity of learning regular expressions under the PAC learning model and the membership query (MQ) framework. For improper learning, it establishes— for the first time—two fundamental hardness results: (1) PAC learning of regular expressions over the uniform distribution is NP-hard; (2) even with access to extended operations such as intersection or complement, MQ learning remains infeasible, and this result holds independently of classical hardness results for DFAs/NFAs. Methodologically, the work integrates tools from computational complexity theory, formal language theory, and automata theory to derive rigorous lower bounds under both distribution-free and uniform-distribution assumptions. The main contribution is the first rigorous demonstration that regular expressions are inherently hard to learn across multiple standard learning models, thereby resolving a long-standing theoretical gap in computational learning theory.

Technology Category

Application Category

📝 Abstract

Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions in the PAC model and with membership queries. We show that PAC learning is hard even under the uniform distribution on the hypercube, and also prove hardness of distribution-free learning with membership queries. Furthermore, if regular expressions are extended with complement or intersection, we establish hardness of learning with membership queries even under the uniform distribution. We emphasize that these results do not follow from existing hardness results for learning DFAs or NFAs, since the descriptive complexity of regular languages can differ exponentially between DFAs, NFAs, and regular expressions.

Problem

Research questions and friction points this paper is trying to address.

Investigating computational hardness of learning regular expressions

Establishing PAC learning difficulty under uniform distribution

Proving hardness with membership queries for extended expressions

Innovation

Methods, ideas, or system contributions that make the work stand out.

PAC learning hardness for regular expressions

Membership query learning complexity established

Extended operators increase computational difficulty

🔎 Similar Papers

What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages