🤖 AI Summary
This work addresses the challenge that blurred task boundaries in natural language processing confound traditional relevance assessments, obscuring the true relationship between data quantity and learnability. To overcome this, the authors construct a formal-language benchmark based on probabilistic finite automata and introduce an algebraic structure termed the “binned semiring” to precisely control the frequency of target attributes in synthetic corpora. Integrating causal graphical models with controllable corpus generation, they enable causal interventions into the learning process. Furthermore, they propose a learnability metric grounded in decomposed Kullback–Leibler divergence, establishing the first causal evaluation framework for formal languages. Experiments demonstrate that non-causal evaluations yield misleading conclusions due to confounding bias, thereby validating the necessity of causal methods in learnability analysis and offering a new paradigm for evaluating natural language tasks.
📝 Abstract
Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.