🤖 AI Summary
This study addresses the scarcity of syntactic acceptability annotation data by constructing the largest publicly available English syntactic acceptability dataset to date (1,000 sentences), drawn from textbook and academic corpora and annotated for both grammaticality and acceptability. Sentences were selected via literature-informed sampling, and high-fidelity annotations were obtained through rigorously designed crowdsourcing experiments. Results show that grammaticality and acceptability judgments align in 83% of cases; however, machine learning models predict acceptability significantly more accurately than grammaticality—revealing a systematic dissociation between native speakers’ intuitive judgments and formal grammatical constraints. Statistical and modeling analyses further confirm the pervasive existence of “intermediate” acceptability states—neither fully acceptable nor fully unacceptable. This dataset provides a critical empirical resource for evaluating NLP models, testing linguistic theories, and advancing interdisciplinary cognitive modeling of syntactic judgment.
📝 Abstract
We present a preview of the Syntactic Acceptability Dataset, a resource being designed for both syntax and computational linguistics research. In its current form, the dataset comprises 1,000 English sequences from the syntactic discourse: Half from textbooks and half from the journal Linguistic Inquiry, the latter to ensure a representation of the contemporary discourse. Each entry is labeled with its grammatical status ("well-formedness" according to syntactic formalisms) extracted from the literature, as well as its acceptability status ("intuitive goodness" as determined by native speakers) obtained through crowdsourcing, with highest experimental standards. Even in its preliminary form, this dataset stands as the largest of its kind that is publicly accessible. We also offer preliminary analyses addressing three debates in linguistics and computational linguistics: We observe that grammaticality and acceptability judgments converge in about 83% of the cases and that "in-betweenness" occurs frequently. This corroborates existing research. We also find that while machine learning models struggle with predicting grammaticality, they perform considerably better in predicting acceptability. This is a novel finding. Future work will focus on expanding the dataset.