🤖 AI Summary
This work proposes a lightweight, interpretable method for detecting text generated by large language models based on syntactic dependency structures. Leveraging only dependency parse labels in conjunction with traditional machine learning classifiers, the approach achieves competitive performance across monolingual, multi-generator, and multilingual settings. The study establishes a novel non-neural, linguistically grounded baseline and reveals systematic differences in syntactic patterns between human-written and AI-generated texts. Furthermore, it identifies a tendency of the model to over-predict machine-generated content in unseen domains, highlighting limitations in its generalization capability.
📝 Abstract
As large language models (LLMs) become increasingly prevalent, reliable methods for detecting AI-generated text are critical for mitigating potential risks. We introduce DependencyAI, a simple and interpretable approach for detecting AI-generated text using only the labels of linguistic dependency relations. Our method achieves competitive performance across monolingual, multi-generator, and multilingual settings. To increase interpretability, we analyze feature importance to reveal syntactic structures that distinguish AI-generated from human-written text. We also observe a systematic overprediction of certain models on unseen domains, suggesting that generator-specific writing styles may affect cross-domain generalization. Overall, our results demonstrate that dependency relations alone provide a robust signal for AI-generated text detection, establishing DependencyAI as a strong linguistically grounded, interpretable, and non-neural network baseline.