🤖 AI Summary
Addressing the challenges of detecting AI-generated Arabic text—namely, detection difficulty, scarce resources, and lack of systematic investigation—this paper introduces the first comprehensive Arabic-oriented AI-text detection framework. Methodologically, it integrates multi-prompt generation, cross-architecture model comparison (ALLaM, Jais, Llama, GPT-4), and stylistic analysis across academic and social media domains to extract domain-specific stylistic features; it further proposes a BERT-based fine-tuned detector augmented by linguistically informed stylometric feature engineering. Experimental results demonstrate a 99.9% F1-score on formal Arabic texts, robust coverage of mainstream Arabic LLMs, and significantly improved cross-domain generalization. Contributions include: (1) identifying detectable linguistic signatures unique to Arabic AI-generated text; (2) establishing the first open-source Arabic AI-fingerprint benchmark; and (3) laying the methodological foundation for language-adapted detection systems.
📝 Abstract
Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains, including education, social media, and academia, enabling sophisticated misinformation campaigns, compromising healthcare guidance, and facilitating targeted propaganda. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text, examining multiple generation strategies (generation from the title only, content-aware generation, and text refinement) across diverse model architectures (ALLaM, Jais, Llama, and GPT-4) in academic, and social media domains. Our stylometric analysis reveals distinctive linguistic patterns differentiating human-written from machine-generated Arabic text across these varied contexts. Despite their human-like qualities, we demonstrate that LLMs produce detectable signatures in their Arabic outputs, with domain-specific characteristics that vary significantly between different contexts. Based on these insights, we developed BERT-based detection models that achieved exceptional performance in formal contexts (up to 99.9% F1-score) with strong precision across model architectures. Our cross-domain analysis confirms generalization challenges previously reported in the literature. To the best of our knowledge, this work represents the most comprehensive investigation of Arabic machine-generated text to date, uniquely combining multiple prompt generation methods, diverse model architectures, and in-depth stylometric analysis across varied textual domains, establishing a foundation for developing robust, linguistically-informed detection systems essential for preserving information integrity in Arabic-language contexts.