๐ค AI Summary
To address the lack of systematic frameworks and robust criteria for AI-generated text detection, this paper proposes the Hierarchical AI-text Risk Taxonomy (HART) and a content-expression two-dimensional decoupling detection paradigm. First, it systematically defines four hierarchical risk levels for AI-generated text. Second, it orthogonally decomposes text into semantic content and linguistic expression dimensions, revealing that content-level features exhibit strong robustness against paraphrasing attacks and thus serve as intrinsic discriminative cues. Third, it achieves fine-grained detection via hierarchical task design, dual-channel modeling, contrastive learning-driven content-invariance modeling, and multi-granularity expression analysis. On the Level-2 and RAID benchmarks, the method achieves AUROC scores of 0.849 and 0.886โoutperforming state-of-the-art methods by 14.4% and 7.9%, respectively. The code and datasets are publicly released.
๐ Abstract
The wide usage of LLMs raises critical requirements on detecting AI participation in texts. Existing studies investigate these detections in scattered contexts, leaving a systematic and unified approach unexplored. In this paper, we present HART, a hierarchical framework of AI risk levels, each corresponding to a detection task. To address these tasks, we propose a novel 2D Detection Method, decoupling a text into content and language expression. Our findings show that content is resistant to surface-level changes, which can serve as a key feature for detection. Experiments demonstrate that 2D method significantly outperforms existing detectors, achieving an AUROC improvement from 0.705 to 0.849 for level-2 detection and from 0.807 to 0.886 for RAID. We release our data and code at https://github.com/baoguangsheng/truth-mirror.