DETree: DEtecting Human-AI Collaborative Texts via Tree-Structured Hierarchical Representation Learning

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The increasing prevalence of human-AI collaborative text generation—e.g., AI draft + human editing, human draft + AI rewriting, or multi-model cascaded refinement—introduces complex, ambiguous textual features that challenge existing binary or coarse-grained multiclass detection methods, which fail to capture intrinsic hierarchical relationships among generation patterns. Method: We propose a tree-structured hierarchical representation learning framework: (1) constructing a generation-process affinity tree to encode semantic hierarchies of collaboration modes; (2) designing a hierarchy-aware loss function to align textual representations with the tree topology; and (3) employing tree neural networks for fine-grained, layered modeling. Contribution/Results: Evaluated on our newly constructed RealBench benchmark, our method achieves significant improvements in mixed-text detection accuracy, demonstrates superior out-of-distribution generalization, and maintains robustness under few-shot settings—establishing a novel paradigm for fine-grained AI content provenance.

Technology Category

Application Category

📝 Abstract
Detecting AI-involved text is essential for combating misinformation, plagiarism, and academic misconduct. However, AI text generation includes diverse collaborative processes (AI-written text edited by humans, human-written text edited by AI, and AI-generated text refined by other AI), where various or even new LLMs could be involved. Texts generated through these varied processes exhibit complex characteristics, presenting significant challenges for detection. Current methods model these processes rather crudely, primarily employing binary classification (purely human vs. AI-involved) or multi-classification (treating human-AI collaboration as a new class). We observe that representations of texts generated through different processes exhibit inherent clustering relationships. Therefore, we propose DETree, a novel approach that models the relationships among different processes as a Hierarchical Affinity Tree structure, and introduces a specialized loss function that aligns text representations with this tree. To facilitate this learning, we developed RealBench, a comprehensive benchmark dataset that automatically incorporates a wide spectrum of hybrid texts produced through various human-AI collaboration processes. Our method improves performance in hybrid text detection tasks and significantly enhances robustness and generalization in out-of-distribution scenarios, particularly in few-shot learning conditions, further demonstrating the promise of training-based approaches in OOD settings. Our code and dataset are available at https://github.com/heyongxin233/DETree.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-involved texts in diverse human-AI collaborative processes
Addressing limitations of current binary or multi-class classification methods
Improving detection robustness for out-of-distribution and few-shot scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-structured hierarchical representation learning for text detection
Hierarchical Affinity Tree modeling human-AI collaboration relationships
Specialized loss function aligning text representations with tree structure
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6
Y
Yongxin He
Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; State Key Lab of AI Safety, Beijing, China; University of Chinese Academy of Sciences, CAS, Beijing, China
S
Shan Zhang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, CAS, Beijing, China
Yixuan Cao
Yixuan Cao
Shenzhen University
Software EngineeringSecurityKernel & CompilerTesting & VerificationBig Data
L
Lei Ma
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, CAS, Beijing, China
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing