RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

📅 2024-06-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the challenge of detecting AI-generated content (AIGC) in the absence of aligned multimodal data. To this end, we introduce RU-AI—the first large-scale, tri-modal (text-image-speech) AIGC detection benchmark—comprising over 1.47 million real-AI paired samples and systematically designed noise variants. Methodologically, we synthesize aligned AI counterparts of Flickr8K, COCO, and Places205 using multimodal alignment-aware generation and controllable noise augmentation. Our key contributions are: (1) the first high-quality, tri-modal aligned dataset of natural vs. AI-generated content; (2) a novel robustness evaluation paradigm that explicitly tests model resilience to diverse corruptions; and (3) empirical validation showing significant performance degradation of current state-of-the-art detectors on RU-AI, confirming its rigor and challenge. All code and data are publicly released, establishing RU-AI as a new standard benchmark for multimodal AIGC detection.

Technology Category

Application Category

📝 Abstract

The recent generative AI models' capability of creating realistic and human-like content is significantly transforming the ways in which people communicate, create and work. The machine-generated content is a double-edged sword. On one hand, it can benefit the society when used appropriately. On the other hand, it may mislead people, posing threats to the society, especially when mixed together with natural content created by humans. Hence, there is an urgent need to develop effective methods to detect machine-generated content. However, the lack of aligned multimodal datasets inhibited the development of such methods, particularly in triple-modality settings (e.g., text, image, and voice). In this paper, we introduce RU-AI, a new large-scale multimodal dataset for robust and effective detection of machine-generated content in text, image and voice. Our dataset is constructed on the basis of three large publicly available datasets: Flickr8K, COCO and Places205, by adding their corresponding AI duplicates, resulting in a total of 1,475,370 instances. In addition, we created an additional noise variant of the dataset for testing the robustness of detection models. We conducted extensive experiments with the current SOTA detection methods on our dataset. The results reveal that existing models still struggle to achieve accurate and robust detection on our dataset. We hope that this new data set can promote research in the field of machine-generated content detection, fostering the responsible use of generative AI. The source code and datasets are available at https://github.com/ZhihaoZhang97/RU-AI.

Problem

Research questions and friction points this paper is trying to address.

Detect machine-generated content effectively

Address lack of multimodal datasets

Improve robustness in triple-modality settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal dataset creation

Triple-modality content detection enhancement

Robustness testing with noise variants

🔎 Similar Papers

Detecting Multimedia Generated by Large AI Models: A Survey