AI-Generated Content (AIGC) for Various Data Modalities: A Survey

📅 2023-08-27

📈 Citations: 25

✨ Influential: 0

🤖 AI Summary

This survey addresses the growing complexity and fragmentation in AI-generated content (AIGC) research across diverse modalities. We systematically review generative methods and cross-modal translation techniques—including text-to-image, audio-to-video, and others—spanning seven modalities: text, image, video, 3D shape/scene/portrait/motion, and audio. Methodologically, we establish the first unified analytical framework covering all modalities, grounded in foundational architectures: GANs, VAEs, diffusion models, autoregressive Transformers, and multimodal alignment paradigms (e.g., CLIP, Flux). Our key contributions include: (i) a novel taxonomy of cross-modal generation paradigms; (ii) a horizontal multimodal comparison framework; (iii) synthesis of 120+ representative works; and (iv) a consolidated analysis of datasets, evaluation metrics, shared challenges, performance bottlenecks, and a comparative performance table. The survey provides systematic guidance for AIGC technology selection, benchmark development, and future research directions.

📝 Abstract

AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the potential of recent works, AIGC developments -- especially in Machine Learning (ML) and Deep Learning (DL) -- have been attracting significant attention, and this survey focuses on comprehensively reviewing such advancements in ML/DL. AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape, 3D scene, 3D human avatar, 3D motion, and audio -- each presenting unique characteristics and challenges. Furthermore, there have been significant developments in cross-modality AIGC methods, where generative methods receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D, and audio. This paper provides a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we discuss the typical applications of AIGC methods in various domains, challenges, and future research directions.

Problem

Research questions and friction points this paper is trying to address.

Artificial Intelligence Generated Content

Multimodal Creation

Cross-modal Conversion

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-generated content

machine learning evolution

media type conversion

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

2024-06-21Journal of Artificial Intelligence ResearchCitations: 6

Uncertain Boundaries: Multidisciplinary Approaches to Copyright Issues in Generative AI

2024-03-31arXiv.orgCitations: 9

Authors to Follow