The DeepSpeak-Agentic Dataset

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This study addresses the lack of high-quality multimodal dialogue datasets for recognizing and analyzing interactions between embodied AI agents and humans. To bridge this gap, the authors introduce DeepSpeak-Agentic, a novel dataset comprising over 37 hours of synchronized audiovisual recordings from semi-structured human–agent conversations, establishing the first multimodal dialogue benchmark specifically designed for embodied AI agents. The work presents a scalable data collection framework that integrates AI agent deployment, crowdworker pairing, automated recording, and signal separation techniques to distinguish human and agent contributions. DeepSpeak-Agentic provides a public, standardized evaluation foundation for research in AI-generated speech and facial animation, human–agent interaction modeling, and automated forensic identification tasks.

📝 Abstract

We present DeepSpeak-Agentic, a dataset of videos comprising over 37 hours of semi-structured conversations between a human and an embodied AI agent. We use this dataset to evaluate the automatic forensic identification (audio, video, or text) of AI agents, study the nature of human-agent interactions, and provide a benchmark for future advances in the large-language models and AI-generated voices and faces that power embodied AI agents. We also contribute a scalable data-capture system that creates agents, automatically pairs them with human crowd workers, records audiovisual conversations across specified scenarios, and identifies and separates the human and agent in the combined stream.

Problem

Research questions and friction points this paper is trying to address.

embodied AI agents

automatic forensic identification

human-agent interaction

AI-generated voices

AI-generated faces

Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied AI agent

automatic forensic identification

scalable data-capture system