Agent for User: Testing Multi-User Interactive Features in TikTok

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Testing multi-user interactive features—such as live streaming and voice calls—in social apps (e.g., TikTok) is challenging due to the need for synchronized multi-device coordination, role-aware behavior modeling, and real-time interaction simulation. To address this, we propose the first LLM-driven multi-agent collaborative testing paradigm tailored for multi-user interaction scenarios. Within a virtual device farm, we deploy dedicated LLM agents for each user role, enabling cross-device, role-specific behavioral modeling and task orchestration. Our approach integrates action-sequence modeling with cross-device coordinated control. Evaluated on 24 multi-user tasks, it achieves 75% task coverage and 85.9% action similarity, reducing manual testing effort by 87%. Deployed on TikTok’s internal testing platform, the method uncovered 26 interaction-related defects.

Technology Category

Application Category

📝 Abstract

TikTok, a widely-used social media app boasting over a billion monthly active users, requires effective app quality assurance for its intricate features. Feature testing is crucial in achieving this goal. However, the multi-user interactive features within the app, such as live streaming, voice calls, etc., pose significant challenges for developers, who must handle simultaneous device management and user interaction coordination. To address this, we introduce a novel multi-agent approach, powered by the Large Language Models (LLMs), to automate the testing of multi-user interactive app features. In detail, we build a virtual device farm that allocates the necessary number of devices for a given multi-user interactive task. For each device, we deploy an LLM-based agent that simulates a user, thereby mimicking user interactions to collaboratively automate the testing process. The evaluations on 24 multi-user interactive tasks within the TikTok app, showcase its capability to cover 75% of tasks with 85.9% action similarity and offer 87% time savings for developers. Additionally, we have also integrated our approach into the real-world TikTok testing platform, aiding in the detection of 26 multi-user interactive bugs.

Problem

Research questions and friction points this paper is trying to address.

Automating testing for TikTok's multi-user interactive features

Managing simultaneous device and user interaction coordination

Detecting bugs in live streaming and voice call features

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agents simulate user interactions

Virtual device farm manages multiple devices

Automates testing with high action similarity

🔎 Similar Papers

COMMA: A Communicative Multimodal Multi-Agent Benchmark