CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis

📅 2024-06-18
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work presents the first systematic study of collaborative writing among multiple large language models (LLMs) in open-ended tasks, focusing on story generation, management, and authorship attribution by ensembles of 1–5 open-source instruction-tuned LLMs (e.g., Llama, Mistral). We formalize three novel tasks: multi-LLM author attribution, contribution identification, and stylistic consistency assessment. To support this, we introduce the first publicly available dataset of multi-author stories generated exclusively by LLMs (32K+ samples) and adapt the PAN evaluation protocol for rigorous benchmarking. Empirical results demonstrate a substantial performance degradation of existing authorship analysis methods in multi-LLM settings. We release the dataset, source code, and evaluation framework to establish a foundational benchmark for detecting LLM collaboration misuse, upholding academic integrity, and enabling copyright governance in AI-generated content.

Technology Category

Application Category

📝 Abstract
The rise of unifying frameworks that enable seamless interoperability of Large Language Models (LLMs) has made LLM-LLM collaboration for open-ended tasks a possibility. Despite this, there have not been efforts to explore such collaborative writing. We take the next step beyond human-LLM collaboration to explore this multi-LLM scenario by generating the first exclusively LLM-generated collaborative stories dataset called CollabStory. We focus on single-author to multi-author (up to 5 LLMs) scenarios, where multiple LLMs co-author stories. We generate over 32k stories using open-source instruction-tuned LLMs. Further, we take inspiration from the PAN tasks that have set the standard for human-human multi-author writing tasks and analysis. We extend their authorship-related tasks for multi-LLM settings and present baselines for LLM-LLM collaboration. We find that current baselines are not able to handle this emerging scenario. Thus, CollabStory is a resource that could help propel an understanding as well as the development of new techniques to discern the use of multiple LLMs. This is crucial to study in the context of writing tasks since LLM-LLM collaboration could potentially overwhelm ongoing challenges related to plagiarism detection, credit assignment, maintaining academic integrity in educational settings, and addressing copyright infringement concerns. We make our dataset and code available at https://github.com/saranya-venkatraman/CollabStory.
Problem

Research questions and friction points this paper is trying to address.

Exploring multi-LLM collaborative story generation
Analyzing authorship in LLM co-authored stories
Addressing challenges in plagiarism detection and copyright
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-LLM collaborative story generation
Authorship analysis extension
Open-source instruction-tuned LLMs
🔎 Similar Papers
No similar papers found.