SwarmChat: An LLM-Based, Context-Aware Multimodal Interaction System for Robotic Swarms

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional human–swarm interaction (HSI) suffers from unintuitive, non-adaptive interfaces, resulting in delayed decision-making, high cognitive load, and inflexible command input. To address these limitations, this paper proposes a large language model (LLM)-driven, context-aware multimodal HSI system. The system employs a four-module collaborative architecture—intent recognition, task planning, modality selection, and state feedback—that integrates text, speech, and teleoperation inputs while enabling dynamic switching between predefined and natural-language commands. By leveraging LLMs for enhanced contextual understanding and real-time closed-loop state awareness, the system significantly reduces user cognitive load. Preliminary evaluation demonstrates superior performance over baseline methods in intent recognition accuracy, command execution success rate, and user satisfaction. The approach advances HSI toward greater naturalness, adaptability, and collaborative efficiency.

Technology Category

Application Category

📝 Abstract
Traditional Human-Swarm Interaction (HSI) methods often lack intuitive real-time adaptive interfaces, making decision making slower and increasing cognitive load while limiting command flexibility. To solve this, we present SwarmChat, a context-aware, multimodal interaction system powered by Large Language Models (LLMs). SwarmChat enables users to issue natural language commands to robotic swarms using multiple modalities, such as text, voice, or teleoperation. The system integrates four LLM-based modules: Context Generator, Intent Recognition, Task Planner, and Modality Selector. These modules collaboratively generate context from keywords, detect user intent, adapt commands based on real-time robot state, and suggest optimal communication modalities. Its three-layer architecture offers a dynamic interface with both fixed and customizable command options, supporting flexible control while optimizing cognitive effort. The preliminary evaluation also shows that the SwarmChat's LLM modules provide accurate context interpretation, relevant intent recognition, and effective command delivery, achieving high user satisfaction.
Problem

Research questions and friction points this paper is trying to address.

Addresses lack of intuitive real-time adaptive interfaces in Human-Swarm Interaction
Enables natural language command of robotic swarms via multiple modalities
Generates context, recognizes intent, and plans tasks using LLM modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered multimodal interaction system
Four LLM modules for context and intent
Dynamic three-layer architecture for control
🔎 Similar Papers
No similar papers found.
E
Ettilla Mohiuddin Eumi
School of Systems & Computing, UNSW Canberra, Canberra ACT 2600, Australia
H
Hussein Abbass
School of Systems & Computing, UNSW Canberra, Canberra ACT 2600, Australia
Nadine Marcus
Nadine Marcus
Associate Professor, School of Computer Science, University of NSW
eLearningCognitive Load TheoryInstructional AnimationsHuman Computer InteractionEducational Psychology