π€ AI Summary
This study addresses the challenge of real-time leader-follower role identification in human-robot interaction under resource-constrained settings, such as mobile and assistive robotics. It presents the first systematic evaluation of small language models (SLMs) for this task, introducing a new benchmark that integrates both real-world and synthetic data. Focusing on the Qwen2.5-0.5B model, the work compares prompt engineering and fine-tuning strategies under zero-shot and one-shot settings. Experimental results demonstrate that zero-shot fine-tuning achieves 86.66% accuracy with a low inference latency of only 22.2 milliseconds per sample, significantly outperforming baseline approaches. In contrast, the one-shot setting suffers performance degradation due to increased context length, highlighting the critical impact of context size on edge deployment feasibility.
π Abstract
Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.