A Component-Based Survey of Interactions between Large Language Models and Multi-Armed Bandits

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the decision-making challenges faced by large language models (LLMs) in training, retrieval-augmented generation, and personalization, as well as the representational limitations of multi-armed bandits (MABs) in action definition and environment modeling. It presents the first systematic, component-level analysis of the bidirectional interaction mechanisms between LLMs and MABs. By establishing a component-wise analytical framework and conducting a comprehensive literature review, the study introduces a novel mutual enhancement paradigm—“LLM-enhanced MAB” and “MAB-enhanced LLM”—characterizing representative architectures, performance trends, and key challenges. To foster further research, the authors also release an open-access literature index repository.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have become powerful and widely used systems for language understanding and generation, while multi-armed bandit (MAB) algorithms provide a principled framework for adaptive decision-making under uncertainty. This survey explores the potential at the intersection of these two fields. As we know, it is the first survey to systematically review the bidirectional interaction between large language models and multi-armed bandits at the component level. We highlight the bidirectional benefits: MAB algorithms address critical LLM challenges, spanning from pre-training to retrieval-augmented generation (RAG) and personalization. Conversely, LLMs enhance MAB systems by redefining core components such as arm definition and environment modeling, thereby improving decision-making in sequential tasks. We analyze existing LLM-enhanced bandit systems and bandit-enhanced LLM systems, providing insights into their design, methodologies, and performance. Key challenges and representative findings are identified to help guide future research. An accompanying GitHub repository that indexes relevant literature is available at https://github.com/bucky1119/Awesome-LLM-Bandit-Interaction.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Multi-Armed Bandits
Component-Level Interaction
Bidirectional Integration
Adaptive Decision-Making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Multi-Armed Bandits
Component-Level Interaction
Retrieval-Augmented Generation
Sequential Decision-Making
🔎 Similar Papers
No similar papers found.
S
Siguang Chen
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
C
Chunli Lv
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Agricultural Machinery Monitoring and Big Data Application, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
Miao Xie
Miao Xie
Alibaba Group
Recommender SystemsSocial Network AnalysisSoftware Engineering